# COLOR AND FORM PERCEPTION: STRADDLING THE BOUNDARY

EDITED BY: Galina V. Paramei and Cees van Leeuwen PUBLISHED IN: Frontiers in Psychology

#### *Frontiers Copyright Statement*

*© Copyright 2007-2016 Frontiers Media SA. All rights reserved. All content included on this site, such as text, graphics, logos, button icons, images, video/audio clips, downloads, data compilations and software, is the property of or is licensed to Frontiers Media SA ("Frontiers") or its licensees and/or subcontractors. The copyright in the text of individual articles is the property of their respective authors, subject to a license granted to Frontiers.*

*The compilation of articles constituting this e-book, wherever published, as well as the compilation of all other content on this site, is the exclusive property of Frontiers. For the conditions for downloading and copying of e-books from Frontiers' website, please see the Terms for Website Use. If purchasing Frontiers e-books from other websites or sources, the conditions of the website concerned apply.*

*Images and graphics not forming part of user-contributed materials may not be downloaded or copied without permission.*

*Individual articles may be downloaded and reproduced in accordance with the principles of the CC-BY licence subject to any copyright or other notices. They may not be re-sold as an e-book.*

*As author or other contributor you grant a CC-BY licence to others to reproduce your articles, including any graphics and third-party materials supplied by you, in accordance with the Conditions for Website Use and subject to any copyright notices which you include in connection with your articles and materials.*

> *All copyright, and all rights therein, are protected by national and international copyright laws.*

*The above represents a summary only. For the full conditions see the Conditions for Authors and the Conditions for Website Use.*

ISSN 1664-8714 ISBN 978-2-88919-857-3 DOI 10.3389/978-2-88919-857-3

#### About Frontiers

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

#### Frontiers Journal Series

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing. All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

#### Dedication to Quality

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view. By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

#### What are Frontiers Research Topics?

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

## **COLOR AND FORM PERCEPTION: STRADDLING THE BOUNDARY**

Topic Editors: **Galina V. Paramei,** Liverpool Hope University, UK **Cees van Leeuwen,** KU Leuven, Belgium

Image taken from: McCann JJ, Parraman C and Rizzi A (2014) Reflectance, illumination, and appearance in color constancy. Front. Psychol. 5:5. doi: 10.3389/fpsyg.2014.00005

Starting from psychophysics, over the last 50 years, most progress in unravelling the mechanisms of color vision has been made through the study of single cell responses, mainly in LGN and striate cortex. A similar development in the study of form perception may seem to be underway, centred on the study of temporal cortex. However, because of the combinatorial characteristics of form perception, we are also observing the opposite tendency: from single-cell activity to population coding, and from static receptive field structures to system dynamics and integration and, ultimately, a synthetic form of psychophysics of color and form perception. From single cells to system integration: it is this development the present Research Topic wishes to highlight and promote. How does this development affect our views on the various

attributes of perception? In particular, we are interested in to what extent evolving knowledge in the field of color perception is relevant within a developing integrative framework of form perception?

The goal of this Research Topic is to bring together experimental research encompassing both color and form perception. For this volume, we planned a broad scope of topics – on color in complex scenes, color and form, as well as dynamic aspects of form perception. We expect that the Research Topic will be attractive to the community of researchers whose work straddles the boundary between the two visual perception fields, as well as to the wider community interested in integrative/systems neuroscience.

**Citation:** Paramei, G. V., van Leeuwen C., eds. (2016). Color and Form Perception: Straddling the Boundary. Lausanne: Frontiers Media. doi: 10.3389/978-2-88919-857-3

# Table of Contents


#### **1. Population code of color in the early visual cortex**


Christian J. Kellner and Thomas Wachtler

#### **2. Integration of color and orientation**

*48 Processing bimodal stimuli: integrality/separability of color and orientation* David L. Bimler, Chingis A. Izmailov and Galina V. Paramei

#### **3. The watercolor effect and filling-in**


Mark Vergeer, Stuart Anstis and Rob van Lier

#### **4. Color-shape associations**


#### **5. Color constancy for 3D objects**

*104 The effect of background and illumination on color identification of real, 3D objects*

Sarah R. Allred and Maria Olkkonen

*118 Reflectance, illumination, and appearance in color constancy* John J. McCann, Carinna Parraman and Alessandro Rizzi

## Editorial: Color and Form Perception: Straddling the Boundary

#### Galina V. Paramei <sup>1</sup> \* and Cees van Leeuwen<sup>2</sup>

<sup>1</sup> Department of Psychology, Liverpool Hope University, Liverpool, UK, <sup>2</sup> Laboratory for Perceptual Dynamics, Faculty of Psychology and Educational Sciences, KU Leuven, Leuven, Belgium

#### Keywords: color and form relationship, early visual cortex, distributed processing, complex selectivity of neurons, contour-based filling-in

#### **The Editorial on the Research Topic**

#### **Color and Form Perception: Straddling the Boundary**

For many years, the dominating stance in neuroscience was that visual information processing is characterized by feature analysis (Hubel and Wiesel, 1959), followed by convergence and synthesis in a cascade of information processing stages (Hubel and Livingstone, 1987). In this cascade, color and features, such as orientation of achromatic contour segments, are initially separate (Zeki, 1978). So the question of how color and form perception are related was simply: At what level of processing do chromatic and achromatic features come together?

This question has taken a different form today. In the present volume, whereas Moutoussis presents a contemporary version of this classical view, Rentzeperis et al. argue that neuroscience has moved on to accommodate broadband selectivity and population coding of sensory information, as well as lateral and feedback connections, enabling context-selective tuning of receptive fields. This means that the neural architecture, as understood today, enables a broad variety of perceptual integration functions.

Therefore, we should not be surprised that integration of color and form appear at different levels and in various domains, from integration of color and orientation, over dynamically filling in (or the watercolor effect), to higher-order processes, such as implicit associations of color and shape in aesthetic judgments and color constancy for 3D objects.

These different topics are brought together in the present E-Book. We expect that the collection of articles will be attractive to the community of researchers whose work straddles the boundary between the two visual perception fields—of color and form perception, as well as to the wider community interested in integrative/systems neuroscience.

### POPULATION CODING OF COLOR IN EARLY VISUAL CORTEX

Moutoussis revisits the classical view that at an early stage, form is processed by several, independent systems that interact with each other, each one having different tuning characteristics in color space. At later processing stages, mechanisms emerge that are able to combine information coming from different sources. Rentzeperis et al. review classical psychophysical and neurophysiological studies on color and form perception from the perspective of recent developments in population coding. Color is typically believed to be encoded in the human retina in L-M and S/(L+M) opponent streams that are kept separate in the LGN. But in the early visual cortex, color selectivity is more widely varied as well as location-specific. Kellner and Wachtlershow that such distributed selectivities may depend on the spatio-chromatic processing in the retina, suggesting that properties of the retinal signal play a role in shaping the cortical population code.

Edited and reviewed by: Rufin VanRullen, Centre de Recherche Cerveau et Cognition, France

> \*Correspondence: Galina V. Paramei parameg@hope.ac.uk

#### Specialty section:

This article was submitted to Perception Science, a section of the journal Frontiers in Psychology

Received: 17 January 2016 Accepted: 19 January 2016 Published: 09 February 2016

#### Citation:

Paramei GV and van Leeuwen C (2016) Editorial: Color and Form Perception: Straddling the Boundary. Front. Psychol. 7:104. doi: 10.3389/fpsyg.2016.00104

### INTEGRATION OF COLOR AND ORIENTATION

Bimler et al. studied how color and line orientation, the lowlevel vision attributes, interact in their contribution to global stimulus dissimilarity. The authors demonstrate that the degree of color and orientation integrality may vary significantly across individuals: rather than being either separable or integral, these attributes combine with variable weights, a finding that might indicate an inter-individual shift between uncorrelated and correlated feature conjunctions in primary visual cortex.

#### THE WATERCOLOR EFFECT AND FILLING-IN

Three studies presented in this section involve the effect of color filling-in, also known as the watercolor effect (Pinna, 1987; Pinna et al., 2001). This is the effect of an illusory color that fills in between two enclosing bichromatic contours. Reeves et al. study the microgenesis of the illusion. They observe that the effect initially arises fast, within the first 100 ms from presentation and only during the presence of the eliciting stimulus. Already at this early stage, the meaning of the stimulus recognized as the "figure" facilitates the effect. Hazenberg and van Lier compare the watercolor illusion with its afterimage. They demonstrate that also "watercolor afterimages" show effects of filling-in, but, in spite of similarity, reveal noticeable contrasts with the watercolor effect itself. Vergeer et al. study color averaging, a form of homogenization of color within an object contour that depends on the shape and luminance of the contour. Homogenization serves to enhance identity of an enclosed surface, as a distinct color percept, while differentiating it from its surrounding as part of the process of representing a world of objects.

#### COLOR-SHAPE ASSOCIATIONS

Wassily Kandinsky claimed the existence of preferential associations between color and form: for instance, "yellow

#### REFERENCES


triangle, red square, blue circle" would make better color-form combinations than, say, yellow square, red circle, or blue triangle. Makin and Wuerger explore the existence of inherent color-form associations. The Implicit Association Test failed, however, to substantiate the evidence for such a relationship underlying the perception of color and form. In comparison, Holmes and Zanker suggest stable associations of color and shapes may exist at the level of aesthetic preference, as assessed by a Gaze Driven Evolutionary Algorithm. Notably, while being consistent for individuals, the preferences for color-shape combinations are found to strongly vary between individuals.

### COLOR CONSTANCY FOR 3D OBJECTS

Color constancy has a function in supporting object identity under different conditions of illumination. Whereas this is a well-established phenomenon for 2D surfaces, the question whether 3D objects show color constancy has been relatively unexplored. Two studies take up this issue. Allred and Olkkonen asked observers to make color matches to 3-dimensional objects (cubes) under varied conditions of illumination. They find that, in contrast to 2D scenes, an illuminant shift increases variability in color matches, but this is reduced by embedding the object within a background. The findings indicate that the addition of a background improves object segregation and, hence, stability of color identification. McCann et al. study the effect of nonuniform illumination that occurs as a consequence of having 3D blocks casting shadows on illuminated surfaces. The authors demonstrate that changes in color appearance depend on the spatial information in both the illumination and the reflectances of objects. They show that non-uniform illumination results in considerable variability in the sensation of lightness, hue, and chroma in 3D objects and departures from perfect constancy.

#### AUTHOR CONTRIBUTIONS

All authors listed, have made substantial, direct and intellectual contribution to the work, and approved it for publication.

Zeki, S. M. (1978). Functional specialization in the visual cortex of the rhesus monkey. Nature 274, 423-428.

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Paramei and van Leeuwen. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

## The physiology and psychophysics of the color-form relationship: a review

#### *Konstantinos Moutoussis\**

*Department of History and Philosophy of Science, National and Kapodistrian University of Athens, Athens, Greece*

The relationship between color and form has been a long standing issue in visual science. A picture of functional segregation and topographic clustering emerges from anatomical and electrophysiological studies in animals, as well as by brain imaging studies in human. However, one of the many roles of chromatic information is to support form perception, and in some cases it can do so in a way superior to achromatic (luminance) information. This occurs both at an early, contour-detection stage, as well as in late, higher stages involving spatial integration and the perception of global shapes. Pure chromatic contrast can also support several visual illusions related to formperception. On the other hand, form seems a necessary prerequisite for the computation and assignment of color across space, and there are several respects in which the color of an object can be influenced by its form. Evidently, color and form are mutually dependent. Electrophysiological studies have revealed neurons in the visual brain able to signal contours determined by pure chromatic contrast, the spatial tuning of which is similar to that of neurons carrying luminance information. It seems that, especially at an early stage, form is processed by several, independent systems that interact with each other, each one having different tuning characteristics in color space. At later processing stages, mechanisms able to combine information coming from different sources emerge. A clear interaction between color and form is manifested by the fact that color-form contingencies can be observed in various perceptual phenomena such as adaptation aftereffects and illusions. Such an interaction suggests a possible early binding between these two attributes, something that has been verified by both electrophysiological and fMRI studies.

#### *Edited by:*

*Cees Van Leeuwen, KU Leuven, Belgium*

#### *Reviewed by:*

*Ilias Rentzeperis, RIKEN Brain Science Institute, Japan Stewart Shipp, University College London, UK*

#### *\*Correspondence:*

*Konstantinos Moutoussis, Department of History and Philosophy of Science, National and Kapodistrian University of Athens, Athens, Greece kmoutou@phs.uoa.gr*

#### *Specialty section:*

*This article was submitted to Perception Science, a section of the journal Frontiers in Psychology*

*Received: 01 November 2014 Accepted: 03 September 2015 Published: 03 November 2015*

#### *Citation:*

*Moutoussis K (2015) The physiology and psychophysics of the color-form relationship: a review. Front. Psychol. 6:1407. doi: 10.3389/fpsyg.2015.01407* Keywords: color, form, vision, brain, physiology, psychophysics

### On the Role of Color Vision

Color vision is an evolutionary gift to some organisms. As everything in nature has a purpose, it makes sense to ask what color vision is for, what type of purpose does it serve, and how does it improve the way in which visual perception supports knowledge acquisition about the world. In addition to esthetically enriching our visual experience, color vision has several more practical applications as well (for a review on the functions of color vision see Mollon, 1989). The most obvious one is that color gives information about vital things such as the state of ripeness of fruit, the suitability of flowers, as well as the presence of water (revealed by the color of the vegetation of a place). Furthermore, colors are also often used as sexual signals for reproduction, as well as indicators in estimating the health or emotional state of others. A major role of color vision, however, is to allow us to detect targets against dappled or variegated backgrounds, where lightness is varying randomly. Color is a linking feature of perceptual segregation serving the detection of targets, and thus the identification and categorisation of objects in the environment. But why should the brain bother to use chromatic input in analyzing shape, when much more detailed information is provided by the luminance system? Object segregation can also be performed by the latter, but color vision has the big advantage of being indifferent to local changes in illumination. Luminance and color edges usually occur together, but in the case of illumination edges, shadows, and highlights, the conservation of color supports the integrity of an object, making chromatic variations more reliable indicators of material boundaries. Thus, color vision helps segment a retinal image into perceived material and illumination components, which is critical for object perception (see Cavanagh, 1991; Shevell and Kingdom, 2008). Sensitivity to both luminance and chromatic contrast would be advantageous for an organism, providing redundant sources of information that would improve contour detection in noisy images. Furthermore, the information is not always redundant: a detailed study on the color-contrast and luminance-contrast statistics of natural images has shown that both variations occur equally often and are independent of each other (Hansen and Gegenfurtner, 2009).

Although the low-level characteristics of luminancederived form vision are slightly better than those of the chromaticity-derived ones (see below), it has been demonstrated experimentally that the latter can sometimes do better than the former. For example, the strength and priority of color information in perceptual segregation is evident in a study in which color noise was shown to strongly interfere in an orientation-based texture-segregation task, rendering objects invisible to normal observers (Morgan et al., 1992). It is interesting that red-green color-blind dichromats escape the color camouflage and perform better than normal subjects in this task. Furthermore, it has been shown that even the S-cone signal alone can be used to detect chromatic boundaries in the absence of luminance contrast (Conway, 2014). It therefore seems that color information can support form perception in a way that can sometimes be superior to luminance information.

An important question that follows is whether these two different systems support form-perception independently, or whether and at what stage do they feed into a common mechanism. The fact that, in vision, we cannot concurrently entertain different perceptual organizations, might indicate a common pattern-recognition system (Mollon, 1989). On the other hand, color and form are two different attributes of visual experience, and it would make sense if they were processed separately by independent, functionally specialized systems. Such a functional segregation has been supported by the neurobiological architecture of the visual system (see Zeki and Shipp, 1988; Zeki, 1993), as well as by psychophysical studies showing that different visual attributes, including color and form, are being perceived independently and at different times (Moutoussis and Zeki, 1997a,b; Moutoussis, 2012). The question of separating chromatic from spatial vision is as old as Chisholm (1869) and will be the main topic of the present review.

### Early Separation of Brain Functions

Spectral variations in the environment are extracted by cone opponent mechanisms, which give rise to the chromatic system, whereas intensity variations are extracted by cone additive mechanisms, which give rise to the luminance system. The latter can use luminance contrast to support form perception (*form-from-luminance*). On the other hand, the variation in the wavelength composition of light reflected by the various parts of the visual field can serve two functions: the first is to be used in order to segregate and segment the external world into various objects (*form-from-color*), and the second is to assign to each of these objects a particular color (*color-from-color*). It therefore seems that we are equipped with two separate chromatic systems, one which is involved in the detection of edges and the analysis of spatial detail, and one which calculates the color of each object in the visual field (Mollon, 1989). It is obvious that the former system is able to support, and is directly involved with, form perception. However, the relationship between the second color system and the processing of form (whether luminance or chromaticity based) is also interesting, since they seem to be closely interlinked: a fundamental parameter of most colorcomputation algorithms is the spatial (and spectral) structure of the light entering the eye.

Land's Retinex Theory of Color Vision is one such example. In this algorithm, the relative amounts of long-, middle- and short-wave light in each area of the visual field (a multi-colored 'Mondrian' was used in his demonstrations) is compared to the relative amounts of long-, middle-, and short-wave light in the other areas of the visual field, in order to calculate three 'lightnessrecords' for each area and thus determine its color (Land, 1974). Each 'lightness-record' is a calculation of the relative amount of the particular wavelength reflected from this area with respect to the amount of the same wavelength reflected by other areas in the scene – a spatial comparison. Before any such computation can be executed, the visual field has to be first segmented into different areas. It therefore seems that form perception, or at least form computation, is a necessary prerequisite for the calculation of perceived color.

The brain is equipped with neurons that could carry out such calculations (see below). An important distinction would be to ask whether a neuron is color-selective, in which case participation in color-calculations is more likely, or simply colorsensitive (signaling pure color-contours irrespective of the actual color), in which case a contribution to form-from-color is more probable. The distinction can be pictured as follows: those neurons that are interested in the spatial arrangement of the color-pair, reminiscent to simple cells in terms of receptive field construction, vs. those that are not, resembling complex cells in terms of receptive field construction. The terms *color-signed* and *color-unsigned* have been previously used for these two types of color-processing mechanisms (Dobkins and Albright, 1994), and will also be used in this review. At initial stages, both classes (signed and unsigned) of cortical neurons likely receive inputs from just one of the two retinogeniculate color channels (parvo or konio). However the distinction remains operative at subsequent stages, after cortical convergence of the parvo and konio channels, where unsigned neurons are capable of responding more universally to any chromatic contrast, whereas others are still selective for the spatial geometry of particular hues, i.e., retain the color-signed property – the signature for color-from-color processing.

The characteristics of color-related neurons are discussed in more detail in the section on electrophysiology. The bearing of color physiology upon color psychophysics (or vice versa) will be noted throughout the text, although this is far from cutand-dried. As noted above, the guiding principle is that color sensitive (unsigned) cells should support form-from-color, and color selective (signed) cells support color-from-color. However, given the possibility that color signed signals at one stage may be pooled to form color unsigned signals at a higher stage, it is not possible to make the distinction unequivocally. For example, some neurons in area V1 reportedly combine color selectivity with sensitivity to luminance contrast, making the classification of these cells and their potential contribution to perception rather ambiguous. Thus, to be frank, the conceptual clarity offered by the use of the 'signed/unsigned' terminology is not intended to disguise the fact that the relationship between perception, psychophysics and hierarchical physiology remains a complicated (and unresolved) story.

#### Form can Influence Color Perception

Several psychophysical studies have clearly demonstrated an interaction between color and form perception, with the latter being able to influence the former. Color filling-in experiments is one such phenomenon: in one of the oldest studies of this type, it has been shown that, if retinal stabilization is used to render a disk-annulus contour invisible, the color of the annulus fills-in to the central disk (which in reality has a different color) and makes it also disappear (Krauskopf, 1963). The perception of edges is thus a determinant of the extent of color assignment, and it has been shown that even illusory contours can determine the shape of an area to be filled-in by color: because the S-cone system has poor spatial resolution and is thus blind to edges (Mollon, 1995), yellow can be made to bleed inside a gray region of similar red-green excitation, until a luminance or illusory contour is met (Santana et al., 2011). Measurements of the time that it takes to fill-in the color of a region, argue in favor of independent determinations of the boundary of a chromatic region and of the color of that region (Santana et al., 2011). With transparent motion stimuli, color filling-in is determined by image segmentation and can occur simultaneously and independently at multiple different surfaces, even if these surfaces occupy the same retinotopic positions (Kanai et al., 2006).

It has also been shown that color induction, i.e., the effect of the color of the surround on the color appearance of a central target, is maximal at isoluminance, when there is no luminance contour between the center and the surround (Gordon and Shapley, 2006), suggesting that luminance can supress color just as color can suppress luminance (Alpern, 1964). Furthermore, color constancy calculations can be influenced by the segmentation of an image in 3D space, as separate constancycomputations seem to operate at separate depth planes (Werner, 2006). Another example on how form can determine the color of an object comes from the fact that the latter can be made to vary with the 3D perception of a surface: using goggles to change the appearance of a 3D corner between convex and concave will also change the color appearance of one side of this corner, depending on whether it is perceived as an inner or an outer surface (Bloj et al., 1999). Effects like these can be attributed to the fact that, in addition to on-line computations regarding the light composition reflected from various parts of the visual field, color calculations also take into account prior knowledge regarding factors such as the source and direction of the illumination. Bayesian inference, where the likelihood of a particular percept is not only determined by the current sensory data but also by the various priors of the system, has been extensively used in explaining color vision. A striking example of a cognitive influence in color perception is the demonstration that prior knowledge regarding the color of objects can make achromatic images to appear as colored (Hansen and Gegenfurtner, 2006a). In an elegant human fMRI study using pattern classification, the neural stamp of such priors was present as early as in area V1 (Bannert and Bartels, 2013). It thus seems that the mutual influence between form and color extends over all levels of hierarchical processing in the visual system, possibly using forward as well as backward pathways (Zeki and Shipp, 1988; Shipp et al., 2009).

### Color-Form Contingency and Double-Tuning

If color and form were processed in an independent manner, there should be no interaction between the two. More specifically, there should be a complete independence between the spatial characteristics of a stimulus, such as orientation or spatial frequency (SF), and its color. However, many psychophysical studies on illusions and perceptual effects resulting from *adaptation* (i.e., from changing the response of a system because of stimulation), reveal contingencies between these two visual attributes. The prevailing idea behind adaptation experiments is that neuronal populations selectively tuned to the adapting stimulus become fatigued after prolonged exposure to the latter, leading to a relatively higher sensitivity of opponent populations and thus to an imbalance of the system (see Kohn, 2007 as well as Thompson and Burr, 2009 for a review, including alternative explanations). Adaptation thus serves as the electrode of the psychophysicist in discovering neurophysiological properties of the brain: if a mechanism adapts, this is taken as an indication that it therefore must exist.

Probably the oldest and most widely known example of a contingent aftereffect is the *McCollough effect*, which is an orientation-specific color-aftereffect (see **Figure 1**): if one adapts simultaneously to two color-orientation pairs, the color of the color-aftereffect depends on the orientation of the test stimulus (McCollough, 1965). For example, if one takes two complementary colors such as red and green, and attributes Vul and MacLeod, 2006).

them to a vertical and a horizontal grating respectively during an adaptation period, a vertical achromatic test stimulus will produce a color-aftereffect with a green tinge, and a horizontal achromatic test stimulus will produce a red tinged-aftereffect. Since the visual system has adapted equally to the two colors, the presence of a color-aftereffect suggests that color and orientation are encoded as a couple. The effect does not show interocular transfer, suggesting that it takes place early in vision – area V1 actually being the only candidate, since there is no orientation selectivity before and no monocularity after. The color specificity of the aftereffect suggests the involvement of striate neurons that are selective to both color and orientation. Interestingly, if the two adaptation pairs are presented at high alteration rates that make not only the color-orientation pairing but also the two colors themselves invisible, the aftereffect is still there, further suggesting that this early binding of color and orientation is also preconscious (Vul and MacLeod, 2006).

A similar but opposite contingency between color and form has been demonstrated using the *tilt-aftereffect*: double adaptation to two gratings of different colors, one tilted to the left and one tilted to the right, will make a vertical test grating appear tilted to the direction opposite to that of the adaptation grating of the same color (Held and Shattuck, 1971). The illusion is maximal at around 15◦ and then declines, in a way that permits one to calculate the width of the orientation tuning of the underlying mechanisms. It should be noted that, in both this and in the original McCollough studies, there was also a luminance contrast present – orientation was not defined purely by color. Thus, at the neuronal level, selectivity to color co-existed with selectivity to orientation which nevertheless was luminance-defined. However, it has been also shown that the normal tilt-aftereffect is equally strong using isoluminant stimuli, that there can be partial interaction between the luminance and the chromatic system in this effect, and that the effect is generally reduced as the difference between adaptation and test colors is increased, revealing in this way the tuning of orientation-specific mechanisms in color vision (Elsner, 1978; Lovegrove and Mapperson, 1981). Collectively, these results support the notion that both the luminance and the chromatic systems are equally efficient in signaling orientation.

In another detailed study for color-contingencies in the tilt-aftereffect it was found that, after double adaptation to oppositely oriented colored gratings, the direction of the aftereffect depended on the position of the test in color-space – being maximal when test and adaptor were identical (Flanagan et al., 1990). The color space used is explained in **Figure 2** and is the one defined by Derrington et al. (1984), also known as the DKL color space, the cardinal directions of which accurately describe the color preferences of parvo and konio LGN neurons (magenta/cyan and purple/greenish-yellow) rather than the primary perceptual opponent colors (red/green and yellow/blue). Color gratings were formed by sinusoidal modulations along the cardinal axes, and drifted during adaptation to prevent the formation of static afterimages. Adaptation along an axis 45◦ to the cardinal ones also gave a partial aftereffect, in accordance with physiological studies that report a broader distribution of preferred color axes amongst early cortical neurons compared to the LGN (Lennie et al., 1990; Kiper et al., 1997). As is also the case with the McCollough effect, cortical mechanisms must be involved in the tilt-aftereffect, since there is no orientation selectivity at a subcortical level. Concerning the color selectivity, the stimulus used by Flanagan et al. (1990) should adapt both color-signed and color-unsigned oriented cells; either class could be capable of generating the observed contingent aftereffect, the implication being that orientation tuning is generated separately within the parvo and konio channels at a cortical level, prior to their convergence. However, when only one color-orientation

actual hue) and the contrast (as a fraction of the maximal modulation along the cardinal axes) (Reprinted with permission from John Wiley & Sons Ltd;

Derrington et al., 1984).

pair was presented during adaptation, the peak magnitude of the tilt aftereffect was larger, and never fell to zero (across test gratings of varying chromaticity, including achromatic). This suggests the presence of two components for the aftereffect, a baseline non-selective one and one which is tuned to color. The latter can encode form in a similar and equally efficient way to the former, providing strong evidence for the ability of the color system to support form perception independently. The baseline non-selective one, on the other hand, provides evidence for a second stage of form analysis, able to pool across chromatic and luminance inputs. Thus, the results of Flanagan et al. (1990) suggest the existence of several, separate orientationtuned cortical mechanisms, one of which is achromatic and at least two of which are chromatic, and thus support the idea of different populations of orientation-selective neurons maximally activated by each. The existence of integrative systems at a higher level is supported by further evidence that is presented below.

Mechanisms responsible for the tilt-aftereffect are probably also responsible for the *tilt-illusion*, in which a tilted annular grating induces an opposite tilt in the orientation of a central target grating (Gibson and Radner, 1937). Unlike early reports claiming the failure of this illusion at isoluminance (Livingstone and Hubel, 1987), the tilt-illusion can be supported by purely chromatic stimuli and maximizes when the test and inducer are of similar chromaticity (Clifford et al., 2003a). This selectivity for color implies the involvement of color-selective cells, whereas the persistence of the effect using different colors reveals the involvement of neurons that are color-sensitive (but not selective) as well. Furthermore, an interaction between luminance and pure color gratings was also reported in this study, suggesting orientation-specific and color-specific lateral inhibition between cells preferring the same orientation. This, in turn, implies a dual selectivity and a color-form interaction at an early stage of visual processing (Clifford et al., 2003a). In a further study from the same group, comparing the tilt-illusion with the tilt-aftereffect, it was confirmed that both are maximal when test and inducer are modulated along the same axes of color space, with this selectivity being slightly more pronounced in the tilt-illusion (Clifford et al., 2003b).

The chromatic characteristics and the hierarchical position (within the visual pathway) of mechanisms involved in the tilt illusion were investigated in a study presenting the target to one eye and the surround to the other, with the two being either of the same or of different colors: it was found that, whereas the binocular mechanism is largely color-invariant, the monocular mechanism is purely chromatic, implying the existence of coupled color-orientation neurons in early visual cortex (Forte and Clifford, 2005). A two-stage model was suggested for the illusion, in which binocular visual mechanisms code for form in a manner that is largely insensitive to chromatic signature, whereas color and orientation processing interact at the monocular stages of visual processing (Forte and Clifford, 2005). Such a model is also in harmony with the lack of interocular transfer in the McCollough effect (McCollough, 1965). The existence of a form system based on pure chromatic information is further supported by experiments showing a decrease in threshold contrasts for vertical and horizontal orientations compared to oblique ones (*oblique-effect*), using isoluminant stimuli (Reisbeck and Gegenfurtner, 1998).

Using the *size-aftereffect* (Blakemore and Sutton, 1969), a contingency between SF and color has been also reported (Hardy and De Valois, 2002). In this experiment, subjects were adapted to colored, luminance-varying gabors, one with a high-SF and one with a low-SF, simultaneously presented at different retinal locations (see Figure 1 of Hardy and De Valois, 2002). If a single color (e.g., red) was used during adaptation, the aftereffect was stronger for a test of the same color but still present for a test of a different color (e.g., green). In double-adaptation experiments using two adaptors of a different color, also having the opposite spatial arrangement with respect to which SF was presented where, the direction of the aftereffect was found to depend on the color of the test (Hardy and De Valois, 2002). It was thus possible to dissociate between the color-selective and the colorinsensitive mechanisms of the aftereffect, showing that the former could account for 1/3 of the total effect and was highly selective for both orientation and eye of origin, suggesting an early cortical monocular mechanism and the involvement of color-signed neurons (as in the McCollough effect). The color-insensitive part of the effect showed interocular transfer, revealing a later (higher) mechanism, and was (surprisingly) not so sensitive to orientation.

### Spatial Properties of Color and Luminance Detectors

As mentioned before, cone outputs are combined at the postreceptor level in two different ways: one additive, giving rise to luminance signals with no information regarding the wavelength composition of light, and one subtractive, which preserves the latter and can thus be used for determining the color of objects. An important question for visual-psychophysics is to examine the properties and functions of these two separate systems, as well as the relationship between them. According to the old 'coloring book' model, the role of the luminance system is to segment the visual input into different objects and shapes, and the role of the color system is simply to fill in the colors of these objects. An alternative view suggests that chromatic information can also be used to derive form perception, independent from and equally efficient to the luminance system. As we are not blind to stationary *isoluminant* (i.e., of the same luminance) figureground spatial arrangements, chromatic information seems capable of supporting the perception of form. The question is how well it can do this, and how its performance compares to that of the luminance system. For example, the spatial resolution of the latter is expected to be much better, since it is only limited by the spacing of (any type) of cones. Studies comparing the performance of the two systems with respect to spatial vision, as well as the degree to which they are independent or interact with one another, have shed some light on these matters. Detection thresholds, subthreshold (i.e., with intensities below threshold) summation, adaptation-aftereffects and masking, are the most common psychophysical tools that have been used in this quest.

One way to compare the properties of the color and luminance systems is to estimate their sensitivity by plotting *detection* *thresholds* as a function of *contrast*. In this way, the minimum contrast necessary in order to perceive the stimulus is estimated, and gives an idea of the sensitivity of the system to the particular attribute. There is a technical issue, however, when comparing the sensitivities of the color and luminance system, as the definition of contrast is not as straightforward in the former as it is in the latter. Luminance is what is called a *prothetic* continuum ('how much?'), and thus has a measurable intensity (total amount of energy) that can be used in contrast calculations (e.g., [max-min]/[max+min]). Color, on the other hand, is a *metathetic* continuum ('what kind?'), having no measurable intensity. Different studies have used different methods to define color contrast, usually in terms of graphics display capabilities, or with respect to an individual subject's absolute detection thresholds. Such arbitrary definitions are useful in calculating things such as the sensitivity of the chromatic system with respect to SF, but leave the validity of a direct comparison between color and luminance performance an open question.

Along these lines, it has been shown that pure-color form is lost at high frequencies (see Elsner, 1978) and that, in general, the luminance system is considered to have a higher spatial (and temporal) resolution than the chromatic one (Kelly, 1983). The contrast-sensitivity function for color has been found to decrease monotonically with frequency, whereas the luminance one has drops at both the high and the low spatial frequencies (De Valois and Switkes, 1983; Mullen, 1985). When comparing the sensitivity of the two systems with respect to SF and orientation, luminance mechanisms show slightly lower contrast-detection thresholds, compared to color mechanisms (Webster et al., 1990). Such a comparison is made by normalizing luminance and color thresholds in terms of equal multiples of their respective contrast detection thresholds. Two types of tasks, across variable contrasts, were used by Webster et al. (1990): a detection task, in which subjects had to decide in which of two time intervals the stimulus appeared, and a discrimination task, in which subjects had to decide which type of grating was presented. It was shown that observers could reliably discriminate the orientation and SF of chromatic gratings even at the limit of the detection thresholds. Furthermore, they could also discriminate between luminance and chromatic gratings, as well as between different types of chromatic gratings at their detection contrasts, suggesting that the mechanisms involved in the detection of these stimuli are selective for both spatial and chromatic information.

Another way to examine the independence between different detection mechanisms in the brain is adaptation. Using adaptation-stimuli that isolate the S-cones, threshold-elevations that were specific to both orientation and color have been reported (Stromeyer et al., 1980). Results from this study are a clear demonstration of color-selective spatial mechanisms, as threshold elevation was not observed for colors others than the test. The effect did not show interocular transfer and was eliminated if either the test color or the test orientation were different from those of the adapting stimulus, revealing a combined orientation-color signal at an early stage of processing. Such evidence for a monocular color-orientation signal has been also suggested by other studies (see both above and below). With respect to SF, simultaneous but opposite size-aftereffects have been demonstrated after double adaptation to luminance and isoluminant gratings, suggesting that chrominance and luminance channels perform similar but independent SF analysis of the image (Favreau and Cavanagh, 1981). No interocular transfer was found for the isoluminant signal in this study, thus pointing toward the operation of early-stage mechanisms. Orientational anisotropy has been reported for the detection of chromatic gratings, also suggesting an explicit representation of the orientation of chromatic stimuli (Murasugi and Cavanagh, 1988). Furthermore, adapting to isoluminant gratings produced contrast-threshold elevations that were orientation and SF specific, just as is the case with adaptation to luminance gratings (Bradley et al., 1988). In this study, there was little cross-adaptation between luminance and color, pointing toward separate cell populations rather than a common form-system deriving input from both. Furthermore, the specificity with respect to the color of the stimulus might suggest the involvement of color-selective neurons.

The independence and interaction between different systems can be also revealed by the extent to which one can *mask* the other. In these types of experiments, the presence of a masking stimulus (usually referred to as *mask*) interferes with the detection of a target stimulus (usually referred to as *test*) presented closely in time. The mask can be presented slightly before, after, or at the same time with the test and usually results in increasing detection thresholds (although facilitation can occur with subthreshold masks). The idea is that if the mask has an effect on detection, then the two must be processed either by the same underlying neuronal mechanism or by different mechanisms that nevertheless interact with each other. If, on the other hand, the mask has no effect, it is assumed that the two are processed by independent mechanisms in the brain. For example, if masking effects are present only when test and mask have the same SF, an architecture characterized by separate spatial-frequency channels is implied.

Masking using luminance and isoluminant gratings has revealed a similar specificity to SF, suggesting that the color system, just like the luminance system (Campbell and Robson, 1968), also consists of bandpass (i.e., being of a specific frequency range) spatial filters (De Valois and Switkes, 1983). Within a system, spatial-frequency specificity was more pronounced for the luminance than for the color system, suggesting the existence of more broadly tuned spatial-frequency channels for the latter. Spatial-frequency specificity was even more pronounced when masking across systems, with the masking of color by luminance being the most stringent (identical spatialfrequencies required). This asymmetry indicates the possibility of asymmetric lateral inhibitory interactions at the neuronal level, with chromatic information dominating luminance information (De Valois and Switkes, 1983). These results clearly demonstrate that the two systems do not show complete independence and are extended in a subsequent study from the same group (Switkes et al., 1988), in which the mask was presented at subthreshold contrasts (suprathreshold masks were used in De Valois and Switkes, 1983). It has been discussed earlier (see above) that, due to shadowing, chromatic contrast is more reliable than luminance contrast in figure-ground segregation, and thus such an asymmetric interaction could be beneficial for the organism. Yet another masking study, using isoluminant stimuli of various spatial frequencies, has also verified that the chromatic contrast sensitivity function is the upper envelope of a range of bandpass mechanisms (Losada and Mullen, 1994). Close spatial-frequencies were again found to be more effective in both facilitation and masking than spatial-frequencies further away from the test. Therefore, as is the case with the luminance system, the color system also consists of different spatial-frequency channels but with a slightly broader tuning than luminance.

The sensitivity of the visual system is not always measured with respect to the total absence of a stimulus as a reference point, but also in cases where an initial stimulus (*pedestal*) is present. A subthreshold pedestal can either lead to facilitation of target detection or interfere with the latter, as in masking. Experiments showing that a color pedestal does not add to a luminance pedestal in facilitating luminance detection but produces masking instead (and vice versa), also point toward the existence of separate color and luminance spatial-mechanisms (Mullen and Losada, 1994). In the same study, subjects showed a decrease in test threshold with added luminance contrast (luminance-luminance facilitation), despite the initial threshold elevation (color-luminance masking) produced by a fixed chromatic pedestal. Subthreshold facilitation is thus unaffected by the presence of a suprathreshold color contrast mask, i.e., the opposite effects of masking and facilitation can occur simultaneously. These results can be explained by the presence of separate color and luminance mechanisms, which nevertheless interact with each other (Mullen and Losada, 1994). By testing subthreshold summation between color and luminance gabors over a wide range of spatio-temporal frequencies, the same group has failed to find any linear summation between color and luminance (Mullen et al., 1997). In this study, detection of the stimulus was achieved when either mechanism reached its own threshold ('inclusive OR rule'), thus supporting the existence of two separate systems that contribute independently to detection using what is referred to as *probability summation* (but see also Gur and Akri, 1992). Similar results have been reported between all the possible combinations of the cardinal axes of the DKL color space (see **Figure 2**), normalized to detection threshold (Mullen and Sankeralli, 1999). In this study, subthresold summation revealed a stochastic independence of the 'red–green,' 'blue–yellow,' and luminance post-receptoral mechanisms, whose joint presentation at near threshold contrast raises the likelihood of detection through probability summation. It must be noted, however, that such results do not guarantee that independence is also present at suprathreshold levels (i.e., above threshold).

#### Global-Form Computations

The detection and discrimination of local contours is the first stage in any computations leading to the perception of form. In the experiments described in the previous section, local contours were defined by luminance or color or both, and sensitivity to and interaction between luminance and chromatic stimulus characteristics were examined. Global form perception, however, is not fully characterized by the sensitivity and tuning characteristics of the different systems in the orientation and SF domain. It also requires an account on how the information of local detectors is integrated across space, and so one can ask whether chromatic and luminance local information are equally effective at this second stage, and whether these two systems remain separate or whether there are mechanisms able to pool local signals coming from different sources. Due to the possibility of such a pooling mechanism, a performance minimum at isoluminance in spatial-integration tasks does not necessarily imply a superiority of the luminance system, since it might also be the result of an additive mechanism to which both systems contribute (Kingdom et al., 1992). Such performance minima at isoluminance have been widely used to demonstrate a superiority of the luminance system in particular perceptual domains (e.g., motion – see Cavanagh et al., 1984). However, in order to infer a superiority of the luminance system over the color one, performance based on isoluminant and isochromatic stimuli requires direct comparison, raising the problem of calibrating their relative contrast, as noted above. This section describes experiments which are carefully designed to address the question of whether color and luminance information are equally effective in spatial-integration, and whether or not they combine at the level of global-form perception.

In one such study, in which low SF orientation pop-out was used in order to segregate figure from ground, after normalizing by the threshold for individual-element detection, no difference in the detection threshold of the figure was found between color-defined and luminance-defined gratings (McIlhagga et al., 1990). The results imply that such an automatic, pre-attentive texture-segregation process can be accomplished using chromatic information alone, just as efficiently as when using luminance information. Similarly, isoluminant stimuli were found to be equally effective to luminance ones in a spatial integration task, in which subjects had to judge the collinearity between 2 and 16 random elements (Kingdom et al., 1992). In yet another spatial integration task, it was shown that both luminance and chromatic stimuli employ probability summation of the orientation and contrast cues in order to detect a target within distractors – a process which also involves higher stages of global spatial processing (Reisbeck and Gegenfurtner, 1998). In this study, functions of orientation thresholds with respect to contrast were shown to have a similar shape for luminance and color, revealing similar orientation discrimination mechanisms (color being slightly more sensitive at low SF and vice versa at high SF). All these findings thus suggest that color contrast can be used as efficiently as luminance contrast in spatial integration tasks.

In another type of spatial task, in which the local orientation continuities of gabor elements are integrated into a global form pattern, the performance of the color system was found to be comparable to that of the luminance system, especially at high contrasts (McIlhagga and Mullen, 1996). When color-defined and luminance-defined gabors were combined, performance was much poorer compared to the homogeneous conditions, suggesting separate integration systems with a limited amount of interaction, rather than a single integration mechanism which is indifferent to the nature of the signal (McIlhagga and Mullen, 1996). In another contour integration task of this type, in which the element-to-element orientation curvature as well as the individual element contrast were the independent variables, performance was found to be similar if either luminance, or isoluminant 'red/green,' or 'blue/yellow' gabors were used as elements (Mullen et al., 2000). In this study, orientation discrimination was much weaker for the 'blue/yellow' stimuli. However, at threshold contrast for path detection all individual gabors were at suprathreshold contrasts, i.e., path detection performance was not limited by an inability to see the individual elements. It thus follows that the contrast limitation for this task must occur at a (possibly common) higher stage. Curvature, which was the other limiting factor for path detection in this study, was also found to affect the three types of stimuli in the same manner. So, although the optimal performances of the chromatic mechanisms fall slightly below that of the luminance mechanism, no evidence was found that they are deficient in contour integration. The authors also examined the possibility of a common integration process by using paths whose elements (gabors) alternated between two channels. This was found to impair performance, but not to the (very poor) level predicted by a model of performance using totally independent mechanisms (that would treat a 10-element contour as two separate five-element contours). Thus the spatial integration process responsible for path extraction was inferred to have some capacity to pool across the primary channels, but the nature of this process remained uncertain as it was not blind to the chromaticity of the gabor elements. Furthermore, it was also shown to be susceptible to variation in their phase, even with uniform chromaticity, suggesting some inherent involvement of color-signed mechanisms (Mullen et al., 2000).

In a subsequent study by the same group, the performance of the chromatic mechanisms was found to decline more steeply with increased element separation, suggesting that perhaps contour integration by this system relies more on short-range interactions compared to the more long-range interactions of the achromatic system (Beaudot and Mullen, 2003). In a different type of global-shape-discrimination study, in which radial-frequency was varied and subjects had to judge the circularity of a given pattern, the best performance was achieved with luminance stimuli and the worst with 'blue/yellow' (Mullen and Beaudot, 2002). Nevertheless, these differences were not very large and, at the highest contrast levels, chromatic shape discrimination could also reach hyperacuity performance, suggesting that color vision cannot be considered seriously deficient in global-form perception. Again, several different, color-orientation mechanisms seem to be involved, rather than simply a single chromatic and a single luminance channel.

A different spatial integration task, commonly used in the literature, is to measure the detection threshold of a *signal* which is embedded into *noise*, where signal and noise are sampled from separate Gaussian distributions across various directions in color space. It is namely a figure-ground segregation task based on the idea that, if noise and signal are being processed by separate neuronal mechanisms, modulations of the former should not interfere with the detection of the latter. If, one the other hand, changes in the noise affect the sensitivity to the signal, then a common underlying processing mechanism is assumed – i.e., the largest interference should be caused by noise modulations along directions close to the modulation of the signal. In this way, independent higher-order detection mechanisms can be identified, in addition to the cardinal ones: if noise orthogonal to one direction has no effect, this can be modeled by a higherorder, cortical mechanism which is tuned along this direction (see Figure 3 of Hansen and Gegenfurtner, 2013).

In a study measuring detection thresholds for vertical gratings embedded in spatiotemporal broadband noise, it was found that if signal and noise were modulated along orthogonal axes in a 2D color space (magenta-cyan and achromatic cardinal directions), the noise had no effect on detection (Gegenfurtner and Kiper, 1992). If, on the other hand, both noise and signal were modulated along the same axis, there was a linear relationship between noise and signal threshold. Furthermore, the slope of this relationship along the magenta-cyan and along the luminance axis was the same. These results suggest the existence of two equally efficient, independent mechanisms for the detection of chromatic and luminance signals respectively. Interestingly, other mechanisms tuned to non-cardinal directions that combine luminance and chromatic information have also been discovered, the chromatic tuning (preference and breadth of tuning) of which could be estimated by keeping the signal direction constant and changing the direction of the noise (Gegenfurtner and Kiper, 1992).

In another study along the same lines, minimal interference was found along cardinal directions, suggesting once more the existence of relatively independent luminance and chrominance channels, one in each cardinal direction, as well as another two channels in intermediate axes within the isoluminant plane (Li and Lennie, 1997). In yet another study of orientation discrimination involving external noise, performance was equivalent for color and luminance stimuli, demonstrating the ability of chromatic information to support the early stages of form vision, almost as effectively as luminance information (Beaudot and Mullen, 2005).

The question of the number of broadly-tuned independent chromatic mechanisms for form-segregation is of great interest, as it is clear that directions other than the cardinal ones exist in color space. In an attempt to investigate in detail the presence and number of such mechanisms, an image segmentation study used texture signal and noise that were varied across various directions in either the isoluminant or the L-M luminance plane, to show that masking is maximum when noise is in the same direction with the signal, and minimum when noise is in an orthogonal direction to the signal (Hansen and Gegenfurtner, 2006b). The tuning curves of detection threshold in color-space that resulted from these detailed measurements were best described by a chromatic-detection model of multiple (*N* = 16 gave the best fit) broadly-tuned, higher-order detection mechanisms. These results were verified by a subsequent study from the same group, in which higher-order color mechanisms were charted in a signalwithin-noise setup, by keeping the noise constant and changing the direction of the signal (Hansen and Gegenfurtner, 2013).

Masking between color and luminance stimuli has also been used in higher-order form tasks, such as the detection of a gabor-defined, orientation-modulation pattern (see Figure 1 of Pearson and Kingdom, 2002). The idea is that if a subthreshold mask facilitates the detection of the test then both test and mask are processed by the same mechanism, something which was experimentally observed in both crossed and uncrossed conditions. Thus, orientation-modulation patterndetection seems to be the function of a single mechanism pooling both luminance and color information. The authors propose the existence of different, independent color and luminance systems at the first stage of form-processing, and a common second stage system that pools color and luminance inputs (as proposed for contour integration in Glass-patterns – see below).

The potential for spatial integration of pure chromatic information with respect to global form perception has been also investigated using Glass-pattern stimuli (Glass, 1969). In these stimuli, a proportion of local elements (the signal) define a particular pattern (e.g., circularity) and the rest (the noise) are positioned randomly. Detection thresholds in such patterns were found to be highest when noise and signal had the same color, and to decline as the color difference between the two increased (Cardinal and Kiper, 2003). In this study, varying the distance in color space between isoluminant signal and noise in order to calculate tuning curves of threshold vs. color-difference, revealed the existence of several independent chromatic mechanisms, which are broadly tuned (i.e., sum their cone-inputs linearly). In another similar study (Wilson and Switkes, 2005), it was shown that chromatic information can be used to perceive form in Glass patterns, with detection thresholds similar to those of luminance (but slightly higher for the S-system). Additionally, by varying the color difference between the two elements forming a dot pair ('intradipole' variation) or between dot pairs ('interdipole' variation), the level of color-form integration could be measured at an early (local) or later (global) stage of processing respectively. Results showed that early-level integration is color-specific, i.e., the ability to extract the orientation of dot pairs in Glass patterns decreases with increasing chromaticity differences, whereas at a global (i.e., interdipole) level, the form system is able to integrate information from differing chromaticities, suggesting that later mechanisms responsible for pattern segregation are color-sensitive but not color-selective (Wilson and Switkes, 2005).

Glass-pattern experiments suggest that the local orientation of dot-pairs is calculated in the first stage, and in the second stage these local orientation signals are pooled over a large area of the visual field. By changing the color relationship between isoluminant dot pairs, it has been shown that first stage mechanisms are color selective and narrowly tuned in color space, i.e., do not combine their cone inputs linearly (Mandelli and Kiper, 2005). It thus seems that color selectivity is more pronounced at the early stages of form processing (Mandelli and Kiper, 2005), but still present at latter stages as well (Cardinal and Kiper, 2003). Adaptation experiments using radial and concentric Glass-patterns have shown that adaptation is global-form specific, but transferable between color and luminance information, further suggesting a common form system combining both luminance and color information (Rentzeperis and Kiper, 2010). In this study, luminance adaptation was found to have a stronger effect in general, and there was also some color-specificity in color adaptation. However, results from other studies suggest that the global analysis stage in Glass pattern processing pools the signal across different color channels and hasthus no color-selectivity (Clifford et al., 2004; Wilson and Switkes, 2005).

### Color, Form, and the Brain

The properties of visual perception, which are revealed by behavioral studies, are the result of the functioning characteristics of the visual brain. Therefore, a link between the neuronal and the behavioral levels of description is the ultimate goal in cognitive neuroscience. With respect to the relationship between color and form in particular, psychophysical results suggest that anatomical, physiological and imaging studies should address the following questions: (1) can neurons selective for the color of a stimulus also be selective for the orientation and SF of the stimulus, or are these mutually exclusive properties?, (2) are spatial and chromatic properties processed in topographically distinct parts of the brain?, (3) is it possible to drive spatiallyselective neurons using stimuli defined by chromatic-contrast alone?, (4) if they exist, are these neurons color-signed or color-unsigned?, (5) are the tuning properties of these neurons similar to the ones driven by luminance?, (6) do luminance and isoluminant stimuli activate the same or different form-related areas in the brain, and how does that change as one moves along the hierarchy of the visual system? In this section, studies addressing these questions will be reviewed, in order to see whether and to what extent the neurobiology of the visual system reflects the picture emerging from the psychophysical evidence. The prediction of the latter is probably for monocular, orientation selective units formed within each of the two cardinal chromatic dimensions at the initial stage, either color-signed or colorunsigned, and neurons retaining sensitivity to color but lacking selectivity being produced by pooling of these initial signals at subsequent stages of spatial integration. However, although the link between physiology and behavior is perhaps the ultimate goal in the brain/mind sciences, it is not always a straightforward matter.

Years of electrophysiological and anatomical studies of the visual system in primates have revealed a cortical architecture which is mainly characterized by functional specialization (see Zeki, 1993): different visual attributes such as color, motion, and form are processed by separate neuronal populations, in different parts of the brain. The idea of such an architecture was initiated by studies in prestriate cortex, pointing toward a specialization for motion in area V5 (Zeki, 1974) and color in area V4 (Zeki, 1973). Segregation is present in the very early visual areas V1/V2 as well, in which different parts of the information are processed by separate neuronal populations, within different anatomical compartments (see Livingstone and Hubel, 1984; Shipp and Zeki, 1985, 2002; Moutoussis and Zeki, 2002). For example, cytochrome oxidase 'blobs' in V1 and 'thin stripes' in V2 contain chromatically tuned neurons that are indifferent to orientation and could thus support color perception. Anatomical studies nicely complement the physiology, showing that blobs project to thin stripes and the latter to V4, perhaps forming the color-from-color system referred to earlier. Similarly, the formfrom-luminance system could be reflected in the interstripeinterblob-V4 pathway, containing orientation-selective neurons that are indifferent to the color of the stimulus. Perhaps Stephen Grossberg's formulation is relevant here, regarding the pathway originating in blobs as a 'feature contrast system,' painting color into fields defined by the 'boundary contrast system' originating in the interblob pathway (Grossberg, 1994).

Several studies have challenged the robustness of such segregation, by demonstrating the presence of neurons with dual selectivities, or cells with selectivities which are different to the majority of the neurons in the particular anatomical region (e.g., Leventhal et al., 1995; Kiper et al., 1997; Friedman et al., 2003). In general, studies with extensive histological documentation of electrode tracks with respect to cytochrome stripes, report higher levels of stripe specialization than studies lacking such documentation (see Shipp and Zeki, 2002), and are further supported by optical imaging studies of V2 (Tootell et al., 2004; Chen et al., 2008; Lu and Roe, 2008; Lu et al., 2010; An et al., 2012). Hence, the overall picture of segregation retains credence, even if one chooses to take account of studies arguing for the opposite (see Gegenfurtner, 2003). Furthermore, the issue of V2 neurons with dual selectivities (reported by studies lacking any histological background) could be explained by the fact that such neurons are common in the feedback layers (that lack ascending output) but are less frequent in the ascending output layer of this area (Shipp et al., 2009).

Functional specialization is also in accord with psychophysical studies, showing that different visual attributes are being perceived at different times (Moutoussis and Zeki, 1997a,b), and is the general principle by which visual processing is implemented by the brain (for an alternative view see Lennie, 1998). It should be noted, however, that most of the conclusion regarding the functional architecture of the visual cortex are based on the assumption that individual neurons function as independent 'feature-detectors.' In this way, brain regions are thought to contain neurons with similar tuning characteristics, and therefore the function of the area as a whole is fully described by the function of individual neurons. In more recent years, the idea that information is encoded in the *pattern* of neuronal activation has prevailed, making the interpretation of individual-neuronselectivities less straight-forward (see Rigotti et al., 2013 for one nice example).

With respect to color and form in particular, many electrophysiological studies have concentrated on showing that orientation selective cells are not selective for the color of the stimulus and vice versa (see above). However, most of these studies use luminance contrast to test for orientation selectivity and simply verify the fact that the form-from-luminance system is separate from the color-from-color system, which carries the information regarding the wavelength composition of the light. As discussed in previous sections, the latter is used to determine the color of the various objects, but could also be used to inform form perception. In searching for the correlates of such a chromaticity-based form system, one should look for spatially-selective information, using stimuli defined by color contrast alone. It is important to note that, although color-signed neurons are necessary for encoding the color of an object, formfrom-color can be encoded by color-unsigned responses – i.e., responses to pure chromatic boundaries without any selectivity for the particular configuration of colors either side of a contour.

Along these lines, orientation-selective neurons responding to pure color stimuli have been reported in area V1, and these cells were also found to be tuned to various spatial frequencies (Thorell et al., 1984). Most of these neurons were color-tuned and responded to luminance modulation as well, although it is possible that multi-unit recording might have contaminated such early electrophysiological results. This is the first study to describe a color-unsigned, orientation specific response. These were complex cells that were found to respond to multiple colors (though not defined within DKL space). Color-signed simple cells were also reported. Although eye-preference was tested for each neuron, no further information is given regarding the monocularity or not of the various cell types. Compared to the luminance-cells, color-cells were found to be more broad-band, with a higher peak at low spatial frequencies but equally sensitive to luminance-cells at high frequencies. The presence of multiple orientation and SF channels for color information makes a Fourier-type analysis of spatial chromatic-information possible, similar to that proposed for luminance (Campbell and Robson, 1968). These results differ from the ones in the LGN, which is low pass for color and high pass for luminance, and where the P system has been shown to respond to isoluminant stimuli much better than the M system (Hubel and Livingstone, 1990). Pure-color responses in primary visual cortex were also reported in a subsequent study, in which the more strongly color-tuned responses were at the same time unselective for the orientation of the stimulus (Lennie et al., 1990). Color-signed responses were found in this study, but the responses of the majority of the complex cells were independent of the chromaticity of the stimulus used to measure them, as in Thorell et al. (1984). It thus seems that, already at the stage of V1, spatial-vision mechanisms sensitive but not selective to color are already present. In yet another study in area V1 it was shown that color-selective cells as a population are tuned to lower spatial-frequencies than noncolor cells (Leventhal et al., 1995). This study reports that most cells in layer 4Cβ also exhibit a significant degree of orientation bias. These cells are also monocular and parvo-driven (hence R/G channel selective), and thus provide a possible neuronal substrate for the signed, monocular oriented mechanism implied by psychophysics.

Color-selective cells responding to isoluminant stimuli have been also reported in area V2 (Kiper et al., 1997). For most of these cells, the addition of luminance contrast was found to increase the strength of the responses, but their orientation and spatial-frequency tuning was similar when using either luminance or chromatic stimuli. They found many more colorunsigned than color-signed cells. The former can be equated with the majority of units showing linear cone summation, and which were noted to respond equally well to light red/dark green gratings or dark red/light green. The minority units with nonlinear, narrower cone summation did not show this property, but were phase selective for the combination of the chromatic and achromatic components of the gratings (Kiper et al., 1997). In agreement with the potential of chromatic information to support form perception, cells responding to chromatic edges, with or without selectivity to contrast polarity, have been reported both in V1 and in V2 (Friedman et al., 2003). Cells responding to uniform color-fields (vs. color edges) were also reported, suggesting a role for these areas in the perception of the internal color of a surface (vs. contour signaling). The latter is in agreement with results from human fMRI experiments, showing that neon-color filling-in can selectively activate striate cortex (Sasaki and Watanabe, 2004). Finally, form-from-color can alter the responses of neurons in striate cortex in the same way that form-from-luminance (or texture, motion and depth) can: a robust contextual modulation in V1 responses by figuresegregation has been reported, the latter being purely defined by color-contrast well outside the receptive field boundary and thus implying spatial integration and feedback from higher areas (Zipser et al., 1996).

In a detailed electrophysiological study using luminance and isoluminant stimuli equated for cone-contrast, many V1 neurons were characterized as 'color-luminance' because they responded almost equally well to both (Johnson et al., 2001). Luminance cells, as well as a minority of color-only neurons (which were lowpass and unoriented), were also reported. In this study, the color tuning of neurons along different directions of color space was not examined in detail. Instead of comparing between different colors, the characterization of individual neurons was based on the ratio between the maximum isoluminant and luminance responses. A 'color-cell,' for example, does not necessarily have a particular color preference (i.e., it could be color-unsigned) but rather responds more strongly to pure-color gratings than to luminance ones. More importantly, most 'color-luminance' cells were tuned for SF and orientation in an equally selective way for chromatic and luminance patterns, revealing a spatial selectivity for color boundaries in primary visual cortex and thus making them a good candidate as the neural correlate of the 'form-from-color' system. In a subsequent study by the same group, these 'color-luminance' cells were renamed as 'doubleopponent' and were found to consist 30% of the total V1 population (Johnson et al., 2008). They are color-signed, i.e., have a simple type of receptive field that retains color-sign, and take their name from the presence of both spatial and cone opponency (see **Figure 3**). Double-opponent neurons (whether orientation selective or not) could potentially signal the local light-composition comparisons (see Land's Retinex algorithm above), and thus be the building blocks of color constancy calculations (color-from-color). Orientation-selective doubleopponent neurons that respond to pure chromatic patterns, could also be part of the form-from-color system in striate cortex (Johnson et al., 2008; Conway, 2009). Yet another 10% of the population was found to be single-opponent, non-selective for orientation, low-pass and not responding well to luminance, making them a good candidate for signaling the wavelength composition coming from a region. Finally, the remaining 60%

of the V1 population was found to be non-opponent; these cells were also the most orientation selective, making them a good candidate for the form-from-luminance system.

The prescription arising from the survey of the psychophysical literature is for color-tuned, oriented units to be monocular (or, at least, to display marked ocular dominance). Unfortunately, the studies described above provide scant information on this topic. Earlier electrophysiological and optical imaging studies do indeed indicate that cytochrome oxidase blobs represent monocular sites of color processing in primate striate cortex (Horton and Hubel, 1981; Livingstone and Hubel, 1984; Tootell et al., 1988; Ts'o and Gilbert, 1988; Ts'o et al., 1990; Landisman and Ts'o, 2002; Lu and Roe, 2008). But, problematically, their consensus is that the monocularly-driven color-processing cells lack selectivity for orientation, in apparent conflict with the behavioral results. This conflict may be eased by noting that the degree of orientation tuning observed can depend on the stimulus used: low frequency sinusoidal gratings characteristically elicit tighter tuning than rectangular bars (Leventhal et al., 1995). Indeed, most of the studies mentioned above that investigate the relationship between cytochrome oxidase blobs and orientation tuning, have used single bars as stimuli, with the exception of Landisman and Ts'o (2002) and Lu and Roe (2008): the latter two studies have used gratings instead, and report a less strict separation between blobs and orientation tuning. Furthermore, a more recent study using (achromatic) sinusoidal gratings reported appreciable orientation tuning in blobs that was only marginally inferior to that observed in interblobs – an orientation bandwidth of 28.4◦ for blobs and 25.8◦ for interblobs (Economides et al., 2011). This tantalizing picture of the nature of early physiological mechanisms vis-à-vis psychophysical inference requires clarification; ideally perhaps, an analysis of the trial-by-trial psycho-physiological correlate in monkeys trained to report subjective phenomena induced by a color-contingent tilt -illusion paradigm.

Moving from the early visual areas to prestriate cortex and thus higher in the hierarchy of visual processing, the presence of color-selectivity in V4 has been a seminal finding toward the idea of functional specialization in the visual system (Zeki, 1973). Furthermore, some cells in this area seem to encode the perceived color rather than the local wavelengthcomposition of the stimulus (Zeki, 1983; Kusunoki et al., 2006). The presence of a similar 'color-center' has been also reported in the human (Lueck et al., 1989; Bartels and Zeki, 2000) and, although many visual areas can decode the color of a stimulus using multivariate fMRI analysis, a color-space representing the perceptual organization of colors can only be found in V4 (Brouwer and Heeger, 2009). Anatomically, segregation and clustering of color-selective neurons has been reported in V4, similar to the cortical architecture found in areas V1 and V2 (Yoshioka and Dow, 1996). A picture of compartmentalization also emerges from a study combining monkey fMRI and electrophysiological recordings in V4 and more anterior areas in the inferotemporal cortex, revealing colorselective compartments named 'globs,' in which cells also show some shape selectivity but not as strong as in the 'interglobs' (Conway et al., 2007).

With respect to the color-form relationship, some neurons in area V4 have been reported to carry signals encoding feature contrast in either shape or color (Ogawa and Komatsu, 2004). In the same study in alert, task-trained subjects, the responses of these neurons could be modulated by top-down attention to a particular type of singleton. In another electrophysiological study, 22% of the V4 neurons showed the color-unsigned property of responding maximally at isoluminance without necessarily being color selective, but could also show formselectivity when tested with various different shapes (Bushnell et al., 2011). In a subsequent study from the same group, the shape selectivity of V4 neurons was found to be the same when tested with two different colors – confirming the presence, in this higher visual area, of mechanisms which are color-sensitive but not color selective (Bushnell and Pasupathy, 2012). There is thus a selectivity to form defined purely from color contrast in V4, irrespective of the particular color that is being used. In this study, 35 V4 neurons were tested with different colors and shapes and the selectivities for these two attributes were found to be independent from each other.

Similar results have been reported in area IT, where again the selectivity to color (tested with non-isoluminant stimuli) was found to be independent from the selectivity to form, with color selectivity remaining the same across different shapes (Komatsu and Ideura, 1993). Independence between color and shape preference has been also reported in a study in which IT neurons were tested using combinations of two shapes and two colors (McMahon and Olson, 2009). For most neurons in this study, responses were a linear sum of their color and shape preference, indicating independent coding of the two, with only a few neurons showing some interaction effects and thus being suitable for coding conjunctions. It is interesting to note that the conjunction-responses were no later than responses to single features, implying no extra time for feature binding (McMahon and Olson, 2009).

A functional segregation between color, faces and form, with at least three representations of the visual field, has been revealed in area IT using fMRI both in human (see Bartels and Zeki, 2000) and monkeys (Lafer-Sousa and Conway, 2013). In the latter study, the more pronounced distinction was found between color and faces, with color regions weakly responding to non-face forms and with optimal form responses outside color regions. The authors suggest that the form-selectivity found in color regions could perhaps be attributed to the second, colorbased form system (form-from-color). The argument is not very convincing, however, since what is meant by 'form-selectivity' is a weakly differentiated response between fruit, bodies and places. Furthermore, the form-from-color system is not necessarily color-selective and would thus not show up in an fMRI cognitivesubtraction type of experiment. It is more likely that these color regions in IT are part of the color-from-color system, initiated in the V1 blobs and proceeding via the V2 thin stripes and V4 globs (see above).

The neural-basis of interactions between color and form has been also studied in humans, using fMRI. In one such study, subjects were adapted to either one of two different orientations defined by color (reddish/greenish) or luminance (four conditions in total), and the weakest rebound-response (i.e., the response following adaptation) in areas V1–V4 was found when the test was the same orientation and chromaticity (chromatic or achromatic) as the adaptor (Engel, 2005). Some transfer was observed between color stimuli of a different orientation, suggesting a pure color-adaptation, and not much transfer between luminance and color stimuli of the same orientation, suggesting separate populations and thus an independent chromatic input to form perception (Engel, 2005). Additionally, an ANOVA model was used in this study to show that joint adaptation effects can be greater than the sum of coloronly and orientation-only effects, suggesting the existence of a neuronal population which is jointly selective for both color and orientation. It is not clear from the design of this experiment whether color-selective neuronal mechanisms are necessarily involved, as color-unsigned neurons could also account for the effect.

Although adaptation and rebound can be a valuable tool to overcome the limited spatial resolution of fMRI, caution should be taken before inferring selectivity in a particular area from such studies (for example, see Rentzeperis et al., 2012). Multivariateanalysis, looking at activation patterns instead of individual voxel responses, is a preferable technique for solving such problems. Using machine-learning type of analysis on humanfMRI data, statistical learning algorithms trained to distinguish between different color-orientation conjunctions have revealed the presence of color-form binding in early visual cortex (Seymour et al., 2010). Supporting a segregation of function, separate sets of voxels were found to best support conjunctions, orientation information or color information, although some voxel groups could also discriminate between their non-preferred stimuli (Seymour et al., 2010). With respect to color processing *per se*, V4 was found to have the best performance in color discrimination and was the only area to encode the latter more efficiently than orientation. In general, however, some caution is also necessary when inferring neuronal mechanisms from fMRI data, especially in the case of multivariate analysis where the strategy that the classifier uses for classification is not always clear.

Regarding the ability of chromatic information to support form perception, another human fMRI decoding study has shown that areas V1–V3 can perform orientation discrimination using luminance, M-L cone, and S-cone stimuli (Sumner et al., 2008). Furthermore, since no transfer was found between conditions (e.g., training the algorithm with luminance and testing with M-L stimuli) this finding suggests the presence of at least three separate orientation-selective neuronal populations in each of these areas, responsive to particular directions of color space. Although it is not always wise to extrapolate fMRI data to the level of single cells, this color specificity suggests the involvement of color-selective neuronal populations in form processing. These results were confirmed by another human fMRI study, in which there was an orientation-specific responsereduction to a target stimulus (when surround and target had the same orientation), both for luminance and isoluminant gratings in areas V1, V2, V3, V3A, and V4 (McDonald et al., 2010). A weak cross-effect reduction was found in some areas, suggesting once more the existence of separate luminance and chromatic orientation-selective neuronal populations, but also the presence of orientation mechanisms which are indifferent to the nature of the signal. Finally, prestriate areas (but not V1) have been shown to code for complex spatial structure (phase congruency) in a human fMRI study using isoluminant stimuli, suggesting again that pure chromatic information can be used in higher order form-vision calculations (Castaldi et al., 2013). Furthermore, by using even- and odd-symmetric isoluminant phase-congruent stimuli, this study implies the presence of colorsigned mechanisms throughout visual cortex.

#### Conclusion

Information about the wavelength composition of light reflected by the various parts of the visual field can be used twofold: in order to compute color and attribute it to the various objects in the scene, and in order to detect chromatic boundaries and see shapes and forms. The latter can also be the result of processing luminance information, which is not necessarily redundant with that provided by the chromatic system. Furthermore, in addition to the ability of chromatic cues to support form perception, color *per se* can also influence the perception of form, at various stages of the processing hierarchy.

A contingency between form and color has been revealed from several psychophysical experiments, using adaptation and illusions. Such contingencies suggest that form information is coded independently by the color system, in a way that is equally effective to that of the luminance system. A coupling between color and form at early visual channels is further supported by the existence of dual-selective neurons at the early stages of visual processing, as well as by human fMRI data that reveal the presence of information regarding color-form conjunctions in early visual areas. Equality between form-from-luminance and form-from-color is evident in psychophysical studies comparing the spatial properties of color and luminance detectors, which reveal a similarity in the orientation and SF tuning characteristics of the two systems. The luminance system is more narrowly tuned and more sensitive at high SFs, whereas the color system performs better at low SFs and can in some cases dominate luminance information. Thus, psychophysical experiments investigating form detection at an early, local level, demonstrate the existence of equally efficient independent mechanisms for the detection of chromatic and luminance signals respectively, and the ability of chromatic information to support the early stages of form vision almost as effectively as luminance information. These separate mechanisms perform an independent early analysis of the spatial information of the image, but there also seems to be a certain level of interaction between them.

Overall, at a local level, the performance of the color system is not inferior to the performance of the luminance system and may sometimes be superior to it, making chromatic information just as valuable as luminance information in supporting form perception. A comparison between the efficiency of these different types of input has been also investigated in global form perception: at that stage, where the spatial integration of different local information takes place, the color system once more accomplishes global integration tasks in a way that is equally effective to that of luminance information. Interestingly, although some tasks reveal separate color and luminance integration systems with a limited amount of interaction between them, even at the global stage, others additionally suggest the existence of a single mechanism able to pool both types of information. Results from perceptual illusions also support this notion, showing independence between the different mechanisms at a monocular level but combination of color and luminance information at higher stages, after the input of the two eyes has been combined. It therefore seems that, at some point and depending on the task, the brain is able to bring together spatial information coming from different sources.

As far as the visual system is concerned, anatomical, electrophysiological, as well as imaging studies have shown that the mechanisms involved in perceiving color are separate and topographically segregated from those supporting the perception of form. Such architecture is in agreement with the general principle of a functional specialization in the visual brain. There are several studies supporting a functional segregation between color-from-color and form-from-luminance, although a possible segregation between the latter and form-from-color is not yet clear at the neurobiological level: some studies have even shown that both color-derived and luminance-derived form calculations can be performed by the same cells, in an equally effective way. More importantly though, our behavioral ability to perceive form defined by pure-color information is fully supported at the implementation level by the existence of spatial selectivity in

#### References


neurons driven by isoluminant stimuli: cells responding to pure chromatic contrast are present in several visual areas and provide a good candidate for the color-derived form-vision.

Concluding, the relationship between color and form seems to be one of love, support and dependence, but also of separation, independency and, in some cases, rivalry. Color information can be carried by neurons which are either color-signed, i.e., selective for a particular color, or unsigned, i.e., are unselective but respond to pure color contrast. Electrophysiological studies have revealed the presence of both types of cells and results from both psychophysics and imaging suggest that both are necessary to explain various aspects of the form-color interaction.

#### Acknowledgments

This work was supported by the European Union (European Social Fund) and Greek national funds through the operational program "education and lifelong learning" of the National Strategic Reference Framework, research program THALIS-UOA-COGMEK (project 892, MIS 375737). Publication fees were covered by the department of history and philosophy of science, university of Athens. I am grateful to Professor Semir Zeki for accommodating me in his lab at University College London while completing a big part of the present review. I am also grateful to the two colleagues that reviewed the present manuscript and came up with numerous suggestions for its significant improvement.


Land, E. (1974). The retinex theory of colour vision. *Proc. R. Inst. Gt. Br.* 47, 23–58.

Landisman, C. E., and Ts'o, D. Y. (2002). Color processing in macaque striate cortex: relationships to ocular dominance, cytochrome oxidase, and orientation. *J. Neurophysiol.* 87, 3126–3137.


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Moutoussis. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

**REVIEW ARTICLE** published: 27 October 2014 doi: 10.3389/fpsyg.2014.00932

#### *Ilias Rentzeperis 1,2 \*, Andrey R. Nikolaev3 , Daniel C. Kiper <sup>1</sup> and Cees van Leeuwen3*

<sup>1</sup> Institute of Neuroinformatics, University of Zürich and Swiss Federal Institute of Technology, Zürich, Switzerland

<sup>2</sup> Laboratory for Human Systems Neuroscience, RIKEN Brain Science Institute, Wako, Japan

<sup>3</sup> Laboratory for Perceptual Dynamics, University of Leuven, Leuven, Belgium

#### *Edited by:*

Galina Paramei, Liverpool Hope University, UK

#### *Reviewed by:*

Ruth Rosenholtz, Massachusetts Institute of Technology, USA Konstantinos Moutoussis, National and Kapodistrian University of Athens, Greece

#### *\*Correspondence:*

Ilias Rentzeperis, Laboratory for Human Systems Neuroscience, RIKEN Brain Science Institute, Wako, Saitama 351-0198, Japan e-mail: ilias@brain.riken.jp

#### **INTRODUCTION**

The seminal investigations of Hubel and Wiesel (1959) established that the receptive field properties of single neurons in V1 emerge from the integration of neurons in the previous processing stage. Since then, it is commonly believed that visual information is processed in a hierarchical fashion, consisting predominantly of feedforward feature integration and convergence. Distinct streams of processing are initially kept anatomically separate, only to come together in higher order areas by specialized cells (Barlow, 1972) – or, alternatively, to be organized functionally, in time through synchronized activity (Von Der Malsburg, 1994; Singer and Gray, 1995).

In this perspective, the earlier stimulus representations are characterized by segregation. The segregation hypothesis postulates that different attributes of visual stimulation are being received and processed by distinct populations of neurons. (Zeki, 1978; Hubel and Livingstone, 1987; Livingstone and Hubel, 1988) and contrasts with integrality, which permits populations of neurons to have mixed selectivities.

More recent studies have taken issue with the notion of segregation (see Lennie, 1998; Gegenfurtner and Kiper, 2003; Shapley and Hawken, 2011 for reviews) as the most viable option for understanding visual representation. It has been shown, for instance, that motion and disparity are encoded jointly in certain subpopulations of neurons (Roy et al., 1992; DeAngelis et al., 1998; Anzai et al., 2001; Pack et al., 2003; Grunewald and Skoumbourdis, 2004). Such results suggest that it may be time to reconsider the segregation model.

Here we will review the status of the evidence for segregation of color and form. This is of special interest because color and form are most likely processed along the ventral cortical pathway (Ungerleider and Mishkin, 1983) and yet, this pair has been regarded as extremely segregated, not only represented in separate neurons, but also in distinct brain regions. For this reason,

To what extent does the visual system process color and form separately? Proponents of the segregation view claim that distinct regions of the cortex are dedicated to each of these two dimensions separately. However, evidence is accumulating that color and form processing may, at least to some extent, be intertwined in the brain. In this perspective, we review psychophysical and neurophysiological studies on color and form perception and evaluate their results in light of recent developments in population coding.

**Keywords: color, form, segregation, integration, distributed processing, mixed selective cells, high dimensional code, complex selectivity**

> segregation of color and form is more controversial than that of color or form versus motion. Dubner and Zeki (1971) first observed a region in the superior temporal sulcus of the macaque that was selective for direction of motion but unresponsive to color or orientation. Zeki (1973, 1974, 1977)showed evidence that areas in extrastriate cortex functionally differ from each other; he proposed that areaV4 is specialized for color andV5 (MT) for motion. In later studies, Hubel and Livingstone (1987; Livingstone and Hubel, 1988) observed that color and form are processed in distinct regions within both the primary (V1) and prestriate (V2) cortex. Motion is considered to be processed along the dorsal stream. Lennie (1998) argued that motion could be a special case and "the separation of motion signals from signals about other dimensions of image variation means the analysis they subserve is self-contained."We will not contest this here.

> We concentrate on the ventral stream, and investigate if any further subdivisions are necessary. We do not wish to claim, however, that visual object information is exclusively processed in the ventral stream. There is evidence that object representations exist in parallel in both dorsal and ventral streams (Konen and Kastner, 2008). The dorsal information pathway is thought to be involved in the encoding of spatial relationship of objects (Mishkin et al., 1983). It could thus be the case that dimensions related to spatial relationships among objects are anatomically separate from the ones that define an object.

> In our present review of color and form segregation we argue, first, that the segregation view does not square well with behavioral findings, including those on attentional feature integration that have traditionally been interpreted in a segregation-friendly framework (Treisman and Gelade, 1980). Next, we re-examine some of the classical neurophysiological studies demonstrating segregated processing of color and form, along with more recent evidence that may undermine the necessity of segregation as the best possible explanation. In areas predominantly representing

color or form, weak selectivities for the other feature also exist. The predominance of studies in single neurons has hitherto obscured the role of weak selectivities in distributed coding. Weak selectivities can have a strong collective effect in a neuronal population. Their presence, and generally that of mixed selectivities in single neurons, enables neuronal population codes with flexible and context sensitive feature representations, properties that have been shown to exist in early visual cortex. We discuss how neuronal population codes could be used in perceptual integration of color and form via feedforward, and feedback and horizontal (recurrent) perceptual mechanisms.

#### **BEHAVIORAL STUDIES**

#### **FORM SENSITIVITY TO COLOR AND LUMINANCE SIGNALS**

According to the segregation framework, functions dedicated to color vision will be poor at form processing and those engaged in form vision operate almost exclusively on luminance signals. However, psychophysical studies have rallied against such a dichotomy by showing comparable orientation discrimination thresholds for color and luminance stimuli (Webster et al., 1990; Reisbeck and Gegenfurtner, 1998; Beaudot and Mullen, 2005). Likewise, performance in contour integration was similar from either color or luminance local elements (McIlhagga and Mullen, 1996; Rentzeperis and Kiper, 2010). Furthermore, recent rating experiments on the similarity of two bars varying in both orientation and color have been inconclusive on how separable color and orientation are (Bimler et al., 2013). This evidence suggests the possibility of color and orientation mechanisms interacting at an early stage of visual processing.

Two well-known visual illusions in the perception of orientation are the tilt aftereffect and the tilt illusion (Gibson and Radner, 1937). Livingstone and Hubel (1987b) reported that there is no tilt illusion effect at isoluminance, a finding that supported their proposition that color and form are processed separately. However, a later study showed the presence of large tilt illusions for isoluminant stimuli (Clifford et al., 2003b). Furthermore, Flanagan et al. (1990) showed that the tilt aftereffect can be also induced by isoluminant gratings. In showing that color-only channels are sensitive to the illusion, both studies (Flanagan et al., 1990; Clifford et al., 2003b) support the interaction of color and form early in processing.

Contrast sensitivity as a function of spatial frequency for both red-green and blue-yellow isoluminant gratings initially was shown to be low pass (Mullen, 1985), a finding that supported the view that color vision has poor spatial acuity (Livingstone and Hubel, 1988). However, later studies pointed out that the low pass contrast sensitivity function is the envelope of several band-pass spatial frequency filters (Bradley et al., 1988; Losada and Mullen, 1994).Mechanisms that have band-pass filters are suitable for the detection of edges or locally oriented elements that form global patterns. These results, therefore, indicate that color vision, like luminance vision, encodes the visual scene using band-pass filters.

#### **COLOR SELECTIVITY OF LOCAL AND GLOBAL FORM PROCESSES: GLASS PATTERN STIMULI**

Psychophysical studies of (achromatic) form processing mechanisms have often used Glass patterns as stimuli (Glass, 1969; Glass and Switkes, 1976; De Valois and Switkes, 1980; Kovacs and Julesz, 1992; Dakin, 1997; Wilson et al., 1997; Wilson and Wilkinson, 1998; Dakin and Bex, 2001, 2002). Glass patterns are constructed from oriented dot pairs; depending on the orientation of the dot pairs different global forms can be perceived. Wilson and Wilkinson (1998) proposed a feedforward, hierarchical model of Glass pattern processing, in which the early stages (V1/V2) use oriented filters and rectification to process the local dot pairs and later stages (V4) pool and sum the output of previous stages to create the percept of global form. Accordingly, subsequent electrophysiological studies have indicated that V1 and V2 neurons respond to dot pairs irrespectively of global form (Smith et al., 2002, 2007).

Colored Glass patterns are an eminent tool for studying color selectivity of local and global form processes. Mandelli and Kiper (2005) measured detection thresholds for circular Glass patterns that consisted of dots isoluminant to the background, with different colors within each dot pair. When the difference in color between dot pairs increased, observer sensitivity decreased. The results suggest that there are local processing mechanisms with narrow color tuning (color selectivity) that are also orientation selective. If not we would expect the observer sensitivity to stay the same irrespective of the presence of a color difference between dot pairs. The average tuning in color space of the local mechanism (the range of colors a local unit responds to) was consistent with the physiological observations that color selective cells in V1 and V2 are also orientation selective (Leventhal et al., 1995; Friedman et al., 2003). The result, therefore, is in accordance with the notion that early processing mechanisms show mixed selectivity.

To probe the color selectivity of global form mechanisms, Wilson and Switkes (2005) measured Glass pattern detection when the colors between dot pairs [but not within dot pairs as in the Mandelli and Kiper (2005) study] were varied. They found that the distance in color between the dot pairs did not affect observer sensitivity. This result suggests a color invariant global form mechanism. Adaptation studies with color and luminance Glass patterns confirmed this result and showed that global form mechanisms are invariant to luminance polarity as well (Rentzeperis and Kiper, 2010; Rentzeperis et al., 2012). In summary, the results on colored Glass patterns indicate that early form processes that code for local features are also selective for color; however, intermediate processes that pool and sum the local orientation cues are color invariant, in the sense that they can integrate oriented signals of any chromaticity.

#### **COLOR AND FORM ASYNCHRONY**

In a series of psychophysical studies, Moutoussis and Zeki (1997a,b) showed that different visual features presented at the same time may not be perceived as simultaneous. That these features are perceived at different times, the authors argued, indicates that they are processed separately. In one of these studies, participants were shown on one half of the screen a colored checkerboard pattern (the colored squares alternating from red to green) and on the other half grey bars (all alternating their tilt from left to right). Participants had to match the colors of the

squares with the orientation of the bars that were presented at the same time (Moutoussis and Zeki, 1997b). Both color and orientation changes occurred at the same rate but their phase difference varied. For certain phase differences the color and orientation pairs perceived were different from the actual ones. The temporal mismatch indicated that color is perceived approximately 63 ms before orientation. Bartels and Zeki (1998) argued that this kind of perceptual asynchrony supports functionally distinct modules in the brain which are acting as autonomous perceptual units, each processing the stimulus in their own time frame (Bartels and Zeki, 1998; Zeki and Bartels, 1998). This claim, however, is at odds with electrophysiological measurements on the macaque, which have shown that the difference in visual response latencies between visual areas does not exceed 20 ms (Schmolesky et al., 1998). If different visual areas in the brain acted as independent functional and perceptual units we would expect the latency in neural response between different visual areas to match the time difference between color and orientation perception.

Holcombe and Cavanagh (2001) measured the temporal resolution of the perception of feature pairs when color and orientation were spatially separated and spatially superimposed. In both conditions, color and orientation changes happened at the same time and participants had to match them. When color and orientation were spatially separated participants reached 75% threshold accuracy in reporting the correct pairings for rates of presentations that were less than 3 Hz. However, when color and orientation were spatially superimposed participants reached the same performance for rates of presentations that were more than six times faster. The latter frequency corresponds to ∼50 ms for feature binding. The authors concluded that color and form are processed in combination in early stages; when the two features are spatially separated they go through a binding process which has low temporal resolution.

In a subsequent study, Clifford et al. (2003a) used sinusoidal gratings oscillating in color and orientation at the same temporal frequency and for a range of phase differences. They found that for rapid presentation rates (10 Hz) both color and orientation were perceived at the same time. However, as the presentation rates decreased the asynchrony between color and orientation grew; for a presentation rate of 1 Hz, color perception preceded orientation perception by 50 ms. The authors proposed that the perceptual asynchrony observed for slow presentation rates could be attributed to a difference in adaptation between color and form processes, resulting in changes to their temporal response profiles. They suggested, however, that both color and orientation are processed by overlapping populations of neurons (since participants show high temporal precision) with each neuron in this population using multiplexed temporal codes for color and orientation. This interpretation is in line with electrophysiological measurements in monkeys indicating that separate temporal codes representing color and form are multiplexed in single neurons in areas V1, V2, and V4 (McClurkin and Optican, 1996; McClurkin et al., 1996). The data from these studies were in accordance with a model in which the response of a neuron to a colored form is the product of a response pattern encoding color and a response pattern encoding form added on

top of the neuron's average response to all stimuli (McClurkin et al., 1996).

#### **FEATURE INTEGRATION THEORY AND VISUAL SEARCH**

The psychophysical literature suggests early integration of color and form information. This calls into question theories proposing that visual feature integration takes place in a late stage of processing. Among these theories, feature integration theory (FIT; Treisman and Gelade, 1980), has been the most influential. FIT claims that color and orientation are initially processed in parallel and pre attentively. As a result, the detection time of a red target remains approximately constant irrespective of the number of green distracters in the visual scene. Note that the target has a basic feature that is not shared by the distracters. By contrast, the detection time of a horizontal red target amongst horizontal green or vertical red distracters increases as the number of distracters grows. Here, the target shares a basic feature with each of the distracters so only their combination is distinctive. Searching targets based on integral, combined, features is done serially, on an item-by-item basis (Treisman and Gelade, 1980). Perceptual integration, therefore, involves attention.

In line with FIT, several authors have proposed biologically plausible models of visual search in which visual stimuli are processed in parallel by feature maps, each covering the entire visual field and representing a single basic visual feature. Feature maps identify locations in the visual field where the feature they represent is different from its surrounding. All the feature maps then feed into a saliency map which codes for conspicuous locations irrespective of the visual feature that stands out (Koch and Ullman, 1985; Wolfe, 1994; Itti et al., 1998). The existence of a feature map for each feature does not necessarily imply an independent physiological locus for that map (Wolfe, 1994).

The locus of the saliency map is not clear; based on neurophysiological or imaging data several candidates regions have been proposed in the parietal cortex (Gottlieb et al., 1998; Geng and Mangun, 2009), V4 (Mazer and Gallant, 2003), FEF (Thompson and Bichot, 2005; Serences and Yantis, 2007), and superior colliculus (Kustov and Robinson, 1996). Li (1999, 2002) has proposed that V1 acts as a saliency map and that no separate layer of feature maps is needed; the receptive fields of the neurons that have the highest responses (regardless of the neurons' feature selectivity) indicate the salient location(s). Recent physiological evidence in humans is consistent with this observation (Zhang et al., 2012).

A number of results, however, have contested the interpretations of FIT and related computational models. For instance, visual search for targets defined by a conjunction of motion and form features (McLeod et al., 1988) and for 3D shapes (Enns and Rensink, 1990) happens in parallel. Visual search for targets and distracters oriented differently can be serial for certain orientation combinations (Wolfe, 1994), even though neurons encode orientation in early visual areas. Finally, visual search that initially was serial for certain stimuli can become parallel with practice (Sireteanu and Rettenbach, 1995).

Why, if color and form are not segregated, does search for a unique feature appear parallel, while search for a conjunction of color and form appear serial? Recently, Rosenholtz et al. (2012) proposed a model that aims to explain these results. The model assumes that the visual system computes a set of summary statistics pooled over local regions that cover the whole visual field. The local regions grow linearly with eccentricity so as to represent the degraded resolution of the visual system for peripheral locations attributed to the larger receptive fields in the periphery compared to the fovea. During a search task, the visual system has to discriminate the summary statistics of peripheral regions with distracters only from those containing the target. If peripheral vision can discriminate the target from the distracters, visual search will be parallel, because the subjects will have information that will guide their eyes to the target right away. If peripheral vision cannot discriminate the target from the distracters, visual search has to be serial because subjects will not have information on where to move their eyes to track the target. In the context of this model, feature binding is largely independent of top-down attention; search performance depends on the amount of information loss of the visual system mainly in the periphery. Thus, the model could operate with either segregated or integrated processing of features in early visual cortex. Rosenholtz (2011) suggested that summary statistics may be computed in multiple color bands, possibly including correlations across bands. Computing summary statistics within a color band means computing responses of orientation-selective, band-pass filters within a color band, reminiscent of filters that are both orientation and color selective.

In sum, psychophysical evidence supported an integrated rather than a segregated view on color and form processing. Model studies show the viability of such integrated views, or are at least agnostic with respect to the controversy. In the following section we consider the classical neurophysiological studies in support of segregation of color and form in early visual cortex and contrast them with more recent findings that show significant intermixing of color and form in the same areas.

#### **SEGREGATED OR INTEGRATED SELECTIVITIES OF SINGLE NEURONS IN EARLY VISUAL CORTEX?**

Early, influential studies on cortical processing have shown evidence of spatially separate populations of neurons being sensitive to different features of a visual scene. For one, studies in V1 and V2 of the primate cortex have indicated regions with distinct anatomical characteristics. Staining with mitochondrial enzyme cytochrome oxidase (CO), revealed alternating dark and light regions in layers 2/3 of V1, a result that indicates high and low concentrations of CO respectively in V1 (Humphrey and Hendrickson, 1980; Horton and Hubel, 1981). The darker stains in sections tangential to the cortical surface were coined *blobs*, in accordance with their three dimensional, oval shapes, and the lighter stained regions were called *interblobs*. V2 shows a different, but equally interesting pattern of patches when stained for CO. Instead of oval shapes, tangential sections show an alternation of dark stripe and light interstripe regions; the dark stripes are of two types; thick and thin ones (Livingstone and Hubel, 1982;Tootell et al., 1983). From

Livingstone and Hubel (1984, 1988) proposed that the anatomically segregated regions in early visual cortex have distinct functional properties. They suggested a link between the CO regions and the magnocellular (M) and parvocellular (P) retino-geniculo-cortical pathways. Whereas the M pathway projects from layer 4B in V1 to the thick stripes in V2 and is selective for depth and motion the P pathway is subdivided into two streams; one passing through the *blobs* in V1 and the thin stripes in V2 that mediates color and another one passing through the *interblobs* in V1 and the interstripes in V2 that mediates form (**Figure 1**). The authors concluded that double opponent cells (cells exhibiting both color and spatial opponency) in V1 *blobs* are not orientation selective and have low spatial acuity. Edges, they suggested, are signaled by cells in the *interblob* area in V1. While these cells are orientation selective, they are not color opponent; they can respond to a luminance or color edge regardless of its color but cannot code the color information of the edge (Hubel and Livingstone, 1987; Livingstone and Hubel, 1988). Additional physiological studies have supported the idea that within area V2, separate anatomical regions have distinct functional properties (Hubel and Livingstone, 1985, 1987; Shipp and Zeki, 1985; Tootell and Hamilton, 1989; Ts'o et al., 1990; Malach et al., 1994; Roe and Ts'o, 1995; Moutoussis and Zeki, 2002).

Since then a number of electrophysiological studies have challenged this segregated view on V1 and V2. Lennie et al. (1990) measured the responses of cells in layers 2/3 of V1 and found that cells inside and outside *blobs* did not have different chromatic properties. Friedman et al. (2003) measured, in layers 2/3 of V1 and in V2, the selectivity of cells for color, orientation and border position from alert macaque monkeys. They found no correlation between any of the selectivities. Leventhal et al. (1995) recorded cells from layers 2/3 and 4 in V1 and also found no correlation between orientation and color selectivity. Clearly, based on the segregation view, a negative correlation would have been predicted. Similarly to Lennie et al. (1990), there was no difference observed in the response properties between cells outside and inside the V1 *blobs* (Leventhal et al., 1995). Using implanted 100 electrode arrays in V1, Economides et al. (2011) found very subtle differences in orientation tuning between neurons in *blobs* and *interblobs*; the mean orientation bandwidth of cells in *blobs* was 28.4 and in *interblobs* 25.8◦. The most pronounced difference was in activity: *blob* cells had 49% higher firing rates than *interblob* cells.

A CO *blob* system has also been found in primates with no color vision (Condo and Casagrande, 1990). O'Keefe et al. (1998) measured the response of V1 neurons in the nocturnal, New World monkey (a species containing only a single cone type). They found no difference in orientation tuning, eye dominance, temporal frequency tuning and contrast response for neurons in *blobs* and *interblobs*. The repeating anatomical patterns found in the visual cortex and other parts of the brain, Purves et al. (1992) argued, do not reflect a fundamental

**FIGURE 1 | Schematic representation of an early segregation model of visual information pathways from the retina to V2.** Parasol cells in the retina are linked to the magnocellular pathway. They project to layers 1 and 2 of LGN, continue to layer 4Cα of V1, and then from layer 4B of V1 they project to the thick stripes of V2. This pathway conveys information about motion and stereo. Midget cells in the retina are part of the parvocellular pathway; they

project to layers 3–6 of LGN and on to layer 4Cβ of V1. From then on they split into two streams. The stream that conveys information about color projects to the blobs in layers 2/3 of V1 and then to the thin stripes in V2. The stream that conveys information about form projects to the interblob area in layers 2/3 of V1, and then to the interstripes in V2 (drawn by Anastasia Lavdaniti; anastasialavdaniti@gmail.com).

#### functional principle but rather are byproducts of developmental requirements.

Johnson et al. (2001) divided the neurons from which they recorded in V1 into three groups, depending on their sensitivity to color and spatial patterns of luminance. Most of the neurons strongly preferred luminance patterns compared to colored ones (60% luminance cells); fewer neurons showed strong color selectivity (11% color cells). Interestingly enough, a considerable percentage of neurons were selective to both luminance and color patterns (29% color-luminance cells). Color-luminance cells did not respond or responded poorly to patterns of low spatial frequency (<0.5 cycles per degree); instead they showed a band-pass tuning similar to luminance cells. Most color cells were low pass in their spatial frequency tuning. In a later study Johnson et al. (2004) concluded that color-luminance cells are double opponent. In contrast to the Hubel and Livingstone studies, Johnson et al. (2008) found that most double opponent color-luminance cells are also orientation selective.

In reviewing the functional segregation of early visual areas, Gegenfurtner (2003) collected results from six studies in which cells from the distinct CO compartments in V2 (thin stripes, thick stripes, interstripes) were examined (DeYoe and Van Essen, 1985; Peterhans and von der Heydt,1993; Levitt et al.,1994; Roe and Ts'o, 1995; Gegenfurtner et al., 1996; Kiper et al., 1997). According to the segregation perspective, cells in the interstripes are selective to form and cells in the thin stripes are selective to color. The averages

from these studies confirmed that cells in the thin stripes are most selective for color, cells in the thick stripes and interstripes are most selective for orientation and cells in the thick stripes are most selective for direction of movement (**Figure 2**). Nevertheless, cells in each compartment were selective for other features as well. The results show that to a considerable extent, the selectivities within both the interstripe and thin stripe regions are mixed, especially for color and form. Around 30% of cells in the interstripes are selective for color and around 40% of cells in the thin stripes are selective for form.

Several studies discussed in this section have questioned the hypothesis that neurons in different CO compartments process separate dimensions of the visual scene. If, on the one hand, there is some degree of anatomical and functional specialization in the brain, why are there these mixed selectivities in the different CO compartments? Or why are there neurons with more than one selectivity in the early visual cortex? If, on the other hand there is no anatomical and functional segregation in the brain why is there a bias for certain features in different CO compartments?

All of the above-mentioned electrophysiological studies analyzed neural activity as if each neuron acted as an independent computational unit, i.e., without considering the possible role of interactions between neurons. Individual neurons with broad selectivities to color or to orientation were categorized as non-selective to color or to form, respectively. Yet, perhaps,

perceptually significant information does not arise at the single neuron level, but from a population of neurons. The combination of responses from a population of neurons may reveal robust decoding for conditions where individual neurons show broad selectivity. Neurons, as we discuss in this section, could have mixed selectivities with unequal tuning widths for different features; however, a population code consisting of inputs from neurons like that may show sharp tuning for all features. In the next section we discuss studies that examine possible ways a population of neurons can encode information and what are the attributes of neurons that make encoding of information optimal.

#### **DISTRIBUTED PROCESSING**

Early influential studies on color and form processing promoted the view that perceptually significant information happens at the single neuron level (Zeki, 1978; Livingstone and Hubel, 1988). An analysis adhering to this view can overlook the possibility that weak selectivities at the single neuron level encode information at the population level. Evidence supports the notion that the brain processes information by combining signals from neuronal populations. Firstly, repeated presentations of the same stimulus evoke considerably variable responses from a single neuron (Tolhurst et al., 1983; Vogels and Orban, 1990). If the activity of single neurons represented perceptually significant activity we would expect less variability in the response of a neuron after repeated presentations of the same stimulus. This leads us to the next point; single neurons in the visual cortex have weak correlations with behavioral decisions (Britten et al., 1996; Shadlen et al., 1996). Finally, the structural features of neurons suggest the formation of distributed circuits with long range connectivity (Alexander and van Leeuwen, 2010; Yuste, 2011).

Contrasting with the single neuron viewpoint, the response of a single neuron gives an ambiguous response by itself and can only provide sufficient information if considered in conjunction with the responses of the rest of the neurons forming a network. In line with this perspective, Lehky and Sejnowski (1988)showed that selectivity of single neural units could give misleading information on the function of a neural network. Population coding analysis examines how information is represented from the pattern activity in a group of neurons. In an influential study on population coding, Georgopoulos et al. (1986)represented the activity of each neuron recorded in the arm area of the primate motor cortex as a vector pointing in a specific direction in 3-D space. The vector associated with each cell was weighted according to the activity of that cell, and then all the vectors were summed. The direction of the vector sum was in close approximation to the direction of the arm movement of the monkey despite the broad tuning of single cells.

Wachtler et al. (2003) examined whether the activity of a population of neurons in macaque V1 can represent color perception. Population responses were expressed as vectors, with each element of the vectors representing the activity of a single neuron. The authors found that distributed neural response changes with different backgrounds corresponded with induction effects in color perception (shown in a follow-up experiment with human participants). An example of the authors' analysis is shown in **Figure 3**. In **Figure 3A**, color patches (c) and (b) are physically identical but appear different because they are displayed on different backgrounds. Furthermore, color patches (a) and (b) appear similar even though they are physically different. In **Figure 3B**, the pattern of responses of four neurons of patch (b) is more similar to that for patch (a) than for patch (c), indicating that the population response of neurons in V1 correlates with color perception. Note that decoding in this study is represented by a vector with the activities of all the neurons. In the Georgopoulos et al. (1986) study each neuron was represented by a position vector pointing at the preferred direction of that neuron; the decoded direction was given as the weighted average of all the vectors. Thus, perhaps, the rules for information processing from a population of neurons depend on the nature of the target feature.

Our understanding of how a population of neurons could represent information has beenfacilitated by studies that link machine learning principles with neural processing (Buonomano and Maass, 2009; Rigotti et al., 2013). In this framework, neurons that have small responses to a particular feature or to a combination

of features can be crucial in the encoding of distributed information, whereas these weak selectivities would be hard to interpret in single neuron analysis. To date, this issue has predominantly been investigated in the prefrontal cortex. In the remainder of this section we focus on neurons in prefrontal cortex. We argue in the next section that if mixed selectivity is a property of neurons throughout the cortex, a simple assumption of non-linearity will enable us to explain the often conflicting results on the selectivity of neurons in early visual cortex discussed in the previous section.

In a recent study, Rigotti et al. (2013) analyzed activity of neural populations in prefrontal cortex (PFC) while monkeys performed a memory task. The authors showed that the dimensionality of the population code is higher when single neurons are tuned to a non-linear mixture of conditions compared to when they respond exclusively to one condition or a linear mixture of conditions. The concept of dimensionality generally refers to the minimum number of coordinates that are needed to fully specify all the points of a set of vectors. For example, two vectors that are linearly dependent (one is a multiple of the other) are one-dimensional (they lie on a line); if they are linearly independent they are twodimensional (they lie in a plane). Higher dimensionality leads to a more versatile code since the number of possible classifications of a linear classifier between two conditions grows exponentially with dimensionality. This means that a population of neurons that represent information in a high dimensional space has the capacity to perform complex tasks. **Figure 4** shows neurons with different selectivities and their effect in the dimensionality of a neural population code. Neurons 1 and 2 show pure selectivity to feature a and b of some stimuli, respectively, neuron 3 shows linearly mixed selectivity to both features and neuron 4 is non-linearly selective to both features (**Figure 4A**). In **Figure 4B**, the representation of the stimuli by the pure and linearly mixed neurons is low dimensional (it is on a line). However, in **Figure 4C** we see that if we substitute one of the neurons with a non-linearly mixed one, then the representation will be on a higher dimensional space (on a plane).

Can we lose information about the selectivity of a neural population by averaging its responses? To test this hypothesis, Rigotti et al. (2013) removed the classical selectivity from a population of neurons from a set of conditions and then tested whether the conditions could still be predicted from the response of the neurons. Classical selectivity refers to the average differences between conditions. To remove classical selectivity the authors added noise that eventually makes the average responses between conditions equal. The population responses could still predict at an above chance level the condition. Thus, on the one hand, the average responses of a group of neurons showed no significant differences between conditions; on the other hand, population coding could successfully differentiate between them. This result indicates that comparison of the average responses between conditions is not sufficient for the characterization of neuron responses and that these neurons have non-linearly mixed selectivities.

As we discuss in the following section, neurons as early as in V1 show complex selectivity and thus use neural code that is of higher dimension than initially thought. Therefore, it is plausible that these neurons are non-linearly mixed. Simply averaging a population of neurons can then hide some of their selectivities. This could explain the studies with conflicting results discussed in the previous section.

From perceptron theory, it is well-known what is required for non-linear combination of selectivities. A single layer neural network can only solve linearly separable problems, and thus map similar inputs to similar outputs. Such networks cannot solve for instance the exclusive OR (XOR) problem. This problem can only be solved with the addition of a layer in the network. Barak et al. (2013)showed in a theoretical paper that starting from the extreme case of totally segregated (or linearly mixed) representations, the dimensionality of the code can increase with an intermediate layer of randomly connected neurons. Thus even if pre-cortical neurons code for single features, it is feasible for neurons as early as in V1 to have non-linearly mixed selectivities, from connections either within V1 or from higher cortical areas.

Can a neural population represent separated signals if these signals are intermixed at the single neuron level? Mante et al. (2013) recorded from the PFC of macaque monkeys while they performed

**FIGURE 4 | Dimensionality of neural representations (taken from Rigotti et al., 2013). (A)** Contour plots of the firing rate of four neurons (spikes/sec). Their firing rate is shown as a function of conditions a and b which vary from 0–1. Neurons 1 and 2 are pure selective: they respond only to condition a and b, respectively. Neuron 3 is linearly mixed selective: its response is a linear combination of its firing rate to single parameters. Neuron 4 is non-linearly mixed: its response cannot be expressed as a linear combination of its firing rate to single parameters. The circles indicate the responses of the neurons for three different combinations of a and b. **(B)** The space of activities of the pure and linearly mixed neurons. **(C)**, as in **(B)**, with the only difference being that the axis where the linearly mixed neuron's response was represented is replaced by the axis that represents the response of the non-linearly mixed neuron. The circles represent the response of the neurons for the same combinations of conditions a and b as in **(A)**. In **(B)** we see that the response of the neurons lie in low dimensional space (a line). This low dimensional space limits the possible input output relationships that a linear classifier can implement. For example a linear decoder (a two dimensional plane in this case) cannot separate the black dot from the green dots. In **(C)** where the activity of the non-linearly mixed neuron is represented, a plane not only can separate the black dot from the green dots, but it can also separate any possible combination of the three dots. This is because the activity of the neurons lies in a higher dimensional space (a plane).

a color or a motion discrimination task on the same stimuli. The authors found that the representation of the color and motion features, and of the choice the monkeys made were separable at the population level but intermixed at the single neuron level. Separation of function at the neural population level but not at

the single neuron level in PFC has been shown in other studies as well (Sigala et al., 2008; Machens, 2010; Machens et al., 2010; Stokes et al., 2013). As discussed previously, analysis of response patterns in V1, V2, and V4 showed that neurons can have multiplexed but separable selectivities to color and form (McClurkin and Optican, 1996; McClurkin et al., 1996).

In light of the results on high level cognitive areas can we make any inferences on neural representations in early visual cortex? A prominent feature of non-linearly selective PFC cells is the complexity of their selectivity. Note that if the neural representation is high dimensional, a linear decoder can implement many input-output combinations; an attribute that is necessary for a population of neurons that perform complex tasks. As discussed earlier (Rigotti et al., 2013), neural populations that can perform complex tasks are suggestive of neurons with non-linearly mixed selectivities. In light of a number of studies showing that neurons in V1 can also show complex response properties previously attributed to higher order areas, we discuss in the next section the possibility that non-linearly mixed neurons are pervasive in the cortex.

#### **COMPLEX SELECTIVITY IN EARLY VISUAL CORTEX**

Neurons in the early stages of visual processing respond to visual stimuli within a local region in space, the classical receptive field. However, the responses of neurons to stimuli within their classical receptive field do not fully encompass their properties. Stimuli outside the neurons'classical receptive field do not elicit a response, but can modulate the response of the neurons to stimuli within their receptive field (Gilbert et al., 1996; Angelucci and Bressloff, 2006). An example of this modulation is surround suppression where, after a certain stimulus diameter, as the size of a stimulus centered within the receptive field of a neuron increases, the rate of firing of the neuron decreases (Hubel and Wiesel, 1965; Blakemore and Tobin, 1972; Nelson and Frost, 1978; Knierim and Van Essen, 1992; DeAngelis et al., 1994; Levitt and Lund, 1997; Adesnik et al., 2012). The properties of neurons in higher cortical areas are much more complex than the well-established classical and extraclassical receptive field properties of neurons in early visual cortex. Other studies, however, have indicated even more complex properties of cells in early visual cortex that may suggest that these cortical areas are more than just a relay to higher visual areas. As discussed in the previous section, complex selectivity of a population of neurons is indicative of non-linearly mixed selectivities at the single neuron level. Thus, complex selectivity in early visual cortex could suggest population of neurons that are responsive to several features.

Recent experiments confirm that activity in V1 can be driven or modulated by prior expectations. In an fMRI study, Kok et al. (2013) used a forward model to predict the direction of random dot motion patterns from activity in the early visual areas. Their results indicated that experimental priors can change the contents of the neural representation in early sensory cortex. Keller et al. (2012) showed that a subset of cells in the primary visual cortex of mice responded only when there was a mismatch between what the mouse was expecting to see and what it actually saw while it was running. Interestingly enough, the cells that showed the strongest responses could also encode the degree of mismatch between expectation and actual visual feedback. McManus et al. (2011) recorded from monkeys performing a contour detection task and found that V1 neurons were selective to complex forms and that this selectivity could be modulated by the monkeys' expectation of the form.

Evidence from electrophysiological studies in monkeys has indicated that attention enhances the response of neurons with receptive fields that are within the focus of attention in all of the cortical areas along the ventral stream, including V1 (Moran and Desimone, 1985; Spitzer et al., 1988; Chelazzi et al., 1993; Luck et al., 1997; McAdams and Maunsell, 1999). Furthermore, it has been shown that V1 neurons have complex perceptual grouping properties previously assigned to higher areas. Lamme (1995) found that V1 neurons play a critical role in figure-ground segregation since they show response enhancement for stimuli presented in the figure compared to stimuli presented in the ground area. In a binocular disparity study, Sugita (1999) showed that some orientation selective neurons in V1 had a diminished response when bars were occluded by a patch, but restored their response when the patch had crossed disparity and thus appeared to be in front of the bars. The studies by Lamme (1995) and Sugita (1999) along with the study by Wachtler et al. (2003) discussed in the previous section suggest that the modulation of neurons depends on global context.

The studies, we discussed in this section, indicate that neurons in early visual cortex are highly context dependent. The proposition, that there exists a context dependent population of neurons which at the same time processes segregated features of the visual scene seems contradictory to us. Furthermore, the complex properties of cells discussed here suggest that the neural population code in early visual cortex is high dimensional. High dimensional codes arise from non-linearly mixed selectivities at the single neuron level. Non-linearly mixed selectivities at the single neuron level may also explain the often conflicting neurophysiological results in the early visual cortex on color and form processing. Population average response can hide some of the selectivities of a neural population. However, as the study by Wachtler et al. (2003) showed, consideration of the pattern of responses from a population of neurons can reveal the complex behavior of the neurons.

The hierarchical model of vision represents V1 cells as localized spatial filters that extract low level visual features to transfer this information to the higher levels. However, V1 cells' complex perceptual grouping properties suggest strong influencesfrom horizontal and feedback connections from higher visual areas. In the next sections we examine the feedforward and recurrent modes of processing and possible ways they can modulate color and form selectivity of early visual cortical areas.

#### **FEEDFORWARD vs. RECURRENT PROCESSING OF COLOR AND FORM**

Thorpe et al. (1996) have shown that in a categorization task of complex natural images, ERP activity starts to differentiate for different visual targets approximately 150 ms after stimulus onset. Furthermore, a simple weighted summation of spike counts on a

population of neurons in IT taken between 100–150 ms after an object is presented can decode object identity even with moderate changes in the object's position, scale, illumination, pose and clutter (DiCarlo et al., 2012). These results along with other electrophysiological (Keysers et al., 2001) and psychophysical (Potter et al., 2014) studies suggest that processing in the visual system can be fast and use mainly if not exclusively feedforward circuits.

However, in the brain, we also find feedback connections from higher order areas to lower ones (Kennedy and Bullier, 1985; Shipp and Zeki, 1989; Felleman and Van Essen, 1991). Feedback connections enable the receptive field properties of neurons to change dynamically, in order to adapt to differences in behavioral state, contextual influences or expectations (Gilbert and Li, 2013). They can also contribute to the disambiguation of noisy scenes (DiCarlo et al., 2012).

As discussed earlier, there are cells as early as in V1 that code for both color and orientation (Leventhal et al., 1995; Friedman et al., 2003). A feedforward model similar to the one proposed by Hubel and Wiesel (1962) could explain this kind of tuning; a V1 cell has oriented, color selective regions in its receptive field because it receives synaptic input from center surround, color opponent LGN cells. This model is physiologically plausible. However, it is unlikely that all processing occurs this way.

Incremental grouping theory proposes a link between different perceptual grouping mechanisms and feedforward and recurrent processing (Roelfsema, 2006; Roelfsema and Houtkamp, 2011). It distinguishes perceptual grouping mediated by base and incremental grouping mechanisms. The base grouping mechanism groups feature conjunctions/objects that are coded by individual cells. The base grouping mechanism can code features of different complexities; it includes cells in V1 that code for both color and orientation and in medial temporal lobe that are selective to specific individuals (Kreiman et al., 2002; Quiroga et al., 2005). It is fast, feedforward and happens in parallel. However, it would lead to a combinatorial explosion if single neurons coded for every possible combination of objects/features. The incremental grouping mechanism is used for objects/features that are not coded by single specialized neurons. It increases the response of a population of neurons that encode the features to be grouped via feedback and horizontal connections. This process is slow since the spread of neural enhancement resulting in perceptual grouping happens gradually. There has been neurophysiological evidence in support of a distinction between base and incremental processing (Roelfsema et al., 1998, 2004; Pooresmaeili et al., 2010). A recent study showed that in macaque, V1 integration spreads out in an approximately 300 ms period from the focus of attention, following perceptual grouping criteria (Wannig et al., 2011). Thus depending on the situation at hand, visual processing can operate in two modes: the feedforward one which is specific, strong and fast and the feedback one which is diffuse, weak and slow.

In incremental grouping, color and orientation are jointly coded during feedforward processing. Recurrent processing, however, can possibly change dynamically the weights of color and orientation selectivity in early visual areas. Neurons as early as in V1 can slowly enhance their response via recurrent influences to signal the association of features to an object (Roelfsema et al., 1998). This may depend on contextual influences, including task or prior expectations.

Predictive coding theory also suggests a mode of visual processing different from the segregated one. The theory states that perceptual, cognitive and action-oriented processing follow a single general strategy, which uses top-down predictions to minimize prediction errors (Clark, 2012). This approach suggests that neuronal selectivity to a feature is not an intrinsic property but the result of interactions across levels of a processing hierarchy (Friston, 2003). Sensory neurons, rather than features *per se*, encode an error signal, i.e., theyfeedforward to hierarchically higher areas the discrepancy between the actual input and the top-down expectation (Egner et al., 2010). According to the predictive coding model, predictions are relayed via feedback connections, whereas prediction errors are conveyed via feedforward connections (Rao and Ballard, 1999). Hosoya et al. (2005) showed that retinal ganglion cells' spatio-temporal receptive fields change dynamically with the visual scene; this result is in line with the view that the raw signal, carried by the receptors, is transformed as early as in the retina by the first interneurons which encode deviations from predicted temporal and spatial structures (Srinivasan et al., 1982). Recent fMRI studies have also shown evidence in support of predictive coding in the visual cortex (Summerfield et al., 2008; Kok et al., 2012). For instance, Kok et al. (2012) found that the amplitude of the fMRI signal in early visual cortex was smaller when the stimulus was expected; typically when we see something that we expect the prediction error encoded in the brain is smaller compared to when we see something unexpected. This mode of processing, however, appears to be at odds with several electrophysiological studies (see Koch and Poggio, 1999 for a commentary).

In the predictive coding framework, context determines whether sensory neurons perform segregated or integrated processing. For example, if the color of some colorful shape is unexpected the visual system generates the prediction error related to the color only. As a result the neural response will indicate color selectivity which is segregated from form. However, if both the color and shape are unexpected, the prediction error will have information about both features, and the neural response will reflect integrated processing.

#### **CONCLUSION**

Early studies on visual processing indicated that different regions in the brain show biases in their selectivity for color and form. These results suggest that color and form are processed by distinct modules in the visual cortex. All of these studies assumed in their analyses that each neuron is an independent computational unit; thus weak selectivities at the single neuron level were disregarded. Meanwhile, these results were found to be at odds with some psychophysical and electrophysiological observations which suggested integrated processing of color and form in early visual cortex.

Studies on higher cortical areas have shown that visual representations and complex task conditions are represented by the distributed activity of a population of neurons. Here selectivities to a feature that appear weak at the single neuron level may encode that same feature robustly at the population level. Populations of neurons that perform complex tasks in PFC were

shown to have non-linearly mixed selectivities at the single neuron level.

Recent studies have showed that early visual areas are not just passive relays of local information, but rather complex processing stages that incorporate global context and prior information. This behavior arises from the flow of information from horizontal and feedback connections that can dynamically adapt the selectivities of single neurons to the situation at hand. The increasing evidence that the early visual areas show this kind of complex selectivity suggests that population codes operate in a high dimensional space; this property makes it likely that single neurons have non-linearly mixed selectivities. Examining these selectivities at the single neuron level can be misleading. Based on the above evidence, we argue that color and form features not only are continuously interacting in our visual experience, but are also integrated rather than segregated in the visual cortex.

#### **ACKNOWLEDGMENTS**

Andrey R. Nikolaev and Cees van Leeuwen were aided by an Odysseus grant from the Flemish Organization for Science (FWO). We thank Dr. Steeve Laquitaine, Dr. Justin L. Gardner and Dr. Anthony J. DeCostanzo for helpful discussions. We also thank the editor for her help and availability throughout the review process and the reviewers for their constructive criticism and helpful suggestions/comments on the manuscript.

#### **REFERENCES**


Koch, C., and Poggio, T. (1999). Predicting the visual world: silence is golden. *Nat. Neurosci.* 2, 9–10. doi: 10.1038/4511


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 14 June 2013; paper pending published: 08 July 2013; accepted: 05 August 2014; published online: 27 October 2014.*

*Citation: Rentzeperis I, Nikolaev AR, Kiper DC and van Leeuwen C (2014) Distributed processing of color and form in the visual cortex. Front. Psychol. 5:932. doi: 10.3389/fpsyg.2014.00932*

*This article was submitted to Perception Science, a section of the journal Frontiers in Psychology.*

*Copyright © 2014 Rentzeperis, Nikolaev, Kiper and van Leeuwen. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## A distributed code for color in natural scenes derived from center-surround filtered cone signals

#### *Christian J. Kellner 1,2\* and Thomas Wachtler 1,2,3*

*<sup>1</sup> Department of Biology II, Ludwig-Maximilians-Universität München, Martinsried, Germany*

*<sup>2</sup> Graduate School of Systemic Neurosciences, Ludwig-Maximilians-Universität München, Martinsried, Germany*

*<sup>3</sup> Bernstein Center for Computational Neuroscience, Ludwig-Maximilians-Universität München, Martinsried, Germany*

#### *Edited by:*

*Galina Paramei, Liverpool Hope University, UK*

#### *Reviewed by:*

*Sérgio M. C. Nascimento, University of Minho, Portugal C. Alejandro Párraga, Universitat Autònoma de Barcelona, Spain*

#### *\*Correspondence:*

*Christian J. Kellner, Department of Biology II, Computational Neuroscience, Ludwig-Maximilians-Universität München, 82152 Martinsried, Germany e-mail: kellner@bio.lmu.de*

In the retina of trichromatic primates, chromatic information is encoded in an opponent fashion and transmitted to the lateral geniculate nucleus (LGN) and visual cortex via parallel pathways. Chromatic selectivities of neurons in the LGN form two separate clusters, corresponding to two classes of cone opponency. In the visual cortex, however, the chromatic selectivities are more distributed, which is in accordance with a population code for color. Previous studies of cone signals in natural scenes typically found opponent codes with chromatic selectivities corresponding to two directions in color space. Here we investigated how the non-linear spatio-chromatic filtering in the retina influences the encoding of color signals. Cone signals were derived from hyper-spectral images of natural scenes and preprocessed by center-surround filtering and rectification, resulting in parallel ON and OFF channels. Independent Component Analysis (ICA) on these signals yielded a highly sparse code with basis functions that showed spatio-chromatic selectivities. In contrast to previous analyses of linear transformations of cone signals, chromatic selectivities were not restricted to two main chromatic axes, but were more continuously distributed in color space, similar to the population code of color in the early visual cortex. Our results indicate that spatio-chromatic processing in the retina leads to a more distributed and more efficient code for natural scenes.

**Keywords: color vision, visual cortex, sparse coding, natural image statistics, population code, efficient coding**

#### **1. INTRODUCTION**

In the retina of trichromatic primates, spatio-chromatic processing of signals from long (L), medium (M), and short (S) wavelength selective cones by ON and OFF bipolar and ganglion cells with center-surround receptive fields leads to parallel pathways that carry both spatial and chromatic information to the lateral geniculate nucleus (LGN) and visual cortex (Lee, 2011). Chromatic signals are carried in pathways encoding differences between S cone signals and the signals of L and M cones, or differences between L and M cone signals, respectively (Mollon, 1989; Dacey, 2000; Solomon and Lennie, 2007; Lee et al., 2010). While the first, phylogenetically older pathway has low spatial selectivity and is thought to be specifically concerned with color information, the second pathway carries both spatial and chromatic information (Boycott and Wässle, 1999; Martin et al., 2011). Midget retinal ganglion cells have spatially center-surround receptive fields and in the fovea achieve their color opponency by antagonistic processing of signals from a single cone in the center and several cones in the surround of their receptive field. There is evidence for functional cone-type specificity beyond that arising from a single-cone center, but different studies have arrived at different conclusions (Reid and Shapley, 1992, 2002; Lee et al., 1998; Martin et al., 2001; Buzás et al., 2006; Field et al., 2010; Crook et al., 2011; Martin et al., 2011; Lee et al., 2012), and the question of the degree to which cone-type specific wiring contributes to midget receptive fields remains open.

Corresponding to the parallel pathways in the retina, color selectivities in the LGN cluster around the two cardinal axes of cone opponency (Derrington et al., 1984). In visual cortex, however, the representation of color is different. Chromatic information is encoded cortically by both opponent and non-opponent neurons (Lennie et al., 1990; Wachtler et al., 2003). Moreover, the preferences of cortical color-selective neurons are not restricted to two main axes of opponency, but are more distributed (Lennie et al., 1990), indicating a population code for color (Wachtler et al., 2003). The transformation from coding along cone opponency axes to a distributed representation in the cortex is not well understood, but according to the theory of efficient coding (Barlow, 1961, 2001) one hypothesis would be that this code is in some sense more efficient.

Previous studies investigating efficient codes for color in natural scenes have used independent component analysis (ICA), a method for finding a linear transformation that makes the resulting outputs as statistically independent as possible (Jutten and Herault, 1991; Bell and Sejnowski, 1997). Analyses of chromatic natural images using ICA revealed that opponent codes are efficient to encode natural color stimuli. Typically, in these studies two main types of chromatic selectivity were found (Hoyer and Hyvärinen, 2000; Wachtler et al., 2001; Doi et al., 2003), which qualitatively resembled more the representation in retina and LGN than the coding properties in the visual cortex. While the discrepancies can be explained in part by the stimuli used in the experiments to determine color tuning in visual cortex (Caywood et al., 2004), broadly distributed color selectivities have also been found with other types of stimuli (Wachtler et al., 2003).

A reason for the lack of insights about the nature of the distributed cortical representation of color from previous studies could be that the assumed model of a linear transformation of cone signals was not appropriate. Comparison of efficient codes found by methods like ICA with properties of neurons in the visual system requires that visual processing can be approximated as a linear transformation of the cone signals. However, the spatio-chromatic processing in the retina transforms the cone signals in fundamentally nonlinear ways and a linear model may not be adequate. Moreover, spatial center-surround filtering as observed in retinal neurons removes much of the spatial correlations between signals of neighboring cones (Wachtler et al., 2007), which may enhance the relative contribution of chromatic variation. It is further conceivable, given the limited number of fibers in the optic nerve as compared to both the number of receptors in the retina and the number of neurons in primary visual cortex, that retinal processing is subject to different constraints and coding objectives than the representation in the visual cortex (Lee et al., 2002). Under this assumption, retinal processing could be considered as a preprocessing step separate from cortical processing (Doi et al., 2003), and it would be appropriate to perform the analyses not on the cone signals, but on the output signals of the retina. Here we investigated the consequences of nonlinear spatio-chromatic filtering in the retina for the efficient coding of chromatic information in natural scenes. We modeled the center-surround processing in the retina to obtain estimates of the signals in the different parallel retinal pathways carrying chromatic information and analyzed these signals by performing ICA.

#### **2. MATERIALS AND METHODS**

#### **2.1. IMAGE BASIS**

As image basis for the main analysis we used eight images from the hyper-spectral image database by Párraga et al. (1998) that were previously used to study efficient codes for color (Wachtler et al., 2001; Lee et al., 2002). We used the same set of images in order to make our results directly comparable with these studies. The images were recorded outdoors around Bristol, UK, under stable weather and lighting conditions (Párraga et al., 1998). In additional analyses, we used other images from the Párraga et al. dataset as well as from other datasets (see below). Images were 256 × 256 pixels in size, with each pixel subtending 0.056 × 0.056 degree of visual angle. Pixels corresponded to radiance values in 31 wavebands between 400 nm and 700 nm. Radiance values were derived from the raw data with the code provided by Párraga et al. (1998). In all scenes a Kodak GrayCard reflectance standard was present; the corresponding picture areas were ignored during analysis.

To control for potential misalignment between the color planes in the hyper-spectral images due to the relatively long acquisition time (Párraga et al., 1998), we estimated the drift between image planes by 2-d cross-correlation. In most cases, the misalignment was zero, and non-zero misalignments appeared unsystematic, with a maximum shift of 2 pixels. Repeating the analysis with images where these shifts had been corrected did not alter the findings. As an additional control we used the four images of the hyper-spectral dataset of Foster and Nascimento (Nascimento et al., 2002; Foster et al., 2006) that showed natural scenes. The individual images were recorded within 15 s, which largely excluded any misalignment of wavelength planes. The results of this analysis, as well as those of an analysis with all the images in this dataset that were larger than 600 × 600 pixels, did not change any of the findings.

#### **2.2. IMAGE FILTERING**

To take into account the spatio-chromatic filtering by the retina, three main processing steps were modeled: (a) transduction of photons to neural signals by the photoreceptors, (b) centersurround integration of cone signals, and (c) splitting of the signals in ON and OFF pathways, mediated by bipolar cells with center-surround receptive fields. **Figure 1** provides an overview of the entire filtering process.

To obtain cone excitations from natural scenes, for each of the images we computed the dot products of the pixel spectra with the vectors of cone sensitivities, resulting in a 256 × 256 × 3 matrix. Human cone sensitivity estimates were taken from Stockman et al. (1993). The center-surround integration stage was modeled by

convolution of the image with a Mexican hat-like kernel. We used an approximation of a Difference-of-Gaussian, consisting of a 3 × 3 pixel matrix with a value of 1 at the center, −0.15 at the center pixel of each edge, and −0.1 at the corner pixels. This filtering assumed that total weights of center and surround are balanced, and that each pixel of each pixel plane represents a ganglion cell with a single cone in the center. We assumed that the surround consists of all three cones exerting an equal contribution at each location (mixed L,M,S surround), but tested other configurations as well. ON and OFF signal channels were generated by halfwave rectification on the filter outputs and their sign-inverted counterparts, respectively. After the rectification procedure we log-transformed every channel to mimick a compressive response function. Since the rectification step introduced zero values into the data and the natural logarithm diverges at zero we added a dynamic offset to the channel. The offset was chosen such that all channels had the same dynamic range.

For the analysis, 7 × 7 patches were selected randomly from the prefiltered data. ON and OFF pixel planes for all 3 cone classes were interleaved at each pixel. The resulting dimensionality of a single input data sample was thus 7 × 7 × 3 × 2 = 294.

#### **2.3. ICA**

ICA was proposed as a solution to the blind source separation problem and has been applied in various studies (Bell and Sejnowski, 1997; Wachtler et al., 2001; Lewicki, 2002) to learn efficient codes for visual stimuli. The ICA model assumes a linear mixture of statistically independent sources *s* (also often called causes), which is observed via a number of sensors. If no additive sensor noise is assumed, the problem can be written as:

$$\mathbf{x} = \mathbf{A} \,\, \mathbf{s} \tag{1}$$

Note that neither the sources *s* nor the mixing matrix *A* are known. The goal of ICA is to recover the sources by adapting *A* such that the resulting signals are as statistically independent as possible.

Once *A* has been inferred, the source can simply be uncovered by solving for *s*:

$$s = A^{-1} \; \mathbf{x} = \mathbf{W} \; \mathbf{x} \tag{2}$$

The columns of *A* are usually called the basis functions and the rows of *W* are called the filters.

Here we used the approach by Lee and Lewicki (2000) with the learning rule for *A* given by:

$$
\Delta \mathbf{A} \propto \mathbf{A} \mathbf{A}^T \frac{\partial}{\partial \mathbf{A}} \log p(\mathbf{x} \mid \mathbf{A}) = -\mathbf{A} (z(\mathbf{s}) \mathbf{s}^T - \mathbf{I}).\tag{3}
$$

The individual terms are the identity matrix *I*, the transpose of the sources *s <sup>T</sup>* and *<sup>z</sup>*(*s*) <sup>=</sup> <sup>∂</sup> log *<sup>p</sup>*(*s*) <sup>∂</sup> *<sup>s</sup>* . The prior source distributions were modeled using the exponential power distribution (also known as the generalized Gaussian or generalized Laplacian). The simple form is:

$$p(s\_i) \propto e^{-\frac{1}{2}|s\_i|^{q\_i}} \tag{4}$$

The kurtosis can be controlled by varying *qi* and thus platykurtic, leptokurtic, and Gaussian distributions can be modeled. We used *qi* <sup>=</sup> <sup>2</sup> <sup>1</sup>+β*<sup>i</sup>* and estimated <sup>β</sup>*<sup>i</sup>* during learning. Therefore no additional assumption about the exact distribution of the sources were made a priori. As β*<sup>i</sup>* becomes bigger, the distribution becomes more leptokurtic and the resulting code more sparse, meaning that the source coefficients are mostly close to zero.

The mixing matrix *A* was estimated in 100.000 iterations. It was initialized with Gaussian distributed random values and all priors were set to Gaussian densities. After every 400 iterations new input data were sampled by drawing 5000 patches randomly from each of the eight pictures and β*<sup>i</sup>* was re-estimated. All samples were centered and rescaled to have zero mean and unit variance. The stepsize was adjusted at iteration points 1000, 5000, 10000, 30000, 70000, to 0.02, 0.01, 0.005, 0.002, 0.001 and 0.0001 respectively. In order to accelerate the learning process, the algorithm was ported to the CUDA parallel computing architecture and run on a NVIDIA Tesla M2090 graphics processor.

#### **2.4. ANALYSIS OF THE RESULTS**

#### *2.4.1. Reverse correlation - activation triggered averages*

After learning, the mixing matrix *A* and the unmixing matrix *W* were adapted to the preprocessed data. Due to the non-invertible nonlinear filtering, the result was not a simple linear unmixing of LMS signals. Therefore, we used a reverse-correlation approach to illustrate the resulting filters: Source activations for each postfiltered patch were used as weights for the corresponding original patch in LMS-color space. By averaging over all weighted original patches we computed the activation triggered average (ATA), i.e., the average patch that would elicit the maximal response for a single basis function. The details of this procedure are as follows:

When the individual filters *wr* (rows of *W*) are used to perform the unmixing of the data, each individual source coefficient *sr* is a direct measure of the response of the filter *wr* to a given data sample *xr*. In our case the data samples were the preprocessed patches at *p*(*i*), where the vector *i* = (*x*, *y*,*e*) specifies the patch position (*x*, *y*) in the preprocessed image *e*. Using the transformation between the preprocessed patches *pk*(*i*) and the patches in LMS space *p*ˆ(*i*), we can then calculate the average original patch *ATAr* that the individual filters *wr* best respond to by using the source coefficient derived from *p*(*i*) as a weight for *p*ˆ(*i*) and then averaging over all available *p*ˆ(*i*).

We therefore generated all possible 7 × 7 patches from each of the eight preprocessed images used for analysis (*N* = 8 ∗ 61504 = 492032). The source activations were then estimated using equation (2). To eliminate noisy contribution of source coefficients with a very low absolute activation, i.e., source activations around zero, we fitted the source activation with an exponential power distribution. When the mean of the fit was close to zero and the distribution was leptokurtic we used the cumulative distribution function *F*(*x*) to discard 95% of all the source activation around the peak (see **Figure 2**). Thus for each basis function and each patch in every image we computed a weight α*r*(*i*):

$$\alpha\_r(\mathbf{i}) = \begin{cases} 0 & \text{if } F(\mathbf{x}) > 0.25 \land F(\mathbf{x}) < 0.975\\ s\_r(\mathbf{i}) & \text{otherwise} \end{cases} \tag{5}$$

To calculate the patch in LMS-space that each basis function would maximally respond to (the ATA), we weighted each original

coefficient values showing a typical leptokurtic distribution with a peak around zero and heavy tails. The fit of the exponential power distribution to the data (β: 1.312, μ: 0.001, σ: 0.753) is shown in red and the cumulative distribution function *F*(*x*) is plotted in blue. **(C)** RGB rendering of the original image. **(D)** Contributions to the image, red indicates high positive, blue high negative and white no contributions of basis function 116.

patch *p*ˆ(*i*) with the corresponding weight α*r*(*i*) that was calculated earlier and averaged over the result:

$$ATA\_r = \frac{1}{N} \sum\_{k=1}^{N} \alpha\_r(k) \,\hat{p}(k) \tag{6}$$

We verified the plausibility of this approach by comparing the *ATAr* with the basis functions for the analysis of LMS input without preprocessing as in Lee et al. (2002). ATAs tended to be slightly less saturated than the corresponding basis functions but otherwise they resembled each other in color preference and spatial structure almost completely.

#### *2.4.2. Plotting of the results*

We displayed the *ATAr* as shown in **Figure 3** with the method used by Ruderman et al. (1998). The L, M, and S components of each pixel of the patches were first normalized to values between 0 and 1 and then plotted as red (R), green (G) and blue (B) values. This gives a pseudo-color representation of relative cone excitations that is qualitatively similar to a true-color rendering. Therefore spatial as well as chromatic structure can be observed. To further illustrate the chromatic properties, each pixel was plotted as a point in a cone-opponent color space (Wachtler et al., 2001), where x values were computed as *x* = *L* − *M*, y values *y* = *S* − (*L* + *M*)/2 and the *z* values as *z* = *L* + *M*. When *x*, *y*,*z* values are converted to a spherical representation *r*, θ, φ, the azimuth angle φ is a direct measure of the hue of a given point while the radial distance *r* indicates its chroma and the elevation θ specifies the luminance. For plotting points we used a projection of the zaxis onto an isoluminant plane. Luminance information can still be inferred from the brightness of the individual points.

#### *2.4.3. Directions in color space*

Once transferred to the cone-opponent color space, the chromatic characteristics of each patch could be quantitatively studied. To quantify the degree of opponency of individual patches, i.e., whether the pixel selectivities were roughly aligned in color space, we performed principal component analysis on the color space coordinates of all the pixels in each patch. The highest eigenvalue was used as an estimate of the strength of opponency, and the eigenvectors were used to estimate the directions of opponency. Additionally, the average color preference for a given patch was calculated by the center of mass of all points.

To quantify how uniformly a set of directions *O* were spread out in color space we calculated the Kullback-Leibler-Divergence (*DKL*) from a uniform distribution *U* with the same number of directions as *O*: *DU*(*O*) = *DKL*(*U O*) = *<sup>i</sup> ln*( *<sup>U</sup>*(*i*) *<sup>O</sup>*(*i*) )*U*(*i*). The higher the value of this measure, the higher the divergence from uniformity. This measure can then be used to compare different sets of directions *On* derived from different ATAs.

#### **2.5. SPARSENESS CHARACTERIZATION**

To quantify the sparseness of the resulting code and thus its efficiency, we used the criteria proposed by Willmore and Tolhurst (2001): The mean lifetime kurtosis *KL*, the population kurtosis *KP* and the dispersal of the learned code. Both kurtosis values were computed via the standard kurtosis measure. The lifetime kurtosis *KL* of the response, i.e the source activation of a single component is a measure of how active this component is across all stimuli. The population kurtosis *KP* quantifies how many filters are active to encode a single stimulus. A high average *KP*-value means that only a small number of available filters are active for any given input. The dispersal of the code is a measure of the contribution of each filter to the encoding of the data. It is based on measuring the variance of the response of a filter to the image data. For a given code the standard deviation of all filters is estimated for each image and then normalized to the highest value and sorted according to their normalized standard deviation. In a compact code only a few filters encode the majority of the total variance of the data so the relative standard deviation of only a few filters will be high (close to one) and close to zero for all others. In a more dispersed code where individual filters have all higher contributions to the data, the relative standard deviation will be higher for all filters.

#### **3. RESULTS**

#### **3.1. SPATIO-CHROMATIC STRUCTURE**

ATAs for all 294 basis functions are shown in **Figure 3**. ATAs are sorted according to the *L*<sup>2</sup> norm of the corresponding basis functions (see also **Figure 4**). The *L*<sup>2</sup> norm can be used as a

measure of the relative contribution of each basis function to the data. The chromaticities of the individual pixels of each ATA plotted in the cone-opponent color space are shown in **Figure 5**. Visual inspection of **Figures 3**, **5** suggests that almost all ATAs can be divided in three major categories: homogeneous chromatic, color-opponent, and achromatic. Homogeneous chromatic ATAs have a large *L*<sup>2</sup> norm, no defined spatial structure, and are highly selective for one color. Most non-homogeneous but chromatic ATAs were color-opponent, i.e., the pixel chromaticities, when plotted in color space, were all clustered along a line and most often also crossed into opposing quadrants. Their spatial structure was both localized and oriented, i.e., they encoded chromatic edges (cf. Wachtler et al., 2001). A small number of

source coefficients for each basis function (sorted as in **A**). **(C)** Lifetime kurtosis vs. the *L*<sup>2</sup> norm.

non-homogenous chromatic ATAs were less strongly opponent with their pixel values more scattered in color space. There was no substantial correlation between the *L*<sup>2</sup> norm of the basis function and the degree of opponency (*r* = 0.1). The achromatic ATAs, encoding luminance edges, had mid- to low-range contributions. This is a notable difference to previous findings (Wachtler et al., 2001; Lee et al., 2002), where many achromatic basis functions with high *L*<sup>2</sup> norm were found (see below).

#### **3.2. DISTRIBUTION OF COLOR PREFERENCES**

To illustrate the overall color preferences of ATAs, we computed the center of mass of all pixels for a single ATA. The resulting positions are plotted in **Figure 6A**. Additionally, the direction of

largest variation around the center of mass position is shown in **Figure 6B**. The center of mass positions were all densely clustered around the origin, indicating relatively weakly pronounced selectivities, with the exception of the homogeneous ATAs, which were more eccentric. All points together formed a distribution that was strongly elongated along a certain direction in color space. This direction, as estimated by the first principal component of the distribution, had an angle of 101.6 degrees in this color space. This matches closely the perceptual "yellow"-"blue" line, approximated by the line between the loci of monochromatic blue light of 476 nm and monochromatic yellow light of 576 nm (Mollon, 2006), which also lies close to the line of natural daylight variation and has an angle of 98.5 degrees in this color space.

**Figure 7** shows the distribution of color preferences across directions in color space. In contrast to previous results obtained without spatio-chromatic preprocessing (Wachtler et al., 2001; Lee et al., 2002), color preferences were spread around the entire color space. However, the distribution was not uniform but showed several regions of higher density. One of these regions was around 90 degrees, with pixel chromaticities varying between light-blue and dark-yellow. This corresponds to a modulation of values along a plane defined by S-cone and luminance variation. Many ATAs with this chromatic signature had localized and oriented spatial features that qualitatively resembled the structure of the basis functions found for natural gray-scale images (Olshausen and Field, 1996; Bell and Sejnowski, 1997) and the achromatic basis functions for L-,M-, and S-cone activations

(Wachtler et al., 2001; Lee et al., 2002). The second region with higher density was around 130 degrees, which corresponds to an opponency axis between orange and teal. These regions correspond to the two opponency axes found in previous studies (Wachtler et al., 2001; Lee et al., 2002). Another more densely covered region appeared in the first quadrant around 65–80 degrees, and the region around 10–30 degrees, was the least densely covered area. Apart from these modulations in density, the directions of color-opponent axes were widely distributed in color space with a divergence from uniformity *DU*(*O*) of 16.65, compared to the code obtained from pure LMS cone activation (Wachtler et al., 2001; Lee et al., 2002) which had a *DU*(*O*) value of 25.58.

To further analyze the chromatic properties we determined for each ATA the color tuning of the pixel with the maximal absolute value. This is comparable to estimating the color preference for small colored spots. By using this measure the directions of color preference were even more uniformly distributed in color space, with a *DU*(*O*) of 2.59 for the filtered data as compared to 21.7 for LMS data.

#### **3.3. CODING EFFICIENCY**

Coding efficiency was originally understood in terms of redundancy reduction. Under this assumption a code is efficient if it reduces the mutual information between components, i.e., the information encoded among a group of neurons would be reduced as much as possible. Another measure of coding efficiency especially when dealing with a large number of encoding neurons, such as in the cortex, is the sparseness of the code, i.e., how many neurons of all that are available are used to encode a specific stimulus.

A quantitative study of redundancy reduction efficacies of different linear filtering algorithms was done extensively by Eichhorn et al. (2009). Multi-information reduction, Average Log-Loss and rate-distortion curves were used as evaluation criteria for various algorithms like ICA and Principal Component Analysis (PCA), which were all compared to a random decorrelation method that served as baseline. We used the source code provided with the paper and adapted it to process our prefiltered and rectified cone signal data. Even though we kept the changes to a minimum in order to stay as close to the original analysis, it was not possible to use the NPL entropy estimator for the filtered data due to numerical instabilities. The reason for this most likely is that the distribution of the data after our preprocessing does not fit with the model assumptions of the NPL entropy estimator. Therefore, we used the Gaussian upper entropy bound (Bethge, 2006). Our results are thus not directly comparable to those in Eichhorn et al. (2009). Nevertheless the absolute multi-information reduction with respect to the random decorrelation transform (RND) was one order of magnitude better for ICA than for PCA, namely −0.4640 ± 0.0058 bits/component (ICA) and −0.0460 ± 0.0013 bits/component (PCA). The relative reduction in multi-information (cf. Table 1. in Eichhorn et al., 2009) compared to RND was 0.42 ± 0.01 percent for ICA, and 0.04 ± 0.00 percent for PCA. The Average Log-Loss (ALL), as a measure of how well the density model of the specific transformation matches the actual data and differences correspond to coding cost (Eichhorn et al., 2009). The difference for PCA-RND was −0.0481 ± 0.001 bits/component, for SSD-RND (spherically symmetric density) it was −0.2429 ± 0.0001 bits/component, and for ICA it was −0.4206 ± 0.0036 bits/component. Thus, in our case ICA performed best, i.e., it had the smallest ALL value.

To estimate the sparseness of the learned representation we computed the lifetime kurtosis *KL* of individual units and the population kurtosis *KP* (cf. Willmore and Tolhurst, 2001). **Figure 8** shows histograms for source coefficients together with their estimated lifetime kurtosis. All source densities are highly sparse (i.e., leptokurtic), with a pronounced acute peak at zero and heavy tails. The mean lifetime kurtosis *KL* over all units was 10.29, which means individual units were silent for almost all inputs but very strongly activated for specific input features. The mean population kurtosis over all inputs was 12.67, i.e., only a small subset of available neurons were active for any given input. In addition to lifeline and population kurtosis, the dispersal is an important measurement for the sparseness of the code. This measure, based on the standard deviation of the responses, quantifies the relative coding contribution of each filter (derived from the

basis function) to the image data. **Figure 9** shows the dispersal for ICA, and as a comparison for PCA, which is an example of a compact code. PCA was done on the same preprocessed cone activation data. For ICA the decrease in coding contribution is close to linear, while for PCA it is exponential. This confirms that PCA is a much more compact code, i.e., only a few filters are used to encode the majority of the data, while in the ICA case most of the filters have a high contribution. Overall, these results show that, in accordance with previous studies (Lee et al., 2002), the obtained representation for the preprocessed images was a highly disperse and sparse, i.e., statistically efficient, code, although sparseness itself was not an enforced constraint during learning.

#### **4. DISCUSSION**

We investigated the consequences of nonlinear spatio-chromatic filtering similar to the processing in the retina, including the splitting into parallel ON and OFF color-opponent channels, for the learning of efficient codes from responses to natural scenes. Compared to the results of previous studies where ICA was used to learn efficient codes directly from LMS cone activations of natural images (Wachtler et al., 2001; Lee et al., 2002), chromatic preferences obtained from opponent signals were more broadly distributed in color space. A continuous distribution is in better accordance with experimental data (Lennie et al., 1990; Wachtler et al., 2003) than the strong clustering into three chromatic types observed previously. Additionally, it is also in closer correspondence with precortical encoding of color. The filtering we applied mimicks the effect of center-surround receptive fields of retinal bipolar and ganglion cells, which removes redundancy both in

the spatial and the spectral domain. In previous studies, whitening had been applied in a linear preprocessing stage before ICA. However, to estimate the results, this pre-filtering had to be taken into account by adding a corresponding linear transformation. In our analysis, such a direct compensation would not have been possible because the preprocessing stage was a nonlinear transformation. To represent the resulting components, we therefore used a reverse correlation technique to obtain a single-stage linear transformation representing the effective linear component of the multi-stage nonlinear filtering. A further effect of the preprocessing was the representation of the signals in the higher-dimensional space of six rectified opponent channels. This representation may have facilitated the distinction of features (Schölkopf and Smola, 2002). Similarly, the parallel channels in the retina and LGN provide such a high-dimensional representation, which might be exploited by cortical learning mechanisms.

The set of natural images was chosen to be the same as in Wachtler et al. (2001); Lee et al. (2002) to enable direct comparison of the results. These images were initially chosen to include a variety of scenes recorded outdoors and under different illuminations. To exclude that our results were an artifact of the specific choice of images, we repeated the analysis including all outdoor images contained in the Párraga et al. (1998) dataset. The result was again a broad distribution of chromatic preferences with a divergence from uniformity of 15.91, compared to a value of 28.64 obtained for these images without pre-filtering. In addition, we ran an analysis using a larger patch size of 10×10 pixel. This resulted in a spread of selectivities that was even more broad, with a divergence from uniformity of 11.09 compared to 24.27 obtained without pre-filtering.

For the center-surround spatio-chromatic filtering we used filters with a surround composed of equal contributions of all cone types. Often, the center-surround processing in the retina is likened to a whitening stage that removes second-order dependencies (Doi et al., 2003). Whitening filters for LMS images typically have also a cone-type specific center-surround structure. We repeated the analysis using whitening filters for the preprocessing. The color preferences of the resulting ICA ATAs were strongly clustered around a single region in color space, which is not in line with the observed color preferences in the visual system.

Our spatio-chromatic prefiltering mimicked the opponency of small bistratified ganglion cells and of midget cells under the assumption of a cone-type unspecific wiring of the surround. However, the exact composition of the surround of retinal receptive fields is unclear (Reid and Shapley, 1992, 2002; Lee et al., 1998, 2012; Martin et al., 2001; Buzás et al., 2006; Field et al., 2010; Crook et al., 2011; Martin et al., 2011). We determined how different surround compositions affect the distributions of chromatic preferences and the sparseness of the coding. We repeated the analysis with the same parameters but with different surround structures in the filtering stage. Besides the unspecific, mixed LMS surround we also used an unspecific mixed LM surround, a cone-type specific surround, and intermediate (mixed but biased) models for the surround (cf. **Table 1**). In addition we also used spatio-chromatic decorrelation via Zero-phase Whitening (Bell and Sejnowski, 1997). Compared to whitening, plausible retinal filtering led to more uniform distributions. Among the considered variants of retinal filtering, an equally balanced mixed surround with contributions from all cone types resulted in the smallest deviation from uniformity, but the other surround structures yielded similar values (cf. **Table 1**). Our results therefore do not provide a strong indication in favor of any specific surround organization. This suggests that in the real visual system there might be a high variation in the surround composition, which could explain why experimental evidence on the specificity of the surround has so far not been conclusive.

In addition to having more distributed preferences, the learned code for preprocessed data had all attributes one would expect from a sparse code in the cortex. The lifetime sparseness of individual components was high, but lower than in the case of unfiltered LMS data (10.29 vs. 21.40). On the other hand, the population kurtosis was drastically increased (12.67 vs. 4.86), meaning that only a small subset of all available units were active


**Table 1 | Kullback-Leibler-Divergence from uniformity, mean lifetime kurtosis** *KL* **and mean population kurtosis** *KP* **for different surround configurations and preprocessing methods.**

*Rows 1–5 show results when surround configurations were altered while holding all other parameters constant. Additionally the results are shown for when spatiochromatic filtering via zero-phase whitening (Bell and Sejnowski, 1997) was performed instead of center-surround filtering (row six). The last row shows the results obtained from non-preprocessed pure LMS signals as in Lee et al. (2002).*

at the same time. This fits very well with the vast increase in number of neurons from LGN to visual cortex, which is paralleled in our study by the increase in dimensionality. Moreover, the code revealed by our analysis was also highly disperse, i.e., for different stimuli different subsets of units were active. This is in contrast to a compact code like PCA, where also only a few components are active all the time, but it is always the same components that take part in the coding. Such an unequal distribution of activity would seem biologically implausible because a majority of the neurons would be there without making substantial contributions to the encoding of the stimuli.

A substantial amount of ATAs (14.3%) had chromatic selectivities that corresponded to variation between light-blue and darkyellow. Moreover, the overall distribution of color preferences also varied along one main direction in color space. Both of these axes of variation were very close to the perceptual blue-yellow axis and the line of variation of natural daylight illuminations (Mollon,

#### **REFERENCES**


color opponency in midget ganglion cells of the primate retina. *J. Neurosci.* 31, 1762–1772. doi: 10.1523/JNEUROSCI.4385-10.2011


2006), which constitutes the main chromatic variation of natural scenes (Webster and Mollon, 1997) and was found in previous ICA analyses (Wachtler et al., 2001). It it also reflected in the peak of the distribution of color preferences in primary visual cortex (Wachtler et al., 2003). Our results support the conclusion that the statistics of natural scenes are an important factor in shaping the processing mechanisms of the visual system.

#### **ACKNOWLEDGMENTS**

We thank Garrett Greene and Christian Leibig for fruitful discussions and critical comments on the manuscript, Delwen Franzen for proofreading, and the reviewers for constructive comments which helped to improve the manuscript.

Supported by the Munich Graduate School of Systemic Neurosciences (GSN) and the Bernstein Center for Computational Neuroscience Munich (BMBF grant 01GQ1004A).

11, 191–210. doi: 10.1088/0954- 898X/11/3/302


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 19 June 2013; paper pending published: 19 July 2013; accepted: 04 September 2013; published online: 27 September 2013.*

*Citation: Kellner CJ and Wachtler T (2013) A distributed code for color in natural scenes derived from centersurround filtered cone signals. Front. Psychol. 4:661. doi: 10.3389/fpsyg. 2013.00661*

*This article was submitted to Perception Science, a section of the journal Frontiers in Psychology.*

*Copyright © 2013 Kellner and Wachtler. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Processing bimodal stimuli: integrality/separability of color and orientation

#### *David L. Bimler <sup>1</sup> \*, Chingis A. Izmailov2† and Galina V. Paramei <sup>3</sup>*

*<sup>1</sup> School of Arts, Development and Health Education, Massey University, Palmerston North, New Zealand*

*<sup>2</sup> Department of Psychophysiology, Faculty of Psychology, Moscow Lomonosov State University, Moscow, Russia*

*<sup>3</sup> Department of Psychology, Liverpool Hope University, Liverpool, UK*

#### *Edited by:*

*Cees Van Leeuwen, Katholieke Universiteit Leuven, Belgium*

#### *Reviewed by:*

*Ilias Rentzeperis, RIKEN Brain Science Institute, Japan Ansgar R. Koene, University of Birmingham, UK*

#### *\*Correspondence:*

*David L. Bimler, School of Arts, Development and Health Education, Massey University College of Education, Centennial Drive, Private Bag 11-222, Palmerston North 4442, New Zealand e-mail: d.bimler@massey.ac.nz*

*†Chingis A. Izmailov is deceased.*

We examined how two distinct stimulus features, orientation and color, interact as contributions to global stimulus dissimilarity. Five subjects rated dissimilarity between pairs of bars (*N* = 30) varying in color (four cardinal hues, plus white) and orientation (six angles at 30◦ intervals). An exploratory analysis with individual-differences multidimensional scaling (MDS) resulted in a 5D solution, with two dimensions required to accommodate the circular sequence of the angular attribute, and red-green, blue-yellow and achromatic axes for the color attribute. Weights of the orientation subspace relative to the color subspace varied among the subjects, from a 0.32:0.61 ratio to 0.53:0.44, emphasis shifting between color and orientation. In addition to Euclidean metric, we modeled the interaction of color and orientation using Minkowski power metrics across a range of Minkowski exponents *p*, including the city-block (*p* = 1), Euclidean (*p* = 2) and Dominance metric (*p* → ∞) as special cases. For averaged data, *p* ∼ 1.3 provided the best fit, i.e., intermediate between separable and integral features. For individual subjects, however, the metric exponent varied significantly from *p* = 0.7 to *p* = 3.1, indicating a subject-specific rule for combining color and orientation, as in Tversky and Gati's variable-weights model. No relationship was apparent between dimensional weights and individual *p* exponents. Factors affecting dimensional integrality are discussed, including possible underlying neural mechanisms where the interaction of the low-level vision attributes orientation and color might shift between uncorrelated (*p* = 1) or correlated (*p* ≥ 2) forms.

**Keywords: color, orientation, bimodal stimuli, feature integration, multidimensional scaling, Minkowski metric, integral dimensions, separable dimensions**

#### **INTRODUCTION**

Researchers in visual perception frequently ask observers whether two stimuli are different, or how different they are. Ecologicallyvalid stimuli can vary along more than one attribute, or in more than one visual sub-modality: that is, their description requires more than one dimension. The total inter-stimulus dissimilarity is then an aggregate of differences across multiple attributes, and the research question becomes one of how these differences interact.

A research tradition beginning with Attneave (1950) has focused on the special case of "integral" dimensions, where the attributes on which the stimuli are parameterized can be replaced with oblique linear combinations, intrinsically as good as the original parameters, because the inter-stimulus dissimilarities remain the same. The classic examples of integral dimensions are lightness and saturation in color space (e.g., Hyman and Well, 1967; Ashby and Townsend, 1986; Burns and Shepp, 1988). In Garner's words (1974, 199), "[p]sychologically, if dimensions are integral, they are not really perceived as dimensions at all. Dimensions exist for the experimenter [. . . ] but these are constructs [. . . ] and do not reflect the immediate perceptual experience of the subject in such experiments . . . ." That is, integral dimensions form a seamless Gestalt.

A second possibility is that the dimensions do not interact, with dissimilarity perceived as simply a linear summation of the absolute differences on each of the dimensions in isolation. Such non-interacting attributes have been dubbed "separable" or "analyzable" (Attneave, 1950; Shepard, 1964).

The integral/separable distinction is of interest for visual psychophysics because feature interaction is ubiquitous by nature and not restricted to explicit judgments of dissimilarity, but bears upon other tasks, such as stimulus classification (Garner and Felfoldy, 1970; Ashby, 1988); classification errors (Attneave, 1950; Shepard and Chang, 1963); visual search (Treisman and Gormican, 1988; Koene and Zhaoping, 2007). The implications for perceptual mechanisms will be discussed below. However, integral and analyzable dimensions are not the only alternatives. Numerous models of feature integration can be subsumed within a family of Minkowski metrics or "*Lp* norms," characterized by a parameter *p*, the Minkowski exponent (e.g., Shepard, 1987). Writing -*Dn* for the difference between two stimuli along the *n*-th dimension:

$$\text{Disssimilarity} = \left[\sum\_{n} \Delta D\_{n}^{\rho}\right]^{1/p} \tag{1}$$

Separable dimensions correspond to *p* = 1, the city-block metric (Householder and Landahl, 1945). Higher values define perceptual models where some level of integration, suppression, or competition occurs among the attributes. Integral dimensions are formally described by *p* = 2, the familiar Euclidean metric (the *L*<sup>2</sup> norm). Here Equation 1 reduces to the Pythagorean formula for distance.

In the limiting case of *p* → ∞, dissimilarity is dictated by the *maximum* of the differences along the separate dimensions or attributes; that is, a dimension is suppressed and does not contribute to the dissimilarity if a larger difference exists along another dimension to dominate it. This has been dubbed the "supremum" or "dominance" metric (e.g., Hyman and Well, 1967). Note that although integer values of *p* receive most attention, the Minkowski framework remains valid for fractional values.

Early examinations of the dissimilarities from composite changes used simple stimuli (circle-spoke structures) confined to a single location in the visual field (Shepard, 1964; Garner and Felfoldy, 1970). Judgments of more elaborated attribute combinations were subsequently considered. Griffin and Sepehri (2002) used pairwise comparisons of simple stimuli varying in texture and color, concluding that the interaction follows a Minkowski metric (though they did not specify the value of *p*). Izmailov and Edrenkin (2010) elicited dissimilarity ratings among bar stimuli combining orientation and luminance and found *p* in the range 1.8–2.0.

In contrast to the above simple-stimulus studies, To et al. (2008, 2010) collected dissimilarity judgments for pairs of natural visual scenes that had been subjected to one or two ecologicallyrealistic manipulations (color, location, size, and/or blur). The image manipulations were typically distributed across the entire scene and could not be parameterized by or reduced to a change in a single localized "visual primitive." The authors' analysis led to the conclusion that dissimilarities between images separated by two manipulations were most concordant with those between single-manipulation pairs if the latter interacted according to a quasi-Dominance metric, with *p* = 2.84 (To et al., 2008) or *p* = 2.48 (To et al., 2010).

Apart from its psychophysical meaning, the *p* exponent may reflect the mechanisms of cortical processing accessed by a given task or index of dissimilarity. To et al. (2008, 2011) proposed that a large *p* can be expected when the variations along the dimensions are strongly correlated, exhibiting a high level of redundancy, so that a stimulus difference along one attribute is normally accompanied by a comparable difference on the others. In informational terms, such differences can be encoded efficiently if the neural nexus at which they converge allows the larger of the signals to dominate others that add little or no further information about dissimilarity.

Conversely, the additive combination expressed as the cityblock metric is the most efficient way of encoding a combination of difference signals from attributes where the values are empirically uncorrelated. If visual primitives such as color, size, or orientation are processed independently and in parallel, then the dissimilarity when two such features vary might be a linear combination of the -*Dn* in isolation, i.e., *p* near 1. Intermediate degrees of correlation require intermediate levels of non-linearity: competition among the attributes implies the kind of non-linear combinations characterized by *p* > 1 (Zhaoping and Snowden, 2006).

The present study further explores the interaction of visual attributes in bimodal stimuli and extends Izmailov and Edrenkin's (2010) path of research, with bar stimuli varying in color (rather than luminance) in addition to orientation. To determine *p* for this situation, we obtained the dissimilarity for each difference in orientation independent of color and each difference in color independent of orientation, and used these to predict the dissimilarity between pairs differing in both color *and* orientation, while varying *p* in the Minkowski metric (cf. Shepard and Cermak, 1973; To et al., 2008).

This approach postulates that color and orientation are separable (in the mathematical sense), i.e., that the difference between each same-color pair (differing only in orientation) is constant whatever the color, and conversely for each same-orientation pair. To test this postulate we applied multidimensional scaling (MDS) to the data. Unlike the process discussed so far, MDS begins with a matrix of inter-stimulus dissimilarities or "map distances" and reconstructs "map coordinates": *empirical* dimensional descriptions of the subjects' mental/perceptual representations of the stimuli. We ask whether a geometrical representation is adequate, treating it as Euclidean in nature (*p* = 2). It is tempting to seek the Minkowski metric for a given set of data by repeating MDS analysis for different *p*, and choosing the value that minimizes the mismatch between data and reconstructed distances. However, this strategy is known to be deceptive (Arnold, 1971; Shepard, 1974).

#### **MATERIALS AND METHODS PARTICIPANTS**

Five participants (four females), aged 20–27 years old, were normal trichromats with normal or corrected-to-normal vision. They were all undergraduate Psychology students, familiar with the scaling procedure but naïve to the specific research area.

#### **STIMULI**

Stimuli were colored bars of different orientation presented on a CRT screen at 12 cd/m<sup>2</sup> against a darker (2 cd/m2) gray background (as illustrated in **Figure 1**). At the viewing distance of 100 cm, each bar subtended an angle of 8.6◦ lengthwise and 0.6◦ widthwise. The bars were presented in pairs, to the right and left from the central nominal fixation point; their *centers* were separated by 10.8◦, the same for all pairs—that is, the bars can be imagined as rotating around these centers of gravity to generate the different orientations. Observation was binocular, without head fixation, in an otherwise-unlit room.

The bars took on six orientations, varying in 30◦ steps from the horizontal, and five different colors: red, yellow, green, blue, and white (**Table 1**). Thus, the two variables created 30 different bars. For convenience, these are labeled below as S*<sup>a</sup> <sup>m</sup>*, where the subscript *m* identifies the orientation and the superscript index *a* identifies the color.

**FIGURE 1 | Example of two stimuli presented for dissimilarity judgment.**



#### **PROCEDURE**

Subjects were instructed to rate the total dissimilarity of each pair of bars on a scale of 1 (least) to 9 (most). No particular pair was provided to subjects as an example of the maximum value. Each pair of bars was shown twice to each subject, once in the form "*i:j*" and once as the mirror-image "*j:i*," providing (30 × 29 =) 870 pairs. These were presented in the course of three sessions for each subject. Each pair was presented for 1.5 s followed by a 0.5 s interval, during which the subject entered the rating using corresponding keys of a computer keyboard. The response was not recorded if it exceeded this interval, though for each individual subject the number of missing inputs was just one or two.

The square matrix of pairwise differences obtained from each participant consists of an upper and lower triangular half-matrix containing *i:j* and *j:i* pairs. The Pearson correlation coefficients *r* between these two values for each participant are shown in the diagonal elements of **Table 2** and indicate good intra-subject replicability. Inter-subject replicability, shown in the off-diagonal elements of **Table 2**, was fair. Here the two judgments from each subject for a given stimulus pair were averaged and compared with the mean from each other subject.

The 870 stimulus pairs can be classified into three classes, as shown in **Figure 2**:


**Table 2 | Pearson correlation coefficients between individual subjects' dissimilarity matrices and (on the diagonal) between individuals'** *ij* **and** *ji* **half-matrices.**


3. 600 *bimodal* pairs (S*<sup>a</sup> m*: S*<sup>b</sup> <sup>n</sup>*), differing in both orientation and color (for brevity we use "bimodal" in a broader sense than usual since sub-modalities of vision are involved rather than separate sensory modalities).

#### **RESULTS**

Each subject's dissimilarity judgments were analyzed in combination with other subjects, and then in isolation. The mean dissimilarity ratings for each subject, across all 870 pairs, were 5.83, 5.76, 5.84, 5.25, and 4.66. We examine the individual distributions of ratings below (**Figure 7**), noting for now that all subjects used the full range from 1 to 9. That is, all five subjects used the response scale in much the same manner.

#### **MULTIDIMENSIONAL SCALING (EUCLIDEAN METRIC)**

Analysis began with MDS, in which a Euclidean geometrical model is used to account for the data, representing each stimulus as a point in a low-dimensional space. In an iterative process, the locations of the 30 points are adjusted so that the distances among them reflect the dissimilarities among the corresponding stimuli as accurately as possible. Any mismatch between the data and reconstructed distances in a solution is measured by *stress*1, an index of badness-of-fit, which is progressively minimized by the MDS process (Kruskal, 1964). The dimensions of the solution can be interpreted as the variables that underlie the visual domain in question.

As noted above, we work with the assumption of Euclidean geometry, i.e., *p* = 2, secure in the knowledge that departures from this approximation have little effect on MDS solutions (Arabie, 1991). Attempting to accommodate the averaged data within 3D, 4D, 5D, and 6D models resulted in *stress*<sup>1</sup> values of 0.217, 0.167, 0.129, and 0.112, respectively. Standard rules for interpreting *stress*<sup>1</sup> (Kruskal, 1964) show the three-dimensional solution to be inadequate. Here we focus on the 5D solution, ignoring the 6D version which provides only a small improvement in goodness-of-fit.

To rotate the optimized solution to non-arbitrary dimensions, we applied the "weighted Euclidean" or INDSCAL framework of individual variation (Wish and Carroll, 1974). This framework allows for the possibility that subjects vary in the relative salience or weight they place on one dimension or another: that is, an inter-stimulus difference along a given dimension may contribute more to perceived dissimilarity for one subject than another. Specifically, the model includes dimensional-weight parameters *wqd* (where the index *q* designates a subject, while *d* labels the dimensions), and finds their optimal values. If the coordinates of the *i*-th and *j*-th items in the model are written *xid* and *xjd* respectively, the parameters *wqd* modulate the perceived inter-item distances for that subject:

$$\text{distance}(i,j)^2 = \sum\_{d} w\_{qd}^2 (\mathbf{x}\_{id} - \mathbf{x}\_{jd})^2 \tag{2}$$

This weighting is equivalent to systemically altering the interpoint distances by stretching or compressing the consensus model along its dimensions for a better fit to each subject's data (which are kept separate in this analysis). The outcome is that the dimensions of the final solution (which would otherwise be arbitrary) correspond to modes of inter-subject variation within the data. To test whether noise alone could account for any differences in the subject-specific weight parameters, the *wqd* were replicated by repeating the INDSCAL analysis with each subject's *i:j* and *j:i* matrices treated separately.

#### **ORIENTATION SUBSPACE**

**Figure 3** is a scatterplot in which the stimuli are located by their coordinates on the first two (rotated) dimensions, *D*1 and *D*2. These clearly accommodate the orientation parameter. Two dimensions are required rather than one because of that parameter's cyclic nature (for these symmetrical stimuli, θ + 180◦ is equivalent to θ), to give the parameter room to loop back on itself. This outcome is in accord with previous results when stimulus pairs of bars varied in orientation alone (Indow, 1988; Izmailov et al., 2004) or in orientation and luminance (Izmailov and Edrenkin, 2010).

Even so, the dimensions have separate physical meanings. *D*1 serves to separate horizontal from vertical stimuli, providing the dissimilarity between what have been called "the cardinal axes of the visual coordinate system" (Orban et al., 1984). *D*2 separates bars inclined right vs. left from the vertical direction. *D*1 disperses the stimuli more than *D*2, accounting for more variance in the MDS solution (27.7% compared to 23.1%). This is consistent with the "oblique effect" (cf. Orban et al., 1984), whereby spatial vision exhibits orientation anisotropy, so that two bars at

right angles seem more dissimilar if they align with the cardinal axes than if they are diagonals.

We note also that the dissimilarities among orientations obtained by Izmailov et al. (2004) could best be explained by separate cardinal-axis and diagonal-axis contributions, combining in a Minkowski metric with *p* ∼ 1.75.

#### **COLOR SUBSPACE**

The remaining three dimensions capture the dissimilarities from differences in color. **Figure 4** projects the solution onto its 3rd vs. 4th dimensions and 3rd vs. 5th dimensions. In each panel, other dimensions (including *D*1 and *D*2) are orthogonal to the plane of the page. *D*3 and *D*4 can immediately be identified as "red-green" and "blue-yellow" opponent perceptual systems, respectively. Further, it appeared that a white and any chromatic bar are perceived as more dissimilar than the isoluminant plane can accommodate, requiring *D*5, an "achromatic" distinction, to capture this additional dissimilarity.

The crucial aspect of **Figures 3**, **4** is that subjects treat orientation and color as separate, decoupled attributes in their mental/perceptual models, with each attribute confined to its own subspace, orthogonal to the other subspace. That is, a given pair of orientations are perceived as equally dissimilar if the bars are (for instance) both blue or both yellow. Conversely, dissimilarities within the color subspace are the same whether two colors are presented as a pair of 30◦ bars or a pair of any other angle. This lack of coupling is a pre-requisite for applying Equation 1 in subsequent analysis.

#### **RELATIVE SALIENCE OF THE ORIENTATION AND COLOR SUBSPACES**

This rotated 5D solution provides a quantitative measure of the relative importance of the inter-color and inter-orientation differences for the subjects. Specifically, the combined axes of the "orientation subspace" (**Figure 3**) disperse the items marginally

"achromatic" dimension.

**Table 3 | Parameters of individual and mean data from the MDS and Minkowski-metric analyses.**


*5D solution for individual-differences MDS includes weights wq1 ...wq5 on the five dimensions (two orientation, three chromatic); total weight (wq*<sup>1</sup> *<sup>2</sup>* <sup>+</sup> *wq*<sup>2</sup> *2) 0.5 of orientation subspace; total weight (wq*<sup>3</sup> *<sup>2</sup>* <sup>+</sup> *wq*<sup>4</sup> *<sup>2</sup>* <sup>+</sup> *wq*<sup>5</sup> *2) 0.5 of color subspace.*

more than the combined axes of the "color subspace" (**Figure 4**), respectively accounting for 50.2 and 49.8% of variance within the MDS solution. Note that this is a combined outcome, with wide variations in the relative importance of orientation to individual subjects.

#### **INDIVIDUAL DIFFERENCES IN DIMENSION WEIGHTS**

**Table 3** shows dimension weights from the individual-differences MDS analysis. In the orientation subspace, subjects differed in the weights they placed on *D*1 (cardinal axes) relative to those for *D*2 (diagonals). Greater variations appeared in the color subspace, particularly in the weight of *D*3, red-green dimension, relative to the blue-yellow (*D*4) and achromatic (*D*5) dimensions. Notably, the combined weight of the orientation subspace relative to the color subspace (**Table 3**) again showed individual variations (**Figure 5**). These differences are replicated between *i:j* and *j:i* ratings. Subject #5, for instance, places relatively greater weight on orientation differences while Subjects #1 and #2 place more weight on color differences. Here the combined weights of the orientation and color subspaces are *wqO* = (*wq*<sup>1</sup> <sup>2</sup> <sup>+</sup> *wq*<sup>2</sup> <sup>2</sup>)0.<sup>5</sup> and *wqC* = (*wq*<sup>3</sup> <sup>2</sup> <sup>+</sup> *wq*<sup>4</sup> <sup>2</sup> <sup>+</sup> *wq*<sup>5</sup> <sup>2</sup>)0.5, respectively.

#### **DISSIMILARITY JUDGMENTS FOR COLOR AND FOR ORIENTATION**

Averaged across the subjects, the dissimilarities for orientation and for color are comparable in magnitude (**Figure 6**), with the

mean rating across *color-only* pairs (3.94 ± 0.47) slightly greater than the mean across *orientation-only* pairs (3.43 ± 0.85). Recall that *orientation-only* pairs outnumber *color-only* pairs (150 vs. 120), and contribute more to variance; thus this is consistent with the earlier observation that the color subspace disperses the items slightly less than the orientation subspace.

**Table 3** indicates substantial inter-individual variation, however, with Subject #2 rating *color-only* pairs twice as dissimilar as *orientation-only* pairs, while for other subjects they are only half as dissimilar (also evident in the dimensional weights).

We mention this situation of similar magnitude for *color-only* and *orientation-only* dissimilarities because it provides greatest sensitivity to *p* in the comparison between predicted and actual dissimilarities (cf. To et al., 2008, 2010). If *p* > 1 (i.e., if there is some degree of non-linear competition between the single-attribute differences), and if either attribute is generally smaller than the other, it contributes disproportionately less to the combined dissimilarity.

The individual distributions of dissimilarity ratings tend to be double-peaked (**Figure 7**), with the dominant peak containing

the 600 bimodal pairs, while 270 *color-only* and *orientation-only* pairs form a smaller bulge of lower values. The distinctness of the second peak relies upon the *color-only* and *orientation-only* pairs having comparable dissimilarities and overlapping distributions; it is thus least distinct for Subject #2.

#### **ESTIMATING MINKOWSKI PARAMETER** *p*

For each subject in turn, and for a given pair of colors (*a*:*b*), we obtain a mean dissimilarity *ab* by averaging the mean dissimilarity over the six appropriate *color-only* pairs (S*<sup>a</sup> m*: S*<sup>b</sup> <sup>m</sup>*) with <sup>1</sup> <sup>≤</sup> *<sup>m</sup>* <sup>≤</sup> 6, and over (S*<sup>b</sup> m*: S*<sup>a</sup> <sup>m</sup>*), i.e., the 12 presentations of that color combination as same-orientation bars (including the two presentations of a pair, left-right and right-left). By the same token, we obtain a mean dissimilarity *mn* for each pair of orientations by averaging the (5 × 2) combinations of that orientation pair as same-color bars (S*<sup>a</sup> m*: S*<sup>a</sup> n*) and (S*<sup>a</sup> n*: S*<sup>a</sup> <sup>m</sup>*) where 1 ≤ *a* ≤ 5. Inserting these values into Equation 1, with a given *p*, provides predicted dissimilarities for the bimodal stimulus pairs. We vary *p* and compare the predictions against the reported values. Note that this comparison relies on the raw data and does *not* involve the inter-point distances obtained in the MDS analysis.

Plotting the observed dissimilarity ratings (averaged over subjects) against the values predicted from *ab* and *mn* for three Minkowski metrics (*p* = 1, *p* → ∞, and *p* = 2) results in **Figures 8A–C**. Seemingly neither of the extreme metrics is ideal: the predicted dissimilarities for bimodal pairs are too large in (*p* = 1; **Figure 8A**) or too small (*p* → ∞; **Figure 8B**), in both cases introducing a discontinuity into the plot. The Euclidean metric (*p* = 2; **Figure 8C**) provides a better solution.

In addition to the three Minkowski metrics named above, we explored the predictive power of intermediate metrics, varying exponent *p* between 0.7 and 3.0. Following Soto and Wasserman (2010) we use the root-mean square error (RMSE) to compare the predicted and actual observed dissimilarities, measuring the discontinuity in the predictions and how well they account for the observations. The RMSE as a function of *p* is plotted in **Figure 9**. A minimum of 0.47 is achieved at *p* = 1.30, compared to the values of 1.02 and 0.99 at *p* = 1 and 2, respectively. Note that RMSE is closely related to the summed residuals used by To et al. (2008, 2011) and the Pearson correlation *r* used by Shepard and Cermak (1973) (see also Dunn, 1983). Predicted dissimilarities for *p* = 1.3 are plotted against observations in **Figure 8D**.

An assumption in the argument is that the data are ratio-level, i.e., that each numerical rating is *proportional* to the perception of that dissimilarity. This is crucial for applying Equation 1 to the mean *orientation-only* and *color-only* ratings. The assumption cannot be tested directly. If, however, the subjects' rating responses are a non-linear function of perceptions, then a different *p* should be optimal for predicting the larger dissimilarities of the bimodal pairs generated by *orientation-only* and *color-only* dissimilarities that are both in the upper half of their distributions (i.e., *mn* > 3.43, *ab* > 3.94), where the perception/response curve presumably differs in slope. The optimal *p* should be different again when we take *orientation-only* and *coloronly* ratings that are both in the *lower* half of their distributions and use them to predict the correspondingly smaller bimodal

dissimilarities. **Figure 9** plots, as a function of *p*, the correlation between predicted and empirical dissimilarities for these two subsets. Clearly the same Minkowski metric of *p* ∼ 1.3 generates the dissimilarities for both subsets. Estimates of *p* could also be distorted if the dissimilarity ratings were interval-level, linear but including a non-zero constant; this possibility is harder to exclude.

Repeating this analysis for individual subjects (**Figure 10**) shows substantial variation in the relationship between RMSE and *p*. It reveals, in particular that Subjects #1, #2, and #5 are governed by similar functions. Their optimal Minkowski exponents (1.3, 1.8, and 1.1, respectively; **Table 3**) correspond to a combination rule in which orientation and color are neither integral nor wholly separable. For Subject #4, with optimal *p* = 0.7, the function clearly indicates that orientation and color were separable, and indeed synergistic, so that the dissimilarity between stimuli differing on both attributes is greater than the sum of each attribute's dissimilarity in isolation. Finally, the optimal *p* = 3.1 for Subject #3 points to a combination rule that is closer to the Dominance metric.

#### **DISCUSSION**

On first glance the task of rating dissimilarities seems arbitrary and artificial. However, the relevance of the combination function that governs the underlying parameters is not limited to this task: integral and separable dimensions contribute in different ways to stimulus classification (Garner, 1974), classification errors (Shepard and Chang, 1963; Shepard, 1964), visual search (Treisman and Gormican, 1988), visual popout (Koene and Zhaoping, 2007), and signal detection (Ashby and Townsend, 1986). Moreover, perceptual dissimilarities bear upon the survival-centered problem of deciding whether or not the consequences of one stimulus generalize to a second. Shepard (1964, 1987) argued on *a priori* grounds that if an

organism's perceptual process is attuned to regularities in its environment, it should follow either the *p* = 1 or *p* = 2 metric when it combines multiple sources of dissimilarity, depending on assumptions about selective attention and the consequential neighborhoods of the stimuli.

Hyman and Wells (1968) considered other conditions conducive to a low *p*. If the stimuli are processed as symbolic or verbal codes then the city-block metric would be the natural rule for obtaining their dissimilarities, with no interaction between the separately-encoded components of these descriptions (in addition, the discrete nature of the parameters of variation can be emphasized by spatial separation of the corresponding attributes). Indeed, the simple reductionist stimuli of the present study varied along orthogonal, "nameable" parameters of orientation and color. They lend themselves to a "verbal response strategy" where the representation of each stimulus is simplified by reducing it to higher-order symbolic labels (e.g., "60◦+ red") and the parameters are processed as parallel verbal codes.

Tasks with a greater cognitive component can also shift integral dimensions to separable ones (Dunn, 1983; Foard and Nelson, 1984). Tversky and Gati (1982) went further, reporting a series of experiments where the dissimilarities could best be explained by a metric with *p* < 1, i.e., the attributes combined in a synergistic way (as with the present Subject #4).

Focusing like Shepard (1964) on cues and regularities in the visual environment, To et al. (2008, 2010) arrived at a different conclusion about dimensional integrality: the authors argue that changes in real-world scenes tend to be correlated (i.e., if one attribute of the scene has changed, it is likely that other attributes have changed also). Our perceptual mechanisms have the plasticity to recognize and exploit such correlations, creating the phenomenon of "cue recruitment" (Haijiang et al., 2006). The most efficient way of encoding such a change is the Dominance metric, in which the dissimilarity is determined by whichever attribute has changed most, suppressing other attributes since they provide little additional information. Indeed, dissimilarities between pictures of natural scenes were best fitted with Minkowski exponent *p* = 2.84 (To et al., 2008) or *p* = 2.48 (To et al., 2010), i.e., *p* > 2, indicating that an approximation to the Dominance metric was in place.

Further, a correlation between attributes is not the only condition that is conducive to a large value of *p*. Hyman and Wells (1967, 247) speculated that "speeding up the judgment process or otherwise overloading" the subject would increase *p* by causing competition and mutual masking among the dimensions. They wondered: "Does the apparent fit to the Euclidean metric in many judgment situations [i.e., *p* = 2 rather than *p* = 1 as might have been expected] indicate that [subject] is having trouble in extracting the information from both dimensions?" Complex differences in particular (as in To et al., 2008, 2010) might "saturate" the inter-stimulus dissimilarity. One complex scene manipulation—controlled by a single parameter, but changing multiple details of the scene—might leave the observer hard-pressed to attend to another simultaneous manipulation (thereby suppressing its contribution to the combined dissimilarity) simply by occupying the limited "bandwidth" of conscious comparisons. Foard and Nelson (1984) add stimulus duration and the task's nature to the factors affecting dimensional integrality.

We note in passing that the discriminative-limitation perspective predicts that *p* can be scale-dependent. Shepard's view (1987, Figure 4) of consequential neighborhoods makes the same prediction. For small enough differences between stimuli (or between stimulus and background), there is a threshold of discrimination where the detection of any change is limited by the specific sensory channel on which the difference is greatest (To et al., 2011). The contribution from any sub-threshold differences coded on other channels is small (in the case of probabilistic detection models) or zero. That is, *p* is large, approximating the Dominance metric as an asymptote. Thus, the neural channels that underlie some sensory domain can often be resolved with stimuli at the discrimination threshold, even if they merge in an isotropic continuum of integral dimensions at supra-threshold dissimilarities.

A tempting approach to the question is to apply MDS repeatedly with different Minkowski exponents *p*, choosing the *p* that minimizes badness-of-fit *stress*1. However, a confounding factor in calculations of *stress*<sup>1</sup> is that the constraints of geometrical embedding are imposed most stringently in Euclidean geometry (*p* = 2). This is why the algorithms function most smoothly in Euclidean space. As Arabie notes (1991), MDS for *p* = 1 and *p* → ∞ turns a single *d*-dimensional optimization into a series of *d* one-dimensional optimizations (requiring a combinatorial attack rather than a simple steepest-descent algorithm), the problem persisting in milder form for any *p* = 2. A related property of Minkowski metrics for *p* = 2 is that small changes in the relative weighing or salience of the dimensions can produce abrupt, discontinuous changes in similarity or preference ranking (Shepard, 1964). Recent algorithms using Bayesian Likelihood rather than *stress*<sup>1</sup> may finesse this problem (Okada and Shigemasu, 2010), but it is not clear how they apply to a "hybrid" geometry such as the present situation, in which *p* governs the combination of orientation and color, two internally-Euclidean subspaces.

One possibility is that the perception of dissimilarity emerges at an early stage of visual processing, from a neural locus where the signals of color and orientation are first combined; before attributes are subjected to parallel processing along separate pathways, and eventually re-integrated (Cavina-Pratesi et al., 2010). "Bottom-up" models based on visual search data allow the combination of dissimilarity contributions to approximate the Dominance metric (Zhaoping and May, 2007), but do not *require* such behavior, for the models do not place tight bounds on *p* (see also Nothdurft, 2000). Koene and Zhaoping (2007) postulated a "saliency map" in primary visual cortex in which the contrast between some combination of features (e.g., color C1+ orientation O1) and a background combination (C2 + O2) follows the Dominance metric, modified by detectors tuned to color + orientation conjunctions. The greater the input from conjunction detectors (relative to single-feature detectors), the further the metric is shifted toward the city-block model. Lateral inhibition from task-irrelevant variations in the background pattern reduces the city-block contribution (Zhaoping and May, 2007) and allows the proposed saliency map to behave more in line with the Dominance metric. Lateral inhibition of this kind could be a factor in difference judgments of the complex natural scenes used by To et al. (2008, 2010).

#### **CONCLUSIONS**

MDS of dissimilarity ratings confirmed the expectation that orientation and color can be represented as separate subspaces, with *color-only* and *orientation-only* mean dissimilarities *ab* and *mn*. Following Shepard and Cermak (1973), we combined these to obtain *p* directly (**Figures 9**, **10**). The range of inter-individual variation of optimal exponents is substantial—between 0.7 and 3.1 (**Table 3**)—but comparable to ranges found in previous studies (cf. Dunn, 1983; Soto and Wasserman, 2010). Notably, the exponent is *p* < 2 for four of our five subjects, and for data averaged across subjects, so the orientation and color attributes had not become "integral," nor merged their separate natures within an isotropic continuum. These values also conflict, to an even greater degree, with the results of To et al. (2008, 2010) from more complex scenes and manipulations.

The same conclusion—that color and orientation are not integral—emerges from the individual variations found by MDS. Specifically, the weight placed on color as a contribution to dissimilarity *varies* across subjects relative to the contribution from orientation (**Table 3**), with corresponding variations in the magnitudes of *ab* and *mn*. There is no obvious relationship between these dimensional-salience parameters and the exponents *p*, nor is one to be expected. We note that for Subject #3, whose *p* > 2, the data showed lowest internal consistency (**Table 2**) and least compatibility with a geometrical model, i.e., highest *stress*<sup>1</sup> (**Table 3**).

The obtained values also rule out the possibility that dissimilarities for these stimuli were determined *purely* by high-level, top-down cognitive operations, since the top-down symboliclabel model predicts *p* = 1, i.e., an absence of non-additive interactions between the two attributes. In practice the contribution of each attribute to total dissimilarity is affected by the value of the other attribute. If, for instance, a stimulus pair is separated by a smaller difference between their colors than between their orientations, then increasing the color difference will yield a relatively small increase in dissimilarity.

Possible artifacts were mentioned above that could increase *p* by encouraging mutual "masking" among the dimensions of variation. Of them, only the short time for responses applies (cf. Foard and Nelson, 1984): a change along either dimension is unlikely to saturate the capacity of visual processing, nor is there a background of task-irrelevant variations to inhibit the signal from feature-conjunction detectors in V1. Thus, it is unlikely that the subjects' actual values of *p* were much *lower* than these observed values.

It follows that the present results are not restricted to situations where the inter-stimulus variations involve clear-cut attributes, and a cognitive verbal-response strategy. We note also that Minkowski exponents *p* near to 1 have been reported even when the underlying parameters generating the stimuli are "relatively novel and difficult to verbalize—at least in any way that is general enough to extend beyond the immediate neighborhood of any one form" (Shepard and Cermak, 1973, 353).

The range of *p*-values across subjects is an interesting phenomenon in its own right, although it is an obstacle to drawing general, universally-applicable conclusions. One possible explanation is that a subject has access to several parallel strategies or processes, each comparing stimuli within a different Minkowski metric, with the judgment of dissimilarity being a combination of their outputs. Then the variations among subjects spring from weighting these outputs in different ratios. A possible role

#### **REFERENCES**


(2010). Separate channels for processing form, texture, and color: evidence from fMRI adaptation and visual object agnosia. *Cereb. Cortex* 20, 2319–2332. doi: 10.1093/cercor/bhp298


of top-down modulation in this weighting could be tested by manipulating the experimental instructions.

As noted earlier, Izmailov and Edrenkin (2010) reported dissimilarity data for 25 bar stimuli with five levels of orientation (0◦, 30◦, 60◦, 90◦, 120◦) and of luminance (1, 2, 8, 32, and 64 cd/m2). We applied our analysis to their 50 orientationonly and 50 luminance-only pairs to predict the dissimilarities of bimodal pairs. The predictions were most accurate for *p* ∼ 1.9. That is, in comparison to the present study, orientation and *luminance* appeared close to being integral. The departure demonstrates that there is nothing about the present approach that *forces p* < 2 as an outcome. Without further investigation, the reason for the different behavior of luminance is not obvious.

#### **ACKNOWLEDGMENTS**

Chingis A. Izmailov, after a long illness, passed away in September 2011 (http://en.wikipedia.org/wiki/Chingis\_Izmailov) while the manuscript was being finalized. We dedicate this work to Chingis' memory. Chingis A. Izmailov was supported by the Russian Foundation for Basic Research (Grant No. 07-06-00109a) and the Russian State Research Foundation (Grant No. 07-06-00184a). Our thanks go to German Levit (Chingis A. Izmailov's undergraduate student) who assisted in data collection. Partial results of this study were presented by Galina V. Paramei at the *Vision Meeting* of *The Colour Group (Great Britain)*, London, UK, 6th January 2010. The authors are grateful to David Alleysson, Michelle To, and Li Zhaoping for valuable discussions and for helpful comments and references, and to the reviewers for constructive feedback.


(Moscow: IPRAN), 390–408 (in Russian).


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 09 June 2013; accepted: 28 September 2013; published online: 17 October 2013.*

*Citation: Bimler DL, Izmailov CA and Paramei GV (2013) Processing bimodal stimuli: integrality/separability of color and orientation. Front. Psychol. 4:759. doi: 10.3389/fpsyg.2013.00759*

*This article was submitted to Perception Science, a section of the journal Frontiers in Psychology.*

*Copyright © 2013 Bimler, Izmailov and Paramei. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

### The microgenesis of the watercolor effect

#### *Adam Reeves <sup>1</sup> \*, Baingio Pinna2 and Felix Roxas <sup>1</sup>*

*<sup>1</sup> Department of Psychology, Northeastern University, Boston, MA, USA*

*<sup>2</sup> Department of Architecture, Design and Planning, University of Sassari at Alghero, Sardinia, Italy*

#### *Edited by:*

*Galina Paramei, Liverpool Hope University, UK*

#### *Reviewed by:*

*Daw-An Wu, Caltech, USA Frederic Devinck, Université Rennes 2, France*

#### *\*Correspondence:*

*Adam Reeves, Department of Psychology, Northeastern University, 360 Huntingdon Ave., Boston, MA 02115, USA e-mail: reeves@neu.edu*

The "watercolor effect" is the wash of illusory color that fills in between two enclosing bichromatic contours. We studied the microgenesis of this illusion by varying the duration of the eliciting stimulus (a yellow/purple contour outlining the Mediterranean Sea) and by varying the duration of a blank interval from stimulus offset to an after-coming mask (the ISI). The illusory wash was rated in strength and also matched to a comparison disk of adjustable color but similar luminance. Results indicate that the watercolor effect grows rapidly as stimulus duration is increased to 100 ms and then grows much more slowly. Increasing the ISI beyond 10 ms had no effect, suggesting that the wash arises only during stimulation. Participants who recognized that the bounding contour depicted the Mediterranean reported twice as strong an illusory effect as those who did not, indicating that visual long-term memory modulates the watercolor effect despite the rapidity of its generation.

**Keywords: color, illusions, microgenesis, masking noise, matching**

#### **INTRODUCTION**

Outlining a shape with a single color on a uniform gray field or gray paper normally affords perception of the outline by itself, with no filling-in of the enclosed area. However, if the shape is drawn as a bichromatic contour, with, for example, yellow on the inside and purple on the outside, an illusory wash of pale yellow appears to fill in the enclosure—the "watercolor effect" of Pinna (Pinna, 1987, 2008; Pinna et al., 2003; Pinna and Reeves, 2006). Conversely, if the yellow is on the outside, the illusory wash spreads outwards from the contour to the edge of the display, the central region remaining gray. Both effects are quite salient with an outline of the Mediterranean Sea (Pinna, 1987, 2008; Spillmann et al., 2004; Pinna and Mariotti, 2005), and we employed this stimulus again in the current research. Illusory watercolor filling-in generally occurs only on the weaker side of the bichromatic contour, that is, the side with lower luminous contrast relative to the background (Devinck et al., 2005; Pinna, 2005). In our display, yellow has lower apparent contrast than purple when presented on a gray field.

The watercolor effect is compelling; observers are certain that the filled-in color they see is genuine, not an illusion (Pinna, 1987, 2008). The illusion has several important properties related to filling-in process (Walls, 1954; Pinna et al., 2001), namely: it is independent of spatial extent, being the same for big as for small shapes; it is spatially uniform, being seen with equal strength across the entire enclosed region, not just at the edges; it is essentially independent of retinal eccentricity, not requiring that the contour be fixated; and it does not decrease over time. These properties agree with some of those deduced from filling-in between retinally-stabilized edges (Krauskopf, 1963) and for brightness perception (Paradiso and Nakayama, 1991), but not with those of some other classic "filling-in" phenomena, such as filling in across a scotoma, whose spatial extent is restricted. Although the wash appears desaturated, it creates a strong grouping, such that the filled-in area becomes figural. Indeed, the grouping due to the illusory wash is stronger than the Gestalt factors (Koffka, 1935) proximity, similarity, and good continuation, when tested in a cue conflict paradigm (Pinna, 2010), and is stronger than the grouping produced by a matched real color (von der Heydt and Pierson, 2006). Curiously, the two colors in the bichromatic contour can be slightly separated and yet still induce a watercolor effect (Pinna et al., 2001; Pinna and Deiana, submitted). Neither of these two latter properties are representative of filling-in *per se*.

These properties of the watercolor effect are compatible with general principles underlying the distinction between the featural "FCS" system and the boundary "BCS" system advanced by Grossberg and Mingolla (1985). In that theory, spatiallylocalized BCS signals constrain the flow of visual features (colors, textures, and light/dark signals) which "fill-in" between the boundaries. The BCS/FCS signals are data-driven, arising in a bottom-up manner, but in our understanding, the featural landscape thus generated has the potential to resonate with topdown long-term visual memory to support recognition and to yield consciously perceivable visual objects. Illusory, spatiallyuniform, eccentricity-independent illusory color washes between bounding contours can be explained in this fashion (Pinna and Grossberg, 2005). That a real (non-illusory) wash of equivalent saturation does not form as strong groupings as the watercolor stimulus (Pinna, 2008, 2010) suggests that such a resonance may be needed to sustain the watercolor effect, whereas in the case of a real wash, such a resonance is not as necessary since the supporting feature (color) is bottom-up driven.

In the present study we asked two questions about how the illusory effect is generated. Since the wash seems to appear along with the contours, we expected the time course of its generation its so-called microgenesis (Werner, 1956)—to be relatively fast. In the BCS/FCS theory, filling-in is initially determined by automatic, fast, low-level processes (Grossberg, 1997). Pinna et al. (2001)reported that the illusion was visible at the shortest presentation time permitted by their equipment, namely, 100 ms. Huang and Paradiso (2008) reported that filling-in within monkey V1 cells takes on average 80 ms and is complete by 200 ms, taking longer for longer distances. We note that the microgenesis of the wash, though fast, is not likely to be as fast as that of the bounding contours themselves, as predicted since BCS processing is prior to FCS filling-in. Second, we asked if there exists a top-down, cognitive contribution to the illusion. A rapid, low-level, feed-forward visual filling-in process would not be influenced by knowledge of the shape of the bounding contour, but in the BCS/FCS system, top-down, memory-driven resonances could strengthen the boundary and increase the illusion. The version of the illusion we used was a sketch of the Mediterranean Sea, and we simply asked whether participants who recognized the Sea would have a stronger illusion that those who did not. (As the eliciting stimulus is constant, this method was preferred to the alternative of comparing well-known and nonsense objects, which confounds knowledge with physical shape differences.) Pinna (2012) has argued that from a developmental perspective that visual object formation occurs according to the following sequence: contours, color, shading, and lighting. To the extent that this also applies to microgenesis, we expect to see the same sequence develop as stimulus duration is increased. The present research, by using the outline of the Mediterranean Sea, addresses the first of these only; other stimuli will be needed to study the development of shading and lighting.

The first Experiment adopted a matching procedure to measure the extent of the watercolor effect in continuously-presented stimuli objectively, and also to determine the chromaticity of a standard or "anchor" stimulus used for a subjective, rating procedure in the second experiment. This rating procedure allowed us to measure the time-course of development of the illusion with reasonable efficiency in both the second and third experiments (the latter involved a variant rating procedure); only the duration of the stimulus was varied. The fourth employed backward masking to control stimulus persistence, analogous to the use of backward masking to control brightness filling-in by Paradiso and Nakayama (1991).

### **MATERIALS AND METHODS**

#### **PARTICIPANTS**

These were undergraduates of Northeastern University with normal vision, 20/20 or better visual acuity, and normal color vision as determined by Ishihara's (1965) pseudo-isochromatic plates, aged between 17 and 24 years old. The number of participants in experiments 1, 2, 3, and 4 was 17, 34, 12, and 30, respectively. All participants signed an informed consent that had been vetted by Northeastern's IRB. They received course credit, but were free to leave the experiment at any time, even before completion.

#### **STIMULI AND APPARATUS**

Two printed bichromatic outlines of the Mediterranean Sea and its surrounding countries, used in Pinna's (1987, 2008; see also Spillmann et al., 2004; Pinna and Mariotti, 2005) original study, were scanned as pic files and presented as 33 × 17 cm images on a calibrated Sony 19- diagonal cathode-ray display monitor (see **Figure 1**, left panel). Viewed from 63 cms, the outlines subtended 27*.*7 × 15*.*1 degree of visual angle and covered roughly the lower two-thirds of the monitor screen (whose width was 33 degree, height 26 degree). A filled disk of 4.7 degree dia. served as a comparison (see **Figure 1**, left panel, upper right corner, where the disk is portrayed with a strong yellow). The center of the disk was 7.4 degree from the top of the screen and 8.5 degree from the right-hand edge. The disk was separated from the nearest point on the outline of the Sea by 2.2 degree. The watercolor and comparison stimuli appeared on a uniform gray field of 116 cd/m² which covered the entire 19- display. The bi-chromatic outline of the Sea was drawn with two flanked lines of yellow (which subtended 6 m min arc) and purple (5 min arc). One figure had purple on the outside and yellow on the inside of the contour (Yellow In, or Yin), and the other figure had the reverse (Yellow Out, or Yout); **Figure 1** portrays the Yin stimulus.

The monitor was driven by a Cambridge Research Systems VSG-5 card programmed in Matlab V.6 and run under Windows XP. The VSG card provides accurate timing of display frames when run repeatedly in "movie" mode, as confirmed with a counter triggered by a photodiode: every 10 ms frame was timed correctly over a 20 min. calibration period. The nominal chromaticity of the gray field was (0.290, 0.300) in CIE *(x, y)* co-ordinates as recorded with a calibrated Cambridge Research Systems colorimeter. This gray point is very slightly bluer than most standard daylights—for example, Illuminant C is (0.310, 0.316). This gray was chosen as being red/green neutral and having no yellow in it, to ensure that any perceived yellow was illusory. The (x, y) chromaticities of the yellow and purple lines used to draw the stimuli were (0.480, 0.500) for yellow and (0.301, 0.120) for purple. Their luminances were 141 cd/m² and 53 cd/m² respectively. To make these measurements, the chromaticities were duplicated over areas large enough to fill the aperture of the colorimeter.

#### **EXPERIMENT 1: MATCHING**

Matching experiments were run to obtain an objective or "Type A" (Brindley, 1960) measure of the strength of the illusory wash with the Sea. Research using phenomenological ratings or other verbal descriptions ("Type B" measures) has established conditions in which the illusion is seen, but not its precise extent. Type "A" matches of real to illusory colors in a variant watercolor stimulus due to Pinna et al. (2003), namely, irregular outline squares with purple and orange bichromatic contours, produced watercolor effects of 3% of the distance to the inducing orange in a

CIE uniform color space (von der Heydt and Pierson, 2006) and of 5.6% in a near replication (Devinck et al., 2005). Our participants similarly matched the color of the illusory area to that of a real comparison stimulus. We permitted almost continuous variation of the color coordinates of the matching stimulus, as did Devinck et al. (2005), rather than employing a discrete series of stimuli, as in von der Heydt and Pierson (2006), as the effect was expected to be small.

In an ideal type A match, the two areas, once matched, are indiscriminable. Type A matches have been critical to visual science since they imply identity of the signals from the two areas at all sites in the visual pathway subsequent to the first site at which identity occurs. Such matches have been invaluable in analyses of higher levels of the visual pathway (as for example in color induction) as well as in classical studies of receptor properties. Some researchers have qualified this picture, since in the case of asymmetrical matches (e.g., between lights viewed under different illuminants), subjects feel they should be able to improve on even their best matches (e.g., Brainard et al., 1997), indicating that the signals from the two areas are not always indiscriminable. In our experiments the scene illumination was constant and the illusory wash appeared as a surface color (the illusion being cognitively impenetrable), not as a trick of the lighting, so Type A matches were possible. If signals from matched real and illusory areas are indeed identical, the illusion should be cancelled by a real complementary stimulus of the same magnitude, as found by Devinck et al. (2005).

To make matches, the 17 participants had mouse control over the two chromatic co-ordinates (*x*, *y*) of a uniformly filled comparison disk, positioned outside the illusory area, of fixed size (3 degree of visual angle) and fixed luminance, but variable hue and saturation. The disk was given an arbitrary chromaticity (0.2 *< x <* 0*.*4; 0.2 *< y <* 0*.*4) at the start of each trial<sup>1</sup> , and the participant was asked to adjust its color to match that of the wash. Small mouse movements initially made large differences in disk color, but the left mouse button could be pressed at any time to halve the step size, so the participant could narrow in on the best match. Best matches were signaled to the computer by a right button press, ending each trial. Note that the illusory stimulus remained present, and physically unchanged, throughout the matching procedure. Fortunately the illusion is stable and does not fade while the matches are being made in this painstaking manner.

There were four conditions. In the two *complementary* conditions, participants matched the comparison disk to the gray inside region when the stimulus was Yellow Out (Yout), or to the gray outside region when the stimulus was Yellow In (Yin). We anticipated that these matches would on average equal the chromaticity of the gray field, given that the bichromatic contour exerts an illusory effect only in the opposite direction. In the two *illusion-generating* conditions, participants matched the comparison disk to the gray inside region when the stimulus was Yin, or to the gray outside region when the stimulus was Yout.

Participants were run for 3 to 5 blocks of 10 trials in each condition in each block, for a total of 40 trials per block. Thus 30, 40, or 50 matches were made per participant per condition. The order of conditions in each block was random, except that 9 of the participants began with illusion matches and 8 with complementary matches. The participants who ran fewer than 5 blocks took extra time per trial, there being no explicit time pressure.

Although the matching instructions were understood and making matches to gray was found to be easy, making matches to the illusion was not easy. Participants reported that the color of the wash, though perceptually salient, was hard to capture, and the ultimate matches were often felt to be "as good as possible" rather than exact. In this respect the matches were not pure Type A, but like the asymmetrical matches of Brainard et al. (1997).

#### **RESULTS (MATCHING)**

Matches (**Table 1**) were transformed to (*u*- *, v*- ) space as this coordinate system more nearly approaches a uniform chromaticity scale than does (*x, y*) (Wyszecki and Stiles, 1982). Matches for the two *complementary* areas, where no illusion was anticipated, averaged (*u*- = 0*.*191, *v*- = 0*.*448) for the gray outside region of Yin and (0.194, 0.449) for the gray inside region of Yout (see the lower right corner of **Figure 2**). These values deviated slightly from the actual gray field (0.193, 0.449), but not significantly as the error bars overlapped the gray field coordinates.

Consistent with their being a color shift in the *illusion* areas, the chromaticity matched to the central gray region averaged to (*u*- = 0*.*189, *v*- = 0*.*465) in Yin and (0.190, 0.454) in Yout. The illusion in Yin represents 13% of the distance in (*u*- *, v*- ) space from the white point to a monochromatic yellow of wavelength 565 nm; the corresponding distance in Yout being 6%. These distances indicate fairly desaturated illusory colors, but as the threshold for discriminating white from yellow is 3.1% in (*u*- *, v*- ) space (Wyszecki and Stiles, 1982), the illusory yellow in Yin would be visible to the average participant, being four times threshold.


<sup>1</sup>For reference, *<sup>x</sup>* <sup>=</sup> *<sup>X</sup>/<sup>T</sup>* and *<sup>y</sup>* <sup>=</sup> *<sup>Y</sup>/T*, where *<sup>T</sup>* <sup>=</sup> *<sup>X</sup>* <sup>+</sup> *<sup>Y</sup>* <sup>+</sup> *<sup>Z</sup>* is the total light through the *X, Y*, and *Z* colorimetric filters. In the (*x, y*) system, the spectrum locus spans 0*.*04 *< x <* 0*.*72 and 0*.*01 *< y <* 0*.*82. CRT color displays cover about half of the total possible area, but this includes the watercolor illusion as this is desaturated and spans only the central region of the space. Transforming the *(x, y)* space into some other system is required to convey equal threshold intervals, relative receptor activations, or equal spacing of color appearances. At the prompting of a reviewer, we chose a uniform color space (*u*- *, v*- ) so that distances in the space would be comparable in terms of threshold units, where *u*- = 4*x/Q*, *v*- = 9*y/Q*, and *Q* = 12 *y* − 2 *x* + 3. Data in (*u*- , *v*- ) can be transformed back to (*X, Y, Z*) and thence to any other system, given that the luminance *Y* = 116 cd/m².

**FIGURE 2 | Upper panel:** Mean matches in Experiment 1 for Yin and Yout configurations. Matches were made to the illusory area and also to the complementary gray area as a control; the actual gray field is noted by a filled square for comparison. Error bars show ±1 standard deviation of trial-to-trial matches in the *u*' and *v*' directions, averaged over 17 participants. The spectrum locus lies outside this plot: the dominant wavelength corresponding to Yin, 565 nm, would be located at *u*- = 0*.*177, *v*- = 0*.*573. **Lower panel:** frequency polygons of mean matches from the 17 participants in Yin and Yout, plotted against distance in (*u*- , *v*- ) space to the matched gray. Threshold for a visible change from gray to yellow is marked by the left-hand broken vertical arrow. The standard used in Experiment 2 is marked by the right-hand vertical arrow.

The illusion generated by the Yin version of the Sea, being 13%, was clearly stronger than that 5.6% generated by the variant watercolor illusion used by Devinck et al. (2005) and the 3% reported by von der Heydt and Pierson (2006). Such a large difference awaits explanation; part of the reason may be the recognizability of the Sea, to which we will turn in Experiment 2.

Individual differences could be ascertained fairly reliably, based on the 30, 40, or 50 trials run in each condition. Participants were highly reliable in their settings, their standard deviations averaging just *u*- = 0*.*0028, *v*- = 0*.*0037 over participants in Yin (the error bars shown in **Figure 2**). Euclidean distances in (*u*- *, v*- ) space from each participant's white point to his or her average illusory chromaticity in Yin varied widely, from 0.0077 to 0.032, across the 17 participants. It is likely that 15 of them experienced illusions, these distances being more than 3 standard deviations above threshold, but the two participants with the smallest distances, 0.0077 and 0.0087, may not have done so. Across participants, the magnitude of the illusion in Yin did not correlate significantly with that in Yout [*r* = 0*.*285, *t(*15*)* = 1*.*16, n.s.], suggesting independent processing of the different regions.

The results of the matching experiment show that the Watercolor effect can be measured using a matching procedure, and indicate that the effect is reliably greater than threshold for nearly all participants. The effect is about twice as large in Yin than in Yout, consistent with the filling-in of an enclosed area being stronger than spreading outwards into an indefinite region. In addition, the neutrality of the complementary (non-illusory) areas was confirmed. The matches were of high quality, the standard deviations being low. However, some participants reported that the illusory area could still be discriminated from the comparison at the match point, so these were not all true Type A matches, either because the luminance of the matching stimulus could not be adjusted, or because the color of the illusory wash differs from that of the matching stimulus in some other, qualitative, manner.

#### **EXPERIMENT 2: RATING** *RE* **A PHYSICAL ANCHOR**

To obtain systematic data as a function of stimulus duration, which required many more data points than we obtained in matching, we used a faster rating procedure.

#### **PROCEDURE**

The same watercolor stimuli were used as in Experiment 1. The disk used for matching in Experiment 1 now provided a physical anchor or standard for the rating scale. The disk was a fixed pale yellow shown by the cross in **Figure 2** (at *u*- = 0*.*190, *v*- = 0*.*470). Participants were first shown a black and white paper version of the stimulus to test their recognition of the Mediterranean Sea from its outline. Some could not do so, not being of European origin, unlike Pinna's Italian observers. After the experiments were over, those who had not recognized the Sea were asked if they had done so during the experiment, to check for delayed recognition. This was the case for just four observers, whose data were folded in with those who initially recognized the Sea; this did not change the pattern of results. Participants now only rated the illusory regions, not the complementary areas, since participants in Experiment 1 had matched the complementary areas to gray. (As a check, though, every participant also rated the complementary areas once at the start of de-briefing, and they did rate them as 0, meaning gray.)

All participants were instructed to rate the strength of the color wash from 0, or none, to 4, the rating assigned to the standard, with 1 designated as a just-visible illusion, and 2 and 3 to be spaced as equally as possible between 1 and 4. The same standard was used in all conditions for all participants. Ratings 2 and 3 were not anchored but left up to the participants' understanding of "equally-spaced." (Here, strength implies the saturation and perhaps brightness of the yellow wash, as the illusion did not affect hue.) The standard could be requested at any time between trials, upon the participant's key press. After initial practice, such requests became infrequent as participants mostly relied on memory. In debriefing, participants reported understanding the rating task and being able to use the scale as requested. Although the standard was more saturated than the mean match made in Yin by every participant in Experiment 1 (see **Figure 2**, lower panel), nevertheless, due to variability over trials in illusion strength, participants with strong illusions might have wanted to use a rating of "5" on occasion. We had abandoned this rating level in pilot work as it was rarely used, but the highest mean ratings, those obtained at the longest durations, may yet have been somewhat truncated.

Participants viewed eight different durations of the stimulus, namely, 10, 20, 30, 100, 300, 1000, 3000, and 10000 ms. Each duration was presented four times in random order, for a total of 32 trials per participant per stimulus. After each stimulus presentation, participants gave their rating. Half the participants rated the Yellow In (Yin) stimulus first, and then the Yellow Out (Yout); the order was reversed for the other half. They rated the *illusory* areas, that is, they rated the central, bounded, region in Yin and the region outside the bounding contour in Yout. Each participant rated four trials of Yin and four trials of Yout, at each duration.

#### **RESULTS AND DISCUSSION (RATINGS; FIXED ANCHOR)**

Confirming Pinna (1987, 2008), not one of our 34 participants realized that the wash was illusory. (Many asked us to prove this was so: we complied, after debriefing, by blocking off sections of the bounding contour.) The illusion is cognitively impenetrable even when participants are rating or matching the illusory stimulus multiple times for an entire hour.

Participants were separated into those who recognized and those who did not recognize the Mediterranean Sea from its outline, as determined during de-briefing. As 16 recognized and 18 did not, the sizes of the two groups were comparable. (*En passant*, we note that the data of the participants who reported feeling bored by the experiment did not differ systematically from the rest, so results were collapsed over interest level.)

The four ratings of each condition and duration provided by each participant were averaged and taken as the input for all statistical calculations and graphs. There were large individual differences, whose causes are unknown: some participants needed just 3 frames (30 ms) to reach their maximum rating, while others needed 100 frames (1 s) in the "recognize" group, or as much as 500 frames (5 s) in the "unrecognized" group.

Mean ratings over participants are plotted in **Figure 3** with error bars based on standard errors across participants. It should be kept in mind that the error bars are small due to the large number of participants, not to lack of systematic individual differences. Despite individual differences, the results were consistent in that the illusion strength in Yin was always rated as greater than in Yout, and in every participant, the illusion strengthened with duration, so the mean data are in this sense representative.

Plots of mean rating vs. stimulus duration in **Figure 3** show a progressive increase in illusion magnitude as stimulus duration was increased. Logarithmic time axes were used in graphs to avoid compressing the data at short durations; however, all reported growth rates are from regressions of mean ratings against linear

recognize the Sea are plotted separately. Error bars show ±1 standard error of each mean.

time. The data appear to fall into two regimes, with a fast rise at durations below 0.3 s followed by a slow rise thereafter. That is, mean ratings in Yin (**Figure 3**, top panel) rose rapidly from 0.01 to 0.3 s, at 1.0 rating unit per tenth of a second (1.0/100 ms) for recognizers, and 0.77/100 ms for non-recognizers. In contrast, from 0.3 to 10 s, ratings rose by only 0.03/s for recognizers and by 0.06/s for non-recognizers (n.s.). Data for Yout (bottom panel) were similar, in that ratings also increased rapidly from 0.01 to 0.3 s, at 0.64/100 ms for recognizers, and 0.88/100 ms for nonrecognizers (*r*<sup>2</sup> <sup>=</sup> <sup>0</sup>*.*77 and.88, respectively), while from 0.3 to 10 s ratings increased at a rate of only 0.02/s for both participant groups (n.s.). It is also clear from the plots in **Figure 3** that ratings were higher for those who did recognize the Sea than those who did not, at each stimulus duration.

Averaging ratings over both recognizers and non-recognizers, ratings for Yin were higher than those for Yout, consistent with the matching data from Experiment 1, as is made clear in **Figure 4**. The difference between the ratings in Yin and Yout increased slightly at longer stimulus durations, but it is notable that a difference is evident even with a stimulus duration of only one frame (0.01 s).

#### **EXPERIMENT 3: RELATIVE RATING**

In Experiment 3, 12 new participants were trained to use a relative rating scale, in which "4" represented the illusion when the Sea was continuously presented, both in Yin and, separately, in Yout. There was no fixed comparison disk. Other ratings were designated as before. In this manner we could compare the growth of Yin and Yout ratings over time, while normalizing out the differences between their absolute magnitudes that appeared in Experiment 2. Only those who recognized the Sea were run. The procedure was the same as in Experiment 2, except that the longest duration was 5 s.

#### **RESULTS AND DISCUSSION (RATINGS; RELATIVE TO 4.0)**

Mean ratings are plotted in **Figure 5**, again on a log time basis to avoid compressing the data at short stimulus durations. Linear regressions showed the typical fast growth over stimulus durations from 0.01 to 0.1 s, namely, 0.85/100 ms for Yout, and 1.05/100 ms for Yin (*r*<sup>2</sup> <sup>=</sup> <sup>0</sup>*.*73 and 0.81, respectively). Growth was much slower from 0.3 to 5 s, at 0.11/s for Yin and 0.23/s for Yout. Growth patterns for Yin and Yout were generally similar. It is also clear that the illusion in Yin was stronger than in Yout, the average rating for Yin (2.78) exceeding that for Yout (2.00). The asymptotic rating, expected to be 4 at the longest duration for both conditions, only reached 3.5 for Yin and 3.0 for Yout at the longest duration used (5 s). In hindsight, we should have included an even longer duration to make quite certain that 4 was eventually reached, as per instructions.

To characterize the growth of the watercolor effect during the early period when the illusion is growing rapidly, we asked at what time does the effect reach half-way between the rating at 0.01 s and that at 0.1 s? Linear time axes were used to interpolate the half-way point. This calculation removes any effect of an overall difference in strength. **Table 2** shows the mean ratings at 0.1 and 0.01 s, the difference between them (the "Effect" column), the half-way rating magnitude (Rating at

**against stimulus duration in log s.** Error bars show ±1 standard error of each mean.

0.1 s minus half the Effect), and the half-way times (the time to reach the half-way magnitude). The rows show these values for each configuration (Yin, Yout) and recognition (did, didn't), in Experiments 2 (Yin-2-did, Yin-2-didn't, etc.) and 3 (Yin-3-did, etc.).

In Experiment 2, those who recognized the Sea attained a faster growth in the illusion than those who did not: the mean half-way times were 0.052 s and 0.075 s respectively. Recognizers experienced both more illusion and a faster growth in the illusion than non-recognizers.

Averaged over recognition status, growth in Yin was slower than in Yout in Experiment 2 (the first 4 lines of **Table 2**): the half-way times are 0.063 s in Yin and 0.027 s in Yout. We asked whether this slower growth for Yin was also the case just for recognizers, as these were run in both Experiments 2 and 3. Their half-way times were indeed longer (0.052 s) in Yin than in Yout (0.025 s) in Experiment 2, but were the same in Yin (0.025 s) as in Yout (0.026 s) in Experiment 3 (the last two lines of **Table 2**). Thus these results are inconclusive and we can only conclude that the growth in illusion strength over the first 100 ms in Yin is either equal to, or slower than, that in Yout. A matching experiment, not dependent on the choice of rating scale anchor, may be needed to resolve this ambiguity.

**plotted against stimulus duration in log s.** Only recognizers were run.

Error bars show ±1 standard error of each mean.

#### **Table 2 | Estimates of the time in secs to reach 50% of the illusion as rated at 0.1 s.**


#### **EXPERIMENT 4 (MASKING)**

Geometrical illusions such as the Ponzo and Zoellner illusions develop microgenetically, as demonstrated by Reynolds (1978), who followed a brief (50 ms) presentation of the illusiongenerating pattern by a 200 ms patterned backward mask. At short inter-stimulus intervals (ISIs), subjects reported the stimulus as it truly was; only at an ISI of 100 ms did the illusions reach their full extent. Reynolds concluded that perception of the parts occurred before their interaction could create the illusion evident in the whole. Kurylo (1997) presented a columnar array of dots to study Gestalt grouping, which was followed by a patterned mask. He argued that the dots must first be located before they can be grouped, so that at longer array-to-mask intervals, only grouping would be interfered with (at shorter ones, locations might also be lost). On this basis he identified the "completion time" as that interval at which interference was first observed. Mean times to complete grouping were 88 ms by proximity and 119 ms by alignment. These times suggest a minimum for the watercolor effect since regions cannot be filled in until their boundaries are grouped into a unit.

We wondered whether backward masking could also be used to study the microgenesis of the illusory wash. Might the observers see a gray field for a period before the wash "filled in," or was the fill-in virtually instantaneous? We ran both recognizers and non-recognizers to discover whether any such effect might be modified by prior knowledge.

#### **PROCEDURE**

The stimulus duration was held constant at 50 ms, long enough for the illusion to have developed about half-way according to the earlier ratings. The 50 ms stimulus was followed, after a blank ISI consisting of the same gray 116 cd/m² field, by a colored, textured mask (**Figure 1**, right-hand panel). The gray field retained the same luminance during the ISI as before the stimulus and after the mask to avoid brightness transients (as might be created by a blank—i.e., dark—ISI). The mask was completely effective when presented simultaneously with the stimulus, rendering the stimulus invisible. Unlike Reynold's line masks, which were chosen to disrupt edge interactions, our mask consisted of a texture of multiple colored dots from green through red, averaging to yellow. This mask was chosen to disrupt filling-in by color, rather than the processing of the bichromatic outline. The ISIs were 10, 20, 30, 50, 100, 150, 200, 300, and 500 ms, and were randomized over trials.

Participants again rated the intensity of the illusion on a four-point scale after each appearance of the stimulus. As in Experiment 3, ratings were relative; that is, the maximum rating of "4" was assigned to Yin and separately to Yout, when these stimuli were presented continuously. Half the participants were shown the Yin stimulus before the Yout one, and the remainder, the reverse. Each participant received 6 stimuli at each ISI for a total of 54 trials each.

#### **RESULTS AND DISCUSSION (MASKING)**

Of the 30 participants, 18 did, and 12 did not, recognize the Sea. For both groups, the illusion reached its asymptote when the mask was presented immediately following the stimulus (an ISI of just 10 ms), and did not increase during the next 500 ms. This is shown by the flat curves in **Figure 6** for the Yin configuration at the top and for Yout at the bottom; different symbols indicate recognition status. Regression analysis is hardly necessary but confirmed that ratings were essentially flat across ISI, with a mean change of <sup>−</sup>0.03/s (r2 values were less than 0.19 for each of the 4 plots in **Figure 6**, ns). The similarity of the data of those who did, and did not, recognize the Sea is marked; clearly the time course of masking, as revealed by these relative ratings, was independent of recognition status.

The steadily increasing growth of the illusion when stimulus duration is increased, seen in Experiments 2 and 3, is in clear contrast to the flat curves obtained when the ISI was increased in Experiment 4. Providing more processing time for the illusion by increasing the ISI had a quite different outcome compared to increasing processing time by increasing duration. In principle, increasing the ISI provides an opportunity for developing the illusion, as in Reynold's (1978) experiment, but this did not happen here. Rather, watercolor filling-in appears to happen simultaneously with the processing of the bichromatic contour, or perhaps just after but still during the 50 ms stimulus presentation. Additional post-exposure time has no effect.

**FIGURE 6 | Mean ratings in Experiment 4 (masking) of Yin (upper panel) and Yout (lower panel), for recognizers and non-recognizers, plotted against ISI in log s.** Ratings were relative, as in Experiment 3. Error bars show ±1 standard error of each mean.

#### **GENERAL DISCUSSION**

The main outcome of the current research is that the watercolor illusion grows rapidly during the first 100 ms, but only when the stimulus is physically present; there was no further growth during the blank interval before the onset of a color-texture mask. This provides evidence that the watercolor effect forms rapidly, but requires a bottom-up signal, consistent with our understanding of the BCS/FCS model (Pinna and Grossberg, 2005), as boundary contour system (BCS) formation, which is bottom-up, immediately precedes filling in by the form system (FCS). However, two additional facts modify this picture. First, the illusion does grow, albeit much more slowly, if the exposure duration in increased from 0.1 to 10 s. (It is possible that this additional slow growth is related to monkey V1 "edge cells," which progressively fill in color from about 0.3 to 8 s: von der Heydt et al., 2003). Secondly, those participants who did recognize the outline of the Sea experienced a faster growth to a higher level than those who did not recognize the Sea. One might have anticipated that the interaction with visual long-term memory that promoted the illusion in the recognizers would occur only after some delay, even in the BCS/FCS model. Yet the results show a clear advantage for recognizers even with 10 ms exposures, although the recognition effect does increase slightly with stimulus duration. Therefore the longterm memory interaction appears to be remarkably rapid. It is not known whether this interaction occurs afresh on each trial or reflects priming from previous trials, since only one watercolor stimulus (the Sea) was used throughout each experiment. Possibly primed signals are readied to guide microgenesis when stimuli are primed by an episodic form of long-term visual memory. A role for priming could be established by comparing repeated presentations of the same stimulus to presentations of novel stimuli on each trial.

A further outcome is that the watercolor effect can be measured quite precisely using a matching paradigm. This was not unexpected, as the illusory color is quite striking and does not fade with time. However, it is somewhat disturbing that previous measurements of the watercolor effect using matching to nonsense forms yielded a much smaller illusion, of 5.6% in (*u*- *v*- ) space (Devinck et al., 2005). One can postulate that the outline of the Sea was a nonsense stimulus for our non-recognizers. If so, recognition would have to more than triple the illusion from 5.6% (for non-recognizers) to 20% (for recognizers) to obtain our mean color shift of 13% for Yin in Experiment 1, given that half our participants recognized the Sea. Unfortunately we did not check for recognition in Experiment 1 and cannot ascertain if this is true, but the ratings do not suggest that recognition even doubles the effect, let alone triples it. Physical differences, such as in spatial area and contour complexity, may account for the remaining differences.

The possibility of measuring the watercolor illusion accurately means that it can be compared with other more traditional chromatic illusions. An obvious example is chromatic induction (Shevell and Wei, 1998). Since light scatter is one component of chromatic induction (Walraven, 1973) but not of the watercolor effect, a direct comparison of magnitudes is risky. However, Shevell and Wei (1998) were able to identify a source of chromatic induction from a neural signal for contrast at the edge of the test field using a method that controls for stray light. Ware and Cowan (1982) measured the extent of chromatic induction when test and inducer stimuli were alternating thin strips, each 6 min arc wide, similar to the lines in our watercolor stimulus. With light scatter and chromatic aberrations controlled, they found that a yellow inducer shifted a white test field chromaticity toward blue by 22% and 15% for each of two observers, in (*u*- *, v*- ) space. This is somewhat comparable to our 13% effect in Yin, but much larger than the 5.6% effect found with irregular outline squares by Devinck et al. (2005). Future measurements will be needed to compare chromatic induction directly to the watercolor effect, using the same participants, adaptation conditions, and test stimuli, to discover whether these different illusory effects share any mechanisms or properties.

#### **REFERENCES**


618–658. doi: 10.1037/0033- 295X.104.3.618


*Sci. Vis.* 53, 741–743. doi: 10.1364/ JOSA.53.000741


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 15 June 2013; paper pending published: 26 June 2013; accepted: 14 September 2013; published online: 17 October 2013.*

*Citation: Reeves A, Pinna B and Roxas F (2013) The microgenesis of the watercolor effect. Front. Psychol. 4:702. doi: 10.3389/fpsyg.2013.00702*

*This article was submitted to Perception Science, a section of the journal Frontiers in Psychology.*

*Copyright © 2013 Reeves, Pinna and Roxas. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Afterimage watercolors: an exploration of contour-based afterimage filling-in

#### *Simon J. Hazenberg \* and Rob van Lier*

*Donders Institute for Brain, Cognition and Behaviour, Radboud University Nijmegen, Nijmegen, Netherlands*

#### *Edited by:*

*Galina Paramei, Liverpool Hope University, UK*

*Reviewed by:*

*Daw-An Wu, Caltech, USA Baingio Pinna, University of Sassari, Italy*

#### *\*Correspondence:*

*Simon J. Hazenberg, Centre for Cognition, Donders Institute for Brain, Cognition and Behaviour, Radboud University Nijmegen, Montessorilaan 3, PO Box 9104, 6500 HE Nijmegen, Netherlands e-mail: s.j.hazenberg@donders.ru.nl* We investigated filling-in of colored afterimages and compared them with filling-in of "real" colors in the watercolor illusion. We used shapes comprising two thin adjacent undulating outlines of which the inner or the outer outline was chromatic, while the other was achromatic. The outlines could be presented simultaneously, inducing the original watercolor effect, or in an alternating fashion, inducing colored afterimages of the chromatic outlines. In Experiment 1, using only alternating outlines, these afterimages triggered filling-in, revealing an "afterimage watercolor" effect. Depending on whether the inner or the outer outline was chromatic, filling-in of a complementary or a similarly colored afterimage was perceived. In Experiment 2, simultaneous and alternating presentations were compared. Additionally, gray and black achromatic contours were tested, having an increased luminance contrast with the background for the black contours. Compared to "real" color filling-in, afterimage filling-in was more easily affected by different luminance settings. More in particular, afterimage filling-in was diminished when high-contrast contours were used. In the discussion we use additional demonstrations in which we further explore the "watercolor afterimage." All in all, comparisons between both types of illusions show similarities and differences with regard to color filling-in. Caution, however, is warranted in attributing these effects to different underlying processing differences.

**Keywords: color, contours, afterimages, filling-in, illusion**

#### **INTRODUCTION**

Perception of color does not always reflect what is physically present. This is clearly demonstrated in the case of colored afterimages where, after adapting to a colored stimulus, a vivid afterimage of a complementary hue is seen when the stimulus is removed or if one changes their gaze to a blank wall. Although the neural locus of the afterimages is still debated, data from a recent study indicates that the appearance of colored afterimages arise from signals originating in the retina (Zaidi et al., 2012). It has been argued further that the signals coding for colored afterimages may be, like retinal signals coding for "real" colors, subject to various kinds of contextual modifications. In this study we focused on the influence of luminance contours on the perception of colors. Specifically, we investigated filling-in of colored afterimages between luminance contours and compared this with color filling-in that is induced by "real" colors in the well-known watercolor illusion.

There have been various studies showing that the perception of colored afterimages can be modulated by luminance defined contours. For instance, an afterimage appears much more salient and saturated when it is surrounded by a luminance contour (Daw, 1962). More recently, the dependence of afterimages on contours have been emphasized by showing that a colored afterimage spreads within a test outline and fills in regions that were not adapted to color (van Lier et al., 2009). They used adapting stimuli consisting of multiple colors and an achromatic inner region and showed that the location that was not adapted to color revealed an afterimage color that depended on the subsequent presented outline (see **Figure 1**). They additionally demonstrated that when afterimages of multiple colors are contained within a contour, the colors within that contour tend to mix (van Lier et al., 2009; Anstis et al., 2012a). The color of an afterimage not only depended on the color that was positioned inside a subsequent contour, but also on the color that was positioned outside the subsequent contour. The color inside the contour produced a complementary color, whereas the color outside the contour produced a color that was similar to the inducing color, although the latter effect was found to be less strong (van Lier et al., 2009). Two possible routes have been suggested previously to explain the latter colored afterimage. Along the first route, the outside color induces a contrasting color across the boundary into the interior of the figure during the adapting period which then produces a complementary colored afterimage during the test phase. Along the second route, the outside color produces a complementary colored afterimage during the test phase, which then induces a contrasting color into the interior of the outline (Anstis et al., 1978).

Interactions between contours and "real" colors are known as well. Chromatic sensitivity seems to be enhanced when a colored region is bound by luminance edges (Montag, 1997). Similarly, "real" color filling-in occurs in the Boynton illusion in which color spreads out from a colored region to (nearly) isoluminant achromatic areas until it reaches luminance-defined contours or illusory contours (Feitosa-Santana et al., 2011). Furthermore, in the neon color illusion color diffuses from colored parts of, for example, multiple concentric lines and is blocked by illusory

contours (van Tuijl, 1975; Bressan et al., 1997; van Lier, 2002). The typical percept is of an illusory colored transparent disc floating in front of the inducing elements. Finally, Pinna et al. (2001) developed a clear demonstration that even a pair of juxtaposed thin colored undulating outlines induces color filling-in, i.e., the so-called watercolor illusion. **Figure 2A** shows a typical example of the watercolor illusion. By positioning the light orange outline on the inside and the dark purple outline on the outside of the star-like shape, the interior of the shape appears to be filled-in with a color that is similar, but less saturated than the color of the inner contour. When the colors of the outlines are reversed, color appears to spread outwards (**Figure 2B**). Color then, spreads in the direction where the luminance contrast between the contour and the background is lowest. In addition, the strength of the effect depends on luminance contrast between the two outlines. When the outlines are isoluminant, color spreading is rather weak and appears to spread both inwards and outwards, but becomes more vivid when the luminance contrast between the outlines is enhanced (Devinck et al., 2005). Additionally, Cao et al. (2011) showed that there is an optimal contrast after which color spreading diminishes again.

In any case, it seems that the appearance of surface color strongly depends on edge information. This is in line with a study in which the activity of single neurons in V1 and V2 were recorded (Friedman et al., 2003). They showed that some cells that code for color are also orientation selective, indicating that the representation for form and color are tightly linked (Von der Heydt and Pierson, 2006). Activity in these cells might be responsible for filling-in of both afterimage colors and "real" colors. Similarly, activity of cells with receptive fields along the edges might also underlie perceived contrast induction of colors across edges. Indeed, in a version of the watercolor illusion in which the inner contour of a shape was achromatic (black) and the outside contour was colored, a complementary color was perceived in the interior of the shape (Pinna, 2006).

Given the similarity of the observed interactions between colors and luminance contours in both "real" colors and afterimage colors it seems plausible that the observed effects tap from a common mechanism. In this study, we further explored this by comparing performance on two color judging experiments in which filling-in could be induced by afterimage colors or by "real" colors. Similar to the stimuli used to investigate the watercolor illusion, we only used thin outlines. In Experiment 1, we first investigated whether the afterimage filling-in effects described by van Lier et al. (2009) also occur when using thin colored outlines similar to the watercolor illusion. In Experiment 2, we used such stimulus configurations to compare filling-in of afterimage colors with filling-in in the watercolor illusion.

#### **EXPERIMENT 1**

In this experiment, a chromatic contour alternated over time with an achromatic contour. Thus, when the achromatic contour is presented, an afterimage of the previously presented chromatic contour should be perceived. The chromatic contour could be positioned inside or outside a subsequently presented achromatic contour. Participants had to judge the color of the interior of the achromatic outline. If afterimage colors of thin outlines are sufficient to induce color filling-in, the interior of the figures should reveal complementary color filling-in when the chromatic outline is placed inside the achromatic outline, whereas filling-in of a similar color as the chromatic outline should be induced when this outline is positioned outside the achromatic outline. Following van Lier et al. (2009) we further expected to find weaker color filling-in for outer chromatic contours as compared to inner chromatic contours.

#### **METHODS** *Participants*

#### Twenty-one observers participated in Experiment 1 (aged 17–24, one participant had an age of 64; six males). All had normal or corrected to normal visual acuity. In addition, all participants had normal color vision as screened with the AO Hardy Rand and Rittler Pseudoisochromatic Plates (2nd edition). Participants received payment or course credits. All participants were naive to the purpose of the experiment.

#### *Stimuli*

Stimuli were presented on a gray background [*CIE(x,y)* = 0*.*3128*,* 0*.*3303] with a luminance of 73*.*87 cd/m2. The stimuli consisted of closed, arbitrarily undulating, contours. Three outlines or contours were created that differed in shape, size, and thickness (see **Figure 3** for examples of the contours). A second set of contours was created such the shapes of these contours fit neatly within the shapes of the former set of contours. Thus, for each shape in **Figure 3**, we distinguished between an "outer" contour and an "inner" contour.

Both inner and outer contours could be used as adapting stimuli or as test stimuli. The contours of the chromatic adapting stimuli could be colored either green [*CIE(x,y)* = 0*.*3514*,* 0*.*4417; *<sup>L</sup>* <sup>=</sup> <sup>65</sup>*.*69 cd/m2], orange [*CIE(x,y)* <sup>=</sup> <sup>0</sup>*.*3774*,* <sup>0</sup>*.*3694; *<sup>L</sup>* <sup>=</sup> <sup>58</sup>*.*65 cd/m2], blue [*CIE(x,y)* <sup>=</sup> <sup>0</sup>*.*2600, 0.3058; *<sup>L</sup>* <sup>=</sup> <sup>52</sup>*.*90 cd/m2], or pink [*CIE(x,y)* <sup>=</sup> <sup>0</sup>*.*3814, 0.2733; *<sup>L</sup>* <sup>=</sup> <sup>29</sup>*.*87 cd/m2]. The colors were chosen such that two colors were approximately complementary to each other (e.g., orange-blue; pink-green). The interior of the adapting contour was gray having a luminance that was equal to the luminance of that contour. This was done as luminance borders between the inducing color and the interior area may block afterimage filling-in (van Lier et al., 2009). The contours of the achromatic test stimuli were gray and had a luminance of 55 cd/m2. Note that all stimuli were darker than the background. Examples of stimuli in which the inner contour is the adapting chromatic contour are shown in **Movie 1** and examples of stimuli in which the outer contour is the adapting chromatic contour are shown in **Movie 2**.

#### *Procedure*

The experiment was run on a PC and an 18-inch CRT monitor with a 120 Hz refresh rate. The monitor was calibrated using an X-Rite Color Monitor Optimizer. The experiment was designed and presented using Presentation (Version 14.8, Neurobehavioral Systems, Inc.).

Once written informed consent was given and the instructions were read, the experiment started. The sequence of events is shown in **Figure 4**. Each trial started with the presentation of a small fixation dot that was presented on the center of the screen for 1000 ms. Afterwards a chromatic adapting stimulus was presented for 1000 ms which was followed by an achromatic test stimulus that was presented for another 1000 ms. The adapting stimulus and test stimulus kept alternating until a response was made. Once a response was made, the next trial started automatically after 1000 ms.

We distinguished between two types of trials (See **Figure 4**). In the first type, the outer contour was the chromatic adapting stimulus and the inner contour was the achromatic test stimulus. In the second type, the inner contour was the chromatic adapting stimulus, while the outer contour was the achromatic test stimulus. Participants were instructed to judge whether the inside of the achromatic test stimulus appeared to be filled in with a color. In order to respond, five disks, four of which were colored, were shown at the bottom of the screen. The colors of the response disks were similar to the inducing colors. When participants perceived color filling-in, they had to choose which of the colored response disks best matched the perceived color. The fifth response disk comprised the same gray as the background and could be chosen whenever participants did not perceive any color filling-in. Participants were asked to observe at least three presentation cycles (chromatic contour, achromatic contour) before responding. Responses were given by pushing one of five buttons corresponding to the five disks.

There were 24 unique trials; stimulus shape (3 levels) × color (4 levels) × trial type (2 levels). Prior to the experiment, participants completed a small practice block consisting of three trials to get familiar with the task. The main experiment was administered in four blocks. In each block, each of the 24 trials was presented once in a randomized order. Participants controlled the time between the blocks and started the next block by pressing one of the five response buttons.

#### **RESULTS**

One participant was unable to perceive any afterimages and was not included in the analysis. In **Figure 5** the responses are plotted. The responses for the colors orange, blue, pink and green were assigned to coordinates (1, 0); (–1, 0); (0, 1); (0, –1), respectively. For example, a 100% response for the color orange is represented by plot coordinate (1, 0). Additionally, a no color response was assigned to coordinates (0, 0). **Figure 5** shows the coordinate plots

**FIGURE 4 | Sequence of events in Experiment 1. Upper row:** an example of a trial in which the outer contour is the chromatic stimulus and the inner contour is the achromatic test contour. **Lower row:** an example of a trial in which the inner contour is the chromatic stimulus and the outer contour is the achromatic test contour.

for trials in which the inner contour was the chromatic contour and for trials in which the outer contour was the adapting contour.

To analyze the data, we calculated the proportions each participant responded with "same color," "complementary color," and "no color." The resulting proportions were transformed using the arcsine transformation in order to obtain a normal distribution of the data. All statistical tests were performed on these transformed proportions. Paired *t*-tests showed that when the inner contour was chromatic, the proportion complementary color responses was larger as compared to the proportion same color responses [*t(*19*)* = 11*.*98, *p <* 0*.*001]. In contrast, when the outer contour was chromatic, there were more same color responses as compared to complementary color responses [*t(*19*)* = 5*.*18, *p <* 0*.*001]. We also compared the proportions no color responses to check whether the probability of perceiving any color filling-in differed between conditions. A paired *t*-test revealed significant results [*t(*19*)* = 2*.*41, *p <* 0*.*05], showing that participants were more likely to respond with no color when the outer contour was chromatic as compared to when the inner contour was chromatic.

#### **DISCUSSION**

The results showed that afterimages of chromatic outlines filled in regions that were not adapted to color. The color of the filled-in region depended on the position of the chromatic contour in the adapting stimulus: when the chromatic contour was positioned at the inside, filling-in of the complementary color in the inner area was perceived; when the chromatic outline was positioned outside, filling-in of the same color in the inner area was perceived. Afterimage filling-in induced by the outer contour appeared less strong as compared to afterimage filling-in induced by the inner chromatic contour, although there appears to be some variability within the different colors (e.g., compare green inside and outside in **Figure 5** with the other colors).

The fact that thin colored outlines were sufficient to induce spreading within closed boundaries strengthened our initial idea that the filling-in mechanisms triggered by afterimage colors proceeds in a similar fashion as the filling-in processes that are at work in the original watercolor illusion. The next experiment was set up to further test similarities between both types of filling-in.

#### **EXPERIMENT 2**

We used a color judging experiment similar to Experiment 1. In addition to presenting the chromatic and achromatic contours alternately, watercolor-like stimuli were created by presenting the contours simultaneously. We expected filling-in of different colors to depend on whether the contours alternated or whether they were presented simultaneously. For example, an inner chromatic contour should induce color spreading of a similar hue when the contours are presented simultaneously, but the same chromatic contour should induce complementary color filling-in when the contours alternated. Previous studies found that the strength of the watercolor illusion depends on the asymmetric luminance profiles of both contours (Pinna et al., 2001; Devinck et al., 2005). To test the effect of luminance contrast between chromatic and achromatic outlines, we used two kinds of achromatic contours. One was of a same gray as in Experiment 1, while the other was much darker (i.e., black). We expected that when the contours were presented simultaneously, stronger color filling-in should be perceived when black achromatic contours were used as compared to gray achromatic contours. Note that in Experiment 1, the inner area was isoluminant to the chromatic contour to enhance filling in. However, to allow a better comparison with the usual static watercolor illusion, in the current experiment the inner area has the same luminance as the surrounding background, which is different from the luminance of the chromatic contour.

#### **METHODS**

#### *Participants*

Twenty observers participated in Experiment 2 (aged 17–24; four male). All had normal or corrected to normal visual acuity. In addition, all participants had normal color vision as screened with the AO Hardy Rand and Rittler Pseudoisochromatic Plates (2nd edition). Participants received payment or course credits. All participants were naive to the purpose of the experiment.

#### *Stimuli*

The stimuli we used in this experiment were similar to those we used in Experiment 1, but with the following changes. Firstly, in addition to presenting the contours in an alternating fashion, the chromatic and achromatic contours where also presented simultaneously to induce the classic watercolor illusion. Examples of these stationary stimuli are shown in **Figure 6**. Secondly, the interior of the adapting contours was always of the same white as the background. In order to enhance the watercolor effect, stimuli were presented on a white background (100 cd/m2). Lastly, in addition to the gray contour that was used in Experiment 1, we also used black contours (0*.*39 cd/m2).

#### *Procedure*

For the alternating presentation condition, the same procedure as in Experiment 1 was followed. Similarly, in the simultaneous presentation condition, participants had to judge whether the interior of the stimulus appeared colored or not, using the same response categories as in Experiment 1. The stimuli remained on the screen until participants responded.

There were 96 unique trials: stimulus shape (3 levels) × color (4 levels) × trial type (2 levels; inner or outer contour was chromatic) × presentation type (2 levels; alternating or simultaneous) × achromatic contour (2 levels; black or gray). Each unique trial was presented twice. The experiment was administered in four blocks in which the variables presentation type and achromatic contour were blocked while the other trials were presented randomly. The presentation order of the blocks was counterbalanced across participants. In each block, each of the 24 trials was presented once in a randomized order. Participants controlled the time between the blocks and started the next block by pressing one of the five response buttons.

#### **RESULTS**

We have plotted the results in a similar fashion as the results of Experiment 1. First we consider the results of trials when the achromatic contours were gray (see **Figure 7**).

The results were analyzed according to the response categories specified in Experiment 1. As in Experiment 1, all analyzes were performed on the arcsine transformation of the proportions. As can be seen in **Figure 7**, for the alternating presentation condition, when the outer contour was chromatic, the proportion participants responded with the same color was greater as compared to the proportion participants responded with the complementary color [*t(*19*)* = 4*.*13, *p <* 0*.*01]. However, when the inner contour was chromatic, the same color responses and the complementary color responses did not appear to differ (*p >* 0*.*1). For the simultaneous presentation condition, the same color response prevailed as compared to the complementary color response for both the inner and the outer chromatic contour [*t(*19*)* = 7*.*80, *p <* 0*.*001; *t(*19*)* = 3*.*53, *p <* 0*.*01; respectively]. To analyze whether the probability of perceiving any color fillingin differed between conditions, we also compared the no color responses. Paired *t*-tests showed no difference for the alternating presentation condition (*p >* 0*.*1). For the simultaneous presentation condition, participants were more likely to respond with no color when the outer contour was chromatic as compared to when the inner contour was chromatic [*t(*19*)* = 3*.*89, *p <* 0*.*01].

Next we consider responses on the same conditions, but now for the stimuli with black achromatic contours (See **Figure 8**).

As can be seen in **Figure 8**, the same pattern as before was found. For the alternating presentation condition, when the outer contour was chromatic, the proportion participants responded with the same color was greater as compared to the proportion participants responded with the complementary color [*t(*19*)* = 3*.*40, *p <* 0*.*01]. However, when the inner contour was chromatic, the same color responses and the complementary color responses did not differ (*p >* 0*.*1). For the simultaneous presentation condition, participants were more likely to respond with the same color response as compared to the complementary color response for both the inner and the outer chromatic

contour [*t(*19*)* = 8*.*80, *p <* 0*.*001; *t(*19*)* = 2*.*48, *p <* 0*.*05; respectively]. Paired *t*-tests on the proportions participants responded with no color revealed no difference for the alternating presentation condition (*p >* 0*.*1). However, for the simultaneous presentation condition, participants were more likely to respond with no color when the outer contour was chromatic as compared to when the inner contour was chromatic [*t(*19*)* = 3*.*43, *p <* 0*.*01].

For the alternating presentation condition, we noticed that when black achromatic contours were used, color filling-in appeared to be attenuated as compared to when the lighter gray achromatic contours were used. Paired *t*-tests confirmed this, showing that for both inner [*t(*19*)* = 2*.*85, *p <* 0*.*05] and outer chromatic contour [*t(*19*)* = 5*.*18, *p <* 0*.*001], the proportion participants responded with no color was greater when black contours were used as compared to when gray contours were used. No such effect was found for the simultaneous presentation condition.

#### **DISCUSSION**

In this experiment we compared filling-in of afterimage colors with filling-in of "real" colors. We used chromatic and achromatic contours that could be presented simultaneously or alternately. When gray achromatic contours were used, for the alternating condition, an outer chromatic contour induced filling-in of a similar color whereas an inner chromatic contour hardly induced filling-in of the expected complementary color. In contrast, for the simultaneous presentation condition, the probability of perceiving color filling-in was greater for an inner chromatic contour as compared to an outer contour. Furthermore, filling-in induced by an inner or an outer chromatic contour was most likely to be similar to the color of the contour. Although we expected to find stronger color spreading for the simultaneous presentation condition when black achromatic contours were used, no such effect was found. Moreover, the use of black contours in the alternating presentation condition greatly diminished filling-in induced both by an inner and outer chromatic contour.

At first glance, the results from the alternating condition in the present experiment appear to be at odds with the results from Experiment 1. In the first experiment, an inner chromatic contour generally induced stronger afterimage filling-in as compared to an outer chromatic contour, while this did not reveal a significantly different effect in Experiment 2. A plausible cause for these apparent differential results lies in the different background luminance. More in particular, in Experiment 1, the interior area of the adapting stimulus was always isoluminant with the chromatic contour, while in this experiment, the interior area was of a different luminance. As mentioned, this was done to allow a better comparison with the typical watercolor displays, but it also caused a luminance border between the inner chromatic contour and the interior area. This luminance border, which likely remained in the afterimage (with a contrast polarity in the opposite direction), apparently prevented the colored afterimage of the chromatic contour from spreading. In contrast, color filling-in by an outer chromatic contour should be influenced less by such afterimage luminance border as this type of filling-in depends on color induction across luminance borders (Anstis et al., 1978). Indeed, comparing the plot in **Figure 5** with the left plot in **Figure 7**, it appears that color spreading induced by the inner chromatic contour was attenuated in Experiment 2 (running additional independent *t*-tests confirmed this observation [*t(*38*)* = 2*.*65, *p <* 0*.*05], while the strength of color spreading induced by the outer chromatic contour appears more or less the same across experiments [*p >* 0*.*1]). When black achromatic contours were used, the greater luminance contrast in the afterimage appeared to interfere with both effects.

Note that although afterimages of a luminance border between a contour and the adjacent area may weaken or even prevent the spreading of afterimage colors, similar luminance borders do not prevent color spreading of "real" colors in the current watercolor displays. In fact, when contours were presented simultaneously, we replicated previous findings on the watercolor illusion (Pinna et al., 2001; Devinck et al., 2005; Cao et al., 2011) and showed that an inner chromatic contour triggered same-color filling-in. Apparently, afterimage filling-in is more sensitive to luminance borders between the inducing contours and the adjacent area than filling-in triggered by "real" colors. Note that the current difference in strengths in fact deal with the induction of the filling-in, i.e., the flow from contour colors to the adjacent area. This is different from the earlier observation that, once afterimages are generated, their perceived strength strongly depends on the position of luminance contours (e.g., Daw, 1962; van Lier et al., 2009; Powell et al., 2012).

An additional difference between the current filling-in of afterimage colors and "real" colors appeared when filling-in was triggered by the outer chromatic contours. For afterimage fillingin, our results were in line with Experiment 1 and previous research (Anstis et al., 1978; van Lier et al., 2009). As it is likely that the effect depends on contrast induction (either of afterimage colors or of "real" colors), we also expected to find, like Pinna (2006), contrast induction when "real" colors were used. However, this is not what we found; not only was color fillingin triggered by the outer contours stronger for afterimage colors as compared to "real" colors, color filling-in triggered by "real" colors was, unexpectedly, more likely to be of the same color as the inducing contour (see **Figures 7**, **8**, right plots). We speculate that given the very weak color appearance for these configurations, color judgments were biased toward the color of contour that was actually present in the display. The use of different stimulus configurations across studies may further offer a solution for the diverging results. To illustrate this, consider the shapes in **Figure 9**. In the upper left and upper right, one of the shapes used in our experiment (**Figure 9A**) and an angular version of that shape (**Figure 9B**) are shown. Both shapes produce rather weak, but an approximately similar color appearance based on contrast induction (i.e., orangish filling-in) in the interior area, indicating that the type of contour (smoothly undulating or angular) does not seem to matter much. However, the effect can be enhanced by reducing the to be filled-in area by adding a gray contour inside the inner area (**Figure 9C**). The enclosed area between the gray contours now reveals a stronger orangish impression. In **Figure 9D** we have added an additional blue contour outside that area. As a consequence the inner part of the figure has a bluish tint and the area between the gray contours appears even more orangish. In fact, the latter figure is similar to the examples provided by Pinna (2006). These informal observations illustrate that our stationary stimuli were perhaps not optimized for strong filling-in based on contrast induction. Note further that, contrary to shapes such as **Figure 9D** in which contrast induction can be compared with watercolor filling-in within the inner region, our stimuli were presented in isolation, which may have made the task of judging color filling-in of an already weak effect even more challenging.

#### **GENERAL DISCUSSION**

We showed that, similarly to the watercolor illusion, afterimages of thin colored outlines spread within a region bounded by luminance contours. Spreading induced by an afterimage of an inner chromatic contour appears stronger as compared to spreading induced by an afterimage of an outer contour. The probability

of perceiving filling-in depended on whether it is induced by "real" colors or afterimage colors. For instance, it appears that spreading of afterimage colors can be more easily affected by changes in luminance settings as compared to spreading of "real" colors. In addition, afterimage colors were, in contrast to "real" colors, more likely to induce color contrast across boundaries.

both sides by a blue contour.

Filling-in demonstrated in this study may be related to other filling-in phenomena that appear to depend on mechanisms that are related to boundary processing. For example in Troxler fading, prolonged fixation causes stimuli in the periphery (Troxler, 1804), or, as has been shown more recently, even entire scenes (Simons et al., 2006) to disappear from view. As adaptation causes luminance boundaries to break down, color may spread beyond that boundary. Filling-in due to Troxler fading has been studied using "real" colors (Hamburger et al., 2005) and afterimage colors (Hamburger et al., 2012). They also found that different types of filling-in triggered by "real" colors and afterimage colors did not perfectly match. As has been mentioned in the introduction, another possible related instance of fillingin occurs in the neon color illusion. Afterimages of neon color spreading have also been reported (Shimojo et al., 2001), which have been shown to be the result of adaptating to the illusory filled-in surface. However, in our demonstrations the situation is different. As the chromatic contours in our stimuli are not likely to induce filling-in or contrast induction by themselves (a second contour is necessary), it is likely that for our stimuli, filling-in mechanisms act on the afterimage of the chromatic contours instead.

Several theories of form and surface perception account for filling-in phenomena (Komatsu, 2006). For example (Grossberg and Mingolla, 1985) proposed that visual input is processed into two parallel systems; a boundary contour system and a feature contour system. Boundary and edge information are processed in the boundary contour system, while feature information such as color and brightness are processed in the feature contour system. A perception of a surface is formed when information of both systems are combined. Color and brightness information in the feature contour system spread across surfaces and are bound by edge information in the boundary contour system. The theory has been used to explain filling-in phenomena such as the neon color effect (Grossberg and Mingolla, 1985) and also the watercolor illusion (Pinna and Grossberg, 2005). In fact, the latter authors constructed a stimulus, the so-called two-dot limiting case, to explain both similarities and differences between the neon color illusion and the watercolor illusion.

Recently Francis (2010) used a model based on the boundary contour system and the feature contour system to simulate the perceived afterimages in van Lier et al. (2009). Initially, the results from the model were consistent with the idea that boundaries block color spreading. In a second study however, some predictions of the model did not entirely match their experimental data (Kim and Francis, 2011). Particularly, in one instance they used similar star-like stimuli (see **Figure 1**) as used in van Lier et al. (2009), but varied the size of the test outline. Contrary to predictions of the model, their experimental results revealed that afterimage filling-in was less likely to be perceived when the test contour did not exactly match the edges of the inducing color. The results were explained by the fact that for a smaller test contour, the region that receives afterimage color signals is relatively smaller, while for larger test contour, a larger region had to be filled in. Both of these factors may have diluted color spreading. An alternative interpretation is that the alignment of a test contour with the edges of the inducing colored region is important for filling-in to occur, because of repeated activation of orientation selective neurons that are also coding for color (Friedman et al., 2003). Kim and Francis (2011) additionally found that when the test contour was larger as compared to matching contours, the probability of perceiving filling-in of an unexpected color became higher, possibly due to the fact that larger test contours included afterimages from two complementary colors. However, when the test contour was smaller than the inducing colored region, the probability of perceiving no color filling-in became higher. In addition to their explanation, this result may have been caused by the fact that while one portion of the inducing colored region fell inside the test contour, another portion fell outside the test contour. It is possible that the similarly colored afterimages induced by these outer colors negated the complementary colored afterimages induced by the inner colors.

To illustrate this, consider the examples in **Movie 3**. In addition to adapting figures with only one contour (on the right), we created adapting figures comprising two chromatic contours, one of which is positioned inside and the other outside the test contour (on the left). When both contours are of the same color, complementary colored filling-in induced by the inner contour and similarly colored filling-in induced by the outer contour appear to cancel each other out, so no color filling-in is perceived (compare top left with top right animation). When the color of the outer contour is changed to purple (bottom left), both inner and outer contour induce approximately the same purplish colored afterimage, leaving a stronger purplish impression as compared to when only an inner green contour is used (compare bottom left with bottom right). Another example is provided in **Movie 4**. Here the chromatic contour is sequentially followed by two achromatic contours, juxtaposed to the chromatic contour; one positioned inside the interior area, and one positioned outside the interior area. After viewing a few cycles (remain fixated on the central dot) one may see a color change with regard to the afterimage filling-in, corresponding with the position of the contour. These examples illustrate that both effects play an important role in afterimage filling-in and should be integrated in any model accounting for afterimage filling-in.

All in all, the current color filling-in triggered by alternating juxtaposed chromatic and achromatic contours largely reveals similar phenomenological impressions as the original watercolor illusion. Nevertheless, there are also different sensitivities for both types of filling-in. It should be noted, however that phenomenological differential effects for "real" colors and afterimage colors do not necessarily point to fundamentally different processing mechanisms between these types of color. For example, it has been shown that when having appropriate color settings (e.g., having "real" colors that are more comparable to the relatively weak and desaturated afterimage colors), both types of colors can be effectively blocked and gated by luminance contours (e.g., Anstis et al., 2012a,b). Further investigations (e.g., Powell et al., 2012) may clarify whether different sensitivities are merely a result of different stimulus parameters and color properties (like saturation) or whether they are caused by different underlying mechanisms. The current afterimage watercolors may provide a suitable entrance to further examine the conditions under which colors straddle the boundaries.

#### **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www.frontiersin.org/journal/10.3389/fpsyg. 2013.00707/abstract

**Movie 1 | Examples of stimuli in which the inner contour is the chromatic adapting contour.** Observers tend to perceive color filling-in complementary to the color of the adapting contour. Note, the effect appears somewhat stronger after a couple of alternations (please set your video player to loop the movie and fixate on the small dot in the middle of the four stimuli).

**Movie 2 | Examples of stimuli in which the outer contour is the chromatic adapting contour.** Observers tend to perceive color filling-in similar to the color of the adapting contour.

**Movie 3 | Examples of stimuli with two chromatic adapting contours.** In the top left a stimulus is shown comprising both an inner and an outer green adapting contour. Compared with the figure in the top right, color spreading appears less strong, because filling-in induced by both contours appear to cancel each other out. In contrast, when the outer contour is changed to purple (lower left), both contours induced a similar colored afterimage. The color should be similar to color filling-in induced by the stimulus in the lower right.

**Movie 4 | Examples of chromatic contours that are followed subsequently by two different achromatic contours.** After a few cycles the color of the afterimages switches according to the achromatic contour.

#### **REFERENCES**


color from boundaries: a new 'watercolor' illusion. *Vision Res.* 41, 2669–2676. doi: 10.1016/S0042-6989(01)00105-5


441–445. doi: 10.1016/0001-6918 (75)90042-6


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 31 May 2013; paper pending published: 24 June 2013; accepted: 16 September 2013; published online: 08 October 2013.*

*Citation: Hazenberg S J and van Lier R (2013) Afterimage watercolors: an exploration of contour-based afterimage filling-in. Front. Psychol. 4:707. doi: 10.3389/fpsyg.2013.00707*

*This article was submitted to Perception Science, a section of the journal Frontiers in Psychology.*

*Copyright © 2013 Hazenberg and van Lier. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Flexible color perception depending on the shape and positioning of achromatic contours

Mark Vergeer <sup>1</sup> \*, Stuart Anstis <sup>2</sup> and Rob van Lier <sup>3</sup>

*<sup>1</sup> Laboratory of Experimental Psychology, KU Leuven, Leuven, Belgium, <sup>2</sup> Department of Psychology, University of California, San Diego, San Diego, CA, USA, <sup>3</sup> Donders Institute for Brain, Cognition and Behaviour, Radboud University Nijmegen, Nijmegen, Netherlands*

In this study, we present several demonstrations of color averaging between luminance boundaries. In each of the demonstrations, different black outlines are superimposed on one and the same colored surface. Whereas perception without these outlines comprises a blurry colored gradient, superimposing the outlines leads to a much clearer binary color percept, with different colors perceived on each side of the boundary. These demonstrations show that the color of the perceived surfaces is flexible, depending on the exact shape of the outlines that define the surface, and that different positioning of the outlines can lead to different, distinct color percepts. We argue that the principle of color averaging described here is crucial for the brain in building a useful model of the distal world, in which differences within object surfaces are perceptually minimized, while differences between surfaces are perceptually enhanced.

#### Edited by:

*Galina Paramei, Liverpool Hope University, UK*

#### Reviewed by:

*David Bimler, Massey University, New Zealand Dingcai Cao, University of Illinois at Chicago, USA*

#### \*Correspondence:

*Mark Vergeer, Laboratory of Experimental Psychology, University of Leuven (KU Leuven), Tiensestraat 102 - box 3711, Leuven B-3000, Belgium mark.vergeer@ppw.kuleuven.be*

#### Specialty section:

*This article was submitted to Perception Science, a section of the journal Frontiers in Psychology*

Received: *10 December 2014* Accepted: *27 April 2015* Published: *18 May 2015*

#### Citation:

*Vergeer M, Anstis S and van Lier R (2015) Flexible color perception depending on the shape and positioning of achromatic contours. Front. Psychol. 6:620. doi: 10.3389/fpsyg.2015.00620* Keywords: color, filling-in, illusions, shape, contours

#### Introduction

Luminance and color play a distinct role in vision and are processed in different subareas of the retino-geniculo-cortical pathway (De Valois et al., 1966; Wiesel and Hubel, 1966). Chromatic signals derive from differences in activity in S-, L-, and M-cones, while activity in the luminance channel is derived from additions of the signals of the different cones. The luminance signal is crucial in shape detection and segmentation, as object boundaries are generally characterized by abrupt luminance changes. Fast and accurate detection and segmentation of different animate and inanimate objects are crucial for a primate's survival. Therefore, such abrupt luminance contrasts are reinforced through lateral inhibition (Kuffler and Nicholls, 1976) by means of which the darker side of a boundary is perceptually darkened while the lighter side is further lightened. Subsequently, achromatic and chromatic signals interact to determine an object's shape and surface appearances (e.g., De Weert and Wade, 1988; Kanga and Shevell, 2008).

Human acuity is much lower for color than for luminance (Wandell, 1995). For instance, you can split a picture into its luminance and color components and blur the color component, followed by reuniting the two components. As a result the picture looks virtually unchanged, and resembles the original picture. This shows that the human visual system has poor acuity (low bandwidth) for color (Livingstone, 2002). Television signals take advantage of this principle in encoding images by devoting less resolution to chromatic information than to luminance information (Winkler et al., 2001). So to summarize, the visual system strongly relies on the luminance signal, more specifically (sharp) luminance changes in the visual scene, to define the boundaries of objects and, hence, their shape.

A number of published studies highlight the interplay between color and luminance information in visual processing. The "watercolor illusion" (Pinna et al., 2001) is a striking demonstration that involves contour dependent color spreading. In their study, a wiggly black line and an additional juxtaposed yellow line enclose an area, causing the whole enclosed area to be perceptually tinged with an apparent pale yellowish tint. The afterimages of watercolor-like displays show similar spreading effects when the outlines are presented sequentially, in an alternating fashion (Hazenberg and Van Lier, 2013). Daw (1962) and Van Lier et al. (2009) have demonstrated the role of luminance contours in the appearance of colored afterimages. Colored afterimages are due to adaptation of retinal cones and they are especially vivid when contours, presented after the adapting image, coincide with the blurred edges of the afterimage (Daw). Van Lier et al. demonstrated that weak, blurry color signals from afterimages could spread within regions defined by strong luminance borders but could not cross over these borders. Thus, one and the same colored stimulus can induce multiple, differently colored afterimages, depending on the test contours presented after the colored image. Anstis et al. (2012a) went on to show that the color-contour interactions shown for afterimage colors also occur for "real" colors. They argued that for both types of stimulation the color signals spread by a process analogous to physical diffusion, until they encounter a strong contour such as a black line. In a similar vein, Anstis et al. (2012b) took two pictures, Gainsborough's "Blue Boy" and a pink nude called "La Source" by Ingres. They split each picture into its color and luminance components, and superimposed just the two color components. This gave a rather indistinguishable colored mess. But when the greyscale luminance picture of the Blue Boy was superimposed, the mess looked like the original Blue Boy, and when the greyscale luminance picture of La Source was superimposed, the very same mess looked like the original La Source. Thus, the superimposed luminance contours modulated the perceived colors: the Gainsborough greyscale made the torso look blue, while the Ingres greyscale made the same torso look pink. With regard to the filling-in of afterimages, Francis (2010) and Kim and Francis (2011) explained such contour dependent filling-in effects with a model in which the contour forms a boundary that traps afterimage colors, and presumably real colors, as they spread across a surface. This model in turn draws upon the earlier theories of Grossberg and Mingolla (1985) and Grossberg (2002). These two latter papers also propose that color spreads spatially in a process akin to physical diffusion, until it encounters a luminance contour. The experiments of Kim and Francis (2011) also showed that their model could not fully account for the spreading of afterimage colors.

We now present several new demonstrations that further highlight this kind of interplay between luminance and color in visual processing. These demonstrations show that color perception is flexible and depends on luminance based surface construction. We have constructed a novel stimulus by first composing three square-waved, concentric, circular gratings, each consisting of an opponent color pair. The three circular gratings differed 1/3 of a cycle in phase. Next, the three composed images were blurred, superimposed and averaged, with the colored disk in **Figure 1A** as a result. In fact, in each of the four panels in **Figure 1** the same colored disk is displayed, with circular outlines superimposed on the colored disk in **Figures 1B–D**. The perceptual color appearances are clearly different in all four panels, although, with the exception of the black outlines, the physical color is the same at each location of the disk for all four images. The color gradient visible in **Figure 1A**, is perceptually absent in the other panels, or at least highly reduced. Instead the presented colors seem to perceptually average, forming steps of more or less uniform color between the presented contours. Apparently, the colors seen depend upon the location of the black outline circles. Video 1 shows a dynamic version of this illusion, in which the contour changes position every second. This Video gives an even clearer impression of the different color percepts for the different color settings.

The demonstrations in **Figure 2** provide a rather different example of color averaging. Again the underlying colors are presented in **Figure 2A**, and they are repeated throughout the whole figure. Only the superimposed black contours differ between panels. One may note the apparent color differences between the panels. Each panel leads to green and purple color appearances, but the shapes with homogeneous color appearances are perceptually different in each panel. It is the shape and the positioning of the outlines that determine the overall color appearance. Thus, one and the same green and purple colored grid (**Figure 2A**) can lead to a multitude of

FIGURE 1 | Flexible colors. All four panels contain a disk that evokes four distinct colored experiences. However, each panel has the same underlying blurry colored image. The only physical difference between the panels is the positioning of circular outlines on top of this colored image. In (A) no outlines are presented, leading to a continuous perceptual color change from the center to the periphery of the disk. The positioning of the contours is at a different phase for each of (B–D). Although the physical color changes are gradual and continuous in all panels, the color appearance seems rather homogeneous between each pair of neighboring outlines; the colors average between the outlines perceptually.

FIGURE 2 | Flexible color averaging between outlines. Again, the underlying colored image is the same for each panel. In fact, the total figure is one regular purple-green plaid (as seen in A) on which the horizontal and vertical panel dividers and the black outlines within each of (B–K) have been superimposed.

different percepts: green octagons and purple stars (**Figure 2B**), purple octagons and green stars (**Figure 2C**), green squares (**Figure 2D**), purple squares (**Figure 2E**), green diamonds (**Figure 2F**), purple diamonds (**Figure 2G**), small green disks (**Figure 2H**), small purple disks (**Figure 2I**), large green disks (**Figure 2J**), and large purple disks (**Figure 2K**). In Video 2, a dynamic version of **Figures 2B,C** is shown. In this dynamic version the contour gradually shifts between the 2 contour positions of **Figures 2B,C**. This Video provides an impression of the dynamic changes in color perception from one contour setting to the other.

**Figure 3** shows how the stimuli in **Figure 2** were constructed. The construction starts with two images (**Figure 3A**), both comprising a grid of octagons. The two colors in these images were chosen from the so-called Teufel colors (Teufel and Wehrhahn, 2000). This is a set of sixteen colors that are approximately isoluminant, equally detectable and perceptually equidistant. The color pairs in the upper and lower image are orthogonal in color space. The two images are in spatial counter phase, both horizontally and vertically. In the next step (**Figure 3B**), the two images are superimposed and made semi-transparent, so that a mixture of both images can be perceived. Next (**Figure 3C**), the image is blurred with a Gaussian blur. **Figure 3D** shows the result of this process, which is the same colored plaid as displayed in **Figure 2A**. When luminance outlines are superimposed on the colored plaid, the colors appear to average between these outlines. As was mentioned above, the stimulus was constructed from grids of octagons. Therefore, superimposing octagon-shaped black outlines most likely leads to the most vivid color experience. However, when differently shaped outlines are superimposed on the colored image (see **Figures 2D–K**), the color also average perceptually within the outlined areas.

In two experiments, to be discussed next, we have tested these effects of color spreading. In Experiment 1 colors were compared and in Experiment 2 colors were matched.

#### Experiment 1: A Comparison Task Participants

Eight naïve observers (6 females), all with normal color vision were tested in this first experiment. The ethical committee of the Psychology Department of the University of Leuven approved both experiments.

#### Apparatus

The experiment was run on a 13 inch MacBook Air (mid 2011 edition), with a screen resolution of 1440 × 900, driven by a 1.7 GHz Intel Core i5 processor at 60 Hz. Images were displayed using Powerpoint for Mac 2011.

#### Stimulus Materials and Procedure

As stimulus material we used images based on **Figures 1**, **2**. In each trial, 2 images were presented side by side, always with the same underlying colors for the left and right side image. The overlying contours on the left and right side images could be positioned either similarly (same contour trials) or out of phase (different contour trials). In trials where the circular images from **Figure 1** were tested, each circle had a diameter of 10.8 arcdeg. The 2 images were presented on a homogenous white background (L = 340 cd/m<sup>2</sup> ). Black outlines had a width of 2.15 arcmin. The stimuli of **Figure 2** were presented in a square configuration (diameter 10.8 × 10.8 arcdeg) with 9.5 repeated cycles per image, both horizontally as vertically. The superimposed black outlines had a diameter of 0.86 arcmin. The colors in this image ranged from green (CIExy = 0.329, 0.412; L = 126 cd/m<sup>2</sup> ) and purple (CIExy = 0.310, 0.306; L = 126 cd/m<sup>2</sup> ). In the different contour trials, observers compared each combination of different contoured images of **Figure 1B** vs. **Figure 1C**, **Figure 1B** vs. **Figure 1D**, **Figure 1C** vs. **Figure 1D** and each contour version of **Figure 2** vs. its counter phase alternative (**Figure 2B** vs. **Figure 2C**, **Figure 2D** vs. **Figure 2E**, **Figure 2F** vs. **Figure 2G**, **Figure 2H** vs. **Figure 2I**, and **Figure 2**J vs. **Figure 2K**). All comparisons were randomized and repeated 4 times for each observer. The observer's viewing distance was approximately 40 cm and his/her task was to indicate whether the colors of the left display and the right display were the same or different. The observer reported their response verbally, by saying

either "same" or "different and the experimenter logged the answer. The observer then continued to the next trial by pressing the space bar. An example of a trial is displayed in **Figure 4**.

#### Results

All observers were fully consistent in their choices. In all same contour trials, all eight observers indicated that the color appearance was the same (i.e., 100% "same" judgments). In all the different contour trials, the observers indicated the overall color appearance to be different (i.e., 100% "different" judgments). That is, a bit surprisingly perhaps, there was no variation at all. This result indicates the robustness of the effect of color averaging across observers<sup>1</sup> .

#### Experiment 2: A Color Matching Task

The aim of this experiment was to quantify the strength of the effects demonstrated in **Figure 1**.

#### Participants

Five observers (all males), all with normal color vision, two of whom are authors on this paper, participated in this experiment.

#### Apparatus

The same computer was used as in Experiment 1. Stimulus presentation, timing and keyboard responses were controlled with custom software programmed in Python 2.7 using the PsychoPy library.

#### Stimulus Materials and Procedure

We used a matching task for this quantification. In each trial of the experiment, one of the four images from **Figure 1** was presented (diameter 19.8 arcdeg), with a white dot superimposed at one of six possible locations. The dot locations were equidistant and the same for each of the four images (see **Figure 5** for the

FIGURE 4 | Example of a trial. Two identical color images were presented on the left and the right side of the screen in each individual trial. The superimposed outlines were also identical, but could either be positioned in phase, or in counter phase (as in the example above.) Observers had to indicate if the color appearance in both images was identical or not.

possible dot locations). The colors at the 6 possible dot locations were as follows: (1) CIExy = 0.364, 0.390, L = 96.1 cd/m<sup>2</sup> ; (2) CIExy = 0.370, 0.373, L = 93.4 cd/m<sup>2</sup> ; (3) CIExy = 0.325, 0.326, L = 97.7 cd/m<sup>2</sup> ; (4) CIExy = 0.285, 0.303, L = 90.2 cd/m<sup>2</sup> ; (5) CIExy = 0.275, 0.318, L = 97.4 cd/m<sup>2</sup> ; (6) CIExy = 0.312, 0.371, L = 95.3 cd/m<sup>2</sup> . The task for the observer was to adjust the color of a matching gray disk with a diameter of 1.43 arcdeg (initial L = 87.8 cd/m<sup>2</sup> ), presented below the image superimposed on a larger, constant gray disk with a diameter of 3.86 arcdeg (L = 87.8 cd/m<sup>2</sup> ), until the color of the inner disk was perceptually similar to the color exactly surrounding the presented white dot. Observers adjusted the RGB values of the, initially, gray inner disk by pressing keyboard buttons. The observers were explicitly instructed to match the color directly surrounding the white colored small dot.

#### Results

To provide more insight into the colors that the different observers perceived, **Figure 6** shows both the presented colors

<sup>1</sup>This illusion was honored with the second prize at the Illusion of The Year Contest 2014 (http://illusionoftheyear.com/), which contained all demonstrations presented in this manuscript.

FIGURE 6 | Color matching results. For each of the images of Figure 5, the presented color, the positioning of the outlines (relative to the spot where the color was to be matched), and the mean matching result of five observers is indicated. The four panels below each other reflect the four images of Figure 5 from left to right. The small numbers on top of each panel indicate the respective matching location on the stimulus. For illustration purpose, the data are repeated four times (from left to right) to mimic the repetitive, alternating color percepts on the original images (note that colors seen on the reader's monitor may differ slightly from those in our actual experiments).

and the colors as the observers perceived them (note that variations between different color monitors means that what the reader sees in **Figure 6** will only approximate to the actual colors we used). The first colored row shows the presented color in the stimulus at the different test locations (1–6). Next, for each condition separately, first the relative positioning of the contours is shown, followed by the averaged perceived colors, as reported in the matching experiment. This figure shows the average response of five observers.

**Figure 7** presents the CIE xy values of the color settings made in this matching experiment. In each of the **Figures 7A–C** the matches on one of the contour settings is compared with the matches on the image without contours. These plots show that, irrespective of the exact positioning of the contours, presenting the contours brings the colors perceived within each cluster closer together in color space, relative to the colors perceived when no contours were presented. This effect is especially clear for contour settings one and two (**Figures 7A,B**), where the matches on the six tested locations can clearly be clustered into two groups of three data points (as indicated with the dashed colored ovals). Each of these clusters represents the matches on locations that lie between the same two outlines. For the third contour setting (**Figure 7C**), a similar effect occurs, though less outstanding. This can be explained by the larger differences between the perceived colors at these tested locations also without contours (as indicated by the black symbols in **Figures 7A–C**). The perceptual clusters as described above are absent for the condition where no outlines were presented, as indicated by the matches (i.e., the black symbols) being much more spread out over the CIE xy space. **Figure 7D** summarizes the average matches between each pair of contours, averaged over the three test locations between two outlines. The symmetrical pattern of these data reflects the repetitive gradient in the original colored image. Each contour setting leads to a color percept of two more or less opponent colors (at different sides of a contour), but these perceived near opponent color percepts are different for each contour setting.

#### Discussion

In this paper we have demonstrated that a single colored stimulus can produce different percepts depending on the shapes and positions of superimposed thin black contours. The data suggest spatial averaging of the colors between contours. Several studies have demonstrated effects related to spatial averaging. We have already shown spatial spreading of afterimage colors (Van Lier et al., 2009) and also the averaging of "real" colors (Anstis et al., 2012a). The latter effect relates to the phenomenon of monocular rivalry (Breese, 1899) in which perception alternates between two orthogonally oriented spatially overlapping semitransparent near-isoluminant gratings. We previously showed that superimposing black outlines on such a stimulus coinciding with either horizontal or vertical color borders will bias perception toward perceiving a horizontal colored grating or a vertical colored grating, respectively (Anstis

et al., 2012a). However, prolonged viewing can cause a perceptual switch to the color grating in the orthogonal orientation, most likely due to adaptation to the first perceived grating and its orientation. The relative instability of this effect could reinforce the argument that the effect is not solely due to low-level color-contour interactions, but is also related to the process of perceptual selection. In our current demonstrations the perceptual effects are more robust: prolonged fixation, controlled horizontal or vertical eye movement and changing the viewing distance does not seem to dramatically change the perceptual outcome. The presence of black contours and their shape and positioning determine color perception. Considering a specific contour arrangement, color perception is constant, not particularly susceptible to effects of adaptation or attention.

The findings we present here add to the demonstrations of contour-dependent and contour-enhanced color perception in the literature. It is likely that our demonstrations share underlying mechanisms with the watercolor illusion (Pinna et al., 2001; Hazenberg and Van Lier, 2013). In addition, we have recently reported demonstrations of color averaging between contours (Anstis et al., 2012a) that are related to the effects we report here. These effects seem to be consistent with what has been called isomorphic filling-in theory (see Von der Heydt et al., 2003), which relies on the idea that color spreads equally in all directions, except across contours. However, the use of relatively complex colored gradients in our demonstrations makes it difficult to relate our quantitative findings to any specific theory on filling in and color contour interactions.

In earlier work on color averaging we additionally showed the role of color contrast induction across contours for afterimages (Van Lier et al., 2009). Anstis et al. (1978) first showed the basic effect of color contrast induction. In our afterimage stimuli, the color of the filled-in surface is not solely determined by the afterimage that followed the color presented within contours but also by the afterimage of the color presented outside the outlined surface, which induces its opponent color on the other side of a luminance border through the process of contrast induction. We speculate that a similar color contrast mechanism may have affected the overall color appearances in our stimuli as well. Further research could test the extent to which the here presented phenomena rely on luminance-defined borders or if other types of contours, such as illusory contours or texture-defined contours could induce similar effects.

Overall, the demonstrations that we have presented here show the versatility of color averaging. The neural mechanisms responsible for this process might not be flexible, but the outcome is flexible, since color perception can change in multiple directions depending on which achromatic information is combined with the presented colors and in which exact spatial configuration. The process of building a reliable, though comprehensible model of the distal world requires many complex neural computations. An important part of this process is object segmentation and surface definition. After surface boundaries are detected, color averaging within the surface could help to minimize perceptual differences within a surface, and thereby

#### References


enhance relative perceptual differences between different objects and surfaces. In other words, perceptual color averaging could function as a tool to filter out irrelevant stimulus variability from the noisy visual input that generally faces us.

#### Acknowledgments

This research was funded by FWO Pegasus Marie Curie Fellowship 1212513N, by the Flemish government and the European Union, awarded to MV.

#### Supplementary Material

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fpsyg. 2015.00620/abstract


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Vergeer, Anstis and van Lier. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

## The IAT shows no evidence for Kandinsky's color-shape associations

#### *Alexis D. J. Makin\* and Sophie M. Wuerger*

*Department of Experimental Psychology, University of Liverpool, Liverpool, UK*

#### *Edited by:*

*Galina Paramei, Liverpool Hope University, UK*

#### *Reviewed by:*

*David Bimler, Massey University, New Zealand Rob Van Lier, Donders Institute, Netherlands Johannes Zanker, University of London, UK*

#### *\*Correspondence:*

*Alexis D. J. Makin, Department of Experimental Psychology, University of Liverpool, Eleanor Rathbone Building, L69 7ZA Liverpool, UK e-mail: alexis.makin@liverpool.ac.uk*

In the early twentieth century, the Bauhaus revolutionized art and design by using simple colors and forms. Wassily Kandinsky was especially interested in the relationship of these two visual attributes and postulated a fundamental correspondence between color and form: yellow triangle, red square and blue circle. Subsequent empirical studies used preference judgments to test Kandinsky's original color-form combinations, usually yielding inconsistent results. We have set out to test the validity of these postulated associations by using the Implicit Association Test. Participants pressed one of two buttons on each trial. On some trials they classified shapes (e.g., circle or triangle). On interleaved trials they classified colors (e.g., blue or yellow). Response times should theoretically be faster when the button mapping follows Kandinsky's associations: For example, when the left key is used to report blue or circle and the right is used for yellow and triangle, than when the response mapping is the opposite of this (blue or triangle, yellow or circle). Our findings suggest that there is no implicit association between the original color-form combinations. Of the three combinations we tested, there was only a marginal effect in one case. It can be concluded that the IAT does not support Kandinsky's postulated color-form associations, and that these are probably not a universal property of the visual system.

**Keywords: shape, form, color, Kandinsky, implicit association test, bauhaus, synesthesia**

#### **INTRODUCTION**

In recent years, there has been a growing interest in the links between art and visual perception, with new specialized journals and conferences. It is recognized that artists might be experts at exploiting the visual system; with special sensitivity to its constraints and parameters, and that vision scientists might learn something by studying paintings in detail (Ramachandran and Hirstein, 1999; Zeki, 2002; Van de Cruys and Wagemans, 2011; Shimamura, 2012).

Wassily Kandinsky (1866–1944) was an influential Russian painter. As his career progressed, Kandinsky produced increasingly abstract images, and for a period from 1922–1933 he taught at the famous Bauhaus school in Germany, which celebrated simple colors and forms. Kandinsky was a theorist as well as an artist, and he derived profound, spiritual meaning from aesthetic experiences. One of Kandinsky's ideas was that there are certain fundamental associations between colors and shapes (Kandinsky, 1912): he proposed *Yellow-Triangle, Blue-Circle, and Red-Square*. These associations were formulated introspectively, however, he did conduct his own survey at the Bauhaus in 1923 by distributing questionnaires to his professorial colleagues and students, and found that many of his colleagues agreed with his associations; notable exceptions were his contemporaries, Klee and Schlemmer, who favored different form-color combinations (Duechting, 1996). In fact, Kandinsky had already embarked upon a similar attempt to identify color form associations while still in Russia with the aim to provide the scientific underpinning for his own intuitions (Poling, 1984).

Recently, the idea of systematic color-form associations has become known as *correspondence theory* (Jacobsen, 2002; Kharkhurin, 2012). It probably crystallized somewhat after Kandinsky's death, becoming more tightly associated with the Bauhaus through various historical accidents. For example, a famous poster for the Bauhaus exhibition in Stuttgart, 1968, designed by Herbert Bayer, showed the yellow triangle, red square, and blue circle (Jacobsen and Wolsdorff, 2007).

After Kandinsky's original survey in 1923, evidence for correspondence theory has been limited. Jacobsen (2002) administered a modified version of Kandinsky's questionnaire to a sample of non-artist university students, half of whom were asked to choose combinations of colors and shapes (mere correspondence), and half were asked which combinations they found aesthetically pleasing (aesthetic correspondence). It was found that pragmatic associations influenced participant's choices, for example, they typically paired red with triangle because of association with traffic signs, and yellow with circle because this looked like the sun. Moreover, these subjects actually disliked the combinations devised by Kandinsky compared to other options. More recently, Jacobsen and Wolsdorff (2007) found that a sample of art experts had their own color-form combination preferences, but these were again in disagreement with correspondence theory.

Most recently, Albertazzi et al. (2013) asked participants to choose the color that they felt naturally went with each of 12 shapes, including some 3D shapes. Participants were explicitly told to avoid associations from memory, of the type reported by Jacobsen (2002). If people have no systematic color form associations, then all combinations would appear equally. However, some combinations were chosen significantly more often than others. The results were partially in agreement with Kandinsky's correspondence theory (Yellow and Triangle, Red and Square), but there was little evidence for blue and circle associations, and several other associations, not discussed by Kandinsky, were also reported.

The topic has also been explored by Holmes and Zanker (2008) who used an innovative oculomotor evolutionary algorithm method. Virtual "genes" controlled stimulus characteristics such as color and shape. For example, for one member of the starting population, the color gene might be set to yellow, and the shape gene set to square, so the individual stimulus would be a yellow square. For another stimulus the color and shape genes would code a different combination. At the beginning of the experiment, the computer generated a "population" of many such stimuli, with randomly chosen virtual genes. On each trial, a subset of the starting population would be shown on the screen, and the participants looked for the ones they like most. Genes were assigned a fitness score based on feedback from an eye tracker. After a certain number of trials, the stimuli entered a competitive tournament, and only the stronger combinations were passed on to the next generation. Over the trials, stimuli evolved to become more like those that best attract participant's gaze. However, while consistent preferences were found within each individual, no systematic color-form combinations emerged across observers.

Kandinsky may have had an intriguing condition known as synesthesia, where multimodal connections create idiosyncratic, additional perceptual experiences. For example, in colorgrapheme synesthetes, who have been best studied, the letter E might be perceived as if it is always written in red ink. Synesthesia does not reflect trivial associations and beliefs: grapheme-color synesthetes show improved letter detection performance in visual search tasks, and, in some subjects, the color-sensitive brain area V4 is activated by presentation of black and white graphemes (Hubbard and Ramachandran, 2005). Many theorists cited above have suggested that Kandinsky's synesthesia may have inspired correspondence theory.

Given the research to date, it is entirely possible that correspondence theory says something about Kandinsky's personal artistic elaborations or the socially constructed aesthetics of the Bauhaus movement, but says nothing about the architecture of the average person's visual system. However, Hubbard and Ramachandran (2005) suggest that mechanisms present in synesthetes may be present to a *lesser degree* in non-synesthetes, and this could account for the "conceptual rightness" of certain, widely held multisensory mappings, such as the Bouba–Kiki effect, where jagged or bulbous visual shapes seem to "go" with certain sounds (in this case, "Bouba" with rounded shapes, "Kiki" with spiky shapes). Moreover, while synesthetic associations are strikingly idiosyncratic, most people sense the Bouba–Kiki associations. Correspondence theory could reflect Kandinsky's intuition into some of these sub-clinical, quasi-synesthetic pairings, which, like the Bouba–Kiki effect, are near-universal.

It is possible that these quasi-synesthetic associations might not be explicitly recognized, but affect perceptual judgments nevertheless. Following this reasoning, Kharkhurin (2012) evaluated correspondence theory with an implicit priming experiment. Participants were shown a colored screen for 1 s, then a shape, which they had to classify as quickly as possible as triangle, square or circle. The hypothesis was that people would classify the shape quicker on congruent trials, where the prime color was associated with the shape according to correspondence theory. In a second experiment, this was reversed, and shapes were used as a prime before color classification. Neither experiment found any facilitation of reaction time in the congruent trials, so this study failed to provide any empirical support for correspondence theory using implicit priming techniques.

Kharkhurin's (2012) work was based on a priming paradigm, which, although valid and potentially informative, may not have been optimized for detecting associations between two visual dimensions. In Kharkhurin's study, the primes were taskirrelevant, and could potentially be ignored completely, while the targets were classified very easily. Recent work on affective priming and symmetry has found this technique to be much weaker than different paradigms where participants are forced to classify all stimuli (Bertamini et al., 2013a). In the current work, we test correspondence theory using the Implicit Association Test (IAT, Greenwald et al., 1998; Nosek et al., 2007), which has been used in thousands of experiments, often revealing associations between dimensions which people are either explicitly unaware of, or even explicitly reject. Recently, the IAT has been used to answer various questions in empirical aesthetics (Gattol et al., 2011; Mastandrea et al., 2011; Makin et al., 2012a,b; Bertamini et al., 2013b). Importantly, the IAT procedure requires participants to classify all stimuli. Furthermore, it was specifically designed to probe associations between stimulus pairs, and it has been subjected to extensive methodological scrutiny (Nosek et al., 2007).

The best way to describe the IAT is through example. In one of the best-known IAT experiments, participants were given two buttons, which were used to classify four stimulus categories (Greenwald et al., 1998). On some trials they saw pictures of either flowers or insects, and had to press one button for flower and the other button for insect. On interleaved trials, they saw either positive words (e.g., LOVE) or negative words (e.g., HATE). They had to press one button for positive, and the other for negative. In *congruent blocks*, the same button was used to report a flower or positive word, and the other was used to report insect or negative word. In *incongruent blocks*, the response mapping was reversed (so one button was used to report flower and negative, the other was used to report insect and positive). Because people usually associate flowers with other positive things and insects with negative things, the task was much harder in the incongruent block, and reaction times were therefore longer. The existence of an RT difference between congruent and incongruent blocks can be taken as evidence for an implicit association between the stimulus pairs. In this example, the implicit associations measured by the IAT were in agreement with the explicitly held attitudes of the participants, who preferred flowers to insects.

In another experiment, Greenwald et al. (1998) used the IAT to reveal implicit racial prejudices in participants who were not overtly racist. White participants were quicker to respond in congruent blocks, where white faces and positive words were reported with the same key, and black faces and negative words were reported with the other key, then in incongruent blocks, where the response mapping was reversed (white and negative, black and positive). Given that the IAT is sensitive to hidden or unconscious associations, it might reveal color-form correspondences that are not explicitly acknowledged.

In this work we presented 36 participants with three separate IAT experiments in a within-subjects design (see Gattol et al., 2011 for another example of this "multidimensional" IAT approach). Each IAT experiment compared two colors and two shapes. In the congruent blocks, color-form mapping was in line with Kandinsky's correspondence theory (yellow-triangle, blue-circle, red-square). Let's consider IAT 1 for example: in the congruent blocks participants had to press the left key for blue or circle, and the right key for yellow or triangle (both Kandinsky's correspondences). In the incongruent blocks, response mapping was the opposite of correspondence theory, so the left key would be used for blue or triangle and the right key would be used for yellow or circle (Opposite of Kandinsky's correspondences). **Tables 1** and **2** show the structure of IAT 1 in more detail. The three IAT experiments cover every combination of color and shape proposed by correspondence theory (**Table 3**).



**Table 2 | Order of blocks and response mappings for participants who did incongruent trials first (example from IAT 1).**


### **METHOD**

#### **PARTICIPANTS**

Thirty-six participants were involved (age 18–45, 11 male, 6 left handed). Most were involved with undergraduate or postgraduate study at the University of Liverpool. All had normal or corrected to-normal vision. The male participants were checked for color blindness using the 1966 Ishihara plates (Ishihara, 1917), and none were aware of the Kandinsky color form correspondence theory (or were at least unaware of the exact combinations he proposed).

#### **APPARATUS AND STIMULI**

Stimuli were presented at ∼57 cm on a 1280 × 1024 pixel CRT monitor, with a refresh rate of 60 Hz. Stimuli were generated using open source Psychopy software (Peirce, 2007). Participants entered responses using the left [A] and right [L] keys of a standard computer keyboard. The three shapes were white line drawings on a black background. The circle was 6*.*3◦ in diameter, the square was 6*.*3 × 6*.*3◦, and the triangle was 6*.*3◦ tall, and 6*.*3◦ wide, always presented with the same upwards orientation. Color patches were ∼20◦ wide (Gaussian-modulated full screen, **Figure 1**). Since there are no precise records of the colors used by Kandinsky, we used a set of primary colors (red, yellow, blue) close to unique hues as judged by a color-normal experienced observer (Wuerger et al., 2005; Wuerger, 2013). The CIE coordinates and the luminance of the colored Gaussian patches were as follows: yellow *<sup>x</sup>* <sup>=</sup> <sup>0</sup>*.*406; *<sup>y</sup>* <sup>=</sup> <sup>0</sup>*.*512; luminance <sup>=</sup> 59 cd/m2; red: *<sup>x</sup>* <sup>=</sup> <sup>0</sup>*.*628; *<sup>y</sup>* <sup>=</sup> <sup>0</sup>*.*331; luminance <sup>=</sup> 14 cd/m2; blue: *<sup>x</sup>* <sup>=</sup> <sup>0</sup>*.*152; *<sup>y</sup>* <sup>=</sup> <sup>0</sup>*.*071; luminance <sup>=</sup> 8 cd/m2. Example stimuli from IAT 1 are shown in **Figure 1**.

#### **PROCEDURE**

Each participant completed 3 IAT experiments, which took about 7 min each. The structure of a single IAT experiment was based on the recommendations of Nosek et al. (2007). There were 8 blocks in total. Half the participants did the congruent blocks first. For these participants, the IAT 1 would have run as shown in **Table 1**. The first block was a training block, where participants discriminated shape only (e.g., left for circle, right for triangle). The second block was another training block, where participants discriminated color only (e.g., left button for blue, right for yellow). The third and fourth *congruent blocks* combined color and shape discrimination, and the response mapping fitted Kandinsky's theory (e.g., left for circle or blue, right for triangle or



yellow). Next two further training blocks were given, where participants had to relearn the key mapping for shapes (e.g., left button for triangle, right for circle). Finally, two incongruent blocks were presented, where the response mapping was the opposite of Kandinsky's correspondence theory (e.g., left for triangle or blue, right for circle or yellow). The other participants did the incongruent block first, and the training blocks were rearranged accordingly (**Table 2**). In each block, the trials were presented in a novel random order for each participant. The order in which participants did the three IAT experiments, and the number of participants doing congruent or incongruent blocks first was counterbalanced.

All blocks were preceded by appropriate instructions informing participants of the stimuli and response requirements. When stimuli were presented, cue words reminded participants which keys should be used to report their answers. For example, when a shape appeared in first training block of IAT 1, the words "circle" and "triangle" appeared on the left and right sides of the screen. When participants pressed the wrong button, the word "Wrong" appeared centrally in red, and remained until the correct key was pressed.

#### **ANALYSIS**

Kandinsky's correspondence theory would be supported by faster reaction times in congruent blocks than incongruent blocks. We processed the data according to the recommendations of Nosek et al. (2007), which are widely used in the IAT literature (e.g., Gattol et al., 2011). Training blocks were excluded from analysis. Trials where participants pressed the wrong button were also excluded (6%). For each participant and experiment, the following data processing steps were taken.


The D score is the difference between incongruent and congruent blocks in standard deviation units. A positive value means the hypothesis was supported, and negative value means that the participant associates stimuli in the opposite way to that predicted. We got three D scores from each participant: one from each IAT experiment in **Table 3**. We used one-sample *t*-tests to explore whether D scores across participants were significantly greater than zero. These variables in this analysis were normally distributed according to the Shapiro-Wilk test (*p >* 0*.*219).

#### **RESULTS**

Participants completed three IAT experiments, each comparing a pair of Kandinsky's color form correspondences. In the congruent blocks, response mapping was in accordance with the theory, while in the incongruent blocks, response mapping was the opposite of correspondence theory. Faster reaction times in the congruent blocks would support the theory. We found limited evidence for this in our IAT experiments. As shown in **Figure 2**, D scores were normally distributed around zero in IAT1 [*t(*35*)* = −1*.*301, *p* = 0*.*202] and IAT 2 [*t(*35*)* =*<* 1, N.S]. However, In IAT 3 there was a borderline effect in the expected direction in IAT 3 [*t(*35*)* = 2*.*020, *p* = 0*.*051].

Repeated measures ANOVA tested whether there was a difference in the D scores between the three IAT experiments, and whether this interacted with the between-subjects factors of experiment order (the order which the three IAT experiments were presented) and block order (congruent block first or second). There was a borderline significant effect of IAT [*F(*2*,* <sup>60</sup>*)* = 3*.*141, *p* = 0*.*050], because of the slightly larger D score in IAT 3, described above. There were no interactions involving experiment order [*F(*4*,* <sup>60</sup>*) <* 1, N.S], or block order [*F(*2*,* <sup>60</sup>*)* = 1*.*744, *p* = 0*.*183].

There are various approaches to IAT analysis. Some researchers do not include the first blocks of congruent and incongruent trials (Blocks 3 and 7 in **Table 1**) and analyze D scores from the longer second blocks only (Blocks 4 and 8). With this approach, the effect in IAT3 was again borderline significant [*t(*35*)* = 1*.*969, *p* = 0*.*057].

Another approach is to simply compare mean response time in congruent and incongruent blocks (without standardization to D scores). In our study, this approach found no effects in IAT1 [609 vs. 588 ms. *t(*35*)* = −1*.*241, *p* = 0*.*223] or IAT2 [630 vs. 601 ms. *t(*35*)* = −1*.*502, *p* = 0*.*142], [575 ms. in the congruent block vs. 621 ms. in the incongruent block, *t(*35*)* = 2*.*498, *p* = 0*.*017].

We next analyzed response times with mixed ANOVA with two within-subject factors [experiment (IAT1, IAT2, or IAT3) × block (congruent or incongruent)] and two between subject factors (3 experiment order × 2 block order). Correspondence theory predicts a main effect of block, resulting from uniform

faster responses in the congruent blocks. There was no evidence for this [*F(*1*,* <sup>30</sup>*) <* 1, N.S]. Furthermore, there was no overall response time difference between the three IAT experiments [*F(*1*,* <sup>61</sup>*,* <sup>48</sup>*.*26*) <* 1, N.S.]. There was, however, an experiment × congruence interaction [*F(*2*,* <sup>60</sup>*)* = 5*.*517, *p* = 0*.*006]. This was because of the unique effect of congruence in IAT3, mentioned above. There was also an IAT × experiment order interaction [*F(*3*,* <sup>22</sup>*,* <sup>48</sup>*.*26*)* = 3*.*654, *p* = 0*.*017], partly because participants tended to take longer to produce responses in their first IAT than the next two. There were no other effects or interactions [next largest *F(*1*,* <sup>30</sup>*)* = 1*.*777, *p* = 0*.*193].

Finally, we predicted that participants should make fewer errors in the congruent blocks. There was no difference in error rates between congruent and incongruent blocks in IAT1 [*t(*35*) <* 1, N.S.] or IAT2 [*t(*35*) <* 1, N.S.]. However, responses were more accurate in the congruent blocks of IAT3 [*t(*35*)* = 3*.*431, *p* = 0*.*002]. So, while there was some marginal support for Kandinsky's correspondence theory from IAT 3, we acknowledge most evidence for this effect would disappear completely if we employed correction for multiple comparisons, so we do not consider it instructive without replication.

#### **DISCUSSION**

Our series of three IAT experiments provided no support for Kandinsky's color form correspondence theory. Participants completed three IAT experiments; each designed to compare one pair of color-shape associations with the opposite. In IATs 1 and 2, Reaction times were comparable in congruent and incongruent blocks, suggesting no special associations between particular combinations. These results are thus in agreement with those of Kharkhurin (2012) and Holmes and Zanker (2008), who also found no evidence for correspondence theory using different implicit procedures. In IAT 3, there was a marginally significant effect in the expected direction, suggesting that participant's associate triangle with yellow and square with red, as Kandinsky would have predicted. However, this effect was not strong, and would require replication.

Our IAT experiments did not test whether people have *preference* for yellow triangles, red squares and blue circles over other color-shape pairings. However, we think these putative color shape associations would be theoretically interesting even without aesthetic preferences. For one thing, a clear, positive result in our IAT experiments would suggest that Kandinsky had intuitively recognized an obscure and unexplained property of the visual system, which would require further research. Our null results have different implications: We conclude that Kandinsky's correspondence theory has little to say about the architecture of the typical visual system, and must have other origins. Leading thinkers in scientific aesthetics are probably justified when they claim that artists possess a unique kind of knowledge about human vision (e.g., Zeki, 2002; Van de Cruys and Wagemans, 2011), however, correspondence theory is probably *not* an example of this special insight.

It is very unlikely that the null results reported here can be attributed to low power. There were 36 participants, a much larger sample size than used in other IAT experiments in aesthetic science. For example, Makin et al. (2012a) used the IAT to detect an implicit preference for symmetrical over random patterns with 12 subjects, and replicated this in other experiments with just 6 subjects. The mean D scores in these experiments were around 0.5, whereas the highest D score here was 0.12. Further examples from diverse fields allow this effect size to be seen in context: Gattol et al. (2011) found D scores ranging around 0.2–0.4 for associations between different car brands and dimensions such as "aggressive-peacefulness" and "conventional-innovative," while Dasgupta et al. (2009) reported D scores of at least 0.5 in a series of IATs measuring implicit attitudes toward social in-groups and outgroups. These are all higher than the largest D score found in IAT3.

Moreover, we conducted three IAT experiments, and did not correct for multiple comparisons. This would increase the chances of a false positive, but we still only found a borderline effect in one of our three experiments. Finally, there was an asymmetry in the design of this experiment, which can be seen in **Figure 2**: Across the three IATs, a shape was twice as likely to share a button with the color proposed by correspondence theory as any other color (for example, a participant would be presented with two experiments where yellow and triangle were reported with the same button, but only a single experiment where blue and triangle or red triangle shared a button). If anything, this design feature would produce false positives in favor or correspondence theory, which we did not find. This data, in conjunction with previous work, seems to rule out Kandinsky's correspondence theory in any strong form, so we think Kandinsky's theory is very unlikely to constitute an insight into the architecture of the average visual system.

Jacobsen (2002) reported different color-form associations to Kandinsky, which seemed to be based on pragmatic associations. For example, his participants paired yellow with circle, because the sun is a yellow circle. We did not find any further evidence for this with our IAT experiments. However, Jacobsen (2002) used a modified version of Kandinsky's questionnaire, which required people to choose preferred particular color-shape combinations from the available options. Perhaps these pragmatic associations only exist when people are forced to make explicit, verbal judgments? Moreover, a very recent study using an explicit matching task (Albertazzi et al., 2013), found partial support for the correspondence theory; observers had to chose their preferred color-form combination, similar to Kandinsky's questionnaire. The authors report a strong association between a triangle and yellow, some evidence for red being associated with a square, but there was no support for the blue-circle combination.

It seems likely that Kandinsky's correspondence theory was influenced by his theorizing, or perhaps indirectly by his

#### **REFERENCES**


multi-dimensional implicit associations. *PLoS ONE* 6:e15849. doi: 10.1371/journal.pone.0015849


synesthesia, and was then spread by subsequent Bauhaus literature and promotion. It seems unlikely that these color-shape associations reflect a common property of all human brains, akin to sub-threshold synesthesia, or that they are an aesthetic universal. Our null results are important because it is likely that people will continue seeking empirical evidence for Kandinsky's correspondence theory, given the current popularity of empirical aesthetics.

Despite our conclusive negative findings, many associations involving color exist in the human brain, some of which may have an ecological basis. For example, Palmer and Schloss (2010) describe evidence that color preferences arise from associations with pleasant and unpleasant objects (e.g., clear blue sky vs. brown feces or rotting food). More generally, it is well-known that statistical properties of the environment (such as a common motion direction for objects and sound sources) are reflected in the neural mechanisms that combine stimuli from different modalities (e.g., Meyer and Wuerger, 2001; Harrison et al., 2011), and that the typical color of objects complements perception, so that, for example, gray bananas still appear slightly yellow (Hansen et al., 2006). However, evidence for apparently meaningless, quasi-synesthetic, color-form associations of the type proposed by Kandinsky remains limited.

#### **ACKNOWLEDGMENTS**

This work was partially funded by the ESRC and by Leverhulme Trust Early Career Fellowship awarded to Alexis D. J. Makin.


Bauhaus-Archiv Berlin, Museum fuer Gestaltung.


account of visual art. *i-Perception* 2, 1035–1062. doi: 10.1068/i0466aap


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 01 May 2013; paper pending published: 31 May 2013; accepted: 22 August 2013; published online: 11 September 2013.*

*Citation: Makin ADJ and Wuerger SM (2013) The IAT shows no evidence for Kandinsky's color-shape associations. Front. Psychol. 4:616. doi: 10.3389/fpsyg. 2013.00616*

*This article was submitted to Perception Science, a section of the journal Frontiers in Psychology.*

*Copyright © 2013 Makin and Wuerger. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

### Investigating preferences for color-shape combinations with gaze driven optimization method based on evolutionary algorithms

#### *Tim Holmes <sup>1</sup> and Johannes M. Zanker <sup>2</sup> \**

*<sup>1</sup> Acuity Intelligence Ltd., Reading, UK*

*<sup>2</sup> Department of Psychology, Royal Holloway University of London, London, UK*

#### *Edited by:*

*Galina Paramei, Liverpool Hope University, UK*

#### *Reviewed by:*

*Sophie Wuerger, The University of Liverpool, UK Osvaldo D. A. Pos, Univversità degli Studi di Padova, Italy*

#### *\*Correspondence:*

*Johannes M. Zanker, Department of Psychology, Royal Holloway University of London, Room W 214, Egham, Surrey TW20 0EX, London, UK e-mail: j.zanker@rhul.ac.uk*

Studying aesthetic preference is notoriously difficult because it targets individual experience. Eye movements provide a rich source of behavioral measures that directly reflect subjective choice. To determine individual preferences for simple composition rules we here use fixation duration as the fitness measure in a Gaze Driven Evolutionary Algorithm (GDEA), which has been demonstrated as a tool to identify aesthetic preferences (Holmes and Zanker, 2012). In the present study, the GDEA was used to investigate the preferred combination of color and shape which have been promoted in the Bauhaus arts school. We used the same three shapes (square, circle, triangle) used by Kandinsky (1923), with the three color palette from the original experiment (A), an extended seven color palette (B), and eight different shape orientation (C). Participants were instructed to look for their preferred circle, triangle or square in displays with eight stimuli of different shapes, colors and rotations, in an attempt to test for a strong preference for red squares, yellow triangles and blue circles in such an unbiased experimental design and with an extended set of possible combinations. We Tested six participants extensively on the different conditions and found consistent preferences for color-shape combinations for individuals, but little evidence at the group level for clear color/shape preference consistent with Kandinsky's claims, apart from some weak link between yellow and triangles. Our findings suggest substantial inter-individual differences in the presence of stable individual associations of color and shapes, but also that these associations are robust within a single individual. These individual differences go some way toward challenging the claims of the universal preference for color/shape combinations proposed by Kandinsky, but also indicate that a much larger sample size would be needed to confidently reject that hypothesis. Moreover, these experiments highlight the vast potential of the GDEA methodology in experimental aesthetics and beyond.

**Keywords: visual perception, aesthetics, shape, color, evolutionary algorithm, individual preference**

#### **INTRODUCTION**

For many centuries questions about the origin, rational, universality, and biological foundations of aesthetic judgments have been a matter of speculation and debate. Since Fechner's inception of "psychophysics" as an exact method of studying the relationship between the physical world and mental experience (Fechner, 1876) such questions have been within the reach of experimental investigation. Berlyne's work (1971) provides a rich foundation to advance this experimental approach by applying recent technical and computational advances in psychophysics that allow us to collect and analyse large and complex data sets in an attempt to revisit some of the classic questions of aesthetics. In the present paper we wanted to test experimentally a well known composition rule about the association of color and shape, which had been advanced in the Bauhaus arts school by one of its most prominent members, Wassily Kandinsky. His proposal that red squares, yellow triangle, and blue circles are the "most pleasing" became a prominent topic at the Bauhaus art school teaching and has been tested empirically with mixed results (Makin and Wuerger, 2013).

The effects of color, shape, and space were one of the primary concerns of the Bauhaus art school (Wingler et al., 1969). As a painter and a teacher, Kandinsky was an influential member of the school. In addition to that, he was a self-professed synaesthete (Ione and Tyler, 2003; Kadosh and Henik, 2007), and had a profound interest in the combination of such features as color and shape in a single object. In addition to his associations between color and music, Kandinsky was convinced that there are universal harmonies between shape and color. In particular, he claimed that there were strong associations between the primary colors blue, red, and yellow and simple geometric shapes like circles, squares, and triangles (Jacobsen, 2002). He drew support for this claim from an empirical investigation of this association by presenting students with a questionnaire containing the outline of a circle, a square, and a triangle and asking them to color in the shapes using only red, blue, and yellow, with each color to be used once and only once (Kandinsky, 1923). The experiment suggested a group preference for blue circles, red squares, and yellow triangles (Jacobsen, 2002). The reliability of this study is contentious (Ball and Ruben, 2004), since Kandinsky's own students and colleagues were unlikely to provide an unbiased sample, and the precise distribution of the results is, unfortunately, undocumented. Additionally the strict color limitations and pre-printed questionnaire which fixed both the arrangement and orientation of the three shapes did not allow the associations to vary independently or for order and orientation effects to be explored. Unsurprisingly, subsequent attempts to validate the theory have yielded inconsistent results (Jacobsen, 2002; Leder et al., 2004; Albertazzi et al., 2013), perhaps because they have tended only to try to establish group level associations, and mostly within the strict confines of Kandinsky's original questionnaire without allowing participants to freely explore a broader color space [with the notable exception of Albertazzi et al. (2013)]. Moreover, several replications have suggested systematic differences between the color/shape associations at an individual level without devoting much effort to investigating the strength and consistency of color/shape correspondence on an individual level (Jacobsen, 2002).

In some of our recent work (Holmes and Zanker, 2008, 2012) we established a novel method to investigate individual choices in large search spaces, based on a gaze driven evolutionary algorithm (GDEA), which has the potential to address some of the issues that had been raised with Kandinsky's experiment. This method combines the power of evolutionary algorithms (EA)—a computational optimization method for highly multi-dimensional search spaces—with the versatility of gaze tracking as means to collect participants' responses in an unobtrusive way from the natural behavioral responses to evaluating the stimuli without focusing the participant on the question being explored. The GDEA was used in our experiments to investigate color/shape association on both a group and individual level. Participants were presented with the same three shapes as used by Kandinsky, but with both the three color palette offered in the original experiment as well as an extended palette of seven colors. In addition, the effects of shape orientation were also explored. The limitation of singular color-shape combinations was overcome, as well as possible position or sequencing effects.

EA are an optimization method which have, at their core, the biological evolutionary principles of reproduction and selection applied to a large population of individuals over a number of generations. Individual members of a population represent possible solutions to a design problem and are described by genetic information using a chromosome which can be thought of as a multi-dimensional vector, and reproduce by exchanging parts of this genetic code with that of another individual (crossover) to produce "offspring" with chromosomes that inherit information from both "parents." Random mutation is often applied to the genetic codes of the offspring to introduce new variants into the population. Survival of offspring, and ability to pass genetic information onto a subsequent generation, is determined by an evaluation and selection process which ensures the fittest members of the population are most likely to reproduce and send their genetic codes on to the next generation (Holland, 1975; Goldberg, 1989; Bentley, 1999). An "evolutionary run" comprises of repeated iterations of this process, as shown in **Figure 1**, which at the end converges to a "final generation" in which the population represents the evolved state that is regarded as the best solution for a given design problem.

These concepts from biological evolution engendered several different types of EA which were developed independently (Eiben and Smith, 2003), which incorporate different genetic representations, such as binary encoded genes in Genetic Algorithms (GA) (Holland, 1975; Goldberg, 1989), real-valued genes in Evolutionary Strategies (ES) (Rechenberg, 1973, as cited in Rechenberg, 1989; Schwefel, 1993), or logical expressions in Genetic Programming (GP) (Koza, 1992). Furthermore, they can rely to different extents on the variance from recombination and mutation (Spears, 2000). In the present context, we use a hybrid mechanism that uses integer values for genes (similar to ES) and uses crossover as well as mutation to boost genetic

**FIGURE 1 | The evolutionary cycle.** An initial population, usually randomly generated, is evolved over several generations using an iterative process comprising EVALUATION, SELECTION and REPRODUCTION stages to produce a "fitter" population—the average fitness of the overall population has increased. EVALUATION is the means by which each individual in the population is attributed a fitness score, which is using gaze data in the GDEA. SELECTION is the means by which individuals are chosen to participate in the generation of "offspring" which will populate the next generation, and thus carry forward the genetic information contained in the chromosomes of both of the parents. REPRODUCTION is the means by which the genetic information from the "parents" is recombined to produce offspring, typically involving crossover (an exchange of genetic information from both parents, resulting in inheritance of properties from both parents) and mutation [random changes to the chromosome of the offspring resulting in novel properties not inherited from the parent(s)].

variability, which puts it into the category of a Genetic Algorithm. One of the crucial technical aspects of the GDEA method used here concerns the implementation of the selection. In search of preference an active choice paradigm has been previously used (see Holmes, 2010) which requires the participants to press buttons in order to indicate their choices, or make verbal responses, which leads to very time-consuming experiments, and corresponding challenges for maintaining attention, and to tendencies to think extensively about every decision, which could trigger criteria shifts about what is regarded as "best," "pleasing," or "beautiful."

Therefore, the GDEA makes use of spontaneous eye movements (Holmes and Zanker, 2012) which result from an interaction between so called "bottom-up" (e.g., saliency) and "top-down" (e.g., attractiveness) effects which can be used as a largely reflexive physiological marker for preference. Preferential selection based on eye movements has been used in a variety of contexts, in particular being at the core of the preferential looking paradigm (Teller, 1979; Dobson, 1983). The power of such methods has been supported by the gaze cascade model of decision making based on preference in visual displays (Shimojo et al., 2003; Glaholt and Reingold, 2009). The relationship between accumulated fixation time and task relevance, which is fundamental to such methods, has first identified by Yarbus (1967).

The purpose of the current work is to explore whether preferred associations of color and shape can be confirmed with newly developed experimental techniques that reduce the influence of potential bias in the task imposed on participants, and also allow us to test such basic aesthetic preferences in a less constrained set of stimuli. To this end we use a specific version of the GDEA that allows us to evolve preferred color-shape combinations for the three basic shapes used by Kandinsky in the absence of instructions that might lead the participants, with simultaneous presentation and comparison of design alternatives, with a larger color palette, and allowing shapes to be presented in noncanonical (upright) orientation. All of these manipulations would lend support to any claim of a "general" composition rule that goes beyond the very small set of designs that would have been possible in Kandinsky's original experiment.

#### **METHODS**

The basic structure of an evolutionary algorithm is well defined through the fundamental steps of reproduction and variation combined with evaluation and selection (see **Figure 1**). However, the implementation of various processes, both in terms of logic and in terms of specific runtime parameters, varies considerably with each instantiation of an evolutionary algorithm. In the following section we describe the representation of features in the genetic code, and the selection, reproduction, and mutation processes used in this study, as well as the subjective fitness measure driving the selection process.

#### **GENETIC CODE TO REPRESENT STIMULI**

The phenotype were defined as colored shapes which could be displayed in any of eight orientations. This was represented using a single chromosome containing three genes as follows:


Integers were used to encode each gene with three, seven, or three different values (alleles), respectively. Thus, the chromosome simply comprised three integers, giving a relatively small solution space of just 168 possible genotypes. It should be noted that due to the rotational symmetry properties of squares and circles, this leads to a considerably smaller number of phenotypes (47).

#### **EVOLUTIONARY ALGORITHM**

When using binary genes in GA, mutation can be implemented using a random bit-flip which rarely need constrained on the number of bit-flips which could occur in a single offspring. For features determined by several or multi-graded genes such an implementation can result in mutated individuals which can look very different from their parents since the probability of any one bit (allele) in chromosome being mutated is the same. This potentially disrupts the function of the mutation operator which is primarily to perform a localized movement within the search space (Eiben and Smith, 2003). In the present experiment a form of geometric mutation (Moraglio and Togelius, 2007) was applied, which is a proximity weighted mutation operator. This means that the probability of any one allele mutating to a neighboring value is higher than the probability of it mutating to a distant one. In the present context a single mutation in any of the genes resulted in a change to one of its two nearest neighbors, each of which was equally likely, which would generate the smallest deviation within a feature, such us the most similar orientation or color of an object. The mutation for each given gene was determined independently by means of a "weighted coin toss" (i.e., true/false values are not equally likely), meaning that it was possible for multiple mutations within the same chromosome to occur, such as shape and color changing in a single evolution step.

We used a tournament selection method, which has been suggested for situations where the fitness scores are noisy, as is typically the case in interactive evolutionary computation (Miller and Goldberg, 1995; Takagi, 2008) where humans provide the fitness scores rather than mathematical functions whose results are determined by the alleles in the chromosome. Tournament selection ensures that the fittest population members are most likely to be selected as parents in the reproduction process whilst still allowing weaker members to contribute to the variability of the population. In this case, for each offspring, tournaments of three randomly selected population members were created from which the two fittest members were chosen to act as parents.

A single evolutionary run comprised 10 generations, with two presentations of eight individuals for each generation. A steady state (fixed) population size of 36 members was used which was sufficient due to the relatively small solution space. The initial population was generated by using random genomes and subsequent generations were evolved with crossover rate of 0.75, and a mutation rate of 0.1875 together with partial replacement, i.e., replacing a fraction of the genes from the parent population (75%), was used to preserve diversity across consecutive generations and to limit premature convergence. The mutation rate is relatively high for a GA where rates of 0.05 are more common, but within the small solution space we did not want participants losing interest as a result of seeing screen after screen of the same shape/color combinations.

#### **STIMULI**

Samples of eight individuals (colored shapes) were rendered and displayed, using a Cambridge Research Systems ViSaGe (Visual Stimulus Generator), on a 48 cm diameter CRT Monitor (Sony 17 Multiscan 17SF 1280 × 1024 pix) at a distance of 57 cm from the participant. All experimental software was written in Matlab R2007b, using a Windows XP system. In Kandinsky's original questionnaire, the area of the three shapes was not kept constant, but instead the height and width were. Because a constant area would result in triangles that were perceptually larger than the other shapes, we also used constant width and height (32 mm) resulting in areas of 805 mm<sup>2</sup> for a circle, 512 mm<sup>2</sup> for a triangle, and 1024 mm<sup>2</sup> for a square. We presented eight (genotypically) distinct individuals on each screen, but the same individual could be displayed more than once in conditions using more than one screen presentation for fitness evaluation. It should also be noted that due to the relatively small solution space, multiple individuals with same phenotype frequently occurred in a single presentation. Individuals were displayed in a radial fashion, with their center at 8◦ distance from the center of the screen. Thus, the stimuli were presented peripherally with respect to the central fixation cross-hair presented immediately prior to the stimuli. A white background was used and all individuals were rendered with a black outline (2 pixels wide), in order to approximate the white paper and black outlined shape to be colored in from the original questionnaire used by Kandinsky (1923). Luminance values for seven different colors used to render the shapes on the screen were: red (RGB = 255|0|0, 18.0 cd/m2; CIE XYZ = 41.24, 21.26, 1.93), orange (RGB = 255|128|0, 47.4 cd/m2; CIE XYZ = 48.96, 36.70, 4.50), yellow (RGB = 255|255|0, 76.9 cd/m2; CIE XYZ = 77.00, 92.78, 13.85), green (RGB = 0|255|0, 58.9 cd/m2; CIE XYZ 35.76, 71.52, 11.92), cyan (RGB = 0|255|255, 63.6 cd/m2; CIE XYZ 53.81, 78.74,106.97), blue (RGB = 0|0|255, 9.5 cd/m2; CIE XYZ 18.05, 7.22, 95.05), magenta (RGB = 255|0|255, 22.7 cd/m2; CIE XYZ 59.29, 28.48, 96.98). Background luminance (white) was (RGB = 255|255|255, 86.3 cd/m2; CIE XYZ 95.05, 100.00, 108.90).

Participants were presented with samples of eight colored shapes and instructed to look for the most aesthetically pleasing shape, the specific target, circle, square, or triangle being indicated via an onscreen instruction at the start of each 10-generation run. In all, eight conditions were presented, six of which are shown in **Figure 2**.



Two additional condition, in which eight different rotations were used for singe and mixed shapes, respectively, with the seven color palette were also tested in our experiments (as conditions D1 and D2), but the data are not shown in this paper because they only added little to the observations made in the set presented here (for a full account, see Holmes, 2010). The experiment was run in two parts. Conditions A and B, in which the rotation gene was ignored during phenotype rendering, was completed by six participants with 12 evolutionary runs for each condition. Conditions C (and D) with the rotation gene activated were completed by three participants with 12 evolutionary runs for each condition. An underlying population size of 36 individuals was used for all conditions.

#### **PROCEDURES**

Participants were presented with samples of eight colored shapes and instructed to look for the most aesthetically pleasing shape, the specific target, circle, square or triangle being indicated via an onscreen instruction at the start of each 10-generation run.

Participants were initially presented with a white screen with a central fixation cross-hair for 1000 ms. Samples of eight individuals from the population were then presented together for 1500 ms after which a color noise mask (randomly generated 5 × 5 pixel blocks, 2.5 × 2.5 mm, of red, orange, yellow, green, cyan, blue, and magenta, see Holmes (2010) was presented for 250 ms to neutralize retinal after images from the high contrast stimuli before the fixation cross was presented to start the next iteration.

#### **EYE-TRACKING AND FITNESS ESTIMATION**

A Cambridge Research Systems 50 Hz Video Eye-Tracker (CRS VET) was used with CRS Matlab toolbox. Fixations were defined as periods of 100 ms or more during which the gaze location remained within a 2.5 × 2.5 mm window on the screen i.e., within a 0.25◦ region of visual angle at a viewing distance of 57 cm.

After presentation of each screen, the gaze data was analyzed as follows, in order to derive an eye tracking signature that was simply cumulative fixation time. (i) Any positional information which did not fulfill the criteria for a fixation was removed as well as any fixation which lay outside of a non-overlapping area of interest extending 2.5 mm beyond the perimeter of each individual rectangle (phenotype). (ii) The total amount of time spent fixating in each zone was then calculated and divided by the screen presentation time to give a fitness score for each phenotype in the range 0.0–1.0. (iii) In the cases where an individual was presented on multiple screens in a single generation, the fitness scores for each presentation were averaged to produce a single fitness score for the phenotype. It is important to note that all fixations for the zone enclosing the entire phenotype contribute to the fitness of that phenotype; fixations on individual features within the phenotype were not distinguished meaning that the fitness is truly based on the interaction of genes resulting in the phenotype and not the individual genes themselves, as is typical for most EA (Takagi, 2001).

Because the number of individuals presented to the participant was less than the size of the population, the fitness of individuals in the population that had not been presented for evaluation was estimated prior the reproduction step, using the following procedure. Let for any one generation *xi* represent the *i*-th population member, with an associated fitness C(ˇ *xi*) defined as the average amount of time spent fixating on that individual. Then

$$f(\mathbf{x}\mathbf{i}) = T(\mathbf{x}\mathbf{i})/N(\mathbf{x}\mathbf{i}) \bigg/ \sum\_{i} T(\mathbf{x}\mathbf{i})/N(\mathbf{x}\mathbf{i})$$

Where *T*(*xi*) is the total amount of time spent fixating on the *i*-th population member, and *N*(*xi*) is the total number of presentations of the *i*-th population member.

Once all samples had been presented for one generation, the fitness of each un-presented individual was estimated. Let *H*(*xi*, x*k*) be the "Euclidian distance" between the *i*-th (un-presented) individual and the *k*-th (presented) individual. This is calculated for *xi* for all un-presented *xk*, and the method of least squares is then used to estimate the equation of the line which best describes the points [*H*(*xi*, *xk*), *f(xk)*], which can then be used to estimate the fitness of the un-presented individual as follows:

$$\hat{f}(\mathbf{x}\mathbf{j}) \;=\;\alpha H(\mathbf{x}\mathbf{j}, \mathbf{x}\mathbf{j}) + \emptyset$$

Since the Euclidian distance of an individual with itself is always zero, β gives the estimated fitness of the un-presented *l*-th population member. This process was repeated for all un-presented population members, *i*, before the selection and reproduction stages of the algorithm.

#### **PARTICIPANTS**

Six participants were recruited from the Psychology Department of Royal Holloway University of London, and received no payment for completion of the experiment. The experiments comply with general ethical procedures, and had been approved by the local ethics committee. All participants conformed that they had normal or corrected-to-normal vision, and no color deficiency, and provided written consent.

#### **RESULTS**

#### **GROUP RESULTS**

Similar to our previous work on aesthetic preferences, such as our study on the Golden Ratio (Holmes and Zanker, 2008), choices of individual participants could vary considerably suggesting that a much larger sample would be needed to collect evidence about the existence of systematic effects and to discern their significance with sufficient statistical power. However, the results even from the small group of participants tested in the current experiment do suggest some degree of individual color/shape correspondence. Throughout this section, stacked bar charts are used to illustrate the average proportions of particular colors for a given shapes and orientation, and at any given generation in the evolutionary run, which are found as phenotypes in the stimulus population. In the presence of any selection pressure that would favor a particular design, these proportions would deviate from random populated samples of feature combinations. For example in the single shape, single orientation, three color condition (A1) and the population size of 36 individuals, the solution space has the size three (colors) for each given shape, which should lead to 1/3 (33%) of observations or 36/3 = 12 individuals for each phenotype (here color) in an not evolved population (such as the initial, random selected, generation). In the course of evolutionary change, these proportions would change, and the population would contain a larger proportion of the "preferred" phenotype, whilst the proportion of other phenotypes would decrease correspondingly.

The three upper panels illustrating condition A1 in **Figure 3** shows the proportion of red, blue, and yellow individuals in a population that only contains circles, triangles, or squares, respectively. These are shown for the initial (randomly selected) generation 1, and generations 2, 4, 6, 8, and 10, which were evolved using the methods described in section Methods. As expected, in the first generation for each shape the populations appear to be well balanced with about equal proportions of all three colors. There is a slight trend for yellow circles to increase in proportion and for blue circles to decrease, and a much more prominent one for yellow triangles to increase at the expense of red, whilst for squares there is no consistent trend. The three lower panels illustrating condition A2 in **Figure 3** show the proportion of red, blue, and yellow individuals of one particular shape in a population that contains mixtures of circles, triangles, and squares that can take any of the three colors, for generations 1, 2, 4, 6, 8, and 10, as in A1. The first observation to make for this condition is that the initial generation not only shows equal proportions of all three colors (as in A1) but additionally is reduced to a cumulative proportion for all colors of ∼33% for each of the three shapes again as expected, because the rest of the randomly generated

**FIGURE 3 | Evolution of populations with three shapes × three colors. (A1)** three shapes are tested in separate populations containing a single shape each (i.e., participants are asked to select the preferred individual from three color-shape combinations with identical shape). **(A2)** three shapes are tested in mixed population containing all shapes (i.e., participants select the preferred color-shape combination of a given shape in the presence of both other shapes, combined with the same three

colors: nine color-shape combinations). Each panel shows for one particular shape (circle, triangle, square) the development of color proportions of red, yellow, and blue individuals (shown as stacked bars) in an evolutionary run, displayed for generations 1, 2, 4, 6, 8, and 10 (abscissa). Because the mixed shape population in **(A2)** does contain other shapes as well, the proportions for each individual shape do not add up to 100%. Averages from *n* = 6 participants.

population should be made up of the two other shapes. The overall proportion of the selected shape grows from this level 70–80% in generation 10 for each of the shapes, as a result of participants complying with the task to choose their preferred color combinations for this particular shape, at the expense of reducing the proportion of alternative shapes of any color. It is important to note that even at the end of the evolutionary run the population does not reach a 100% level of the selected shape because the genes for other shapes are not completely eliminated from the population, although it is clear that other shapes are substantially reduced (i.e., the participants complied with the task, looking for the right shape). Within each of the shape panels for condition A2, we find similar but not identical trends to those observed made in condition A1: a substantial growth of yellow and red for circles, a considerable increase in yellow for triangles, a substantial increase of blue and modest increase of red in squares.

In general, it is interesting to observe that the presentation of mixed shapes results in clearer, and more distinct preferences for each of the three shapes (red circles, yellow triangles, and blue squares) than it was the case when the shapes were presented in isolation, suggesting that the mixed condition perhaps best replicates Kandinsky's experiment since participants seem to have associated a single color and shape. In two additional experiments, during which each shape was either presented in isolation or containing a mixture of all three shapes throughout the generations, with different search spaces were designed to find out whether such a trend can be generalized across a range of conditions.

So far, our experiments were restricted to the three "primary" colors used by Kandinsky in his work, red, yellow, and blue, which does not speak to any preference outside this very restricted range of choices, on the presence of other potential candidates. The strength of our current experimental method is to probe large "solution spaces," i.e., preferred coloration from a larger color palette. The results of two tests, during one of which each shape was presented in isolation, and the other containing a mixture of all three shapes throughout the generations, are summarized in panels B1 and B2, respectively, of **Figure 4**.

The first impression of the overall figure, being very colorful, seems suggest that there is not a dominating subset of preferred color, but a rather mixed set of choices that can develop o preference for certain combinations over time. For instance in the left two panels, for circles, one can see a weak preference for red developing that mirrors that observed in **Figure 3** for the minimal

**FIGURE 4 | Evolution of populations with three shapes × seven colors. (B1)** three shapes are tested in separate populations containing a single shape each (i.e., participants are asked to select the preferred individual from seven color-shape combinations with identical shape). **(B2)** three shapes are tested in mixed populations containing all shapes (i.e., participants select the preferred color-shape combination of a given shape in the presence of both

other shapes, combined with the same set of seven colors: 21 color-shape combinations). Each panel shows for one particular shape (circle, triangle, square) the development of color proportions of red, orange, yellow, green, cyan, blue, and magenta individuals (shown as stacked bars) in an evolutionary run, displayed for generations 1, 2, 4, 6, 8, and 10 (abscissa). Averages from *n* = 6 participants.

color palette, but there is no similarity for yellow preference. It should be noted, however, that for a set of colors defined by yellow and its two neighbors (orange on green) will occupy more than half of the final population and that the two colors furthest away from red and yellow (blue and cyan) end up with smallest probabilities in the single shape condition, whereas in the mixed shape condition a strong dominance for a yellow-orangered group develops, Similarly, the final population of triangles is dominated by yellow and its two neighbors, orange and green in both conditions, and squares leading to a cyan-blue-purple dominance that in a wider sense reflect the preferences in the restricted color palette. In conclusion, some level of color preferences do develop in both experiments (A and B), but the preference seems not to be narrowly restricted to a particular shade of colors (which could relate to Kandinky's initial thoughts about color and angles, see Kandinsky, 1926), but rather link to a broad range of similar colors on the color circle.

Whereas experiment B1/B2 allowed us to expand the observations of preferred colors beyond the restricted set of three primary colors used by Kandinsky, we can also ask a corresponding question about the constraints about the shapes that have been associated with colors. Still keeping the limited set of three every basic shapes used by Kandinsky, we made an initial step of expanding the search space in this respect by a further set of evolutionary runs (with three participants) allowing different orientations of these shapes with respect to the vertical. For instance, the Apex of the triangle could be rotated such that they point in eight different directions (cf. **Figure 2**), thus increasing the large "solution spaces" considerably for this configuration. Because corresponding rotations only lead to two distinct appearances of squares (orthogonal and oblique squares, respectively) and identical phenotypes for circles, due to their specific properties with respect to rotation symmetry, we focus here in the results for triangles (which in our preceding experiments also showed the strongest baseline color-shape association). The results of two tests for this shape, during one of which each shape was presented in isolation, and the other containing a mixture of all three shapes throughout the generations, are summarized in panels C1 and C2, respectively, of **Figure 5**.

The two panels in the left side of **Figure 5** illustrate the time course of evolution in the present of an orientation gene, showing the average proportions of the three colors in triangles, irrespective of the orientation of the shape chosen by the participants. The striking similarity of these two diagrams with those shown

**colors × eight orientations. (C1)** shapes are tested in separate populations containing a single shape each, data shown for triangle (i.e., participants are asked to select the preferred triangle from 24 color-orientation combinations). **(C2)** all three shapes are tested in mixed population containing all shapes (i.e., participants select the preferred color-shape combination of a given shape presented at any one of eight orientations in the presence of both other shapes, combined with the

development of color proportions of red, yellow, and blue individuals (shown as stacked bars), averaged for all orientations, in an evolutionary run, displayed for generations 1, 2, 4, 6, 8, and 10 (abscissa). The panels on the right side show the proportions of red, yellow, and blue individuals in the final (10th) generation in an evolutionary run, for each of eight stimulus orientations (abscissa). Averages from *n* = 3 participants.

in **Figure 3** for triangles (the middle panels in A1 and A2) clearly demonstrates that the additional freedom to select an orientation did not change the color preferences associated with the three basic shapes. The two panels on the right side of **Figure 5** show the proportions of colors associated with each triangle orientation in the final generation of the evolutionary runs for each of the eight orientations separately. Because the overall number of individuals was split into eight subsets for this analysis (note the different scaling of on the ordinates), the data are much noisier than those in the previous figures. There might be a weak tendency of overall preference for upright orientations (compare stacked bar sizes around 0◦ with those around 180◦), and as result of this the preference for yellow seems to be more pronounced in these orientations. However, there is no clear and systematic association of any particular color with any particular orientation of the triangles, suggesting that the preferred association of yellow with triangles is a genuine property of the shape rather than being related to its particular orientation.

#### **VARIABILITY BETWEEN SINGLE PARTICIPANT**

It is an important question whether the existence or absence of preferences for particular shape-color combinations, and the strength of any preference, in the color proportions at group level shown in section Group Results, is the result of individual variations (i.e., different but pronounced color preferences for different participants) or a property of the association between color and shape itself (i.e., variability of individual decisions, or gene frequencies within the same participant). It is important to keep in mind that an inherent, and crucial, feature of the genetic algorithm itself is to sustain variability in the population by selection and mutation. Whilst this means that we have to expect genetic diversity in any population generated by an individual participant in our experiments, inter-individual differences in color proportions would still be reflected by convergence toward different color proportions for different individuals. This question will be followed up by looking at some individual date from the same experiments in more detail. In **Figure 6** we show for each of the three shapes (shown in separate panels) the proportions of colors in condition A2 (three colors, mixed shapes) in the final population, separately for each of the six participants (columns with labels), together with the group averages (column seven at the right of each panel).

The amount of inter-individual variability is immediately apparent when looking at any of the data panels: for instance,

for circles, on participant expresses a strong preference for yellow, two others for red, and the three remaining participants end up with a range of colors in their final populations. There is also considerable variation of the proportion of other shapes which remaining in the final population, as expressed by the overall height of the stacked bar—in participant P6, more than 60% of the final population is made up from triangles and squares, despite the instruction to participants to look for their preferred circles. Comparing the results from different shapes, we find participants with distinct color-shape associations (such as P5 with red-yellow-blue dominance for circle-triangle-square) others with preferred colors (such as P3 with red-yellow-red, or P1 with yellow-yellow, blue), and others with very little distinct preferences (such as P4)

Observing in **Figure 6** distinct, but not identical, references for a particular form-shape associations in some participants raises another interesting question, about the internal consistency of variations: is an individual preference a reproducible association that would be conserved over extended periods of time or just a spontaneous preference arbitrarily expressed at the moment in time when this experiment was carried out? A first answer to this question arises from the comparison of experimental data of participants across different experiments that could have been separated by many weeks. Three of our participants were tested in all of the experimental conditions reported here, and therefore offer a good data set for such a comparison, which we restrict to the conditions involving the combination of three colors and three shapes that are straight forward to compare. In **Figure 7** we show for each of the three shapes (shown in separate panels) the normalized proportions of colors in condition A1, A2 (individual, mixed shapes, upright orientation) and C1, C2 (individual, mixed shapes, pooled across eight directions) in the final population, separately for the three participants (columns for four conditions shown in a block for each participant).

A general inspection of **Figure 7** gives a clear impression of color patterns in the bars, which varies considerably between different panels (i.e., shapes) and between different participants (blocks within each panel), but exhibit an impressive resemblance with each other for a give participant and condition (within a block of bars), For example, for squares (right panel of **Figure 7**) in each of the respective bars there is a clear dominance of blue for participant P1, of yellow for participant P4, and red for participant P3. This suggests a considerable persistent of individual color preferences for shapes (i.e., low intraindividual variation) in the presence of substantial differences of color preferences between individuals (i.e., high inter-individual variation).

#### **DISCUSSION**

GA are commonly used in engineering applications as an optimization tool and have been explored in our previous work as a powerful method for studying human decision making, by using the subjective responses of human observers to visual stimuli to evolve a preferred stimulus. In particular, in our initial work (Holmes and Zanker, 2008) conventional key press responses for the selection of the preferred designs, in that case aspect ratios of simple rectangles, was replaced by selection based on the amount of time spent fixating the individual rectangles which was recorded using an eye-tracker. This method was expanded by subsequent experiments (Holmes and Zanker, 2012) which developed an oculo-motor signature based on several aspects of eye movements toward and between targets which provide a more specific reflection of choice, and was tested with a wider range of stimuli rich in a number of image attributes, supporting optimization across much larger solution spaces. An additional advantage, which is crucial for the current study, arises from the possibility to investigate choice preferences at the individual level rather that being restricted to choice frequencies determined for groups of observers.

The present work uses this technique to generate profiles of individual and group preferences and confirms that the GDEA methodology seems sound when extending beyond single gene and monochromatic phenotypes, which is clearly an

**FIGURE 7 | Color proportions in final populations for three participants in four different conditions (as shown as averages in A1, A2 and C1, C2 in Figures 3, 5).** Each panel shows for one particular shape (circle, triangle, square) the proportions of red, yellow, and blue population members (shown as stacked bars,

ordinate, to facilitate the comparison across conditions all data are shown here normalized as percentages of a particular shape in each of the three colors) in the final generation of the evolutionary runs, displayed for each of the three participants (P1, P4, P3) as a block of four condition (abscissa).

important step when using it to evaluate questions of aesthetics. It is clear from **Figures 3**–**5** that the Evolutionary algorithm develops smoothly and consistently from random choices (i.e., balanced proportions of colors) in generation one through consecutive generations to characteristic preferences expressed in the final generation. Our conclusions are therefore focused on the associations between colors and shapes expressed in the final populations, which is the result of testing and retesting the preferential looking that eliminates the effects from onscreen position or other concurrently presented color/shape combinations. It should be emphasized again that these experiments show that the GDEA in a multi-dimensional solution space have the potential to rapidly identify robust individual aesthetic preferences. This method of evaluating several stimuli in a single presentation exploits the ability of participants to perform multi-dimensional comparisons quickly which has a direct impact on their eyemovements, allowing the relative fitness of multiple stimuli to be evaluated simultaneously. A typical evolutionary run with 10 generations and 2 stimulus presentations per generation would require 20 presentations. In the current experiment this was sufficient to explore a "solution space" representing between 9 (3 colors, 3 shapes) and 168 (7 colors, 3 shapes, 8 orientations) different stimulus configurations. When testing the same set of stimuli in a two-alternative forced choice experiment with 10 repetitions (to get a preference measure), the number of required stimulus presentations grows approximately with the square of the number of stimuli to be compared with each other, from 9 × 8 × 10 = 720 to 168 × 167 × 10 = 280,560, illustrating how "economical" an EA can be when exploring large solution spaces (cf. Holmes and Zanker, 2008). The Building Block Hypothesis (Goldberg and Holland, 1988) goes someway to explaining how an EA performs multidimensional searches so efficiently, because they effectively test interactions across all the dimensions simultaneously with each phenotype, rather than preferential looking paradigms, including 2 AFC methods, that typically limit the number of dimensions being varied between each pair of stimuli to 1. This results in a localisation of the region in the solution space that attracts strongest interest much faster than would be possible using manual selection or sequential presentation in a two alternative forced choice (2AFC) paradigm combined with a more conventional means of varying the stimuli such as the interleaved staircase (Cornsweet, 1962).

The key benefits of GDEA—the fast sampling of a huge stimulus space, and the flexibility in testing participants without the need for them to know or understand the experimental question—makes this an attractive methodology to be explored in many different contexts, such as testing human infants or even animal studies, where description of a task to the participant is simply not possible. Furthermore, it clearly lends itself to be applied to the experimental investigation of aesthetics, which has been targeted in the present study with a well known question from the Bauhaus arts school. Because verbal instructions are needed to accompany the GDEA or in *post-hoc* controls to differentiate between aesthetic choices and unspecific responses, for instance based on stimulus saliency, the method obviously is restricted to investigate preference as such in infant or animal studies.

Fechner's (1860) methods of choice, use and production have been used with mixed results to study the question of preferred associations between shapes and colors. Kandinsky's (1923) own attempt used the method of production by asking students to color in three particular shapes. A more recent attempt to reproduce his results with a wider range of shapes and colors (Albertazzi et al., 2013), used the method of choice (task: "choose a color from the circle that you see as the one most naturally related to the shape") and led to some distinct preferences that, however, were only partially consistent with Kandinsky's claims. Another study, using an "Implicit Association Test," which can also be categorized as a method of choice, did not produce any significant preferences (Makin and Wuerger, 2013). As we know form another experimental study of aesthetic preference, investigating bias toward the golden ratio, different results can result from using methods of choice or production, respectively (Green, 1995). Most importantly, choice method seems to be particularly vulnerable to the range of choices presented with a tendency to cluster around the center of the range. The GDEA represents a new methodology for exploring such questions, as it combines the methods of choice and production: phenotypes are produced from other phenotypes which have been previously selected by participant's choices—by removing any high-level demand for making an aesthetic decision or carefully considered action it opens an opportunity to immediately access observer's preferences. In particular, the chances of a participant encountering their preferred phenotype are often relatively small and so it must be produced through the recombination of other members of the population. Most interestingly, the only strong association observed form in our experiments is a preference for yellow triangle, which resonates with the suggestion by Albertazzi et al. (2013) that the "warmth" and degree of "natural lightness" of hues are related to particular shape.

In our previous work, we always maintained a one-to-one relationship between the phenotype (the visual stimulus) and the chromosome (the genetic representation of the stimulus): each chromosome exactly defined a unique phenotype and each phenotype could only be represented by one chromosome. In conditions C1/C2 of the present set of experiments, as well as condition D1/D2 which are described in detail in Holmes (2010), we attempted to introduce a rotation gene which changed this genotype-phenotype relationship to become one-to-many because of the rotation symmetry properties of some shapes for example, the rotation gene could contain any value and it had no effect on the phenotype for circles. In fact, the difference in perceivable changes as a result of genetic changes is an important consideration when using the GDEA. By introducing inequalities in the number of distinct genetic representations for a single phenotype biases are introduced in the fitness estimation unless steps are taken to recognize that different genetic representations are effectively identical from the participant's perspective. This is because the looking preferences of the participant in one-to-one mapping can be directly attributed to the individual chromosome, whereas in a many-to-one mapping, the fitness scores need somehow to be combined from similarly appearing phenotypes and attributed to all related chromosomes. One option here would be to introduce some kind of template match or image recognition component to the algorithm as part of the fitness estimation process. An alternative approach would be to represent the features which introduce the one-to-many relationships within the chromosome, rotational symmetry in this case. This way the relationship between fitness and rotational symmetry could be explored independently of the shape it is associated with.

It is important to be aware of the potential for a particular task specification to affect the behavior of the participant. In our current experiments participants were asked to look for their favorite circle, square or triangle. The explicit statement of the shape to be searched for was simply to ensure an equal number of trials were completed by each participant in the mixed shape conditions. However, asking participants to search for their favorite triangle, for example, in the mixed shape conditions b and d, immediately informed them that the other shapes were not of interest, potentially causing them to be regarded as distracters rather than be evaluated as other shapes which might be more appealing in the color currently being associated with a triangle, for example yellow. This is an example of where the use of a specific task instruction for experimental control potentially diminishes the power of the GDEA approach to experimental aesthetics which samples the entire solution space allowing the participants to freely explore the different phenotypes resulting from different gene interactions. Here the exploration was directed using a question of shape preference, but could equally have been performed using a color preference task in which participants were instructed to look for their favorite red shape, for example, and thus selecting the shape which looks best in red. The mixed color conditions would result in highly salient distracters which could easily be eliminated, suggesting a better task for exploring aesthetic preference would have been simply to look for the favorite shape (unspecified) in all conditions and use non-parametric methods to analyse the data. This relationship between the task and the genome is important particularly when the task becomes one of free-viewing since the genome must not re-introduce the biases removed by the unguided task.

In the current experiments, we simply used the ratio of cumulative fixation time (gaze positions that remained for more than 100 ms with a given region with less than ±0.25◦ of movement) to presentation time to determine the fitness score for each individual target in a given display. Such a method is susceptible to salience effects, attracting gaze toward the most obviously visible object in a scene, rather than being especially sensitive to attaching particular labels to object that are associated with evaluation behavior (which, for instance could lead to re-visits individual items). In subsequent experiments (Holmes and Zanker, 2012) a more specific oculo-motor signature has been further developed to look at the full time-course of fixations. However, the results in the current study suggest that salience alone was not driving the responses since if that were to be the case yellow would have always driven preference, raising the question is why this preference is so much stronger for the triangle than for the other shapes. Therefore, our experiments provide evidence of individual colorshape correspondence underlying the preferences observed here, rather than simply being driven by salience, color, or luminance effects.

Notwithstanding such limitations, the current work has shown that there is a strong correlation between the eye-movements made during decision making, and the decisions themselves. Additional work (Holmes and Zanker, 2012) using the developed oculo-motor signature that included some tests of color-shape associations (which, however, was only exploratory and should be corroborated with a larger study and the full signature), corroborates our results based on correlations between eye-movements and conscious preference. The ability of the developed subjective fitness function to predict preference needs to be established using a new set of stimuli in experiments without the explicit preference task, in order to validate whether it can be generalized to other aesthetic evaluations, which are not guided by direct instructions to participants. Various steps toward this goal by using the extended GDEA and novel stimuli have been reported elsewhere (Holmes et al., 2010; Holmes and Zanker, 2011).

#### **CONCLUSIONS**

Taken together and keeping the limitations of the current data in mind, our present results support a view that there is a certain degree of correspondence between color and shape in all participants, and that particular preferences are reproducible for individuals. Whilst individual combinations are not necessarily consistent with Kandinsky's reported correspondences, our findings do suggest that aesthetic preference for more complex color-shape combinations as seen in art, design, and packaging might be influenced by such associations. Extending the color palette disrupts the robust preference for yellow triangles observed in the three color condition, suggesting that to some extent the constraints of Kandinsky's original experiment biased his results, as his students were not allowed the freedom to repeat the use of a single color for multiple shapes, or use additional colors, although our small sample size does not allow us to draw a conclusion with any degree of confidence. Thus, the interpretation that, for example, triangles are most preferred when they are yellow should be treated with caution. Kandinsky believed that preferences such as those for the yellow triangle was related to the characteristics of the angles that define the shapes, and at the same time could resulted from religious iconography and was directly related to its pointing upward to the Sun and God (Kandinsky, 1923, 1926). Interestingly, changing the orientation of the triangle seemed to have no clear effect on its associated color in the single shape condition, suggesting that this preference, which also shows inter-individual differences, may have its roots elsewhere. The results do suggest that correspondence may exist between a shape and a range of colors which may well have its roots in semantic associations with the geometric shapes (Leborg, 2006).

#### **ACKNOWLEDGMENTS**

This work was supported by an EPSRC Case studentship in collaboration with P&G. The authors thank their participants for their patience, a number of colleagues for helpful comments, Szonya Durant for her insightful comments on the slowly evolving manuscript, and the enthusiasm and insights of everyone at VSS, ECVP, and ECEM where this and related work was presented.

#### **REFERENCES**


Kandinsky, W. (1923). "Farbkurs und Seminar," in *Kandinsky: Complete Writing on Art*, eds K. C. Lindsay and P. Vergo (New York, NY: Da Capo Press), 501–504.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 26 September 2013; accepted: 22 November 2013; published online: 13 December 2013.*

*Citation: Holmes T and Zanker JM (2013) Investigating preferences for color-shape combinations with gaze driven optimization method based on evolutionary algorithms. Front. Psychol. 4:926. doi: 10.3389/fpsyg.2013.00926*

*This article was submitted to Perception Science, a section of the journal Frontiers in Psychology.*

*Copyright © 2013 Holmes and Zanker. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## The effect of background and illumination on color identification of real, 3D objects

#### *Sarah R. Allred\* and Maria Olkkonen*

*COVI Research Lab, Department of Psychology, Rutgers – The State University of New Jersey, Camden, NJ, USA*

#### *Edited by:*

*Galina Paramei, Liverpool Hope University, UK*

#### *Reviewed by:*

*Alessandro Rizzi, Università degli Studi di Milano, Italy Sarah Elliott, University of Chicago, USA*

#### *\*Correspondence:*

*Sarah R. Allred, Department of Psychology, Rutgers – The State University of New Jersey, 311 N. Fifth Street, Room 301, ATG Camden, NJ 08102, USA e-mail: srallred@camden.rutgers.edu* For the surface reflectance of an object to be a useful cue to object identity, judgments of its color should remain stable across changes in the object's environment. In 2D scenes, there is general consensus that color judgments are much more stable across illumination changes than background changes. Here we investigate whether these findings generalize to real 3D objects. Observers made color matches to cubes as we independently varied both the illumination impinging on the cube and the 3D background of the cube. As in 2D scenes, we found relatively high but imperfect stability of color judgments under an illuminant shift. In contrast to 2D scenes, we found that background had little effect on average color judgments. In addition, variability of color judgments was increased by an illuminant shift and decreased by embedding the cube within a background. Taken together, these results suggest that in real 3D scenes with ample cues to object segregation, the addition of a background may improve stability of color identification.

**Keywords: color appearance, constancy, illumination, background, object, 3D**

#### **1. INTRODUCTION**

For the surface reflectance of an object to be a useful cue to object identity, judgments of its color should remain relatively stable across changes in the object's environment. This stability is known as color constancy. Achieving color constancy between scenes poses a difficult problem for the visual system because the sensory signal that reaches the eye from a scene confounds the surface reflectance of objects within the scene and the illumination impinging on the scene. For example, imagine moving a coffee mug from the kitchen counter to a patio table outside. Both the illumination and the sensory signal reaching the eye from the mug and the area surrounding it will change. The reflectance properties of the mug have not changed, but the reflectance properties of its surrounding surfaces have. The challenge for the visual system is to correctly parse the changing sensory signal in a fashion that supports color identification.

A complete theory of color vision would characterize behavior in real-world color tasks for objects in scenes where both the illumination and surrounding objects change. We are still far from this goal, both because the characterization of such realistic stimuli is currently a computationally intractable problem and because typical laboratory tasks diverge from real-world tasks in a number of ways (see Brainard and Radonjic, in press, for discussion).

There are several approaches available as we seek to move toward a more complete theory of color vision. One general approach is to simplify from the complexity of realistic stimuli and tasks to more carefully controlled tasks and stimuli, with the goal of uncovering principles that govern the relationship between stimuli, task, and color judgments. The hope is that such principles will generalize well to more complex tasks and stimuli. Experiments in this vein have achieved success in demonstrating relationships between early physiological mechanisms and judgments of color appearance (Werner and Walraven, 1982; Webster and Mollon, 1994; Engel and Furmanski, 2001) and have guided the development of computational models that can predict color appearance judgments (McCann et al., 1976; McCann, 2004). However, an important question is whether such principles in fact generalize to color judgments of more realistic stimuli and in more ecologically relevant tasks. Indeed, recent work highlights the difficulty of linking early physiological mechanisms to the later cortical mechanisms that presumably underlie functional color judgments in complex scenes (Gegenfurtner, 2003; Solomon and Lennie, 2007; Witzel and Gegenfurtner, 2013). Thus, a complementary class of experimental approach is to measure color judgments that employ more realistic tasks and stimuli. Because the critical variables underlying perception of such realistic stimuli are not yet amenable to a clear computational characterization, this approach has the disadvantage that the data are not obviously applicable to known physiological mechanisms and models. However, such experiments can provide important guidance about ecologically relevant variables as we develop increasingly complex models of human color vision.

Here we take the second approach with the goal of measuring color judgments in 3D scenes with a real-world color task. In the remainder of the introduction we outline the principles that might be expected to generalize from simpler scenes to govern such color judgments.

In many cases it is now possible to predict successfully color judgments of a uniformly illuminated flat test stimulus surrounded by other flat stimuli. For example, one can start with the responses of cones in the retina, and compute color estimates explicitly using computations grounded in the opponent chromatic and luminance responses of cells early in the visual system (Land and McCann, 1971; McCann et al., 1976; McCann, 1992a; Zaidi et al., 1992; Nayatani, 1997). Although a vigorous debate continues about the exact mapping between local contrast and color appearance (McCann, 1992b; Singer and D'Zmura, 1994; Brown and MacLeod, 1997; Zaidi et al., 1997; Blakeslee and McCourt, 2001; Rudd and Zemach, 2004; McCann, 2006; Ekroll and Faul, 2012), local contrast in some form is central to many theories. Such local contrast mechanisms in principle support color constancy under illumination shifts, but yield poor color constancy under background shifts. Consistent with this, a large body of work suggests that color constancy in 2D scenes is relatively high under illuminant shifts (Smithson, 2005; Shevell and Kingdom, 2008; Foster, 2011; Brainard and Radonjic, in press) but relatively poor under background shifts (McCann, 1992b; Kraft et al., 2002; Werner, 2006).

As we move from 2D scenes with uniform illumination to 3D scenes with non-uniform illumination, an important question is whether these consistent findings of high constancy under illumination shifts and poor constancy under background shifts will generalize. There are at least two reasons to be cautious about such generalizations.

First, as scenes become more complex, the local contrast relationships between object and background likewise become more complex. For example, the light reaching the eye from an object of one surface reflectance may vary because object pose with respect to the illuminant introduces illumination gradients or shadows, because of variation in the illumination itself, because of the texture of the object, or because of specular highlights (Brainard and Radonjic, in press).

Thus far, the empirical research is mixed. In support of generalization, some recent work suggests that, as in 2D scenes, observers adjust color matches to compensate partially for illumination gradients (Boyaci et al., 2003, 2004; Ripamonti et al., 2004; Allred and Brainard, 2009; Xiao et al., 2012). Also as in 2D scenes, constancy is less stable when the surfaces surrounding an object change than when the illuminant changes, and constancy is particularly poor when both are manipulated together (Delahunt and Brainard, 2004; Allred and Brainard, 2009). Similarly, Kraft et al. (2002) found that reducing cues to depth in a real scene had little effect on color constancy, suggesting that at least in some cases, depth is not a critical variable.

In contrast, other research suggests a more complicated picture. For example, Xiao et al. (2012) reported interactions between illuminant cues and object form, and perceived color can be strongly influenced by the perceived shape of a test stimulus (Adelson, 1993; Bloj et al., 1999) or the region of the scene with which a test stimulus is perceptually grouped (Gilchrist, 1977; Schirillo and Shevell, 2000). And it is clear that the geometric structure of a scene can exert effects on color judgments beyond those that can be explained by local contrast. For example, Radonjic and Gilchrist ´ (2013) found that perceived depth modulates perceived lightness even when local luminance ratios remain constant, and Werner (2006) demonstrated that the addition of depth cues alone improves color constancy.

A second reason to be cautious about generalization is that the task facing the observer may also be complicated by increasing scene complexity. For example, consider again the mug moved from inside to outside. An observer might notice subtle differences in the appearance of the mug—one surface might appear shadowed, for example—while simultaneously recognizing that the reflectance properties of the mug itself are uniform and unchanged from indoors. Although observers may make distinct appearance and reflectance judgments in 2D scenes (Arend and Reeves, 1986; Troost and de Weert, 1991; Arend and Spehar, 1993a,b; Bäuml, 1999; Blakeslee and McCourt, 2001), the greater physical complexity of 3D scenes may exacerbate those distinctions. Many previous studies in 2D scenes either explicitly or putatively rely on proximal or appearance judgments (Brainard and Radonjic, in press). However, many real-world color tasks require us to identify objects between scenes rather than make exact appearance matches (Zaidi, 1998; Abrams et al., 2007). For example, when picking out a thread at the store to match a button at home, we seek to match the reflectance properties of thread and button, not the color appearance between home and store illumination. Thus, to the extent that observers make reflectance rather than appearance judgments in 3D scenes, results from 2D scenes may fail to generalize. We do note that the literature surrounding appearance and reflectance instructional effects is somewhat murky (Brainard et al., 1997; Blakeslee and McCourt, 2001; Ripamonti et al., 2004; Allred and Brainard, 2009; Allred, 2012; Brainard and Radonjic, in press), and we return to this topic in the discussion.

To summarize, here we measured color identification of real 3D objects. Observers made color matches for real cubes presented in an unevenly illuminated three-dimensional scene in which we independently manipulated both the illumination impinging on the scene and a three-dimensional background in which the cube was embedded. To examine real-world task constraints, observers matched the reflectance of the object.

#### **2. MATERIALS AND METHODS**

Observers were 122 college students who participated for course credit. All experimental procedures were approved by the Rutgers IRB (Protocol #E10-410) and written informed consent was acquired from all observers. Observers had normal or correctedto-normal visual acuity and normal color vision as assessed by the Ishihara Color plates. Observers entered a room and viewed two adjacent 4- × 4- × 4 gray flat matte booths. Illumination in the room was provided separately for each booth (chromaticity in CIE uvY space; Booth A: *u* = 0*.*27, *v* = 0*.*53, CCT ∼2600K; Booth B: *u* = 0*.*22, *v* = 0*.*50, CCT ∼4000K ).

Observers sat in a rolling chair and were free to move positions. Mounted 4*.*5- from the front of each booth was a book of 1022 commercial paint chips (Sherwin-Williams, 2010) which served as a matching palette (**Figure 2**). The palette mount allowed observers to rotate individual palette strips into the booth, but a stopper prevented the palette strips from rotating out of the booth illuminant. Each palette strip contained either 7 or 8 paint chips. Experimenters monitored observers to make sure that they did not climb into the booths or move the cubes. Observers were instructed in each condition to choose the paint chip that matched the paint of the cube under study, and observers were instructed to make their final chip selection when the palette strip was aligned with the stopper (see **Figure 2**). The instructions were intended to evoke reflectance rather than appearance matches. Sixteen 3-- × 3-- × 3-cubes (subtending 4.5◦–6*.*5◦ at usual viewing distances), painted with different colors of flat matte paint chosen to approximately span color space (see **Figure 1**) served as stimuli.

Observers made color matches by inspecting the paint palette and writing the number corresponding to the paint chip that best matched the paint on the cube. In the *baseline* condition, which served as the comparison for all other conditions, observers viewed the cubes and matching chips in the same booth (**Figure 2**, Trial 1, right cubes). In the *illumination* condition, observers looked between booths while viewing cubes in one booth and matching chips in the other booth (**Figure 2**, Trial 2, right cubes). The *background* condition differed from the that the cube was embedded in a three-dimensional background (**Figure 2**, Trial 1, left cubes). The *joint* condition combined manipulations, so that cubes were embedded in the background in one booth and the matching chips were viewed in the other booth (**Figure 2**, Trial 2, left cubes).

Each observer performed two trials (see **Table 1**), one each on two different days. On each trial, observers viewed four different cubes, two in each booth. One cube in each booth was embedded in a background (see **Figure 2**). This yielded eight color matches per observer, two in each condition. Thus, each observer made color matches for 8 of the 16 cubes. On Trial 1, color matches were made from the palette mounted in the booth in which the cube was viewed (baseline and background conditions) and on Trial 2, color matches were made from the palette in the opposite booth (illumination and joint conditions, see **Table 1**). To prevent order effects, we counterbalanced between observers to achieve color matches for each cube in each of the four conditions; thus, observers never viewed an individual cube in more than one condition. We did not counterbalance the booth in which cubes were seen; thus, in the illumination condition, half the

**FIGURE 1 | Chromaticity in CIE uv coordinates of the 16 cube stimuli (colored squares), 1022 paint chips (black dots), and illumination for Booth A (black x) and Booth B (black +).** Luminance information was discarded. Plotted measurements were made in Booth A. The square's color is an approximation of the cube's apparent color.

cubes were viewed in Booth A and matched in Booth B, and the other half were viewed in Booth B and matched in Booth A (see **Table 2**). There were a total of eight different backgrounds. Each cube was seen with only one background. Background paints were chosen by eye to be approximately color-opponent while remaining in a different color category from any other stimulus (cube or background) present within a particular trial. The category restriction sometimes resulted in non-opponent color pairings. The color categories and chromaticities for each cube and its background are enumerated in **Table 2**, and illustrated in **Figure 3**. Implications of cube/background pairings are addressed in the discussion.

Color specifications were made using a Spectrascan PR-655 spectral radiometer (Photo Research Inc., Chattsworth, CA). Conversions between color spaces (wavelength to CIE uvY) were made using standard equations implemented in Matlab's Psychophysics Toolbox (Brainard, 1997). The white point was taken as the illuminant, measured with a reflectance standard (PhotoResearch, Inc. RS-2, Mg0 standard). In all analyses, we discard luminance and use only chromaticity values.

To specify the cube chromaticity, we measured each cube in the location where it was experimentally presented. The radiometer was positioned to approximate the average observers' eye point; however, there is considerable variability in this eye point since observers ranged in height and were free to move outside the booths. For each cube, measurements were from the top surface of the cube, in the corner closest to the observer. Repeat measurements were taken over the course of the experiment and showed very small deviations in chromaticity and somewhat larger variations in luminance. Chromaticity measures of the background were made on the bottom surface of the background closest to the observer, nearly below the location of the cube measurement. Radiometer measurements for each paint chip in each booth were made near the center of the paint chip. Although each cube and each background were painted uniformly, the 3D structure of the scene elicited considerable variations in luminance across each surface. This variation is seen easily in **Figure 2**. In this experiment, we made no attempt to control for or manipulate luminance. Radiometer measurements confirmed that chromaticity was relatively stable across surfaces.

#### **2.1. DATA ANALYSIS**

We discarded data from 11 of 122 observers for failure to understand the task as indicated by not recording a response for more than half the cubes, or for systematically recording cube color in the wrong location. From the remaining 888 trials (111 observers × 8 cubes), we discarded a further 105 trials for the following reasons: indecipherable or non-existent card notation (82/888 trials, 9%), missing radiometer data (7/888 trials, *<*1%) or color match of a clearly different, non-adjacent color category (15/888, 2%). To determine which matches fell into the last group, two lab members independently examined each color match and rated it as either within normal limits or of a clearly different category. Lab members were provided a list of matches for each cube but were not informed about the condition in which the match was chosen. Only matches judged as the wrong category by both lab members were discarded. In most cases (12/15), the

**FIGURE 2 | Photograph of experimental setup for one example trial.** On each trial, observers viewed four cubes, two cubes each in Booth A (left images) and Booth B (right images) that were separately illuminated. On each trial, one cube in each booth was embedded in a 3D background (for this condition, left cubes in each image). The matching palette (booklet in the front of each booth) contained 1022 paint chips. The palette in each booth rotated freely on a long screw mounted into palette, and the wooden stopper prevented observers from pulling palette strips out of the booth. Observers were permitted to flip freely through the book, but were instructed to choose

#### **Table 1 | Trial description.**


*On each trial, observers viewed four cubes in different locations (left column). The cubes are placed in these four locations (from left to right) in Figure 2. The second column indicates the condition of each cube. The third column indicates the booth where the cube was viewed (See) and the booth in which the color match was chosen from the paint palette (Match). Each cube was viewed in only one location. To achieve color matches for each cube in each condition between observers, we counterbalanced the location of the background and the trial (1 or 2) in which the cube was presented.*

discarded match seemed to match another cube on that trial, and thus probably reflects an observer recording the paint chip in the wrong location. For example, observers recorded a pink match for the blue cube and a blue match for the pink cube. Overall, similar numbers of color matches were discarded in each condition: baseline (21), illumination (30), background (26), joint (27).

a match only when the palette strip was aligned with the stopper. On Trial 1 (baseline and background conditions) observers chose color matches from the palette mounted in the same booth as the cubes. To illustrate this, the palette is open to the green section (Trial 1, Booth A) and the purple section (Trial 1, Booth B). On Trial 2 (illumination and joint conditions) observers chose color matches for a cube from the palette mounted in the other booth. As illustrated, the color match for the green cube (Booth A) was selected from the palette in Booth B, and the color match for the purple cube (Booth B) was selected from the palette in Booth A.

In all cases where significance levels are reported for a family of statistical tests, we report the *p*-value without the Bonferroni correction. We do so because the assumptions underlying the uncorrected *p*-value are relatively transparent, whereas the criteria for including a test within a specific family are not always clear.

#### *2.1.1. Color constancy index*

Many different metrics are used to describe color constancy (e.g., Foster, 2011). We described color matches across an illuminant shift by computing a color constancy index based on a modified-Brunswick ratio (*mBR*). This index describes the extent to which observers alter the chromaticity of color matches in the direction expected by color constancy. Values near 1 indicate high color constancy, such that observers selected a paint chip with chromaticity equal to that of the cube measured under the matching illuminant. Values closer to 0 indicate failure to compensate for the illuminant shift, and values of greater than 1 indicate overcompensation for the illuminant shift. We calculated the *mBR* as follows:

$$mBR = \left(\frac{proj\_{p\vec{h}\vec{y}s}\rho\vec{e}rc}{||\vec{m\vec{h}\vec{y}s}||}\right) \tag{1}$$

where *perc* is the perceptual shift caused by the illuminant shift, taken as the average color match in the illumination condition.


**Table 2 | Description of 16 cube stimuli (left half) and backgrounds (right half).**

*Names refer to the apparent color of the cubes and backgrounds. Each row gives one cube/background pairing. Cubes were always paired with the same background. Radiometer measurements are under Booth A illumination. Booth indicates where the cube was seen (see Table 1).*

In this index, *phys* is the chromaticity of the illuminant shift. Calculating *phys* is non-trivial; the illumination impinging on the cube in both booths is non-uniform, both because of the location of the illuminant and the 3D structure of the cubes. This is seen clearly in **Figure 2**, where the top of the cube reflects more light than the sides. Because *phys* varies across the booth, and because we had no way of knowing which portions of the cube the observers utilized for their matches, we calculated *phys* as follows: First, we made the assumption that the area of the booth observers utilize in making color matches is independent of condition. If this is a secure assumption, then a perfectly color constant observer would pick the same palette chip as a match in the overall and joint conditions as in the baseline condition. We took the palette chips chosen in the baseline condition, measured their chromaticity in the illumination condition, and took this as *phys* . Both *perc* and *phys* require a reference chromaticity. For the reasons just described, the reference was defined as the chromaticity of the average match in the baseline condition, rather than the chromaticity of the cube measured under the baseline illuminant. Thus, the constancy indices as calculated here are best described as relative constancy indices: the *mBR* measures the concordance between color matches in the baseline condition and color matches in each experimental condition, rather than the concordance between color match chromaticity and cube chromaticity.

#### *2.1.2. Error index*

To compare directly constancy in the illumination and background conditions, it would be useful to have a measure of constancy in the background condition. However, such a constancy index requires a definition of what constitutes a failure of constancy. In simple 2D scenes, one can estimate these failures using algorithms that equate cone contrasts between the baseline and background conditions. It is less obvious how such algorithms should be applied to our 3D stimuli, both because the cone contrast between cube and background varies substantially with scene location, and because we lack an empirical characterization of which parts of a 3D scene should be incorporated.

Thus, to avoid subscribing to a particular theoretical approach, we chose a relatively atheoretic error index (*eI*) to compare matches in the baseline and experimental conditions. To compute the *eI*, we took the distance in color space between the average color match and the color constant match, as described above. We defined the *eI* in the baseline condition as the split-half error, calculated by randomly dividing the baseline data into two groups and computing the distance between the average color match in each of the two groups.

#### *2.1.3. Central tendency*

We characterized the average color match of the data in each condition in two ways: First, after discarding luminance information, we took the mean u and v chromaticites across all matches in a condition as the average color match. Second, we determined the ellipse that best-fit the color matches in a least-squares sense (Fitzgibbon et al., 1999), and used the center of the ellipse as a measure of the average color match. The pattern of results is qualitatively the same with both measures of central tendency. Here we report the mean chromaticity as the average color match.

#### **3. RESULTS**

The main goal of this paper is to investigate the effect of illumination and background shifts on color matches. To that end, we first show color matches for all observers for representative individual cubes, and then turn to quantitative comparison across all cubes.

#### **3.1. INDIVIDUAL CUBES**

Color matches for all observers and all conditions for four of the sixteen cubes are shown in **Figure 4**. From these plots, several salient points can be made.

First, in the baseline condition, observers chose many different paint chips (unfilled blue diamonds, **Figure 4**). This range of color matches in the baseline condition was a common feature across all cubes (median number of paint chips chosen in baseline condition = 7, min = 4, max = 10; median number of observers per cube = 12). The trend of variability in baseline color matches is reassuring. The basic task employed here, choosing a flat paint chip from a commercial palette book to match a three-dimensional cube located at a distance from the palette, is somewhat non-traditional. Thus, the baseline data provide a useful sanity check: the paint palette was sufficiently discretized to provide a reasonable estimate of between-observers variability in color perception.

Although observers chose many paint chips for each cube, the region of color space spanned by the individual matches varies between cubes. For example, the paint chips chosen for plum in the baseline condition span a larger region of color space than do the chips chosen for aqua, yellow and doeskin. These differences could reflect true differences in color perception between cubes, or they could reflect the non-uniformity of the paint chips in color space seen in **Figure 1**. For the moment, we do not attempt to disentangle inherent inhomogeneities in color perception between cubes from palette inhomogeneities; rather, we seek in the subsequent analyses to ask how background and illumination affect color matches for a given cube.

Second, for each cube shown, observers exhibited relatively high but imperfect degrees of color constancy under a change in illumination. If observers were perfectly color constant; that is, if the observers chose the same paint chips under the illuminant shift as they did in the baseline condition, then individual data points (brown squares) would cluster near the constancy prediction (see Materials and Methods) indicated by the black crosses (**Figure 4**). If, on the other hand, observers matched the sensory signal reaching the eye in the baseline condition and failed to account for the change in illumination, the brown squares should overlap the color matches in the baseline condition (blue diamonds). Most of the brown squares are shifted toward the black crosses, but not identical to them, indicating that observers showed high but imperfect color constancy. Again, observers showed considerable variability in the number of distinct paint chips chosen (median number of paint chips chosen = 8, min = 5, max = 12).

Third, embedding the cubes in a background had little effect on color matches (magenta circles in **Figure 4**), in contrast to the relatively large effect elicited by a change in the illumination. In most panels, the matches made when the cube was embedded in a background were nearly identical to the blue diamonds of matches made to the cubes in the baseline condition.

Fourth, combining the addition of the surround with an illumination shift seems to have an effect similar to that of the illumination shift alone (green triangles similar to brown squares in **Figure 4**).

Lastly, inspection of the four panels reveals considerable variability in the extent of color space spanned by individual matches in a given condition. For example, the region of chromaticity space spanned by paint chip choices for the plum cube in each condition seems larger than for the yellow cube. Additionally, for each cube, the region of chromaticity space spanned by the brown and green symbols (illumination and joint conditions) seems larger than the region of chromaticity space spanned by the blue and magenta symbols (baseline and background conditions).

In the remainder of the paper, we quantify the extent to which the effects of experimental condition on both average color constancy and variability noted in the individual panels in **Figure 4** are consistent in the entire dataset.

#### **3.2. AVERAGE COLOR CONSTANCY**

As with the data for the individual cubes (**Figure 4**), average color constancy across all cubes under an illumination shift, shown in **Figure 5**, was generally high but imperfect. To quantify the degree of constancy, it is standard to compute a color constancy index. Such indices seek to frame the data with respect to their position between the *constancy* and *no-constancy* predictions, where 1 indicates perfect constancy,

**FIGURE 4 | Color matches in all four conditions (baseline, blue diamonds; background, magenta circles; illumination, brown squares; joint, green triangles) for four of the sixteen cubes.** Each unfilled data point represents one paint chip chosen; the size of the data point represents

the number of observers who chose that chip. Filled data points represent average color matches, and black solid crosses show constancy predictions. The x- and y-axis ranges are consistent between plots, though the starting point shifts to accommodate the relevant chromaticity range.

0 indicates a complete failure of constancy, and indices greater than 1 indicate that observers overcompensated for the illuminant shift. From the constancy predictions (illustrated for the four cubes in **Figure 4**), we computed such an index (see Materials and Methods). Briefly, the color constancy prediction was derived using the assumption that color constant observers would choose the same paint chips in the baseline condition as in the illumination condition; that is, their matches would reflect consistency in surface reflectance, rather than chromaticity.

Consistent with other color constancy studies of illumination changes in relatively realistic scenes, color constancy indices were quite high, averaging 0*.*88 ± 0*.*03. Constancy indices are displayed for all cubes in **Figure 5A**. Although average constancy indices were relatively high, there was substantial variability between cubes, reflected in the varying bar heights in **Figure 5A**. Indices ranged from 0.61 (orange) to 1.04 (brown). Within a cube, indices were relatively consistent between observers, where the standard error averaged about 6% of the constancy index.

How does embedding a cube in the background affect color matches? The individual data suggest that the effect of background is small. To compare illumination and background conditions, it would be useful to calculate a background constancy index that frames the data between the constancy and noconstancy predictions. However, since we lack a complete characterization of both the theory and low-level computations involved in color constancy in three-dimensional scenes, it is not obvious how to compute the no-constancy prediction for the background condition. In scenes that consist of uniformly illuminated flat stimuli embedded in backgrounds, a simplifying assumption that is based on early processing in the visual system is that a matching surface will appear the same as a study surface when the cone-excitation ratio between the match and its surround equals the cone-excitation ratio between the study surface and its background. Although we computed such local-contrast predictions (not shown), their dependence on luminance meant that the the no-constancy match varied substantially depending on what radiometer measurements were utilized.

To avoid potentially spurious relationships that might either hide or exaggerate the effect of the background, we compared illumination and background matches to baseline matches using a less theoretically motivated error index (*eI*). We defined the *eI* as the distance in color space between the chromaticity of the average match and the constancy prediction. Unlike a constancy index, the eI compares the magnitude of experimental effects and is agnostic about cause or directionality of effects. Such an index is particularly useful in the background condition, where a color constancy index may be influenced heavily by theoretical assumptions and there is less consensus about the size or direction of expected effects.

For a majority of cubes, errors in the background condition were smaller than errors in the illumination condition, as evidenced by the majority of points being below the identity line in **Figure 6B**. Aggregated across cubes, this difference was significant (paired two tailed *t*-test, *p <* 0*.*05; second and third bars, **Figure 6A**). To provide context for the size of these errors, we compared them to a split-half baseline error (first bar, **Figure 6A**). Although illumination errors were significantly different than baseline (paired two tailed *t*-test, *p <* 0*.*05), errors elicited by the addition of a background were no different than baseline errors (paired two tailed *t*-test, *p* = 0*.*43). Thus, background errors were comparable in size to the variability within the baseline data. Thus, in contrast to the robust phenomenon of color induction in flat stimuli with uniform surrounds, embedding a cube in a background has little effect on color judgments.

Next we asked how the effects of background and illumination combine. Real world color constancy tasks often involve both changes in surrounding surfaces and changes in the illumination, and previous research has suggested that constancy is particularly poor when both changes are made simultaneously (Delahunt and Brainard, 2004). Although we found little effect of embedding cubes in backgrounds without an illuminant shift, it remains possible that there is an interaction between background and illumination.

However, we found that constancy indices were no different in the joint condition than in the illumination condition (twotailed, paired *t*-test, *p* = 0*.*57), as demonstrated in **Figure 5B**, where color constancy indices remained close to the diagonal. The

cubes. In **(B)**, symbol color approximates apparent cube color, and black

diagonal line is the identity.

distance in color space between the chromaticity of the color constancy prediction and the actual average chromaticity of paint chips variability of color constancy between cubes in the joint condition was similar to the baseline condition (range with background 0*.*75–0*.*99; range without background 0.61–1.04) and marginally correlated (*r* = 0*.*45, *p* = 0*.*082) between conditions. Similarly, error indices in the joint condition were no different than in the illumination alone condition (two tailed paired *t*-test, *p* = 0*.*82, third and fourth bars in **Figure 6A**). Further consistent with the idea that background elicits no more errors than the baseline condition and the illumination shift elicits the same pattern of errors with or without a background, a Three-Way ANOVA showed no main effect of background, a main effect of illumination and no interaction between illumination and background (**Table 3**).

#### **3.3. VARIABILITY IN COLOR MATCHES**

In addition to average color matches, we also investigated the effect of background and illumination on the variability of color matches. In all conditions, observers chose a variety of paint chips


*The table shows the results of a Three-Way ANOVA on the errors for color matches averaged across observers for each cube in each experimental condition. Cube was coded as a random effects variable, while surround and illumination were coded as fixed effects. The ANOVA modeled all main effects and the surround by illumination interaction.*

as color matches (see **Figure 4**). For each experimental condition, we defined variability as the distance between each color match and the average color match in that condition. Thus, cubes with matches that span a larger region of color space have higher variability.

We compared variability in the baseline condition to variability in each experimental condition (**Figure 7**). If the basic processes underlying color matching are not altered by either the illumination shift or the addition of a background, then the data should fall along the identity line. However, we found that variability in the illumination condition was significantly different than in the baseline condition (brown squares above the line in **Figure 7A**, two-tailed paired *t*-test, *p <* 0*.*005). In contrast, adding a background significantly decreased the between-observers variability in color matches (magenta circles below the line in **Figure 7A**; two-tailed, paired *t*-test, *p <* 0*.*05).

As with average color matches, cubes within an experimental condition elicited a wide range of variability in color matches. Given the non-uniformity of the palette chip chromaticities in color space, we cannot distinguish whether variability between cubes within a given condition results from increased perceptual variability for that particular color or the non-uniformity with which paint chips sample color space. However, withincube variability was highly correlated between experimental conditions (baseline-illumination: *r* = 0*.*73, *p <* 0*.*005; baselinebackground: *r* = 0*.*65, *p <* 0*.*01), as it was with color constancy indices, indicating that this variability is related to some property of the cube or palette itself and is not an artifact of differences between observers.

#### **3.4. RELATIONSHIP BETWEEN AVERAGE COLOR MATCHES AND VARIABILITY**

Here we have separately analyzed average color matches and variability of color matches, but it is possible that both judgments arise from a common representation. We investigated their

labeled experimental condition averaged across cubes. Error bars are s.e.m. across cubes (*n* = 16). In both **(A)** and **(B)** variability was defined as the average distance in color space space between each observer's match and the average chromaticity of matches in that condition.

independence by plotting variability of matches as a function of color constancy in **Figure 8**. If, for example, increased variability necessarily led to decreased constancy, we would expect a negative correlation. If, on the other hand, color constancy and variability were unrelated, we would expect no correlation. There was no significant correlation between variability of color matches and degree of color constancy in either the illumination (*r* = −0*.*10, *p* = 0*.*71) or joint condition (*r* = −0*.*03, *p* = 0*.*91).

A related question is whether the palette non-uniformity is related to within-condition constancy or variability. The betweencondition experimental effects are unlikely to be artifacts of palette non-uniformity; for example, the palette discretization is the same for the orange cube in the baseline condition and in the illumination condition. However, of interest is whether the degree of constancy or variability within a condition is predicted by palette density. For example, can palette density account for the relatively high color constancy and low variability of doeskin compared to plum in the illumination condition (**Figure 4**)?

This relationship is examined in **Figure 9**, where we plot color constancy (blue squares) and variability (red circles) as a function of number of palette chips in the cube region. Cube region was calculated as a circle with its center defined by the average color match in the baseline condition and its radius defined as the average variability of color matches in the baseline condition. The number of palette chips will clearly increase with cube region, and this increase may also be non-uniform. We confirmed that a wide range of radius values yielded the same pattern of results. To aid in visualization, both constancy indices and variability values were normalized to their respective maxima, but statistical tests were completed on the non-normalized data.

Color constancy under a change in illumination was unrelated to palette density, either in the joint condition (filled blue squares, *p* = 0*.*84 ) or in the illumination condition (unfilled blue squares, *p* = 0*.*19). In contrast, there was a negative correlation between variability of color matches and palette density (red symbols), although this correlation was stronger in the joint condition

**FIGURE 9 | Relationship between palette density (x-axis) and color constancy (blue symbols, y-axis) and variability (red symbols, y-axis) in the illumination (unfilled circles) and joint (filled circles) conditions.** To aid in visualization, color constancy indices (red symbols) and variability values (blue symbols) were normalized to their respective maxima. All statistical tests in text were performed on non-normalized values. Lines are best fit to data collapsed across experimental condition.

(filled red circles, *r* = −0*.*59, *p <* 0*.*05 ) than the illumination condition (unfilled red circles, *r* = −0*.*48, *p* = 0*.*06). Thus, in regions of color space with fewer possible matches, observers chose matches that spanned a wider range of color space.

#### **4. DISCUSSION**

The main goal of this paper was to compare the effects of background and illumination on color matches in real objects to the large body of data on background and illumination effects in more simplified scenes. We found that the effects of illumination on average color matches generalized well from 2D to 3D, while the effect of background did not. In addition, both manipulations affected variability of color matches. Below, we discuss both findings as well as the implications of the specific task we employed.

#### **4.1. AVERAGE COLOR MATCHES**

We found that color constancy across a change in the illumination was very good (**Figure 5**), with an average color constancy index of about 90%, although the degree of constancy varied with cube. This is consistent with previous results in both 2D and 3D scenes with ample cues to the illuminant (Shevell and Kingdom, 2008; Brainard and Radonjic, in press).

That surfaces surrounding a colored surface or test patch affect its appearance is a well-known phenomenon: simultaneous contrast or color induction has been widely reported in a variety of different stimulus configurations (Shevell, 1978; Ware and Cowan, 1982; Chichilnisky and Wandell, 1995; Rinner and Gegenfurtner, 2002; Hurlbert and Wolf, 2004). Explanations of such background effects typically invoke some form of local contrast-coding, such as von Kries adaptation (Brainard et al., 1993). An implicit assumption is that a color constant visual system ought to attribute changes in the background color signal to a change in illumination, rather than a change in background reflectance. In simulated or simplified scenes, such as the classical patch/surround display, color signal changes are ambiguous. However, in real scenes where such color signal changes are in fact due to reflectance changes, local contrast is not a valid cue to surface reflectance; in such circumstances, constancy indices are typically much lower, around 20%, (Delahunt and Brainard, 2004; Allred and Brainard, 2009; Brainard and Radonjic, in press). An important question is whether such background effects, endemic in 2D scenes or for flat test stimuli and backgrounds embedded in 3D scenes, also exist for 3D objects and backgrounds.

The extent to which scene geometry affects color constancy is currently a topic of active research (Boyaci et al., 2003, 2004; Bloj et al., 2004; Delahunt and Brainard, 2004; Ripamonti et al., 2004; Boyaci et al., 2006; Allred and Brainard, 2009; Xiao et al., 2012). Though early work focussed on the importance of color contrast in determining color perception, more recent research has emphasized the importance of scene geometry. For example, Gilchrist demonstrated that the apparent lightness of a constant luminance test patch is influenced heavily by the depth and associated illumination with which it is grouped (Gilchrist, 1977).

The principle governing such reasoning is that the visual system segregates the scene into objects and regions of illumination/frameworks, and then applies color or lightness-mapping rules within each framework (for different theoretical implementations of these general principles, see Adelson, 2000; Gilchrist et al., 1999). Perceptual organization is thus of crucial importance: in this view, failures of constancy in classic simultaneous contrast illusions result from the failure of the visual system to segregate a test object from its surround or the incorrect assignment of a test stimulus to the appropriate region of illumination. From the anchoring/framework perspective, then, we might expect that any cues that increase the accuracy of object segregation or illuminant estimation would increase color constancy.

In contrast with the large body of research on flat, matte stimulus collections, we found that embedding a test cube in a 3D background had little effect on average color matches: errors in the background condition were similar to the split-half error in the baseline condition and background errors were significantly smaller than illumination errors (**Figure 6**). Further, in contrast to previous research (Delahunt and Brainard, 2004; Allred and Brainard, 2009), we also found that adding a background change to an illuminant shift (joint condition) did not substantially reduce color constancy indices (**Figure 6**). Thus, our data are consistent with the principles of anchoring or framework theories which postulate that local contrast cues can be silenced when the visual system is provided with sufficient evidence for perceptual segregation and illuminant estimation.

#### **4.2. VARIABILITY OF COLOR MATCHES**

Generally, scene complexity is thought to improve color constancy (Shevell and Kingdom, 2008), although there are notable exceptions (see Foster, 2011, for discussion). Under one view of color constancy, scene complexity is postulated to do so by increasing the accuracy of the illuminant representation. Under this view, the visual system arrives at a reflectance estimate by combining a variable estimate of the illuminant (either implicitly or explicitly) with the incoming sensory signal (see Brainard and Maloney, 2011, for review). In such a view, failures of constancy are interpreted as mis-estimations of the illuminant. Although color constancy research typically focuses on the extent of average mis-estimation under the rubric of color constancy, it may be that an illumination shift also alters the overall uncertainty in the illuminant representation, and this could manifest itself as increased variability in color matches as well as the more traditionally reported decreased constancy. Although past research has generally focused on average constancy, a growing body of research seeks to understand the relationship between variability of responses and average responses in both color (Rinner and Gegenfurtner, 2000; Hillis and Brainard, 2005; Abrams et al., 2007; Hillis and Brainard, 2007a,b) and other visual domains (Weiss et al., 2002; Stocker and Simoncelli, 2006).

Two features of our data are consistent with this view. First, we found that the variability of color matches increased in the illumination condition (**Figure 7**). If illuminant estimation is indeed involved in achieving constancy in this task, then the illumination condition required observers to estimate the illuminant in both booths; this presumably increased uncertainty compared to the baseline condition. We also found an increase in matching errors in this condition (**Figure 6**).

Second, embedding a cube in the background decreased variability compared to the baseline condition (**Figure 7**). To understand this, consider that overall errors in this condition were relatively low, similar to split-half errors in the baseline condition (**Figure 6**). This suggests that the 3D cues present in the scene, cube, and background allowed the visual system to successfully segregate the background from the cube. If this is the case, then the background could be thought of as another nearby object in the scene that allows a second estimate of the same illuminant, thereby reducing the overall uncertainty in the illuminant estimation within the booth and the subsequent variability in the color matches. This view is further supported by noting that variability in the joint condition, where the background is added to the illumination shift, is less than in the illumination alone condition (**Figure 7**).

Interestingly, although conditions with higher constancy overall also tend to have less variability, we failed to find any withincondition correlations between color constancy for individual cubes and variability of color matches for that cube.

Although we have cast our interpretation of average constancy and variability in terms of illuminant estimation, we note that the available evidence suggests that observers do not explicitly represent the illuminant (Rutherford and Brainard, 2002; Amano et al., 2006; Granzier et al., 2009). Despite this, the language of illuminant estimation implicit in discussions of perceptual segregation may be functionally useful. Still, we note that there are alternative interpretations of our data for those reluctant to view perceptual segregation as either critically important or theoretically useful. For example, our scenes are relatively rich scenes with non-uniform illumination; thus, the local contrast relationships are more complex than they are in uniformly illuminated, 2D scenes. Previous work has suggested that with such information, low-dimensional linear models are in theory able to unambiguously recover both surface reflectance and illumination without resorting to higher level perceptual segmentation (Zmura and Iverson, 1994). However, such low-dimensional models have not yet been able to successfully predict human color judgments (see Foster, 2011, for dicussion).

#### **4.3. TASK**

The discretized matching task employed here is very different than many other color constancy tasks. Many studies employ asymmetric matching, where observers adjust a matching stimulus under a test illuminant until it appears to match a standard under a standard illuminant (Kuriki and Uchikawa, 1996; Brainard et al., 1997; Faul et al., 2008; Kulikowski et al., 2012) or achromatic adjustment, where observers adjust the stimulus until it appears gray (Brainard, 1998; Boyaci et al., 2004; Hansen et al., 2006). Although some studies have employed discretized palettes, they typically use Munsell chips or papers (McCann, 2004; Olkkonen et al., 2010; Allred et al., 2012) or NCS papers (Hedrich et al., 2009). Such color spaces and palettes are used because they are thought to uniformly sample perceptual color space, and thus avoid potential artifacts due to uneven stimulus sampling.

Such palettes and tasks have proved fruitful in explaining laboratory color matching. However, palettes encountered in the real world, such as thread, fabric, or paints, are unlikely to uniformly sample color space. In addition, typical laboratory tasks often involve appearance matches, and there is considerable debate both about whether and when such matches may differ from reflectance matches (Troost and de Weert, 1991; Bäuml, 1999; Ripamonti et al., 2004; Brainard and Radonjic, in press). We chose here to focus on reflectance matches because they arguably underlie many behaviorally important tasks (Zaidi et al., 1992; Allred, 2012; Brainard and Radonjic, in press), but we acknowledge that others may have a different perspective on the functionality of appearance judgments.

With respect to these concerns, two of our findings are particularly relevant to the task demands. First, color constancy indices in the illumination condition were very similar to those reported in a variety of other studies employing relatively realistic stimuli, but using different tasks. Our observers were instructed to make a reflectance match; the nature of the task also supports a reflectance identification strategy. Furthermore, the lack of correlation between palette density and color constancy suggests that, at least for average constancy indices, the palette choice is not critical. Together, these findings provide support for the common assumption that the results from asymmetric matching and achromatic adjustment tasks in simulated scenes will generalize to more complex scenes and more realistic tasks.

Although the concordance between our findings and previous studies are encouraging, we recognize that several complications may arise from using a non-standard color task and palette. First, if the color palette is insufficiently discretized, then constancy indices could be artificially inflated. However, observers chose many different palette chips. On average there were 7.7 chips chosen for the 11.3 observers per cube. The raw number of paint chips chosen per cube was much higher than in some other studies using discretized chips (Hedrich et al., 2009), indicating that insufficient discretization is likely not a potential confound.

Second, non-uniformity of the matching palette makes it difficult to compare variability of color matches between cubes. For example, the greater variability in color matches for red than green (see **Figure 7**) could result either from more perceptual variability or from a less densely sampled palette. Indeed, we reported a negative correlation between palette density and variability of color matches (**Figure 8**). Since there was no correlation between palette density and average color constancy (**Figure 9**), palette non-uniformity is less likely to affect the interpretation of constancy for individual cubes.

#### **5. CONCLUSIONS**

As noted in the introduction, there are two broad classes of approach as we seek to move from relatively simple, parametrically manipulated stimuli and tasks to the full complexity of realistic scenes. One approach takes incremental steps, predicting and then testing the effect of manipulating one particular stimulus aspect such as object slant (Bloj et al., 2004) or cues to depth (Werner, 2006). Here we took the complementary approach of utilizing as realistic a scene and task as possible. We do not view our data as endorsing a specific theoretical view or mechanistic model of constancy; rather, we have the much more modest goal of providing some empirical constraints as we elaborate further theories of color vision. Our results suggest that average color constancy across illumination should remain high but variability should increase. Furthermore, the addition of a background either with or without an illumination change should introduce relatively few errors in average matches and should decrease matching variability. However, there are several limitations to our approach that caution against over-generalization.

First, although our stimuli and matching palette were real, relatively rich scenes, many real world scenes contain variables that our scenes did not. For example, real objects may not be uniformly colored, or they may contain textures or specular highlights that provide additional information to the visual system. Second, although the illumination varied within booths, realscenes may have both abrupt and gradual illumination changes, and may vary over many orders of magnitude greater than ours (Xiao et al., 2012). Third, although observers performed an identification task with a real matching palette, the matching palette was not 3D. In some real-world identification tasks, observers often have additional cues such as shape that combine with color to guide behavior. Fourth, we note that although we chose a wide variety of cube and background colors, (**Figures 1**, **3**) we did not parametrically manipulate either. As noted previously, there is a complex and sometimes contradictory literature surrounding the magnitude and direction of expected simultaneous contrast or color induction effects (see Ekroll and Faul, 2012, for discussion). Although in aggregate we found no effect of background, certain cube/background pairs (e.g., dark green) had higher error indices, and it remains possible that there is a subset of stimuli where backgrounds would have a larger effect. Lastly, we focused solely on reflectance judgments, and the distinction between appearance and reflectance judgments may be of particular importance in scenes like those used here. For example, it is clear from visual inspection of the cubes that each face of the cube appears different in some way, even though it is also easy to see that the cube is uniformly painted.

Taken together, these points suggest caution against overgeneralization of our results. An important avenue for future research is to determine the relative importance of each of these factors in the constraining our ability to generalize from color matching in simplified laboratory tasks to the color tasks faced by individuals in everyday experience.

#### **ACKNOWLEDGMENTS**

The authors would like to acknowledge Jeremy Bell, Patrice McCarthy, and Jessica McLaren for their help in collecting data.

#### **FUNDING**

This research was funded by NSF-BCS 0954749 to Sarah Allred.

#### **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 13 June 2013; paper pending published: 06 July 2013; accepted: 15 October 2013; published online: 11 November 2013.*

*Citation: Allred SR and Olkkonen M (2013) The effect of background and illumination on color identification of real, 3D objects. Front. Psychol. 4:821. doi: 10.3389/fpsyg. 2013.00821*

*This article was submitted to Perception Science, a section of the journal Frontiers in Psychology.*

*Copyright © 2013 Allred and Olkkonen. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Reflectance, illumination, and appearance in color constancy

#### *John J. McCann1 \*, Carinna Parraman2 and Alessandro Rizzi <sup>3</sup>*

*<sup>1</sup> McCann Imaging, Arlington, MA, USA*

*<sup>2</sup> Centre for Fine Print Research, University of the West of England, Bristol, UK*

*<sup>3</sup> Dipartimento di Informatica, Università degli Studi di Milano, Milano, Italy*

#### *Edited by:*

*Galina Paramei, Liverpool Hope University, UK*

*Reviewed by:*

*Maria Olkkonen, University of Pennsylvania, USA Ana Radonjic, University of Pennsylvania, USA*

#### *\*Correspondence:*

*John J. McCann, McCann Imaging, 30 Spy Pond Parkway, Arlington, MA 02474, USA e-mail: mccanns@tiac.net*

We studied color constancy using a pair of identical 3-D Color Mondrian displays. We viewed one 3-D Mondrian in nearly uniform illumination, and the other in directional, nonuniform illumination. We used the three dimensional structures to modulate the light falling on the painted surfaces. The 3-D structures in the displays were a matching set of wooden blocks. Across Mondrian displays, each corresponding facet had the same paint on its surface. We used only 6 chromatic, and 5 achromatic paints applied to 104 block facets. The 3-D blocks add shadows and multiple reflections not found in flat Mondrians. Both 3-D Mondrians were viewed simultaneously, side-by-side. We used two techniques to measure correlation of appearance with surface reflectance. First, observers made magnitude estimates of changes in the appearances of identical reflectances. Second, an author painted a watercolor of the 3-D Mondrians. The watercolor's reflectances quantified the changes in appearances. While constancy generalizations about illumination and reflectance hold for flat Mondrians, they do not for 3-D Mondrians. A constant paint does not exhibit perfect color constancy, but rather shows significant shifts in lightness, hue and chroma in response to the structure in the nonuniform illumination. Color appearance depends on the spatial information in both the illumination and the reflectances of objects. The spatial information of the quanta catch from the array of retinal receptors generates sensations that have variable correlation with surface reflectance. Models of appearance in humans need to calculate the departures from perfect constancy measured here. This article provides a dataset of measurements of color appearances for computational models of sensation.

**Keywords: color constancy, measured appearance, sensations, high-dynamic-range (HDR) scenes, discounting illumination, 3-D test targets**

#### **INTRODUCTION**

Colorimetry and traditional color photography have fixed responses to spectral light. For each local area, the quanta catch of the light sensors determines the response of the system. For colorimetry, the quanta catch determines the match; and for silver-halide photography the quanta catch determines the optical density of the image. Their color processing has no color constancy. Humans sense their visual environment in real complex scenes. Humans have color constancy, such that appearance is largely indifferent to illumination. Further, they are insensitive to the changes in scene radiances due to shadows. This indifference in complex scenes is the result of spatial image processing. For centuries the best scene capture and rendering has been by artists who simply painted appearance. By simultaneously rendering the entire scene's spatial content in paint on a flat plane they recorded the visual equivalent of the real scene's greater range of light in nonuniform illumination.

In the late 19th century, discussions of constancy began with the study of the appearances of objects in different illuminations. In 1872 Hering wrote: "The approximate constancy of the colors of seen objects, in spite of large quantitative or qualitative changes of the general illumination of the visual field, is one of the most noteworthy and most important facts in the field of physiological optics. Without this approximate constancy, a piece of chalk on a cloudy day would manifest the same color as a piece of coal does on a sunny day, and in the course of a single day it would have to assume all possible colors that lie between black and white."(Hering, 1872/1905).

At least four different kinds of color constancy are studied today. Although these disciplines have common roots, they have grown apart in their basic assumptions, terminology, and goals for successful implementation. These disciplines ask observers distinctly different questions and get answers that superficially seem to be contradictory. The Optical Society of America used a pair of definitions for sensation and perception that followed the ideas of the Scottish philosopher Thomas Reid. Sensation is a "Mode of mental functioning that is directly associated with the stimulation of the organism" (OSA Committee on Colorimetry, 1953). Perception is more complex, and involves past experience. Perception includes recognition of the object. It is helpful to compare and contrast these terms in a single image to establish our vocabulary as we progress from 18th century philosophy to 21st century image processing. **Figure 1** is a photograph of a raft,—a swimming float—in the middle of a lake (McCann and Houston, 1983a; McCann, 2000a). The photograph was taken in early morning. The sunlight fell on one face of the raft, while the skylight illuminated the other face. The sunlit side reflected about 10 times more 3000◦K light than the 20,000◦K skylight side. The two faces had very different radiances, and hence very different colorimetric *X*, *Y*, *Z* values.

For sensation measurements, observers can select the colors they see from a lexicon of color samples, such as the Munsell Book, or the catalog of paint samples from a paint store. The question for observers is to find the paint sample that a finearts painter would use to make a realistic rendition of the scene. Observers say that a bright white with a touch of yellow looked like the sunlit side, and a light gray with a touch of blue looked like the sky-lit side. The answer to the sensation question was that the two faces were similar, but different.

For perceptions, observers can select the colors from the same catalog of paint samples, but with a different question. The

**FIGURE 1 | Photograph of a swimming raft with sunlight illumination on the right and skylight illumination on the left.** Observers report sensations that are lighter and more yellow in the sun; and darker and more blue in skylight. Observers also report that the two sides of the float are perceived to have the same white paint, despite their different appearances.

perception question was to find the paint sample that a house painter would use to repaint the raft using the same paint. For this question, observers selected white paint. They recognized that the paint on both sides of the raft is white with different illuminations. The surface perception question renders the two faces identical.

In summary, the raft faces are very different, or similar, or identical depending on whether the experimenter is measuring colorimetry, or sensation, or perception. We need completely different kinds of image processing algorithms in order to model the three different answers to these three questions. Colorimetry models predict receptor responses; sensation models predict the color appearances; and perception models predict the observer's recognition of the object's surface. Subsequent experiments asked the same question, using a slightly different vocabulary (Arend and Goldstein, 1987). They found the same result. Namely, observer's responses depended on the observers' task.

#### **COLOR CONSTANCY MODELS**

Human color constancy involves the spatial content of the scene. As we will observe in this paper, it depends on the reflectances of objects, the spectral content and spatial distribution of the illumination, and the arrangement of the scene. There are a number of models of color constancy used to predict colors from the array of radiances coming to the eye, or the camera. They not only use a variety of image processing assumptions, they have different sets of required information, and different goals for the model to calculate. **Table 1** lists the names of four types of models, their goals (result of the calculation), their required information (inputs to calculation), mechanisms, their dependence on surface reflectance and references (**Table 1**-row 1).

#### *Retinex*

Land's Color Mondrian experiment (Land, 1964; Land and McCann, 1971) used a flat array of matte colored papers. He varied the amounts of uniform R, G, and B illumination over the entire array of more than 100 papers. He measured the light coming from a paper, then moved to a second paper and changed the illumination so that the second paper sent the same local stimulus to the eye. This experiment demonstrated that identical retinal


stimuli can generate all colors. A red paper still looked red when its illumination was altered so that it was the same light stimulus as a green paper. The quanta catch of the retina at a pixel does not correlate with appearance. Color constancy measurements showed that color appearance correlates with the scaled integrated reflectance of the paper in Land's Color Mondrian (McCann et al., 1976). This good correlation uses Scaled Integrated Reflectance, not the usual spectral surface reflectance curves measured with a narrowband spectral radiometer. This integrated reflectance has L, M, S values that are the product of the spectral surface's reflectance, its irradiance, and the L, M, S retinal cone sensitivity functions. The L reflectance is the ratio of the L cone response to the surface divided by the Lcone response to an adjacent white paper in the same illumination. The scaling is done by the CIE L∗ cube root function that approximates a correction for lower reflectances for scatter in the eye (McCann and Rizzi, 2012, ch. 14, 18).

McCann et al. (1976) calculated the paper's appearance using spatial comparisons. Further, cone sensitivity functions have considerable overlap. The L cones respond to middle-wave light, etc. The observed colors showed that the spatial comparison model predicted observer matches. The measured discrepancies from perfect constancy were predicted by crosstalk between the cone sensitivity spectra. (McCann et al., 1976; McCann, 2004c, 2005a).

Land's Retinex model requires, as input, the spectral radiances at each pixel in the field of view. Its goal is to calculate the appearance of all color sensations in the scene. It builds color appearances out of spatial comparisons. Land said "... the function of retinex theory is to tell how the eye can ascertain reflectance in a field in which the illumination is unknowable and the reflectance is unknown." (Land and McCann, 1971). Later Retinex papers restated the language using edges and gradients, instead of illumination and reflectance. This was a result of studies of real life scenes in which: gradients in reflectance are difficult to see, and shadows with abrupt edges in illumination are highly visible (McCann, 1999b, 2004b).

Retinex, and other related models of vision, calculate sensations (McCann and Rizzi, 2012, p. 283–371). The correlation between surface reflectance and sensation depends on the scene's spatial content (**Table 1**-row 2).

#### *CIELAB and CIECAM*

Helmholtz proposed the idea that humans discount illumination, (von Helmholtz, 1866/1962) so that appearances correlated with recognizing the object, namely its reflectance. This principle is incorporated in pixel-based color appearance models such as 1976 CIELAB and 2002 CIECAM (CIE, 1978, 2004). These models use physical measurements of the illumination to normalize radiances from objects and remove the radiance information contained in the illumination. These models cannot predict color appearance without measurements of illumination at the pixel of interest as input. CIECAM requires that the user assign scene-dependent coefficients c (viewing condition parameter), Nc (chromatic surround induction factor), and F (surround parameter). These parameters have to be set by inspecting the scene (Moroney et al., 2002; Hunt, 2004). They are not calculated from the array of scene radiances (**Table 1**-row 3).

CIELAB and CIECAM use a pixel's scene radiance and that pixel's illumination. If two pixels from different parts of a scene have the same reflectance, but different illumination, then CIELAB and CIECAM predict identical outputs. These models predict that sensations always correlate with surface reflectances. CIELAB and CIECAM transform the color space of the scene radiances, but equal reflectances always generate equal sensations. There is no mechanism to introduce spatial variations caused by scene content.

#### *Computer vision*

Computer Vision Color Constancy algorithms work to remove the illumination measurement limitation found in CIE colorimetric standards by calculating illumination from scene data. The image processing community has adopted this approach to derive the illumination from the array of all radiances coming to the camera. Since estimating the illuminant from the pixel array is a multidimensional ill-posed problem, computer vision models need to apply some constraints on the scene. These constraints can regard spectral content or geometry of the illuminant, statistics of reflectances, etc. For example, one of the assumptions used in many Gray-World algorithms, is that scenes have a constant average reflectance (Buchsbaum, 1980; Funt and Drew, 1988). If true, then Gray-World algorithms can use the average radiance of all pixels to measure the spectral distribution of the illuminant.

As long as the illumination is constant for all pixels in the scene, then each pixel's radiance divided by the calculated illumination will equal that pixel's reflectance. Computer-vision models measure success by how well they can calculate an object's reflectance in different spectral illuminants.In order to use these models in a discussion about human vision, we need to perform a separate psychophysical experiment to test whether appearances correlate with reflectance for the image in question. One should not use such models for vision in situations where appearance deviates from reflectance. These models often assume perfect color constancy which is quite different from the approximate constancy found in humans. This field has been studied by Horn (1974), Buchsbaum (1980), Marr (1982), Funt and Drew (1988), Richards (1988), D'Zmura and Iverson (1993a,b), Sinha and Adelson (1993), Adelson and Pentland (1996), Finlayson et al. (1997), Purves and Lotto (2003), Zickler et al. (2008), Foster et al. (2009), Gevers et al. (2012) (**Table 1**-row 4).

Summary of Computer Vision Color Constancy are found in Ebner (2007) and Gevers et al. (2012). Many of these studies use shared datasets to optimize their algorithms. Instead of each experiment devoting the authors' resources to making complete sets of measurements of each phenomenon, computer vision research often collaborates by the use of shared data. Examples of datasets of images provided for other authors to test their algorithms are found in Grosse et al. (2009), and Gevers et al. (2012).

Color Constancy in Computer Vision searches for the object's intrinsic surface properties. That definition sets the algorithm's goal as finding surface reflectance. That goal implies the accurate calculation of illumination from the array of scene radiances.

#### *Surface perception*

Surface Perception algorithms study and model the observer's ability to recognize the surface of objects. Following Hering's concern that chalk should not be mistaken for coal, the objective is to predict human response to questions about recognizing an objects surface. Here the subjects are asked the house painter's question: what paint is on the surface? Techniques include the analysis of cues from specular reflections and application of Bayesian inferences. This field has been studied by Helson (1947, 1964), Lee (1986), Yang and Maloney (2001), Bloj et al. (2004), Brainard and Maloney (2004), Ripamonti et al. (2004), Smithson and Zaidi (2004), Brainard et al. (2006), Gilchrist (2006), Foster et al. (2009), Kingdom (2011).

Helson (1947, 1964) believed that the complex visual image generated a "pooled effect of all stimuli," to which the organism was "attuned or adapted." Helson's level of reference is centrally stored and used as reference for all judgments. Many elements of the visual environment are suggested to play a role in such a global normalization factor, such as visual pigment adaptation, the history of reflectances in the field of view, and temporal distribution of cues (Smithson and Zaidi, 2004). See Brainard and Maloney (2004) for summary (**Table 1**-row 4).

All four types of algorithms listed in **Table 1** do well with their predictions in the flat uniformly illuminated Color Mondrian. As long as the illumination is uniform, the sensation predictions of Retinex and CIECAM models are similar. Further, sensations correlate with reflectance in uniform illumination (McCann et al., 1976). That has led some authors to suggest that Computer Vision algorithms can be used in modeling vision (Ebner, 2007). All four of the distinct Color Constancies listed in **Table 1** involve the interpretation of scene radiances. Beyond that common variable, they differ in their use of reflectance, illumination, scene content, sensations, and perceptions.

The experiments in this paper introduce a different set of requirements for color appearance models. Here, we use a restricted set of reflectances and highly variable illumination. By varying the spatial structure in the illumination we have more realistic stimuli representing complex scenes, and we greatly increase the dynamic range of the scene. We have more information to help sort out the importance of radiance, reflectance and illumination, as well as scene content, including edges and gradients, in modeling human vision. By studying the effects of spatial structure in illumination we will attempt to compare and contrast the roles of reflectance and illumination in these four types of color constancy.

#### **DATASET OF SENSATIONS—DEPARTURES FROM PERFECT CONSTANCY**

These experiments measure sensations. They ask the "fine arts painter" question. What is the appearance of the surface?

This article describes experiments that measure the departures from perfect constancy in complex scenes. If human color sensation constancy were perfect then the same surface paint would generate identical sensations in all illuminations and in all scenes. Human constancy is rarely perfect. It is observed only when the retinal quanta catches are constant in surrounding scenes that are identical. What is remarkable about human vision is how small the departures are in most scenes. We can measure these departures from perfect constancy to test computational models of sensations. In other words, departures from perfect constancy are the signature of the underlying mechanisms.

The much earlier McCann McKee and Taylor paper: (1) measured the sensations in flat 2-D Mondrians. They modeled observer results with: (2) measured scene radiances; (3) calculated cone responses; and (4) spatial algorithm calculations of color sensations. They successfully modeled the observer results. They found that appearances in *that* Mondrian correlated with the spectral measurements of reflectance using spatial comparisons (edge ratios) of cone responses.

We are in the process of performing the same steps here. However, our scenes are much more complicated. This paper intends to perform only the first step by collecting the dataset of departures. Other more complex steps will follow. This paper describes the measurement of a dataset of departures from perfect constancy in 3-D Mondrians. Here, the three dimensions of the target are used to modulate illumination. We are not studying the perceptual effects of depth perception. Here, the objects modify the illumination by introducing gradients and edges.

The process for calculating the cone quanta catch generated by these 3-D Mondrians is beyond the scope of this paper. Both cameras and human intraocular glare introduce different major spatial transformations of scene radiances measured by a telephotometer. You cannot use camera data, even RAW camera data, as an input to computational models of vision. Detailed calibrations are needed to prove that a particular digital camera image is an accurate record of the scene (McCann et al., 2013). As well, we need to use the CIE model of intraocular glare (Vos and van den Berg, 1999) to calculate the retinal image. Glare is a variable addition to scene radiances depending on scene content. Radiance measurements of the scene do not represent retinal stimuli, particularly in high-dynamic-range scenes (McCann and Rizzi, 2012, p. 113–171). As well, the implementation of spatial algorithms (McCann and Rizzi, 2012, p. 283–371) to calculate predictions of sensations from the retinal image is beyond the scope of this paper.

#### **MATERIALS AND METHODS**

This article studies human color appearances of surface reflectances in real complex scenes. These scenes are made up of two copies of the same surfaces (wooded blocks with matte paints). There are only 11 paints (R, Y, G, C, B, M, W, G7.5, G6, G4, K). **Figure 2** (left) shows a circular test target with 11 painted sections. **Figure 2** (right) lists the Munsell chip closest to each paint, evaluated in daylight. The colors were selected among matte surface paints. The five grays were selected to maximum and minimum reflectances with two paints near middle gray, and a light gray. The six colors were selected to red neither orange red, nor purple red; yellow—neither warm, nor cool yellow; ... etc. They have high chroma, but not maximal chroma.

We worked with these 11 painted surfaces to construct two parts of a 3-D complex scene.


Both 3-D Mondrians were made of two sets of identical wooden blocks. They used the same paint on each corresponding facet. Both the LDR and the HDR parts of the scene were viewed in the same room at the same time. Ideally the LDR illumination would be perfectly uniform. That would restrict the range of scene radiances to the range of surface reflectances. While this is possible with flat Mondrians, measurements of surfaces in our LDR illumination cube showed a small range of nonuniformity.

HDR scenes are generated by directional light and the presence of light emitters. We use the terms LDR and HDR as labels of our experimental illumination, and they should not be confused with tone-mapping algorithms in digital photography.

By varying the illumination on constant surfaces we can measure the extent of color constancy of sensations. Does appearance correlate with the objects physical reflectance, or scaled integrated reflectance? How does appearance change with different illuminations? Does the spatial content of the illumination play a role in appearance?

All color appearance measurements were made on the combined LDR/HDR display described in Section LDR/HDR Display The first measurement set (Section Magnitude Estimation Color Appearance in the Munsell Book) were made using observer magnitude estimates of the changes in appearance with reference to distances in the Munsell Book. In the second measurement set (Section Artist's Rendering of Scene Appearances), an artist recorded the color appearances of all the blocks in a watercolor painting. We then measured the visible reflectance spectrum of each area's color match. By painting the entire scene we measured the appearance of a facet in the surround equivalent to that in the scene.

#### **LDR/HDR DISPLAY**

We made a pair of photographs of the two parts of the scene. **Figure 3** (left) shows the LDR part. The blocks were inside an illumination cube with a white floor, translucent top and sides, and

**FIGURE 3 | Photographs of the Low-Dynamic-Range (LDR) part of the scene on left, and High-Dynamic-Range (HDR) on right.**

a black background. We directed eight tungsten-halide spotlights on the sides and top of the illumination cube. The combination of multiple lamps with identical emission spectra, light-scattering cloth of the illumination cube, and highly reflective walls made the illumination nearly uniform. Departures from perfect uniformity came from shadows cast by the 3-D objects, and the open front of the cube for viewing.

**Figure 3** (right) is a photograph of the HDR 3-D Color Mondrian illuminated by two different lights. One was a 150 W tungsten spotlight on the right side of the HDR Mondrian at the same elevation. It was placed 2 m from the center of the target. The second light was an array of WLEDs assembled in a flashlight. It stood vertically and was located 20 cm from the display on the left. Although both illuminants are "white lights," they have different emission spectra. The placement of these lamps produced highly nonuniform illumination and increased the dynamic range of the scene (McCann et al., 2009a,b). The overhead lights in the room were shut off, but the display provided sufficient working illumination.

In the HDR 3-D Mondrian, the black back wall had a 10 cm circular hole cut in it. Behind the hole was a small chamber with a second black wall 10 cm behind the first. We placed the flat circular test target on the back of this chamber. It was placed so that none of the direct light from either lamp fell on the circular target. That target was illuminated by light reflected by the black walls of the chamber. The target in the chamber had much less illumination than the same paints on the wooden blocks. The target in the chamber significantly increased the range of the nonuniform display. Nevertheless, observers had no difficulty seeing the darker circular target.

**Figure 4** identifies the 104 painted facets measured in these experiments. This information is needed to identify the individual areas in the LDR and HDR displays. The highest luminance was 273 cd/m2. It was the white paint facet #60 in the HDR portion. The lowest HDR facet was 0.73 cd/m<sup>2</sup> (black paint, facet #94), giving a range of 377:1.

In the LDR portion the highest luminance was 248 cd/m2 (white paint facet #9) and the lowest was 3.4 cd/m2 (black facet #21), giving a range of 74:1.

The range of luminances for white and black paint in uniform illumination is 17:1. We measured the radiances of each facet in both LDR and HDR parts of the target [Appendix 2 (Data Sheet)—data normalized and scaled].

#### **MAGNITUDE ESTIMATION COLOR APPEARANCE IN THE MUNSELL BOOK**

We asked 10 observers to measure the color appearances of identical painted surfaces. The average age of observers was 32; there were 6 males and 4 females. All observers reported that they had their color vision tested. They all had normal color vision. The experimental design was reviewed by the Ethical Committee of Università degli Studi di Milano.

Before the start of the experiment, we informed observers that they were comparing the appearance of wooden facets with identical painted surfaces. Each observer was given two documents. One was a data sheet to be used to record their responses. For example, the HDR part of the sheet had a color photograph of the HDR scene with five arrows pointing to five facets with R paint, labeled R1–R5. Adjacent to each arrow, there was a location for the observer to report the Hue, Lightness, and Chroma changes in appearance from "Ground Truth" of that red paint facet. The four-page data sheet had 9 LDR photographs, each with specific arrows identifying the facets to be evaluated for that paint. It also

had 9 HDR photographs with arrows identifying the same facets. The four-page form identified a selection of 37 areas in both the LDR and HDR parts. The 37 LDR and HDR facets were chosen to represent examples of nine paints. Some were chosen to document the changes in appearance in the LDR part, and others for changes in appearance in the HDR part. Nine areas in the box behind the HDR circular hole were included.

The second paper handed each observer was a copy of **Figure 5**. It provided written guidance on their magnitude estimates. The observers were shown a pair of painted circular test targets (**Figure 2** left) placed on the floor of each display, in uniform light. This circular array of the paints was defined to be the appearance of ground truth. They were told that all the flat surfaces had the same paints as those on the "ground truth" targets. We explained that we were asking about what the area looked like—its appearance—its sensation (OSA Committee on Colorimetry, 1953).

If the observer reported that a facet appeared the same as the red paint applied to the "ground truth" test target, we called that an example of perfect color constancy. In other words, the human did a perfect job of ignoring the illuminant. When a facet appeared different from the paint in the "ground truth" target, we asked the observer to estimate the change in sensation using guidelines reproduced in **Figure 5**.

Observers were asked: "Do the selected facets have the same appearance as ground truth?" If not, they described the direction and magnitude of the change in appearance using the following procedure. Observers estimated hue changes starting from each of the six patches of colors labeled R, Y, G, C, B, and M. The written instructions stated: "If the facet changes hue, estimate how much it moved toward another color?" They considered changes in the hue as a percentage between one hue (e.g., R), and another hue (e.g., Y). For example, 50%Y indicates a hue shift to a color halfway between R [Munsell 2.5R] and Y [Munsell 2.5Y] (**Figure 5** left). 50%Y equals Munsell 2.5YR. 100%Y meant a total shift of hue to Y.

Observers estimated lightness differences on a Munsell-like scale indicating either increments or decrements, for the apparent lightness value (**Figure 5**, center). They were given the Munsell Lightness Values of *W* = 10; G7.5 = 7.5; G6 = 6; G4 = 4; *K* = 1. The instructions said: "If the facet looks the same lightness as the standard area G6, write down G6. If the facet looks lighter, or darker, estimate how much using the ground truth lightness values." They were asked to estimate the apparent lightness of each area.

**estimation.**

Observers estimated chroma by assigning paint sample data relative to ground truth defined as 100% (**Figure 5** right). If the sample had the same chroma as ground truth then they gave the value 100%. Zero % was assigned to achromatic appearances. In case the target patch appeared more saturated than ground truth, estimates could be greater than 100% (Parraman et al., 2009).

To relate ground truth to the Munsell Book Notation, we matched the 11 painted ground truth samples, by placing Munsell chips on top of the paint samples in daylight. The Munsell notations of the 11 paints are listed in **Figure 2**.

Observers reported the direction and magnitude of changes in appearance from ground truth. We used linear scaling to calculate the Munsell designation of the matching Munsell chip. We used the distance in the Munsell Book as described in the MLAB color space, (Marcu, 1998; McCann, 1999a) as the measure of change in appearance. We assumed that the Munsell Book of Color is equally spaced in color. MLAB converts the Munsell designations to a format similar to CIELAB, but avoids its large departures from uniform color spacing (McCann, 1999a). When the observer reports no change in appearance from illumination MLAB distance is zero. A change as large as white to black (Munsell 10/ to Munsell 1) is MLAB distance of 90.

The results, presented in Appendix (Data Sheet) 1, are the average ± standard error of the mean of 10 observers' estimates of the selected areas in the pair of LDR and HDR 3-D Mondrians. We converted the observer magnitude estimates to an observed Munsell chip designation, and then to MLAB Color space. Munsell chips vary from less than 10 to 1, and L ∗ a ∗ b ∗ varies from 100 to 0. We multiplied estimated Munsell Lightness Values by 10.

$$ML = 10 \ast \left( \text{Munsell Lightness Value} \right) \tag{1}$$

$$Mb = 5 \ast \left( \text{Munsell Chroma} \ast \sin \left( \text{Hue Angle} \ast PI / 180 \right) \right) \tag{2}$$

$$Ma = \left( (5 \ast \text{Chroma})^2 + Mb^2 \right)^{0.5} \tag{3}$$

We multiplied Munsell Chroma by 5 and by the sine of the hue angle to calculate *Mb*. Ma is the third side of the triangle in the Chroma plane for that Lightness (McCann, 1999b). We averaged the 10 observed *ML*, *Ma*, *Mb* values for each chip. This representation of the data allows us to calculate distance in the uniform Munsell Space.

In McCann et al. (1976) observers matched rectangles in flat displays in uniform illumination to the Munsell Book. There the departures from perfect constancy were small. We used matches to the Munsell Book for greater accuracy. The average standard deviation for a match was close to ± one Munsell chip for this technique. In preparing these 3-D Mondrians, we observed how large the departures were. Some of them were as large as 60% of the range between white and black. Matching to Munsell chips is much slower, and more difficult with nonuniform illumination. We chose to use magnitude estimation techniques (Stevens, 1975; Bodmann et al., 1979) because they are more efficient. Although magnitude estimation increases the variance of measurements, the mean data is reliable and repeatable. Observers estimated the linear change in appearance in the uniformly spaced Munsell Book. Wyszecki and Stiles (1982, Appendix) provide a MacAdam table of *Y*, *x*, *y* values for Munsell Designations that extend to the spectrum locus. Thus, magnitude estimates can extend beyond the limitations of Munsell's samples. We chose to use magnitude estimation so we could increase the number of observers.

#### **ARTIST'S RENDERING OF SCENE APPEARANCES**

After observers finished the Magnitude Estimations of Munsell designations, we left the pair of 3-D Mondrians in place. One of the authors (Carinna Parraman) painted with watercolors on paper a rendition of both 3-D Mondrians (Parraman et al., 2010). The painting took a long time to make the reproduction as close as possible to the appearances in both displays. Painters are usually applying their particular "aesthetic rendering" that is a part of their personal style. In this case the painter worked to present on paper the most accurate reproduction of appearances possible. As with the magnitude estimation measurements, both LDR and HDR were viewed and painted together in the same room at the same time.

We made reflectance measurements of the watercolor with a Spectrolino® meter in the center of the areas identified in **Figure 4**. We measured the reflectance spectra of both LDR and HDR watercolor paintings at each of the 104 facets. The meter reads 36 spectral bands, 10 nm apart over the range of 380– 730 nm. We calibrated the meter using a standard reflectance tile.

We considered how to represent these reflectance measurements taking into account human vision. Analysis of percent reflectance overemphasizes the high-reflectance readings, while analysis using log reflectance (optical density) overemphasizes the low-reflectance values. Experiments that measure equal changes in appearance show that the cube root of reflectance is a good approximation of equal visual weighting (Wyszecki and Stiles, 1982). This nonlinear cube root transformation of reflectance has been shown to correlate with intraocular scatter (Stiehl et al., 1983; McCann and Rizzi, 2008). Thus, the cube root of scene luminance converts it to an approximation of log retinal luminance (McCann and Rizzi, 2009). Studies by Indow (1980), D'Andrade and Romney (2003) used the L∗ transform as the first step in their studies of how cones, opponent processes, and lateral geniculate cells generate the perceptually uniform Munsell Color Space. We used the L∗ function Equation (4) to scale Spectrolino reflectance values for each waveband.

$$\mathrm{L}\_{\lambda}^{\*} = 116 \ast (\mathrm{reflectance}\_{\lambda})^{1/3} - 16 \tag{4}$$

Appendix (Data Sheet) 2 lists the scaled XYZ transformations of reflectances of 11 ground truth paint samples: the radiances LDR and HDR facets; and the reflectances of the LDR and HDR watercolor paints. The middle of Appendix (Data Sheet) 2 lists the normalized radiance measurements made with a Konica Minolta CS100 colorimetric telephotometer. We measured (*Y*, *x*, *y*) for each block facet. They were converted to *X*, *Y*, *Z*; averaged and normalized in LDR by the White paint Area 9 measurements (*<sup>X</sup>* <sup>=</sup> <sup>284</sup>.7, *<sup>Y</sup>* <sup>=</sup> <sup>247</sup>.5 cd/m2, *<sup>Z</sup>* <sup>=</sup> <sup>62</sup>.8); in HDR by the White paint Area 60 measurements (*<sup>X</sup>* <sup>=</sup> <sup>314</sup>.2, *<sup>Y</sup>* <sup>=</sup> 273 cd/m2, *<sup>Z</sup>* <sup>=</sup> 88.5). These normalized values were scaled by L ∗ Equation (4).

The color space used in the watercolor measurements describes the painted matches in the framework of retinal responses. It is the color space used in Colorimetry to represent the first step in color vision. The color space used in the Magnitude Estimation measurements is the end of the color process, namely the uniform color space of Munsell. In a uniform color space the retinal responses have been transformed by opponent-color processes to significantly expand chroma, and to counteract the effects of cone crosstalk.

#### **RESULTS**

The goal of these experiments is to evaluate how complex, nonuniform illumination affects color constancy. In the 3-D Mondrians, the objects in the scene modulate the illumination. It is a departure from our previous experimental design using uniform illumination that varies only in its spectra (McCann et al., 1976). We used two different techniques with different observers and different skills. The magnitude estimate experiment used the average of 10 observers to assess the changes in Munsell Color space using distance from ground truth and the direction of the departure from constancy in color space (Section Magnitude Estimates Results). The artist rendering of appearance in the watercolor makes a different comparison in a different color space. It measured the reflectance spectra of a matching image (Section Artist's Watercolor Appearances Results).

#### **MAGNITUDE ESTIMATES RESULTS**

We measured the departure in sensation from constancy in Munsell Space by calculating the observers' average *ML*, *Ma*, *Mb* magnitude estimate for each color paint. We used two circular targets, one in front of each part of the display as the ground truth starting point. For example: R matches Munsell chip 2.5R 4/14. We converted this Munsell designation to MLAB values (*ML* = 40, *Ma* = 70, *Mb* = 7). The red paint in the circular target on the back wall [Area 97] (See **Figure 2**, left), had an average observer appearance estimate of *ML* = 44.5, *Ma* = 66.5, *Mb* = 5.8 for LDR, and *ML* = 27.1, *Ma* = 59.4, *Mb* = 4.3 for HDR. We calculated the distance between average of observed sensations and ground truth as the square root of the sum of the squares of average - *ML*, - *Ma*, and -*Mb* differences (**Table 2**).

**Table 2 | For facet 97, the list of the** *ML***,** *Ma***,** *Mb* **values of ground truth (top row); LDR, and HDR average magnitude estimates; their differences and the distances between them in Munsell Space and the direction of these departures in** *ML* **vs.** *Ma***, and** *Mb* **vs. Ma planes.**


The distance between LDR and HDR average magnitude estimates was 18.9 MLAB units, or the equivalent of 20% of the distance between white and black in the uniform Munsell Color Space. The LDR appearance of facet 97 was 5.7 units away from ground truth moving in the direction of 125◦ in the *ML* vs. *Ma* plane; and moving in the direction of 206◦ from ground truth in the *Mb* vs. *Ma* plane.

The HDR appearance of facet 97 was 16.7 units away from ground truth moving in the direction of 232◦ in the *ML* vs. *Ma* plane; and moving in the direction of 196◦ in the *Mb* vs. *Ma* plane. We asked the observers to evaluate 5 facets with red paint in HDR. These results are listed in the top section of Appendix (Data Sheet) 1 dataset.

Appendix (Data Sheet) 1 lists the data described above for all color samples reported by observers. It includes multiple areas with the same painted surfaces. **Figure 6** plots the MLAB distances from ground truth for the six chromatic paints. In general, these distances are greater in HDR illumination. However, for each color there is at least one sample that changes appearances more in the LDR than in the HDR illumination.

The lightness (*ML*), hue/chroma plane (*Ma*, *Mb*) for nine paints and the average magnitude estimates for 37 selected areas in both LDR and HDR are listed in Appendix (Data Sheet) 1. For each area, we list the average *ML*, *Ma*, *Mb* values; the change in appearance from ground truth as delta *ML*, delta *Ma*, delta *Mb*, and the MLAB distance in the Munsell Book. Appendix (Data Sheet) 1 also lists the ranges of for each paint sample and the angles of departures from constancy. The following detailed results will show observers reported larger departures from ground truth in the HDR than in the LDR scenes. We analyze the result from each set of nine paints. For the red paint the LDR ranges were - *ML* = 9, - *Ma* = 4, - *Mb* = 3; the HDR ranges were - *ML* = 25, - *Ma* = 26, - *Mb* = 24. This pattern held for all the paints. For the five red samples, the individual distances in Munsell MLAB space were LDR = 6, 13, 13, 12, 7 and HDR = 17, 25, 13, 4, 29. This illustrates an important point. In the LDR scene the changes in appearance were smaller in nearly uniform illumination. In the HDR scene the changes in appearance were larger, but there were individual areas that showed little or no change from ground truth. The changes in appearance in the LDR were larger in lightness than in hue/chroma. The changes in HDR were found in both lightness and hue/chroma. Area 11 in LDR is *ML* = 43, *Ma* = 64, *Mb* = 3. This is 3 units lighter, 5 units less red, and 4 units bluer than ground truth. In HDR illumination Area 11 is a sample of red paint that is close to the LED illumination on the right. It has more short-wave light than the tungsten lamp on that side of the Mondrian. Area 11 in HDR is *ML* = 55, *Ma* = 62, *Mb* = −15; that is 15 units lighter, 8 units less red and 22 units more blue than ground truth. For this facet the departure from constancy is larger in hue/chroma than in lightness.

The yellow paint samples in Appendix (Data Sheet) 1 show that Areas 68 and 74 appear darker and have less hue/chroma in the LDR scene (distance = 19, 21). In the HDR scene Areas 68 and 74 are both lighter (distance = 14). Area 100 is 24 units darker in HDR, while it appears the same as ground truth in LDR.

The green samples in the LDR scene show that Area 65 is 20 units lighter, and only 5 units lighter in HDR. In LDR areas 50

and 51 are both darker, appear redder and less yellow (distance = 22, 25). In HDR, Area 50 is 20 units lighter, while Area 51 is 40 units darker and 25 units bluer. Area 103 is 18 units darker and 16 units bluer in HDR.

In cyan Area 102 is 20 units darker in HDR. Area 73 is very close to ground truth in both illuminations. Area 53 is about 12 units lighter in both LDR and HDR, and Area 45 is darker by 20 units in LDR; it is 15 units lighter and 16 units bluer in HDR.

In LDR all blue areas were within 10 units of ground truth. In HDR Areas 99 and 47 were darker and bluer (distance = 30, 35).

For the blocks with white paint, the shadows in the LDR caused a drift in lightness of 30 units. In HDR Area 81 showed a distance of 3 units. The same tall thin white facet makes up areas 81, 83, 84, and 85. Area 83 was darker and slightly bluer (distance = 31). Area 84 had light reflected from an adjacent magenta facet. It was darker and more magenta (distance = 42). Area 85 had light reflected from an adjacent yellow facet. It was more yellow (distance = 39).

The magnitude estimate observer data shows that, in general, the color estimates in LDR are closer to ground truth than HDR. Nevertheless, there are areas in the HDR scene that show very small departures from ground truth standard colors. The change in appearance of individual areas depends on the illumination and the other areas in the scene. The sources of illumination, their spatial distributions, and inter-reflections of light from one facet to another, all play a part in generating appearance. One cannot generalize that the surface property (physical reflectance) correlates with the individual facet's sensation. The illumination falling on each individual facet has introduced a considerable variety of changes in sensation. The local spatial properties of illumination (edges and gradients) show significant influence on the hue, lightness and chroma of observed appearances. These measurements provide an extensive dataset for future work in modeling mechanisms that can calculate color sensations, and the variability of color constancy in these 3-D Mondrians. These targets introduce spatial structure in the illumination, and we found greater departures from constancy with increase in illumination structure.

#### **ARTIST'S WATERCOLOR APPEARANCES RESULTS**

**Figure 7** is a photograph of the watercolor painting of the combined LDR/HDR scene. We made reflectance measurements with a Spectrolino® meter in the center of 104 areas. If the same paint in the scene appeared the same color appearance to the artist, then all the watercolor painting's reflectance plots for these surfaces should superimpose. They do not. The artist selected many different spectra to match the same paint on a number of blocks (**Figures 8**, **9**). Overall, the artist selected a narrower range of watercolor reflectances to reproduce the LDR scene. Many more paint colors are needed to reproduce the HDR scene. Nevertheless, some block facets appeared the same as ground truth, while others showed large departures in their reproduction spectra.

#### *Chromatic watercolor reflectances*

We plotted the watercolor spectra for all reproductions of the red painted blocks in both LDR and HDR scenes (**Figure 8**, top row left). In the LDR reproduction, all but one of the facets had very similar measured reflectances. This showed that appearances correlated well with the objects reflectance, with one exception. In the HDR reproduction the painting had a wide variety of measured reflectances, showing that the nonuniform illumination had considerable influence limiting color constancy.

For the green painted blocks we see a wide variety of reproduction spectra in both LDR and HDR paintings. The blue painted blocks had very similar HDR reflectances for all but one of the facets. The LDR reproduction had more variability than the HDR painting. The cyan, and magenta reproductions of the HDR scene showed greater variability in lightness of similar spectra. The yellow reproductions of both showed variability in lightness and spectra.

It is important to study the photographs in **Figure 3** and the paintings in **Figure 7** to see that these results have more to do with the position of the blocks and their illumination, than with the blocks' paint color. The differences in appearance from ground truth correlate with the spatial structure of the illumination.

#### *Achromatic watercolor reflectances*

**Figure 9** compares the LDR and HDR painting reflectances for the five achromatic value blocks. All departures from a flat

spectrum in **Figure 9** are examples of hue/chroma introduced by illumination. The white surfaces show considerable variation in lightness and in hue/chroma.

Appendix (Data Sheet) 2 lists the five triplets of radiometric (*X*, *Y*, *Z*) data from these experiments: reflectances of the paints on the blocks; radiances from both the LDR and HDR scenes; and the reflectances of both LDR and HDR watercolor rendering. For the Spectrolino measurements we integrated the reflectance spectra with CIE fundamentals. Then, these values were scaled by L ∗ Equation (4) to approximate the stimulus on the retina. The left triplet of Appendix (Data Sheet) 2 lists the Area Identification Number (**Figure 3**), the paint, the L ∗ (*X*), L ∗ (*Y*), L ∗ (*Z*) for the paint on the blocks. The right pair of triplets lists the corresponding values for the LDR and HDR watercolor painting. The middle pair of triplets in Appendix (Data Sheet) 2 lists the normalized radiance measurements made with a Konica Minolta CS100 colorimetric telephotometer. We made two measurements (*Y*, *x*, *y*) for each block facet. They were converted to *X*, *Y*, *Z*; averaged and normalized in LDR by the White paint Area 9 measurements (*<sup>X</sup>* <sup>=</sup> <sup>284</sup>.7, *<sup>Y</sup>* <sup>=</sup> <sup>247</sup>.5 cd/m2, *<sup>Z</sup>* <sup>=</sup> <sup>62</sup>.8); in HDR by the White paint Area 60 measurements (*X* = 314.2, *Y* = 273 cd/m2, *<sup>Z</sup>* <sup>=</sup> <sup>88</sup>.5). These normalized values were scaled by L <sup>∗</sup> Equation (4).

#### **EXAMPLES OF DEPARTURES FROM CONSTANCY CAUSED BY ILLUMINATION**

On the right side of the HDR Mondrian, there is a tall thin white surface. The block's white paint has uniform reflectance values [L ∗ (*X*) = 93, L ∗ (*Y*) = 93, L ∗ (*Z*) = 92)] from the facet's top to bottom [Appendix (Data Sheet) 2]. It illumination is variable because there are different reflections from adjacent blocks. These reflections, from a chromatic block onto an achromatic one, add illumination structure (edges and gradients) to the reflectance structure. The white surface reflectance takes on chromatic appearances, as shown in the spectra in **Figure 9** (top row, left) and the photographs in **Figure 10**.

**Figure 10** shows photographic sections of the display, from the LDR and HDR parts. The LDR (left) appearances show light-middle-gray, and dark-middle-gray shadows. The HDR (right) appearances of the single white paint surface show four different colors. The painting shows: white at the top, a blue-gray shadow below it, a pinker reflection and a yellow reflection below that. Shadows and multiple reflections show larger changes in appearance caused by different illumination.

The photographs of the LDR/HDR scene (**Figure 3**) and the watercolor painting show that the white block in LDR has achromatic shadows. The measurements of watercolor reflectances (**Figure 10** top row, left) show that the painter used darker paints to report the darker shadows in LDR. In HDR the painter selected different hues because the white paint on the block was illuminated with a variety of chromatic illuminations.

The measurements of radiances from the white block (**Figure 10**, middle) show that the shadow spatial structure had achromatic variations in LDR from Appendix (Data Sheet) 2. In HDR, the meter recorded chromatic structure in illumination: a chromatic shift from the two light sources (Area 83) and from multiple reflections (Areas 84, 85). These radiance measurements

document the spatial structure in the illumination falling on the uniform reflectance block.

The magnitude estimates of sensations (**Figure 10**, bottom row) show that the LDR illumination caused changes in lightness, while the HDR illumination caused changes in lightness, chroma and hue Both measurement techniques, watercolor painting and magnitude estimates, show similar results. The changes in illumination falling on this single white block caused relatively

photometer readings from the blocks [L ∗ (*X*), L ∗ (*Y*), L ∗ (*Z*)]; and the bottom section shows average observer magnitude estimates [*ML*, *Ma*, *Mb*].

sharp edges in light coming to the eye. These retinal images caused observers to report changes in lightness, hue, and chroma. This data supports the observation that the changes in appearance of this white block correlate with the spatial structure in the illumination.

In **Figure 11**, there is another example of how the illumination structure plays an important role in these color constancy experiments. **Figure 11** shows central Mondrian areas surrounding a dark gray and black block (Areas 36 and 38). The captured appearances of the LDR and HDR watercolor renderings have different values from the same paint on that block; Areas 36 and 38 have the same dark gray (G4) paint. They both have reflectance CIE L ∗ (*Y*) values of 41.4. These constant surface reflectances have different appearances in the LDR and HDR portions of the watercolor. In LDR area 36, the top, is lighter [L ∗ (*Y*) = 49] than the side [L ∗ (*Y*) = 30]. In HDR, the top is darker [39], than the side [59].

In the HDR, Area 38 is the lightest of the block's three faces (36, 37, 38), while it is nearly the darkest in the LDR. These changes in appearance correlate with the changes in edges caused by the different illuminations. The bottom row of **Figure 11** shows the telephotometer scaled luminances L ∗ (*Y*). LDR area 36 (top) has higher luminance [34], than the side [18]. In HDR, the top has lower luminance [17], than the side [37]. The dark gray facets in **Figure 11** illustrate that edges caused by illumination cause

**FIGURE 11 | Measurements of gray, white and green blocks in the center of the 3-D Mondrians.** The top row shows the sketch with Area IDs; the paint used on the blocks; the LDR; and HDR watercolor painting. The middle row shows the Spectrolino® watercolor L <sup>∗</sup> (*Y*) values for these block facets. The bottom row shows the telephotometer L ∗ (*Y*) values for these block facets.

substantial change in the appearance of surfaces with identical reflectances.

Facet 63 is another example. It has white paint (L<sup>∗</sup> = 93). In the watercolors it was matched by L<sup>∗</sup> = 91 (LDR) and L<sup>∗</sup> = 41 (HDR).

Facets 50, 51, and 65 all have green paint (L<sup>∗</sup> = 52). The LDR watercolor matches are all equal (L<sup>∗</sup> = 49). The HDR matches are different (L<sup>∗</sup> = 58, 34, 53).

The directions of the changes in appearance are consistent with the directions of changes in illumination on the blocks. Edges in illumination cause substantial changes in appearance. The measurements do not show correlation of appearance with luminance of a local region, rather it demonstrates that change in appearance (sensation) correlates with change in luminance across edges in illumination.

Both magnitude estimates and the artist's rendering give very similar results. Both sets of measurements show that appearance depends on the spatial properties of illumination, as well as reflectance. Edges in illumination cause large changes in appearance, as do edges in reflectance. The magnitude estimates analyze the results in a uniform color space. By definition, distance in this space represents the size of the change in appearance for all hues, lightnesses and hue/chroma. Here we have averaged the estimates of 10 observers for 37 areas in both LDR and HDR. In the second experiment, we analyzed the watercolor painting data for 104 facets for a single observer in a modified colorimetric space. We integrated full spectral data under the color matching functions and scaled them by Equation (4). This color space calculates the retinal spectral response (*X*, *Y*, *Z*) with an approximate correction for intraocular scatter (L∗) to analyze the retinal response. Both experiments give similar results, but in different color spaces. Further, there are limitations imposed by the gamut of possible colors in the watercolor paints that do not limit the magnitude estimate experiments. The most important comparison is the effect of illumination (LDR vs. HDR) on appearance. Differences in color spaces and small differences caused by experimental techniques are of secondary importance.

**Figure 12** plots the distribution of distances between ground truth and observed color for the measurements of appearances (sensations) using the magnitude estimates and the watercolor reflectances. The left graph binned the 37 magnitude estimates of MLAB distances into 9 groups 5.8 units wide. The average LDR magnitude estimate distance from ground truth was 12 ± 8 with a maximum distance of 30 and a minimum of 3. The average HDR magnitude estimate distance from ground truth was 18 ± 13 with a maximum distance of 52 and a minimum of 3. The population LDR distances are greatest close to zero, decreasing with distance. There are no LDR distances near the maximum. The HDR has fewer near zero, with the highest population in the middle of the range. LDR and HDR have different distance distributions.

The right graph (**Figure 12**) binned the 104 watercolor reflectance distances [in L ∗ (*X*), L ∗ (*Y*), L ∗ (*Z*) space] into 9 groups 15 units wide. The average LDR magnitude estimate distance from ground truth was 27 ± 22 with a maximum distance of 96 and a minimum of 3. The average HDR magnitude estimate distance from ground truth was 42 ± 30 with a maximum distance of 130 and a minimum of 1. Again, the population of LDR distances is greatest close to zero, decreasing with distance. There are no LDR distances in the maximum bin. The HDR has fewer near zero, with the highest population in the middle of the range. The magnitude estimates and watercolor reflectances show similar departures from ground truth.

#### **DISCUSSION**

This paper studies a very simple question. Can illumination change the appearances of paints with the same physical surface reflectance? We are asking this question using a complicated scene with nonuniform illumination falling on 3-D objects; in other words, real scenes, not experimental abstractions. Here we used the placement of objects in the scene to modulate the illumination. We found a complicated answer. There is no universal generalization of our results; such as human vision makes constant surface reflectances appear constant. Rather, we found a wide range of distinct, individual observations. In the LDR, illumination changes appearance some of the time. In the HDR, illumination changes appearance most of the time. Appearance sensations depend on the objects in the scene, their placement, and the spatial structure in the illumination.

Another simple question is whether observer data supports the "discounted illumination" hypothesis. Hering observed that constancy was approximate. The signature of the departures from perfect constancy provides important information about how human vision achieves constancy. The experiments here study how illumination alters the spatial information from the scene. Observer data correlated with spatial structure in the illumination (edges and gradients).

Previous experiments have measured the high correlation between colors in complex scenes with reflectances of the objects' surface (McCann et al., 1976). This good correlation uses Scaled Integrated Reflectance, not the usual spectral surface reflectance curves measured with a narrowband spectral radiometer. This integrated reflectance has L, M, S values that are the product of the surface's spectral reflectance and the L, M, S retinal cone sensitivity functions. [The L reflectance is the ratio of the L cone response to the surface divided by the Lcone response to white paper in the same illumination. The scaling is done by the CIE L∗ cube root function that approximates a correction for lower reflectances for scatter in the eye (McCann and Rizzi, 2012, ch. 14, 18).] The measured departures from perfect constancy in flat displays in uniform illumination are small, but provide important information about the underlying color constancy mechanism. Further experiments, with many different narrowband spectral illuminants in uniform illumination, showed changes in color appearances are controlled by cone crosstalk, and are inconsistent with cone adaptation theories of constancy, as shown by McCann (2004c, 2005a). These results extended the report by McCann McKee and Taylor that the departures from constancy in uniform illumination were caused by the integrals of reflectance, narrowband illumination and very-broad cone sensitivities. The observed correlation with scaled integrated reflectance was possible because there was no spatial structure in the illumination.

The illumination in the 3-D Mondrian experiments was modulated by the objects in the scene. The departures from perfect constancy measured here in 3-D Mondrians are larger than in uniform illumination. Further, they do not show dependence on the surface reflectances of the paints. We see this in the variability of the distances between appearance and ground truth, and the directions of the color changes in Munsell Space. Each identical colored surface exhibits highly variable changes in appearance. Appearances show dependence on the spatial content of the illumination, as shown in the individual areas described in **Figures 10**, **11**. We also see this in the artist's paint selection used to match appearances in the watercolor. There is great variability in the size and direction of the departures from constancy that correspond to the information about individual areas recorded in the dataset in Appendixes (Data Sheets) 1 and 2.

We found no evidence that visual sensations are the result of illumination detection followed by discounting the illumination. As shown in **Figures 10**, **11**, the changes in appearance of identical reflectances correlate with the departures from uniform illumination. These results, and many other examples documented in this dataset, show that spatial structure in illuminations influences color constancy sensations. Studies comparing the spatial properties of illumination and reflectance show that human form vision processes the spatial content of the illumination the same way it processes the spatial content of the reflectance of objects (McCann, 2000b).

#### **COLOR CONSTANCY MODELS**

As one inspects the color appearances in the LDR and HDR Mondrians one looks for a physical correlate in the scene for color appearances. That correlate is not the XYZ values of a single pixel. The correlate is the spatial relationship of XYZ values with all the other pixels in the rest of the scene. Shadowed regions of the same reflectance paint can have edges created by the illumination. The appearances observed here are consistent with a model that builds colors from image structure.

Spatial comparison algorithms, such as Retinex, use the quanta catch of the cones from the entire field of view as input. Its goal is to calculate the sensations of all areas in the scene. It does this by building the image up from spatial comparisons using the entire image. The model output is equal to the scene's surface reflectance in uniform illumination in flat Mondrians (McCann et al., 1976). It is possible to calculate reflectances using spatial comparisons without ever finding the illumination. Spatial models using 3-D Mondrians will not calculate the paints' reflectances. Instead, we will get a rendition of the scene that treated edges in illumination the same as edges in reflectance. The Retinex spatial model (Frankle and McCann, 1983) shows correlation with reflectance sometimes, (in flat Mondrians), but not all the time (in 3-D Mondrians). A number of computational variations of Retinex spatial processing have been proposed (Frankle and McCann, 1983; Jobson et al., 1997; Marini et al., 1999; McCann, 1999b, 2004a,b, 2005b; Rizzi et al., 2003, 2004; Sobol, 2004; Provenzi et al., 2007, 2008; Bertalmío et al., 2009; Kolås et al., 2011; see McCann and Rizzi, 2012, p. 285–371 for a review).

CIELAB/CIECAM models calculate sensations. They measure the *X*, *Y*, *Z* reflectances of individual pixels and transform them into a new color space. The model uses only two radiance measurements of a single pixel: the radiance coming to the eye, and the illumination falling on that pixel. The ratio of radiances over illumination gives the pixel's reflectance, independent of the content of the rest of the scene. These equations transform the position in color space of the object's reflectance. There is nothing in the calculation that can generate different outputs from identical reflectance inputs. These models predict the same color appearance for all blocks with the same physical reflectance. While useful in analyzing appearances of flat scenes, such as printed test targets, it does not predict appearances with shadows and multiple reflections.

Computer Vision (Computational Color Constancy) has the goal of calculating the object's reflectance, namely the object's intrinsic property. The question here is whether such material recognition models have relevance to human vision. If a computer vision algorithm correctly calculated cone reflectances of flat Mondrians, then one might argue that such processes could happen in human vision (Ebner, 2007). However, the 3- D Mondrians, and other experiments show that illumination affects the observers' responses (Rutherford and Brainard, 2002; Yang and Shevell, 2003). If that same Computer Vision algorithm correctly calculated 3-D Mondrian reflectances, then these calculations would not model their appearances. Computer Vision is a distinct discipline from human vision, with very different objectives. These algorithms are not applicable to human vision.

Surface Perception has the goal of calculating an observer's perceived recognition of a surface's reflectance. In many perception experiments subjects report on the observed properties of objects. The two sides of the lake raft in **Figure 1** have different appearances (sensations). Nevertheless, observers recognize that these different appearances are part of the same object in different illumination. Our dataset reports the sensations of constant reflectances in structured illumination. It is not useful in evaluating models that calculate the perception of objects. The observer task was different and the data are not useful in modeling cognition.

#### **REAL PAINTS AND LIGHTS**

In the careful analysis of reflectance and illumination, with its extended dynamic range, there is no room for errors and artifacts introduced by image capture and display technologies. In 1975 we began to study human vision using computer controlled complex image-displays (Frankle and McCann, 1983; McCann and Houston, 1983a). Since then, we have been aware of the need for extensive calibration of electronic imaging devices (McCann and Houston, 1983b). For the experiments in this paper, we chose to fabricate our test scene with real objects painted with exactly the same paints. We chose to use real light sources. We were measured the reflectance of each paint, the *Y*, *x*, *y* of the light coming from the surface, and the full spectra of the paints in the watercolor.

HDR reproduction techniques are widely used today. They include a variety of approaches to render the appearance of HDR scenes in LDR Media (Frankle and McCann, 1983; McCann, 1988, 2004a,b, 2007). Multiple exposure techniques are used to improve photographic reproductions (Debevec and Malik, 1997; Reinhard et al., 2006). Nevertheless, multiple exposures do not record accurate scene radiances. Rather they record the sum of the scene radiance and the undesired veiling glare from the camera and its optics. Glare is image dependent, and cannot be corrected by calibration (McCann and Rizzi, 2007; Rizzi and McCann, 2009). Scene information and glare cannot be separated without independent radiance measurements of the scene.

Similarly, there are great difficulties in error-free rendering the information stored in computer memory on a print, or display device. Extensive calibrations of all image areas throughout the full 3-D color space are needed to avoid hardware limitations. The hardware systems that convert digits to light have many operations that alter the light coming to the eye from the expected value to a different device-dependent value. The digital image stored in computer memory is continuously sent, via a graphics card, to the display pixel (refreshed at the rate specified by the hardware). The physical characteristics of the display (spectral emission, number and size of pixels); the time budget (refresh rate and response times), the image processing in the graphics card; and the circuitry in the display all influence the display's light output at each pixel. The amount of light output does not always correspond to image digits in computer memory. A good example is that the EMF of the display signals in the screen wiring introduces image-dependent color shifts (Feng and Daly, 2005). Hardware systems introduce image-dependent transformations of the input signals that on average improve the display's appearance (Feng and Yoshida, 2008). HDR displays with two active light modulators introduce even more complexity with high-resolution LCDs, and low-resolution LEDs. The system integrates the two images with complex, proprietary, spatial filtering of the image data (Seetzen et al., 2004). It is not a simple matter to verify the accuracy of a display over its entire light-emitting surface, for all light levels, for its entire 3-D 24-bit color space. The combination of reflectances (range = 100:1), and illuminations (range = 100:1) require great precision over a range of 10,000:1. Rather than calculate the combined effects of reflectances and illumination for an image-dependent display device, and verify its accuracy with calibration measurements, we chose to use real lights and paints for this analysis.

#### **DATASET APPLICATIONS**

We made RAW format digital photographs of the LDR and HDR parts of the display using a Leaf Aptus digital sensor in a Mamiya body camera. We used multiple exposures to verify the camera's range of linear response. Using the KM spotmeter readings we calibrated the linear portion of RAW camera digits to convert to XYZ data. The next steps convert XYZ to cone response and then use the human glare spread function (Vos and van den Berg, 1999) to calculate the cone quanta catch of the retinal image that includes the veiling glare of intraocular scatter. Calibrated digital images of the arrays of scene radiance and cone quanta catches will be added to the dataset reported here. These images can be used as the calibrated input to models of color appearance and object intrinsic properties. The details of image calibration and model analysis are beyond the scope of this paper.

#### **CONCLUSIONS**

We measured the sensation appearances of two identical arrays of 3-D objects in nearly uniform (LDR) and nonuniform (HDR) illumination. They were viewed in the same room at the same time. All flat facets were painted with one of 11 paints. We used two different techniques to measure the appearances of these constant reflectance paints. In the first, observers made magnitude estimates of changes in Munsell notation; in the second we measured the reflectance spectra of an artist's watercolor rendition of both scenes. Departures from perfect color constancy are the signature of the underlying mechanism. Both magnitude estimates and watercolor reflectances showed that departures depended on the spatial structure measured in the illumination. The dataset reported here provides measurements of radiances and sensations in complex scenes for future analysis by computational models of appearance. If a computer algorithm discounted the illumination, and succeeded in accurately calculating an object's reflectance, then that algorithm would not predict observed sensations in real-life scenes with complex nonuniform illumination.

#### **ACKNOWLEDGMENTS**

We would like to thank all the participants of CREATE, European Union, Framework 6 Marie Curie Conferences and Training Courses (SCF); and staff at the Center for Fine Print Research University of the University of the West of England, Bristol; and the assistance of Alison Davis, Vassilios Vonikakis, and Mary McCann. We also want to thank the referees who engaged in thoughtful and comprehensive discourses about the work. Their efforts significantly improved the paper.

#### **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www.frontiersin.org/journal/10.3389/ fpsyg.2014.00005/abstract

#### **REFERENCES**


Land, E. H. (1964). The retinex. *Am. Sci*. 52, 247–64.


Richards, W. (ed.). (1988). *Natural Computation*. Cambridge: MIT Press.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 31 May 2013; paper pending published: 18 June 2013; accepted: 04 January 2014; published online: 24 January 2014.*

*Citation: McCann JJ, Parraman C and Rizzi A (2014) Reflectance, illumination, and appearance in color constancy. Front. Psychol. 5:5. doi: 10.3389/fpsyg.2014.00005*

*This article was submitted to Perception Science, a section of the journal Frontiers in Psychology.*

*Copyright © 2014 McCann, Parraman and Rizzi. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*