**PERCEPTION OF VISUAL ADVERTISING IN DIFFERENT MEDIA: FROM ATTENTION TO DISTRACTION, PERSUASION, PREFERENCE AND MEMORY**

**Topic Editors Jaana Simola, Jukka Hyönä and Jarmo Kuisma**

PSYCHOLOGY

#### *FRONTIERS COPYRIGHT STATEMENT*

© Copyright 2007-2015 Frontiers Media SA. All rights reserved.

All content included on this site, such as text, graphics, logos, button icons, images, video/audio clips, downloads, data compilations and software, is the property of or is licensed to Frontiers Media SA ("Frontiers") or its licensees and/or subcontractors. The copyright in the text of individual articles is the property of their respective authors, subject to a license granted to Frontiers.

The compilation of articles constituting this e-book, wherever published, as well as the compilation of all other content on this site, is the exclusive property of Frontiers. For the conditions for downloading and copying of e-books from Frontiers' website, please see the Terms for Website Use. If purchasing Frontiers e-books from other websites or sources, the conditions of the website concerned apply.

Images and graphics not forming part of user-contributed materials may not be downloaded or copied without permission.

Individual articles may be downloaded and reproduced in accordance with the principles of the CC-BY licence subject to any copyright or other notices. They may not be re-sold as an e-book.

As author or other contributor you grant a CC-BY licence to others to reproduce your articles, including any graphics and third-party materials supplied by you, in accordance with the Conditions for Website Use and subject to any copyright notices which you include in connection with your articles and materials.

All copyright, and all rights therein, are protected by national and international copyright laws.

The above represents a summary only. For the full conditions see the Conditions for Authors and the Conditions for Website Use.

**ISSN** 1664-8714 **ISBN** 978-2-88919-416-2 **DOI** 10.3389/978-2-88919-416-2

### *ABOUT FRONTIERS*

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

### *FRONTIERS JOURNAL SERIES*

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing.

All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

### *DEDICATION TO QUALITY*

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view.

By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

### *WHAT ARE FRONTIERS RESEARCH TOPICS?*

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area!

Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

## **PERCEPTION OF VISUAL ADVERTISING IN DIFFERENT MEDIA: FROM ATTENTION TO DISTRACTION, PERSUASION, PREFERENCE AND MEMORY**

Topic Editors: **Jaana Simola,** University of Helsinki, Finland **Jukka Hyönä,** University of Turku, Finland **Jarmo Kuisma,** Aalto University School of Business, Finland

Orange Bull I. Tempera on Canvas, 2007. With permission of Marjatta Tapiola, one of the leading painters of Finnish modern art.

This Research Topic aims to showcase the state of the art in visual advertising research. Although visual processes are a central component of consumer behavior, they have been largely neglected in models explaining consumer perception of advertising. Rather than being the mere input into the cognitive or affective systems, the visual processes both voluntarily and involuntarily affect the amount and quality of information that is passed into further mental processing. Moreover, advertisements provide a welldesigned, rich and stimulating environment to study visual processes in real-life conditions.

Consumers encounter thousands of advertisement messages per day. Previous research on visual perception of advertising mostly considers print advertising. However, advertising messages increasingly appear in a variety of formats and in different media. Part of these messages are still conveyed through traditional media,

such as newspapers, magazines, television, as well as outdoor and supermarket advertising. In addition, the amount and diversity of visual marketing stimuli is rapidly growing in

terms of different advertising formats appearing in online and social media, smartphones and tablets. This challenges the marketing professionals and academics to better understand the impact of marketing on consumers. At the same time, the technical development of the research methods allows better opportunities to investigate advertising perception in different environments.

Traditionally, papers investigating the psychological processes underlying advertising perception are published in journals widespread across different disciplines, such as marketing, applied psychology and human computer interaction journals. With this Research Topic, we aim to create a forum in which experts in different fields define the state of the art and future directions of the research on the visual aspects of marketing. We include reviews and original research papers involving both empirical and theoretical studies on visual perception of advertising across different media.

# Table of Contents


#### *Jaana Simola1 \*, Jukka Hyönä2 and Jarmo Kuisma3*

*<sup>1</sup> Cognitive Science, Cognitive Brain Research Unit, University of Helsinki, Helsinki, Finland*

*<sup>2</sup> Department of Psychology, University of Turku, Turku, Finland*

*<sup>3</sup> Department of Marketing, Aalto University School of Business, Helsinki, Finland*

*\*Correspondence: jaana.simola@helsinki.fi*

#### *Edited and reviewed by:*

*Lorenza S. Colzato, Leiden University, Netherlands*

**Keywords: advertising, eye movements, attention, memory, ad format, animation, internet, media**

Our everyday visual environment is cluttered with advertisements. We come across them in newspapers, magazines, television, and Internet. They can be static, as in print advertisements, or dynamic, as is often the case with TV and Internet ads. The advertising messages are transmitted into the cognitive and affective systems via visual processes. Rather than being a mere input device, the visual processes both voluntarily and involuntarily control the amount and quality of information that is passed onto further mental processing. The effectiveness of advertising therefore critically depends on its ability to attract visual attention. The advertisements should also retain attention long enough in order to allow sufficient encoding of information into the long-term memory.

This Research Topic reviews and further explores processing of visual advertising. We draw together state of the art research on how different ad properties, such as image statistics, surface size of the ads, presence of human faces or animation, and distance of the ads from the editorial content affect ad processing. Furthermore, new measures are proposed to evaluate the capacity of an advertisement to guide observers' gaze and to evaluate the global scanning behavior related to the ads. The Research Topic also presents novel hypotheses and results concerning the impact of narcissism and alcohol intoxication on ad processing. Developmental aspects are taken into account by investigating how children interact with online advertisements. The contributions also successfully cover the diversity of media where visual marketing stimuli appear. The studies review and investigate processing of image and text content of print advertisements as well as advertisements in dynamic media, such as in the Internet or in a digital game environment.

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 26 September 2014; accepted: 06 October 2014; published online: 28 October 2014.*

*Citation: Simola J, Hyönä J and Kuisma J (2014) Perception of visual advertising in different media: from attention to distraction, persuasion, preference and memory. Front. Psychol. 5:1208. doi: 10.3389/fpsyg.2014.01208*

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology.*

*Copyright © 2014 Simola, Hyönä and Kuisma. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

**REVIEW ARTICLE** published: 17 March 2014 doi: 10.3389/fpsyg.2014.00210

#### *Emily Higgins\*, Mallorie Leinenger and Keith Rayner*

*Department of Psychology, University of California, San Diego, CA, USA*

#### *Edited by:*

*Jukka Hyönä, University of Turku, Finland*

#### *Reviewed by:*

*Agnieszka Konopka, Max Plank Institute for Psycholinguistics, Netherlands Stevan Adam Brasel, Boston College, USA*

#### *\*Correspondence:*

*Emily Higgins, Department of Psychology, 0109, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA e-mail: ehiggins@ucsd.edu*

In this selective review, we examine key findings on eye movements when viewing advertisements. We begin with a brief, general introduction to the properties and neural underpinnings of saccadic eye movements. Next, we provide an overview of eye movement behavior during reading, scene perception, and visual search, since each of these activities is, at various times, involved in viewing ads. We then review the literature on eye movements when viewing print ads and warning labels (of the kind that appear on alcohol and tobacco ads), before turning to a consideration of advertisements in dynamic media (television and the Internet). Finally, we propose topics and methodological approaches that may prove to be useful in future research.

**Keywords: advertising, eye movements, visual attention, saccades, marketing**

Eye movements are of interest, with respect to viewing advertisements and more generally, because they provide finegrained information about patterns of visual attention. Because we cannot process detailed information far beyond the *fovea*, the central region of the retina spanning about 2◦ of visual angle, we must move our eyes from one location to the next, sequentially *fixating* (or looking directly at) areas of interest (Rayner, 1998, 2009). Saccade targets are determined, in large part, by our immediate cognitive or perceptual requirements. Eye movements are thus an important way in which we exercise active selection over our complex visual environments (Findlay and Gilchrist, 2003). By inspecting the eye movement record we can, consequently, make inferences about how viewers selectively attend to the visual world, whether they are reading, viewing natural scenes, searching for a target item, or, as is of primary concern here, viewing advertisements.

It is important to note, at this point, that *eye position* and the *locus of visual attention* are not precisely identical concepts, since it is possible to disengage attention from the current point of fixation (Posner, 1980). Indeed, our attention generally shifts to the *next* location we will fixate shortly before we actually move our eyes (Rayner et al., 1978; Kowler et al., 1995; Deubel and Schneider, 1996). However, attention and eye movements are typically quite closely coupled (and, when they do become separated, it is generally in the systematic manner just described, so that the eyes will soon "catch up" with the focus of attention). Therefore, fixation distributions provide detailed information about which regions of a display most effectively capture visual attention. Furthermore, the duration spent fixating each location provides information about the amount of cognitive and perceptual processing devoted to that region (Rayner, 1998, 2009).

Research on eye movements and advertisements can provide general theoretical insights (Rayner et al., 2001; Wedel and Pieters, 2008b). For instance, the domain is well-suited for investigating the relationships between eye movements and higher-level

phenomena, such as memory and preference. Furthermore, work in this area can shed light on how we integrate text and images as we inspect our visual environments, as ads are often complex stimuli, composed of both elements. As Buswell (1935) noted in his classic study of eye movements and scene perception, this research may be also be useful from an applied perspective (see Duchowski, 2002 for a general review of applied eye movement research).

There are several reasons why eye tracking may be useful to those who design advertisements or public policy notices such as warnings on alcohol and tobacco products. First, eye movements can provide insight into the fast and detailed dynamics of visual attention that may simply not be available for introspection or verbal report (Pieters and Wedel, 2008). Second, eye tracking can be done in real time during ad viewing without interfering with ongoing processing (Russo, 1978; Wedel and Pieters, 2008a; Glaholt and Reingold, 2011). Third, the technique seems less prone to biasing subsequent responses of interest (e.g., choice of product or brand memory) than verbal protocols. Fourth, eye tracking can provide an efficient means of pinpointing which specific characteristics of an ad contribute to its success or failure in holding viewers' attention or driving consumer choices1. Of course, the technique is limited with regard to the kinds of information it can provide: if a researcher or advertiser were primarily interested

<sup>1</sup>Suppose, for instance, that two draft versions of an ad were created and that one was consistently viewed for longer than the other. If the ads differed in several respects (pictorial, headline, etc.), an eye tracking experiment could efficiently reveal which element of the favored (or, at least, longer-viewed) ad was driving the effect. This information could then be used to inform the creation of new ads. As another example, suppose that behavioral experiments revealed that the inclusion of a particular new element in an ad – a line of text, for example, or a "packshot" showing the product – failed to increase memory and preference for the brand or product in question. Eye tracking could reveal whether the element was viewed (but, presumably, deemed unpersuasive) or simply never fixated. This, in turn, could provide useful clues about how the element should be revised, e.g., by changing its message or simply making it more visually salient (Lohse, 1997).

in viewers' conscious, emotional reactions to a given image, for example, soliciting verbal responses would be preferred. Used in conjunction with other approaches, however, including interviewing subjects, testing their memory for products or brands, and tracking their selections, the technique can contribute substantially to applied research on advertisements (Treistman and Gregg, 1979).

We begin by providing some background information on the basic properties of eye movements as well as their characteristics in reading, scene perception, and visual search. These topics are relevant because ads often consist of both text and scene-like information, and may also include a search component (if, for example, one is searching in a supermarket circular for a particular product of interest). Next, we will provide a more specific review of key findings concerning eye movements when viewing advertisements, including print ads, warning labels, and ads appearing on television (TV) and on the Internet2. Finally, we outline some topics that have, up to this point, remained relatively unexplored, as well as methodological approaches that may prove useful in future research.

#### **BACKGROUND INFORMATION ON EYE MOVEMENTS BASIC CHARACTERISTICS**

While we can produce several different types of eye movements (see Rayner, 1998 for a review), only *saccades* are covered here, since they are most critical for the research reviewed. Saccades are fast, darting movements that we perform about three times each second (Schiller, 1998). They are interleaved with brief periods of relative stability, known as *fixations*, which last on average about 200–300 ms, depending on the task and the individual (Rayner, 1998, 2009). Saccades can reach velocities as high as 500◦ of visual angle per second. While their duration is dependent on the distance covered and varies as a function of task, they generally last about 20–50 ms. During these movements, effective visual processing is largely suppressed (Matin, 1974; Campbell and Wurtz, 1978), such that useful visual information can only be gathered during the intervening fixations.

Saccades are executed, as was noted above, in order to bring the fovea, the central 2◦ of the visual field with high acuity and good color vision, into alignment with the region we wish to process. The region surrounding the fovea and extending up to 5◦ of visual angle from fixation is known as the *parafovea*, while the region that lies beyond the parafovea is known as the *periphery* (note, however, that acuity drops off in a continuous fashion with increasing distance from the fovea, so that no sharp distinction should be drawn between the parafovea and periphery; Liversedge and Findlay, 2000). Although we make use of the lower resolution, parafoveal and peripheral information (e.g., to

begin to process an upcoming word when reading or to decide where to move the eye next), for most tasks requiring the rapid processing of detail, foveal processing is necessary (Rayner, 1998, 2009).

#### **NEURAL BASIS OF SACCADE TARGETING**

The neural underpinnings of saccade targeting span multiple cortical and sub-cortical structures involved in attention, visual processing, and motor planning. We present a brief overview of some of the important aspects of this system here (for reviews, see Gaymard et al., 1998; Schiller, 1998; Liversedge and Findlay, 2000; Pierrot-Deseilligny et al., 2004; Schall and Cohen, 2011).

A saccade occurs when the extraocular muscles, arranged in three opposing pairs around the eye, are appropriately stimulated by premotor structures in the brainstem. Regions of the superior colliculus (SC), located in the midbrain, are critical for controlling these saccades. One population of cells in the SC fires continually during fixation, ceasing to fire just before a saccade is executed and remaining inactive for much of the duration of the saccade. Another population of cells forms a map of the visual field. The level of neural activity at different locations in the map appears to code for the importance of the corresponding locations in the visual scene. Thus, this population of cells is sometimes referred to as a *salience map*, with areas of high activity (or "peaks") marking important positions that serve as candidate targets for the upcoming saccade (Findlay and Gilchrist, 2005).

Similar maps appear to exist in other, cortical areas of the brain that project to the SC, though they are sometimes known as *priority maps* in these higher areas (Schütz et al., 2011). Maps in a region of the frontal cortex known as the frontal eye fields (FEF) may be important for directing endogenous, or top-down, saccades – i.e., saccades based largely on the goals of the viewer3. In contrast, the parietal eye fields (PEF) in the parietal lobe appear to be particularly important for coding exogenous, reflexive, or bottom-up saccades, of the kind that might occur, for example, following the sudden onset of a stimulus. Other frontal regions may be involved in suppressing such saccades, however, when executing them would be undesirable for present purposes (Pierrot-Deseilligny et al., 2004).

Notably, when mild stimulation, insufficient to trigger a saccade, is applied to the SC or FEF, this leads to superior visual processing at the corresponding locations in the scene (see Noudoost et al., 2010 for a summary), indicating overlap between the visual attention system and the oculomotor system (see Desimone and Duncan, 1995 for a review of visual attention in the brain).

While the basic principles of the oculomotor system hold true across tasks, it is important to note that eye movement measures in one task (e.g., reading) can differ substantially from those in other tasks (e.g., scene perception). This likely follows from differences both in the physical stimuli involved and in the nature of the viewers' goals and cognitive processing across these different activities. Therefore, we outline the basic characteristics of eye

<sup>2</sup>Please note that some important topics concerning eye movements and marketing lie beyond the purview of this article. For example, we do not cover point-ofpurchase marketing here (e.g., consumer responses to supermarket shelf displays). However, this is an active area of research (see Wedel and Pieters, 2008a; Glaholt and Reingold, 2011; Orquin and Mueller Loose, 2013 for relevant reviews). The topic of roadside advertising and potential attendant distraction, while clearly a matter of great importance, is also beyond the scope of the present article.

<sup>3</sup>Many complexities of the system are necessarily omitted from this short review. For example, the FEF also have direct projections to the premotor areas of the brainstem that are not relayed through the SC (Gaymard et al., 1998).

movements during reading, scene perception, and visual search below.

#### **READING**

When reading, fixations tend to be on the order of 225–250 ms. Average saccade length is seven to nine letters in alphabetic languages (Rayner, 1998, 2009). For speakers of English, and other languages written from left to right, most eye movements proceed in that direction, with *regressions* (i.e., saccades that move backward in the text) representing 10–15% of eye movements. Readers only fixate about 70% of the words in the text, *skipping* the other 30%.

Eye movements during reading provide an online index of the cognitive processes underlying language comprehension: in fact, how long the eyes remain fixated on a given word largely depends on how easy or difficult it is to process. Lexical variables such as word frequency and predictability have strong influences on fixation durations (for reviews, see Rayner, 1998, 2009), as does reading skill (Ashby et al., 2005) as well as typographical factors such as font difficulty (Rayner et al., 2006; Slattery and Rayner, 2010).

Though a large amount of text falls on the visual field during reading, readers are only able to obtain useful letter information from approximately 18–20 character spaces around fixation, and they do not use information from lines above or below the currently fixated line (Inhoff and Briihl, 1991; Inhoff and Topolski, 1992; Pollatsek et al., 1993). This limited area of effective processing, known as the *perceptual span*, is asymmetrical in the direction of upcoming text (and attention), such that, for readers of English, it extends about three to four character spaces to the left of fixation (McConkie and Rayner, 1976; Rayner et al., 1980) and 14–15 characters to the right of fixation (McConkie and Rayner, 1975; Rayner and Bertera, 1979).

While fixation location and visual attention coincide when we are processing a fixated word, they may become decoupled when processing of that word is complete. While the eyes remain fixated on the current word, attention can nonetheless shift to the upcoming word (located parafoveally, but within the perceptual span) so that processing of this parafoveal word can begin. This preprocessing prior to actual fixation will facilitate foveal processing following a saccade to that word, giving rise to a *preview benefit*. Preview benefit is measured using a gaze-contingent boundary paradigm (Rayner, 1975), in which an initial preview of a target word is replaced with the word itself when the subject's eyes cross an invisible boundary during the saccade to the target (note that, because the display change occurs during the saccade, when vision is largely suppressed, subjects generally fail to notice it; Slattery et al., 2011). The preview may be identical to the target or may be a non-identical letter string. During reading, this preview benefit, defined as the reduction in foveal viewing time of the target following an identical vs. a non-identical preview, is about 30–50 ms (for reviews, see Rayner, 1998, 2009; Schotter et al., 2012).

#### **SCENE PERCEPTION**

During scene perception, viewers make both longer fixations and longer saccades than when reading text. Fixations last, on average, about 300 ms, while saccades span approximately 4–5◦ of visual angle (though both figures vary depending on the specific features of the scene as well as the task at hand). Furthermore, the perceptual span in scene viewing is substantially larger than in reading, though its precise extent is not as well understood as it is in reading (Rayner and Castelhano, 2008; Rayner, 2009). In addition, just as in reading, viewers obtain a preview benefit during scene perception (Pollatsek et al., 1984, 1990; Henderson et al., 1987, 1989; Henderson, 1992; Henderson and Siefert, 1999, 2001). The magnitude of this benefit appears to be on the order of 100 ms (Rayner, 1998, 2009).

Within our very first fixation on a scene we are, rather impressively, able to extract its global meaning or *gist*, distinguishing, for example, an indoor from an outdoor scene or a forest from a mountain landscape (Henderson, 2003; see Oliva, 2005 for a review of gist processing). This first glimpse is thought to orient the viewer and provide some guidance about subsequent eye movements (Rayner, 2009). When viewers do go on to inspect the rest of the scene, they do not fixate all regions with equal probability. Rather, they tend to selectively view those elements that are particularly meaningful or relevant. For instance, viewers inspecting a scene of two figures walking in a garden would devote a great many more fixations to the people's faces than to a nearby patch of plain grass (see Buswell, 1935 for a classic demonstration of this effect). In addition, if a region is visually distinctive or *salient* – for example, if it is of higher or lower intensity than its immediate surroundings – it will tend to draw a disproportionate number of fixations (Parkhurst and Niebur, 2003).

The goals of the viewer also affect eye movements during scene perception. Yarbus (1967), for instance, found that viewers inspected a single painting, Repin's *The Unexpected Visitor*, quite differently depending on their instructions. In the painting, a man (the "visitor") enters a domestic scene. When viewers were asked to decide how long the visitor had been away, for instance, fixations seemed to cluster mainly on the faces of the individuals in the room. When asked to determine the economic circumstances of the family depicted, however, viewers' fixations appeared more widely dispersed, landing more upon objects in the room (such as pieces of furniture or clothing) that might provide information about prosperity than in the former condition.

Finally, one striking finding regarding scene perception is that, despite the common intuition that we monitor our visual environments quite closely (Levin et al., 2000), research indicates that we may miss even rather dramatic changes provided that they happen during a saccade or other visual disruption. Grimes (1996; see also McConkie and Currie, 1996), for example, investigated subjects' sensitivity to dramatic changes in natural scenes introduced during saccadic eye movements. Even with prior warning that such changes might occur, subjects' ability to detect them was surprisingly limited. For example, when a flock of birds in one scene dwindled in number by about a third during an eye movement, subjects reported noticing something odd only about 10% of the time. Importantly, however, if the changing object is pre-cued (Rensink et al., 1997) or lies near the target of the critical saccade (i.e., the saccade during

which the change occurs), change detection rates improve (Henderson and Hollingworth, 1999). These findings highlight the critical role of attention in determining how we perceive our visual environments.

#### **VISUAL SEARCH**

Visual search is an important part of many everyday activities. We perform such searches, for example, when looking for tea at the grocery store or trying to find our keys on the way to work each morning. The basic parameters of fixations and saccades during visual search are quite variable. Overall, average fixation times are reported to be between 180 and 275 ms, while average saccade size tends to be intermediate between that of reading and that of scene perception, but can vary widely (Rayner and Castelhano, 2008). Such variability is perhaps to be expected since, as will be seen below, eye movement patterns during search exhibit a remarkable flexibility and sensitivity to the specific demands of the moment.

When we search for an item of interest, both bottom-up (or stimulus-driven) and top-down (goal-driven) factors guide our eye movements. Bottom-up guidance is evident when eye movements are drawn to a region that stands apart from its surroundings, irrespective of the qualities of the search target (see Itti and Koch, 2001 for a review of models that emphasize bottom-up effects on attention and eye movements). An item that stands out in a highly salient manner from all surrounding objects (e.g., a single tilted line amid a field of vertical lines) is said to "pop out" (Wolfe, 1994).

Top-down guidance is driven by the properties of the target and their relationship with various elements of the scene. For instance, if we are searching for a bright yellow car in a crowded parking lot, similarly bright cars will preferentially attract our eye movements (Pomplun,2006).When we perform *conjunctive* visual search, i.e., search for a target that is defined by a pair of properties (e.g., being both round and red), fixations cluster preferentially on items belonging to the less frequent property in the display (Shen et al., 2003). This illustrates the remarkable sensitivity of our eye movement system to the relative informativeness of different stimulus features during search.

Top-down search also operates when our high-level expectations about where a target object is expected to reside affect search behavior. For instance, when searching for a computer monitor in an office scene, eye movements will cluster on the desk, rather than along the floorboards (Neider and Zelinsky, 2006). In general, recent research suggests that, while bottom-up guidance plays a role in search, top-down guidance may be dominant during realworld searchfor meaningful objects (e.g.,Chen and Zelinsky,2006; Pomplun, 2006; Henderson et al., 2007; Peters and Itti, 2007).

#### **VIEWING ADVERTISEMENTS**

We now turn to examine research more specifically focused on eye movements when viewing advertisements. We discuss print advertisements, warning labels, and dynamic media (TV and the Internet) in turn.

#### **PRINT ADVERTISING**

Viewers obtain the gist of print advertisements very quickly, reliably discriminating them from editorial content – and, under

some conditions even identifying the advertised product – after exposures of only 100 ms (Pieters andWedel, 2012). In this section, we examine some of the factors that guide attention after this initial glimpse, as viewers begin to actively explore advertisements by shifting their gaze from one location to the next within the display. We begin by considering the composition of ads, including basic visual properties (e.g., color and size) as well as higherlevel, semantic cues. Next, we review effects of ad originality (or creativity) as well as repetition. We then consider how viewers' goals or tasks affect viewing behavior before turning, finally, to briefly review findings concerning the integration of text and picture processing when viewing print advertisements. At several points throughout the review, the relationship between eye movements and higher-level phenomena such as memory will also be discussed.

#### *Ad composition*

In this section, we review critical findings on the relationship between the composition of print ads and eye movement measures. We begin by examining possible effects of basic, visual characteristics and then proceed to a consideration of higher-level, semantic aspects of advertisements.

Lohse (1997) tracked subjects' eye movements as they viewed yellow page advertisements and selected products from various categories as if for purchase. Viewers were more likely to look at large ads than small ads (see also Pieters et al., 2007), though small display ads received more fixations per unit area than large display ads (see Peschel and Orquin, 2013 for a review of surface size effects on visual attention). Viewers were also more likely to fixate on color than black and white ads, and looked at color ads sooner (i.e., nearer the beginning of the fixation sequence) and for a longer duration. In addition, they spent marginally more time viewing ads that contained pictures than those that did not. The location of the ad was also important, such that ads near the end of the page were often skipped. Products that were subsequently selected also received considerably more visual attention than did those that were not. Lohse and Wu (2001) conducted a similar study, this time presenting a directory in Mandarin to Chinese subjects and replicated the main findings of the original study, suggesting that these effects are not culturally specific.

Other research has examined possible effects of the size of particular elements of advertisements, such as the text or picture, on patterns of visual attention. When ads were presented as part of a competitive visual array (as in a supermarket circular), Pieters et al. (2007) found that ads with larger pictures, but not larger text elements, were more likely to be fixated and were viewed for longer. In contrast, Pieters and Wedel (2004) found that when subjects inspected solitary advertisements in magazines, ads with larger text elements, but not larger pictures, were more likely to be fixated and viewed for longer. (The *presence* of a picture, however, independent of its size, did appear to attract attention under these conditions.) Comparing these findings may suggest that sufficient picture size is particularly important for capturing and holding attention in competitive visual environments, while a sufficient amount of text may be especially important when ads are presented alone. However, the

results were obtained in separate studies using stimuli that differed in several respects (e.g., types of product advertised, the range of text and picture sizes), so no strong claim to that effect can yet be made.

Interesting findings have also been reported regarding brand elements (e.g., logos) of advertisements in particular. While intuition might suggest that viewers will be repelled by them, since they serve as a salient reminder that the stimulus is an ad rather than a piece of editorial content, some eye movement data suggest otherwise. First, Wedel and Pieters (2000) found that, among all ad elements, the brand received most fixations per unit of surface area (but see Ryu et al., 2009). Second, each fixation on the brand element predicted a greater improvement in performance on a subsequent recall test than did each fixation on the text or pictorial4. Third, increasing the size of the brand element did not reduce overall viewing times on ads, as one might expect on the theory that salient brand elements reduce attention to advertisements (Pieters and Wedel, 2004). However, as will be noted below, the sustained presence of a central brand element in TV commercials is associated with ad skipping (Teixeira et al., 2010).

Visual competition or clutter, an issue of considerable importance in many visually complex contemporary environments, has also been examined. Pieters et al. (2010) found that high levels of visual feature complexity in advertisements was associated with reduced viewing of the brand element. Visual competition is also a concern when designing "feature advertisements" (such as supermarket circulars), wherein multiple ads are displayed simultaneously and must compete for viewers' attention. Janiszewski (1998) found that items subject to greater visual competition by surrounding objects were viewed for less time and, in a separate experiment, remembered less well than items subject to less competition.

Janiszewski (1998) also proposed that the layout of feature advertisements could be optimized (from the perspective of the advertiser), without removing any items, in order to minimize visual clutter and maximize overall viewing time. Pieters et al. (2007) extended this line of inquiry, developing a model to minimize visual competition (based on the Attention Engagement Theory; see Duncan and Humphreys, 1989, 1992). This optimized layout led to an increase in overall viewing time of the entire ad array when compared with the existing layout. Average time spent viewing a particular feature ad, given that it was fixated, was also higher in the optimized layout, though average probability of fixating an ad within the array declined. Furthermore, Zhang et al. (2009) developed a Bayesian model that, they argue, suggests that the layout of feature advertisements can affect sales and that this effect is mediated by visual attention on ads. However, confounds are, of course, a concern in correlational research of this kind (though Zhang et al., 2009 adopted a statistical approach designed to circumvent several concerns of this nature).

Simola et al. (2013) examined both the semantic and the spatial relationships between ads and editorial material. They found that when the semantic content of ads was congruent with the text – for instance, a beer ad accompanying an article about beer – these ads were (at least when presented on the right) remembered better than were incongruent ads. Interestingly, however, incongruent ads received more visual attention (also when presented on the right) than did congruent ads (but see Hervet et al., 2011, discussed below). This difference only appeared in "second-pass" viewing of the ad (that is, on a return to the ad after having left it), suggesting that an initial fixation on the ad was required before effects of semantic congruency could influence eye movements. Simola et al. also found that ads received more visual attention and were recognized better when placed to the right of the editorial content.

Social cues contained within advertisements have also been examined. Hutton and Nolte (2011) recently demonstrated, for instance, that when a model in an advertisement looks at the product on display, rather than looking forward toward the viewer, subjects spend longer inspecting the product, the brand logo, and the advertisement as a whole.

Classic research has also found that the presence of a human form may affect viewing behavior (Nixon, 1925; see also (Kroeber-Riel, 1979) citing a study by Witt, 1977 concerning the level of undress exhibited by a figure in an advertisement). Research in scene perception indicates, however, that when attempting to discover effects of high-level, semantic aspects of a stimulus, it is important to control for possible differences in low-level visual salience (see Rayner, 1998 for a discussion of such considerations). Future research could build upon these early studies, then, by determining and attempting to control for differences in lowlevel visual salience across ads, thus allowing us to draw stronger inferences about the possible role of these higher-level, semantic factors.

#### *Originality*

When ads are particularly creative or original, how do viewers respond? Radach et al. (2003) compared viewing behavior, affective responses, and memory for "implicit" and "explicit" ads. The explicit ads featured text and images that were related to one another and to the product being advertised in a fairly straightforward manner while, in the implicit ads, these relationships were more creative and less direct. The implicit ads were viewed for longer than their explicit counterparts and, while mean fixation duration and saccade amplitudes did not differ across ad types, the implicit ads received significantly more fixations than did the explicit ads. Subjects also liked the implicit ads better than the explicit ones and rated them to be more interesting than their explicit counterparts5. Overall, memory for the implicit and explicit ads was similar, but a detailed analysis suggested that there might have been a slight advantage for the implicit ads in some conditions (see also Pieters et al., 1999b).

<sup>4</sup>However, it should be noted that the particular nature of the memory test used here, in which subjects had to identify the advertised brand based on a pixilated version of the ad, seems likely to confer a relative advantage on the brand element when compared with other components. Note, for instance, that the body text was not easily resolvable from the pixilated version of the ad. Thus, further examinations should attempt replicate this result using different types of recall tests.

<sup>5</sup>There is a typo in Table 6 of the chapter by Radach et al. (2003) suggesting that, in Experiment 2, the explicit ads were liked better and rated as more interesting. However, the main body of the text (with which the table conflicts) is correct in claiming that in both Experiment 1 and Experiment 2 the implicit ads were liked better and rated as more interesting (R. Radach, personal communication, October 17, 2013).

However, Pieters et al. (2002) pointed out that while consumers like original ads and view them for longer periods overall, they may attend selectively to the particularly creative or artistic aspects of the advertisements, potentially at the expense of the brand or product advertised. Thus, while such creative ads may please the viewer, they may not serve the interests of the advertiser if, indeed, they direct attention away from the advertised brand. Pieters et al. conducted an experiment that partially addressed this question by comparing viewers' fixations on the brand elements (such as the logo) of original or creative ads with more typical ads. Brand elements in the creative ads tended to receive *more*, not fewer, fixations than those of their typical counterparts, suggesting that creative ads may not, in fact, divert attention from the advertised brand, but rather may serve to increase it.

#### *Repetition*

Another potentially important factor in real-world ad viewing is that a viewer may well be exposed to a particular ad repeatedly (if, for instance, it runs in multiple magazines). Pieters et al. (1996) addressed this topic, finding that when subjects were exposed to an ad three times over the course of an experimental session, viewing time decreased with additional exposures (see also Pieters et al., 1999a). More elements of the ad were also skipped in the third than in the first viewing. Furthermore, an effect of subject motivation on viewing time (to be described below) disappeared by the third exposure. Pieters et al. (1999a) maintained, however, that the probabilities of moving from each ad element (e.g., the headline) to each other element (e.g., the pictorial) on the next fixation remained stable over repeated exposures (see also Rosbergen et al., 1997b). It is not yet clear, however, how well each of these findings will generalize to (arguably more naturalistic) conditions in which exposures to the ad are spaced out over longer intervals.

Finally, Pieters et al. (2002; see also Pieters et al., 1999b) investigated the eye movement patterns associated with ads of varying prior familiarity. Ads rated as being more familiar (by trained raters not participating in the eye movement study) were fixated less frequently than were less familiar ads. The effect seemed mainly to be driven by a decline in fixation frequency on the text with increasing ad familiarity. However, if an ad was particularly original or creative, this ameliorated negative effects of familiarity.

#### *Goals*

As was discussed above, top-down factors concerning the viewer's goal have long been known to affect eye movement behavior during scene perception and other visual activities. More recent research has also examined effects of goal or task when subjects view advertisements and has demonstrated that these factors can have a profound effect on viewing behavior.

Perhaps unsurprisingly, when subjects control viewing time, they inspect ads for longer when given instructions that encourage deeper processing. An important implication of this general finding (to be discussed in more detail below) is that viewing behavior during laboratory tasks that promote deep engagement with advertisements is likely to differ substantially from real-world ad viewing, which is often quite cursory (Wedel and Pieters, 2000; Pieters and Wedel, 2004, 2007, 2008).

Pieters et al. (1996) compared behavior in a "high motivation" condition, in which subjects were instructed to view ads carefully and told they would later be allowed to select one of the advertised products, to that in a "low motivation" condition, in which subjects were simply told to evaluate the "draft versions" of the ads (see also Pieters et al., 1999a, Study 2). In early exposures to the ad, highly motivated subjects viewed ads for substantially longer, although, as was noted above, this difference disappeared by the third exposure. Similarly, Rayner et al. (2001) compared viewers' responses to "critical" ads, those featuring a product to be evaluated as if for purchase, and "non-critical" ads, featuring products from another category. Critical ads were fixated more and viewed for significantly longer than were non-critical ads. Critical ads were also missed less, in a subsequent recognition memory test, than were non-critical ads (though no such advantage for critical items appeared in a free recall test). In addition, Radach et al. (2003) found that when subjects were asked to decide how much they liked an ad, they viewed it for substantially longer than when they were asked to paraphrase the message of the ad. Subtle differences in task, however, may not be sufficient to drive this effect, as Rayner et al. (2008) found no significant differences in total ad viewing time when subjects were instructed to evaluate an ad for its effectiveness or decide how much they liked it.

The total time spent viewing an ad (presented in isolation) can, of course, be measured perfectly well without eye tracking. However, eye movement data can also reveal more fine-grained differences across tasks. In particular, some eye tracking research suggests that viewers' goals affect the proportion of time they allocate to different ad elements, such that tasks that require considering the brand or product advertised in a fairly deep manner may favor the text, while tasks that encourage more shallow processing, or making judgments about the quality of the ad itself, may favor picture viewing.

First, Radach et al. (2003) found that when subjects were asked to evaluate an advertisement, they viewed the picture longer than the other components and subsequently recalled more information about the picture. When subjects were asked to paraphrase the message of an ad, however, viewing time on the picture substantially declined. In addition, Pieters et al. (1996) found interesting differences in text and picture viewing between high and low motivation conditions. However, the effects were only significant in the second of three presentations of the ad, so they should perhaps be viewed as tentative at this time. In the second exposure to an advertisement, low motivation subjects spent a greater proportion of time viewing pictures than did those in the high motivation group. Conversely, high motivation subjects spent a greater proportion of time viewing the text than low motivation subjects.

Pieters and Wedel (2007; see also Wedel et al., 2008 for further analyses of these data) also found that body text and picture viewing were affected differently by task. Subjects spent most time viewing the text in a task that required subjects to learn about the advertised brand. In contrast, viewers' eye movements were drawn preferentially to the picture in conditions that required subjects to memorize the ad or view it freely as they would at home.

Comparing the findings of Rayner et al. (2001), in which subjects were instructed to consider one of the types of advertised products for purchase, and Rayner et al. (2008), wherein subjects made judgments about the ads themselves (whether they liked them and how effective they were) also suggests that different goals may affect text and picture viewing patterns differently. In Rayner et al. (2001), text elements were viewed for a great deal longer than the pictures, while in the latter study, the pictures were viewed longer than the text (though the effect failed to reach statistical significance in an analysis that controlled for differences in surface area across elements). Furthermore, early looks tended to be drawn toward text in the 2001 study (on average, the text was reached by the third fixation) but toward the picture in the 2008 study.

Rayner et al. (2008) compared data obtained in the two experiments, considering only the subset of stimuli that were used in both. Based upon this analysis, they suggested that differences in subject instructions did likely contribute, to some extent, to the differences in viewing behavior across studies. This interpretation should not be viewed as conclusive, however, since the data compared were collected in separate experiments. It should also be noted that, when text and picture viewing for critical and noncritical ads were compared within theRayner et al. (2001)study, no clear interaction of the expected type (i.e., showing a text advantage for critical ads and a picture advantage for non-critical ads) emerged6.

Rosbergen et al. (1997a) obtained related results using latent class analysis to segment viewers into three distinct groups. While task was not manipulated in this study, subjects' attitudes about the advertised products were recorded and compared with the eye movement data. The picture (as well as the headline) was favored by the subject group who spent the least time viewing the ad overall and deemed the advertised product to be particularly low in risk (i.e., they thought that choosing incorrectly would not be a costly error; Jain and Srinivasan, 1990, as cited in Bearden and Netemeyer, 1999). The only group to spend a substantial portion of the time viewing the body text was that which spent the most time viewing the ad overall, perhaps indexing deeper consideration of the advertised product. Additionally, subjects in this group viewed the product as more risky than did those in the other groups. Overall, then, the evidence suggests that deep engagement with the product advertised (and its attendant risks) may bias subjects toward the text, while more casual viewing, or evaluation of the advertisement itself, may bias viewers toward the picture.

#### *Integrating text and picture viewing*

We now consider research on how viewers integrate text and picture elements while inspecting print ads. Rayner et al. (2001)found that average fixation duration when viewing the picture in an ad (about 266 ms) was significantly longer than when viewing the text (about 226 ms). Viewers also made longer saccades on average (about 4.5◦ of visual angle) when examining a picture than when reading the text (about 3.1◦). These findings were replicated in Rayner et al. (2008) and are also quite consistent with the

Rayner et al. (2001, 2008) also found that viewers generally did not quickly alternate between fixating the text and the picture but rather tended to remain on one component or the other for several fixations in a row. More specifically, given that a fixation was on the picture, the next fixation would also be on the picture about 78% of the time; if a fixation was on the text, the following fixation would remain on the text about 77% of the time (Rayner et al., 2008). Pieters et al. (1999a) reported similar findings.

However, Radach et al. (2003) reported (somewhat informally) that viewers tended to look back and forth fairly frequently between different elements of the ad, including the text and the picture. They suggested that this may have been due to the relatively high demands placed on subjects in their study. Indeed, as we have seen, the goal of the viewer can substantially affect viewing behavior. However, another possibility is that the nature of the stimuli, and in particular the text used within the ads, may have differed across experiments. In particular, many of the ads used by Rayner et al. contained somewhat lengthy passages of "body text." If the stimuli used by Radach et al. (2003) contained shorter snippets of text (in the form of headlines or brief slogans), one might imagine that this could lead to more alternating between text and pictures if readers adopted a "sampling" approach rather than a reading approach toward the text. This idea is, of course, purely speculative, but it could be tested experimentally in future research.

In summary, then, a number of factors appear to guide eye movements when viewing print advertisements. These include size, color (Lohse, 1997; Lohse and Wu, 2001), and visual clutter (Janiszewski, 1998), as well as higher-level social cues, such as the direction of a model's gaze (Hutton and Nolte, 2011). Creative or original ads are also fixated more than typical ads, and are liked better, and deemed more interesting (e.g., Radach et al., 2003). Repeated exposures to a given ad reduce viewing times, at least when these exposures occur in short succession (Pieters et al., 1996). However, the transition matrices between ad elements, indexing the probability of making a saccade from one element to another, remainfairly stable across multiple viewings (Pieters et al., 1999a). In addition, the beneficial effects of a particularly creative ad may ameliorate the negative influences of repetition (Pieters et al., 2002). The goal or task of the viewer also strongly influences how long we view ads (e.g., Rayner et al., 2001) and may, furthermore, change the proportion of time spent viewing specific ad elements (such as the text vs. the picture). Research on eye movements when viewing text and pictures in ads mirrors the broader eye movement literature in that both fixations and saccades are longer when viewing pictures than when reading text (Rayner et al., 2001). Somewhat mixed findings have emerged on the question whether viewers tend to skip back and forth between text and pictures or remain on one element for a more extended period (compare Radach et al., 2003 with Rayner et al., 2001, 2008). However, two possible explanations for these discrepancies have been proposed (one concerning differences in task and the other concerning differences in stimuli), and future research may resolve this question. Finally, in some of the studies reviewed, eye movement measures were correlated with subsequent measures of memory

<sup>6</sup>More specifically, the text was viewed longer and more often than the picture in this study for both critical and non-critical ads. For one of the ad types only (depicting cars), however, the text advantage was greater when those ads were critical than when they were not.

for the advertised product or brand. In the upcoming sections of the article, reviewing eye movements when viewing warning labels as well as ads presented on TV or the Internet, we will continue to explore issues of eye guidance, as well as the relationship between eye movements and higher-level phenomena such as memory.

#### **WARNING LABELS**

When studying how viewers inspect advertisements, we are often interested in what elements of an ad capture and hold viewers' attention. While most information (pictorial or textual) is redundant in its attempt to persuade consumers and provide them with a favorable impression of the advertised product or brand, there is one clear-cut exception. The inclusion of health warnings on alcohol and tobacco advertisements represents a clear case in which the information gleaned from viewing the advertisement varies as a function of which regions are viewed.

Across several studies investigating the viewing of alcohol and tobacco warning labels, the general finding is that these labels are often never viewed, and when they are viewed, it is for a very small percentage of the overall ad viewing time (e.g., Fischer et al., 1989; Fox et al.,1998; Thomsen and Fulton,2007). Because, in the United States, these warnings are usually small in relation to the overall advertisement (taking up, for example, only 3.2% of the ad in a sample used by Fischer et al., 1989), entirely text-based, and black and white, they are unlikely to capture and hold viewers' attention. Multiple lines of research have therefore investigated the viewing time and recall of warning labels in existing advertisements and compared them with those in which the salience of the warnings has been manipulated.

In one of the first such studies, Fischer et al. (1989)recorded the eye movements of adolescents viewing real cigarette and alcohol advertisements. They found that on 43.6% of trials, subjects never directly fixated the warning, and that on 19.8% of trials subjects looked at, but did not read the warning7. On average, subjects looked at the warning labels for only 750 ms, which corresponded to 8% of the total ad viewing time, and this time was unaffected by differences in content, position, or shape (though the stimulus set was small – only five advertisements were tested). Additionally, they found that performance in a subsequent masked recall test of warning label content (where subjects were shown the original ad with the warning label and other areas masked and asked to recall the content) was positively correlated with both mean looking and reading time.

To investigate the effects of various cues on attentional capture and ease of identification, Laughery and Young (1991) manipulated the saliency of warning labels by including pictorials, icons, colors, borders, or combinations of these four cues, and measured the time it took subjects to locate the warning label (i.e., the time from image onset to the first fixation on the warning label), as well as the time it took them to determine that the information was a warning (measured by the time from first fixation on the label until a button was pressed). Time to locate the warning was numerically shorter when any of the saliency manipulations were included, and significantly shorter when the pictorial cue, the color cue, or all four cues combined were included. Similarly, the time to determine that the label was a warning was significantly shorter when a pictorial was included, either alone or combination with other cues. However, since the subject's goal was to determine whether or not a warning was present in each advertisement, the procedure was, in fact, a visual search task. Thus, it is unclear whether the results would generalize to a more naturalistic, passive viewing of advertisements.

To answer this question, Krugman et al. (1994) compared the eye movements of subjects viewing ads with standard, federally mandated cigarette warnings to novel warnings, which were the same size and shape, but could differ in text, color, graphics, and print type. To keep ecological validity high, the subjects were asked to view the advertisements as they would in a magazine. Novel warnings attracted more attention (i.e., were fixated by more subjects) and attracted attention sooner (i.e., were fixated more rapidly) than the standard warnings. Additionally, Krugman et al. (1994) found that the time spent viewing the warning was positively correlated with masked recall performance for content of the new ads (note that they did not measure masked recall of the standard ads because of subject familiarity).

More recently, Thomsen and Fulton (2007) examined the eye movements of adolescents viewing alcohol ads with moderation messages (e.g., "drink responsibly"). They found that, on average, subjects only fixated the moderation message for 350 ms, which corresponded to 7% of the total viewing time, and that in 75% of the ads with small moderation messages, that message was the least fixated area of the advertisement. However, when the moderation message was a central theme, subjects viewed the message significantly longer (on average 710 ms, compared to 170 ms when the message was not a central theme). In general, recall for even general concepts of the moderation messages was poor even among subjects who fixated them, but, as in the studies by Fischer et al. (1989) and Krugman et al. (1994), there was a positive correlation between fixation time and masked recall performance.

Finally, Peterson et al. (2010) found that American adolescents viewed Canadian-style cigarette warnings, containing graphic images (e.g., of diseased tissue) and novel text warnings, for about 2.5 times as long as traditional, American warnings (including only text delivering the Surgeon General's warning). Subjects also recalled the graphic messages more accurately in a subsequent memory test. Strasser et al. (2012) observed similar responses to graphic warnings on tobacco products among adult, American smokers.

Overall, then, the data seem quite clear that small, text-based warnings on advertisements receive little visual attention and are poorly recalled. However, by manipulating the salience (and the novelty) of such ads by, e.g., adding graphic images, attention and memory may be improved8.

<sup>7</sup>Reading time was calculated as the sum of all fixations with durations of 100 ms or more, not by a qualitative assessment of the eye movement patterns in relation to the text. Individual fixations shorter than 100 ms were counted in looking time, but not reading time. If a subject made no fixations over 100 ms in duration, they were deemed not to have read. A more detailed investigation of the eye movement data was not included.

<sup>8</sup>For an additional example of research using eye tracking to examine the effectiveness of public health messages, see O'Malley et al. (2012), which concerns visual attention when viewing osteoporosis prevention ads.

#### **DYNAMIC MEDIA**

Recent research has expanded beyond the realm of print advertising to examine eye movements when viewing ads presented via dynamic media, including websites and TV. While print advertisements can only use static cues, websites and TV also afford advertisers the opportunity to use sound and motion to guide viewers' attention. Research that specifically examines viewers' responses to dynamic media is essential for developing a complete understanding of the effects of sound and motion on attentional capture, memory, and preference. Several important findings regarding eye movements when viewing dynamic media are reviewed below.

#### *Television advertisements*

While research using eye tracking to examine the effectiveness of TV ads in capturing visual attention and affecting recall is relatively limited at this time, several interesting and potentially useful findings have nonetheless emerged from this literature (see also Wedel and Pieters, 2008a for a review).

First, in one early line of research, d'Ydewalle and colleagues (d'Ydewalle et al., 1988; d'Ydewalle and Tamsin, 1993) measured attention to and subsequent memoryfor advertisements appearing on billboards at a soccer field during a televised game. In both studies, subjects viewing the game on video spent less than 4% of the total time fixating the billboards. Perhaps unsurprisingly, given how little time was spent inspecting the ads, d'Ydewalle and Tamsin (1993) found that subjects recalled on average only 1.2 brands out of the 42 that were presented and were at chance for brand recognition. Thus, TV ads that are embedded within the primary content of a sporting event may not attract substantial visual attention or lead to strong memory representations of the advertised brand.

Other research has analyzed visual attention to more standard TV ads, typically presented during commercial breaks and interspersed with the primary content. Brasel and Gips (2008b) compared viewing behavior for TV shows and commercials. They found, first, that viewers exhibited a strong tendency to fixate near the center of the screen when viewing both kinds of content. They also conducted a frame-by-frame analysis of variability in fixation locations across subjects and found that variability was higher when viewing commercials than when viewing the primary program. Furthermore, variability of fixation locations was particularly high when the commercials contained brand elements. Finally, familiarity with a given commercial (manipulated by presenting it several times over the course of an experimental session) was also linked with increased variability of fixation locations. Brasel and Gips speculated that lack of engagement with the ad, driven by repeated presentations, could, perhaps explain the tendency for subjects' eyes to wander more widely in later exposures to the ad.

Two studies by Teixeira and colleagues also examined variability in fixation locations across subjects, this time in connection with ad avoidance. Critically, if viewers do not wish to view TV ads (and video-based ads more broadly), they are often able to avoid them entirely, by muting them, temporarily turning off the device, or even blocking or skipping the commercials. The topic of ad avoidance is, consequently, an important one in the domain

of TV advertising. Teixeira et al. (2010) found that higher variability in fixation locations across subjects predicted greater ad skipping. They suggested that high variability may indicate a failure, on the part of the advertiser, to sufficiently shape viewers' engagement with the advertisement and guide attention to key aspects of the scene from one moment to the next. In addition, they found that the sustained presence of a central brand element on the screen predicted ad skipping9. However, brand "pulsing," a strategy wherein the brand is shown for the same duration overall, but for shorter intervals each time, was found to ameliorate this effect. To explain this finding, Teixeira et al. speculated that pulsing, unlike the sustained, central presence of the brand, may leave the narrative of the commercial relatively intact, thus supporting effective guidance of viewers' visual attention and preventing ad skipping.

Building up on these findings, Teixeira et al. (2012) examined the relationships among emotion, as measured by viewers' facial expressions, variability in fixation locations, and commercial skipping10. They found that measures of apparent joy and surprise were linked with reduced variability in fixation locations across subjects. These emotions, in addition, were found to reduce ad skipping, both via a direct route (when controlling for fixation concentration effects) and via an indirect route, by concentrating fixation locations across viewers.

Quite recently, Brasel and Gips (2013) investigated the effect of subtitles on visual attention to and memory for ads. They found that same-language subtitles attracted visual attention, as subjects spent a greater percentage of frames looking at the subtitle region when subtitles were present than when they were absent. In addition, same-language subtitles also improved recall for the brand and for verbal information that was presented redundantly (i.e., both vocally and within the subtitles). Subtitles did not improve all aspects of memory, however: indeed, they decreased recall of information presented only visually, leading to reduced memory for brands that were not verbally named (and were therefore not included in the subtitles). The eye-tracking data and the memory data were collected from different subject groups, however, so it is not possible to correlate a given subject's fixations on subtitles with subsequent recall performance.

Finally, Janiszewski and Warlop (1993) found evidence that attention to ads may be improved via a conditioning procedure. In the study, TV commercials were always presented in a specific order such that a conditioned stimulus (clip of the soda being advertised) always preceded an unconditioned stimulus (a clip of an enjoyable activity). This conditioning procedure led to increased (and more rapid) attention to the conditioned brand during subsequent exposure, suggesting that associative learning about a given brand can enhance attention to that brand.

In summary, research on TV ad viewing suggests, first, that embedded advertisements, in the form of billboards appearing

<sup>9</sup>See Brasel and Gips (2008a), however, for results suggesting that a central brand element may be beneficial for memory for brands viewed in fast-forwarded commercials.

<sup>10</sup>The ads tested in this study were, in fact, Internet ads. However, they are included in this section because they represent video-based ads and are similar in form to television advertisements.

during sporting events, may not be effective in capturing visual attention or influencing subsequent memory (d'Ydewalle et al., 1988; d'Ydewalle and Tamsin, 1993). When considering more traditional TV commercials, in which ads are interleaved with the primary content during commercial breaks, ad skipping is a central concern. Interestingly, when fixation locations are quite variable across subjects, more frequent ad skipping occurs (Teixeira et al., 2010), perhaps suggesting a lack of engagement with the narrative of the ad. Measures of joy and surprise are linked with more homogeneous viewing behavior across subjects and reduced brand skipping (Teixeira et al., 2012). In contrast, repeated exposures to an ad lead to increased variability in fixation locations across subjects (Brasel and Gips, 2008b). Including subtitles with TV ads is also associated with improved memory for certain kinds of information presented in the ads (Brasel and Gips, 2013). Finally, conditioning procedures can increase attention to brand elements in TV commercials (Janiszewski and Warlop, 1993).

#### *Internet advertisements*

As in TV advertising, ad avoidance is a topic of considerable interest in the domain of Internet advertising. Unlike most TV ads, banner and "skyscraper" ads (i.e., vertical banners) that appear on websites must often compete directly with surrounding editorial content for visual attention (see Drèze and Zufryden, 2000). As will be discussed below, viewers are thought to routinely avoid such ads when viewing websites, a phenomenon known as "banner blindness" (Benway, 1998, 1999; see also Owens et al., 2011 for similar findings regarding text ads). Several lines of research have manipulated the location, animation, onset, and relevance of Internet ads, simultaneously recording viewers' eye movements to determine when the ads capture visual attention and when "banner blindness" takes place.

In one early study of eye movements during Internet search, Drèze and Hussherr (2003)found that subjects searching web sites fixated just under half of the banner ads presented. Since the probability of fixation was less than one would predict on the basis of ad size and location alone, Drèze and Hussherr concluded that viewers were able to identify banner ads in the visual periphery and, subsequently, intentionally avoid fixating them. Additionally, only 46.9% of subjects remembered seeing any banner ads during the experiment, and a recognition memory test revealed that subjects could not accurately discriminate ads that had been present on the website from foils that had never appeared.

Since certain Internet ad locations are consistent and thus predictable, however, users may not need to identify ads in the periphery in order to avoid them, but rather may be able to learn where they tend to appear and simply avoid fixating those locations. Lapa (2007) provided evidence that viewers do, in fact, learn the locations of banner ads over time and sometimes use this information to avoid fixating them. However, Burke et al. (2005) found that even when ad locations were not predictable, subjects only fixated the banners in 11.7% trials11. This suggests, as Drèze and Hussherr (2003) proposed, that subjects are, indeed, also able to recognize banner ads in peripheral vision and avoid fixating them.

While it appears that Internet ads may receive little attention in general, certain factors may be manipulated with the aim of attracting or holding viewers' attention: these include location, animation, onset, and relevance. Kuisma et al. (2010) manipulated both ad location (horizontal, banner ads on the top of the display vs. vertical, "skyscraper" ads on the right side of the display) and animation (both static, both animated, or one of each). There was a main effect of ad location, such that more fixations landed on the skyscraper ad on the right side of the display than on the banner ads along the top. Animation was also found to increase fixations on skyscraper ads and decrease fixations on banner ads. Furthermore, including multiple animated advertisements resulted in fewer fixations on the ads than including only a single animated ad. Somewhat surprisingly, recognition memory results did not mirror the eye movement data. Rather, animation increased recognition memory for banner ads, but had no effect on the recognition memory for skyscraper ads. Findings on the relationships among memory, animation, and visual attention to Internet ads become even less clear when we consider the results of Burke et al. (2005), who found that memory (though very poor overall) was better for static banner ads than animated ones.

In a study similar to that of Kuisma et al. (2010), Simola et al. (2011) also manipulated both location (banner, skyscraper) and animation (both static, both animated, one of each), but additionally included different ad onset delays from 0 to 12 s. Consistent with the findings of Kuisma et al. (2010), they reported that animation increased attention to the skyscraper ads to the right of the text (especially when one ad was animated and the other remained static), and that the skyscraper ad was fixated more often and for longer than was the banner ad above the text. They also found that abrupt onset captured attention, as ads that appeared abruptly were fixated more often, though this effect was modulated by ad location, with skyscraper ads in close proximity to the text capturing attention more immediately, and banners located in the periphery capturing attention less quickly (see also Day et al., 2006 for evidence that even without capturing overt attention, ads flashing in the periphery can increase arousal and result in more efficient primary task performance).

Extending these findings, Simola et al. (2011) varied the task (reading for comprehension vs. browsing according to subjects' own interests) and found that subjects were more likely to view the ads and looked at them for longer during browsing than during reading for comprehension, thus providing evidence that a user's goals can exert "a strong top-down influence on attentional allocation" (p. 189) during online processing of information and ads. Additionally, during browsing, they found a correlation between ad onset and first fixation time for ads at both locations. However, in the reading task, there was only a correlation for the ad to the right of the text (which was in close proximity to the ends of the lines of text) and not to the peripheral banner ad, suggesting that users can selectively allocate attention to the task-relevant portions of the screen. Critically, in both tasks, self-reports of attention were correlated with actual eye movement data, such that participants who reported attention to ads also looked at the ads more often and

<sup>11</sup>This 11.7% estimate is an upper bound since in 70% of these trials, the ad was fixated following the first eye movement and in 54% of this subset, the ad actually appeared in the location of the first fixation after the eyes had already moved.

for longer periods of time. This led Simola et al. to suggest that attentional capture by ads is related to overt rather than covert attention, a conclusion that seemingly runs counter to the studies suggesting that ads are recognized peripherally via covert attention (e.g., Drèze and Hussherr, 2003; Burke et al., 2005; Day et al., 2006).

Hamborg et al. (2012) examined the time course of attention to banner ads when subjects were given a primary task requiring that they extract information from an accompanying article. Significantly more subjects looked at a continuously animated than a static banner ad, in seeming contrast to some of the findings described above. Interestingly, these banner ads also attracted most fixations near the beginning or end of the primary task, suggesting that bottom-up salience may be more likely to interfere with top-down processing during these early and late periods of information search (see also Wang and Day, 2007). More details about the animated ads than the static ads were also recalled in a subsequent memory test.

Finally, some research has manipulated relevance of the ad to the subject's task as well as the relationship between the ad and the editorial content. Lapa (2007) manipulated ad relevance by including ads that were either related or unrelated to the subject's search task. He found that relevance did not influence ad viewing time, suggesting that users may assume banner ads to be irrelevant to their goals and the primary content. Relatedly, Hervet et al. (2011) found that congruency between text ads and surrounding web page content did not influence fixation probability or total viewing time on the ads, though congruent ads were remembered better than incongruent ones12.

In summary, viewers may tend to avoid fixating advertisements on websites, both by identifying them peripherally (Drèze and Hussherr, 2003; Burke et al., 2005) and by learning the locations in which they are likely to appear (Lapa, 2007). Some evidence also suggests that skyscraper ads, presented to the right of the primary content, are more likely to be fixated across a variety of tasks than are banner ads, presented on top of the primary text (Kuisma et al., 2010; Simola et al., 2011). Furthermore, the likelihood of fixating such skyscraper ads may be increased if they are animated (Kuisma et al., 2010; Simola et al., 2011) or appear suddenly (Simola et al., 2011). Effects of animating banner ads, however, are somewhat less clear (compare Hamborg et al., 2012 with Simola et al., 2011 and Kuisma et al., 2010). A mixed pattern of findings has also been reported concerning the relations among memory, animation, and eye movements when viewing Internet ads. In general, however, the data indicate that memory for Internet ads is rather poor (Drèze and Hussherr, 2003; Burke et al., 2005). The relevance of Internet ads (Lapa, 2007) or their relationship with surrounding content (Hervet et al., 2011) do not appear to affect ad viewing, suggesting that users may assume that such ads will be irrelevant to their primary goals. Finally, some evidence suggests that when viewers are engaged in a primary task, they are more likely to view banner ads near the beginning or the end of this task, when they

may be more susceptible to bottom-up influences on oculomotor behavior (Wang and Day, 2007; Hamborg et al., 2012).

#### **CONCLUSION**

In this article, we reviewed critical findings on eye movements when viewing advertisements, including in print, on TV, and on websites. A number of factors were found to guide eye movements when viewing prints ads, ranging from basic visual properties of advertisements (e.g., size and color), to social cues (e.g., the direction of a model's gaze), to the goals of the viewer. The literature regarding warning labels on tobacco and alcohol ads revealed that the plain, black-and-white text warnings currently used in the United States draw little visual attention and are often forgotten. However, manipulating the visual salience (and novelty) of these warnings – by, for example, including graphic images – improved both visual attention to and memory for such warnings. Research on ads in dynamic media has also produced several noteworthy findings, revealing, for example, that subjects appear to avoid viewing banner ads in some cases, using both peripheral processing and canonical ad locations as cues. Across multiple domains, eye movement measures were often (though not always) found to predict subsequent memory for the advertised product, warning, or brand.

Although a substantial body of research has now been produced on eye movements while viewing advertisements, several avenues remain largely unexplored. First, relatively little is known about the guidance of eye movements when viewing dynamic, videobased ads (but see Itti, 2005 for a model of bottom-up effects on dynamic scene viewing). We expect that this will be an important area for future research to examine in greater depth. The relationships among eye movements, memory, and preference are also ripe for further investigation. The potentially complex causal relationships among these variables are of considerable theoretical interest13. Such research could also be quite useful from an applied perspective. For example, it would be helpful to determine how or whether specific eye movement measures might predict memory for a brand or product over an extended period of time (e.g., multiple days or weeks). As was noted above, tracking eye movements seems less likely to bias subsequent measures (such as product recall) than does soliciting verbal responses from subjects. Therefore, if eye movements are indeed a robust predictor of brand memory over some duration, this may be very helpful to applied researchers.

Several methodological approaches may also prove useful. First, though the point may seem rather a minor one, we strongly believe that settling on a common, codified set of terms to refer to the same, underlying eye movement measures (e.g., the total duration spent viewing a given element within a trial) will enable findings to be shared and compared much more efficiently across laboratories in the upcoming years. At present, the terminology used for such measures appears to be somewhat variable in the advertising literature.

Second, the gaze-contingent display change paradigm (McConkie and Rayner, 1975; Rayner, 1975) may prove useful

<sup>12</sup>As discussed above, however, Simola et al. (2013) found that newspaper ads that were semantically incongruent with primary content received more attention than those that were semantically congruent in second-pass viewing.

<sup>13</sup>Note that related issues have been addressed in some detail in the visual decisionmaking literature (see, e.g., Glaholt and Reingold, 2011).

in future research. As was noted above, this approach, which consists in dynamically updating the display based on the eye movements of the viewer, has been quite useful in research on reading, visual decision-making, etc., allowing us to investigate topics such as parafoveal preview and the perceptual span in detail. The technique is useful because it affords experimenters precise control over subjects' visual input, based on current eye position, while allowing subjects to inspect the scene freely. Gaze-contingent designs could, we believe, take on an important role in upcoming research on eye movements when viewing advertisements.

Third, and most broadly, further controlled, experimental designs could be used in future research to complement some existing correlational findings. A number of important studies in the field have used an approach that is at least partly correlational, presenting viewers with an assortment of real advertisements that vary naturally along dimensions of interest (e.g., the size of each ad element) and then measuring associated eye movement variables. This approach has advantages: notably, ecological validity is high. However, confounds are also a risk in such studies14. Therefore, it would be useful to determine if experimental studies, requiring systematic manipulation of independent variables of interest, will produce consistent results.

Finally, compared with the literature concerning eye movements in reading, scene perception, and visual search, it seems that research on eye movements while looking at advertisements is in its infancy. Consequently, a large number of interesting and useful avenues of research (of which only a few are mentioned above) remain available for future researchers to explore.

#### **ACKNOWLEDGMENTS**

We would like to thank Agnieszka Konopka and Stevan Adam Brasel for helpful comments on a previous draft of this article. This submission was partially supported by the University of California at San Diego Open Access Fund.

#### **REFERENCES**


<sup>14</sup>For example, suppose that brands that sold particularly intriguing products also tended, on average, to use large pictures in their advertisements. If we found longer gaze durations associated with larger pictures, then, it may be attributable to underlying characteristics of the product rather than the size of the picture.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 15 November 2013; paper pending published: 23 December 2013; accepted: 24 February 2014; published online: 17 March 2014.*

*Citation: Higgins E, Leinenger M and Rayner K (2014) Eye movements when viewing advertisements. Front. Psychol. 5:210. doi: 10.3389/fpsyg.2014.00210*

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology.*

*Copyright © 2014 Higgins, Leinenger and Rayner. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited andthatthe original publication inthis journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## A review of the findings and theories on surface size effects on visual attention

#### *Anne O. Peschel\* and Jacob L. Orquin*

*MAPP Centre for Research on Customer Relations in the Food Sector, Department of Business Administration, Aarhus University, Aarhus, Denmark*

#### *Edited by:*

*Jaana Simola, University of Helsinki, Finland*

#### *Reviewed by:*

*Myriam Chanceaux, Centre National de la Recherche Scientifique, Grenoble University, France Thierry Baccino, University of Paris 8, France*

#### *\*Correspondence:*

*Anne O. Peschel, MAPP Centre for Research on Customer Relations in the Food Sector, Department of Business Administration, Aarhus University, Bartholins Allé 10, 8000 Aarhus C, Denmark e-mail: annpe@asb.dk*

That surface size has an impact on attention has been well-known in advertising research for almost a century; however, theoretical accounts of this effect have been sparse. To address this issue, we review studies on surface size effects on eye movements in this paper. While most studies find that large objects are more likely to be fixated, receive more fixations, and are fixated faster than small objects, a comprehensive explanation of this effect is still lacking. To bridge the theoretical gap, we relate the findings from this review to three theories of surface size effects suggested in the literature: a linear model based on the assumption of random fixations (Lohse, 1997), a theory of surface size as visual saliency (Pieters et al., 2007), and a theory based on competition for attention (CA; Janiszewski, 1998). We furthermore suggest a fourth model – demand for attention – which we derive from the theory of CA by revising the underlying model assumptions. In order to test the models against each other, we reanalyze data from an eye tracking study investigating surface size and saliency effects on attention. The reanalysis revealed little support for the first three theories while the demand for attention model showed a much better alignment with the data. We conclude that surface size effects may best be explained as an increase in object signal strength which depends on object size, number of objects in the visual scene, and object distance to the center of the scene. Our findings suggest that advertisers should take into account how objects in the visual scene interact in order to optimize attention to, for instance, brands and logos.

**Keywords: advertising, eye movements, surface size, visual attention, saliency**

"fpsyg-04-00902" — 2013/12/5 — 17:18 — page 1 — #1

#### **INTRODUCTION**

For an advertisement to be effective, the advertised information must invariably capture consumers' attention, ideally fast and reliable. To meet this end advertisers often enlarge important objects or messages, for instance, by increasing the size of magazine ads and billboards or the size of important elements within the ad. A large object in an advertisement is more likely to attract attention than a small one (Wedel and Pieters, 2007) and using the optimal size of ad elements furthermore can lead to downstream effects such as increased sales (Zhang et al., 2009). While this might be a satisfactory conclusion from a business perspective, there are only few theoretical attempts to explain why surface size increments affect attention. However, an improved understanding of how surface size affects attention would contribute to research on visual perception of advertising as well as practice by allowing a more systematic approach to the study and application of surface size effects.

Our paper aims to bridge this research gap by first summarizing the results of a literature review including studies on surface size effects on eye movements. In a second step we relate the findings to three theories suggested in the literature: a linear model based on the assumption of random fixations (Lohse, 1997), a theory of surface size as a consequence of visual saliency (Pieters et al., 2007), and a theory of surface size based on competition for attention (CA; Janiszewski, 1998). As a final step we propose a fourth model of surface size effects called "demand for attention" based on a revision of the theory of CA. This model improves the model of CA by adjusting the underlying assumptions based on previous research. In the last section we evaluate the models by reanalyzing a large eye tracking dataset from a study on consumer decisions.

The effect of surface size on attention has previously been reviewed by Wedel and Pieters (2006) as well as Orquin and Mueller Loose (2013). Neither of these reviews addressed the theoretical underpinnings of surface size effects.

#### **LITERATURE SEARCH AND FINDINGS**

The Web of Science, Scopus, PsycINFO, and Google Scholar databases were searched using a combination of keywords related to surface size and visual attention. Additional searching was carried out using literature lists and through contact with authors. The literature search revealed 19 studies fulfilling the inclusion criteria of reporting surface size effects on visual attention operationalized by eye movement measures. Most of the studies were conducted using print advertisements, focusing on brand, pictorial, and text elements (Lohse, 1997; Rosbergen et al., 1997; Janiszewski, 1998; Wedel and Pieters, 2000; Rayner et al., 2001, 2008; Pieters et al., 2002, 2007, 2010; Pieters and Wedel, 2004, 2007; Ryu et al., 2009; Zhang et al., 2009; Boerman et al., 2011). Participants were asked to leaf through magazines while their fixations on the target advertisement were recorded by eye tracking equipment.

The most commonly reported fixation measures were fixation likelihood (FL), fixation count (FC), total fixation duration (TFD), and time to first fixation (TTF). FC and TFD are closely related variables indicating the number of fixations on a stimulus and the total duration of all fixations on the stimulus. FL is the estimated probability that an area will capture attention while attention capture itself is a binary variable indicating whether the stimulus was fixated or not. TTF indicates how fast an area is fixated after stimulus onset. An increase in the first three variables is often referred to as an increase in attention while the opposite is true for TTF (for an overview of fixation measures, see Holmqvist et al., 2011).

The findings from the identified studies were grouped according to the dependent variable in question (see **Table 1**). The most commonly reported dependent variables were TFD, followed by FC and FL while TTF was rarely reported. Overall, the studies show that increasing the surface size of an element significantly increases FC, FL, and TFD toward the enlarged object. The studies reporting TTF also suggest that large objects are fixated faster than small objects. The results reveal a strong and robust effect of surface size on attention, i.e., large objects receive more attention, are more efficient in capturing attention, and do so faster than small objects.

Although the main effects of surface size on attention are relatively clear, the findings on interaction effects with stimulus class are mixed. Several studies have found that the effect of surface size depends on the class of stimulus, such as the brand, pictorial, or text element in an advertisement (Rosbergen et al., 1997; Goldberg et al., 1999; Pieters and Wedel, 2004; Pieters et al., 2007, 2010; Chandon et al., 2009). However, the findings on interaction effects do not reveal a consistent pattern across studies. Pieters and Wedel (2004), for example, found a significant effect of text element surface size on FL and TFD in magazine advertisements but no effect of surface size for brand or pictorial. Contrary to this, in a study onfeature advertisements, the effects were reversed with only the text element being non-significant (Pieters et al., 2007). A further contradiction was reported by Pieters et al. (2010) in a study on magazine ads, where again the TFD toward text elements was unaffected by increases in the surface size. Rosbergen et al. (1997) and Goldberg et al. (1999) only found significant effects of surface size on TFD for specific consumer segments. These authors also found that surface size effects on TTF differed between stimulus classes.

Three studies analyzed how increasing the surface size of one element affects attention to other elements, i.e., how elements compete for attention (Goldberg et al., 1999; Pieters and Wedel, 2004; Boerman et al., 2011). These studies found significant negative effects of increasing the surface size of one element on FL, fixation duration, and TTF toward other elements. However, this CA effect was not consistent across studies. Boerman et al. (2011) found that increasing the size of text elements significantly decreases FL toward pictorial elements. Pieters and Wedel


"fpsyg-04-00902" — 2013/12/5 — 17:18 — page 2 — #2

#### **Table 1 | Overview of the existing literature and findings.**

+*, significant positive effect;* −*, significant negative effect; ø, no significant effect.*

(2004), on the other hand, found that increasing the size of text elements significantly decreases the TFD toward brand elements, however, not toward pictorial elements. Furthermore, the study by Goldberg et al. (1999) showed that other distracting elements, such as anchor lines, decreased the time to first fixate on the enlarged version of the target stimulus.

This brief overview shows that despite a strong and robust effect of surface size on attention, a deeper inspection of the findings produces a mixed picture. Understanding the mechanisms behind the effect of surface size on attention could contribute to an explanation of these apparent inconsistencies. In the following section we introduce four theories on surface size effects on attention. We review the assumptions and predictions of the theories and relate them to the findings from the literature review. As a final evaluation of the theories, we reanalyze an eye tracking data set thereby providing a direct comparison of how well the models describe eye movements in a naturalistic consumer choice situation.

#### **THEORIES OF SURFACE SIZE EFFECTS ON ATTENTION RANDOM FIXATIONS**

The first and simplest theory of surface size effects proposed by Lohse (1997) states that the size of an object determines the number of random fixations landing on the object. According to this theory, an object covering 1% of the total surface will receive 1% of fixations; an object covering 2% will receive 2% of fixations, etc. Object surface size increments therefore influence attention metrics linearly resulting in a decrease in TTF, and increase in FC, FL, and TFD.

The simplicity of the theory is appealing but it makes a strong assumption about a random distribution of fixations. Although some eye movement theories assume a stochastic distribution of attention (Krajbich et al., 2010), it is unlikely that fixations are randomly distributed across visual scenes. In fact, one could argue that fixations are exactly the opposite of random, i.e., each fixation is computed to maximize information acquisition (Hayhoe and Rothkopf, 2011) which speaks against the fundamental assumptions of this theory.

The reviewed literature suggests that surface size does not influence attention linearly but rather logarithmically. In an early study conducted by Nixon (1924), 30 participants were observed while looking at magazine advertisements which either occupied a whole or half a page. The ratio of TFD of the full relative to the half page advertisements was estimated at 1 to 0.74. This indicates greater attention toward the full page advertisement, however, not in the linear manner as described above, which would predict that reducing object size by 50% would result in a loss of 50% of fixations.

Lohse (1997) conducted a study with 32 participants looking at yellow page advertisements in which participants were asked to find three businesses in each of eight categories. The study showed that the number of fixations per advertisement increased with the size of the advertisement; small advertisements, however, maximized the number of fixations per square inch. In other words, small advertisements gained attention more efficiently than large ones which again contradict the assumption of a random distribution of fixations.

At a simulated supermarket shelf, Chandon et al. (2009) examined the effect of number of product facings on attention. The number of product facings is an indicator of how large an area of the shelf is occupied by a particular brand, i.e., the surface size for that brand. The study revealed a significant positive effect of number of product facings on FC and FL for an increase from four to eight product facings. Adding four more facings resulted in a marginal but significant increase in both attention measures.

Finally, in a study comparing 198 different products Orquin et al. (2012) examined the effect of surface size of individual product packaging elements on attention. The analysis revealed a better model fit when the surface size variable had been log transformed indicating non-linear effects of size on TFD.

The reviewed studies show a very high agreement on non-linear effects of surface size on attention. All studies point to a logarithmic effect of size with the higher gains occurring for small objects and a diminishing marginal effect for large ones. We find no evidence supporting the prediction that surface size has a linear effect on attention.

#### **SIZE AS SALIENCY**

"fpsyg-04-00902" — 2013/12/5 — 17:18 — page 3 — #3

Another interpretation of surface size effects is that size increments lead to higher visual saliency and therefore affect attention through saliency (Pieters et al., 2007). Visual saliency is often manipulated through stimulus contrast or luminance (e.g., Foulsham and Underwood, 2009; Nordfang et al., 2013) but more advanced computational models integrate several feature dimensions to compute stimulus pop-out. These dimensions typically include color, orientation, and contrast. The computational models integrate these feature layers into a saliency map predicting the relative saliency of each pixel in the image analyzed (Itti and Koch, 2001). If size effects are a function of visual saliency, as suggested, it then follows that size effects share psychometric properties with visual saliency such as shorter TTF and higher FL for more salient objects (Itti and Koch, 2001). Another property of saliency is that the effect is easily disrupted by task instructions (Einhäuser et al., 2008), semantic or contextual cues about a visual scene, feature-based attention, object representations, and rewards for task performance (Kowler, 2011). According to the size as saliency theory, the same should be true for surface size effects. Another important consequence of the theory is that if size is a function of saliency, then the effect size of surface size should be in the same range as the effect size of saliency, but never exceeding it. In line with this, when controlling for visual saliency, size effects should be minimal or non-existent.

Orquin et al. (2012) studied the effect of size and saliency of product packaging elements on FL. Spearman's Rho correlation between size and saliency of each packaging element were considerably strong (ρ<sup>s</sup> = 0.434). The correlation coefficient therefore suggests that at least some of the effect of size on attention is due to increments in saliency.

Pieters and Wedel (2007) studied gaze duration to brand, pictorial, headline, and body text elements of advertisements under various viewing goals (ad memorization, ad appreciation, brand learning, or brand evaluation). The study revealed that surface size had a significant positive effect on attention independent of

processing goals indicating that, unlike saliency, surface size effects were not disrupted or affected by task instructions.

Regarding the statement that surface size effects cannot exceed saliency effects, several studies suggest otherwise. In the previously described study by Lohse (1997), advertisements were distinguished by being black and white or containing some red color. In a layout of black and white posting, the red advertisement is considered more salient than the others due to increased contrast. Accordingly, these advertisements received more attention in terms of increased FL, TFD and decreased TTF. However, size effects were stronger than saliency effects for all three measures.

In the Pieters et al. (2007) study saliency was operationalized by assessing target distinctiveness and distractor heterogeneity. An element with high target distinctiveness stands out relative to the distractor item due to its size and orientation. High distractor heterogeneity is characterized by distractors of different size, shape, and orientation. Both high target distinctiveness and low distractor heterogeneity result in higher target saliency. The results showed a significant influence of both target distinctiveness and distractor heterogeneity on FL and a significant effect of target distinctiveness on TFD. Yet, as observed before, the effect of surface size was much stronger than the target distinctiveness and distractor heterogeneity measures.

Brand identifiability served as a measure of saliency in the study by Pieters et al. (2010). This measure incorporated different features such as contrast and heterogeneity of background elements. Brand identifiability had no significant effect on attention toward advertisement elements or toward the advertisement as a whole. Contrary to this, significant surface size effects were observed for individual elements.

Similar results were obtained by Peschel et al. (2013) where size and saliency were manipulated in an orthogonal design. Saliency did not affect attention significantly; size on the other hand influenced FL significantly. Since the study manipulated visual saliency and surface size in an orthogonal design, we can conclude that surface size had a significant effect independent of the level of saliency. In Orquin et al. (2012) both size and saliency showed a significant effect on FL but the influence of surface size on attention was twice as strong as that of saliency.

Overall, the theory of size as saliency can only be accepted in parts. There are sound theoretical reasons and empirical evidence that support the assumption of a substantial correlation between size and saliency. However, the effects of surface size on attention cannot be subsumed to visual saliency as the effect is both independent of and stronger than that of visual saliency.

#### **COMPETITION FOR ATTENTION**

The theory of surface size referred to here as "competition for attention" is derived from work by Janiszewski (1998). It is based on the assumption that object-based signal strength measured by visual receptor cones deteriorates as a function of distance from the focal point. This is due to the fact that less receptor cones are located outside the fovea which the center of the visual image is projected onto. Peripheral objects are therefore projected onto an area with fewer receptors, resulting in a weaker signal strength. However, increasing the size of peripheral objects leads to projection on a larger area of receptors resulting in measurably stronger

"fpsyg-04-00902" — 2013/12/5 — 17:18 — page 4 — #4

signal strength. The theory further assumes that based on their attentional demand, objects in a visual scene compete for attention. Attentional demand refers to an object's strength to attract attention based on its size and the distance to the object which is currently fixated. The theory further states that the CA, when fixating one object, is equal to the sum of attentional demand from surrounding objects. Objects with a low surrounding CA will attract more fixations. Janiszewski (1998) calculated an object *i*'s CA as the sum of the size to distance ratio of all surrounding objects *j* which we will refer to as CAJA*i*:

$$
\text{CAJA}\_{i} = \sum\_{1}^{j} \frac{S\_{j}}{D\_{i-j}}.\tag{1}
$$

Where *Sj* refers to the size (in degrees) of the surrounding objects in question and *Di*−*<sup>j</sup>* is the distance (in degrees) from object *j* to object *i.* In his first study, Janiszewski (1998) found a significant negative correlation of CAJA*<sup>i</sup>* with an average gaze time (*r* = −0.46) indicating that an object faced with a lot of CA attracted less fixations. In an additional study he found that incorporating CAJA*<sup>i</sup>* in a regression model next to size as a factor explained significantly more of the observed variance.

While deriving his model, Janiszewski (1998) pointed out that the attentional demand of any object is equal to its size discounted for loss of acuity. We incorporate this idea into his model below. The loss of acuity is a consequence of diminishing visual acuity in retinal eccentricity and it has been shown that in order to maintain visual acuity an object must increase by 0.2◦ in size for each degree of retinal eccentricity (Anstis, 1974). To maintain acuity as an object recedes from the currently fixated location, it must grow by 0.2◦. If it, on the other hand, maintains its current size it follows that acuity is reduced by a factor of 1/1.2 or 83.33% for each degree of retinal eccentricity (Anstis, 1974). The acuity loss function is illustrated in **Figure 1**. It documents the loss of acuity in percent for each degree increase of retinal eccentricity.

To compute the acuity of an object of size *S* and distance *D* from a current fixation, the following formula can be applied:

$$\text{Acuity}\left(\text{S}, D\right) = \text{S} \ast \text{0}, \text{83}\,\text{3}^D. \tag{2}$$

Incorporating visual acuity loss in the computation of CA for object *i* (CA*i*) with *j* surrounding objects results in the following formula:

$$\text{CA}\_{i} = \sum\_{1}^{j} \text{S}\_{j} \ast \text{0}, \text{833}^{D\_{i-j}}.\tag{3}$$

Here each object *j* is discounted for acuity loss seen from the fixation position of object *i*. The idea is that when fixating an object *i*, all surrounding objects are competing to draw attention away from the object. Depending on their distance and size, the surrounding objects will impose an either greater or weaker CA.

Since we are interested in explaining size effects on attention and not the competition of attention caused by these effects, we refine the model further. Considering the overall CA in the visual scene, the sum of relative CA imposed on any object can be computed as the proportion of CA (CA*i*) for that object relative to the overall sum of CA(*<sup>N</sup> <sup>j</sup> CAj*) in the visual scene. To further derive the relative measure of attention directed to an object *i* (*Ai*), the proportion of the inverse of CA serves as our model of demand for attention based on CA:

$$A\_i = \frac{1/CA\_i}{\sum\_{j}^{N} 1/CA\_j}.\tag{4}$$

This model predicts that, everything else being equal, the more objects there are in a visual scene the less attention will be directed to any object. In addition, it can be derived that the effect of increasing object size on attention is stronger for smaller set sizes, i.e., visual scenes containing fewer objects. If all objects have the same size and distance to each other, each object will receive 1/N measures of attention. However, with increasing size and various degrees of dispersion, differences in relative attention devoted to central or peripheral objects can be observed, as illustrated in **Figure 2**. The *x*-axis describes size increments of a peripheral or central object by a factor of 10. In the low dispersion condition, peripheral objects are defined to be equally dispersed around the center with a distance factor of 5. In the high dispersion condition, the peripheral object is located twice as much outside the center. According to the model, an object which is located on the periphery relative to the other objects in the visual scene should initially receive more visual attention due to less CA. However, size increments gains are lower compared to an object which is located closer to the center. As opposed to the two theories reviewed above, the theory of competition of attention predicts that increasing object size has diminishing marginal effect on attention.

Evidence in favor of the theory of CA is scarce as the identified papers do not provide information about set size or position of the stimuli. One prediction, however, is confirmed as diminishing marginal effects of surface size on attention have been extensively covered in the previous section. To further assess the robustness of the model on a theoretical level, we proceed to discuss the assumptions in more detail. The theory is based on three main assumptions: first of all, in order to compute CA it is assumed that all objects are fixated, recall that CA is the sum of demand for attention of all surrounding objects. Second the theory assumes that there is no effect of the object centrality in the visual scene, i.e., objects that are positioned more centrally in the visual scene are not fixated more often than peripheral ones. The third and most important assumption is that an object's signal strength is a function of its visual acuity.

Regarding the assumption of all objects being fixated, it is clear that this can only be the case under certain conditions such as when there is a limited number of objects in the visual scene. It has, for instance, been demonstrated that increasing set size leads to non-attendance, i.e., objects not being fixated, and that increasing the set size further leads to increasing non-attendance (Orquin and Mueller Loose, 2013). For a general theory of surface size it is therefore not appropriate to assume that all objects are fixated. The second assumption that there is no effect of centrality is also difficult to sustain as a general principle as it has been shown repeatedly that participants tend to gaze at the middle of a visual scene (Vincent et al., 2009; Tatler et al., 2011).

The third assumption about object signal strength as a function of visual acuity is more difficult to evaluate. To understand the claim about object-based signal strength, note should be made of the fact that objects have been shown to predict attention better than, for instance, visual saliency (Scholl, 2001; Einhäuser et al., 2008). A number of computational models have been developed to explain object-based attention (Walther and Koch, 2006; Orabona et al., 2008). The tenet of these models is that visual selection

"fpsyg-04-00902" — 2013/12/5 — 17:18 — page 5 — #5

occurs after the perception of an object (Rensink, 2000) or even after categorization of the object (Bundesen, 1998). These findings suggest a strong effect of object-based attention in the sense that identifying or recognizing something as an object increases the likelihood of fixating it.

If increasing object surface size augments object identification, it should also have an effect on attention. It has already been shown that size is a strong predictor of visual acuity, which determines how well a stimulus is detected and identified (Anstis, 1974; Strasburger and Rentschler, 1996; Kondo et al., 2008). However, most of the studies on object identification in eccentricity looked at isolated objects, which is not ecologically valid, as visual scenes in general contain several objects. In addition, the number of objects in a scene was shown to create visual clutter (Rosenholtz et al., 2007) or visual crowding (Levi, 2008; Whitney and Levi, 2011), which prevents the identification of target objects. Nevertheless, it has been argued that crowding is independent of object size (van den Berg et al., 2007) which would mean that clutter and crowding can be ignored when trying to describe effects of object size.

However, a few studies indicate that crowding is not size independent but that differences in size and shape between target and flankers diminish crowding effects (Treisman and Gelade, 1980; Nazir, 1992; Kooi et al., 1994; Levi and Carney, 2009). Results of these studies showed increased target identification when flankers were larger than the target, i.e., small object size led to increased identification when the target was increasingly distinct from the flankers. This suggests that surface size can play a larger role than merely determining acuity. Whether flanker effects of this type occur in natural vision is difficult to say as most experiments on crowding used highly controlled lab experiments. What seems clear is, however, that surface size does affect an object's signal strength and that this signal is furthermore dependent on the distance of the object to the location currently fixated. This speaks in favor of the third assumption stating that an object's signal strength is a function of its visual acuity.

#### **DEMAND FOR ATTENTION**

Taking the above considerations into account, we propose an alternative model to the theory of CA (Janiszewski, 1998). The assumption concerning object-based attention was found plausible as described above. However, the computation of CA*<sup>i</sup>* (Eq. 3) faces two theoretical problems as it is based on the assumption of all objects being fixated as well as the assumption that object centrality plays no role on attention. To address these challenges we propose a revision of the model of CA. Our model is based on the demand for attention of an object *i* (DA*i*) based on its signal strength relative to all other objects in the visual scene. To address the centrality issue we propose that demand for attention is computed as an object's visual acuity (Eq. 2) as seen from the center of the visual scene (Dc). In order to assess an object's relative demand for attention, thus incorporating CA in this model, the proportion of demand for one object is divided by the total demand for attention in the visual scene:

$$DA\_{\vec{i}} = \frac{\mathbf{S}\_{\vec{i}} \ast \mathbf{0}, 833^{D\_{\vec{i}}}}{\sum\_{1}^{\vec{j}} \mathbf{S}\_{\vec{j}} \ast \mathbf{0}, 833^{D\_{\vec{i}}}}.\tag{5}$$

"fpsyg-04-00902" — 2013/12/5 — 17:18 — page 6 — #6

The model predicts that objects with a higher relative demand for attention are fixated earlier and with a higher likelihood. If all objects have the same signal strength, each object will receive 1/*N* amounts of relative attention. This means that, holding signal strength constant, increasing the number of objects in the visual scene will reduce the relative amount of attention per object as a monotonic function of the set size. Plotting the effects of size increments on attention shows that central objects demand most attention initially and also gain more from size increments than peripheral objects as illustrated in **Figure 3**. Again the *x*-axis describes size increments of a peripheral or central object by a factor of 10. The peripheral object in focus is located away from the center by a distance factor of 10. Similar to the model of CA, the model predicts diminishing marginal effects of surface size on attention.

#### **EMPIRICAL EVALUATION OF THE THEORIES**

In order to evaluate the theoretical models derived above, we reanalyzed data from Orquin et al. (2012). FL, TFD, and TTF from 123 Danish consumers were analyzed. The total stimulus sample contained 198 product images from four food product categories (yogurt, milk, cheese, and butter). For each product, seven areas of interest were defined (brand, category, fat percentage, organic label, keyhole label, GDA label, pictorial). Each product had on average six areas of interest. Surface size, saliency, and position on the packaging were determined for each element. The saliency measure was obtained using a saliency algorithm developed by Itti and Koch (2001). The experimental stimuli were existing market products which might restrict the co-occurrence of some features. The data was collected using a Tobii 2150 eye tracker and each product image was displayed on the screen approximating the natural size of the product. The participants were asked to choose a product from a choice set of four products. Each product was viewed separately and for as long as the participants needed and the decision was made only after having viewed all four products. Although product packaging is different from advertising, the cognitive processes that guide eye movements should be comparable. We therefore argue that it is reasonable to transfer findings from product packaging to other areas such as advertisement research.

**FIGURE 3 | Effect of size increments on relative attentional demand for a centrally and peripherally located object.**

According to the formulas introduced above, CA as used by Janiszewski (1998) (CAJA), relative attention based on CA including (*Ai*), demand for attention excluding visual acuity loss (DAno\_acuity ) and demand for attention (DA*i*) were derived. Based on that, Spearman's Rho pairwise correlations of CAJA, A*i*, DAno\_acuity and DA*<sup>i</sup>* with aggregated attention measures (FC, FL, TFD, and TTF) were performed to identify the strength of association of the models using empirical data. We chose correlations as a measure of association between the two variables because it is straightforward in interpretation and constitutes an appropriate measure of effect size. In order to compare the performance of the models with other measures, size, distance to the center, and saliency, values were also correlated with the fixation data.

The results displayed in **Table 2** demonstrate that demand for attention showed the strongest relationship with all fixation measures. All correlations were between 0.5 and 0.6, indicating a strong relationship between the measure of demand for attention and fixation data. Correlations of size were similar to demand for attention, yet slightly less pronounced suggesting that our model of demand for attention contributes to explain size effects on attention. Distance to center showed significant correlations with all attention measures as well, even though less pronounced than size and demand for attention. To ensure the contribution of visual acuity loss, we also calculated demand for attention without acuity loss (DAno\_acuity ). This measure still performed better than the other CA measures but resulted in considerably weaker correlations than DA*i*. This means that adding distance to the center and visual acuity loss to the model contributes to the explanation of size effects.

Saliency was considerably less correlated with attention data but indicated that more salient objects receive more attention in terms of all fixation measures. *Ai* was weakly but significantly correlated with all fixation measures and all correlations point in the expected direction. Nevertheless, this model did not proof useful when trying to explain size effects on attention, because size as a measure on its own was stronger correlated with fixation data. The model introduced by Janiszewski (1998) was even weaker correlated with the empirical data. In addition, the correlation between CA and FL pointed in the wrong direction and the correlation with



*All Spearman's Rho (*ρ*s ) correlations are significant at p* < *0.01 level unless labeled n.s.*

TTF was not significant. On top of that, we could not find a correlation of CA and TFD as strong as reported in Janiszewski's (1998) study.

The results of this section suggest that the models of CA did not contribute to explaining size effect on attention. Contrary, the model of DA*<sup>i</sup>* was closest to our empirical fixation data. The finding supports the assumption of a central fixation bias and the necessity to account for visual acuity loss when analyzing surface size effects on visual attention.

#### **DISCUSSION**

Overall the reviewed studies showed that surface size had a significant positive effect on FC, FL, and TFD; however, magnitude differed in between objects and context of the stimuli. In order to get a more profound understanding of size effects and to classify their impact on attention, four theories of surface size effects on attention were evaluated in this paper. The first theory, explaining surface size effects with a linear increase in attention due to random fixations, could be rejected based on ample evidence showing that surface size effects on attention are logarithmic (Lohse, 1997). Small objects gained more from size increases than large objects (Chandon et al., 2009) suggesting that size increments were limited in the capacity to increase attention. All in all, these findings explain that size increments do not increase the probability that fixations randomly land on a larger area. Since small elements gain more from size increments, a logarithmic distribution of gains in attention is reasonable. Consequently size effects do not influence attention linearly based on random fixations but must be explainable with more targeted information acquisition.

Size as function of saliency comes closer to explaining the effect of increased attention based on size increments. Objects that are salient attract attention and suggest a greater interest to the observer in free viewing tasks (Parkhurst et al., 2002). However, this effect diminishes when applied to real-world search tasks (Henderson et al., 2007). Since most of the reviewed studies were conducted under free viewing conditions, it seems reasonable to expect an effect of visual saliency on attention. Based on the theory presented, surface size is seen as a dimension of saliency (Pieters et al., 2007). This would mean that an object's size increments affect attention because the object becomes more salient but not due to the size increment in itself. Indeed, a correlation exists between size and saliency. Nonetheless, the results presented show that size cannot be seen as a property of saliency but that it is a measure on its own showing consistently stronger effect sizes than saliency. Interestingly though, the correlation of saliency with fixation data was stronger than that of CA. This confirms the potential of saliency to attract attention, yet to a lesser extent than size. This is supported by the significant interaction effect of size and saliency found in Orquin et al. (2012). It is reasonable to assume that a salient target will gain more attention from size increments than a non-salient one; however, the driver for this effect remains surface size in itself. The theory of size as function of saliency therefore is not supported by our findings.

The models of CA incorporate set size and distance features of surrounding objects in order to explain the effect of size increments on attention. Janiszewski's (1998) model predicts that an

"fpsyg-04-00902" — 2013/12/5 — 17:18 — page 7 — #7

object that is faced with a high degree in CA will receive less attention. Our data revealed weak or non-significant relations going in the opposite direction than that predicted by the model. The data clearly shows that other models are better suited to account for the observations.

The model of attentional demand (*Ai*) refines the model of CA as it accounts for object signal strength as a function of visual acuity loss relative to the fixated object. In addition, this model delivers clear predictions as to the measure of attention that each object will receive relative to all other objects in the visual scene. This also means that the more objects there are in a visual scene, the less attention each object receives. Furthermore, an increase in size of the fixated object will result in increased attention for this object relative to the competition of attention imposed from all other objects. As described above, the increase in attention is predicted to be shaped as a logarithmic function. The correlations with our attention data were significant and pointed in the expected direction. However, size as a measure on its own was stronger correlated with attention data. Consequently, we do not find evidence supporting that this model provides an explanation of size effects on attention.

As discussed before, the underlying assumptions of the models of CA were not robust when related to findings from the literature. A closer relationship with the data was expected when the assumption of equal distribution of attention in the visual scene was replaced by the assumption of a central fixation bias. The model of demand for attention (DA*i*) is much simpler than the models of CA as it only consists of the target object's size and distance to the center relative to the sum of demand for attention from the other objects in the visual scene. Increasing the target object's size was predicted to result in a logarithmic increase of attention toward that object. Objects which were closer to the center would gain more from size increments than others. The correlations between our fixation data and demand for attention resulted in substantially stronger correlations than all other measures. Consequently, the model of demand for attention performed better than pure surface size measures. This supports the idea that size effects depend on other factors as well such as set size and position. Our model of demand for attention predicts that fixations are equally distributed when objects are equal in size and distance. However, increasing the size of one object, leaving distances equal, should result in more fixations for the larger object. This prediction is in line with previously mentioned findings, namely that crowding effects are dependent on size effects (Treisman and Gelade, 1980; Nazir, 1992; Kooi et al., 1994; Levi and Carney, 2009). In support of our model, demand for attention without visual acuity loss (DAno\_acuity ) performed better than the other CA measures, saliency and distance to the center. Still, this measure was considerably weaker correlated with fixation measures than DA*<sup>i</sup>* and surface size. A potential explanation for the robust performance of DAno\_acuity could be the strong effect of the central bias assumption since it is the major difference between demand for attention and the CA models. If this is true, then measuring distance from the target object to the center and to all other objects is one of the major features that need to be taken into account when explaining size effects.

Based on our findings, size effects on attention can be explained by an object's signal strength, which is a function of visual acuity loss and distance to the center. Increased signal strength serves as a proxy for visual attention. When observing a visual scene, the center will be the focal point of attention. Surrounding objects compete for attention with less signal strength the further they are away from the center. Increasing object size according to the acuity loss function, however, will compensate for the distance to the center and enhance visual perception of peripheral stimuli. The theoretical model assumes that shifting the position of a stimulus closer to the center, results in an increase in signal strength but to a lesser degree than increasing size. Transferring these results to visual advertising research, an improved layout design might be achieved by systematically organizing the important information based on the signal strength of each element and taking into account that size and position of all advertisement elements influence each other in terms of how much attention each object will gain. Increasing the size of one object increases its signal strength but imposes CA on other elements. Being located as close to the center as possible enhances signal strength but is also dependent on the location and size of other objects in an advertisement. The model of demand for attention, which we suggest, could serve as a systematic approximation to optimize advertising layout in practice; this might improve sales through more efficient attention allocation.

Implementing object signal strength as a function of visual acuity loss and distance to the center is a refinement of the model of CA (Janiszewski, 1998) which contributes to an understanding of surface size effects on attention. Nevertheless, the magnitude of correlations showed that our model can be further improved by other factors that contribute to explain size effects on attention. Overall it can be concluded that size effects on attention depend on their surroundings and can be more effectively predicted when visual acuity loss and distance to the center are accounted for in the model.

Future research should attempt to improve the integration of set size in the model of demand for attention. The current model accounts for set size by incorporating the total demand for attention in the estimation of relative attention to each individual object. However, the decisive factor for relative attention allocated to each object depends on the signal strength of the object. This means that in a visual scene with high heterogeneity in demand for attention, e.g., the target object has high demand for attention and the distractor objects have low demand for attention, the number of surrounding objects should have little or no effect on the amount of relative attention to the target. The question is whether this assumption is realistic or whether there is a minimum influence of the set size on demand for attention.

#### **AUTHOR CONTRIBUTIONS**

The authors contributed in equal shares to this article.

#### **REFERENCES**

"fpsyg-04-00902" — 2013/12/5 — 17:18 — page 8 — #8

Anstis, S. M. (1974). A chart demonstrating variations in acuity with retinal position. *Vision Res.* 14, 589–592. doi: 10.1016/0042-6989(74)90049-2

Boerman, S., Smit, E., and Meurs, L. (2011). "Attention battle; the abilities of brand, visual, and text characteristics of the ad to draw attention

versus the diverting power of the direct magazine context," in *Advances in Advertising Research*, Vol. 2, ed. S. Okazaki (Wiesbaden: Gabler Verlag), 295–310.


"fpsyg-04-00902" — 2013/12/5 — 17:18 — page 9 — #9


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 01 October 2013; accepted: 14 November 2013; published online: 09 December 2013.*

*Citation: Peschel AO and Orquin JL (2013) A review of the findings and theories on surface size effects on visual attention. Front. Psychol. 4:902. doi: 10.3389/fpsyg. 2013.00902*

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology.*

*Copyright © 2013 Peschel and Orquin. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

"fpsyg-04-00902" — 2013/12/5 — 17:18 — page 10 — #10

## Statistical image properties of print advertisements, visual artworks and images of architecture

#### *Julia Braun1, Seyed A. Amirshahi 1,2, Joachim Denzler <sup>2</sup> and Christoph Redies <sup>1</sup> \**

*<sup>1</sup> Experimental Aesthetics Group, Institute of Anatomy I, University of Jena School of Medicine, Jena University Hospital, Jena, Germany <sup>2</sup> Computer Vision Group, Friedrich Schiller University, Jena, Germany*

#### *Edited by:*

*Jaana Simola, University of Helsinki, Finland*

#### *Reviewed by:*

*Britt Anderson, Brown University, USA Branka Spehar, University of New South Wales, Australia*

#### *\*Correspondence:*

*Christoph Redies, Experimental Aesthetics Group, Institute of Anatomy I, Jena University Hospital, Teichgraben 7, D-07740 Jena, Germany e-mail: christoph.redies@ med.uni-jena.de*

Most visual advertisements are designed to attract attention, often by inducing a pleasant impression in human observers. Accordingly, results from brain imaging studies show that advertisements can activate the brain's reward circuitry, which is also involved in the perception of other visually pleasing images, such as artworks. At the image level, large subsets of artworks are characterized by specific statistical image properties, such as a high self-similarity and intermediate complexity. Moreover, some image properties are distributed uniformly across orientations in the artworks (low anisotropy). In the present study, we asked whether images of advertisements share these properties. To answer this question, subsets of different types of advertisements (single-product print advertisements, supermarket and department store leaflets, magazine covers and show windows) were analyzed using computer vision algorithms and compared to other types of images (photographs of simple objects, faces, large-vista natural scenes and branches). We show that, on average, images of advertisements and artworks share a similar degree of complexity (fractal dimension) and self-similarity, as well as similarities in the Fourier spectrum. However, images of advertisements are more anisotropic than artworks. Values for single-product advertisements resemble each other, independent of the type of product promoted (cars, cosmetics, fashion or other products). For comparison, we studied images of architecture as another type of visually pleasing stimuli and obtained comparable results. These findings support the general idea that, on average, man-made visually pleasing images are characterized by specific patterns of higher-order (global) image properties that distinguish them from other types of images. Whether these properties are necessary or sufficient to induce aesthetic perception and how they correlate with brain activation upon viewing advertisements remains to be investigated.

**Keywords: experimental aesthetics, digital image analysis, self-similarity, complexity, anisotropy, fractal dimension, Fourier spectrum, Pyramid of Histograms of Oriented Gradients (PHOG)**

#### **INTRODUCTION**

Neuroeconomics and neuroaesthetics are two areas of experimental aesthetics, which study responses of the human brain to advertisements and beautiful images or objects, respectively. Both types of visual stimuli can induce the experience of pleasantness in human observers. At least in part, they also activate similar regions of the brain's self-reflective and reward circuitries, for example, the medial orbitofrontal cortex, the ventromedial prefrontal cortex, the ventral pallidum and the ventral striatum (Erk et al., 2002; O'Doherty et al., 2003; Cela-Conde et al., 2004; Kawabata and Zeki, 2004; Vartanian and Goel, 2004; Jacobsen et al., 2006; Schaefer et al., 2006; Simmons et al., 2013). Other brain systems that are associated with the perception of visually pleasing stimuli, such as artworks, are involved also in moral judgment (Zaidel and Nadal, 2011; Avram et al., 2013) or belong to the default mode network (Vessel et al., 2012).

Along another line of research in experimental aesthetics, the computational approach aims to identify the statistical properties of visually pleasing images and to relate them to visual perception (Hoenig, 2005; Datta et al., 2006; Li and Chen, 2009; Graham and Redies, 2010; Amirshahi et al., 2012, 2013). For example, recent studies revealed that subsets of artworks possess a scale-invariant Fourier power spectrum (Graham and Field, 2007; Redies et al., 2007a,b; Alvarez-Ramirez et al., 2008). Images of natural scenes also show this property (Field, 1987; Burton and Moorhead, 1987; Field and Brady, 1997; Simoncelli, 2003). Interestingly, the human visual system is adapted to process natural scene statistics efficiently (Parraga et al., 2000; Olshausen and Field, 2004). Taylor and colleagues demonstrated fractal-like structure in both natural scenes and abstract expressionist paintings (Taylor, 2002; Taylor et al., 2011). Based on these similarities, it has been speculated that visually pleasing images follow universal regularities so that they can be processed efficiently by the human visual system (Zeki, 1999; Redies, 2007). In a similar vein, it was proposed that stimuli that can be processed fluently are more aesthetic in general (Reber et al., 2004). In the computational approach, a particular focus was placed on visual artworks and photographs (Datta et al., 2006; Li and Chen, 2009; Amirshahi et al., 2012; Redies et al., 2012), but other types of visually pleasing images, such as graphic novels (Koch et al., 2010) and aesthetic writings (Melmer et al., 2013) have also been studied.

Advertisements are another type of man-made images that attract human attention, often by using pleasant visual stimuli. Many psychological studies have investigated what contents are suited for advertisements in order to evoke a pleasant feeling in the observer, and there are elaborate practical instructions on how to produce an appealing visual layout for print advertisements (Assael et al., 1967; Edell and Staelin, 1983; Finn, 1988; Bushko and Stansfield, 1997). Basic features of print advertisements, such as color, size, and spacing of dominant pictorial and text elements, have been examined (Assael et al., 1967), also in cross-cultural studies (Cutler and Javalgi, 1992). However, to the best of our knowledge, there are no studies to date on higher-order global statistical image properties of print advertisements that may possibly relate to aesthetic perception.

Another source of man-made, visually pleasing stimuli is architecture. The biophilia concept of architecture conjectures that urban planning and architectural design should be based on fractal (self-similar) geometry (Joye, 2007, 2011; Taylor and Sprott, 2008). This theory is based on the observation that humans prefer fractal geometry in their environment (Hagerhall et al., 2004; Taylor et al., 2005), possibly because the natural surroundings of our ancestors had fractal characteristics ("savannah hypothesis"; Orians, 1986; Forsythe et al., 2011). However, Torralba and Oliva (2003) studied simple image statistics, such as Fourier spectral signatures in images of street scenes and buildings, and showed that cardinal (horizontal and vertical) orientations are more prevalent in these images than in images of natural scenes.

In the present work, we investigate higher-order image properties that have been studied in visually pleasing stimuli before. On the one hand, we used a modern computational method that was developed for object recognition and categorization, the Pyramid of Histograms of Oriented Gradients (PHOG) method (Dalal and Triggs, 2005; Bosch et al., 2007). With this method, the following measures were calculated:


On the other hand, the above measures were compared to the following features that have been calculated for visually pleasing images before, also by other groups (for references, see above):


Using these six measures, we compared images of advertisements and architecture with various previously studied image categories, including colored artworks of Western provenance, and asked the following questions:


#### **MATERIALS AND METHODS IMAGE DATASETS**

We investigated 15 different categories of images, focusing on advertisements, artworks and architecture. For comparison, datasets of images that were studied before, including faces, simple objects, and natural scenes and patterns, were analyzed (control images). Each dataset consisted of about 200 color images.

The images of artworks and advertisements (except for show windows) were scanned from high-quality art books and advertisement brochures or magazines, respectively. For scanning, a calibrated digital scanner (Perfection 3200 Photo, Epson, Owa, Japan) was used, as described before (Redies et al., 2012). Images of the other datasets and show windows were taken with a digital camera (EOS 500D with EF- S15- 85 mm f/3.5-5.6 IS USM lens; Canon, Tokyo, Japan) by the authors (Julia Braun and Christoph Redies).

#### *Artworks*

For this category, we selected 197 colored (mostly oil) paintings from a previously analyzed dataset (Redies et al., 2012). The images were selected so that they represented a wide variety of subject matters (21 paintings of architecture, 52 portraits, 23 natural scenes, 60 abstract paintings and 41 other subject matters). The following art periods were covered: Renaissance (20 paintings from 18 artists), Baroque (20 paintings from 18 artists), Romanticism (12 paintings from 9 artists), Realism (20 paintings from 11 artists), Impressionism (20 paintings from 18 artists), Art Nouveau (5 paintings from 3 artists), Expressionism (20 paintings from 9 artists), Fauvism (7 paintings from 4 artists), Cubism (15 paintings from 7 artists), Surrealism (12 paintings from 4 artists), Suprematism (10 paintings from 6 artists), and abstract paintings (36 paintings from 25 artists, including 16 abstract expressionist paintings from 14 artists). Examples are shown in **Figure 1A**. Images were scanned at a size higher than about 4 million pixels. In the statistical measures analyzed, the sample of 197 paintings did not differ from the previously published dataset (Redies et al., 2012).

#### *Advertisements*

First, single pages that represented advertisements of one product were scanned from current magazines that were purchased in two newspaper kiosks. An effort was made to include as many different categories of magazines available in the shops. The group of single-product advertisements was further divided into advertisements for cars and automobile accessories (200 images; **Figure 1B**), fashion (mostly for women, 200 images; **Figure 1C**), cosmetics (198 images; **Figure 1D**) and others products (204 images; **Figure 1E**). Brochures obtained from local car sellers complemented the car image category. Second, other types of advertisements were scanned. They included covers of various womens' and TV magazines (196 images; **Figure 1F**) and leaflets from supermarkets and department stores with advertisements for grocery, furniture, hardware and other stores (212 images; **Figure 1G**). Members of the laboratory contributed this material. The cover page, the middle sheet and the rear page of the leaflets were used and typically contained advertisements for several products on each page. In addition, one of the authors (Julia Braun) took 85 photographs of show windows of fashion shops in Jena and Berlin, Germany, with a digital camera, as described above (**Figure 1H**). Photographs were taken during two walks in major shopping districts. All store windows encountered were photographed, except for those with strong light reflections. The photographs were cropped so that the height of the window fitted the height of the image.

For a general comparison of advertisement images with the other categories of images, we created a dataset that consisted of 30 images randomly selected from each of the seven advertisement subsets described above (210 images in total).

#### *Architecture*

Photographs of architecture in Austria, Germany and Spain were obtained for three different ranges of distance to the photographed objects. First, 200 photographs of urban scenes, which represent street views or a group of buildings, sometimes with horizon, were taken (**Figure 2A**). Second, 200 entire buildings were photographed (**Figure 2B**). Third, 3–4 floors of 175 facades were photographed; an attempt was made not to include cars or people in front of the ground floor (**Figure 2C**).

For comparison with other types of images, we included previously analyzed datasets (control images; Redies et al., 2012) in the present study. These image datasets are available on the following webpage: www.inf-cv.uni-jena.de/en/aesthetics.

#### *Simple objects*

This dataset included 200 photographs of ordinary household and laboratory equipment (**Figure 2D**).

#### *Face images*

This dataset comprised 200 face photographs of about 100 persons of both genders who were either smiling (72 images) or showed a neutral facial expression (123 images). These photographs were randomly selected from the AR face database (Martinez and Benavente, 1998). Similar photographs are shown in **Figure 2E**.

#### *Natural scenes and patterns*

Images of artworks share statistical similarities with images of natural scenes and patterns, in particular large-vista natural scenes (Redies et al., 2007a,b), also in the PHOG analysis (Redies et al., 2012). For comparison, the following datasets were used in the present study: 200 images of large-vista natural scenes of different landscapes (**Figure 2F**), including the horizon, and 200 images of branches that were taken in winter without foliage (**Figure 2G**).

#### **IMAGE CALCULATIONS**

For each image, values for self-similarity, complexity and anisotropy were obtained with the PHOG method, as described before (Amirshahi et al., 2012; Redies et al., 2012). Because halftone dots were visible in a small number of the scanned artworks, image size was reduced to 100,000 pixels by bicubic interpolation. This reduction was carried out for all image categories because the measured values depend on the image size (Redies and Groß, 2013). Color images were transformed into the Lab color space. The general procedure to calculate the PHOG measures is described in the Appendix.

Three different possibilities to calculate self-similarity were compared (**Figure 3**). The histograms of each section can be compared to the histograms (i) of the parent section at the previous level (parent approach; **Figure 3B**), (ii) of all the adjacent (neighboring) sections at the same level (neighbor approach; **Figure 3C**), (iii) of the entire image at level 0 (ground approach; **Figure 3D**). Self-similarity values obtained for the three levels of the pyramid (**Figure 3A**) were averaged, with each level carrying the same weight.

The slope of log-log plots of the radially averaged Fourier power spectrum was determined as described for natural scenes and artworks before (Burton and Moorhead, 1987; Field, 1987; Graham and Field, 2007; Redies et al., 2007a,b). In brief, images were padded according to square ones by adding a uniform border with a gray level that was equal to the mean gray level

of the image. All images were reduced to a size of 1024 × 1024 pixels by bicubic interpolation and isotropic scaling. A discrete Fourier transform (2d Fast Fourier Transform) was calculated to obtain the 2d power spectrum, which was then transformed into a 1d spectrum by rotational averaging for each frequency. In the log-log plane, power was plotted as a function of spatial frequency. To measure the slope of the resulting frequency spectrum, a least-squares fit of a line was performed in the range of 10–256 cycles/image and the slope of the line was determined.

The fractal dimension can be seen as an indicator for the complexity of a pattern: A high fractal dimension indicates high complexity, while a low fractal dimension indicates low complexity (Mureika and Taylor, 2013). The fractal dimension was estimated with the box-counting method, as described by Hagerhall et al. (2004). Because the box-counting method requires binarized images, we applied the canny-edge filter (Canny, 1986) to each image. Next, each image was covered by a mesh of "boxes" that represented equally sized squares. This procedure was repeated for decreasing box sizes ε, which results in an increasingly finer mesh. Let *N()* be the number of boxes that are occupied by the pattern in relation to a specific box size . According to the power law relation *<sup>N</sup>()* <sup>∼</sup> −*D*, the box-counting dimension *<sup>D</sup>* can be estimated by fitting a line to the plot log *N()* vs. log*(*1*/)* and measuring the slope of this line. All calculations were performed with the MatLab program.

#### **STATISTICAL ANALYSIS**

For the statistical verification, we carried out the non-parametric Wilks-Lambda multivariate analysis of variance test on all 15 image categories with all the six measurements, followed by the Tukey *post-hoc* test for individual comparisons for all pairs of categories.

#### **RESULTS**

We measured statistical properties in 15 different categories of images, with a particular focus on advertisements, artworks and architecture. Six features that were previously studied in visually pleasing images were calculated (self-similarity, complexity, anisotropy, the Birkhoff-like measure, the slope of the 1d Fourier spectrum, and the fractal dimension; see Introduction). Statistical testing showed overall differences between all 15 groups and all six measures (*p <* 0*.*001).

Median values of the measured properties for all 15 image categories are provided in **Tables 1**–**5** and are summarized in **Figure 4**. **Figures 5**, **7A** compare the results for advertisements with five datasets of image categories that were analyzed previously by our group (artworks and photographs of branches, simple objects, natural scenes and passport-type face photographs; Redies et al., 2012). Results show that images of advertisements can be characterized by a specific combination of the measured properties on average, as discussed in more detail below (section Comparison of Advertisements to Other Image Categories). The other image categories can also be characterized by specific combinations of the six measures (Redies et al., 2012).

In the following sections, we will first evaluate the three different methods to calculate self-similarity by the PHOG method. Second, we will compare results for advertisements with the previously studied image categories. This comparison was of interest in particular with respect to man-made images vs. natural images

construction of the pyramid for the calculation of the histogram of oriented gradients (HOG) features is shown in **(A)**. The numbers in **(A)** indicate the levels of the pyramid. To determine self-similarity, a section at a given level of the pyramid (orange) can be compared to the parent section **(**yellow in **B)**, to the neighboring sections **(**yellow in **C)** or the entire image at the ground level **(**yellow in **D)**.

#### **Table 1 | Comparison of different approaches to calculate self-similarity.**


*\*Significantly different from advertisement (p < 0.001).*



*Values represent median (mean* ± *SD), calculated on the basis of the ground approach.*

*<sup>a</sup>, <sup>b</sup>, cSignificantly different from advertisement (ap < 0.05; bp < 0.01; cp < 0.001).*

*dSignificantly different from natural scenes (p < 0.001).*

*eImages were randomly sampled from all seven advertisement subcategories.*



*Values represent median (mean* ± *SD).*

*<sup>a</sup>, <sup>b</sup>, cSignificantly different from advertisement (ap < 0.05; bp < 0.01; cp < 0.001).*

*<sup>d</sup>, eSignificantly different from natural scenes (d p < 0.05; ep < 0.001).*

and for images that differ in their degree of their aesthetic appeal. Third, we will compare results obtained for the different types of advertisements. Fourth, results for images of architecture will be described.

#### **METHODOLOGICAL CONSIDERATIONS**

The values that are derived from the PHOG method critically depend on several parameters, for example, the resolution of the images and the level of the image pyramid, on which the analysis was performed (Amirshahi et al., 2012). For this reason, we carried out PHOG calculations for all images at the same resolution (100,000 pixels) and at the same level (level 3).

To calculate self-similarity, three different approaches were considered in the present work (see Materials and Methods, Section Image Calculations; **Table 1**). In two earlier studies from our group, self-similarity was calculated based on the parent approach (**Figure 3B**; Amirshahi et al., 2012; Redies et al., 2012). The average self-similarity for advertisements assumes intermediate values (0.68; **Figure 5A**). Values for artworks (0.74), natural scenes (0.73) and branches (0.83) are higher (*p <* 0*.*001) whereas values for faces (0.56) and simple objects (0.62) are lower (*p <* 0*.*001). These values are similar to those from our earlier study (Redies et al., 2012).

For the second method of calculating self-similarity (neighbor approach; **Figure 3C**), overall results follow the same pattern, but self-similarity values were generally lower than for the parent approach (**Figures 5A,B**, **Table 1**). This decrease is especially prominent for images of advertisements (0.37), artworks (0.50), faces (0.26) and simple objects (0.28). The decrease of average values for natural scenes (0.65) and branches (0.69) is smaller.

Results for the ground approach (**Figures 3D**, **5C**, **Table 1**) are intermediate between those of the parent approach and the neighbor approach. Again, the overall pattern of differences is similar compared to the two other approaches.

In conclusion, these results demonstrate that the different PHOG-derived approaches to calculate self-similarity are relatively robust with regard to relative differences between image categories, although absolute values differ. In the following comparison, we chose the ground approach.

#### **COMPARISON OF ADVERTISEMENTS TO OTHER IMAGE CATEGORIES**

The scatter plots shown in **Figures 5C,D**, **7A** compare the results for advertisements and the other image categories. The plot of self-similarity vs. complexity (**Figure 5C**), of anisotropy vs. the Birkhoff-like measure (**Figure 5D**) and Fourier slope vs. fractal dimension (**Figure 7A**) reveal distinct but partially overlapping clusters for each image category.

For advertisements, self-similarity values (0.62; **Figure 4C**, **Table 2**) differ significantly from artworks (0.68, *p <* 0*.*001), simple objects (0.53, *p <* 0*.*001), faces (0.43, *p <* 0*.*001) and branches (0.78, *p <* 0*.*001) but not from natural scenes (0.64). Complexity values (**Figure 4C**) obtained for advertisements (9.00) are higher than values for artworks (7.23, *p <* 0*.*05), simple objects (6.18; *p <* 0*.*001), and faces (3.99; *p <* 0*.*001), but much lower than for branches (28.25; *p <* 0*.*001). The complexity of natural scenes (10.63; **Table 2**) is similar to that of advertisements. This pattern of differences is similar to the previously published results (Redies et al., 2012) although absolute values differ because of the resolution of the images used (1 million pixels vs. 100,000



*Values represent median (mean* ± *SD), calculated on the basis of the ground method.*

*<sup>a</sup>,b,cSignificantly different from artworks (ap < 0.05; bp < 0.01; cp < 0.001; Table 2)*

*<sup>d</sup>, <sup>e</sup>, <sup>f</sup> Significantly different from natural scenes (d p < 0.05; ep < 0.01; <sup>f</sup> p < 0.001; Table 2).*



*Values represent median (mean* ± *SD).*

*aSignificantly different from artworks (p < 0.001; Table 3)*

*<sup>b</sup>, cSignificantly different from natural scenes (bp < 0.05; cp < 0.001; Table 3).*

pixels) and the approaches to calculate self-similarity varied (see above).

We obtained distinct clusters for each image category in the scatter plot of anisotropy vs. the composite Birkhoff-like measure (**Figure 5D**, **Table 2**). For example, compared to advertisements (0.70 <sup>×</sup> <sup>10</sup>−3), anisotropy is lower for artworks (0.49 <sup>×</sup> <sup>10</sup>−3, *<sup>p</sup> <sup>&</sup>lt;* <sup>0</sup>*.*001), natural scenes (0.42 <sup>×</sup> <sup>10</sup>−3, *<sup>p</sup> <sup>&</sup>lt;* <sup>0</sup>*.*001) and branches (0.33 <sup>×</sup> <sup>10</sup>−3, *<sup>p</sup> <sup>&</sup>lt;* <sup>0</sup>*.*001), whereas it is higher for simple objects (0.93 <sup>×</sup> <sup>10</sup>−3, *<sup>p</sup> <sup>&</sup>lt;* <sup>0</sup>*.*001) and faces (0.86 <sup>×</sup> <sup>10</sup>−3, *<sup>p</sup> <sup>&</sup>lt;* <sup>0</sup>*.*001). For the Birkhoff-like measure, values are similar for advertisements and natural scenes (**Figure 5D**, **Table 2**). The comparison to all other image categories yields significantly different values.

The Fourier slope values of images of advertisements (−2.57) and architecture (−2.45 to −2.53) are similar (**Table 3**). Artworks have slope values (−2.77) lower than advertisements (−2.77; *p <* 0*.*001). Images of natural scenes and branches have slope values close to −2 (**Figure 7A**) while face images have much lower values than advertisements (−3.51; *p <* 0*.*001; **Figure 7A**). For the fractal dimension, as expected, the overall pattern of values is similar to that of complexity (**Figures 4C,D**).

#### **COMPARISON OF DIFFERENT ADVERTISEMENT CATEGORIES**

**Figures 6**, **7B–D** illustrate the results for single-product advertisements (**Figures 6A,B**, **7B**) and the other types of advertisement (magazine covers, supermarket and department store leaflets, and show windows; **Figures 6C,D**, **7C**). Detailed results are listed in **Tables 4**, **5** and compared to the other image categories in **Figure 4**.

Compared to the image categories described in the previous section, the dot clusters of the individual advertisement categories overlap to a much larger degree with each other and also with the artworks cluster (**Figures 6A–D**, **7B,C**). Singleproduct advertisements (cars, fashion, cosmetics and others) tend to have lower complexity, fractal dimension, and self-similarity than the other types of advertisements whereas anisotropy and the Fourier slope tend to be more variable in general (**Figure 4**, **Tables 4**, **5**). Compared to artworks, single-product advertisements tend to be less self-similar (*p <* 0*.*001) and more anisotropic (*p <* 0*.*001) but they do not differ in average complexity (**Figures 6A,B**, **Table 4**). Values for the Birkhoff-like measure are lower than for artworks (*p <* 0*.*001; **Figure 6B**). The Fourier slope values for single-product advertisements are higher than for artworks (*p <* 0*.*001), except for fashion images. The fractal dimension is smaller for fashion (1.44, *p <* 0*.*001) and cosmetics (1.42, *p <* 0*.*001) than for artworks (1.49) (**Figure 7B**). The other types of advertisement are more complex (*p <* 0*.*05 to *p <* 0*.*001), more anisotropic (*p <* 0*.*001) and have a lower Birkhoff-like measure and a higher fractal dimension than artworks (*p <* 0*.*001; **Figures 6C,D**, **7C**, **Tables 4**, **5**). Self-similarity for magazine covers is as high as for artworks, higher for leaflets (*p <* 0*.*001) and lower for show windows (*p <* 0*.*05; **Figures 4**, **6C**).

#### **IMAGES OF ARCHITECTURE**

The images of architecture (urban scenes, buildings, and facades) are more complex than artworks and single-product advertisements (*p <* 0*.*001; **Figures 4C**, **6E**; **Table 2**) but tend to be similar to the other types of advertisements. A similar trend is observed for the fractal dimension (**Figures 4D**, **7D**, **Tables 3**, **5**). The degree of anisotropy resembles that of single-product advertisements and show windows but is higher than for artworks (*p <* 0*.*001; **Figure 6F**, **Table 2**). Self-similarity for buildings is as high as for the combined dataset of advertisements; it is higher for facades (*p <* 0*.*001) and lower for urban scenes (*p <* 0*.*001;

**Figure 6E**, **Table 2**). The Birkhoff-like measure is lower for architecture than for artworks (**Figure 6F**; *p <* 0*.*001). The Fourier slope for images of architecture is similar to that of singleproduct advertisements (**Figures 4D**, **Tables 3**, **5**) and artworks (**Figure 7D**).

#### **DISCUSSION**

In this work, we studied statistical image properties in images of advertisements and architecture, and compared them to results of other man-made, visually pleasing images, such as artworks (Graham and Redies, 2010; Redies et al., 2012; Amirshahi et al., 2013; Melmer et al., 2013). Given the similarities in brain responses to these different types of rewarding stimuli (see Introduction), we speculated that the images might also share structural features at the stimulus level. This notion was challenged by measuring image features that have been studied in visually pleasing stimuli before (self-similarity, complexity, anisotropy, slope of the radially averaged Fourier spectrum, and fractal dimension; see Introduction). A particular focus of our study was the question of whether the images of advertisements differ from the other image categories.

In the following paragraphs, we will first point out similarities and differences between advertisements and the other image categories, such as artworks and large-vista natural scenes. Second, we will address the question of whether the measured values relate to the content and layout of different types of advertisements. Third, we will compare images of architecture to the other types of visual stimuli.

#### **DIFFERENCES BETWEEN IMAGES OF ADVERTISEMENTS AND OTHER IMAGE CATEGORIES**

In an earlier study, we showed that the PHOG measures (selfsimilarity, complexity and anisotropy) allow distinguishing artworks from many other image categories on average (Amirshahi et al., 2012; Redies et al., 2012). In the present work, we add images of advertisements to this comparison (**Figure 5**). In addition, we compare the results with previously obtained measurements of the Fourier slope and the fractal dimension (**Figure 7**).

Our results indicate that, like the other image categories, advertisements can be characterized by a specific combination of these measures (**Figures 4**, **5C,D**, **7A**, **Tables 2**, **3**). The results for the combined dataset of advertisements largely resemble those of artworks and natural scenes, although some differences were observed. All three image categories have relatively high values for self-similarity and intermediate values for complexity and the fractal dimension, compared to, for example, photographs of faces and branches. This finding supports the notion that subsets of visually pleasing images share specific statistical properties in general. However, we note that the different measures are not independent. For all images analyzed together, **Table 6** provides the Spearman correlation coefficients for the measures. As expected, the strongest correlation is found between

complexity and the fractal dimension. Other correlations, e.g., between complexity and self-similarity, are also relatively strong. The precise relation between the different measures remains to be determined.

An intermediate level of complexity has been previously linked to aesthetic perception. Berlyne (1974) described a u-shaped dependence of the hedonic value of aesthetic stimuli on complexity (or information content). He postulated that this curve could be explained by the different degrees of activation of two antagonistic systems in the human brain, a reward system and an aversion system. For visually pleasing stimuli, the dependence of beauty on complexity is not straightforward, partially because physical measures of complexity differ between studies (Forsythe et al., 2011) and there is a general lack of well-controlled studies that manipulate complexity in such images, with few exceptions (Jacobsen and Hofel, 2002; Taylor et al., 2005; Forsythe et al., 2011). Nevertheless, the present results support the notion that the different categories of visually pleasing images have an intermediate degree of complexity in general.

Furthermore, the present results indicate that the PHOGderived self-similarity measure used by us attains a similar degree in advertisements, artworks and natural scenes. The Fourier slope is not highly correlated with the self-similarity measure (**Table 6**). The Fourier slope of subsets of monochrome artworks and monochrome images of natural scenes share a slope value of around −2 (Graham and Field, 2007; Redies et al., 2007a,b; Alvarez-Ramirez et al., 2008), but the slope value of colored artworks converted to grayscale images is much lower (−2.8 in the

present study; -2.9 for colored art portraits in the study by Redies et al., 2007b). However, monochrome artworks are not equivalent to grayscale versions of colored paintings because color plays a pivotal role in aesthetic appreciation (Palmer et al., 2013). Consequently, conversion of colored artworks to grayscale version may destroy their aesthetic appeal.

Median anisotropy is higher for advertisements than for aesthetic artworks and natural scenes. This finding may relate to the presence of vertical text and image divisions along cardinal (horizontal and vertical) orientations in advertisements. Strikingly, all image categories studied are significantly more anisotropic on average than artworks, except for the natural categories (natural scenes and branches), although there is some degree of overlap (**Figures 5C,D**). A relatively low degree of anisotropy of artworks has been observed before in other studies (Koch et al., 2010; Redies et al., 2012; Melmer et al., 2013). This result is remarkable because, conceivably, artists can also produce highly anisotropic images. Note that specific natural patterns, such as lichen growth

the different colors, as indicated in each panel.

**Table 6 | Spearman correlation coefficients** *r* **for the measures studied (***p <* **0***.***001 for all comparisons).**


patterns and branches, can have even lower anisotropy values (Redies et al., 2012). In how far the responses of the visual system to isotropic and anisotropic visual stimuli relate to aesthetic perception is unclear at present.

The separation of the different image categories is especially clear in the scatter plots of anisotropy vs. the Birkhoff-like measure, which we defined as self-similarity divided by complexity (**Figure 5D**). Each image category, including advertisements and artworks, is characterized by a specific pattern of values defined by the three measures. Whether this pattern is required or sufficient for aesthetic perception remains to be studied.

#### **DEPENDENCE OF AESTHETIC MEASURES ON ADVERTISEMENT CONTENT**

Within the seven subcategories of advertisements, differences were observed, some of which were anticipated. For example, single-product advertisements are generally less complex than leaflets that promote multiple products on each page. Magazine covers, which contain a relatively large amount of printed text, are also more complex. It is likely that the high complexity of leaflets and magazine covers relates to the fact that they display multiple visual elements that can each attract attention separately (e.g., headings, text banners and price information) whereas single-product advertisements are the result of a more integrated visual composition that encompasses the entire image. As a consequence, in single-product advertisements, the appeal of the product may be carried by the global appearance of the entire image and not by its parts, such as in leaflets. In this respect, single-product advertisements may resemble artworks. For this reason, it is perhaps not surprising that the two types of images have similar statistical image properties.

Like complexity, self-similarity differs significantly between the various types of advertisements. Nevertheless, single-product advertisements and show windows exhibit a similar degree of self-similarity in general, when compared to the other image categories. Also, differences in anisotropy between single-product advertisements are relatively small (**Figure 4**, **Table 4**).

In conclusion, for the single-product advertisements studied, image properties are similar irrespective of the type of product shown. Whether and by what perceptual mechanisms these properties lead to a higher efficiency in promoting products remains to be studied.

#### **IMAGES OF ARCHITECTURE AND THEIR "BIOPHILIC" STRUCTURE**

As expected, self-similarity of buildings and facades is relatively high (**Figure 4**), possibly because they are composed of repetitive visual elements, such as windows and architectural ornaments. In contrast, urban scenes contain elements of more diverse forms (e.g., cars, trees, street surfaces and buildings) and are less selfsimilar. The complexity of architectural images is higher than in artworks in general. Compared to natural scenes, images of buildings and facades tend to be more complex while urban scenes share a similar degree of complexity. The relatively high degree of anisotropy in architectural images was anticipated because cardinal orientations are prominent in most types of architecture and urban scenes, as demonstrated before (Oliva and Torralba, 2006).

Natural environments are thought to be particularly rich in stress-reducing, restorative elements (Kaplan, 1995). It has been proposed that humans possess a visual preference for natural, fractal-like patterns in architecture and urban scenes ("biophilia" hypothesis; see Introduction). The overall similarities between architectural images and natural scenes in self-similarity and complexity support this idea. However, anisotropy in our set of architectural images is much higher than in natural scenes and natural patterns, such as branches. Most likely, anisotropy is lower in specific types of architecture that prominently feature oblique orientations, for example the buildings by Antoni Gaudí or Friedensreich Hundertwasser. Other styles of architecture, for example the Bauhaus style, are characterized by a conspicuous lack of oblique orientations (Salingaros, 1999), leading to high anisotropy. The role of anisotropy in architectural aesthetics therefore remains unclear.

The present study demonstrates that images of advertisements are characterized by a specific combination of higher-order image properties (high self-similarity, intermediate complexity and intermediate anisotropy) on average. For single-product advertisements, these properties do not depend on the type of product promoted. We hypothesize that the processing of such higher-order image features can be fast and may be mediated at a lower level of the visual system, similar to gist perception of scenes (Torralba and Oliva, 2003; Oliva and Torralba, 2006), possibly occurring even before the content of the advertisements (e.g., the depicted object, brand name etc.) is recognized. The degree of self-similarity and complexity in advertisements is close (but is not identical) to that of artworks and natural scenes. It has been shown that higher-order image properties similar to natural scenes allow an efficient processing in the visual system (Parraga et al., 2000; Simoncelli and Olshausen, 2001). This idea has been extended to visual artworks (Redies, 2007; Graham and Redies, 2010). Here, we speculate that the idea may also apply to print advertising, at least to some degree. Possibly, specific higher-order image properties enhance the visual effectiveness of advertisements. At higher levels of (cognitive) processing, the effectiveness of advertisements also depends on other factors, such as the psychological condition of the observer and the real or stimulated demand for the product. Interestingly, images of architecture share statistical properties with advertisements to a large extent. It remains to be investigated whether any of these image properties (or a combination thereof) plays a causative role in judging visual stimuli as perceptually pleasing.

#### **ACKNOWLEDGMENTS**

The authors thank Dr. Schlattmann for statistical advice, members of the Denzler and Redies groups for constructive suggestions, discussion and comments on the manuscript.

#### **REFERENCES**


Birkhoff, G. D. (1933). *Aesthetic Measure.* Cambridge: Harvard University Press.

Bosch, A., Tisserman, A., and Munoz, X. (2007). "Representing shape with a spatial pyramid kernel," in *Proceedings of the 6th ACM International Conference* *on Image and Video Retrieval,* (New York, NY: Association of Computing Machinery), 401–408. doi: 10.1145/1282280.1282340


categories of photographs. *PLoS ONE* 5:e12268. doi: 10.1371/journal.pone. 0012268


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 28 June 2013; accepted: 13 October 2013; published online: November 2013. 05*

*Citation: Braun J, Amirshahi SA, Denzler J and Redies C (2013) Statistical image properties of print advertisements, visual artworks and images of architecture. Front. Psychol. 4:808. doi: 10.3389/fpsyg.2013.00808*

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology.*

*Copyright © 2013 Braun, Amirshahi, Denzler and Redies. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

### **APPENDIX A**

This appendix gives an overview of the method that was used to calculate values for self-similarity, complexity and anisotropy (Amirshahi et al., 2012; Redies et al., 2012). The method is based on the Pyramid Histogram of Oriented Gradients (PHOG) approach. PHOG descriptors are spatial shape descriptors that were originally introduced by Bosch et al. (2007) for image classification. They are global feature vectors based on a pyramidal subdivision of an image into sub-images, for which Histograms of Oriented Gradient (HOG; Dalal and Triggs, 2005) are computed. In computer vision, such a data structure is called a *quadtree*.

In this approach, the following steps are taken to calculate a gradient image *G* (shown in **Figure A1**) for a given color image *I*.


selected and placed in a new image, *G*. From here on, we will refer to *G* as the gradient image. The following equation represents this approach in another way:

$$\mathbf{G}(\mathbf{x},\boldsymbol{\chi}) = \max\{ \left\| I\_L(\mathbf{x},\boldsymbol{\chi}) \right\|, \left\| I\_a(\mathbf{x},\boldsymbol{\chi}) \right\|, \left\| I\_b(\mathbf{x},\boldsymbol{\chi}) \right\| \}$$

As an example, **Figure A2C** shows the gradient image *G* of the photograph displayed in **Figure A2A**.

Next, the HOG features are calculated (Dalal and Triggs, 2005). HOG is based on the orientation of gradients in an image. Using image *G*, we separate the orientations of the gradients (**Figure A2D**) into *n* bins resulting in a HOG feature

$$h = (h\_1, h\_2, \dots, h\_n)$$

of size *n*. Although the value for *n* and the range of orientations could be any arbitrary number, using 8 or 16 bins with 180 or 360 degrees is common. In the present study, we used 16 bins covering 360 degrees (**Figure A2E**). To obtain the HOGs, the strength of all gradients is calculated for each bin. As the last step in the HOG calculation, the histogram values are normalized so that the sum of the values for all 16 bins is one.

The self-similarity measure is obtained by calculating the HOG features for each sub-image of the quadtree (**Figure A2B**) as follows:


The results of all HOG features at the individual levels of the quadtree (step 1–4) are concatenated to a features vector being the PHOG representation of the image. An example of the underlying histograms is depicted in **Figure A2E**.

To calculate the degree of self-similarity of an image, the Histogram Intersection Kernel (HIK; Barla et al., 2002) is used

$$\text{HHK}(h, h') = \sum\_{i=1}^{n} \min\left(h\_i, h'\_i\right).$$

to determine how similar two HOG features are. In the above equation, *h* and *h* represents two sets of HOG vectors for two sub-sections in the image and hi represents the value in the *i*th bin in *h*. The range of HIK is between 0 and 1. As it can be seen from the equation, the HIK function will result in the value of one in the case of two matching HOG features. The value of zero is reached if the entries for hi or h <sup>i</sup> is zero for all bins. The self-similarity for image *I* at any level *L* is calculated by

$$m\_{\mathbb{S}^\bullet}(I, L) = \text{median}\left(HIK\left(h(\mathbb{S}), h(N(\mathbb{S}))\right)\right).$$

In this equation, **h***(N(S))* corresponds to the nod (section) in the quadtree of the section, to which sub-section *S* is compared (parent section, ground section or neighboring sections; see **Figure 3**), and *h* represents the HOG vectors. Sample values for the selfsimilarity measure are given in the main text. By selecting the median value among the different values, we avoid taking the overshoots into account.

The mean of all gradient strengths in the gradient image *G* serves us as a measure of the complexity of the image in the present study. A uniform original image with small changes in pixel values would result in a gradient image of low mean values (low complexity) while an image with large changes would result in a gradient image of high mean values (high complexity). To calculate complexity for image *I*,

$$m\_{Co}(I) = \frac{1}{N \cdot M} \sum\_{(\mathbf{x}, \mathbf{y})} \mathbf{G}(\mathbf{x}, \mathbf{y})$$

the mean value over the gradient image, *G* is calculated. In this equation, *N* and *M* are the height and width of image *I*, respectively.

The HOG approach also allows deriving a measure for how different the strengths of the gradients are across orientations in an image (anisotropy). Low anisotropy means that the strengths of the orientations are uniform across orientations and high anisotropy means that orientations differ in their overall prominence. For example, in the photograph shown in **Figure A2A**, anisotropy is high because vertical orientations (0 and 180◦) are more prominent than the other orientations. As a measure of anisotropy, we calculate

$$m\_{AnI}(I,L) = \sigma(H(L)),$$

over the HOG feature entries for image *I* at the last level (*L* = 3 in our calculation). In this equation, *H(L)* corresponds to the HOG feature at level *L* and σ is the standard deviation,

$$\sigma(H(L)) = \sqrt{\frac{1}{m} \sum\_{i=1}^{m} \left( h\_i - \mu\_{H(L)} \right)}.$$

In this equation, μ*H(L)* corresponds to the mean value of all the bins at level *L*, while *m* represents the number of bins at level *L*. If each section of the images is divided to *sec* new subsections,

$$m = (\text{sec})$$

*l*

at level *l*. An image with a high degree of anisotropy will result in values that change a lot over the feature entries, while an image with a low degree of anisotropy will result in feature values that are approximately equal.

According to Birkhoff (1933), the aesthetic value depends on the ratio of order and complexity. Following this idea, we substituted order by self-similarity to obtain a Birkhofflike measure, as described in Redies et al. (2012). This measure was calculated for level 3 based on the ground approach.

## Scan path entropy and arrow plots: capturing scanning behavior of multiple observers

#### *Ignace Hooge\* and Guido Camps*

*Department of Experimental Psychology, Faculty of Social Sciences, Helmholtz Institute, Utrecht University, Utrecht, Netherlands*

#### *Edited by:*

*Jaana Simola, University of Helsinki, Finland*

#### *Reviewed by:*

*Sebastiaan Mathôt, Aix-Marseille Université, France Kenneth Holmqvist, Lund University, Sweden*

#### *\*Correspondence:*

*Ignace Hooge, Experimental Psychology and Helmholtz Institute, Utrecht University, Heidelberglaan 1, 3584 CS Utrecht, Netherlands e-mail: i.hooge@uu.nl*

Designers of visual communication material want their material to attract and retain attention. In marketing research, heat maps, dwell time, and time to AOI first hit are often used as evaluation parameters. Here we present two additional measures (1) "scan path entropy" to quantify gaze guidance and (2) the "arrow plot" to visualize the average scan path. Both are based on string representations of scan paths. The latter also incorporates transition matrices and time required for 50% of the observers to first hit AOIs (T50). The new measures were tested in an eye tracking study (48 observers, 39 advertisements). Scan path entropy is a sensible measure for gaze guidance and the new visualization method reveals aspects of the average scan path and gives a better indication in what order global scanning takes place.

**Keywords: scan-path, entropy, eye-tracking, gaze-guidance, data visualization**

#### **INTRODUCTION**

Marketing research companies use eye trackers to give layout advice to enhance advertisements (Pieters and Wedel, 2004; Pieters et al., 2007). This can be done, for example, by comparing gaze behavior on (slightly) different designs. A number of observers look at different designs, and based on how fast a certain element [e.g., brand label, visual or text message (Pieters and Wedel, 2004)] is fixated for the first time, the best design can be chosen. Based on the obtained eye movements advice can be given to slightly change the design. Examples of such advice are making a logo more conspicuous or removing a distracter that hampers message transfer. Heat maps (also referred to as attention maps) are often used to visualize the eye movement data in this evaluation process. Despite the benefits of heat maps to visualize eye movements, it is desirable to have a method to show more of the average scanning behavior of the whole population of observers. Is it possible to visualize the average scan path? This seems a simple question, but scan paths have both temporal and spatial properties and may differ a lot between individual observers. We are convinced that layout advice would gain from more sophisticated eye tracking methods than those that are available nowadays. In this study we suggest two quantitative measures combined with a visualization method to reveal properties of average scanning behavior. The visualization method is based on transition matrices (Goldberg and Kotval, 1999) and a visualization method proposed by Lessing and Linge (2002). The timing measure is based on T50, a reaction time measure that takes into account that areas of interest (from now on referred to as AOI) are not always fixated by all observers in the test (Montfoort et al., 2007). The measure for the spatial aspects of scanning, scan path entropy, is new.

The goal of visual communication material (e.g., ads, road signs, warnings) is to transfer a message effectively. "Message" should be interpreted here in its broadest sense; it may be a literal text message like "Drink Coke," but it may also the message that parking on the left side of the road is prohibited. The first requisite for visual message transfer is that people perceive the message. Perception may be a problem because the resolution of the retina decreases with eccentricity. A visual stimulus in the periphery may be too small to be resolved by the peripheral retina. There is a second problem for perception of elements in the periphery. Especially in cluttered visual scenes crowding may hamper perception. Crowding is the phenomenon that elements that look like a target element (a brand logo) laterally mask the target (Bouma, 1970; Toet and Levi, 1992; Vlaskamp and Hooge, 2006). The terminology may differ but in the marketing literature effects of visual context have been investigated (Pieters et al., 2007). The negative effects on perception of both the lower resolution of the peripheral retina and crowding can be reduced by making saccadic eye movements. By means of a saccade interesting elements are projected onto the fovea (the most sensitive area of the retina), to make sampling at the highest resolution possible. Another important factor in collecting visual information is time. Fixation time (time between two succeeding saccades) has to be long enough to enable encoding of the visual information around the fixation point. If fixation time is too short to allow for visual encoding, observers may re-fixate (Hooge and Erkelens, 1996; Hooge et al., 2007). There are of course many other factors that play an important role in message transfer (memory, state of mind, language, culture etc.). However, in the chain of collecting visual information, fixation is the first step. Without fixation, in most cases perception is impossible or hampered and the succeeding processes have no chance to succeed.

On top of the previous, in the real world message transfer suffers from additional factors. Concurrent messages compete for attention and exposure time is often limited. Imagine that you drive your car with 120 km/h and a company tries to reach you by means of a billboard standing along the road among other billboards. Similarly, static messages in television commercials are presented for a limited amount of time. The previous sets the prerequisites for ads; the more effective ad is the ad that that has the capability to transfer its message more quickly.

Eye tracking is a logical step in testing and improving visual material because fixations play an important role in gathering visual information. The usual way to investigate gaze behavior is by means of AOIs. Usually AOIs are drawn by hand and commercial packages (Tobii Studio, BeGaze from SMI and Data viewer from EyeLink) have the ability to compute most AOI measures such as dwell time, transition rate and time to first hit (Holmqvist et al. chapter 6). Dwell time and total dwell time are good measures for attention retention capacity; time to first hit is useful to estimate attention-capturing capacity of a visual element.

In manmade images such as paintings, drawings and ads the elements are often organized spatially. For example, ads are designed according to a traditional composition with brand logo in the right bottom corner acting like a metaphorical sender. One may expect that traditional composition may help to increase the number of fixations at the logo. However, we do not know whether this is true. In other cases designers may have a hypothesis or an intention with their design to guide gaze to a certain location. A good example of clever design can be found in the work of Oliviero Toscani who designed many controversial Benetton ads. The "Food for Life" ad of 1997 and the "White, Black and Yellow" ad seem to have gaze-guiding properties. Humans seem to be good eye trackers themselves, it is well known that gaze direction of one person is a strong cue for another person to direct attention (and gaze) to the gazed at location (Frischen et al., 2007). There are also attempts to actively guide gaze to certain locations by subtle visual cues (Barth et al., 2006a,b; McNamara et al., 2008). Magicians point with their hands and use their gaze and beautiful assistants to distract the audiences gaze from the actual locations where they do the trick. Here we will refer to gaze guidance if a visual stimulus has the (implicit and sometimes unintentional) capability to bias gaze systematically in a certain direction to increase the number of observers fixating a specific element such as the brand logo or to decrease time to brand logo fixation by guiding the eyes directly to it. Here we hypothesize that visual stimuli with good gaze guiding capacities produce similar scan paths in different observers.

Gaze guidance in relation to whole scan paths is a new topic, however, there is a long history of scan path research and this field is still very active (Noton and Stark, 1971; Brandt and Stark, 1997; Goldberg and Kotval, 1999; Cristino et al., 2010; Jarodzka et al., 2010; Dewhurst et al., 2012; Mathot et al., 2012). This literature focuses on scan path of individuals and methods to compare individual scan paths. In contrast, we are interested in describing and visualizing scan paths of a population of observers to investigate gaze guidance in visual stimuli. Such description will be useful in marketing research, investigation of art but also in the psychology of scene perception and visual search. What are the minimal requirements for such a description? The measures and visualizations should be capable of capturing temporal order, and spatial layout properties of scan paths of a population of observers.

As stated before, scan paths of individual observers should resemble each other more if gaze guidance is present and effective. A first attempt to investigate the effectiveness of gaze guidance is done by studying all scan paths of a group of observers at the same visual stimulus. We coded and subsequently counted the scan paths to produce a scan path histogram. Here we sketch two possible extremes: all observers produce different scan paths, the other extreme is that all observers follow a similar scan path, which can be seen as ordered group behavior. Information theory has a measure to describe the information in a variable in terms of ordering. This measure is Shannon entropy and it is defined as:

$$H(X) = -\sum\_{i=1}^{n} p\left(\mathbf{x}\_{i}\right)^{2} \log p\left(\mathbf{x}\_{i}\right) \tag{1}$$

where *H*(*X*) is the entropy in bits and *p*(*xi*) is the proportion of measurement *xi*. The idea behind entropy is as follows. If we throw a die, it has 6 possible outcomes (*x* = 1, *x* = 2, *x* = 3, *x* = 4, *x* = 5, and *x* = 6) and the chance on each of these outcomes is 1/6. We can apply the entropy formula (which means adding up the information values and weigh them with their chance of occurrence), resulting in *H(X)* = 2*.*5850 bits. Imagine that the die is loaded [the new manipulated chances are: *P(x* = 1*)* = 0*.*1, *P(x* = 2*)* = 0*.*1, *P(x* = 3*)* = 0*.*1, *P(x* = 4*)* = 0*.*1, *P(x* = 5*)* = 0*.*1 and *P(x* = 6*)* = 0*.*5]. Now entropy becomes lower, formula (1) gives *H(X)* = 2*.*161 bits. A loaded die is a metaphor for a visual stimulus biasing scan behavior. Biased scan behavior results in lower scan path entropy. In this study we measure eye movements in different advertisements and investigate whether scan path entropy is a sensible measure. If we succeed, we will have a measure (one number) to describe scanning behavior of a group of observers. In other words, scan path entropy can be used to quantify gaze guiding properties of a visual stimulus.

Temporal aspects of scanning are at least as interesting as spatial ones. Even if different observers follow similar scan paths, their behavior may differ a lot because some people fixate long, where others have a much higher saccade rate. For example people are known to fixate longer with increasing age (Spooner et al., 1980; Abel et al., 1983). To determine attention attraction power of an area of interest we could simply compute average time to first AOI hit. However average reaction time will not provide us with this information in certain situations and that needs some explanation. Imagine we engage 100 observers in a fictitious experiment with one visual stimulus yielding two AOIs. The observers were asked to watch the stimulus for 1.5 s. The data revealed that 45 unique observers fixated AOI nr 1 for the first time after on average 521 ms. AOI nr 2 was fixated by 21 unique observers and their average time to AOI first hit was 309 ms. Based on these two measures we cannot decide which of the two AOIs has the highest attention attraction power. AOI nr 1 attracts a higher number of fixations than AOI nr 2, but needs more time to achieve that. AOI nr 2 quickly receives a lot of attention, but is not very successful in attraction a lot of attention. This problem can be solved with a measure presented in Montfoort et al. (2007). They used a measure called T0*.*<sup>5</sup> instead of averaged RT to enable comparison of reaction times produced by two groups (here we refer to T50 instead of T0*.*5). In their experiment one group has high accuracy and long reaction times, the other group had low accuracy and shorter reaction time.

Now we have measures for gaze guidance and attention attraction capacities. We all know the saying "a picture is worth a thousand words", we need visualization that is more sophisticated than an attention map. A good visualization may help to explain test results to both customers of marketing research companies and designers in a more intuitive manner. The attention map is often used to illustrate gaze behavior of a group of observers, however, attention maps have many disadvantages. One disadvantage is that attention maps lack temporal ordering (Holmqvist et al., 2011, chapter 7 for a critical view on attention maps). Here we like to suggest a more sophisticated visualization based on the work of Lessing and Linge (2002). Linge and Lessing visualized a transition matrix (Stark and Ellis, 1981; Goldberg and Kotval, 1999) by means of arrows and numbers representing transitions from one AOI to another AOI and plotted those superimposed on the visual stimulus. They write "When the analysis is first presented there are no arrows, as arrows from every object to every other object can clutter the screen." Their solution to solve this problem is elegant, they produce a figure called "Stand-alone diagram" with AOIs represented by circles with arrows and numbers between the AOIs to represent transition probabilities. But, we still prefer the original visualization because the visual stimulus is involved. Therefore we stick to the original Lessing and Linge (2002) approach but modified it and combined it with the timing measure from the previous section to add temporal order information. The modification consists of splitting up the visualization in two parts: one component describing eye movement traffic density and the other describing net eye movement direction.

Summarized, to facilitate evaluation of eye movements of a group observers while viewing a visual stimulus we


To test the quality of our new measures scan path entropy and T50, we engaged 54 subjects in an experiment with 70 different ads downloaded from the Internet and compared Scan Path Entropy and T50. To be sure that the results are not caused by a specific method of AOI production we, we produce our AOIs in two ways: (1) gridded AOIs, each ad is divided in 12 equally sized rectangles, (2) hand drawn AOIs of fixation clusters based on the heat map. We expect to find a high correlation between Scan Path Entropy and T50 irrespective of AOI production method. Of course a validation against another method would be preferable, but we have no knowledge of another method to quantify gaze guidance. We believe that the least we can do is to show that we can deliver reasonable results based on data acquired with a 7-year-old low frequency eye tracker and two very different methods of AOI production.

### **METHODS**

#### **SCAN PATH ENTROPY**

In the following recipe we present a method to compute scan path entropy. Before we start we make some choices. The method presented here aims at measuring gaze guidance to the brand logo (we could have made another choice here). We know that many ads are designed for other purposes than only fixating the brand logo as fast as possible; therefore we expect some of the ads in the present test to have little or no gaze guiding capacity to the brand logo. In the following recipe we compute the entropy from scan paths that end on the brand logo. This recipe can be applied to other AOIs than the brand logo and it can also be used without a target AOI. We will touch on this issue in the discussion.


#### **T<sup>50</sup>**

To measure attention drawing power of an AOI one needs a measure that takes care of both the number of first AOI hits and the speed at which the attraction occurs. Such a measure is T50. T50 is extracted from the cumulative reaction time distribution.

(1) Construct the cumulative reaction time distribution. Make a list of points of time of first AOI hits (Holmqvist et al., 2011, page 189). In this list there is a point of time for each observer fixating the AOI at least once. The number of points may be lower than the number of observers in the test, because some observers miss or skip the brand logo. If there is more than one AOI in the visual stimulus, make a "AOI first hit" list for each AOI.


The maximum proportion of observers fixating the target AOI was 0.82. We refer to this number as fixation score or *P*max.

$$F(t; \alpha, \beta, \lambda, \mathbf{k}) = \alpha \left[ 1 - e^{-\left(\frac{(t-\beta)}{\hbar}\right)^{k}} \right] \tag{2}$$

T50can then be determined by taking the inverse function and find the value for 0.5.

#### **VISUALISED TRANSITION MATRICES: ARROW PLOT**

For the visualization of the eye movements we use the transition matrix. Transition matrix cells contain frequencies of direct transitions between AOIs. A transition is a saccade from one AOI to another one (Holmqvist et al., 2011, page 190/191). For this study we construct transition matrices with self-written matlab software, but there is also commercial software available for transition matrices production (BeGaze from SMI and Noldus observer that works with Tobii). To explain how we visualize transition matrices, we use an example with three AOIs (**Figure 2**). A transition matrix describing all transition between these three AOIs has nine cells (3 × 3). The cells on the diagonals are empty (a transition goes from one AOI to another). The cell in the first row and third column describes the number of transitions from AOI nr 1 to AOI nr 3. The cell in the third row and second column describes the number of transitions from AOI nr 3 to AOI nr 2. We decided to visualize transitions with two figures to avoid too much clutter in the resulting image. In one panel, we visualize the total number of transitions between AOIs with arrows with two arrowheads. In the other panel we visualize the net number of transitions between AOI. The net number of transitions has a direction, which will be indicated with an arrow with one arrowhead. For example, if there are 5 transitions from AOI nr 3 to AOI nr 1 and 4 transitions from AOI nr 1 to AOI nr 3, the net number of transitions from AOI nr 3 to AOI nr 1 is 1 (**Figure 2**). The net number of transitions between AOIs is calculated as follows

$$A\_{\text{net}} = A - A^T \tag{3}$$

With the transition matrix *A* and its mirror *AT*. The total number of transitions is

$$A\_{\text{total}} = A + A^T \tag{4}$$

**FIGURE 2 | Left panel.** Transition matrix: The diagonal contains zeros because by definition a transition is a saccade from one AOI to another. In this example there are 5 transitions from AOI nr 3 to AOI number 1. **Right panel**. The black cells in *A*net describe the net transitions. The positive number in the third row,

second column of *A*net denotes 2 transitions from AOI nr 3 to AOI nr 2. The negative number in the second row, first column denotes one transition from AOI nr 1 to AOI nr 2 (negative number reverses the direction). The black cells in *A*total describe the total number of transitions between three AOIs.

Since we added and subtracted *A* and *AT*, the cells under and above the diagonal yield similar information. We are only interested in the number of transitions between each AOI pair and therefore we only need the numbers in the cells of the left bottom corner (**Figure 2**, the black cells in *A*net and *A*total). If the observers make equal number of transitions between all the cells we expects no transitions in the net transition matrix. In contrast, we expect the net transition matrix obtained from saccades made in a stimulus with strong gaze guidance to yield many transitions. Therefore relative number of transitions in the net transition matrix is reported with the figure. This number is calculated as follows

$$F = \frac{\sum A\_{\text{net}}}{\sum A\_{\text{total}}} \tag{5}$$

To produce transition figures with arrows between the AOIs, we transformed the numbers in both the total and the net transition matrix to relative numbers. The width of the arrows is scaled with the maximum relative number in the matrix.

### **EXPERIMENT + METHODS**

#### **AREA OF INTEREST PRODUCTION**

Eye movements were analyzed with 2 different sets of AOIs. In the clustered condition, AOIs were hand drawn in Adobe Photoshop and based on the heat map. This is a data driven approach and all clusters of fixation received an AOI. In the gridded condition, the stimulus area was divided in 12 equal rectangles. Horizontally oriented displays were divided in a 3 × 4 grid and vertically oriented stimuli were divided in a 4 × 3 grid. Examples of clustered and gridded AOIs can be found in **Figure 3**.

#### **OBSERVERS AND PROCEDURE**

For this study we recorded the eye movements from of 54 observers (27 males, 27 females, age ranged from 18 to 55). Observers were positioned in front of a Tobii 1750 Eye Tracker and were individually calibrated with a nine-point calibration. They were instructed to browse the ads as they would have done when they would stumble upon them in a magazine (this instruction is comparable to that of Pieters and Wedel, 2004), by clicking left arrow button they could go on to the next ad. In between stimulus presentation a black screen was presented.

#### **MATERIAL**

For the study we used unmodified advertisements downloaded from the Internet. The criteria for an ad to be included in the study were that it was available in a sufficiently high resolution, and that the language used in the ad was either Dutch or English or a mixture of the two. We choose 70 different digital ads that were all scaled down to the maximum size possible to be presented on the screen of the Tobii 1750 (1280 pixels horizontally, 1024 pixels vertically), and depending on the orientation of the original ad, the rest of the screen was filled by a black background. Not all lay-outs of the chosen ads were suitable for the analysis done in this study. We included 39 of the 70 ads in the analysis with both the gridded AOIs and the clustered AOIs. These 39 ads met the following criteria.


#### **FIXATION DETECTION**

We performed fixation detection by a computer program that marked fixations by an adaptive velocity threshold method. First, velocities were obtained by fitting a parabola through three subsequent data points of the position signal. We used the derivative of this parabola to estimate the value of the velocity of the second (center) data point. This procedure was repeated for all data points (except the first and the last point). In the present analysis, everything that is not a saccade is called a fixation. To remove the saccades from the signal we calculated average

**FIGURE 3 | Two methods for AOI production used in this study.** The **left panel** contain the gridded AOIs. The **right panel** contains the hand drawn AOIs. These AOIs are based on the heat map **(middle panel)** and aim at capturing fixation clusters. The target

AOI (the brand logo) is coded with "*t*". This manuscript is not about AOI-production. We used these two extreme methods to show the robustness of our measures, not to show any preference for one of the two methods.

and standard deviation from the absolute velocity signal (composed from the vertical and horizontal velocity signal). All data points having absolute velocities higher than the average velocity plus 3 times the standard deviation were removed. This procedure was repeated until the velocity threshold converged to a constant value or the number of repetitions reached 50. Then we removed fixations having durations shorter than 60 ms from the analysis. We removed saccades smaller than 1.0◦. When a saccade was removed, the preceding and succeeding fixations were added together. This is a velocity based fixation detection method suitable for data from low frequency eye trackers such as Tobii 1750 (50 Hz), Tobii T60 (60 Hz), EasyGaze (52 Hz max) and SMI red (60–120Hz). One may avoid fixation detection by directly calculating the dwell time from raw data in combination with the AOIs at the cost of too long dwell times because parts of the saccade time are counted as dwell.

#### **RESULTS**

#### **DATA QUALITY AND DATA EXCLUSION**

We measured data quality in each trial by determining RMS noise (Holmqvist et al., 2011 page 35) in the horizontal and vertical eye movement signal during fixation (meaning the saccades were removed from this signal) and saccade detection velocity threshold. RMS noise is a measure for the variable error and its cause may vary from physiology to operator quality (Holmqvist et al., 2012; Nyström et al., 2013). Average RMS noise level over all trials and observers measured 0.28◦ for the x-component and 0.33◦ for the y-component, average saccade detection velocity threshold converged to 24.5◦/s. We combined horizontal and vertical RMS noise in one number by adding up the horizontal and vertical component using the theorem of Pythagoras. From these values we constructed per observer one histogram of RMS values (each trial delivers one value). From the original data set of 54 observers we removed 6 observers due to high values for mean combined (horizontal and vertical) RMS noise (*>*30 pixels/±0.8◦).

#### **T<sup>50</sup>**

To estimate attention drawing power of brand logos we determined T50 of the brand logo in 39 different ads. To rule out the possibility that our results are due to the choice of AOIs we choose to make our AOIs in two very different ways. We refer to the two methods as gridded and clustered AOIs. **Figure 4** shows values for T50. Each dot in the figure represents data for one advertisement; the *x*-value is T50 obtained from the clustered AOIs and the *y*-value is the T50 obtained from the gridded AOIs. T50 ranges from 0.42 to 7.8 s for the clustered AOIs and from 0.52 to 4.66 s for the gridded AOIs. T50 reaction times are shorter for gridded AOIs than for clustered AOIs. This is probably due to the larger size of gridded AOIs. We find a high correlation *r* = 0*.*84 for T50 determined between two very different methods.

#### **BRAND LOGO FIXATION**

Analogous to accuracy in a visual search task we determine *P*max or fixation score in this free viewing task

**FIGURE 4 | T50determined from gridded AOI vs. T50determined from clustered AOIs.** Each data point represents data from 48 observers in one ad. T50 represents the time that is required for an AOI to attract fixations from the first unique 50% observers of the population.

(**Figure 1**, *P*max = 0*.*8 means that 80% of the observers fixated the brand logo at least once). The fixation score of the brand logo AOI is high, irrespective of AOI production method. Most ads have a fixation score higher than 0.8 and the correlation between the scores is 0.95 (see **Figure 5**).

#### **ENTROPY**

We stated before that if the scan path entropy measure is sensitive, it may be a useful measure. Entropy obtained from the clustered AOIs ranges from 0.78 to 5.11 and entropy from the gridded AOIs ranges from 2.99 to 5.44. We stated that we expect a correlation between T50 and scan path entropy. **Figure 6** shows T50 vs. scan path entropy.

Is the scan path entropy high or low? The best way to answer that question is to compare the scan path entropy to the minimum and maximum entropy possible. Minimal entropy would be reached if all subjects follow the same scan path. If this is not the case the number of scan paths is always higher than one. To determine the maximum entropy is difficult. Here we compared obtained entropy with the maximum entropy for the same number of paths (**Figure 7** dashed line). For the highest number of paths, entropy almost resembles the maximum entropy (for both AOI production methods). For the lower number of paths, entropy is lower than the maximum entropy for the number of paths (gridded AOIs). With clustered AOIs, for the lowest number of paths entropy almost resembles the maximum entropy. An explanation is not

**FIGURE 7 | Entropy as function of the number of scan paths.** A single dot represents data from an individual advertisement. The line denotes the maximum entropy given the number of paths. The previous is calculated from a flat distribution of scan paths. In the gridded condition for high number of paths the measured entropy resembles the maximum entropy for that specific number of paths. In the clustered condition for both high and low number of paths the measured entropy resembles the maximum entropy for that specific number of paths.

complicated. Ads without gaze guidance produce many different paths without preference for any the paths. This results in maximal entropy. Ads with gaze guidance produce fewer paths than possible and some of these path were favored resulting in lower entropy. For the low number of paths (around 1, and 3) the difference between minimal and maximal is quite small.

#### **VISUALIZATION**

**Figures 8** and **9** show two examples of scan path visualization. **Figure 8** shows an example with a low number of AOIs. It was already clear in the visualizations of Lessing and Linge (2002) that too many elements cause clutter and thus make the figure less usable as a visual interface. One way to avoid clutter is splitting up the visualization in two panels, namely the total and the net transitions. We scale the width of the arrows to code for the relative number of transitions (no numbers required, again less clutter). **Figure 8C** shows an arrow plot depicting the total relative number of transitions. It also yields T50 values. From this figure we can see that scanning started in the center. **Figure 8C** shows many transitions between the text and the visual (the gun). **Figure 8D** shows the net transition and it contains only 8% of the transitions. This is an indication for low gaze guidance capacity. With this knowledge we can look to panel **(B)**, showing T50 for all AOIs. Here we can see that after quickly fixating both the gun and the text AOIs, it takes a long time before the observers gaze at the bullets and the brand logo. The T50 for the bullets (3.31 s) and the brand logo (3.27 s) suggest that these two AOIs compete for attention. If we had to advice the designer we would ask to make the bullets a less attractive target or make the brand logo stand out better, facilitating quicker brand logo fixation. A re-test could be used to validate the changes applied to the design. **Figure 9** clearly shows the limitation of our visualization method. Too many AOIs make the figure hard to interpret. However, a global scan pattern can still be extracted from this figure. Especially in the top of **Figure 9C** (the net transitions), reading from left to right is clearly visible.

#### **DISCUSSION**

#### **DESIGNING A TEST**

What to measure and how to test eye movement behavior with the new quantitative measures? In case of ad enhancement we suggest to build a database from quantitative eye tracking measures (including T50, entropy and (visualized) transition matrices). New ads should be measured each week with a large number of observers (±50–100). However, such an approach is too expensive for most usability and marketing research companies. Interpreting quantitative measures is also possible based on a simple differential measurement instead of a large database. This is of course much cheaper and easier to carry out.

Imagine a client that provides a researcher with an ad and the question whether the design is good enough (it may sound strange, but this is usual practice). The researcher should be provided with (slightly) different versions of the ad, to be able to carry out a differential measurement to allow for interpreting the quantitative eye tracking measures. Then T50 and entropy could be used to choose the best design. In addition, if there is the possibility to interview the designer or client, the researcher should ask

**FIGURE 8 |** Panel **(A)** Hand drawn AOIs. The shape and size of the AOI are not very critical in sparse displays. Panel **(B)** Cumulative plots for 4 AOIs of panel **(A)**. Cumulative curves for AOI-Text and AOI-Gun are steep with high fixation scores (100 and 98%) and short T50 (0s and 0.48 s). Panel **(C)** shows visualization of the total transition matrix. The

orange labels denote T50*.* The gun is fixated first and many transitions go to the text. Later there are transitions to the bullets (left) and the brand logo (right). Panel **(D)**. shows a visualization of the net transition matrix. Only 8% of the transitions are in this figure, indicating that gaze guidance is not very strong in this stimulus.

for design goals, hypothesis, discussions about and ideas behind the design. Comparing ideas and goals of designers with actual saccadic scanning behavior may help improving the design and evoke new ideas; the arrow plots can be helpful in this process. For example, there is still strong belief in the field that observers scan ads in a Z-pattern. The idea behind Z-scanning is that people scan from left to right and from top to bottom (in a reading like way). In this study we saw many different scan-patterns (including z-scanning, but of course not exclusively Z-scanning). As stated before when different versions of the ad compete for publication the shortest brand logo T50 may be used as criterion. However, depending on the goal of the ad, a mix of T50 values (for attention drawing capacity) combined with total dwell times (for attention retaining capacity) would do the job of choosing the best ad. We restricted ourselves in this study to T50 measures for the brand logo for reasons of simplicity.

If the designer provides only one ad, the researcher could add ads from competing companies or ads having similar composition to the eye tracking test. Doing research without direct competitors from the same designer or company is of course more difficult and less effective. One can always give qualitative description of scan patterns as in **Figure 8**. Differential measurement within one ad is also possible. An example of this is provided in results section about *Visualization.*

#### **DOING STATISTICS WITH T<sup>50</sup> AND ENTROPY**

In the present article we did not yet use statistics. Imagine that there is doubt about the sensitivity of T50 or scan path entropy in a situation where two visual stimuli produce almost similar values. To solve this problem, we advise to use a bootstrapping method (Efron, 1979). Bootstrapping is a resampling method that uses an estimator (such as T50) based on a subpopulation of responses drawn from the whole population. In bootstrapping, drawing a subpopulation is repeated a large number of times (in the range of 5000–10,000 times).

#### **RELATION BETWEEN CHOICE OF AOIs, T<sup>50</sup> AND ENTROPY**

How sensitive are T50 and scan path entropy for the choice of AOI? In this study we used two completely different methods to produce AOIs and we found correlated but different values for T50 and scan path entropy (**Figures 4, 6** and **7**). It may depend on the nature of the visual stimulus how T50 and scan path entropy are related to the AOIs. If the visual stimulus is sparse, it is recommended to make the AOIs as large as possible. That is possible because in such stimuli there is not much crowding and conspicuity areas of visual elements are large (Engel, 1971; Toet and Levi, 1992). Large conspicuity areas implicate that objects are visible at larger eccentricities (or larger distance from the gaze point), allowing observers to overview larger areas around the gaze point.

In dense stimuli, however, researchers may make many choices during production of AOIs that can affect AOI measures. An example can be found in **Figure 8C**, where two fixation clusters can be found in the gun AOI. The previous makes clear that comparison of AOI measures (such as T50, scan path entropy and total dwell time) between different studies may be impossible without taking into account the nature of the AOIs. Another problem is that in dense visual stimuli the number of AOIs may become too large to produce both entropy values that are interpretable and arrow plots that are informative. Here are some practical suggestions


Applying one of the points above may be helpful in doing effective research, however any of the choices should be reported and motivated.

#### **LIMITATIONS OF THE ARROW PLOT**

The arrow plot is beautiful and appealing but **Figure 9** shows clearly that if the number of AOIs exceeds a certain number (4 or 5), the arrow plot becomes cluttered and hard to interpret. Both restrictions to the data set and some modifications to the arrow plot would be helpful to avoid too much cluttering and too many arrows.


There are other modifications possible to the arrow plot. The width of the arrows code relative numbers of transitions in the present figures. The width of the arrows in the net and total transition arrow plot is determined independently. They can be coupled together or made to represent absolute numbers of transitions. Another possibility is to measure eye movements in two different visual stimuli and color-code the arrows if they represent a significantly higher number of transitions in one of the two visual stimuli.

#### **ENTROPY IS A USEFUL MEASURE IN PRACTICAL SITUATIONS**

Why use entropy as a measure to qualify ads? There is a correlation between scan path entropy and T50 but T50 is a direct measure and scan path entropy is an indirect one. If one is only interested in direct performance measures as total dwell time and T50, one should not determine entropy. However, if advertisement enhancement is of interest, both entropy (or the underlying histogram from which entropy is calculated) and the arrow plot may be handy tools. They may provide insight in the cause of scan patterns that produce long or short T50s. Consider the following: The attention attracting power of brand logos can be increased easily by


to make them more conspicuous (Toet and Levi, 1992; Kooi et al., 1994). However, designers may have many reasons not to increase conspicuity for aesthetic reasons or because they have stick to corporate standard design. In that case increasing gaze guidance capacity is an alternative strategy to make observers look to a specific element.

There is another reason to be interested in scan path entropy. Many advertisements contain a story that is important for message transfer. It is assumed that fixation order is important for the observer to understand the story (whether that is really the case is an interesting question too). Scan path histograms, scan path entropy and arrow plots may provide the necessary information to investigate this.

#### **CONCLUSION**

We suggest a new visualization method "the arrow plot" in combination with two quantitative measures, T50 and scan path entropy. These methods were applied on 39 ads and we showed with two methods of AOI production that T50 and scan path entropy are robust measures. The arrow plot reveals aspects of average scanning behavior that are hidden with attention maps. We discussed the pros and cons and suggested ways to adapt the new measures and visualizations to specific research questions. Our new methods will be applicable to the field of art and eye movements, the field of psychology (free-viewing and visual search) and the fields of ergonomics and usability.

#### **ACKNOWLEDGMENTS**

We like to thank Roy Hessels for careful reading of the manuscript.

#### **REFERENCES**


*2010 Symposium on Eye Tracking Research and Applications* (New York, NY: ACM).


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 30 September 2013; accepted: 13 December 2013; published online: 24 December 2013.*

*Citation: Hooge I and Camps G (2013) Scan path entropy and arrow plots: capturing scanning behavior of multiple observers. Front. Psychol. 4:996. doi: 10.3389/fpsyg. 2013.00996*

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology.*

*Copyright © 2013 Hooge and Camps. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Recall and recognition of in-game advertising: the role of game control

### *Laura Herrewijn\* and Karolien Poels*

*Department of Communication Studies, University of Antwerp, Antwerp, Belgium*

#### *Edited by:*

*Jaana Simola, University of Helsinki, Finland*

#### *Reviewed by:*

*Sven-Erik Fernaeus, Karolinska Institutet, Sweden Stephen Doherty, Dublin City University, Ireland*

#### *\*Correspondence:*

*Laura Herrewijn, Department of Communication Studies, University of Antwerp, Sint-Jacobstraat 2, 2000 Antwerp, Belgium e-mail: laura.herrewijn@uantwerpen.be* Digital gaming has become one of the largest entertainment sectors worldwide, increasingly turning the medium into a promising vehicle for advertisers. As a result, the inclusion of advertising messages in digital games or in-game advertising (IGA) is expected to grow steadily over the course of the following years. However, much work is still needed to maximize the effectiveness of IGA. The aim of the study was to contribute to IGA effectiveness research by analyzing the impact of two factors on the processing of IGA in terms of brand awareness. The primary objective was to investigate the effect of a person's sense of involvement related to the control and movement mechanisms in a game (i.e., kinesthetic involvement). A within-subjects experiment was conducted in which control over a racing game was varied by manipulating game controller type, resulting in two experimental conditions (symbolic versus mimetic controller). Results show that the variation in game controller has a significant effect on the recall and recognition of the brands integrated into the game, and that this effect can be partially brought back to players' perceived control over the game: when a game is easier to control, the control mechanisms require less conscious attention, freeing attentional resources that can be subsequently spent on other elements of the game such as IGA. A second factor that was taken into account in the study was brand prominence. The influence of both the size and spatial position of in-game advertisements was examined. Findings demonstrate that there are significant changes in effectiveness between different types of placements. Spatial position seems to be the most important placement characteristic, with central brand placements obtaining the highest recall and recognition scores. The effect of ad size is much smaller, with the effectiveness of the large placements not differing significantly from the effectiveness of their smaller counterparts.

**Keywords: in-game advertising, game control, player involvement, brand prominence, brand awareness**

#### **INTRODUCTION**

No other entertainment sector has experienced the same explosive growth as the digital game industry. A report from DFC Intelligence forecasts that the global market for digital games is expected to grow from \$63 billion in 2012, to \$78 billion in 2017 (DFC Intelligence, 2013), making it one of the largest entertainment sectors worldwide. Moreover, according to the Entertainment Software Association, 58% of U.S. citizens play digital games; 45% of all game players are women; and the average game player is 30 years old (32% of all game players are younger than 18 years, 32% are between 18 and 35 years, and 36% are over 36 years) and has been playing games for 13 years (Entertainment Software Association, 2013). These figures reflect the increasing popularity of digital games, showing that there are millions of people from all sociodemographic groups who increasingly enjoy playing digital games in their spare time. Digital games have thus surpassed their status as being a predominantly male pastime, and have grown into a mainstream entertainment medium that touches every segment of the population.

Consequently, the advertising industry has taken an interest in digital games. The appearance of advertising inside digital games

goes as far back as the early 1970s, when the computer game *Lunar Lander*<sup>1</sup> included a McDonald's restaurant as a hidden feature or easter egg into its gameplay. The goal of *Lunar Lander* was to land a lunar module on the moon. If the player landed on exactly the right spot, the McDonald's restaurant would appear and the astronaut would order a Big Mac hamburger to go. Crashing into the restaurant, however, destroyed it permanently and the game would display a message, scorning the player for destroying the only McDonald's on the moon (Vedrashko, 2006; Skalski et al., 2010). In this early example of advertising inside a digital game, the brand was integrated because of its humoristic rather than commercial value. Advertisers began showing explicit interest in digital games in the early 1980s though, and from the 1990s on, advertisers began to see digital games as an appropriate and viable medium for the incorporation of their advertisements and the reaching of their target markets (Schneider and Cornwell, 2005; Vedrashko, 2006; Mau et al., 2008; Mackay et al., 2009; Nicovich, 2010; Skalski et al., 2010). This interest in the use of digital games as a medium for the delivery of advertisements has been increasing

"fpsyg-04-01023" — 2014/1/24 — 11:27 — page 1 — #1

<sup>1</sup>Digital Equipment Corporation (1973).

ever since, and is predicted to keep growing steadily over the next several years. On a global basis, advertising related to digital games is expected to reach \$7.2 billion by 2016, up from \$3.1 billion in 2010 (DFC Intelligence, 2011). This includes *in-game advertising* (IGA) or the incorporation of advertising messages into existing digital games, a practice similar to product placement in movies or television shows (Nelson et al., 2006; Bogost, 2007; Interactive Advertising Bureau, 2010). IGA comes in a lot of different formats, such as the inclusion of real-world analogs (e.g., billboards, poster ads, radio spots and television commercials), product placements, branded music, and branded characters in digital games. These formats have been maturing throughout the years, advancing from very static toward more dynamic types of IGA (Schneider and Cornwell, 2005;Vedrashko, 2006; Bogost, 2007; Bardzell et al., 2008). This implies that, due to the online capabilities of modern digital games, advertisements can now be dynamically delivered and updated in-game based on multiple criteria, such as players' demographic, regional, and gamer profile, time of the day, etcetera (Schneider and Cornwell, 2005; Vedrashko, 2006; Bogost, 2007; Bardzell et al., 2008). Apart from reaching an ever-growing, diverse audience and the possibility to dynamically place, track, and alter ad units in games, the appeal of IGA also lies in the long shelf-life and replay value of games (the average game is played for up to 30 h), and the fact that integrating ads into digital games can provide brands with the opportunity to become an integral part of the digital game experience, reaching out to players in a highly vivid, interactive and immersive entertainment environment (Nelson,2002,2005; Schneider and Cornwell,2005; Mackay et al.,2009; Nicovich, 2010). Moreover, for game publishers and developers, the integration of advertising is an interesting means to subsidize the rising development and marketing costs of their games without having to increase the retail price, which also benefits the gamer as end user (Chambers, 2005).

However, much work is still needed to maximize the effectiveness of IGA. Prior research has shown that IGA effectiveness often depends on a multitude of context-related factors, such as the type of brand or advertisement that is integrated, the prominence of the brand placement, the amount of congruence between game and product, the situational circumstances in which the ad is encountered, the emotions and experiences of the player during the encounter, etcetera (e.g., Nelson, 2002, 2005; Grigorovici and Constantin, 2004; Schneider and Cornwell, 2005; Mau et al., 2008; Mackay et al., 2009; Lewis and Porter, 2010). The aim of the current study is to further analyze the impact of this contextual component in an *experimental setting*, in order to come to a better understanding of the issues and mechanisms that are critical to the effective use of IGA.

As the starting point of our study, we take the *limited capacity model of motivated mediated message processing* [*LC4MP* (Lang, 2009)]. This model states that a person's total attentional capacity (and thus his ability to cognitively process information) is limited. This has important implications for the effectiveness of IGA. Digital games are considered to be highly interactive and involving, with a multitude of tasks and stimuli vying for attention at the same time. Getting a brand noticed and remembered in such an involving game context is not self-evident, since people allocate their attentional resources to those aspects of an activity that are

"fpsyg-04-01023" — 2014/1/24 — 11:27 — page 2 — #2

most relevant to them at a particular time. In a digital game context, this means that people will focus their attention primarily on the most essential tasks at hand, i.e., tasks and information that are central to furthering their progress in-game, while leaving fewer mental resources for the processing of secondary information such as advertisements that are embedded into the game (Grigorovici and Constantin, 2004; Lee and Faber, 2007; Lang, 2009). Keeping this in mind, there are several contextual factors that are important to consider when studying the effectiveness of IGA and the player's ability to cognitively process these advertising messages. The current study specifically examines the impact of two of these factors on the processing of IGA in an experimental setting.

First of all, we look at the effect of a player's sense of *involvement* in the game on the way he is able to process IGA. Games are an interactive, vivid, engaging, immersive, and complex cultural form that require an active audience (Nelson, 2005) and are able to induce a wide variety of emotions and experiences (Poels et al., 2012). These medium-specific characteristics are considered responsible for an enhanced level of audience (i.e., player) involvement (Vorderer, 2000). Prior research already showed that player involvement is a relevant factor to consider in an IGA context, showing that different levels of player involvement affect the way players process IGA in terms of brand awareness (i.e., brand recall and brand recognition; e.g., Grigorovici and Constantin, 2004; Lee and Faber, 2007).

However, involvement is a multidimensional construct and in the specific context of digital games, it is understood as a combination of six primary sources of engagement, namely: control and movement in the game environment (kinesthetic involvement); the exploration, navigation, and learning of the game's spatial domain (spatial involvement); players' awareness of and interaction with other agents in the game environment (shared involvement); the emotions that are generated during gameplay (affective involvement); story elements that have been written into a game, and those that emerge from the player's interaction with the game (narrative involvement); and the pursuit of goals and the decision-making and reward systems integrated in a game (ludic involvement). These six dimensions occur with varying degrees of intensity and with frequent, fluid shifts in attention (Calleja, 2011).

This multidimensional nature of player involvement has not been taken into account in IGA effectiveness research before, however, implicating that the results of prior studies only lift a corner of the veil. Therefore, this study scrutinizes the effects of one dimension of player involvement, namely *kinesthetic involvement* or the player's involvement related to the modes of control and movement in a game (Calleja, 2011), in order to be able to analyze how the attention toward and involvement with this specific component of the game influences how people process IGA. Player control and in-game movement are a central part of the digital gaming experience, creating a direct link between the player and his avatar in the game world that contributes to the interactive nature of games. There are a lot of different forms of game control, and the amount of freedom that is allowed and the nature and difficulty of the controls have a great impact on the player's sense of involvement in the game environment (Calleja, 2011). The dimension

requires more conscious attention when the player is still learning to use the game controls, or because a situation demands a complex sequence of actions that are very challenging (Calleja, 2011). The main objective of this study is to examine the impact of kinesthetic involvement on the processing of IGA in an experimental context by manipulating the *player's control over the game world*.

Secondly, we investigate the impact of an additional contextual factor that might alter the processing of advertising inside a digital game environment, namely the *prominence of the brand placement*. Brand prominence is mostly defined as a factor that depends on placement characteristics such as ad size, color, attractiveness, and spatial position. Several IGA studies have already looked at the effect of these placement characteristics, showing that prominent brand placements are generally better in capturing the player's attention, resulting in a positive effect on brand awareness (e.g., Grigorovici and Constantin, 2004; Schneider and Cornwell, 2005; Acar, 2007; Lee and Faber, 2007; Bardzell et al., 2008; Jeong and Biocca, 2012). However, these studies have focused on the impact of only one placement characteristic (i.e., ad size or spatial position) or on the influence of all characteristics at the same time. In the current study, we will elaborate on the effect of brand prominence by examining how both *ad size* and *spatial position* relate to people's response to the brand placements, in different combinations.

#### **STUDY SET UP AND HYPOTHESES**

#### **KINESTHETIC INVOLVEMENT**

When playing a digital game, players have the opportunity to influence – to varying degrees – what happens in the game environment. The kinesthetic dimension of player involvement deals specifically with this exertion of agency, manifesting itself in the form of avatar control and the sensation of movement this can produce (Calleja, 2011). Consequently, kinesthetic involvement is closely connected to the modes of game control that are possible. Digital games can be controlled with a wide range of different input devices or game controllers that have progressed considerably over time. These modes of game control range from the more traditional, symbolic game controllers to the relatively new, motion-based symbiotic, and mimetic game controllers (Calleja, 2011; Skalski et al., 2011).

On one end of the spectrum, there is the symbolic control of controller buttons, keys, and thumb sticks, as used in the traditional keyboard and mouse combo and gamepads (e.g., traditional *Microsoft Xbox 360* and *Sony PlayStation 3* gamepad controllers). In the case of symbolic control, there is no direct, mimetic relationship between the actual movement that is performed by the player and the corresponding movement in-game, executed by the avatar. Actions like running and jumping are not controlled through real life movements; players simply press symbolic buttons that they know to correspond with but are not strongly related to the actions in-game (Calleja, 2011; Skalski et al., 2011).

On the other end of the spectrum, there is symbiotic control, in which the player's physical movements in real life are detected and mapped onto the avatar and have a close relationship with the virtual response of the avatar in the game world (Calleja, 2011;

"fpsyg-04-01023" — 2014/1/24 — 11:27 — page 3 — #3

Skalski et al., 2011; Vanden Abeele, 2011). The best example of symbiotic control is the relatively new and popular *Microsoft Kinect* interface, which can be used with *Microsoft Xbox 360* consoles and *Microsoft Windows* PCs. This type of control is substantially different from the pressing of symbolic buttons, since players have to physically move themselves in order to cause the appropriate action in the game (Calleja, 2011). *Kinect* utilizes a camera that is attached to the television or PC and maps player movement directly onto the avatar. If people are playing a fighting game, for instance, they will have to use their entire body to punch and kick their foes (Calleja, 2011).

Finally, a milder version of symbiotic control is mimetic control, which constitutes a partial mapping of the player's actions onto the avatar. Well-known and popular examples of mimetic controllers include the *Nintendo Wii Remote* and the *Sony PlayStation Move* motion controller. Only the movement of these motion-sensing controllers is registered, so players have to swing the controller in the wanted direction and with the wanted intensity, triggering a similar response in-game (e.g., swinging the controller as a baseball bat or pointing it as a gun). Another form of mimetic control can be found in controllers that replicate part of a machine, tool, vehicle, or instrument (e.g., steering wheels, light guns, musical instruments; Calleja, 2011; Skalski et al., 2011).

Apart from these different manifestations of game control, a player's sense of kinesthetic involvement is highly dependent on the perceived difficulty of the controls. When people start playing a game, they go through an entire process that ranges from the learning of the game controls to the automation of control and movement (Calleja, 2011). When a player is not yet sure which key corresponds to a certain action, he needs to direct more of his attention to the key presses, learning by trial and error and sometimes even needing to consult a guide. After a while, however, the controls will be learned and practiced to such a degree that the player will be able to press the keys automatically, resulting in the on-screen movement feeling unmediated and the player being able to direct his attentional resources to other aspects of the game (Calleja, 2011). This is especially relevant for the practice of IGA: as players increasingly learn controls, they can devote more of their attention to the exploration of their surroundings, including the advertising-related elements they feature.

In summary, kinesthetic involvement is a crucial part of the gaming experience, as most other aspects of involvement in games are dependent on developing at least a basic fluency of movement in the environment. Therefore, in order to be able to elaborate on the impact of player involvement on the effectiveness of advertising featured in a digital game, we decided to conduct an experimental study in which we were primarily interested in the effect of kinesthetic involvement. In this experiment, we varied players' control over the game by manipulating the *type of game controller* with which the game was played as a *within-subjects factor*, resulting in two experimental conditions. We chose to work with a racing game in the experiment, since racing games are very performative games in which the player has to constantly execute kinesthetic actions, manipulating the controller while following the visual cues shown on the screen (Apperley, 2006). In one condition, respondents played the racing game

with a symbolic game controller (i.e., a traditional gamepad controller), while in the other condition, participants played the racing game with a mimetic controller (i.e., a motion-based racing wheel controller). The choice for a *traditional, symbolic controller versus a motion-based mimetic one* was made because prior research studying player experience had already shown that these types of controllers lead to significant differences in player involvement, especially affecting a person's sense of kinesthetic involvement (e.g., Johnson et al., 2002; Limperos et al., 2009; Skalski et al., 2011; Vanden Abeele, 2011). Moreover, they can also influence the way in which people process IGA, as shown by a study of Dardis et al. (2012). In what follows, we will give an overview of relevant literature concerning the effects of game controller type on kinesthetic involvement on the one hand and the processing of IGA in terms of brand awareness on the other hand, and formulate hypotheses accordingly. Moreover, we will discuss the mechanisms that possibly underlie the effects of game controller on IGA memory and make a case for the mediating role of kinesthetic involvement.

#### *The impact of game controller on kinesthetic involvement*

First of all, studies looking at the impact of game controller on the player experience show that playing games with a motion-based, mimetic game controller augments players' perceived controller naturalness (Skalski et al., 2011;Vanden Abeele, 2011). Since these game controllers exploit a direct relation between the physical actions of the gamer and the in-game actions of the avatar, they are perceived as being more predictable, logical, intuitive, and natural (Skalski et al., 2011; Vanden Abeele, 2011). However, intuitiveness and control are two different things. Although it is often believed that motion-based play is easier than playing with a traditional, symbolic game controller, research proves that the opposite is often the case (Johnson et al., 2002; Vanden Abeele, 2011). Although physical controllers allow for more intuitive and natural game controls, they also still suffer from a lack of precision and responsiveness, making it harder for players to control their actions and movements in the game environment (Johnson et al., 2002; Vanden Abeele, 2011). Studies from Vanden Abeele (2011) and McMahan et al. (2010) also find that motion-based controllers not only decrease perceived control but also actual control, resulting in a lower game performance (e.g., lower game scores, slower game completion times).

Based on Calleja's (2011) description of the kinesthetic involvement dimension and the results of the studies mentioned above, we deem kinesthetic involvement to be influenced by and thus to consist of two sub dimensions. The first sub dimension concerns the player's *control* over his actions and movements in the game world and is highly dependent on the perceived difficulty of the game controls, while the second sub dimension is related to the perceived *naturalness* of the game controller. We expect that our manipulation of game controller type (traditional symbolic controller versus motion-based mimetic controller) will affect both sub dimensions; we predict that the mimetic controller will be perceived as more intuitive and natural compared to the symbolic controller, but that it will also be less precise and responsive, making it harder for players to control the game. As such, we propose the following hypotheses:


Moreover, we will also look at the impact of game controller type on the players' actual game performance, expecting the following:

H3: *People will have a higher game performance when playing with the symbolic controller compared to playing with the mimetic controller.*

Finally, we will examine the influence of game controller type on kinesthetic involvement in general. Playing with the mimetic controller is suspected to lead to higher levels of perceived controller naturalness, while the symbolic controller is hypothesized to be easier to command, leading to increased responsiveness and control over the game world. We expect that when considering kinesthetic involvement in its entirety, control will carry more weight than naturalness, since it puts a greater strain on the player's attention. Consequently, we formulate the following hypothesis:

H4: *People who play the game with the symbolic controller will experience higher levels of kinesthetic involvement than those who play the game with the mimetic controller.*

#### *The impact of game controller on brand awareness*

If the manipulation of game controller is indeed able to cause significant variances in kinesthetic involvement, we are interested in analyzing whether it also affects the way people process the advertisements integrated into the game environment.

Concerning the influence of game controller on people's awareness of the brands integrated into the game, we start from the assumptions of the LC4MP (Lang, 2009). The LC4MP states that a person's ability to process information is limited, with people only having access to a limited pool of cognitive resources at a particular time (Lang, 2009). In the context of our study, this model has important implications for the processing of IGA. Digital games are highly interactive and involving media that bombard the player with a continuous stream of sensory (i.e., audiovisual, tactile) information. Getting an advertisement noticed and remembered in such an involving game environment is not self-evident (Grigorovici and Constantin,2004; Lee and Faber,2007). People allocate their attentional resources to those aspects of a task or activity that are most relevant to them at a particular time (i.e., the primary task). In a digital game context, the primary task consists of actually playing of the game; the player tries to process, remember and act on the information that is most essential for his progression in the game. Since people will focus their attention primarily on the playing of the game, this leaves fewer mental resources available for secondary tasks such as the processing of advertisements that are embedded into the game (Grigorovici and Constantin, 2004; Lee and Faber, 2007; Lang, 2009). The LC4MP (Lang, 2009) will thus form the basis for our hypotheses to be built upon.

The influence of game controller on the processing of IGA has been studied in one previous study before. This study from

"fpsyg-04-01023" — 2014/1/24 — 11:27 — page 4 — #4

Dardis et al. (2012) showed that variations in game controller can indeed reflect on the memory of the ads featured in a game; playing a racing game with a symbolic controller (i.e., *Xbox 360* gamepad controller) resulted in increased ad recall rates compared to playing with a mimetic controller (i.e., *Xbox 360* racing wheel with gas and brake pedals). They, too, explain these findings by referring to the LC4MP (Lang, 2009), suggesting that players are more familiar with the traditional symbolic controllers compared to the newer mimetic controllers, which therefore require less attentional resources. However, they do not explicitly measure players' familiarity or expertise with the game controllers, leaving the mechanisms that underlie the effect of game controller open for discussion.

*Kinesthetic involvement as mediator.* The aim of our study, then, is to further investigate and verify the effect of game controller on the processing of IGA (i.e., brand awareness: brand recall and brand recognition), and determine whether or not they can be explained by variations in kinesthetic involvement.

Previous research already established that player involvement is a relevant factor to consider when studying IGA, showing that fluctuations in a person's general absorption or involvement in a digital game can alter the processing of IGA (e.g., Grigorovici and Constantin, 2004; Lee and Faber, 2007). Grigorovici and Constantin (2004), for instance, investigated the influence of involvement on the awareness of IGA in a virtual environment. Their results show that the more involving a virtual environment is, the lower people's brand recall and recognition. Lee and Faber (2007) looked at effects of involvement while playing an online racing game on brand memory, and also found that it limited players' awareness of the brands integrated into the game. Both studies clarify these effects on brand awareness by quoting the LC4MP (Lang, 2009), arguing that highly involving environments put an increased strain on people's cognitive resources, resulting in people devoting their attention more to playing the game and less to the processing of IGA (Grigorovici and Constantin, 2004; Lee and Faber, 2007; Lang, 2009).

The findings of these studies – looking at both the impact of game controller (Dardis et al., 2012) and player involvement (Grigorovici and Constantin, 2004; Lee and Faber, 2007) on IGA processing – underline the importance of considering the multidimensional nature of player involvement. Following the reasoning of Dardis et al. (2012), we suspect that in the specific case of kinesthetic involvement, results might differ from studies looking at general involvement, although we expect them to still be in line with the reasoning of the LC4MP (Lang, 2009). As we mentioned before, kinesthetic involvement can range from the learning of new controls, to the automation of control and movement in a game. As a player becomes more practiced and familiar with the game controls, he will be able to press the buttons automatically, without paying conscious attention to them. His attentional resources can therefore be directed to other aspects of the game, such as the advertisements integrated into the game environment (Calleja, 2011). Since we hypothesized that playing with a symbolic controller will be easier, increasing the player's control over the game and the sensation of movement it produces, we propose the following hypothesis:

H5: *People will experience higher levels of brand awareness (recall, recognition) when playing with the symbolic controller compared to playing with the mimetic controller.*

Finally, if our results show that the manipulation of game controller indeed has a significant influence on participants' awareness of the brands encountered in-game, we expect that this effect will be mediated by their sense of kinesthetic involvement.

H6: *The sub dimensions of kinesthetic involvement will mediate the relationship between type of game controller (symbolic controller versus mimetic controller) and brand awareness (recall, recognition).*

#### **BRAND PROMINENCE**

Further, we want to analyze the influence of an additional factor that might affect the impact of IGA: the prominence of the brand placement. Brand prominence depends on placement characteristics such as ad size, color, attractiveness, and spatial position. These characteristics are of considerable importance in an advertising context. Advertising studies investigating effects in traditional media (e.g., television, print) have demonstrated that the placement of a brand in a prominent way generally has a positive effect on brand memory, since a prominent ad attracts more attention and is more deeply processed resulting in increased awareness (Law and Braun, 2000; Van Reijmersdal, 2009). In an IGA context, several studies have already looked at the effect of brand prominence on brand awareness, although they mostly focused on only one placement characteristic (e.g., Grigorovici and Constantin, 2004; Acar, 2007; Lee and Faber, 2007; Bardzell et al., 2008; Jeong and Biocca, 2012) or on all characteristics at the same time (e.g., Schneider and Cornwell, 2005). Their findings reveal that prominent placements (e.g., large versus small placements, central versus peripheral placements) indeed lead to higher levels of recall and recognition compared to more subtle placements.

In the current study, we aim to elaborate on the effect of brand prominence by examining how different combinations of ad size and spatial position affect people's response to the brands featured in IGA in terms of recall and recognition. Therefore, we manipulated the size (small versus large) and spatial position (peripheral versus central) of the in-game ads integrated into the experimental game, resulting in four different placement types (large-central, small-central, large-peripheral, small-peripheral). We expect that highly prominent brand placements will lead to higher brand awareness rates compared to more subtle placements. In our experiment, the large-central brands can be considered to be the most prominent, while small-peripheral placements are the most subtle.

H7: *Large-central brand placements will obtain higher levels of brand awareness (recall, recognition) compared to small-peripheral placements.*

However, our results will have to point out which placement characteristics (ad size or spatial position) prove to be the most important in light of IGA effectiveness, leading us to formulate the following research question:

RQ1: *Which combinations of placement characteristics (size x spatial position) are the most effective in terms of brand awareness (recall, recognition)?*

"fpsyg-04-01023" — 2014/1/24 — 11:27 — page 5 — #5

Finally, we also want to check whether the effects of our manipulations of game controller and brand prominence interact with each other:

RQ2: *Do the effects of game controller (symbolic controller versus mimetic controller) and brand prominence (size x spatial position) on brand awareness (recall, recognition) interact with each other?*

#### **METHOD**

#### **EXPERIMENTAL DESIGN**

To be able to test the impact of kinesthetic involvement on the effectiveness of IGA, we conducted an *experiment* with a *withinsubjects design*. During this experiment, participants played a *Sony PlayStation 3* kart racing game containing several in-game advertisements. In order to vary kinesthetic involvement, we manipulated the players' control over the game between two conditions by letting them play the game twice, with two *different types of game controllers*. In one condition, participants played the game with the *traditional PlayStation 3 controller* (i.e., a gamepad or symbolic game controller), while in the other condition, they played the game with the *PlayStation Move racing wheel* (i.e. a mimetic game controller that combines the motion-sensing abilities of the *PlayStation Move* controller with a steering wheel). The order in which participants played with the two different controllers was counterbalanced to avoid order effects.

Additionally, in order to elaborate on the influence of *brand prominence*, we manipulated the *size* (small versus large) and *spatial position* (peripheral versus central) of the in-game ads integrated into the game, combining both characteristics into four different placement types (large-central, small-central, largeperipheral, small-peripheral).

#### **PARTICIPANTS**

Thirty one people (24 male, seven female), 18 to 30 years of age (*M* = 22.6, *SD* = 2.9) participated in the experiment. Although people were only required to have basic experience with games in order to be able to participate, the majority of our sample can be considered experienced gamers. All participants had been playing digital games for 6 years or more (6 to 8 years: 12.9%, 9 years or more: 87.1%), and most of them played games on a weekly or daily basis (a few times a year: 12.9%, monthly: 9.7%, weekly: 48.4%, daily: 29.0%).

#### **EXPERIMENTAL GAME AND IN-GAME ADVERTISEMENTS**

The game *LittleBigPlanet Karting*<sup>2</sup> was used in the experiment. *LittleBigPlanet Karting* is a *Sony PlayStation 3*-exclusive kart racing game in which players race against computer opponents or other players in a go-kart on a variety of tracks. Throughout the race, players can pick up both offensive and defensive weapons to either attack or protect themselves from their opponents. Moreover, the game focuses heavily on user-generated content, enabling us to create our own levels using the official editor of the game. We made two levels for use in the experiment: one level for each condition. Since we wanted to analyze the effect of controller type on the effectiveness of IGA, these two levels looked exactly the same except for the integration of the advertising in the game environment. The race track that we created was set in a village environment and players had to complete five racing laps, competing against seven computer opponents. Players did not have a time limit, but since it was a racing game their goal was to finish the game level in the best time and placing possible.

We chose to incorporate billboard advertisements inside our levels. Billboard ads are one of the most common forms of advertising in racing games and the experimental setup thus resembled the real-life practice of IGA (Nelson, 2005; Skalski et al., 2010). As we wanted to investigate the impact of ads with varying sizes (large versus small) and spatial positions (central versus peripheral), we combined both ad characteristics into four different placements: large-central, small-central, large-peripheral, and small-peripheral. The peripheral billboards were placed on the side of the road, while the central billboards were attached to bridges the player had to drive underneath, featuring the placements in the center of the screen. The logos of the brands featured on the large billboards were exactly the same size, as were the logos of the brands on the small billboards. Each racing lap featured these four placements, meaning that players encountered each placement five times.

The brands that were featured on the in-game billboards were popular and well-known soda and candy brands. We chose to work with existing and well-known brands in order to create a realistic and plausible IGA scenario. As already mentioned, we created two game levels for use in the experiment. These two levels looked exactly the same, apart from the integration of IGA. Each level incorporated four different brands that were found to be similar in familiarity and attitude based on the results of a *pretest* with 43 people (32 male, 11 female, *M* = 22.2, *SD* = 4.0), in order to be able to distinguish the impact of the manipulation of game controller on IGA effectiveness (see **Table 1**). Again, the order in which the participants played the levels with the different types of game controllers was counterbalanced to avoid order effects. By making use of the *PlayStation Eye* camera, we transferred the real logos of these brands onto the billboards that were placed into the game (see **Figure 1**).

#### **PROCEDURE**

"fpsyg-04-01023" — 2014/1/24 — 11:27 — page 6 — #6

The experiment took place in a lab room at the University of Antwerp (Belgium). In this game lab we had a *Sony PlayStation 3* console at our disposal, connected to a large television screen. During the experiment, participants first played the official tutorial of the game with one controller [either the symbolic controller (i.e., traditional *PlayStation 3* controller) or the mimetic controller (i.e., *PlayStation Move* racing wheel)], explaining how to play the game (e.g., steer, pick up weapons). Afterward, they played the experimental game level containing IGA. When they finished playing, we wrote down their game score and game completion time and they had to fill out the first part of a self-report questionnaire, asking them about their player involvement while playing the game with the specific game controller. When they completed this questionnaire, the second part of the experiment started, and they had to repeat the whole process with the other game controller: play tutorial – play

<sup>2</sup>Sony Computer Entertainment Europe (2012).

#### **Table 1 | Overview of the brands featured in the two experimental levels.**


*Note: Brand attitude was measured on a scale from 0 (very negative) to 7 (very positive).*

*Brand familiarity indicates the percentage of the respondents that were familiar with the brand.*

*The differences in familiarity [F(7,294)* = *1.103, NS] and attitude [F(7,259)* = *1.984, NS] between the brands were non-significant.*

**FIGURE 1 | Pictures of advertisements (i.e., billboards) in the experimental game: large-peripheral placement (left), small-central placement (right).**

"fpsyg-04-01023" — 2014/1/24 — 11:27 — page 7 — #7

experimental level – fill out player involvement questionnaire. The order in which participants played these experimental game levels with the two different controllers was counterbalanced to avoid order effects. Finally, after playing the game twice and filling out the player involvement questionnaires, respondents were asked to fill out a questionnaire exploring the effectiveness of in-game ads, and their socio-demographic and gamer characteristics. Participation in the experiment took approximately 45 min.

#### **MEASURES**

#### *Game performance*

We measured participants' *game scores* with each game controller (i.e., the place they finished in the race), as well as the *time* it took them to complete the racing level.

#### *Self-report measures*

*Kinesthetic involvement. Kinesthetic involvement* was measured by 14 items (*Cronbach's* α = 0.933), covering the player's involvement generated by both the controllability and naturalness of the game controllers. Twelve items gauged the players' perceived control over their actions and movements of the avatar in-game, and to which extent they found the controls easy to use (e.g., "I felt that my avatar was responsive to actions that I initiated," "I felt proficient in moving around in the game environment," "The game controls were easy to pick up"), while the two other items measured the perceived naturalness and intuitiveness of the game controllers (e.g., "The actions necessary for controlling the game were very close to that in the real world"). Agreement with these items was measured on a five-point intensity scale ranging from "not at all" (0) to "extremely" (4). The utilized scales were based on Witmer and Singer's (1998) presence questionnaire, Jennett et al.'s (2008) immersion scale, and Vanden Abeele's (2011) perceived control and perceived controller naturalness scales.

A principal component analysis (PCA) was conducted on these 14 items with oblique rotation (direct oblimin). The Kaiser–Meyer–Olkin measure verified the sampling adequacy for the analysis (KMO = 0.932). Bartlett's test of sphericity <sup>χ</sup><sup>2</sup> (91) <sup>=</sup> 938.563, *<sup>p</sup>* <sup>&</sup>lt; 0.001, indicating that correlations between items were sufficiently large for PCA. Results demonstrate that two components had eigenvalues that were greater than 1 and in combination explained 78.343% of the variance. These two components were in line with our expectations and revealed the *control* versus *controller naturalness* sub dimensions of kinesthetic involvement. **Table 2** shows the factor loadings after rotation. Next, we averaged the scores of the items loading on these two factors, leading to two new variables [*control*

#### **Table 2 | Summary of the principal component analysis results of kinesthetic involvement.**


*Note: Factor loadings of less than 0.200 have been omitted.*

(*Cronbach's* α = 0.97) and *controller naturalness* (*Cronbach's* α = 0.89)].

*In-game advertising effectiveness. Brand awareness* was measured on three levels. First of all, participants were asked to spontaneously recall which brands they remembered encountering in the digital game (i.e., free recall). Subsequently, participants were presented with a list of brand names (i.e., brand name recognition), and a list of brand logos (i.e., brand logo recognition). In each case, participants had to indicate which brand names and brand logos they remembered seeing in-game. For each recognition measure, the four correct options were included, as well as eight filler items and an "I don't know" option. The data that originated from these measures were combined into brand awareness variables that indicate how many brands (names, logos) each participant correctly recalled or recognized (*brand recall*, *brand name recognition*, *brand logo recognition*).

We measured IGA effectiveness in terms of *brand evaluation* as well, i.e., brand attitude and purchase intention. *Brand attitudes* of the integrated brands were measured by the mean of three seven-point scales anchored by the adjectives "good (0) – bad (6)", "like very much (0) – dislike very much (6)," and "pleasant (0) – unpleasant (6)" (*Cronbach's* α values range from.95 to.98). *Purchase intentions* of the brands were measured by using a fourpoint scale going from "not at all likely to buy" (1) to "very likely to buy" (4). However, we did not expect to find an effect of our manipulations on brand attitude or purchase intention, since participants only played the game for a very short while. Moreover, the game featured well-known, popular brands in order to create a realistic encounter with IGA. We thought it unlikely that the limited exposure to IGA (due to the short playing duration) would lead to significant changes in the evaluations of these established brands. Our findings show that this is indeed the case: the manipulations did not lead to significant changes in attitudes or the intention to buy the products. Therefore, we do not include the results concerning brand evaluation in the current paper and focus mainly on IGA effectiveness in terms of brand awareness.

*Background information.* Finally, participants were asked about their *socio-demographic characteristics* (e.g., gender, age) and their *gamer profile* (e.g., game experience, frequency, and familiarity with the two game controllers that were used (i.e., traditional *PlayStation 3* gamepad controller and *PlayStation Move* racing wheel controller). These variables were tested for their potential moderating effects, but were not found to be significant moderators.

#### **RESULTS**

"fpsyg-04-01023" — 2014/1/24 — 11:27 — page 8 — #8

#### **THE IMPACT OF GAME CONTROLLER**

As mentioned before, we conducted a within-subjects experiment in which we manipulated the type of game controller that the players used to command the game, resulting in two conditions [symbolic controller (i.e., traditional *PlayStation 3* controller) versus mimetic controller (i.e., *PlayStation Move* racing wheel)]. In order to test our hypotheses in this within-subjects context, we performed one-way repeated measures analyses of variance (ANOVAs) to examine the impact of game controller on (1) kinesthetic involvement and game performance and (2) the processing of IGA in terms of brand awareness (i.e., brand recall, brand name recognition, brand logo recognition).

First of all, we looked at the effect of the manipulation of game controller on the players' sense of kinesthetic involvement. We have defined *kinesthetic involvement* as consisting of two sub dimensions: *control* and *controller naturalness*. The existence of these two sub dimensions was confirmed by the principal component analysis we performed (see **Table 2**). Oneway repeated measures ANOVAs show that there are significant differences in all of these sub dimensions of kinesthetic involvement between experimental conditions (see **Table 3**). Playing the game with the symbolic controller led to more control over actions and movements in-game [*F*(1,30) = 90.816, *p* < 0.001], but playing the game with the mimetic controller was perceived as more natural [*F*(1,30) = 26.774, *p* < 0.001]. These results are therefore in line with *hypotheses 1* and *2*.

Moreover, when looking at the impact of game controller on game performance (i.e., *game scores* and *game completion time*) we see that the traditional game controller does not only lead to higher perceived control, but also to higher actual control. Results of one-way repeated measures ANOVAs demonstrate that playing with the mimetic controller has a detrimental effect on players' game score or the place in which they finished [*F*(1,30) = 34.634, *p* < 0.001; *M*symbolic controller = 2.032, *SD* = 2.213, *M*mimetic controller = 5.258, *SD* = 2.932] and on the time (measured in seconds) in which players finished the race [*F*(1,30) = 92.540, *p* < 0.001; *M*symbolic controller = 330.516, *SD* = 20.805, *M*mimetic controller = 366.065, *SD* = 18.417]. This result is thus in line with the expectations formulated in *hypothesis 3*.

Finally, when taking into account *kinesthetic involvement in its entirety*, we see that control outweighs naturalness (see **Table 3**): playing with the symbolic controller results in higher levels of kinesthetic involvement compared to playing with the mimetic controller [*F*(1,30) = 33.757, *p* < 0.001], providing support for *hypothesis 4*.

Next, we looked at the impact of game controller manipulation on the effectiveness of IGA in terms of *brand awareness* (i.e., *brand recall, brand name recognition,* and *brand logo recognition*). The findings of one-way repeated measures ANOVAs point out that there are significant variations in the recall and recognition rates of the brands between conditions: playing the game with the symbolic controller results in significantly higher levels of brand recall [*F*(1,30) = 13.304, *p* < 0.001], brand name recognition [*F*(1,30) = 40.208, *p* < 0.001] and brand logo recognition [*F*(1,30) = 38.958, *p* < 0.001] compared to playing with the mimetic controller (see **Table 4**). These results support *hypothesis 5*.

#### **THE IMPACT OF KINESTHETIC INVOLVEMENT**

The results thus show that the manipulation of game controller has a significant influence on (1) players' sense of involvement with the kinesthetic properties of the game and (2) the effectiveness of IGA in terms of brand awareness.

In order to check whether kinesthetic involvement mediates the impact of game controller on brand awareness, the direct effect of kinesthetic involvement on brand awareness was subsequently examined. Linear mixed model analyses on the brand awareness variables with the sub dimensions of kinesthetic involvement as repeated measures factors reveal that *brand recall* is not significantly affected by either of the sub dimensions. Both of the recognition measures, however, are significantly influenced by control (effect on *brand name recognition*: *F*(33,28.000) = 2.507, *p* = 0.008; *brand logo recognition*: *F*(33,28.000) = 3.002, *p* = 0.002).

Next, we performed linear mixed model analyses on the brand awareness variables with game controller type as factor and the control sub dimension of kinesthetic involvement as a repeated measures covariate (i.e., mediation analyses). Regarding *brand name recognition*, we see that the effect of game controller is weakened [*F*(1,59.000) = 4.096, *p* = 0.048] when control is included, although its mediating effect is not able to reach significance [*F*(1,59.000) = 3.145, *NS*]. When looking at the effect on *brand logo recognition*, the impact of game controller is diminished to the point of non-significance [*F*(1,59.000) = 2.966, *NS*], with control [*F*(1,59.000) = 5.174, *p* = 0.027] serving as a significant


"fpsyg-04-01023" — 2014/1/24 — 11:27 — page 9 — #9

**Table 3 |The impact of game controller on kinesthetic involvement.**

*Note: \*p* < *0.001.*

*The sub dimensions of kinesthetic involvement were measured by using five-point intensity scales ranging from 0 to 4.*



"fpsyg-04-01023" — 2014/1/24 — 11:27 — page 10 — #10

*Note: \*p* < *0.001.*

*The brand awareness variables indicate how many brands each participant correctly recalled or recognized (0–4) in each experimental condition.*

mediator. These results partially support *hypothesis 6*, showing that players' perceived control over their actions and movements in-game underlie the impact of game controller on the awareness of IGA.

#### **THE IMPACT OF BRAND PROMINENCE**

In order to analyze the influence of brand prominence, we included the placement characteristics of *ad size* (large versus small) and *spatial position* (central versus peripheral) as an additional withinsubjects factor in our experiment, resulting in four different placements (large-central, small-central, large-peripheral, smallperipheral). Since we created two levels for use in the experiment, each with placements for four different brands, we first of all checked whether the matched brands (e.g., large-central brand in level 1 versus large-central brand in level 2) differed significantly from each other in terms of brand awareness. One-way repeated measures ANOVAs on the awareness variables of the matched brands show that the differences were non-significant for all placement types, allowing us to combine the awareness scores of the brands into average brand awareness variables per placement.

To test the effect of brand prominence on brand awareness, we performed one-way repeated measures ANOVAs on the brand awareness rates of these four types of placements. Concerning *brand recall*, results reveal that there are significant differences between the different placements [*F*(3,90) = 3.425, *p* = 0.021]. Large-central placements (*M* = 0.548, *SD* = 0.568) obtain the highest recall rates, followed by small-central placements (*M* = 0.387, *SD* = 0.615), large-peripheral placements (*M* = 0.258, *SD* = 0.445), and lastly, small-peripheral placements (*M* = 0.161, *SD* = 0.374). Bonferroni *post hoc* tests demonstrate that the significant differences are situated between the large-central and small-peripheral placements (*p* = 0.003).

Regarding *brand name recognition*, results are similar. The different placement types differ significantly in their effect on brand name recognition [*F*(3,90) = 7.035, *p* < 0.001], with largecentral placements (*M* = 0.871, *SD* = 0.562) having the greatest influence, followed by small-central (*M* = 0.645, *SD* = 0.709), large-peripheral (*M* = 0.452, *SD* = 0.568), and small-peripheral placements (*M* = 0.258, *SD* = 0.445). Bonferroni *post hoc* tests reveal that the large-central placements are again significantly different from the small-peripheral placements (*p* < 0.001).

Finally, when looking at *brand logo recognition*, results show that the different placement types also vary significantly in their effect [*F*(3,90) = 7.520, *p* < 0.001], with large-central placements (*M* = 1.032, *SD* = 0.547) having the greatest impact, followed by the small-central (*M* = 0.807, *SD* = 0.703), large-peripheral (*M* = 0.548, *SD* = 0.624) and the small-peripheral placements (*M* = 0.387, *SD* = 0.495). Bonferonni *post hoc* tests demonstrate that the large-central placements vary significantly from the large-peripheral (*p* = 0.030) and small-peripheral placements (*p* < 0.001).

Based on these results, we can accept *hypothesis 7*: the most prominent (i.e., large-central) placements obtain significantly higher rates of awareness compared to the most subtle (i.e., smallperipheral) placements. Moreover, we can answer*research question 1*: when looking at the effectiveness of different types of IGA placements in terms of ad size and spatial position, results indicate that especially spatial position is of importance, with the central placements obtaining the highest recall and recognition scores. The effect of ad size is much smaller; large placements are not able to lead to significant differences in brand awareness compared to their smaller counterparts.

Lastly, in order to be able to answer *research question 2*, we checked for interaction effects of our manipulations on the awareness of IGA by conducting two-way repeated measures ANOVAs and including game controller type and brand prominence as within-subject factors. Our results show a significant interaction effect of game controller and brand prominence on *brand name recognition* [*F*(3,90) = 5.016, *p* = 0.003]. When looking at the results in greater detail, we see that game controller significantly affects the brand name recognition of the large-central [*F*(1,30) = 37.345, *p* < 0.001] and small-central placements [*F*(1,30) = 6.234, *p* = 0.018] while the recognition rates of the peripheral placements are not affected. Moreover, we observe significant changes in brand name recognition between the different placement types when people are playing with the symbolic controller [*F*(1,30) = 9.158, *p* < 0.001], but not with the mimetic controller (see **Figure 2**).

The results concerning *brand recall* show a similar trend: game controller significantly affects the recall rates of the large-central [*F*(1,30) = 10.552, *p* = 0.003) and small-central placements [*F*(1,30)=5.094, *p*=0.031], while the recall rates of the peripheral placements are not affected. We also observe significant changes

"fpsyg-04-01023" — 2014/1/24 — 11:27 — page 11 — #11

in recall between the different placement types when playing with the symbolic controller [*F*(1,30) = 4.288, *p* = 0.007], but not with the mimetic controller (see **Figure 3**). However, this interaction effect of game controller and brand prominence is not able to reach significance [*F*(3,90) = 2.553, *NS*].

Finally, results show that the interaction effect of game controller and brand prominence on *brand logo recognition* is also not significant [*F*(3,90) = 1.842, *NS*]. Here, the manipulation of game controller significantly affects all placements [large-central [*F*(1,30) = 18.028, *p* < 0.001], small-central [*F*(1,30) = 6.328, *p* = 0.017], and large-peripheral placements [*F*(1,30) = 4.153, *p* = 0.050]] except for the small-peripheral ones. The brand logo recognition rates of the different placement types also significantly differ when playing with the symbolic controller [*F*(1,30) = 6.857, *p* < 0.001], although they still do not when playing with the mimetic controller (see **Figure 4**).

#### **DISCUSSION**

The aim of the study was to contribute to research on the effectiveness of IGA by analyzing the impact of two contextual factors on the processing of IGA, namely a person's sense of *involvement* in a game, and the *prominence* of the advertisements that are integrated into the game.

Prior research had already established that *player involvement* is a relevant factor to consider in an IGA context, showing that different levels of a player's general involvement with a game affect the way they process IGA in terms of brand awareness (e.g., Grigorovici and Constantin, 2004; Lee and Faber, 2007). However, involvement is a multidimensional construct and in the specific context of digital games, it is understood as a combination of six dimensions that are able to capture the player's attention (Calleja, 2011). The study scrutinizes the effects of one of these dimensions, namely *kinesthethic involvement* or the player's involvement related to the modes of control and movement in a game (Calleja, 2011).

In order to test the specific effect of kinesthetic involvement on the processing of IGA, we conducted a *within-subjects experiment* in which we manipulated the type of *game controller* that was used to play the game between two conditions [symbolic controller (i.e., traditional *PlayStation 3* gamepad controller) versus mimetic controller (i.e., motion-based *PlayStation Move* racing wheel)]. Results show that this manipulation of

"fpsyg-04-01023" — 2014/1/24 — 11:27 — page 12 — #12

game controller had a significant impact on the players' sense of involvement with the kinesthetic properties of the game. The controls of the traditional symbolic controller were easier to learn and handle, allowing the players more precise control over their movements and actions in the game world, while the controls of the motion-based mimetic controller were perceived to be more natural and intuitive. These results replicate the findings of previous studies looking at the effects of game controller type on player experience and performance (Johnson et al., 2002; McMahan et al., 2010; Skalski et al., 2011, Vanden Abeele, 2011). When looking at kinesthetic involvement in its entirety, we see that control outweighs naturalness: playing the game with the symbolic controller led to higher levels of kinesthetic involvement compared to playing the game with the mimetic controller.

Moreover, the variation in game controller significantly influenced the processing of the advertisements embedded in the game: when participants played the game with the symbolic controller, they were able to recall and recognize significantly more brands. This finding is in line with the results from Dardis et al. (2012), who also found that playing with a symbolic (versus mimetic) game controller led to increased ad recall scores.

We subsequently looked at the impact of the sub dimensions of kinesthetic involvement on brand awareness. Results show that brand recall was not significantly influenced by kinesthetic involvement. However, both of the brand recognition measures were significantly affected by a person's perceived control over his actions and movements in the game. Mediation analyses further showed that the impact of game controller on brand logo recognition was fully mediated by the player's perceived control. This finding suggests that when a game is easier to control, the process going from the learning of the controls to the automation of movement in-game will happen more quickly, freeing attentional resources, and leading to people being able to pay more attention to secondary elements of the game such as IGA. These results are therefore in line with the LC4MP (Lang, 2009).

Apart from the influence of game controller type and kinesthetic involvement, we additionally checked the impact of *brand prominence* in an IGA context. Brand prominence is a factor

that depends on several placement characteristics such as ad size, color, attractiveness, and spatial position. Several studies already looked at the effect of these placement characteristics, showing they can have a major impact on the awareness of IGA (e.g., Grigorovici and Constantin, 2004; Schneider and Cornwell, 2005; Acar, 2007; Lee and Faber, 2007; Bardzell et al., 2008; Jeong and Biocca, 2012). These studies mostly analyzed the impact of only one characteristic though (i.e., either ad size or spatial position). The current study investigates the effects of brand prominence by examining how both *ad size* and *spatial position* relate to people's response to the brand placements, manipulating and combining both characteristics into four different placement types (small-peripheral, small-central, large-peripheral, large-central).

Results demonstrate that there are indeed significant changes in effectiveness between different types of placements, with largecentral placements obtaining the highest awareness rates, followed by small-central, large-peripheral, and lastly, small-peripheral placements. Spatial position is the most important placement characteristic, with the central placements obtaining the highest brand recall and brand recognition scores. The effect of ad size is much smaller; large placements are not able to lead to significant differences in brand awareness compared to their smaller counterparts.

Finally, we lookedfor interaction effects of both game controller and brand prominence on brand awareness. Results indicate that for brand name recognition, the effects of our two manipulations indeed interacted with each other, showing that game controller mainly affects the central placements, while the peripheral placements are not influenced. Moreover, there are significant changes in awareness rates between the different placement types in the symbolic controller condition, but not in the mimetic controller condition. We thus observe floor effects for both playing with the mimetic controller and the awareness rates of the peripheral placements. Since playing the game with the mimetic controller proves to be more difficult, the controls of the game take up the majority of the player's attentional resources, resulting in all brand placements receiving low attention. Moreover, it seems that it is indeed harder for the peripheral (versus central) placements to attract and keep the player's attention, leading to their awareness rates remaining low in either condition.

In summary, although the sample of our within-subjects experiment was relatively small (*N* = 31), results already show that kinesthetic involvement is a relevant factor to consider when studying or planning to integrate advertising inside digital games. The findings demonstrate that the nature of a game controller can have a significant effect on the processing of IGA, and that this effect can be partially brought back to players' perceived control over their actions and movements in-game. However, it would be interesting for future research to examine the impact of kinesthetic even further in experimental studies containing a larger pool of participants with different profiles (e.g., differing levels of prior game expertise), looking at other aspects of kinesthetic involvement, employing different kinds of games, etcetera.

For instance, a person's sense of kinesthetic involvement might also be dependent on his game expertise: experienced players will often learn the controls of a game more quickly (Calleja, 2011), resulting in a faster automation of control. Since our study mostly included experienced gamers, we did not find a moderating effect of this characteristic. However, it might be interesting for future research to take a closer look at the effect of gamer characteristics on the effectiveness of IGA in general and in combination with the impact of kinesthetic involvement in particular.

Moreover, kinesthetic involvement is not only dependent on the type of game controller that is used to play a game; it also relies on the different modes of in-game control that are possible. In some game environments, in-game control can be brought back to the control over a single entity or avatar, which can be interacted with either from a third-person perspective (giving the player a sense of distance) or from a first-person perspective (giving the player a view of the game world through the eyes of the avatar; Calleja, 2011). In other games, players have control over a number of game-pieces or miniatures, either individually or simultaneously, controlling the miniature world by taking on the role of an external, god-like controller (Calleja, 2011). Further, it is important to mention that some types of games are far more focused on the kinesthetic aspect than others. Games involving intense, fast-paced kinesthetic actions such as racing and shooter games, where the player has to constantly manipulate the controller while following the visual cues shown on the screen, often require extreme levels of attentional resources (Apperley, 2006). In other types of games or game genres, the focus may not lie on fast-paced kinesthetic action but on other components of the game (e.g., puzzle games, strategy games). As such, our results may not apply to all game genres and situations. It is therefore advisable for advertisers to carefully select the type of game in which they want to embed their advertisements (i.e., game genre, game console, game controller).

Finally, the results indicate the relevance of brand prominence, showing that spatial position is a more important variable to consider than ad size. Strategically placing advertisements in the center of the player's viewpoint may prove to be far more effective than randomly placing large advertisements inside a game environment.

#### **REFERENCES**


"fpsyg-04-01023" — 2014/1/24 — 11:27 — page 13 — #13


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 31 October 2013; accepted: 30 December 2013; published online: 24 January 2014.*

*Citation: Herrewijn L and Poels K (2014) Recall and recognition of in-game advertising: the role of game control. Front. Psychol. 4:1023. doi: 10.3389/fpsyg.2013.01023*

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology. Copyright © 2014 Herrewijn and Poels. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

"fpsyg-04-01023" — 2014/1/24 — 11:27 — page 14 — #14

## Advert saliency distracts children's visual attention during task-oriented internet use

#### *Nils Holmberg1,2\*, Helena Sandberg1 and Kenneth Holmqvist <sup>2</sup>*

*<sup>1</sup> Department of Communication and Media, Lund University, Lund, Sweden*

*<sup>2</sup> Lund University Humanities Lab, Lund University, Lund, Sweden*

#### *Edited by:*

*Jaana Simola, University of Helsinki, Finland*

#### *Reviewed by:*

*Netherlands*

*Tom Foulsham, University of Essex, UK Ignace Hooge, Utrecht University,*

#### *\*Correspondence:*

*Nils Holmberg, Department of Communication and Media, Lund University, Centre for Languages and Literature, Helgonabacken 12, PO Box 201, 221 00 Lund, Sweden e-mail: nils.holmberg@kom.lu.se*

The general research question of the present study was to assess the impact of visually salient online adverts on children's task-oriented internet use. In order to answer this question, an experimental study was constructed in which 9- and 12-year-old Swedish children were asked to solve a number of tasks while interacting with a mockup website. In each trial, web adverts in several saliency conditions were presented. By both measuring children's task accuracy, as well as the visual processing involved in solving these tasks, this study allows us to infer how two types of visual saliency affect children's attentional behavior, and whether such behavioral effects also impacts their task performance. Analyses show that low-level visual features and task relevance in online adverts have different effects on performance measures and process measures respectively. Whereas task performance is stable with regard to several advert saliency conditions, a marked effect is seen on children's gaze behavior. On the other hand, task performance is shown to be more sensitive to individual differences such as age, gender and level of gaze control. The results provide evidence about cognitive and behavioral distraction effects in children's task-oriented internet use caused by visual saliency in online adverts. The experiment suggests that children to some extent are able to compensate for behavioral effects caused by distracting visual stimuli when solving prospective memory tasks. Suggestions are given for further research into the interdiciplinary area between media research and cognitive science.

**Keywords: online advertising, children, internet use, distraction, visual saliency, visual attention**

#### **1. INTRODUCTION**

Children's internet use is known to vary a lot between countries (Holloway et al., 2013). In Sweden, children attending primary school and middle school come into contact with the internet in a wide variety of everyday situations, ranging from information search in connection to school projects, to instant messaging on mobile phones during leisure activities. Current media research indicates that children spend an increasing amount of time connected to the internet, and that "tweens" aged between 9 and 12 spend about 1–2 h online a day on average. Notably, there is a steep increase in online activities between these two age groups, and time spent online per day is more than doubled over this age interval (Nordicom, 2013). Typical online activities among 9-year-old children are playing games and watching video clips. These activities are also found among 12-year-olds, but in addition there is a pronounced increase in time spent on social networking websites (Findahl, 2012).

Online advertising seems to quickly become a natural and persistent part of children's overall online experience. Interview studies of 9-year-old children show that, while online advertising is generally perceived as disturbing and confusing, these adverts are also largely tolerated and even sometimes consumed as entertainment (Martinez et al., 2013). However, at the same time these children often show a naive conception about the commercial and persuasive intent of online advertising, as well as a limited understanding of how online advertising use visual cues to capture attention (Buijzen et al., 2010). Whereas older children naturally start to develop the necessary attentional mechanisms needed to shield off impinging visual stimuli from online adverts, these neural structures are still developing in younger children, causing them to react to salient adverts on an involuntary level (Kramer et al., 2005). Involuntary exposure to online advertising might be a problem during free entertainment-based web surfing (depending on whether adverts are age-appropriate, distinct, recognizable etc.), but such "forced exposure" could become much more problematic in the case of task-oriented internet use. In the latter scenario, online adverts could introduce a disruptive element in situations where children are trying to pursue goaldirected activities online. Should this be the case, it becomes urgent to safeguard young children's rights to equal opportunities online (Holloway et al., 2013).

In order to investigate younger children's cognitive sensitivity to online advertising, we believe it is crucial to take into account both individual factors such as age, gender and cognitive development, as well as the visual properties of online adverts. In this study we used the anti-saccade task to determine children's individual level of oculomotor control. This paradigm directly measures participants' voluntary control of their eye movements, and has been shown to correlate with cognitive functions such as executive control, working memory capacity and visual distractibility (Munoz and Everling, 2004, Kramer et al., 2005, Zanelli et al., 2005, Hutton and Ettinger, 2006). Several studies have also shown that this capacity to inhibit stimulus-driven, reflexive eye movements undergoes significant development throughout childhood, which is linked to increased development of the frontal lobe (Klein and Foerster, 2001, Eenshuistra et al., 2004). To our knowledge, there have been no attempts to measure childrens' visual distractibility in relation to their advertising exposure. In the present study, the motivation for measuring oculomotor control in two age-groups was that we intended to differentiate the effects of prefrontal control from other age-related effects, and that we wanted to find out if better gaze control is related to less advert distraction. The motivation for selecting age groups at 9 and 12 years was that previous research has shown that these ages respresent clear developmental and cognitive stages in children's understanding of persuasive advertising content (Buijzen et al., 2010).

In a recent study, we analyzed low-level saliency features in internet adverts with regard to children's visual attention (Holmberg et al., 2013). In this study, a group of 9-year-old children were allowed to surf freely on their favorite websites, while eye movement data were collected along with real-time screen recordings of the web page stimuli. These screen recordings were used to quantify low-level saliency aspects in all adverts that the children encountered. Key findings of this exploratory study were: (1) that low-level saliency features such as motion (pixel change), luminance and edge density in online adverts had a positive correlation with children's visual attention, and (2) that children with low individual level of gaze control had an increased sensitivity to these saliency features. Other studies have focused on stimulus onset as a key component in low-level visual saliency. This research has shown that abrupt onset of visual stimuli has a powerful effect on attentional capture (Ludwig et al., 2008), and that such low-level factors can impair task performance by distracting attention and increasing cognitive load (Lavie, 2005). There is also some evidence of "high-level saliency features" and their effects on attention (Findlay and Walker, 1999). High-level saliency refer to visual features that become relevant depending on the subject's particular cognitive task (Malcolm and Henderson, 2010), and in order to avoid confusion we will refer to this kind of saliency as "task relevance." Currently, there is a fairly strong consensus that, while low-level visual features such as abrupt onset can account for a some portion of people's eye movement behavior, task relevance is more powerful in explaining visual attention allocation (Foulsham and Underwood, 2008, Tatler et al., 2011).

The general research question of the present study was to assess the impact of internet adverts on children's internet use. Since internet use is a fairly broad concept including several types of interaction, we decided to focus further on one particular type of use case in which children interact with web pages in order to solve a predefined task involving memory and judgement. This type of web interaction should have a high intrinsic value to children, and should consequently be facilitated rather than hindered by the commercial online environments that children encounter. We reasoned that distraction caused by internet adverts would affect both the gaze behavior involved in the process of solving the tasks, as well as the task performance. In order to create experimental manipulations of the internet adverts, we utilized two aspects of visual saliency that are well-known in vision research: low-level visual features (Itti and Koch, 2000; Peters et al., 2005) and content relevance (Henderson, 2003). Low-level visual features were manipulated by varying the advert onset speed, and this feature was expected to distract the visual processing by attracting visual attention to the adverts (measured as saccades to ads). Advert relevance was manipulated by varying the level of task relevance in the advert content, and was expected to cause visual distraction by retaining attention on the adverts (measured as dwell time on ads).

Users' visual interaction with web pages and other interfaces will differ widely depending on the particular task that the interaction process is intended to solve (e.g., Yarbus, 1967; Cowen et al., 2002). The more viewers are allowed to decide their own subjective goals during visual interaction, the more these viewing patterns will vary between individuals (so-called free viewing conditions, e.g., Jansen et al., 2009). By contrast, if viewers are presented with a distinct and uniform task, viewing patterns will generally show much more similarities. In behavioral and psychological research the latter case usually means that it becomes easier to detect weak behavioral signals among random noise. It has been shown that in the absence of a task, viewing patterns become more influenced by low-level saliency features, and conversely, if a task is present, viewing patterns become more concentrated to task-relevant visual features (Hooge et al., 2005). An additional benefit from using a task-oriented experimental paradigm is that the visual interaction process can be evaluated in terms of performance, where some interaction strategies can be linked to better outcomes. In a free viewing task such evaluation of the interaction process is much more difficult. Finally, a task-oriented paradigm is also sensitive to the subjects individual level of expertise, and thus it is possible to isolate and estimate the positive effect of task expertise on solving a particular task (Jarodzka et al., 2010).

By constructing an experimental website that repeatedly presented children with a series of similar tasks, the present study has sought to benefit from all the positive aspects of task-oriented study designs previously mentioned. Thus we expected to find a high overall attentional focus on task relevant elements on the web pages, as well as a difference between younger and older children (caused by a higher level of internet expertise in the latter group). But more importantly, by using a task-oriented paradigm we expect to find differences in task performance depending on the advert saliency manipulations presented in each trial. Task performance is both measured through *performance measures* involving the accuracy and duration of each task response, but also as several *distraction measures* describing the children's visual interaction with the web pages. Thus, our experiment allowed us to test the effects of advert saliency conditions on performance measures and on distraction measures respectively, and it also allowed us to explore possible links between these two kinds of measures. This is crucial since advert saliency might affect these measures differently, and in that case it is important to be able to capture these differential effects to get a correct understanding of the effects of advertising saliency on children's task-oriented internet use.

Two types of performance measures were assumed to be sensitive to advert saliency manipulations: *trial accuracy* and *trial duration*. We reasoned that the advert saliency conditions would distract the children and cause a lower ability to solve the tasks correctly (trial accuracy) and efficiently (trial duration). For the sake of simplicity, high trial accuracy and low trial duration are grouped together as high task performance in the hypotheses. We hypothesized directional effects on task performance caused by the following experimental factors:


Two types of distraction measures were constructed in order to capture effects of the advert saliency manipulations: *saccades to ads* and *dwell time on ads*. Saccades to ads measure the number of times visual attention has been shifted toward experimental adverts instead of objects relevant for solving the tasks. We reasoned that this measure would capture one important aspect of distraction: (1) the attention attracting power of ads. Dwell time on ads measure the actual amount of time spent on experimental adverts instead of other objects that are critical for solving the tasks, and we reasoned that this measure would capture a second crucial aspect of distraction: (2) the attention retaining power of ads. For simplicity, these two aspects of distraction are grouped together in the hypotheses. We hypothesized directional effects on distraction measures caused by the following experimental factors:


As can be deduced from the hypotheses listed above, the current study contains both correlational hypotheses (H1a and H2a) as well as more causal hypotheses associated with experimental manipulations (H1b, H1c, H2b, and H2c). This structure will also be reflected when presenting and discussing the results of the study.

#### **2. MATERIALS AND METHODS**

#### **2.1. PARTICIPANTS AND APPARATUS**

The participants were selected from two age groups, 9-year-olds (*n* = 19) and 12-year-olds (*n* = 26), and were recruited from an elementary school in the south of Sweden. The distribution was fairly equal between girls (*n* = 23) and boys (*n* = 22). Only children that were given parental consent participated in the study (*n* = 45). The data recording equipment consisted of an SMI RED 250 eye-tracking camera and a laptop computer (Intel Core i7 2.67 GHz CPU, 2.98 GB RAM). The laptop was used both for stimulus presentation and eye movement recordings, and was connected to the Internet through a wireless 4 G router. Visual stimuli were presented on a 1680 × 1050 LCD monitor. The interactive web tasks were presented using the standard Internet Explorer 8 web browser. Eye-tracking data were recorded at 250 Hz using the SMI iViewX 2.7 software during all experimental modules.

#### **2.2. EXPERIMENTAL DESIGN AND MATERIALS**

A pre-test was administered to all children in the form of an antisaccade test. After 4 practice trials, a series of 32 anti-saccade trials were presented to each participant. In each trial, a central fixation cross was replaced by a peripheral target, and participants were instructed to look in the opposite direction relative to the target location. The stimulus parameters of the anti-saccade test were chosen in accordance with recently suggested standards (Antoniades et al., 2013). Thus, the test was designed with no temporal gap between central fixation offset and target onset. The central fixation foreperiod was set to a duration of 1500–2000 ms, and the target duration was set to 1000 ms. After target offset, a blank screen was presented for 500 ms. Targets were presented in four randomized locations (top, bottom, left, and right) with an amplitude of ca 10◦ from the central fixation cross. To reduce fatigue in the children, the stimuli were constructed with a dark background.

Children in both age groups performed the exact same experiment, which consisted of 36 trials. Each trial consisted of a webbased visual search task, in which the participant was instructed to memorize a single image presented on an initial web page, and then proceed to a second web page to select the the most similar image in an array of 12 images. On the second page, 3 similar but unique target images were presented, along with 9 unique distractor images with lower similarity (**Figure 1**). The experimental images were created by splitting an animated open source movie (Big Buck Bunny, © 2008, Blender Foundation) into separate frames, and image similarities were determined using the OpenCV histogram comparison algorithm (Bradski, 2000). Target images were selected from a high correlation coefficient interval (0*.*95 ≥ *r* ≥ 0*.*65), while distractor images were selected using a lower threshold (*r* ≤ 0*.*10). An important implication of this image similarity approach was that target images were never identical to the initial image, which added a cognitive component due to the fact that finding an optimal solution encouraged the participants to perform an image similarity judgement. The initial image memorization phase was self-paced, while the second image selection phase introduced a 7000 ms delay before the web page allowed the participant to select an image and thus move to the next web task.

On the second image selection web page, an online advert was presented according to 9 saliency conditions. The low-level saliency conditions were operationalized as two levels of advert onset speed, which were implemented as animated GIF images. Each GIF animation consisted of a number of transitional frames between the advert image and a blank white image, and was presented at a frame rate of 10 fps. Smooth advert onset was created using 50 transitional frames and a 1000 ms pause, while abrupt onset speed was created using 2 transitional frames and a 3000 ms

**solve the online tasks.** The image array contains 3 target images and 9 distractor images in randomized positions. An experimental advert in the high task relevance condition is presented in the

make the web pages more realistic, and was kept constant in all trials. The participant's eye movements during the trial are superimposed on the web page image.

pause. The GIF animations were then looped in order to present the low-level saliency conditions continuously on the web pages during each trial. The onset speed manipulation gave the visual impression that the adverts disappeared and then reappeared softly or abruptly on the web pages (Supplementary Material). The onset speed factor also included a control condition, consisting of the static advert images. These low-level saliency conditions were then combined with three types of task relevance including a control condition, producing a total of 3 × 3 advert saliency conditions. The task relevance conditions were operationalized as two levels of task relevant pictorial content in adverts. Adverts in the low task relevance condition depicted system dialog windows and website login windows, while adverts in the high task relevance condition depicted mockup adverts that closely resembled the target pictures in tasks (**Figure 2**). The task relevance factor also included a control condition depicting irrelevant inanimate objects.

The advert conditions were presented in randomized order during the web-based tasks. Each advert saliency condition was repeated four times, and advert positions were randomized between the four corners of the web page (top-left, top-right, bottom-left, and bottom-right). These positions were assumed to emulate typical advertising positions on real web pages. All adverts except those in the high task relevance condition were based on naturally occurring adverts found on websites frequently used by children in the current sample.

#### **2.3. PROCEDURE**

Each child was first greeted and presented with a verbal outline of the web-based tasks to be performed. Careful consideration was taken to ensure that the children were kept naive about the focus on online adverts and the exact nature of our data recordings. Prior to all eye-tracking recordings the participants were calibrated using a 5-point calibration method available in the SMI iViewX software. Calibrations were done at an eye-monitor distance of ca 700 mm, and were repeated until the horizontal and vertical deviation was below 1◦ of visual angle. After the first calibration, each participant underwent a 9 plate Ishihara color vision test presented on screen (Hardy et al., 1945). Results of this test indicated that all participants had full color vision. After another calibration, an anti-saccade test was performed containing 4 initial test trials and 32 actual trials. A third calibration was then undertaken before a web browser loaded and presented the instructions for the web-based tasks. First, the participants were instructed on how to solve the tasks through a detailed verbal walk-through of two test trials. The participants were instructed to memorize an initial image for each task and then try to find and click on the most similar image on a second web page. No information was given about the number of target

and distractor images. Instructions were given to complete each task as accurately as possible, rather than as quickly as possible. The participants were not given any information about the advert content accompanying each task, and thus they were not instructed to avoid any adverts. All participants received a movie ticket as reward for active participation in the study. When the data collection phase was finished, meetings were arranged with the children in order to inform them about the true purpose and methods of the experiment.

#### **2.4. DATA ANALYSIS**

The overall quality of the eye-tracking data was calculated as the average deviation between the calibrated point of regard (POR) and 4 validation points. The average horizontal and vertical deviation was 0.75 and 0.92◦ respectively. The amount of missing samples (including blinks) in the anti-saccade data was 12.6%. These quality measures were only calculated for the anti-saccade dataset, but it should generalize to the dataset for internet use as well, since the exact same calibration procedure was applied in both cases.

Eye movement data from the anti-saccade test were analyzed by using the Engbert and Kliegel algorithm in order to detect the first saccade in each trial (Engbert and Kliegl, 2003). A minimum saccade duration of 32 ms was provided as a parameter for the detection algorithm. The first saccades were then analyzed for latency, peak velocity and direction relative to target position using a second algorithm (Ahlström et al., 2013). Saccade latency was calculated using a minimum latency parameter of 0.08 ms, peak velocity was calculated using a maximum saccade velocity parameter of 1000◦/s. Anti-saccades were categorized binomially as correct if they were terminated within a 45◦ angle in the opposite direction of the target location. Only the total proportion of correct anti-saccades for each participant was used for further analysis, as this construct was considered to be the most valid measure of gaze control.

Behavioral data from the children's task-oriented internet use were analyzed in two major steps. In the first step, the two performance measures were analyzed. Trial accuracy was determined by analyzing mouse click responses recorded by SMI Experiment Center 3.2 and encoding these responses as a binomial variable depending on whether the tasks had been solved correctly by clicking on one of the target images. The second product measure, trial duration, was also analyzed in this step by recording the time difference in milliseconds between trial onset (when the task web pages were loaded) and the participants' response (when the mouse click was used to solve the task). Since the SMI software logged the timing of these events, we could control for variable network latencies in the web-based stimulus presentation.

In the second step, the eye movement data from each trial were extracted and eye movement events such as fixations, saccades and blinks were detected using the SMI BeGaze 3.2 software. These event detected eye movement data were then used to calculate the two distraction measures in relation to the area of interest (or AOI) corresponding to the adverts. Dwell time on ads was calculated by adding all fixation durations on the experimental ads for each trial. This AOI-based measure is better known as *total dwell time* in the eye-tracking literature (Holmqvist et al., 2011). The function of this measure is often to provide a close approximation of the total amount of visual attention devoted to a specific region in the visual field. Saccades to ads were calculated by counting the number of saccades that originated outside the pixel coordinates of the experimental advert AOIs, and terminated inside this same region (a variation of the more common *number of saccades* and *number of transitions* measures) (Holmqvist et al., 2011). The denomination of the distraction measures was chosen in order to clearly contrast the functional difference between fixations and saccades. Thus, saccades to ads were assumed to measure the ads' attention attracting power, while dwell time on ads was assumed to measure their attention retaining power (Born and Kerzel, 2008).

#### **3. RESULTS**

#### **3.1. CHILDREN'S GAZE CONTROL**

The children's individual level of gaze control was measured with an anti-saccade test in the beginning of the experiment. The proportion of trials containing valid eye movement data was high (92.2%). However, the proportion of correct responses was low, indicating that the children had difficulties inhibiting saccades toward the distractor, and saccading in the opposite direction at target onset. The average proportion of correct saccades was 0.23 for 9-year-olds and 0.45 for 12-year-olds. The overall proportion of correct anti-saccades was 0.36, which is considerably lower than what would be expected in an adult population in a similar task. Success rates around 80% have been reported for adults in recent large-scale studies (Hutton and Ettinger, 2006). Conversely, saccade latencies in the child sample were longer than what would typically be expected among adults. In correct anti-saccade trials, the average saccade latency was 409 ms among 9-year-olds and 325 ms among 12-year-olds (overall 344 ms), while the same latency measure among adults typically lies around 200 ms (Holmqvist et al., 2011). Although we report saccade latency in the current study, only the individual proportion of correct anti-saccades was used as an independent variable in the statistical analyses. The reason for this is that the latter measure seems to have a higher validity with regard to children's gaze control.

#### **3.2. PERFORMANCE AND DISTRACTION MEASURES**

Task performance was measured as two product measures, trial accuracy (whether the task was answered correctly or not) and trial duration (the time taken to provide a solution to the tasks). All 36 trials contained valid performance data for all 45 participants. The overall trial accuracy was high (95.9% correct), but there was significant differences between 9-year-olds (93.4%) and 12-year-olds (97.6%), as well as between boys (95.1%) and girls (96.6%). Looking at trial duration, the average time to complete a task was just over 10 s (11558 ms). There was no significant difference in trial duration depending on age, but girls were about one second faster than boys on average. Trial number had a significant negative effect on trial duration, but no effect on trial accuracy, meaning that the children became faster to solve tasks toward the end of the experiment. **Figure 3** shows the effect of children's age and gender on task performance.

The distraction measures used in this study were dwell time on ads (attention retention) and saccades to ads (attention attraction). All 36 trials contained valid eye movement data for all 45 participants. The average fixation time on ads was just over half a second (654 ms), with no significant differences depending on age or gender. However, there was a significant negative effect of trial number, meaning that children tended to spend less time on experimental adverts toward the end of the experiment. The average number of saccades to ads was just over one saccade (1.13), and there was no significant differences depending on children's age or gender. As in the case of the previous distraction measure, there was a significantly negative effect of trial number, which would indicate that the children became less prone to behavioral distractions over the course of the experiment (as well as more proficient in solving the tasks). **Figure 4** shows the effect of advert saliency conditions on task distraction.

#### **3.3. EFFECTS ON TASK PERFORMANCE MEASURES**

We hypothesized that children's task performance would depend on individual factors as well as advert saliency conditions. More specifically, our hypotheses were that trial accuracy and trial duration could be described as a function of subject age and gaze control (H1a), level of advert onset speed (H1b), and level of advert task relevance (H1c). To test these hypotheses, the dataset was analyzed using linear mixed models in which the unique identifier of the experimental adverts was treated as a random factor (using the lme4 package in R). Subject was not entered as a random factor, since the gaze control variable also contained values that were unique for each participant. Fitting the data to these multi-level models provided partial support for our hypotheses regarding trial accuracy, but only weak support regarding trial duration. In the case of trial accuracy, all individual factors proved to have significant effects. Thus, older children as well as children with better gaze control were able to solve the tasks significantly more accurately, which gives support for hypothesis H1a. In the case of trial duration, the only significant effect

was associated with male gender. Thus, boys generally required more time to solve the tasks. The advert saliency conditions did not seem to have a negative impact on trial accuracy or trial duration, and thus hypotheses H1b and H1c failed to gain support. Taken together, the evidence suggests that individual factors had an effect on one aspect of task performance (trial accuracy), while neither advert onset speed nor advert task relevance had any significant impact task performance.

**Table 1** shows how advert saliency conditions and individual factors affected children's task performance. Task performance was divided into trial accuracy and trial duration, and the same independent variables were then used to model effects on both these performance measures. Tables for these performance measures are shown side by side. The coefficients and *p*-values for each independent variable are shown in the order they were entered. The advert saliency conditions consisted of three levels, and the effects of these conditions were tested aginst the control condition in the intercept. The level of multicollinearity between independent variables was low. In order to describe the model fit of the independent variables, the deviance of the proposed models were compared to the deviance of unconditional null models which included only the intercept and the random factor as independent variables. The proposed models and their corresponding null models were compared using chi-square tests, which showed that the independent variables contributed significantly to explaining the observed variance in trial accuracy and trial duration. Since the proposed models were used for hypothesis testing rather than modeling the best combination of predictors, no further attempts were made to optimize the models by excluding non-significant independent variables.

#### **3.4. EFFECTS ON TASK DISTRACTION MEASURES**

We also hypothesized that individual factors and advert saliency conditions would have distractive effects on children's gaze behavior while processing the tasks. The distraction measures that we analyzed in this study were: (1) dwell time on ads, and (2) saccades to ads, and we hypothesized that these measures would be sensitive to subject age and gaze control (H2a), level of advert onset speed (H2b), and level of advert task relevance (H2c). To test these hypotheses, additional linear mixed models were constructed using the lme4 package in R, in which adverts where treated as a random factor. As in the previous models, subject was not entered as a random factor, since the gaze control variable also contained values that were unique for each participant. Fitting the data to these multi-level models provided strong evidence for our hypotheses concerning both task distraction measures. Advert onset speed and advert task relevance were associated with increases in both dwell time on ads and saccades to ads. Thus, higher levels of advert saliency caused increased attentional retention as well as increased attention attraction in children, which provides support for H2b and H2c. Overall, there was a significant decrease on both distraction measures among children with better gaze control, which gives partial support for H2a. Contrary to H2a, the results for dwell time on ads show that older children spent significantly more time on experimental ads than younger children, but no such effect was detected in saccades to ads. Children's gender did not have any significant effects on distraction measures. Taken together, this evidence suggests that advert saliency conditions had a stronger effect on task distraction measures than individual factors, but better gaze control in children was associated with less distraction.

**Table 2** shows how advert saliency conditions and individual factors affected children's task distraction. Task distraction was divided into dwell time on ads and saccades to ads, and the same independent variables were then used to model effects on both these distraction measures. Tables for these performance measures are shown side by side. The coefficients and *p*-values for each independent variable are shown in the order they were

#### **Table 1 | Effects of independent variables on task performance measures (trial accuracy and trial duration).**


*Significant effects are emphasized with bold face. Child gaze control values have been centered.*

**Table 2 | Effects of independent variables on task distraction measures (dwell time on ads and saccades to ads).**


*Significant effects are emphasized with bold face. Child gaze control values have been centered.*

entered. The level of multicollinearity between independent variables was low. In order to describe the model fit of the independent variables, the deviance of the proposed models were compared to the deviance of unconditional null models in which all independent variables were excluded except the random factor. The proposed models and their corresponding null models were compared using chi-square tests, which showed that the independent variables contributed significantly to explaining the observed variance in both distraction measures. Since the proposed models were used for hypothesis testing rather than modeling the best combination of predictors, no further attempts were made to optimize the models by excluding non-significant independent variables.

#### **4. DISCUSSION**

We have tested the effects of advert saliency conditions on children's internet use while controlling for individual factors. The reported effects are a result of fitting observational data to the statistical model specified by our hypotheses. The main findings on children's task-oriented internet use are as follows: (1) Individual factors such as age, gender and level of gaze control have clear effects on both performance measures as well as distraction measures associated with solving the tasks; (2) Advert onset speed and advert task relevance only have a marginal effect on task performance, but have a clear effect on task distraction. A possible interpretation of these results is that children between 9 and 12 years of age are sensitive to advert saliency conditions on a behavioral level, but are still able to compensate for (or cope with) this distraction on a higher cognitive level, and consistently produce accurate responses during task-oriented internet use.

#### **4.1. INDIVIDUAL FACTORS AND TASK-ORIENTED INTERNET USE**

When focusing on task-oriented internet use in relation to individual differences, a general pattern emerges revealing that individual factors tend to have a more profound impact on performance measures such as trial accuracy and trial duration (supporting H1a), than on distraction measures such as dwell time on ads and saccades to ads (disproving H2a). This difference is seen most clearly when looking at the gender variable, which shows that male gender affects both trial accuracy and trial duration negatively, whereas gender does not have any significant effects on distraction measures. In other words, boys had more difficulty solving the tasks than girls, and boys also needed more time to complete the tasks. However, in terms of distraction measures, boys and girls showed no differences. Looking at the age factor, the results give partial support for our hypotheses in that older children were associated with higher scores on trial accuracy (supporting H1a), but contrary to our expectations, older children were also associated with a significant increase in fixations on adverts (disproving H2a). Thus, older children unsurprisingly performed better than younger children on task accuracy, but children in the older age group also spent more time looking at the adverts. A possible interpretation of this pattern would be that older children have developed a better working memory, enabling them to engage in longer "detours" of attentional distraction, while still keeping track of the task at hand and produce accurate answers.

Still looking at individual factors, the strongest predictor of task performance and task distraction was not age or gender, but gaze control. In this study, gaze control was measured as children's ability to inhibit reflexive eye movements in an anti-saccade task. High scores on gaze control were clearly associated with higher scores on task accuracy (supporting H1a) and lower scores on both distraction measures (supporting H2a). The implication of these results is that children with better gaze control are more able to focus on the actual web-based task at hand while avoiding being distracted by salient internet adverts in the periphery. This interpretation fits nicely with other psychological research that has found strong positive correlations between gaze control and cognitive functions, e.g., working memory (Eenshuistra et al., 2004). According to the experimental design of the current study, we have chosen to investigate the age, gender and gaze control factors independently with regard to the dependent measures. The evidence suggests that children's individual level of gaze control plays an important role as a predictor of task performance and advert distraction. However, these results open up to other interesting lines of research in which the combined effects of these individual factors could be studied more carefully. Such a research direction could allow us to pinpoint various sub-groups among children that are particularly sensitive to advert saliency. For example, gaze control might develop differently in boys and girls, and by examining an interaction between age and gender with regard to gaze control, vulnerable sub-groups might be identified. Also, the contribution of motivational factors on task performance should be addressed in future research.

#### **4.2. ADVERT SALIENCY AND TASK-ORIENTED INTERNET USE**

Contrary to our expectations, advert saliency conditions did not appear to affect performance measures in this study (disproving H1b and H1c). However, advert position, which was entered as a control variable, had a significantly positive effect on trial accuracy. This positive effect on task performance was associated with the bottom-right advert position, which was compared to the top-left position. The implication is that adverts that were placed in the top-left corner of the web page were associated with significantly more errors in trial accuracy, irrespective of advert saliency condition. This result applies directly to previous research on advertising effects where advert position has generated inconsistent results, mostly because of the fact that this property have been difficult to control for (Gidlöf et al., 2012). The present study can therefore conclude that adverts placed in the top-left corner of the web page are associated with a strong detrimental effect on task performance and a strong distractive effect in terms of total dwell time on adverts. A plausible explanation as to why the top-left advert position has a detrimental effect on web page interaction could be that this position tends to coincide with the typical starting position when reading text or when initiating a visual search task (Zelinsky, 1996).

In accordance with the expectations of this study, the advert saliency conditions proved to have strong effects on distraction measures (supporting H2b and H2c). Focusing first on advert onset speed and low-level visual features (H2b), our evidence suggests that abrupt onset speed in adverts caused a significant increase in dwell time on ads, as well as significantly more saccades to ads. That is, abrupt and dynamic visual features of internet adverts affect children's task-oriented internet use by causing distraction from the task, both in terms of attention retention and in terms of attention attraction. Slightly curiously, when investigating the effects of smooth advert onset speed the results tend to run counter to the hypothesized scenario. Thus, this dynamic visual feature is actually associated with less dwell time on ads and significantly less saccades to ads compared to the static control condition. Lower scores on these measures would mean that smooth advert onset speed causes reduced task distraction. To find these diametrically opposed behavioral effects caused by different levels of the same saliency factor (advert onset speed) is puzzling, and consequently this effect should be investigated further. One interpretation would be that smooth advert onset speed allows children to identify and avoid advert content through peripheral vision, while abrupt onset speed exerts a more coercive stimulus on the visual system, causing children to saccade toward the ads and also to fixate on the ads for an extended period of time. Recapitulating the arguments presented in the introduction, abrupt onset speed in adverts could represent a concrete example of "forced exposure" reported by previous internet advertising research done with children. Since advert onset speed is an objective and quantifiable visual feature, it seems to be an advertising property that could effectively be regulated in order to facilitate children's interaction with websites.

Changing focus to advert task relevance (H2c), the results show that this type of saliency causes more consistent and unequivocal effects on the distraction measures utilized in this study. Thus, both levels of advert task relevance were associated with significantly more dwell time on ads as well as significantly more saccades to ads. That is, pictorial advert content that is relevant to the task in some sense, affects children's taskoriented internet use by causing distractions from the task, both in terms of attention retention and in terms of attention attraction. Compared to the onset speed factor, advert task relevance thus appears to have a more powerful and detrimental effect on children's visual interaction with the web pages. These results are consistent with previous research on visual saliency, which have argued that task relevance have a more profound effect on gaze behavior and attention (Findlay and Walker, 1999). The downside of investigating task relevance is that these visual properties are considerably more difficult to define compared to low-level visual features. In the current study, low task relevance was operationalized as pictorial content that consisted of fake dialog windows, while high task relevance consisted of pictorial content that was similar to the target images that the children were solving the tasks for. The effects of these advert conditions were compared to pictorial content in a supposedly task irrelevant control condition. The problem with fake dialog windows and irrelevant advert content is that the subjective relevance of these types of pictorial content is difficult to control for since to some extent they depend on individual interests. Notwithstanding, the high task relevance condition had a more objective implementation in this experiment, and since this condition caused the strongest behavioral effects, we argue that we have successfully managed to document the effects of task relevance in internet adverts. Because of the difficulties in defining task relevance, it would probably be harder to regulate this aspect of internet adverts in order to facilitate children's internet use. The effects of task relevant advert content, could motivate further research into so-called behavioral targeting in advertising, in which web interaction metrics are collected in order to serve up more relevant adverts to users.

From a research perspective, it would be fruitful to develop the experimental paradigm used in the current study to include tests of other types of visual saliency in internet ads. We think that our study design offers a robust combination of ecological validity and experimental control that is well-suited for obtaining reliable and valid behavioral data on children's taskoriented internet use. Consequently, this design could easily be extended to investigate other eye movement measures such as fixation durations and blink rate. Developing our research in this direction would allow us to address questions about cognitive load in children when engaged in task-oriented internet use in commercial online environments. A limitation with the present study was the high overall scores on trial accuracy, which could indicate a ceiling effect on this measure. If the web-based tasks were too easy to solve accurately, then there would be less need for the children to compensate for attentional distractions in order achieve high task accuracy. In order to address the cognitive relationship between task distraction and task performance more thoroughly, we recommend increasing the task difficulty or limiting the trial duration.

From a policy and regulation perspective on online advertising, it is important to take these new findings into consideration when discussing possible restrictions of ads directed to children. Children in different age groups have repeatedly given verbal statements of how annoying, disturbing, and irritating online advertising is during their daily internet activities (Sandberg et al., 2011). Our study has provided empirical evidence that demonstrates how children's task-oriented internet use is disturbed by advertising saliency factors as well as advert positions. However, the children seem to cope with this distraction by adjusting their responses to accommodate for coercive advertising features and thus manage to compensate to some degree for these visual demands when involved in task-oriented internet use, i.e. children's task performance is adequate, but their experience of online advertising might be strenuous, especially for children that suffer from poor gaze control.

#### **ACKNOWLEDGMENTS**

Thanks to the participants of the eye-tracking seminar at Lund University for providing helpful critique of the manuscript. Thanks to Joost van de Weijer at the Department of Linguistics, Lund University, for helping out with the statistics.

#### **FUNDING**

This research was supported by the Crafoord Foundation (grant 20100899) and the Swedish Research Council (grant 421-2010- 1982).

#### **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www.frontiersin.org/Journal/10.3389/fpsyg. 2014.00051/abstract

#### **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 01 November 2013; accepted: 15 January 2014; published online: 12 February 2014.*

*Citation: Holmberg N, Sandberg H and Holmqvist K (2014) Advert saliency distracts children's visual attention during task-oriented internet use. Front. Psychol. 5:51. doi: 10.3389/fpsyg.2014.00051*

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology.*

*Copyright © 2014 Holmberg, Sandberg and Holmqvist. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## The influence of banner advertisements on attention and memory: human faces with averted gaze can enhance advertising effectiveness

#### *Pitch Sajjacholapunt <sup>1</sup> and Linden J. Ball <sup>2</sup> \**

*<sup>1</sup> Department of Psychology, Lancaster University, Lancaster, UK*

*<sup>2</sup> School of Psychology, University of Central Lancashire, Preston, UK*

#### *Edited by:*

*Jaana Simola, University of Helsinki, Finland*

#### *Reviewed by:*

*Pia Knoeferle, Bielefeld University, Germany Jaana Simola, University of Helsinki, Finland*

#### *\*Correspondence:*

*Linden J. Ball, School of Psychology, University of Central Lancashire, Darwin Building, Preston PR2 1HE, UK*

*e-mail: lball@uclan.ac.uk*

Research suggests that banner advertisements used in online marketing are often overlooked, especially when positioned horizontally on webpages. Such inattention invariably gives rise to an inability to remember advertising brands and messages, undermining the effectiveness of this marketing method. Recent interest has focused on whether human faces within banner advertisements can increase attention to the information they contain, since the gaze cues conveyed by faces can influence where observers look. We report an experiment that investigated the efficacy of faces located in banner advertisements to enhance the attentional processing and memorability of banner contents. We tracked participants' eye movements when they examined webpages containing either bottom-right vertical banners or bottom-center horizontal banners. We also manipulated facial information such that banners either contained no face, a face with mutual gaze or a face with averted gaze. We additionally assessed people's memories for brands and advertising messages. Results indicated that relative to other conditions, the condition involving faces with averted gaze increased attention to the banner overall, as well as to the advertising text and product. Memorability of the brand and advertising message was also enhanced. Conversely, in the condition involving faces with mutual gaze, the focus of attention was localized more on the face region rather than on the text or product, weakening any memory benefits for the brand and advertising message. This detrimental impact of mutual gaze on attention to advertised products was especially marked for vertical banners. These results demonstrate that the inclusion of human faces with averted gaze in banner advertisements provides a promising means for marketers to increase the attention paid to such adverts, thereby enhancing memory for advertising information.

**Keywords: online banner advertising, human faces, eye tracking, gaze cues, averted gaze, mutual gaze, memory, advertising effectiveness**

#### **INTRODUCTION**

The Internet World Statistic Report (2012) indicated that nearly 2 billion people were using the Internet in 2011, compared to 360 million in 2000. This major growth in Internet usage has been paralleled by an exponential increase in online advertising, with investment reaching an estimated 31 billion dollars in 2011, surpassing that of advertising via cable and broadcast television (Internet Advertising Revenue Report, 2012). Website advertisements include pop-ups, videos and on-site sponsorship (Schumann and Thorson, 2007), but it is the simple banner advertisement that appears to be the most enduring format subsequent to its initial appearance in 1994 (Cho, 2003). Banner advertisements arise in various rectangular-shaped graphics, including skyscrapers (120 × 600 pixels), squares (250 × 250 pixels), large rectangles (336 × 280 pixels) and vertical rectangles (240 × 400 pixels). Color, animation, and interactivity are often included in the advertisement in an attempt to capture attention, with the interactivity element also providing a way to track user interest (Zeff and Aronson, 1999). Ultimately, the popularity of banner advertisements appears to derive from their considerable flexibility and targetability as devices for marketing products and brands.

Despite the dominance of banner advertisements in Internet advertising, their effectiveness remains debatable. Benway and Lane (1998) demonstrated that web users tend to avoid looking at such advertisements even when they are designed to be attention-grabbing—a phenomenon referred to as "banner blindness." More recent research has emphasized the importance of quantifying the effectiveness of banner advertisements: (1) by using metrics derived from eye-movement tracking, which can indicate overt attentional shifts to such advertisements; and (2) through tests of people's memory for banner contents. Using such measures, Drèze and Hussherr (2003) showed that web users fixate more on banner advertisements that are relevant to their goal-directed searches, which also leads to an increase in the memorability of those advertisements. They also found that banner advertisements were more effective when placed vertically to the left or right of webpages, as opposed to horizontally at the top or bottom of webpages.

Drèze and Hussherr's (2003) results also resonate with recent findings from a series of eye tracking experiments reported by Simola et al. (2011), which examined the attentional impact of salient advertisements placed simultaneously at two locations on authentic webpages: above a central portion of text and to the right of the text. Results indicated that banner advertisements attracted overt attention (as indexed by eye fixations on the advertisement) and that such attentional capture was especially marked when the advertisements were vertical and to the right of the webpage text. Simola et al. propose that this effect is a likely consequence of Western readers having a perceptual span that is highly biased toward the right of fixation (i.e., around 15 letters) rather than to the left of fixation (only around 3–4 letters; see Rayner, 1998). Simola et al. (2011) also found that right-located vertical banners were particularly attention demanding either when they contained animated features that contrasted with static horizontal advertisements appearing simultaneously at the top of the page or when they appeared abruptly after a random time interval.

The findings arising from the research of Drèze and Hussherr (2003) and Simola et al. (2011) pose a problem for investment in online advertising given that horizontal banners are far more prevalent on websites than vertical banners (Hussain et al., 2010), whereas it is vertical banners located to the right of webpage text that appear to have a greater capacity to capture attention (see Simola et al., 2013, for similar evidence from a study of attention and memory for newspaper advertisements). It is also interesting to reflect on the issue of banner location in the light of Nielsen's (2006) research, which has shown that web users normally extract information from webpages in an F-shaped pattern: they start off looking at page elements from the top left to the top right, they then read down the page slightly, again from left to the right, and finally continue to fixate downward on the left side of the page. This F-shaped reading pattern would suggest that it is should be components that are placed at the bottom center and the bottom right of a webpage that are most likely to be *overlooked* (see also Djamasbi et al., 2010). However, Simola et al.'s (2011) findings raise the possibility that attentional capture to banner advertisements may be effective even for vertical advertisements located to the bottom-right of pages in cases where such advertisements are co-located alongside webpage text. Admittedly, this proposal has not yet received empirical support since Simola et al.'s experiments only involved vertical banners that extended well above the half-way point of webpages. The present research therefore aimed to address the banner location issue by manipulating the position of banners on webpages such that they appeared either at the bottom-right of presented webpages in a position adjacent to the centrally-located text or at the bottom-center of webpages. According to Nielsen's (2006) Fshaped reading pattern it would be expected that both of these banner locations would be equally poor for attention capture. In contrast, Simola et al.'s (2011) findings lead to the prediction that the vertical banners (located bottom-right) should be associated with increased attentional capture relative to the horizontal banners (located bottom-center).

Clearly, any on-line advertisements that fail to capture or hold a viewer's attention will generally be ineffective in instilling product knowledge or brand awareness (Keller and Lehmann, 2006; Maughan et al., 2007). This is why advertisers have become increasingly interested in ways to augment the attention-grabbing capacity of on-line advertisements using techniques such as animation or their abrupt appearance, which can drive attention in a "bottom-up," data-driven manner. However, there is also evidence that web users are able to exercise strategic, top-down control of attention such that they can override bottom-up attentional capture arising from salient low-level information such as motion (e.g., see Burke et al., 2005). In addition, there is evidence that having to exercise such top-down control leads web users to report negatively about their website experience, claiming higher perceived workload and a greater sense of irritation and distraction (e.g., Zhang, 2000; Gao et al., 2004; Burke et al., 2005). The negative effects of animated advertisements on the experience of web users means that advertisers are continually examining new and more subtle ways to design banner advertisements that may have a facilitatory impact on people's attention allocation and memory without being annoying. One factor that Wedel and Pieters (2007) suggest needs far greater research in online advertising contexts is the role of the human face, which may be able to draw a viewer's attention to banner advertisements and the content therein. The present research aimed to investigate the capacity of human faces to capture attention to banner advertisements and thereby to facilitate enhanced memory for banner contents. We examined this issue in conjunction with assessing the banner location factor that we have already discussed.

Returning to the potential role of facial images in cueing attention we note that faces are considered to be uniquely potent stimuli for attracting visual attention owing to their social importance for understanding others' characteristics, personalities, intentions, and emotions (e.g., Emery, 2000; Vuilleumier and Schwartz, 2001). Evidence from neuroimaging (e.g., Kanwisher et al., 1997) indicates that face perception is underpinned by specialized neural systems (Tsao and Livingstone, 2008), whilst behavioral data show that when faces are presented in a visual scene along with other stimuli they capture a viewer's attention more readily than do the other objects (Vuilleumier, 2000; Ro et al., 2001). Indeed, Langton et al. (2008) found that when participants were asked to search for a target object (images of butterflies) in the presence of an irrelevant image of a human face they found the face distracting. These findings suggest that faces might well serve as a powerful means for attracting and holding a viewer's attention in an online advertising context.

Eye-movement studies of face processing (e.g., Althoff and Cohen, 1999) have clarified that people spend more time viewing internal features of faces (i.e., the eyes, nose, and mouth) than external features (i.e., hair, ears, and face contours). Indeed, many studies have shown that the eyes are the most attended facial region and are the most valuable source of information for social communication (e.g., through the portrayal of emotion and thought) and for directing the attention of others (e.g., Henderson et al., 2005; Frischen et al., 2007; Itier and Batty, 2009). If the position of the dark iris is observed to be in the middle of the white sclera, then people perceive this gaze as looking straight at them (i.e., "direct" or "mutual" gaze). In contrast, if the position of the dark iris is situated to the left or the right of the sclera, thereby creating the large visible area of white, then the observer perceives this as "averted" gaze (Itier and Batty, 2009).

Research has indicated that mutual gaze is more efficient in capturing attention than averted gaze (e.g., Senju et al., 2005; Conty et al., 2006; Frischen et al., 2007), whilst other studies have shown that averted gaze conveys to an observer that the person being observed (we subsequently use the term "model") is paying attention to a particular object or location that follows their direction of gaze (Baron-Cohen, 1995; Emery, 2000). Such averted gaze can thereby have an impact on orienting the focus of an observer's attention such that both the model and the observer pay attention to the same location or object and engage in "joint attention" (Itier and Batty, 2009). Evidence also supports the view that the gaze cueing that leads to joint attention arises rapidly and reflexively (e.g., Friesen and Kingstone, 1998; Hood et al., 1998). Although many studies have addressed the reflexive, orienting effects of averted gaze cueing in naturalistic situations with actual people present, there are also numerous studies that have used photographic depictions of a human face presented centrally to an observer (Driver et al., 1999; Langton and Bruce, 1999; Vuilleumier, 2002; Mansfield et al., 2003). This research has again shown reflexive shifts of attention with photographic images depicting averted gaze (see also Ricciardelli et al., 2002; Mansfield et al., 2003), confirming that photographic images of faces can drive a stimulus-driven orienting response toward a gazed-at location that cannot be suppressed.

The aforementioned findings suggest that placing an image of a face within a banner advertisement with the face depicting averted gaze might serve as an effective trigger for capturing and orienting a viewer's attention toward advertised information, despite the presence of other stimuli on the webpage. To examine this possibility the experiment we report below manipulated the presence vs. absence of faces within banner advertisements and also examined the issue of whether mutual gaze vs. averted gaze might differentially impact the level of attention to advertised information. Based on existing evidence we would predict that faces involving mutual gaze would lead viewers to pay more attention to the model's face itself rather than to the text or products in the advertisements. In contrast, faces involving averted gaze would be predicted to orient reflexively the focus of a viewer's attention to advertised texts and products embedded within the advertisements.

Very little research appears to have been undertaken to explore the power of a model's gaze cues to influence people's attention toward print and online advertisements. One relevant study is reported by Straub (2008), who used eye-tracking and the presentation of a female face on a computer screen to examine the effect of gaze cues (mutual gaze vs. averted gaze) on attention to a shampoo advertisement. The results suggested that when the eyes of the model were looking at the advertised text and product (averted gaze), then participants were likely to look at the internal features on the face (e.g., eyes and nose) and then to fixate intensively on the advertised text and products. Conversely, when the eyes of the same model were looking straight ahead at the viewer (mutual gaze), participants were prone to fixate intensely on the face while spending less time on the advertised text and products. The results of this study support the prediction that the perception of different gaze direction can affect the gaze patterns of viewers looking at advertisements on the screen, but it remains unknown what the impact might be of gaze cues on banner advertisements located on webpages involving realistic online content.

An attendant issue that has not been examined concerns the effects of human faces with gaze cues on people's memory for advertisements. Although it is known that information that is fixated for longer tends to be better remembered (Irwin and Zelinsky, 2002), it is nevertheless, important to generalize this finding to the context of banner advertising. As such, the present research not only addressed the influence of gaze cues on attention to banner advertisements but also the effectiveness of these gaze cues on memory for advertising content. Many eyetracking studies have shown that memory for advertised text or brands contained in banner advertisements is poor, even when the banner advertisement has been fixated, although there is also evidence that memory for banner contents is positively correlated with the overall time that people attend to the advertisement (e.g., Drèze and Hussherr, 2003; Burke et al., 2005).

It should be noted, however, that most studies of memory for banner advertisements have relied on explicit memory tests such as recognition and recall (Bayles, 2000; Drèze and Hussherr, 2003; Burke et al., 2005; Calisir and Karaali, 2008; Chatterjee, 2008). Explicit testing is limited in what it can tell us, not least because information that is presented in banner advertisements but seemingly ignored may still be processed to some extent, such that retained information may be detectable using implicit measures even when it is not revealed using explicit measures (Heath and Nairn, 2005; Yoo, 2007, 2008). Indeed, several studies have demonstrated implicit memory for advertising content in the absence of explicit recall or recognition, as evidenced through priming effects arising in indirect memory testing (e.g., Petre, 2005; Yoo, 2007). In the present experiment we deployed both explicit and implicit memory tests to measure the memorability of banner advertisements so as to counter any shortcomings arising from an exclusive reliance on traditional, explicit testing methods.

Based on the empirical research and theoretical perspectives reviewed above, four predictions were formulated in relation to our reported experiment:


It should be noted that no predictions were made relating to possible interactive effects on attention and memory arising from the combined influences of the banner type (vertical vs. horizontal) and face condition manipulations. We had no a priori reasons to motivate specific hypotheses regarding the likely presence of moderator effects given the limited existing research that has been pursued on these factors to date.

#### **MATERIALS AND METHODS**

#### **PARTICIPANTS**

The study involved 72 participants (36 male, 36 female) aged between 18 and 32 years (*M* = 22*.*9 years, *SD* = 1*.*53 years). Participants were undergraduate and postgraduate students at Lancaster University, UK, studying in a range of disciplines. Each participant received £8 for taking part and all had extensive experience of using the Internet for a period of at least 8 months prior to the study.

#### **DESIGN**

The experiment involved a 2 × 3 mixed within-between participants design. The within-participants factor was the banner type on the webpage, with two levels: vertical banner (located at the bottom-right) vs. horizontal banner (located at the bottomcenter). The between-participants factor—referred to as face condition—had three levels: banner advertisements without a face (no face); banner advertisements containing a face with mutual gaze looking directly at the observer (mutual gaze); and banner advertisements containing a face with averted gaze looking at the advertised text and product (averted gaze).

The dependent variables in the eye-tracking part of the experiment were the average fixation duration and the total dwell time within three regions of interest (ROIs) located within the banner advertisements: faces (where these were present), advertised text and product. Note that total dwell time is the sum of all fixation durations within a particular ROI. Research has suggested that longer average fixation durations and longer total dwell times are both indicative of information being more attention demanding and engaging (Rayner, 1998; Poole and Ball, 2006; Holmqvist et al., 2011). For the memory phase of the study the dependent variable for the explicit memory test was the recognition score for presented brand names, whereas for the implicit memory test it was the word fragmentation completion score for aspects of the advertising text. Participants were randomly assigned to one of the three face conditions, with an equal number of participants and an equal gender split in each condition.

#### **EQUIPMENT**

An ASL eye-tracker was used to record participants' eye movements whilst they performed a goal-directed browsing task. An infrared camera mounted below the computer screen was used to capture eye-movement data by recording the reflections from a participant's retina and cornea that arose from light being projected at the eyes from an infrared LED. These reflections were used to calibrate gaze positions on the screen (Duchowski, 2003; Poole and Ball, 2006). The experiment was controlled via a desktop computer.

#### **FABRICATED WEBPAGES**

Five thematically-linked webpages were designed that provided authentic factual information about healthy eating and the benefits of different vitamins and minerals (i.e., vitamin E, vitamin C, calcium, iodine, and zinc). One of the created webpages (i.e., concerning the mineral iodine) was always used as a "familiarization" trial so as to acquaint participants with the general style and information content of the webpages used in the experiment. The remaining four webpages were used as "target" trials, with two webpages presenting a vertical banner advertisement and two presenting a horizontal banner advertisement to each participant. The order of presentation of the target webpages was controlled in the manner explained in the Procedure section below. The fabricated webpages were realistic and in alignment with typical webpages that are found during everyday information searches on the Internet. Page headings, navigation bars, search boxes, and graphics were all located at conventional positions. Examples of two such pages are presented in **Figures 1**, **2**. Note that **Figure 1** depicts a vertical banner advertisement, whilst **Figure 2** depicts a horizontal banner advertisement. All presented information on the webpages relating to vitamins and minerals was gender-neutral and was easy to understand. The information concerned good sources of particular vitamins and minerals, quantities needed for health benefits, side effects from excessive intake, Department of Health advice, useful links, and top tips.

#### **BANNER ADVERTISEMENTS**

Vertical banner advertisements (226 × 246 pixels) were created for two fictitious products (i.e., "Redden" hair shine and "Aqua" mineral cleansing foam). Horizontal banner advertisements (606 × 96 pixels) were created for two other fictitious products (i.e., "Cutie-kids" clothing for children and "Orchid Thai" restaurant cuisine). All banner advertisements also included a small amount of product-specific text. Three versions of each banner advertisement were designed, one that did not include a face, one that included a face with the model's eyes looking straight ahead at the observer (mutual gaze), and one that included a face with the model's eyes averted toward the advertised text and product. **Figures 3**–**5** show an example of a vertical banner advertisement for "Redden" hair-shine in each of the three conditions: no face (**Figure 3**), mutual gaze (**Figure 4**), and averted gaze (**Figure 5**). **Figures 6**–**8** show an example of a horizontal banner advertisement for "Orchid Thai" restaurant cuisine across the same three face conditions.

To minimize the confounding effect of brand familiarity on attention and memory (e.g., Dahlén, 2001), all brands that we used were fictitious. In addition, we ensured that the product information in all banners was semantically incongruent with the information content of the webpage that they appeared on. We note that some research has shown that congruent advertisements increase attention to the advertising information and its subsequent memorability (e.g., Finlay et al., 2005; Hervet et al., 2011), whereas other research has revealed the opposite effect, whereby incongruent advertisements increase attentional capture and improve memory for advertising content (e.g., Dahlén et al., 2005). A recent eye-tracking study by Simola et al. (2013) that examined semantic incongruency in the context of newspaper




**FIGURE 1 | An example webpage used in the experiment that presented information about zinc along with useful links and a vertical banner advertisement (located bottom-right).**

**FIGURE 2 | An example webpage used in the experiment that presented information about Vitamin E along with useful links and a horizontal banner advertisement (located bottom-center).**

**FIGURE 3 | Fabricated vertical banner advertisement for "Redden" in the no face condition.**

advertisements revealed that incongruency increased attention to advertisements whereas congruency improved advert recognition. The key upshot of these conflicting findings is that it is vital to control for advertisement congruency/incongruency effects in a study such as the present one by standardizing the relationship between advertisements and webpage information. As noted, we achieved this by ensuring that all banner advertisements were semantically incongruent.

#### **MEMORY TESTS**

To develop an explicit recognition test for the banner advertisements, two false lures were created for each advertisement by

**FIGURE 5 | Fabricated vertical banner advertisement for "Redden" in the averted gaze condition.**

changing only the brand name. For example, the correct brand name for the "Orchid Thai" advertisement (see **Figures 6**–**8**) was replaced with either "Mung Mee" or "Oriental Cuisine." In this way the other graphical and textual aspects of the advert were controlled so as to be consistent across the distractor items.

To assess people's implicit memory for the advertising message within each banner advertisement we developed a word fragment completion test in which participants were asked to complete fragments in which some consonants and vowels were missing (Fennis and Stroebe, 2010). To develop a list of word fragments we first constructed a pool of 35 words, with 20 being "target" words derived from the banner advertisements and 15 being "distractor" words selected from Tulving et al.'s (1982) study examining priming in word recognition. Having fragmented these words we then asked 30 students at Lancaster University (age range = 18–32 years, *M* = 22*.*6 years, *SD* = 1*.*06 years) to complete the fragments to make real words. Of the 35 words tested only 12 target words and 8 distractor words showed a correct completion rate of between 15 and 46%, which is a standard criterion for acceptability to avoid floor and ceiling effects (Tulving et al., 1982; Yoo, 2008). As examples, we note that the target words (and fragmented versions) for the banner advertisement relating to "Orchid Thai" restaurant cuisine, as shown in **Figures 6**–**8**, included: orchid (O\_C\_ \_D), restaurant (R\_ \_ \_ \_ UR\_ \_T), and Lancaster (L\_N\_A\_ \_ \_R). Examples of distracter items (and fragmented versions) included: mystery (\_YS\_ \_RY), horizon (HO\_ \_ \_ON), approval (APP\_ \_ \_AL), and chimney (\_ \_IMN\_Y).

The mean completion rates of the final 12 target words and 8 distractor words were 21.66% (*M* = 3*.*23, *SD* = 2*.*38) and 17.99% (*M* = 2*.*60, *SD* = 1*.*83), respectively. A paired-sample *t*test demonstrated that there was no significant difference in word completion rate between the target and distractor words, *t(*30*)* = 1*.*596, *p* = 0*.*121. Accordingly, these words were deemed to be

**FIGURE 8 | Fabricated horizontal banner advertisement for "Orchid Thai" in the averted gaze condition.**

suitable for use in the main experiment. We note that our word selection process meant that some banner advertisements were less well represented than others in the final implicit memory test used in our experiment. Indeed, the final test involved twice as many word fragments derived from the horizontal advertisements than from the vertical advertisements. To deal with this issue in our analysis of the implicit memory data (see below) we therefore derived "percentage correct" word fragment completion scores for items derived from vertical banners vs. horizontal banners, which standardized the scoring.

#### **PROCEDURE**

The experiment was run in a small, quiet eye-tracking laboratory. Participants were briefed and asked to sign a consent form prior to the study and were then asked to read information about a fictitious individual's symptoms of feeling unwell, as follows: "My name is Andy and I always get colds. I walk and move very slowly because my knees hurt. I cannot remember things well. Also, my skin is very dry, which makes me feel itchy and I easily get wounds." Participants were subsequently instructed to browse through the five presented webpages so as to advise on the choice of vitamins and minerals suitable to relieve Andy's symptoms. This ensured that participants were provided with personal, goal-directed task instructions aimed at ensuring their focus on reading for comprehension. After this introductory session, but prior to browsing the webpages, participants were asked to sit about 50 cm from the computer screen and to undertake an eyemovement calibration procedure. This involved them fixating on nine small black crosses located in a 3 × 3 grid on the computer screen, without moving their head or body.

After calibration, participants were exposed to the initial familiarization webpage containing information about the mineral iodine in the absence of a banner advertisement. This trial aimed to acquaint participants with the style and content of the webpages, although it should be noted that participants were unaware that this trial served a purely practice function. Immediately after the familiarization trial participants were exposed to the four target webpages that formed the experimental trials, with each webpage presenting further information about vitamins and minerals in addition to either a vertical or horizontal banner advertisement. The order of these four experimental trials was counterbalanced such that there were 24 different orders per condition (i.e., 4!). This meant that each of the 24 participants within a condition received the target webpages in a unique order.

When the participant had finished reading the information about vitamins and minerals on a webpage they could move on to read the next webpage by clicking on the left button of the mouse. Throughout the webpage browsing task the eye tracker measured gaze behavior in relation to designated ROIs within the banner advertisement on each webpage, namely, the faces (where present), the brand name and associated text, and the product itself. Immediately after the browsing task each participant's implicit memory was tested using the word fragmentation completion test. This involved presenting participants with a sheet of paper that provided the list of incomplete words and asking them to complete them as best they could within 6 min. Participants then completed the recognition task, in which they were presented with two distractor advertisements and one target advertisement for each banner advertisement presented previously (the presentation order mapped onto the counterbalanced order in which banners had appeared during the browsing trials). The distractor advertisements were created by changing only the brand names from the target banner advertisements. Following the memory tests participants were asked to present a verbal account of how they would advise Andy in terms of his vitamin and mineral intake to improve his well-being.

#### **RESULTS**

#### **EYE-TRACKING DATA**

The mean fixation duration data and the mean dwell time data were examined for skew and deviations from a normal distribution. It was found that all conditions showed a degree of positive skew—as is typical with time-based data—although in all cases but one the positive skew values were modest and less than +2.5, which is typically viewed as acceptable threshold for conducting parametric data analyses (e.g., see Tabachnick and Fidell, 1996). These violations of normality were confirmed through the application of the Kolmogorov-Smirnov test, which indicated that around 50% of the conditions involved data distributions that deviated significantly from normality.

Our approach to dealing with these modest violations from parametric testing assumptions was to pursue logarithmic transformations of our time-based data subsequent to the addition of a constant of 1.0 (to handle scores at or close to zero). This method was successful in reducing positive skew, normalizing the data and stabilizing variances. We next pursued equivalent parametric tests using both the transformed and the untransformed data. These separate analyses produced a very similar pattern of significant and non-significant effects, with similar effect magnitudes, although the transformed data typically yielded results with larger effect sizes. In the sub-sections below we limit our presentation of statistical findings to the outcomes of inferential tests undertaken on the *transformed* data. For ease of interpretation, however, we present graphical depictions of the untransformed time-based data in natural units (milliseconds).

We finally note that although we conducted a full set of inferential analyses for the mean fixation duration data *and* for the mean dwell time data, it was observed that both types of data produced near identical patterns of statistically significant effects. In order to limit the length of this article and provide a more focused narrative we only report below the results of the analyses undertaken on the mean dwell time data.

#### *Mean dwell time on banner advertisements*

The first analysis of mean dwell time data examined the predictions that: (1) vertical banners (located bottom-right) give rise to increased attention to the whole advertisement relative to horizontal banners (located bottom-center); and (2) banner advertisements containing a face give rise to increased attention to the whole advertisement relative to banner advertisements where a face is absent. To test these predictions a 2 × 3 mixed factorial ANOVA was conducted on the log-transformed mean dwell time arising across the full extent of banner advertisements (vertical vs. horizontal) as a function of face condition (see **Figure 9** for the natural data).

The analysis showed no significant main effect of banner type, *F <* 1, but did reveal a significant main effect of face condition, *<sup>F</sup>(*2*,* <sup>69</sup>*)* <sup>=</sup> <sup>12</sup>*.*86, *<sup>p</sup> <sup>&</sup>lt;* <sup>0</sup>*.*001, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0*.*27. *Post-hoc* Bonferroni tests (alpha level = 0.05) showed that the mean dwell time in the averted gaze condition and the mutual gaze condition were both significantly higher than in the no face condition, but there was no significant difference between the mutual gaze and averted gaze conditions. There was also a significant banner type × face condition interaction, *<sup>F</sup>(*2*,* <sup>69</sup>*)* <sup>=</sup> <sup>4</sup>*.*01, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>*.*02, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0*.*10, with vertical banners attracting more attention than horizontal banners in the mutual gaze condition, with the reverse being the case in the no face condition and with no difference in the averted gaze condition. Overall these findings do not provide any clearcut support for increased attention arising for vertical banners over horizontal ones, but they do support the prediction that banner advertisements containing a face with either averted gaze or mutual can increase attentional capture relative to banner advertisements where a face is absent—at least when attentional capture is measured in terms of mean dwell time.

#### *Mean dwell time on the region of interest relating to the face*

Our next analysis of the mean dwell time data focused on the prediction that banner advertisements containing a face with mutual gaze give rise to increased attention to the face ROI compared to banner advertisements containing a face with averted gaze. To examine this prediction we conducted a 2 × 2 mixed factorial ANOVA with a within-participant factor of banner type (vertical vs. horizontal) and a between-participants factor of face condition (mutual gaze vs. averted gaze). Note that the reason for conducting a 2 × 2 ANOVA for this analysis compared to the 2 × 3 ANOVA in the previous analysis was simply a consequence of the fact that a face ROI did not exist in the banners that were employed in the no face condition. Such banners did not therefore

include an ROI that could act as a meaningful comparison region to the face ROI that existed in the mutual gaze and averted gaze conditions.

The dependent variable in this analysis was the logtransformed mean dwell time on the face ROI (see **Figure 10** for natural data). The main effect of banner type was not significant, *<sup>F</sup>(*1*,* <sup>46</sup>*)* <sup>=</sup> <sup>2</sup>*.*57, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>*.*12, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0*.*05. There was, however, a significant main effect of face condition, *F(*1*,* <sup>46</sup>*)* = 11*.*19, *p* = 0*.*002, η2 *<sup>p</sup>* = 0*.*20, with the mutual gaze condition promoting increased mean dwell time on faces relative to the averted gaze condition. There was also a significant banner type × face condition interaction, *<sup>F</sup>(*1*,* <sup>46</sup>*)* <sup>=</sup> <sup>7</sup>*.*43, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>*.*009, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0*.*14. **Figure 10** shows that for the horizontal banners the mean dwell time on faces was similar whether the faces involved mutual or averted gaze. In contrast, for vertical banners the mean dwell time on faces was longer in the mutual gaze condition than the averted gaze condition, suggesting that vertical banners are more sensitive to the facial gaze manipulation than horizontal banners.

#### *Mean dwell time on the regions of interest relating to the advertising text and product*

We next assessed the prediction that banner advertisements containing a face with averted gaze give rise to increased attention to the advertising text and the product compared to banner advertisements either containing a face with mutual gaze or no face. Our first analysis involved undertaking a 2 × 3 mixed factorial ANOVA to examine the log-transformed mean dwell time on *each word of advertising text* within vertical vs. horizontal banners across all three face conditions (see **Figure 11** for natural data). We analyzed mean dwell time per word in order to control for the fact that there was twice as much text present in the horizontal banner advertisements (*M* = 11 words), than in the vertical banner advertisements (*M* = 5*.*5 words). To derive an approximation of a participant's mean dwell time per word for a particular banner we took their overall dwell time on the text ROI and divided this by the number of words within the ROI (see Ball et al., 2005, for another application of this "dwell time per word" standardization procedure).

The analysis of the resulting text-oriented dwell time data showed that there was no main effect of banner type, *F <* 1, but there was a significant main effect of face condition, *F(*2*,* <sup>69</sup>*)* =

12*.*24, *p <* 0*.*001, η<sup>2</sup> *<sup>p</sup>* = 0*.*26. A series of *post-hoc* Bonferroni tests (alpha level = 0.05) were pursued to follow up the main effects of face condition. These tests revealed that the mean dwell time per word in the no face condition was not significantly different to that in the mutual gaze condition. However, both the no face condition and the mutual gaze condition had significantly lower mean dwell times per word than the averted gaze condition. The ANOVA also revealed a significant banner type × face condition interaction, *<sup>F</sup>(*2*,* <sup>69</sup>*)* <sup>=</sup> <sup>5</sup>*.*38, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>*.*007, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0*.*14. The data depicted in **Figure 11** suggest that in the no face condition there was increased dwell time on the text of the horizontal banners compared to the vertical banners, whilst the pattern reversed in the averted gaze condition. In the mutual gaze condition there was little difference between horizontal and vertical banner in terms of dwell time on the advertising text.

The second ANOVA that we conducted examined logtransformed mean dwell time on the *product* ROI for vertical vs. horizontal banners across all three face conditions (see **Figure 12** for natural data). This analysis revealed that there was a significant main effect of banner type, *F(*1*,* <sup>69</sup>*)* = 18*.*32, *p <* 0*.*001, η2 *<sup>p</sup>* = 0*.*21, with participants' mean dwell times on the product

**FIGURE 11 | Mean dwell time per word (milliseconds) within the advertising text ROI for vertical vs. horizontal banner advertisements as a function of face condition (error bars depict the standard error of the mean).**

**condition (error bars depict the standard error of the mean).**

in the vertical banners being longer than in the horizontal banners. There was also a significant main effect of face condition, *<sup>F</sup>(*2*,* <sup>69</sup>*)* <sup>=</sup> <sup>6</sup>*.*31, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>*.*003, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0*.*16. A series of *post-hoc* Bonferroni tests (alpha level = 0.05) were pursued to follow up the main effects of face condition. It was observed that the mean dwell time on the product region of banner advertisements in the no face condition was not significantly different to the mutual gaze condition. There was also no significant dwell time difference on products in the mutual gaze vs. averted gaze conditions. However, the no face condition had a significantly lower mean dwell time on the product compared to the averted gaze condition. The ANOVA also gave rise to a significant banner type × face condition interaction, *F(*2*,* <sup>69</sup>*)* = 3*.*20, *p* = 0*.*047, η2 *<sup>p</sup>* = 0*.*09. This interaction effect appears to be caused by the incrementally increasing dwell time on product information in vertical banners that arises across the no face condition, followed by the mutual gaze condition, followed by the averted gaze condition. The horizontal banners show no such effect across face conditions.

#### **MEMORY DATA**

#### *Explicit memory for brand information*

We predicted that that banner advertisements containing a face with averted gaze would show increased explicit memory for brand information compared to banner advertisements either containing a face with mutual gaze or no face. To examine this prediction we conducted a 2 × 3 mixed factorial ANOVA on correct recognition scores for brands for vertical vs. horizontal banners across all three face conditions (no face vs. mutual gaze vs. averted gaze). This ANOVA revealed no main effect of banner type, *F <* 1, but it did reveal a significant main effect of face condition, *<sup>F</sup>(*2*,* <sup>69</sup>*)* <sup>=</sup> <sup>19</sup>*.*43, *<sup>p</sup> <sup>&</sup>lt;* <sup>0</sup>*.*001, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0*.*36. The banner type × face condition interaction was not reliable, *F <* 1. The main effect of face condition was explored using *posthoc* Bonferroni tests (alpha level = 0.05) and showed that the mean recognition score for brand names in the no face condition (0.54 items) was significantly lower than for products in the mutual gaze condition (1.06 items) and the averted gaze condition (1.50 items). The better recognition performance in the averted gaze condition compared to the mutual gaze condition was also shown to be statistically reliable. These results provide good support for the predicted increase in the recognition of product names in the averted gaze condition relative to the other conditions.

#### *Implicit memory for the advertising text*

We predicted that banner advertisements containing a face with averted gaze would show increased implicit memory for the advertising text compared to banner advertisements either containing a face with mutual gaze or no face. To test this prediction we conducted a 2 × 3 mixed factorial Analysis of Covariance (ANCOVA) on participants' implicit memory scores (i.e., their percentage correct word fragment completions for word items contained in vertical banners vs. word items contained in horizontal banners) across all three face conditions (no face vs. mutual gaze vs. averted gaze). This analysis included participants' correct *distracter item* word fragment completions as a covariate, since performance in relation to such distractor items that have not been encountered in the context of the experiment can be viewed as a good measure of a participant's baseline word fragment completion ability (see Ball et al., 2010).

This ANCOVA analysis of the percentage of correct word fragment completions (**Figure 13**) revealed no main effect of banner type, *<sup>F</sup>(*1*,* <sup>68</sup>*)* <sup>=</sup> <sup>2</sup>*.*71, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>*.*01, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0*.*04, but it did give rise to a significant main effect of face condition, *F(*2*,* <sup>68</sup>*)* = 14*.*84, *p <* 0*.*001, η<sup>2</sup> *<sup>p</sup>* = 0*.*30. The banner type × face condition interaction was also reliable, *<sup>F</sup>(*2*,* <sup>68</sup>*)* <sup>=</sup> <sup>7</sup>*.*72, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>*.*001, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0*.*19. The main effect of face condition was explored further using *post-hoc* Bonferroni tests (alpha level = 0.05), which indicated that the mean percentage target completion score in the mutual gaze condition (*M* = 47*.*79%) was not significantly different to the no face condition (*M* = 41*.*54%). However, the averted gaze condition was associated with a significantly higher mean target completion score (*M* = 64*.*48%) relative to each of the other two conditions. These results indicate that, as predicted, banner advertisements containing faces with averted gaze looking at advertised texts have a greater ability to improve participants' implicit memory performance for advertising contents than do banner advertisements containing faces with mutual gaze cues or advertisements with no faces.

This significant interaction between banner type and face condition appears to be caused by the more marked improvement in implicit memory scores for the vertical banners compared to the horizontal banners that arises across the no face condition, followed by the mutual gaze condition, followed by the averted gaze condition (see **Figure 13**). Again, this finding supports the eye tracking results reported above, which revealed that the vertical banners in our study were more sensitive to the facial gaze manipulation than were the horizontal banners. It appears, moreover, that such increased sensitivity to advertising contents is consequential, giving rise to enhance implicit memory performance for the adverting text in the banners that were attended to more assiduously, most notably the vertical banners containing faces with averted gaze.

### **DISCUSSION**

The present study used a combination of eye-tracking analysis and explicit and implicit memory testing to determine the effects on attention to (and memory for) banner advertisements arising from two factors: (1) the location of banner advertisements on webpages (i.e., vertical banners positioned bottom-right vs. horizontal banners positioned bottom-center); and (2) the presence of facial images within banner advertisements (i.e., no face vs. face with mutual gaze vs. face with averted gaze).

In relation to the issue of the location of banner advertisements, it is noteworthy that some prior studies have suggested that the bottom-right and the bottom-center areas of webpages may typically be overlooked by web users (e.g., Nielsen, 2006; Djamasbi et al., 2010). However, more recent experimental research by Simola et al. (2011) has indicated that heightened levels of attentional capture can arise for vertical banner advertisements that are co-located alongside webpage text. The bottom-right located vertical banners that we examined in our experiment have not previously been examined in terms of attention and memory, but based on Simola et al.'s evidence we predicted that these would be associated with increased attentional capture and improved memory relative to the bottom-center located horizontal banner advertisements.

In relation to the issue of face presence within banner advertisements, the critical comparison in our study was between performance arising in a control condition in which the presented advertisement did not include a face vs. performance in experimental conditions in which the advertisement contained a face with mutual gaze (directed at the viewer) or a face with averted gaze (directed at the product and advertising text). Based on extant evidence (e.g., Straub, 2008), it was predicted that faces with mutual gaze would result in people paying more attention to the model's face rather than to the text or products in the advertisements. In contrast, faces with averted gaze were predicted to orient the focus of the viewer's attention automatically to the text or products embedded in the advertisements. Such increased attention to advertising information was also predicted to impact on the viewer's memory for brands and advertising messages. In our study we not only used an explicit recognition test to measure memory for brands, but also an implicit word fragment completion test to assess more subtle, priming-based evidence for the memorability of advertising information.

#### **THE ROLE OF BANNER LOCATION IN CUEING ATTENTION TO AND MEMORY FOR BANNER ADVERTISEMENTS**

The primary measure that we used to determine the attention that a participant paid to a banner advertisement was their dwell time on the banner, which could also be broken down further into the component dwell times on specific ROIs, including the face, text, and product. Based on Simola et al.'s (2011) research, we predicted that vertical banners (located bottom-right) would promote increased attention to the whole advertisement relative to horizontal banners (located bottom-center) by virtue of being co-located to the right of the text on the webpage. Contrary to this prediction our analyses indicated no main effect of banner type on the overall dwell time measure. The banner type factor was, however, found to interact with face condition, with vertical banners attracting more attention than horizontal banners in the mutual gaze condition, with the reverse being the case in the no face condition—and with no difference in the averted gaze condition. The fact that the overall attention-attracting capacity of banners is moderated by face condition affirmed the need to pursue more detailed dwell time analyses (discussed in the next sub-section) that focused on the way in which people's attention is distributed across specific ROIs within vertical and horizontal banners.

Our analyses also aimed to determine any influence of banner location on memory for banner contents. Our recognition measure of *explicit* memory for brand information showed neither a main effect of banner type nor an interaction between banner type and face condition. The absence of a banner type effect on recognition memory is unsurprising given the lack of any influence of this factor on the overall dwell time measure, as noted above. Our examination of the *implicit* memory measure (percentage correct word fragment completions for words that had appeared in vertical vs. horizontal banners) also revealed the absence of a main effect of banner type, supporting the explicit memory findings. The analysis did indicate, however, that the banner type factor interacted with face condition, with more manifest improvement in implicit memory scores for the vertical banners compared to the horizontal banners across the no face condition, mutual gaze condition and averted gaze condition, respectively. We suggest that these observations support the dwell time findings, which revealed that the vertical banners we used were more sensitive than the horizontal banners to facial gaze manipulation (see below for further discussion).

#### **THE ROLE OF FACES IN CUEING ATTENTION TO BANNER ADVERTISEMENTS**

The overall dwell time data confirmed that the participants exposed to banners containing faces showed increased attention to the banner relative to banners where a face was absent. Having established the potency of faces to attract attention to banner advertisements our next series of analyses unpacked the effect of mutual gaze vs. averted gaze on viewers' attention to ROIs within the banner, such as the face itself and the text and product information.

In terms of the face ROI, we found that the mutual gaze condition led to substantially longer dwell times on the face itself compared to the averted gaze condition. This finding supports previous evidence demonstrating that mutual gaze has a unique capacity to capture a viewer's attention, leading to adverse consequences in terms of performance on a primary visual search task relating to the identification of a non-facial stimulus within the search array (e.g., Senju et al., 2005; Conty et al., 2006; Frischen et al., 2007). This analysis also revealed a significant banner type × face condition interaction, with the evidence indicating that vertical banners have greater potency than horizontal banners to attract increased attention to faces that involve mutual gaze as opposed to averted gaze. This increased attentional sensitivity to the specific contents of vertical banners provides some support for predictions that derived from Simola et al.'s (2011) research.

In terms of the text and product ROIs, our analyses gave rise to some further striking findings. In particular, it was evident that the averted gaze condition promoted significantly enhanced engagement with the advertising text (measured in terms of dwell time per word) and product information compared to either the mutual gaze condition or the no face control condition. This observation supports a key prediction concerning the power of averted gaze cues to orient attention by producing a reflexive shift in viewers' attention toward a specific item located in the direction of the gaze (cf. Ricciardelli et al., 2002; Mansfield et al., 2003). In an online advertising context it seems that once attention has been attracted toward text and product information as a result of a model's gaze cues then this can augment the possibility of the viewer actually engaging further in understanding (and potentially assimilating) the advertised brand and messages. The analysis of mean dwell time per word in relation to the advertising text also demonstrated an interaction between banner type and face condition. In particular, it was evident that in the no face condition there was increased dwell time on the text of the horizontal banners compared to the vertical banners, whilst the pattern reversed in the averted gaze condition.

In relation to the analysis that examined dwell times on the advertised product we observed a significant main effect of banner type, with mean dwell times on the product in the vertical banners being longer than in the horizontal banners. The increased attentional capture by the product contents of vertical banners relative to horizontal banners provides some further support for predictions that derived from Simola et al.'s (2011) research. The analysis of mean dwell times on the product information also gave rise to an interaction between banner type and face condition, whereby vertical banners located to the bottomright of pages were differentially sensitive to the face condition manipulation relative to horizontal banners (located bottomcenter), which showed little sensitivity to the face condition manipulation. The shortest dwell time on products in vertical banners arose in the no face condition, whilst the highest dwell time on products in vertical banners arose in the averted gaze condition (i.e., the natural data showed a near 5-fold increase in product dwell time). The vertical banners with mutual gaze were intermediate in terms of the dwell time on products. The horizontal banners showed no such effect across face conditions, receiving a low dwell time on product information in all conditions.

This latter evidence again supports the notion that vertical banners located to the right of webpage text are rather different in their attention-attracting capacity compared to horizontal banners that are located below webpage text (cf. Simola et al., 2011). Although, as noted, there was no evidence in our dataset that vertical banners attracted more *overall* attention than horizontal banners, it nevertheless appears that the pattern of attentional capture to vertical banners is highly sensitive to the facial cues that are present. In other words, it seems that once a person's attention had been gained by vertical banners then the subsequent distribution of attention is very much under the control of the embedded eye-gaze cues.

#### **THE ROLE OF FACES IN ENHANCING MEMORY FOR BANNER ADVERTISEMENTS**

The use of an explicit brand-recognition test revealed, as predicted, that participants were better at recognizing brand names that had been embedded in banner advertisements receiving the most attention on the relevant text and product information, that is, brand names in the averted gaze condition. To corroborate these relationships we report here the results of: (1) a correlation analysis examining the association between the mean dwell time on banner *text* and a participant's total brand recognition score; and (2) a correlation analysis examining the association between the mean dwell time on the *product* within the banner and a participant's total brand recognition score. The respective correlations were significant and indicated the existence of the predicted positive association (*r* = 0*.*272, *p* = 0*.*021; *r* = 0*.*400, *p* = 0*.*001; both tests two-tailed). These findings are consistent with a range of evidence regarding attention and memory, suggesting that items that are attended to (as determined by eye-tracking data) are subsequently remembered (e.g., Irwin and Zelinsky, 2002), and that the longer the time spent viewing an item, then the greater the ability to remember it (e.g., Zelinsky and Loschky, 2005).

The use of an implicit memory test (i.e., indirect priming in a word fragment completion task) likewise supported the prediction that participants would be better at showing retention of aspects of the advertising message that had been embedded in banner advertisements receiving the most attention on the advertising text and product information, that is, the text and products in the averted gaze condition. To corroborate these relationships (as in the case of the explicit memory test noted above) we report here the results of: (1) a correlation analysis examining the association between the mean dwell time on banner *text* and a participant's percentage correct word fragment completion score; and (2) a correlation analysis examining the association between the mean dwell time on the banner *product* and a participant's percentage correct word fragment completion score. The respective correlations were positive and significant and supported the presence of the predicted association (*r* = 0*.*371, *p* = 0*.*001; *r* = 0*.*376, *p* = 0*.*002; both tests two-tailed). We note that Yoo (2008) reported a similar implicit memory effect for banner advertisements, suggesting that an increase in attention in terms of consciously processing web advertisements could enhance implicit memory performance in terms of remembering the advertising words embedded in those advertisements. In sum, our findings suggest that successful implicit memory performance in remembering advertising messages in banner advertisements is critically related to the high level of attention being paid to those messages.

#### **IMPLICATIONS AND FUTURE WORK**

The present findings go a step beyond previous research on banner advertising by providing a demonstration that embedding faces with averted gaze within online banner advertisements can not only capture web users' attention while they are searching for information, but can also specifically increase their attention to the advertising message as well as brand details and product information. Furthermore, this research reveals that this increased attention to advertising information is consequential, that is, it leads to the enhanced assimilation of such information, promoting a significantly increased ability to remember the content of advertisements, as determined by means of explicit and implicit memory testing. These attentional and memorial effects can occur whether faces with averted gaze are placed in vertical banners located at the bottom-right of webpages or horizontal banners located at the bottom-center of webpages.

Our findings suggest that that it is possible for advertisers to design graphical banner advertisements with embedded faces in ways that have considerable potency to orient a viewer's eye gaze toward key contents within the banner that pertain to advertising messages, brand information and product details. Vertical banners located at the bottom-right of webpages show particular sensitivity to facial manipulations involving gaze cues, with averted gaze leading, for example, to increased attention on product information as opposed to the face itself. Taken from a different perspective, the present results also attest to what are problematic designs to use for banner advertisements. For example, banners containing faces depicting mutual gaze are likely to invoke increased attention to the face at the expense of the viewer attending to advertising information. Particularly problematic are faces with mutual gaze within vertical banners located at the bottom-right of webpages, which engender especially intensive user engagement with the face itself rather than with advertising information.

As with many studies of online advertising using realistic stimuli there are limitations inherent in the present experiment that may provide an impetus for future research. First, although our use of fabricated banner advertisements with fictitious brand names reduced the potentially confounding effects of extraneous variables, such as product familiarity, we recognize that our findings may not necessarily generalize to real banner advertisements embedded in genuine website pages. It is, therefore, important for future research to explore how facial images with gaze cues have an effect on attention and memory in professional banner advertisements embedded in Internet pages so as to increase the external validity of our findings.

Second, the sizes of the vertical banners (226 × 246 pixels) and the horizontal banners (606 × 96 pixels) that we used were based on how well they fitted in the bottom-center location (in the horizontal axis) and the bottom-right location (in the vertical axis) on our constructed webpages. Although the use of these specific banner sizes allowed for an effective manipulation of banners within a naturalistic browsing context, it would be valuable for future research to examine more fully and systematically the impact of manipulating banner size across a range of dimensions. In this respect a follow-on study would also be worth conducting that carefully standardized the size (and also the content) of the vertical and horizontal banners so as to eliminate any potential confounds arising from a lack of control in these respects. We are especially conscious of the potential problems with data interpretation that can derive from a failure to control adequately for banner content across vertical vs. horizontal banner types. We know from previous research, for example, that low-level visual features are highly influential in directing attention in a bottom-up manner toward salient, localized areas of scenes and images (Theeuwes, 1994; Rayner, 1998). Indeed, contemporary models of visual attention typically include the concept of a "saliency map," which is a theoretical construct that functions to integrate information across different low-level features within a scene (e.g., color, intensity, orientation) to form a unitary map that encodes the visual saliency of those features (e.g., see Itti and Koch, 2000). The "maximum" of the saliency map corresponds to the most salient location within the image or scene, which is believed to be the location that is most likely to attract visual attention (see Simola et al., for a relevant discussion of these concepts in an advertising context).

In these latter respects we concede that in our study there remained a possibility that salient low-level features within the advertisements that we used might have inadvertently been confounded with our banner type manipulation, despite our careful attempt to standardize banners in terms of information content relating to faces, products and text. Follow-up research could control for this issue more effectively by first applying a saliency algorithm to different advertisements so as to check their comparability in terms of their inherent featural saliency. Alternatively, an experimental design in which advertising content was systematically rotated across banner types and locations would also be a good way to help control for any influence of bottom-up feature salience on attention.

Third, the present study did not investigate data relating to the number of fixations on ROIs within banner advertisements, yet previous research has suggested that a large number of fixation counts on a particular area is indicative of the informativeness and importance of that area for viewers (e.g., Bojko, 2005). Hence, it would be useful for future research to analyse fixation counts on ROIs so as to inform an understanding of how facial images and gaze cues within banner advertisements impact the perceived importance of advertising information. In addition, the analysis of data relating to the duration of first fixations (e.g., Henderson and Hollingworth, 1999) might be useful so as to obtain further insights regarding the attention-grabbing capacity of faces depicting mutual or averted gaze cues in relation to different ROIs.

Finally, it is important for future research to investigate the effects of banner advertisements containing facial images on attention and memory in terms of sex differences. This is because research by Bayliss et al. (2005) has revealed intriguing evidence that females have a greater ability to encode gaze direction than males, such that the unique capacity that gaze cues have for the reflexive orientation of attention may be more pronounced for female than male viewers of web-based advertising information.

#### **CONCLUDING REMARKS**

The present findings demonstrate how facial images with averted gaze that are embedded within online banner advertisements provide powerful orienting cues that can increase web users' attention to advertising information that is incidental to their current, goal-directed search task. Importantly, this increased attentional engagement with advertising information manifests itself in an enhanced ability to remember advertising contents such as brand information and words linked to advertising messages. The study Sajjacholapunt and Ball Gaze cues and banner advertisements

also demonstrates that the converse results arise when banner advertisements include embedded faces with mutual gaze. In this latter case although web users are attracted to attend to the banner advertisement, they engage disproportionately with the face itself at the expense of attending to advertising information, which generally limits any memory benefits that arise for brand information or adverting details. This detrimental impact of mutual gaze on attention to advertising products is particularly marked for vertical banners located at the bottom-right of webpages, whereas averted gaze cues in such banners have a positive impact on attention to product information. Our findings give good grounds for suggesting that advertisers could capitalize on the inclusion of averted gaze cues within online advertisements so as to enhance people's engagement with (and memory for) advertising messages, brand information and product details.

#### **ACKNOWLEDGMENTS**

We are grateful to Lucy Atherton, Diana Mazgutova, and Gareth McCray, who all contributed to the success of the reported research and the writing of this article. We note our particular appreciation to Dave Gaskell, for his help with the eye-tracking technology that was central to our reported experiment. We are also grateful to the two expert reviewers who commented on previous versions of this article. Their constructive feedback has greatly enhanced our reporting of this research.

#### **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 30 September 2013; accepted: 11 February 2014; published online: 04 March 2014.*

*Citation: Sajjacholapunt P and Ball LJ (2014) The influence of banner advertisements on attention and memory: human faces with averted gaze can enhance advertising effectiveness. Front. Psychol. 5:166. doi: 10.3389/fpsyg.2014.00166*

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology.*

*Copyright © 2014 Sajjacholapunt and Ball. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Online advertisement: how are visual strategies affected by the distance and the animation of banners?

### *Léa Pasqualotti and Thierry Baccino\**

*Laboratory of Human and Artificial Cognition, Department of Psychology, CHArt/LUTIN, University of Paris VIII, Saint-Denis, France*

#### *Edited by:*

*Jaana Simola, University of Helsinki, Finland*

#### *Reviewed by:*

*Jukka Hyönä, University of Turku, Finland David Lane, Rice University, USA*

*\*Correspondence: Thierry Baccino, Université Paris 8,*

*2 Rue de la liberté, 93526 Saint-Denis, France e-mail: baccno@free*

Most of studies about online advertisements have indicated that they have a negative impact on users' cognitive processes, especially when they include colorful or animated banners and when they are close to the text to be read. In the present study we assessed the effects of two advertisements features—distance from the text and the animation—on visual strategies during a word-search task and a reading-for-comprehension task using Web-like pages. We hypothesized that the closer the advertisement was to the target text, the more cognitive processing difficulties it would cause. We also hypothesized that (1) animated banners would be more disruptive than static advertisements and (2) banners would have more effect on word-search performance than reading-for-comprehension performance. We used an automatic classifier to assess variations in use of *Scanning* and *Reading* visual strategies during task performance. The results showed that the effect of dynamic and static advertisements on visual strategies varies according to the task. Fixation duration indicated that the closest advertisements slowed down information processing but there was no difference between the intermediate (40 pixel) and far (80 pixel) distance conditions. Our findings suggest that advertisements have a negative impact on users' performance mostly when a lots of cognitive resources are required as for reading-for-comprehension.

#### **Keywords: visual strategies, web pages, advertising, banner blindness**

#### **INTRODUCTION**

Because the economic model of the Internet is based on advertisements, advertisers attempt to grab users' attention by any means possible. Even during an activity such as reading, advertisements can disrupt the attention of readers, making text comprehension more difficult (Baccino, 2004). However, attention is a highly labile capacity and reports of attentional disturbance from online advertisements have led to extensive research on the influence of banners on Internet users (Diaper and Waelend, 2000; Burke et al., 2005; Pagendarm and Schaumburg, 2006; Zhang, 2006; Simola et al., 2011). Studies can be classified according to whether they focused on the level of control of attentional processes (Theeuwes, 1994; Theeuwes and Burger, 1998; Drèze and Hussherr, 2003; Stenfors et al., 2003) or the distinction between overt and covert attention (Burke et al., 2005; Diaper and Waelend, 2000; Simola et al., 2011).

#### **SHIFTS OF ATTENTION**

Studies of online advertisement have retained the classical distinction between automatic and controlled attentional processes (Theeuwes, 1994; Theeuwes and Burger, 1998; Drèze and Hussherr, 2003; Stenfors et al., 2003; Simola et al., 2011). From a bottom–up perspective, involuntary shifts of attention are guided by salient elements of the on-screen display (Itti and Koch, 2000). Controlled shifts of attention to particular elements of the interface are determined by the goal: top–down processing. Nevertheless, research has provided evidence for a two-component model of attentional shifting: a fast bottom–up process and a slower top–down mechanism (Braun and Sagi, 1990; Hikosaka et al., 1996; Braun, 1998; Braun and Julesz, 1998; Itti and Koch, 2000). This research has also distinguished between overt attention and covert attention. Overt attentional shifts are manifested as an eye movement toward the element which has grabbed the individual's attention. Covert attentional shifts do not involve eye movement. Previous work has provided mixed results on overt and covert shifts of attention, for example Burke et al. (2005) suggested that online advertisements affects users' performance even when they do not show eye movement or overt attention, however, Simola et al. (2011) obtained data which indicated that users directly fixated advertising banners, particularly those on the right-hand side of Web pages.

#### **IMPACT OF ADVERTISEMENTS ON ATTENTION**

According to Kahneman's theory of attention (1973), sharing capacity is reduced when one of two competing tasks is highly demanding. Based on this theory, Simola et al. (2011) suggested that advertisements act as distractors: covertly attending to advertisements decreases the cognitive resources assigned to the main task. Whether attention to advertisements is overt or covert, controlled or automatic, there is a consensus that online advertisements affect users' performance. Recent results (Simola et al., 2011) suggested that users paid overt attention to banners, particularly when they were located on the right-hand side of a Web page. These authors specified that the most distracting Web page configuration was characterized by a static banner at the top of the page and an animated banner on the right-hand side. Previous work also highlighted the impact of the size of advertisements or advertisement elements on attentional shifts, showing that larger advertisements attract more fixations (Lohse, 1997; Wedel and Pieters, 2007). Other studies have linked larger surface size to higher visual saliency (Pieters et al., 2007; Orquin et al., 2012). Previous research has also consistently found that the impact of online advertisement varies according to the task: tasks which require higher level cognitive resources and deeper information processing suffer less interference from advertisements (Diaper and Waelend, 2000; Pagendarm and Schaumburg, 2006; Simola et al., 2011). Additionally, Wang and Day (2007) reported that the level of attention paid to an online advertisement varies according to the stage of the task; they found that users were more sensitive to banners at the beginning and the end of an information search task.

#### **BANNER BLINDNESS**

Not all the research has confirmed the hypothesis that online advertisements affect users' performance; some studies found that some users' ignore the banners (Benway and Lane, 1998; Drèze and Hussherr, 2003; Stenfors et al., 2003). This capacity actively to ignore advertisements—which are typically salient elements of a visual display—is called "banner blindness" and was first reported by Benway and Lane (1998). These authors investigated how users browsed through a corporate Intranet to find a link to Internet courses. They reported that even large, colorful or dynamic banners which may contain information relevant to the task can be ignored. However, Benway and Lane (1998) did not used actual advertising banners but banner advertisement stylelinks. Previous studies have also indicated that the position of advertisements affects the strength of banner blindness (Burke et al., 2005; Cooke, 2008; Owens et al., 2011). Cooke (2008) and Owens et al. (2011) obtained similar results which indicated that users actively ignored the right-hand side of Web pages when they expected to find an advertisement there. It was suggested that users may anticipate the position of the banners and may respond by focusing on the top of the page (Burke et al., 2005). Owens et al. (2011) also suggested that users tend actively to ignore areas of the interface where advertisements are usually located. However, Theeuwes and Burger (1998) reported that the banner blindness phenomenon only occurs when users are aware of the distractor and its features. These authors also stated that the phenomenon disappears when the distractor varies randomly from one trial to another.

#### **OBJECTIVES OF THE PRESENT STUDY**

The objective of the present study was to assess the impact of banners on two types of visual strategy used for visual inspection: reading and scanning. We investigated the impact of distance from the target material and animation of advertising banners on visual strategies. We investigated two specific questions. (1) In which conditions do we observe a banner blindness phenomenon? (2) How do visual strategies vary with distance from the target and animation of banners? We recorded participants' eye movements while they performed two reading different activities. Participants performed trials of a word-search task and a reading-for-comprehension task in random order. The goal of the word-search task was to find a specific target word in a Web page. The reading-for-comprehension task required participants to scan or read the Web page attentively in order to summarize the topic afterwards. It was hypothesized that the closer the advertisement was to the target, the more difficulty participants would have with task processing. We also hypothesized that animated banners would be more disrupting than static advertisements. We predicted that participants would be disturbed by advertisements while performing the word-search task; because the readingfor-comprehension task required more cognitive resources, we predicted that participants would apply strategies to ignore the banners and would not be distracted by them. Previous studies have showed that readers can switch between different cognitive states whilst performing a reading activity, for instance shifting between scanning and reading (Carver, 1990; Simola et al., 2008; Cole et al., 2011). These different cognitive states can be identified by specific eye movement patterns (Lemaire et al., 2011).

Our study attempted to classify visual strategies automatically. The classification data were used to explore how the effect of advertising banners on visual strategies varies according to the depth of processing required by the target task and how advertisements generate task-switching. From a theoretical standpoint, the present study potentially provides new perspectives on theories on online advertising and attention and the methodologies used to investigate online attention. From a practical standpoint, information on the effects of advertising banners could guide Web designers, developers and advertisers in their choice of banners distance and animation.

#### **EXPERIMENT**

#### **PARTICIPANTS**

The required sample size for *F*-tests (repeated measures ANOVA, within-subjects factors) was estimated by a power analysis (GPower 3.1.7) (Faul et al., 2007). The results showed that with 12 experimental conditions (see below for the Design) and 24 trials, 12 participants would be required to achieve a significance level of *p* = 0*.*05 (power = 0*.*95; effect size = 0*.*25). Twenty-four participants (12 females, 12 males, all right-handed) were tested. The participants were students at the University of Paris VIII and the Ecole Pratique des Hautes Etudes (EPHE). Their mean age was 30 years; the range was from 21 to 38 years. All participants were native French speakers and reported normal or corrected to normal vision. They were not aware of the purpose of the study. The students did not receive any reward for their participation.

#### **APPARATUS**

Eye movements were recorded using an infrared video eyetracking device (SMI RED500, SensoMotoric Instruments, Teltow: Germany) sampling pupil and corneal reflection at 500 Hz. The screen coordinates of the left eye were sampled. The system has a spatial tracking accuracy of approximately 0.5◦ of visual angle. The calibration was run on 9 points to optimize spatial tracking accuracy. Drift was corrected once during the experiment, after 12 trials. Data were recorded with Experiment Center software (SMI Teltow, Germany) and processed with BeGaze software.

The participants were seated on a chair at a fixed distance of approximately 57 cm away from the monitor and the eye-tracker. A chin-rest was used to minimize head movements during the recording. Participants were given the opportunity to adjust the seat and chin-rest to the most comfortable position. The stimuli were presented on a 24 Dell 2007 FP LCD flat screen with a 60 Hz refresh rate. The screen resolution was 1280∗1024 pixels. With this screen resolution and the given distance from the screen, 1◦ of visual angle encompassed 2.3 letters on average.

#### **STIMULI**

#### *Texts*

Fifty texts from six domains—France, World, Science, Technology, Sport, and Culture—were extracted from newspaper websites. The length of the texts was controlled by the number of words (*M* = 168*.*63; *SD* = 4*.*85) and the number of lines (*M* = 12*.*25; *SD* = 0*.*59). The 50 texts were pretested to ensure that the texts used in the main task all had a similar level of difficulty. Eight students from the University of Paris VIII participated in the pretest. The relative difficulty of each text was evaluated with 3 subjective questions and 3 inferential questions. For the subjective questions, participants rated the text difficulty using a five-point Likert scale (from "1"—very difficult to "5"—very easy). The inferential questions were true-false questions and a correct response required use of information from the texts and participants' general knowledge. Texts were excluded if an error was made on the inferential questions and if the mean rating was ≤2 on the Likert scale. Four training texts and 24 experimental texts were selected and integrated into Web-like pages that we created. The average estimated difficulty of the 28 texts was about 3.92 (*SD* = 0*.*59).

#### *Web pages*

We designed 28 Web-like pages structured as follows (see **Figure 1**): a horizontal main menu on top of the page, a vertical menu on the left-hand side and a central text. An advertising banner was positioned on the right-hand side of the 24 experimental pages. There were 3 possible distances (in pixels; px) between the text and the banner: 0 px (*near*), 40 px (*intermediate*), and 80 px (*far*). The web pages were stored on a server using FileZilla Client freeware and displayed using the Internet Explorer 9 browser.

#### *Banners*

Forty-two vertical advertising banners were selected from various websites. In order to control the impact of surface size on attention all banners used the same 120∗600 px format (Peschel and Orquin, 2013). The visual salience of the banners was also controlled using the Itti and Koch algorithm (Itti and Koch, 2000). Twenty-four banners with similar salience maps were chosen and integrated into the Web pages. The salience maps were compared pairwise in terms of the Area Under Curve (AUC) for each banner (Le Meur and Baccino, 2012); the average correlation was highly significant (*r* = 0*.*81 *p <* 0*.*001). Dynamic and static versions of each banner were available.

#### *Target words for the word-search task*

A single target word per text was selected for the word-search task. Only nouns were chosen. The target was randomly chosen from the beginning, the middle or the end of the text contained in the Web pages. The horizontal position of the target words also varied: they were chosen from the beginning, the middle or the end of the lines. The selected word only appeared once in the text. The target words were 5–8 letters long—this length was selected so that the length of the target words would be close to the mean length of French words. We computed the frequency of the targets using a corpus of French texts (New et al., 2001). The average frequency1was about 81.22 per million (*SD* = 60*.*68). When displayed on the screen, the target words were 1.2– 1.8 cm long, that is, they subtended 1.2–1.8◦ of visual angle. SMI Experiment Center software allows the user to specify the triggers which advance the task from one trial to another; Areas of Interest (AOIs) can be used as triggers. We defined the target words as AOI triggers to ensure that participants always located the targets and completed the task. We defined a 1000 ms threshold for the time clock of the trigger AOIs, i.e., the participants had to fixate the target words for 1000 ms to access the next trial.

#### *Post-test questionnaire*

A post-test questionnaire was developed to assess how participants felt affected by the banners. The questionnaire consisted of 14 statements which were evaluated using a five-point Likert scale. Participants completed the questionnaire at the end of the experiment. Half the questions investigated whether participants had paid attention to the advertisements. The other

<sup>1</sup>Frequency per million of the lemma in the corpus of books: it corresponds to the sum of the frequencies of the inflected forms of each lemma in the corpus Frantext, normalized by dividing by 14.8 (the original corpus contains 14.7 million occurrences).

questions evaluated whether participants felt distracted by the advertisements while they were performing the tasks.

#### **CLASSIFIER ALGORITHM**

In previous work we developed an algorithm (in PERL) for categorizing fixations in terms of function: *Scanning* or *Reading*. The algorithm accuracy has been evaluated using a classifier technique (Naive Bayes) showing a cross-validation accuracy of 57% for predicting "reading fixations" and 79% for scanning fixations. The classification of a fixation is a function of the orientation *O*, horizontality *H* and the size of the saccade *S* that produced the current fixation (Equation 1).

$$\text{Class}(\text{fixation}) = f\left(\text{O(fixation)}, \ H(\text{fixation}), \ \text{S(fixation)}\right) \tag{1}$$

The orientation *O* is obtained by computing the difference between the x-coordinates of the fixation *f* and the previous fixation (Equation 2). A positive result corresponds to a forward saccade and a negative result corresponds to a backward saccade.

$$O(f) = x\_f - x\_{f-1} \tag{2}$$

The horizontality *H* is the absolute value of the difference between the y-coordinates of the fixation and the previous fixation (Equation 3). We defined a threshold for the horizontality of a saccade: in terms of the height of the white space between two lines of characters: if a saccade was confined within a 45 px vertical gap it was classified as a horizontal saccade.

$$H(\emptyset) = |\mathcal{y}\_f - \mathcal{y}\_{f-1}|\tag{3}$$

The size *S* is the Euclidian distance between the fixation and the previous one (Equation 4). Saccades were classified as short or long by reference to a threshold specified in terms of perceptual span which extends about 12–15 characters to the right side of the fixation point and about 4 characters to the left side (McConkie and Rayner, 1975, 1976), i.e., 4◦ of visual angle.

$$S(f) = \sqrt{\left( (\mathbf{x}\_f - \mathbf{x}\_{f-1})^2 + (\mathbf{y}\_f - \mathbf{y}\_{f-1})^2 \right)} \tag{4}$$

Although the most important part of visual information is processed within the foveal region, during reading information is also extracted from the parafoveal region. This corresponds to a perceptual span which is about 4◦ of visual angle. At a distance of 57 cm from the screen, 1◦ of visual arc corresponds to 1 cm. With a screen resolution of 1280∗1024 px, 4◦ of visual angle is about 107 px. We rounded this figure down and classified saccades less than 100 px long as short. A saccade was classified as long if it was over 600 px long; this was half the width of the space covered by the text.

Considering a fixation n, the algorithm treats it as a reading fixation in three different cases:

• fixation *n* is preceded by a short, horizontal forward saccade, i.e., oriented to the right in French, which is a left-to-right language (Equation 5). This is a normal reading saccade.


All other fixations were classified as *scanning* (Equation 7). A fixation *n* resulting from a short, horizontal backward saccade preceded by a *scanning fixation n-1* is classified as *scanning* (Equation 8).

$$\text{if } (O(n), H(n), S(n)) = \text{"Reading" if } O(n) > 0 \text{ and } H(n)$$

$$< 45 \text{ and } S(n) < 100 \tag{5}$$

*f (O(n), H(n), S(n))* = "*Reading if O(n) <* 0 *and H(n)* (6)

*<* 45 *and S(n) >* 50% *of text width)*

$$f\left(O(n),\ H(n),\ S(n)\right) = \text{"Scanning"}\text{ otherwise}\tag{7}$$

$$f\left(O(n), H(n), S(n)\right) = f\left(O(n-1), H(n-1), S(n-1)\right) \text{ if } O(n)$$

$$< 0 \text{ and } H(n) < 45 \text{ and } S(n) < 100 \text{ (8)}$$

This method does not depend on the content of the page, but only on the shape of the scanpath, which makes a difference with noisy data. The method used to record eye movements of subjects reading multi-line texts produces rather noisy data which does not allow the position of the eye to be determined with precision. Working from the shape of the scanpath instead of the content fixated is therefore necessary. Holmqvist et al. (2003) applied a similar method to identify scanning and reading fixations recorded when readers covered newspapers and net papers. They analyzed fixation data above 100 ms through a custommade reading filter. Reading fixations were filtered if they were (1) before, between or after two successive forward saccades and (2) before and after return sweeps. Correction and backward saccades were not recognized. The fixations that were not filtered were labeled as scanning. Contrary to Holmqvist et al. (2003) filter, our classifier algorithm classify fixations that occurs after backward saccades.

#### **DESIGN**

The experiment used a full within-subjects design with 2 tasks (word-search, reading-for-comprehension), 2 banner animations (dynamic, static), and 3 positions (near, 0 px; intermediate, 40 px; far, 80 px) as experimental variables. These variables were counterbalanced in a Latin square design to produce 12 lists of stimuli and avoid any biases. Two trials per condition were assigned to the participants. In each list, 12 Web pages were assigned to the word-search task and 12 other Web pages to the reading-forcomprehension task. Web pages contained either a dynamic or a static advertisement positioned at 0, 40, or 80 px from the text. The 24 participants were randomly assigned to the 12 lists composed of 28 Web pages (4 training Web pages, 24 experimental Web pages).

#### **EXPERIMENTAL PROCEDURE**

The experiment was run individually in an isolated and quiet workspace. First of all the participants read the instructions on the screen which described the reading tasks they were to perform: searching for a particular word in a text (*word-search task*) and reading the text carefully in order to provide a brief summary of the topic afterwards (*reading-for-comprehension task*). Then the experimenter asked the participants if they understood the instructions. The instructions for both tasks were developed in line with Carver's (1990) methodological recommendations. After calibration, the subjects performed the 4 training trials (2 trials for each task) and at the end the experimenter checked again that participants had understood the instructions. When everything had been checked, participants performed the 24 test trials. The two tasks were presented randomly across the trials.

On each trial of the word-search task a target word was displayed on the screen. Participants were instructed to memorize the word, then press the space bar and fixate a cross appearing for 1 s in the top center of the screen. The Web page was then displayed and the participant had to find the target word as quickly as possible. Once the target had been located participants had to fixate it for 1000 ms to trigger the end of that trial and start the next one. On each trial of the reading-for-comprehension task an instruction to read the text carefully was presented on the screen. Participants were then asked to press the space bar and to fixate a cross appearing for 1 s in the top center of the screen. This caused the Web page to be displayed and the participants could read the text. To complete the task participants had to close the browser and provide a brief written summary of the topic of the text in a dedicated area. The next trial started after their answer had been recorded.

The experimenter stayed with the participants throughout the training and experimental sessions to monitor the eye-tracking system. At the end, the participants answered a post-test questionnaire to assess their perception of the banners after which the experimenter explained the aim of the study to the participants and answered any questions.

#### **DATA ANALYSES**

Analyses of variance for repeated measures (rm ANOVA) were conducted on 5 dependent variables: fixation duration, number of fixations, first-fixation duration, gaze duration and saccade amplitude, with a fixed significance threshold of *p <* 0*.*05. Firstfixation duration was defined as the mean duration of the first 5 fixations. The objective was to investigate where subjects fixated when the webpage was first displayed and how the durations of these early fixations differed from those of the rest of the fixations. Gaze duration was the sum of fixation durations for an AOI (the text or the banner). The objective was to examine the total processing time for all the elements of the webpage. All analyses were corrected using Bonferroni *post-hoc* tests. A low cut-off of 100 ms and high cut-off of 500 ms were used for filtering fixations, these cut-offs corresponded to 2 SD above and below the average (i.e., 3.7% of outliers fixations were excluded). Outlanding saccades, i.e., saccades that landed outside the screen, were excluded from the analyses (0.24% of all saccades). After filtering the eye movement data the results of a Kolmogorov-Smirnov and

We defined 2 AOIs, one on the central text and one on the banner. The size of the AOI on the text was 430∗565 px. The size of the advertisement AOI was the area of the banner (120∗600 px).

#### **RESULTS**

**Table 1** summarizes the means and standard deviations for fixation durations, number of fixations and saccade amplitudes for all the experimental conditions (see also **Figure 2**).

#### **BEHAVIORAL DATA: EYE MOVEMENTS**

#### *Variation of eye data during reading activities*

The following results consider all the eye movement data together regardless of where they terminated on the Web page (main text vs. banner). We hypothesized that overall eye movement data would be affected by banner animation. There were no significant main effects of the variables *Task*, *Animation,* and *Distance* on the eye movement metrics, all *F*s *<* 1. However, there was a significant interaction between *Task* and *Animation* for fixation duration [*F(*1*,* <sup>23</sup>*)* <sup>=</sup> <sup>9</sup>*.*36, *<sup>p</sup> <sup>&</sup>lt;* <sup>0</sup>*.*010; <sup>η</sup><sup>2</sup> <sup>=</sup> <sup>0</sup>*.*29, <sup>α</sup> <sup>=</sup> <sup>0</sup>*.*05], number of fixations [*F(*1*,* <sup>23</sup>*)* <sup>=</sup> <sup>41</sup>*.*76, *<sup>p</sup> <sup>&</sup>lt;* <sup>0</sup>*.*001; <sup>η</sup><sup>2</sup> <sup>=</sup> <sup>0</sup>*.*64, <sup>α</sup> <sup>=</sup> <sup>0</sup>*.*05], and saccade amplitude [*F(*1*,* <sup>23</sup>*)* <sup>=</sup> <sup>7</sup>*.*76, *<sup>p</sup> <sup>&</sup>lt;* <sup>0</sup>*.*025; <sup>η</sup><sup>2</sup> <sup>=</sup> <sup>0</sup>*.*25, α = 0*.*05]. During the word-search task, participants made more fixations (see **Figure 3**) [*F(*1*,* <sup>23</sup>*)* = 37*.*98, *p* = 0*.*008] and fixations were longer [*F(*1*,* <sup>23</sup>*)* = 6*.*48, *p* = 0*.*024] if the banner was dynamic. Although the effect did not reach significance, dynamic banners also tended to generate shorter saccades [*F(*1*,* <sup>23</sup>*)* = 2*.*70, *p* = 0*.*114]. The opposite pattern of results was found for the reading-for-comprehension task. When the banners were static Web pages received more fixations [*F(*1*,* <sup>23</sup>*)* = 11*.*71, *p* = 0*.*001] of longer duration, [*F(*1*,* <sup>23</sup>*)* = 4*.*94, *p* = 0*.*027] and saccades were shorter, [*F(*1*,* <sup>23</sup>*)* = 9*.*11, *p* = 0*.*020].

We did not obtain any significant results for either firstfixation durations or gaze durations, *p* = *ns*.

An ANOVA for *Trial Durations* (i.e., mean time required to complete the task) was also carried out to estimate readers' efficiency. Again the analyses revealed only one significant result: an interaction between *Task* and *Animation*, *F(*1*,* <sup>23</sup>*)* = 21*.*99, *<sup>p</sup> <sup>&</sup>lt;* <sup>0</sup>*.*001; <sup>η</sup><sup>2</sup> <sup>=</sup> <sup>0</sup>*.*49, <sup>α</sup> <sup>=</sup> <sup>0</sup>*.*05. During the word-search task, participants took longer to complete the task when the Web pages contained a dynamic banner, *F(*1*,* <sup>23</sup>*)* = 9*.*17, *p* = 0*.*017. The opposite result was found for the reading-for-comprehension task: completion times were longer when the banner was static, *F(*1*,* <sup>23</sup>*)* = 12*.*00, *p* = 0*.*019.

#### *Advertisement and Areas of Interest (AOIs)*

In order to investigate the banner blindness effect, an ANOVA was carried out for all eye movement data from two AOIs: one defined on the banner and another one on the central text. We defined another factor *Zone* to investigate differences between the two AOIs. We weighted the fixation durations and the number of fixations according to the size of the AOIs. The results showed that the central text received significantly longer fixations than the banner, *<sup>F</sup>(*1*,* <sup>23</sup>*)* <sup>=</sup> <sup>176</sup>*.*61, *<sup>p</sup> <sup>&</sup>lt;* <sup>0</sup>*.*001; <sup>η</sup><sup>2</sup> <sup>=</sup> <sup>0</sup>*.*88, α = 0*.*05. It also attracted significantly more fixations, *F(*1*,* <sup>23</sup>*)* = <sup>324</sup>*.*89, *<sup>p</sup> <sup>&</sup>lt;* <sup>0</sup>*.*001; <sup>η</sup><sup>2</sup> <sup>=</sup> <sup>0</sup>*.*93, <sup>α</sup> <sup>=</sup> <sup>0</sup>*.*05. The position of the banner also affected fixation duration, *F(*2*,* <sup>46</sup>*)* = 3*.*39, *p <* 0*.*050;


**Table 1 | Average fixation durations (ms), average number of fixations and average saccade amplitude (degrees of visual angle) by task, animation, and banner location for all participants.**

*Standard deviations are given in brackets.*

<sup>η</sup><sup>2</sup> <sup>=</sup> <sup>0</sup>*.*13, <sup>α</sup> <sup>=</sup> <sup>0</sup>*.*05. Banners near the central text received longer fixations than banners at an intermediate distance or far from the text, *F(*1*,* <sup>23</sup>*)* = 5*.*14, *p* = 0*.*000. There was no significant difference between intermediate and far banners, *p* = *ns*. There was a significant interaction between *Zone* and *Distance* on fixation durations, *<sup>F</sup>(*2*,* <sup>46</sup>*)* <sup>=</sup> <sup>4</sup>*.*34, *<sup>p</sup> <sup>&</sup>lt;* <sup>0</sup>*.*025; <sup>η</sup><sup>2</sup> <sup>=</sup> <sup>0</sup>*.*16, <sup>α</sup> <sup>=</sup> <sup>0</sup>*.*05. Fixation durations for the central text did not vary according to the distance of the banner from the text, all *F*s *<* 1. However, banners near the central text received longer fixations than banners at an intermediate distance or far from the text, *F(*1*,* <sup>23</sup>*)* = 4*.*78, *p* = 0*.*020 (see **Figure 4**). The results also indicated a three-way interaction with *Zone, Task and Animation*, *<sup>F</sup>(*1*,* <sup>23</sup>*)* <sup>=</sup> <sup>40</sup>*,* <sup>20</sup>*, <sup>p</sup> <sup>&</sup>lt;* <sup>0</sup>*.*001; <sup>η</sup><sup>2</sup> <sup>=</sup> <sup>0</sup>*.*64, <sup>α</sup> <sup>=</sup> <sup>0</sup>*.*05. However, the number of fixations only varied on the central text, during the word-search task with the text received more fixations when the advertisement was dynamic, *F(*1*,* <sup>23</sup>*)* = 11*.*08, *p* = 0*.*000, but during the reading-for-comprehension task the text received more fixations when the banner was static, *F(*1*,* <sup>23</sup>*)* = 35*.*40, *p* = 0*.*000.

#### *Post-test questionnaire and visual strategies*

The post-test questionnaire was used to investigate participants' subjective perception of shifts of attention toward the banners and how they thought they had been affected by the banners. The higher the score, the more attention grabbed and the more distraction felt. Of the 24 participants, 13 (54%) reported that they did not pay attention to the banner and were not affected by the banners (*M* = 1*.*4; *SD* = 0*.*46) (see **Figure 5**).

ANOVAs were carried out for the eye movement data from the 13 participants who reported that they had not paid any attention to the banners, to investigate possible automatic and unconscious shifts of attention (see **Table 2**). These participants covered the Web pages with more fixations when they contained a dynamic banner, *<sup>F</sup>(*1*,* <sup>12</sup>*)* <sup>=</sup> <sup>7</sup>*.*32, *<sup>p</sup> <sup>&</sup>lt;* <sup>0</sup>*.*025; <sup>η</sup><sup>2</sup> <sup>=</sup> <sup>0</sup>*.*38, <sup>α</sup> <sup>=</sup> <sup>0</sup>*.*05. A significant interaction between *Task* and *Animation* [*F(*1*,* <sup>12</sup>*)* = 14*.*30, *<sup>p</sup> <sup>&</sup>lt;* <sup>0</sup>*.*010; <sup>η</sup><sup>2</sup> <sup>=</sup> <sup>0</sup>*.*54, <sup>α</sup> <sup>=</sup> <sup>0</sup>*.*05] indicated that dynamic banners only affected the number of fixations during the word-search task, *F(*1*,* <sup>12</sup>*)* = 20*.*05, *p* = 0*.*005. There was also an interaction between the *Animation* and *Distance* of the banners for fixation durations during the reading-for-comprehension task, *F(*1*,* <sup>12</sup>*)* = <sup>3</sup>*.*56, *<sup>p</sup> <sup>&</sup>lt;* <sup>0</sup>*.*050; <sup>η</sup><sup>2</sup> <sup>=</sup> <sup>0</sup>*.*23, <sup>α</sup> <sup>=</sup> <sup>0</sup>*.*05. The far (80 px) banners generated longer fixations when they were static, *F(*1*,* <sup>12</sup>*)* = 5*.*40, *p* = 0*.*036.

We did not obtain any significant results for the saccade amplitudes of the 13 participants, *p* = *ns*.

#### **CLASSIFIER ALGORITHM: PROPORTIONS OF** *SCANNING* **AND** *READING*

An algorithm was developed to explore how visual strategies vary according to the text layout and how advertising banners affect visual strategies according to the processing depth and explore how ads generate task-switching. This algorithm classified all fixations as *Scanning* or *Reading* according to the saccade that preceded the fixation. Logically, more scanning fixations should be found in the search task and more reading fixations in the reading-for-comprehension task. This classifier algorithm has been applied on fixations data (i.e., after an event detection has been applied on eye samples for detecting fixation and saccade). A lots of event-detection algorithms have been used in the eye-tracking literature, but an interesting one which might improve also the accuracy of our algorithm has been recently developed by Nyström and Holmqvist (2010) for fixation, saccade and glissade detection. It seems fairly robust and has addressed problems which affected other event-detection algorithms and would be of interest in this context.

We computed a scanning rate for each condition and each participant (**Table 3**). The results suggested that participants switched between the strategies during both tasks. About half the fixations during the reading-for-comprehension task were classified as *Scanning* (*M* = 50*.*67, *SD* = 8*.*98). The proportion of fixations classified as scanning was slightly larger for the wordsearch task (*M* = 52*.*25, *SD* = 8*.*45). There was considerable variability in strategy between the participants across conditions (*M* = 51*.*5, *SD* = 11*.*34). For example, Participant 17 used a scanning strategy much more than Participants 4 and 15 (64 vs. 36% averaged across conditions).

We carried out a Friedman ANOVA on the proportion of fixations assigned to each strategy. Proportions of scanning were compared by *Task* and *Animation*. Significant differences were found in use of the scanning strategy across the conditions, χ<sup>2</sup> *(*11*,* <sup>23</sup>*)* = 30*.*00, *p <* 0*.*010. For both the word-search and reading-for-comprehension tasks, Tukey's HSD *post-hoc* test revealed that participants used a *Scanning* strategy significantly

more when the banner was static (*p <* 0*.*050). When the advertising banner was static participants switched to a *Scanning* strategy more often during the reading-for-comprehension task than during the word-search task (*p <* 0*.*010).

#### **DISCUSSION**

The impact of online advertisement has been the topic of research for many years. The theoretical debate has contrasted top– down and bottom–up processing (Theeuwes, 1994; Theeuwes and Burger, 1998; Drèze and Hussherr, 2003; Stenfors et al., 2003; Simola et al., 2011) and overt and covert shifts of attention (Benway and Lane, 1998; Itti and Koch, 2000). In the present work we investigated how the animation and placement characteristics of advertising banners affected readers' eye movements and thus their cognitive states, during two different reading activities. Previous studies of visual processing activities using statistical models suggested that eye movements reflect readers' cognitive states (Carver, 1990; Rayner and Pollatsek, 1992; Rayner, 1995, 1998; Simola et al., 2008; Cole et al., 2011; Lemaire et al., 2011; Henderson et al., 2013). We predicted that the closer the advertisement, the more difficult participants would have with task processing. We also hypothesized that animated banners would be more distracting than static advertisements. We predicted that the banners would have a stronger effect during the word-search task, but that participants would experience "banner blindness" during the reading-for-comprehension task. We recorded the eye movements of participants performing both word-search and reading-for-comprehension tasks and investigated transitions between visual strategies with the help of a classifier algorithm that differentiates *scanning* fixations from *reading* fixations.

The results revealed that readers' eye movements were affected differently by the characteristics of the advertising banners during the word-search and reading-for-comprehension tasks. When participants were performing the word-search task, the eye movement data showed smaller fixation durations, fewer fixations, shorter saccades and less efficiency when the banners were dynamic rather than static. During the reading-for-comprehension task performance was worse when the banners were static. On both the word-search and readingfor-comprehension tasks, the variations in the number of fixations only applied to the central text. The results also indicated that the central text received longer fixations than the banner and that variations in fixation durations for the banner only occurred when it was near the central text. Although 54% of the participants reported that they had not paid attention to the banners the results showed they were affected by dynamic banners during the word-search task and by the distant (80 px) static banners when performing the reading-for-comprehension task. The results of the strategy classification algorithm suggested that when readers were performing the word-search task they switched from a scanning strategy to a reading strategy more often if the banner was dynamic, whereas when they were performing the reading-forcomprehension task, they switched from a reading strategy to a scanning strategy more often if the banner was static.

These results have implications for understanding how online advertising banners grab users' attention. They strongly suggest that advertisements affected users in a bottom–up manner. The banners, as salient elements of the Web pages, automatically generated shifts of attention toward them. Although in the current study most of the attentional shifts were covert, these data also provide evidence supporting overt attention theories (Simola et al., 2011). Shifts of attention toward the advertisements were sometimes accompanied by an eye movement. Our comparative analysis of the use of scanning and reading strategies is consistent with previous work suggesting that advertisements have a negative impact on users' performance (Diaper and Waelend, 2000; Burke et al., 2005; Zhang, 2006). Whilst performing the word-search task, participants appear to have slowed down their reading rate more often when the banner was dynamic. During the reading-for-comprehension task, readers seemed to experience more difficulty maintaining a consistent reading rate and switched to a scanning strategy more often when the banner was static. Nevertheless, we expected both static and dynamic banners to affect users more during the word-search task than during **Table 2 | Average fixation durations (ms), average number of fixations and average saccade amplitude (degrees of visual angle) by task, the animation and banner location for the 13 participants who reported that they were not affected by the advertisements.**


*Standard deviations are given in brackets.*

#### **Table 3 | Proportion of** *Scanning* **fixations (as %) for each participant averaged over conditions according to the classifier algorithm.**


the reading-for-comprehension task. Our results showing that dynamic banners had a greater impact on the word-search task than static banners are consistent with previous issues (Simola et al., 2011). However, nothing in the literature explains the interaction between task-type and animation. One possible explanation is that the reading-for-comprehension task was highly demanding, leaving fewer attentional resources available for organizing the sharing of attentional capacity between task processing and banner processing. Participants may also have used strategies actively to ignore the banners. The higher salience of the animated advertisements may have made them easier to ignore. Contrary to previous research which suggested that online advertisements have more impact during tasks requiring low-level information processing, such as the word-search task (Burke et al., 2005; Pagendarm and Schaumburg, 2006; Simola et al., 2011), we found that advertisements affected performance on both tasks. Participants were more affected by dynamic advertisements whilst performing the word-search task, but more disrupted by static advertisements whilst reading for comprehension.

Our data also suggest that readers were not completely able to ignore the advertisements, although banners were generally not fixated directly in both tasks. The number of fixations on the text varied with task and animation. Shifts of attention toward the banner were mostly covert. However, fixation durations on advertisements may imply that sometimes participants glanced briefly at the banners. It is possible that participants used banner blindness strategies when the banners were distant from the central text, but without complete success. The data from participants who claimed that they were not affected by the banners are consistent with findings from Theeuwes and Burger (1998). These authors suggested that banner blindness only occurs when users are aware of the distractors and their features, and when distractors do not vary randomly during the task. In this study advertisements varied unpredictably and participants were not warned about them, which may explain why all the participants were disturbed by them. All our participants were experts Internet users so our findings provide support for Zhang's (2006) assertion that users cannot habituate to online advertisements.

From a practical standpoint, the current work has implications for the design of Web interfaces and could guide Web developers and advertisers in their choice of advertising banners. Banners which are well separated from target material would be preferred by Web developers seeking to limit the impact of advertisements on users and to offer more user-friendly Web interfaces. Whenever possible (depending of the device size) advertisers might prefer to display ads closer to the main content of the Web pages as closer banners attracted longer fixation durations. However, it should be noted that the number of fixations did not vary with distance from the target text. The decision about use of animation might depend on the aim of the Web developer or advertiser; it could also depend on the task for which the Web interface was designed. For tasks which require only low-level cognitive processing, static advertisements might be preferred by Web designers although advertisers would choose dynamic banners. The opposite pattern of preferences would probably apply to tasks requiring greater depth of processing. The present study has demonstrated that eye movements and visual strategies are affected by online advertisements underlining that users' cognitive states are also affected by advertisements. The choice of the type of online advertisement depends on the objective. In future work, it would be interesting to replace the right-hand side banners with another type of advertisement such as pop-ups.

#### **ACKNOWLEDGMENTS**

The authors would like to express their thanks to the students of the University of Paris VIII and the EPHE who agreed to participate in this study and the two anonymous reviewers who help us to improve the manuscript.

#### **REFERENCES**


McConkie, G. W., and Rayner, K. (1976). Asymmetry of the perceptual span in reading. *Bull. Psychon. Soc.* 8, 365–368. doi: 10.3758/BF03335168


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 04 November 2013; accepted: 24 February 2014; published online: 17 March 2014.*

*Citation: Pasqualotti L and Baccino T (2014) Online advertisement: how are visual strategies affected by the distance and the animation of banners? Front. Psychol. 5:211. doi: 10.3389/fpsyg.2014.00211*

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology.*

*Copyright © 2014 Pasqualotti and Baccino. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Attention to advertising and memory for brands under alcohol intoxication

#### *Jacob L. Orquin1\*, Heine B. Jeppesen1, Joachim Scholderer <sup>1</sup> and Curtis Haugtvedt <sup>2</sup>*

*<sup>1</sup> Department of Business Administration – MAPP, Aarhus University, Aarhus C, Denmark <sup>2</sup> Fisher College of Business, Ohio State University, Columbus, OH, USA*

#### *Edited by:*

*Jaana Simola, University of Helsinki, Finland*

#### *Reviewed by:*

*Susanne Kristen, Ludwig Maximilian University of Munich, Germany Emily Catherine Higgins, University of California, San Diego, USA*

#### *\*Correspondence:*

*Jacob L. Orquin, Department of Business Administration – MAPP, Aarhus University, Bartholins Allé 10, 8000 Aarhus C, Denmark e-mail: jalo@asb.dk*

In an attempt to discover new possibilities for advertising in uncluttered environments marketers have recently begun using ambient advertising in, for instance, bars and pubs. However, advertising in such licensed premises have to deal with the fact that many consumers are under the influence of alcohol while viewing the ad. This paper examines the effect of alcohol intoxication on attention to and memory for advertisements in two experiments. Study 1 used a forced exposure manipulation and revealed increased attention to logos under alcohol intoxication consistent with the psychopharmacological prediction that alcohol intoxication narrows attention to the more salient features in the visual environment. Study 2 used a voluntary exposure manipulation in which ads were embedded in a magazine. The experiment revealed that alcohol intoxication reduces voluntary attention to ads and leads to a significant reduction in memory for the viewed ads. In popular terms consuming one or two beers reduces brand recall from 40 to 36% while being heavily intoxicated further reduces brand recall to 17%.

**Keywords: advertising, eye movements, alcohol intoxication, memory, brand recall**

#### **INTRODUCTION**

Our world is cluttered with visual information. Since our attention capacity is limited, we can only process objects we encounter during a day to varying degrees of depth – sometimes processing extensively and other times at a very superficial level. A working assumption of most marketers is that more extensive processing should result in more positive outcomes for the advertised object or issue. Perhaps as a consequence of increased advertising clutter (Pieters et al., 2007) or consumer antagonism to traditional advertising (Jensen et al., 2014) a variety of nontraditional advertising strategies have been suggested as ways to get consumers to pay attention to and process advertisements more extensively. Many strategies have involved attempts to increase the salience of the advertising stimulus by placing it in uncluttered environments. One such strategy is to advertise in licensed premises, using different types of media such as restroom advertisements, promotional beer mats, and pub TV systems. The low density of advertisements in licensed premises could suggest that such a strategy might indeed be successful (Pieters et al., 2007). However, proponents of this strategy have to contend with the fact that customers in licensed premises consume alcohol. Research on the psychopharmacological effects of alcohol has demonstrated serious impairments at perceptual and post-perceptual stages of information processing, including impairments of attention functions such as object recognition (Maylor et al., 1987), allocation of resources to stimulus analysis and response selection (Pickworth et al., 1997; De Cesarei et al., 2006) and conceptual processing functions such as encoding and elaboration (Hashtroudi et al., 1983; Saults et al., 2007; Söderlund et al., 2007). In general, these impairments result in (a) a narrowing of visual attention to the most salient features in a complex stimulus (b) shallow processing of conceptual information and (c) memory loss.

In the context of advertising exposure under the influence of alcohol, it can therefore be expected that ad elements which are predominantly processed by perceptual mechanisms, such as logos and images, will have a selective advantage over ad elements that are predominantly processed by conceptual mechanisms, such as headlines and text blocks. It furthermore seems plausible that alcohol intoxication will have a detrimental effect on brand recall although the strength of such an effect is probably moderated by attention to important ad elements like the logo. To the best of our knowledge, no research on the influence of alcohol consumption and reactions to advertising has been reported. The current research is an exploration of how various levels of alcohol might influence perception of advertising messages. While the current research focuses on traditional product advertising, the procedures and results of this research may have implications for the way individuals react to various public and personal safety messages under varying degrees of alcohol consumption.

#### **STUDY 1**

Study 1 addressed the question of how alcohol intoxication affects visual attention to ad elements. As suggested above, psychopharmacological effects of alcohol intoxication such as a narrowing of the attention span to salient stimuli might translate well into advertising perception to mean that intoxicated consumers will focus more on perceptual ad elements like the logo or the image. However, the degree of attention to perceptual versus conceptual ad elements is likely to depend on the balance between perceptual and conceptual elements in the visual scene (Wedel and Pieters, 2007; Orquin et al., 2013a; Peschel and Orquin, 2013). To test whether the effect of alcohol on attention to advertising depends on the balance between perceptual and conceptual elements, we conducted an eye tracking experiment manipulating the perceptual and conceptual load of advertisements. Eye tracking provides an objective measure of eye movements which is a reliable indicator of overt visual attention (Orquin and Mueller Loose, 2013).

#### **METHODS**

#### *Participants*

Thirty six undergraduate and graduate business students with specializations other than marketing or corporate communications were recruited on campus for participation in the study. Their mean age was 23.87 years (SD = 1.83), 36% were female.

#### *Experimental design*

Two factors of the advertising stimuli were varied: (1) brand (12 levels, representing consumer goods, services, and corporate brands) and (2) perceptual and conceptual load (three levels: high perceptual load with a dominance of pictorial elements, high perceptual and conceptual load with a balance between pictorial and text elements, and high conceptual load with a dominance of text elements). Pretesting was used to determine these levels. The two factors were completely crossed in the master design, resulting in 36 stimuli. The design was then blocked in such a way that each participant was exposed to all twelve levels of the first factor, brand, and at equal proportions of the levels of the second factor, perceptual and conceptual load.

#### *Materials and measures*

The 36 experimental ads were developed using a graphic design software and were all based on existing market stimuli. All ads used color and were displayed in a similar size on a 21 inch color screen.

For ethical reasons we decided not to manipulate the blood alcohol concentration (BAC) of our participants. Instead we measured the BAC levels of already sober and intoxicated participants using a digital breathalyzer. BAC levels ranged between 0% (sober) and 0.164% (heavily intoxicated), with a mean level of 0.056% (SD = 0.054).

Measures of visual attention were obtained by means of eye tracking (Tobii 2150, frame rate: 50 frames per second). Three measures of eye movements were extracted from the eye tracker logs for each major ad element (headline, logo, image, text) in each ad: time to first fixation on the ad element, number of fixations before the first fixation on the element occurred (both measured from stimulus onset), and total fixation time to each ad element.

#### *Procedure*

All participants were recruited in the university student club in the late afternoon and evening hours and accompanied to the lab facilities by the experimenter. Before the experiment started, each participant's BAC was tested using a digital breathalyzer. Participants were positioned in front of the eye tracker and after calibration and a series of training stimuli each participant was randomly assigned to a block of 12 advertising stimuli. Each stimulus was presented for 10 sec thus creating a competition for attention among the ad elements (Orquin and Scholderer, 2011). After the experiment participants were thanked and accompanied back to the student club.

#### **RESULTS**

To test the hypothesis that alcohol intoxication influences the salience of logos we analyzed the effect of BAC on eye movements to the four major ad elements (headline, logo, image, text) by means of Cox regression. The models were specified in such a way that the effects of BAC were estimated separately within levels perceptual and conceptual load, controlling for brand and stratified by participant. Instances where a participant had not fixated on an ad element were defined as censored events. Likelihood ratio tests of the significance of the alcohol effect are reported in **Table 1**.

**Table 1 | Effect of blood alcohol concentration on the visual salience of ad elements (headline, logo, image, text) under different levels of perceptual and conceptual load.**


The analysis revealed that increased alcohol intoxication led to a significant amplification of the salience of the logo. A general reduction was observed in the number of fixations to other ad elements that occurred before participants fixated on the logo for the first time. Cumulative probabilities for the first fixation on the logo are plotted in **Figure 1**.

The effect of alcohol intoxication on salience of the logo was not moderated by the degree of perceptual and conceptual load in the advertisements. The amplification of salience also occurred in the high perceptual load condition, ruling out the alternative explanation that the effect could have been

an artifact, caused by a reduction in conceptual information processing.

This is not to say that impairments of conceptual processing did not occur. In the high conceptual load condition, alcohol intoxication led to significant increases in the *time* before the first fixations on headline and text block occurred (see **Table 1**), suggesting a slowing-down of the conceptual information processing mechanisms. Furthermore, additional analyses revealed that alcohol intoxication led to significant decreases in the accumulated number of times participants fixated on the headline (Wald <sup>χ</sup>2[1] <sup>=</sup> 9.313, *<sup>p</sup>* <sup>=</sup> 0.002) and text block (Wald <sup>χ</sup>2[1] <sup>=</sup> 11.094,

*p* = 0.001). No such effects were found for the visual elements, logo and image.

**Table 2** provides an overview of the effect of alcohol intoxication on eye movements to ad elements for three levels of alcohol intoxication: sober (BAC = 0%), intoxicated (BAC > 0% and ≤ 0.02%), and heavily intoxicated (BAC > 0.02%). The table contains three metrics the first of which is *fixation likelihood* which is the probability that a participant fixates an ad element, the second is *total fixation time* which is the average time participants spent viewing ad elements taking into account both the number and duration of fixations. The total fixation time is not conditional on participants having fixated the ad element. Trials in which a participant did not fixate the ad element are counted as zero total fixation time. The third metric, *fixations before*, is the average number of fixations that participants have to other ad elements before fixating the target element.

#### **DISCUSSION**

The aim of the first study was to assess how visual attention to advertisements may be affected by alcohol intoxication. Based on established psychopharmacological findings, we hypothesized that the salience of the perceptual elements in complex advertisements would be selectively increased under conditions of alcohol intoxication, whereas the processing of conceptual information would be impaired. The results support our hypothesis, but in a more specific manner than originally expected: the selective increase in visual salience was only observed for logos (either brand or corporate) but not for other pictorial elements such as representations of products or human models. Additionally, we found the increased salience of logos was reflected in fixations before but not in time to first fixation. The only difference between the two metrics is that time to first fixation take the duration

**Table 2 | Fixation likelihood, total fixation time and fixations before to the four add elements according to three levels of intoxication: sober (BAC = 0%), intoxicated (BAC ≤ 0.02%), heavily intoxicated (BAC ≤ 0.169%).**


of individual fixations into account. Finding an effect for fixations before but not for time to first fixation therefore suggests that although intoxicated participants had fewer fixations before fixating the logo the duration of their fixations were longer than for sober participants. This interpretation seems consistent with the general finding that alcohol intoxication slows down cognitive processing.

The results suggest that "reminder" advertisements, primarily intended to increase the accessibility of the brand in the mind of the customer, will be effective in environments that involve the consumption of alcohol. Advertisements that intend to persuade, on the other hand, are likely to suffer.

Although the results confirm and extend psychopharmacological findings in the area of advertisement perception the interpretation may be limited due to the use of forced exposure to advertising stimuli. In Study 2 we address this issue by employing a voluntary exposure paradigm in which participants voluntarily fixate the advertising stimuli.

#### **STUDY 2**

Study 2 examined the effect of alcohol intoxication on the distribution of attention to ads and ad elements as well as brand recall in a voluntary ad exposure paradigm. Whereas Study 1 used a forced exposure paradigm, Study 2 employed a procedure that more realistically simulated voluntary attention to ads in real life situations. The ads were embedded in a consumer magazine consistent with previous eye tracking research on voluntary attention to ads (Pieters et al., 2002, 2007; Pieters and Wedel, 2004). Such a procedure minimizes demand characteristics and furthermore allows assessment of whether alcohol intoxication has an influence on overall attention to ads. Study 2 used the same experimental design as Study 1 except that all experimental stimuli were embedded in a magazine.

#### **METHODS**

#### *Participants*

Thirty six undergraduate and graduate business students with specializations other than marketing or corporate communications were recruited on campus for participation in the experiment. The mean age was 22.97 years (SD = 1.73), 44.5% were female.

#### *Experimental design*

The experimental design was identical to that in Study 1 manipulating the conceptual and perceptual load of ads for 12 different brands. The 36 experimental ads were blocked in three groups and inserted in a consumer magazine.

#### *Materials and measures*

The experimental stimuli consisted of 36 ads identical to those in Study 1. The ads were embedded in a consumer magazine with an even distribution of ad compositions and in a fixed order resulting in three different versions of the magazine. Each ad occupied an entire page in the magazine. As in Study 1 measures of BAC were obtained using a digital breathalyzer. BAC levels ranged from 0 to 0.169% (heavily intoxicated) with a mean level of 0.55% (SD = 0.53). Eye movements were recorded on the same eye tracker as in Study 1 and identical eye movement metrics were extracted from the log. Additionally, measures of brand recall were obtained from participants using a cued recall procedure. Brand recall was measured one day after the laboratory test.

#### *Procedure*

All participants were recruited in the school's student club in the late afternoon and evening hours. Participants were accompanied to the lab facilities by the experimenter. Before the experiment started, each participant's BAC was measured using a digital breathalyzer. Participants were positioned in front of the eye tracker and after the individual calibration of the eye tracker, the participants were randomly assigned to one of the three experimental blocks. Participants were informed that they could browse through the magazine at their own pace and the test ended when the participants reached the final page in the magazine. The magazine was presented one page at a time with each ad occupying an entire page. Participants were not informed about the purpose of the experiment. After the experiment participants were thanked and accompanied back to the student club. The day after the eye tracking study each participant received a questionnaire measuring cued brand recall. Sixty-four percent of the participants replied to the brand recall questionnaire.

#### **RESULTS**

The first step in the analysis addressed the question of whether alcohol intoxication had any effect on advertising attention capture (whether the ad or ad element was fixated or not). The analysis was carried out by means of a generalized estimating equation with a logit link function and a binomial response distribution using attention capture as dependent variable and ad element (logo, headline, text, image, entire ad), ad version (perceptual load, conceptual load, mixed conceptual and perceptual load), and BAC as independent variables in a full factorial design. The results are reported in **Table 3** below.

In order to interpret the results we extracted descriptive statistics for the effect of alcohol intoxication on attention capture to ad elements. Similar to Study 1, participants were grouped into three levels of alcohol intoxication: sober (BAC = 0%), intoxicated (BAC > 0% and ≤0.02%), and heavily intoxicated (BAC > 0.02%). The descriptive statistics are shown in **Table 4**.

It is clear from **Table 4** that alcohol intoxication has a negative effect on attention capture for all ad elements including the ad itself. The decrement in attention capture is particularly strong for logos, which is surprising given the results of Study 1.

In the second step of the analysis we examined the effect of alcohol intoxication on fixation count to ad elements. Fixation count is the number of times the participant fixates on a stimulus and can be used as an indicator for the strength of interest in a stimulus or as an indicator of confusion. The analysis was carried out by means of a linear mixed model using fixation count as dependent variable and ad element, ad version, brand, and BAC as independent variables. The analysis revealed a significant effect of ad version, *F*(2,875.55) = 5.42, *p* < 0.01, a significant effect of brand, *F*(11,875.73) = 3.44, *p* < 0.01, a significant effect of ad element, *F*(3,880.09) = 31.08, *p* < 0.01, and a significant interaction effect between BAC and ad element, *F*(3,881.79) = 2.88, *p* < 0.05. The interaction effect between BAC and ad element is illustrated in **Figure 2**.

It is clear from **Figure 2** that alcohol intoxication does not have any effects on fixation count except for the text element for which fixation count increases as a function of BAC. However, the effect of alcohol intoxication on fixation count to the text element does not necessarily mean that intoxicated participants read more of the ad copy relative to sober participants. An alternative and perhaps more plausible interpretation is that alcohol intoxication has a detrimental effect on reading abilities which necessitates more fixations to process the same amount of text.

In the last step of the analysis we examined the effect of alcohol intoxication on brand recall. The analysis was carried out by means of a generalized estimating equation with a logit link function and a binomial response distribution using ad recall as dependent variable and BAC and ad attention capture as independent variables. The results are reported in **Table 5**.

The analysis revealed that alcohol intoxication has a significant negative effect on brand recall (Model 1) although the effect diminishes when controlling for ad attention capture (Model 2). Having shown in step 1 of the analysis that alcohol intoxication has a significant negative effect on attention capture for the entire ad, it should be clear that the negative effect of alcohol intoxication on brand recall is partially mediated by attention capture. In other words, alcohol intoxication significantly diminishes brand recall, but only a part of this effect is due to memory loss another part is due to reduced ad attention capture.



**Table 4 | Fixation likelihood, total fixation time and fixations before to the four add elements and the entire ad according to three levels of intoxication: sober (BAC = 0%), intoxicated (BAC** *>* **0% and ≤ 0.02%), heavily intoxicated (BAC** *>* **0.02%).**


In more popular terms consuming one or two beers diminishes the probability of brand recall from 40 to 36% while being heavily intoxicated further diminishes the probability of brand recall to 17%.

#### **DISCUSSION**

Study 2 examined the effects of alcohol intoxication on attention to advertising under a voluntary exposure paradigm which, compared to Study 1, more realistically simulates real world ad exposure. The results suggest that alcohol intoxication changes

**Table 5 | Effect of alcohol intoxication on brand recall controlling for ad attention capture.**


attention to ads in several ways. First of all, alcohol intoxication lowers the likelihood of participants fixating on the ad and particularly the logo. This result is in stark contrast to Study 1 showing an increase in the salience of logos as a result of alcohol intoxication. Since the experimental stimuli were identical for the two studies this can only mean that alcohol intoxication lead to remarkably different effects on attention under a voluntary versus a forced exposure paradigm. Another interesting finding in Study 2 was that text elements received more fixations under alcohol intoxication. This, however, does not mean that intoxicated participants read more of the text than sober participants. More likely, intoxicated participants need additional fixations to process the same amount of text due to impairments in conceptual processing. Most importantly, Study 2 showed that alcohol intoxication has a strong negative effect on brand recall even when controlling for ad attention capture. The analysis also revealed that the detrimental effect of alcohol intoxication on brand recall is partially mediated by the diminished attention capture.

#### **GENERAL DISCUSSION SUMMARY OF FINDINGS**

Study 1 found that alcohol intoxication leads to a significant increase in the visual salience of logos compared to other ad elements. The result is in line with psychopharmacological findings on the effect of alcohol intoxication on cognitive processing. The increased salience of logos occurred regardless of the ad composition and did not affect other perceptual ad elements like pictorial representations. However, these results were obtained in a forced exposure paradigm and we decided to conduct a second experiment to assess the effect of alcohol intoxication in a more realistic voluntary exposure paradigm.

Study 2 examined attention to ads in a voluntary exposure paradigm and found that alcohol intoxication had a negative effect on fixation likelihood for the ad as a whole and for each individual add element. While diminishing the overall attention capture for all ad elements alcohol intoxication increased the number of fixations to the text element which could suggest that intoxicated participants needed more fixations in order to process the text. Furthermore, alcohol intoxication had a strong negative impact on subsequent brand recall which means that a high degree of alcohol intoxication diminishes brand recall by more than 50%.

Interestingly, the two studies demonstrate that alcohol intoxication can lead to dramatically different outcomes depending on whether ad exposure is forced or voluntary. In the forced exposure condition alcohol intoxication increases the salience of the logos which is generally beneficial for the advertiser. However, in the voluntary exposure condition alcohol intoxication leads to a significant decrease in the overall attention to ad elements which is detrimental the effectiveness of advertising.

#### **MANAGERIAL AND POLICY IMPLICATIONS**

From a managerial perspective our results lead us to the conclusion that advertising under licensed premises should constrain itself to the use of reminder ads intended to increase the accessibility of the brand or product. The main reason for this suggestion is that alcohol intoxication impairs conceptual processing of ads which limits the probabilities of persuasion through central route argumentation (Petty and Cacioppo, 1986). It also appears that alcohol intoxication increases the visual salience of logos but only underforced exposure. Under a voluntary exposure paradigm with many visual distractors (in this case magazine articles) alcohol intoxication actually diminishes the likelihood of fixating the ad and the logo. One consideration for advertising under licensed premises would therefore be the degree to which one can control distractors in the environment. One clever strategy which has become popular in many bars is placing ads directly above urinals which, one could argue, is as close as one can get to forced exposure.

Another consideration is that the advertised product should furthermore be for immediate consumption since brand recall will diminish considerably as a function of alcohol intoxication. Using advertising under licensed premises for consumer learning of for instance new products would therefore have to consider the extra expenditure to reach the same degree of consumer learning.

From an ethical perspective the present research solves one issue but raises another. Importantly, there were no indications that alcohol intoxication led to extra influences of advertising on consumers. On the contrary, alcohol intoxication was found to impair conceptual processing of ads as well as recall for the advertised brands which necessarily lowers the effects of persuasion attempts. On the other hand, it was demonstrated that alcohol intoxication under some conditions increases the visual salience of logos which could be used for increasing the accessibility of products for immediate consumption. This could be problematic if advertising under licensed premises for products like alcohol or cigarettes lead to an increased consumption of these products, but the enhanced impact idea could also be used for advertising of cab services or protection against sexually transmitted diseases. The issue is particularly important since other studies have suggested that intoxicated people respond stronger than sober people both to irresponsible short-term incentives as well as more prudent long-term goals (MacDonald et al., 2000).

#### **LIMITATIONS AND FUTURE RESEARCH**

One of the main limitations to our studies is the fact that all data collection took place in a lab environment. This naturally limits the external validity of the results and an important step for future research would therefore be to study attention to ads in more natural environments.

Another important limitation in both studies stems from the decision to measure rather than manipulate the BAC. Our decision to measure BAC rather than manipulate it was based on ethical considerations. However, choosing this approach we had to contend with the fact that participants were not randomly assigned to experimental conditions. It is easy to imagine that some participants are more likely to engage in alcohol consumption and that this tendency could be correlated with other traits that could have influenced the experimental results. Furthermore, because participants were recruited in a student club it was impossible to control for exposure to nicotine which has been shown to influence attention (Bekker et al., 2005). Future research should ideally take these issues into consideration in designing experiments both aiming for high external validity using methods such as mobile eye tracking yet avoiding threats to internal validity such as possibly non-random assignment of participants to experimental conditions and control over exposure to other stimulants.

#### **AUTHOR NOTE**

Parts of this article are taken from abstracts presented in the 39th EMAC Conference (Jeppesen and Scholderer, 2010) and the Conference on APA Convention (Orquin et al., 2013b).

#### **REFERENCES**


Wedel, M., and Pieters, R. (2007). A review of eye-tracking research in marketing. *Rev. Mark. Res.* 4, 123–147.

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 11 November 2013; accepted: 24 February 2014; published online: 25 March 2014.*

*Citation: Orquin JL, Jeppesen HB, Scholderer J and Haugtvedt C (2014) Attention to advertising and memory for brands under alcohol intoxication. Front. Psychol. 5:212. doi: 10.3389/fpsyg.2014.00212*

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology. Copyright © 2014 Orquin, Jeppesen, Scholderer and Haugtvedt. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Narcissism and consumer behavior: a review and preliminary findings

#### *Sylwia Z. Cisek\*, Constantine Sedikides, Claire M. Hart, Hayward J. Godwin, Valerie Benson and Simon P. Liversedge*

*Centre for Research on Self and Identity, School of Psychology, University of Southampton, Southampton, UK*

#### *Edited by:*

*Jaana Simola, University of Helsinki, Finland*

#### *Reviewed by:*

*Ramesh Kumar Mishra, University of Hyderabad, India Ilona Mikkonen, Aalto University School of Business, Finland*

#### *\*Correspondence:*

*Sylwia Z. Cisek, Centre for Research on Self and Identity, School of Psychology, University of Southampton, Southampton SO17 1SW, UK e-mail: s.z.cisek@soton.ac.uk*

We review the literature on the relation between narcissism and consumer behavior. Consumer behavior is sometimes guided by self-related motives (e.g., self-enhancement) rather than by rational economic considerations. Narcissism is a case in point. This personality trait reflects a self-centered, self-aggrandizing, dominant, and manipulative orientation. Narcissists are characterized by exhibitionism and vanity, and they see themselves as superior and entitled. To validate their grandiose self-image, narcissists purchase high-prestige products (i.e., luxurious, exclusive, flashy), show greater interest in the symbolic than utilitarian value of products, and distinguish themselves positively from others via their materialistic possessions. Our review lays the foundation for a novel methodological approach in which we explore how narcissism influences eye movement behavior during consumer decision-making. We conclude with a description of our experimental paradigm and report preliminary results. Our findings will provide insight into the mechanisms underlying narcissists' conspicuous purchases. They will also likely have implications for theories of personality, consumer behavior, marketing, advertising, and visual cognition.

**Keywords: narcissism, consumerism, eye movement, self-motives, symbolic goods**

"Some people think luxury is the opposite of poverty. It is not. It is the opposite of vulgarity."

— Coco Chanel

#### **INTRODUCTION**

The literature indicates that consumer behavior cannot be adequately described by rational models or bounded rationality models (Arnould and Thompson, 2005; Strack et al., 2006). We advocate a complementary approach based on the influence of self-relevant considerations on purchasing decisions. We focus on the construct of narcissism (a self-aggrandising and manipulative interpersonal orientation) and show that narcissism is related to materialism. We further argue that narcissists engage in conspicuous consumption, at least in part, as an attempt to address underlying insecurities. Narcissists' proclivity for products with symbolic value (e.g., luxurious, flashy) over products with instrumental value (e.g., practical, common-looking) may soothe such insecurities. We describe a novel line of research that addresses these issues and delineate plans to investigate them via eye-tracking methodology. Our plans will allow us to determine whether there are individual differences in the information processing behavior of narcissists relative to non-narcissists when choosing a potential product to purchase.

#### **CONSUMER DECISION-MAKING AND SELF-RELATED MOTIVES**

As long ago as 1890, William James stated that material possessions play an important role in defining the self, and may even become an extension of one's self. If this is so, we would expect

self-relevant considerations (e.g., self-enhancement, self-esteem, self-consistency) to influence the acquisition of material goods. Traditional models of consumer behavior have paid mostly lip service to such motives. These models focused to a large degree on rational economic behavior, where consumers were depicted as engaging in deliberate cost-benefit analysis before deciding to purchase goods (Deaton and Muellbauer, 1980; see also Zinkhan, 1992). Consumers, according to these models, are driven mostly by utilitarian motives. As such, consumers purchase products that deliver what they promise to do (i.e., functionality), are easy to use (i.e., practicality), and are relatively inexpensive (i.e., affordability). In other words, consumers strive for"best value for money." If the choices they make are not always optimal, bounded rationality theory could come to the rescue. According to this theory, situational constraints (e.g., time pressure, restricted access to information, overabundance of information, and susceptibility to cognitive biases) limit the rationality of consumer decision making (see for example Strack et al., 2006). Nevertheless, from this theoretical standpoint, the decision to buy or not to buy is still considered a consequence of utilitarian concerns or economic considerations, even if such a decision is impaired to some degree by inaccuracy or error.

Traditional models of consumer behavior have been challenged and receded to the background, as they were unable to adequately describe purchasing decisions and were largely unsupported by data. More recent research on consumer behavior has yielded evidence consistent with James' (1890) original proposals. This research has demonstrated that consumer attitudes towards products and consumer shopping behavior are not only determined by their functional features but also, and perhaps even more so, by their symbolic features (Sirgy, 1982; Belk, 1988; Kressman et al., 2006; Sedikides et al., 2007). People do purchase products for pragmatic reasons, that is, merely as a way to cope more effectively with the necessities of everyday life; however, people also purchase products as means to define and express themselves (Belk, 1985; Dittmar, 1992; Aaker, 1996), to regulate their moods or emotions (Dittmar and Drury, 2000; Dittmar, 2011), to bolster their selfesteem or gain social status (Banerjee and Duflo, 2007; Sedikides et al., 2007), and to fulfil their needs for self-efficacy or mastery (Dittmar, 2011). Thus, self-oriented considerations often underlie consumer decision and behavior.

This self-oriented perspective on consumer behavior is gaining momentum. Dunning (2007a) argued that consumers are not merely rational or "cold" decision-makers, who evaluate and choose consumer alternatives according to standard economic principles. Rather, consumers are motivated agents, driven to make decisions consistent with their "sacrosanct beliefs," selfimages, and self-motives. Consumers make purchases that allow them to view themselves (and be viewed by others) as competent, endearing, and honorable. This notion helps to explain many purchase decisions that would be difficult to comprehend from the point of view of traditional models of consumer behavior. Several theoretical formulations, backed by empirical evidence, concur with such a notion.

Self-congruity theory states that various products and brands are perceived as having certain "personality" traits that in turn reflect the traits of their users. According to this theory, consumers making their buying decisions attempt to choose brands that match their own self-image be it actual or ideal. Self-congruity is guided mostly by two self-related motives: self-consistency and self-esteem (Sirgy, 1982; Malhotra, 1988; Kressman et al., 2006). The self-consistency motive drives individuals toward purchasing products that match the way in which they perceive themselves, that is, products that fit with their preferences, lifestyles, and dispositions. Buying products characterized by features that closely match consumers' desired identities reduces discrepancies between their ideal and actual self and thus boosts their self-esteem.

Complementing self-congruity theory, the purchasing and using of material goods is regarded by symbolic self-completion theory (Wicklund and Gollwitzer, 1982) as a compensation strategy. This theory postulates that, whenever self-discrepancies between the actual and ideal self are detected, or one's self-image is threatened, compensation motivation becomes activated. This motivation may lead to the acquisition of material possessions, which will either cover (at least momentarily) the detected selfinadequacies or soothe the acute self-threat. For example, people who aspire to master an activity (e.g., playing golf), but are not yet good at it, may purchase expensive equipment and gadgets related to this activity (e.g., golf clubs or clothing) in order to compensate for their undeveloped skills, and thereby promote a more favorable impression of themselves to others and perhaps to themselves.

Symbolic self-completion theory further proposes that individuals are more likely to become materialistic, if their self-concepts are uncertain or threatened (Wicklund and Gollwitzer, 1982). Materialism is defined as "the importance a consumer attaches to worldly possessions" (Belk, 1984, p. 291). The construct has been operationalized as holding the belief that acquiring conspicuous goods is a major route to success, esteem, and happiness (Richins and Dawson, 1992). The implication is that consumers chase prosperity and obtain material goods in an attempt to substantiate their self-definitions. The literature is consistent with this idea. Insecurity experienced as a result of death cognitions (i.e., fear of death; Kasser and Sheldon, 2000) or simply in dreams (Kasser and Kasser, 2001) is positively associated with materialistic attitudes. In addition, mere exposure to self-threatening words (e.g., doubtful, incompetent) activates materialistic attitudes (Chang and Arkin, 2002). Materialism has also been linked with lower self-esteem (Richins and Dawson, 1992; Kasser, 2002), and experimental inductions of low self-esteem or insecurity lead to a materialistic orientation (Braun and Wicklund, 1989; Chaplin and John, 2007). Loneliness also contributes to materialism. In a longitudinal study, loneliness was associated with an increase in materialism over time and, simultaneously, materialism was associated with an increase in loneliness over time (Pieters, 2013). Finally, materialism is associated with negative emotions, such as sadness: individuals who focus on their sadness spend higher amounts of money for the acquisition of material possessions (Cryder et al., 2008).

People may strategically use materialistic possessions to compensate for self-image impairment. In a study by Dong et al. (2013), participants used consumer goods to reinstate their positive identities or at least "hide their faces" so as to minimize negative emotions following embarrassing incidents. Participants were engaged symbolically in these coping tactics by choosing products that either literally covered their faces (e.g., sunglasses) or products that improved and restored their appearance (e.g., cosmetics). Furthermore, the tactic that participants employed had meaningful psychological consequences for their future behavior. Symbolically repairing one's face by choosing adequate products diminished their aversive emotions (e.g., embarrassment) and restored their willingness to interact with others, whereas symbolically hiding their face had little impact on their emotions and engagement in social behavior. Similarly, Sivanathan and Pettit (2010) demonstrated that individuals resort to highstatus and conspicuous goods in an attempt to restore their threatened self-image, thus treating consumption as an indirect source of self-affirmation (Sherman and Cohen, 2006). These studies establish that individuals can strategically use material goods in the service of self-related considerations or motives.

The relation between the self and materialism may have become more prevalent in contemporary life with the rise and dominance of globalization (Dunning, 2007b). Western and Eastern societies alike value money, expensive acquisitions, and attractive features, and consider them valid indicators of success and happiness. Affluent life styles are associated with autonomy, control, high achievement, and desirable states of wellbeing. Furthermore, various products and services are commonly advertised as necessary bridges toward the positive self and as a *sine qua non* for blissful existence, free from self-worth anxieties and worries (Dittmar, 2011). In all, the influence of self-related motives on marketing decisions is probably on the rise. This said, of course, the theoretical and empirical landscape is bound to be nuanced. For example, the strength of the influence of self-related motives on consumer behavior is moderated by various personality traits, such as level and contingency of self-esteem or self-concept clarity (Dunning, 2007a,b). We propose yet another moderator: narcissism (Sedikides et al., 2007).

#### **NARCISSISM DEFINED**

We view narcissism (subclinical narcissism, to be exact) as a normally distributed personality trait. Narcissism is typically operationalized with the Narcissistic Personality Inventory (NPI; Raskin and Terry, 1988). For convenience, we will refer to persons scoring high on the NPI as narcissists and to those scoring low on the NPI as non-narcissists.

We define narcissism as an agentic, egocentric, selfaggrandizing, dominant, and manipulative orientation (Emmons, 1987; Sedikides et al., 2004). Narcissists have highly inflated and unrealistically positive self-views and feel entitled. They also lack regardfor others, showing a diminished interest in affiliation, communal values, and pro-social behavior (Campbell and Foster,2007; Cisek et al., 2008). Indeed, narcissism is positively associated with antagonism, aggression, and hostility towards others (especially outperforming or critical others), and is negatively associated with agreeableness, empathy, and intimacy (Sedikides et al., 2002; Morf et al., 2011; Hepper et al., in press).

Narcissists are addicted to self-esteem (Baumeister and Vohs, 2001) and to striving towards self-enhancement (Sedikides and Gregg, 2001). They manifest exhibitionism, vanity, and a relentless need to validate their overly favorable self-beliefs in the presence of others (Wallace and Baumeister, 2002). They are also statusoriented and power-driven (Bradlee and Emmons, 1992; Horton and Sedikides, 2009). In order to maintain their excessively positive self-views, narcissists rely on several self-regulatory strategies. They engage in grandiose self-displays (e.g., boasting), flaunt their material possessions, and associate with high-status others (Buss and Chiodo, 1991; Campbell, 1999).

#### **NARCISSISM, MATERIALISM, AND CONSPICUOUS CONSUMPTION**

Narcissists display their affluence as a self-presentational tactic. After all, carefully chosen materialistic possessions can symbolize an individual's traits, skills, preferences, values, and personal goals, thus differentiating them from others and portraying them as unique and special. Material possessions constitute a rich source of information about others' identity (Burroughs et al., 1991), and may successfully (and to some degree accurately) express one's actual and ideal selves. This is so, because material goods convey clues that can be used to make inferences about the owners of these goods. For example, affluent individuals are judged as capable (e.g., intelligent, competent) and sophisticated (e.g., cultured, knowledgeable; Christopher and Schlenker, 2000). Such attributes are highly valued by narcissists, given their agentic inclinations (Sedikides et al., 2002). Affluent individuals are at the same time perceived as relatively inconsiderate (e.g., unkind, unhelpful, unlikable; Christopher and Schlenker, 2000), but this is a price that narcissists are willing to pay given their lack of communal proclivities (Sedikides et al., 2002). Narcissists' self-worth is contingent on the admiration that they receive from others rather than on building long-lasting relational bonds or on gaining genuine social approval (i.e., respect). Consistent with this reasoning, Lee et al. (2013) demonstrated that narcissists' consumer decisions are guided by their need to distinguish themselves positively from others. They do so by purchasing goods that are scarce, unique, exclusive, and customizable. They perceive acquisition of such goods as an opportunity to validate, sustain, and elevate their positive self-image.

Taking all this into account, it is perhaps unsurprising that narcissism is associated with materialism (Sedikides et al., 2011) and proneness to compulsive buying (Rose, 2007). Narcissists openly report more interest in pursuing wealth and social status than in pursuing affiliation and communal endeavors (Kasser and Ryan, 1996). They desire material possessions (Cohen and Cohen, 1996) and have high economic aspirations (Roberts and Robins, 2000), prioritizing financial goals, such as attaining a prestigious and well-paid job or securing high standards of living, over social goals, such as helping or teaching others. Similarly to materialistic individuals, they are prone to purchasing high-status and expensive products, which are likely to signal status and sophistication (Richins, 1994). Thus, many features of narcissism mirror materialism, suggesting that both narcissists and materialists engage in conspicuous consumption in an effort to boost their status, self-protect, or derive self-esteem from the responses of admiring others.

We have reviewed literature showing that narcissism is linked with consumer behavior. Conversely, it has been suggested that the emphasis on consumerism prevalent in contemporary society sparks increases in narcissism. Lasch (1991) labeled contemporary society as the "culture of narcissism." He argued that the culture is characterized by an "entitlement mentality" that comprises an unjustified sense of privilege. This, in turn, pushes consumers further into a vicious circle of excessive consumption that feeds their self-images. Indeed, there is evidence that levels of narcissism are on the rise both in Western culture (Twenge et al., 2008) and in Eastern (i.e., Chinese) culture (Cai et al., 2012).

As narcissists are predominantly concerned with agentic (rather than communal) traits and goals, their materialism may further strengthen this orientation. Focusing on money enhances individuals' self-sufficiency, but diminishes their communal motives and pro-sociality. In particular, participants primed with money (vs. those in a control condition) are less likely to ask for help and to help others, prefer to work and play alone, and keep a larger physical distance between themselves and others (Vohs et al., 2006). Also, cues that prompt materialism inhibit social values and increase competitiveness (Bauer et al., 2012). Moreover, individuals preoccupied with money are egocentric (Belk, 1985) and often feel alienated and disconnected from others (Kasser, 2002; Pieters, 2013). In summary, materialism may exacerbate the signature narcissistic characteristics: self-sufficiency or autonomy, egocentricity, competitiveness, unwillingness to help, and poor interpersonal relationships.

#### **COMPENSATING FOR EGO FRAGILITY**

One reason why individuals embrace materialism and engage in consumption of prestigious brands may have to do with their attempts to override their inadequacy and to compensate for doubt surrounding their self-worth (Wicklund and Gollwitzer, 1982; Chang and Arkin, 2002). Narcissists routinely present themselves as successful and confident, if not arrogant. However, this direct and bold approach may be partially a disguise for their brittle, fragile selves (Gregg and Sedikides, 2010; Cheng et al., 2013). This inner fragility is often attributed to inadequate parental practices such as excessive adoration or copious neglect (Kernberg, 1975; Kohut, 1976; Horton et al., 2006; Otway and Vignoles, 2006).

There is empirical support for these theoretical proposals. First, narcissists' affect and level of self-esteem fluctuate more than those of non-narcissists (Bogart et al.,2004; Zuckerman and O'Loughlin, 2009; Zeigler-Hill et al., 2010). Secondly, narcissists, in comparison to non-narcissists, have lower implicit self-esteem (Jordan et al., 2003; Zeigler-Hill, 2006; Gregg and Sedikides, 2010). Thus, narcissist consumer behavior may be seen as a coping strategy to compensate for their self-doubts and insecurities. Through their consumer behavior, they project the positive attributes of their purchases onto their private and public self-image. In particular, they show a distinct consumer behavior pattern, where they accumulate and display material possessions (e.g., flashy, highly fashionable clothes, top-range cars, expensive watches) that bear prestigious labels (Sedikides et al., 2011). On the basis of flimsy evidence, then, we would surmise that Coco Chanel was endorsing a narcissistic consumer behavior pattern in our opening quote.

If the main force guiding narcissists' conspicuous consumption is rooted in their self-related motives, and specifically serves to fulfill their self-presentational goals, narcissists should be focused on symbolic rather than on instrumental aspects of acquired or potential purchases. Sedikides et al. (2007) hypothesized that narcissists should demonstrate this preference in their consumer choices by selecting ostentatious, fashionable, and flashy versions of a given product over affordable, practical, and modest-looking versions. To test this hypothesis, Cisek et al. (2011) conducted a vignette study in which participants were presented with pictorial and descriptive examples of two alternatives for seven different products (e.g., mobile phones, MP3 players, laptops). One example represented a symbolic choice reflecting superior attractiveness but inferior practicality. The other example represented an instrumental choice reflecting superior practicality but inferior attractiveness. Participants were instructed to indicate their consumer choice for each of these products. The obtained results confirmed that narcissism significantly and positively predicted the number of symbolic products chosen. Thus, narcissists were more concerned with the products' expressive properties rather than their practical attributes. They selected new and impressive-looking items over reliable and practical ones. Additionally, mediational analyses revealed that this orientation toward symbolic products was accounted for by narcissists' materialism and self-esteem.

However, although these findings contribute to the emerging literature, a number of important questions remain unanswered. For example, do narcissists employ deliberate thinking in choosing symbolic products in spite of their practical shortcomings? That is, do narcissists make well-informed decisions, consciously processing all available information and sacrificing instrumentalism for the sake of the expressive properties of their choices? Alternatively, do narcissists make rapid decisions, choosing a product that they believe is better on most of the dimensions? This second possibility could be linked to inadequate information processing and could be reflected in an over-reliance upon pictorial clues, which in turn could be related to narcissists' high levels of impulsivity. It has been documented that impulsive buying is related to materialism (Yurchisin and Johnson, 2004). Evidence also points to a positive relation between narcissism and impulsivity (Vazire and Funder, 2006), although in our own research impulsivity was not found to mediate the link between narcissism and preference for symbolic products (Cisek et al., 2011).

To shed additional light on how information is processed during consumer decision-making, we will measure the amount of time that narcissists and non-narcissists spend looking at the symbolic and instrumental pictures and at written descriptions of potential purchases. We will use objective measures of attention, that is, eye movement data collected via an eye tracker whilst participants complete the task.

#### **EYE MOVEMENT RESEARCH IN ADVERTISEMENT AND CONSUMER BEHAVIOR**

Eye tracking methodology provides an objective record of participants' eye movements as information is processed on-line. Eye movements are one of the most frequent of all human movements (Bridgeman, 1992): on average, individuals make 3–4 eye movements, known as saccades, per second. These movements are essential to relocate the high acuity area of the retina, the fovea, such that light from new locations in the visual field falls upon it and new objects or areas of objects can be inspected in detail. Hence, eye movements are necessary for efficient operation of the visual system, and there is a close relation between eye movements and ongoing information processing. Eye movement behavior has also been utilized to explore visual cognition in a wide variety of tasks and environments, including mental imagery, memory, language comprehension, reading, reasoning and decision making (Rayner, 1998, 2009a,b; Liversedge and Findlay, 2000). Eye movements can also provide insight into psychiatry, neuroscience, ergonomics, and marketing. In recent years, the increased sophistication and accessibility of eye tracking technologies have led to a substantial number and broad variety of studies that employ eye tracking methods. Applied research has included package design, website design, advertising, computer game design, automotive engineering, evaluating skills of radar operators, pilots, security officers at airports, and many others (Richardson and Spivey, 2004). There is a growing body of research using eye tracking methods to analyze marketing decisions and advertisement perception (Wedel and Pieters, 2000; Rayner et al., 2001, 2008; Radach et al., 2003).

Eye movements are driven both by the characteristics of the task, and the properties of visual stimuli. For example, a classic study by Yarbus (1967) showed that the nature of the task influences the individuals' eye movements such that different patterns of saccades and fixations (where the eyes remain stable to process information) are observed when the same picture or scene is viewed under different inspection instructions (see also Hayhoe and Ballard, 2005; Underwood and Foulsham, 2006). Also, participants' fixations center on informative or interesting areas of the image, whereas blank or uniform regions are often left uninspected (Wooding, 2002; Richardson and Spivey, 2004). At the same time perception is influenced by memories, beliefs, attitudes, emotions, expectations, and other forms of top-down knowledge that viewers "bring" to the given image. However, advertisement research has addressed not only the visual patterns of examining pictorial information, but also reading text that often accompanies and forms an important part of such adverts. How long an observer spends initially fixating an area or object (gaze duration), how quickly a saccade is initiated to a stimulus (saccade latency), and the occurrence of revisits or regressions (saccades that return to previously fixated areas) can be analyzed to provide understanding of the online cognitive processing of pictorial information and text reading (Just and Carpenter, 1980; see also Rayner, 1998). In summary, eye movement tracking can provide data that will help to identify factors that determine the amount of attention allocated to different types and elements of adverts. This is critical especially in light of research showing that gaze durations to different advertisements correlate positively with subsequent product choices (Treistman and Gregg, 1979; Lohse, 1997). Put simply, when choosing to purchase a product, total fixation time increases for that product.

Surprisingly, although there is a substantial body of research on eye movements during reading processes and during examination of a scene or an image, there are relatively little data referring to integration of text and pictorial information, and these data are not always conclusive or free from ambiguity. Rayner et al. (2001) showed that advertisement viewers very quickly fixate the text of an advertisement. They also spend more time reading the text of the advertisement than inspecting pictorial information, especially if the ads are relevant to their goals (e.g., buying a given product). However, Radach et al. (2003) reported different findings. They observed that participants made repeated fixations back and forth between the different elements of an advertisement (that is headline, text, and picture) during inspection. More recently, and in contrast to the study from 2001, Rayner et al. (2008) demonstrated that participants who were asked to assess the quality of advertisements spent more time analyzing pictorial parts of the adverts than the accompanying text, and fixated the images first within a trial, and only after a relatively long time inspecting these did they move their eyes to the linguistic information provided. This difference in visual examination patterns has been accounted for by differences in the nature of the task (that is, the active goal of the viewer) and the type of the advertisements. Rayner et al. (2008) argued that participants in the first study were instructed to make a consumer decision, that is to decide whether to buy a particular product, and in that case they paid more attention to linguistic than pictorial descriptions of the products. Participants in the two other studies (Radach et al., 2003; Rayner et al., 2008) were asked to assess the attractiveness or effectiveness of advertisements, and here they generally showed an inverse pattern of visual investigation.

However, although the given instructions influenced how long participants fixated different aspects of the advertisements, the advertisements themselves also influenced the patterns of eye movements. These results render conclusions less firm and in need of further investigation.

#### **PROPOSED EYE MOVEMENT STUDY TO EXAMINE NARCISSISTIC CONSUMER CHOICES**

To understand better information processing and the influence of self-related motives on consumer decisions, we will examine the amount of time that narcissists and non-narcissists spend looking at various forms of information related to their potential purchases. Our study aims to investigate consumer choices made by narcissists and non-narcissists and whether possible differences in such choices are accompanied by differences in online cognitive processing of the information, as indexed by eye movement patterns. Our primary goal is to find out whether narcissists (in contrast to non-narcissists) spend more time fixating pictorial rather than linguistic information about available product alternatives, and whether they spend more time fixating symbolic rather than utilitarian products. Participants in this study will be asked to make consumer decisions as if they intended to buy given products.

Participants will be undergraduate students of the University of Southampton. At the beginning of each academic year, Psychology undergraduate students complete a range of personality questionnaires, including measures of narcissism (NPI;Raskin and Terry, 1988) as well as self-esteem (Rosenberg Self-Esteem Scale; Rosenberg, 1965), and materialism (Richins and Dawson, 1992).

Participants scoring either in the top third or bottom third of the NPI will be invited to take part in the study. Firstly, they will be requested to complete several extra personality scales, such as the Pathological Narcissism Inventory (Pincus et al., 2009), and Hedonistic Consumption Scale and Impulsive Trait Scale (Hausman, 2000). This will allow us to control for extraneous variables when analyzing narcissistic patterns of consumer behavior.

Subsequently, participants will take part in an eye movement experiment. A computer screen will display the stimulus material. On each trial, two examples of a given product will be presented; one symbolic version and one instrumental version of the same product (with each product featuring a pictorial and linguistic description). By "symbolic," we mean visually attractive, flashy, and fashionable items with inferior practicality. Instrumental products will be described as practical and economical, but will be much less visually appealing. (The differences in perceptions of visual attractiveness as well as practicality and usefulness of the two types of products have been established in extensive pilot-testing). There will be seventeen trials in total, each depicting a different product, such as a sofa, toaster, laptop, watch, coffee machine. Three of the seventeen products (digital book reader, roller-case, and cross-trainer) will have one alternative that is superior on both dimensions: attractiveness and practicality. These products will be introduced to disturb the pattern of one product always being less attractive but more practical than the other one, and to minimize the chance of participants guessing the purpose of the experiment. The arrangement of the examples on the trial display

screen will vary such that the pictures will either appear above or below the written descriptions, and the symbolic and instrumental products will appear either on the left or the right of the screen (counterbalanced). The order of products will be randomized. Participants will be instructed to look at the pictures, read the descriptions carefully, and choose which product they would most likely buy. The eye tracker will record every eye movement and fixation made during inspection for each trial. Analyses of the eye movements will allow us to determine which elements (picture/description of symbolic/ instrumental product) are fixated initially, providing an index of which type of element is of the highest relevance to the participants completing the task. Furthermore, the eye movement analyses will also enable us to explore the amount of time spent fixating, the number of fixations, and the fixation durations upon the different elements within the display, thereby allowing us to investigate the depth, extent, and time course of information processing on the different elements as a function of narcissism. We will use regression analyses of different behavioral and eye movement measures in order to test whether narcissism predicts a stronger preference for symbolic products.

We hypothesize that narcissists will demonstrate a predilection for symbolic products: they will make more symbolic choices than non-narcissists. Given the relevance of symbolic products for narcissistic self-esteem, we also hypothesize that narcissists will fixate such products for longer, and that generally they will fixate images for longer than the text. Given that participants will be instructed to make their consumer decisions as if they were actually to buy the products, in line with results of Rayner et al. (2001), we expect that non-narcissists will spend longer fixating text than pictorial information.

#### **PILOT STUDY**

We conducted a short pilot study following the procedure described above but without measuring any individual differences, as the goal of this study was to validate the paradigm for use with narcissists at a future date. Seven individuals (students or visitors from the University of Southampton, six females and one male) took part as volunteers without compensation. Due to an intermittent hardware error, the experiment terminated before all trials had been completed for two participants, and it "looped" resulting in additional trials for one additional participant. Given that this was a pilot study, and that these participants' data patterned in a similar manner to the data of the remaining participants, we

included data from all seven participants in the analyses we report below.

Analyses included total fixation time – separately for utilitarian and symbolic alternative – number and average duration of fixations, and probability of a first fixation being in a given region.

The pattern of results replicates that obtained by Rayner et al. (2001)(see **Table 1**). A 2 (type of information: text vs. picture) × 2 (type of product: symbolic vs. utilitarian) repeated-measures ANOVA revealed a statistically significant effect of type of information for the total fixation time, *<sup>F</sup>*(1,6) <sup>=</sup> 13.44, *<sup>p</sup>* <sup>=</sup> 0.01, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.69. Participants spent more time fixating on the text (*M* = 5258.24, SE = 1210.91) than on the pictorial representation of products (*M* = 1109.07, SE = 238.40). The main effect of type of product and the interaction were not significant, *F*(1,6) = 2.61 *p* = 0.16, and *F*(1,6) = 0.89, *p* = 0.38, respectively.

In addition, and consistent with the above result patterns, we obtained a significant main effect of type of information for the number of fixations, *<sup>F</sup>*(1,6) <sup>=</sup> 16.15, *<sup>p</sup>* <sup>=</sup> 0.007, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.73. Participants fixated on the text more often (*M* = 26.54, SE = 5.68) than on the pictures (*M* = 4.64, SE = 0.76). Again, the main effect of type of product and the interaction were not significant, *F*(1,6) = 3.25 *p* = 0.12 and *F*(1,6) = 0.12, *p* = 0.74, respectively.

The instruction delivered to participants was very similar to that given to participants in the study by Rayner et al. (2001), and this instructional similarity for the current pilot study may account for the replication. Participants who took the perspective of a potential buyer tried to gain useful information about the characteristics of different products, and thus, focused their attention on the linguistic description of the products to a greater extent than on their pictorial features. As images convey mostly evidence of symbolic values, whilst text provides additional information about practical aspects of a given product, this asymmetry in visual investigation may also suggest that people in general seek information about utilitarian properties when considering buying a product. Furthermore, before making a consumer decision, participants investigated both alternatives of a given product (symbolic and utilitarian) to the same extent. Such a pattern of behavior may be useful in situations when a potential buyer has to decide which product to purchase and has no initial knowledge about available alternatives. After all, to determine if products are or are not practical and useful, a consumer must first collect and process information about these products.



*We first computed the mean fixation duration for each trial, and then the mean across trials. For this reason, the mean fixation duration we report is not simply the mean total fixation time divided by the mean number of fixations. This approach ensures that the mean value for each trial contributes equivalently to the overall mean value.*

At the same time, participants made significantly longer fixations on pictures (*M* = 232.96, SE = 17.46) than on text (*M* = 198.84, SE = 10.87), type of information *F*(1,6) = 14.61, *<sup>p</sup>* <sup>=</sup> 0.009, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.71. This finding is typical and has also been reported in previous studies (Rayner et al., 2001, 2008). The type of product and the interaction remained non-significant, *F*(1,6) = 2.23 *p* = 0.19, and *F*(1,6) = 2.65, *p* = 0.16, respectively.

We obtained no significant effects for type of information, type of product, or the interaction on the probability of first fixating a region (*F*s < 1.5, *p*s > 0.26). Thus, participants first fixated on each of the different areas of the stimuli with equal probability.

Overall, the results of the pilot study replicate the earlier findings of Rayner et al. (2001), reinforcing the idea that task instruction can drive inspection behavior for product advertisements. The pilot data have served to validate our paradigm for future investigation on how such inspection patterns are modulated by high levels of narcissism. Based on the current data, our paradigm allows us to index online decision-making behavior, and provides an insight into the interplay between human visual sampling behavior and the moment-to-moment cognitive processes in which individuals engage when they make consumer decisions. Specifically, we can evaluate the relative importance of pictorial versus linguistic information when such decisions are made and record which aspects of available information are salient in capturing and holding a consumer's attention. We hope that our future studies will provide a clear demonstration of the value of eye movement technology in the investigation of the influence of personality traits (i.e., narcissism) on consumer behavior.

#### **ACKNOWLEDGMENT**

Preparation of this article, as well as the reported research, were supported by The Leverhulme Trust grant F/00 180/AM.

#### **REFERENCES**


Zuckerman, M., and O'Loughlin, R. E. (2009). Narcissism and well-being: a longitudinal perspective. *Eur. J. Soc. Psychol. 39*, 957–972. doi: 10.1002/ejsp.594

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 31 October 2013; accepted: 01 March 2014; published online: 21 March 2014. Citation: Cisek SZ, Sedikides C, Hart CM, Godwin HJ, Benson V and Liversedge SP (2014) Narcissism and consumer behavior: a review and preliminary findings. Front. Psychol. 5:232. doi: 10.3389/fpsyg.2014.00232*

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology. Copyright © 2014 Cisek, Sedikides, Hart, Godwin, Benson and Liversedge. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*