PowerPoint® Presentation Flaws and Failures: A Psychological Analysis

Kosslyn, Stephen M.; Kievit, Rogier A.; Russell, Alexandra G.; Shephard, Jennifer M.

doi:10.3389/fpsyg.2012.00230

ORIGINAL RESEARCH article

Front. Psychol., 17 July 2012

Sec. Educational Psychology

Volume 3 - 2012 | https://doi.org/10.3389/fpsyg.2012.00230

PowerPoint® Presentation Flaws and Failures: A Psychological Analysis

Stephen M. Kosslyn¹*

Rogier A. Kievit²

Alexandra G. Russell³

Jennifer M. Shephard⁴

¹Center for Advanced Study in the Behavioral Sciences, Stanford University, Stanford, CA USA
²Department of Psychology, University of Amsterdam, Amsterdam, Netherlands
³Department of Psychology, Stanford University, Stanford, CA, USA
⁴Division of Social Science, Harvard University, Cambridge, MA, USA

Electronic slideshow presentations are often faulted anecdotally, but little empirical work has documented their faults. In Study 1 we found that eight psychological principles are often violated in PowerPoint^® slideshows, and are violated to similar extents across different fields – for example, academic research slideshows generally were no better or worse than business slideshows. In Study 2 we found that respondents reported having noticed, and having been annoyed by, specific problems in presentations arising from violations of particular psychological principles. Finally, in Study 3 we showed that observers are not highly accurate in recognizing when particular slides violated a specific psychological rule. Furthermore, even when they correctly identified the violation, they often could not explain the nature of the problem. In sum, the psychological foundations for effective slideshow presentation design are neither obvious nor necessarily intuitive, and presentation designers in all fields, from education to business to government, could benefit from explicit instruction in relevant aspects of psychology.

Introduction

PowerPoint^®, Keynote^®, and their electronic slideshow siblings have become pervasive means of communication. Indeed, in 2001, Microsoft estimated that an average of 30 million PowerPoint^® presentations were given each day – and we can only imagine what that number is today (Parker, 2001). Presentations using such computer programs are commonplace in academia, business, government, the military, and even K-12 schools. PowerPoint^® has been used to teach subjects as varied as the three-dimensional structure of the larynx (Hu et al., 2009), prosthetics and orthotics (Wong et al., 2004), how to perform a testicular exam (Taylor et al., 2004), developmental psychology (Susskind, 2008), and physics (Gunel et al., 2006).

Observers have offered varied opinions about PowerPoint^® and the value of presentations that utilize it (e.g., for overviews, see Farkas, 2006; Stoner, 2007), and considerable empirical research has been conducted on the use of the medium (e.g., for reviews, see Susskind, 2005, 2008; Levasseur and Sawyer, 2006; Savoy et al., 2009). One set of this research has examined whether using this medium improves learning or comprehension. For example, Gunel et al. (2006) found that students learned more physics from PowerPoint^® presentations than from chapter summaries, whereas other researchers have reported that students remember about the same amount of material following PowerPoint^® presentations as they do following other media (such as overheads and use of the blackboard; e.g., Szabo and Hastings, 2000; Beets and Lobingier, 2001; Campbell et al., 2004; Apperson et al., 2006; Experiments 1 and 3; Susskind, 2005, 2008). Indeed, some studies find that PowerPoint^® actually impairs learning. For example, Savoy et al. (2009) showed that students recalled more verbal information from a traditional lecture than they did when that information was presented verbally during a PowerPoint^®-based lecture.

Researchers have also examined whether students prefer PowerPoint^® presentations to other types of presentations. The results of such studies generally indicate that students do like PowerPoint^® presentations (Susskind, 2008; Savoy et al., 2009). Apperson et al. (2008) asked psychology students what in particular they liked about PowerPoint^® presentations, and found that they like to see lists built up one item at a time and that they also like outlines of key phrases, graphics, relevant sounds, and colored backgrounds. However, it is worth noting that Mahar et al. (2009) compared memory retention after an entire list had been presented simultaneously to retention when each item on the list had been presented individually, and found better memory (but only for two of the nine items on their test) in the simultaneous condition.

Even this brief overview of the literature reveals that the results are mixed. One reason for the varied results may lie in differences in the quality of the presentations: the medium is not the entire message; any medium can be used effectively or ineffectively. In our view, the key is not whether or not PowerPoint^® is used, but rather how it is used. Nobody should be surprised if a PowerPoint^® lecture utilizing poorly designed slides or presented by an inadequately prepared speaker is not effective. In this article, we use psychological principles to develop objective guidelines for slideshow design, slide preparation, and presentation execution, and use these guidelines to assess flaws in PowerPoint^® slides and presentations.

In the three studies reported here, we empirically extend ideas presented in Kosslyn (2007, 2010) and analyze how well PowerPoint^® slideshows, slides, and presentations respect principles of human perception, memory, and comprehension. Specifically, we hypothesize that the psychological principles presented in the following section are violated in PowerPoint^® slideshows across different fields (Study 1), that some types of presentation flaws will be noticeable and frustrating to audience members (Study 2), and that observers will have difficulties in identifying violations in the case of graphical displays in individual slides (Study 3).

Eight Cognitive Communication Principles

Levasseur and Sawyer (2006) note in their extensive review that remarkably little research provides direct guidelines for designing presentations (see also Suchoff, 2006). However, much research on basic psychological processes has been conducted that can be used to formulate such guidelines. This observation has been appreciated by many researchers, who have understood the fundamental point that effective displays must play to the strengths of human information processing and must avoid relying on the weaknesses of such processing (e.g., Aspillaga, 1996; Helander et al., 1997; Mejdal et al., 2001; Vekiri, 2002; Watzman, 2003; Stanney et al., 2004).

We acknowledge that the following formulation is but one of several possible ways to organize these guidelines within the conventional view of human information processing, but this one (from Kosslyn, 2007) appears to capture the major points of agreement among researchers. Moreover, this set of principles is a convenient way to organize the specific rules we examined, and it is these rules that identify the specific aspects of psychological processing we investigated. Thus, in addition to grouping the data according to the principles, we provide the data for each individual rule – which also allows readers to consider the results for those rules of particular interest.

In formulating our eight Cognitive Communication Principles, we began with a task analysis; that is, we considered what the viewer must do to understand a presentation. This analysis rests on the now-standard view that perception and comprehension can be decomposed into distinct classes of information processing operations (e.g., see Reisberg, 2006; Smith and Kosslyn, 2006), which underlie encoding, working memory, and accessing long-term memory. We focus here primarily on the visual modality because the visual aspects of the slide design are a major factor of PowerPoint^® presentations (although the presenter’s speaking style and structuring of the presentation are also important, as we address in Study 2). In what follows, we first present the basic findings about information processing that led us to formulate a specific principle, and then explain the principle and how we operationalized it. Crucially, because all of these principles stem from known bottlenecks in information processing, slideshows, slides, and presentations that violate them will tax human information processing.

In what follows, although we organize the information processing operations into three categories (encoding, working memory, and accessing long-term memory), we do not mean to imply that these processes work in lockstep sequence. Rather, by focusing on the operations, we are led to define the bottlenecks that can disrupt presentations.

Encoding Processes

When viewing a sequence of slides, audience members first must encode what they see; if they fail to encode material, it may as well not exist. We formulated three principles based on essential facts about encoding.

Discriminability

The initial step of encoding requires noticing the to-be-encoded material, which requires patterns to be clearly different from the background and from other patterns. This leads us to formulate the Principle of Discriminability: two properties (such as two colors, degrees of gray, or sizes) cannot convey different information unless they differ by a large enough proportion to be easily distinguished (Woodson et al., 1992). This principle underlies camouflage, which occurs when two properties are so similar that they are not distinguishable – and there’s no place for camouflage in a presentation. In addition, we humans are not very good at discriminating among differences in the sizes of areas: we tend to underestimate areas, and do so to an increasingly large degree as the area increases (Stevens, 1975). The Principle of Discriminability has the following corollaries:

(1) Unless typefaces are large enough, letters cannot easily be distinguished from each other.

(2) If the color of text or graphics is not clearly distinct from the color of the background on which they appear, they cannot be readily distinguished.

(3) Typefaces in which letters are similar (because they are visually complex, all upper case, all italics, or all bold) cannot be easily read.

(4) Because patterns that are registered by separate “spatial frequency channels” are easily distinguished (Stromeyer and Klein, 1974; De Valois and De Valois, 1990), viewers can easily distinguish points that are about twice as thick as the lines that connect them and can easily distinguish texture patterns in which elements (e.g., stripes) vary by at least 2:1 (which will also avoid distracting “visual beats”).

(5) Similarly, because orientations that differ by at least 30° are processed by different “orientation channels” (Thomas and Gille, 1979; De Valois and De Valois, 1990), viewers can easily distinguish orientations of the lines in different regions when they vary by at least 30°.

(6) Finally, because of the way that different wavelengths of light focus on the retina, we have difficulty bringing cobalt blue (which is a mixture of red and blue) into focus, and we have difficulty focusing on red and blue at the same time (combinations of these colors tend to “shimmer”); thus text or other fine lines are difficult to distinguish if they are rendered in deep blues, and boundaries defined by adjacent red and blue regions are distracting.

Perceptual organization

Once material is noticed, processes that underlie figure/ground segregation organize the input into perceptual units that often correspond to objects (including words and graphics). This observation leads us to formulate the Principle of Perceptual Organization: people automatically organize elements into groups, which they then attend to and remember (Larkin and Simon, 1987; Gropper, 1991; Aspillaga, 1996; Helander et al., 1997; Vekiri, 2002). Such groups emerge in numerous ways (Palmer, 1992). Classical grouping “Laws” specify the perceptual conditions that lead viewers to see elements as a single group. For example, we tend to group together elements that are nearby (the so-called “Law of Proximity”). That is, we see “xxx xxx” as two groups of three each, not six separate x’s. Similarly, we tend to group together elements that appear similar (the so-called “Law of Similarity”). For instance, we see “///\\\” as two groups, not six separate lines.

Grouping also can be imposed by lines, such as those used in complex tables to define specific regions, which makes the tables easier to read. In addition, some visual dimensions are automatically grouped; these dimensions are “integral,” and cannot easily be considered independently. For example, height and width are not easily seen as distinct, but rather are organized into a single shape – and viewers register the overall area, not the two dimensions separately (Macmillan and Ornstein, 1998). Other integral dimensions are hue, lightness, and saturation, which cannot easily be attended to independently; rather, our visual systems tend to group them into a single color – and thus these different dimensions of color cannot easily be used to convey separate measurements. The Principle of Perceptual Organization has the following corollaries:

(1) Labels initially are seen as applying to the closest graphic element.

(2) Common color organizes parts of a display into a group, even if they are separated in space.

(3) Common patterns or orderings organize parts of a display into a group, even if they are separated in space. For example, when the elements in a key (e.g., small squares of different colors, each associated with a label) are ordered the same way as the corresponding parts of the display (e.g., bars in a bar graph), the elements in the key will be grouped appropriately with the corresponding parts of the display.

(4) Because shape and location largely are processed separately in the visual system (Ungerleider and Mishkin, 1982), an element in one location will not always be easily grouped with elements in other locations. Hence, it is useful to construct a display so that separated-but-related elements are explicitly grouped (e.g., by using inner grid lines to group the tops of bars in a bar graph with locations along the Y axis at the left of the graph, if specific point values are to be read).

(5) Comprehension is impaired by spurious groups, which form accidentally because of the grouping laws (e.g., such as those that can form when pairs of measurements for two or more categories are plotted in the same scatter plot, or when a banner at the top of a slide groups with specific objects in the display because of proximity or similarity).

Salience

Not all perceptual units can be processed simultaneously, if only because they are in different locations and the visual acuity of the eye is greatest at the fovea (and hence whatever people fixate their gazes on will be encoded with higher resolution). Thus, some units are encoded in detail before others. Attention selects which patterns will be processed in detail, and attention is automatically drawn to what is different; we immediately notice the nail that sticks above the floorboards or the one red light in a group of green lights. The Principle of Salience states: attention is drawn to large perceptible differences. In fact, a part of the brain (the superior colliculus) operates as an attentional reflex, automatically drawing our visual attention to large differences among stimuli (Posner, 1980; Glimcher and Sparks, 1992; Krauzlis et al., 2004). The Principle of Salience has the following corollaries:

(1) Because movement is a particularly salient cue (Buchel et al., 1998), animation captures and directs attention.

(2) Text with a distinct format (color, size, or typeface) draws attention.

(3) Visual disparities draw attention. For example, we notice a wedge exploded from a pie chart because the boundary has been disrupted; however, if more than one or two wedges are exploded, the boundary becomes so disrupted that the separated pieces are no longer salient.

(4) If salience is aligned with importance (Helander et al., 1997), the more important aspects of the slide (e.g., the title or topic sentence) or of an illustration (graph, diagram, demo) draw the audience members’ attention – which will also enhance later memory for those aspects (Schmidt, 1991).

(5) A special case of salience arises with colors: because of a quirk in how the corneas diffuse light and project it onto the retinas, “warm” colors (i.e., those with relatively long wavelengths, such as red) seem closer to the viewer than do “cool” colors (i.e., those with relatively short wavelengths, such as blue; see Held, 1980; Allen and Rubin, 1981; Travis, 1991). Thus, if warmer colors are used for lines that pass beneath other lines, viewers will experience an annoying illusion in which material at the back appears to be struggling to move forward.

Working Memory

After visual patterns are encoded, they must be integrated. In a typical presentation, material must be integrated in working memory over the course of multiple slides, each of which may require multiple eye fixations. This integrating process is a prelude to fully comprehending both individual displays and the entire presentation. The high demands on working memory lead to the following two specific principles.

Limited capacity

One of the key facts about working memory is that it has a very limited capacity (e.g., Baddeley, 2007). The Principle of Limited Capacity states that people have a limited capacity to retain and to process information and will not understand a message if too much information must be retained or processed (Smith and Mosier, 1986; Shneiderman, 1992; Helander et al., 1997; Lund, 1997; Sweller et al., 1998). The Principle of Limited Capacity has the following corollaries:

(1) The amount of information that people can retain in working memory is defined in terms of psychological units, such as the perceptual groups produced by the classical perceptual grouping laws. We can hold in working memory only about four such units (Cowan, 2001). However, each of these units in turn can comprise four units – and thus hierarchical organization can enhance our ability to hold information in mind.

(2) In general, humans can track the movement of only about four units at the same time (Intriligator and Cavanagh, 2001 – although under special circumstances, more can be tracked; Alvarez and Franconeri, 2007).

(3) Eliminating the need to search for labels – for instance by directly labeling items in a display rather than using a key – reduces processing load (c.f. Sweller, 1999).

(4) Audience members need time to process the information that is presented.

(5) However, conversely, if slides fade-in or fade-out very slowly, the audience members may organize them incorrectly and then have to break their initial organization – which also requires effort (Potter, 1966).

Informative change

Because working memory has limited capacity, extraneous information can easily overwhelm it at the expense of relevant information. But, by the same token, visual or auditory cues can be used to help organize the information. According to the Principle of Informative Change, people expect changes in perceptual properties to carry information, and expect every necessary piece of information to be conveyed by such a perceptible change. Indeed, the very concept of “information” has been defined in terms of change: only when there is a change is information conveyed (Shannon, 1948). The Principle of Informative Change has the following corollaries:

(1) Audience members assume that words, graphics, or other changes in appearance convey new information (Smith and Mosier, 1986; Shneiderman, 1992; Aspillaga, 1996; Helander et al., 1997; Lund, 1997). For example, audience members assume that new information is being conveyed by changes in the appearance of the background, bullet points, or color or typeface of text. Random or arbitrary changes in appearance, in transitions between slides, or in terminology (“fowl” versus “bird”) can lead the audience astray.

(2) Clearly marking the beginnings and ends of sections of a presentation (for instance by presenting a title or concluding slide with a distinct format, typeface, or background) helps audience members follow the presentation.

Accessing Long-Term Memory

Our task analysis leads us to posit a third class of relevant processes. Encoding information and integrating it appropriately would be useless if the meaning of the material were not extracted. In order to ascribe meaning to stimuli, one must compare them with material previously stored in long-term memory; it is only by retrieving associated information that we comprehend the import of what we see. The following principles address factors that affect the ease of accessing long-term memory and activating the relevant stored information.

Appropriate knowledge

Meaning can be ascribed to a pattern only if the person has the requisite information already stored in long-term memory; to reach an audience, the presenter must make contact with what the audience members already know (Osman and Hannafin, 1992; Shneiderman, 1992; Fleming and Levie, 1993; Lund, 1997; Schwartz et al., 1998). The Principle of Appropriate Knowledge states: communication requires prior knowledge of relevant concepts, jargon, and symbols. If the presenter relies on novel concepts, jargon, or symbols, the audience members will fail to understand. The Principle of Appropriate Knowledge has the following corollaries:

(1) Unfamiliar (for that audience) concepts, conventions, formats, terminology, and symbols may not be understood. Moreover, when they are understood, they will likely require effortful processing (which will be accomplished only if the audience is highly motivated).

(2) If unfamiliar (for that audience) concepts, conventions, formats, terminology, and symbols are absolutely necessary to convey the message, they must be explicitly introduced and explained – and such explanations are more effective if they draw on information that is familiar to the audience.

Compatibility

The meaning of a stimulus will be difficult to extract if the interpretation of its surface properties (such as the size or color) is inconsistent with its symbolic meaning. The Principle of Compatibility states that a message is easiest to understand if its form is compatible with its meaning (Vessey, 1991; Woodson et al., 1992; Vekiri, 2002; Speier, 2006). This principle is perhaps most evident when it is breached, as demonstrated by the classic “Stroop effect” (MacLeod, 1991). In the Stroop effect, people have more difficulty naming the color of the ink used to print words when the words name a different color from the ink (e.g., blue ink used to print the word “red”) than when words name the same color (e.g., blue ink used to print the word “blue”). We register both the surface properties (e.g., the color of the ink) and the meaning. This principle applies across perceptual modalities, and thus comprehension generally is best when audio and visual contents coordinate with text and the overall message that is being conveyed; inappropriate sounds and visuals will interfere with comprehension. The Principle of Compatibility has the following corollaries:

(1) Because hue is a metathetic (values on metathetic dimensions are arranged qualitatively) variable, variations in hue are not seen automatically as signaling different amounts of a quantity; in contrast, because saturation and intensity (or brightness) are prothetic variables (values on prothetic dimensions are arranged quantitatively), variations along these dimensions do line up with variations in quantities (Stevens, 1975).

(2) Animation interferes with comprehension if it does not fit the natural movements of the object (e.g., a picture of a car should not drop down from the top), and sounds and slide backgrounds will interfere if they are not appropriate for the topic (a floral background, or the sounds of birds chirping, are not compatible with a presentation about carbon reservoirs in the ocean).

(3) Viewers most easily interpret icons that depict the typical examples of represented items. For example, a picture of a duck effectively illustrates “water fowl” but not “pet bird,” and vice versa for a picture of a canary (Rosch et al., 1976).

(4) Old-fashioned looking typeface would send a conflicting message if used in a written description of a high-tech device.

(5) Because they make different sorts of information explicit, different sorts of graphics are appropriate for making different points (Tversky et al., 2002). Line graphs (rather than bar or mixed graphs) illustrate trends effectively because the continuous variation in the height of a line in a graph directly indicates the continuous variations of a measurement. Similarly, crossing lines in a graph indicate interactions more effectively than sets of bars. Bar graphs illustrate specific values (not trends) effectively because the discrete heights of the bars directly indicate specific measurements. Maps illustrate complex information about geographic territories or show alternate routes to a destination. Charts effectively illustrate organizational structure, a sequence of steps, or processes over time (“flow charts”).

Relevance

Finally, the message must be calibrated so that neither too much nor too little information is presented for a specific audience (Grice, 1975). (Note: the Principle of Informative Change states that, given a specific message, the requisite information must be provided – the present point is about providing an appropriate message in the first place.) It is clear that not providing enough information is a problem, but perhaps it is less immediately evident that providing too much information (such as extraneous graphics, text, or audio) is also a problem. Presenting too much information is a problem in part because this forces viewers to search for the relevant information, which requires effort (e.g., Wolfe, 1998). The Principle of Relevance states that communication is most effective when neither too much nor too little information is presented (Smith and Mosier, 1986; Woodson et al., 1992; Vekiri, 2002; Bartsch and Cobern, 2003). This principle has the following corollaries:

(1) To decide what is too much or too little, one must know about the nature of the message: depending on what the intended point is, specific information can be necessary or extraneous.

(2) When attempting to understand information, people (largely unconsciously) organize it into a narrative (c.f. Wagoner, 2008; Karns et al., 2009). Defining the topic and presenting a roadmap at the outset (in an outline or other overview) facilitates this process.

(3) Graphics (photos, drawings, graphs, diagrams), audio, and video can provide detail to illustrate the relevant concepts clearly. However, pictures are often ambiguous (Wittgenstein, 1953/2001), and when they are ambiguous labels can clarify them.

In the following study we use these Cognitive Communication Principles, and the specific rules that grow out of them, to evaluate a wide variety of slideshows and slides. We ask the following questions: first, are violations of the principles common? Given that many corollaries of these principles are not intuitively obvious, we hypothesize that we will find many violations. Second, are the violations equally common in different fields? Because human beings are preparing the slides in all cases, we have no grounds for hypothesizing differences in the frequency of violations in the different fields.

Study 1

Levasseur and Sawyer (2006) noted that there is remarkably little research on how slides should be designed so that they function effectively. One possible reason for this dearth is that many may feel that “good design” is intuitively clear to most people, and hence there is no reason to study it. Consistent with this conjecture, some have claimed that the sorts of psychological principles just discussed are obvious, and rarely if ever would be violated in a presentation. To evaluate this supposition, we examined a sample of PowerPoint^® slideshows. Specifically, we used a stratified sampling procedure to examine a selection of slideshows in five categories: Research (academic), Education, Government, Business, and Miscellaneous. We formulated 137 ways (each specified in a rule) in which the eight principles just summarized could be violated (see Table 1).

TABLE 1

Table 1. List of rules for each principle and the proportion of presentations violating each rule (according to strict and liberal criteria) for Study 1.

Materials and Methods

For all searching and coding, Mac OSX and Microsoft Excel were used. We used the Google search engine for the sampling procedure.

We first acquired a random, stratified sample of PowerPoint^® slideshows from the web. Following this, we asked two judges to score each PowerPoint^® slideshow independently, using a checklist (see Table 1) to search for violations of specific rules of effective PowerPoint^® communication; these rules were special cases of the eight general principles discussed in the Introduction. As noted below, two additional judges evaluated the slideshows using slightly different criteria.

Search procedure

Before we present the results of our sampling, we need to explain how we produced the sample. To ensure that there was no potential for bias (that is, sampling non-representative slideshows), we randomly selected slideshows within each category. To this end, we implemented a two-step procedure. We first constructed a list of keywords for each of four categories – Research (academic), Education, Government, and Business – by searching an electronic resource (e.g., EbscoHost) for the category name and then noting the keywords of the first few academic articles to appear in the search results. We then randomly paired each keyword with two numbers, which ranged from 1 to 10. The first number designated the page, the second the position of the item within a page. For example, a keyword, page, and position combination could have been “marketing, 6, 4.”

We then performed each search by entering the keyword followed by a space and then “.ppt” (the common abbreviation for PowerPoint^® slideshows). For example, if the keyword and page combination was “marketing, 6, 4,” the search term would have been “marketing.ppt,” which would yield a series of Google hits from which we would select the forth hit on the sixth page. In cases where the specified hit was not a slideshow, the next slideshow down would be selected, and if that entry was not a slideshow, we looked at the subsequent one, repeating this process until we found a slideshow. The minimum required length for slideshows was 15 slides, and the maximum allowed length was 100 slides; if slideshows contained fewer than 15 or more than 100 slides, we again selected the next slideshow in the list until we found a slideshow of suitable length.

In total, we selected 141 slideshows using this method, one of which was excluded because it clearly bridged categories, leaving us with a total of 140. We then classified each of these slideshows into one of five categories (in addition to Research, Education, Government, and Business, we defined a fifth category, Miscellaneous, for slideshows that did not fit clearly into the other four). Because we selected search terms that were targeted for a specific category, we were likely to find slideshows in that category – but our search method did not guarantee this result, and thus we double-checked the appropriate category after each slideshow was retrieved. Our search methods were designed to have a high probability of sampling from the different categories, and we had hoped to sample about the same number of slideshows from each category. However, our criteria were not perfect, and thus the numbers in each category are not precisely the same. Ultimately each category contained between 20 and 40 slideshows, specifically: Research: 27, Education: 38, Government: 20, Business: 32, and Miscellaneous: 23.

We note that we did not search for “.pptx” or “.key” or for other extensions. At the time the study was begun (2008), .pptx had not been available very long – and hence we worried about possible biases of sampling works from “early adopters.” In addition, the vast majority of slideshows are created with PowerPoint^®, and we worried that Keynote^® users might also represent a biased sample. Thus, we can only be confident that our results generalize to PowerPoint^® slideshows per se. In addition, because we sampled from the web, our results can only generalize to other slideshows of the sort that are posted on the web (however, for our purposes, this is sufficient – if anything, these slideshows may be better than those not deemed worthy of being posted).

Scoring and coding

Four judges scored the 140 slideshows for violations of the 137 rules. Two of the judges were authors, and two were college-educated paid research assistants in the laboratory; at the time of coding, none of the judges knew whether or not they would subsequently do enough work on this project to be included as authors, and none were biased to find flaws (let alone flaws of specific types). Each rule was worded as a statement that, if it described a slide or slideshow as a whole, revealed a violation of that rule.

Two of the judges began by independently scoring 10 slideshows, and then comparing their scores. Any disparities were discussed, and in five cases the wording of rules was modified to be more specific. Following this initial training and calibration procedure, each judge independently scored all of the slideshows. The judges then discussed any disparities in classification or scoring, and reached a consensus.

The initial two judges interpreted each rule literally (“strictly”). For example, if the rule stated that no more than two lines should be included in a single bullet point, then a bullet point that had two lines and a single word in a third line would be classified as violating the rule. We adopted this procedure because we wanted to be sure that the rules could be easily interpreted. Nevertheless, we were concerned that we would inflate the number of violations by adhering rigidly to the rules; many of them were intended to be heuristics. For example, the prohibition against more than two lines per bullet point is based on the idea that no more than four concepts should be held in working memory at the same time, and in general two lines of text convey about four concepts.

Thus, following the initial scoring, two additional judges considered each of the violations identified by the first two judges. They decided, independently, whether each violation was important. In this case, “important” was defined as “likely to disrupt the comprehension or memory of the material.” If not, then they rejected that initial classification of a violation. By using both the strict and liberal scoring methods, we thereby defined a range in which we evaluated the slideshows. Table 1 shows the proportion of slideshows that violated each of the 137 rules, sorted by principle, scored separately for the strict and liberal criteria. Finally, two authors independently evaluated the rules and – using the summary provided in the introduction – indicated which principle applied to each rule. When they disagreed (for 15% of the rules), a discussion resolved the classification. In the process of this discussion, it became clear that 12% of the rules were “over-determined”: rather than following primarily from a single principle, they followed from two or more principles. Thus, we created a ninth level for the “principle” variable, which included all the rules for which more than one principle applied.

Results

The dependent variable was whether or not a slideshow violated a principle. We took a “bad-apple-spoils-the-barrel” approach: if even a single slide contained material that violated a rule, the slideshow was scored as having violated the rule. And if a slideshow violated one or more rules within a specific principle, the score for that principle would be “1;” if none of the rules were violated for a principle, the score for it would be “0.” We present first the results from the strict, initial scoring, and after this the results from the more liberal scoring.

Strict scoring

We began by assessing inter-rater reliability. The overall inter-rater agreement was 0.88, which indicates the proportion of times that the raters either both coded “1,” indicating a violation, or both coded “0,” indicating no violation. We then sought to answer three questions about the data.

First, we asked whether the violations of the principles differed for the different categories (see Table 2). To answer this question, we conducted a repeated-measures ANOVA, with the nine levels of the Principles (the eight principles plus an “over-determined” class) as a within-participants variable and Category as a between-participants variable. The results showed that, in general, there were no overall differences in violations for the different categories, F(4, 135) = 2.24, p > 0.05, $η_{p}^{2} = 0.06 .$

TABLE 2

Table 2. Proportion of violations for each principle in each category for Study 1.

Second, we asked whether some principles were violated more frequently than others. As illustrated in Figure 1, in the same two-way ANOVA just noted, we found that some principles were violated more often than others, F(8, 1080) = 87.94, p < 0.01, $η_{p}^{2} = 0.39 .$ A Bonferroni-corrected post hoc test with an alpha of 0.05 revealed that out of the 36 possible comparisons, only the following eight were not significant: the difference between Appropriate Knowledge and Salience, Compatibility and Over-determined, Compatibility and Perceptual Organization, Discriminability and Limited Capacity, Informative Change and Relevance, Informative Change and Salience, Over-determined and Perceptual Organization, and Relevance and Salience. All other comparisons were significant.

FIGURE 1

Figure 1. The percentage of presentations that had violations, scored according to strict (dark bars) and liberal (light bars) criteria, in Study 1. A, appropriate knowledge; C, compatibility; D, discriminability; I, informative change; L, limited capacity; P, perceptual organization; R, relevance; S, salience; O, over-determined. Error bars illustrate the standard error of the mean.

In addition, the results of this analysis showed there was an interaction between Principles and Category, F(32, 1080) = 1.91, p < 0.05, $η_{p}^{2} = 0.054 .$ Specifically, a Bonferroni-corrected post hoc procedure documented that the difference in violations between the principle of Appropriate Knowledge and the principle of Compatibility was significantly different between the categories of Research and Government, F(1, 45) = 10.19, p < 0.01, $η_{p}^{2} = 0.19 .$

In general, the following aspects of these results are worth noting: the principles of Discriminability and Limited Capacity were violated in every single slideshow, and the principle of Informative Change was violated in 93% of the slideshows. In contrast, the principle of Compatibility was violated in only 31% of the slideshows.

Third, we examined which rules were violated most often. In particular, we found that the following five rules were violated most often: (1) Bulleted items are not presented individually, growing the list from the top to the bottom (96% of the slideshows); (2) More than two lines are used per bulleted sentence (91%); (3) More than four bulleted items appear in a single list (91%); (4) Hierarchical organization of lists is not used, with no more than four items at each level (86%); (5) All uppercase, all italics, or all bold typefaces are used (81%). To examine this result more carefully, we examined whether the five worst rules were violated equally often for the different categories of slideshows. The results showed no interaction between Category and the violation of the five worst rules, F(16, 540) = 1.51, p > 0.05, $η_{p}^{2} = 0.04 .$ See Table 3 for a more detailed breakdown of the worst rules per category.

TABLE 3

Table 3. Most frequently violated rules per category (strict scoring) for Study 1.

Liberal scoring

We again assessed inter-rater reliability, now based on the judgments with the more liberal criteria. The proportion agreement between the two raters for all principles and scales combined was 0.98. The shift from strict to liberal criteria did not change the pattern of the results. Using the new criteria resulted in our eliminating only 4% of the violations that were identified using the strict scoring procedure.

Considering again our three questions about the data: first, as with the strict coding, we conducted a repeated-measures ANOVA, with the nine levels of the Principles, as a within-participant variable, and Category as a between-participant variable. The results showed that, in general, there were no differences in violations between the categories, F(4, 135) = 1.93, p > 0.1, $η_{p}^{2} = 0.05 .$

Second, we again asked whether some principles were violated more often than others. As illustrated in Figure 1, we found that some principles were violated more often than others, F(8, 1080) = 79.52, p < 0.05, $η_{p}^{2} = 0.37 .$ A Bonferroni-corrected post hoc test with an alpha of 0.05 showed that out of the 36 possible comparisons, only the following eight were not significant: the difference between Appropriate Knowledge and Salience, Compatibility and Over-determined, Compatibility and Perceptual Organization, Discriminability and Limited Capacity, Informative Change and Relevance, Informative Change and Salience, Over-determined and Perceptual Organization, and between Relevance and Salience. All other comparisons were significant.

As before, the principles of Discriminability (98%), Limited Capacity (100%), and Informative Change (86%) each were violated in the vast majority of slideshows. Also as before, the principle of Compatibility was violated least-frequently, in “only” about a quarter of the slideshows (27%) in this case. The results of the analysis also showed that there was no interaction between Category and Principle, F(32, 1080) = 1.44, p > 0.05, $η_{p}^{2} = 0.04 .$

Third, we again examined which rules were violated most often. In this analysis, we found that the following five rules were violated most often: (1) Bulleted items are not presented individually, growing the list from the top to the bottom (94% of the slideshows); (2) More than four bulleted items appear in a single list (88%); (3) More than two lines are used per bulleted sentence (87%); (4) Hierarchical organization of lists is not used, with no more than four items at each level (84%); (5) Words are not large enough (i.e., greater than 20 point) to be easily seen (65%). The first four rules are the same as the first four identified in the previous analysis, but now the rule “All uppercase, all italics, or all bold typefaces are used” was replaced by another rule that also stemmed from the principle of Discriminability, namely the rule regarding size. To examine this result more carefully, we examined whether there was an interaction between the five worst rules and categories, and found such an interaction, F(16, 540) = 2.35, p < 0.05, $η_{p}^{2} = 0.07 .$ Specifically, a Bonferroni-corrected post hoc procedure showed that the difference in violations between the rule “Words are not large enough (i.e., greater than 20 point) to be easily seen” and the rule “More than two lines are used per bulleted sentence” was significantly different between the categories of Business and Government, F(1, 50) = 10.99, p < 0.01, $η_{p}^{2} = 0.18 .$

Table 2 provides a summary of the violations per category per principle, using both the liberal and the strict scoring method.

Conclusion

As is clear, by either the strict or liberal scoring criteria, many slideshows are flawed. Indeed, not a single slideshow was scored – according to either set of criteria – as having no flaws. Using the strict criteria, each slideshow violated on average 6.23 of the nine classes of principles, which shrank slightly to 5.88 using the liberal criteria. Considering just the eight individual principles, excluding the “over-determined” group of rules, the overall violation was 6.17 for the strict coding and 5.86 for the liberal coding. The three most-violated principles were Discriminability (because material was too similar to be easily distinguished), Limited Capacity (because too much information was presented), and Informative Change (because changes in how information was presented did not actually reflect changes in the information being conveyed). Even the least-frequently violated principle, Compatibility, was still violated in at least a quarter of the slideshows.

In addition, the fact that violations were comparable across the different categories is of interest. We examined the different categories partly in an effort to consider the possible influence of different topics. Although slideshows of different topics (or areas) may differ in their average complexity, the results do not suggest that violations of specific principles generally vary for different topics. That said, we did find that some specific rules were violated more frequently in some categories than others, which could reflect differences in the complexities of the topics.

In sum, the present results suggest that the psychological principles either are not obvious or are obvious but often ignored.

Study 2

The analysis of PowerPoint^® slideshows in Study 1 revealed that slideshows themselves typically are flawed in multiple ways. We wanted to know whether viewers are sensitive to presentation flaws more generally, and thus we conducted a survey asking participants to report approximately how many electronic slideshow presentations they had seen in the past year, and then to rate various aspects of those presentations in terms of their quality.