Edited by: Joel Pearson, The University of New South Wales, Australia
Reviewed by: Holly Bridge, University of Oxford, UK; Joel Pearson, The University of New South Wales, Australia
*Correspondence: Freya Bailes, Department of Drama and Music, University of Hull, Cottingham Road, Hull HU6 7RX, UK. e-mail:
This article was submitted to Frontiers in Perception Science, a specialty of Frontiers in Psychology.
This is an open-access article distributed under the terms of the
Musicians imagine music during mental rehearsal, when reading from a score, and while composing. An important characteristic of music is its temporality. Among the parameters that vary through time is sound intensity, perceived as patterns of loudness. Studies of mental imagery for melodies (i.e., pitch and rhythm) show interference from concurrent musical pitch and verbal tasks, but how we represent musical changes in loudness is unclear. Theories suggest that our perceptions of loudness change relate to our perceptions of force or effort, implying a motor representation. An experiment was conducted to investigate the modalities that contribute to imagery for loudness change. Musicians performed a within-subjects loudness change recall task, comprising 48 trials. First, participants heard a musical scale played with varying patterns of loudness, which they were asked to remember. There followed an empty interval of 8 s (nil distractor control), or the presentation of a series of four sine tones, or four visual letters or three conductor gestures, also to be remembered. Participants then saw an unfolding score of the notes of the scale, during which they were to imagine the corresponding scale in their mind while adjusting a slider to indicate the imagined changes in loudness. Finally, participants performed a recognition task of the tone, letter, or gesture sequence. Based on the motor hypothesis, we predicted that observing and remembering conductor gestures would impair loudness change scale recall, while observing and remembering tone or letter string stimuli would not. Results support this prediction, with loudness change recalled less accurately in the gestures condition than in the control condition. An effect of musical training suggests that auditory and motor imagery ability may be closely related to domain expertise.
Musicians imagine music during mental rehearsal (Holmes,
Mental representations of pitch and melody have been shown to involve auditory (Deutsch,
The role of motor representations in auditory imagery is critical to distinctions that have been made between an “inner ear” (acoustic imagery) and an “inner voice” (subvocal rehearsal) (Smith et al.,
In the current study, an experiment was conducted to investigate the modalities that contribute to imagery for loudness change. Investigations of visual imagery and working memory have used an interference paradigm as the means to disrupt different types of processing. For example, in a study of movement imagery in rock climbing, Smyth and Waller (
One of the challenges presented by ubiquitous real-world stimuli such as music is that it is time-varying. Prior studies of mental imagery have investigated more static material such as pictures, objects, or alphanumeric characters. Thus there is a need to evaluate contemporary accounts of imagery in the context of sequential and temporally structured and varied material. In turn, this requires the development of new methods of: stimulus presentation, on-line generative responding, and analysis of the resulting production (time-series) data. In short, investigation of imagery in music demands a method of responding that captures its temporal unfolding, and this may be best achieved by way of a production (rather than recognition) task. Accordingly, we used a continuous response paradigm that encouraged participants to imagine loudness change stimuli. Participants moved a volume slider to indicate increases and decreases in the “loudness” of the imagined stimuli. The advantages of such an approach are twofold. First, enacting the response in time is more likely to recruit a mental image of the stimulus than performing a stimulus recognition task. Second, movement is integral to this response mode, respecting our hypothesized link between representations of intensity and motor effort.
The principal hypothesis was that imagining changes in loudness would be disrupted by concurrently remembering movement sequences (presented visually). However, the loudness change stimuli in the current experiment comprised loudness changes in the sounding of ascending and descending scales, and pitch (the patterns of note ascent and descent) is integral to a representation of such stimuli, so it was possible that tone sequences would also impair mental imagery. Finally, if participants chose a strategy of labeling increases and decreases of intensity as “up” and “down” respectively in order to remember the loudness change scale stimuli, then a concurrent verbal task of remembering letters could be expected to interfere with the task of recreating the loudness change stimuli, perhaps suggestive of a verbal representation rather than a mental image. In line with previous research (e.g., Williamson et al.,
In experiments on working memory for actions that use an interference paradigm (e.g., Smyth and Pendleton,
Conductor gestures were used as visuo-spatial distractor stimuli of relevance to the communication and understanding of musical intensity, and to represent a motor sequence. Action-observation theories would suggest that observing a sequence of conductor gestures necessarily activates motor representations. Simulation theory (see Berthoz,
It also was important in the current study to separately determine whether these auditory, verbal, and motor distractor tasks would impact on imagery for
A within-subjects design comprised two different imagery tasks (melodies, loudness change scales), each with four different distractor conditions (control, letter sequence, tone sequence, movement sequence), generating eight different experiment conditions.
Participants (
For the melody imagery task, 28 melodies were selected from the Australian Music Examinations Board (AMEB) aural test syllabus for grades 2–3 (AMEB,
Half of the melodies were altered to produce the “different” test stimuli, while half were unaltered for the “same” test stimuli. Three types of alteration were made: (1) the order of two consecutive pitches was reversed, as in the “Exchange” comparison of Mikumo (
For the loudness change imagery task, 16 different loudness change patterns comprising sequences of crescendi and decrescendi were produced. Eight loudness patterns were superimposed on an ascending/descending (in pitch) one octave major scale, while eight were superimposed on a descending/ascending (in pitch) scale. The audio files were generated and recorded through the Disklavier, controlled by Max/MSP. Each note in the scale was 500 ms, so that all the scales spanned 8 s. Half of each scale type (e.g., ascending/descending) began with a crescendo, while the other half began with a decrescendo. Each stimulus comprised between two and four loudness changes (crescendo or decrescendo), lasting between three and eight notes each. Loudness changes were implemented by manipulating the MIDI signal sent from Max/MSP to the Disklavier for the velocity at which each note should be played. The minimum and maximum note velocities were the same for all loudness changes (MIDI note velocity range from 20 to 60). No more than two consecutive notes shared the same note velocity.
For use in the test phase, visual scores of the scales, without loudness change markings, were written in Sibelius. Powerpoint and the screen capture software Capture Me were then used to record videos of each scale being gradually revealed at the rate of one note per 500 ms.
The same total set of six letters as used by Williamson et al. (
Four-tone sequences were generated in Audacity. Pure sine tones were used, each being 2 s long, presented sequentially with no gap in-between. For each trial, the four tones were selected from outside the key
A set of 10 clips of musical conducting were selected from the videos provided in “Expressive Conducting” (Wiens,
The experiment was run from a MacBook (OS X 10.5.8). Stimuli were presented and data were collected using a custom-made patch in Max/MSP. Participants wore Sennheiser HD 650 headphones, and data in the loudness change imagery task were collected by means of an I-CubeX push v1.1 slider facing away from the participant at a slight upwards incline (Figure
The study was approved by the Human Research Ethics Committee of the University of Western Sydney. Written informed consent was first sought to participate in the study, and general instructions about the format of the experiment were provided. Participants began by filling out the OMSI questionnaire. Trials for each of the eight experiment conditions (two imagery tasks × four distractor tasks) were blocked, and instructions specific to the condition were provided at the start of that block. Participants then performed a practice trial for the block, and were given an opportunity to ask the experimenter any questions that they had before proceeding to the experiment trials. Presentation of the eight blocks was random. Each of the 24 melody stimuli was presented once without repeat across melody imagery blocks. Each of the 14 loudness change scale stimuli was presented once or twice across loudness change imagery blocks (the Max/MSP program randomly selected without replacement all 14 stimuli, then began the process again until 10 of the list had been presented a second time).
In a melody trial, a melody was sounded, followed by presentation of the distractor stimulus (letter sequence, tone sequence, movement sequence, or control period of 8 s). Immediately after the distractor stimulus presentation, a visual score of the melody appeared on screen, and participants indicated as quickly as possible whether the score was the same or different to the melody that they had heard, by comparing their mental image of the melody with the score. “Same” and “Different” buttons appeared next to each other on the screen, and participants used a mouse to indicate their response. Following the melody test, the trial ended for the control condition, or a distractor recognition test appeared, in which participants were presented with the distractor stimulus (letter, tone, or movement sequence) and used the same buttons to indicate whether the distractor test sequence was the same or different to the distractor stimulus which had originally been presented. Figure
In a loudness change scale trial, a scale modulated in acoustic intensity (loudness change) was sounded, followed by presentation of the distractor stimulus (letter sequence, tone sequence, movement sequence, or control period of 8 s). Immediately after the distractor stimulus presentation, an unfolding visual score of the notes of the scale was presented on the screen, and participants used a volume slider to indicate their mental image of the loudness change profile of the scale that they had heard. Notes appeared on the score at the same pace as they had been sounded at the start of the trial (i.e., one note per 500 ms), and participants were instructed to match the timing of their slider adjustments to the timing of the unfolding visual score. To ensure that slider movements began from the appropriate imagined loudness level at the start of the scale, a 2-s long orientation period was provided, visually marked by a yellow circle on the screen. During this time participants were to move the slider to the level that they thought best represented the opening loudness of the scale, before going on to indicate the loudness changes corresponding to the visually unfolding scale
Each block comprised six trials and a practice. The experiment lasted approximately 45 min.
Participant responses for the loudness change scale recall task comprised the series of slider values produced by each participant on each trial. Figure
Accuracy in the melody recognition was calculated as the proportion of correct responses (i.e., correct identification of a different or same stimulus) from all given responses per condition. Four participants were at chance performance only in the nil distractor (control) condition, and consequently they were excluded from analyses on the melody task.
Multi-level linear modeling was used (lme4 in the statistical program “R”) to determine how well the distractor condition was able to model the scores. One advantage of this approach over ANOVA is the possibility of modeling random effects so that different intercepts and gradients for individuals and block order can be included, thus controlling for intersubject variability or order effects. Models were developed stepwise, using interference condition as a predictor, and testing the impact of individuals, block order, OMSI, and years of musical training as random effects. Model selection used the Bayesian Information Criterion (BIC) to determine the most parsimonious fit. Confidence intervals (CI) were calculated as Highest Posterior Density estimates obtained by Markov Chain Monte Carlo sampling.
In the best fit multi-level linear model, recall of loudness change after the movement sequence distractor was significantly worse than recall under the nil distractor condition (β = 1.04,
The optimized model also included a random intercept for each individual participant (SD = 2.39), for years of musical training (SD = 0.89), and for block order (SD = 0.67).
In the model of accuracy in the melody recognition task, performance was not significantly different from the nil distractor condition under tone sequence distraction (β = 0.08,
As in the model of DTW distances as an index of imagery for loudness change, the optimized model of accuracy in recognizing melodies included a random intercept for each individual participant (SD = 0.10). Here no significant contribution of years of musical training or block order was found.
Melody recognition in the nil distractor/control condition was significantly better than chance [
It was also of interest to compare memory for the distractor stimuli following performance in each of the quite different loudness change scale recall and melody recognition tasks. Table
Letter | Tone | Movement | |
---|---|---|---|
Loudness change scale recall | 0.87 (0.1) | 0.76 (0.2) | 0.55 (0.2) |
Melody recognition | 0.86 (0.1) | 0.74 (0.2) | 0.68 (0.2) |
During the loudness change scale task, correct recall of the distractor sequences differed significantly by distractor type, as assessed by a repeated measures ANOVA of accuracy in the distractor task,
The correct recall of distractor sequences in the melody recognition task also differed significantly by distractor type,
No relationship was found between OMSI and score when modeling performance in either the melody recall or loudness change scale reproduction tasks. The OMSI is designed to categorize participants as more (>500) or less (<500) musically sophisticated. Comparing the imagery scores of participants categorized in this way confirmed the result from linear modeling that there were no significant differences on either imagery task along this dimension. However, years of musical training contributed to the model of accuracy in loudness change scale reproduction. Furthermore, a positive correlation of years of musical training with accuracy on the melody imagery task was found
This experiment aimed to determine the disruptive effects of rehearsing letter, tone, and movement sequences on mental imagery for changes in loudness. As predicted, rehearsing a movement sequence in mind significantly impaired the recall of loudness change scales. Rehearsing tone sequences did not, though rehearsing letter sequences, a task which could have involved subvocal motor rehearsal, came close to producing a significant impairment. Analyses of how well participants were able to remember the different distractor stimuli revealed that movement sequence recognition was consistently weaker than the recognition of the other distractor sequences. Equating the difficulty of tasks that are to be used in working memory experiments is a vexed issue that receives relatively little discussion. While recognizing conductor gestures might be regarded as more difficult, it is just as likely that the stimuli and task are less familiar than performing a task containing letters or musical tones. Familiarity refers to having knowledge of the material in long-term memory. Thus familiarizing participants with novel material such as gestures within an experiment is one way in the future that could strengthen task comparability. Alternatively, unfamiliar words and tones could be used to be more comparable with the novel conducting gestures.
It seems likely that retaining the movement sequences presented a substantial cognitive load during the performance of any concurrent memory task. However, memory for melodies was only marginally impaired by the movement distractor task, and so its impact primarily concerned the specific task of reproducing imagined changes in loudness. While a motor response was required to reproduce these imagined loudness changes, evidence from a separate experiment suggests that mentally rehearsing the movement sequences does not impair use of the slider
Contrary to expectations, the accurate imagining of melodies was not significantly impaired by the concurrent rehearsal of tone sequences. Perhaps participants ignored the intervening tone sequence, choosing to prioritize mental rehearsal of the melodies. Such a strategy would have been associated with poor performance on the subsequent tone recognition task, yet accuracy was better than chance [
The retention of letter sequences was significantly higher than the retention of other distractor sequences during both the loudness change scale and melody tasks. Yet this superior letter recognition did not come at the price of memory for the loudness change scale or melodies. The letters were presented visually, but they were selected in the knowledge that they might be encoded phonologically and rehearsed as an acoustic image or by subvocal rehearsal. An absence of interference from the letters in the melody recognition task might suggest a visual rehearsal strategy, while a lack of significant interference in the loudness change scale recall task might point to a similar approach, with the interesting corollary that if letter sequences were rehearsed visually, the successful imagining of loudness change scales must be achieved as a motor or auditory image, and not as a visual image of crescendo and decrescendo markings.
The finding that imagery for musical changes in loudness is disrupted by the concurrent rehearsal of a movement goes some way to answering the question of whether patterns of musical loudness are best described as a verbal, auditory, or motor representation. To be added to the list of potential modalities is vision, given that the presentation of the conductor gestures was visual, and participants might have been translating the changes in loudness that they heard into a visual code of what are called “hair pins” (score annotations to indicate crescendi and decrescendi). The absence of an impairment from rehearsing tones does not seem to suggest an exclusively auditory image. Similarly, the lack of a statistically significant impairment from rehearsing a letter sequence does not point to a uniquely verbal labeling of loudness change information such as “up,” “down.” The most likely scenario is that a balance of representation modality was involved. Such a view is consistent with current accounts of working memory that emphasize interference from process rather than content; these accounts recognize the influence of task demands, task relevant, and task irrelevant information, instructions, and context on performance (e.g., Marsh et al.,
Working memory involves simultaneous short-term storage and processing of information (Oberauer,
Participants in this study were able to read music, suggesting at least a minimal amount of musical training. Not only has a link been established between auditory imagery abilities and musical training (Aleman et al.,
Audio-motor coupling has been argued to be strong for musicians, and Baumann et al. (
In conclusion, we have presented behavioral evidence for motor processing in the imagining of musical changes in loudness. Although concurrent verbal and auditory distractor tasks did not significantly impair participants’ ability to imagine loudness change stimuli, we should not conclude that a uniquely motor representation drives imagery for musical loudness change. These verbal and auditory distractor tasks failed to impair performance on a melody imagery task, in spite of previous research to suggest that melodic material is rehearsed as an auditory image. Future work is needed to determine how rehearsing the tonal and letter sequences employed in the current study should have impaired auditory and verbal imagery for musical stimuli. Individual differences in imagery ability were evident, and it remains to be seen whether these individual differences are reflected in the processing modalities preferred when imagining musical stimuli. Our ongoing research is studying the use of mental imagery for loudness change by expert musicians in performance. The current experiment has provided an effective interference task for loudness change imagery in the guise of conductor gestures, allowing us to examine the strategies used by musicians when they cannot consciously plan (imagine) their expressive intentions.
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Our thanks to Stephen Fazio for assistance with editing video clips, Yvonne Leung for data collection, and to members of the Amateur Chamber Music Society for their generous help collecting data.
1A musical key is described by the scale to which most of the notes in a piece of music conform. For example, if most of the notes in a piece are in the scale of C major, with important pitches such as “C” (tonic) or “G” (dominant) occurring particularly often, its key is probably C major.
2Since moving a slider is a motor task, it was important to ensure that any impaired performance associated with movement sequence distractor conditions could not be attributable to physical motor production demands. A separate experiment required 12 participants (six from the current experiment and six new participants) to use the slider to indicate the loudness changes that they were hearing in the moment. Since this perceptual task did not require participants to rehearse or recall loudness change, we did not expect any interference from concurrent distractor conditions. Indeed, the participants’ accuracy in marking loudness changes under letter and movement sequence conditions was not significantly different (