Edited by: Javier Jaen, Universitat Politecnica de Valencia, Spain
Reviewed by: João Dinis Freitas, Microsoft Language Development Center, Portugal; Rubén San Segundo Hernández, Universidad Politécnica de Madrid, Spain
Specialty section: This article was submitted to Human-Media Interaction, a section of the journal Frontiers in ICT
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
We aimed to elucidate how our domain-general cuing algorithm improved multitasking performance and changed behavioral strategies in human operators.
Though many gaze-control systems have been designed, previous real-time gaze-aware assistance systems were not both successful and domain-general. It is largely unknown what constitutes optimal search efficiency using the eyes, or ideal control using the mouse. It is unclear what the best coordinating strategies are between these two modalities. Our previously developed closed-loop multitasking aid drastically improved multitasking performance, though the behavioral mechanisms through which it acted were unknown.
We performed in-depth analyses and generated novel eye tracking and mouse movement measures, to explore the complex effects of our helpful system on gaze and motor behavior.
Our overlay cuing algorithm improved control efficiency and reduced well-known biases in search patterns. This system also reduced micromanaging behavior, with humans rationally relying more on imperfect automation in experimental assistance cue conditions. We showed that mouse and gaze were more independently specialized in the helpful cuing condition than in control conditions. Specifically, with our aid, the gaze performed more global movement, and the mouse performed more local clustered movement. Further, the gaze shifted toward search over processing with the helpful cuing system. We also illustrated a relationship between the mouse and the gaze, such that in these studies, “the hand was quicker than the eye.”
Overall, results suggested that our cuing system improved performance and reduced short-term working memory load on humans by delegating it to the computer in real time. Further, it reduced the number of required repeated decisions by an estimate of about one per second. It also enabled the gaze to specialize for improved visual search behavior, and the mouse to specialize for improved control.
The vast majority of people are poor multitaskers (Watson and Strayer,
Tracking a participant’s eye movements while multitasking is an especially good way to glean optimal cognitive strategies. Much work has shown that eye tracking to determine point of gaze can reliably convey the location at which humans’ visual attention is currently directed (Just and Carpenter,
Multitasking principles also apply when managing multiple items in working memory (Heathcote et al.,
Though many paradigms have been developed to study multitasking using eye tracking, most traditional applications of eye tracking are not used in real time, but instead to augment training, or simply to observe optimal strategies. For an example of training, post-experiment analysis of gaze data can be used to determine an attention strategy of the best-performing participants or groups. Then, these higher-performing strategies can be taught during training sessions at a later date (Rosch and Vogel-Walcutt,
Real-time reminders for tasks can improve user performance (Moray,
The structure of our paper is as follows: the Materials and Methods section details the design of our cuing system, our previous evaluation of its basic effectiveness toward improving multitasking performance, the participant details, technical implementation, and statistical procedures. All of our measures (except one) were novel and custom to this circumstance. Thus, the algorithms for analysis were not included in the Methods section, but instead were interleaved with the Results below for better readability, and since these analysis methods are new contributions themselves. Then, the Section Discussion includes both the findings in the context of the engineering psychology literature and in-depth review of the related eye tracking work. We end the paper with brief conclusions and suggestions of wider application.
In a previous paper, we demonstrated that cuing participants with the inverse of eye-gaze recency (the most neglected task at the moment) drastically improved users’ performance (Taylor et al.,
To evaluate this system, participants played a multi-agent game (Ember’s game): they managed multiple simulated robotic firefighters simultaneously to save rescue victims. The semi-random automated movement of the robot would eventually rescue some targets, though occasional human intervention could speed rescue times. Each participant session had 7 experimental blocks, and the number of robots they managed increased from 4 to 10 across the 7 blocks, e.g., they managed 4 robots in Block-1, 5 robots in Block-2, and 10 robots in Block-7. Each robot moved in a separate map panel task. An image of the game and eye tracking algorithm design was included in Figure
This study adopted a between subject design with one independent variable, the type of frame cuing each of multiple simultaneous map panel tasks. Participants were randomly assigned to one of the three conditions (one test and two control), determined by three frame cue types:
In our previous study, we analyzed data collected from participants’ eye movements, mouse movements, and task performance (score). The system displayed large improvements in performance in the On (Helpful frame cue) condition over both controls. In addition, the Helpful frame cue group demonstrated faster reaction times and showed reduced pupil dilation as a proxy for reduced cognitive load (Taylor et al.,
A total of 44 human subjects participated in Ember’s game. All procedures complied with departmental and university guidelines for research with human participants and were approved by the university institutional review board, and participants provided written informed consent. Participants were recruited from the university population at large and were compensated for their time with $5 USD. Data were not excluded based on behavioral task performance in order to obtain a generalizable sample of individual variation on performance of the task while avoiding a restriction of range (Myers et al.,
We used a desk-mounted GazePoint GP3 eye tracker positioned directly under the computer monitor to pinpoint the users’ point of gaze, i.e., the point on the screen the user was fixating. All experimental presentation procedures, data collection, and analyses were fully automated. Python and PyGame were used to program the experiment, and interfaced with the eye tracker’s open standard API via TCP/IP, generously provided by GazePoint (
Most statistics were displayed within figures themselves, either (1) as SEM bars, which in our experiment conservatively indicate statistically significant differences between groups by approximating
Measure | Fig | 4 | Maps | 5 | 6 | 7 | 8 | 9 | 10 | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
– | – | On–Off | On–Ra | On–Off | On–Ra | On–Off | On–Ra | On–Off | On–Ra | On–Off | On–Ra | On–Off | On–Ra | On–Off | On–Ra |
Target paths primary | 3A | 0.43 | 0.47 | 0.07 | 0.4 | 0.37 | 0.8 | 0.59 | 0.96 | 0.94 | 1.14 | 1.01 | 1.3 | 0.82 | 1.13 |
Num. paths total | 3B | 0.02 | 0.24 | 0.02 | 0.14 | 0.26 | 0.01 | 0.13 | 0.17 | 0.14 | 0.19 | 0.2 | 0.1 | 0.03 | 0.31 |
Num. of clicks | 3C | 0.06 | 0.35 | 0.04 | 0.26 | 0.31 | 0.26 | 0.06 | 0.59 | 0.05 | 0.63 | 0.04 | 0.59 | 0.02 | 0.87 |
Micromanage coefficient | 4A | 0.09 | 0.27 | 0.16 | 0.27 | 0.16 | 0.58 | 0.17 | 0.58 | 0.6 | 0.86 | 0.58 | 0.89 | 0.55 | 0.84 |
Path length mean | 4B | 0.17 | 0.42 | 0.14 | 0.49 | 0.56 | 0.95 | 0.83 | 1.02 | 1.05 | 1.03 | 0.95 | 1.0 | 0.83 | 0.9 |
Panel gaze switches | 5A | 0.54 | 0.31 | 0.68 | 0.65 | 0.7 | 0.74 | 0.31 | 0.39 | 0.34 | 0.39 | 0.5 | 0.26 | 0.3 | 0.3 |
Panel mouse switches | 5B | 1.06 | 1.11 | 0.48 | 0.99 | 0.36 | 0.68 | 0.57 | 0.83 | 0.84 | 1.09 | 0.48 | 0.8 | 0.59 | 1.3 |
Mouse mileage | 6A | 0.65 | 1.24 | 0.63 | 1.43 | 0.5 | 1.5 | 0.66 | 1.53 | 0.91 | 1.56 | 0.54 | 1.05 | 0.61 | 1.31 |
Mouse clustering | 6B | 0.56 | 0.84 | 0.17 | 0.86 | 0.18 | 0.79 | 0.37 | 1.08 | 0.4 | 1.25 | 0.31 | 0.87 | 0.31 | 0.91 |
Fixation to saccade | 7A | 0.41 | 0.48 | 0.25 | 0.44 | 0.32 | 0.68 | 0.22 | 0.58 | 0.28 | 0.63 | 0.57 | 0.35 | 0.54 | 0.74 |
Mouse-gaze separation | 7B | 0.39 | 0.17 | 0.41 | 0.39 | 0.44 | 0.29 | 0.15 | 0.25 | 0.48 | 0.26 | 0.4 | 0.01 | 0.68 | 0.39 |
Trans: Diag-Horiz, mouse | 9 | NA | NA | NA | NA | 0.02 | 0.31 | 0.48 | 0.7 | 0.31 | 0.57 | 0.46 | 0.42 | 0.5 | 0.83 |
The
The low number of tests within proposed statistical families, the presence of consistent global trends, and guidelines cited here below, all argue against correcting any values for multiple comparisons (Rothman,
Automated data processing and plotting were programed in the R-project statistical environment (Core Team,
To determine whether participants actually used the assistive frame cue, we employed two overlapping measures to see how quickly participants responded to frame cues, one with the eyes, and another with the mouse: (1) For the eyes, we calculated the mean duration of the most neglected frame cue, for each of the 7 blocks (i.e., 4, 5, … , 10 map task panels) in the treatment group, only in the Helpful “On” condition. This duration was defined as an interval from when the most neglected cue frame started to highlight a map panel task to when that frame disappeared from that panel (Figure
The duration from the cue starting until the gaze or click interaction with that map task was reduced as map number increased (Figures
The decreasing durations for the Helpful frame cue condition suggested that the users were consistently looking at, and taking action (clicking), on the highlighted map panel tasks, confirming usage of the Helpful frame cue. This decrease in time means that participants utilized the cuing system to a greater degree over time, until an observed ceiling of compliance. The increasing duration in the Random condition associated with increased map task numbers, suggesting participants were choosing which map panel task to look at independently of the frame highlighting. This was as expected, since the participants in the Random condition were instructed that the frames were irrelevant to game-play.
Multiple measures of mouse and click efficiency were calculated to determine the strategies employed by users in the better-performing Helpful frame cue condition: (1) The total number of paths users set to send the robot to primary targets was calculated; (2) the total number of paths was calculated as a baseline reference; (3) the total number of clicks was calculated as another baseline reference.
The number of paths the users set to send the robot directly to primary targets was higher (better) with Helpful frames compared to both controls, while the Random condition had the lowest (worst) values (Figure
Use of the eye tracking cuing system improved strategic efficiency of user’s mouse interaction with the task. Since it appeared that relevant clicks were increased in number over irrelevant clicks in the Helpful condition, these measures inspired the next analysis to explore micromanaging.
When performing the task a user could have either sequentially specified many intermediate path goals along the way to a target, or sent the semi-automated robot directly toward a primary rescue target with a single click at its ultimate goal. These two strategies illustrate the range of behaviors for “micromanaging” the robot’s location and path. Two measures of micromanaging were defined. (1) The ratio of clicking directly on-target path goals (clicking on the target) versus clicking on non-target (intermediate location) path goals was computed. This proportion resulted in our micromanaging coefficient, with lower numbers indicating less micromanaging (Figure
The Helpful frame cue condition demonstrated less micromanaging behavior than the Off frames condition, which in turn demonstrated less than the Random condition (Figure
These results highlighted the importance of utilizing automated systems to their fullest extent, often even when such automation is incomplete and error prone. Some robots heading directly to targets likely took inefficient paths, whereas other robots were not directed to targets at all, the latter being a much more important task to satisfy. This cuing system reduced such irrational perseverance and micromanaging behavior, and improved reliance on the semi-autonomous robots. Participants in the Helpful condition appeared to rely more on the robot’s sub-optimal automation.
To ascertain information about the type of movement strategies, the mouse and gaze were performing across conditions, the global and local dynamics of eye movements were explored. Multiple measures were calculated: (1) For both the mouse and the gaze, the number of times the mouse or gaze switched between map panel tasks in the array of 4–10 panels was calculated for each condition. The larger this number, the greater the global movement, and large-scale task switching. (2) A related measure was generated by taking the mean time that the mouse or gaze spent within a single map panel task, before leaving it and moving to the next map panel task, averaged across all map panel tasks in a single block.
The Helpful frames increased the number of macro-level map panel task switches for gaze relative to the two control conditions (Figure
These results suggested that gaze movements were more distributed, long range, or global, while the mouse made fewer macro-level map panel task transitions, perhaps for more efficient task switching. These results inspired the next measure, extending the investigation of mouse movements.
To further characterize the global–local properties of mouse movements, these were analyzed using two measures: (1) The total cumulative distance covered by the mouse traversing the computer monitor over the course of a block, for each condition and block, was computed. (2) A novel measure of mouse-clustering was defined. Mouse movements were classified into clusters by setting an upper limit on the Euclidean distance (>2 cm) between a minimum number of sequential adjacent fixation coordinates, which spanned roughly 100 ms. We then quantified the proportion of mouse movements not in clusters (large movements) versus within clusters (small adjacent movements). The proportion of adjacent long movements (>2 cm) versus shorter movements estimated the degree to which mouse movements were clustered. Larger values indicate greater proportion of long movements, and smaller values indicate greater clustering (smaller movements).
Surprisingly, despite the above discussed decrease in global mouse movement (number of panel mouse switches), an increase in mouse mileage was observed in the Helpful experimental condition, indicating more local movement relative to global movement with the mouse (Figure
There was an observed opposite pattern in the gaze and mouse measures for map panel task switches (previously), which agreed with the current result of higher mouse mileage when compared to lower mouse global switches. Together, these indicated a functional specialization. Specifically, the mouse and gaze could work more independently; i.e., the gaze could search broadly, and the mouse could move in a more clustered local manner. In conclusion, with Helpful frames compared to other conditions, the eye-gaze appeared to specialize for global search, while the mouse for local movement.
Much previous work defines the basic terminology for eye-gaze patterns (Goldberg and Kotval,
To further explore search behavior, two measures were computed: (1) We estimated the fixation-to-saccade ratio. Fixations were gazes lasting >120 ms, within roughly 1 cm, while any shorter duration movements were classified as saccades. (2) To explore the temporal relationship between the mouse and gaze during search, a measure of how long the mouse and gaze were separated before reuniting was calculated, the mouse-gaze separation duration. Specifically, when the mouse and gaze overlapped in 2D space (were on the same square centimeter), they were classified as overlapping. The mean duration between these overlap events was computed, to represent the duration of time the mouse and gaze spent separate. The average duration in time between these overlap events (i.e., the mean duration of time-segments where the mouse and gaze were separate) was plotted (Figure
In the Helpful frames condition, the fixation-to-saccade ratio was lower, with a smaller proportion of fixations or greater proportion of saccades (Figure
More short saccades and fewer long fixations would be classically interpreted to mean that participants searched more and processed less, respectively. These results indicated that the eye gaze may have specialized for search behavior (saccades) over processing (fixations), which could then be more efficient and directed. This was congruent with the above finding of greater global movement of the eyes with Helpful frames. This further supports the notion that the mouse and gaze were able to be functionally and spatially independent to a greater degree in the Helpful frames condition. This indicates a functional specialization of the mouse and gaze in the Helpful frames condition.
To further elucidate mouse-gaze relationships, three more novel measures were defined: (1) Mouse-gaze distance around click. It was predicted that the mouse and gaze would be closer in 2D space around the time of a click, relative to when clicks were not occurring. To test this prediction, the distance between the mouse and gaze was plotted as a function of time, time-locked to the click, averaged across all clicks and across all blocks (Figure
The mouse and gaze appeared distant from each other, except when they came close immediately after a click (Figure
The mouse-gaze-click analysis further extended the earlier observation that the mouse and gaze may have been functionally specialized to a greater degree in the Helpful frames “On” condition. An equivalence of gaze-click-click and gaze-mouse-click plots indicated that the mouse stills, slowing movement just before the click to a greater degree than the gaze, which continues to move up until the click. The click analyses and the time-stepped correlation both indicate, but do not necessitate, the conclusion that the mouse preceded the gaze. These two findings supported the adage that the “the hand is quicker than the eye,” and previous studies (Land et al.,
Much previous work has demonstrated that human observers have a bias toward horizontal detection and transition over a diagonal, even when diagonal transitions are optimal (Megaw and Richardson,
With Helpful frames, participants transitioned more frequently between map panel tasks diagonally relative to horizontally in the Helpful frames condition compared to controls (Figure
In addition to our previous demonstration of reducing global bias (Taylor et al.,
To measure effect sizes, Cohen’s
We confirmed that there were no incidental differences between subject groups in each condition for measured features known to influence experimental performance. To do so, we tested the null hypothesis that each group had the same population mean using ANOVA for the following measures: (A) hours of sleep in the previous week did not differ (
Dividing attention over multiple tasks or entities is notoriously problematic for most humans (Watson and Strayer,
In supervisory control sampling tasks, where operators perform something like instrument scanning or sampling (Moray,
The costs for task switching are many: the rapid decision making required, the code-switching needed to switch between tasks, and the observed perseverance bias for continuing tasks longer than ideal (Jersild,
Irrational perseverance is also illustrated by a past study showing that when noticing a problem with one element, operators failed to continue monitoring all tasks well, and did not move on effectively (Moray and Rotenberg,
To ameliorate task switching costs, visual external cues for task switching may assist operators (Allport et al.,
Our studies took advantage of these phenomena to optimize the primary task, when a secondary task is also helpful to perform, but can be performed by the computer. Our frame cue aid may allow for more full focus on a single map, with more effective and rapid task switching between maps.
With multiple entities to track and evaluate, past studies showed that the human user is often limited by functional working memory load. Different working memory resources are thought to exist for different modalities or mental processes; for example, visual working memory has been proposed to be stored in what is called the visuospatial sketchpad (Salway and Logie,
With industrial relevance, vigilance tasks are hypothesized to produce continuous loads on working memory (Parasuraman,
To perform optimally the user must, among other things, remember where was monitored last, since the longer the time that has elapsed since a check, the greater the probability of a situation requiring human assistance; this, however, is also a spatial working memory task. With high numbers of tasks, working memory load increases, as measured by blunting typical assistance via peripheral preview (Tulga and Sheridan,
Our algorithm aids multitasking in a domain-general manner, for the first time. It is likely that one mechanism of action was via the delegation of short-term working memory to the computer in real time, freeing cognitive resources from a secondary task, so that the resources can be invested in the primary task.
Extensive research has been performed under DARPA initiatives to further real-time assistance for technical scenarios (St. John et al.,
Similarly, human–robot interaction must be optimized for human performance; such studies take the form of designing command interfaces, optimizing naturalness, gesture-based communication, hardware design, and social considerations (Waldherr et al.,
Though simple cuing can assist users in knowing what to attend to in multitasking scenarios (Seidlits et al.,
Gaze location can be used to modify a display or physical device in real-time, depending on either pupil size or on the location of gaze itself, with both summarized in the following five parts: (1) Augmented cognition, (2) Contingency, (3) Gaze-control, (4) Gaze-aware systems, and (5) Previous limitations.
The DARPA augmented cognition initiative has considered the use of gaze tracking as a potentially important for assisting users in military scenarios (Crosby et al.,
Contingent (interactive) eye tracking is defined as modifying a display or process in real-time based on gaze location, and has previously been used in lab experiments, though often as an impediment, not for the point of optimizing performance. Early in the development of the technology, the fields of linguistics and reading employed paradigms like the moving window paradigm (Reder,
A very large quantity of work on gaze-based paradigms, which are intended to control computerized systems, wheelchairs, or other robots with the eyes, exists. These have been pioneered both inside academia and out, for groups of individuals with diseases like amyotrophic lateral sclerosis (ALS), a peripheral motor-neuron disease paralyzing the body, while leaving eye movements in tact. These studies most often involve the movement of a cursor, wheelchair, and accessories via the point of gaze, while defining a variety of mouse-click paradigms, including blinks and others, as well as the use of gaze location to control computer graphical menus, zooming of windows, or context-sensitive presentation of information (Jacob,
Many of these gaze control-based paradigms are very beneficial, and some include gaze-aware features, though we would like to distinguish the purely gaze-aware from the purely control or control which has some additional assistive function. The experiments presented here demonstrate a gaze-aware system, which can improve performance, without direct input, and may assist operators in a variety of scenarios, both control and non-control. We now progress to further discussion of gaze-aware components.
Most gaze-display paradigms have some control component, though some displays are purely gaze-aware but not intended for user assistance: for example, video compression is used in gaze contingent displays (GCDs) to maintain image resolution while compressing the periphery and to optimize computational resources (Reingold et al.,
Most of the currently existing gaze-aware systems or AUIs actually also include an active control component, for example, in the selection of which window to interact with, or which menu item to enlarge. Many of these attentive or gaze-aware interfaces are also quite domain-specific, with examples including reading, menu selection, scrolling, or information presentation (Bolt,
In robot interaction, most real-time eye interfaces have been designed to mimic gaze for social or communicative reasons (Staudte and Crocker,
A common theme in gaze-aware interfaces is to attempt to predict the users’ preferences or gaze location in domain-specific scenarios, such as map scanning, reading, eye-typing, or entertainment media (Goldberg and Schryver,
Though some have attempted domain-general systems, such attempts have interfered with the user by obscuring the display inflexibly (e.g., make everything looked at entirely opaque), or only apply to a specific subset of behavior, such as within some types of search (Pavel et al.,
Given the seeming obviousness of our solution, it is notable that despite the need and benefit from this situation-blind multitasking aid, no others exist to date. The benefits were non-trivial, with very large statistical effect sizes, and potential for wide generalization. Previous eye trackers cost upwards of $20,000 USD, which often prohibited this as a feasible solution. However, recently eye trackers have come down in price, with the eye tracker used in this study being printed on a 3D printer, with an open API, while many others have now published designs for open and inexpensive hardware (Babcock and Pelz,
In part, previous applications attempted to predict human intentions, to better provide the human with what they wanted. For general benefits, it is more efficient to eliminate prediction, and define domain-general probability utility over the visual display, such that the display is transparent, non-interfering, and assistive. Our application may function via delegating short-term memory load to the computer, and reducing the number of repeated decisions the user needed to make. Highlighting the gaze history on multiple tracking tasks could theoretically improve performance on many tasks where the probability of task relevance relates to the delay since gaze, even in complex ways. The design of the assistive algorithms tested here demonstrated clearly greater application-independence than any previous AUIs or gaze-aware interfaces. Rather than enable novel control means, these experiments demonstrated an overlay procedure to transparently accelerate normal interaction or pre-existing control, likely generalizing to a wide variety of multi-agent applications.
PT designed experiment, with assistance from NB, HS, and ZH. NB programmed experiment, with contributions and edits by PT. PT and ZH ran human subjects. ZH and PT programmed data analysis. PT wrote manuscript, with contributions from HS, ZH, and NB. HS supervised all research.
Source code for the experimental game will be provided upon request, under the open GPLv3 license.
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
We thank the Office of Naval Research, award: N00014-09-1-0069.