# MODULARITY IN MOTOR CONTROL: FROM MUSCLE SYNERGIES TO COGNITIVE ACTION REPRESENTATION

EDITED BY: Andrea d'Avella, Martin Giese, Yuri P. Ivanenko, Thomas Schack and Tamar Flash PUBLISHED IN: Frontiers in Computational Neuroscience

### *Frontiers Copyright Statement*

*© Copyright 2007-2016 Frontiers Media SA. All rights reserved. All content included on this site, such as text, graphics, logos, button icons, images, video/audio clips, downloads, data compilations and software, is the property of or is licensed to Frontiers Media SA ("Frontiers") or its licensees and/or subcontractors. The copyright in the text of individual articles is the property of their respective authors, subject to a license granted to Frontiers.*

*The compilation of articles constituting this e-book, wherever published, as well as the compilation of all other content on this site, is the exclusive property of Frontiers. For the conditions for downloading and copying of e-books from Frontiers' website, please see the Terms for Website Use. If purchasing Frontiers e-books from other websites or sources, the conditions of the website concerned apply.*

*Images and graphics not forming part of user-contributed materials may not be downloaded or copied without permission.*

*Individual articles may be downloaded and reproduced in accordance with the principles of the CC-BY licence subject to any copyright or other notices. They may not be re-sold as an e-book.*

*As author or other contributor you grant a CC-BY licence to others to reproduce your articles, including any graphics and third-party materials supplied by you, in accordance with the Conditions for Website Use and subject to any copyright notices which you include in connection with your articles and materials.*

> *All copyright, and all rights therein, are protected by national and international copyright laws.*

*The above represents a summary only. For the full conditions see the Conditions for Authors and the Conditions for Website Use.*

ISSN 1664-8714 ISBN 978-2-88919-805-4 DOI 10.3389/978-2-88919-805-4

# About Frontiers

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

# Frontiers Journal Series

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing. All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

# Dedication to Quality

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view. By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

# What are Frontiers Research Topics?

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

# **MODULARITY IN MOTOR CONTROL: FROM MUSCLE SYNERGIES TO COGNITIVE ACTION REPRESENTATION**

Topic Editors:

**Andrea d'Avella,** University of Messina and IRCCS Fondazione Santa Lucia, Italy **Martin Giese,** University Clinic Tuebingen and Hertie Institute Tuebingen, Germany **Yuri P. Ivanenko,** IRCCS Fondazione Santa Lucia, Italy **Thomas Schack,** Bielefeld University, Germany **Tamar Flash,** Weizmann Institute, Israel

Modules are like bricks in a construction game: they represent building blocks that capture key organizational principles underlying versatility and adaptability in motor control. The many articles contributing to this Research Topics address modularity at different levels and demonstrate the impressive breadth of research currently being undertaken on this topic. Image by Yuri Ivanenko and Andrea d'Avella

Mastering a rich repertoire of motor behaviors, as humans and other animals do, is a surprising and still poorly understood outcome of evolution, development, and learning. Many degrees-offreedom, non-linear dynamics, and sensory delays provide formidable challenges for controlling even simple actions. Modularity as a functional element, both structural and computational, of a control architecture might be the key organizational principle that the central nervous system employs for achieving versatility and adaptability in motor control. Recent investigations of muscle synergies, motor primitives, compositionality, basic action concepts, and related work in machine learning have contributed to advance, at different levels, our understanding of the modular architecture underlying rich motor behaviors.

However, the existence and nature of the modules in the control architecture is far from settled. For instance, regularity and low-dimensionality in the motor output are often taken as an indication of modularity but could they simply be a byproduct of optimization and task constraints? Moreover, what are the relationships between modules at different levels, such as muscle synergies, kinematic invariants, and basic action concepts?

One important reason for the new interest in understanding modularity in motor control from different viewpoints is the impressive development in cognitive robotics. In comparison to animals and humans, the motor skills of today's best robots are limited and inflexible. However, robot technology is maturing to the point at which it can start approximating a reasonable spectrum of isolated perceptual, cognitive, and motor capabilities. These advances allow researchers to explore how these motor, sensory and cognitive functions might be integrated into meaningful architectures and to test their functional limits. Such systems provide a new test bed to explore different concepts of modularity and to address the interaction between motor and cognitive processes experimentally.

Thus, the goal of this Research Topic is to review, compare, and debate theoretical and experimental investigations of the modular organization of the motor control system at different levels. By bringing together researchers seeking to understand the building blocks for coordinating many muscles, for planning endpoint and joint trajectories, and for representing motor and behavioral actions in memory we aim at promoting new interactions between often disconnected research areas and approaches and at providing a broad perspective on the idea of modularity in motor control. We welcome original research, methodological, theoretical, review, and perspective contributions from behavioral, system, and computational motor neuroscience research, cognitive psychology, and cognitive robotics.

**Citation:** d'Avella, A., Giese, M., Ivanenko, Y. P., Schack, T., Flash, T., eds. (2016). Modularity in Motor Control: From Muscle Synergies to Cognitive Action Representation. Lausanne: Frontiers Media. doi: 10.3389/978-2-88919-805-4

# Table of Contents


Y. P. Ivanenko, G. Cappellini, I. A. Solopova, A. A. Grishin, M. J. MacLellan,

R. E. Poppele and F. Lacquaniti

# **Section 2: Muscle synergies**


Katherine M. Steele, Matthew C. Tresch and Eric J. Perreault

*141 Quantitative evaluation of muscle synergy models: a single-trial task decoding approach*

Ioannis Delis, Bastien Berret, Thierry Pozzo and Stefano Panzeri


Enrico Chiovetto, Bastien Berret, Ioannis Delis, Stefano Panzeri and Thierry Pozzo

*190 Effort minimization and synergistic muscle recruitment for three-dimensional force generation*

Daniele Borzelli, Denise J. Berger, Dinesh K. Pai and Andrea d'Avella


Cristiano De Marchis, Maurizio Schmid, Daniele Bibbo, Anna Margherita Castronovo, Tommaso D'Alessio and Silvia Conforto

*325 Distinguishing synchronous and time-varying synergies using point process interval statistics: motor primitives in frog and rat* Corey B. Hart and Simon F. Giszter

# **Section 3: Motor primitives at the kinematic level**


*380 From ear to hand: the role of the auditory-motor loop in pointing to an auditory source*

Eric O. Boyer, Bénédicte M. Babayan, Frédéric Bevilacqua, Markus Noisternig, Olivier Warusfel, Agnes Roby-Brami, Sylvain Hanneton and Isabelle Viaud-Delmon

*389 Spatio-temporal analysis reveals active control of both task-relevant and task-irrelevant variables*

Kornelius Rácz and Francisco J. Valero-Cuevas

*406 Dynamic primitives in the control of locomotion* Neville Hogan and Dagmar Sternad

# **Section 4: Neural substrates**


Simon A. Overduin, Andrea d'Avella, Jose M. Carmena and Emilio Bizzi


Olesya A. Mokienko, Alexander V. Chervyakov, Sofia N. Kulikova, Pavel D. Bobrov, Liudmila A. Chernikova, Alexander A. Frolov and Mikhail A. Piradov

# **Section 5: Models**


Cristiano Alessandro, Juan Pablo Carbajal and Andrea d'Avella

*559 Learned graphical models for probabilistic planning provide a new class of movement primitives*

Elmar A. Rückert, Gerhard Neumann, Marc Toussaint and Wolfgang Maass


*613 Synergetic motor control paradigm for optimizing energy efficiency of multijoint reaching via tacit learning*

Mitsuhiro Hayashibe and Shingo Shimoda


# **Section 6: Robotics**

# *690 Learning modular policies for robotics*

Gerhard Neumann, Christian Daniel, Alexandros Paraschos, Andras Kupcsik and Jan Peters

*703 MACOP modular architecture with control primitives*

Tim Waegeman, Michiel Hermans and Benjamin Schrauwen

*716 A soft body as a reservoir: case studies in a dynamic model of octopus-inspired soft robotic arm*

Kohei Nakajima, Helmut Hauser, Rongjie Kang, Emanuele Guglielmino, Darwin G. Caldwell and Rolf Pfeifer

*735 Kinematic primitives for walking and trotting gaits of a quadruped robot with compliant legs*

Alexander T. Spröwitz, Mostafa Ajallooeian, Alexandre Tuleu and Auke Jan Ijspeert

# **Section 7: Intermittent control**


# **Section 8: Action representation**

*779 From action representation to action execution: exploring the links between cognitive and biomechanical levels of motor control*

William M. Land, Dima Volchenkov, Bettina E. Bläsing and Thomas Schack

# Editorial: Modularity in motor control: from muscle synergies to cognitive action representation

Andrea d'Avella1, 2 \*, Martin Giese<sup>3</sup> , Yuri P. Ivanenko<sup>2</sup> \*, Thomas Schack <sup>4</sup> and Tamar Flash<sup>5</sup>

*<sup>1</sup> Department of Biomedical Sciences and Morphological and Functional Images, University of Messina, Messina, Italy, <sup>2</sup> Laboratory of Neuromotor Physiology, Santa Lucia Foundation, Rome, Italy, <sup>3</sup> Section for Computational Sensomotorics, Department of Cognitive Neurology, Hertie Institute for Clinical Brain Research and Center for Integrative Neuroscience, University Clinic Tuebingen, Tuebingen, Germany, <sup>4</sup> Research Group Neurocognition and Action-Biomechanics and Cognitive Interaction Technology-Center of Excellence, Bielefeld University, Bielefeld, Germany, <sup>5</sup> Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot, Israel*

Keywords: modularity, motor control, muscle synergies, motor primitives, compositionality, action representation, robotics

Mastering a rich repertoire of motor behaviors, as humans and other animals do, is a surprising and still a poorly understood outcome of evolution, development, and learning. Many degrees-offreedom, non-linear dynamics, and sensory delays provide formidable challenges for controlling even simple actions. Modularity as a functional element, both structural and computational, of a control architecture might be the key organizational principle that the central nervous system employs for achieving versatility and adaptability in motor control. Recent investigations of muscle synergies, motor primitives, compositionality, basic action concepts, and related work in machine learning have contributed, at different levels, to advance our understanding of the modular architecture underlying rich motor behaviors.

However, the existence and nature of the modules comprising the control architecture is far from settled. For instance, regularity and low-dimensionality of the motor output are often taken as an indication of modularity but they could simply be a byproduct of optimization and task constraints. Moreover, what are the relationships between modules at different levels, such as muscle synergies, kinematic invariants, and basic action concepts?

One important reason for the new interest in understanding modularity in motor control from different perspectives is the impressive development in cognitive robotics. In comparison to animals and humans, the motor skills of today's best robots are limited and inflexible. However, robot technology is maturing to the point at which it can start approximating a reasonable spectrum of different perceptual, cognitive, and motor capabilities. These advances allow researchers to explore how these motor, sensory, and cognitive functions might be integrated into meaningful architectures and to test their functional limits. Such systems provide a new test bed to explore different concepts of modularity and to experimentally investigate possible interactions between motor and cognitive processes.

Thus, the goal of this Research Topic is to review, compare, and debate theoretical and experimental studies of the modular organization of the motor control system at different levels. By bringing together researchers seeking to understand the building blocks of coordinating many muscles, planning endpoint and joint trajectories, and representing motor and behavioral actions in memory we aim at promoting new interactions between often disconnected research areas and approaches and providing a broad perspective on the notion of modularity in motor control.

### Edited by:

*Si Wu, Beijing Normal University, China*

Reviewed by: *Malte J. Rasch, Beijing Normal University, China*

### \*Correspondence:

*Andrea d'Avella and Yuri P. Ivanenko a.davella@hsantalucia.it; y.ivanenko@hsantalucia.it*

Received: *11 September 2015* Accepted: *22 September 2015* Published: *09 October 2015*

### Citation:

*d'Avella A, Giese M, Ivanenko YP, Schack T and Flash T (2015) Editorial: Modularity in motor control: from muscle synergies to cognitive action representation. Front. Comput. Neurosci. 9:126. doi: 10.3389/fncom.2015.00126*

# Reviews and Perspectives

A number of review articles present and discuss available evidence, conceptual frameworks, and fundamental questions concerning modularity in motor control. These cover a range of issues such as the effective dimensionality, movement invariants, neural underpinnings, evolution, motor learning, and recovery of motor function.

Lacquaniti et al. (2013) provides a comprehensive review of evolutionary and developmental modules. These authors focus on modular control of locomotion to argue that the building blocks used to construct different locomotor behaviors are similar across several animal species, presumably related to ancestral neural networks of command. The authors present evidence that modular units of development are highly preserved and recombined during evolution.

In a thought-provoking review article, Duysens et al. (2013) argue that there is large overlap between the notions on modules and the older concepts of reflexes. They reason that facilitation of the flexor synergy at the end of the stance phase is linked to the activation of circuitry that is responsible for the generation of locomotor patterns (CPG, "central pattern generator"). More specifically, it is suggested that the responses in that period relate to the activation of a flexor burst generator. The latter structure forms the core of a new asymmetric model of the CPG. Beloozerova et al. (2013) review data on the differential controls for the shoulder, elbow, and wrist that are used by populations of neurons in the thalamo-cortical network. It is one of manifestations of a modular organization of control for locomotion. The authors hypothesize that this contributes to an effective control of a global limb parameter, the length of the stride, which results in a great reduction in variability of paw placement during accurate stepping.

Santello et al. (2013) propose a theoretical framework to reconcile important and still debated concepts such as the definitions of "fixed" vs. "flexible" synergies and mechanisms underlying the combination of synergies for hand control. d'Avella and Lacquaniti (2013) review recent results from the analysis of reaching muscle patterns supporting a control strategy consisted in the sequencing of time-varying muscle synergies. Alessandro et al. (2013b) review the works related to muscle synergies in neuroscience and control engineering and provide an overview of the methods that have been employed to test the validity of the control scheme. Specifically, the authors suggest that to assess the functional role of muscle synergies, synergy extraction methods should explicitly take into account task execution variables. Bizzi and Cheung (2013) address two critical questions: the explicit encoding of muscle synergies in the nervous system, and how muscle synergies simplify movement production and motor learning.

Another important field of research is the outcome of interventions in neurological disorders with motor deficits. Uncovering a common underlying neural framework for the modular control of movements and its dysfunction represents an interesting avenue for future work. Casadio et al. (2013) review the state of the art of computational models for neuromotor recovery from stroke through exercise, and their implications for treatment. The review specifically covers models of recovery at central, functional and muscle synergy level. Ivanenko et al. (2013) review various examples of adaptation of locomotor patterns in patients and discuss the findings in a general context of compensatory gait mechanisms, spatiotemporal architecture, and modularity of the locomotor program. Such investigations may have important implications related to the construction of gait rehabilitation technology. Further research needs to clarify whether plasticity in muscle patterns originates from sharing common modules or by creating new muscle synergies and whether the rehabilitation programs may benefit from revitalizing the modules underlying motor behaviors.

# Muscle Synergies

Amongst the original research articles, a large group of contributions is dedicated to the modular organization of multi-muscle activity across different motor tasks. It has been hypothesized that the nervous system simplifies muscle control through modularity, using neural patterns to activate muscles in groups called synergies.

An important example of ongoing debate is the current discussion of the critical aspects and organization of muscle synergies. de Rugy et al. (2013) argue that the usefulness of muscle synergies as a control principle should be evaluated in terms of errors produced and, using data from a forceaiming task in two dimensions, illustrate through simulation how synergy decomposition inevitably introduces substantial task space errors. They also show that the number of synergies required to approximate the optimal muscle pattern for an arbitrary biomechanical system increases with task-space dimensionality, which indicates that the capacity of synergy decomposition to explain behavior depends critically on the scope of the original database. Steele et al. (2013) present evidence that the number and choice of muscles impact the results of muscle synergy analyses. Thus, researchers should be cautious in evaluating muscle synergies when EMG is measured from a small subset of muscles.

Delis et al. (2013a,b) stress the effectiveness of the decoding metric in systematically assessing muscle synergy decompositions in task space and the functional role of trialto-trial correlations between synergy activations. The results of Chiovetto et al. (2013) support the notion that each EMG decomposition provides a set of well-interpretable muscle synergies, identifying reduction of dimensionality in different aspects of the movements. Borzelli et al. (2013) test whether the CNS generates forces by minimum effort recruitment of either individual muscles or muscle synergies during the generation of isometric forces at the hand. The minimum effort recruitment of synergies predicts the observed muscle patterns better than the minimum effort recruitment of individual muscles. Russo et al. (2014) compare the torques acting at four arm joints during fast reaching movements in different directions and show that muscle pattern dimensionalities are higher than torques dimensionalities. They argue that this is necessary to overcome the non-linearities of the musculoskeletal system and to flexibly generate endpoint trajectories with simple kinematic features using a limited number of building blocks.

In the context of direction-specific recruitment of muscle synergies, Gentner et al. (2013) investigate adaptation to a visuomotor rotation of a virtual target displacement and show that the structure of muscle synergies is preserved, suggesting that changes in muscle patterns are obtained by rotating the directional tuning of the synergy recruitment. Bengoetxea et al. (2014a,b) employ a dynamic recurrent neural network (DRNN) and principal component analysis of EMG activity during discrete and rhythmic arm movements. The authors discuss consistent patterns of muscle groupings in the context of their functional organization for controlling orthogonal movement directions. Berger and d'Avella (2014) recorded EMG activity and isometric hand forces during a force-aiming task in a virtual environment. In contrast to de Rugy et al. (2013), they show that muscle synergies can be used to generate target forces in multiple directions with the same accuracy achieved using individual muscles. Strikingly, human subjects are able to perform the task immediately after switching from force-control to EMG-control and synergy-control, suggesting that muscle synergies provide an effective strategy for motor coordination.

Whether muscle synergies are shared across tasks or they are task-specific is another debated aspect of modularity. Chvatal and Ting (2013) compare muscle synergies during multidirectional support-surface perturbations during standing and walking, as well as unperturbed walking. They find both shared and task-specific muscle synergies, suggesting that differences in muscle synergies across conditions reflect differences in the biomechanical demands of the tasks and that muscle synergies may define a repertoire of biomechanical subtasks recruited according to task-level goals. Frere and Hug (2012) demonstrate that the muscle synergies are consistent across experienced gymnasts, even during a skilled motor task that requires learning. De Marchis et al. (2013) investigate muscle synergies during pedaling in humans. Additional modules are identified when visual feedback about mechanical effectiveness is available and the structure of the identified modules is found similar to that extracted in other studies of human walking, confirming the existence of shared and task specific muscle synergies. Finally, Hart and Giszter (2013) present a method that uses point process statistics to discriminate the forms of synergies in motor pattern data. According to this method, frog and rat EMG data are most consistent with synchronous synergy models, supporting separated control of rhythm and pattern of motor primitives.

# Motor Primitives at the Kinematic Level

A number of contributions aim at understanding motor primitives at the kinematic level. Zelman et al. (2013) explore whether different octopus arm movements are built up of elementary kinematic units by decomposing surfaces, representing curvature, and torsion values of the paths of points along the arm, into a weighted combination of 2D Gaussian functions, considered as motion primitives at the kinematic level of octopus arm movements. Endres et al. (2013a) investigate the endpoint trajectories of human movements (sign language) that are characterized by the power laws linking velocity and curvature. The parameters of these power laws are exploited for the unsupervised segmentation of actions into movement primitives. Sternad et al. (2013) propose that control of sensorimotor behavior may utilize dynamic primitives. Their results clearly indicate a gradual transition between discrete and rhythmic arm movements, supporting the proposal that representation is based on primitives rather than on veridical internal models. Boyer et al. (2013) investigate interactions between the auditory and motor systems to uncover different modular neural processes involved in the multisensory and motor representations of targets in goal-directed movements and corresponding reference frames for each sensory modality. Racz and Valero-Cuevas (2013) suggest that the similar nature of control actions across time scales in both task-relevant and task-irrelevant spaces points to a level of modularity not previously recognized in motor tasks. Hogan and Sternad (2013) propose that the spectacular performance of a wide range of upper- and lower-limb behaviors arises from encoding motor commands in terms of three classes of dynamic primitives: submovements, oscillations, and mechanical impedances. They present some methods for addressing the challenges posed by the experimental identification of these dynamic primitives and consider the implications of this theoretical framework for locomotor rehabilitation.

# Neural Substrates

Another exciting area explored in this Research Topic is potential neural substrates for modularity in motor control and action representation. Takei and Seki (2013) argue about synaptic and functional linkage between spinal interneurons and the organization of hand-muscle synergies. Abeles et al. (2013) discuss the compositional structure of hand movements by analyzing and modeling neural and behavioral data obtained from experiments where monkeys performed scribbling movements. A classification of the neural data employing a hidden Markov model shows a coincidence of the neural states with the behavioral categories of movement segmentations that are primarily parabolic in shape. Overduin et al. (2014) investigate whether muscle synergies evoked by intracortical microstimulation (ICMS) in rhesus macaques are similarly encoded by nearby motor cortical units during object reach, grasp, and carry movements. They find that the synergy most strongly evoked at an ICMS site matches the synergy most strongly encoded by proximal units more often than expected by chance. The results suggest a common neural substrate for microstimulation-evoked motor responses and for the generation of muscle patterns during natural behaviors. Krouchev and Drew (2013) describe a modular organization of the locomotor step cycle in the cat in which a number of sparse synergies are activated sequentially during unobstructed locomotion and during voluntary gait modifications. The authors argue that the changes in phase and magnitude of a finite number of muscle synergies could be produced by changes in the activity of neurons in the motor cortex. Mokienko et al. (2013)

study motor imagery of grasping movements and corresponding neural underpinnings in brain-computer interface trained human subjects.

# Models

A number of modeling papers address different aspects of modularity. In a multi-directional reaching task simulated with a musculoskeletal model of the human arm, Ruckert and d'Avella (2013) propose a movement primitive representation that employs parametrized basis functions, which combines the benefits of muscle synergies and dynamic movement primitives, and show how movement primitives can be used to learn appropriate muscle excitation patterns and to generalize effectively to new reaching skills. Sartori et al. (2013) use a Gaussian-shaped impulsive excitation curves or primitives as input drive for large musculoskeletal models across different human locomotion tasks. Alessandro et al. (2013a) examine the feasibility of controlling non-linear dynamical systems by linear combinations of a small set of torque profiles or motor synergies and suggest that in order to realize an effective and lowdimensional controller, synergies should embed features of both the desired tasks and the system dynamics.

Significant progress has been made with respect to some fundamental questions concerning optimization of control architectures and motor learning. Rückert et al. (2012) propose a movement primitive representation based on probabilistic inference in learned graphical models with properties that comply with salient features of biological movement control. In simulations of a complex 4-link balancing task, they show that movement primitives facilitate learning and lead to better generalization. Endres et al. (2013b) address the issue of the selection of the parameters of movement primitive models or the model type and propose an approach based on a Laplace approximation to the posterior distribution of the parameters of a given blind source separation model. They validate the approach on simulated data and on human gait data, finding that an anechoic mixture model with a temporal smoothness constraint on the sources can best account for the data. Kuppuswamy and Harris (2014) investigate whether muscle synergies can reduce the state-space dimensionality while maintaining task control. Based on the observation that constraining the control input to a weighted combination of temporal muscle synergies also constrains the dynamic behavior of a system in a trajectory-specific manner, they show that smooth straight-line Cartesian trajectories with bellshaped velocity profiles emerged as the optima for the reaching task and that trajectory and synergy specific dimensionality reduction results from muscle synergy control. Hayashibe and Shimoda (2014) aim at identifying a modular control architecture realizing adaptability and optimality without prior knowledge of system dynamics. They propose a novel motor control paradigm based on tacit learning with task space feedback. The proposed paradigm can optimize solutions for reaching with a three-joint, planar biomechanical model, acquiring motor synergy, and finding energy efficient solutions for different load conditions.

A few contributions further examine the usage of neural networks. Schilling et al. (2013) demonstrate a solution for the selection and sequencing of different (attractor) states required to control different behaviors of a hexapod walker as forward walking at different speeds, backward walking, as well as negotiation of tight curves. The proposed control architecture of a recurrent neural network is characterized by different types of modules being arranged in layers and columns, and can also be considered as a holistic system showing emergent properties which cannot be attributed to a specific module. Hoellinger et al. (2013) describe the use of a DRNN mimicking the natural oscillatory behavior of human locomotion for reproducing the planar covariation rule in both legs at different walking speeds. This emerging property in the artificial neural networks resonates with recent advances in neurophysiology of inhibitory neurons that are involved in central nervous system oscillatory activities. The main message of this study is that this type of DRNN may offer a useful model of physiological central pattern generators for the purpose of gaining insights in basic research and developing clinical applications.

Ehrenfeld et al. (2013) address the question of how the brain maintains a probabilistic body state estimate over time from a modeling perspective. The results showed that the neural estimates can detect and decrease the impact of false sensory information, can propagate conflicting information across modules, and can improve overall estimation accuracy due to additional module interactions. Finally, Tagliabue and Mcintyre (2014) review different formulations of concurrent models for sensory integration and propose a modular approach in which the overall behavior is built by computing multiple concurrent comparisons carried out simultaneously in a number of different reference frames.

# Robotics

Findings in biological research concerning a modular control hierarchy, which combines movement/motor primitives into complex and natural movements, inspire engineers in the quest for adaptive and skillful control for robots. Neumann et al. (2014) present a unified approach for learning a modular control architecture, introducing new policy search algorithms that are based on information-theoretic principles and are able to learn to select, adapt, and sequence the building blocks to compose more complex behaviors. The authors summarize their experiments for learning modular control architectures in simulation and with real robots. Waegeman et al. (2013) propose a modular architecture with control primitives (MACOP) which uses a set of controllers, where each controller becomes specialized in a subregion of its joint and task-space. The authors evaluate MACOP on a numerical model of a robot arm by training it to generate desired trajectories and show how MACOP compensates for the dynamic effects caused by a fixed control rate and the inertia of the robot. Nakajima et al. (2013) explore the idea that control, which is conventionally thought to be handled by the brain or a controller, can partially be outsourced to the physical body and the interaction with the environment. By using a soft robotic arm inspired by the octopus they show in a number of experiments how control is partially incorporated into the physical arm's dynamics and how the arm's dynamics can be exploited to approximate non-linear dynamical systems. Spröwitz et al. (2014) implement kinematic primitives for walking and trotting gaits of a quadruped robot and show that a very low complexity of modular, rhythmic, feed-forward motor control is sufficient for level-ground locomotion in combination with passive compliant legged hardware.

# Intermittent Control

Evidence for intermittency in human motor control has been repeatedly observed in the neural control of movement literature and it has been discussed in this Research Topic in the context of the modular organization of the motor control system. Karniel (2013) focuses on an area in which intermittent control has not yet been thoroughly considered, with respect to the structure of muscle synergies. He presents the minimum transition hypothesis and its predictions with regard to the structure of muscle synergies. D'Andola et al. (2013) demonstrate that that the control of interceptive movements (catching a flying ball) relies on a combination of reactive and predictive processes through the intermittent recruitment of time-varying muscle synergies. van de Kamp et al. (2013) explore modular organization in whole body control architecture within the intermittent control paradigm with an intermittent interval of around 0.5 s. The authors suggest that parallel sensory input

# References


converges to a serial, single channel process involving planning, selection, and temporal inhibition of alternative responses prior to low dimensional motor output and may underlie the flexibility of human control. Such studies may have important implications with respect to the design of brain machine interfaces and human robot interaction.

# Action Representation

The final theme we have identified in the contributions centers on the modular organization and interaction between motor and cognitive processes. Land et al. (2013) explore the links between cognitive and biomechanical levels of motor control in order to understand the extent to which the output at a kinematic level is governed by representations at a cognitive level of motor control. The authors apply a new spatio-temporal decomposition method for assessing memory structures underlying complex actions in order to investigate the overlap between the structure of motor representations in memory and their corresponding kinematic structures.

Taken together, this Research Topic demonstrates the impressive breadth of research currently being undertaken on modularity in motor control.

# Acknowledgments

Supported by the EU Seventh Framework Programme (FP7-ICT No 248311 AMARSi).


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 d'Avella, Giese, Ivanenko, Schack and Flash. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Evolutionary and developmental modules

#### **Francesco Lacquaniti 1,2,3\*,Yuri P. Ivanenko<sup>3</sup> , Andrea d'Avella<sup>3</sup> , Karl E. Zelik <sup>3</sup> and Myrka Zago<sup>3</sup>**

<sup>1</sup> Department of Systems Medicine, University of Rome Tor Vergata, Rome, Italy

<sup>2</sup> Centre of Space Bio-Medicine, University of Rome Tor Vergata, Rome, Italy

<sup>3</sup> Laboratory of Neuromotor Physiology, IRCCS Santa Lucia Foundation, Rome, Italy

### **Edited by:**

Tamar Flash, Weizmann Institute, Israel

### **Reviewed by:**

Dougal Tervo, Janelia Farm Research Campus, Howard Hughes Medical Institute, USA Vincent C. K. Cheung, Massachusetts Institute of Technology, USA

### **\*Correspondence:**

Francesco Lacquaniti, Laboratory of Neuromotor Physiology, IRCCS Santa Lucia Foundation, Via Ardeatina 306, 00179 Rome, Italy. e-mail: lacquaniti@caspur.it

The identification of biological modules at the systems level often follows top-down decomposition of a task goal, or bottom-up decomposition of multidimensional data arrays into basic elements or patterns representing shared features. These approaches traditionally have been applied to mature, fully developed systems. Here we review some results from two other perspectives on modularity, namely the developmental and evolutionary perspective. There is growing evidence that modular units of development were highly preserved and recombined during evolution.We first consider a few examples of modules well identifiable from morphology. Next we consider the more difficult issue of identifying functional developmental modules. We dwell especially on modular control of locomotion to argue that the building blocks used to construct different locomotor behaviors are similar across several animal species, presumably related to ancestral neural networks of command. A recurrent theme from comparative studies is that the developmental addition of new premotor modules underlies the postnatal acquisition and refinement of several different motor behaviors in vertebrates.

**Keywords: activation pattern, CPG, gene expression, interneurons, locomotion**

# **INTRODUCTION**

Modules are easily defined in human-engineered product design. The final product is split up into components (modules) that are designed, manufactured, and tested independently. The modules can be reused in different systems that perform different tasks. Modular industrial design tends to reduce manufacturing complexity and costs. Parsimony, reduction of complexity, and evolutionary costs presumably also underlie modular solutions in biological systems, but biological modules are not always easily identifiable. We will return later to the issue of cost factors possibly underlying the evolution of modules. Here we first address the issue of the identification of biological modules.

According to Fodor (1983), modular systems are decomposable in a set of independent processes. However, the degree of independence among the basic components is not fixed. According to Simon (1969), the interactions among components are negligible in comparison with those within components only in fully decomposable systems. Instead, nearly decomposable systems are those in which the interactions among components are weak but not negligible. Finally, un-decomposable systems are those in which the interactions among components are as strong as those within components.

A biological module could be defined on the basis of structural (morphological), functional, or developmental elements. A structural module is defined by a set of spatially defined, interconnected elements. A functional module is a discrete entity whose function is separable from that of other modules (Hartwell et al., 1999). A developmental module is a component of a developing organism (e.g., an embryo) that is semi-autonomous relative to pattern formation and differentiation, or relative to a signaling cascade (Schlosser and Wagner, 2004).

Spatially bounded structures (e.g., the ribosomes) are the most clearly defined modules at a subcellular level. At a functional level, the ensemble of cellular proteins (proteome) can be partitioned into a limited set of complexes, which differentially combine to support diverse functions (Ravasz et al., 2002; Gavin et al., 2006). The identification of biological modules at the systems level is even more challenging. Top-down decomposition may help recognizing a modular structure. A complex event (such as the goal of a motor task) can be detailed by decomposing it in several more basic informational events. If the system being analyzed is structured as a hierarchy, decomposition can be performed recursively (iteratively), each stage leading to finer and finer basic components (Palmer and Kimchi, 1986). Successive decompositions remove some of the complexity that is inherent in higher levels of description, but they raise the issue of when to stop. Ideally, one would like to stop when a stage of primitive informational events has been reached. One could take as primitive some computationally plausible set of operations that is sufficient to perform the task. This approach has been used in the study of vision, for instance (Palmer, 1999). Thus, the task of visual object identification is decomposed in the distinct operations involved in image-based, surface-based, object-based, and category-based processing.

Modules may also be identified using bottom-up decomposition, instead of top-down decomposition. Starting from a multidimensional array (e.g., electromyographic EMG activity), we aim at identifying basic components that represent some common features, which are shared by a number of the original data (for instance, the output variables in a motor task, Lacquaniti et al., 2012b; d'Avella and Lacquaniti, 2013). Here the basic components represent building blocks that are used to construct a given behavior; therefore, they can be considered as primitives in a computational sense (Flash and Hochner, 2005; Bizzi et al., 2008; Giszter et al., 2010).

Top-down and bottom-up approaches can be merged together using hierarchical multiple layers of representation. For example, hand action representations have been modeled as involving a temporal postural synergy (one kind of motor module) only if the ancestors of the synergy in the tree-structured organization are themselves involved in the action representation (Tessitore et al., 2013). The advantage of combining the two approaches consists in the possibility of validating an *a priori* hypothesis about the structure of the task implied in top-down decomposition with the *post hoc* identification of basic components derived from bottom-up decomposition.

# **EVOLUTIONARY AND DEVELOPMENTAL PERSPECTIVES**

The approaches described above have been applied primarily to mature, fully developed systems. A different perspective on modularity is provided by comparative developmental and evolutionary studies (e.g., Gilbert et al., 1996; Wagner and Altenberg, 1996; Winther, 2001; Schlosser and Wagner, 2004; Flash and Hochner, 2005; Cheung, 2007; Gerhart and Kirschner, 2007; Raff, 2007; Giszter et al., 2010). Evolutionary developmental biology (often referred to as Evo-Devo) aims at understanding how evolutionary trajectories (phylogeny) are constrained by developmental rules (ontogeny), and how developmental rules themselves evolve. Thus, processes evolve to produce new patterns of development, new gene regulation, new morphologies, and new behaviors (Raff, 2007).

There is now much evidence that modular units of development were highly preserved and recombined during evolution. It has been argued that biological modularity is the result of evolution and facilitates evolution (Schlosser and Wagner, 2004). Evolvability, that is, the capacity of a system for adaptive evolution, is positively affected by modularity when the developmental modules of an organism match the modularity of specific adaptive functions (Kirschner and Gerhart, 1998; Schlosser and Wagner, 2004).

There are several applications of the concept of modularity from an evolutionary developmental perspective. We first consider a few examples of anatomical modules, because these are relatively well identifiable and represent a model for the more difficult identification of modules in motor control. During development, anatomical modules form integrated series of relatively distinct, autonomous components that help partitioning different parts of the embryo, reducing the effects of any changes on the organism as a whole (Reno et al., 2008). The developmental modules have a genetically discrete organization, resulting in identifiable domains within the developing organism, and these modules undergo evolution (Raff, 2007). The most basic modular gene expression networks provide the crucial regulation of bodyplan development at the phylum level of the animal kingdom (Raff, 2007).

A striking example of genetically controlled modularity has been demonstrated by Halder et al. (1995). They obtained ectopic eyes on the wings, legs, and antennae of *Drosophila* by misexpression of the cDNA "eyeless" gene. (The products of "eyeless" and twin of "eyeless" gene are homologs of the vertebrate Pax6.) The ectopic eyes consisted of complete eye structures and were responsive to light, indicating that "eyeless" is the control master for the genesis of the fly eye.

Studies of fossils of Placodermi (prehistoric fishes) showed separate evolution of jaws and teeth, through distinct developmental modules, topographically related but each independently genetically regulated (Smith and Johanson, 2003; Rücklin et al., 2012). Comparative embryological studies showed that the transcription factor Satb2 specifies a developmental module within the mandibular jaw (Fish et al., 2011). Satb2 is expressed in the mesenchyme of the jaw primordia that gives rise to distal elements of both the upper and lower jaws.

The development of the limbs and their evolutionary adaptation to locomotion and otherfunctions provide another important example of modularity. Limb development begins in the embryological limb field, followed by the ectoderm bulging out as the limb bud. Certain homeobox (Hox) genes act as selector genes with spatially regulated expression patterns that specify differential growth in distinct modules. Hox genes are widely shared across the animal kingdom, from insects to reptiles and mammals. In general, these genes are linearly ordered in a sequence which maps into the spatial order and timing of development of different body regions. Duplication of Hox genes can produce new body segments, and this mechanism probably played an important role in the evolution of segmented animals. In particular HoxA and HoxD specify segment identity along the limb axis.

It is well known that tetrapod limbs have evolved from fish fins. Teleosts are ray-fin fishes that diverged from the lobe-fin ancestors of tetrapods more than 400 millions of years ago. The fin buds of the living zebrafish (a teleost) exhibit the early phases of Hoxd13 expression that are observed in tetrapod limb buds, despite the fact that the appendages of zebrafish have no skeletal homologs in tetrapods (Raff, 2007). Zebrafish shows walking-like movements of the pectoral fins during slow swimming (Thorsen et al., 2004). Late-phase HoxD expression pattern was present in primitive bony fish and was lost together during evolution with the posterior portion of the ancestral fin in teleosts. It was retained, however, in sarcopterygians like *Tiktaalik*, and co-opted into the tetrapod limb (Raff, 2007). *Tiktaalik* was an extinct fish (late Devonian) with limb-like fins that probably allowed it "walking" on land.

Hoxa and Hoxd gene expression patterns also demarcate boundaries of several developmental domains in the limbs of birds and mammals. Thus, the five HoxD genes expressed in distal limb buds regulate the formation of the five digits in living tetrapods. In particular, anthropoids (humans, apes, new world, and old word monkeys) exhibit differential adaptations in forearm and digital skeletal proportions to the specific locomotor modes. It has been argued that Hox-defined developmental modules may have served as evolutionary modules during hand evolution in anthropoids (Reno et al., 2008). On the basis of forearm and digital morphometric data in several anthropoid species and of correlations with the spatial patterns of Hoxd expression territories during bone growth, Reno et al. (2008) postulated the existence of at least two developmentally independent growth modules: (1) a posterior digit module which includes the distal radius, posterior digit metacarpals, and phalanges of the long digits, and (2) a distal thumb module. The growth of the posterior digit module is possibly regulated by Hoxd11, and the thumb module by Hoxa13 and Hoxd13. Thus, the posterior digit elongation observed in ateline and colobine (a new world and old word monkey, respectively) could have been achieved through an up-regulation of Hoxd11 expression (Reno et al., 2008). An elongated thumb, on the other hand, may interfere with brachiation; thus, thumb reduction in atelines and colobines may have been an adaptation to brachiation, and may have resulted from the modulation of Hoxa13/Hoxd13 expression. The length of the radius and posterior digits in humans is short relative to that of other higher primates. Instead, humans have very long thumb phalanges. This pattern could be achieved by the combination of up-regulation of the targets of Hoxd13 and/or Hoxa13 expression, and down-regulation of Hoxd11 targets (Reno et al., 2008).

### **SPINAL CIRCUITS FOR THE CONTROL OF LOCOMOTION**

The identification of functional modules in motor control is more difficult than that of anatomical modules, but similar principles of evolutionary conservation probably apply. In particular, there is growing evidence that the building blocks used to construct locomotion are similar across several animal species, presumably related to ancestral neural networks of command. In all vertebrates, spinal neuronal networks termed Central Pattern Generators (CPGs) generate the basic rhythms and patterns of motor neurons (MNs) activation for locomotion (Grillner, 2006). CPGs are controlled by descending supraspinal inputs (including locomotor command regions in the brainstem), as well as by sensory inputs. Comparative studies in vertebrates using genetic and electrophysiological tools have consistently shown that there are several common principles in the organization and regulation of CPGs (Goulding, 2009; Kiehn, 2011). In particular, the core premotor components of locomotor circuitry mainly derive from a set of embryonic interneurons which are remarkably conserved across different species (Goulding, 2009). Grillner (2011) suggested that the neural control system for locomotion can be traced back to the lamprey, a jawless fish-like vertebrate, which appeared about 560 million years ago, before any legged animal had evolved yet. Notice that not only the spinal CPG modules, but also some supraspinal centers which contribute importantly to locomotor control, such as the basal ganglia, have been conserved throughout vertebrate phylogeny starting with the lamprey (Grillner et al., 2013).

In addition to the segmentally organized MNs that innervate adjacent myotomes, there are a few basic classes of neurons in the vertebrate locomotor CPG with established homologies across different aquatic (lamprey, *Xenopus*, zebrafish) and terrestrial (mouse) species (Grillner, 2006; Goulding, 2009; Kiehn, 2011). (1) Glycinergic inhibitory commissural interneurons project to the opposite side of the spinal cord, and provide the mid-cycle inhibition underlying left-right alternation: the muscles on each side of the body must contract out of phase with those on the opposite side. These interneurons are termed inhibitory CINs in the lamprey and *Xenopus*, CoSA neurons in the zebrafish, and V0<sup>D</sup> neurons in the mouse. (2) Ipsilaterally projecting inhibitory interneurons provide inhibition to MNs and to commissural interneurons, and can regulate the speed of locomotion. They are termed L-interneurons in the lamprey, CiA neurons in the zebrafish, and V1 neurons in the mouse. The V1 class includes Renshaw cells and Ia-inhibitory interneurons. (3) Glutamatergic excitatory interneurons project to the other CPG cell types. A number of these interneurons provide rhythmic drive to MNs and other CPG neurons. They are termed EIN in the lamprey, CiD neurons in the zebrafish, and V2a neurons in the mouse.

The close correspondence of several classes of spinal CPG neurons across aquatic and terrestrial vertebrates suggests that the corresponding neuronal modules may have been evolutionarily conserved between the swimming and walking CPG. This close phylogenetic relationship is especially evident in the embryonic spinal cord (Goulding, 2009; Kiehn, 2011). Nevertheless, the full neural circuitry required for legged-locomotion is more complex than that required for swimming, presumably in relation to the different biomechanics of these modes of locomotion (Goulding, 2009). Swimming movements of aquatic vertebrates differ considerably from the limbed locomotion of terrestrial vertebrates. Whereas fishes mainly use side-to-side flexion of the torso to shift in water, most extant tetrapods mainly use their limbs for propulsion, with trunk movements playing a subsidiary role in the potentiation of limb movements. In addition, legged-locomotion on land places unique demands related to weight-loading, postural, and limb control on uneven terrains.

We still do not know how the swimming CPG has been modified during evolution to sustain legged-locomotion. The transition may have been gradual, since amphibians and reptiles show oscillatory movements of the axial body that are tightly coupled to limb movements. Limb movements may have resulted from a reconfiguration of the swimming CPG at limb metamers, or they may have resulted from the addition of specialized modules controlling limb flexor–extensor muscles (Goulding, 2009). Some animal species can sustain swimming and walking at different developmental stages; for instance, amphibia shift from one locomotor mode to the other one from larval to adult metamorphosis (Combes et al., 2004). Also, as noted above, the zebrafish exhibits walking-like movements of the pectoral fins during slow swimming (Thorsen et al., 2004), indicating that some neural substrates for a walking mode are already present in teleosts (Goulding, 2009).

While most spinal cord is involved in the control of swimming in aquatic species (with a rostro-caudally traveling wave, phase-shifted in adjacent axial myotomes), CPGs at cervical and lumbar-sacral levels control the forelimbs and hindlimbs, respectively, of walking mammals. In particular, the isolated lumbar and sacral spinal cord is able to generate quasi-normal walking at the hindlimbs. According to one view, the CPG network for each limb may include multiple inter-connected modules controlling the movement of each joint, with a coupling of CPG activity across limb joints. The ability to generate rhythmic motor output is not evenly distributed in the lumbo-sacral cord, but there is a rostro-caudal excitability gradient (Deliagina et al., 1983; Cazalets and Bertrand, 2000; Kiehn, 2006). Indeed, isolated rostral segments (L1–L3 in rodents, L3–L5 in cats, D7–D10 in turtles) exhibit a stronger rhythmic drive than isolated caudal segments (L4–L6, L6–S1, and S1–S2, respectively). The stronger rhythmic drive of the rostral cord (which contains hip MNs) suggests that these segments act as leading oscillators, entraining more caudal and less excitable oscillators (perhaps those controlling knee and ankle joints). Motor bursts propagate rostrally and caudally

from the lumbar region to farther segments (Falgairolle and Cazalets, 2007). In addition to autonomous signals generated within the spinal networks, sensory signals from joint, muscle, and skin receptors play a major role in shaping locomotion. In particular, sensory signals are involved in the onset and on-line adjustment of the locomotor rhythm, they can affect the amplitude and phase of the activity profiles in motor output, and their central effects are gated temporally with the result that reflexive contributions become appropriate to the specific phase of the step cycle (Pearson, 2000).

## **CONTROL MODULES FOR LOCOMOTION**

Although the evidence reviewed above suggests the existence of modularly organized circuits in spinal CPGs, their exact functional structure remains unclear. Modularity might be organized at a segmental level, involving the control of single joints (in limbed-animals) or single axial metamers (in swimming animals). While this scheme appears compatible with the organization of swimming CPGs (as in the lamprey, Grillner, 2006), it appears unlikely in mammals. Thus, in both cats and humans, single joint movements are controlled by distinct spinal segments, because muscles involved in joint flexion and extension are innervated in several different segments even far apart. Alternatively, there could exist distinct flexor and extensor modules spanning multiple joints (or metamers), distinct from the commissural modules (including commissural interneurons) involved in left-right alternation. However, in both cats and humans, flexor and extensor muscles can be co-activated in a given phase of the gait cycle, and it is not evident whether this co-activation reflects shared or independent drives directed to flexors and extensors. Still another possibility is that there exist distinct rhythm-generation and pattern-generation modules (McCrea and Rybak,2008). This reflects the idea of multilayered organization of CPGs. Neurons in the rhythm-generating module would be two or more synapses upstream from MNs and project to pattern-generating neurons; the latter project monosynaptically to MNs. The main evidence for separate control of rhythm and pattern is provided by the observation that CPGs can maintain the period and phase of locomotor oscillations both during spontaneous deletions of MNs activity and during sensory stimulation affecting MNs activity (McCrea and Rybak, 2008).

# **DEVELOPMENT OF LOCOMOTION**

Further cues about control modules are provided by developmental studies of locomotor patterns (Lacquaniti et al., 2012a). It has recently been shown that the primitive stepping patterns exhibited by human babies are retained and tuned, while new patterns are added during development (Dominici et al., 2011). Stepping can be elicited in newborns supported under the arms in an upright, slightly tilted forward posture, after contacting ground with the feet soles. Reflex stepping has been reported also in premature infants at 30+ weeks post-conception (Allen and Capute, 1986) and anencephalic newborns (Peiper, 1961). This is consistent with a predominant role of spinal and brainstem mechanisms, at a time when cerebral connections to the spinal cord are still immature. The neural patterns of muscle control have been studied by factorization of EMG activity into the basic components (Dominici et al., 2011). In human newborns, two patterns were sinusoidally

modulated over the step cycle: one pattern was timed around the body support phase of stance, while the other one was timed during swing. Toddlers (∼1-year-old) at their first independent steps showed the same two patterns of the newborn, and 2 new patterns timed at touch-down and lift-off, probably contributing the shear forces necessary to decelerate and accelerate the body, respectively. In preschoolers (2–4-years), all four patterns showed transitional shapes: the older the child, the closer the waveform to the adult.

The development of adult gait from infant stepping depends on a progressive integration of supraspinal, intraspinal, and sensory control (Yang et al., 1998). In particular, the lack of muscle patterns around foot contact in the neonate might depend on immature sensory and/or descending modulation of stepping. Without sensory modulation (as in fictive locomotion), the spinal circuitry of animals also generates sinusoidal-like patterns (Falgairolle and Cazalets, 2007), similar to those observed in the human neonate. The addition of basic patterns in the first months of life implies a functional reorganization of inter-neuronal connectivity, the appearance of additional functional layers in the CPGs, and/or more powerful descending and sensory influences on CPGs.

Locomotor-like oscillatory activity can be recorded from the lumbar and sacral ventral roots of the isolated spinal cord of neonatal rats, bathed with dopamine plus NMDA or serotonin (Falgairolle and Cazalets, 2007). Factorization of the electroneurograms associated with this fictive walking showed two patterns essentially identical to those of human newborns (Dominici et al., 2011). Factorization of the EMG of adult rats, cats, macaques, and guinea fowls showed four patterns, closely resembling those found in human toddlers (Dominici et al., 2011). However, with development, the motor patterns may become tuned to the specific biomechanical requirements of each animal species. Thus, brief, pulsatile activations timed at the apex of limb oscillations is specific of human adult locomotion, perhaps in relation to our unique erect bipedal locomotion on extended legs and a heel-contact well ahead of the body (Dominici et al., 2011).

Also the topographical maturation of intraspinal connections in human babies is reminiscent of the organization observed in other animals. In rodents, turtles, and cats, a rostro-caudal excitability gradient has been described in the lumbo-sacral CPGs (Kiehn, 2006). Also human newborns exhibit a higher activation of lumbar vs. sacral segments (Ivanenko et al., 2013). With human development, the lumbar and sacral loci of activation become more dissociated with shorter activation times (Ivanenko et al., 2013), but the upper lumbar CPG may represent a major pacemaker also in human adults (Gerasimenko et al., 2010), whereas the sacral CPG could play a subordinate role for adaptation to specific foot-support interactions (Selionov et al., 2009). Overall, these behavioral results are consistent with the genetic and electrophysiological studies reviewed above which demonstrate that, despite the existence of species-specific features, there are several common principles in the neural organization and regulation of CPGs.

### **CONCLUSION**

The goal of identifying modules based on independence of function has been reached only in selected cases so far. In particular, experimental attempts to associate specific functions to individual

modules of locomotor control mainly involved simple correlations between the temporal changes in biomechanical parameters and the parallel changes in muscle activity (Lacquaniti et al., 2012b). Computer simulations have also been used to correlate biomechanics and muscle activity (Neptune et al., 2009). The evolutionary developmental perspective may provide an alternative and fruitful approach by considering how developmental units match specific adaptive functions in different organisms (Kirschner and Gerhart, 1998; Schlosser and Wagner, 2004).

Although the evolutionary developmental approach is still in its infancy in the field of motor control, some general principles have already emerged. For example, the developmental addition of new premotor modules appears to underlie the postnatal acquisition and refinement of several different motor behaviors in vertebrates. We reviewed how new control patterns are added to the primitive ones during the development of human stepping (Dominici et al., 2011). Similar principles have been uncovered in the development of song. Thus, juvenile zebra finches initially produce babbling-like sub-songs, and start generating more mature songs with recognizable phrases around the seventh week after hatching (Aronov et al., 2008). This transition follows the formation of synaptic connections between the song premotor nucleus HVC and the motor nucleus RA, suggesting that more mature songs depend on the developmental addition of the HVC module to the premotor pathway (Aronov et al., 2008). Another example is provided by the development of murine vibrissal circuitry (Takatoh et al., 2013). Mice start employing rhythmic sweeping of their vibrissae for exploration around 2 weeks after birth. At about the same time, new sets of bilateral excitatory inputs are added to vibrissa facial MNs from neurons in the lateral paragigantocellularis nucleus (Takatoh et al., 2013). Moreover, descending axons from the motor cortex directly innervate these premotor neurons.

In general, modules can be conserved in their basic structural *bauplan,* but can lead to divergent morphologies and functions. Like a child with the Meccano®or Lego®parts, Nature has constructed a multitude of different forms and behaviors starting

### **REFERENCES**


from a basic set of components over millions of years of evolution. Different animal species have gross morpho-functional differences, but they often use surprisingly similar organizational modules. A striking example is provided by the comparison of reaching strategies used by man and octopus, an invertebrate. Despite the evolutionary gap and morphological differences, humans, and octopuses evolved similar strategies to reach for a target (Sumbre et al., 2006). Thus, arm extension in octopus is controlled by basic muscle synergies involving the activation of all arm muscles (Sumbre et al., 2006), similar to the synergy control of human reaching (d'Avella and Lacquaniti, 2013). Moreover, octopus arms generate a quasi-articulated structure based on three dynamic joints with a tight temporal co-variance of joint motions (Sumbre et al., 2006). This kinematic invariance is closely reminiscent of the joint motion co-variance of human reaching movements (Soechting and Lacquaniti, 1981).

To make a simplistic metaphor, the flexible use and combination of similar basic components resulting in widely different behaviors could be equated to the different expression of similar genes resulting in widely different phenotypes and behaviors in different animal species. Novel evolutionary characteristics might have emerged from changes in expression of basic control patterns phylogenetically conserved, rather than the expression of totally new patterns. Here, we argued that features that are conserved across species might be modules that are recombined during evolution for the emergence of new phenotypes. However, an alternative possibility that currently we cannot rule out for complex organizations at systemic level, such as the motor patterns, is that their conservation is explained just as a result of convergent evolution, that is, they result from similar environmental pressure (natural selection).

### **ACKNOWLEDGMENTS**

Our work was supported by the Italian Ministry of Health, Italian Ministry of University and Research (PRIN grant), Italian Space Agency (DCMC and CRUSOE grants), and European Union FP7-ICT program (AMARSi grant #248311).

and their development. *Science* 334, 997–999.


modularity of the yeast cell machinery. *Nature* 440, 631–636.


locomotion. *Curr. Opin. Neurobiol.* 22, 822–828.


correspond to Hoxd expression territories. *J. Exp. Zool. B Mol. Dev. Evol* 310, 240–258.


temporal postural synergies. *Exp. Brain Res.* 225, 11–36.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 13 March 2013; accepted: 30 April 2013; published online: 17 May 2013.*

*Citation: Lacquaniti F, Ivanenko YP, d'Avella A, Zelik KE and Zago M (2013) Evolutionary and developmental modules. Front. Comput. Neurosci. 7:61. doi: 10.3389/fncom.2013.00061*

*Copyright © 2013 Lacquaniti, Ivanenko, d'Avella, Zelik and Zago. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

# The flexion synergy, mother of all synergies and father of new models of gait

# *Jacques Duysens 1,2\*, Friedl De Groote3 and Ilse Jonkers <sup>1</sup>*

*<sup>1</sup> Department of Kinesiology, KU Leuven, Heverlee, Belgium*

*<sup>2</sup> Department of Research, Sint Maartenskliniek, Nijmegen, Netherlands*

*<sup>3</sup> Department of Mechanical Engineering, KU Leuven, Heverlee, Belgium*

### *Edited by:*

*Yuri P. Ivanenko, IRCCS Fondazione Santa Lucia, Italy*

### *Reviewed by:*

*Yuri P. Ivanenko, IRCCS Fondazione Santa Lucia, Italy Francesca Sylos Labini, IRCCS Santa Lucia Foundation, Italy*

### *\*Correspondence:*

*Jacques Duysens, Department of Kinesiology, Tervuursevest 101 - bus 01500, BE-3001 Heverlee, Belgium. e-mail: jacques.duysens@ faber.kuleuven.be*

Recently there has been a growing interest in the modular organization of leg movements, in particular those related to locomotion. One of the basic modules involves the flexion of the leg during swing and it was shown that this module is already present in neonates (Dominici et al., 2011). In this paper, we question how these finding build upon the original work by Sherrington, who proposed that the flexor reflex is the basic building block of flexion during swing phase. Similarly, the relation between the flexor reflex and the withdrawal reflex modules of Schouenborg and Weng (1994) will be discussed. It will be argued that there is large overlap between these notions on modules and the older concepts of reflexes. In addition, it will be shown that there is a great flexibility in the expression of some of these modules during gait, thereby allowing for a phase-dependent modulation of the appropriate responses. In particular, the end of the stance phase is a period when the flexor synergy is facilitated. It is proposed that this is linked to the activation of circuitry that is responsible for the generation of locomotor patterns (CPG, "central pattern generator"). More specifically, it is suggested that the responses in that period relate to the activation of a flexor burst generator. The latter structure forms the core of a new asymmetric model of the CPG. This activation is controlled by afferent input (facilitation by a broad range of afferents, suppression by load afferent input). Meanwhile, many of these physiologic features have found their way in the control of very flexible walking bipedal robots.

**Keywords: flexion reflex, local sign, reflex modules, synergy, central pattern generator, gait, forward model**

# **INTRODUCTION**

One of the first authors to point out the modular organization of the motor control system was Sherrington (1910a,b). Reading his work, it is clear that for him the flexor reflex was the mother of all modules and synergies (in the broad sense, not in the sense of the mathematical synergies defined recently). He proposed that the flexor reflex is a basic building block of the central nervous system and that "the flexion reflex is in reality the reflex stepping of the limb" (pp. 69 in Sherrington, 1910b). In his view, stepping was basically a series of flexion reflexes, with extension occurring merely as the "rebound" following the flexion. The extension during the stance phase of gait could be provided as some type of "extensor thrust," evoked by "the weight of the animal applied through the foot against the ground" (pp. 78 in Sherrington, 1910b). In the absence of support (air stepping), the rhythmic activity continues, which for Sherrington was an argument for stating that "the extensor thrust cannot therefore be an indispensable factor in the reflex step" (pp. 79 in Sherrington, 1910b). This idea of a basic asymmetry in the control of locomotion has since lost terrain, mostly because of the powerful impact of the (symmetrical) half-center model for the central pattern generation of locomotion (one half of this center inducing activity in flexors, the other in extensors). The latter model was described by Brown (1914) and is known as the "half-center" model. The first ideas in that direction were actually presented by Sherrington himself on the basis of work by Brown (1911, 1912). They are based on experiments showing that cats with a transected spinal cord and with cut dorsal roots still showed rhythmic alternating contractions in ankle flexors and extensors. However, Sherrington did not necessarily propose a symmetrical organization. Instead, he and Brown proposed originally that gait was the result of a balance "between equal and opposite states of excitation" in flexors and extensors, while being well-aware that the origin of these states could be quite different. The latter notion seems to have been lost in later years.

In addition, in many cases the discussion on the credits for the original ideas about a central basis for locomotion has been simplified considerably in many accounts (as explained elegantly in a review by Stuart and Hultborn, 2008). Both Sherrington (1910a) and Philippson (1905) have indeed emphasized the idea that during gait one phase induced automatically the next one (reflex chain) but this does not mean that these authors excluded a central origin for the rhythmic activity (for details see also Clarac, 2008). In particular, Philippson believed that the spinal control was due to a combination of central and reflex mechanisms (Clarac, 2008). Hence it is a gross simplification to see this part of the history as a "victory" of Brown over his competitors (Sherrington and Philippson). Brown should be credited for having provided compelling evidence for the central spinal origin of locomotor activity, while Sherrington and Philippson should be remembered for their important insights on the importance of afferent input for the control of gait.

The "half-center" model has helped us greatly in appreciating the spinal origin of the central pattern generator (CPG) for locomotion, but it may have led to the simplifying idea of symmetry within the CPG (Jankowska et al., 1967a,b; Lundberg, 1981; Lafreniere-Roula and McCrea, 2005). In that way it has deterred our thinking away from the notion of a basic asymmetry of the neural organization of locomotion.

Nevertheless, there have been attempts to remediate these shortcomings and to explain gait in terms of asymmetric models (for review see Guertin, 2009). For example, Pearson and Duysens (1976) introduced a swing generator model, based on work on cats and cockroaches (**Figure 1**).

This model again assigns the flexor synergy (defined as the synchronous activation of flexors) a central place ("swing generator"). In contrast, the activation of extensors is thought to rely more on feedback systems, notably for load receptors (Duysens and Pearson, 1980; Dietz and Duysens, 2000; Duysens et al., 2000; Pearson, 2004). For the flexor synergy, there is little doubt that there may be an involvement of part of the spinal CPG for locomotion, even for humans (Duysens and Van de Crommert, 1998). However, for the extensor synergy, the requirements are very different. The extensor synergy is called upon by the loading of the limb. Pressure on the foot sole can simulate this loading and results in an "extensor thrust" (Sherrington, 1906). Hence, it is basically a peripherally driven synergy, not a centrally triggered one. In terms of sensory feedback the organization of gait is basically asymmetrical since interaction with the environment is much more intense during the stance phase (Duysens, 2006). In contrast, for the swing phase, there is only the need for a trigger (in this case limb unloading and hip extension). Even Sherrington already recognized that the flexor synergy was greatly facilitated by hip extension (see pp. 81 in Sherrington, 1910b). For him, the extension phase followed automatically after the flexion phase, which was the only phase that needed to be centrally triggered. From earlier cat work, it is confirmed that this transition to the stance phase is indeed facilitated at the end of the flexor activity (Duysens, 1977).

Recent data have provided support for such asymmetric models. Thanks to the insights from recent use of genetic manipulations of CPG neurons, it is now widely accepted that the core premotor components of locomotor circuitry are common and derive from a set of embryonic interneurons that are remarkably conserved across different species (e.g., Goulding, 2009). In particular, it is of interest to consider the organization of "swimming" CPGs since they are the evolutionary basis for the "walking" CPGs. In this respect, it should be emphasized that these models of the swimming CPG are highly asymmetric. In the lamprey, for example, there are four functional classes of neurons in the swimming CPG. One of these four consists of excitatory glutamatergic neurons (EINs), projecting to all three other CPG neuron cell types. These cells provide rhythmic drive to other CPG neurons during swimming.

In mammalian systems the idea of an asymmetric CPG is also taken seriously (Brownstone and Wilson, 2008; Zhong et al., 2012). Some of the evidence relies on the observation that rhythmic bursts of activity (in muscles or nerves to leg muscles) sometimes are skipped during periods of real or fictive locomotion (this is termed "spontaneous deletions"). They often occur in reduced preparations of cats (Duysens, 1977, 2006) or rats (Zhong et al., 2012). Such deletions are hard to explain on the basis of a simple half-center model (McCrea and Rybak, 2007, 2008). One typical feature is that these deletions are highly asymmetric: flexor deletions are accompanied by sustained ipsilateral extensor activity, whereas rhythmic flexor bursting is not altered

Adapted from Pearson and Duysens (1976). This model could underlie a number of locomotor behaviors, as long as they include a flexor and an extensor phase. In humans, the question has been raised whether one should not assume that there are separate spinal CPGs for different types of gait, such as for forward and backward gait (Jansen et al., 2012) or for walking

favor of the idea that the same CPGs can be utilized for different locomotor behaviors but that different supraspinal descending systems facilitate the reconfiguration of the spinal CPGs. This is in line with recent work on animal species where it is possible to record from individual neurons within CPGs (see "Discussion" in the papers mentioned above).

during extensor deletions. Such results are best explained by a rhythm generator that provides direct input to a "swing" or "flexor burst" generator but not to the extensor part of the CPG (Rybak et al., 2006a,b; Zhong et al., 2012). Hence, it is basically similar to the model proposed originally by Pearson and Duysens (1976) except that the swing generator is split up in a rhythm generator and a flexor center.

Some earlier cat modeling work had pointed toward asymmetry as well. For example, the model proposed by Prochazka and Yakovenko (2007a) seems symmetrical at first sight but it already contains important elements of asymmetry. In particular it is argued that interneurons in the extensor timing element may receive less inputs generating persistent inward currents, therefore as a network "they are not only set to have longer half-cycle durations, but also to be more sensitive to synaptic commands." Interestingly, the model was only stable for extensor dominant phase-duration characteristics (where extension durations vary more than flexion durations; see also Prochazka and Yakovenko, 2007a,b). This pattern is seen in the normal cat. The inverse (flexor dominant) was, however, not observed in the model while it has been observed experimentally, but only in fictive locomotion (rhythmic output of the spinal cord in paralyzed cat preparations). This is an important point as the existence of both flexor- and extensor-dominated patterns has often been invoked to support the notion of a symmetrical CPG (McCrea and Rybak, 2008). However, one may wonder whether the flexor-dominated pattern is not simply an artifact of the preparation used, since the output observed is one from CPGs without interaction with afferent input. As pointed out above, the afferent input is crucial for the automated phase transitions. Experimental work on cats has clearly established that peripheral input from the paw (as occurs during touchdown) is very potent in terminating the flexor phase and initiating the extensor phase (Duysens, 1977). Hence, in the absence of such feedback it is not surprising to see flexor phases of abnormally long duration.

This feature was not always recognized and perhaps for this reason, the concept of an asymmetric model first met some resistance (McCrea and Rybak, 2007, 2008). However, due to more recent data (Brownstone and Wilson, 2008; Zhong et al., 2012), the idea of an asymmetric pattern generator has reemerged and it is therefore worthwhile to reexamine the presumed basis of the swing generator, namely the flexor synergy, its adaptations (for example to stimulation of different skin areas on the leg, a phenomenon referred to as "local sign" in physiology) and its integration in the process of locomotion.

# **THE TASK TO WITHDRAW AND THE CORRESPONDING FLEXOR SYNERGIES IN THE SPINAL CORD: A DEFENSE IN FAVOR OF THE "LOCAL SIGN"**

For the flexor reflex, it is clear that the synergy (as described by Sherrington) corresponds very well to the task of withdrawal. This protective reflex is so important that it is present at birth and can be elicited with about any type of stimulus to the foot. In neonates, the flexion reflex responses to innocuous stimulation are already present (Andrews and Fitzgerald, 1999). How do these responses compare to the more recently described synergies (or "components")? In the adult, the flexor reflex cannot simply be related to just one component, although factor 5, as described by Ivanenko et al. (2004), or P3 as described by Dominici et al. (2011), are close candidates. For example, the factor 5 of Ivanenko et al. (2004) relies of strong activations of Sartorius and Tibialis Anterior during the middle of the swing phase.

In neonates, the gait is explained (up to 89%) by just two patterns, one of which peaks at about 75% of the step cycle, hence in the swing phase. This "swing" pattern persists in the adults and is seen in a wide variety of species. When one considers the large input of flexors to these basic patterns, it is tempting to relate these components to the flexor synergy as described by Sherrington (1910b). Furthermore, the appearance of these components in swing is nicely in line with the Sherrington proposal of a common neural basis for the flexor reflex and the flexion phase of stepping. Further experimental evidence for such common use of neural circuitry has been obtained in animal studies. For example, in the turtle, Berkowitz has described interneurons that are active in both types of activity (flexion phase and flexor reflex; Berkowitz, 2007, 2010). In addition, in the same species it was shown that often the same interneurons can be involved in various types of rhythmic behavior (swimming, scratching), thereby supporting the idea that basic synergies can be used in various behaviors (Berkowitz and Hao, 2011; see also Grillner, 1985).

During maturation in humans, the threshold for the reflex increases and biceps femoris responses dominate (Andrews and Fitzgerald, 1999). Furthermore, the pattern of the flexor reflexes changes. The recruitment of specific flexor muscles depends increasingly more on the area of skin stimulated ("local sign"), thereby allowing a more efficient withdrawal when stimuli are applied at various distinct locations on the limb. This has led several authors to propose the existence of a variety of reflex modules both in humans (Andersen et al., 1999, 2001; Sonnenborg et al., 2000, 2001) and in animals (Schouenborg and Weng, 1994; Tresch et al., 1999).

There is no doubt that these new experiments have provided a wealth of very precise data but still the question arises whether this has basically altered our way of thinking. The idea of a "local sign" goes back to the early days of reflex physiology. Creed and Sherrington (1926) stated (pp. 265): "The term flexionreflex . . . denotes strictly speaking a group of reflexes, all more or less alike . . . yet from one afferent to another differing in detailed distribution of the motor units employed, while yet always conforming to the general type flexion-reflex." Especially this last point is important as it is proposed originally that there remains a basic synergy ("flexion-reflex") underlying all these different variations. The data of Creed and Sherrington (1926) showed that, despite variations in some of the distal flexors, the hip and knee flexors always participated in the various reflexes (see their table on pp. 260). More recent work supports this, both in the frog (Tresch et al., 1999) and in the rat (Schouenborg and Kalliomaki, 1990). However, this common element is often not emphasized and the impression may arise that the recently defined "modules" and "synergies" are independent entities. This certainly differs from the view of Creed and Sherrington (1926), who viewed the different versions of the flexor reflexes as expression or adaptations of one and the same basic flexor synergy. Hence the basic issue is whether the recently described reflex modules are also mostly variations of a basic synergy (the flexion reflex) or whether they really constitute separate entities.

In our opinion, there is no convincing evidence for the latter, at least when one considers the literature on withdrawal reflexes. Local cutaneous reflexes do exist, but they differ from withdrawal reflexes. For example, the selective activation of extensor reflexes such as the gastrocnemii was observed when stimulating the skin that covered these muscles (Hagbarth, 1960). For withdrawal reflexes, in contrast, there is no data to show convincing evidence for neural pathways for separate types of flexor reflexes. They mainly show modifications of a basic flexor pattern. These modifications are likely to be due to changes in activity in spinal dorsal horn cells (Schouenborg et al., 1995). During development, the withdrawal reflexes are "fine-tuned" by the spontaneous movements of the individual but the resulting reflexes always have a component of hip and/or knee flexion (Holmberg and Schouenborg, 1996). Hence these studies provide a substantial contribution to our knowledge on the "local sign" but they do not show that there is a conceptual deviation from the notion of "local sign," as originally defined. In addition these studies underline the plasticity of reflexes. Synergies, as defined more recently in mathematical terms, do not fully overlap with these reflexes. Nevertheless, several authors have emphasized that these muscle synergies and modules may also be highly plastic and basically represent solutions for specific tasks at a given time (Latash, 1999; Ivanenko et al., 2013).

# **PATHOLOGY**

The data provided by pathology further support the notion of variations in flexor reflexes rather than a set of separate modules. As one might expect, the adjustments and fine-tuning of the flexor reflex relates to input from descending pathways. Hence, when a lesion occurs in these pathways, one should see a reversal to the more primitive state. This is exactly what happens.

In spinal cord injury (SCI) there is a loss of "local sign" and a return to the simpler forms of flexor reflexes (Schmit et al., 2003; "an invariant flexion response pattern was produced regardless of stimulus location"). In addition, in these patients there is a link between a normal flexor reflex and the ability to recover gait (Dietz et al., 2009). After some 6–12 months, this ability deteriorates when the early flexor reflex (latency 60–120 ms) decreases over time. Again, this illustrates the importance of the flexor reflex circuitry for the generation of gait. In stroke, a similar return to a more primitive synergy occurs after the insult and this phenomenon is known as the Babinski sign (Babinski, 1922). Stimulation of the sole of the foot induces dorsiflexion of the big toe, by activating the extensor hallucis longus muscle, a flexor in the physiological sense. Babinski pointed out that this reflex was part of the flexion synergy of the lower limb and in fact clinicians, still as of today, are advised to watch for flexion of the whole limb as an obligatory concomitant of the reflex (Van Gijn, 1978; Kumar, 2003). The whole reflex is a return to the condition of the neonate, where indeed a dorsiflexion Babinski is normally present, usually in conjunction with a brisk flexion of the whole limb. Interestingly, in the neonate it is important not to stimulate too gently, because otherwise a grasp reflex occurs. This shows that actually what we know as the "normal plantar reaction" (plantar flexion of the toes) may actually be a superposition of two reflexes, with the grasp reflex dominating the flexor reflex. This makes sense in an evolutionary context since, for example, grasping tree branches might have been more important than a "blind" withdrawal defense toward any type of stimulus. In complete SCI subjects, the occurrence of the Babinski sign has been described as well, although it can be absent in some patients due to associated peripheral nerve damage (Petersen et al., 2010).

# **CAN THE REAL FLEXOR REFLEX PLEASE STAND UP!**

One problem in this research field is the confusion on the flexor reflex terminology. In humans, most studies do not use pure nociceptive stimuli such as heat. Instead, electrical stimulation is used. However, it is impossible to activate nociceptive afferents in any nerve without coactivating large myelinated fibers. Therefore, in humans, the response to high intensity electrical stimuli typically has two components, an early (60–120 ms) and a late one (120–200 ms; Shahani and Young, 1971). The difficulty is to decide which afferents are responsible for a given component. If only high intensity stimuli are used, one is easily misled in thinking that the early response is a nociceptive one, while in fact it often can be elicited by low intensity stimuli as well. The problem is aggravated by differences in definition. Hugon (1973) defined the early response (RII) as having a latency of 40–60 ms and the late response (RIII) as having a latency of 85–120 ms (for review see Sandrini et al., 2005). Hence, RIII is really the equivalent of the "early" flexor reflex. In normal control subjects, walking on a treadmill, one can easily evoke RIII responses in a variety of muscles with stimuli that are just above perception threshold (Duysens et al., 1990). Nevertheless, some people label this component as "the flexor reflex" and in fact it was even claimed to be useful as an index of pain (Willer, 1977).

Part of the problem is that some of these reflexes are also task-dependent, needing stronger stimulation under unfavorable conditions. For example, the RIII component can be elicited very easily by stimulation of non-nociceptive low threshold afferents during gait while the same responses may be small or absent in subjects at rest (Duysens et al., 1993; Komiyama et al., 2000). During gait, the RIII responses are especially prominent in muscles such as biceps femoris and tibialis anterior both in intact cats (Duysens and Loeb, 1980) and in intact humans (Duysens et al., 1990; Yang and Stein, 1990; Zehr et al., 1997). When cutaneous stimuli are given at the ankle just prior to the onset of the swing phase, they elicit responses in these flexors, just as one would expect from Sherringtons' work (see Duysens et al., 2004). However, at end of swing the same stimuli elicit facilitatory responses in extensor muscles (Duysens et al., 1990) while providing suppression to flexor muscles (Duysens et al., 1990; Yang and Stein, 1990). This has been termed "reflex reversal" (in analogy with the use of this term in cat literature, Forssberg et al., 1975; Duysens and Pearson, 1976). In later work it was shown that such reversal of EMG responses resulted in a reversal of behavioral responses (flexion, extension) as well (Duysens et al., 1992; Zehr et al., 1997). Furthermore, the responses depended heavily on "local sign" (Van Wezel et al., 1997; Zehr et al., 1997, 1998; Nakajima et al., 2006).

These examples show that synergies are extremely flexible and their expression depends highly on the task and the phase of the movement ("phase-dependent modulation"). A given stimulus does not always elicit the same responses in the same muscles. One way to interpret this type of results is by assuming that a given afferent input (or descending command) is translated in the spinal cord into responses that are appropriate for the state of the interneurons related to a given phase of the movement (Drew, 1991). This view differs from the contention that reflexes or synergies are fixed building blocks. Instead it opens the way to the idea that they are highly adaptable entities depending on the constraints of the environment and the state of the central nervous system ("timevarying muscle synergies," d'Avella et al., 2003; Ivanenko et al., 2006b).

An important unresolved issue concerns the pathways of the flexor reflexes or synergies. Since both components of the flexor reflex persist in patients with a complete spinal cord lesion, it is evident that the minimal responsible pathways could go through the spinal cord (Shahani and Young, 1971). In fact, in recent literature the first component is often simply labeled "spinal reflex" (Dietz et al., 2009; Bolliger et al., 2010; Dietz, 2010; Hubli et al., 2011, 2012). While this is entirely appropriate for SCI patients, it can be questioned whether this can also be used as a valid term when intact humans are tested since responses with similar latencies have been related to circuits either through brainstem (spinobulbospinal "SBS" reflexes, Shimamura et al., 1980) as well as through cortex (Christensen et al., 1999). Hence, the responses at a given latency can arise from very different sources.

# **ACTING AGAINST GRAVITY: EXTENSOR SYNERGIES IN THE SPINAL CORD**

In the interaction with the environment, one of the most crucial forces to deal with is gravity. This even applies to the flexor reflex. Indeed, it is often overlooked that the flexor reflex involves not only the activation of flexor muscles but also the suppression of extensor activity. This could be particularly important for situations where the limb is loaded, for example during the stance phase of gait. In such cases it is crucial that a contact with a nociceptive stimulus (a sharp object) can induce a fast unloading of the limb (Santos and Liu, 2007). However, in most cases with normal ground surface, there is no need for unloading but instead there is a need to recruit additional extensor activity as soon as the limb is loaded (early stance). In the latter case, there is a need to suppress the flexor synergy. Work on cats has revealed that this is achieved through the activation of load receptors in the extensor muscles (Duysens and Pearson, 1980; Whelan, 1996; Duysens et al., 2000). Models, allowing reinforcing feedback from extensors during the stance phase of gait, have successfully simulated cat gait (Prochazka et al., 1997). In humans, the role of load feedback in shaping the extensor output during gait has been recognized as well (Dietz and Duysens, 2000). Under conditions of simulated reduced gravity, even minimal contact forces, and a very limited amount of loading during the stance phase, have profound effects since it completely restores normal limb trajectory (Ivanenko et al., 2002).

In recent work, the activation of various extensors in the stance phase is identified as a synergy, based on a mathematical decomposition of the EMG data (factors 1 and 2 in Ivanenko et al., 2004; see also Ivanenko et al., 2006a,b, 2007, 2008). Consistent with the idea of combinations of synergies to simplify motor control, the combination of these patterns with other synergies leads to the full process of walking, (d'Avella et al., 2003; Lacquaniti et al., 2012). During maturation there is a gradual transition from a two synergy state control of gait (flexor extensor, in neonates) to a four state synergies in toddlers (Dominici et al., 2011) This is consistent with the idea that additional tasks (such as equilibrium control) are achieved by the addition of extra synergies. In this context, it is of interest that the synergy approach has also been applied successfully in studies on balance and posture (Ting and Macpherson, 2005; Torres-Oviedo et al., 2006). For gait, these synergies are wellestablished (Ivanenko et al., 2005) and they have been shown to be robust in a wide variety of gait conditions (Ivanenko et al., 2004, 2006a,b, 2007). In fact, it is now possible to use these synergies to model human gait (see below) and in the future it is conceivable that these new notions enter the field of robotics, since there is increasing interest to incorporate physiological features in the design of walking robots (Klein and Lewis, 2012).

# **INTRODUCING SYNERGIES IN MODELS OF HUMAN GAIT**

The question arises whether synergies can help to achieve a closer correspondence between calculated and experimentally measured muscle activity in models of human gait. Although it is recognized that muscle activity patterns underlying gait originate from a highly flexible modular system, this is largely ignored in simulation frameworks aiming to causally relate muscle action to gait kinematics and kinetics. Due to the redundancy of the musculoskeletal system, a single motion can be obtained by different muscle coordination strategies. Typically, a performance criterion is optimized to predict the muscle coordination strategy underlying a given motion. Static optimization algorithms minimize muscle activity while imposing that the corresponding muscle forces produce the net joint torques calculated using inverse dynamics (Anderson and Pandy, 2001). Although such optimization approaches predict some basic features seen in the muscles' EMG, other features are not well-predicted. Hence, in addition to biomechanical constraints, it is important to take the principles of neural control into account when estimating muscle activations (Ting et al., 2012). Recently, simulated gait motions based on modular activation patterns were successfully produced (Neptune et al., 2009; Allen and Neptune, 2012; Sartori et al., 2012). Neptune et al. (2009) use five muscle activation modules identified from EMG and assigned each muscle to one module. They then used an optimization approach to find the magnitude and timing of the activation patterns that minimized the tracking error in a forward simulation of gait. They found that the five modules framework they proposed can successfully simulate 2D walking but that it does not provide all control needed for 3D walking. This additional control is important since it may underlie the transition from neonate to adult walking (Dominici et al., 2011; see above).

Alternatively, and in contrast to Neptune et al. (2009), an inverse approach can be used that allows each module to contribute to the activation pattern of all muscles, as described below. Ivanenko et al. (2006a) showed that during gait five Gaussian components G*k*(*t*) with a standard deviation of 6% of the gait cycle duration and appropriate timing account for 90% of the EMG variation. This representation was used to model muscle activation patterns underlying locomotion with each individual muscle activation pattern a*m*(*t*) described as a weighted sum of Gaussian components:

$$\mathbf{a}\_{m}(t) = \sum \mathbf{w}\_{mk} \ \mathbf{G}\_{k}(t),$$

with w*mk* the weight of muscle *m* for component *k*. This description of muscle activation patterns with a static optimization approach allows calculating muscle activations underlying a previously measured gait motion. However, in this approach, the timing of the Gaussian components and the muscle-specific weights of these components were determined using an optimization procedure minimizing the sum of muscle activations squared while a penalty term was used to impose that the corresponding muscle forces produce the net joint torques. The resulting activation patterns were compared to the solution of a "classic" static optimization approach without any constraints on the activation pattern and the measured EMG patterns (**Figure 2**). The experimental protocol, data processing including inverse dynamics, and the static optimization approach are described in De Groote et al. (2012).

The optimized timing of the Gaussian components is 18, 42, 55, 69, and 100% of the gait cycle. The differences between the calculated timings and the timings proposed by Ivanenko et al. are 8, 3, 0, 6, and 5%, respectively. The key features of the EMG are well-predicted by the modules-based activations. Although the correspondence with the inverse dynamics joint torques is higher when the activation patterns are not constraint to a weighted sum of Gaussians, the modules-based activations better predict the measured EMG of biceps femoris, gastrocnemius, and tibialis anterior. For other muscles such as soleus, gluteus medius,

**FIGURE 2 | Comparison of calculated activations and measured EMG for eight superficial muscles.** Activations underlying an experimentally measured gait motion were calculated using static optimization without any constraints on the activation pattern (dashed black) and by modeling the activation patterns as a weighted sum of Gaussian modules (solid

black). EMG was measured for eight superficial muscles using surface electrodes. The EMG (solid gray with standard deviation indicated by the gray band) is scaled to the maximal modules-based activation. For more details on the experimental protocol and data processing see De Groote et al. (2012).

semimembranosus, and vastus lateralis there are still differences in timing. Based on preliminary results we feel that this can be improved by adding positive force feedback to the simulation. Finally for M. rectus femoris (RF) the fit is poor. The weak correspondence between measured EMG and calculated activations for RF is seen in both activation patterns and may be related to the notorious problem of cross-talk for surface EMG for this muscle (Nene et al., 1999, 2004). In fact, it has been recognized that cross-talk can affect synergies as well, but only to the degree that weighting coefficient are altered (Ivanenko et al., 2004). Therefore, some authors have insisted on using fine wire EMG recordings (Ivanenko et al., 2004). Another reason for the difficulty of modeling RF is that this muscle presumably has activity which depends heavily on afferent input and reflexes. For example, in cats the activity in RF differed between fictive locomotion (i.e., in absence of reflexes) and normal forward level walking, indicating that afferent input helps shaping the activity profile of this muscle during locomotor activity (Markin et al., 2012).

# **REFERENCES**


for locomotion and scratching. *Integr. Comp. Biol.* 51, 890–902.


**CONCLUSIONS**

It is clear that the synergy approach is very fruitful and that it can improve our understanding of human gait and its models, including new asymmetrical models of the CPG. Furthermore, it can be helpful in providing the basis of new neuro-computational approaches, as was shown here, as a proof of principle, for inverse dynamic calculation of muscle activations. Nevertheless, as concerns the popular notion of independent modules, a word of caution is in place since it is not fully appropriate to depict the modular organization as being a replacement of some of the older theories ("local sign"), such as those put forward by early physiologists (Creed and Sherrington, 1926).

# **ACKNOWLEDGMENTS**

Jacques Duysens was supported by KU Leuven's "Bijzonder Onderzoeksfonds" (OT/08/034) and by the Research Foundation-Flanders (FWO grant G.0901.11). All authors received support from KU Leuven's Interdisciplinary Research Program (IDO/07/012).


human spinal cord during different gaits and gait transitions. *Eur. J. Neurosci.* 27, 3351–3368.


the control of stepping in cockroaches and cats," in *Neural Control of Locomotion,* eds R. E. Herman, S. Grillner, P. S. Stein, and D. G. Stuart (New York, NY: Plenum Press), 519–535.


Modular organization of excitatory and inhibitory reflex receptive fields elicited by electrical stimulation of the foot sole in man. *Clin. Neurophysiol.* 111, 2160–2169.


synergy organization is robust across a variety of postural perturbations. *J. Neurophysiol.* 96, 1530–1546.


reflexes during human walking. *J. Physiol.* 507(Pt 1), 305–314.

Zhong, G., Shevtsova, N. A., Rybak, I. A., and Harris-Warrick, R. M. (2012). Neuronal activity in the isolated mouse spinal cord during spontaneous deletions in fictive locomotion: insights into locomotor central pattern generator organization. *J. Physiol.* 590(Pt 19), 4735–4759.

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 21 December 2012; accepted: 20 February 2013; published online: 13 March 2013.*

*Citation: Duysens J, De Groote F and Jonkers I (2013) The flexion synergy, mother of all synergies and father of new models of gait. Front. Comput. Neurosci. 7:14. doi: 10.3389/fncom.2013.00014*

*Copyright © 2013 Duysens, De Groote and Jonkers. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

# Distinct thalamo-cortical controls for shoulder, elbow, and wrist during locomotion

# **Irina N. Beloozerova\*, Erik E. Stout and Mikhail G. Sirota**

Division of Neurobiology, Barrow Neurological Institute, St. Joseph's Hospital and Medical Center, Phoenix, AZ, USA

### **Edited by:**

Yuri P. Ivanenko, IRCCS Fondazione Santa Lucia, Italy

### **Reviewed by:**

Sergiy Yakovenko, West Virginia University, USA Gianfranco Bosco, University of Rome Tor Vergata, Italy

### **\*Correspondence:**

Irina N. Beloozerova, Barrow Neurological Institute, St. Joseph's Hospital and Medical Center, 350 West Thomas Road, Phoenix, AZ 85013, USA. e-mail: ibelooz@chw.edu

Recent data from this laboratory on differential controls for the shoulder, elbow, and wrist exerted by the thalamo-cortical network during locomotion is presented, based on experiments involving chronically instrumented cats walking on a flat surface and along a horizontal ladder. The activity of the following three groups of neurons is characterized: (1) neurons of the motor cortex that project to the pyramidal tract (PTNs), (2) neurons of the ventrolateral thalamus (VL), many identified as projecting to the motor cortex (thalamocortical neurons, TCs), and (3) neurons of the reticular nucleus of thalamus (RE), which inhibit TCs. Neurons were grouped according to their receptive field into shoulder-, elbow-, and wrist/paw-related categories. During simple locomotion, shoulder-related PTNs were most active in the late stance and early swing, and on the ladder, often increased activity and stride-related modulation while reducing discharge duration. Elbow-related PTNs were most active during late swing/early stance and typically remained similar on the ladder. Wrist-related PTNs were most active during swing, and on the ladder often decreased activity and increased modulation while reducing discharge duration. In the VL, shoulderrelated neurons were more active during the transition from swing-to-stance. Elbow-related cells tended to be more active during the transition from stance-to-swing and on the ladder often decreased their activity and increased modulation. Wrist-related neurons were more active throughout the stance phase. In the RE, shoulder-related cells had low discharge rates and depths of modulation and long periods of activity distributed evenly across the cycle. In sharp contrast, wrist/paw-related cells discharged synchronously during the end of stance and swing with short periods of high activity, high modulation, and frequent sleeptype bursting.We conclude that thalamo-cortical network processes information related to different segments of the forelimb differently and exerts distinct controls over the shoulder, elbow, and wrist during locomotion.

**Keywords: cat, motor cortex, thalamus, PTN, ventro-lateral thalamus, reticular nucleus of thalamus, accuracy, walking**

# **INTRODUCTION**

Locomotion is one of the most essential and frequently used behaviors. The neural mechanisms that determine the timing and pattern of muscle activity and the coordination of limb movements during locomotion resides in the spinal cord (Shik and Orlovsky, 1976; Grillner and Zangger, 1979; Forssberg et al., 1980a,b). The spinal mechanisms can produce locomotor movements with different rhythms and intensities to adapt to different speeds, different inclines of the support surface, etc. The real environment, however, consists of irregular terrain full of obstacles. Navigating such environments requires land-living animals to control the transfer and placement of their feet accurately. The spinal mechanisms, however, lack information about objects in the outside world that are at a distance. The motor centers of the brain adapt locomotion to the peculiarities of the environment. The motor thalamo-cortical network plays a central role in this adaptation.

In this review we present our recent findings of differential activities of the shoulder, elbow, and wrist-related populations of neurons in the thalamo-cortical network during simple locomotion on flat surface and accurate target stepping along a complex terrain.

Results of a number of biomechanics studies suggest that different segments of the limb are controlled differently. Indeed, limb segments differ in mechanical characteristics, such as dimensions and weight, and differ in their role during movements. Whereas displacements of proximal segments greatly affect the kinematics and kinetics of more distal segments, the influence of a distal segment movement on the mechanical characteristics of proximal segments is much smaller. When Galloway and Koshland (2002) studied point-to-point whole arm movements in humans, they found that movement dynamics differed greatly between the joints. A number of other studies have reported similar data (reviewed in Dounskaia, 2005, 2010). For locomotion, it was shown that the hip angle is an important factor in determining the initiation of the stance-swing phase transition, while angles of distal joints have no effect (Grillner and Rossignol, 1978). In a recent study we found that when stepping has to be accurate

during walking along a horizontal ladder, movements in different joints adapt differently to the accuracy demands (Beloozerova et al., 2010). Based on biomechanical evidence, a "leading joint hypothesis" has been advanced proposing that the joints of a limb play roles in movement production according to their mechanical subordination in the joint linkage (Dounskaia, 2005).

Several lines of evidence suggest that the neuronal mechanisms underlying the differences in controls for different forelimb segments are also different. For example, it is well-known that lesions to the pyramidal tract in primates evoke devastating effects on the fine movements of the fingers and wrist, while the disturbances to movements in the proximal joints are much less severe (e.g., Lawrence and Kuypers, 1968). For a reach and prehension task, it was shown that postspike effects of motor cortex pyramidal tract projecting neurons (PTNs) are both more numerous and more prominent on distal muscles as compared to proximal ones (McKiernan et al., 1998). Furthermore, in a study of postnatal development of the forelimb representation in the motor cortex in the cat, Chakrabarty and Martin (2000) have found that the motor map develops in a proximal-to-distal sequence, with shoulder and elbow controls developing earlier than wrist and digit controls. Developmental differences in the controls for different forelimb joints have been reported in humans as well (e.g., Konczak and Dichgans, 1997). Differences were reported also at the single neuron level. While it has been found that nearly all neurons in the shoulder/elbow area of the motor cortex modulate their activity during reaching in accordance with the posture of the arm (Scott and Kalaska, 1997), the activity of only a fraction of neurons in the hand area is wrist posturerelated (Kakei et al., 2003). However, the neuronal mechanisms underlying differences in controls for different limb segments have never been explicitly studied until recently. Here we present our data on the differential controls for the shoulder, elbow, and wrist that are used by populations of neurons in the thalamo-cortical network.

All our experiments were conducted in chronically instrumented cats walking on a flat surface and along a horizontal ladder (**Figure 1**). Neurons in the motor cortex (MC), all of which were identified as PTNs; neurons in the motor thalamus, most of which were identified as thalamo-cortical projection neurons (TCs) of the ventrolateral nucleus of the thalamus (VL); and inhibitory interneurons of the motor compartment of the reticular nucleus of the thalamus (RE) were recorded (**Figure 3**). Neurons recorded within each of the MC, VL, and RE were grouped according to the location of their receptive field into shoulder-, elbow, and wrist/paw-related subpopulations. The discharges of these subpopulations within each of the motor centers were compared across the step cycle of simple and ladder locomotion and between the centers. Significant differences were found both between the neuronal groups within each of the motor centers as well as between the centers.

Original data on biomechanics of ladder locomotion were published in Beloozerova et al. (2010); on the activity of the MC – in Stout and Beloozerova (2012); on the activity of the VL – in Marlinski et al. (2012a); and on the activity of the RE – in Marlinski et al. (2012b). Data on biomechanics and the activity of the MC, VL, and RE were all obtained in identical experiments although

**FIGURE 1 | Locomotion tasks**. **(A)** Cats walked in an experimental box that was divided into two corridors. In one of the corridors, the floor was flat, while the other corridor contained a horizontal ladder. White circles on the crosspieces of the ladder schematically show placements of cat forelimb paws. This schematic drawing is not to scale. **(B)** A typical distribution of right forelimb paw prints recorded from one cat during 10 walking passages though each corridor: on a flat surface (simple locomotion) and along the ladder with crosspieces 5 cm wide (complex locomotion). View from above. The direction of the cat's progression is shown by the arrow on the top. For simple locomotion, paw prints are adjusted to start in the same position. During the ladder task, the first paw placement during ladder locomotion was between the crosspieces. Ellipses enclose approximate areas in which 95% of paw prints were found. (Adapted with modifications from Beloozerova et al., 2010).

conducted on different sets of cats. Methods of data collection and spike trains analysis have been described earlier (Beloozerova and Sirota, 1993a; Prilutsky et al., 2005; Beloozerova et al., 2010; Marlinski et al., 2012a,b; Stout and Beloozerova, 2012) and will be briefly outlined below when necessary. All experiments were conducted in accordance with NIH guidelines and with the approval of the Barrow Neurological Institute Animal Care and Use Committee.

# **LOCOMOTION TASKS**

Two locomotion tasks were used: (1) simple locomotion on a flat surface, and (2) accurate stepping on the crosspieces of a horizontal ladder (**Figure 1A**). A box 2.5 m long and 0.6 m wide served as an experimental chamber. It had two corridors. In one of the corridors, the floor was flat, while the other corridor contained a horizontal ladder. The crosspieces of the horizontal ladder were flat and 5 cm wide, so that cats had full paw support on the crosspieces. Crosspieces were spaced 25 cm apart, that is, at half of the mean stride length observed in the chamber during locomotion on flat floor (Beloozerova and Sirota, 1993a; Beloozerova et al., 2010). Cats were continuously walking around the chamber, sequentially passing through both corridors, briefly stopping after each round in one of the corners for a food reward.

In our studies we have used a comparison between "nonaccurate" locomotion on the flat surface and "accurate" stepping on crosspieces of a horizontal ladder as a tool to reveal the portion of neuronal activity that represents control signalsfor accuratefoot placement during locomotion. It has been demonstrated in several studies that simple locomotion does not require vision and can be successfully performed after the MC has been ablated or inactivated, while locomotion that requires accurate foot placement on complex surfaces, including on a horizontal ladder, depends on vision (Sherk and Fowler, 2001; Beloozerova and Sirota, 2003; Marigold and Patla, 2008), and on the activity of the MC and VL (Trendelenburg, 1911; Liddell and Phillips, 1944; Chambers and Liu,1957;Beloozerova and Sirota,1993a,1998;Metz andWhishaw, 2002; Friel et al., 2007).

Our detailed examination of biomechanics (229 full-body biomechanical variables were tested) have shown only limited differences between the tasks, apart from paw placement. The variability of paw placement is dramatically smaller during ladder locomotion where, in the direction of progression, it is 5 mm, than during simple unconstrained walking, where it is 70 mm (**Figure 1B**; Beloozerova et al., 2010). In addition, on the ladder, angles at the distal metacarpophalangeal and metatarsophalangeal joints are slightly different, the wrist is more plantarflexed during swing and its plantar flexion moment during most of stance is lower than during simple locomotion (**Figure 2**). In contrast to distal joints, there is no significant difference in the values of the proximal joint angles or moments between simple and ladder locomotion (**Figure 2**). On the ladder cats tilt their neck and head more toward the ground, and the vertical position of the general center of mass and the centers of mass of the neck/head and trunk segments are lower by ∼1–2 cm during ladder as compared to simple locomotion. Out of 229 variables tested, however, there is little else different between simple and ladder locomotion. In particular, the horizontal and vertical displacements of limb segments do not

differ significantly between the tasks during most of the step cycle, and the time histories of paw horizontal velocity are symmetric and smooth; there is no statistical difference in the paw velocities between simple and ladder locomotion.

# **THE THALAMO-CORTICAL NETWORK FOR LOCOMOTION**

In this review we will summarize the activities of the three chief elements of the thalamo-cortical network for locomotion (**Figure 3**). We will first compare and contrast the activities of shoulder-, elbow, and wrist/paw-related neurons of the motor cortex (MC, red plate). All of these neurons were identified as PTNs (red arrow). We will then describe the activity of shoulder-, elbow, and wrist/paw-related neurons of the ventrolateral nucleus of thalamus, a part of the "motor thalamus" (VL, blue circle). The VL receives its major input from the interposed and lateral nuclei of cerebellum (purple arrow), and also receives input from the spinal cord (green arrow). The VL forms the main subcortical input to the MC. Most neurons whose activities are summarized here were identified as thalamo-cortical projection neurons (TCs, blue arrow). TCs synapse on both PTN and interneurons of the MC (Jones, 2007). Finally, we will consider shoulder-, elbow, and wrist/paw-related neurons of the motor compartment of the reticular nucleus of thalamus (RE, gray plate). The RE is a collection of inhibitory neurons that receive inputs from TCs as well as the cortico-thalamic neurons (CT) of the motor cortical layer VI (orange arrow). The RE projects back to the VL, inhibiting it. The RE neurons whose activities are described here received inputs from both the MC and VL. We will not discuss the activity of the CTs of cortical layer VI because, in the MC, they lack somatosensory receptive fields (Sirota et al., 2005), and thus cannot be grouped into shoulder-, elbow, and wrist/paw-related categories.

were similar across the two tasks and for clarity are shown only for simple locomotion. Symbol \* indicates significant (p < 0.05, post hoc t-test)

Beloozerova et al., 2010).

end of the segment to the proximal one. (Adapted with modifications from

arrow shows inhibitory neurons and connection.

In our studies, a "relation" of a neuron to control of the shoulder, elbow, or wrist/paw was inferred solely based on receptive field of the neuron. For PTNs, evidence exists that there is a substantial correspondence between a part of the limb, from which a PTN receives somatosensory information, and whose spinal networks it influences (Asanuma et al., 1968; Sakata and Miyamoto, 1968; Rosen and Asanuma, 1972; Murphy et al., 1975). In particular, it was shown that micro-stimulation in the forelimb region of the MC typically produces contraction in single muscles or in small groups of muscles in the area that composes the receptive field at the stimulation site (Asanuma et al., 1968; Sakata and Miyamoto, 1968; Rosen and Asanuma, 1972; Murphy et al., 1975; Armstrong and Drew, 1985a) and affects monosynaptic reflexes of only one or two muscles (Asanuma and Sakata, 1967). Even when series of pulses of 20µA were used in locomoting subjects, micro-stimulation of a quarter of sites within forelimb motor cortex still affected only one or two muscles (Armstrong and Drew, 1985b). Experiments that used spike-triggered averaging of EMGs in primates showed that although many PTNs excite several motoneuron pools, including those related to muscles on two different segments of the limb or occasionally even across the entire forelimb, approximately half of PTNs influence motoneuron pools that only innervate muscles on one segment of the limb (Buys et al., 1986; McKiernan et al., 1998). For VL and RE neurons no analogous data exist primarily because they are quite remote from muscles. However, the grouping into shoulder-elbow, and wrist/paw-related categories was applied similarly through all elements of the thalamo-cortical network for locomotion. We acknowledge that, at present, it is unknown exactly how cells with different receptive fields in the VL,MC, and RE are connected with each other.

Somatosensory receptive field testing and classification was performed as follows. The receptive fields of neurons were examined in the animals sitting on a comfort pad with their head restrained. Stimulation was produced by palpation of muscle bellies, tendons, and by passive movements of joints. In this review, only neurons with the following somatosensory receptive fields are discussed. (1) The shoulder-related group included neurons responsive only to passive movements in the shoulder joint, and/or palpation of upper back, chest, or lower neck muscles. (2) The elbow-related group included neurons responsive only to passive movements in

the elbow joint and/or palpation of upper arm muscles. (3) The wrist-related group included neurons responsive only to passive movements in the wrist joint, and/or palpation of distal arm muscles, and/or to stimulation of the palm or back of the paw. Neurons responsive to movements of toes or claws, those that had receptive field spanning more than one forelimb segment, and neurons without receptive fields were not included.

# **CHARACTERISTICS OF NEURONS INCLUDED IN THIS REVIEW PTNs OF THE MC**

The activity of 115 PTNs was recorded in eight cats. The vast majority of neurons were sampled from the region of the MC rostral to the cruciate sulcus. In **Figure 4A**, circles overlaying the cortex schematically show microelectrode entry points into the cortex for tracks in which PTNs with different receptive fields were recorded during locomotion. Receptive fields of all these PTNs were located on the contralateral forelimb and were excitatory. Forty-five PTNs were shoulder-related, 30 were elbow-related, and 40 PTNs were wrist-related. There was extensive spatial overlap between PTN groups.

In their somatosensory responses, most PTNs had some directional preference. Among shoulder-related PTNs, 33% were preferentially responsive to flexion, while 20% were preferentially responsive to extension. The other 43% were responsive to abduction or adduction of the joint, or to palpation of the muscles on the back or chest. Among elbow-receptive PTNs, 37% were preferentially receptive to flexion, and 60% were preferentially receptive to extension. Finally, among wrist-receptive PTNs, 42.5% were receptive to plantar (ventral) flexion of the wrist, while 32.5% were receptive to its dorsal flexion. The remaining 25% of the wrist-related PTNs were receptive to palpation of muscles on the forearm or paw.

To determine whether a MC neuron was projecting through the pyramidal tract, the test for collision of spikes was applied (Bishop et al., 1962; Fuller and Schlag, 1976). It is illustrated in **Figures 4B,C**. The latencies of antidromic responses of different PTNs to pyramidal tract stimulation varied in the range of 0.4–5.0 ms. Estimated conduction velocities were between 5 and 80 m/s. In shoulder-, elbow-, wrist-related, and non-responsive PTN groups, the proportions of fast and slow conducting neurons were similar.

## **VL NEURONS, INCLUDING TCs**

The activity of 97 VL neurons, including 53 TCs, was recorded in three cats. Neurons were sampled starting at the most rostral aspect of the VL that borders the ventral anterior nucleus of the thalamus (VA) at the level of the caudal putamen (**Figures 5A,B**) and were found caudally up to the level of the rostral aspect of the lateral geniculate body (**Figure 5C**). In two of cats, retrograde tracers were injected in the area of recoding to determine afferent connections of the areas (WGA-HRP in cat 1, or red fluorescent beads in cat 2). In both cats, numerous labeled neurons were found in the lateral and interposed cerebellar nuclei on the contralateral side, and in cat 1,where recording included theVL-VA border zone, labeled neurons were also found in the lateral half of the ipsilateral entopeduncular nucleus. The receptive fields of all recorded VL neurons were on the contralateral forelimb and were excitatory.

**FIGURE 4 | Location of MC neurons and identification of PTNs**. **(A)** Area of recording in the forelimb representation of the left motor cortex. Microelectrode entry points into the cortex are combined from eight cats and shown by circles on the photograph of the cortex of one cat. Tracks where PTNs with shoulder-related, elbow-related, and wrist-related receptive fields were recorded are shown by purple, yellow, and red circles, respectively. **(B)** Reference electrolytic lesion in the left pyramidal tract. Gliosis surrounding the electrode track and the reference lesion mark are indicated by arrows. Abbreviations: LM, lemniscus medialis; NR, nucleus raphes; PT, pyramidal tract. Frontal 50µm thick section, cresyl violet stain. **(C)** A collision test determines whether a PTN response was antidromic. Top trace, the PTN spontaneously discharges (arrowhead 1), and the pyramidal tract is stimulated 3 ms later (arrowhead 2). The PTN responds with latency of 1 ms (arrowhead 3). Bottom trace, the PTN spontaneously discharges (arrowhead 1) and the pyramidal tract is stimulated 0.7 ms later (arrowhead 2). PTN does not respond (arrowhead 3) because in 0.7 ms its spontaneous spike was still en route to the site of stimulation in the pyramidal tract, and thus collision/nullification of spontaneous and evoked spikes occurred. (Adapted with modifications from Stout and Beloozerova, 2012).

Fifty-one cells, including 34 TCs, responded to passive movements of the shoulder joint and/or palpation of muscles on the back or neck. Slightly more than half of these cells showed a directional preference to shoulder movement, and responded better either to flexion or to extension and/or abduction of the joint. Thirty neurons, including 17 TCs, responded to movements in the elbow joint. Almost all of these neurons had a directional preference: half of them responded to flexion and another half to extension of elbow. Sixteen cells, including two TCs, had receptive fields on the paw or wrist. Typically, these neurons responded to pressure on the paw or to the wrist ventral flexion. In **Figure 5D**, shapes of different colors show estimated locations of all recorded neurons. According to the most often used atlases of the cat diencephalon (Reinoso-Suarez, 1961; Snider and Niemer, 1961; Berman and Jones, 1982), our recordings included the entire rostro-caudal and most of the dorso-ventral extents of the VL. In addition, based on an assessment of receptive fields of the neurons we also concluded that we have covered most of the medio-lateral extent of the forelimb representation in the VL. Neurons that responded to stimulation of different parts of the forelimb were distributed randomly in the VL: there were no clear clusters of shoulder-, elbow-, or wrist paw-related cells.

To determine whether a neuron was projecting to the MC, stimulating electrodes were placed in the layerVI of area 4γ of the distal forelimb representation (paw, MCd) and in the proximal forelimb representation (elbow, shoulder; MCp; **Figure 5E**), and the test for collision of spikes was applied (**Figure 5F**; Bishop et al., 1962; Fuller and Schlag, 1976). Thalamo-cortical projection cells (TCs) were distributed fairly evenly throughout the area of recording (**Figure 5D**). Most TC neurons responded either to stimulation of MCd or MCp, and only few responded to stimulation of both sites. Interestingly, the vast majority (72%) of neurons projecting to MCd had receptive fields on proximal parts of the forelimb, shoulder, or elbow, and only 9% had receptive fields on the wrist or paw. Neurons projecting to MCp had various receptive fields. Latencies of antidromic responses of different TCs varied in the range of 0.5–5.5 ms. Estimated conduction velocities ranged from 5 to 70 m/s.

### **RE NEURONS**

Forty-six RE neurons with receptive fields on the contralateral forelimb were recorded from two cats. In **Figure 6** the recording sites, combined from both cats, are shown on frontal sections of the thalamus. Cells were collected from the rostro-lateral compartment of the RE at approximate coordinates A 11.75–12.5, L 5.5–7.0, and V 1.0–4.0. The RE was identified by neurons' characteristic bursts of spikes during sleep (**Figures 6E–H**). Within these bursts the discharge frequency first ramps up and then winds down (**Figure 6H**). The motor compartment of the RE was identified by orthodromic responses of the neurons to electrical stimulation of the MC and VL. The overwhelming majority of cells responded vigorously to both stimulations (**Figure 8I**). A single shock applied to the cortex or VL evoked a sequence of several spikes with interspike intervals of 2–6 ms. Latencies to the first spike were in a range of 1–8 ms, similar for both the cortex and VL. This short latency response was followed by a 120–150 ms period of silence, after which another barrage of high frequency discharge occurred.

In one of the cats, red fluorescent beads were injected into the rostro-lateral part of the explored RE area to reveal the areas of thalamus and cortex that projected to these neurons. In **Figure 6A**, an arrow points to the site of injection, and **Figures 6L–O** show locations of neurons retrogradely labeled in the VL. Labeled neurons extended rostro-caudally from A11 to A9, medio-laterally from 3.5 to 5.5, and vertically from 0.5 to 3.0; in addition, labeled neurons were found in a part of the somatosensory ventral posterolateral nucleus (VPL) adjacent to the VL.

Receptive fields of all RE neurons were excitatory. Nineteen cells (41%) were activated by passive movements of the shoulder and/or palpation of muscles on the upper back. Nearly all of these cells had directional preference to shoulder movement,

**FIGURE 5 | Location of VL neurons and identification of TCs**. **(A)** The recording site in cat A is shown on a photomicrograph of a parasagittal section of the thalamus. It was located in the rostral VL. The arrow points to the electrolytic lesion mark and the darkened area of tissue filled with WGA-HRP. The site is ∼2 mm caudally to the Nucleus caudatus (NC) of the basal ganglia. **(B)** The recording site in cat B is shown on a photomicrograph of a coronal section of the thalamus. It was positioned in the middle of the VL. The arrow points to the electrolytic lesion mark and darkened area where fluorescent beads were deposited. The caudal part of putamen (PU), a landmark for the anterior-posterior position of the section, is seen laterally. **(C)** The recording site in cat C is shown on a photomicrograph of a coronal section of the thalamus. It was positioned in the caudal VL. The arrows point to a track from a reference electrode. The most rostral aspect of the lateral geniculate body (LG), a landmark for the anterior-posterior position of the section, is visible laterally. **(A–C)** 50µm thick sections, cresyl violet stain. **(D)** A photograph of the dorsal surface of the left frontal cortex of cat B. Entrance points of stimulation electrodes into the precruciate sulcus are schematically shown by black dots. Electrodes were placed in the paw (the motor cortex distal forelimb representation, MCd), the elbow and shoulder representations (the motor cortex proximal forelimb representation, MCp) as determined by multiunit recording and micro-stimulation procedures. Cru, cruciate sulcus; Pcd, post-cruciate dimple; mAns, medial ansate sulcus. **(E)** A collision test determined whether a neuron response was antidromic. Stimulation of the MC evoked a spike in the neuron with a latency of 0.8 ms. To determine

and either responded better to flexion or adduction (13/19) or to extension or abduction of the joint (6/19). Eighteen neurons (39%) had receptive fields on the paw or wrist or responded to passive movements of the wrist, typically in only one direction. The number of neurons responding to passive movements of the elbow was relatively small (20%, 9/46); and all responses were to extension rather than flexion. In **Figures 6A–D** cells with different receptive fields are depicted with different shapes. There was coarse dorso-ventral topography: cells with receptive fields involving the shoulder were located dorsal to neurons whose receptive fields involved the wrist/paw.

whether this spike was elicited antidromically, on a next trial a spontaneous spike of the neuron was used to trigger MC stimulation with 0.4 ms delay. Stimulation delivered with a delay smaller than the time needed for a spontaneous spike to reach the site of stimulation (that is approximately equal to the latent time of an antidromic spike) was not followed by a response. This indicated a collision of ortho- and antidromically conducted spikes and confirmed the antidromic nature of the evoked spike. **(F)** A reconstruction of positions of individual neurons recorded during locomotion in cats A, B, and C. , Purple squares show neurons with somatosensory receptive fields on the shoulder: responding to passive movements in the shoulder joint and/or palpation of muscles on the back or neck; , Yellow diamonds show cells that were activated by movements in the elbow; N, Red triangles represent neurons with receptive fields on the wrist or paw. Filled symbols represent neurons with axonal projections to the MC (thalamo-cortical neurons, TCs); open symbols represent neurons whose projections were not identified. Abbreviations: AV, nucleus anterio-ventralis thalami; CI, capsula interna; CL, nucleus centralis lateralis; CLA, claustrum; EPN, nucleus entopeduncularis; LA, nucleus lateralis anterior; LG, lateral geniculate nucleus; LME, lamina medullaris externa thalami; LP, nucleus lateralis posterior; NC, nucleus caudatus; OT, optic tract; PC, pedunculus cerebri; PU, putamen; RE, nucleus reticularis thalami; SUB, nucleus subthalamicus; VA, nucleus ventralis anterior; VL, nucleus ventralis lateralis; VM, nucleus medialis; VPL, nucleus ventralis postero-lateralis; VPM, nucleus ventralis postero-medialis (Adapted with modifications from Marlinski et al., 2012a).

# **EXAMPLES OF LOCOMOTION-RELATED ACTIVITY OF NEURONS ACROSS THE THREE MAIN ELEMENTS OF THE THALAMO-CORTICAL NETWORK FOR LOCOMOTION**

Analysis of spike trains was performed as follows. The onset of swing phase was taken as the beginning of step cycle. The duration of each step cycle was divided into 20 equal bins, and a phase histogram of spike activity of the neuron in the cycle was generated. The coefficient of stride-related frequency modulation, the "depth" of modulation, dM, that characterizes fluctuation in probability of the spike occurrence, was calculated as dM = (*N*max − *N*min)/*N* × 100%, where *N*max and *N*min are

**FIGURE 6 | Location and identification of RE neurons**. **(A–D)** Location of RE neurons recorded during locomotion. Estimated locations of neurons are combined from two cats and are shown by various symbols on frontal sections of thalamus of one of them: , Purple squares show neurons with somatosensory receptive fields on the shoulder: responding to passive movements in the shoulder joint and/or palpation of muscles on the back or neck; , Yellow diamonds show cells that were activated by movements in the elbow; N, Red triangles represent neurons with receptive fields on the wrist or paw. In **(A)**, an arrowhead is pointing to a reference electrolytic lesion and an arrow indicates the site of injection of red fluorescent beads. **(A)** close-up to the injection site is shown in the insert. Abbreviations: AM, nucleus anterio-medialis; AV, nucleus anterio-ventralis thalami; CI, capsula interna; DH, dorsal hypothalamus; EPN, nucleus entopeduncularis; MV, nucleus medio-ventralis; NC, nucleus caudatus; RE, nucleus reticularis thalami; VA, nucleus ventralis anterior. Frontal 50µm thick sections, cresyl violet stain. **(E–H)** Identification of RE neurons by characteristic profile of their bursts during sleep. **(E)** Cat sleeping with its head restrained. **(F,G)** An example of activity of a RE neuron while cat is awake and asleep. At the beginning of the record desynchronized activity in EEG indicates that the cat was awake, and the neuron was discharging fairly regularly. The arrow points to the beginning of "spindle waves" in EEG, which are a sign of beginning of slow wave sleep. Shortly thereafter very high frequency irregular bursts separated by long periods of inactivity replaced the regular discharge of the neuron. **(H)** Close-up on a burst. The first interspike interval in this burst was longer than the second one, and the second interval was longer that the third.

Several following interspike intervals were of an approximately similar duration, while the last ones were progressively longer. The lower trace shows change of discharge frequency within the burst. Such a burst with ramping up and then winding down firing rate identifies this neuron as belonging to the RE. **(I)** Identification of the motor compartment of the RE by responses of neurons to electrical stimulation of the VL (upper trace) and MC (lower trace). In response to either stimulation the cell generates a short latency burst followed by a period of silence and then by another burst. **(J)** Locomotion-related activity of a representative neuron with shoulder-related receptive field. The activity of this neuron is modulated to strides but does not contain any "sleep-type" busts. **(K)** Accelerating-decelerating frequency "sleep-type" bursting during locomotion in a wrist/paw-related neuron. A burst is shown in the insert at a fast time scale. Such bursts often appeared at the beginning of the locomotion-related activation of this neuron. **(L–O)** Thalamic projections to the area of recording in the RE. Neurons in the VL and VL/VPL border zone in one of the cats where red fluorescent beads were injected in the rostro-lateral part of the explored RE area, retrogradely labeled with red fluorescent beads. Neurons are shown on photomicrographs of frontal sections of the left thalamus ipsilateral to the injection site. Each circle represents one labeled neuron. Abbreviations: CL, nucleus centralis lateralis; LA, nucleus lateralis anterior; LG, lateral geniculate nucleus; LP, nucleus lateralis posterior; OT, optic tract; PC, pedunculus cerebri; VL, nucleus ventralis lateralis; VM, nucleus medialis; VPL, nucleus ventralis postero-lateralis; VPM, nucleus ventralis postero-medialis; other abbreviations are as in **Figure 5** (Adapted with modifications from Marlinski et al., 2012b).

the number of spikes in the maximal and the minimal histogram bin, and *N* is the total number of spikes in the histogram. Neurons with dM > 4% were judged to be stride-related based on an analysis of fluctuation in the activity of neurons in the resting animal (Marlinski et al., 2012a). In stride-related neurons, the portion of the cycle in which the activity level exceeded 25% of the difference between the maximal and minimal frequencies in the histogram was defined as a "period of elevated firing," or PEF. In neurons with a single PEF, the "preferred phase" of discharge was calculated using circular statistics (Batshelet, 1981; Drew and Doucet, 1991; Fischer, 1993; see also Beloozerova et al., 2003a; Sirota et al., 2005).

An example activity of a PTN during simple and ladder locomotion is shown in **Figures 7A–E**. At rest, this PTN was activated by passive adduction of the shoulder. The PTN was rather steadily active during standing. When locomotion began, its activity reduced overall but became modulated with respect to the stride: it was greater during stance phase of the stride and smaller during swing. Rasters in **Figures 7B,D** show that the activity of the PTN was very consistent across strides. The activity is summed in **Figures 7C,E** showing histograms of PTN firing rate across the step cycle during simple (**Figure 7C**) and ladder (**Figure 7E**) locomotion. The PEF is indicated by a black horizontal bar, and the preferred phase is shown by an open circle. Note that during ladder locomotion, the discharge of the neuron during the stance phase was much higher as compared to that during simple locomotion while remaining low during swing phase. Thus, the magnitude of frequency modulation, dM, was larger during ladder locomotion. In addition, the duration of the period of elevated firing, PEF, was shorter by 20% of the cycle.

An example activity of a TC neuron is shown in **Figures 7F–J**. At rest, this neuron was activated by palpation of muscles around the shoulder. During simple locomotion the neuron discharged throughout all phases of the stride, except for the middle of stance when it was practically silent (**Figures 7F–H**). This pattern of activity was very consistent across many strides (**Figure 7G**). The discharge within the PEF varied in intensity, however, forming three small sub-peaks; the maximum discharge rate was 80 spikes/s. During ladder locomotion, rather than discharging throughout most of the stride cycle, the neuron was active almost exclusively around the swing-stance transition (**Figures 7F,I,J**), but peaked near the same preferred phase as during simple locomotion. Its firing rate reached 118 spikes/s, significantly higher than during simple locomotion (*p* < 0.05, *t*-test), whereas the activity in the trough during stance remained low. Consequently, the magnitude of modulation was larger during ladder than simple locomotion. The duration of the PEF shortened by one half.

An example activity of a RE neuron is shown in **Figures 7K–O**. At rest, this neuron responded to passive flexion and extension of the shoulder. During locomotion, it was highly active during the end of swing and beginning of stance, and less active at the end of stance phase and beginning of swing. This pattern of activity was consistent across many strides (**Figure 7L**). The maximum discharge rate of the neuron was 102 spikes/s (**Figure 7M**). During ladder locomotion, discharge of the neuron during the first half of swing decreased, increasing during the second half of swing to 123 spikes/s. As a result, similarly to both PTN andVL neurons, the magnitude of modulation of the RE neuron's discharge was larger during ladder than simple locomotion and the PEF was shorter.

We want to note that for none of the MC, VL, or RE, is there a single "typical" neuron with respect to the activity during locomotion. Instead, each of the motor centers contains a variety of neurons that differ in the phases of their discharges during the stride, in the number of PEFs they produce per cycle, in the manner by which they respond to the accuracy demand imposed by the ladder, and other parameters. We did our best to describe these different cell types in our original research reports (Marlinski et al., 2012a,b; Stout and Beloozerova, 2012). In **Figure 7** we show neurons with shoulder-related receptive fields that belong to the most populous group of cells: those that discharge a single PEF per cycle and respond to accuracy demand on stepping by increasing the magnitude of their stride-related modulation and by shortening the PEF.

For populations of shoulder-, elbow-, and wrist/paw-related neurons, we will first overview their activities during simple unconstrained locomotion and then consider their discharges during accurate stepping along the horizontal ladder.

# **SIMPLE LOCOMOTION: SETTING DISTINCT FRAMES FOR THE SHOULDER, ELBOW, AND WRIST/PAW CONTROLS PTN ACTIVITY**

During simple locomotion, shoulder- and wrist-related PTNs were more active than elbow-related PTNs (18.9 ± 1.3 vs. 13.8 ± 1.7 spikes/s; *t*-test, *p* < 0.05). In 97% of all cells the discharge rate was modulated with respect to the stride: it was greater in one phase of the stride and smaller in another phase. Most PTNs (79%) had one PEF per stride, while 21% had two PEFs. The proportion of two-PEF cells was similar between groups of PTNs with different somatosensory receptive fields, and one- and two-PEF neurons will be considered jointly in this review. The depth of modulation was similar between PTN groups (10.2 ± 0.4%) as was the duration of the PEF (55–60% of the cycle). PEFs of individual PTNs of all groups were distributed across the step cycle. However, this distribution was different between groups (**Figure 8**, two left columns). Shoulder-related PTNs were more often active during the late stance and early swing (**Figures 8A1**,**3**), and their discharge rate was highest during the stance-to-swing transition, at 21.8 ± 2.0 spikes/s (here and below: mean ± SEM), while the firing rate during the opposite phase was 8.4 spikes/s lower (*p* < 0.05, *t*-test; **Figures 8A2**,**4**). Elbow-related PTNs were largely active in antiphrase with shoulder-related cells (**Figures 8B1**,**3**), discharging during the late swing and early stance at 17.4 ± 2.4 spikes/s, while giving only 10.6 ± 2.1 spikes/s during the opposite phase (**Figures 8B2**,**4**). In contrast to both of these groups, PEFs of wristrelated neurons were distributed fairly equally throughout the step cycle (**Figures 8C1**,**3**), and their population' average discharge rate only slightly fluctuated around 20 spikes/s (**Figures 8C2**,**4**).

# **VL NEURON ACTIVITY**

During simple locomotion, the activity of shoulder-, elbow-, and wrist-related VL neurons was similar, and averaged at 23.8 ± 1.4 spikes/s, ∼5 spikes/s higher than the average activity of the most active PTN populations (*t*-test, *p* = 0.01). The activity of 85.5% of neurons, including 87% of TCs, was modulated in the

### **FIGURE 7 | Continued**

phase in each stride is indicated by an open triangle. In the histograms, the horizontal interrupted line shows the level of activity during standing. The horizontal black bar shows the period of elevated firing (PEF) and the circle indicates the preferred phase. **(D,E,I,J,N,O)** Activities of the same neurons during ladder locomotion are presented as rasters **(D,I,N)** and as histograms **(E,J,O)**. (Examples of the activity of MC, VL, and RE neurons are adapted with modifications from Beloozerova et al., 2010; Marlinski et al., 2012a,b, respectively).

rhythm of strides. Similarly to PTNs, two basic patterns of modulation were seen: one or two PEFs. The one PEF pattern was the most common one (67% of neurons, including 63% TCs). Two PEFs were observed in 31% of cells, including 35% TCs. The proportion of one- and two-PEF cells was similar between groups of VL neurons, and one- and two-PEF cells will be considered jointly below. In the shoulder-related group, the depth of modulation was higher at 9.3 ± 0.6% as compared to either elbow- or wrist/paw-related cells (7.3 ± 0.5%; *p* = 0.02, *t*-test), and the duration of the PEF was shorter (58 ± 3% vs. 65 ± 3% of the cycle; *p* = 0.04, *t*-test). PEFs of individual cells of all groups were distributed across the step cycle. However, as with PTNs, this distribution was different between neuronal groups with different receptive fields (**Figure 8**, two middle columns).

PEFs of shoulder-related neurons were fairly evenly distributed across the step cycle (**Figures 8D1**,**3**); however, neurons with PEFs during end of swing/beginning of stance were more active than other cells (**Figure 8D2**), and the mean discharge rate of the shoulder-related population was higher during this period, at 27.7 ± 4.0 spikes/s, while the firing rate during mid-stance was 11.2 spikes/s less (*p* = 0.04, *t*-test; **Figure 8D4**). In contrast, cells of both elbow- and wrist/paw-related groups were most often active during the stance phase (**Figures 8E1**,**3**,F1,**3**). However, while elbow-related neurons attained their maximal population discharge rate at the end of the stance only and during the stanceto-swing transition (**Figure 8E4**), the mean discharge rate of the wrist/paw-related group was at its highest in the beginning of the stance phase (**Figure 8F4**). Strikingly, each of VL groups was active largely in anti-phase with their MC counterparts.

## **RE NEURON ACTIVITY**

During simple locomotion, wrist-related RE neurons were more active then either shoulder- or elbow-related cells (31.4 ± 3.0 vs. 22.6 ± 3.1 spikes/s; *p* < 0.05, *t*-test). The discharge of 96% of all RE neurons was modulated with respect to the stride. Most neurons (74%) had one PEF per step cycle, and 26% had two. Between groups of cells with different somatosensory receptive fields, the proportions of neurons with one and two PEFs were similar. The activity of neurons with receptive fields on the wrist/paw were more modulated than that of either shoulder- or elbow-related groups (12.5 ± 1.1 vs. 8.0 ± 0.6 or 8.4 ± 0.9%; *p* < 0.01, *t*-test), and their PEFs were shorter (54 vs. 66% of the step cycle; *p* = 0.036, *t*-test). As in the PTNs and VL neurons, there was a prominent difference between the phase positions of PEFs of RE cells with different receptive fields. PEFs of wrist/paw-related cells promptly terminated at the end of the swing phase and did not restart before the middle of the stance (**Figures 8I1**,**3**). In contrast, PEFs

step cycles **(B,G,L)** and as histograms **(C,H,M)**. In the rasters, the duration of step cycles is normalized to 100%, and the rasters are rank-ordered according to the duration of the swing phase. The beginning of the stance

(Continued)

**FIGURE 8 | Activities of the shoulder-, elbow-, and wrist/paw-related cells in the thalamo-cortical network during simple locomotion**. **(A,D,G)** Activity of neurons responsive to movements in the shoulder joint, and/or palpation of back, chest, or neck muscles in the MC **(A)**, VL **(D)**, and RE **(G)**. **(A1,D1,G1)** Phase distribution of PEFs. **(A2,D2,G2)** Corresponding phase distribution of discharge frequencies. The average discharge frequency in each 1/20th portion of the cycle is color-coded according to the scale shown at the bottom. **(A3,D3,G3)** Proportion of active neurons (neurons in their PEFs)

in different phases of the step cycle. **(A4,D4,G4)** The mean discharge rate. Thin lines show SEM. Vertical interrupted lines denote end of swing and beginning of stance phase. **(B,E,H)** Activity of neurons responsive to passive movement of the elbow joint in the MC **(B)**, VL **(E)**, and RE **(H)**. **(C,F,I)** Activity of neurons responsive to stimulation of the paw or movement in the wrist joint in the MC **(C)**, VL **(F)**, and RE **(I)**. (Data on the activity of PTNs, VL, and RE neurons are adapted with modifications from Stout and Beloozerova, 2012; Marlinski et al., 2012a,b, respectively).

of shoulder- and elbow-related neurons were distributed more evenly across the cycle (**Figures 8G1**,**3**,H1,**3**). Both the wrist/pawand shoulder-related neurons attained their highest discharge rates during swing and lowest during stance, but the wrist/paw-related population was almost twice as active at the peak as compared to the shoulder-related one (42 and 24 spikes/s, respectively). Overall, RE elbow- and wrist/paw-related neurons were active more or less in anti-phase with their counterparts in the VL, while shoulder-related cells were mostly active in-phase.

In addition to differences in their discharge rates and phase preferences, wrist/paw-, and shoulder-related cells differed sharply in their inclination to produce sleep-type bursts of spikes during locomotion (**Figures 6J,K**). The activity of a shoulder-related neuron shown in **Figure 6J** was modulated with respect to the step cycle, but otherwise was rather regular. This firing behavior contrasted sharply with that of a wrist/paw-related neuron shown in **Figure 6K**. The activity of this neuron was also modulated in relation to the step cycle, however, after a period of silence during stance, it discharged dense bursts of spikes, in which the spike frequency first increased and then decreased. The insert in **Figure 6K** shows a burst at sufficient temporal resolution to illustrate that its structure during locomotion was similar to the signature RE-type bursts during sleep (**Figure 6H**). All but one shoulder-related cell had relatively regular firing behavior during locomotion, similar to that of the neuron shown in **Figure 6J**. In contrast, a significant portion of wrist/paw-related cells (39%, 7/18) discharged sleep-type bursts during walking, similar to those shown in **Figure 6K**.

### **GENESIS OF LOCOMOTION-RELATED ACTIVITY IN THE MC, VL, AND RE DURING SIMPLE LOCOMOTION**

We have shown in all three key centers of the thalamo-cortical network for locomotion, MC, VL, and RE, that neurons responsive to stimulation of different forelimb joints are active differently during simple locomotion. While it might be tempting to suggest that these differences are due to differences in the neurons' somatosensory receptive field characteristics, at least for PTNs, somatosensory information seems not to play a leading role in determining their locomotion-related discharges. Indeed, PTNs with similar receptive fields often discharge during quite different phases of the locomotion cycle (Armstrong and Drew, 1984b). It has been shown that the locomotion-related responses of MC neurons are only slightly affected by changes in the vigor of movements during up- and downslope walking, weight bearing, or alterations in speed (Armstrong and Drew, 1984a; Beloozerova and Sirota, 1993b) – changes that most certainly cause significant changes to proprioceptive afferentation. With regard to cutaneous input, Armstrong and Drew (1984b) have demonstrated that in the MC, neurons with cutaneous receptive fields, including on the forefoot, still rhythmically discharge during locomotion with a similar phasing relative to the step cycle when their response to mechanical stimulation in the receptive field is temporarily reduced or abolished by local anesthesia of the skin. In our recent study we found that the great majority of PTNs with direction-specific receptive fields did not show any particular preference to discharge in-phase with stimulation of their receptive field during locomotion (Stout and Beloozerova, 2012). Similarly poor relationships

between phasing of task-related discharges and directional specificity of PTN resting receptive fields were reported in previous studies from our and other laboratories (Armstrong and Drew, 1984b; Drew, 1993; Beloozerova et al., 2003b, 2005; Karayannidou et al., 2008).

For VL and RE neurons, the above experiments have not been conducted; however, one can argue that discharges of neurons in these thalamic nuclei during simple locomotion are likewise, at the very least, not entirely driven by stimulation of somatosensory receptive fields. In our studies we did not find any simple correlation between neuronal responses to somatosensory stimulation in the quiescent animal and preferred phases of VL neurons activity during locomotion (Marlinski et al., 2012a). In decerebrated cats, it was found that the cerebellum plays the pivotal role in driving locomotion-related discharges in the neurons of subcortical motor centers, including neurons of the red nucleus, vestibular nuclei, and the neurons of the reticular formation giving rise to the reticulo-spinal tract (Orlovsky, 1970, 1972a,b; reviewed in Arshavsky et al., 1986; Orlovsky et al., 1999). For these centers, the role of direct afferentation from the spinal cord for periodic modulation of activity during locomotion is minimal, because in the majority of their neurons, locomotion-related modulation disappears after removal of the cerebellum in the decerebrate preparation. It can be expected that the VL, as a subcortical motor nucleus receiving direct connections from the cerebellum, does not differ in this respect from the brainstem motor centers. It is important to stress that the locomotion-related output of the cerebellum during simple locomotion is almost exclusively formed on the basis of information that is obtained from the spinal locomotor CPG (rev. in Arshavsky et al., 1986 and Orlovsky et al., 1999). The VL receives this information. All deep cerebellar nuclei project to the area of VL that we explored (Rinvik and Grofová, 1974; Rispal-Padel and Grangetto, 1977; Angaut, 1979; Nakano et al., 1980; Ilinsky and Kultas-Ilinsky, 1984; Evrard and Craig, 2008; Marlinski et al., 2012a), and it was shown that all these nuclei house neurons whose activity is strongly step-related during locomotion, with characteristics that are very suitable for driving locomotion-related activity in theVL (Orlovsky,1972c;Armstrong and Edgley, 1984, 1988; Beloozerova and Sirota, 1998; Nilaweera and Beloozerova, 2009).

Signals to the motor compartment of the RE come from collaterals of VL TCs and the collaterals of MC cortico-thalamic (CT) neurons of layer VI (**Figure 3**; rev. in Jones, 2007). A comparison of locomotion-related discharges in these two regiones (Sirota et al., 2005 for CTs; Marlinski et al., 2012a for TCs) shows that the activity of the RE is very similar to that of the VL and appears to be predominantly driven by it (see Marlinski et al., 2012b for a detailed discussion). Therefore, one can conclude that if during simple locomotion VL neurons are, at least to a significant extent, driven by the spinal locomotor CPG, so too are the neurons of the RE.

If the activity of MC, VL, and RE neurons is influenced by signals from the spinal locomotor CPG, then this influence is quite different for neurons associated with different joints of the forelimb (**Figure 8**), as we found that these cells tend to discharge differently during simple locomotion. Namely, for VL neurons that are the "entry" elements of the network (**Figure 3**), the influence from the CPG onto shoulder- and wrist/paw-related groups is maximal during the swing-to-stance transition, and onto the elbow-related group during the opposite phase. For RE neurons, which form a feedback inhibition loop for the VL, the influence from the CPG, although arriving in a similar phase of the stride, greatly differs in magnitude between the wrist/pawrelated group and the shoulder- and elbow-related groups. For PTNs, which are the output elements of the network, the influence from the CPG onto the shoulder-related group is maximal during the stance-to-swing-transition, during the opposite phase for the elbow-related group, and roughly even throughout the step cycle for the wrist-related group.

# **FUNCTION OF LOCOMOTION-RELATED ACTIVITY IN THE MC, VL, AND RE DURING SIMPLE LOCOMOTION**

Many studies have demonstrated that the MC does not exert decisive control over simple locomotion. Analogous data was also reported for the VL (Fabre and Buser, 1979; Beloozerova and Sirota, 1993a). In our earlier publication we have suggested that the stride-related modulation of the activity that MC neurons exhibit during simple locomotion has an informational character, allowing these neurons, if a need arises, to influence the spinal locomotor mechanism for correction of movements without disturbing the overall stepping rhythm (Beloozerova and Sirota, 1993a). We have later extended this hypothesis to both the VL and RE (Marlinski et al., 2012b).

It is important to understand how setting of permissible "windows of influence" takes place. Locomotion-related modulation of PTNs appears to be primarily caused by the activity of the VL, the main subcortical input to the MC. The general importance of this input for MC activity is well-known (Massion, 1976; Fabre-Thorpe and Levesque,1991; Shinoda et al.,1993;Horne and Butler, 1995; Steriade, 1995; Destexhe and Sejnowski, 2001); however, the contribution of the VL to the transmission of locomotion-related signals has been not researched before. We found that discharges of 92% of VL neurons are modulated in the rhythm of strides with cells expressing one- and two-PEF patterns in proportions that are close to those seen in PTNs (Armstrong and Drew, 1984a; Beloozerova and Sirota, 1993a; Drew, 1993; Stout and Beloozerova, 2012). Thus, TCs can contribute to the activity in the MC during locomotion. However, in the four major characteristics of locomotion-related activity: mean discharge frequency, depth of frequency modulation, duration of activity bursts, and their stride phase distribution, there are two notable differences in the activity of VL neurons as compared to PTNs. The average depth of modulation is lower in the VL: 7.3–9.3 ± 0.5% vs. 10.2 ± 0.4% (*p* < 0.05, *t*-test), and the discharge within the activity bursts is typically more variable (Marlinski et al., 2012a). That is, stride-related responses of VL neurons are less phase-specific as compared to those of PTNs. This agrees with previous findings of a weaker directional specificity of VL neurons discharges during arm and wrist movements as compared to that of neurons in the motor cortex (Strick, 1976; Kurata, 2005), as well as with the well-known fact that, in the visual system, the responses of neurons in the lateral geniculate nucleus are less specific to visual stimuli than those of cells in the visual cortex (e.g., Tsao and Livingstone, 2008). This means that even during simple locomotion, the MC integrates its

own information processing into signals received from the VL and likely takes into account other, predominantly cortical, inputs.

In addition to general differences in VL and PTN activities during locomotion, each of the shoulder-, elbow-, and wrist-related VL groups discharges in anti-phase with their respective PTN counterpart much of the time (**Figures 8** and **11**). This can have several reasons. First, it is possible that TCs direct their main output to PTNs not with a similar, but rather a dissimilar receptive field. Using electrical stimulation of the MC we found that the vast majority (72%) of TCs projecting to distal forelimb representation in the MC had receptive fields on proximal parts of the forelimb. Correspondingly, shoulder-related TC neuron activity is roughly in-phase with that of wrist/paw-related PTNs (**Figures 8C4,D4**). Although we did not find any other statistically significant crossed projections, elbow-related TCs activity was in-phase with that of shoulder-related PTNs, and wrist-related TCs as a group were active roughly in-phase with elbow-related PTNs.

A second explanation for the generally antiphasic activity of VL and PTNs subpopulations with similar receptive fields is that, in analogy with the somatosensory cortex where TCs powerfully excite inhibitory interneurons (Swadlow, 2002), PTNs may receive their main input from TCs not directly but via an inhibitory cortical network. This is quite plausible because putative inhibitory interneurons with suitable locomotion-related properties have been seen in the MC (Beloozerova et al., 2003a,c; rabbit, cat; Murray and Keller, 2011, rat). GABAergic inhibitory interneurons are thought to be involved in regulating both spatial and temporal response properties of cortical neurons (Sillito, 1975; Hicks and Dykes, 1983; Dykes et al., 1984), and it was demonstrated that they importantly participate in motor-related responses of PTNs as reduction of cortical GABA<sup>A</sup> inhibition enhances PTN activity during voluntary movements (Matsumura et al., 1992) and postural corrections (Tamarova et al., 2007).

Finally, since among both VL and PTN subpopulations there are neurons that are active in any phase of the stride, it is possible that although gross populational activities of VL and PTNs are in anti-phase, individual TC neurons influence those PTNs with which they are active in-phase. This will imply that VL neurons active during different phases of the stride have different divergence/convergence ratios for different PTNs. For example, in the wrist/paw domain, the few TC neurons active during swing diverge and powerfully drive many PTNs, while the many TCs that are active during stance converge upon similar overall numbers of PTNs, but drive them less powerfully (**Figures 8C3**,**4**,**F3**,**4**). These possibilities of fine organization of TC to PTN projection can and should be tested experimentally.

The activity of the VL is shaped by operation of the inhibitory feedback through the RE (**Figure 3**). While a wealth of information is available on the properties of RE neurons in brain slices, in anesthetized animals, and during sleep (Steriade et al., 1990; McCormick and Bal, 1997; Funke and Eysel, 1998; McCormick and Contreras, 2001; Hartings et al., 2003; Lam and Sherman, 2005, 2007, 2011; Cotillon-Williams et al., 2008; Sillito and Jones, 2008), the involvement of the RE in the production of movements has not been researched until recently (Marlinski et al., 2012b). In our studies we found that the activity of 90% of RE neurons is step phase-related during locomotion. The fact that the activity of

the RE, at both the individual and population level, changes with the phase of the stride indicates that during different stride phases RE neurons exerts different influences upon the VL. The activity of all RE subpopulations is more intense during late stance and swing as compared to early stance (**Figures 8** and **11**). This means that their target VL neurons are most inhibited during late stance and swing, thus allowing only the strongest ascending signals to pass through and reach the MC during these periods. A blockade of thalamic transmission permits other inputs to the MC to gain a greater contribution to the formation of cortical output during late stance and throughout swing phase. In contrast, during the early stance phase, when the activity of RE neurons is the lowest and thus their target VL cells are disinhibited, more ascending information passes through thalamus to the MC allowing the thalamus to provide a larger contribution to the cortical output during this period.

RE neurons with receptive fields on different segments of the forelimb, likely related to control of different segments of the limb, act differently during locomotion. Wrist/paw-related neurons, which are located ventrally in the nucleus, greatly exceed both shoulder- and elbow-related cells in the magnitude of their population activity modulation (**Figures 8** and **11**). They also have the highest discharge rates and greatest depths of frequency modulation in discharges of individual neurons, and are prone to high frequency bursting. The shoulder-related cells, which are located dorsally in the nucleus, have the lowest discharge rates and depths of modulation and rarely if ever burst. Thus, the VL-to-MC signal transmission in the distal limb domain is the most heavily influenced by the RE and is least influenced in the proximal limb domain.

# **LADDER LOCOMOTION: EXERTING DIFFERENTIAL CONTROLS OVER SHOULDER, ELBOW, AND WRIST/PAW FOR ACHIEVEMENT OF ACCURATE STEPPING**

The ladder adds accuracy requirements to the locomotion task. On the ladder, cats are forced to constrain their paw placement to the raised crosspieces. They step accurately on their tops, showing much less spatial variability in feet placement as compared to simple locomotion (Beloozerova et al., 2010; **Figure 1B**). It has been demonstrated that walking with accurate stepping requires visual control (Sherk and Fowler, 2001; Beloozerova and Sirota, 2003; Marigold and Patla, 2008) and the activity of the MC and VL to be successful (Trendelenburg, 1911; Liddell and Phillips, 1944; Chambers and Liu, 1957; Beloozerova and Sirota, 1988, 1993a, 1998; Metz and Whishaw, 2002; Friel et al., 2007). In our experiments, all neurons that were tested during walking on the flat surface were also tested during locomotion along the ladder.

### **PTN ACTIVITY**

Upon transition from simple to ladder locomotion, 97% of PTNs changed at least one characteristic of their activity, and 76% changed two or more. During ladder locomotion, high proportions of PTNs in all somatosensory response groups, 27–42% depending on the group, increased their average discharge rate as compared to simple walking, on average by 99 ± 74%. Overall, fewer cells decreased the activity. Wrist- and elbow-related groups differed sharply, however: wrist-related PTNs had a fair number

of cells with diminishing activity (40%), while the elbow-related group had only few (15%). In result, the average discharge rate of elbow-related group increased and became similar to that of shoulder- and wrist/paw-related PTNs. The average rate for all PTNs was 19.3 ± 1.2 spikes/s.

The activity of all but three PTNs was stride-related during ladder locomotion. The average depth of modulation was 11.4 ± 0.4%. The same two patterns of modulation were observed in proportions similar to those seen during simple locomotion. Half of shoulder- and wrist-related PTNs increased the depth of modulation, on average by 62 ± 44% (**Figure 9A**). To do this, wrist/paw-related PTNs most commonly decreased discharge rate during the inter-PEF interval, while shoulder-related neurons could either increase it within the PEF or decrease in-between the PEFs (**Figures 9B,C**). Decreases of modulation also occurred in these neurons, but only half as frequently. In contrast, a typical response of elbow-related PTNs to the ladder task was a decrease of modulation depth (**Figure 9A**), typically by a decrease in the firing rate during the PEF (**Figure 9D**). About one third of shoulderand wrist-related PTNs decreased the duration of their PEF, on average by ∼40%, but typically kept the same number of PEFs. In contrast, the elbow-related neurons typically did not change the PEF's duration, but tended to change the number of PEFs by either increasing or decreasing it.

A number of PTNs, especially within the elbow-related group, changed their preferred phases of the activity by either discharging earlier or later in the cycle. However, the phasing preferences of the entire shoulder- and elbow-related subpopulations during ladder locomotion remained largely similar to those during simple locomotion (**Figures 8** and **10**). In shoulder-related PTNs the mean discharge rate during stance-to-swing transition slightly rose to 24.4 ± 2.9 spikes/s; however, the activity during the opposite phase also rose, reaching 16.1 ± 2.4 spikes/s. Elbow-related PTNs still had a tendency to discharge more intensively during the swingto-stance transition (**Figures 8** and **10**). In stark contrast to those groups, wrist-related PTNs developed a strong phase preference. While during simple locomotion this group showed only a subtle tendency to discharge more intensively during swing, during ladder locomotion this preference became pronounced (**Figures 8** and **10**). The discharge during swing was now slightly higher and, in addition, the discharge rate during stance substantially decreased. So, the difference in the discharge rate between swing and stance of wrist-related PTNs was 14.6 spikes/s during ladder locomotion.

### **VL NEURON ACTIVITY**

Upon transition from simple to ladder locomotion, 79% of VL neurons changed at least one characteristic of their activity. One third of cells changed the discharge rate by either increasing or decreasing it by 51 ± 7% on average. While the average discharge rate of shoulder-, elbow-, and wrist-related neurons remained similar to that during simple locomotion (23–25.5 spikes/s), the elbow-related VL group was different from both other groups in that it had significantly more neurons whose activity diminished upon transition from simple to ladder locomotion (*p* = 0.01, χ 2 test). This change in the activity of VL elbow-related neurons directly opposed that of elbow-related PTNs.

**transition from simple to ladder locomotion**. **(A)** Comparison of depth of modulation in the activity of individual MC, VL, and RE neurons. The abscissa and ordinate of each point show the values of the depth of modulation of a neuron during simple and ladder locomotion, respectively. Neurons whose depths of modulation were statistically significantly different during the two tasks are shown with filled diamonds, the other ones are shown with open diamonds. **(B–E)** Typical changes in the depth of modulation upon transition from simple to ladder locomotion in PTNs.

locomotion, and the bar histograms show activity of the same PTNs during ladder locomotion. Bar graphs beneath the histograms show the proportion of PTNs from each group exhibiting that type of modulation change. **(B):** Increase in the depth of modulation by additive mechanism. **(C)** Increase in the depth of modulation by subtractive mechanism. **(D)** Decrease in the depth of modulation by subtractive mechanism. **(E)** Decrease in the depth of modulation by additive mechanism. (Adapted with modifications from Stout and Beloozerova, 2012).

The activity of 92% of all VL neurons was step-related during ladder locomotion, with eight neurons becoming step cyclemodulated only during this complex task. The average depth of modulation was 9.1 ± 0.4%. The same two patterns of discharge

modulation as during simple locomotion were expressed: the one PEF (63% of neurons) and the two-PEF (34% of neurons) patterns. In the shoulder-related group, 32% of cells increased and 10% decreased the depth of modulation, but the average depth of

**FIGURE 10 | Activities of the shoulder-, elbow-, and wrist/paw-related cells in the thalamo-cortical network during ladder locomotion**. **(A,D,G)** Activity of neurons responsive to movements in the shoulder joint, and/or palpation of back, chest, or neck muscles in the MC **(A)**, VL **(D)**, and RE **(G)**. **(A1,D1,G1)** Phase distribution of PEFs. (**A2,D2,G2)** Corresponding phase distribution of discharge frequencies. The average discharge frequency in each 1/20th portion of the cycle is color-coded according to the scale shown at the bottom. **(A3,D3,G3)** Proportion of active neurons (neurons in their PEFs)

in different phases of the step cycle. **(A4,D4,G4)** The mean discharge rate. Thin lines show SEM. Vertical interrupted lines denote end of swing and beginning of stance phase. **(B,E,H)** Activity of neurons responsive to passive movement of the elbow joint in the MC **(B)**, VL **(E)**, and RE **(H)**. **(C,F,I)** Activity of neurons responsive to stimulation of the paw or movement in the wrist joint in the MC **(C)**, VL **(F)**, and RE **(I)**. (Data on the activity of PTNs, VL, and RE neurons are adapted with modifications from Stout and Beloozerova, 2012; Marlinski et al., 2012a,b, respectively).

modulation for this subpopulation did not significantly change. In contrast, half of the elbow-related cells increased the depth of modulation, on average by 60 ± 7% (**Figure 9A**), and in result, the depth of modulation of the elbow-related group increased to 9.5 ± 0.6%. In the wrist-related group, only 15% of cells increased modulation and 15% decreased it, and the average modulation of wrist/paw-related cells remained low. The duration of the PEF was similar across the three VL neuronal groups, averaging 61 ± 1.5% of the cycle, however, in about one third of cells the number of PEFs per cycle changed. Elbow-related neurons differed from both other groups by almost always increasing the number of PEFs on the ladder from one to two, while shoulder- and wrist/paw-related cells more often decreased it from two to one. In approximately one quarter of neurons that were modulated with one PEF during

the activities of the MC and VL are in anti-phase. (Data on the activity of PTNs, VL, and RE neurons are adapted with modifications from Stout and

Beloozerova, 2012; Marlinski et al., 2012a,b, respectively).

both locomotion tasks, regardless of their receptive field, the preferred phase of the activity on the ladder was different from that during simple locomotion.

Ventrolateral thalamus neurons with receptive fields involving different joints tended to have their PEF in different phases of the step cycle (**Figure 10**, two middle columns). Despite changes in preferred phases of activity of individual neurons, populations' activity distributions were generally similar to those seen during simple locomotion. Shoulder-related neurons were more active during the transitions from swing-to-stance phase, and the mean discharge rate of the stride-related population was higher during this period, at 27.0 ± 3.3 spikes/s, while the firing rate during mid-stance was 10 spikes/s less (**Figure 10D4**). Elbow-related neurons tended to be more active in the opposite phase, reaching maximum in the activity at 30 ± 5.0 spikes/s during the late stance and early swing (**Figure 10E4**). Wrist-related neurons were more active throughout stance at 25–30 spikes/s while discharging 10–15 spikes/s less during mid swing (**Figure 10F4**).

### **RE NEURON ACTIVITY**

Upon transition from simple to ladder locomotion, 75% of RE neurons changed at least one characteristic of their activity (**Figure 9A**). During ladder locomotion, wrist-related RE neurons still tended to be more active then either shoulder- or elbow-related cells (29 ± 3.4 vs. 24.5 ± 3.0 spikes/s). The discharge of 91% of all RE cells was modulated with respect to the stride, and as with the MC and VL neurons, the same two patterns of modulation were observed in proportions similar to those seen during simple locomotion.

There were substantial differences in the activity between neurons with different receptive fields (**Figure 10**). As with the VL populations, distributions were generally similar to those seen during simple locomotion. PEFs of shoulder-related cells were distributed rather evenly across the cycle (**Figures 10G1**–**4**), and their average discharge rate was relatively low (23 ± 3.3 spikes/s). They also had low average depth of modulation (8 ± 1%) and long PEFs (70 ± 3% of the cycle). In contrast, wrist/paw-related cells discharged most intensively during the swing and end of stance, generally sparing the first half of stance (**Figures 10I1**–**4**). They also tended to be more active (29 ± 3.4 spikes/s), were much more modulated (12.4 ± 1.2%), and exhibited shorter PEFs than neurons of any other group (55 ± 4.5% of the cycle). In addition, wrist/paw- and shoulder-related cells still differed dramatically in production of sleep-type spike bursts. The most frequently bursting wrist/paw-related cell generated a burst nearly every third stride, while shoulder- and elbow-related generated very few if any. Three wrist-related neurons had a significantly higher probability to discharge a sleep-like burst during ladder than simple locomotion (*p* = 0.001, *t*-test). The activity characteristics of elbow-related neurons were in-between of those of shoulder- and wrist/paw-related cells (**Figures 10H1**–**4**).

### **DISTINCT MC CONTROLS FOR SHOULDER, ELBOW, AND WRIST DURING COMPLEX LOCOMOTION**

It is clear that the MC plays a critical role in the control of accurate stepping, as precise positioning of limbs is nearly impossible after destruction of the MC or even its short-lasting inactivation (Trendelenburg, 1911; Liddell and Phillips, 1944; Chambers and Liu,1957;Beloozerova and Sirota,1988,1993a;Metz andWhishaw, 2002; Friel et al., 2007). In cats walking on a treadmill belt, it was shown that the activity of many neurons in the MC changes periodically according to the step cycle, and significantly increases during unexpected perturbations and voluntary gait modifications (Armstrong and Drew, 1984a; Drew, 1993; Widajewicz et al., 1995; Drew et al., 1996). In our earlier work, we found that when paw positioning on the surface was restricted such that visually guided adaptation of gait was required to place the paws accurately, the activity of 60–70% of the neurons in the MC, depending on the task, changed dramatically as compared to walking on the flat surface, and the changes in neuronal activity increased as the requirements for accurate foot placement became increasingly demanding (Beloozerova and Sirota, 1993a). Later, we additionally found that, as accuracy demand on stepping progressively increases, many neurons in the MC progressively refine their discharge timing, producing activity more precisely in a specific and restricted phase of the stride (Beloozerova et al., 2010).

Several lines of evidence indicate that the differences in MC activity during simple and ladder locomotion reflect different modes of cortical descending control during these tasks, not a difference in the afferent signals. First, as discussed above, afferent signals appear to play little role in driving locomotion-related responses inMC neurons (Armstrong and Drew,1984a,b;Beloozerova and Sirota, 1993a,b; Stout and Beloozerova, 2012). Second, in our recent study we have examined 229 full-body biomechanical variables of cats walking on the flat surface and along a horizontal ladder with flat rungs placed at a convenient for the cat distance (Beloozerova et al., 2010). We found that on such ladder, cats step on support surface with much less spatial variability (**Figure 1B**) but the overwhelming majority of other biomechanical variables do not differ between the tasks. This suggests that afferentation received by the MC during simple and ladder locomotion may be very similar. While it was shown that the level of fusimotor activity is often higher during difficult motor tasks, especially those that are novel, strenuous, or are associated with high degree of uncertainty (Prochazka et al., 1988; Hulliger et al., 1989), our ladder locomotion task was well practiced, entirely predictable, and, judging from levels of EMG activity (Beloozerova et al., 2010) not at all strenuous. Thus, it does not seem very likely that a difference in the proprioceptive afferentation between simple and ladder locomotion can be responsible for the entire volume and spectrum of discharge differences of MC, VL, and RE neurons during these two tasks. Nevertheless, in the majority of these neurons, discharge rate averages, peak values, depths of stride-related frequency modulation, and duration of PEFs are very different during ladder locomotion as compared to simple walking (**Figure 9**). We suggest that during ladder locomotion MC activity reflects processes that are involved in integration of visual information with ongoing locomotion and represents cortical commands that control stride length. These controls are different for different joints of the forelimb.

Shoulder-related PTNs often increase their discharge rate and depth of modulation while reducing discharge duration. They typically do not change their preferred phase, but as a group become more active at the end of stance (**Figures 10** and **11**). Such activity modifications are consistent with the hypothesis that during precise stepping shoulder-related PTNs have a significant role in planning of limb transfer, which is believed to occur before the end of stance phase (Laurent and Thomson, 1988; Hollands and Marple-Horvat, 1996), as well as in the initial phases of limb transfer when adjustment of the foot trajectory is still possible (Reynolds and Day, 2005; Marigold et al., 2006). In addition, during the second half of stance, accurate paw placement of the opposing limb is taking place, and precise posture maintenance from the supporting limb is important to maintain balance. This could be another reason for shoulder-related PTNs, specifically those related to shoulder extension, to increase their activity and modulation during stance.

Wrist-related PTNs, whose activity was fairly evenly distributed throughout the cycle during simple locomotion, as a group became strongly modulated, exhibiting a prominent activity peak during swing (**Figures 10** and **11**). In contrast to shoulder-related PTNs, individual wrist-related PTNs often decreased discharge rate while also increasing depth of modulation and reducing their discharge duration. Such activity modifications are consistent with the hypothesis that wrist-related PTNs, specifically those related to the wrist plantar (ventral) flexion, are involved in distal limb transfer during accurate target stepping by ensuring greater plantar flexion of the wrist during the swing phase during ladder locomotion (**Figure 2**). It is well-known that activation of the MC results in contraction of more flexor than extensor muscles, and this rule holds during locomotion (Armstrong and Drew, 1985a).

Although both shoulder- and wrist-related PTNs often increase modulation during ladder locomotion as compared to simple walking, they generally do so using different mechanisms (Stout and Beloozerova, 2012). Shoulder-related PTNs often achieve an increase in modulation by increasing their peak discharge rate. This is likely to result in a more intensive signal to the spinal network, often along with a more specific timing of the discharge. Wrist-related PTNs achieve increases in the modulation chiefly by decreasing the firing outside of PEF, thus increasing the salience of the signal without making it more intense. This modification could specifically improve the temporal precision of the controls for limb transfer during a accurate stepping task.

In contrast to shoulder and wrist-related PTNs, upon transition from simple to ladder locomotion, elbow-related PTNs do not often increase the depth of modulation or discharge duration, but often increase discharge rate and change preferred phase. Their group activity becomes evenly distributed throughout the cycle during complex locomotion (**Figures 10** and **11**). The change in the preferred phase and the number of PEFs might reflect incorporation of visual information about the location of crosspieces into the CPG activity-based locomotor pattern, serving to "tweak" the limb into place to secure accurate stepping. The generally elevated activity of the elbow-related group is likely to enhance efficacy of their influence during complex locomotion task.

An effective way for PTNs to differentially influence different segments of the forelimb during locomotion is to influence individually the respective locomotion pattern formation networks in the spinal cord (McCrea and Rybak, 2008) by modulating the amplitude and potentially the timing of their output. Indeed, Asante and Martin (2010) recently found that in the mouse spinal

projections from shoulder-, elbow-, and wrist-related areas in the MC primarily contact those spinal premotor circuits that connect to shoulder-, elbow-, and wrist-related motoneuron pools, respectively. Based on results of experiments with micro-stimulation in the MC, analogous mechanisms for control of limb segments have been previously suggested by Drew (1991) for the forelimb and by Bretzner and Drew (2005) for the hind limb of the cat.

# **SIGNALS FROM THE VL-TO-MC DURING ACCURATE STEPPING CONTAIN INTEGRATED VISUO-MOTOR INFORMATION FOR FOOT PLACEMENT, DIFFERENTIATED BY FORELIMB JOINT**

How are motor cortical controls for shoulder, elbow, and wrist formed? The main subcortical input to the MC comes from the VL. The VL obtains locomotor CPG-generated information from the cerebellum, receives direct input from the spinal cord, and also receives visual information from the cerebellum and probably from the cortex. We found that during locomotion VL neurons discharge in a manner that is very suitable to contribute to the additional modulation of the activity in the MC that occurs during locomotion over complex terrain. Namely, the activity of VL neurons with one PEF is modulated more strongly on the ladder than during simple locomotion, the overwhelming majority of individual VL neurons change their discharges upon transition from simple to ladder locomotion, and the dominant change, similar to that in the MC, is an increase in the depth and temporal precision of the modulation.

What is the content of information conveyed by the VL to the motor cortex during ladder locomotion? Considering the rather similar limb motor patterns (Beloozerova et al., 2010) but dramatically different gaze behaviors (Rivers et al., 2009, 2010) in the two locomotion tasks, we suggest that at least a part of the differences in discharges of VL neurons during simple and ladder locomotion reflects differences in processing of visual information during these two tasks, as well as the changes in motor commands made on the basis of visual information. During locomotion in complex environments, visual information about the position of the stepping target is first processed through visual networks and then at some point is incorporated into the basic locomotion rhythm in order to guide the limb. From this point on it becomes integrated "visuo-motor" information that, in the afferent sense, is "(processed) visual information," while in the efferent sense it is a "limb control signal" reflecting preparation of the movement. It has been suggested that visual information about the environment is integrated with movement-related information in the cerebellum, and then funneled to the motor cortex via the VL for control of limb movements (Glickstein and Gibson, 1976; Stein and Glickstein,1992;Glickstein,2000). Our data indicate, however, that the VL is more than a simple relay for signals passing to the MC during ladder locomotion. Many of VL neurons discharge in

### **REFERENCES**


other motor cortical neurones during locomotion in the cat. *J. Physiol.* 346, 471–495.

Armstrong, D. M., and Drew, T. (1984b). Locomotor-related neuronal discharges in cat motor cortex compared with peripheral receptive fields and evoked movements. *J. Physiol.* 346, 497–517.

different phases of the cycle during simple and ladder locomotion. This shows that information related to the complex environment changes the basic locomotion-related discharge pattern of VL neurons. In our original research report we have described five major modes of this integration (Marlinski et al., 2012a).

# **THE RE DIFFERENTIALLY GATES TC SIGNALS DEPENDING ON LOCOMOTION TASK**

Two thirds of RE neurons change at least one aspect of their activity upon transition from simple to ladder locomotion. This indicates that participation of the RE in shaping of VL signals going to the MC depends on the task. The mean and peak activities in 33–37% of RE neurons during ladder locomotion are different from those during simple walking. This signifies differences in the intensity of regulation of the VL-to-MC transmission between two tasks. Differences in the depth of modulation in 40% of RE neurons mean differences in the salience of the RE to VL influence. Differences in the preferred phase, duration of PEFs and/or in the number of PEFs mean differences to the timing of RE influences on the thalamo-cortical signal transmission, and these are often seen in RE neurons between two tasks.

# **CONCLUSION**

In this review, we have presented the results of a series of studies that examined the differences in the activities of shoulder-, elbow-, and wrist/paw-related neurons in the thalamo-cortical network for locomotion. Substantial differences were found both between the subpopulations of neurons with different receptive fields within each of the MC, VL, and RE, as well as between neurons with similar receptive fields residing in different motor centers. We conclude that the thalamo-cortical network for locomotion processes information related to different segments of the forelimb differently and exerts distinct controls over shoulder, elbow, and wrist. We hypothesize that this contributes to an effective control of a global limb parameter, the length of the stride, which results in a great reduction in variability of paw placement during accurate stepping. It is one of manifestations of a modular organization of control for locomotion. The efficacy and contribution of synaptic connections between neurons with similar and dissimilar receptive fields in different sites in the thalamus and cortex need to be determined, however, to further reveal the operation of thalamo-cortical neuronal network during locomotion.

### **ACKNOWLEDGMENTS**

The authors are grateful to Dr. Boris I. Prilutsky for a fruitful collaboration on the analysis of biomechanics of cat locomotion, for many insightful discussions, and a critical review of the manuscript. The research was supported by National Institute of Neurological Disorders and Stroke grants R01 NS-39340 and R01 NS-058659 to Irina N. Beloozerova.


in the cat. *J. Physiol.* 367, 327–351.


cerebellum during locomotion under different conditions. *J. Physiol.* 400, 425–445.


tract neurons in the cat during postural corrections. *J. Neurophysiol.* 93, 1831–1844.


*and Large-scale Networks Organize Sleep Oscillations*. New York: Oxford University Press.


guidance of movement. *Physiol. Rev.* 72, 967–1017.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 23 December 2012; accepted: 30 April 2013; published online: 21 May 2013.*

*Citation: Beloozerova IN, Stout EE and Sirota MG (2013) Distinct thalamo-cortical controls for shoulder, elbow, and wrist during locomotion. Front. Comput. Neurosci. 7:62. doi: 10.3389/fncom.2013.00062*

*Copyright © 2013 Beloozerova, Stout and Sirota. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, providedthe original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

# Neural bases of hand synergies

#### *Marco Santello1 \*†, Gabriel Baud-Bovy2,3† and Henrik Jörntell 4†*

*<sup>1</sup> Neural Control of Movement Laboratory, School of Biological and Health Systems Engineering, Arizona State University, Tempe, AZ, USA*

*<sup>2</sup> Department of Robotics, Brain and Cognitive Sciences, Istituto Italiano di Tecnologia, Genova, Italy*

*<sup>3</sup> Faculty of Psychology, Vita-Salute San Raffaele University, Milan, Italy*

*<sup>4</sup> Neural Basis of Sensorimotor Control, Department of Experimental Medical Science, Lund University, Lund, Sweden*

### *Edited by:*

*Andrea D'Avella, IRCCS Fondazione Santa Lucia, Italy*

### *Reviewed by:*

*Kazuhiko Seki, National Institute of Neuroscience, Japan Simon Overduin, University of California, Berkeley, USA*

### *\*Correspondence:*

*Marco Santello, Neural Control of Movement Laboratory, School of Biological and Health Systems Engineering, Arizona State University, 501 East Tyler Mall, ECG Building, Suite 346, Tempe, AZ 85287-9709, USA. e-mail: marco.santello@asu.edu*

*†These authors have contributed equally to this work.*

# **INTRODUCTION**

The structure of the hand, with its intricacy of bones, muscles, tendons, blood vessels, and nerves, is a marvel of evolution yet unsurpassed by any artificial hand. At the functional level, the hand is also a marvel of dexterity and versatility that combines a rich sensory endowment with strength: it holds the scalpel of the neurosurgeon, the pen of the scribe, the brush of the aquarellist as well as the hammer of the blacksmith or the sword of the warrior. One sign of the importance of the hand for humans is its remarkable evolution, with the development of the opposable thumb which underlies all skilled procedures of which the hand is capable (Napier, 1956, 1980) and, at the neuronal level, of the corticospinal tract, which allows the brain to control it in a much more direct way than in other species. The role that the hand plays in almost all our activities and its adaptability to a wide range of behavioral contexts, have led some to see the hand as a simple executant of the commands coming from the higher centers in the brain.

Whereas the motor apparatus of the hand offers a tremendous movement range and adaptability (or, in more technical terms, features a very high number of *degrees of freedom,* DoF), this feature also makes the human hand exceedingly difficult to control—as underscored by the challenges faced by robotics and neuroprosthetics in controlling the latest generation of anthropomorphic hands. The biomechanical and functional characteristics of the hand make it a remarkable model to investigate how the brain controls the many DoFs of the body, one of the fundamental problems of motor control (e.g., Bernstein, 1967; Turvey, 2007; Latash, 2008). As pointed out by many investigators, having a large number of DoFs allows one to perform a task in

The human hand has so many degrees of freedom that it may seem impossible to control. A potential solution to this problem is "synergy control" which combines dimensionality reduction with great flexibility. With applicability to a wide range of tasks, this has become a very popular concept. In this review, we describe the evolution of the modern concept using studies of kinematic and force synergies in human hand control, neurophysiology of cortical and spinal neurons, and electromyographic (EMG) activity of hand muscles. We go beyond the often purely descriptive usage of synergy by reviewing the organization of the underlying neuronal circuitry in order to propose mechanistic explanations for various observed synergy phenomena. Finally, we propose a theoretical framework to reconcile important and still debated concepts such as the definitions of "fixed" vs. "flexible" synergies and mechanisms underlying the combination of synergies for hand control.

**Keywords: degrees of freedom, premotor neurons, manipulation, motor cortex**

a wide variety of ways, which presents considerable advantages in terms of flexibility and adaptability. For example, one might grasp an object or distribute the forces holding the object differently depending on specific and possibly contingent events such as holding two cups at once, or grasping them using a broken finger. However, the flexibility afforded by the many DoFs of the hand, which allows it to be used in a wide variety of situations, comes also with the price that the central nervous system (CNS) must control a system that is in general vastly more complex than necessary to execute any particular task. This control problem becomes even more daunting if one considers the entire time course of an action. A simple count of the number of mechanical DoFs like the number of joints or muscles in the hand vastly underestimates the complexity of the problem that the motor system must solve since it amounts to counting the number of parameters necessary to specify the hand state at a precise moment in time. Most actions are dynamic and require a constant and mutual adjustment of all elements of the system. For example, reaching for an object requires transporting the hand while orienting and pre-shaping it, as well as making postural adjustments to keep the body's center of gravity within the base of support defined by the positions of the feet. The number of redundant DoFs increases almost without limits if we consider the temporal evolution of the system rather than a static snapshot of it.

The concepts of synergy and "synergy control" have attracted considerable interest in motor control neuroscience in recent years as a possible solution to this problem. According to Turvey (2007), a synergy is "a collection of relatively independent degrees of freedom that behave as a single functional unit – meaning that the internal degrees of freedom take care of themselves, adjusting to their mutual fluctuations and to the fluctuations of the external force field, and do so in a way that preserves the function integrity of the collection" (p. 659). In other words, a synergy is a functional property of a multi-element system performing an action, whereby many elements of the system are or can be constrained to act as a unit through a few coordination patterns to execute a task. In principle, "synergy control" could combine dimensionality reduction with great flexibility.

The main goal of this review is to illustrate how various analyses of hand motor control point to the fact that the control problem of apparently complex manual tasks is solved by extensive dimensionality reduction and that the number of effective DoFs present in a task might actually be quite small. We will start out by describing the peripheral apparatus and the biomechanical characteristics of the hand that are more relevant to this review. We then review work on kinematic synergies, i.e., tasks involving actual hand movements, and proceed with force synergies, i.e., tasks involving different kinds of grasping and the force equilibrium maintained through the fingertips. In the last section we review evidence for synergy control embedded in the "infrastructure" of the nervous system, i.e., the neuronal circuitry involved in hand movement control. Finally, we conclude with open questions and directions for future research.

# **BIOMECHANICAL CONSTRAINTS**

The hand is first and foremost a complex mechanical structure that includes 27 bones actuated by 18 intrinsic and 18 extrinsic muscles by means of a complex web of tendons. A first measure of the complexity of the hand is to consider the number of joints in the hand. A simple kinematic model of the hand typically consists of four DoFs for each finger, four or five DoFs for the thumb, plus one DoF at the radio-ulnar joint and two DoFs at the wrist, yielding a total of 23 or 24 DoFs. A more detailed kinematic model would include additional parameters to describe arching of the palm that occurs when the thumb tip approaches the tip of the ring or little fingers. A complete biomechanical model of the hand would also include the 36 muscles acting on the thumb and fingers and the complex web of tendons that actuate the hand, yielding total of at least 60 DoFs without even taking into account muscles acting on the wrist and radioulnar joints or additional DoFs associated with contact forces. Therefore, even for seemingly trivial motor tasks such pushing a key with a finger, a complete biomechanical account of the action would require a description of how the combined activation of many muscles contributes to generate desired motions and contact forces.

There are several reasons why knowledge of hand biomechanics is important for understanding the neural control of the hand. First, biomechanical constraints might limit the actions of the hand, therefore motor commands must adapt to these constraints to avoid engaging these actions, to find ways around the constraints, or to exploit them. For example, we cannot stick a finger in an S-shaped tube because of the impossibility to fold the two inter-phalangeal joints in different directions. Another constraint is that hand dimensions and digit lengths limit the size of an object that can be grasped with one hand. Secondly, biomechanical constraints can, in principle, reduce the number of independent DoFs. For example, extrinsic finger flexor and extensor muscles have tendons that span several joints of each finger (see **Figure 1**). Therefore, assuming a perfectly focal activation of a single muscle compartment (e.g., the index finger compartment of *m. flexor digitorum profundus*), contraction of that muscle would cause flexion at several joints. This partly accounts for the difficulty encountered when trying to move only the distal phalange of a finger without also moving the more proximal joints. It should also be noted that the biomechanical architecture constraining coupled actions at multiple joints *within* a digit also characterizes coupled actions *across* digits. Specifically, interconnections between tendons of hand muscles and long tendons spanning all finger joints limit the extent to which focal activation of hand muscles innervating a finger can isolate torque generation at one joint or at a given finger from torque generation in adjacent fingers [see Schieber and Santello (2004), for review of peripheral constraints on hand function]. Third, what might appear like simple movement, e.g., flexion of one finger around one joint, might require the coordinated action of several hand muscles (Reilly and Schieber, 2003; Schieber and Santello, 2004). Fourth, and most importantly, biomechanics is crucial to understanding the mechanical functions of a muscle and how motor commands adapt to task conditions. For example, finger posture has been shown to significantly affect the mapping between muscle activation and isometric joint torque production (Kamper et al., 2006).

This brief description of the hand anatomy should make clear that the brain does not directly control the movement of the individual muscles or joints of the hand. Therefore, the biomechanical structure of the hand defines a complex mapping between the motor commands and the observable motions or contact forces at the digits.

# **KINEMATIC SYNERGIES**

In the past 20 years, hand kinematics has been studied in a variety of tasks. Despite significant differences in the requirements of these tasks, all of these studies share a main observation: simultaneous motion of multiple digits occurs in a consistent fashion, even when the task may require a fairly high degree of movement individuation such as grasping a small object or typing.

### **COVARIATION OF FINGER MOVEMENTS**

The word "synergy" has been used in several contexts when describing thumb and finger movements, often referring to qualitative observations of movement patterns that are characterized by simultaneous motion of the fingers in the early stages of development (palmar grasp reflex) and for the purpose of classifying hand movements (Elliott and Connolly, 1984) or grasp postures along a functional gradient (Napier, 1956, 1980; Cutkosky, 1989). The first attempts at quantifying the kinematics of hand synergies focused on the spatial and temporal coordination of digit movements in serial tasks requiring isolated motion of one digit, i.e., typing (Fish and Soechting, 1992; Flanders and Soechting, 1992; Angelaki and Soechting, 1993; Gordon et al., 1994; Gordon and Soechting, 1995; Soechting and Flanders, 1997), and later extended to movement sequences involving motion of one or more digits [in piano playing and finger spelling (Engel et al., 1997; Jerde et al., 2003a,b)]. These studies found that when subjects typed a single letter with a finger ("focal movement"), motion that was not necessary to complete the task ("corollary motion") also occurred at other fingers in a subject-dependent but stereotypical fashion. These studies also reported that the degree of correlation of movement across pairs of digits—stronger for adjacent than non-adjacent digits and higher when neither of a pair of fingers is used to strike the key—was not obligatory, hence not uniquely due to biomechanical constraints (Fish and Soechting, 1992; Angelaki and Soechting, 1993; Soechting and Flanders, 1997). These early observations led to the suggestion that "*...* synergistic finger motions would simplify the problem of controlling the large number of DoFs inherent in motion of all of the fingers of the hand *...*" (Fish and Soechting, 1992). A subsequent typing study further revealed that only a few principal components (PCs) could characterize motion of a subset (17) of all kinematic DoFs of the hand, thus implying "*...* a reduction in the number of dofs independently controlled by the nervous system" (Soechting and Flanders, 1997) stemming from musculoskeletal and neural constraints. The consistency with which these constraints operate significantly facilitates hand shape recognition in finger spelling due to "leakage" of information across finger joint angles (Jerde et al., 2003a). Similarly to what was found for typing movements, however, covariations in finger joint excursions exhibit some degree of task-dependency as indicated by the sensitivity of hand shape to the preceding or following letter (Jerde et al., 2003b). A recent application of PC analysis has been introduced to study sensorimotor transformations by quantifying humans' ability to perceive and reproduce hand postures with the contralateral hand as a function of grasping force and forearm orientation relative to gravity (Pesyna et al., 2011).

# **DIMENSIONALITY REDUCTION IN THE ANALYSIS OF FINGER MOVEMENTS DURING REACH-TO-GRASP**

Research on synergy control has been devoted to developing analytical techniques to reveal if and how a reduction in the dimensionality of the hand control space is attained. In this section we discuss the reduction in dimensionality of hand kinematics defined as joint angles observed during grasping and reach-to-grasp movements.

The concept of synergies, often quantified and defined through dimensionality reduction techniques [principal components analysis, PCA; singular value decomposition, SVD; nonnegative matrix factorization, for review see Tresch et al. (2006)], has also been invoked when describing systematic covariations of joint angular excursions of hand postures used for grasping. The first description of hand postural synergies for grasping movements (Santello et al., 1998) was based on grasping a large number of imagined objects with different sizes and shapes. This design was motivated by the fact that hand shape at contact with an object results from central planning as well as the mechanical interaction of the hand with the object. By asking subjects to shape the hand to imagined objects, one could examine the central planning of hand posture as a function of object shape. Subsequent work on grasping real objects (Santello et al., 2002) confirmed the main observations made on hand posture used for grasping imagined objects by revealing, as expected, a larger number of PCs when physical contact was allowed.

Similar to the results of PCA of typing movements (Soechting and Flanders, 1997), the main finding of the study by Santello et al. (1998) was that a few linear combinations of the 15 DoFs that were measured could account for most of the variance in the original set of hand postures. The lower order components (PC1-3) described more "basic" patterns of finger motion, e.g., hand opening/closing caused by motion at all metacarpal-phalangeal or proximal-interphalangeal joints. Interestingly, however, hand shapes reconstructed by adding higher order PCs (up to the fifth or sixth) provided additional information about the object. These observations led to the suggestion that the control of hand posture might be implemented by a combination of postural synergies ranging from those responsible for the general shape of the hand (lower PCs) to those responsible for subtler kinematic adjustments (Santello et al., 1998). A similar framework was also proposed when interpreting the results of an earlier study on matching haptically or visually perceived object size (Santello and Soechting, 1997). As noted by the authors, PCs do not need have any physical significance, and therefore they cannot be used to infer the relative contribution of peripheral constraints vs. central commands in generating coupled motion of the digits. However, transcranial magnetic stimulation (TMS) of primary motor cortex can elicit synergistic finger movement patterns similar to those found when the same subjects grasped imagined objects (Gentner and Classen, 2006), suggesting a modular organization of cortical representations of hand muscles. Nevertheless—and as noted above for finger movement patterns in typing—covariation patterns of finger motion are not obligatory. This implies that hand synergies revealed by the above studies are not a mere byproduct of anatomical factors such as multi-tendoned and multi-joint muscles, and therefore that the CNS retains the ability of partially overriding, in a task-dependent fashion, anatomical constraints. Specifically, finer modulation of neuromuscular activity might be required to overcome peripheral coupling when the task requires independent finger actions (Santello et al., 1998) or to exploit it when multi-digit actions require all digits to act as a unit (Schieber and Santello, 2004). These considerations also explain why manipulations characterized by different requirements in the spatio-temporal coordination of the digits can elicit different number and patterns of PCs (Todorov and Ghahramani, 2004). Similarly, task-dependencies in finger joint angle covariations have also been reported when comparing whole-hand grasping of one object vs. individuated finger movements (Braido and Zhang, 2004). However, the relatively large number of PCs associated with haptic exploratory procedures can be used to reconstruct grasp postures, indicating some commonalities of digit movement coordination patterns across these tasks (Thakur et al., 2008).

Hand kinematics and coordination patterns of multi-digit motion, although not always quantified or defined as synergies in a lower dimensional space, have also been examined in terms of the temporal evolution of hand shape during reach-to-grasp as a function of object geometry (Santello and Soechting, 1998), sudden changes of either object shape (Ansuini et al., 2008), gravitational conditions (Micera et al., 2002), task in healthy subjects (Mason et al., 2001; Ansuini et al., 2006, 2008) and neurologically impaired individuals (Schettino et al., 2004, 2006, 2009; Ansuini et al., 2010), and sensory feedback in humans (Santello et al., 2002; Schettino et al., 2003; Winges et al., 2003) and nonhuman primates (Mason et al., 2004; Theverapperuma et al., 2006). These studies have characterized tendencies in task and/or sensory modality-specific kinematic coordination patterns. For example, it has been shown that online visual feedback of the hand and/or object are not necessary for whole-hand shaping to object geometry (Santello et al., 2002; Mason et al., 2004) but it might nevertheless be used to modulate finger motion in the late portion of the reach (Schettino et al., 2003). Another important observation is that the temporal evolution of hand posture to object shape in monkeys occurs through a stereotypical temporal coordination of finger motion superimposed on a movement amplitude scaling (Theverapperuma et al., 2006). More recently, dimensionality reduction in hand control has been described during bimanual grasping (Vinjamuri et al., 2008) and while learning cursor control through finger movements (Vinjamuri et al., 2009). These authors have also used PCA and SVD analyses to characterize digit joint velocities of grasping movements performed at natural (Vinjamuri et al., 2007) and fast speed (Vinjamuri et al., 2010a,b).

In summary, all these studies of hand kinematics point to the same, main conclusion: the dimensionality of the kinematic space of a large repertoire of hand behaviors is significantly smaller than that defined by the number of the hand's mechanical DoFs.

# **FORCE SYNERGIES**

Grasping can be defined as holding an object stationary within the hand<sup>1</sup> . Precision grasps predominantly involve the fingertips while power grasps usually involve contact with more extended parts of the hand such as the palm. Grasp planning requires that one selects a grasp (e.g., whether to use a precision or power grasp) as well as the position of the contact points on the object for a particular grasp. In precision grasps, the position of the fingertips on the object plays an important role in determining the net force/torque that can be produced. It should be noted that some choices might be incompatible with the task constraints. For example, it might be impossible to use a pinch grasp to lift a very slippery object. Alternatively, multiple grasps may be compatible with task demands. Finally, some choices might lead to desirable properties of the grasp. For example, force closure ensures that it is possible to produce an arbitrary net force and/or torque while form closure ensures a stable grasp in the absence of friction forces (Bicchi et al., 2011).

Once a contact is established, a precise control of the contact forces with the object and/or net force is necessary in order to be able to perform any skilled manipulation. For example, when holding an object immobile in the air, the net force must balance exactly the gravitational force. Similarly, the use of most tools requires a precise control of interaction forces. While a complete mechanical analysis of the finger forces during this phase is outside the scope of this review (see **Inset 1**), task and frictional constraints do not, in general, fully specify the contact forces. In other words, from a control point of view, there is usually an infinite number of ways of setting the contact forces that are compatible with all constraints (Murray et al., 1994; Li et al., 1998; Zatsiorsky et al., 2003a).

The extent to which the CNS can control finger forces independently has been studied in multi-finger pressing tasks where one must produce force by pressing on a surface with one or more fingers. In these studies, a sensor measured the force produced by each finger while subjects were instructed to produce certain levels of force. A common observation is that all fingers produced forces even when the participant was instructed to apply a force with only one finger (Li et al., 1998; Reilly and Hammond, 2000, 2004; Zatsiorsky et al., 2000). This phenomenon, called *enslaving*, is in agreement with the observation that it is difficult to move one finger without moving the others [Kilbreath and Gandevia, 1994; Hager-Ross and Schieber, 2000; Lang and Schieber, 2003; Kim et al., 2008; see above *Kinematic Synergies* section].

In tasks involving grasping, lifting, and holding an object against gravity with five digits, the variations of the contact forces while holding an object have also been analyzed to understand the spatial and temporal coordination of multi-digit forces. For example, Santello and Soechting (2000) described consistent in-phase relations between pairs of digit normal forces. These coordination patterns, also found regardless of hand dominance or object property predictability (Rearick and Santello, 2002), could theoretically be dismissed as a by-product of biomechanical factors constraining the temporal relations among digit forces. However, a subsequent study revealed these patterns to be taskdependent as they disappear when the same forces are applied to the object resting on a table as opposed to being held against gravity (Rearick et al., 2003). These findings suggest an active, neural component compensating for temporal fluctuations in multi-digit forces to satisfy the above-described constraints of mechanical equilibrium.

<sup>1</sup>In-hand manipulation of objects are not reviewed here as they have not been studied to the same extent.

### **Inset 1 | Task constraints and degrees of freedom in the control of contact forces.**

Force synergies reflect coordination patterns at the level of the contact forces. This inset gives an overview of the contact forces involved in precision grasps in order to clarify the parameters that need to be controlled and the task constraints that act at this level. Contacts in precision grasps involve small areas of the hand such as the fingertips and can be modeled as a force applied at a point on the object together with a moment along the normal of the contact surface [the so-called *soft finger model*, (Murray et al., 1994)]. According to this model, four parameters are needed to describe each contact. More complex models accounting for the geometry and/or compliance of the finger pads have been developed in robotics but are outside the scope of this article as they are rarely used to analyze human grasps.

When grasping and manipulating an object, *task constraints* define the net force and moment. For example, when holding an object immobile in the air, the net force must oppose the gravitational force and the net moment must be 0. To satisfy these constraints, the contact forces must satisfy *equilibrium equations* that relate the finger forces to the net force and moment. In addition, each contact force must also satisfy a *frictional constraint* which specifies that the normal force (i.e., the force component perpendicular to the contact surface) must be larger than the tangential force (i.e., the force component parallel to the contact surface) divided by the coefficient of friction of the contact surface to avoid a slip of the finger. Geometrically, this constraint states that the contact force must be in the so-called friction cone, the aperture of which depends on coefficient of friction of the surface (see **Figure 2**). Often, this constraint implies that one must squeeze the object with more force to increase the normal forces when the load (i.e., the tangential force) increases.

In the analysis of contact forces, the complexity of the control task is often defined in terms of the number of redundant DoFs, i.e., the number of DoFs that are not constrained by the task. In general, the number of redundant DoFs increases with the number of digits involved in the grasp. For example, in precision grasps where the structure of the hand allows control of the direction and magnitude of the force at each fingertip independently, there are at least 3 redundant DoFs in a tripod grasp (left panel of **Figure 2**) and 9 redundant DoFs in a five-digit grasp. This high degree of redundancy raises the question of how the brain selects one solution among all possible ones (Bernstein, 1967).

Counting the number of degrees of freedom is not without difficulties however (see also Introduction). In the analysis of force synergies, the number of DoFs of the grasp is often defined as the total number of parameters needed to specify the contact forces. However, this definition assumes that all parameters can be controlled independently, which is not always the case. For example, when a finger makes two contacts with an object, it is impossible to control the direction and magnitude of the two contact forces independently. Similarly, when holding a pen, the structure of the hand does not afford control of the contact force between the pen and the palm independent from the contact force between the fingertips and the pen (see right panel of **Figure 2**). In these cases, a more complex analysis of the structure of the hand is necessary to identify the effective number of DoFs.

A striking observation in force production and grasping tasks is the high amount of inter-trial variability present in these tasks. For example, in multi-finger force production tasks where the total force is constrained, the normal forces produced by each finger vary much more than the total force (Latash et al., 2001). In these tasks, fingers clearly act synergistically to produce a stable performance.

### **HIERARCHICAL CONTROL OF CONTACT FORCES**

To explain force synergies observed in grasping, a general idea is that contact forces are controlled hierarchically. In other words, higher-level constraints might impose coordination patterns between the contact forces.

Latash and collaborators have proposed that the variables controlled by the CNS are not the individual finger forces but the *force modes*, i.e., force patterns distributed across fingers reflecting the enslaving phenomenon in force or grasping tasks (Danion et al., 2003). Their analysis focused on the normal component of the contact forces and they assumed the existence of five force modes, one per finger. The instruction of pressing against a surface with only one finger would presumably activate one mode that primarily controls the instructed finger. Different force patterns required in other tasks would be obtained by activating simultaneously two or more force modes. Follow-up studies extended this idea to various grasping tasks (Zatsiorsky et al., 2003b). Conceptually, the force modes correspond to a re-parameterization of the finger force parameters but it is not clear whether they actually reduce the number of parameters that the CNS needs to control. Moreover, a comparison between the results obtained from the same individuals in pressing and grasping tasks showed that force modes could be both task-specific (Olafsdottir et al., 2005) and subject-dependent (Gao et al., 2003). This flexibility, however, raises the question of whether controlling an alternative set of variables (the force modes) to control contact forces is actually simpler than controlling contact forces directly. Therefore, one might as well look for multi-digit synergies not at the level of force modes, but at the level of the components of the contact forces themselves [(Latash, 2008), p. 207].

The analysis of the variability in motor tasks by the same group of researchers led them to formulate the *uncontrolled manifold (UCM) hypothesis,* which they have used to discover and quantify synergies (Scholz and Schoner, 1999; Scholz et al., 2003; Kang et al., 2004). The hierarchical nature of the UCM hypothesis is reflected in the distinction between *elemental* and *performance* variables. Elemental variables are loosely defined as related to the parts of the system (e.g., components of the contact forces) while performance variables are related to the task (e.g., production of a given total force across multiple digits). This classification is central to establishing another distinction between the so-called bad variance (VB), i.e., variability of the elemental variables that would lead to a loss of precision in the task, and good variance (VG), i.e., variability of elemental variables that would compensate each other and thus not affect the performance. A large ratio VG/VB would be the sign of a strong synergy while a smaller ratio would correspond to a weak synergy.

Another proposal to explain the observed coupling between finger forces is the *virtual finger (VF) hypothesis*. This hypothesis proposes that the brain controls the position and force of one or more VFs at a higher level of the control hierarchy (Iberall et al., 1986; Baud-Bovy and Soechting, 2001; Shim et al., 2003; Zatsiorsky et al., 2003b; Smith and Soechting, 2005). At a lower level, the forces produced by the physical fingers would be coordinated to match the constraints induced by the VF(s). Unlike the force mode hypothesis, the VF hypothesis does not determine a fixed pattern of covariation between the contact forces because the constraints induced by the VF(s) still leave redundant DoFs in the grasp. For example, the VF hypothesis in the tripod grasp can account for the coupling between contact forces of the index and middle fingers, but does not fully specify their directions, which might be optimized to increase the stability of the grasp as a function of object shape (Baud-Bovy and Soechting, 2001).

One issue with hierarchical control schemes is that higher-level constraints do not in general specify all parameters at the lower level, which raises the question of how remaining parameters are selected (see **inset 1**). One partial answer is that these parameters might be selected so as to maximize the stability and efficiency of the grasp. More generally, the core idea of optimal control is that behavior reflects the optimum of some cost function (Todorov and Jordan, 2002; Todorov, 2004). In the context of grasping, this general framework has been used to explain both how a set of contact positions (Friedman and Flash, 2007; Lukos et al., 2007, 2008; Ciocarlie et al., 2009; Fu et al., 2010, 2011; Craje et al., 2011; Gilster et al., 2012) and contact forces (Hershkovitz et al., 1997; Chalfoun et al., 2004; Pataky et al., 2004; Baud-Bovy et al., 2005; Niu et al., 2009; Terekhov et al., 2010) are selected. However, a problem with this framework is that it does not explain how the optimal solution might be actually computed in a biologically plausible manner.

Another possibility might be that the CNS does not control all parameters of the grasp (e.g., contact forces, finger positions, net force, and torque). Actually, this might be excessively difficult to do because it would require the CNS to have an inverse model of the complex biomechanical structure of the hand in order to activate the muscles to obtain a very specific level of force at each contact point to satisfy the equilibrium equations. Instead, the CNS might rely on the compliance of the fingers to balance the contact and external forces. In this case, the coordination patterns observed between the contact forces could reflect both the passive properties of the hand-holding-an-object system and the active contribution of the CNS (Ostry and Feldman, 2003; Winges et al., 2007; Gabiccini et al., 2011). Recent work has been conducted to re-interpret multi-digit synergies in the framework of the equilibrium point hypothesis (Pilon et al., 2007; Latash et al., 2010).

# **MUSCLE SYNERGIES**

The hypothesis that movements are generated by combining a small group of muscles has been extensively studied since the early observations of Sherrington (1910) on the wide range of reflexmediated movement patterns in the cat. These combinations are generally referred to as "muscle synergies" [for recent reviews, see Ting and McKay (2007), Tresch and Jarc (2009)]. Whereas the number of muscles acting around one or more joints is finite, the number of muscle synergies can theoretically be very large when considering modulation of the timing at which muscles can be activated relative to each other and/or the number of motor units that can be recruited in a given muscle. The timing of muscle and motor unit activation is best measured by direct recordings of their activity through electromyographic (EMG) recordings.

# **EMG AND THE STUDY OF MUSCLE SYNERGIES IN THE HAND**

EMG has been extensively used as a tool to study the spatial and temporal coordination of multiple muscles. The characteristics of the EMG signals recorded from active muscles, e.g., its magnitude, frequency content, and/or timing, all reflect the net output of the interactions of neural inputs to the spinal alphamotorneurons (alpha-MNs). Therefore, EMG signals recorded from muscle fibers innervated by the motor nuclei of hand muscles have been studied to determine the organization and plasticity of the neural drive responsible for coordinating multiple hand muscles during individual finger movements or multi-digit movements, e.g., grasping and manipulation.

The divergence of inputs to motor units of hand muscles has been quantified by measuring temporal and/or frequency correlations in EMG signals (motor-unit synchrony and coherence, respectively) at the single unit level or as multi-unit interference EMG [for review see Santello (in press)], as well as by measuring correlations in the magnitude of EMG signals across muscles acting on one (e.g., Valero-Cuevas et al., 1998; Valero-Cuevas, 2000), or two digits during force production (Maier and Hepp-Reymond, 1995), and three digits during object hold (Danna-Dos Santos et al., 2010; Poston et al., 2010). The existence of these correlations is interpreted as denoting common inputs to hand motor nuclei, and therefore could be considered as a manifestation of "building blocks" of muscle synergies. Correlations in EMG amplitude of hand muscles, quantified through PC analysis, have also been described for whole-hand grasping (Weiss and Flanders, 2004) as well as for tasks that do not involve contact forces, e.g., finger spelling (Weiss and Flanders, 2004; Klein Breteler et al., 2007). In non-human primate models, covariation in EMG amplitude of hand muscles has been described when reaching to grasp objects with different shapes (Brochier et al., 2004). A recent study provided further evidence for muscle synergies in a non-human primate model by revealing the existence of a small number of EMG synergies that could capture the variance of EMG activity patterns elicited by grasping objects of variable shapes and sizes (Overduin et al., 2008). A follow-up study further revealed that similar postures could be elicited by electrical microstimulation to motor cortical areas (Overduin et al., 2012).

Human studies using analyses of EMGs from all muscles inserting on the index finger during force production in different directions revealed that the variance-per-dimension (one for each of the 7 muscles) was smaller in the task-relevant subspace than in the task-irrelevant subspace (Valero-Cuevas et al., 2009). This finding supports the "principle of minimum intervention" or optimal sensorimotor control (Todorov and Jordan, 2002; Todorov, 2004), hence compatible with the earlier framework of synergies proposed by Bernstein (1967) (see also above-discussed uncontrolled manifold hypothesis). However, the non-negligible variance in all seven dimensions was also interpreted as evidence against the framework of muscle activation as resulting from the combination of a small set of synergies. A recent study further revealed that EMG amplitude of intrinsic muscles is modulated in tandem with that of extrinsic muscles when holding an object at different wrist angles, even though wrist posture changes the length and moment arm of extrinsic muscles only (Johnston et al., 2010a). The mechanisms underlying these covariations and the cost function(s) being minimized through these coordination patterns remain to be understood. Nevertheless, these findings indicate that the CNS consistently exploits a small set of solutions (motor commands) among many equally valid ones. The question of the extent to which the above-described covariation patterns of EMG amplitudes can be flexibly modulated to task conditions or, conversely, reflect relatively rigid neural constraints, is addressed in the Discussion.

### **INTERPRETATION OF SYNERGIES IN EMG RECORDINGS**

When examining the activity of concurrently active motor units, the nature of these common inputs can be further distinguished depending on the lags between action potentials, nearsynchronous discharges being indicative of shared inputs from branched axons of single last-order neurons (short-term synchrony) and longer lags denoting synchrony of separate presynaptic inputs to the alpha-MNs (Kirkwood, 1979). Inferences about the mechanisms responsible for coherence between spike trains of motor units (EMG-EMG coherence) can be made based on the range of frequencies across which significant correlations occurs [i.e., periodic or non-periodic inputs; (Halliday and Rosenberg, 2000; Taylor and Enoka, 2004)], coherence strength (e.g., Johnston et al., 2005, 2010b; Winges et al., 2006), as well as by quantifying the relation between motor-unit synchrony and coherence (e.g., Semmler et al., 2003; Johnston et al., 2005) [for review, see Grosse et al. (2002)]. Motor-unit synchrony has been reported to occur both within and across muscle compartments of finger flexor muscles (i.e., Hockensmith et al., 2005; Winges et al., 2008) and extensor muscles (Keen and Fuglevand, 2004). Furthermore, in static grasp tasks involving multi-digit force coordination, e.g., object hold, it has been found that common neural input is heterogeneously distributed across hand muscles. Specifically, extrinsic hand muscles (e.g., long flexors of the fingers) tend to receive stronger common input than intrinsic hand muscles (Winges and Santello, 2004; Johnston et al., 2005; Danna-Dos Santos et al., 2010; Poston et al., 2010).

It should be noted that alpha-MNs are very large neurons, receiving thousands of synaptic inputs on their dendritic tree. Individual synaptic inputs generally have an amplitude of 0.1 mV or less (Asanuma et al., 1979), meaning that the effect of inputs from a single afferent neuron is normally negligible. It follows that synergistic muscle activation requires that the corresponding alpha-MNs share direct inputs from corticospinal tract neurons as well as indirect inputs through a relatively large number of spinal premotor neurons and that these inputs are activated in a concerted fashion (**Figure 3**). Due to the huge number of synaptic inputs to alpha-MNs and the stochastic properties of the spike firing in alpha-MNs (Moritz et al., 2005), it is difficult to establish the exact mechanisms underlying the moderate strength of synchrony across motor units of different hand muscles, or the difference in the extent of across-motor unit synchrony between long finger flexors and intrinsic muscles such as first dorsal interosseus and first palmar interosseus (Winges et al., 2008). Specifically, EMG recordings can only capture the *net* effect of excitatory and inhibitory inputs, while precluding a clear distinction between these two types of inputs. However, interestingly, synchronized discharge in alpha-MNs, at frequencies which are not present in the excitatory input signal to these neurons, may be generated for example by Renshaw inhibitory interneurons (Williams and Baker, 2009) or possibly other types of spinal inhibitory interneurons driven by Ia afferents (Jankowska, 1992).

# **NEURAL SYNERGIES**

In the previous sections, we have presented numerous behavioral and physiological observations indicating that the CNS operates in terms of synergy control with respect to the coordination of multi-digit movements and forces. Do the properties of the underlying neuronal circuitry support this notion? Whereas it is indisputable that activation of a given hand muscle generates torque across more than one joint through multi-joint tendons and passive linkages, it is less well-appreciated that in terms of neural control, activation of hand muscle synergies essentially seems to be an inevitable consequence of the known neural connectivity patterns. As we will present below, given that the "infrastructure" of the neuronal connectivity defines the lower bound of what can be achieved in terms of individuated muscle control, it is very difficult to see how the brain could possibly control the hand on a muscle-per-muscle basis.

# **ARRANGEMENT OF HAND MOTOR NUCLEI IN THE SPINAL CORD vs. NEURONAL CONNECTIVITY**

An important indicator of the nature of the neural control of the hand is the anatomical distribution of the motor nuclei innervating hand muscles. In a system in which there is a point-to-point innervation of single muscles, one would expect that the motor nuclei of individual muscles would lie separated in the nervous tissue, similar to the oculomotor nuclei in the brain stem.

Conversely, in a system in which multiple muscles as a rule are controlled as synergies, the motor nuclei would be expected to lie closely spaced, perhaps even partly overlapping. A relatively complete study of the distribution of hand muscle motor nuclei exists only for the cat (Fritz et al., 1986a,b) but the general findings seem applicable also to the monkey (Jenny and Inukai, 1983). These studies clearly show that the motor nuclei of individual hand muscles are extremely narrow in the transverse plane (i.e., in the cross-section plane of the spinal cord where the motor nuclei are only 2–4 alpha-MNs across) and that the alpha-MNs of different hand muscles are densely packed with the dendrites of the neurons partly extending into the adjacent nuclei containing the alpha-MNs of other muscles. In the longitudinal plane, these motor nuclei form extremely elongated structures, with a substantial overlap in the rostrocaudal extent of the different motor nuclei (Fritz et al., 1986a,b). Considering the wide distribution of the termination territories of single premotor axons (corticospinal and spinal interneurons) in both cat and monkey (Shinoda et al., 1979, 1981; Jankowska, 1992) in the transverse plane, the anatomical structure of the neuronal network seems to be a construct that would promote synergy control.

# **DIVERGENT CONNECTIVITY IN MOTOR PATHWAYS SUPPORTS SYNERGY CONTROL**

In addition to having a divergent innervation to target numerous alpha-MN pools (Shinoda et al., 1981), the majority of the corticospinal terminations in primates are made inside the population of spinal premotor interneurons in laminae VI–VIII (Bortoff and Strick, 1993), where they may be expected to primarily target spinal interneurons. Furthermore, the spinal interneurons of both cat and monkey branch in their innervation of the alpha-MN pools (Jankowska, 1992; Perlmutter et al., 1998; Takei and Seki, 2008, 2010) and hence may add an additional divergence factor on top of the divergent corticospinal terminations (Cheney et al., 1985). Although direct corticomotoneuronal connections have been the focus in the literature of hand muscle control, it is clear that indirect effects, presumably by way of the spinal interneurons, dominate the muscle activation from a single motor cortex cell (Schieber and Rivlis, 2007). In primates, although monosynaptic corticomotoneuronal effects are evident in alpha-MN, indirect effects mediated via the spinal interneurons can be more powerful (Isa et al., 2006, 2007; Alstermark et al., 2007, 2011). Furthermore, the neurons contacting individual hand muscles have a wide distribution across the motor cortex, which heavily overlaps the distributions of neurons controlling other hand muscles (Rathelot and Strick, 2006).

# **SPINAL PREMOTOR MACHINERY—EMBEDDED MOTOR FUNCTIONS AND SYNERGY CONTROL**

Much of the literature on neural control of hand movements typically takes its origin in the motor cortex. However, the field of motor cortex research, which has focused on finding correlations between motor cortex neuron discharge and physical parameters of the movement controlled, has failed to reach a consensus of what the functions of the individual neurons are. A possible caveat, which may explain why no consistent pictures have emerged, is that perhaps the neural code in motor cortex does not code for any physical parameter directly, but may rather be a compound code to be deciphered by the downstream neurons in the target region of the spinal cord.

Even for the simplest hand movements, neurons distributed over a large part of the motor cortex are activated (see for example, Schieber and Hibbard, 1993; Georgopoulos et al., 2007) and a single neuron is typically found to be engaged in a wide variety of movement types (Schieber and Hibbard, 1993; Poliakov and Schieber, 1998, 1999; Schieber and Rivlis, 2007). One interpretation of these findings is that in order to generate a movement, the motor cortex needs to control a large proportion of the spinal interneuron pool, which determines the excitability level of most or all of the alpha-MNs innervating the muscles within the arm and hand. Since spinal premotor neurons are strongly driven by peripheral feedback from Ia muscle afferents, Ib tendon organ afferents, and skin sensor afferents (Jankowska, 1992), they cannot be left uncontrolled during any movement or any phase of the movement since the outcome in terms of alpha-MN activation may be unpredictable. By allowing high excitability in *some* interneurons, these types of sensory feedback can instead be utilized by the CNS to become an integral part of the motor command, since the feedback during a given type of movement will become quite predictable as a specific movement pattern is established.

Interestingly, spinal premotor neurons, in addition to innervating alpha-MN nuclei, as a rule have a recurrent axon collateral that either go all the way up to the lateral reticular nucleus for transmission to the cerebellum as mossy fibers (Clendenin et al., 1974a,b; Alstermark et al., 1981; Ekerot, 1990) [including spinal neurons below the C3–C4 segments (Ekerot, 1990)], or to a local projection neuron that issues ascending axons that directly issue mossy fibers to the cerebellum (Oscarsson, 1973; Mrowczynski et al., 2001). This gives the cerebellum direct information about the involvement of the spinal premotor pool. Since it is likely that these premotor neurons are involved in the selection of local motor programs (Grillner, 2003) and thereby synergies, the cerebellum will also be informed about the synergies employed. Whether the implication is that the cerebellum has a role in the learning of muscle synergies is an interesting issue for future studies.

Detailed functional analysis indicates that the spinal cord is to be regarded as a full motor system in its own right (Raphael et al., 2010; Arber, 2012) and its circuitry can play a pivotal role in hand dexterity (Kinoshita et al., 2012). By using the spinal interneuron system, descending motor commands can act on a complex neuronal machinery that comes with a number of features that can facilitate motor control of the complex structure of the extremities. In line with this view, spinal interneurons are activated in advance of onset of movement and alpha-MN activation (Maier et al., 1998; Perlmutter et al., 1998; Prut and Fetz, 1999; Fetz et al., 2002), which can be interpreted as a step to set up the dynamics of this circuitry in advance of the start of the movement. By using the dynamics of the subcortical circuitry the motor cortex could be relieved of solving control issues at a high level of detail. An essential part of this semi-automatic control system is the peripheral feedback from the various sensors of the muscles, joints, and skin. However, although the spinal cord neuronal circuitry has been extensively studied (Jankowska, 1992; Kitazawa et al., 1993; Hultborn, 2001), the complexity of this network has so far prevented us from obtaining a detailed picture of its complete structure in terms of connectivity and function. This remains an important outstanding question for future research on brain synergy control.

# **DISCUSSION**

The many DoFs of the hand combined with its dexterity and versatility would suggest that the hand must be exceedingly complex to control. On the other hand, as seen in the previous sections, the analysis of the movement of the hand and contact forces has revealed that the DoFs of the hand are typically correlated at every level of description. It should be noted that it is often possible to reduce the observed behavior to the combination of a few basic patterns (see Kinematic and Force Synergies sections). Similarly, the analysis of muscular activities has shown evidence of a common drive between motor units of different muscles. Altogether, these observations indicate that synergy control is a pervasive element of hand function.

However, it should be noted that, in the literature, synergies have been defined in ways that reflect the level at which analysis is performed, even though all definitions point to a reduction in the dimensionality of the control space. Specifically, at the *kinematic level* synergies have been defined as covariation patterns among digit joints during reach-to-grasp and manipulation tasks. At the *kinetic level*, synergies denote coordination patterns among digit forces that are thought to minimize given cost functions. At the *neural level*, synergies consist of common divergent inputs to multiple neurons.

In this section, we propose a control scheme where the spinal circuitry would play a central role in explaining the observed coordination patterns. This proposed scheme could also explain a long-standing question in biology about whether synergies are flexible or fixed. Even during simple hand movements, cortical activation involves large parts of hand motor areas in the primary motor cortex, which via divergent direct and indirect connections would result in excitatory drive to a large proportion, most likely all, of the alpha-MNs innervating all the muscles of the lower arm. However, for any given alpha-MN, the output activity depends on the balance of excitatory and inhibitory inputs from the premotor neurons. As a consequence, inhibitory spinal premotor neurons, the only element providing synaptic inhibition to alpha-MNs, may play a key role in the actual selection of the muscles that will *not* be activated.

Overall, the pool of premotor neurons seems to include all the necessary circuitry to function as a dynamical system endowed with one or, possibly, several stable states corresponding to specific patterns of muscle activation or synergies. **Figure 4** provides a schematic representation of this theoretical framework, where the descending motor commands would control the dynamics of the system by controlling the shape of its potential function. We do not argue that the dynamics of the pool of premotor neurons can always be adequately represented by a potential function but this example is used here because it is sufficient to describe the concept of synergies as we conceive it. According to this view, a synergy corresponds to a pattern of muscle activation (**Figure 4A**). The number of premotor neurons and motor nuclei involved in a synergy might vary (**Figure 4B**). For example, the coordination of contact forces between five digits might require that a larger number of motor nuclei are coordinated together. This view is also compatible with the concept of motor primitives, which could correspond to a set of synergies that are simultaneously activated to control the hand. In this case, separate sub-pools of premotor neurons would form distinct dynamical systems that could each determine a specific muscle activation pattern or motor primitive (**Figure 4C**). The convergent input of these pools of excitatory and inhibitory premotor neurons

synergy is enabled, the current state (black sphere) of the system converges toward the stable state of the system, which establishes a specific coordination pattern or synergy between the alpha-MNs belonging to different motor nuclei. **(B)** Example of a synergy involving a larger

would summate in the alpha-MNs of the different muscles, hence determining the pattern of muscle activation (see also **Figure 3**). This scheme might allow the CNS to control hand muscle activation by combining together various activation patterns (see the above-discussed force mode hypothesis and kinematic synergies).

According to this view, the degree to which a specific synergy (under a given task condition) is stable (or the degree to which it will appear fixed) may reflect the degree to which this synergy is learned in the circuitry. In absence of well-defined synergies, the dynamic of the system formed by the pool of premotor neurons would correspond to a relatively flat potential function with multiple local minima. Sensory feedback or noise could disruptively affect the state of premotor neurons and spinal motor nuclei, which would have to be precisely controlled from higher centers. Another possibility in novel situations with no pre-learned movement patterns is that subjects may resort to using and combining pre-existing synergies. In this case, poorly learned tasks may therefore be associated with a larger degree of variability in the synergy selection, but this would not necessarily imply that the processing at the premotor neuron level itself is in an unstable state.

dashed curves). The red and green curves represent two stable synergies, in which the impact on the system dynamics of small variations of the motor commands is smaller (dashed curves). A narrow valley (red curve) would correspond to a fixed synergy while a wider valley (green curve) would correspond to a more flexible synergy.

In contrast, stable synergies would correspond to deeper valleys of the potential field (see **Figure 4D**). By definition, these deeper and possibly wider valleys make it less likely that the system will exit from this stable state. Interestingly, it might be possible to relate the degree of flexibility of a synergy to the width of the valley: a narrow valley would correspond to fixed synergy while a wider valley might correspond to a more flexible synergy. A narrow valley would result in a more rigid coordination pattern, less susceptible to disruptions or noise, thereby appearing fixed. In contrast, the state of the system and, therefore, the coordination pattern of a flexible synergy might fluctuate as a result of sensory feedback. In particular, a wide valley synergy might accommodate in a flexible manner perturbations or unexpected events such as grasping a bigger than expected object or an object slip during manipulation.

If the spinal interneuron pool is involved in the synergy formation mechanism, the types and patterns of sensory feedback may strongly influence the synergy. Spinal premotor neurons in many cases receive monosynaptic feedback from peripheral sensors and hence provide a very fast pathway for sensory feedback to affect the selection of activated alpha-MNs with short temporal delays. The strong drive provided by sensory feedback cannot be ignored by the higher centers that control the dynamic of the spinal circuitry underlying the formation of synergies. It is conceivable that the higher centers might set up the dynamic of this system in such a manner as to switch the motor output or the coordination pattern when some external event happens. For example, such mechanisms could be used to control the various phases of a lift in an automated manner (Johansson and Flanagan, 2009).

According to the ideas presented in this discussion, the direct corticomotoneuronal connections provided by some of the layer V neurons (i.e., the monosynaptic connections from the motor cortex to the alpha-MNs) would have almost no part in the formation of hand synergies because these connections do not influence directly the circuitry formed by the spinal premotor neurons. They also lack the inhibitory component which is necessary to prevent their widely divergent excitatory connections from resulting in the inadvertent activation of all hand muscles. This view, however, does not negate that these welldocumented connections play an important role in the control of the hand. This role might include, for example, a direct control of the hand movement when hand synergies are absent or not activated. These connections could also allow the cortex to control or modulate the effect of hand synergies by applying a bias to specific finger muscles when necessary. However, due to the divergence also of the direct corticomotoneuronal connections, this role, too, may be exerted in a synergistic fashion (Shinoda et al., 1979, 1981; Fetz and Cheney, 1980; Cheney and Fetz, 1985; Cheney et al., 1985; Buys et al., 1986; Lemon and Mantel, 1989; Bennett and Lemon, 1994; McKiernan et al., 1998; Rathelot and Strick, 2006, 2009). How cortical and spinal circuits interact in selecting and shaping hand synergies remains a major question in motor neuroscience. Nevertheless, the proposed framework capitalizes on recent developments in our understanding of the spinal machinery suggesting that the spinal circuits could explain many experimental observations about synergies as revealed by studies of hand kinematics, kinetics, and EMG. Note that our framework incorporates the notion of variable time shifts of individual synergies that is central to the notion of "time-varying" synergies [for review see Bizzi et al. (2008)]. Specifically, in our proposal, shifts in the temporal relation among synergies would result from time-varying interactions between cortical inputs to pre-motor neurons and afferent signals from the periphery.

If one accepts this view of synergy control, a number of interesting consequences unfold.

### **REFERENCES**


forelimb in the cat. 8. Ascending projection to the lateral reticular nucleus from C3-C4 propriospinal also projecting to forelimb motoneurones. *Exp. Brain Res.* 42, 282–298.

Alstermark, B., Pettersson, L. G., Nishimura, Y., Yoshino-Saito, K., Tsuboi, F., Takahashi, M., et al. (2011). Motor command for precision grip in the macaque monkey can be mediated by spinal

First, for a synergy which is often used, the premotor circuitry could adapt in a way that allows a fairly wide range of spatiotemporal patterns in the motor commands and still produce the same synergy. Compatible with this notion is the effect that cortical microstimulation in various places in the motor cortex of the monkey, which would be expected to evoke a large variety of spatiotemporal patterns in the corticospinal tract, evokes just a handful of synergies (Overduin et al., 2012). In other words, in the framework that we propose the brain might not have to control the state of the premotor and motoneurons with the same precision as it would in absence of synergies. The synergies, as already postulated by Bernstein, would greatly simplify performance of a given task by the CNS while minimizing the effects of noise from higher centers on motor output.

Secondly, with respect to the long-standing question about the degree of flexibility of synergies, the flexibility of the synergy would be related to the shape of the potential field of the dynamical system, which suggests that a whole range of possibilities exists between a deep valley (a fixed synergy) and a flat potential which would correspond to the absence of any synergy. This scheme also allows for the possibility that synergies could be fixed along some directions and flexible along other ones (for example, see the unconstrained manifold hypothesis).

Admittedly, this view is still more of a framework than a detailed model. Nonetheless, it clarifies the respective roles that the spinal circuitry, motor commands, sensory feedback, and learning could play in the formation of synergies. This theoretical framework remains to be thoroughly tested in experiments. In particular, numerous proposed circuitry mechanisms in brain synergy control remain almost completely unexplored. Altogether, these considerations point to the need for designing complementary behavioral, neurophysiological, and modeling studies to more conclusively demonstrate the interplay among the above factors underlying the modulation of synergies.

### **ACKNOWLEDGMENTS**

This research was partially supported by National Institute of Arthritis and Musculoskeletal and Skin Diseases (NIAMSD) Grant 2R01 AR-47301 and the European Commission with the Collaborative Project no. 248587, "THE Hand Embodied" within the FP7-ICT-2009-4-2-1 program "Cognitive Systems and Robotics." The authors thank Dr. Antonio Bicchi and Dr. Stephen Helms Tillery for insightful comments.

interneurons. *J. Neurophysiol.* 106, 122–126.


Ansuini, C., Giosa, L., Turella, L., Altoe, G., and Castiello, U. (2008). An object for an action, the same object for other actions: effects on hand shaping. *Exp. Brain Res.* 185, 111–119.


and function. *Neuron* 74, 975–989.


*Systems (IROS 2004),* Vol. 2. (Sendai, Japan), 1293–1298.


of propriospinal neurons in the C3-C4 segments mediating disynaptic pyramidal excitation to forelimb motoneurons in the macaque monkey. *J. Neurophysiol.* 95, 3674–3685.


in human subjects. *J. Physiol.* 479(Pt 3), 487–497.


kinematics and kinetics. *J. Neurosci.* 28, 12765–12774.


prehension. *Exp. Brain Res.* 194, 115–129.


individual corticospinal axons to motoneurons of multiple muscles in the monkey. *Neurosci. Lett.* 23, 7–12.


on simulated and experimental data sets. *J. Neurophysiol.* 95, 2199–2212.


grasping. *J. Neurophysiol.* 92, 3210–3220.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 18 December 2012; accepted: 13 March 2013; published online: 08 April 2013.*

*Citation: Santello M, Baud-Bovy G and Jörntell H (2013) Neural bases of hand synergies. Front. Comput. Neurosci. 7:23. doi: 10.3389/fncom.2013.00023*

*Copyright © 2013 Santello, Baud-Bovy and Jörntell. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

# Control of reaching movements by muscle synergy combinations

#### *Andrea d'Avella1 \* and Francesco Lacquaniti 1,2,3*

*<sup>1</sup> Laboratory of Neuromotor Physiology, Santa Lucia Foundation, Rome, Italy*

*<sup>2</sup> Center of Space Biomedicine, University of Rome "Tor Vergata", Rome, Italy*

*<sup>3</sup> Department of Systems Medicine, University of Rome "Tor Vergata", Rome, Italy*

### *Edited by:*

*Martin Giese, University Clinic Tuebingen, Germany*

### *Reviewed by:*

*Florentin Wörgötter, University Goettingen, Germany Maurizio Mattia, Istituto Superiore di Sanità, Italy*

### *\*Correspondence:*

*Andrea d'Avella, Laboratory of Neuromotor Physiology, Santa Lucia Foundation, Via Ardeatina 306, 00179 Rome, Italy. e-mail: a.davella@hsantalucia.it*

Controlling the movement of the arm to achieve a goal, such as reaching for an object, is challenging because it requires coordinating many muscles acting on many joints. The central nervous system (CNS) might simplify the control of reaching by directly mapping initial states and goals into muscle activations through the combination of muscle synergies, coordinated recruitment of groups of muscles with specific activation profiles. Here we review recent results from the analysis of reaching muscle patterns supporting such a control strategy. Muscle patterns for point-to-point movements can be reconstructed by the combination of a small number of time-varying muscle synergies, modulated in amplitude and timing according to movement directions and speeds. Moreover, the modulation and superposition of the synergies identified from point-to-point movements captures the muscle patterns underlying multi-phasic movements, such as reaching through a via-point or to a target whose location changes after movement initiation. Thus, the sequencing of time-varying muscle synergies might implement an intermittent controller which would allow the construction of complex movements from simple building blocks.

**Keywords: muscle synergies, reaching movements, human, motor control, intermittency, EMG**

# **INTRODUCTION**

We perform reaching movements frequently and effortlessly, for example when eating food or using a tool. Reaching is a prototypical goal directed behavior and, as such, has been investigated extensively in human and non-human primates. Kinematic and kinetic analyses of reaching have revealed invariant features suggesting that the central nervous system (CNS) relies on simple rules for movement planning and execution. For point-to-point movements, hand paths are often roughly straight and tangential velocity is "bell-shaped" (Morasso, 1981). Moreover, paths do not change much with speed (Soechting and Lacquaniti, 1981) or load (Lacquaniti et al., 1982; Atkeson and Hollerbach, 1985). Tangential velocity profiles have the same shape when normalized for speed and distance. Moreover, shoulder and elbow motions can be quasi-linearly related to each other (Soechting and Lacquaniti, 1981; Lacquaniti et al., 1986), as are the corresponding dynamic muscle torques, i.e., the net muscle torque minus the torque required to counteract gravity (Gottlieb et al., 1997).

In contrast, the analysis of the electromyographic (EMG) activity recorded from many muscles acting on the shoulder and elbow joints has revealed complex dependencies of the shape and timing of the EMG waveforms on the movement direction and speed. For reaching in vertical planes, the EMG waveforms are constructed by combining components related to both dynamic and gravitational torques (Flanders, 1991). The waveform components responsible for the dynamic torques (phasic activations) have an intensity and timing that change with the movement direction in a complex manner: each muscle has a distinct spatial and temporal pattern, with a recruitment intensity which is maximal in multiple directions and a recruitment timing that changes gradually across directions (Flanders et al., 1994, 1996). Thus, there is an apparent discrepancy between the kinematic/kinetic regularities of reaching movements and the variability/complexity of the muscle patterns underlying their control.

The control of reaching movements requires a sensorimotor transformation of visual and proprioceptive information about the target and the initial state of the arm into the coordinated activation of many muscles acting on several joints. Because the dynamic relationships between muscle activation and joint torques and between joint torques and joint motions are complex and non-linear, the control of reaching would seem as a challenging task for the CNS (Bernstein, 1967). In robotics, if the geometrical and inertial characteristics of the arm are known or can be estimated precisely, inverse kinematics and inverse dynamics can be used to compute joint angle trajectories and joint torque commands necessary to follow a desired end-effector trajectory. Moreover, if fast sensing and actuation is available, a desired joint angle trajectory can be executed using feedback control. However, it is unlikely that the CNS performs inverse dynamics computations explicitly. Moreover, the CNS has to cope with substantial sensorimotor delays which often make feedback control insufficient. One possibility that has gained increasing support in recent years is that the CNS simplifies the control of goal directed movements by implementing a direct mapping from the initial state of the arm and the goal into appropriate muscle activity patterns through the combination of a few muscle synergies, that is, coordinated recruitments of groups of muscles (Bizzi et al., 2002, 2008; Tresch et al., 2002; Giszter et al., 2007; Ting and McKay, 2007; d'Avella and Pai, 2010; Lacquaniti et al., 2012). Thus, muscle synergies are thought to be stable and reproducible modules organized by the CNS to take the role of "basis functions." Support for a modular control architecture has been provided in frogs (Tresch et al., 1999; Saltiel et al., 2001; d'Avella et al., 2003; Hart and Giszter, 2004, 2010; Cheung et al., 2005; d'Avella and Bizzi, 2005), cats (Ting and Macpherson, 2005; Torres-Oviedo et al., 2006), monkeys (Overduin et al., 2008, 2012), and humans (Krishnamoorthy et al., 2003; Ivanenko et al., 2004, 2005; d'Avella et al., 2006, 2008, 2011; Torres-Oviedo and Ting, 2007) by identifying a small number of muscle synergies whose combinations explain a large fraction of the variation in the muscle patterns.

Here we first review two notions of muscle synergies commonly used to model the modular organization of muscle patterns, that is, the time-invariant and time-varying muscle synergies. We then review recent results from the analysis of muscle patterns recorded during reaching movements in humans indicating that modulation and superposition of time-varying muscle synergies is a key mechanism for the control of reaching. Timevarying muscle synergies capture spatiotemporal features in the reaching muscle patterns and provide a parsimonious description of the changes of the muscle patterns across conditions, allowing to reconcile the apparent discrepancy between kinematic and kinetic regularities and muscle pattern complexity. Moreover, the superposition and sequencing of time-varying muscle synergies may underlie the intermittent control of complex, multiphasic arm movements.

# **MUSCLE SYNERGIES**

Muscle synergies are building blocks that can be used to control a task in different conditions by selecting a small number of parameters. Synergies are building blocks because they capture a set of features in the muscle patterns that can be reused across movement conditions. In the spatial domain, i.e., across muscles, a muscle synergy captures a specific relationship in the strength of activation of a group of muscles. In the temporal domain, a synergy may capture time-invariant or time-varying relationship among muscles. Considering *D* muscles, a *timeinvariant* synergy can be expressed as a *D*-dimensional vector **w** of weighting coefficients specifying the relative activation level of the muscles (**Figure 1A**). Then, a set of *N* synergies, [**w***i*]*<sup>i</sup>* <sup>=</sup> <sup>1</sup>*,...,N*, can be linearly combined to generate distinct muscle patterns (**Figure 1B**):

$$\mathbf{m}(t) = \sum\_{i=1}^{N} c\_i(t) \,\mathbf{w}\_i \tag{1}$$

where **m**(*t*) is a *D*-dimensional vector that specifies the activation of each muscle at time *t* and *ci(t)* is the time-varying combination coefficient for the *i*-th synergy. Across movement conditions, either the synergies **w**<sup>i</sup> or the activation coefficients *ci(t)*, also referred to as temporal components (Ivanenko et al., 2004), may be invariant. A *time-varying synergy*, in contrast, is comprised by a collection of muscle waveforms that can be expressed as a timevarying vector **w**(*t*) (**Figure 1C**). In this case, the time dependence of the muscle activations is captured by the temporal structure of the synergies and by their onset times (*ti*) and Equation 1 can be written as (**Figure 1D**) (d'Avella et al., 2003):

$$\mathbf{m}(t) = \sum\_{i=1}^{N} c\_i \mathbf{w}\_i (t - t\_i) \tag{2}$$

The combination of time-varying synergies can be seen as a special case of anechoic mixture model (Omlor and Giese, 2011). Thus, time-varying synergies provide a parsimonious representation of the motor output because, once the synergies are given, a few scalar amplitude and onset coefficients are sufficient to specify the entire spatiotemporal structure of the muscle pattern. In contrast, with time-invariant synergies the full time-series of combination coefficients must be specified. When both types of synergies are extracted from the same data, the spatial organization of the time-varying synergies, given by the synergy waveforms averaged across time, closely matches the time-invariant synergies (d'Avella and Bizzi, 2005). However, a larger number of time-invariant synergies is required to capture invariant asynchronous activations across muscles (d'Avella et al., 2006).

### **SYNERGIES FOR FAST REACHING MOVEMENTS**

The analysis of the muscle patterns for fast reaching movements in 3D revealed that the complex dependence of the muscle activation waveforms on movement direction results from the combination of 4/5 time-varying synergies (d'Avella et al., 2006). Muscle synergies were identified from the phasic muscle activation waveforms recorded from up to 19 shoulder and arm muscles during fast point-to-point movements between a central location and eight peripheral targets in both a frontal and a sagittal plane. Phasic waveforms are the components of the EMG signal related to accelerating and decelerating the arm and were computed by subtracting the tonic components responsible for balancing gravitational forces and maintaining postural stability. For each subject, an iterative optimization algorithm was used to extract sets of synergies with an increasing number of elements which minimized the average muscle pattern reconstruction error across multiple directions (d'Avella and Tresch, 2002; d'Avella et al., 2003). The number of synergies was determined, as a compromise between model parsimony and reconstruction accuracy, observing the relationship between the amount of data variation explained by the model (R2) and the number of synergies. The optimal number of synergies was selected as the number at which the R2 curve had a change in slope, suggesting that additional synergies only captured small residual amounts of variation attributable to noise.

Five synergies extracted in one subject (**Figure 2A**) illustrate the typical basic features. Each synergy recruits a specific subset of muscles with a similar biomechanical action (e.g., elbow flexors in the first synergy, elbow extensors in the second synergy) but each synergy involves muscles involving multiple joints (e.g., brachioradialis and trapezius superior in the first synergy), and the same muscle is often recruited by multiple synergies (e.g., medial

different lengths. **(B)** A time-varying muscle pattern [**m***(t)*] is generated by combining the synergies with time-varying scaling coefficients [*ci(t)*]. Different patterns can be obtained by changing the scaling coefficient waveforms. **(C)** Each one of the two time-varying synergies illustrated is

multiplying all waveforms of each synergy by a single scaling coefficient (*ci* ), shifting them in time by a single delay (*ti* ), and summing them together. Different patterns are obtained by changing two scaling coefficients and two delays.

deltoid in the third, fourth, and fifth synergy). The synergy waveforms show synchronous bursts of activation in many muscles as well as bi-phasic bursts (e.g., lateral head of triceps in the second synergy) and asynchronous bursts (e.g., long head of biceps in the second synergy). Some muscle waveforms have negative components, indicating an inhibitory drive that reduces the activation of that muscle due to excitatory drive from other synergies or tonic components.

The reconstruction of the muscle patterns by synergy combination for movements in different directions occurs by recruiting the synergies with different amplitude and at different times (**Figure 2B**). For example, the muscle patterns for a forward movement (*first column*) are generated by recruiting the second plus the third synergy, and the fourth synergy with smaller amplitude and later in time. The second and the third synergies are also recruited in a downward movement (*fourth column*), but with a different balance of activation and a different relative timing. Thus, different muscle patterns underlying reaching movements with different kinematics are captured by selecting a small number of parameters.

Plotting the dependence of the synergy amplitude coefficients on the movement direction in a polar plot (**Figure 2C**) clearly shows that synergy recruitment depends on movement direction (directional tuning) and that each synergy has a specific direction of maximal activation (preferred direction). In contrast to the dependence of individual muscles (Flanders et al., 1996), in most

**FIGURE 2 | Muscle synergies for fast reaching movement. (A)** A set of five time-varying synergies, identified from the muscle patterns recorded during point-to-point movements between one central location and 8 peripheral locations in the frontal and sagittal planes with a movement duration below 400 ms. **(B)** The activation waveforms of 17 shoulder and arm muscles for four movement conditions (*columns*) are reconstructed by activating the five synergies with different amplitudes and at different times and then by combining, muscle by muscle, the amplitude-scaled and time-shifted muscle activation waveforms of each synergies. At the *top* of the panel the gray areas represent the averaged EMG activity and the solid black lines the synergy reconstruction. At the *bottom* of the panel, the amplitude scaling coefficient c*<sup>i</sup>* of each synergy and movement condition is represented by the height of a rectangle and the onset latency *ti* and the duration of the synergy is indicated by the horizontal position of

cases the synergy coefficients have a single peak and, remarkably, the directional tuning is well characterized by a simple cosine tuning (d'Avella et al., 2006). Cosine tuning is characteristic of neural activity in the motor system (Georgopoulos et al., 1982; Caminiti et al., 1991) and represents an optimal encoding of motor commands in terms of accuracy in presence of noise (Todorov, 2002) and minimum effort (Fagg et al., 2002). Thus, the observed cosine tuning of the synergy amplitude coefficients supports the role of muscle synergies as a mechanism for implementing a simple, direct mapping of movement goals into motor commands and suggests that their recruitment may be encoded in motor cortical areas (Overduin et al., 2012).

# **MODULATION OF PHASIC AND TONIC SYNERGIES WITH MOVEMENT DIRECTION AND SPEED**

If movement direction can be controlled by modulating the recruitment of a few time-varying muscle synergies according to a cosine directional tuning, is movement speed also related to synergy recruitment in a simple way? The invariances observed the rectangle. The profile within each rectangle represents the mean muscle waveform of each synergy i.e., they are scaled versions of the waveforms shown below each synergy at the *bottom* of panel **A**. **(C)** The amplitude coefficients (c*i*) for all five synergies (*color coded*) across all eight movement directions in the frontal (*top*) and sagittal (*bottom*) planes are shown in a polar plot. Thus, for each movement direction, the amplitude coefficient is indicated by the distance from the origin of a colored marker in the corresponding direction. Such polar plots clearly show that the amplitude coefficients are modulated by movement direction (directional tuning) and that each synergy has a specific preferred direction (direction of maximal activation). In most cases the directional tuning is well captured by a cosine function (corresponding to a circle in the polar plot). Adapted from (d'Avella et al., 2006) © 2006 by the Society of Neuroscience, with permission.

in the arm kinematics and present in the equations of motions for an articulated arm suggest that a simple scaling rule might be used to control speed. Reaching movements between two given locations are executed at different speeds along an invariant path (Soechting and Lacquaniti, 1981) by scaling in time the entire motion (Atkeson and Hollerbach, 1985). Moreover, the arm motion equations have the property that a solution is invariant for changes in speed (i.e., the resulting joint motion follows the same trajectory with a different time scale) if the dynamic component of the torque profiles is scaled as the inverse of the square of the time scale (Hollerbach and Flash, 1982; Atkeson and Hollerbach, 1985). Thus, the CNS might control the speed of a reaching movement between two locations simply by scaling synergy activation according to movement duration. Such scaling rule would have to be captured by a close-to-quadratic function of the inverse of movement duration (notice, however, that joint torque is related non-linearly to muscle activation).

The analysis of the muscle patterns for reaching in different directions and with different speeds supports the notion of a simple scaling rule for speed control (d'Avella et al., 2008). The patterns recorded during point-to-point movements in eight different directions on the frontal plane with five different movement durations, after scaling in time to equal movement duration, were reconstructed by the combination of three phasic and three tonic time-varying muscle synergies. Phasic synergies, similar in structure to the synergies identified only from the phasic patterns of fast reaching movements and with a similar directional modulation of amplitude and timing coefficients, were also scaled in amplitude by movement speed. The synergy amplitude coefficients for movements in its preferred direction scaled with the maximum speed of the movement according to a power law with an exponent close to two (range over all synergies of five subjects: 1.4–2.7, median 2.0), i.e., approximately in accordance to the torque scaling law. In contrast, tonic synergies, extracted from the muscle pattern without any time-shifts, showed directional modulation in their amplitude coefficients but either non-significant or weak speed dependence (exponent range: 0.1–0.6, median 0.3). Thus, the modulation of a small number of time-varying muscle synergies underlies the control of both direction and speed of point-to-point reaching movements.

# **SUPERPOSITION AND MODULATION OF SYNERGIES FOR MULTI-PHASIC MOVEMENTS**

When reaching a set of different targets in sequence or a target whose location changes after movement initiation, movement kinematics may be complex, with curved paths, and multiple peaks in the tangential velocity. At the kinematic level, such multi-phasic movements can be decomposed as a sequence of superimposed sub-movements, each with the same features of point-to-point movements (Flash and Henis, 1991). As the muscle patterns for point-to-point movements are captured by the combination of a few time-varying muscle synergies, are multiphasic movements constructed by a sequence of the same pointto-point synergies? If superposition holds at the kinematic level, because of the non-linear dependence of the muscle forces and torques on the arm posture, one expects a simple superposition of muscle patterns and muscle synergies not to hold. However, synergies may provide a simple mechanism for generating the muscle patterns underlying a multi-phasic movement by adjusting a small number of control parameters. To test this hypothesis, the muscle patterns recorded during reaching through a viapoint (d'Avella et al., 2006) and to a target changing location after movement initiation (d'Avella et al., 2011) were analyzed using time-varying muscle synergies identified in point-to-point reaching. Indeed, the model of Equation 2 can be extended to allow for the same synergy to be recruited at different, multiple times. When multiple instances of point-to-point synergies were fit to multi-phasic muscle patterns, they reconstructed the muscle patterns with a level of accuracy comparable to that of the point-to-point patterns. However, the recruitment of the synergies, especially those underlying the second phase of the via-point or target change movements, was adjusted with respect to their recruitment in the corresponding point-to-point movement. Indeed, the simple superposition of two, appropriately aligned point-to-point patterns could not reconstruct the multiphasic patterns with the same accuracy as the synergies. Thus,

complex arm movements involving multiple phases appear to be constructed by the modulation and superposition of the same building blocks used for simple point-to-point reaching movements. As time-varying muscle synergies represent an invariant spatiotemporal component of a muscle pattern with a specific duration, the superposition of a set of synergies recruited at different times may be implemented by an intermittent controller.

# **MUSCLE SYNERGIES AND INTERMITTENT CONTROL**

Sensory feedback is crucial for the control of accurate reaching movements and an internal model of the dynamics of the musculoskeletal system can be exploited to construct an optimal feedback controller (Todorov and Jordan, 2002). However, it might be challenging for the CNS to acquire such a model explicitly and to perform the necessary computations. In contrast, an internal model sufficient for constructing an open-loop controller may be acquired implicitly as a mapping from goals and initial states into motor commands, and feedback might be used for on-line adjustments and trial-to-trial adaptation. Muscle synergies may then provide the basis functions that allow acquiring and using such mapping quickly and efficiently by reducing the number of parameters to be adjusted, stored, and retrieved. An open-loop controller is used before feedback can be processed (Woodworth, 1899; Keele and Posner, 1968), e.g., in the initial phase or for brief movements. However, because of noise and inaccuracy in the model, feedback-driven corrections are required for accuracy. While it is often assumed that such corrections are performed continuously, sensory feedback might also be used intermittently to trigger discrete, open-loop corrections (Doeringer and Hogan, 1998; Gawthrop et al., 2011; Loram et al., 2011). In a synergistic controller, such intermittent corrections may be simply implemented by re-using the mapping of goals and states into synergy recruitment coefficients. Sensory feedback may be processed continuously to update an estimate of the current state and goal, necessary to prepare the synergy coefficients for the appropriate correction. In addition, sensory feedback may be used to construct an error signal which, possibly through a threshold process, triggers a correction by recruiting a set of timevarying synergies. As each synergy has a given duration, different synergies or multiple instances of the same synergy may partially overlap and generate a smooth movement that may appear to be continuously controlled. The fact that the same set of muscle synergies observed in fast point-to-point reaching movements also appear to be recruited in via-point and target-change movements, as reviewed above, supports the notion of a synergy-based intermittent controller.

# **CONCLUSIONS**

Reaching muscle patterns are reconstructed by the combinations of a few time-varying muscle synergies. The complex changes of the activation waveforms of individual muscle across movement direction and speed are captured by the modulation in amplitude and timing of these synergies according to simple rules, such as amplitude cosine tuning for direction and time scaling for speed. Multi-phasic reaching movements, such as reaching through a via-point or toward a target whose location changes after movement initiation, appear to be generated by sequencing and superimposing the same small set of muscle synergies identified in point-to-point movements. Thus, the regularities observed in the muscle patterns across movement conditions suggest that muscle synergies are building blocks used by the CNS to control goal directed movement. However, regularities may derive from optimization or task constraints (Todorov and Jordan, 2002; Tresch and Jarc, 2009; Kutch and Valero-Cuevas, 2012). Direct support for muscle synergies as centrally organized building blocks would come either from identifying their neural substrates or by testing the prediction that motor adaptation must be more difficult if it cannot be achieved recombining existing synergies (d'Avella and Pai, 2010). Recent results in frogs (Hart and Giszter,

**REFERENCES**


2010) and monkeys (Overduin et al., 2012) support a neural organization of muscle synergies both at the spinal and cortical levels. Future investigations of adaptation after novel perturbations of the musculoskeletal system either compatible or incompatible with the synergies will help to clarify whether muscle synergies are merely low-dimensional approximations of the muscle patterns or building blocks organized by the CNS.

# **ACKNOWLEDGMENTS**

Supported by the Italian Ministry of Health, the Italian Space Agency (DCMC and CRUSOE), and the EU Seventh Framework Programme (FP7-ICT No 248311 AMARSi).


muscle modes. *Biol. Cybern.* 89, 152–161.


in the frog. *J. Neurophysiol.* 85, 605–619.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 15 January 2013; paper pending published: 25 February 2013; accepted: 03 April 2013; published online: 19 April 2013.*

*Citation: d'Avella A and Lacquaniti F (2013) Control of reaching movements by muscle synergy combinations. Front. Comput. Neurosci. 7:42. doi: 10.3389/ fncom.2013.00042*

*Copyright © 2013 d'Avella and Lacquaniti. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

# Muscle synergies in neuroscience and robotics: from input-space to task-space perspectives

#### *Cristiano Alessandro1 \*, Ioannis Delis 2,3,4, Francesco Nori 2, Stefano Panzeri 4,5 and Bastien Berret <sup>6</sup>*

*<sup>1</sup> Artificial Intelligence Laboratory, Department of Informatics, University of Zurich, Zurich, Switzerland*

*<sup>3</sup> Communication, Computer and System Sciences Department, University of Genoa, Genoa, Italy*

*<sup>4</sup> Institute of Neuroscience and Psychology, University of Glasgow, Glasgow, UK*

*<sup>5</sup> Center for Nueorscience and Cognitive Systems @UniTn, Istituto Italiano di Tecnologia, Rovereto (TN), Italy*

### *Edited by:*

*Martin Giese, University Clinic Tuebingen/Hertie Institute, Germany*

### *Reviewed by:*

*Vladimir Brezina, Mount Sinai School of Medicine, USA Vincent C. K. Cheung, MIT, USA*

### *\*Correspondence:*

*Cristiano Alessandro, Artificial Intelligence Laboratory, Department of Informatics, University of Zuerich, Andreasstrasse 15, CH-8050 Zuerich, Switzerland. e-mail: alessandro@ifi.uzh.ch*

In this paper we review the works related to muscle synergies that have been carried-out in neuroscience and control engineering. In particular, we refer to the hypothesis that the central nervous system (CNS) generates desired muscle contractions by combining a small number of predefined modules, called muscle synergies. We provide an overview of the methods that have been employed to test the validity of this scheme, and we show how the concept of muscle synergy has been generalized for the control of artificial agents. The comparison between these two lines of research, in particular their different goals and approaches, is instrumental to explain the computational implications of the hypothesized modular organization. Moreover, it clarifies the importance of assessing the functional role of muscle synergies: although these basic modules are defined at the level of muscle activations (input-space), they should result in the effective accomplishment of the desired task. This requirement is not always explicitly considered in experimental neuroscience, as muscle synergies are often estimated solely by analyzing recorded muscle activities. We suggest that synergy extraction methods should explicitly take into account task execution variables, thus moving from a perspective purely based on input-space to one grounded on task-space as well.

**Keywords: muscle synergies, modularity, task-space, dimensionality reduction, motor control, robotics, review**

# **1. INTRODUCTION**

One of the fundamental questions in motor control concerns the mechanisms that underlie muscle contractions during the execution of movements. The complexity of the musculoskeletal apparatus as well as its dynamical properties allow biological systems to perform a wide variety of motor tasks (Bizzi et al., 1992); on the other hand, such a complexity has to be mastered by efficient strategies implemented in the central nervous system (CNS). How does the CNS "choose" among the infinity of solutions of a given motor task (i.e., Bernstein problem) (Bernstein, 1967)? How are motor intentions translated into muscle activations? How can biological systems learn and plan movements so rapidly? A prominent hypothesis suggests that motor circuitries are organized in a modular fashion, so that muscle activations can be realized by flexibly combining such modules. Modularity has been observed in various forms such as kinematic strokes, spinal force fields and muscle synergies (Flash and Hochner, 2005); this paper provides an overview of the findings related to the so-called muscle synergies, as well as the application of such a concept in robotics and character animations.

Muscle synergies are defined as coordinated activations of a group of muscles<sup>1</sup> . It has been suggested that the CNS encodes a set of synergies, and it combines them in a task-dependent fashion in order to generate the muscle contractions that lead to the desired movement (muscle synergy hypothesis). Evidence for this organization relies on the spatio-temporal regularities observed in the EMG (Electromyography) activities of several species (Tresch et al., 2002; Bizzi et al., 2008). Since in many cases these regularities appear to be very similar across subjects and motor tasks (i.e., robustness of muscle synergies), scientists have proposed that they might reflect a modular organization of the

*<sup>2</sup> RBCS, Italian Institute of Technology, Genoa*

*<sup>6</sup> UR CIAMS, EA 4532 – Motor Control and Perception Team, Université Paris-Sud 11, Orsay, France*

<sup>1</sup>The term *synergy* has also been used in the context of another motor control hypothesis, the uncontrolled manifold hypothesis (UMH) (Latash, 2010; Latash et al., 2010). In that context, the term refers to "a neural organization of a set of elemental variables (e.g., muscle contractions) with the purpose to ensure certain stability properties of a performance variable produced by the whole set (e.g., desired joint configuration)" (Latash et al., 2008). These studies are out of the scope of this paper, however, we will discuss the concept of M-modes, that has been introduced in the UMH but it is very similar to the definition of synergies we adopt in this manuscript.

underlying neural circuitries. Assuming that muscle activations represent the control input to the musculoskeletal system, in this context muscle synergies are implicitly defined as inputspace generators (i.e., components that are able to generate the necessary input signals).

From a computational point of view, a modular organization based on muscle synergies is very attractive. The activations of many muscles is hypothetically implemented by modulating the contributions of a small set of predefined muscle synergies. Such a dimensionality reduction may simplify motor control and learning, and it may contribute to the adaptability observed in biological systems (Mussa-Ivaldi and Bizzi, 2000). This observation has recently motivated roboticists and control engineers to develop control strategies that are based on the same concept: combination of a small number of predefined actuations. In addition to the possible dimensionality reduction, the modularity of such scheme has the advantage that improved performance may be achieved incrementally by introducing additional synergies to the controller. The price to be paid is the restriction of the possible actuations to those that can be obtained by combining the synergies (i.e., synergies span set). This also implies a reduction of the possible movements that the controlled system can perform.

In the two fields of neuroscience and control engineering, research on muscle synergies is characterized by radically different goals and approaches (see **Figure 1**). In the context of controlling artificial systems, the main goal is the synthesis of a small set of synergies that instantiates an effective control strategy. The obtained controller, as such, is mainly evaluated in relation to task-accomplishment, and in particular it should be able to generate a set of feasible actuations that allows the agent to perform a wide variety of tasks. In neuroscience, on the other hand, the main goal is to validate or falsify the hypothesis of muscle synergy. The typical approach consists in analyzing a dataset of recorded muscle activities, and in verifying if such a dataset is compatible with the proposed modular decomposition; the hypothetical synergies are inferred by applying a decomposition algorithm to the dataset

of EMG signals. Unlike in control engineering, the major focus of this line of research resides at the motor level (i.e., the inputspace of muscle activations); the evaluation of the hypothesized modular organization at the level of task is not always considered and, from our point of view, it deserves more attention. Does the set of identified muscle synergies actually lead to the task performance observed experimentally? Does it generate feasible actuations? These issues have been investigated *a-posteriori* using realistic models of the musculoskeletal systems of different species (Berniker et al., 2009; Neptune et al., 2009; McKay and Ting, 2012). Additionally, novel methodologies to deal with these challenges are starting to emerge in experimental neuroscience as well (Chvatal et al., 2011; Delis et al., 2013). We believe that a shift of paradigm from an input-space to a task-space identification of muscle synergies, which seems to be already in progress, may contribute to a better understanding of the hypothetical modularity of the CNS, and of its relationship to human learning and control. In particular in this review we argue that task-space constraints could be directly integrated in the decomposition algorithm used to extract the synergies.

This paper reviews the studies that investigate the hypothesis of muscle synergies, as well as the methods to control artificial systems that have been developed taking inspiration from this hypothesis. The organization of the paper follows the rationale developed so far. Initially, in section 2, we provide a mathematical formulation of the concept of muscle synergies, we detail different synergy models (proposed as the mechanism to generate muscle contractions), and we analyze their computational implications. In section 3 we discuss the works that evaluate the hypothesis of muscle synergies solely in the space of input-signals, and the ones that seek more direct neural evidence. Then, in section 4, we present the studies that evaluate synergies also at the tasklevel; this section includes robotics, characters animation, as well as neuroscience. Finally, in section 5 we offer further discussions and concluding remarks.

# **2. MODELS OF MUSCLE SYNERGY**

The concept of muscle synergy has been formalized in a variety of mathematical models. We will present these models in the context of controlling a generic dynamical system. This formulation is sufficiently generic to represent both the control of the musculoskeletal system and the control of an artificial agent. Furthermore, it is useful to explain the computational implications of the various synergy models, and to clarify the difference between input-space and task-space evaluation of a set of synergies.

The generic dynamical system we employ can be represented as follows:

$$
\dot{\mathbf{x}}(t) = f(\mathbf{x}(t), t) + \mathbf{g}(\mathbf{x}(t), t)\mathbf{u}(t),
$$

where *t* represents time, **x***(t)* ∈ R*<sup>n</sup>* is the system state variable at time *t* (e.g., angular positions and velocities of the joints), and **u***(t)* ∈ R*<sup>m</sup>* is the system input at time *t* (e.g., muscle activations or joint torques). Within this framework, the variable to be controlled is denoted as **y***(t)* ∈ R*p*, and it is a generic function of the system state: **y***(t)* = *h(***x***(t))*. The task is defined in terms of a set of constraints applied on the time evolution of this variable. Typical examples of tasks include reaching (**y***(tf)* = **y***<sup>d</sup>* where *tf* is the desired reaching time), and tracking (**y***(t)* = **y***d(t)*∀*t*, where **y***d(*·*)* is the desired trajectory to be tracked). We refer to the task-space, as the space where the task **y***<sup>d</sup>* is defined; similarly, the input-space is the space of the input signals **u***(*·*)*. The relation between these two spaces is given by the dynamics of the system. It is now clear that a given control input should always be evaluated in relation to the error between the corresponding evolution of the controlled variable and the desired task; in other words, it should always be evaluated in task-space.

Classically, control inputs **u***(*·*)* belong to the infinite dimensional space of continuous functions. Under this assumption a number of interesting control properties (e.g., controllability and observability) can be proven. The idea behind modular control, is to significantly restrict the control input-space by constraining **u***(*·*)* to be a combination of modules, or muscle synergies. The various muscle synergy models can be distinguished based on the mathematical formalization of this combination, and they are described in the following (see **Figure 2** for a schematic representation). An empirical comparison of these models is proposed by Chiovetto et al. (2013).

### **2.1. TEMPORAL AND SYNCHRONOUS SYNERGIES**

In these models, the control input is defined as a linear combination of *k* vectors **w** ∈ R*m*, with 1-dimensional time-varying coefficients *a(t)* : R<sup>+</sup> → R (**Figure 2A**):

$$\mathbf{u}(t) = \sum\_{j=1}^{k} a\_j(t)\mathbf{w}\_j. \tag{1}$$

Each vector **w***<sup>j</sup>* specifies a balance between the input variables (e.g., balance between muscle activations), and its coefficient *aj(t)* determines its temporal evolution. In the *temporal synergy model*, the coefficients {*aj(t)*} serve as the task-independent predefined modules, and the vectors {**w***j*} represent the new (task-dependent) control input. As a result, this model reduces the control space to *k* × *m* dimensions; i.e., the *k m*-dimensional vectors **w***<sup>j</sup>* have to be appropriately specified to fulfill the desired task **y***d*. Synergies are thus interpreted as the temporal patterns that are recruited selectively by different muscles. In literature, temporal synergies are also referred to as temporally fixed muscle synergies. An important special case, the *premotor drive model*, is obtained by defining the temporal coefficients as *aj(t)* = *Aj*φ*(t* − τ*j)*. In this case, the time course of the vectors **w***<sup>j</sup>* are determined by a common function φ*(t)*, called premotor drive or burst pulse, that can be modulated in amplitude and shifted in time. In contrast, the *synchronous synergy model* defines the task-independent synergies as the vectors **w***j*. The the new control input {*aj(t)*} belongs to the infinite dimensional space of the one-dimensional real functions. Therefore this model, unlike the previous one, provides a dimensionality reduction only if the number of synergies is lower then the number of input variables, i.e., *k < m*. Synchronous synergies are co-varying group of muscles, and are also called time-invariant synergies, spatially fixed muscle synergies, or muscle modes.

**FIGURE 2 | Different models of muscle synergies.** The temporal and the synchronous models explain motor signals as linear combinations of muscle balance vectors (spatial patterns), with 1-dimensional time-varying coefficients **(A)**. In the temporal model, these coefficients serve as task-independent predefined modules, and the spatial patterns as the new (task-dependent) control input. In the synchronous model, on the other hand, the control input is represented by the temporal patterns, while the spatial patterns act as predefined modules. Finally, time-varying synergies are spatio-temporal predefined motor patterns, which can be scaled in amplitude and shifted in time by the new input coefficients **(B)**.

### **2.2. TIME-VARYING SYNERGIES**

This model defines the control input as the superposition of *k* task-independent vector-valued functions **w***(t)* : R<sup>+</sup> → R*<sup>m</sup>* (**Figure 2B**):

$$\mathbf{u}(t) = \sum\_{j=1}^{k} a\_{j} \mathbf{w}\_{j}(t - \mathbf{r}\_{j}). \tag{2}$$

Each synergy **w***<sup>j</sup>* can be scaled in amplitude and shifted in time by means of the coefficients *aj,* τ*<sup>j</sup>* ∈ R. These coefficients represent the new control input, and have to be chosen in order to accomplish the task **y***d*. As a result, the new input-space is reduced to a 2 × *k* dimensional space. Neuroscientifically, these synergies are genuine spatiotemporal muscle patterns which do not make any explicit spatial and temporal separation. As such, according to this model, muscles within the same time-varying synergy do not necessarily co-vary.

# **3. SYNERGIES AS INPUT-SPACE GENERATORS**

As discussed above, muscle synergies can be considered as inputspace generators. Whether or not these generators are implemented in the CNS, and how they are eventually coordinated through the sensorimotor loops, is a main stream of research in motor neuroscience. To tackle this question, scientists have employed two main approaches. One of them is solely based on the analysis of EMG signals, therefore it can only provide indirect evidence of a modular neural organization. The other approach aims at locating the areas of the CNS where muscle synergies might be implemented, therefore providing a direct evidence. These methodologies as well as the obtained results are discussed in the following.

## **3.1. INDIRECT EMG-BASED EVIDENCE**

The classical approach to evaluate the hypothesis of muscle synergies consists in searching spatio-temporal regularities (i.e., synergies) in a dataset of muscle activities (**Figure 3**, continuous green arrows). Such a dataset is obtained by recording the EMG signals from a group of subjects/animals that are performing some prescribed motor tasks. As such, this methodology is mainly based on considerations grounded at the input level. The possibility to discriminate the various task instances from motor signals represents the only (*a-posteriori*) task-related verification of the identified synergies (see **Figure 1**).

Linear dimensionality reduction algorithms are employed to identify a small set of components (i.e., synergies) that approximate the EMG dataset according to the chosen synergy model (see section 2). The number of synergies to be extracted has to be specified *a-priori* by the experimenter, as it constitutes an input parameter of the decomposition algorithm. The choice of the decomposition algorithm to be used depends on the assumptions made on the nature of the hypothetical muscle synergies (e.g., non-negativity, orthogonality, statistical independence etc.) (Ting and Chvatal, 2010). Principal component analysis (PCA) (Mardia et al., 1980) looks for orthogonal synergies that account for as much of the variability in the data as possible. Similarly, factor analysis (FA) (Darlington, 1968) seeks the smallest set of synergies that can account for the common variance (correlation) of a set of muscles. Independent component analysis (ICA) (Bell and Sejnowski, 1995) maximizes the statistical independence of the extracted components, thus it assumes that synergies represents independent information sources. Non-negative matrix factorization (NMF) (Lee and Seung, 1999) enforces the extracted synergies and their activation coefficients to be non-negative; this constraint reflects the non-negativity of neural and muscle activations ("pull-only" behavior). Additionally, NMF does not assume that the generators are statistically independent, thus it is more compatible with the observation that activations of multiple synergies are correlated (Saltiel et al., 2001). Finally, the extraction of time-varying synergies is performed by an NMF-based algorithm developed *ad-hoc* that allows the components to be shifted in time (d'Avella and Tresch, 2002).

To assess the quality of the extracted synergies, the socalled VAF (Variance Accounted For) metric is typically used

### **FIGURE 3 | Procedures for the identification and the testing of muscle synergies.** In experimental neuroscience (green arrows), initially a group of subjects perform the tasks prescribed by the experimenter **(A)**. The EMG signals acquired during the experiments **(B)** are then analyzed, and a dimensionality reduction algorithm is applied to obtain the synergies **(C)**. Very often such synergies are not evaluated at the task-level (dashed arrow), therefore there is no guarantee that they lead to the observed task

performance. In robotics (red arrows), synergies are synthesized **(C)** based on the requirements of the desired class of tasks **(A)**. Then they are appropriately combined to generate the motor signals **(B)** to solve a specific task instance. The quality of the synthesized synergies is finally tested in terms of the obtained task performance **(A)**. Without loss of generality, the figure presents the time-varying synergy model; however, the previous description holds for all the models.

(see **Figure 1**). VAF quantifies the percentage of variability in the EMG dataset that is accounted for by the extracted synergies. High values of VAF indicate good reconstruction of the recorded EMGs, which lends credit to the extracted synergy set; low VAF values cast doubt on the extracted synergies, indicating that they do not explain a large part of the EMG variance. This metric is also used for determining the dimensionality of the synergy space. The criteria used for this purpose rely on the assumption that most of the EMG variability is attributable to task-dependent muscle activations, whereas a small portion is due to several sources of noise. Under this assumption, the number of synergies is defined either by the point where the VAF-graph (i.e., the curve that describes the trend of the VAF as function of the number of synergies, which increases monotonically) reaches a threshold level (e.g., 90%) (Torres-Oviedo et al., 2006), or by its flattening point, i.e., the point where a drastic decrease of slope is observed. Such an "elbow" is in fact interpreted as the point that separates "structured" and noisedependent variability, and therefore it can be used to define the minimum number of synergies that capture the task-related features (d'Avella et al., 2006; Tresch et al., 2006). Besides the VAF metric, other metrics [e.g., log-likelihood (Tresch et al., 2006)] have been proposed to evaluate the effectiveness of extracted synergies (still in input-space); a thorough discussion of these metrics is beyond the scope of the present review. As depicted in **Figure 1**, this indirect methodology is mainly restricted to the analysis of input-level data. A complementary metric based on single-trial task-decoding techniques has been proposed by Delis et al. (2013).

A significant amount of experiments has been conducted in frogs, cats, primates as well as humans in order to test the validity of the above-mentioned synergy models, and by extension, of the muscle synergy hypothesis itself. A pioneering study showed that a small set of synchronous muscle synergies could generate a large number of reflexive motor patterns produced by cutaneous stimulations of the frog hindlimb (Tresch et al., 1999). This study also demonstrated that microstimulations of the spinal cord produced very similar muscle synergies to the ones generated by the freely moving animal. Qualitatively similar synergies were also found by intraspinal microstimulation (Saltiel et al., 2001). The above analysis was then extended in order to identify spatiotemporal patterns of muscle activities (i.e., time-varying muscle synergies) (d'Avella et al., 2003). A few time-varying synergies were shown to underlie the muscle patterns required to let the frog kick in different directions, and their recruitment was directly related to movement kinematics. These findings were further generalized to a wide variety of frog natural motor behaviors such as jumping, swimming, and walking; evidence for both synchronous and time-varying synergies was reported (d'Avella and Bizzi, 2005). Additionally, this study revealed that some synergies are shared across motor behaviors, while others are behavior specific.

The synergy models described in section 2 do not include sensory feedback, however, the original experiments on animals involved sensory-triggered reflexive movements. In fact, only a few studies have systematically investigated the influence of sensory feedback in the muscle synergy organization. Cheung et al. (2005) analyzed the EMG signals collected from the bullfrog during locomotor behaviors before and after having interrupted its sensory pathways (i.e., deafferentation). Their findings support the existence of centrally organized synchronous muscle synergies that are modulated by sensory inflow. Further support was provided by showing that an appropriate modulation of the synergy activations could explain immediate motor adjustments, and that these synergies were robust across different dynamic conditions (Cheung et al., 2009a). A discussion on the role of sensory feedback is provided in section 5.

A number of studies have examined the generalization of the above results to other species. In primates, Overduin et al. (2008) found that three time-varying synergies described a large repertoire of grasping tasks. Shape and size of the grasped objects were shown to modulate the recruitment strength as well as the timing of each synergy. In this way, this study validated that time-varying synergies account for salient task differences, and their activations can be tuned to adapt to novel behavioral contexts. Along the same lines, Brochier et al. (2004) provided further support for such a robust and distinctive synergistic organization of primates' muscle patterns during grasping. Analysis of single-trial EMG signals demonstrated that the time-varying activation of three synchronous synergies was reproducible across repetitions of the same grasping task and allowed unequivocal identification of the object grasped in each single trial. In cats, Ting's group showed that muscle synergies could be mapped onto the control of task-level variables; such experiments will be detailed in section 4.2.

The framework of muscle synergies has been successful also in characterizing the spatio-temporal organization of muscle contractions during human reaching tasks. Muscle patterns observed during movements in different directions (d'Avella et al., 2006) and speed (d'Avella et al., 2008) were accurately reconstructed by appropriate linear combinations of synergies, which appeared very similar across subjects. The synergies that were extracted from muscle activities during unloaded reaching (i.e., subjects did not hold any load in their hands) accounted for the EMG signals obtained during loaded conditions. The recruitment of the individual synergies, as well as their onset time, were consistently modulated with movement direction, and did not change substantially with movement speed. This observation was further confirmed by Muceli et al. (2010); in this study a small set of specialized synchronous synergies was able to explain a large set of multijoint movements in various directions. Finally, visually guided online corrections during center-out reaching were tested recently. The synergistic strategy was shown to be robust and more effective in explaining the corrective muscle patterns than the individual muscle activities (d'Avella et al., 2011). Furthermore, it was shown that to correct ongoing reaching movements, the CNS may either modulate existing synergies (d'Avella et al., 2011), or reprogram new ones (Fautrelle et al., 2010).

Roh et al. (2012) showed that an appropriate set of synergies could reconstruct the average patterns of muscle activation observed during isometric forces production in humans. The EMG signals were obtained for different force magnitude, directions and initial postures. The extracted synergies were very similar across conditions, and they were able to explain the corresponding datasets. Each synergy seemed to underly a specific force direction, while its activation coefficient appeared correlated to the force magnitude. In another series of experiments, a small set of synchronous synergies was able to explain static hand postures and discriminate the shapes of grasped objects (Weiss and Flanders, 2004). Moreover, a few time-varying synergies succeeded in revealing the spatiotemporal patterns of muscle activity during hand shape transitions, as in fingerspelling (Klein Breteler et al., 2007).

A relevant series of experiments showed that muscle activations involved in human postural control can be explained in terms of combinations of muscle synergies. A set of synchronous muscle synergies was able to explain muscle activations involved in postural stabilization; the EMG variation observed among trials and perturbation directions was accounted for by appropriate modulations of the synergies activation coefficients (Torres-Oviedo and Ting, 2007). In order to verify that the extracted synergies did not depend only on the specific biomechanical context, in a new experiment a set of subjects were asked to react to support perturbation from different postural configurations (Torres-Oviedo and Ting, 2010). The extracted synergies were very similar across the different conditions; however, in some cases task-specific muscle synergies needed to be added to the original synergy set to obtain a satisfactory EMG reconstruction. As the various postures lead to different patterns of sensory inflow, these results rule out the possibility that the observed synergies are only determined by specific patterns of sensory stimulations. On the contrary, they support the hypothesis that different muscle postural responses are generated by task-related modulations of the synergy activation levels. Such a hypothesis found evidence in the experiments performed by Safavynia and Ting (2012), where the temporal recruitment of the identified synchronous muscle synergies were explained by a mathematical model that explicitly takes into account the kinematic of the subject's center-of-mass (CoM). The authors then concluded that synchronous muscle synergies are recruited according to an estimate of task-related variables. The same model was previously used to fit the activations of each muscle independently during the same postural perturbation tasks (Welch and Ting, 2007). Related to postural control, Krishnamoorthy and colleagues analyzed the muscle activations that underly shifts of the centers of pressure (COP) of standing subjects (Krishnamoorthy et al., 2003a,b). In this experiment three "muscle modes," extracted by means of PCA, explained most of the variability of the integrated EMG signals. Such components are equivalent to synchronous muscle synergies as defined in section 2, and they are characterized by the authors as the independent elemental variables that are controlled synergistically (in the sense of the UMH) by the CNS to stabilize the COP. Specifically, the model assumes that the location of the COP is modified by linear combinations of the M-modes, and their mixing coefficients represent the independent variables controlled by the CNS. Perreault et al. (2008) examined the organization of reflexes involved in postural stabilization in both stiff and compliant environments; although reflexive responses are modulated by the direction of perturbation, they showed that the synchronous muscle synergies appear very similar across conditions.

Another scenario that provides evidence to the hypothesis of muscle synergies is human locomotion (Ivanenko et al., 2006a; Lacquaniti et al., 2012b). Ivanenko et al. (2004) showed that five temporal synergies could reconstruct the muscle activity involved in locomotion tasks. These patterns are robust across walking speeds and gravitational loads, and they relate to foot kinematics (Ivanenko et al., 2003). Additionally, the same temporal synergies (accompanied by additional ones) were observed during the coordination of locomotion with additional voluntary movements (Ivanenko et al., 2005). Similar results have been reported in other locomotor behaviors such as running (Cappellini et al., 2006) and pedaling (Hug et al., 2011).

Finally, some experiments have investigated how the hypothetical synergy organization of the CNS evolves during onthogenetic development (Lacquaniti et al., 2012a). Dominici et al. (2011) observed that the two temporal synergies identified in stepping neonates are retained through development, and they are augmented by two new patterns first revealed in toddlers. The final set of synergies was observed in several animal species, consistent with the hypothesis that, despite substantial phylogenetic distances and morphological differences, locomotion is built starting from common temporal synergies. This conclusion was also supported by the comparison of temporal synergies extracted from young and elderly people, which revealed no significant effect of aging on synergy compositionality and activation (Monaco et al., 2010).

# **3.2. DIRECT NEURAL EVIDENCE**

The studies presented so far support the existence of synergistic muscle activations during the sensorimotor control of movements. However, these methods are indirect, in the sense that the presence of synergistic structures within the CNS can only be inferred. What remains to be tested is whether the uncovered muscle organization is neurally implemented in the CNS and, if so, in which areas. Alternatively, one could argue that the extracted synergies represent a phenomenological output of the motor coordination required for movement execution. For instance, recently Kutch and Valero-Cuevas (2012) designed carefully thought experiments and simulations to show that muscle synergies can be observed even if the nervous system does not control muscles in groups. The authors demonstrated that muscle synergies, as detected via dimensionality reduction methods (see section 3.1), may originate from biomechanical couplings and/or from constraints of the task. Similar conclusions were already reached by Valero-Cuevas et al. (2009), who showed that the observed within-trial variability of EMG data underlying the production of fingertip forces, was incompatible with the (unique) associated muscle synergy that would have been extracted. Although these findings do not directly falsify the muscle synergy hypothesis, they cast at least some doubts about the sole neural origin of modularity.

This underlines the need for a more critical assessment of the validity of the muscle synergy hypothesis. In this direction, a number of recent studies sought evidence for a neural implementation of muscle synergies, and examined which regions of the CNS may express synergies and their activations. This question has been addressed by attempting to relate neural activity with simultaneously recorded muscle activity during performance of different motor tasks. Using such an approach, Holdefer and Miller (2002) provided direct support for the existence of neural substrates of muscle synergies in monkey's primary motor cortex. In particular, they studied the activity of neurons and muscles during the execution of a variety of reaching and pointing movements, and they found that the discharge of individual neurons represents the activation of functional groups of muscles. In addition, Hart and Giszter (2010) showed that some interneurons of the frog spinal cord were better correlated with temporal synergies than with individual muscles. Therefore, they suggested that these neural populations constitute a neural basis for synergistic muscle activations (Delis et al., 2010). Another study demonstrated that the sequential activation of populations of neurons in the cat's motor cortex initiates and sequentially modifies the activity of a small number of functionally distinct groups of synergistic muscles (Yakovenko et al., 2010). Similarly, Overduin et al. (2012) showed that microstimulations of specific regions of the motor cortex of two rhesus macaques corresponded to well-defined spatial patterns of muscle activations. These synchronous synergies were very similar to those extracted from the same animals during natural reaching and grasping behaviors. Extending this research line in the context of motor learning, Kargo and Nitz (2003)showed that early skill learning is expressed through selection and tuning of primary motor cortex firing rates, which specify temporal patterns of synergistic muscle contractions in the frog's limb. Finally, Roh et al. (2011) analyzed the muscle patterns of the frog before and after transection at different levels of the neuraxis: brain stem, medulla and spinal cord, respectively. They found that medulla and spinal cord are sufficient for the expression of most (but not all) muscle synergies, which are likely activated by descending commands from supraspinal areas. Similarly, Hart and Giszter (2004) examined the compositionality of temporal synergies in decerebrated and spinalized frogs. Their results indicated that in both cases temporal synergies consisted of pulsed or burst-like activations of groups of muscles. They also showed that brainstem frogs had more focused muscle groups and showed richer behaviors than spinalized equivalents.

In humans, the main approach to locate hypothetical muscle synergies has been to analyze brain-damaged patients. Comparing the synergies extracted from healthy and braindamaged subjects could provide hints about the neural centers involved in the synergistic control of muscles. In this vein, examining motor tasks involving arm and hand movements, Cheung et al. (2009b) showed that the synchronous synergies extracted from the arm affected by a stroke were strikingly similar to the ones extracted from the unaffected arm, concluding that muscle synergies were located in regions of the CNS that were not damaged. In a second study involving subjects with more severe motor impairment (Cheung et al., 2012), they found that synchronous synergies may be modified according to three distinct patterns—including preservation, merging, and fractionation of muscle synergies—reflecting the multiple neural responses that occur after cortical damage. These patterns varied as a function of both the severity of functional impairment and the temporal distance from stroke onset. Similarly, Roh et al. (2013) found systematic alterations of the upper limb synergies involved in isometric force production in stroke patients with severe motor impairment. However, these alterations did not involve merging or fractionation of normal synergies. Clark et al. (2010) investigated the modular organization of locomotion in stroke patients. They found a coordination pattern consisting of fewer synchronous synergies than for the healthy subjects. These synergies resulted from merging of the synergies observed in healthy subjects, suggesting reduced independence of neural control signals. In contrast, Gizzi et al. (2011) demonstrated that the temporal waveforms of the synergy activation signals, but not the synchronous synergies, were preserved after stroke.

Finally, a different but worth-mentioning approach has been the attempt to map the activity of leg muscles onto the alphamotoneuron pools along the rostrocaudal axis of the spinal cord during human locomotion (Ivanenko et al., 2006b, 2008). Using this procedure, the authors could infer the temporal and spatial spinal motor output for all the muscles of the legs during a variety of human walking conditions, and relate them to the control of task-relevant variables such as center of mass displacements. Overall, their findings support the existence of some spinal circuitry that implement temporal synergies. The strength of this approach resides in the explicit use of anatomical and clinical charts that document the innervation of the lower limb muscles from the lumbosacral enlargement (Cappellini et al., 2010).

# **4. SYNERGIES FROM THE PERSPECTIVE OF THE TASK-SPACE**

# **4.1. FROM INPUT-SPACE TO TASK-SPACE: GENERAL RATIONALE**

The methodology presented in section 3.1 undeniably led to many crucial insights, however, it does not guarantee that the extracted synergies account for the observed task performance. VAF-like metrics only measure the capability of the synergies to reconstruct/fit the dataset of recorded "input-signals" (i.e., EMG data). Moreover, in some studies, such signals are averaged across movement repetitions. In this case, the VAF constitutes an average indicator, and it does not quantify the capability of the synergies to reconstruct each individual trial (Ranganathan and Krishnan, 2012). Since the musculoskeletal apparatus is a non-linear system, these approximations of the recorded muscle activities may not lead to the observed task performance (Broer and Takens, 2011; section 1.1), a condition that would harm the validity of the hypothesized modular control structure. On a similar note, the extracted synergies might generate unfeasible joint torques. Finally, even if the dataset of muscle activity is very well approximated, additional muscles that are not recorded during the experiment might have a crucial role in the generation of the movement. These issues emerge because the dynamics of the musculoskeletal system (i.e., its input–output relation) is not directly taken into account in the synergy decomposition algorithms.

In this section we review those works that attempt to relate muscle synergies to performance variables defined in task-space. Initially, we present the concepts of functional synergies and spinal force fields. The former constitutes a valid strategy to include the task variables into the classical EMG-based analysis; the latter provides task-based evidence for neurally implemented muscle synergies. Then, we discuss some studies that, in the context of biomechanics, employ plausible musculoskeletal models to test the movements obtained from experimentally extracted muscle synergies. Finally, we shift our attention to robotics and characters animation. In these fields, the main challenge is the synthesis of a small set of synergies that reduces the dimensionality of control and, at the same time, spans a subspace of actuations that allows the agent to perform a wide variety of tasks (**Figure 3**, red arrows). Ideally, the synthesized synergies should preserve controllability and reachability of the system. Loosely speaking, this means that any desired system state can be reached by an appropriate control input (i.e., combination of synergies) in a finite amount of time. At the motor level, it is important that the synergies generate feasible actuations; additional properties, such as the generation of optimal control signals, may also be desirable (see **Figure 1**).

# **4.2. FUNCTIONAL MUSCLE SYNERGIES AND SPINAL FORCE FIELDS**

In most of the works presented so far, the functional role of muscle synergies is estimated *a-posteriori* by analyzing the dependence of the recruitment coefficients (i.e., gain and/or onset time) on the task conditions (e.g., reaching direction, force magnitude and direction, perturbation direction). Typically, each muscle synergy is assumed to underlie the task-level functionality observed in conjunction with the higher values of its activation coefficient. As an example, the analysis of directional tuning curves illustrated that some of the synergies were directly related to reaching in specific directions (d'Avella et al., 2008). A different approach is taken by a pool of studies which define the concept of functional synergies; i.e., components, typically extracted by means of NMF, of a dataset containing both EMG signals and measurements of defined task-related variables. As a result, each component consists of two elements: a balance of muscle contractions (i.e., synchronous muscle synergy), and the evolution of the task-related variables induced by such a muscle synergy (taskrelated vector). In our view, the concept of functional synergies provides a way to tackle the drawbacks of input-based extraction algorithms: if a set of functional muscle synergies extracted from a training-set is able to reconstruct both the EMG and, more importantly, the task-related signals observed in another set of data (testing set), then it is more likely that combinations of such muscle synergies will generate the appropriate control signals to perform the task successfully.

Functional muscle synergies were analyzed in the context of postural tasks in experiments with humans (Chvatal et al., 2011) and cats (Ting and Macpherson, 2005; Torres-Oviedo et al., 2006). The task-related variables were defined as the forces measured under the feet of the subject, which reacted to unexpected motions of the support surface. The experiments showed that each subject exhibited the same functional synergies for both stepping and non-stepping responses to perturbations (Chvatal et al., 2011), suggesting that a common pool of muscle synergies, with specific biomechanical functionalities, can be used by the CNS to drive the motion of the CoM independently of the subject's behavioral response. The functional synergies extracted from the non-stepping data were able to reconstruct the EMG signals, the CoM acceleration and the direction (not the magnitude) of the forces recorded during stepping responses; however, an additional stepping-specific muscle synergy was needed to increase the quality of EMG reconstruction. Generality and robustness of functional synergies were also analyzed in postural experiments with cats (Torres-Oviedo et al., 2006). In this study, a group of cats experienced both translations and rotations of the support surface. Functional muscle synergies were extracted from a dataset containing EMG signals and ground forces observed for different postural configurations (i.e., distances between the anterior and the posterior legs). The functional synergies extracted during surface translations for the most natural posture were able to reconstruct the data observed in all the other conditions (i.e., different postural configurations and surface rotations). Moreover, functional synergies appeared very similar across subjects. These results suggested that each muscle synergies implements a specific biomechanical functionality (Ting and Macpherson, 2005), which is general across tasks and robust across subjects.

The methodology proposed by Ting and colleagues is undoubtedly a valuable attempt to identify muscle synergies that are directly related to task execution, however, it presents some limitations. First, NMF extracts non-negative components and coefficients; while this constraint is well justified at the muscle activation level (see section 3.1), task variables may exhibit negative values. Second and more important, in addition to a linear superposition also at the task-level, this decomposition procedure assumes that both EMG signals and task-variables are generated with the same mixing coefficients. Although it is possible to obtain a good fit of a given dataset, due to the non-linearity of the musculoskeletal system, this assumption does not hold in general.

A radically different approach to investigate the modularity of motor circuitries consists in analyzing the so called spinal force fields. This method is grounded on the seminal discovery that electrical stimulations of individual regions of the frog's spinal cord produce peculiar isometric endpoint forces that depend on the posture of the limb; the direction of the force vectors within each of these fields is invariant over time, while their magnitudes are characterized by a specific time evolution. Additionally, each of these force fields features a specific point of convergence. Structures with these characteristics can be generated by groups of coactive and linearly covarying muscles (Giszter et al., 1993; Mussa-Ivaldi et al., 1994). In particular, only a small subset of all the possible muscle combinations leads to robust and convergent force fields (Loeb et al., 2000). Therefore, the observation of such characteristics in an experimentally measured force field can be regarded as an indirect evidence for spinally implemented temporal muscle synergies (see section 2). Kargo and Giszter (2000b) showed that rapid corrections of movements in wiping frogs can be explained as linear combinations of spinal force fields. Additional evidence was obtained by examining the force fields generated by frogs (Giszter and Kargo, 2000) and turtles (Stein, 2008) that exhibited deletion of motor patterns. Another method to investigate the nature of spinal circuits is the analysis of feedback mechanisms in relation to force fields. Different external excitations of the frog's muscle spindles during wiping reflexes led to structurally invariant force fields across time. Furthermore, the bursts of muscle activity underlying the wiping behavior and the balance of activations across muscles were not altered by the spindle feedback. Instead, feedback regulated the amplitude and the timing of each single burst. Since these variables did not covary across the pulses, the authors concluded that individual premotor drive pulses and not time-varying synergies are the units of spinal activity (Kargo and Giszter, 2008). Such hypothetical neural organization is compatible with the synergy scheme proposed by Drew et al. (2008) and Krouchev et al. (2006) for locomotive behaviors. These schemes allow a sequential activation of coordinated groups of muscles, a mechanism that can be implemented in the premotor drive model by modulating the onset time of the bursts. Spinal force fields are effectively task-level representations of hypothetical neural modules, however, this methodology does not provide any estimate of what the corresponding muscle synergies may look like. Moreover, the relation between linear combinations of muscle synergies and linear combinations of force fields is far from being trivial.

# **4.3. NEUROMECHANICAL MODELING**

Although many studies in experimental motor control provide support to the hypothesis of muscle synergies, it is hard to test whether the proposed control model can effectively lead to the task performance observed experimentally and generalize to other tasks. This issue can be tackled computationally by employing biologically plausible models of the musculoskeletal apparatus.

A pool of studies investigate if a modular organization like the synchronous synergy model can explain a complex task like human walking (Neptune et al., 2009; McGowan et al., 2010; Allen and Neptune, 2012). A set of synergies are identified from a dataset of recorded EMG signals by means of NMF. Such "modules" are then used to generate the muscle control inputs to a musculoskeletal model of the human legs. Using these synergies as a first guess, a numerical procedure optimizes the relative level of muscle activation within each module and the time course of the weighting coefficients; the objective is to minimize the difference between the results of the forward simulation and the values of the task variables measured experimentally. The walking kinematic and the ground reaction forces are well reproduced by 5 modules, if the motion is constrained in 2D (Neptune et al., 2009), and 6 modules for 3D walking (Allen and Neptune, 2012). Additional simulations reveal that the muscle groups identified during normal walking are able to emulate walking tasks with very different mechanical demands (i.e., change in mass and weight of the models) (McGowan et al., 2010). These results agree with the theoretical considerations formulated by Nori et al. (2008). Finally, this research shows that each module is associated to a specific biomechanical functionality (e.g., body support, forward propulsion, leg swing and balancing).

Related results are presented by McKay and Ting (2008, 2012). The goal of these studies is to predict the patterns of muscle activities and the ground reaction forces observed experimentally in unrestrained balance tasks with cats (Torres-Oviedo et al., 2006). Muscle contractions for an anatomically-realistic musculoskeletal model of the cat are computed; the used optimization procedure constrains task-related variables (i.e., center of mass) to the experimental results. Although many different cost functions are tested, the best predictions are achieved by minimizing control effort (i.e., total squared muscle activation). Predictions improve if muscle contractions are constrained to linear combinations of the experimentally derived synergies (Torres-Oviedo et al., 2006); however, the overall control effort increases, and the range of admissible ground forces reduces substantially. Furthermore, these studies validate the assumption made by Torres-Oviedo et al. (2006) that the ground reaction forces associated to each synergy rotate as a function of the limb axis. These results suggest that muscle synergies are feasible physiological mechanisms for the implementation of near-optimal or "good-enough" motor behaviors (de Rugy et al., 2012).

Kargo et al. (2010) employed a biomechanical model of the frog hindlimb to test whether the model of premotor drive could account for the wiping behavior observed experimentally (Kargo and Giszter, 2008). The parameters of the premotor drive model (i.e., muscle groups, pulse time course, and amplitude and phasing of the single synergies) are initially identified to reproduce experimental isometric forces and free limb movement kinematics. As expected, starting from different limb postures the derived feedforward control fail in driving the simulated limb toward the target. However, as showed by Kargo and Giszter (2008), appropriate feedback modulations of the amplitude and the phase shift of the drive burst, and the adjustment of muscle balance based on the initial configuration of the limb, are enough to generate successful muscle activations. Furthermore, the limb trajectories obtained with and without feedback are very similar to those observed in intact and deafferented (Kargo and Giszter, 2000a) frogs, respectively. These results support the model of premotor drives, in which feedback mechanisms preserve the duration of the pulses.

Berniker (2005) analyzed mathematically the control scheme of muscle synergies and proposed a principle for its formation (Berniker et al., 2009). A linear reduced-dimensional dynamical model that preserves (to the best extent possible) the natural dynamic of the original system is initially computed. Synergies are defined as the minimal set of input vectors that influence the output of the reduced-order model (Berniker, 2005), and that minimally restrict the commands (and the resulting responses) useful to solve the desired tasks (Berniker et al., 2009). Practically, this set is found by optimizing the synergy matrix over a representative dataset of desired sensory-motor signals. This method was able to synthesize a set of synergies for the model of the frog hindlimb that were very similar to the ones observed experimentally (Cheung et al., 2005). Furthermore, the synergy-based controller produced muscle activations and kinematic trajectories that were comparable with the ones obtained with the best-case controller (that can activate each muscle independently).

# **4.4. ROBOTICS AND CHARACTER ANIMATION**

In the context of robotics and characters animation, the concept of muscle synergies is appealing as it provides a strategy to reduce the number of variables to be controlled (synchronous synergy model), or more generically, the dimensionality of the control signals (time-varying synergy model). Animated characters are embedded in physical environments (i.e., dominated by physics laws) thus the associated control problem is totally equivalent to the control of a musculoskeletal model or of a humanoid robot. In this section we present the works that have been carried out in these fields of research.

The work proposed by Mussa-Ivaldi (1997) is one of the first attempts to develop a controller based on the modularity observed in biological systems (Mussa-Ivaldi and Giszter, 1992). The idea is that the motion of a kinematic chain can be determined by a force field applied to its end effector. Inspired by the experiments performed by Giszter et al. (1993), such a force-field results from the linear combination of basic fields, each characterized by a single equilibrium point in operational space. Results show that, for a simulated planar kinematic chain, an appropriate choice of the basis-field coefficients can produce a wide variety of end-effector trajectories. Similarly, Mataric et al. (1999) ´ used force fields to drive joint torque controllers on a rigid-body animated character (Mataric et al., ´ 1998a,b).

Although the concept of spinal-force field is very similar, Mussa-Ivaldi's work does not directly use the notion of synergy as defined in section 2. A step forward is taken by Nori and Frezza, who propose a mathematical formulation for a set of actuations (i.e., synergies) that comply with the hypothetical properties of spinal-force fields (Mussa-Ivaldi and Bizzi, 2000). The mathematical description of the synergies is derived from the closed-form solution of an optimal control problem. Additionally, a feedback controller assures that the system follows the desired trajectory toward the synergy equilibrium position. It is proved that the proposed formulation guarantees system controllability <sup>2</sup> . The synthesized synergies are successfully tested on a simulated twodegrees-of-freedom (dof) planar kinematic chain (Nori, 2005; Nori and Frezza, 2005).

The idea that each synergy solves a well-defined control problem [e.g., to lead the system to a specific equilibrium position (Nori and Frezza, 2005)], appears in several other studies (Chhabra and Jacobs, 2008; Todorov, 2009; Alessandro and Nori, 2012). Chhabra and Jacobs (2008) propose a method called Greedy additive regression (GAR). A library of task-specific actuations (synergies) are kept in memory. When a new task has to be performed, a suitable actuation is initially searched in the linear span of these synergies. If the lowest task-error is above a certain threshold, the task will be solved via traditional methods (e.g., feedback error learning), and the obtained actuation will be added to the library. If the library already contains the maximum number of synergies allowed, the least used one will be removed. The obtained results suggest that the synergies synthesized via GAR outperform primitives based on PCA if the dynamical system is non-linear (planar kinematic chain), and there is no statistical difference for linear systems. However, no theoretical explanation is provided.

In the same vein, Todorov (2009) proved that, for a certain class of stochastic optimal control problems, an appropriate change of variable in the Bellman equation allows to obtain the optimal control policy as a linear combination of some primitives. These primitives are, in turns, solutions to other optimal control problems. Such a method has recently been tested in the context of character animation (da Silva et al., 2009). It is important to clarify that this theory provides a theoretical grounding to the compositionality of optimal control laws, but like GAR it does not provide a method to compute such primitives. In fact, although new efficient methods have been proposed recently, solving an optimal control problem remains quite computationally intense, and it might be unfeasible for systems with a large number of dof.

Another mathematical framework, that has recently been developed in the context of character animations, is based on the optimal anechoic mixture decomposition model, mathematically equivalent to the time-varying synergy decomposition. Specifically, complex kinematic animations are obtained by mixing primitive source signals that are learned from motion captured data (Mezger et al., 2005; Park et al., 2008a,b; Giese et al., 2009). Within this framework a number of interesting results have been achieved, including a mathematical proof of stability properties for groups of characters that interact in various ways (Mukovskiy et al., 2011).

The procedure presented by Alessandro et al. (2012) is grounded on a method to solve generalized reaching tasks called dynamic response decomposition (DRD). In this context, a task is defined as a list of constraints on the values of the state variables at given points of time. Initially, a state-space solution is computed by interpolating these constraints by means of a set of dynamic responses (i.e., evolutions of the state variables); then, inverse dynamics is used to obtain the corresponding actuations. Based on this technique, the following two-phase procedure allows to synthesize a set of synergies. An extensive collection of generic actuations are used to generate the system dynamic responses (exploration phase); in a second stage (reduction phase), they are used to interpolate a small set of tasks. The corresponding actuations proved to be effective synergies for additional reaching tasks on a simulated planar kinematic chain. Like GAR, this procedure generates synergies in the form of feedforward controllers, and it allows to modify incrementally the library of synergies. However, DRD provides a computationally fast method to solve the task. This technique has proved its efficacy empirically, but a solid theoretical grounding is still lacking.

Most of the methods presented so far require an accurate analytical model of the system dynamics. Such a model is not always available, and for certain robots, it might be difficult to identify. Todorov and Ghahramani (2003) propose a method to synthesize synergies by means of unsupervised learning. Their work emphasizes the role of muscle synergies in an hypothetical hierarchical control scheme similar to the one proposed by Safavynia and Ting (2012): receptive fields translate sensory signals to internal variables, and muscle synergies translate highlevel control signals applied to these variables to actual muscle contractions. From this perspective, receptive fields along with

<sup>2</sup>In control theory, a system is said to be controllable if an external input can move the system from any initial state to any final state in a finite time interval.

motor primitives must form an inverse model of the sensorymotor system. This mapping is learned by fitting a probabilistic model to a dataset of sensory-motor signals generated by actuating the robot with random pulses. The use of the learned synergies as low-level controllers substantially reduces the time needed to learn a desired policy, however, their capability to generalize to additional control laws is not explicitly tested.

Alessandro and Nori (2012) define synergies as parameterized functions of time that serve as feedforward controllers. The identification procedure consists in finding the values of the parameters such that appropriate linear combinations of the resulting synergies drive the dynamical system over a set of desired trajectories (training set). The synergies identified are then tested for generalization; the idea is to evaluate to which extent they can generate actuations that drive the system along a new group of trajectories (testing set). This procedure has been evaluated successfully in simulation and does not require the analytical form of the system dynamics. However, it is computationally very intense as it involves heavy optimizations. In essence, this work proposes a new formal definition of the concept of muscle synergies: elementary controls that are evaluated in terms of task-performance (i.e., tracking error), rather then in terms of approximation of the input-space.

Thomas and Barto (2012) formulate the problem of primitive (i.e., synergy) discovery within the framework of reinforcement learning. In this case, the problem that the agent has to solve is a Markov decision process (MDP), and each primitive is a parameterized feedback control policy. The idea is to identify the optimal parameters that maximize the expected reward for a given task, when the control is restricted to linear combinations of the learned primitives. This method is tested on a simulated planar kinematic chain actuated with artificial muscles. Primitives are identified on reaching tasks, and they are successfully tested in a scenario that involve reaching and avoiding obstacles. This work clearly shows the advantage of a synergy-based framework in terms of learning speed of novel control policies. This method is in essence similar to the one proposed by Alessandro et al. (2012), however, it identifies complete feedback control policies rather then single feedforward synergies.

The time-varying synergy model greatly reduces the dimensionality of the problems by encoding actuations with synergy-coefficients, however, at the same time it introduces a complication. As the new input variables are piecewise constant, it is difficult (although possible) to implement feedback loops. The synchronous model ameliorates this problem and, to some extent, it allows adapting traditional control strategies to the new reduced-dimensional control input.

Some researchers employ the synchronous synergy model to control the tendon-driven robotic ACT hand (Deshpande et al., 2013) in a reduced dimensional space (Rombokas et al., 2011; Zhang et al., 2011; Malhotra et al., 2012). Similarly to Todorov and Ghahramani (2003), dimensionality reduction is applied both in the sensory space and in the actuation space. The "observation synergies" transform sensory readings (tendon lengths) into a lower dimensional variable; the "control synergies" translates synergy-coefficients (as defined in section 2) to motor signals. Model adaptive control and PIDs are applied to the reduced dimensional input, and allow the robotic hand to perform tasks like writing (Rombokas et al., 2011; Malhotra et al., 2012) and playing piano (Zhang et al., 2011). The synergy matrices (observation and control) are computed by applying PCA and NMF to a dataset of tendon-lengths obtained as a result of defined hand motions. It is noteworthy that the more similar this motions are to the ones required to solve the task, the better the quality of the obtained synergy-based controller. This is clearly not surprising, but it highlights the importance of task-related variables in the formation of muscle synergies (Todorov et al., 2005).

Marques et al. (2012) identify synchronous synergies by means of an unsupervised Hebbian-like algorithm that captures the correlations between motor signals and sensory readings. Each synergy thus summarizes the levels of correlation between each motor and one of the sensors. The time modulation of each synergy to solve a given task is then obtained by means of a supervised learning procedure that aims at reducing the task error. Unlike many other works in robotics, the exploratory strategy proposed to generate the dataset of sensory-motor data does not exploit any prior information about the desired motor tasks, therefore muscle synergies are implicitly interpreted as patterns of motor coordinations that solely reflect the biomechanical constraints of the robot. This method has been tested on a single-joint tendon driven robot.

In the context of robotic hands, many researchers adopted the idea of postural-synergies, or eigengrasps. This concept derived by the observation that the variability of finger postures during human grasps can be explained by a few principal components (Santello et al., 1998), i.e., eigengrasps. Similarly, constraining the finger-joints positions of a robotic hand in such a way that the useful grasping postures can be obtained by superposing a small number of components, would result in a substantial simplification of the graphing problem. Ciocarlie and Allen (2009) derived a theoretical formulation of the problem of stable grasping in the low dimensional space of the postural-synergies; such a formulation is further improved by Gabiccini et al. (2011) for complain grasps. These studies are further analyzed and discussed by Bicchi et al. (2011), who presented them from the point of view of modeling the process of grasping and active touch. Finally, Brown and Asada (2007) proposed a direct mechanical implementation of the eigengrasps. In all these works, the quantitative details of the postural-synergies are taken from human experiments and adapted to the robot mechanical structure; the problem of finding a set of synergies that is optimized for a given robotic hand is left as future research.

Reduced dimensionality based on postural synergies is also explored by Hauser et al. (2011) for the task of balancing a humanoid robot. The authors propose a mathematical formulation, as well as a method to construct kinematic synergies (i.e., predefined balance between joint positions) that are directly linked to task variables (e.g., for balance control, the center of pressure). Additionally, the synergies are constructed in such a way that the mapping from synergy coefficients to task variables is linear (similar to the work proposed by Nori and Frezza (2005) but in kinematic space). This allows to use a simple proportional-integral-derivative controller (PID) on the synergy coefficients to control the center of pressure of the robot, as long as the movements are slow enough to neglect dynamic disturbances. The proposed method is demonstrated both in simulation and in a real humanoid.

As a final note, it is important to say that the concept of modularity has been employed in robot control in many other ways. In most of these works modules are defined as kinematic-based controllers that are combined sequentially to obtain complex joint trajectories (Khansari-Zadeh and Billard, 2011; Ijspeert et al., 2013). In this regard, these works are more related to the concept of kinematic stroke than to muscle synergies (Pollick et al., 2009). These works are out of the scope of this paper, as we focus on controllers that, in accordance with the models of muscle synergies, are based on (parallel) superpositions of primitives in input-space.

# **5. CONCLUSIONS AND PERSPECTIVES**

The hypothesis of muscle synergies, that proposes a modular organization of the neural circuitry involved in muscle coordination, has been proved very difficult to validate or falsify (Tresch and Jarc, 2009). As discussed in section 3, a substantial body of evidence in favor of this hypothesis comes from the observation that the main components of EMG recordings are robust across behaviors, biomechanical contexts, and individuals. In addition, the successful control of artificial agents confirm the computational feasibility of the hypothesized synergy-based controller (section 4). However, there also exist experiments that, for the case of the human hand, seem to disprove the hypothesis of muscle synergies (Kutch et al., 2008; Valero-Cuevas et al., 2009). As a matter of fact, there is no real consensus yet on whether muscle synergies effectively represent a modular organization of the CNS, or they merely result from the methodology employed during the experiments.

The works that are based on the control of artificial agents (e.g., musculoskeletal models, robots, and animated characters) clarify the importance of evaluating synergies in task-space. In this context, the idea is to synthesize a set of synergies that guarantees the accomplishment of the desired tasks (**Figure 3**, red arrows). On the contrary, the main focus of experimental motor control has been to identify the synergies that better reconstruct the recorded EMG dataset (**Figure 3**, continuous green arrows), and to understand their neural substrate. This approach implicitly assumes that a well reconstructed input signal leads to the observed task performance. Given the non-linear dynamics of the musculoskeletal system, this assumption might not hold. For this reason, in our view the hypothesis of muscle synergies should be tested by validating an input–output model (i.e., from muscle activations to task-variables), rather than fitting a model of the input data alone (**Figure 3**, dashed green arrow). In fact, we could speculate that muscle synergies encode a form of body schema (Hoffmann et al., 2010) that allows translating intentions to motor plans (i.e., the inverse dynamic model of the musculoskeletal system) (Torres-Oviedo and Ting, 2010).

The concept of functional synergies represents a first attempt to relate muscle synergies to task variables. However, as discussed in section 4.2, EMG and task-level components are assumed to be activated by the same coefficients. This assumption cannot hold in general because the musculoskeletal system is non-linear; rather, input-space and task-space coefficients should be related by a non-linear mapping (as described by Alessandro et al., 2012). To address this issue, one should go beyond the use of NMF, and develop novel techniques that do not impose a linear mapping between the two sets of coefficients. Additionally one could try to reconstruct the task-variables with more general non-linear methods instead of imposing a linear combination also at the task level. In the same spirit of the procedure used so far, such a technique should optimize the reconstruction error of the EMG signals, and constrain a good fit of the task-variables. In any case, the generality of the extracted functional synergies should be tested. To the best of our knowledge, the model of functional synergies was never used as a predictive framework. It would be extremely interesting to evaluate the extent to which functional synergies identified during the execution of a certain set of tasks, are able to predict the muscle activations observed during the execution of another task that involve the same task variables. If such prediction was unsuccessful, the experimenter could conclude that the identified muscle synergies do not really encode the hypothesized biomechanical functionalities, or that the same functionalities might be encoded by different synergies. In general, the model of muscle synergies has very seldom been used to make predictions.

An alternative strategy to verify the relationship between muscle synergies and task execution (**Figure 3**, dashed green arrow), is to evaluate if they can account for task-related variations of single movement executions (Delis et al., 2013). In practice, one might assess the capability of these synergies to decode each repetition of different motor tasks. In other words, one should be able to classify the motor tasks from the activation coefficients of the extracted synergies. If the decoding capability is satisfactory, one might conclude that the synergies not only constitute a low dimensional, but also a functional representation of the motor commands. This idea might be used to develop novel extraction algorithms that include task decoding objectives directly in the optimization procedure. The identified synergies would then maximize not only the reconstruction of the original motor patterns, but also the capability of disambiguating task-relevant trial-to-trial variations. Unlike the dimensionality reduction methods used so far, this approach would rely on supervised learning techniques to exploit information about the task. Possible alternatives to standard extraction algorithms include energy constrained discriminant analysis (Philips et al., 2009), the discriminant NMF (Buciu and Pitas, 2004), and the hybrid discriminant analysis (Yu et al., 2007).

The use of single-trial analysis, like the decoding strategy proposed above, may be useful for addressing some open problems that are relevant to this review. First, the development of such techniques may be useful to identify muscle activation components of relatively low amplitude that reflect unique information about the task (Quiroga and Panzeri, 2009); such components would be completely lost if an average across several trials is performed prior to the analysis. Second, such single-trial analysis techniques may be used to investigate the existence of trial-to-trial correlations across synergy activations, and to evaluate their functional role in controlling and performing task-related movement (Golledge et al., 2003; Schneidman et al., 2003). Finally, approaches based on single-trial analysis of neural activity could also be instrumental in clarifying the existence of a neural basis for the muscle synergies (Hart and Giszter, 2004, 2010; Nazarpour et al., 2012; Ranganathan and Krishnan, 2012). For example, they could in principle be applied to decode the task from single-trial neural population patterns that regulate the activation of synergies, and also to determine which patterns encode task differences, and which carry additional or independent information to that carried by other patterns (Delis et al., 2010).

Finally, an important aspect that is worth discussing is the role of feedback loops. In the case of synchronous synergies, the time course of the mixing coefficients can be adjusted on-line by means of appropriate feedback controllers; this is the reason of the popularity of such a model in the context of robotics. On the contrary, the models of temporal and time-varying synergies, in which the actuation time course are directly embedded in the synergies themselves, naturally represent feedforward controllers. As

# **REFERENCES**


*Philos. Trans. R. Soc. Lond. B Biol. Sci.* 366, 3153–3161.


a result, the evolution of the task-variables intimately depends on the initial condition of the dynamical system. Alternatively, these synergies might be defined as functions of both time and statevariables; such an approach would characterize temporal and time-varying synergies as generators of complete control policies (Nori and Frezza, 2005; Todorov, 2009; Thomas and Barto, 2012).

In conclusion, we believe that the evidence reviewed here provides support for the existence of muscle synergies. However, many issues are still unresolved. A deeper investigation of the relationship between synergies and task variables might help to address some of the open questions. In general, a closer coordination between experimental and computational research might lead to a more objective assessment of the muscle synergy hypothesis in task-space, and a better understanding of the modularity of the CNS.

# **ACKNOWLEDGMENTS**

This research was supported by the EU project RobotDoC under 235065 from the 7th Framework Programme (Marie Curie Action ITN) and the EU project AMARSi (ICT-248311).

spinal cord reflects body mechanics in human locomotionj. *J. Neurophysiol.* 104, 3064–3073.


synergies. *Front. Comput. Neurosci.* 7:11. doi: 10.3389/fncom.2013. 00011


of fast-reaching movements by muscle synergy combinations. *J. Neurosci.* 26, 7791–7810.


optimal choice of grasping forces. *Auton. Robots* 31, 235–252.


Dynamical movement primitives: learning attractor models for motor behaviors. *Neural Comput.* 25, 328–373.


in spinal frogs. *J. Neurophysiol.* 103, 573–590.


Alessandro et al. Muscle synergies: input- and task-space perspectives

Output units of motor behavior: an experimental and modeling study. *J. Cogn. Neurosci.* 12, 78–97.


*Graphics and Visualization (APGV)* (New York, NY: ACM), 25–32.


*Systems*, ed A. Schuster (London: Springer), 209–229.


*Engineering in Medicine and Biology Society* (Cancun: IEEE), 1750–1753.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 21 December 2012; paper pending published: 21 February 2013; accepted: 03 April 2013; published online: 19 April 2013.*

*Citation: Alessandro C, Delis I, Nori F, Panzeri S and Berret B (2013) Muscle synergies in neuroscience and robotics: from input-space to task-space perspectives. Front. Comput. Neurosci. 7:43. doi: 10.3389/fncom.2013.00043*

*Copyright © 2013 Alessandro, Delis, Nori, Panzeri and Berret. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any thirdparty graphics etc.*

# The neural origin of muscle synergies

# *Emilio Bizzi\* and Vincent C. K. Cheung\**

Department of Brain and Cognitive Sciences, McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA

### *Edited by:*

Martin Giese, University Clinic Tuebingen / Hertie Institute, Germany

### *Reviewed by:*

Florentin Wörgötter, University Goettingen, Germany Martin Giese, University Clinic Tübingen, Germany

### *\*Correspondence:*

Emilio Bizzi and Vincent C. K. Cheung, Department of Brain and Cognitive Sciences, McGovern Institute for Brain Research, Massachusetts Institute of Technology, 46-6189, 77 Massachusetts Avenue, Cambridge, MA 02139, USA. e-mail: ebizzi@mit.edu; ckcheung@mit.edu

Muscle synergies are neural coordinative structures that function to alleviate the computational burden associated with the control of movement and posture. In this commentary, we address two critical questions: the explicit encoding of muscle synergies in the nervous system, and how muscle synergies simplify movement production. We argue that shared and task-specific muscle synergies are neurophysiological entities whose combination, orchestrated by the motor cortical areas and the afferent systems, facilitates motor control and motor learning.

**Keywords: motor primitive, spinal interneuron, motor module, non-negative matrix factorization, motor cortex**

When the central nervous system (CNS) generates voluntary movement, many muscles, each comprising thousands of motor units, are simultaneously activated and coordinated. Computationally, this is a daunting task, and investigators since Bernstein (1967) have strived to understand whether and how the CNS's burden is reduced to a much smaller set of variables. In the last few years we and our collaborators have searched for physiological evidence of simplifying strategies by exploring whether the motor system makes use of low-level discrete elements, or motor modules, to construct a large set of movement. In this brief communication, we argue that there is convincing evidence that the discrete elements for such simplification are muscle synergies, neurophysiological entities whose combination is orchestrated by the motor cortical areas and the afferent systems.

# **EXPLICIT ENCODING OF MUSCLE SYNERGIES IN THE NERVOUS SYSTEM**

The core argument for the neural origin of motor modules rests on studies of the spinal cord in several vertebral species, conducted using a variety of techniques such as microstimulation (Bizzi et al., 1991; Giszter et al., 1993), *N*-methyl-D-aspartate (NMDA) iontophoresis (Saltiel et al., 2001, 2005), and cutaneous stimulation (Tresch et al., 1999). With these approaches, we and others were able to provide the experimental basis for a modular organization of the spinal cord circuitry in the frog (with the studies cited above), rat (Tresch and Bizzi, 1999), and cat (Lemay and Grill, 2004). A *spinal module* is a functional unit of spinal interneurons that generates a specific motor output by imposing a specific pattern of muscle activation. The output of an activated module can be characterized as a force field, or the collection of isometric muscle forces generated at the limb's endpoint over different locations of the workspace. In the spinal frog and rat, different groups of

force field were activated as the stimulating electrode was moved to different loci of the lumbar spinal cord in the rostro-caudal and medio-lateral directions. Following the initial description of the force field, Mussa-Ivaldi and others found that co-stimulation of two spinal sites led to vector summation of the forces generated at each site separately. When the pattern of forces resulting from co-stimulation were compared with those computed by linear summation of the two individual fields, the co-stimulation fields and the summation fields were found to be equivalent in 83% of the cases (Mussa-Ivaldi et al., 1994). Similar results were also obtained by Hogan and colleagues (Lemay et al., 2001), Tresch and Bizzi (1999), and Kargo and Giszter (2000). Vector summation of force fields led to the hypothesis that generation of movement and posture may be based on the combination of a few discrete motor primitives. A subsequent simulation study showed that the combinations of frog hind-limb muscles that produced stable force fields were similar to the muscle groups observed to be co-activated in spinal microstimulation studies (Loeb et al., 2000). These results together argue that the experimentally derived force fields are generated by discrete groups of muscles activated as individual units, or *muscle synergies*, from whose linear combination a vast repertoire of movement and posture could be generated.

Most voluntary movements are the result of the simultaneous activation of a few muscle synergies via descending or afferent pathways which produces a complex electromyographic (EMG) pattern in the limb's muscles. To retrieve the structures of muscle synergies from the variability of muscle activations, we and others have utilized a computational procedure, the non-negative matrix factorization algorithm (NMF), originally proposed by Lee and Seung (1999, 2001). The synergies identified by NMF are time invariant non-negative vectors whose linear combination is found, through an iterative update rule, to minimize the error of EMG

"fncom-07-00051" — 2013/4/25 — 21:48 — page 1 — #1

reconstruction, with the additional assumption that this error follows a Gaussian distribution (Cheung and Tresch, 2005). The extracted synergies thus reflect spatially fixed regularities (Kargo and Giszter, 2008; Safavynia and Ting, 2012) embedded within diverse muscle patterns. In addition to NMF, there exist other linear factorization algorithms, such as independent component analysis (Bell and Sejnowski, 1995) and independent factor analysis (Attias, 1999). For all of these algorithms, because each synergy can have components across any subset of muscles, any one muscle may belong to multiple synergies; this aspect makes the extracted synergies different from other formulations in which each muscle belongs to a single synergy. As Tresch et al. (2006) have shown, most of the algorithms, with the exception of principal component analysis (Jolliffe,2002), perform comparably in simulated and experimental data sets. This observation suggests the extracted muscle synergies are likely not an artifact contingent upon the particular assumptions employed by the algorithm for separating the activations of the synergies, but reflect basic aspects of muscle activations.

Recent electrophysiological experiments in the lower vertebrates, cat, and monkey have provided evidence that the temporal activations of muscle synergies identified by computational algorithms are expressions of neural activities. In the frog, discharging neurons in the intermediate zone of the spinal cord were found to be significantly related to activations of muscle synergies rather than activities of individual muscles (Hart and Giszter, 2010). In the cat, activities of distinct groups of neurons in the forelimb motor cortex recorded during reaching coincided with activations of muscle synergies identified by a cluster analysis (Yakovenko et al., 2011). In the monkey, intra-cortical microstimulation of an area of the motor cortex with descending connections to the spinal interneurons evoked EMG patterns decomposable into muscle synergies that, remarkably, matched the ones observed during natural reach-and-grasp behaviors (Overduin et al., 2012). This finding directly support the idea that the expression of voluntary movement relies on a complex pathway connecting the motor cortex and the spinal interneurons; through this circuitry, the cortex selects and combines the appropriate spinal interneuronal modules, and supplies the modules with temporal patterns of activation appropriate for the behavior being executed.

There is emerging evidence suggesting that the above conclusions may also underlie production of voluntary movement in humans. In a group of mildly to-moderately impaired stroke survivors with lesions in the motor cortical areas, we observed that the muscle synergies extracted from the stroke-affected arm were similar to those of the unaffected arm despite marked differences in motor performance between the arms (Cheung et al., 2009b). This observation is compatible with the proposal that muscle synergies are structured in the brain stem or spinal cord, and after a stroke, altered descending commands from the supraspinal areas generate abnormal motor behavior through faulty activation of the spinal modules. Similar results for the lower limb have also been presented (Clark et al., 2010). It is of course entirely possible that in humans who have developed highly skilled movements, like pianists (Gentner et al., 2010) or professional athletes (Frère and Hug, 2012), the motor cortex may also encode muscle synergies (Gentner and Classen, 2006; Rathelot and Strick, 2006). How the CNS manages to involve both the cortex and the spinal interneurons from the earliest stage of movement preparation (Fetz et al., 2002), and to integrate sensory information that contains crucial postural information related to the initial limb position during motor planning, are questions that deserve to be systematically explored.

# **DO MUSCLE SYNERGIES HAVE A NON-NEURAL ORIGIN?**

Kutch and Valero-Cuevas (2012) have recently proposed that the muscle synergies extracted from EMGs using factorization algorithms could have a non-neural origin. Through cadaveric experiments and computational models, these authors showed that constraints arising from the selected task and/or limb biomechanics could produce apparent couplings among muscles even when each muscle in the model is assumed to be independently controlled. This point of view that emphasizes non-neural constraints represents an important contribution to the ongoing debate on the provenance of the previously observed low-dimensionality of muscle activations (Tresch and Jarc, 2009), and offers important complementary insights. There are, however, a number of developmental and clinical studies that place the view of Kutch and Valero-Cuevas in a different light.

For instance, in a recent developmental study on human locomotor primitives, Lacquaniti and his colleagues (Dominici et al., 2011; Lacquaniti et al., 2012) demonstrated that the development of motor patterns from the neonatal to the toddler stages is primarily a result of the addition of new patterns to the few basic patterns present at birth. The precise temporal activations of all primitives are shaped gradually, over many years, as the individual grows from being a toddler to a preschooler, and finally to an adult. This progressive addition and fine-tuning of motor primitives could reflect how an infant, on his or her way to bipedal locomotion, "learns" new muscle synergies, presumably through mechanisms of associative learning and/or supervised learning, or one analogous to the mechanism responsible for the formation of ocular dominance columns in the developing visual cortex (Hubel and Wiesel, 1959). While the precise roles of genetic control in motor development remains to be established, it is conceivable that sensory feedback from muscles and tendons triggers adaptive changes in the spinal interneuronal circuitry to tune or create modules specifically tailored to the limb biomechanics of the individual, and informs other areas of the CNS of these modifications. At the termination of these developmental processes, the biomechanical properties of the limb are fully incorporated into the architecture of the motor modules, thus resulting in a match between the plant and its neural controllers that allows high-caliber motor performance. Since the limb biomechanics of different individuals are at least slightly different, it is not surprising that the precise structures of some muscle synergies are subject-specific (Torres-Oviedo and Ting, 2010). Thus, our argument that muscle synergies could have a neural origin is not incompatible with the idea that the precise structuring of each muscle synergy incorporates knowledge of both the musculo-skeletal dynamics (Berniker et al., 2009) and other biomechanical properties of the limb.

Ideally, a strong case supporting the neural origin of the muscle synergies extracted from the EMGs should come from a comparison between the number of experimentally derived synergies and

"fncom-07-00051" — 2013/4/25 — 21:48 — page 2 — #2

the dimensionality of the space of all muscle patterns suitable for the selected tasks. In a model of the human leg comprising 14 muscles, Kutch and Valero-Cuevas (2012) (their Figure 6C) showed that the set of all EMG patterns compatible with an isometric production of endpoint forces at 16 different directions, over three force magnitudes, defined an approximately sevendimensional subspace (with 80% of the data variance explained). On the other hand, the number of leg muscle synergies for human locomotion, including running and walking at different speeds, was estimated to be between four and five (Ivanenko et al., 2004; Cappellini et al., 2006; Dominici et al., 2011). We do not know whether, for the human leg, isometric production of endpoint forces and locomotion define spaces of motor patterns with similar dimensionalities. But the fact that the dimensionality of the observed EMGs (4–5) was lower than that expected from the constraint of a related task (7) is not inconsistent with the notion that structures such as muscle synergies of neural origin exist to constrain possibilities of motor output (Andrea d'Avella, personal communication).

There is the additional possibility that muscle synergies extracted from EMG data sets reflect regularities in the reflex and other feedback-related activities of multiple muscles arising from fixed patterns of musculo-tendon length changes that are dictated by how the muscles are arranged around the joints (Kutch and Valero-Cuevas, 2012; their Figure 1B). This is certainly an important biomechanical constraint that can give rise to apparent muscle couplings. However, in the frog, the structures of both the spinal force fields (Loeb et al., 1993) and locomotor muscle synergies (Cheung et al., 2005) persisted even after complete hind-limb deafferentation.

Another related set of findings has come from studies of muscle synergies in human stroke patients. Briefly, using the NMF algorithm and a non-negative least-squares technique, we characterized post-stroke alterations of muscle synergies in the stroke-affected arm as reflecting either a merging or a fractionation of the unaffected-arm muscle synergies (Cheung et al., 2012). Remarkably, while the extent of synergy merging correlated with the severity of motor impairment (which reflects the extent of motor cortical damage), the degree of synergy fractionation varied with the temporal distance from stroke onset (which reflects how long the motor system had been influenced by post-stroke plasticity). Given that these two patterns of synergy change correlated with variables related to the state of the nervous system, and that the biomechanical structures of the stroke-affected and unaffected arms are expected to be similar, it is likely that neural constraint is a major contributor to the structures of the observed muscle synergies in the affected arm. Alterations and merging of both upper- and lower-limb muscle synergies in stroke survivors have similarly been reported in several other recent studies (Clark et al., 2010; Gizzi et al., 2011; Roh et al., 2013).

# **DO MUSCLE SYNERGIES SIMPLIFY MOVEMENT PRODUCTION BY DECREASING THE NUMBER OF DEGREES OF FREEDOM?**

Muscle synergies may be conceived as representing elementary building blocks whose superposition allows the expression of a vast number of movements and postures. Similar concepts have been advanced by a number of laboratories with a variety of species ranging from *Aplysia* (Jing et al., 2004), to the frog (d'Avella et al., 2003; Hart and Giszter, 2004), rat (Tresch and Bizzi, 1999), cat (Ting and Macpherson, 2005; Ethier et al., 2006; Krouchev et al., 2006), monkey (Overduin et al., 2008), and humans (Krishnamoorthy et al., 2003; Torres-Oviedo and Ting, 2007; d'Avella et al., 2008; Monaco et al., 2010; Muceli et al., 2010). Taken together, these results indicate that *for each single task*, a reduction of the number of degrees of freedom relative to the number of muscles is a way to simplify the control of movement. In the frog, we and others have studied the activation patterns of all major hind-limb muscles collected during diverse natural motor behaviors, including jumping, in- and out-of-phase swimming, walking, kicking, and wiping. As shown by d'Avella and Bizzi (2005), each motor behavior results from a combination of both synergies shared between behaviors, and synergies specific to each or a few behaviors. While we do not know the maximum number of in-born and learned motor tasks each species may produce, it is conceivable that in any individual, the numbers of *all* taskspecific and shared synergies combined may exceed the number of relevant muscles, in which case the EMGs recorded over all possible behaviors are not expected to exhibit a low dimensionality. This theoretical possibility raises the question of how muscle synergies of neural origin "simplify" movement control. We think muscle synergies simplify the production of posture and movement in the following senses. First, for tasks that can be executed by many possible trajectories or muscle activation patterns, a set of pre-existing muscle synergies can serve as a preferred channel through which the motor commands are specified. Muscle synergies thus effectively remove any musculoskeletal redundancy at the levels of posture (Santello et al., 1998; Weiss and Flanders, 2004; Bicchi et al., 2011), kinematics (Flash and Hochner, 2005), and muscle activation, by constraining how the muscles can be activated (Bernstein, 1967; Full and Koditschek, 1999; McKay and Ting, 2008).

Second, *for a single given task*, the total number of shared and task-specific muscle synergies needed for its execution is still expected to be smaller than the total number of muscles. The set of synergies thus reduces the volume of the space of possible motor commands that the CNS needs to search through by defining a subspace of a lower dimensionality. This is equivalent to a previous suggestion that preformed neural coordinative structures, such as muscle synergies, function to automatically eliminate muscle patterns that lead to uncoordinated or inappropriate movements (Tuller et al., 1982; Turvey et al., 1982). Such a reduction in searchspace volume allows efficient transformation between task-level variables and muscle activations (Ting et al., 2012). This advantage conferred by a synergy-based control scheme may be particularly important for a task for which only a very small set of motor patterns is compatible with fulfilling the task requirements. For such a task, given the very large volume of the high-dimensional muscle-activation space defined by the many muscles of the limb, without any neural coordinative structures in place it would be very difficult for the motor system to discover, every time, a very small subspace of suitable motor patterns starting from any initial point in the space. The muscle synergies required could be a mixture of shared and task-specific muscle synergies

"fncom-07-00051" — 2013/4/25 — 21:48 — page 3 — #3

either acquired through motor learning, or inherited over the course of evolution of the species (Giszter et al., 2007). Generating motor outputs by activating these synergies ensures an efficient and robust execution of a difficult task.

Third, it has been shown that the activation of each muscle synergy can accomplish a certain kinematic (d'Avella et al., 2003) or biomechanical (Ting and Macpherson, 2005) goal which may or may not be shared between behaviors. The set of all muscle synergies may then be viewed as a compendium of coordinative patterns for executing different functions that the motor system can exploit either when executing a learned task under a different dynamic environment, or when learning a new task. In broader terms, muscle synergies may be essential components in the architecture of the motor system that allow generalization to occur (Poggio and Bizzi, 2004). Consistent with this interpretation, muscle synergies were observed to be robust across very different biomechanical or behavioral contexts (Cheung et al.,2009a; Torres-Oviedo and Ting, 2010; Chvatal et al., 2011). Also, human subjects were able to adapt much faster to a perturbation if the compensatory motor patterns required could be generated simply by tuning the activations of the existing muscle synergies (d'Avella and Pai, 2010). It remains to be seen to what extent difficult and unusual movements are also executed by recruiting the synergies utilized in well-practiced behaviors.

# **CAN MOTOR OUTPUTS BE GENERATED ONLY BY COMBINING MUSCLE SYNERGIES?**

We have reviewed above experimental evidence that support the neural origin of muscle synergies, and argued how the combination of both shared and task-specific synergies could facilitate motor control and motor learning. We do not claim that neurally based muscle synergies are the only structures that can give rise to muscle couplings observed in experiments: feedbackrelated activities arising from limb biomechanics, for example,

# **REFERENCES**


could lead to an observed coupling (Kutch and Valero-Cuevas, 2012). Nor do we claim that motor outputs can be generated only by combining a handful of spatially fixed muscle synergies. Monosynaptic stretch reflex, for instance, clearly contributes to the activities of each individual muscle. Also, at least for humans, with sufficient training even individual motor units of a single muscle could be voluntarily controlled (Basmajian, 1963). These and other additional mechanisms of motor-output generation further augment the flexibility of the motor system, and could conceivably play a role during the acquisition of motor skills (Kargo and Nitz, 2003).

One fruitful direction of future research is to determine precisely how the CNS integrates non-synergy-based mechanisms with the existing muscle synergies for the execution of a wide range of movements. In the higher primates and humans, there are two subdivisions of the primary motor cortex: a rostral, phylogenetically older region that contains descending efferents destined to the spinal interneurons, and a caudal, phylogenetically newer region that contains cortico-motoneuronal (CM) cells with monosynaptic innervations to the motoneurons of individual shoulder, elbow, and finger muscles (Rathelot and Strick, 2009). It is plausible that while the "old" motor cortex contributes to motor output by providing activation drives for the spinal modules, the "new" motor cortex further sculpts the activations of specific muscles by bypassing the spinal mechanisms through the CM cells. Controlling movement by combining muscle synergies and other proposals based on independently controlled muscles (Kutch et al., 2008;Valero-Cuevas et al., 2009) are not necessarily mutually exclusive.

# **ACKNOWLEDGMENTS**

We thank Robert Ajemian and Vittorio Caggiano for reading versions of this manuscript. Supported by NIH grants NS44393 and RC1-NS068103-01 to Emilio Bizzi.

Bizzi, E. (2009b). Stability of muscle synergies for voluntary actions after cortical stroke in humans. *Proc. Natl. Acad. Sci. U.S.A.* 106, 19563–19568.


(2010). Merging of healthy motor modules predicts reduced locomotor performance and muscle coordination complexity post-stroke. *J. Neurophysiol.* 103, 844–857.


"fncom-07-00051" — 2013/4/25 — 21:48 — page 4 — #4

Cicchese, M., et al. (2011). Locomotor primitives in newborn babies and their development. *Science* 334, 997–999.


"fncom-07-00051" — 2013/4/25 — 21:48 — page 5 — #5


The problems of degrees of freedom and context-conditioned variability," in *Human Motor Behavior: An Introduction* ed. J. A. S. Kelso (Hillsdale: Lawrence Erlbaum Associates), 239– 252.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 20 February 2013; accepted: 11 April 2013; published online: 29 April 2013*

*Citation: Bizzi E and CheungVCK (2013) The neural origin of muscle synergies. Front. Comput. Neurosci. 7:51. doi: 10.3389/fncom.2013.00051*

*Copyright © 2013 Bizzi and Cheung. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any thirdparty graphics etc.*

"fncom-07-00051" — 2013/4/25 — 21:48 — page 6 — #6

# Neuromotor recovery from stroke: computational models at central, functional, and muscle synergy level

# *Maura Casadio, Irene Tamagnone, Susanna Summa and Vittorio Sanguineti\**

*Department of Informatics, Bioengineering, Robotics and Systems Engineering, Neuroengineering and Neurorobotics Lab (NeuroLAB), University of Genoa, Genoa, Italy*

### *Edited by:*

*Andrea D'Avella, IRCCS Fondazione Santa Lucia, Italy*

### *Reviewed by:*

*Gianluigi Mongillo, Paris Descartes University, France David Reinkensmeyer, University of California at Irvine, USA*

### *\*Correspondence:*

*Vittorio Sanguineti, Department of Informatics, Bioengineering, Robotics and Systems Engineering, University of Genoa, Via all'Opera Pia 13, 16145 Genoa, Italy*

*e-mail: vittorio.sanguineti@unige.it*

Computational models of neuromotor recovery after a stroke might help to unveil the underlying physiological mechanisms and might suggest how to make recovery faster and more effective. At least in principle, these models could serve: (i) To provide testable hypotheses on the nature of recovery; (ii) To predict the recovery of individual patients; (iii) To design patient-specific "optimal" therapy, by setting the treatment variables for maximizing the amount of recovery or for achieving a better generalization of the learned abilities across different tasks. Here we review the state of the art of computational models for neuromotor recovery through exercise, and their implications for treatment. We show that to properly account for the computational mechanisms of neuromotor recovery, multiple levels of description need to be taken into account. The review specifically covers models of recovery at central, functional and muscle synergy level.

**Keywords: functional recovery, cortical reorganization, motor skill learning, compensation, robot, slacking, muscle synergy**

# **INTRODUCTION**

In the nervous system, a cerebro-vascular accident (stroke) elicits a complex series of reorganization processes at molecular, cellular, neural population, behavioral (sensorimotor and cognitive) and social interaction levels, with temporal scales that range from hours, to months, to years (Schaechter, 2004; Barbay et al., 2006; Nudo, 2006, 2007). Alterations occurs well beyond the actual lesion, including a low-activity "penumbra" region in the surrounding areas and inter-hemispheric unbalance due to a decreased activity in the ipsilesional side (Hummel and Cohen, 2006).

Animal models and human studies suggest that functional recovery is mediated by use-dependent reorganization of the preserved neural circuitry. A key to neuromotor recovery, and the basis of neuro-rehabilitation interventions, is movement associated with a task (Nudo, 2006, 2007) and with volitional effort (Blennerhassett and Dite, 2004; Higgins et al., 2006; Timmermans et al., 2010). This process produces alterations in neuronal excitability (Ward and Cohen, 2004), leading to changes in neural circuitry, with a process resembling that occurring in the developing brain. Redundancy in the musculoskeletal system plays a key role in neuromotor recovery. It has long been suggested (Bernstein, 1967) that the nervous system has a modular control structure to deal with redundancy. According to this view, the nervous system adaptively controls combinations of motor primitives that are the "building blocks" of movement organization. The pressure toward re-gaining functional independence may lead to the development of compensatory strategies that, even when adequate for carrying out activities of daily life (ADLs), may be stereotypical or energetically inefficient so that they may ultimately prevent true recovery (Levin, 1996b; Cirstea and Levin, 2000). For instance, an excess use of the non-paretic limb can have a negative influence on the process of cortical reorganization (Avanzino et al., 2011) by further reinforcing the imbalance between the two hemispheres. Models of neuromotor recovery that explicitly take modularity into account might be the most appropriate level of description for these phenomena.

In summary, neuromotor recovery through exercise is the end result of a complex interplay between activity-dependent reorganization of the brain areas close to the lesion, the recruitment of new neural pathways and the development of novel motor strategies.

A deeper understanding of the functional and physiological mechanisms underlying recovery would have strong impact on approaches to neuromotor rehabilitation. Computational motor control and, more in general, computational models may greatly contribute to this understanding (Huang and Krakauer, 2009). Even more importantly, models may be directly incorporated into technological solutions, and can constitute the basis for personalized therapy. In fact, Marchal-Crespo and Reinkensmeyer (2009) pointed out that there is a specific need for "improved models of human motor recovery to provide a more rational framework for designing robotic therapy control strategies." However, while musculoskeletal models have a long history in the personalization of treatment of movement disorders (Fregly et al., 2012), computational models of neuromotor recovery through exercise are still in their infancy.

Here, we review the state of the art of computational models for neuromotor recovery and their implications for treatment. We then suggest directions for future research.

# **MODELS OF NEUROMOTOR RECOVERY**

There have been several attempts to model the time course of recovery, either when it is spontaneous, or when is facilitated by some form of treatment, e.g., electrical stimulation or assistance by a robot. Here we specifically focus on models of activitydependent recovery. Models of recovery may focus on different levels of description, ranging from cortical or subcortical lesions, to muscle control, to functional behavior in the context of a specific task.

Models of neuromotor recovery at the level of cortical circuitry (Goodall et al., 1997; Reinkensmeyer et al., 2003; Butz et al., 2009) address how focal cortical lesions elicit neural reorganization phenomena, and the way these lesions affect motor behavior.

Other models address the "functional" level of description, related to the ability to complete a specific task and to how it changes over time. For instance, Colombo et al. (2008, 2010) describe the temporal evolution of performance over training time. Only few models focus on how physical interaction affects voluntary control (Emken et al., 2007), and how such voluntary control changes with exercise (Casadio and Sanguineti, 2012).

Models of muscle control focus on characterizing the impairment in individual subjects in terms of altered muscle synergies (Cheung et al., 2012).

Only few models encompass multiple levels of description. Han et al. (2008), Reinkensmeyer et al. (2012a) and Takiyama and Okada (2012) address the mechanisms of cortical reorganization in the context of voluntary motor activity, in a skill learning scenario. The emphasis here is on how voluntary movements promote recovery through cortical and subcortical reorganization.

In the following sections we will review a number of computational models of neuromotor recovery—respectively, at central, functional and muscle level—that have recently appeared in the literature. For each model we provide a general description; we then discuss their main findings or predictions, their limitations and the implications for rehabilitation.

# **MODELS OF FOCAL CORTICAL LESIONS AND ACTIVITY-DEPENDENT REORGANIZATION**

Several models of neuromotor recovery explicitly focus on the mechanisms of cortical reorganization following a focal lesion (models at central level).

In the work of Goodall et al. (1997), an existing computational model of the sensorimotor control of arm movements (Chen and Reggia, 1996), incorporating a model of both the somatosensory and the motor cortex, was used to investigate the reorganization processes that occur immediately after a focal cortical lesion; see **Figure 1**. The model assumes "mexican-hat" lateral cortical connections, a competitive activation dynamics and a Hebbian plasticity mechanism for incoming cortical connections. Lesions were modeled by setting the activation levels of selected units to zero, and by eliminating connections to and from those units.

The main prediction of this model is a two-phases reorganization process. Immediately after the lesion, lower activity is observed in the areas surrounding the lesion. A second phase is characterized by a gradual increase of the size of this area and by a general reorganization of the intact cortical regions. Both effects are mediated by activity-dependent synaptic changes. The lowactivity peri-lesional area and its expansion over time are due to lack of activation. The model also predicts that a small uniform excitatory peri-lesional input may favor the participation of this area in the reorganization process.

The same group also studied the short- and long-term changes in lateralization that occur after a focal lesion, and investigated the possible contribution to recovery of the intact hemisphere and of inter-hemispheric communication (Reggia et al., 2000; Shkuro et al., 2000).

Here cortical reorganization is modeled in terms of synaptic changes (self-organization) of a topological map. In contrast, Butz et al. (2009) specifically addressed the mechanisms of activity-dependent synaptic rewiring that occur immediately after a focal cortical lesion. Synapse formation is accounted for by models of axonal and dendritic elements. On the pre-synaptic side, activity is assumed to promote axonal outgrowth. On the post-synaptic side, each neuron is assumed to change its input connectivity in an homeostatic manner, with the goal of keeping the firing probability within a specified range. In this framework, rewiring after a lesion can be seen as a form of compensation, driven by the need to regain a stable (homeostatic) regime.

The main prediction of this model is that neural populations that are already in homeostatic conditions (e.g., in adult individuals) are much less likely to compensate for lesions than networks that are still under development.

An additional prediction is that external stimulation may promote axonal outgrowth, thus accelerating rewiring. However, prolonged stimulation may induce a saturation effect, hence it can be detrimental to recovery or, anyway, less effective than paused stimulation. In terms of rehabilitation, these findings suggest that training with pauses in between may be more effective than continuous intensive training without pauses.

Varier et al. (2011) used the model proposed by Reggia and colleagues (Chen and Reggia, 1996) to examine the effects of focal and distributed lesions at various stages of development. In partial contrast with Butz et al. (2009), this model predicts that mature systems are relatively more robust to lesions than systems that are still under development. This apparent discrepancy may be a consequence of the different assumptions on structural plasticity highly modifiable (Butz et al., 2009) vs hardwired (Chen and Reggia, 1996).

Reinkensmeyer et al. (2003) developed a model of cortical damage and its consequences on arm reaching movements. Different from the previous approaches, this model does not address intracortical connectivity and its reorganization. Based on experiments on non-human primates (Georgopoulos et al., 1982), neurons in the motor cortex are assumed to collectively encode the initial direction of the movement (population vector coding). Specifically, each neuron's firing rate is assumed to be a function (truncated cosine) of the difference between the actual direction and the "preferred direction" for that neuron plus a noise term, whose standard deviation is proportional to the deterministic part of the cell response (signal-dependent noise). The overall encoded direction is the sum of the preferred directions of each individual neuron, weighted by their activity levels.

Cortical lesions were simulated by eliminating part of the neurons (cell death)—hence resulting in under-represented or non-represented preferred directions. Movement performance was measured in terms of the discrepancy between intended and encoded movement direction. They specifically looked at the variability of directional error within the same intended direction and across directions, and how these quantities are affected by cell death. They found that both types of error are inversely correlated with the fraction of surviving cells. In a number of experimental studies with stroke survivors, they found that the same indicators exhibit similar relationships with the subjects' clinical impairment score (Kamper et al., 2002; Reinkensmeyer et al., 2002; Takahashi and Reinkensmeyer, 2003).

This study addresses the problem of how cortical damage results in impaired movements. As such, it is a model of impairment, which does not explicitly address the mechanisms of recovery.

# **MODELS OF USE-DEPENDENT RECOVERY**

Other models focus on cortical reorganization in the context of a specific motor task. Han et al. (2008) look at how lesions in cortical motor areas affect the mechanisms of arm selection to achieve a goal (reaching a target), and how impairment evolves through spontaneous arm use. The model accounts for motor cortical dynamics (both hemispheres) and action selection; see **Figure 2**. The cortex is modeled as in Reinkensmeyer et al. (2003), with the addition of a Hebbian learning mechanism to account for cortical reorganization. In addition, the process of deciding which arm to use is modeled as a form of reinforcement learning.

More specifically, the preferred directions for each neuron are assumed to adapt as a function of activity. Adaptation has two aims: (i) shifting the actual encoded direction closer to the desired direction (supervised component) and (ii) shifting the preferred directions of the individual neurons toward the desired direction (self-organizing component).

The action selection module accounts for the process of selecting the hand that will actually make the movement. A model of the action-value mapping, based on radial basis functions, generates the expected reward as a function of the direction of the actual movement. The hand whose expected reward is maximal is selected to execute the movement. This module is driven by a reinforcement learning mechanism. After every movement a reward signal is provided, defined as the sum of two terms, respectively reflecting (i) how close the cortex's encoded direction is to the desired movement direction, and (ii) the fact that the left hand is more likely chosen for leftward movements whereas the right hand is more likely selected for rightward movements. The action-value model is updated to minimize the difference between actual and expected reward.

As in Reinkensmeyer et al. (2003), the effect of a stroke was modeled by eliminating part of the neurons within one hemisphere's motor cortex. As a result, the impaired side is initially unlikely to be selected for movements on that side, and lack of use makes its selection even less likely. In contrast, its forced use induces reorganization, so that the intact portion of that hemisphere gradually shifts its preferred directions toward those that were once covered by the impaired portion of the cortex. In summary, the model addresses the mechanisms of interaction between activity-driven cortical reorganization and functional compensation, i.e., the change in the motor strategy (in this case, from the impaired to intact arm) that is driven by the need to preserve functional performance (e.g., a high reward).

Takiyama and Okada (2012) used a similar model, with emphasis on bimanual training. Their main prediction is that bimanual training facilitates the adaptation of the preferred directions of the unimpaired neurons in the ipsilesional cortical hemisiphere.

Han et al.'s(2008) model predicts that recovery will self-sustain if the amount of spontaneous use of the impaired arm reaches a certain threshold. If this is not the case, the impaired arm will be less likely to be selected, and recovery (if any) will gradually wash out. The model makes an important qualitative prediction—an activity threshold is a necessary condition for recovery to selfsustain. This can help explaining the mechanisms of action of rehabilitation strategies that rely on forced use of the impaired arm. As a matter of fact, observations from a rehabilitation trial based on constraint-induced movement therapy (CIMT) were found to be consistent with the "threshold" notion (Schweighofer et al., 2009). Aiming to achieve an "activity threshold" rather than providing a fixed amount of training, can be seen as a form of personalization of the therapy. Studies in this direction are currently under way, one first step being to quantify the amount of arm non-use in activities of daily living (Han et al., 2013).

More in general, the model is important because it is a first attempt to address the interplay between cortical reorganization and the development of compensatory strategies.

Although with different emphasis, all the above approaches focus on neural mechanisms of use-dependent reorganization. Impairment is only modeled qualitatively, and as such these models cannot immediately be related to the behavior of a specific subject, if not in qualitative terms.

Two related studies (Reinkensmeyer et al., 2009b, 2012a) focus on how reorganization of the preserved cortico-spinal (CS) pathways and the recruitment of new ones underlie the recovery of the ability to generate force. The proposed model—inspired by single-cell recordings of the neural correlates of wrist force generation in primates—specifically addresses the "residual capacity" phenomenon, i.e., the empirical evidence that additional motor training may still improve the movement capacities even years after a stroke.

Specifically, the model is based on the notion that experience of movement "practice" induces a re-optimization in the recruitment of the intact CS connections to motor-neurons (MNs). A pool of CS cells (see **Figure 3**) is assumed to receive a movement command as input. The activity of CS cells summates at the level of the MN pools in the spinal cord. Wrist force (aimed toward either flexion or extension) is determined by the difference between the activities of the "flexor" and "extensor" MN pools. This can be considered as the simplest muscle synergy. The synaptic weights of the CS-MN connections are assumed to

be fixed, whereas the weights of the input connections to the CS cells are learned through a reinforcement mechanism, in which the experienced attempt to move the limb represents the reward signal that guides the refinement of activation.

Furthermore, the model assumes that if CS connections originating from the primary motor cortex (M1) are lacking, CS connections from the supplementary motor area (SMA) may be recruited as well.

Model simulations predict that the size of the residual CS population determines the maximum strength an individual stroke patient can achieve. This is consistent with observations based on Transcranial Magnetic Stimulation (TMS) and Magnetic Resonance-Diffusion Tensor Imaging (MR-DTI) (Stinear et al., 2007) suggesting a strong correlation between white matter integrity and maximum grip force. In addition, the model predicts that the same dose of exercise is more effective when administered to sub-acute subjects (as compared to chronic), and to less impaired subjects (as compared to more severe). Furthermore, severe M1 lesions are predicted to induce an increase in SMA activity. These predictions have been confirmed experimentally (Feydy et al., 2002; Ward et al., 2007).

This model has not been directly used in rehabilitation, but provides some hints on how to make rehabilitation more effective. The simulations show that recovery would be facilitated if noise level were decreased, or if noise were signal-dependent. Another prediction is that inhibition of the stronger connections, e.g., the residual connections originating from M1, would facilitate the recruitment of alternative pathways. The latter prediction is somehow consistent with, and actually provides a possible interpretation for, the empirical observation (Bolognini et al., 2009) that repetitive TMS of the intact cortical hemisphere facilitates recovery.

Like Han et al. (2008), this model only makes qualitative predictions and cannot describe the behavior of one specific subject. Further, apart from the M1-SMA shift of activity, the model does not directly address the compensation issue and does not provide new insight about the interplay between neural reorganization and task re-learning.

# **TEMPORAL EVOLUTION OF PERFORMANCE OVER TRAINING TIME**

Variants of the power law of practice have often been used to describe the trial-by-trial evolution of motor performance during exercise, both within and between training sessions (the "functional" level of description).

In a widely used approach to robot-assisted rehabilitation (Krebs et al., 1998; Colombo et al., 2007), subjects are allowed a specific time interval to complete the task without assistance. After that, the robot completes the task through a high-stiffness position controller. In this way, the amount of robot intervention (fraction of the movement operated by the robot) automatically adjusts to subject's performance (more active performance, less robot intervention). However, to prevent movements from being totally passive, at least initially, this technique requires at least a minimum residual amount of voluntary control. In a systematic analysis of various performance indicators during robot-assisted rehabilitation based on this protocol, Colombo et al. (2008, 2010) used an exponential model to account for their temporal evolution.

The main finding of these studies was that some movement features, namely force control and movement smoothness, improved more quickly than parameters that relate to the fine-tuning of the movement, like speed. Specifically, speed improvement exhibited a much longer (2–3 times) time constant than force control and smoothness. Furthermore, in several subjects the path curvature exhibited a non-monotonic time course, with an initial increase until a peak, followed by a steady decrease.

These findings shed some light on the process of functional recovery. Subjects first explore the action space with the primary objective of exploiting their residual abilities to achieve the goal; then they undergo effort optimization to make the movement more efficient. The distinctive behavior of path curvature is a consequence of this two-phase process: the increase relates to the exploration phase, whereas the decay denotes the beginning of the effort optimization phase. The final step is the fine tuning of movement performance, denoted by the increase in movement speed (Colombo et al., 2012). These processes likely start at the same time and run in parallel, but have different time constants (faster the former, slower the latter).

A different, but related view of recovery is based on the notion that each movement is built by combining multiple submovements, and the empirical observation (Dipietro et al., 2009) that neuromotor recovery is characterized by a gradual decrease in the number of sub-movements—which results in an improved smoothness.

These mechanisms suggest that training-mediated recovery shares common features with motor skill learning.

In a subsequent study, Colombo et al. (2012) developed a control algorithm Progressive Task Regulation (PTR) that automatically adjusts various aspects of the task (amplitude of the movement, number of subtasks, assistance modality) according to the evolution of the different performance indicators. After each training epoch, performance is measured (moving average over the last three epochs), and the difficulty level of the task is modified according to a set of rules. Depending on the fraction of movement completed without assistance of the robot, exercise difficulty can switch to more challenging assistance modalities. The observed evolution of the different performance indicators was incorporated into the threshold values used to determine when the switch occurs. The decisions of switching task difficulty based on the PTR algorithm were found to be highly consistent with those based on subjective therapists' assessment.

This model only focuses on performance time series and only provides information on the temporal evolution of the recovery process. However, it does not attempt to describe the mechanisms underlying recovery. Specifically, the model describes the evolution of performance in non-assisted trials. The same modeling framework can address situations in which the degree of assistance is constant or changes slowly and monotonically, but cannot distinguish between the "robot" and the actual "voluntary" contributions in situations when they act together.

Recently, the same group proposed a dynamical model of recovery (Panarese et al., 2012), attempting to reproduce the mechanisms underlying the trial-by-trial evolution of performance. The model derives from a computational framework that was originally developed by Smith et al. (2004) in the context of associative learning in animal studies. In this model, an internal state variable denoting "motor improvement" is modeled as a random walk. This reflects the assumption that motor improvement builds up as the effect of many different factors. As an addition to the original model, at a given trial the logarithms of a number of observable performance indicators are assumed to be proportional to the amount of motor improvement. The model was found to reproduce the variety of time constants of the recovery observed in the different performance indicators. While interesting, this model is nothing more than a noisy version of the exponential model. In particular, the random walk assumption says little on what determines recovery.

# **THE ROLE OF ASSISTANCE**

A few modeling studies have addressed the issue of how robot assistance affects recovery. Reinkensmeyer (2003) and Emken et al. (2007) focuses on adaptive changes in gait movements in presence of assistive forces. The study specifically addresses the trial-by-trial evolution of performance during adaptation to an assistive force field, and suggests that adaptation can be explained as an optimization process, which accounts for a combination of motor error and effort.

The resulting computational model is summarized in **Figure 4**. Similar to force field adaptation experiments (Thoroughman and Shadmehr, 2000; Scheidt et al., 2001; Donchin et al., 2003; Cheng and Sabes, 2006), the adaptation process is modeled as a linear autoregressive model. A controller receives as inputs the desired trajectory, the performance error and the motor command (muscle force) on the previous trial. The motor command on the next trial is the controller output. The controller includes two modules: (i) an inverse model of limb dynamics, which transforms a desired trajectory into an appropriate motor command; (ii) a "learning rule", which adapts the motor command to changes in the dynamics of the limb and/or of the environment. The output muscle force is applied to the body, whose movements are disturbed by an external perturbation.

The "learning rule" module (**Figure 4**, inset) relies on the notion that while dealing with assistive forces the motor system

behaves as a "greedy" optimizer, so that these forces are quickly incorporated into the motor plan in order to minimize the effort while maintaining the required performance level (e.g., a small error). The force estimator predicts the disturbance on the next trial, and an optimizing controller generates the next motor command by minimizing a cost function on a trial-by-trial basis. In the model derived from this optimization framework, the "slacking" mechanism is captured by a decay term in the dynamics equation (the "f " term in the Learning Rule block, see **Figure 4**).

Emken et al. (2007) also suggested that slacking may have adverse effects on recovery. As a consequence, in robot-assisted rehabilitation, assistance should be kept to a minimum. Furthermore, it has to be decreased—manually or automatically—as performance improves. A variety of techniques have been proposed to adaptively regulate the magnitude of assistive force as a function of the observed outcome. In Casadio et al. (2009) the therapist manually selected the assistance level in order to keep it to the minimum level that evoked a functional response needed for accomplishing the task. In Vergaro et al. (2010) a linear controller continuously and automatically regulated the assistive force provided by the robot, depending on on-line performance measures. Similar mechanisms have been proposed in the context of upper limb—the Performance-based progressive robot-assisted therapy used by the MIT-Manus robot (Krebs et al., 2003)—and the lower limb—the patient-cooperative training modality used by the Lokomat system (Riener et al., 2005; Mihelj et al., 2007). Using a computational model of human-robot load sharing, Reinkensmeyer et al. (2007) suggested that to achieve assistance as needed, the robot controller should possess a slacking mechanism that resembles that observed in humans. Wolbrecht et al. (2008) designed a more complex adaptive control scheme, based on task performance, that automatically negotiates an error-reducing and an effort-enhancing component. The controller needs an explicit model of the subject's arm and its neural control. In this specific study, this model took the form of a radial basis function neural network, which was built by experience.

The "greedy optimizer" model has been highly influential to rehabilitation, but does not really address the recovery mechanisms. In fact, it has been derived from studies on healthy subjects and its implications for recovery are largely speculative. The slacking hypothesis has never been directly tested in clinical rehabilitation trials.

In a recent study, Casadio and Sanguineti (2012) developed a linear dynamical model to describe the trial-by-trial evolution of the motor performance of chronic stroke survivors who underwent a rehabilitation protocol based on a robot-assisted arm extension task.

The model is based on the notion that in robot-assisted exercise the robot device and the subject cooperate toward a common goal—a form of shared control. Specifically, the model assumes that task performance is a function of a voluntary, humangenerated command (taken as the model's state variable) and of a robot-generated assistive force (taken as one of the model's inputs); see **Figure 5**, top right.

As regards the dynamics of the actual recovery process, the model assumes that the amount of voluntary control on the next trial is the sum of three components: (i) a "memory" or "retention" term—a fraction of the current amount of voluntary control; (ii) a "learning" component, proportional to a "driving" signal—a function of movement performance that can be interpreted as a reward—and (iii) an assistance-related component proportional to the magnitude of the assistive force.

A distinctive feature of this model is that it posits separate mechanisms for "retention" and for the effect of assistance, i.e., the actual "slacking". These terms have often been used interchangeably; see Reinkensmeyer et al. (2009a) for a review that specifically covers slacking models.

The model was used to analyze the trial-by-trial time series of performance in nine chronic stroke survivors, who underwent a 10-sessions training protocol; see **Figure 5**, left. The estimates of the model parameters for each subject suggested that recovery is determined by a complex interplay of memory (retention), performance and slacking. One specific finding was that in severely impaired subjects recovery is greater when the driving (reward) signal is greater; hence, recovery improves when the performance—not the motor error—is greater. Another finding, which somewhat confirmed the observation of Emken et al. (2007), was that a greater assistive force has a negative impact on recovery (slacking). However, only a few subjects—the least impaired—exhibited a significant "slacking" effect. The single most important finding was that the retention rate (memory decay) parameter accurately predicts the long-term outcome of the rehabilitation trial (see **Figure 5**, bottom right). This finding is consistent with Han et al. (2008): the hypothesis that recovery must reach a threshold in order to self-sustain implies a buildup mechanism that integrates the effect of repeated motor activities. High retention is an essential prerequisite of this mechanism.

This model suggests that mechanisms of recovery may differ in different subjects. Again, this calls for an adaptive regulation of assistance, in which peculiarities of the individual subjects are to be taken into account.

Squeri et al. (2011) designed an adaptive Bayesian regulator that adjusts the magnitude of assistive force (or other task parameter) to keep the average performance around a target magnitude (performance clamp). In this way, as performance improved, the controller automatically reduced the amount of assistance. Squeri et al. (2012) used this model to analyze the temporal evolution of the subjects' voluntary control in a task that involved submovements in different directions. One single controller was used to adaptively regulate the degree of assistance in all submovements. The model suggested that the dynamics of recovery is direction-dependent, mostly due to the different sensitivity to assistance exhibited by the arm moving in different directions. These results suggest that modulation of assistance would better be made separately for each direction. More in general, they point at the need for therapy personalization.

While this model attempts to distinguish between recovery and slacking (i.e., adaptation to assistance), it remains purely descriptive and does not explicitly address the underlying mechanisms.

# **SYNERGY-BASED MODELS OF IMPAIRMENT**

The characterization of compensatory strategies in stroke survivors is receiving an increasing attention. A number of studies

look at movements in terms of their basic building blocks, or synergies (Ting and Macpherson, 2005; Tresch et al., 2006; Raghavan et al., 2010; Cheung et al., 2012). The notion of muscle synergy relies on the assumption that the nervous system recruits spinal inter-neural modules that activate groups of muscles as individual units.

Cortical lesions affect the organization of these modules, thus resulting in abnormal muscle activations (Twitchell, 1951; Brunnstrom, 1970), incorrect regulation of interaction torques, incorrect timing of action sequences (Archambault et al., 1999), decreased joint range of motion (Levin and Dimov, 1997), loss of inter-joint coordination (Levin, 1996a) and ultimately abnormal movements

To extract muscle synergies, Cheung et al. (2012) collected myoelectric signals (EMGs) from different muscles and applied a factorization algorithm—Non-negative Matrix Factorization (NMF)—to the rectified, integrated, and variance-normalized EMGs. Based on this factorization, the temporal pattern of activation of each muscle is expressed as the sum of a small number of (time-varying, task-dependent) signals. The relative contribution of each of these signals to each muscle is time-invariant and taskindependent, and denotes groups of muscles that are recruited together.

These authors systematically analyzed the muscle synergies exhibited by stroke survivors with different levels of impairment (mild to severe) and different disease durations (post-acute to super-chronic). Subjects performed different tasks with each of their arms to allow a comparison of their muscle synergies in both the impaired and the unimpaired sides. A portion of the synergies observed in the impaired arm was found to be similar to those found in the intact arm of the same subjects ("preserved" synergies). The remaining synergies were altered. Specifically, the authors identified two distinct types of alterations, which they named "merging" and "fractionation". "Merging" refers to a situation in which in the impaired arm multiple synergies merge together. The amount of merging was found to correlate with the severity of the impairment. "Fractionation" refers to a situation in which the normal synergies split into multiple, novel patterns. The amount of fractionation was found to correlate with the disease duration (time since stroke onset), with suggest that it can be seen as a form of spontaneous reorganization.

Synergy models do not explicitly address the recovery process, but can be used to characterize impairment and the effects of treatment in individual subjects, in term of their repertoire of movement strategies at the articular and/or muscular level. In other words, muscle synergies are physiological markers of both the degree of impairment and of the degree of recovery. Dipietro et al. (2007) observed that in chronic stroke survivors, robotassisted exercise results in a more efficient control of shoulder and elbow joints. They suggested that these changes are due to a "tuning" of the existing abnormal synergies, not to their modification.

Latash and Anson (1996) suggested that in impaired individuals, the modified motor strategies should not be necessarily considered as pathological but, rather, they should be seen as forms of adaptation to the primary disorder. Therefore, their correction should not be the primary concern of the rehabilitative treatment. However, this view may be in contrast with the longterm goal of recovering motor functions (Levin, 1996b; Cirstea and Levin, 2000). Failure to modify the abnormal synergies may lead to incorrect postures, weakening of underutilized muscles. With time, it may worsen the chances of recovering other abilities (Levin, 1996b).

It is conceivable to design technological aids that drive the recovery toward preserving and/or facilitating the "normal" articular and muscle synergies while, conversely, reducing or preventing the "abnormal" ones. Recently, Crocher et al. (2012) demonstrated that in healthy subjects, training with an exoskeleton may induce changes in the arm-related synergies. In severe stroke survivors Ellis et al. (2005) demonstrated that training reduces abnormal isometric elbow and shoulder joint torque coupling.

Rehabilitation treatments built upon synergy models should be designed to emphasize the synergies that are silent or underactivated, while discouraging compensatory strategies in favor of new and more independent coordination patterns.

Synergy-based models are descriptors of impairment; they do not provide an explicit model of the recovery process, although they can be used for describing this process as a change at the level of muscle activity.

# **DISCUSSION**

Computational models are widely used to investigate motor skill learning and sensorimotor adaptation. Similar models might potentially contribute to our understanding of the mechanisms of recovery after a stroke at "functional" level, and to the design of optimal, individualized rehabilitation strategies.

Neuromotor recovery is facilitated by exercise and is mediated by neural reorganization at cortical and sub-cortical levels, whose physiological substrate is synaptic plasticity and rewiring through axonal outgrowth.

On one hand, neural plasticity mechanisms constrain the way recovery proceeds. On the other hand, sensorimotor behaviors determine the patterns of neural activity, thus inducing specific, activity-dependent synaptic changes. Therefore, behavioral mechanisms and neural reorganization cannot be simply treated as different levels of description. Rather, both aspects need to be accounted for when modeling the neuromotor recovery process.

### **DO COMPUTATIONAL MODELS PROVIDE TESTABLE HYPOTHESES ON THE MECHANISMS OF RECOVERY?**

It has been suggested (Krakauer, 2006; Dipietro et al., 2012) that neuromotor recovery shares at least some features with motor learning. The class of models focusing on the recovery of functions highlights this "motor learning" component.

To acquire (or re-acquire) a motor skill an individual must understand how to achieve the movement goal and, more in general, learn how to get high rewards, while minimizing the necessary muscle effort. Some of the models discussed in the previous sections unveil specific aspects of this process. The observation (Colombo et al., 2010) that different aspects of motor performance exhibit a different temporal evolution, which may not be monotonic, reflects the notion that recovery is a complex multifactorial process, in which maximization of performance is only one of the components. Another aspect is generalization, from simpler to more complex movements (Dipietro et al., 2009).

Other models (Han et al., 2008; Reinkensmeyer et al., 2012a) describe neuromotor recovery within a reinforcement learning paradigm. Casadio and Sanguineti (2012) suggest that in severe chronic stroke survivors, improvements in voluntary control are determined by performance, not error—another indication that recovery has much in common with motor skill learning.

Other models focus on the "central" level, and describe cortical reorganization in terms of Hebbian and/or self-organization principles. These models highlight a number of physiological mechanisms of recovery. The model of Goodall et al. (1997) predicts an initial increase of the size of peri-lesional "silent" areas immediately after the lesion, followed by overall reorganization. Another specific prediction is that tonic stimulation of the lesioned side would limit the size of peri-lesional "silent" areas. A similar effect is obtained by limiting the activity of the contralateral hemisphere. Butz et al. (2009) focuses on axonal outgrowth driven by the push toward restoring the homeostatic inter-hemispheric balance. Its main prediction is that stimulation may facilitate axonal outgrowth but, because of a saturation effect, paused stimulation is more effective than sustained stimulation.

Still other models address reorganization mechanisms either at the level of the musculoskeletal system or at the level of CS circuitry. Reinkensmeyer et al.'s (2012a) model includes a basic form of reorganization, based on the recruitment of corticospinal pathways that originate from cortical areas (e.g., the SMA) that in the intact brain are normally not used because they are less efficient, and therefore more "costly", in contributing to force generation.

In Han et al.'s (2008) model, the activity of the impaired limb induces a reorganization of the ipsilesional cortical areas, which in turn makes this same hand more likely to be selected for movement. Its main prediction is that for recovery to selfsustain, activity of the impaired hand must reach a threshold. The hand selection (Han et al., 2008) model suggests that if there is too much emphasis on performing the task but there is too little pressure toward reorganization—for instance, if the affected arm is not likely to be selected and hence the impaired hemisphere is not active enough to undergo reorganization recovery of the paretic side functions will not self-sustain and will possibly wash-out. One testable prediction is that techniques that facilitate reorganization independent of motor learning—e.g., by increasing cortical excitability, as with trans-cranial direct current stimulation (tDCS) (Reis et al., 2009) might lower the "threshol" activity level that allows recovery to self-sustain. The model of Takiyama and Okada (2012) is very similar. It predicts that the inter-hemispheric activation induced by bimanual exercise facilitates cortical reorganization on the ipsilesional side.

A complementary view of the recovery process is provided by the attempts to characterize muscle synergies in a quantitative way (Ting and Macpherson, 2005; Tresch et al., 2006; Raghavan et al., 2010; Cheung et al., 2012). These models do not directly address the recovery mechanisms, but provide a window into the way muscle groups are recruited in different phases of the recovery process, and allow to distinguish whether recovery occurs through structural changes in muscle synergies, or as a tuning of existing synergies.

Only few models address the facilitatory role of assistive forces. In principle, assistive forces enable achieving the same motor performance with less voluntary contribution (Emken et al., 2007), but how this mechanism plays a role in the recovery process has not been extensively investigated. The "slacking" hypothesis predicts that reduced voluntary commands have a detrimental effect on recovery. A recent study confirmed this prediction, by demonstrating that subjects undergoing passive training exhibit less recovery than subjects that are actively involved in the exercise through an electromyography-driven robot. Hu et al. (2009) and Casadio and Sanguineti (2012) found that this effect may not be equally important in all subjects.

# **CAN COMPUTATIONAL MODELS PREDICT THE RECOVERY ON A PATIENT-BY-PATIENT BASIS?**

By "predicting the recovery" we mean estimating the future evolution—either spontaneous or induced by treatment—of the impairment and/or the functional performance of a specific subject, measured in terms of clinical scales and task-specific performance indicators. A number of mathematical models, e.g., Chaudhuri et al. (1988), Saeki et al. (1994), Oczkowski and Barreca (1997), Stineman et al. (1997), Lofgren et al. (2000) and Mirbagheri et al. (2012), have been proposed to predict the recovery outcome within different time scales. These models use different types of information, such as the initial degree of impairment and the nature, size or location of the lesion. In contrast, computational models explicitly account for the mechanisms by which recovery takes place, namely use-dependent neural reorganization and motor learning.

Some computational models focus on general principles, but do not address the dynamics of the recovery process of a specific individual. All models of the central level account for brain areas, CS circuitry and neural plasticity mechanisms in a way that cannot be immediately associated to empirical observations on individual subjects.

However, some comparison of these models with empirical data is still possible. For instance, in Reinkensmeyer et al. (2003) the lesion is modeled as a reduction in the number of available direction-tuned cells in the motor map. The size of the lesion can be related to the degree of impairment, a correlation that has been observed empirically. In principle it would be possible to personalize the model in terms of location and size of the lesion, or fraction of intact CS pathways (e.g., from imaging data), but the general focus is on quantities (e.g., the activity and the changes of the spatial tuning individual cortical columns) that are hard to associate to empirical measures.

Other models allow inferring how recovery would take place in a specific individual. These include all functional models and synergy-based models.

Specifically, Colombo et al. (2010) predict the time constant of the recovery process by fitting the time course of the performance data during the rehabilitation treatment. Casadio and Sanguineti (2012) predict the long-term retention of the recovery by using a state space model and by looking at the memory decay of the learning process. These models directly refer to observable quantities, so that it is relatively easy to identify subject-specific model parameters. Model parameters capture the modality with which one subject undergoes recovery. For instance, a model may allow estimating the subject's rate of spontaneous recovery, or may provide information on his/her peculiar response to mechanical perturbations (mechanical impedance).

All the above models enable significant, but limited predictions. All focus on limited aspects of the recovery process (cortical, muscular, functional) and only provide a limited account of their complex interplay. The importance of such interplay is exemplified by the prediction that the possibility that recovery will self-sustain can be inferred by evaluating if the amount of spontaneous use of the impaired arm reaches a certain threshold (Han et al., 2008).

In conclusion, multiple levels of description would be necessary. One important issue is whether these models can incorporate the specific features of one patient (nature, size and location of the lesion; type and degree of impairment). This is particularly important for the cortical level, for which there is a need for descriptions that are based on observable quantities.

# **DO MODELS ALLOW DESIGNING PATIENT-SPECIFIC "OPTIMAL" THERAPY?**

Computational models may enable designing patient-specific therapy, aimed at maximizing speed and amount of recovery of that patient.

Patient-specific models provide a description of his/her status and characteristics, e.g., impairment, articular and muscular synergies, residual movements and force generation abilities etc. They also provide a better understanding of what determines the recovery of that patient. This information can be used to define specific goals for treatment, and to assess its efficacy in terms of progress of the subject's status (motor strategies and functional behavior) toward the treatment goals.

Colombo et al. (2012) designed and tested a controller PTR that automatically selects exercise parameters (amplitude of the movement, number of sub-movements, assistance modality) based on their previous work on modeling the evolution of different performance indicators (Colombo et al., 2010).

Reinkensmeyer et al. (2007) proposed a robot controller based on the assist-as-needed principle, with a slacking mechanism similar to the one observed in humans. In several other applications, the magnitude of assistive force is adaptively regulated as a function of the observed outcome, in both the upper limb (Krebs et al., 2003; Vergaro et al., 2010) and the lower limb (Riener et al., 2005; Mihelj et al., 2007).

Bayesian regulation of robot-generated assistance (Squeri et al., 2011) is a direct derivation of the model of recovery proposed by Casadio and Sanguineti (2012).

The success of the above mentioned approaches suggests that an even greater advantage would come from the much more ambitious goal of designing treatment modalities that directly rely on patient-specific recovery models. In principle, if a model allows predicting evolution and final outcome of a specific rehabilitation intervention, it should be possible to use this same model as a basis for an optimal planning of the intervention. This may include the timing of the individual exercise sessions, the specific exercises to be administered, the trial-by-trial regulation of the degree of assistance, and the online planning of assistive or resistive forces (or other forms of stimulation, like Functional Electrical Stimulation (FES)).

Only few examples of model-based robot controllers for treatment have been proposed so far (Wolbrecht et al., 2008; Reinkensmeyer et al., 2012b). One major difficulty is that to incorporate recovery models in the robot controller requires to achieve a dual goal: the controller should select the goal and the difficulty level of the exercise based on the subject's state as predicted by the model and, at the same time, the model should be continuously adjusted, on the basis of the observed subject and robot performance while the treatment proceeds. Therefore, the resulting treatment protocol has to be a trade-off between exploitation (of the model) and exploration (of the treatment control space to keep the model up to date).

# **DIRECTIONS FOR FUTURE RESEARCH**

Computational approaches to the study of neuromotor recovery after stroke are innovative and promising, but still in their infancy. The models described in this review open a new view of the recovery process, and along these lines there is still a lot to understand, to discover, and to integrate in the clinical practice.

We can identify the following directions for future research: (i) multi-level models; (ii) the role of modularity in neuromotor recovery, and (iii) new modeling approaches.

The models reviewed in this paper capture important and complementary aspects of the recovery process at central, muscular, functional level. With few exceptions, the focus is on one single level of description. Convergence toward multiple levels of description may provide a more comprehensive representation of the different aspects of the recovery process and of their interconnections. Multi-level models are being successfully used in other related fields such as musculo-skeletal disorders see Fregly et al. (2012) for review—and are of great importance both for a more comprehensive understanding of the

# **REFERENCES**


dependent hemispheric balance. *J. Neurosci.* 31, 3423–3428. doi: 10.1523/JNEUROSCI.4893-10.2011 Barbay, S., Plautz, E. J., Friel, K. M., Frost, S. B., Dancause, N., Stowe, A. M., et al. (2006). Behavioral and neurophysiological effects of delayed training following a small ischemic infarct in primary motor cortex of squirrel monkeys. *Exp. Brain Res.* 169, mechanisms related to the observed phenomenon—in our case, neuromotor recovery—and for planning the most appropriate intervention.

The well-established notion of modularity in the motor system has been getting a renewed attention in computational motor control. In the context of neuromotor recovery, the view of motor strategies as the combination of a repertoire of muscle synergies may be the key to unveil the role of mechanical redundancy in counteracting the effects of a central focal lesion, either toward compensation or true recovery. A recent study (Overduin et al., 2012) highlighted the neural correlates of muscle synergies in two rhesus macaque monkeys. However, more work is needed to understand how a focal lesion affects muscle synergies and how they evolve over time as a consequence of spontaneous recovery and/or exercise-based therapy.

One main difficulty in this application domain is our limited understanding of the physiological mechanisms underlying synaptic plasticity and neural reorganization. Moreover, these phenomena are hard to monitor and quantify experimentally, so that models at central level are difficult to identify from empirical observations and have poor predictive power. Novel modeling approaches, relying on quantities that can be observed with simple and non-invasive procedures, like fMRI and EEG would facilitate progress in this respect. Novel modeling approaches can also be beneficial to the study of the functional level of description. Novel computational approaches used for describing the neural control of movements and motor learning might find insightful application in the study of the recovery process. Optimization is a particularly viable concept to describe neuromotor recovery. Given the available sensory information and the constraints that derive from the actual impairment (sensory, motor), it may suggest how the next voluntary command is selected. Bayesian inference, optimal control and reinforcement learning may play a role here.

Computational models have been successfully applied to the study of motor learning and adaptation, providing important insights with respect how brain controls movement and react to the environment or task variables changes. Their application to the rehabilitation field is fairly new and the first approaches suggest that they will lead to a deeper understanding of the mechanisms underlying neuromotor recovery.

# **ACKNOWLEDGMENTS**

This work was partly supported by the EU Grant FP7-ICT-271724 (HUMOUR), by a grant from the Italian Ministry of Research (PRIN 2009), and by the COST Action TD1006 (European Network on Robotics for NeuroRehabilitation).

106–116. doi: 10.1007/s00221-005- 0129-4


*bil. Eng.* 20, 276–285. doi: 10. 1109/TNSRE.2012.2195679


and neurorehabilitation. *Curr. Opin. Neurol.* 19, 84–90. doi: 10.1097/01. wco.0000200544.29915.cc


recovery. *Stroke* 38, 840–845. doi: 10.1161/01.STR.0000247943. 12887.d2


reaching impairment after stroke using a population vector model of movement control that incorporates neural firing-rate variability. *Neural Comput.* 15, 2619–2642. doi: 10. 1162/089976603322385090


*Med. Rehabil.* 75, 858–860. doi: 10. 1016/0003-9993(94)90109-0

Schaechter, J. D. (2004). Motor rehabilitation and brain plasticity after hemiparetic stroke. *Prog. Neurobiol.* 73, 61–72. doi: 10.1016/j.pneurobio. 2004.04.001

Scheidt, R. A., Dingwell, J. B., and Mussa-Ivaldi, F. A. (2001). Learning to move amid uncertainty. *J. Neurophysiol.* 86, 971–985.


131–140. doi: 10.1007/s00221-002- 1340-1


Wolbrecht, E. T., Chan, V., Reinkensmeyer, D. J., and Bobrow, J. E. (2008). Optimizing compliant, model-based robotic assistance to promote neurorehabilitation. *IEEE Trans. Neural Syst. Rehabil. Eng.* 16, 286–297. doi: 10.1109/TNSRE.2008. 918389

**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 22 March 2013; paper pending published: 26 April 2013; accepted: 25 June 2013; published online: 22 August 2013.*

*Citation: Casadio M, Tamagnone I, Summa S and Sanguineti V. (2013)* *Neuromotor recovery from stroke: computational models at central, functional, and muscle synergy level. Front. Comput. Neurosci. 7:97. doi: 10.3389/ fncom.2013.00097*

*This article was submitted to the journal Frontiers in Computational Neuroscience.*

*Copyright* © *2013 Casadio, Tamagnone, Summa and Sanguineti. This is an open-access article distributed under the* *terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

### *Y. P. Ivanenko1 \*, G. Cappellini 1,2, I. A. Solopova3, A. A. Grishin3, M. J. MacLellan1, R. E. Poppele4 and F. Lacquaniti 1,2,5*

*<sup>1</sup> Laboratory of Neuromotor Physiology, Santa Lucia Foundation, Rome, Italy*

*<sup>2</sup> Centre of Space Bio-Medicine, University of Rome Tor Vergata, Rome, Italy*

*<sup>3</sup> Laboratory of Neurobiology of Motor Control, Institute for Information Transmission Problems, Russian Academy of Sciences, Moscow, Russia*

*<sup>4</sup> Department of Neuroscience, University of Minnesota, Minneapolis, MN, USA*

*<sup>5</sup> Department of Neuroscience, University of Rome Tor Vergata, Rome, Italy*

### *Edited by:*

*Martin Giese, University Clinic Tuebingen/Hertie Institute, Germany*

### *Reviewed by:*

*Maurizio Mattia, Istituto Superiore di Sanità, Italy Walter Maetzler, University of Tuebingen, Germany*

### *\*Correspondence:*

*Y. P. Ivanenko, Laboratory of Neuromotor Physiology, Scientific Institute Foundation Santa Lucia, 306 via Ardeatina, 00179 Rome, Italy e-mail: y.ivanenko@hsantalucia.it*

Human locomotor movements exhibit considerable variability and are highly complex in terms of both neural activation and biomechanical output. The building blocks with which the central nervous system constructs these motor patterns can be preserved in patients with various sensory-motor disorders. In particular, several studies highlighted a modular burst-like organization of the muscle activity. Here we review and discuss this issue with a particular emphasis on the various examples of adaptation of locomotor patterns in patients (with large fiber neuropathy, amputees, stroke and spinal cord injury). The results highlight plasticity and different solutions to reorganize muscle patterns in both peripheral and central nervous system lesions. The findings are discussed in a general context of compensatory gait mechanisms, spatiotemporal architecture and modularity of the locomotor program.

**Keywords: compensation, plantarflexor weakness, spinal cord injury, stroke, EMG activity, modularity, locomotor pattern generation**

# **INTRODUCTION**

Investigating locomotor responses after neurological lesions is fundamental to the development of improved rehabilitation strategies and to explore the mechanisms involved in improving locomotor function. The problem of motor neurorehabilitation is significant and complex. Numerous studies have shown that motor activity after brain damage plays an essential role in anatomo-physiological reorganization, which may occur in the areas adjacent to the damage (Cao et al., 1998; Nelles et al., 1999). Nevertheless, the building blocks with which the central nervous system constructs the motor patterns can be preserved in patients with neurological disorders. In particular, several studies highlighted a modular burst-like organization of muscle activity.

While biomechanical and neural aspects of human locomotion have been documented in many studies both in normal and pathological gait, the architecture of neural circuits and the nature of descending neural signals that are involved in locomotor control remain elusive in humans. To date, little work has been completed on characterizing the neural substrates for modularity in both healthy individuals and in neurological patients with different sensory-motor disorders. A number of studies explored the bases of central motor programming by decomposing muscle activation patterns as a means to look backward from the periphery to the CNS (Davis and Vaughan, 1993; Prentice et al., 1998; d'Avella and Bizzi, 2005; Ivanenko et al., 2006; Giszter et al., 2007; Tresch and Jarc, 2009; Chvatal and Ting, 2012; Bizzi and Cheung, 2013; Lacquaniti et al., 2013). While different studies use different decomposition techniques, the common message is the emphasis on modular architecture of the motor output. Furthermore, these computational techniques often converge to a similar solution (Ivanenko et al., 2005; Tresch et al., 2006). The data and concepts discussed here offer a new approach to characterizing the mechanisms underlying control of human locomotion that may potentially benefit the study of pathological gait and the ability of current therapeutic exercises to improve patient outcomes.

In patients, the mechanisms involved in locomotor improvements may rely on the inherent spatiotemporal organization of neural circuitry and its adaptability. The question arises as to whether the rhythmic patterning elements are invariant when muscle activation patterns can be compromised by spinal cord lesions, brain damage and other motor disturbances. Compensatory strategies for plantarflexor weakness or after distal limb segment amputation also represent important examples of locomotor adaptations. Several recent studies provide some clues on this topic. We will consider and discuss these examples in the following sections.

### **LOCOMOTOR PATTERNS IN HEALTHY SUBJECTS**

Muscle activity during normal locomotion has both invariant and variant features. In each step, the control system needs to compensate for body weight, provide forward and lateral stability and maintain forward progression. The coordination of the musculoskeletal system with non-linear properties and multiple degrees of freedom is complex and requires activity of tens of leg muscles. Major muscle activity during walking tends to be organized in bursts at specific moments of the gait cycle (**Figure 1A**) to perform specific functions dictated by the biomechanics of bipedal walking (Winter, 1989; Zajac et al., 2003; Lacquaniti et al., 2012). For instance, in early stance hip and knee extensors contribute to weight acceptance at heel contact (**Figure 1B**). Ankle

**FIGURE 1 | Motor patterns in healthy volunteers. (A)** Ensemble-averaged EMGs (*n* = 8 subjects) recorded from 10 ipsilateral leg muscles during walking on a treadmill at 5, 7, and 9 km/h. At 9 km/h, there is an "atypical" burst of activity in several thigh muscles that is synchronous with the peak activity in the calf muscles [the data are illustrated from Ivanenko et al. (2008)]. ST, semitendinosus; BF, biceps femoris; TFL, tensor fascia latae; SART, Sartorius; Gmed, RF, rectus femoris; VM, vastus medialis; VL, vastus lateralis; MG, gastrocnemius medialis; LG, gastrocnemius lateralis; SOL, soleus; TA, tibialis anterior. **(B)** ensemble-averaged (±*SD*) ankle, knee and hip moments of force (normalized to the subject's weight) of the right leg during overground walking at about the same speeds (as in panel **A**) in one representative healthy subject.

plantar flexors provide body support and forward propulsion in late stance while ankle dorsiflexors and hip flexors contribute to foot lift-off in early- to mid-swing. Simultaneously, erector spinae muscles activate at this time to stabilize the trunk. In late swing, hamstrings decelerate the leg in preparation for heel contact and then stabilize the pelvis. Throughout the entire step cycle, adductor muscles contribute to the control of medio-lateral accelerations of the center of body mass. However, it is worth noting that most leg muscles that are involved in the control of forward progression in the sagittal plane have a noticeable lateral component of force production and thus are also involved in the control of motion in non-sagittal planes. Even though there is a relationship between the neural and biomechanical control of the gait cycle, evidentially the system is much more complex due to the dynamic coupling of multiple body segments (e.g., Zajac et al., 2003).

Whereas this "invariant" picture of muscle activation (**Figure 1A**) has been documented in healthy subjects, there are also variant features of muscle activity depending on the context and differences between individuals. If one aims at reactivating the "normal" motor patterns in patients and extrapolating them to any walking condition, this may not be of benefit to the patient due to the specific pathology or individual differences that occur in pathological as well as healthy subjects. Muscle activity in healthy subjects may show very non-linear changes in both amplitude and temporal envelope, e.g., with changing the speed or body weight support even while kinematics patterns remain similar (Ivanenko et al., 2002; Lacquaniti et al., 2002). For instance, the amplitude of EMG activity of "anatomical" synergists may diverge remarkably in these conditions: lateral and medial gastrocnemius muscles at different walking speeds (Huang and Ferris, 2012), soleus and gastrocnemius muscles at different levels of limb loading (McGowan et al., 2010). With body weight unloading (Ivanenko et al., 2002), most muscles (e.g., gluteus maximus and distal leg extensors) decrease their activity, while other muscles demonstrate a "paradoxical" increment of activation (e.g., quadriceps) or considerable changes in the activation waveforms (hamstring muscles). In addition, muscle activity patterns are shaped by the direction of progression (e.g., forward vs. backward, Grasso et al., 1998, or walking along a curved path, Courtine et al., 2006). In particular, such studies suggest that a comparison of normal and pathological gait should be preferably performed in the same stepping conditions.

There is also notable inter-individual variability in muscle activity during walking (Winter and Yack, 1987). The most variable patterns are observed in the proximal and bi-articular muscles especially at lower walking speeds. For example, quadriceps activity is virtually silent in some subjects at low speeds (*<*4 km/h), whereas it is still present in others (Ivanenko et al., 2002). Finally, notable systematic changes in the EMG activity during walking occur with age, e.g., co-contraction of leg muscles in infants (Forssberg, 1985; Teulier et al., 2012; Ivanenko et al., 2013b) and widening of EMG patterns in elderly (Monaco et al., 2010).

Nevertheless, a number of recent studies using statistical analyses of EMG suggest that the nervous system may adopt a relatively simple control strategy (e.g., Ivanenko et al., 2005; Cappellini et al., 2006; Clark et al., 2010; McGowan et al., 2010; Monaco et al., 2010). Using pattern recognition mathematics, both the stereotypical activation patterns and across-stride variability of these patterns can be accounted for by combining and scaling a small set of basic activation components. Such tendency for a few basic patterns to account for about 90% of variance indicates that leg muscles tend to group their activity in order to perform specific biomechanical functions during gait.

While intuitively one would expect some changes in the EMG activity in patients, the present question is whether the basic modular patterns or the functional grouping of muscles are conserved in pathological participants. Below we summarize various examples of motor patterns in patients with both peripheral and central lesions. The main focus is placed on the studies that analyzed multi-muscle EMG patterns.

# **LOCOMOTOR PATTERNS IN PERIPHERAL LESIONS**

Compensatory strategies for plantarflexor weakness or after below-knee amputation represent an important example of gait adaptation. The human bipedal gait and heel-to-toe rolling pattern are unique (Bramble and Lieberman, 2004) and require a specific inter-segmental coordination (Lacquaniti et al., 2002), balance control and walking experience for acquisition of plantigrade gait at the beginning of independent walking (Forssberg, 1985; Ivanenko et al., 2007). Plantarflexor muscles are an important muscle group that regulates the gait speed, compensates for body weight and provides the vertical and horizontal (anteriorposterior shear) forces during the push-off phase. Weakness of the plantarflexors is considered as one of the limiting factors that prevents humans from walking at faster speeds (Nadeau et al., 1999; Brunner and Romkes, 2008).

In addition to the development of extensor forces in the distal antigravity muscles, there is an important sensory feedback from these muscles and from numerous foot receptors. Peripheral neuropathy and aging may result in muscle weakness and substantial impairments of sensory feedback and balance control (Nardone et al., 2000, 2006; Nardone and Schieppati, 2004; Mazzaro et al., 2005). For instance in older adults, ankle plantarflexor work remains relatively constant at increasing speeds, in contrast to the systematic increase in ankle work output with walking speed in young adults (Winter et al., 1990; Judge et al., 1996).

# **PATIENTS WITH LARGE FIBER PERIPHERAL NEUROPATHY**

Weakness of distal extensors in patients with large-fiber neuropathy can be observed after acute nerve compression in the sciatic notch associated with a reduced level of motor and sensory function. After sciatic nerve compression there may be a loss of reflexes, movement skills, sensation in the affected area, and atrophy of the affected muscles can occur (Hagiwara et al., 2003). Sciatica commonly refers to pain that radiates along the sciatic nerve and is typically felt in the back of the leg and possibly to the foot, and is one of the most common forms of pain caused by compression of the spinal nerves.

We analyzed adaptations of gait patterns at different walking speeds in four patients with a unilateral large-fiber neuropathy of S1 innervation resulting from acute nerve compression in the sciatic notch. Plantarflexor weakness on the affected side was evidenced by subjective difficulty to lift and support the body weight on the forefoot region (forefootstanding) by plantarflexing the ankle joint during standing. Reflexes and sensory thresholds were all normal in the contralateral leg. **Figure 2** illustrates EMG patterns in these patients during walking and slow and fast speeds. After acute nerve compression in the sciatic notch, the patients walked somewhat slower (even their self-selected fast speed was *<*5 km/h) than healthy individuals. The EMG patterns differed from healthy individuals walking at the same speeds (**Figure 1A**) and were variable between the patients as well as between left and right leg muscle activities (**Figure 2**). A part of these differences may originate from slightly different EMG electrode placements and/or skin impedance. Nevertheless, we found an interesting cooperation of distal and proximal extensors (**Figure 2**, marked in red) and discuss its general functional significance for bipedal gait adaptations (Dickey and Winter, 1992; Beres-Jones and Harkema, 2004; Nene et al., 2004; Ivanenko et al., 2008, 2013a).

The "atypical" burst of activity in the proximal leg muscle was more prominent in patient 1 (**Figure 2**) on the affected side, while it could be observed also in other patients on the contralateral side. To understand better it's link to the kinematics and kinetics of gait, we recorded patient 1 again during overgound (**Figure 3B**) and treadmill walking (**Figures 3A,C**). The second time (1 year later, session 2) the patient had recovered all reflexes and reduced sensation was limited to the lateral, plantar surface of the foot. The patient was now able to fully support his body weight on the left leg during "forefoot standing," although some weakness still remained. The most prominent decrements in angular oscillations and angular velocities were observed in

**[the same patient as in Figure 2A but recorded 1 year later (session 2)]. (A)** ensemble-averaged (across 12 consecutive steps) joint angular displacements (left panel, mean ± *SD*) and amplitudes of angular joint motion (right panel) during walking at 3 km/h. Asterisks denote significant differences. Note significantly smaller distal joint oscillations on the affected (left) side. **(B)** ensemble-averaged (*n* = 5 steps) vertical (*Fz* ) and horizontal anterior-posterior (*Fx* ) ground reaction forces, and ankle, knee and hip joint moments of force normalized to the patient's weight during overground walking at ∼5 and 7 km/h in session 2 (left panels). The patterns are plotted

Ensemble-averaged bilateral EMG activity of leg muscles during walking on a treadmill at different speeds. In session 1 (**Figure 2A**), the patient could walk only at speeds up to 5 km/h due to plantarflexor weakness, while in session 2 we recorded walking in a wide range of speeds (3–9 km/h). Adapted from Ivanenko et al. (2013a). Note a prominent burst of activity (marked in red) in the proximal extensors during late stance on the affected (left) side at low speeds in the session 1 (**Figure 2A**) and only at higher speeds (*>*5 km/h) in session 2. Furthermore, at 9 km/h the "atypical" burst of activity was present in both legs, as in healthy subjects (see **Figure 1A**).

the ankle joint motion (**Figure 3A**). In the knee joint, angular motion asymmetry was significant only at higher speeds (7 and 9 km/h). When the plantarflexor strength and/or reflexes were compromised in one leg, the primary compensatory mechanism was an increase in activity of proximal extensor muscles (VL, RF, VM, TFL evidenced by red circles in **Figure 2**) during late stance. This compensatory effect was observed at all recorded walking speeds in patient 1 in session 1 (**Figure 2A**) but only at the higher speeds (*>*5 km/h) in session 2 (**Figure 3C**) which required greater propulsion forces. This suggests a possible link to the extent of the weakness in the ankle extensors.

Healthy volunteers typically do not show activity in the proximal extensor muscles during late stance in normal walking (Nilsson et al., 1985; Winter, 1989; Prilutsky and Gregor, 2001; Nene et al., 2004; Cappellini et al., 2006). However, the atypical proximal burst was present in all healthy subjects at high, nonpreferred walking speeds (**Figure 1A**, right panel). The proximal muscles involved (RF, VM, VL, SART, Gmed, TFL) were activated synchronously with the distal extensors (MG, LG, SOL) as was observed for the patient. At the higher speeds, the muscle gain, or force produced for a given level of activation, may be lower due to the muscle force-velocity relationship (Neptune and Sasaki, 2005), so the force produced by ankle extensors alone may be insufficient to initiate lift-off or to provide appropriate limb stiffness. The proximal activation may be recruited to compensate by supplying additional extensor torque and stiffness.

Similarities can also be noted between this type of adaptation and unilateral acute pathologies such as hemiplegia (Knutsson and Richards, 1979), below-knee amputation (Winter and Sienko, 1988) or unilateral ischemic block of distal leg muscles in healthy subjects (Dickey and Winter, 1992). Co-activation of distal and proximal extensors during stance in each leg was also observed in the clinically incomplete spinal cord injury individuals during weight-bearing treadmill stepping (Beres-Jones and Harkema, 2004, see also below) and in healthy adults during walking on a slippery surface (Cappellini et al., 2010). Such similarity indicates that this kind of adaptation seems not to depend on the location of the lesion of the neuronal system but may be related to the activation of existing basic activation patterns or muscle synergies.

Nevertheless, despite a potential broad-spectrum functional significance of this compensatory response to plantarflexor weakness, its biomechanical nature remains puzzling. One would perhaps expect the cooperation of distal and proximal extensors during knee-flexed locomotion, e.g., as it happens during digitigrade gait requiring a significant positive (anti-gravity) knee torque during stance. However, in our case (**Figure 3B**) there was no generation of positive knee moments of force in response to extra activation of knee extensors, likely reflecting dynamic coupling between body segments (Zajac et al., 2003). Moreover, biomechanical simulations of the compensatory strategies in response to muscle weakness do not seem to predict the appearance of "atypical" burst of activity in the proximal extensors (Goldberg and Neptune, 2007). Perhaps a complete consideration of all factors affecting gait optimization is necessary, including a 3D rather than 2D musculoskeletal gait model (e.g., Gmed, TFL, and SART may generate a noticeable trunk torsion or lateral force component, Dostal et al., 1986) and taking into account the mechanisms that regulate leg stiffness during walking (Fiolkowski et al., 2005).

Whatever the exact biomechanical reasons for the observed phenomenon (**Figure 3**), it is important to emphasize that the timing of the reaction in the proximal muscles corresponded to the timing of the calf muscle activation. This supports the idea of a temporal architecture of the locomotor program linked to specific kinematic events (Ivanenko et al., 2005, 2006; Giszter et al., 2007; McGowan et al., 2010) or critical points in the processing of sensory information (Saltiel and Rossignol, 2004) in the gait cycle.

## **TRANSTIBIAL AMPUTEES**

Various efforts have been made to restore the normal EMG patterns in patients, presumably by reactivating the CPG (central pattern generator) circuitry or more directly by functional electrical stimulation (FES) (Thrasher et al., 2006; Solopova et al., 2011) or when implementing myoelectric control of powered limb prostheses in amputees (Au et al., 2008; Huang et al., 2011). For instance, transtibial amputees can learn to volitionally activate residual leg muscles (Au et al., 2008; Ha et al., 2011; Hargrove et al., 2011) that can be used for movement intent recognition in the myoelectric control of powered limb prostheses. However, such strategies and their underlying hypotheses are often based on the assumption that the motor patterns are relatively invariant across different walking conditions, for instance, when using FES for gait rehabilitation (Thrasher et al., 2006). Below we consider a reorganization of EMG activity in transtibial amputees.

Below-knee amputation represents a severe damage of the neuromuscular apparatus of the leg and impaired sensory feedback. As a result, the EMG activity in amputees may be compromised by these factors. Indeed, below-knee amputation may result in the EMG patterns different from those in healthy subjects. For instance, co-activation of distal and proximal extensors during stance, similar to that described in the previous section (**Figures 2**, **3C**), was also observed in below-knee amputees (Winter and Sienko, 1988).

Of particular interest is residual lower leg muscle activation following such an amputation. For instance, if the CPG output were relatively fixed (e.g., providing an alternating activity of flexors and extensors, Zehr, 2005) one would not expect major changes in residual muscle activation profiles. **Figure 4** illustrates the results of a recent study on multi-muscle EMG activity in both proximal and residual leg muscles during walking in transtibial amputees (Huang and Ferris, 2012). In the upper leg muscles, the data showed that amputee subjects had greater inter-subject variability in their biceps femoris and gluteus medius muscle activation profiles compared to control subjects during walking, as well as a different BF activation profile shape (**Figure 4**, right panels). Amputee subjects also demonstrated reliable muscle recruitment signals from residual lower leg muscles recorded within the prosthetic socket during walking. However, the grouping of muscles activated together differed from that in controls (see, for instance, "atypical" co-activation of Gmed, BF and MG in the A02 subject or TA, BF and Gmed in the A10 subject). Overall, muscle activation profile variability was higher for amputee subjects than for control subjects (Huang and Ferris, 2012). Nevertheless, it is interesting to note that muscle recruitment signals in amputees tended to be locked to particular phases of the gait cycle (**Figure 4**).

# **LOCOMOTOR PATTERNS IN CENTRAL LESIONS**

Contrary to impairments in the peripheral sensory feedback or neuromuscular apparatus, nervous system lesions may essentially affect central controllers and thus provide some insights into the spatiotemporal organization of neural circuitry. In particular, if muscle modules are indeed mechanisms by which task level biomechanical goals are implemented, one would expect that impairments to the neural control of such modules would directly result in impaired biomechanical outputs (Cheung et al., 2009; Ivanenko et al., 2009; Clark et al., 2010). In addition, lesion at different levels of the neuraxis could differentially affect locomotor control. Adaptation of gait after cortical, subcortical or spinal cord damage might thus represent the experimental field in which one might test such hypotheses and the therapeutic relevance of different interventions.

### **STROKE PATIENTS**

Post-stroke locomotor impairments are often associated with abnormal spatiotemporal patterns of muscle coordination (Knutsson and Richards, 1979; De Quervain et al., 1996; Mulroy et al., 2003; Den Otter et al., 2007). Furthermore, impaired locomotor coordination in post-stroke may be accompanied by fewer modules (Clark et al., 2010; Safavynia et al., 2011), though in a recent study Gizzi et al. (2011) have argued that impulses of activation rather than muscle synergies are preserved in the locomotion of subacute stroke patients (**Figure 5**). The discrepancies could be accounted for by a different set of recorded muscles or different populations of patients. The authors of both studies hypothesized that identification of motor modules may lead

to new insight into how nervous system injury may alter the organization of motor modules and their biomechanical outputs. Furthermore, entraining appropriate motor modules can be of a major importance for neurorehabilitation of gait in these patients since many of them develop an abnormal stereotype of movement during walking, which is difficult to correct.

A similar conclusion has been reached in recent studies on the upper limb control (Cheung et al., 2009, 2012). All patients studied suffered from a mostly unilateral cortical and subcortical lesion resulting from either an ischemic or a hemorrhagic stroke. The robustness of muscle synergies observed in that study supports the notion that descending cortical signals represent neural drives that select, activate, and flexibly combine muscle synergies specified by networks in the spinal cord and/or brainstem and suggest an approach to stroke rehabilitation by focusing on those synergies with altered activations after stroke. Nevertheless, despite higher variability in muscle activation patterns, all these studies suggest a modular organization of muscle coordination underlying motor control in both healthy and post-stroke subjects.

In fact, one of the effective approaches to gait rehabilitation after stroke consists in using step synchronized FES of leg muscles (Yan et al., 2005; Tong et al., 2006; Ferrante et al., 2008). FES has been shown to be an effective tool for muscle strength augmentation, increase in the range of motion in joints and improvements in walking in neurological patients (Popovic et al., 2009). FES is delivered in reference to the timing of natural muscle excitation during movement and it provides additional sensory reinforcement which ultimately improves learning. As a result, the locomotor centers or networks are excited or released from inhibition in phase with their expected activity and thus may be accessible for correction or stimulating effects (Yan et al., 2005; Tong et al., 2006; Popovic et al., 2009; Solopova et al., 2011). Thus, this approach takes advantage of the spatiotemporal architecture of the locomotor program and increases the patient's functional abilities and the effectiveness of rehabilitation.

### **SPINAL CORD INJURY**

Flexibility and adaptability of locomotor patterns are evident from monitoring and analyzing the spatiotemporal spinal segmental output after spinal cord injury. For instance, in motor incomplete paraplegics who recovered independent control of their limbs, an additional activation burst is present in the lumbosacral enlargement at full loading (**Figures 6A,C**). The presence of this burst is related to abnormal activation of the quadriceps muscle during this time. Patients can be trained to step with body weight support unassisted, but they use activity patterns in individual muscles that were often different from healthy individuals.

A number of clinical trials have suggested the possible beneficial effects of locomotor training in SCI patients (Edgerton and Roy, 2012). In patients with severe SCI disorders, initial training is performed while being supported by a harness or with their body partially unloaded. As well, assistance of leg movements by the therapist (or robotics) may be required. These patients frequently show EMG patterns different from those of healthy

individuals suggesting that human spinal cord can interpret differently loading- or velocity-dependent sensory input during stepping (Beres-Jones and Harkema, 2004). One method used to study such variability involves reconstructing the total output pattern of motoneuron activity of the lumbosacral enlargement of the spinal cord by mapping the recorded EMG waveforms onto the known charts of segmental localization (Ivanenko et al., 2006). Spatiotemporal maps of motoneuron activity are generally different from those of healthy subjects (**Figure 6C**; Grasso et al., 2004). The legs may also show muscle activity which is not-systematically synchronized with the gait cycle on the most affected side (**Figure 6B**; marked in red). Although training of stepping in patients with body weight support can be facilitated in a laboratory setting, often new coordinative strategies appear. Training can be utilized in order for patients to relearn foot kinematics of healthy individuals, but the muscle activation patterns used to generate these kinematics differ from that of a healthy group (Pepin et al., 2003; Grasso et al., 2004). While most incomplete paraplegics can recover independent control of leg muscles sufficient to propel the limbs in swing and to support body weight in stance, complete paraplegics, were unable to and they typically used their arms and body to assist the leg movements. SCI patients largely relied on proximal and axial muscles to lift the foot and to project the limb forward.

It is though that spinal lesions may trigger plasticity including modified synaptic strengths, sprouting and anatomical development of new circuits as well as plasticity of unlesioned descending pathways, including both subcortical and cortical motor areas. Stepping may also depend more heavily on cortical (and voluntary) control after severe spinal lesions (Van den Brand et al., 2012) than it does in healthy subjects, where locomotion may be more automatic. The spinal cord itself does also contribute to the proposed adaptation mechanisms. Indeed, experiments on both animals and SCI patients suggest that the spinal cord is capable of adaptive locomotor plasticity with training after spinal lesion (Hodgson et al., 1994; Belanger et al., 1996; Heng and de Leon, 2007) or after peripheral motor nerve lesions (Bouyer et al., 2001).

Modular pattern generator elements (or burst synergies), nevertheless, tend to be conserved after spinal cord injury (Fox et al., 2013; Giszter and Hart, 2013). Our previous work in SCI patients (Ivanenko et al., 2003) has shown a similar set of temporal components from EMG activity (**Figure 6D**). In addition, muscles both rostral and caudal to the lesions could be strongly weighted on a given component. However, some patients do exhibit a smaller number of basic components during walking (Ivanenko et al., 2003; Hayes et al., 2011; Fox et al., 2013) although these results may be dependent on the number and selection of muscles recorded during stepping. Nevertheless, similar activation timings in SCI patients (**Figure 6D**) may be ultimately related to the global kinematic goal (a motor equivalent solution; Grasso et al., 2004) or the necessity to apply forces at particular phases of the gait cycle (Lacquaniti et al., 2012). These data highlight the importance of understanding the modular structure of motor behaviors to best provide principled therapies after central nervous system lesions (Giszter and Hart, 2013).

# **CONCLUDING REMARKS**

Taken together, the data support the idea of plasticity and distributed networks for controlling human locomotion (Scivoletto

and incomplete (at 0.89 m/s, right panel) SCI individuals [modified from Beres-Jones and Harkema (2004) and Maegele et al. (2002) with permission of the authors]. The stance phase in the right panel is evidenced by the elevation in the ground reaction force trace and indicated by the shaded region. MH, medial hamstring; load, vertical ground reaction force. **(B)** Ensemble-averaged (across 5 strides) EMG patterns in the SCI-C patient during walking at a natural speed (∼3.1 km/h). Note variable and weaker muscle activity on the most affected side (marked in red). **(C)** Examples of spatiotemporal patterns of α-motorneuron activity in the lumbosacral enlargement in controls and

et al., 2007). Tens of muscles participate in the control of limb and body movements during locomotion, and redundancy in the neuromuscular system is an essential element of gait adaptability (Winter, 1989; Cai et al., 2006; Noble and Prentice, 2006; Ivanenko et al., 2009; Molinari, 2009; Duysens et al., 2013). Indeed, experimental studies performed on individuals with wellidentified pathologies have demonstrated distinct adaptations. Due to muscle redundancy, various neuromotor strategies may exist to compensate for decreased muscle strength and joint stiffness (Grasso et al., 2004; Goldberg and Neptune, 2007; Ivanenko et al., 2009; Gordon et al., 2013).

A modular motor organization may be needed to solve the degrees of freedom problem in biological motor control (Giszter et al., 2010). Nevertheless, there are still many open questions related to the choice of appropriate modules, their task EMG waveforms (normalized method, see Ivanenko et al., 2006) onto the known charts of segmental localization. White vertical lines denote stance-to-swing transition time. **(D)** Time course of the temporal components in controls and patients for stepping at 2 km/h, 0–75% body weight support. The components extracted by factor analysis from individual subjects. Right panel illustrates weighting coefficients of the temporal components in individual activity patterns of 12 muscles for all groups of subjects in a color coded scale. Adapted from Ivanenko et al. (2003). Note similar basic EMG components in controls and patients as opposed to quite different EMG patterns and weighting coefficients.

dependence, influence of sensory input and adaptation to the malfunctioning of neural networks in the case of different gait pathologies. While many studies succeeded in a decomposition of motor patterns into a few "motor modules," nevertheless, the way in which the central nervous system combines them together, how and where the weighting coefficients are encoded are not understood. Often it is difficult to distinguish what primarily comes from pathology and what comes from compensatory mechanisms. We suggest that many adaptive features in various neurological disorders are likely compensatory, including modified EMG patterns during walking. In view of task dependence of muscle synergies, it would be interesting also to compare them in different behaviors and examine whether plasticity in muscle patterns originates from sharing these common modules or by creating new muscle synergies.

An impulsive (burst-like) controller made of a lowdimensional set of time-delayed excitation pulses has been also thoroughly considered in a simulation study from the biomechanical viewpoint (Sartori et al., 2013). In particular, simulated gait motions based on a few modular activation patterns were successfully produced (see also Neptune et al., 2009; Allen and Neptune, 2012; Allen et al., 2013). Once calibrated, the musculoskeletal model could work in open-loop, approximating joint moments over multiple degrees of freedom using only the recorded kinematics and the internal impulsive controller. The accuracy of estimation of the joint torques was comparable when using the low-dimensional activation signals (Sartori et al., 2013). This approach has substantial implications for the design of human machine interfaces for prosthetic and orthotic devices.

# **REFERENCES**


*Gait Posture* 27, 399–407. doi: 10.1016/j.gaitpost.2007.05.009


Uncovering a common underlying neural framework for the modular control of human locomotion and its development represent an interesting avenue for the future work. Motor primitives may reflect in a way how the nervous system develops, by building up or modifying modules as it matures. Some functional units are likely inborn, others may develop later or be dependent on individual body size/proportions or experience (Dominici et al., 2011). Such investigations may have important implications related to the construction of gait rehabilitation technology.

# **ACKNOWLEDGMENTS**

Supported by the Italian Ministry of Health, Italian University Ministry (PRIN project), Italian Space Agency (DCMC and CRUSOE grants) and European Union FP7-ICT program (MINDWALKER grant # 247959 and AMARSi grant #248311).

*J. Neurosci.* 32, 12237–12250. doi: 10.1523/JNEUROSCI.6344-11.2012


Mondì, V., Cicchese, M., et al. (2011). Locomotor primitives in newborn babies and their development. *Science* 334, 997–999. doi: 10.1126/science.1210617


neuroethological perspective. *Prog. Brain Res.* 165, 323–346. doi: 10.1016/S0079-6123(06)65020-6


for stepping during development from infant to adult. *J. Neurosci.* 33, 3025–3036a. doi: 10.1523/JNEUROSCI.2722-12.2013


motor system in hemiplegic stroke patients: a positron emission tomography study. *Stroke* 30, 1510–1516. doi: 10.1161/ 01.STR.30.8.1510


474–480. doi: 10.1007/s00221005 0591


spinal cord injury using functional electrical stimulation. *Spinal Cord* 44, 357–361. doi: 10.1038/sj. sc.3101864


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 30 November 2012; accepted: 16 August 2013; published online: 10 September 2013.*

*Citation: Ivanenko YP, Cappellini G, Solopova IA, Grishin AA, MacLellan MJ, Poppele RE and Lacquaniti F (2013) Plasticity and modular control of locomotor patterns in neurological disorders with motor deficits. Front. Comput. Neurosci. 7:123. doi: 10.3389/fncom. 2013.00123*

*This article was submitted to the journal Frontiers in Computational Neuroscience.*

*Copyright © 2013 Ivanenko, Cappellini, Solopova, Grishin, MacLellan, Poppele and Lacquaniti. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Are muscle synergies useful for neural control?

#### *Aymar de Rugy1 \*, Gerald E. Loeb2 and Timothy J. Carroll <sup>1</sup>*

*<sup>1</sup> Centre for Sensorimotor Neuroscience, School of Human Movement Studies, The University of Queensland, Brisbane, QLD, Australia <sup>2</sup> Department of Biomedical Engineering, University of Southern California, Los Angeles, CA, USA*

### *Edited by:*

*Andrea D'Avella, IRCCS Fondazione Santa Lucia, Italy*

### *Reviewed by:*

*Lena H. Ting, Emory University and Georgia Institute of Technology, USA Kevin Englehart, University of New Brunswick, Canada*

### *\*Correspondence:*

*Aymar de Rugy, Centre for Sensorimotor Neuroscience, School of Human Movement Studies, The University of Queensland, Room 424, Building 26, St. Lucia, QLD 4072, Australia. e-mail: aymar@hms.uq.edu.au*

The observation that the activity of multiple muscles can be well approximated by a few linear synergies is viewed by some as a sign that such low-dimensional modules constitute a key component of the neural control system. Here, we argue that the usefulness of muscle synergies as a control principle should be evaluated in terms of errors produced not only in muscle space, but also in task space. We used data from a force-aiming task in two dimensions at the wrist, using an electromyograms (EMG)-driven virtual biomechanics technique that overcomes typical errors in predicting force from recorded EMG, to illustrate through simulation how synergy decomposition inevitably introduces substantial task space errors. Then, we computed the optimal pattern of muscle activation that minimizes summed-squared muscle activities, and demonstrated that synergy decomposition produced similar results on real and simulated data. We further assessed the influence of synergy decomposition on aiming errors (AEs) in a more redundant system, using the optimal muscle pattern computed for the elbow-joint complex (i.e., 13 muscles acting in two dimensions). Because EMG records are typically not available from all contributing muscles, we also explored reconstructions from incomplete sets of muscles. The redundancy of a given set of muscles had opposite effects on the goodness of muscle reconstruction and on task achievement; higher redundancy is associated with better EMG approximation (lower residuals), but with higher AEs. Finally, we showed that the number of synergies required to approximate the optimal muscle pattern for an arbitrary biomechanical system increases with task-space dimensionality, which indicates that the capacity of synergy decomposition to explain behavior depends critically on the scope of the original database. These results have implications regarding the viability of muscle synergy as a putative neural control mechanism, and also as a control algorithm to restore movements.

**Keywords: aiming movement, muscle coordination, motor control, biomechanics, optimal control**

# **INTRODUCTION**

There is now considerable evidence from a broad range of tasks and contexts that the activity of multiple muscles can appear to be well-approximated by only a few muscle synergies, each defined as a set of fixed relative levels of muscle activation (d'Avella et al., 2003, 2006; Torres-Oviedo et al., 2006; Tresch and Jarc, 2009; Dominici et al., 2011; Roh et al., 2012). Although many view this as a sign that muscle synergy is an important principle used by the nervous system to control movement, we believe that the viability of synergies as control elements needs to be evaluated in relation to task achievement rather than only to accuracy in accounting for observed muscle activity. All synergy decomposition procedures (including, for example, those based on convenient optimization algorithms such as the non-negative matrix factorization; Lee and Seung, 2001), care only about explaining as much variance of muscle activity as possible. These procedures are therefore completely blind to any consideration of task achievement, ignoring the functional significance of the (typically modest) musclespace errors that are inevitably introduced when approximating an original muscle pattern using fewer synergies than muscles. However, because the musculoskeletal system has complex and highly non-linear properties, statistical methods that minimize errors in the input signal (muscle activity) may result in unacceptably large errors in the output (limb kinematics). A careful assessment of behavioral errors introduced by synergy decomposition therefore appears necessary to evaluate the viability of synergy as a potential biological control principle. Such errors would also affect the utility of synergy decomposition as a potential control strategy to restore movement artificially such as with functional electrical stimulation (FES) or myoelectric controls (Davoodi et al., 2003; Parker et al., 2006; Hargrove et al., 2009).

Neptune and colleagues evaluated whether muscle synergies extracted during human walking would actually produce well coordinated locomotion (Neptune et al., 2009; Allen and Neptune, 2012). They found that the activations of muscle synergies required substantial fine-tuning based on their consequences in task-space (i.e., minimizing difference between actual and simulated walking kinematics and ground reaction forces) to achieve satisfactory motor behavior. This suggests that relatively small errors produced by muscle synergies in reproducing muscle activation patterns can lead to important functional deficits. Other studies demonstrated the capacity to generate functional movements with a limited number of muscle synergies (McKay and Ting, 2008, 2012; Berniker et al., 2009; Kargo et al., 2010). These also required fine-tuning of synergy activation to produce reasonable behavior, a requirement that might result from discrepancies between the real and modeled biomechanics. None evaluated the functional consequence of synergy decomposition by comparing the movements predicted by the extracted synergies with those actually occurring when the basis-set of electromyograms (EMG) signals was recorded.

A likely reason for the lack of attention that has been devoted to the functional consequences of synergy approximation is the complexity of the mapping between muscle activities and their resulting effects on the limb. In addition to an accurate biomechanical model, effective forward simulation of limb kinematics from EMG requires an accurate measurement of the activation of each muscle. However, EMG are subject to crosstalk (i.e., contamination by nearby muscles) and representativeness issues (e.g., regional segregation of early recruited, slow-twitch motor units vs. late-recruited, fast-twitch units; Chanaud et al., 1991b), and are therefore imperfect measures of muscle activation (Staudenmann et al., 2010; Hug, 2011). To accommodate for this, we designed a practical forward simulation approach whereby a virtual representation of muscle biomechanics is defined that best reconstructs force when driven by EMG recordings (de Rugy et al., 2012c). This "virtual biomechanics" technique offers a unique opportunity to map the functional consequences of synergy approximation: Because the mapping between muscle and force is explicitly defined and used to control the task, the mapping between muscle synergies and force is also explicit and enables unambiguous assessment of the functional consequences of using extracted synergies compared to the original EMG signals in order to account for the motor behavior actually measured during those EMG recordings.

Another advantage of the explicit representation of muscle biomechanics defined with this technique is that it provides an experimental basis from which we can compute muscle activity according to the principles of optimal control theory; to achieve the task while minimizing a cost such as effort or variability of movement. Optimal control theory has been shown to reproduce patterns of muscle recruitment that are consistent with the existence of motor synergies (Todorov, 2004; Chhabra and Jacobs, 2006; Diedrichsen et al., 2010), and muscle synergies have been used to simplify the computational cost of optimization in several optimal control schemes (Todorov et al., 2005; Lockhart and Ting, 2007; Berniker et al., 2009). Here, we compared directly synergy decomposition and the resulting task performance between real data and simulated optimal muscle patterns for the same task. This comparison should indicate whether synergies similar to those extracted from real data can result from an optimal control scheme, and whether we can use simulated optimal muscle patterns in place of unavailable EMG to explore functional consequences of synergy decomposition in arbitrary biomechanical systems. First, we evaluated the functional consequences of synergy decomposition using previous data obtained when subjects performed force-aiming in two dimensions at the wrist, with force reconstructed online from EMG recordings (de Rugy et al., 2012c). Then, we computed the optimal muscle pattern that minimized the summed squared muscle activities (Fagg et al., 2002; Diedrichsen et al., 2010) for the virtual biomechanics extracted for each subject, and compared synergy decomposition on this simulated pattern with that obtained on real data. We also assessed synergy decomposition on the optimal muscle pattern for the higher dimensional elbow-joint complex (13 muscles), and explored reconstruction from an incomplete set of muscles as this represents the vast majority of cases for which recordings are available from only a subset of contributing muscles. The idea that sampling of muscle degrades the estimate of muscle synergies has been addressed in the literature (Clark et al., 2010; Ting and Chvatal, 2010; Allen and Neptune, 2012), but here we additionally determined the influence of the redundancy of the muscle selection on both synergy approximation and task performance. Finally, we explored the implications of increasing the dimensionality of the task (i.e., from 2-d to 3-d) for synergy decomposition of the optimal muscle pattern for an arbitrary biomechanical system. The scope of the original database is known to influence the results of synergy decomposition (Macpherson, 1991; Ting and Chvatal, 2010; Burkholder and van Antwerp, 2012), and we wanted to evaluate the influence of task dimension on synergy decomposition of optimal muscle patterns.

# **MATERIALS AND METHODS**

# **WRIST EXPERIMENT**

We re-analyzed data from Experiment 1 in de Rugy et al. (2012c), applying synergy decomposition methods and measures, and additionally assessed their consequences in task space.

## *Participants*

Six healthy, right-handed subjects (all men, aged 23–38) volunteered for this study. All had normal or corrected to normal vision and gave informed consent prior to the experiment, which was approved by the local ethics committee and conformed to the Declaration of Helsinki.

### *General procedure*

Subjects sat 80 cm from a computer display positioned at eye level. The right hand was maintained in a custom-made manipulandum with the forearm in a neutral position (midway between pronation and supination, as displayed **Figure 1**). The elbow was kept at 110◦ with the forearm parallel to the table and supported by a custom-built device. The wrist was fixed by an array of adjustable supports, contoured to fit the hand at the metacarpalphalangeal joints (12 contacts) and the wrist just proximal to the radial head (10 contacts). This allowed wrist forces to be applied without the need for a gripping force. Wrist forces were recorded using a 6-df force/torque transducer (JR3 45E15A-I63-A 400N60S, Woodland, CA) coupled with the wrist manipulandum.

Real-time visual feedback of either the real wrist forces or the reconstructed wrist forces was presented on the visual display. Targets were presented at 16 radial positions around the center of the display (i.e., 22.5◦ apart). Flexion/extension corresponded to the horizontal axis (flexion left) and radial/ulnar deviation corresponded to the vertical axis (radial deviation up).

A block of 32 maximal voluntary contraction (MVC) trials was first conducted for each subject. This block was used to normalize

the activity of each muscle during the aiming task to the maximal EMG obtained in that muscle during MVC toward any target direction. Each of the 16 target directions was presented twice in a randomized order. For each direction, subjects were asked to raise their force rapidly to the maximal extent while maintaining the force direction within a delineated range of ±8◦ of target direction. Maximal forces were held for approximately 2 s. Fifteen seconds were allowed for rest before the next target appeared in another direction.

The experiment contained a "force-driven" block in which the visual cursor used to reach targets represented the real force, followed by an "EMG-driven" block in which the cursor represented the reconstructed force. The force-driven block consisted of 96 trials (six trials for each of the 16 target directions) in which a low level of force (i.e., 22.5 N, which represents approximately 20% MVC for the subjects tested) was required to reach targets. This level of force was identical across all subjects, and chosen to reduce the possibility of fatigue. Each trial began only if the cursor was maintained less than 5% of the target distance from the origin continuously for 200 ms. The origin was calibrated to zero force along both axes (wrist relaxed) prior to each block. A random delay (1–2 s) elapsed before a single target appeared coincident with a brief tone. Participants were asked to move the cursor to the target with a movement time of between 150 and 250 ms, defined as the time between 10 and 90% of the radial distance to the target, and to hold the cursor continuously for 1 s within the target zone (a trapezoid ±8◦ from target direction by 10% of radial distance to target). A high-pitched tone signaled that the target had been acquired. If the target was not acquired within 2 s of target presentation, a low-pitched tone indicated the end of the trial. A second tone (200 ms after the first) indicated whether the movement time was correct (high tone) or not (low tone), and a bar graph provided visual feedback of the movement time in relation to the prescribed time window. Both the target and cursor disappeared at target acquisition or trial end, and at least 1 s elapsed before the start of the next trial. For each block, six consecutive trials were conducted for each one of 16 randomly ordered targets. The "EMG-driven" block was identical to the "force-driven" blocks, with the only exception that the real force feedback was replaced by the reconstructed force.

# *EMG procedure*

Bipolar electromyographic signals were recorded from extensor carpi radialis longus (ECRl), extensor carpi radialis brevis (ECRb), flexor carpi radialis (FCR), flexor carpi ulnaris (FCU), and extensor carpi ulnaris (ECU) muscles, with selfadhesive surface electrodes. Signals were band-pass filtered from 30 Hz–1 KHz, amplified 200–5000 times (Grass P511, Grass Instruments, AstroMed, West Warwick, RI, USA), and sampled at 2 KHz. Electrode locations were determined according to procedures previously reported (Selvanayagam et al., 2011).

# *Data reduction and analysis*

Muscle tuning curves, or the time-independent muscular activity (**a**) for the different target directions, were determined for each muscle as the mean rectified EMG during the hold-phase of the task (i.e., in a time window from 300 to 1000 ms after movement onset), averaged over five trials to each target (the first of the six consecutive trials to each target was discarded to prevent the uncertainty about target direction from contaminating the data).

Virtual biomechanics, the representation of muscle biomechanics that best reaches the target when combined with EMG data, was extracted from muscle tuning curves obtained from the "force-driven" block as indicated in de Rugy et al. (2012c). A coordinate descent was used to determine the set of pulling vectors **P** (**Figure 1**) that resulted in the best aiming performance, i.e., that minimizes endpoint errors *E* = - **x**targ − **x** - - <sup>2</sup> between cursor positions **x** *(***x** = **P a**) and target positions **x**targ. This coordinate descent used the following steps: (1) Assign random values to the initial set of pulling vectors in the physiological range of muscle force and direction. (2) Pick a muscle at random and modify its pulling vector by changing its endpoint by a step in four orthogonal directions. The target errors associated with each of the five pulling vectors (i.e., the original and the four modified for that muscle) was then calculated as the summed squared error between targets and reconstructed reaches, and the pulling vector that produced the lowest cost was retained. (3) One iteration of the model was said to be completed when each muscle had been optimized once. (4) The whole model was iterated until the overall cost converged to a low value.

The resulting set of pulling vectors was then multiplied online by the rectified filtered EMG of the five muscles to reconstruct force used as a feedback in the "EMG-driven" condition. This virtual biomechanics was also used to compute the optimal muscle pattern, as explained below. We showed previously that muscle tuning curves for this data set were not different in the "force-driven" and in the "EMG-driven" blocks (de Rugy et al., 2012c).

Synergy extraction was conducted using the non-negative matrix factorization algorithm (Lee and Seung, 2001) on muscle tuning curves obtained in the "EMG-driven" block, for which we unambiguously know the mapping between muscle activity and task space. The muscle activation pattern **a** was first normalized such that each muscle has unit variance, and the normalized pattern **a<sup>∗</sup>** was approximated with N muscle synergies according to

$$\mathbf{a^\*} \approx \hat{\mathbf{a^\*}} = \mathbf{c} \,\mathrm{w}$$

where **a<sup>∗</sup>** is a matrix with each component representing the normalized activation of a specific muscle for a specific target direction, **a**ˆ<sup>∗</sup> is the approximated muscle pattern, **w** is a matrix with each component representing the activation of a specific synergy for a specific target direction, and **c** is a matrix of non-negative scaling coefficients.

The goodness of synergy approximation was calculated as a multivariate R2 (Mardia et al., 1979; d'Avella et al., 2006):

$$\mathbf{R}^2 = 1 - \frac{SSE}{SST} = 1 - \frac{\left\|\mathbf{a^\*} - \hat{\mathbf{a}}^\*\right\|^2}{\left\|\mathbf{a^\*} - \overline{\mathbf{a}}^\*\right\|^2}$$

where *SSE* is the sum squared residuals and *SST* is the summed squared residual from the mean normalized activation vector (**a**¯**∗**). We also computed the variance accounted for (VAF), a related measure where *SST* is simply the summed squared activation, i.e., calculated on uncentered data (Cheung et al., 2005; Roh et al., 2012):

$$\text{VAF} = \text{1} - \frac{\text{SSE}}{\text{SST}} = 1 - \frac{\left\| \mathbf{a^\*} - \hat{\mathbf{a}}^\* \right\|^2}{\left\| \mathbf{a^\*} \right\|^2}$$

For each synergy decomposition, an associated aiming error (AE) was computed as the distance between targets **x**targ and the force vector produced by combining the pulling vectors with the (un-normalized) muscle activity approximated by the synergies (**x**ˆ = **Pa**ˆ):

$$\text{AE } = \left\| \hat{\mathbf{x}} - \mathbf{x}\_{\text{tag}} \right\|^2$$

For each subject and synergy number (*N* = 1 − 5), the synergy decomposition was conducted 10 times and averaged values were obtained for each of the three measures (R**2**, VAF, and AE).

### **WRIST SIMULATIONS**

We computed the optimal muscle pattern **a**opt for the set of pulling vectors extracted individually for each subject using the procedure described in Fagg et al. (2002). This procedure minimizes the following composite cost *C*, which ensures task achievement by minimizing target errors while simultaneously minimizing the summed squared muscle activations:

$$C = \frac{1}{2} \left\| \mathbf{x}\_{\text{target}} - \mathbf{x} \right\|^2 + \frac{\lambda}{2} \left\| \mathbf{a}\_{\text{opt}} \right\|^2$$

where λ is a regularization parameter set to 0.02 to represent allowable errors on the order of 2% of movement magnitude. Synergy decomposition was applied on the optimal muscle pattern as on experimental data, and differences between experimental and simulated data were tested using a two way [data type (experimental vs. simulated) × number of synergy (1–5)] repeated measures ANOVAs for the three measures (R2, VAF, and AE). Differences between AEs produced with different number of synergies were also tested on experimental data using a paired sample *t*-test. The significance level was set to α = 0*.*05.

### **ELBOW SIMULATIONS**

The optimal muscle pattern was also computed on an existing biomechanical model of the arm for a similar center-out isometric task performed at the elbow joint complex in the flexion/extension and supination/pronation workspace (de Rugy et al., 2009; de Rugy, 2010). The biomechanical model developed by Davoodi and colleagues (**Figure 4A**; Davoodi et al., 2002a,b; de Rugy et al., 2008; de Rugy, 2010) was used to extract the pulling vectors of 13 arm and forearm muscles in this workspace (**Figure 4B**): Supinator (SUP), the short and long heads of Biceps Brachii (BIC ln and sh), Brachialis (BRA), Brachioradialis (BRD), Pronator Teres (PT), Pronator Quadratus (PQ), the long, medial, and lateral heads of Triceps (TRI ln, m, and lt), and three wrist muscles (FCR, ECRl, and ECRb; please note that the two remaining wrist muscles, FCU, and ECU, were not included because their moments are negligible in that workspace).

Synergy decomposition was conducted on the optimal muscle pattern according to the method described above for the wrist, and the procedure was repeated 100 times to obtain the mean and standard error of the three measures (R2, VAF, and AE).

To account for the vast majority of cases in which recordings are limited to an incomplete set of muscles, we also explored reconstructions from a limited number of muscles. In particular, we considered two qualitatively different selections of eight muscles amongst the 13 muscles: a "redundant" selection (**Figure 5A**, BIC ln and sh, BRA, BRD, TRI ln, lt, and m, PT), and a "less redundant" one (**Figure 5B**, BICln, BRA, TRIln, PT, SUP, PQ, FCR, ECRb). The idea behind this choice of qualitatively different sets of muscles was that the level of biomechanical redundancy might translate into relationships between activation of muscles that would be visible through synergy decomposition. R2 and VAF were calculated as before for these incomplete sets of muscles, and AE was computed on the basis of the virtual biomechanics reconstructed from these eight muscles only (i.e., based on the set of pulling vectors that best achieve targets when combining the activity of the eight muscles rather than on the true pulling vectors). This was designed to assess the quality of the reconstruction in both muscle space and task space for common situations in which recordings are only available from fewer muscles than those contributing the task.

## **SIMULATIONS WITH ARBITRARY PULLING VECTORS IN 3 AND 2 DIMENSIONS**

We explored the implications of changing the dimensionality of the task-space for synergy decomposition using the optimal muscle pattern for an arbitrary biomechanical system represented by a set of 13 pulling vectors in three and two dimensions. A set of 13 unit vectors approximately uniformly distributed in three dimensions was first defined using a repulsive iterative algorithm (**Figure 6A**). Then the same iterative algorithm was used to generate a set of 200 targets approximately uniformly distributed on a sphere (**Figure 6B**), and the optimal muscle pattern for these pulling vectors and targets was determined that minimized the same cost function as for the wrist and elbow (i.e., composite cost with target errors and summed squared muscle activations). The corresponding simulations in two dimensions were conducted using the original set of 16 targets used for the wrist and elbow, and the same set of 13, 3-dimensional pulling vectors by ignoring the third dimension (Z on **Figure 6B**, 2-d pulling vector shown **Figure 6C**). The optimal muscle pattern was computed as before (**Figure 6D**). Synergy decomposition and the mean and standard error of the three main measures (R2, VAF, and AE) were conducted and obtained as for the elbow system.

# **RESULTS**

# **SYNERGY DECOMPOSITION ON REAL AND OPTIMAL DATA AT THE WRIST**

**Figure 2A** shows how synergy decomposition approximates the original muscle pattern for one subject, and **Figures 2B,C** show how this approximation translates into AEs for the same subject (**B**) and for all six subjects (**C**). This figure illustrates that the muscle pattern reconstructed with synergy only starts to resemble the real pattern with three or four synergies, but also that the remaining muscle-space differences translate into substantial AE. These errors disappear only when five synergies have been extracted, which reflects the unreduced dimensionality of the complete musculoskeletal system.

**Figure 3** shows that the goodness of the muscle reconstruction increases as expected with the number of synergies, and that AEs decrease accordingly. It is worth noting that a VAF of approximately 0.5 is obtained with only one synergy, where the cursor hasn't moved from the center of the workspace toward any target (**Figure 2C**). This illustrates that 50% of the variance in muscle activity is accounted for by a synergy decomposition that is doing no better at reaching targets than simply not activating any muscles. **Figures 2B,C** also illustrate that synergy decomposition systematically results in undershoot errors. The reason for this is obvious with one synergy, where the solution found by the nonnegative matrix factorization algorithm takes the form of muscles that co-contract to their average activity level in the original muscle pattern. For other number of synergies, the muscle pattern approximated with fewer synergies than muscles similarly results in wider muscle tuning curves, which produces undershoot errors when summing muscle contributions at the joint.

**Figure 3** shows that the reconstruction with four synergies explained most of the variance of the muscle pattern (VAF = 0*.*95 and *R*<sup>2</sup> = 0*.*91) while still producing considerable task errors (averaged error of 13.5 % of target distance, visible **Figure 2C**),

that are significantly higher than errors produced by the original muscle pattern (3.6% averaged error; *t* = 8*.*11 *p <* 0*.*0005).

**Figure 3** also shows that synergy decomposition conducted on optimal muscle patterns computed for the virtual biomechanics extracted from individual subjects generated similar results to synergy decomposition of real data. There were no differences in R2 or AE values calculated on real vs. simulated data [*F(*1*,* <sup>5</sup>*) <* 4*.*98, *p >* 0*.*08], although VAF values were slightly higher when calculated on real data than optimal data [*F(*1*,* <sup>5</sup>*)* = 9*.*19, *p* = 0*.*03]. These results indicate that synergy decomposition of optimal muscle recruitment patterns produces results similar to those obtained from real EMG signals. This makes it possible to explore the inherent consequences of synergies interacting with realistic musculoskeletal dynamics without the potential confounds introduced by missing, poorly sampled or noisy EMG signals.

# **SYNERGY DECOMPOSITION ON OPTIMAL DATA AT THE ELBOW JOINT COMPLEX**

**Figure 4** shows that synergy decomposition conducted on the optimal muscle pattern (**C**) computed for the 13 muscles of the biomechanical arm model provides results (**D**) closer to those usually reported in the literature; i.e., a goodness of the

reconstruction (both VAF and R2) that quickly rises to reach an asymptotic level at which most muscle variance is explained with substantially fewer synergies than muscles. For instance, the 90% variance level is reached with only four synergies (VAF = 0*.*94 and *R*<sup>2</sup> = 0*.*90) and almost the entire variance is explained with five synergies (VAF = 0.98.5 and *R*<sup>2</sup> = 0*.*97*.*5). However, the associated AEs are still substantial (14.6% and 7.5% of target distance with four and five synergies, respectively), and at least six synergies are needed to generate AEs below 5% of target distance (i.e., 4.1% with six synergies).

The situation changes when only an incomplete selection of muscles is available to perform the synergy decomposition and the force reconstruction, reflecting the common experimental situation in which recordings are not available from all contributing muscles. **Figure 5** shows that the goodness of muscle approximation is better for the selection of redundant muscles than for the selection of less redundant muscles. For instance, at least 90% of the variance is explained with three and four synergies with the set of redundant muscles, but this degree of variance explained requires an additional synergy with the set of less redundant muscles. Inversely, the AE reconstructed by combining synergy approximation with the virtual biomechanics extracted on the incomplete set of available muscles is much higher for the redundant muscles than for the less redundant muscles. In particular, the reduction of error saturates above 30% from 4 to 8 synergies for the set of redundant muscles, but monotonically decreases to a minimum of 3.8% for the less redundant muscles.

It is worth noting that if AEs associated with synergy decomposition are better for the set of less redundant muscles, they are still substantially higher than with the complete set of muscles. For instance, AEs for 5–7 synergies are 29, 17.5, and 8% for the set of less redundant muscles, and 14.6, 7.5, and 4.1% for the complete set of muscles. This indicates that aiming performance suffers more from synergy approximation with an incomplete set of muscles.

# **SYNERGY DECOMPOSITION ON ARBITRARY BIOMECHANICS IN 2 AND 3 DIMENSIONS**

**Figure 6E** shows that synergy decomposition conducted on the optimal muscle pattern computed on the 13 muscles of the arbitrary biomechanical model provides different results for the twoand three-dimensional versions of the task. As for the elbow system, the optimal muscle pattern for the 13 two-dimensional pulling vectors (**Figures 6C,D**) reaches the 90% variance level with only four synergies (VAF = 0*.*94, *R*<sup>2</sup> = 0*.*90, AE = 14*.*6%). In contrast, nine synergies were required to reach the same 90% variance level for the three-dimensional version of the task (VAF = 0*.*94, *R*<sup>2</sup> = 0*.*91, AE = 12%). For both cases, an additional synergy is required for the averaged AE to drop below 10% (i.e., 8% with five synergies and 9.8% with 10 synergies for the two- and three-dimensional cases, respectively). This indicates that the capacity of synergy to explain optimal muscle patterns and their functional outcome depends critically on the scope of the original database, where higher dimensional behaviors will require more synergies or will produce poorer fits.

# **DISCUSSION**

The purpose of this study was to assess the functional consequences of approximating an actual pattern of muscle recruitment with fewer synergies. We first used previous data obtained when people performed a force-aiming task in two dimensions at the wrist, where force was reconstructed online from EMG recordings (de Rugy et al., 2012c), to show that despite successfully explaining muscle activities, synergy-approximated EMG data would introduce substantial errors in task space. Then, we showed that synergy decomposition on the optimal muscle pattern that minimizes summed-squared muscle activities for a representation of muscle biomechanics produces similar muscle approximations, with the same functional consequences. We also assessed the influence of synergy decomposition on the optimal muscle pattern computed for the more redundant elbow-joint complex, to show that when selecting an incomplete set of muscles, the redundancy of that selection has opposite effects on the goodness of muscle approximation and on task achievement: higher redundancy is associated with better muscle approximation, but with higher AEs. Finally, we showed that increasing the dimensionality of the task-space from 2 to 3 dimensions also increases the number of synergies required to approximate the optimal muscle pattern and produce low AEs.

If synergies are used only as a tool to summarize observed muscle activation patterns, then they are no different from regression analysis, in which the goodness of fit depends simply on the number of free parameters and the complexity of the source data. Instead, synergies were introduced originally by Bernstein (1967 translation of 1934 book) and continue to be offered as a theory for how the nervous system solves a very specific control problem known as redundancy. The musculoskeletal system

contains more degrees of mechanical freedom and more muscles than required to perform tasks successfully. If the nervous system must compute the pattern of muscle activations required to perform a given task, how does it decide which of many solutions to use? One solution is to add performance criteria that can be optimized by one and only one solution (e.g., minimize trajectory errors in the face of noise or minimize effort to conserve energy). Computing or discovering such optimal solutions tends to be extremely difficult for systems with the complexity of a typical limb (reviewed by Valero-Cuevas et al., 2009 and Loeb, 2012). An alternative solution is for there to be arbitrary restrictions on the available patterns of muscle recruitment, either as a consequence of hard-wired neural circuits or learned motor habits. The synergies extracted by decomposition of observed EMG patterns would then be indicative of this control strategy at work. The validity of synergies as a neural control strategy thus depends on its necessity (what are the alternatives?) and its utility (what are the consequences?), which are discussed below.

### **MUSCLE SYNERGIES INTRODUCE AIMING ERRORS**

The importance of considering error introduced in the task space when assessing the usefulness of muscle synergies is clearly illustrated by the fact that in our wrist isometric task, about 50% of muscle variance is accounted for (i.e., VAF) by only one synergy. In this case, the solution found by the non-negative matrix factorization algorithm takes the form of muscles that co-contract to their average activity level in the original muscle pattern. When summing muscle contributions at the joint, this resulted in zero net force, which therefore translated in no movement whatsoever toward the force targets. This extreme case might indicate that centered data should be preferred when calculating the goodness of muscle approximation (i.e., use R2 instead of VAF) to avoid over-interpreting spurious results arising from the nature of the statistical method. However, both R2 and VAF are relatively insensitive to the functional consequences of the synergy approximation of muscle activity. For instance, approximating the activity of the five wrist muscles with four synergies explained most of the muscle variance (R2 and VAF *>* 0.9), while still

missing the force targets by 13.5% of target distance. This is important because this order of magnitude of explained muscle variance has been considered as an accurate description in different contexts [e.g., R2 of 0.85 in reaching (Muceli et al., 2010) or VAF of 0.88 for locomotion (Oliveira et al., 2012)]. It is interesting to note that synergy decomposition systematically resulted in undershoot errors (**Figure 2**) because it inevitably produces inappropriate cocontraction.

The assessment of the functional consequences of muscle approximation by synergies was possible here because for the isometric task at relatively low force (approximately 20% of MVC), the relationship between muscle activity and task space is likely to be close to linear. Although we have not tested our virtual biomechanics technique in contexts where this might not be the case, this linear relationship might be required for subjects to perform the task similarly well with either the real force or the force reconstructed online from EMG recordings (de Rugy et al., 2012c). Because the synergy decomposition was conducted on data obtained when the task was performed with reconstructed force, the mapping between EMG and task space was known, and we applied this mapping directly to calculate errors introduced by synergy decomposition in task space. It remains that the mapping between muscle activity and task space is likely to be far more complex in broader dynamic contexts that include strongly nonlinear relationships between muscle force and velocity for a given level of activation (Brown et al., 1999), which might introduce more errors. For example, the final positions of center-out reaches will depend more on the relatively small EMG signals but large forces that stop the movement on target than on the larger EMG signals but smaller forces in the agonists that accelerate the limb at the beginning of the task. This illustrates that an important part of the control problem might reside in an arbitrarily small proportion of unexplained muscle activation variance.

As mentioned in the introduction, we believe that the complexity of the mapping between muscle activity and the resulting action is the primary reason for the scant attention that has been devoted to the functional consequences of synergy approximation. Although Neptune and colleagues found that synergies were a useful starting point, they required substantial fine-tuning based on their consequences in task-space to produce well coordinated locomotion (Neptune et al., 2009; Allen and Neptune, 2012). There is no question that the statistical methods used to extract

synergies from EMG recordings must capture substantial features of the very behavior from which the recordings were obtained. The question of whether these synergies reflect an organizing principle of neural control depends on their capacity to sufficiently account for behavior. This condition was not met in either the walking studies or in the wrist task presented here.

# **MUSCLE SYNERGIES MIGHT ARISE FROM, OR SUBSERVE OPTIMAL CONTROL**

Despite the shortcomings mentioned above in relation to AEs generated with the wrist system, we found that synergy decomposition produces similar results on real and simulated (optimal) data. This direct comparison extends previous reports that optimal control schemes produce synergy-like properties (Todorov, 2004; Chhabra and Jacobs, 2006) by showing both qualitative and quantitative matches within the same protocol. Because the deleterious effects of synergy decomposition in terms of AEs at the wrist might relate to the relatively low muscle redundancy of that system, we also explored the more redundant elbow-joint complex. We show that synergy decomposition on the optimal pattern for the 13 muscles of that system produces results that correspond to those typically reported in muscle synergy studies (d'Avella et al., 2006, 2008; Roh et al., 2012), with a goodness of muscle approximation that quickly rises to reach an asymptote level at which most muscle variance is explained with substantially less synergies than muscles. Thus, properties that are typical of the muscle synergy hypothesis arise from the inherent principles of optimal control, highlighting the obvious possibility that synergy might just be a by-product of an alternate control scheme rather than a control principle in itself (Todorov, 2004; Chhabra and Jacobs, 2006; Diedrichsen et al., 2010). One such control scheme that could result in muscle activations that resemble the output of muscle synergies involves the online computation of optimal task solutions by feedback control laws (i.e., optimal feedback control; Todorov, 2004; Diedrichsen et al., 2010). However, muscle synergies have also been suggested to subserve optimal feedback control as a possible neural control element to reduce the high computational cost associated with online optimization (Todorov et al., 2005; Berniker et al., 2009) or to implement a simple feedback rule at task level (Lockhart and Ting, 2007; Ting and McKay, 2007).

Alternatively, muscle synergies might prevail over optimal control schemes. Similarities between real and optimal muscle patterns might reflect behaviors that have developed and evolved to minimize biologically relevant costs, and whose production is mediated by muscle synergies that are relatively less flexible at shorter time scale. In fact, we showed recently that when faced with novel biomechanics, participants adapted by scaling their original muscle patterns linearly rather than re-optimizing them (de Rugy et al., 2012b), which could at first glance appear in favor of the existence of hard wired synergies. However, we also found that muscle patterns observed when simulating the biomechanics of a posture different from the real posture were better described by a linear scaling of the muscle pattern associated with the real posture than with the simulated posture (de Rugy et al., 2012a,b). This result is not consistent with a re-optimization of activation signals to a set of fixed muscle synergies, as this should have enabled reproducing the optimal muscle pattern associated with the simulated posture. This result could potentially be explained by posture- or feedback-specific synergies (Cheung et al., 2005; d'Avella et al., 2008), but this would be inconsistent with the definition of muscle synergy as a fixed, linear combination of muscle activations, and the associated potential benefits in terms of dimensionality reduction for higher level controllers. Indeed, if synergies are allowed to vary depending on the task or context, these variations would become additional degrees of freedom requiring both additional circuits that can produce the additional synergies and control circuits to select among them. Alternatively, the observed limitation of flexibility of posture-dependent muscle patterns could conceivably pertain to stored and recalled activation signals to synergies rather than to the synergies themselves. However, this would be difficult to distinguish from alternative control schemes involving stored motor programs that would be based on an equal or higher number of control signals than muscles. The concept of synergies might be defined broadly to reflect any tendency to use muscles in learned patterns rather than requiring the existence of specific circuits that generate fixed combinations of muscle activations. But in that case, it is really a neologism for regression analysis that has no predictive value as a reductionist theory of motor control.

# **THE NUMBER OF SYNERGIES INCREASES WITH TASK DIMENSION**

Our simulations show that when going from aiming in two to three dimensions with the same (arbitrary) biomechanical system, more synergies are required to well-approximate the optimal muscle pattern and to generate sufficiently small errors (**Figure 6E**). This illustrates that the capacity of synergy to explain behaviors depends critically on the scope of the original database, where more diverse behaviors will require more synergies or will produce poorer fits. In other word, the seemingly high capacity to account for most activity of numerous muscles with only few linear synergies might hold in restricted experimental contexts (Loeb, 2000), but is expected to deteriorate for more diverse natural behaviors (Macpherson, 1991).

The need to study sufficiently rich behavioral sets has been well-recognized in EMG studies of behaving animals, in which it is possible to surgically implant selective and precisely positioned recording electrodes (Loeb and Gans, 1986). A monkey can learn gradually to selectively modulate muscles that appear to be closely synergistic on both anatomical and electrophysiological grounds if some mechanical advantage can thereby be attained (Cheng and Loeb, 2008). Cats walking on a treadmill exhibit stereotypical patterns of synergy in some major muscles but appear to have learned idiosyncratic patterns of use for smaller muscles (Loeb, 1993), which patterns depend on the musculoskeletal mechanics of the limb rather than genetically specified spinal pattern generators for locomotion (Loeb, 1999). Even within anatomically singular muscles, neuromuscular compartments that have somewhat different mechanical actions on the skeleton can be differentially recruited for some but not all tasks (Chanaud et al., 1991a,b; Pratt et al., 1991; Pratt and Loeb, 1991). These refinements of neural control are likely to go unappreciated in the EMG databases obtainable from surface electrodes, but they seem likely to be present nonetheless in humans.

# **IMPLICATIONS FOR THE USE OF SYNERGY IN ARTIFICIAL CONTROL**

FES typically requires the transformation of a motor goal into control signals designed to stimulate muscles in order to achieve that goal (Davoodi et al., 2003; Loeb and Davoodi, 2005). We illustrated previously that the nervous system does not seem to reoptimize activation signals to a set of muscle synergies (de Rugy et al., 2012b), but this does not mean that the principle of low dimensional control modules cannot be useful to restore movements artificially with FES. For instance, optimal muscle patterns computed for the biomechanics of a particular limb could be decomposed into fewer synergies, that could be used to enable online optimization onto fewer control signals (Todorov et al., 2005), or to implement of simple feedback rules at task-level without requiring to solve complex redundancy problems (Lockhart and Ting, 2007; Ting and McKay, 2007). The value of such schemes, however, remains contingent upon whether their benefits in reducing computational cost outweigh task-space errors introduced by the original approximation into fewer synergies.

Myoelectric control is another important area where muscles are used to restore movement artificially, although in this case muscles are used to generate control signals rather than to receive them. In contrast to traditional myoelectric prostheses where muscle activities are translated into velocity about a joint, the goal of contemporary myoelectric control research is toward the simultaneous and proportional control of multiple degrees of freedom (Parker et al., 2006; Jiang et al., 2009). In this context, a recent technique that involves transferring residual nerves to alternative muscle sites targeted motor reinnervation (TMR) has increased the number of muscle signals available to control a prosthetic device in amputees (Kuiken et al., 2007, 2009). Although promising, this re-innervation technique is unlikely to restitute the complete set of original control signals. Similar recording limitations obtain in intact musculoskeletal systems. This is why we explored simulations with incomplete sets of muscles. The first consideration, before synergy decomposition, is to reconstruct the motor output from the available muscles. We showed previously that this could be done with our virtual biomechanics technique, which empirically finds a representation of muscle biomechanics that best reconstructs force when driven by EMG recordings (de Rugy et al., 2012c). Here, we additionally show that on simulated (optimal) muscle patterns, the reconstruction is better for an incomplete set of muscles that are less redundant than for a set of more redundant muscles. Then, irrespective of how good or bad this reconstruction is, synergy decomposition would add additional errors on the top of it. From that perspective, applying synergy decomposition onto the set of available muscles appears of little use, and it seems rather more appropriate to make full use of all available muscles to best reconstruct the motor output in task space. This is essentially what is now done with the method of principal components analysis of TMR recordings (Jiang et al., 2009).

Nevertheless, synergy decomposition might suggest a useful, although paradoxical way to decide which muscles to record from in the case of myoelectric controllers in which the number of available myoelectric channels is limited. Those muscles that are most difficult to decompose into a small number of synergies should be accorded the highest priority for obtaining command signals.

## **A HIGH DIMENSIONAL ALTERNATIVE TO SYNERGY**

Finally, we ask whether there really is a redundancy problem to be solved at all. The notion that the nervous system might control movement through a limited number of synergies seems at odds with the high number of neurons available to process the transformation between sensory information and motor commands, as well as with the numerous divergent pathways that have been suggested by some as a possible basis for the implementation of muscle synergies. For instance, the processes of sensorimotor transformation and adaptation are well described by gain fields, or population codes formed by numerous basis neurons each responding to a particular range or combination of inputs (Andersen et al., 1985; Salinas and Abbott, 1995; Pouget and Snyder, 2000; Baraduc et al., 2001). Although decoding algorithms such as those developed for neural prosthetics involve a great deal of dimensionality reduction to extract motor goals from neuronal populations (Musallam et al., 2004; Hauschild et al., 2012), there seems to be no compelling reason to believe that the nervous system should operate a comparable dimensionality reduction into muscle synergies before increasing the dimensionality again to pools of motor units that have mechanically distinct actions. It has been suggested that the nervous system may actually encode more muscle synergies than muscles, but that only a subset of the entire synergy library is used in any given task (Chiel et al., 2009). Such a scheme clearly does not alleviate redundancy, and although it could conceivably simplify control within the context of a given task, it would require an additional control process to select appropriate synergies for each task. It would also appear to be impossible to generate testable hypotheses regarding the existence of an unlimited number of unrealized synergies.

If the nervous system does not control movements through a limited number of synergies, then how does it decide which of many good-enough (i.e., redundant) motor programs to use? Recent modeling work that includes the spinal cord circuitry provides interesting insight into this question. Indeed, a system with a large number of control inputs to a realistic set of interneuronal pathways was found to enable a simple learning algorithm to rapidly converge to physiological solutions (Raphael et al., 2010; Tsianos et al., 2011). Instead of reducing the dimensionality of control signals to assist computation of an unlikely global optimum, the nervous system might take advantage of the high probability of finding good-enough local minima within the high dimensional space of low-level circuitry (Loeb, 2012). A system that learned and stored such motor habits would appear to be limited to the synergies that it happened to have learned, but not as a result of any fundamental mechanism. Such a system might tend to get stuck in motor habits that could become suboptimal if the musculoskeletal system were to change its properties. This is exactly what we found when we applied our virtual biomechanics methodology (de Rugy et al., 2012c) to studies of wrist control in human subjects (de Rugy et al., 2012b).

# **ACKNOWLEDGMENTS**

We thank the two reviewers for their helpful comments, Rahman Davoodi for the musculoskeletal model of the arm and David Lloyd for the figure of the apparatus. This work was funded by the Australian Research Council.

## **REFERENCES**


Bernstein, N. A. (1967). *The Coordination and Regulation of Movements.* Oxford: Pergamon Press.


muscles of the cat hindlimb. II. Mechanical and architectural heterogenity within the biceps femoris. *Exp. Brain Res.* 85, 257–270.


muscle synergies during natural motor behaviors. *J. Neurosci.* 25, 6419–6434.


factorization. *Adv. Nural Info. Proc. Syst.* 13, 556–562.


organization of balance control following perturbations during walking. *J. Neurophysiol.* 108, 1895–1906.


synergies. *Curr. Opin. Neurobiol.* 19, 601–607.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 12 December 2012; accepted: 05 March 2013; published online: 21 March 2013.*

*Citation: de Rugy A, Loeb GE and Carroll TJ (2013) Are muscle synergies* *useful for neural control? Front. Comput. Neurosci. 7:19. doi: 10.3389/fncom. 2013.00019*

*Copyright © 2013 de Rugy, Loeb and Carroll. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

# The number and choice of muscles impact the results of muscle synergy analyses

# *Katherine M. Steele1,2\*, Matthew C. Tresch2,3,4 and Eric J. Perreault 2,3,4*

*<sup>1</sup> Mechanical Engineering, University of Washington, Seattle, WA, USA*

*<sup>2</sup> Sensorimotor Performance Program, Rehabilitation Institute of Chicago, Chicago, IL, USA*

*<sup>3</sup> Biomedical Engineering, Northwestern University, Evanston, IL, USA*

*<sup>4</sup> Department of Physical Medicine and Rehabilitation, Northwestern University Feinberg School of Medicine, Chicago, IL, USA*

### *Edited by:*

*Andrea D'Avella, IRCCS Fondazione Santa Lucia, Italy*

### *Reviewed by:*

*Ioannis Delis, Istituto Italiano di Tecnologia, Italy Vincent C. K. Cheung, Massachusetts Institute of Technology, USA*

### *\*Correspondence:*

*Katherine M. Steele, Mechanical Engineering, University of Washington, Stevens Way, Box 352600, Seattle, WA 98195, USA e-mail: kmsteele@uw.edu*

One theory for how humans control movement is that muscles are activated in weighted groups or synergies. Studies have shown that electromyography (EMG) from a variety of tasks can be described by a low-dimensional space thought to reflect synergies. These studies use algorithms, such as nonnegative matrix factorization, to identify synergies from EMG. Due to experimental constraints, EMG can rarely be taken from all muscles involved in a task. However, it is unclear if the choice of muscles included in the analysis impacts estimated synergies. The aim of our study was to evaluate the impact of the number and choice of muscles on synergy analyses. We used a musculoskeletal model to calculate muscle activations required to perform an isometric upper-extremity task. Synergies calculated from the activations from the musculoskeletal model were similar to a prior experimental study. To evaluate the impact of the number of muscles included in the analysis, we randomly selected subsets of between 5 and 29 muscles and compared the similarity of the synergies calculated from each subset to a master set of synergies calculated from all muscles. We determined that the structure of synergies is dependent upon the number and choice of muscles included in the analysis. When five muscles were included in the analysis, the similarity of the synergies to the master set was only 0.57 ± 0.54; however, the similarity improved to over 0.8 with more than ten muscles. We identified two methods, selecting dominant muscles from the master set or selecting muscles with the largest maximum isometric force, which significantly improved similarity to the master set and can help guide future experimental design. Analyses that included a small subset of muscles also over-estimated the variance accounted for (VAF) by the synergies compared to an analysis with all muscles. Thus, researchers should use caution using VAF to evaluate synergies when EMG is measured from a small subset of muscles.

**Keywords: muscle synergy, electromyography, simulation, nonnegative matrix factorization, musculoskeletal model**

# **INTRODUCTION**

The human musculoskeletal system is complex, providing a robust and flexible system for executing tasks of daily life. A primary challenge for researchers, clinicians, and others seeking to evaluate human movement is to understand how we control this complex system. The musculoskeletal system is highly redundant, with more muscles than degrees of freedom. Thus, there are many different ways that muscles can be recruited to execute a given task. Understanding the control strategies used during movement can provide insight into pathologic conditions, optimize performance, and inspire the design of novel robotics.

One theory of the control of human movement suggests that muscles are activated in groups, commonly referred to as synergies or modes (Lee, 1984; Tresch et al., 1999; Krishnamoorthy et al., 2003; Ting and Macpherson, 2005; Ivanenko et al., 2006). Activating multiple muscles with a single control signal is theorized to provide a simplified system compared to controlling each muscle individually. Previous studies have shown that muscle activity during a variety of tasks in humans (Ting and Macpherson, 2005; Ivanenko et al., 2007; Cheung et al., 2012) and animals (Tresch et al., 2002; d'Avella and Bizzi, 2005) can be described by a low-dimensional space thought to reflect synergies. In these experiments, electromyography (EMG) is measured during a variety of tasks and matrix factorization algorithms, such as nonnegative matrix factorization (NNMF), are used to determine a subset of vectors, or synergies, which describe the EMG signals. For postural control, gait, and upper-extremity tasks in humans, a lower dimensional space of four to six synergies have consistently been shown to describe muscle activity (Ivanenko et al., 2006; Torres-Oviedo et al., 2006; Roh et al., 2012).

Using NNMF or other algorithms to identify synergies relies upon measuring EMG from the muscles used to execute a task. However, due to constraints on time, resources, and subject comfort, EMG can usually only be measured for a subset of the muscles involved in the task. For example, in the human arm, there are over twenty muscles that may contribute to movement and force generation. Previous studies of synergies in the human arm typically only measure EMG from eight to nineteen muscles (d'Avella et al., 2006; Cheung et al., 2009; Roh et al., 2012). Researchers typically try to include as many muscles as possible within the constraints of their experimental set-up and often choose larger muscles thought to contribute to the task and from which it is easier to record surface EMGs. However, no rigorous analysis has been performed to determine how the number and choice of muscles included affects the results of synergy analyses.

A few studies have been able to measure EMG from all muscles involved in a task such as Valero-Cuevas et al. (2009) who evaluated human index finger force generation. This study determined that when EMG was measured from all muscles involved in the task, more synergies were required to describe the variance in EMG activity and a lower percentage of variance in EMG activity was explained by the synergies compared to prior studies that included EMG for a subset of muscles. Furthermore, studies that have tried to use synergies to drive musculoskeletal simulations have found that the synergies identified from EMG are often inadequate for controlling motion. For example, an experimental analysis of synergies during gait did not include EMG of the iliopsoas, a deep muscle that is difficult to measure with EMG. Simulations indicated that a synergy that included the iliopsoas was required to control gait (Neptune et al., 2009). However, even from this analysis it was not possible to determine if the number of synergies was incorrect (i.e., an extra synergy for the iliopsoas was required) or the structure of the synergies was incorrect (i.e., the iliopsoas should have been included in one of the original synergies). These results suggest that synergies identified from a subset of muscles involved in a task may be inaccurate or incomplete. Understanding the sensitivity of synergy analyses to the number and choice of muscles will enable researchers to understand the limitations of these methods, design experimental protocols, and critically analyze the results of these analyses.

The aim of this study was to evaluate how the number and choice of muscles included in synergy analyses impact synergies calculated from EMG. We created a musculoskeletal model of an isometric force task in the upper-extremity and evaluated the effect of using different combinations of muscles to calculate synergies. By comparing the similarity of synergies from different combinations of muscles, we determined that synergies were affected by the number and choice of muscles included in the analysis. We also were able to identify methods, such as selecting the largest muscles, which can be used to design experimental protocols and decrease the sensitivity of synergy analyses to the number and choice of muscles.

# **MATERIALS AND METHODS**

### **MUSCULOSKELETAL MODEL**

A previously developed model of the upper-extremity (Holzbaur et al., 2005) with 30 muscles (**Table 1**) was used to recreate an upper-extremity isometric force task that has previously been used to evaluate synergies (Roh et al., 2012). The upperextremity model included seven degrees of freedom including three degrees of freedom at the shoulder (flexion/extension, abduction/adduction, and internal/external rotation), wrist flexion/extension, forearm supination/pronation, and two degrees of

**Table 1 | Musculotendon actuators included in model.**


*\*Maximum isometric force (Holzbaur et al., 2005)*

freedom at the wrist (flexion/extension and radial/ulnar deviation). The model was positioned according to the experimental protocol of Roh et al. (2012) with the hand at half an armslength in front of the shoulder, the shoulder and elbow flexed, and the forearm and wrist in neutral positions. Muscle activations required to generate isometric forces in various directions at the hand were estimated by minimizing the sum of squared activations. Similar to the experimental protocol, the muscle activations to hold the force were examined and the periods ramping up or down from each force target were not included in the analysis. In the experimental protocol, the subjects generated forces in 54 or 210 directions evenly distributed in a sphere around the hand. Since, with a musculoskeletal model, we did not have the constraints of time, attention, or fatigue of the subject, we included 1000 force directions randomly distributed in a sphere around the hand. For each force direction, we solved for the muscle activations required to generate a ten newton force. The musculoskeletal model and analysis were executed using OpenSim, an open source software platform for musculoskeletal modeling and simulation (Delp et al., 2007).

# **SYNERGY ANALYSIS**

The muscle activations estimated from the musculoskeletal model during the upper-extremity isometric force task were used to calculate synergies. The muscle activations from all force directions were combined into an *m* × *t* matrix, V, where *m* was the number of muscles (i.e., 30) and *t* was the number of force directions (i.e., 1000). The activations for each muscle were normalized to unit-variance to ensure that the synergies were not biased toward high-variance muscles (Roh et al., 2012). NNMF was used to calculate synergies (Lee and Seung, 1999; Tresch et al., 1999, 2006) such that V = W∗C where W is the *m* × *n* matrix with *n* synergies and C is the *n* × *t* matrix of synergy activation coefficients. Thus, each column of W represents the weights of each muscle for one synergy, and each row of C represents how much the corresponding synergy was activated or used to generate force in each direction. The number of synergies, *n*, was set at four to compare to the prior experimental study. The NNMF algorithm was implemented within an iterative optimization which tested random initial estimates of W and C and selected the muscle weights and activation timings that minimized the sum of squared error between V and the muscle activations.

To demonstrate that our simulation was consistent with experimental observation, we first compared the synergies estimated from the musculoskeletal model to the synergies from the experimental protocol reported by Roh et al. (2012). The experimental protocol included EMG from eight muscles: the brachioradialis, biceps brachii, triceps brachii (long and lateral heads), deltoid (anterior, medial, and posterior fibers), and pectoralis major (clavicular fibers). Thus, for this comparison, we used the activations from the musculoskeletal model for the eight muscles with EMG to calculate synergies using NNMF. We compared the synergies from the musculoskeletal model to the experimental synergies from eight unimpaired subjects. We calculated the similarity of the synergies as the average correlation coefficient. To evaluate if the synergies from the simulation were within

the inter-subject variability, we compared the synergies from the musculoskeletal model to the experimental synergies of each subject. We calculated the similarity of the experimental synergies from each subject to one another to evaluate the inter-subject variability. Each subject's synergies were then compared to the simulated synergies to evaluate the similarity between the experimental and simulated synergies. We used an equivalence test to determine if the similarity of the experimental and simulated synergies were within the inter-subject similarity with a significance level of 0.05. For both the inter-subject similarity and similarity between experimental and simulated, we report the 95% confidence intervals.

# **IMPACT OF NUMBER OF MUSCLES ON SYNERGIES**

To evaluate the impact of the number of muscles included in the analysis on the resultant synergies, we compared the synergies calculated from random subsets of muscles to the "master set" of synergies (**Figure 1**). The master set of synergies was determined from the activations of all 30 muscles and 1000 force directions using NNMF. We then calculated synergies from the muscle activations of random subsets of muscles. We evaluated subsets that included odd numbers of muscles between 5 and 29. For each number of muscles (e.g., five muscles), we selected 1000 random combinations, or as many combinations possible given 30 muscles, and calculated synergies from the activations of these muscles using NNMF. The synergies from each subset were then compared to the same subset of muscles isolated from the master set. For each combination, the synergies were normalized to unit length and similarity was evaluated as the average correlation coefficient between the subset of the master set and the synergies calculated from the subset of muscle activations. The average correlation coefficient was determined by matching the pairs of synergies from the master set and subset that had the greatest similarity and averaging the correlation coefficients across the pairs. The correlation coefficient was normalized from zero to one

muscles required to perform the isometric upper-extremity force task. Random

of the synergies was compared as the average correlation coefficient.

where zero is the similarity expected by chance and one is perfect similarity (Tresch et al., 2006). The similarity expected by chance was calculated as the average correlation coefficient comparing each set of synergies to 25 randomly generated sets of synergies of the same size (i.e., same number of muscles and force directions). Across all analyses, four synergies were calculated from NNMF.

To evaluate the impact of changes in synergy weights, W, on the recruitment of synergies, C, we compared the directional tuning of synergies. The directional tuning was calculated as the dot product of the activation level of each synergy and each force direction. The resulting direction indicates the force direction for which a given synergy was highly recruited. The impact of the number of muscles on synergy recruitment was evaluated as the average angle between the directional tuning of the synergies from each subset of muscles and the master set.

Experimental EMG is commonly a noisy signal. To evaluate the impact of noise on synergies we repeated the analysis for each number of muscles between 5 and 29 with varying levels of noise. Noise was added to the estimated activations as a random normal distribution with a signal-to-noise ratio (SNR) between 0 and 20 dB, adjusted for the level of activity in each muscle. The similarity of the estimated synergies with noise was compared to the master set, as described above.

# **IMPACT OF THE CHOICE OF MUSCLES ON SYNERGIES**

To improve the similarity of synergies estimated from a subset of muscles to the master set, we evaluated different methods for choosing which muscles to include in the analysis. We evaluated two protocols for choosing muscles and compared the protocols to randomly selected subsets of muscles. We first evaluated the impact of selecting random sets of muscles from the dominant muscles of the master set of synergies. Dominant muscles were defined as muscles whose weight was within twenty percent of the maximum weight for each synergy, of which 22 of the 30 muscles met this criterion in at least one synergy. We varied the threshold for defining dominant muscles from 5 to 30% and found similar results for the similarity. To select a subset of muscles from the group of dominant muscles, an equal number of muscles were chosen from the dominant muscles of each synergy. For example, for combinations of five muscles, one dominant muscle was selected from each of the four synergies and then the final muscle was randomly selected from all the remaining dominant muscles. We compared the similarity of random combinations of 5–21 muscles selected from the dominant muscles to the similarity of the random combinations of muscles described above.

Selecting dominant muscles based on a synergy analysis of all relevant muscles requires that researchers have access to a musculoskeletal model appropriate for simulating their experimental protocol. For cases when this requirement may not be practical, we also evaluated the impact of selecting muscles according to size. We selected the largest muscles, according to maximum isometric force (**Table 1**), and determined the similarity to the master set for subsets that included from 5 to 29 of the largest muscles. These methods were evaluated to provide guidance for experimental protocols and to assist researchers in deciding which muscles to measure EMG from for synergy analyses.

# **RESULTS**

# **COMPARISON OF EXPERIMENTAL AND MODEL SYNERGIES**

Synergies estimated from a musculoskeletal model of an upperextremity isometric force task were similar to synergies calculated from experimental EMG (**Figure 2**). From the experimental EMG, the average similarity of synergies between subjects was 0.79 ± 0.10 (95% CI: 0.75–0.82). We also calculated synergies from the musculoskeletal model using the activations of the eight muscles that had EMG in the experimental protocol. The average similarity of the synergies from the musculoskeletal model to the eight subjects' experimental EMG was 0.72 ± 0.10 (95% CI: 0.65–0.78). Thus, the similarity of synergies estimated from the musculoskeletal model to the experimental synergies was slightly less than the inter-subject similarity of synergies from experimental EMG, but not significantly different. From **Figure 2**, the primary difference between the model and experimental synergies was the grouping of the posterior deltoid. In the synergies from the model, the posterior deltoid (DELT3) was grouped with the triceps while, in the synergies from the experimental EMG, the posterior deltoid was grouped with the other compartments of the deltoid. This may be due to the simplified shoulder (i.e., ball and socket) used in the model which does not include the posterior deltoid's role in controlling other shoulder degrees of freedom.

### **SYNERGIES CALCULATED FROM ALL MUSCLES**

The relative weightings and grouping of muscles in the synergies calculated from all 30 muscles differed from the synergies

**musculoskeletal model (dark gray bars) and experimental EMG (black outlined bars showing average ± one standard deviation and light gray bars showing synergies of individual subjects).** The similarity of the synergies from the musculoskeletal model and the experimental EMG were not significantly different from the inter-subject similarity of synergies.

models.

calculated from the eight muscles included in the experimental protocol (**Figure 3**). When all 30 muscles were included in the analysis, four synergies described 88% of the total variance in muscle activity. Similar to the analysis with eight muscles, one synergy was dominated by the biceps (**Figure 3**, see Synergy 1) and another synergy was dominated by the triceps (see Synergy 2); however, the dominant muscles of the other synergies included muscles that were not included in the experimental analysis such as the latissimus muscles (see Synergy 3) and shoulder rotator cuff muscles and forearm muscles (see Synergy 4). Additionally, the grouping and relative weights of muscles differed with thirty versus eight muscles. For example, in the analysis with eight muscles, the deltoids dominated Synergy 4 and were also coupled with the pectoralis major clavicular (PECM1) in Synergy 3. However,

when all 30 muscles were included in the analysis, the deltoids did not dominate one of the synergies and were no longer coupled with PECM1, suggesting fundamental differences in the grouping of muscles. The changes in the synergy weights, W, also altered the synergy activations in the C matrix. The directional tunings of synergies calculated from all 30 muscles were different from synergies calculated from the subset of eight muscles used experimentally, with differences in direction ranging from 12.2◦ to 74.5◦ (**Figure 4**).

If 5 synergies were included in the analysis, the variance accounted for increased to 92% and included four synergies with the same dominant muscles as the analysis with four synergies. The dominant muscles of the fifth synergy included the posterior deltoid and supraspinatus. To maintain consistency with the prior experimental analysis, four synergies were used for all subsequent analyses.

# **THE IMPACT OF NUMBER OF MUSCLES ON SYNERGIES**

The number of muscles included in the synergy analysis impacted the results from the NNMF algorithm (**Figure 5**). We compared the similarity of the synergies calculated from random subsets that included between 5 and 29 muscles to the synergies calculated from all 30 muscles (the master set). The average similarity of the random subsets to the master set was greater than 0.8 for all subsets that included between 5 and 29 muscles (**Figure 5A**). However, the similarity expected by chance increased when fewer muscles were included in the analysis. When only five muscles were included in the analysis, the similarity expected by chance was 0.63 (see dark bars, **Figure 5A**). Thus, the average normalized similarity (with 0 equal to similarity expected by chance) was only 0.57 ± 0.54 with five muscles (**Figure 5B**) and remained below 0.8 when less than 11 muscles were included in the analysis. When a small number of muscles were included in the analysis, the variance in similarity was also greater between subsets. For example, with five muscles, the average normalized similarity was 0.57 but some combinations of muscles approached perfect similarity while the similarity of other combinations was not different from similarity expected by chance. The set of five muscles with the greatest normalized similarity (0.999) included the triceps lateral head, teres major, supinator, latissimus dorsi inferior, and

30 muscles (light gray bars − average ± 1 standard deviation). The dark gray bars show the similarity expected by chance for each number of muscles included in the analysis. **(B)** Average similarity of synergies from analysis increased, the total variance accounted for approached the variance accounted for when all 30 muscles were included in the analysis (dotted line).

brachioradialis. The normalized similarity of the subset of eight muscles used in the experimental protocol to the master set of synergies was 0.52.

The difference in directional tunings of the synergy activations (C matrix) between subsets of muscles and the master set decreased as more muscles were included in the analysis. The average difference in directional tuning compared to the master set of synergies was 26.2◦ (±15.3◦) when only five muscles were included; however, the difference in directional tuning decreased to 12.7◦ (±11.0) when 15 muscles were included and approached zero degrees as the number of muscles increased. Differences in directional tuning would indicate errors in interpreting how synergies are recruited to produce force in various directions.

Total variance accounted for is often used to determine the number of synergies to include in an analysis and to evaluate how well a set of synergies reproduces muscle activity. The average total variance accounted for decreased with the number of muscles included in the analysis (**Figure 5C**). Thus, the average total variance accounted for was 0.98 ± 0.03 when only five muscles were included in the analysis and decreased to approach the total variance accounted for by the master set of synergies, 0.88 (see dotted line, **Figure 5C**) as more muscles were included in the analysis. Four synergies could more easily describe the variability in muscle activity when fewer muscles were included in the analysis. These results demonstrate that experimental analyses that include fewer muscles may over-estimate the total variance accounted for compared to an analysis that included all muscles involved in a task.

The task simulated in this analysis included 1000 force directions; however, the experimental protocol included either 54 or 210 force directions. We evaluated the impact of the number of force directions on the normalized similarity and total variance accounted for. The total variance accounted for was not sensitive to the number of force directions; however, the normalized similarity was reduced when fewer than 100 force directions were included. Additionally, there was greater variability in the normalized similarity between trials when fewer force directions were included depending upon the choice and dispersion of the force directions.

# **THE IMPACT OF NOISE ON SYNERGIES**

Surface EMG is an inherently noisy signal; however, the activations from a musculoskeletal model estimate muscle activity without noise. Thus, we sought to determine the impact of noise on the similarity of synergies to the master set of synergies. Increasing noise decreased the average normalized similarity (**Figure 6**), especially for combinations that included less than 15 muscles. Noise with an SNR greater than 10 dB had minimal effect on combinations of muscles that included more than 15 muscles. These results emphasize the importance of maintaining high SNR, especially when fewer muscles are included in the analysis.

# **PROTOCOLS TO IMPROVE SIMILARITY OF SYNERGIES**

To aid researchers in selecting muscles to include in an experimental protocol for synergy analyses, we evaluated several methods that could minimize the impact of measuring EMG from a subset of muscles. The most successful method involved selecting a subset of muscles evenly distributed across the dominant muscles from the master set of synergies (**Figure 7**). The dominant muscles were defined as muscles that were within 20% of the maximum for each synergy; however, the effectiveness of this method remained similar if the cut-off for defining dominant muscles was varied between 5 and 30%. This protocol resulted in a normalized similarity to the master set greater than 0.95 for subsets including between 5 and 29 muscles.

We also evaluated a method in which the subset of muscles was selected based on muscle size, using the maximal isometric force of each muscle. Such a method might be useful for the general situation where a musculoskeletal model is not available but overall muscle sizes might be. Selecting the largest muscles significantly improved the similarity to the master set. The normalized similarity of synergies calculated from the five largest muscles was 0.75 and all combinations with more than seven muscles had an average normalized similarity greater than 0.9.

# **DISCUSSION**

In this study we sought to determine if the number and choice of muscles with EMG in an experimental protocol will impact the synergies identified using matrix factorization algorithms, such as NNMF. We found that the number and choice of muscles does impact the structure of synergies and the amount of variance in muscle activity accounted for by a given set of synergies. However, we were also able to identify several strategies that can be used to minimize the impact of using a subset of muscles. We also compared our results from the musculoskeletal model to experimental results and found similar synergies, suggesting that results from musculoskeletal modeling were comparable to experimental conditions and can provide a platform for investigating muscle synergies.

The average similarity of synergies to the master set dropped below 0.8 when fewer than eleven muscles were included in the analysis. Prior studies of synergies during upper-extremity tasks have included between 8 and 19 muscles and thus may be significantly impacted by the choice of muscles. For example, our comparison to synergies from eight muscles used in the experimental protocol by Roh et al. (2012) had a low normalized similarity to the master set of only 0.52. Although the structure of two of the four synergies, dominated by the biceps and triceps, were similar, the other two synergies identified from the master set were dominated by muscles not included in the experimental protocol. Furthermore, the relative weighting and grouping of muscles also changed significantly when all 30 muscles were included in the analysis. Although synergies identified from a subset of muscles may be able to describe the variance in EMG activity, they may not correctly reflect how muscles are recruited or activated together which can impact the functional interpretations of synergies. Previous studies of similar tasks have also identified synergies with different structures and dominant muscles which may be due to the different muscles included in the experimental protocols (e.g., Cheung et al., 2012; Roh et al., 2013). Evaluating muscle synergies from a subset of muscles may still be valuable for comparing populations, such as unimpaired individuals and individuals after stroke, if the same subset of muscles is used for all groups. However, the limitations of using a subset of muscles involved in a task should be considered in analyzing the results of synergy analyses and in generalizing results to the over-arching neuromuscular control strategy.

Variance accounted for is also commonly used as a measure to evaluate the results of muscle synergy analyses and to determine the number of synergies used in a given task. However, the results of this study highlight that when fewer muscles are included in the analysis, the variance accounted for is over-estimated. Using variance accounted for to determine the number of synergies may result in fewer muscle synergies being selected than if all muscles were included in the analysis. The impact of the number of muscles included on variance accounted for suggests that complementary methods, such as using the ability of synergies to discriminate between tasks (Delis et al., 2013), should be used to determine the number of synergies in experimental protocols that include a small subset of muscles.

To assist with future experimental design, we identified several strategies to select muscles which can improve similarity when only a subset of muscles is included due to experimental constraints. Selecting from the dominant muscles of the synergies identified from musculoskeletal simulation was the most successful method. By selecting an equal number of dominant muscles from each synergy in the master set, the average similarity increased to over 0.95, even for cases with only five muscles. This approach worked well because it identified important muscles from each synergy of the master set which translated to similar synergies identified from NNMF. However, since musculoskeletal simulation may not always be available for analysis, we determined that selecting the largest muscles, as determined by maximum isometric force, also improved similarity. The largest muscles have the greatest contribution to movement and force generation and overlap with the dominant muscles identified with musculoskeletal simulation. These strategies for choosing which muscles to include in an experimental protocol are important for decreasing the sensitivity of synergies to experimental constraints and improving our understanding of the generalizability of synergy analyses.

The synergies calculated from the musculoskeletal model were similar to the experimental synergies; however, it is important to note that the muscle activations from the model were determined without reference to synergies. Similar to previous studies, we determined the muscle activations required to perform the task by minimizing the sum of squared activations. Thus, no synergies or other coupling between muscles was incorporated into the musculoskeletal model. Previous studies have suggested that the lower-dimensional subspace determined from matrix factorization algorithms may be more reflective of biomechanical or task constraints rather than the underlying neuromuscular control strategy (Valero-Cuevas et al., 2009; Kutch and Valero-Cuevas, 2012; Burkholder and Van Antwerp, 2013). The similarity between the synergies calculated from the model and experiment in this study further reinforce this theory since we could recover similar synergies as the experimental task in the absence of a control strategy based upon synergies.

This study also demonstrated how musculoskeletal simulation can be used to complement and optimize experimental design for muscle synergy analyses. Based upon the posture, kinematics, and external forces for an experimental protocol, musculoskeletal simulation can be used to estimate expected muscle forces and test a priori the impact of experimental constraints such as the number of muscles with EMG. Musculoskeletal simulation can also be used to predict the functional impacts of altered synergies or test if synergies

# **REFERENCES**


modulated activations of muscle synergies during natural behaviors. *J. Neurophysiol.* 101, 1235–1257. doi: 10.1152/jn.01387.2007


identified from matrix factorization algorithms can control movement (Neptune et al., 2009; Allen and Neptune, 2012). Free musculoskeletal simulation platforms such as OpenSim (Delp et al., 2007) provide a variety of human and animal models as well as the simulation algorithms that can be used for these analyses.

Matrix factorization algorithms provide a valuable tool for evaluating neuromuscular control of movement through the framework of synergies. Synergy analyses can provide insight into lower-dimensional subspaces that describe muscle activity during a variety of tasks and may reflect the underlying strategies for controlling the complexities of the neuromuscular and musculoskeletal systems. Muscle synergy analyses are increasingly being used to evaluate altered neuromuscular control in clinical populations, such as individuals after stroke (Clark et al., 2010; Cheung et al., 2012; Roh et al., 2013). As the applications of muscle synergy analysis reach the clinical realm, it is even more important to understand the limitations and generalizability of these methods. Understanding if the structure of synergies is altered in clinical populations because of different control strategies, altered biomechanics, or other factors will be critical for using synergy analyses to improve treatment and will require careful experimental design. This study has demonstrated that, although synergies estimated from NNMF are sensitive to the number and choice of muscles, there are multiple strategies that can be employed to improve experimental design and decrease the sensitivity of these analyses to experimental constraints. Researchers should especially note the increased risk of over-estimating the variance accounted for by synergies when fewer muscles are used in an experimental analysis. Combining simulation and experimental studies provides a complementary platform to address these challenges and continue to refine our knowledge of how humans control movement and interact with the world.

# **ACKNOWLEDGMENTS**

The authors would like to thank Samuel Hamner for his assistance with the upper-extremity model, as well as, Jinsook Roh and Claire Honeycutt for their feedback and discussion. This work was funded by NIH K12HD073945, 5R24HD050821-09 and R01 NS053813.

performance and muscle coordination complexity post-stroke. *J. Neurophysiol.* 103, 844–857. doi: 10.1152/jn.00825.2009


of the upper extremity for simulating musculoskeletal surgery and analyzing neuromuscular control. *Ann. Biomed. Eng.* 33, 829–840. doi: 10.1007/s10439- 005-3320-7


control during a postural task. *J. Neurophysiol.* 93, 609–613. doi: 10.1152/jn.00681.2004


control. *J. Neurophysiol.* 102, 59–68. doi:10.1152/in.90324.2008

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 03 May 2013; accepted: 13 July 2013; published online: 08 August 2013. Citation: Steele KM, Tresch MC and Perreault EJ (2013) The number and choice of muscles impact the results of muscle synergy analyses. Front. Comput. Neurosci. 7:105. doi: 10.3389/fncom. 2013.00105*

*Copyright © 2013 Steele, Tresch and Perreault. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Quantitative evaluation of muscle synergy models: a single-trial task decoding approach

# *Ioannis Delis 1,2\*, Bastien Berret 1,3, Thierry Pozzo1,4,5 and Stefano Panzeri 6,7\**

*<sup>1</sup> Robotics, Brain and Cognitive Sciences Department, Istituto Italiano di Tecnologia, Genoa, Italy*

*<sup>2</sup> Communication, Computer and System Sciences Department, Doctoral School on Life and Humanoid Technologies, University of Genoa, Genoa, Italy*


*<sup>6</sup> Center for Neuroscience and Cognitive Systems @UniTn, Istituto Italiano di Tecnologia, Rovereto, Italy*

*<sup>7</sup> Institute of Neuroscience and Psychology, University of Glasgow, Glasgow, UK*

### *Edited by:*

*Andrea D'Avella, IRCCS Fondazione Santa Lucia, Italy*

### *Reviewed by:*

*Simon Giszter, Drexel Med School, USA Dominik M. Endres, HIH, CIN, BCCN and University of Tübingen, Germany*

### *\*Correspondence:*

*Ioannis Delis, Robotics, Brain and Cognitive Sciences Department, Istituto Italiano di Tecnologia, Via Morego 30, 16163 Genoa, Italy. e-mail: ioannis.delis@iit.it; Stefano Panzeri, Institute of Neuroscience and Psychology, University of Glasgow, 58 Hillhead Street, Glasgow G12 8QB, UK. e-mail: stefano.panzeri@ glasgow.ac.uk*

Muscle synergies, i.e., invariant coordinated activations of groups of muscles, have been proposed as building blocks that the central nervous system (CNS) uses to construct the patterns of muscle activity utilized for executing movements. Several efficient dimensionality reduction algorithms that extract putative synergies from electromyographic (EMG) signals have been developed. Typically, the quality of synergy decompositions is assessed by computing the Variance Accounted For (VAF). Yet, little is known about the extent to which the combination of those synergies encodes task-discriminating variations of muscle activity in individual trials. To address this question, here we conceive and develop a novel computational framework to evaluate muscle synergy decompositions in task space. Unlike previous methods considering the total variance of muscle patterns (VAF based metrics), our approach focuses on variance discriminating execution of different tasks. The procedure is based on single-trial task decoding from muscle synergy activation features. The task decoding based metric evaluates quantitatively the mapping between synergy recruitment and task identification and automatically determines the minimal number of synergies that captures all the task-discriminating variability in the synergy activations. In this paper, we first validate the method on plausibly simulated EMG datasets. We then show that it can be applied to different types of muscle synergy decomposition and illustrate its applicability to real data by using it for the analysis of EMG recordings during an arm pointing task. We find that time-varying and synchronous synergies with similar number of parameters are equally efficient in task decoding, suggesting that in this experimental paradigm they are equally valid representations of muscle synergies. Overall, these findings stress the effectiveness of the decoding metric in systematically assessing muscle synergy decompositions in task space.

### **Keywords: muscle synergies, reaching, arm movement, task decoding, single-trial analysis**

# **INTRODUCTION**

The question of how the central nervous system (CNS) coordinates muscle activity to produce movements is central to the understanding of motor control (Tresch et al., 1999). The human brain has to deal with a redundant musculoskeletal system comprising of approximately 600 muscles actuating approximately 200 joints. It has been suggested that the CNS reduces the complexity of this control problem by exploiting various types of modularity present in the motor system (Bizzi et al., 2002, 2008; Flash and Hochner, 2005; Berret et al., 2009). A prominent example of such modularity is given by muscle synergies (D'Avella et al., 2003; Ting and McKay, 2007), loosely defined as stereotyped patterns of coordinated activations of groups of muscles. According to this hypothesis, the muscle patterns driving movements originate from linear combinations of a small number of synergies presumably recruited by a premotor drive generated by

some neuronal population (Delis et al., 2010; Hart and Giszter, 2010).

A relatively standard approach to individuate putative muscle synergies from EMG recordings of multiple muscles (**Figure 1B**) while subjects perform a variety of motor tasks (**Figure 1A**) is to first apply dimensionality reduction techniques to decompose the recorded EMGs into a set of synergies (**Figure 1C**), and then to assess validity of the decomposition using measures of goodness of approximation such as the Variance Accounted For (VAF) (D'Avella et al., 2006; Torres-Oviedo et al., 2006). This analytical approach has held valuable insights and hypotheses about the structure of modularity in muscle space.

Yet, actions are defined in task space and evaluation of the functional role of muscle synergies requires relating them explicitly to the execution of motor tasks (Todorov et al., 2005; Nazarpour et al., 2012; Ting et al., 2012). For this reason, the

analytical framework needed to critically test synergy decompositions on empirical data should include additional elements that are currently not considered systematically. First, synergies must constitute not only a low dimensional but also a functional representation of a variety of motor tasks (Overduin et al., 2008). Thus, to evaluate how well muscle synergy recruitment relates to differences across motor tasks, we ought to quantify how well the executed motor tasks can be distinguished on the basis of the synergy activation coefficients (Brochier et al., 2004; Torres-Oviedo and Ting, 2010; Chvatal et al., 2011).

Second, the CNS generates appropriate motor behaviors on single trials, thus synergy recruitment must accurately describe single-trial muscle activations (Tresch et al., 1999; Torres-Oviedo and Ting, 2007; Roh et al., 2011; Ranganathan and Krishnan, 2012). Third, to evaluate the extent to which muscle synergies implement a dimensionality reduction, the number of synergies that discriminate among different task-related movements must be correctly identified and compared to both the degrees of freedom of the musculoskeletal system (e.g., number of muscles) and/or the number of tasks (Ting and McKay, 2007). The estimation of which synergies are needed to fully describe all the task-discriminating movement variations is therefore crucial. Moreover, muscle synergies can contribute to motor function even if their activations do not depend on the task at hand (e.g., if they relate to body posture or reflect biomechanical constraints). To understand the function of a putative set of synergies, it is important to be able to easily tease apart task-dependent and task-independent synergies.

In practice however, not only the identity of these synergies, but also their number and their contribution to task discriminating variations, is unknown a priori. To select the number of synergies, the most widely used criteria rely on the dependence of the amount of variance explained upon the number of synergies. However, criteria solely based on VAF depend substantially also on factors not related to task execution, such as noise (neural variability, measurement fluctuations etc.) and preprocessing (filtering, averaging of EMG signals etc.) and cannot distinguish synergy that describe task-to-task variations from synergies that do not. We argue that synergy model selection should reflect not only the reconstruction of the dataset but also include a way to assess the reliability of the associated mapping from synergy recruitment to motor task identification.

To address these needs, here we propose, implement and validate on EMG data a method (schematized in **Figure 1**) for predicting, on a single-trial basis, the motor task from the synergy activation parameters. The method is based on quantifying the task discriminability afforded by one or more synergies, and then using this for an automated objective selection of the minimal set of synergies containing all information about such salient task-related differences in synergy activation patterns. We validate the robustness and applicability of the method using simulated EMG datasets. We finally apply it to real EMG recordings during a reaching task to illustrate how this method may be used for individuating synergy sets relevant for the execution of a given set of tasks, and for evaluating in task space the effectiveness of different types of muscle synergy decompositions.

# **MATERIALS AND METHODS MUSCLE SYNERGY EXTRACTION**

To identify muscle synergies, we used the time course of EMG activity of all recorded/simulated muscles in all individual trials for each task. We considered two well-established mathematical models for the representation of muscle patterns as synergy combinations: synchronous and time-varying.

### *Synchronous synergy model*

We used the Non-negative Matrix Factorization (NMF) algorithm (Lee and Seung, 1999) to extract synchronous synergies. In this model, the EMGs are represented as a linear combination of a set of time-invariant activation balance profiles across all muscles activated by a time-dependent activation coefficient:

$$\mathfrak{m}^{\circ}(t) = \sum\_{i=1}^{N} c\_i^{\circ}(t)\mathfrak{w}\_i + \mathfrak{e}^{\circ}(t) \tag{1}$$

where *m<sup>s</sup>* (*t*) is again the EMG data of all muscles at time *t*; *wi* is the synergy vector for the *i*-th synergy; *c s i* (*t*) is the scalar coefficient for the *i*-th synergy at time *t*; *N* is the total number of synergies composing the dataset; and ε*<sup>s</sup>* (τ) is the residual (e.g., noise). *N* is an input to the NMF algorithm, so we varied the number of extracted synergies from 1 to 8. In this case, the sample-independent muscle synergies are time-invariant vectors and the parameters that have to be modified in each sample *s* are the time-varying waveforms *c s i* (*t*) (Cheung et al., 2005)-the superscript *s* is used to denote sample-dependent quantities.

### *Time-varying synergy model*

We used the time-varying synergies model first introduced in (D'Avella and Tresch, 2002). According to it, a muscle pattern recorded during one sample *s* is decomposed into *N* time-varying muscle synergies combined as follows:

$$\mathfrak{m}^s(t) = \sum\_{i=1}^{N} c\_i^s \mathfrak{w}\_i(t - t\_i^s) + \varepsilon^s(t) \tag{2}$$

where *m<sup>s</sup>* (*t*) is a vector of real numbers, each component of which represents the activation of a specific muscle at time *t*; *w*(τ) is a vector representing the muscle activations for the *i*-th synergy at time τ after the synergy onset; *t s <sup>i</sup>* is the time of synergy onset; *c<sup>s</sup> <sup>i</sup>* is a non-negative scaling coefficient; and <sup>ε</sup>*<sup>s</sup>* (τ) is the residual (e.g., noise). This is a linear model providing a very compact representation of the muscle activity during one sample, since it has only two free parameters (one amplitude and one time coefficient) for each synergy (D'Avella et al., 2006). Note that the synergies *wi* are sample-independent, whereas the parameters *t s <sup>i</sup>* and *<sup>c</sup><sup>s</sup> <sup>i</sup>* must be adjusted for each sample (i.e., trial or task).

We fed the dataset to the time-varying synergies extraction algorithm to identify a set of muscle synergies and their activation coefficients that reconstructed the entire set of muscle patterns with minimum error. The number of extracted synergies *N* is a parameter of the model, and thus we repeated the extraction with a number of synergies ranging from 1 to 8. In order to minimize the probability of finding local minima, for each *N*, we ran the algorithm 10 times using different random initializations of the synergies and coefficients and selected the solution with the lowest reconstruction error. We used a convergence criterion of ten consecutive iterations for which the average error was decreased by less than 10<sup>−</sup>6. After extraction, the synergies (and the corresponding coefficients) were normalized to their maximum muscle activations.

# *Synergy similarity*

We assessed the robustness of the synergy sets extracted from different experimental datasets as done in (D'Avella et al., 2003). In brief, we quantified the similarity between pairs of synergies as their correlation coefficient. For evaluating the similarity of the whole ensembles of synergies recorded in different datasets, we started by selecting the pair with the highest similarity, and then the synergies in that pair were removed from their sets. We then computed the similarities between the remaining synergies and repeated this procedure until all synergies in the smallest set had been matched. In the case of time-varying synergies, we computed the correlation coefficient of the two synergies over all

# **CLASSICAL VAF-BASED CRITERIA FOR ASSESSING THE VALIDITY OF SYNERGY DECOMPOSITIONS AND FOR SELECTING THE SMALLEST SET OF SYNERGIES**

The purpose of this study is to develop a methodology to select, in a considered dataset, the smallest set of synergies that accounts for all task-discriminating variability in the recorded EMG dataset. Before we describe this new methodology, in this Subsection we briefly summarize the current methodology to choose the set of synergies used to decompose a dataset.

Synergy selection can be cast as a model selection problem, because different synergy decompositions are obtained when varying the number of extracted synergies. In the literature, there are typically *ad-hoc* criteria to assess the number of synergies, all based on the dependence of the amount of explained variance on the number of synergies extracted (*N*). This VAF is a measure of how well the actual muscle patterns can be fitted with a given set of synergies. The VAF of each synergy decomposition (D'Avella et al., 2006) is defined as follows:

$$VAF = 1 - \frac{\sum\_{s=1}^{S} \sum\_{t=1}^{T} \left\{ m^s(t) - \hat{m}^s(t) \right\}^2}{\sum\_{s=1}^{S} \sum\_{t=1}^{T} \left\{ m^s(t) - \bar{m} \right\}^2}. \tag{3}$$

where *s* indexes samples and *t* indexes time steps; *m*ˆ *<sup>s</sup> (t)* <sup>=</sup> *<sup>N</sup> <sup>i</sup>* <sup>=</sup> <sup>1</sup> *c<sup>s</sup> i wi(t* − *t s <sup>i</sup>)* for the time-varying synergies and *<sup>m</sup>*<sup>ˆ</sup> *<sup>s</sup> (t)* = *<sup>N</sup> <sup>i</sup>* <sup>=</sup> <sup>1</sup> *<sup>c</sup><sup>s</sup> i (t)*w*<sup>i</sup>* for the synchronous synergies model and *m*¯ is the mean (over samples and time steps) activation vector.

To detect the correct number of synergies from the VAF, there are four main existing criteria all based on the "VAF curve," i.e., the function quantifying the dependence of the VAF on the number of synergies extracted (*N*): (a) the point in the VAF curve at which a threshold (usually 0.9) is exceeded ("VAF-T" criterion) (Torres-Oviedo et al., 2006), (b) the point at which the highest change in slope (the "elbow") is observed ("VAF-E") (Tresch et al., 2006), (c) the point at which the curve "plateaus" to a straight line ("VAF-P") (Cheung et al., 2005) and (d) the point at which any further increase in the number of extracted synergies yields a VAF increase smaller than 75% of that expected from chance ("VAF-S") (Cheung et al., 2009b). VAF-S is implemented by first shuffling the EMG data across muscles and time steps and then extracting synergies from this "randomized" dataset. The VAF curve obtained from this decomposition exhibits an almost linear increase. Comparison of its slope with the slope of the real VAF curve at its point gives the selected number of synergies.

# **NEW TASK-DECODING BASED CRITERION FOR ASSESSING THE VALIDITY OF THE SYNERGY DECOMPOSITION AND FOR SELECTING THE SMALLEST SET OF SYNERGIES**

In this section, we present an additional method to quantitatively evaluate muscle synergies in task space and to automatically identify the smallest set of synergies that captures all the variance describing single-trial task-to-task variations in the dataset. To this aim, we first used a single-trial decoding analysis to quantify how well the single-trial coefficients of individual synergies or groups thereof discriminate between different tasks. A similar analysis has been implemented in the past in the context of muscle synergies (Brochier et al., 2004; Weiss and Flanders, 2004; Overduin, 2005) and in studies of modularity in the kinematics space (Santello and Soechting, 1998; Santello et al., 1998; Jerde et al., 2003). Intuitively, our reasoning is that VAF includes both the "interesting variance" (the one related to variations in synergy recruitment across tasks) and the "less interesting variance" (the one unrelated to variations in synergy recruitment across tasks and in some cases reflecting various sources of noise). In some condition, the presence of the latter variance may make difficult the selection of the correct number of synergies. Also, the removal of "noise" variance using VAF requires the experimenter's intuition and partly arbitrary or *ad-hoc* criteria (see section "Classical VAF-Based Criteria for Assessing the Validity of Synergy Decompositions and for Selecting the Smallest Set of Synergies" above). Our synergy evaluation method overcomes this problem by singling out (by single-trial task decoding) only the task-discriminating variance and then studying the dependence upon the number of synergies of this part of the variance.

# *Task decoding based metric*

For a given set of synergies extracted via classical dimensionality reduction techniques (e.g., NMF), we defined the new metric as the decoding performance, i.e., the percentage of correct decoding of individual trials, based on the single-trial measure of their activation coefficients (or some parameters of them). To avoid inflating artificially the decoding performance because of data over-fitting, each trial was decoded based on the distribution of all other trials (decoding with leave-one-out crossvalidation).

The task-decoding metric for determining the minimal set of synergies potentially depends on the choice of an algorithm used to decode the stimuli. Thus, to test the consistency of the results with respect to the details of the underlying calculations, we evaluated the performance of different decoding algorithms.

Here we used the following algorithms: (i) A linear discriminant algorithm (LDA), which worked as follows. For each pair of classes (i.e., motor tasks to be decoded), it first projects the *N*-dimensional values onto a hyperplane (i.e., a linear decision boundary) where the samples of each class are optimally separated. The direction of this line is the one that maximizes the ratio of the between-class over the within-class distances. Then, the trial to be predicted is assigned to one of the two classes by taking the one that has the minimum Euclidean distance in the direction of the decision boundary. An example of the decoding procedure using 2 pairs of synergies to decode the eight different motor tasks is given in **Figure 6E**. The LDA has determined the decision boundaries for classifying the trials to the motor task performed, and as a result, has separated the 2-dimensional space into eight regions, one for each task. Each point is assigned to the class represented by the colored region on which it lies. (ii) The quadratic discriminant (QDA), i.e., a discriminant algorithm that assumes unequal variances across classes leading to quadratic decision boundaries. (iii) The Naive Bayesian classifier (NB) which assumes that data points are independent with Gaussian in-class distributions and calculate the most likely class using Bayes theorem. (iv) The *k*-Nearest-Neighbors classifier with Euclidean distances (*k*-NN) (Duda et al., 2001). We set the number of neighbors *k* (a free parameter) to *k* = 10 because empirical investigations revealed that this selection maximized decoding performance for the dataset examined.

Unless otherwise stated, in this paper we relied on decoding using a LDA, because of its high computational speed and performance on the datasets considered here.

# *Automated procedure for selecting the minimal number of synergies based on task decoding*

From the proposed decoding based metric, we developed an automated procedure to select the minimal number of synergies. This model selection technique is based upon the progressive evaluation of the statistical significance of the task-discriminating information added when progressively increasing the number of synergies in the decomposition model. After evaluating decoding performance with *N* = 1 synergy, the number of synergies in the decomposition model increases step by step, until the increase of synergies does not gain any further statistically significant increase of decoding performance. The procedure is automatic because of this statistical test of significance. In this way, the chosen set of *N* synergies is the smallest decomposition that captures all available task-discriminating variance within the synergy space.

Crucial to this selection procedure is the test to determine if adding one more synergy significantly increases decoding performance. We needed to ensure that the different dimensionality of two models with *N* and *N* + 1 synergies does not lead to any artifactual difference in the computed percent correct values. Thus, we designed the statistical test as follows. For a given value *N*, we compare the decoding performance of the synergy parameters when using the *N* synergies with the decoding performance of the parameters of all subsets consisting of *N* − 1 synergies plus the parameters of the *N*-th synergy pseudo-randomly permuted ("shuffled") across conditions. We repeat this shuffling procedure a number of times (100 in our implementation) to obtain a nonparametric distribution of decoding performance values in the null hypothesis that the additional synergy does not add to the decoding power of the synergy decomposition. In the following we evaluated this significance at the *p <* 0*.*05 threshold. The statistical threshold for significant increase of decoding performance was graphically highlighted in the plots of the decoding performance (% correct) as a function of the number of synergies as a shaded area indicating the 95% confidence intervals constructed using this bootstrap procedure (**Figure 2C**). In this way, if the original decoding performance curve enters in the shaded area at the *N*-th synergy, the *N*-th dimension does not increase significantly the decoding performance and therefore this suggests that *N* − 1 synergies should be selected. The selected number of synergies can be simply visualized as the smallest value of *N* for which decoding performance lies above the no-significance (shaded) area.

### **FIGURE 2 | Continued**

curves (red curves whose scale is indicated in right *y*-axes) for datasets generated from 3 **(D (E** we used the "reference" values (*a* = 0*.*1 and *v* = 0*.*1). The red shaded areas represent the 5–95% confidence intervals of the bootstrap test for decoding. **(F–H)** VAF and decoding performance curves [plotted with the same **(C) )** , 4 , and 5 **)** "ground-truth" synergies. For noise levels,

# *Choice of the synergy coefficients' parameters*

In practice, task decoding is applied in synergy space, thus it requires working with synergy activation variables. The nature of such activation coefficients depends on the synergy model considered. After the decomposition into *N* synergies, the pattern of synergy activation in each trial can be described by a set of scalar parameters, vectors and temporal waveforms. To illustrate the method clearly, we first restricted our analysis to only one singletrial parameter per synergy for both models (time-varying and synchronous synergies), so each trial was represented by *N* values. The time-varying synergy model has two single-trial parameters (one scaling coefficient and one time delay) per synergy. We used the scaling coefficients which were shown to be more taskinformative than the time-delays (see section "Results"). In the case of the synchronous model, we had to extract the singletrial parameters from the time-varying activation coefficient. We decided to use the time integral of the activation coefficient over the entire task period because preliminary investigations (not shown) revealed that integral measures largely outperformed the decoding performance obtained with other measures based on single points such as timing and amplitude of first or second activation peak (Karst and Hasan, 1991; Flanders et al., 1996) and measures based on Principal Component Analysis of the activation time course (Optican and Richmond, 1987). The use of the integrated EMG activity is standard and has been extensively used in biomechanical studies (e.g., Winter, 2009).

Then, we went on to investigate whether taking more singletrial parameters into account would increase the task identification power of each type of synergy decomposition. For the time-varying model, we evaluated the decoding performance using both single-trial parameters (scaling coefficients and time delays). For the synchronous model, the time-varying activation coefficients of the synergies add a large number of extra parameters (up to an independent parameter per time point if activation varied fast enough) that could potentially carry more task-discriminating information. To test for the possibility that complex multi-parameter single descriptions of the time course of activation of synchronous synergies carry more information, we progressively refined the parameterization of the single-trial activation coefficients by binning them in smaller bins and we computed the decoding performance as a function of the number of bins.

### **EMG RECORDING PROCEDURES**

To validate the analytical methods developed here, we applied them to a set of EMGs recorded during the execution of a reaching task, as described in the following.

Four healthy right-handed subjects (AM, AB, AK, and ES) participated voluntarily in the experiment. The experiment conventions as in panels **(C–E)**] when generating data from five synergies and varying the simulated noise parameters as follows: increased signal-dependent noise (*a* = 0*.*3, *v* = 0*.*1) **(F)**; increased trial-by-trial variability of the activation coefficients (*a* = 0*.*1, *v* = 0*.*3) **(G)**; increased signal-dependent noise and trial-by-trial variability of the activation coefficients (*a* = 0*.*3, *v* = 0*.*3) **(H)**.

conformed to the declaration of Helsinki and informed consent was obtained from all the participants, which was approved by the local ethical committee ASL-3 ("Azienda Sanitaria Locale," local health unit), Genoa. The protocol consisted in executing reaching multijoint movements (flexions and extensions of the shoulder and elbow joints) in the horizontal plane (**Figure 1A**). The subjects sat in front of a table and were instructed to perform fast one-shot point-to-point movements between a central location (P0) and 4 peripheral locations (P1-P2-P3-P4) evenly spaced along a circumference of radius 40 cm. Subjects were supporting the weight of their arm by themselves; no device was used to remove static gravitational effects. The upper trunk was not restrained, but analysis of the kinematics data showed that its movement during the investigated tasks was negligible. The subjects made center-out (fwd) and out-center (bwd) movements (**Figure 1A**) to each one of the 4 targets. In total, the experimental protocol specified 8 distinct motor tasks denoted by T1*,...,*T8, each one of which was performed 40 times, thus the entire experiment consisted of 8 tasks × 40 trials = 320 samples. The subjects were asked to perform such a relatively high number of repetitions of each task because this was useful for the validation of the singletrial algorithms and of the impact of trial-to-trial variability on the generation of muscle activation patterns. This experimental design can be viewed as a variant of the one proposed in (D'Avella et al., 2006, 2008) since performance of each one of the 8 motor tasks requires executing movements to 8 different directions. The main difference with respect to this previous work is that instead of considering movements from the center of a circle to 8 points on its circumference, we analyze forward and backward movements between the center and 4 of these points. The order with which the movements were performed was randomized.

Electromyographic (EMG) activity was recorded by means of an Aurion (Milano, Italy) wireless surface EMG system. The EMG signals were recorded with a sampling rate of 1 kHz from the following muscles: (1) finger extensors (FE), (2) brachioradialis (BR), (3) biceps brachii (BI), (4) triceps medial (TM), (5) triceps lateral (TL), (6) anterior deltoid (AD), (7) posterior deltoid (PD), (8) pectoralis (PE), (9) latissimus dorsi (LD) (**Figure 1B**). The EMGs for each trial were digitally full-wave rectified, low-pass filtered (Butterworth filter; 20 Hz cutoff; zero-phase distortion), their duration was normalized to 1000 time steps and then the signals were integrated over 20 time-step intervals yielding a final waveform of 50 time steps. This is a standard EMG treatment in muscle synergy studies (see D'Avella et al., 2006). Body kinematics was recorded by means of a Vicon (Oxford, UK) motion capture system with a sampling rate of 100 Hz. Six passive markers were placed on the fingertip, wrist (over the styloid process of the ulna), elbow (over the lateral epicondyle), right shoulder (on the lateral epicondyle of the humerus), back of the neck and left shoulder. The kinematics data were low-pass filtered (Butterworth filter, cut-off frequency of 20 Hz) and numerically differentiated to compute tangential velocity and acceleration. Movement onset and movement end were identified as the times in which the velocity profile of the fingertip superseded 5% of its maximum. The mean movement duration varied across subjects from 370 to 560 ms. We verified that for all the subjects included in this analysis none of the muscles showed a systematic change in signal amplitude across the recordings sessions, which would be an indication that the EMG sensors were partially detached from the skin.

# **GENERATION OF SIMULATED EMG DATASETS**

To demonstrate the validity of our method and investigate its robustness, we tested it on simulated single-trial EMG responses in which the "ground truth" about which synergy set actually generated the data was known by construction, and that contained physiologically plausible sources of single-trial variance. We constructed two synthetic datasets by linear summation of a small number (from 2 to 5, depending on the simulation) of either synchronous or time-varying synergies and corrupted them with different physiologically plausible sources of single-trial variance: motor noise on the synergy activations and additive signal-dependent noise. The amount of each type of noise was parameterized by two separate free parameters: *v* and *a* respectively.

The first EMG dataset (simulation of synchronous synergies) was generated as a weighted combination of set of synchronous synergies according to Equation 2. The data simulated the activation of *M* = 30 muscles used for executing *R* = 40 repetitions of each one of *T* = 15 motor tasks, i.e., 40 × 15 = 600 simulated samples in total (**Figure 2A**). The *M*-dimensional synergies were drawn from exponential distributions with mean 10 were then normalized to each synergy's maximum muscle activation (Tresch et al., 2006). Their number was varied from *N* = 3 to 5 and their corresponding activation coefficients were assumed time-invariant for simplicity (Tresch et al., 2006). The use of time-invariant activation coefficients is common in the synchronous synergy framework (Ting and Macpherson, 2005; Torres-Oviedo et al., 2006). Typically, such time-invariant activations result from the preprocessing of the EMG signals to reduce dimensionality in the time domain (e.g., time-averaging of the muscle activities). However, the model can readily incorporate temporal variability too. In our simulations, we assumed that, to execute a motor task, each one of the *N* synergies was activated by a scalar coefficient drawn from a uniform distribution in the [0,1] interval. So, each task was represented as an *N*-dimensional vector of activation coefficients. The variability in the neural motor command, which in turn leads to trial-to-trial variations of the synergy coefficients in each task, was modeled as additive white Gaussian noise with covariance matrix *v*2**I**. Hence, we varied the parameter *v* to modulate the trial-to-trial variability of the synergy recruitment for each task.

To construct the second simulated dataset (simulation of timevarying synergies), we used the time-varying synergies (from *N* = 2 to 4) and the corresponding scaling coefficients extracted from the single-trial experimental data of the typical subject (**Figures 3A,B**). Importantly, this simulated dataset was built such that, without any source of task-unrelated variability, the underlying set of synergies is theoretically sufficient to produce the EMG patterns of 9 muscles for the execution of 8 motor tasks and such that, even with other sources of noise added, all task-discriminating variations are described by the scaling coefficients of the "ground-truth" synergies. We computed the means of the extracted coefficients for each motor task and fitted their distributions with *N*-dimensional Gaussians. Based on these data, we modeled the motor noise that corrupts the coefficients and delays as a Gaussian whose standard deviation (SD) is a fraction *v* of the SDs measured on real EMGs. A value *v* = 1 corresponds to the amount of variability empirically found on real data, whereas *v >* 1 corresponds to trial-to-trial synergy recruitment variability larger than that of the experimental data. From the resulting Gaussian distributions, we generated 40 sets of activation coefficients *c s <sup>i</sup>* per task which scaled the extracted synergies **wi**(t) according to Equation 1 to generate the simulated EMG dataset consisting of 9 muscles and 40 trials × 8 tasks = 320 samples. The duration of each sample was 50 time steps.

For both simulations, the resultant data were then corrupted by task-unrelated variability during movement execution. This was done by adding the noise term ε*<sup>s</sup> (t)* in Equations 1, 2. Muscle activation patterns have been reported to be corrupted by noise that scales with the amplitude of the motor signal, termed as signal-dependent noise (Tresch et al., 2006). Such noise whose SD is proportional to the amplitude of the noiseless EMG was added on each muscle's simulated EMG activity. In summary, the two types of variability included in the generation of the simulated data set are:


The above simulation parameters were varied to test the robustness of the proposed method to all sources and different levels of noise. In all cases, the maximal values of these parameters were chosen as the largest values that allowed accurate reconstruction of the original synergies by the synergy extraction algorithms.

### **FIGURE 3 | Continued**

coefficients activating the four synergies across the 8 motor tasks performed. Histograms are plotted as means ± SDs across all trials to the given task. **(C–E)** VAF and decoding performance curves for datasets generated from 2 **(D (E** levels, we used the "reference" values (*a* = 0*.*1, = 1). **(F–H)** VAF and **(C) ) )** "ground-truth" synergies. For noise  *v* , 3 , and 4

Conventions as in **Figure 2**. decoding performance curves when generating data from four synergies and varying the simulated noise parameters as follows: increased signal-dependent noise (*a* = 0*.*5, *v* = 1) **(F)**; increased trial-by-trial variability of the activation coefficients (*a* = 0*.*1, *v* = 5) **(G)**; increased signal-dependent noise and trial-by-trial variability of the activation coefficients (*a* = 0*.*3, *v* = 3) **(H)**.

synergies (*w1* and *w2* ) from the activities of 3 muscles executing 3 motor tasks. The 3-dimensional muscle space is approximated by a linear 2-dimensional synergy space. Bottom: Distributions of the synergy activation coefficients across trials for the three tasks. The variability of these distributions determines how reliably synergy recruitment maps onto

# **RESULTS**

# **COMPLEMENTARITY OF VAF AND DECODING METRICS FOR ASSESSING THE VALIDITY OF SYNERGY DECOMPOSITIONS**

Before we proceed to a detailed validation of the newly proposed decoding metric, we begin discussing and exemplifying the complementarity between our decoding metric and the traditionally used VAF metric (Ting and Chvatal, 2011). This is useful to both illustrate our methodology and justify the need for complementing the current evaluation of muscle synergy models with a task-space metric.

For this illustration, we simulated the activity of three muscles executing three different tasks (**Figure 4**; red, blue, green). For simplicity, we assume that muscle activation is described by a time-invariant scalar in this preliminary example. In each trial, the activation of these three muscles can be represented as a point in the 3-dimensional muscle space [as done in (Tresch et al., a function of the number of extracted synergies under four cases of extreme (either high or low) levels of variability. The dataset was generated by combining five synchronous synergies. These examples indicate that the two metrics assess the role of two different types of variance in the dataset.

2006) in simulations and in (Torres-Oviedo et al., 2006; Torres-Oviedo and Ting, 2007) with real data]. In **Figure 4A** (top), we show a total of 30 sample points (10 for each task). Muscle synergy identification consists in finding basis vectors spanning a lower-dimensional linear space in which the original data can be accurately described: in this case we extracted two synergies with non-negativity constraints (the arrows *w*<sup>1</sup> and *w*2) that approximate at best the data by a 2-dimensional hyperplane defined by the 3-dimensional synergy vectors (**Figure 4A** top). This representation is considered good if the approximation error (i.e., the residual error *r (j)* <sup>⊥</sup> ), defined as the distance of the data points from the hyperplane, is low. The currently most exploited metric to determine the set of synergies best describing a dataset is the VAF measure, which quantifies the proportion of variability in the EMG data set that is accounted for by the muscle synergy decomposition (see Equation 3 in section "Materials and Methods" for the exact definition). This metric is defined in muscle space. Thus, a high value of VAF means a high quality of reconstruction of the original muscle pattern (i.e., a relatively low *r (j)* <sup>⊥</sup> ). In this part of the analysis, unsupervised learning only is performed and the available knowledge about the task is ignored.

However, as outlined in section "Introduction," VAF alone cannot tell if, or how well, activations of these synergies describe variations of movement across different tasks in individual trials, and if two different synergies describe a different or the same type of task-discriminating variations. In other words, there is a second and equally important type of variability in the synergy decomposition that cannot be addressed by the VAF, which concerns trial-by-trial differences in the recruitment of muscle synergies. To analyze this, we move to coordinates in the synergy space (**Figure 4A** bottom). This variance reflects the spread of the synergy activation coefficients *c(j)* for the execution of the same task. The extent to which this latter variance affects task discrimination (and thus likely also task execution) can be assessed by quantifying the identifiability of the task performed on every trial in the synergy space. This is because executing a motor task using muscle synergies requires a well-defined mapping from synergy recruitment to task identification (i.e., one possible outcome associated with a given synergy activation). To quantify the reliability of this mapping on a singletrial basis we introduced (see section "Task Decoding Based Metric") the decoding metric which measures the percentage of times the task *j* is correctly predicted by the single-trial activation coefficients *c(j)* . This additional metric is necessary for the evaluation of muscle synergy representations with respect to task identification.

We further illustrate in **Figure 4B** the *complementarity* of the two metrics (VAF and decoding based) by considering EMG datasets corrupted with extreme levels (either high or low) of both types of variability described above. These datasets were generated from the five synchronous synergies shown in **Figure 2A** (see section "Materials and Methods" for details) and in all cases the noise levels were restricted so that the original synergies could be recovered from the synergy extraction algorithm (**Figure 2B**). Note that with very large noise, the original synergistic structure can be hidden or lost and therefore the extraction algorithm may fail to extract the correct synergies. Therefore, our analysis relies on the assumption that NMF's algorithms were able to recover the correct synergies (at least for the correct *N*, which was checked a posteriori).

First, we considered low levels of both types of variance for each task (the variance in muscle space, assumed to lie on a plane that is orthogonal to the synergy space, which characterizes how the actual muscle pattern varies across trials and the variance in synergy space, which characterizes trial-to-trial variations in the synergy recruitment). Note that the distinction between these two sources of variability is not unrealistic as it may reflect actual neural noise acting at different levels of the CNS. With low variance, the VAF metric reveals that five synergies reconstruct accurately all the muscle patterns and the decoding metric validates that all the tasks are perfectly discriminable from the activation coefficients of these synergies (**Figure 4B**, top left). Then, we tested the case of very high variance only in *r (j)* <sup>⊥</sup> . Because such variability cannot be written as synergy decomposition (by hypothesis), the VAF by the extracted synergy decomposition is never sufficiently high: the high levels of unstructured noise do not allow a good approximation of the data in a low-dimensional space. Nevertheless, decoding performance proves that 5 synergies describe all task-to-task differences; therefore these synergies constitute the minimal set of synergies that guarantees a reliable mapping between their activation and task identification (**Figure 4B**, top right). On the other hand, when there is a very high variability in the synergy activations *c (j)* then the synergy decomposition cannot distinguish between different tasks even though the dimensionality reduction is not affected. This is shown by the decoding performance curve that never exceeds significantly chance level. Hence, although all the data points still lie on a 5-dimensional space (*VAF = 1* exactly for *N=5*), there is no way to guarantee what task will be executed at fixed activation of the synergies (in this extreme case, the mapping is even a uniform random variable –chance level) (**Figure 4B**, bottom left). Finally, we consider high variance for both *r (j)* <sup>⊥</sup> and *c(j)* simultaneously. As expected, VAF exhibits no saturation point and decoding performance is at chance level. So, in this case, the identified synergy decomposition neither reconstructs well the original (i.e., recorded) muscle activities nor discriminates between tasks (**Figure 4B**, bottom right). This "worst case" scenario would typically invalidate the synergy decomposition because both metrics yield low scores.

These extreme but theoretically plausible cases exemplify well the usefulness of a systematic methodology that evaluates quantitatively not only the approximation of the muscle space by the synergy space (addressed by the VAF) but also the mapping between synergy activation and task identification (addressed by single-trial task decoding). Such an approach will allow an objective and automatic assessment of muscle synergy models/decompositions and thus, it can serve as a model selection criterion.

It is useful to consider other possible conceptual cases in which concurrent evaluation of VAF and task decoding may provide valuable insights. First, consider a case when VAF individuates a larger set of synergies than that individuated by decoding. The extra synergies identified by VAF explain a useful amount EMG variance but do not add any task-discriminating information. This may happen either because such extra synergies are activated always in the same way in all tasks (something that can be tested by verifying that when considered individually they lead only to random task decoding) or that they do not add any task discrimination to that already carried by the minimal set of synergies determined by the decoding metric. Careful decoding analysis of the individual extra synergies contributing to VAF but out of the minimal task-decoding set may be useful to individuate synergies (e.g., postural synergies) that may be important for generating the appropriate movements even if their activation is constant across tasks. A second possibility is that decoding selects a larger set of synergies than VAF. This case indicates that some synergies (because e.g., they have little signal amplitude) do not explain a large part of the variance of the dataset, yet they provide unique information about task-to-task differences not carried even by other synergies in the set with larger amplitude. These considerations suggest the potential value and complementarity of the insights provided by the joint use of VAF and decoding metrics.

In the following, we aim to show how our new method is crucial in this respect. We will illustrate how the VAF and decoding curves behave when muscle synergies are extracted from real data and verify that the method is reliably applicable to a variety of realistic EMG datasets with different properties giving robustly correct results.

# **SIMULATION STUDIES OF THE ROBUSTNESS OF THE DECODING-BASED METHOD**

To demonstrate and test our novel method for selecting the set of synergies that describe all task-related differences at best, we used two types of simulated datasets (one for each type of muscle synergy model, see section "Materials and Methods" for details). Importantly, in both simulated datasets, which contain both task-discriminating and non-discriminating ("noise") variance, we know by construction the number of synergies sufficient to describe the entire task-discriminating part of the EMG variance. This allows a direct evaluation of the performance of the method against a ground truth. Moreover, by manipulating the parameters (such as the level of noise, or the number of true synergies, or the number of samples recorded), we can investigate how these variables affect the quality of synergy detection. In the following, we first validate the methodology by implementing various tests on the dataset generated by synchronous synergies and then show that its application extends naturally to the time-varying synergy model.

# *Assessment of the method's performance when applied to synchronous synergies*

*Varying the number of "ground truth" synergies.* First, we checked the extent to which our method could identify the correct set of synergies that generated the data, by varying the number of ground-truth synergies from which we generated the simulated EMG data. More precisely, we simulated three, four or five synchronous synergies (see section "Materials and Methods") and corrupted them with both types of noise (*a* = 0*.*1 and *v* = 1). After extracting muscle synergies from the simulated datasets, we obtained the VAF and decoding curves reported in **Figures 2C–E**. In all three cases, the ground-truth sets of synergies that explain the task-discriminating variations were correctly individuated, indicating that our method reliably detects synergy sets independently of the precise number of synergies.

*Varying the sources and levels of noise.* We then estimated the effect of varying the amount of the two types of physiological noise. In these further investigations, we generated the data using all 5 simulated synchronous synergies (plotted in **Figure 2A**). In particular, we tested how noise differentially affects the selection of synergies with our decoding method and with standard VAF criteria. (The threshold for the VAF-T criterion was set to 0.9 in all subsequent simulations).

We started by generating an EMG dataset with a set of "reference" values of the noise parameters (*a* = 0*.*1 and *v* = 0*.*1) on which we performed the subsequent synergy-identification and decoding analysis. **Figure 2E** shows that the decoding method correctly identified the number of synergies expressing taskdiscriminating variance. This already suggests that the decoding metric is quite robust to the presence of task-irrelevant variance in the data set.

In the subsequent simulations, we increased the level of one noise parameter at a time in order to gain some intuition about the dependence of the VAF and percent correct metrics on these different sources of noise and also examine the robustness of our method to higher levels of noise.

First, we examined the impact of increasing the signaldependent noise. We increased the signal dependent noise parameter to *a* = 0*.*3 keeping *v* unchanged. The VAF drastically decreased indicating its susceptibility to the unstructured additive noise (**Figure 2F**). In fact, in the limit of very large additive noise (e.g., due to very noisy recordings), the VAF curve exhibited an almost linear increase with low slope implying that a large number of synergies would be required to explain the variance of the dataset (see e.g., **Figures 2F,H**). These findings highlight a degree of arbitrariness in selecting synergy sets by setting thresholds purely based on VAF criteria, as VAF contains also noise variance and thus choice of a good threshold crucially depends on the amount of noise in the considered dataset. More specifically, in the high additive noise case, the VAF-based criteria select a very large number of synergies because the curve neither reaches the 0.9 threshold nor plateaus. On the other hand, the percent correct metric exhibited robustness to additive noise demonstrating only a slight decrease (**Figure 2F**). Hence, by attempting to decode the task performed, we can discount task-unrelated variance and we can still identify the synergies that account for task-discriminating variability in a general way and separate them from those explaining unstructured noise in the data.

Second, increasing signal-dependent noise (*v* = 0*.*3) does not change the VAF curve (**Figure 2G**)- meaning that the whole dataset can still be reconstructed with low error- but produces synergies whose activations are more variable across trials, i.e., the motor tasks are performed in a less stereotyped way, which causes the decoding performance of the extracted synergies to decrease. However, although the decoding ability of the extracted synergies is decreased, our method is still able to detect the four synergies that explain all the task-relevant variability in the dataset (**Figure 2G**). Note that in the limit of very large motor noise, decoding would tend to chance level while the VAF would still be near 100% (see e.g., **Figure 4C**). In such an extreme case, our decoding algorithm would correctly detect that the synergies (though they explain the variance of the data) cannot be used for identifying and selecting the task-discriminating movement features. In fact, the mapping between synergy activations and tasks executed would be completely random, proving the non-usability of this synergistic structure for accurate motor control.

Finally, we increased both types of noise simultaneously (*a* = 0*.*3 and *v* = 0*.*3). Consequently, both curves fell to lower values. However, although the VAF curve did not exhibit any elbow or saturation point, the decoding curve had a clear peak at *N* = 5 demonstrating the robustness of our approach to high levels of any type of variability (**Figure 2H**).

*Varying the number of samples.* Because the datasets that can be obtained experimentally are composed of a limited number of trials, we investigated how this limitation may affect the performance of our method. Our aim was to assess the smallest number of repetitions of each motor task that can guarantee reliable selection of the set of muscle synergies that account for all the task identification power. In these simulations we used the same noise levels as the ones in **Figure 2E** (*a* = 0*.*1 and *v* = 0*.*1). We generated 5 repetitions of each simulated motor task, i.e., 5 trials × 8 tasks = 40 samples, and corrupted them with 20 different instances of noise yielding 20 different simulated datasets. On these datasets we implemented our decoding-based method and computed how many times it gave the correct result (five). We repeated the same when simulating 10 and 20 repetitions of every motor task. **Figure 5A** shows that 10 trials per task were sufficient to determine the five synergies in the vast majority of the algorithm runs, while when 20 trials were used the algorithm identified the number of synergies always correctly. In general, we found (results not shown) that increasing the number of samples caused a reduction of both the variance of the decoding performance at fixed number of synergies and of the confidence

intervals of the "null hypothesis" (indicated as a shaded red area in **Figures 2**, **3**), which in turn both contribute to increasing the stability of the method. In the case of five trials, although the correct number was found in most cases, sometimes the algorithm underestimated the number of underlying synergies because relying on only 5 trials per task rendered our test of statistical significance not reliable. So, the synergies contributing the least to describing task-related differences were often falsely excluded.

*Using different decoding algorithms.* To validate that our results do not depend on the specific choice of single-trial decoding algorithm (LDA) we applied our methodology to the simulated dataset constructed from the five synergies (**Figure 2A**) corrupted with the "reference" noise values (*a* = 0*.*1 and *v* = 0*.*1) and 40 simulated trials per task using a wider range of classification algorithms (see section "Materials and Methods" for details). We used these noise levels and number of trials because they reflect the corresponding values used/computed in our experiment. We tested other three standard decoding algorithms (QDA, NB, and 10-NN) and found that all of them identified correctly the set of (five) synergies describing all information about the task. Moreover, the discrimination power of the synergy decomposition was revealed to be robust to the assumptions underlying the classification procedure as the decoding performance was almost identical for all algorithms (**Figure 5C** left) (also when we varied the number of trials and levels of noise, results not shown).

### *Extension of the method's applicability to time-varying synergies*

So far, we validated our method on EMG data described by synchronous synergies. However, it is readily applicable to any type of representation of muscle activation patterns. To demonstrate its robustness independently of the underlying model, we repeated all the tests performed above with simulated data generated from four physiologically plausible time-varying synergies (see section "Materials and Methods"). The reference values of the noise parameters *v* = 1 and *a* = 0*.*1 used in these simulations were set so that the amount of task-discriminating and non-discriminating variability in the simulated data matched those measured in the experimental data based on which we built the simulations. We note that in all cases adding realistic amounts of noise—as the ones expressed by the parameter values used in all subsequent simulations—allowed accurate reconstruction of the shape of the underlying synergies (shown in **Figure 3A**).

First, we validated that the method identifies correctly the set of synergies that generated all task-to-task differences independently of the original synergy set dimensionality (**Figures 3C–E**). Then, using all four synergies, we examined the method's robustness to higher levels of both types of variability. Again, the number of synergies explaining all task-discriminating variations was reliably specified (**Figures 3F–H**). The slightly larger confidence intervals of the "null hypothesis" (shaded red area) result from the larger variance of the estimates of this decomposition with respect to the synchronous synergies. This can arise from the non-guaranteed convergence of the time-varying synergy extraction algorithm and its sensitivity to the initial guess, two factors that make the output of time-varying decompositions less stable across simulations.

Following this, we aimed at estimating the minimal number of trials required to guarantee reliable synergy set selection. Five trials appeared to be too few for our proposed test of statistical significance (**Figure 5B**). In this case, the assessment of the number of synergies was not reliable because the distribution of the shuffled surrogates had a very big variance. As a result, any increase of dimensionality did not increase significantly (with respect to the shuffled surrogates) the decoding performance and thus, our method yielded usually one synergy. Therefore, we suggest a minimum of 10 repetitions of each motor task for potential future studies evaluating the importance of trial-to-trial variability in time-varying synergy extraction.

We note that the minimum of trials required to individuate correctly the synergy set may depend upon both the number of muscles recorded by the EMG setup and the level of noise in a current session. The number of trials needed to identify the synergy set is likely to increase with the number of recorded muscles (because decoding in a higher dimensional space with small dataset is more difficult) and with the level of noise (due to the difficulty in detecting patterns in noisy conditions). Whereas our simulations demonstrate that our method can detect reliably the synergy set with feasible amounts of data, for the reasons indicated above it is valuable to evaluate the minimum number of trials at a preliminary stage using simulated data (such as those considered here) constructed with statistical properties as similar as possible to the actual experimental data of interest.

Finally, we examined whether other decoding algorithms yield the same results for the time-varying synergies too (**Figure 5C** right). Indeed, both the identified number of synergies and their decoding performance were robust once more indicating the applicability of our method independently of the properties of the dataset or the details of the mathematical implementation.

# **IDENTIFICATION OF THE SMALLEST SYNERGY SET THAT ACCOUNTS FOR ALL TASK-DISCRIMINATING VARIABILITY IN SINGLE-TRIAL SYNERGIES EXTRACTED DURING CENTER-OUT POINTING.**

To illustrate the ability of our methodology to identify synergy sets on real data, we applied it to a dataset of EMG activity recorded during an experimental protocol (fully described in section "Materials and Methods") comprising of many repetitions of a variety of point-to-point reaching movements. For each one of the four subjects tested, we formed an EMG matrix of dimensions 9 muscles × (50 time steps × 320 samples) consisting of all the movement-related EMG activity (rectified and filtered) of the 9 muscles for all samples recorded. In order to illustrate our methodology first, we present extensive results from the analysis of only one ("typical") subject's EMG dataset. In such realistic situation, the correct number of synergies is unknown and the amount of the different types of noise is not available. Here we test the extent to which extracted synergies can express single-trial task-related differences and how many synergies are selected by our method (compared to the VAF-based criteria). A summary of the results of all four datasets is reported at the end of the following section.

# *Synchronous synergy identification*

We first illustrate the application of our method to recorded synchronous synergies. We extracted using the NMF algorithm (see section "Materials and Methods") single-trial synchronous synergies from the experimental EMG data, beginning with the typical subject (**Figure 6**). These synergies consist of constant vectors of levels of muscle activations (**Figure 6A**) recruited by time-varying activation coefficients. Note that the synergies and coefficients shown in this figure were obtained using *N* = 4 in

**FIGURE 6 | Application of our decoding method to the synchronous synergies extracted from the EMG data of a typical subject recorded during the execution of an arm pointing task. (A)** The four synchronous synergies obtained from the experimental data of the example subject as four vectors of activation levels of 9 shoulder and elbow muscles. **(B)** Histograms of the integral of the activation coefficient (Int) across the 8 motor tasks performed. **(C)** VAF (black curve whose scale is indicated in left *y*-axis) and percent correct curve (red curve whose scale is indicated in right *y*-axis). The shaded area represents the 5–95% confidence interval of the bootstrap test for decoding. **(D)** Histogram of the number of synergies

selected by each one of the existing criteria (gray bars) and our proposed one (black bar). **(E,F)** Decoding the 8 motor tasks using the integral of the 4 synchronous synergies' activation coefficients. For a given trial to be decoded, the activation coefficients of 2 synergies are represented as a point in the 2-dimensional space. The color of each point indicates the actual task which this trial corresponds to. The linear discriminant algorithm has divided the space into 8 regions, one for each class (motor task). The trial is assigned to the class indicated by the color of the region on which the point lies. **(E)** Classification using the integral of activation of synergies S2 and S4. **(F)** Classification in the S1–S3 space.

the synergy extraction algorithm because we found (see next sections) that this is the minimal set of synergies explaining all task-discriminating variations. However, the four most taskdiscriminating synergies and their coefficients obtained assuming more synergies were quantitatively and qualitatively very similar to those presented in **Figure 3** (results not shown). The resulting muscle groupings had a straightforward anatomical and functional interpretation: Synergy1 describes mainly the activation of two shoulder flexors; Synergy2 has strong activations of elbow extensors; Synergy3 has strong activations of elbow flexors and Synergy4 is primarily composed of shoulder extensors (**Figure 6A**).

In order to assess how synchronous synergies allow decoding the task on single trials, we computed the time integral of the synergies' activation coefficients, which represents the magnitude of the synergy activation (see section "Materials and Methods"). The magnitude of each synergy clearly depended upon the task. More precisely, S1 is primarily activated for out-center movements starting from the left; S2 is recruited for center-out leftward movements; S3 is recruited for out-center movements starting from the right and S4 for center-out rightward movements (**Figure 6B**).

**Figure 6C** plots the VAF curve as a function of the number of synergies (black curve) and **Figure 6D** shows the selected number of synergies that would be selected according to each one of these criteria. It is clear that these VAF-based criteria do not yield a consistent number of synergies for this dataset, and selecting the correct set of synergies in this way necessarily relies partly on a somewhat arbitrary choice. We attempted to resolve this problem by complementing the VAF curve with the decoding performance afforded by the single-trial parameters of the model. We applied our proposed decoding-based method on the magnitude of the extracted synchronous synergies. The decoding performance curve (red curve in **Figure 6C**) obtained by varying the number of synergies (*N*) saturates at *N* = 4 indicating that the task discrimination power added by the magnitude parameters of any additional synergy is negligible. Thus, application of our model selection algorithm gave 4 synergies (last point on the curve lying above the red shaded area).

To gain more insights on how these 8 motor tasks are decoded using the magnitude parameters of the extracted four synergies, we visualized the decoding procedure. In **Figures 6E,F**, we show scatter plots of the parameters on a 2-dimensional space that has been split by the LDA into 8 different classes, one for each task. Each point is colored according to the actual class which this trial corresponds to and is assigned by the decoding algorithm to the class represented by the colored region on which it lies. Thus, the trial is decoded correctly only if these two colors coincide. From **Figure 6E**, it is clear that activations of synergies S2 and S4 can discriminate well the forward tasks (T1-T2-T3-T4), but the backward ones(T5-T6-T7-T8) are much less discriminable (larger overlap of the data points). This is why only two synergies are not sufficient to distinguish all the simulated tasks. We may expect that adding the other two synergies will resolve this problem. Indeed, **Figure 6F** shows that in the S1-S3 space, even though tasks T1-T2-T3-T4 are poorly classified, there is a clear improvement in the classification of tasks T5-T6-T7-T8. So, to get maximal discrimination of the 8 motor tasks four synergies are required, two of them encoding the forward movements and the other two the backward ones.

We then examined the results from the single-trial analysis of the synchronous synergies extracted from all four recorded subjects. The overall decoding performance afforded by synchronous synergies ranged from 63% to 73% for the LDA decoding algorithm across subjects. The set of synergies needed to explain all task-discriminating variability consisted of four synergies in two subjects and three in two other subjects. In general, the synchronous synergies extracted from different subjects were qualitatively and quantitatively similar. The average similarity between the synchronous synergies extracted from different subjects is shown in **Figure 8** (lower left part). In the two subjects that needed four synergies to capture all task-discriminating variations, muscle groupings were similar to the ones shown in **Figure 6A**. In the two subjects that needed only three synergies, the elbow extensors (S2 in **Figure 6A**) were either grouped with the shoulder extensors (S4 in **Figure 6A**) to form one synergy or activated by all three remaining synergies at lower levels (results not shown). Hence, S1, S3, and S4 were identified in all subjects, whereas S2 was present only in the two subjects that used four synergies to perform the tasks. In all subjects the decoding performance of the selected synergies was significantly higherthan-random for all tasks (*p <* 0*.*05, bootstrap test). We also tested the decoding performance of each one of the four synchronous synergies separately and found higher than random decoding for all of them (ranging from 31% to 46%). This implies that each synergy in the set exhibits a degree of tuning to all task directions.

# *Time-varying synergy identification*

To characterize the spatiotemporal organization of the recorded muscle patterns, we fed the EMG matrix to the time-varying synergy extraction algorithm and extracted sets of time-varying synergies (see section "Materials and Methods"). Each EMG pattern in each sample was then described by the coefficients specifying the amplitude (scaling coefficient) and time delays of the activation of each synergy. The extracted synergies, the mean activation coefficients and delays across tasks are shown for the typical subject in **Figure 3**.

We then examined how these single-trial parameters were modulated by the task (**Figure 3B**). We considered the distribution across tasks of the scaling coefficient and found that it exhibited a clear task-dependence (**Figure 3B**) (D'Avella et al., 2008) similar to the one found for the integral of the synchronous synergies model. The corresponding delays are dependent on the task performed but their relationship with motor tasks is less apparent than that of the coefficients (results not shown). Then, we implemented the decoding-based method opting for the set of time-varying synergies that explain all task-discriminating variability. Our method yielded four synergies (**Figure 7A**) also for this model. We further asked if we could reach the same result using the VAF-based criteria. **Figure 7B** shows that each one indicated a different number of synergies rendering such an assessment inconclusive and pointing out the arbitrariness of any selection made using these criteria.

**the typical subject. (A)** VAF (black curve whose scale is indicated in left *y*-axis) and decoding performance curve (red curve whose scale is indicated in right *y*-axis) for the example subject. The red curve represents the percent correct values using the scaling coefficients of the time-varying synergy model. The shaded area represents the 5–95% confidence interval of the

synergies' scaling coefficients. **(C)** Classification using the coefficients of synergies S1 and S2. **(D)** Classification in the S3–S4 space. Region and marker color conventions as in **Figures 6E,F**.

As we did for the synchronous synergies, we examined the discriminability of the motor tasks also in the time-varying synergy activations space (**Figures 7C,D**). Again, we found the activations of two synergies (S1-S2) describing the differences across the forward tasks (T1-T2-T3-T4) and the other two (S3-S4) were used to distinguish the backward ones (T5-T6-T7-T8).

We note that with respect to previous studies of the role of synergies upon goal-directed reaching movements (D'Avella et al., 2006, 2008), our method individuated in this subject synergies spanning all four cardinal directions in planar reaching space. This may be due to specific subject-to-subject or task setup differences between our experiments and those previously published. Another possibility is that some synergies encoding specific directions carry relatively little EMG variance (such as those pointing right in right-handed movements, e.g., synergy S4 in **Figure 3A**, which expresses out-center movements starting from the right) and may thus be discarded according to VAF-based criteria. However, our method picks these synergies because they express degrees of freedom of movement relevant for the task and not expressed by other synergies and so must necessarily be included in a task-decoding analysis.

Similar results held for all four recorded subjects. The overall decoding performance afforded by the scaling coefficients of time-varying synergies ranged from 64% to 74% (for the LDA algorithm) across subjects (for comparison, chance level is 12.5%). The number of time-varying synergies needed to explain all task-discriminating variability was the same as the number of synchronous synergies for all four subjects. The shape of synergies (i.e., the time course and the relative levels of muscle activations) was similar for each subject, as indicated by the high similarity index obtained for all pairs of subjects (**Figure 8** upper right part). The main difference across subjects was that in the two subjects that needed only three synergies the muscle activations that constitute the fourth (non-significant) synergy was either included in one of the other three synergies or distributed across all of them (results not shown). The resulting scaling coefficients were modulated accordingly so as to produce movements to all task directions and describe identifiably all motor tasks. Indeed, in all subjects the scaling coefficients of the selected synergies had significantly higher-than-random decoding performance (*p <* 0*.*05, bootstrap test) for all reaching directions. Furthermore, we found that all time-varying synergies too exhibit some degree of tuning to all task directions (individual decoding performance ranged from 33,75% to 38,75 %).

In sum, a small set of time-varying or synchronous synergies described all task-related differences in the examined EMG datasets. This set of synergies would have been hard to select relying on VAF-based criteria, because some of the standard criteria would exclude synergies that express task-discriminating muscle activations and contribute to the generation of each task, while others would include synergies that explain only task-irrelevant variations reflecting different sources of noise in the recorded data.

**FIGURE 8 | Synergy similarity matrix across all tested subjects.** The synergy sets extracted from all subjects that were selected by our method exhibited a high similarity either when the synchronous (bottom left) or when the time-varying synergy (top right) model was used.

# **USING THE DECODING FORMALISM FOR COMPARING THE ABILITY OF MUSCLE SYNERGY MODELS TO ENCODE TASK-DISCRIMINATING INFORMATION**

We further note that the overall decoding performance of a synergy model may be used as a criterion to decide which type of synergy decomposition is most suitable to describe a set of tasks and which (and how many) parameters of such a representation carry information about the task effected at hand. To illustrate this point, we compared the decoding performance of both timevarying and synchronous synergies with comparable number of parameters per synergy for all subjects tested.

We first examined decoding performance when using one parameter per synergy. We started by assessing the discrimination power of the two single-trial parameters of the time-varying synergies (i.e., scaling coefficients and time delays). Although decoding with the time delays is significantly above chance level (1*/*8 = 12*.*5%) for all subjects, the scaling coefficients afford a significantly higher decoding performance (37.3 ± 4.3% vs. 70.5 ± 2.2% respectively, **Figure 9A**) (*p <* 0*.*05, paired t-test). Comparing the scaling coefficients of the time-varying synergies with the integral of the activation coefficients of the synchronous synergies (**Figures 9A,B**), we found that the time-varying and the synchronous model had comparable performance (70.5 ± 2.2% vs. 69 ± 1.9% respectively). The corresponding VAF values for these synergy decompositions were 81 ± 2.1% vs. 89 ± 1.4% respectively. Consistent with previous findings (D'Avella and Bizzi, 2005), the synchronous synergy decomposition captures a significantly higher percentage of the variability of the data (*p <* 0*.*05, paired t-test).

Then, we evaluated the decoding performance gain obtained when combining two parameters per synergy. For the timevarying synergies, using both parameters yielded a significantly better decoding than with the scaling coefficients alone

**FIGURE 9 | Comparison of the decoding performance of synchronous and time-varying synergies. (A)** Percent correct values of the time-varying synergy decomposition for all four subjects tested using only the scaling coefficients (left), only the delays (middle) and both single-trial parameters per

synergy (right). The last column (black) is the mean across subjects. **(B)** Decoding performance as a function of the number of bins in which the integral of the activation coefficient is split for all subjects. The black curve and shaded area represents the mean ± SEM across subjects.

(72 ± 2*.*7% vs. 70*.*5 ± 2*.*2% respectively, **Figure 9A**) (*p <* 0*.*05, paired *t*-test). For the synchronous synergies, we divided the movement duration in two phases; we computed the integral of the synchronous synergy activation coefficients for each one of them and used these values as the decoding parameters. Also for this model, decoding using two parameters was significantly higher than with one (72 ± 2.6% vs. 69 ± 1.9% respectively) (*p <* 0*.*05, paired *t*-test). Then, we compared the task discrimination power of the two models. We found that, when using two parameters per synergy (scaling and delay coefficients for the time-varying model and the integral in two equal bins for the synchronous), the decoding performance of both types of synergy decomposition was 72 ± 2.7% vs. 72 ± 2.6% respectively (**Figure 9B**). Thus, for this particular set of tasks and muscles, both synergy decompositions seem equally adequate and equally compact in describing the task-discriminating variations in muscle activation signals. Indeed, equal decoding performance is obtained when using the same number of parameters in both approaches (*p <* 0*.*05, paired t-test).

Finally, we tested whether adding more single-trial parameters per synergy adds more information about the task. As the time-varying synergies have only two single-trial parameters, we had to restrict the analysis to the synchronous synergy model. By binning the activation coefficients in gradually shorter time bins and computing the corresponding integrals, we progressively increased the number of parameters used for decoding. **Figure 9B** depicts the decoding performance for all subjects as a function of the number of bins. Decoding performance saturated quickly for all subjects at approximately three bins, meaning that all task-discriminating information can be described by considering three basic phases of one-shot rapid movements. This is reminiscent of the well-known triphasic pattern observed during the control of ballistic single joint rotations but also during whole-body actions (a first agonist burst followed by an antagonist burst followed by a second agonist burst) (Berardelli et al., 1996; Chiovetto et al., 2010). This result points out that despite the potentially high number of free parameters describing the activation time course of synchronous synergies, the task being performed may be encoded by a much more compressed set of parameters without loss of information about task-discriminating variations in synergy recruitment.

In sum, in this particular dataset the synchronous synergy model was slightly more effective at describing taskdiscriminating variability than the time-varying one when the time course of synchronous synergies was modeled by three or more parameters. This corresponds to a slight increase in model complexity with respect to the use of time-varying synergies whose single-trial activation is described by two parameters per synergy. However, when using the same number of parameters, time-varying and synchronous synergies explained the same amount of task-discriminating variability.

# **DISCUSSION**

In this article we proposed and implemented an automated method, based on single-trial task decoding, to evaluate the extent to which muscle synergy recruitment can be mapped onto motor task identification. We showed how this new metric complements the VAF metric commonly used to test the validity of muscle synergy decompositions and can be used to address questions related to movement execution in task space. Quantifying the degree of validity of muscle synergies in task space is fundamental since synergies (as extracted from dimensionality reduction techniques) are assumed to be the building blocks of movement production that can be reused across tasks, while the synergy combination coefficients represent the task-dependent motor features. This study presents a systematic procedure to measure the degree of feasibility of such a modular and hierarchical control scheme by means of a single-trial task decoding technique. The significance of our conceptual and computational developments is discussed in the following.

## **CONCEPTUAL FOUNDATIONS OF THE METHOD**

Fundamental to the method is the fact that motor behaviors are produced on single trials and therefore should be analyzed on such a basis (Ranganathan and Krishnan, 2012; Ting et al., 2012). Even though single-trial analysis is not new in the study of muscle synergies (Tresch et al., 1999; Torres-Oviedo and Ting, 2007, 2010; Kutch et al., 2008; Valero-Cuevas et al., 2009; Chvatal et al., 2011), it appears to be a necessary and rational component in our methodology (otherwise task-decoding would be trivial with data averaged per task). Thus, exploiting single-trial analysis tools, we came up with a computational procedure that quantifies the task identification power afforded by different synergy decompositions. Our method uses this formalism to separate out trial-to-trial variations in synergy space that reflect task-discriminating variability (and hence increase decoding performance) from those that do not account for taskrelated differences. The latter can be regarded as "noise" as far as task discrimination is concerned, even if they reflect neurophysiological processes. This conceptualization mirrors the one often used in single-trial neural literature to separate signal from noise and identify the most informative components of neural responses (Quian Quiroga and Panzeri, 2009). The advantage of this method is that it allows the user to focus on task-discriminant aspects of variability using an objective and useful scale in a userdefined "task space" rather than measuring variability on a scale related only to the amplitude of the EMG signals (Quian Quiroga and Panzeri, 2009; Tolambiya et al., 2011).

As exemplified by our application to real data, one potential advantage of the decoding metric is that it can identify muscle activation components of relatively low amplitude (and accounting for a small amount of the VAF) but reflecting unique information about the task. Furthermore, a comparison of synergy sets determined by VAF or by decoding metrics may be useful to tease apart synergies that provide unique information about task-to-task differences (such as the synergy set individuated by decoding) from synergies that are task invariant but contribute an important part of variance because they e.g., implement muscle activation for maintaining posture. Such important task invariant synergies are likely to appear as "extra" synergies selected by VAF method (because they carry variance) but not from the decoding method (because they do not add much task discriminating power).

Interestingly, in the experimental data we found that our method revealed relatively high decoding scores for relatively small synergy sets, supporting the idea that a small set of synergies are recruited in different ways during the execution of a larger number of different motor tasks. This finding is compatible with the idea of muscle synergies as intermediate low-dimensional representations of the mapping between motor commands and task goals and the associated theory of hierarchical motor control (Ting and McKay, 2007; Todorov, 2009).

The hierarchical view of motor control implies that a desired motor behavior can be mapped onto specific recruitment of muscle synergies whose activation leads to the expected behavior (Bizzi et al., 2000; Tresch and Jarc, 2009). Due to the nonlinearities of the musculoskeletal plant, even small variations in the muscle pattern could lead to very different behaviors at the end-effector level and thus could affect task achievement. Therefore, evaluating the effectiveness of synergies in terms of motor task performance is crucial. Ultimately biomechanical modeling/simulation is needed to achieve this (Neptune et al., 2009; Kargo et al., 2010). Task-decoding is a useful additional metric to validate (or invalidate) a certain chosen synergy decomposition before or simply without this step. High task-decoding scores lend credit to a synergy decomposition. In contrast, low task-decoding scores (e.g., decoding performance that is robustly around chance level for different decoding algorithms) may falsify a given synergy decomposition, even if VAF is close to 100%. Generally speaking, the decomposition/model yielding concurrently the highest VAF and the highest decoding score could be viewed as the most likely representation of neural synergies implemented in the CNS. In practice, the decoding metric was shown to be more robust to various sources of noise than the VAF. As such, it provides a more stable reference to inter-subject or interstudy comparisons. Concretely, this approach can be useful to reduce side effects on VAF values related to the different material, subjects or signal pre-processing used by different researchers.

Previous studies have proposed to separate taskdiscriminating from non-discriminating variations by evaluating VAF within each task separately (Torres-Oviedo and Ting, 2010; Chvatal et al., 2011; Roh et al., 2011) or identifying task-specific muscle synergies (Cheung et al., 2009a). These methods are very effective when some tasks in the examined set are executed using synergies that are not shared with other tasks. On the contrary, our method is more effective when muscle activations in different tasks differ not because of the highly specific activation of synergies only in particular tasks, but rather because of different activation coefficients of the same group of synergies. In this latter case, application of synergy decompositions separately in each individual task would lead to the identification of a larger set of synergies than those actually generating the taskdiscriminating muscle activations (we verified this intuition also by extracting synergies in one task at a time in our simulated datasets, which invariably led to incorrect identification of a larger set of synergies with lower task-discrimination power than the synergy set determined by our method; data not shown).

Our findings parallel results recently obtained in frogs' experimental and modeling studies (Kargo and Giszter, 2000, 2008; Kargo et al., 2010). In these studies, the authors varied the initial configurations as well as the level of muscle vibrations applied to the frog's hindlimb and showed that flexible combinations of a small set of invariant muscle patterns can produce successful accurate targeting of the frog's hindlimb from a large range of starting positions. It would be of interest to apply our method to this dataset in order to assess quantitatively the reliability of the correspondence between such task variables or feedback stimuli and muscle pattern activations. We believe that our method could serve to determine the functional role of the identified muscle patterns and further evaluate the significance of their coupling under different experimental conditions.

# **ON THE COMPUTATIONAL METHOD AND ITS USE**

# *Critical comparison of the validity of different synergy models*

An advantage of the derived decoding metric is that it allows a direct comparison of performance of different synergy decompositions in terms of task execution when using the same number of parameters in all decompositions. This is useful for testing the validity of various hierarchical motor control schemes or various mathematical representations of muscle synergies. For example, the synchronous synergy model may account for more variance than the time-varying synergy one (as happens in the EMG dataset considered here) but this may be due to the fact that synchronous synergies are potentially characterized by a larger number of parameters than time-varying synergies (because the former report the full time course of activation, whereas the single-trial activation of the latter is described by two parameters per synergy) (D'Avella and Bizzi, 2005; D'Avella et al., 2006). Specifically, by applying this metric to our dataset we found that the two decompositions decode tasks equally well when using the same number of parameters. Moreover, our formalism can be used to evaluate the loss of task-discriminating information due to simplification of the time course of synergy activation. In our example dataset, our analysis suggested that a full representation of the temporal pattern of activation patterns of synchronous synergies is not crucial to encode the goal of the task. Only the average activation during two or three temporal phases seems sufficient to know what target will be reached.

To shed more light onto the effectiveness of synergies as lowdimensional structures for motor control, future research could aim at extending our formalism to quantify the predictability of kinematic movement features in continuous time (Corbett et al., 2010) rather than just decoding which task (out of many) was performed. This would enable a proper comparison between the dimensionality of the instantaneous kinematics needed to perform tasks and the dimensionality of the muscle representations that explain all the kinematic range elicited by task execution. Furthermore, it would be of interest to apply our methods to experimental datasets containing a wider range of complex movements to determine which synergy decomposition is more effective in general and particularly in more "daily-life" situations. For example, in an entire reach-and-grasp motor task, we could expect that the time-shifts of the time-varying synergy model or a more detailed consideration of the temporal profile of the synchronous synergy activations would be more relevant to task discrimination.

### *Automated selection of the minimal number of synergies*

Since our framework allows critically evaluating any muscle synergy model, it can be used in particular to select the minimal number of synergies at fixed synergy representation. Assuming, for example, time-varying synergies as the model from which movements are generated, the simple question of how many synergies are required to describe distinctly the entire set of motor tasks under consideration is usually difficult to answer. The empirical and intuitive VAF-based methods are not automatic or lack objective rationale. Even more problematic is the fact that a synergy accounting for a large or small amount of the total VAF might have no implication with respect to the task goal. We proposed a recursive and automated method to compare synergy set of *N* elements with synergy set of *N* − 1 elements. The basic argument is that if adding a synergy improves significantly task decoding performance then increasing the dimensions by one is worthwhile. We showed that this procedure is effective on simulated data sets for various levels of noise. However, examining decoding alone might be insufficient in some cases: for some data sets, one synergy could allow perfect decoding even though the muscle pattern cannot be reconstructed accurately (very low VAF). To cope with such cases, a potential variation of our method may imply choosing first a lower bound for *N* based on an inspection of the VAF graph, and then running our automated procedure from this *N* to refine the selection of the number of synergies. More generally, considering both VAF and decoding seems important to fully understand he function of the computed synergies, as a consequence of the above discussed complementarity of the two metrics.

## **POSSIBLE APPLICATIONS AND EXTENSIONS**

There is evidence that the activity of motor cortical neurons in the monkey (Holdefer and Miller, 2002) and the cat (Yakovenko et al., 2010) during reaching movements as well as spinal interneurons in the frog during reflex motions (Hart and Giszter, 2010) correlates with muscle synergies and/or their

# **REFERENCES**


recruitment. These studies suggest that the physiological basis of muscle synergy structures, which is observed in the motor output and assumed to simplify motor control, may rely on a focused selection of a set of neurons. We note that our decoding approach could be in principle applied to decode from single-trial neural population patterns activating synergies, to determine which patterns encode the task and which patterns carry additional or independent information to that carried by other patterns. Application of our method to simultaneous recordings of neural activity and EMGs during the execution of different movements could therefore lead to the determination at the same time of the minimal set of synergies and the minimal set of neural activity patterns that explain all taskdiscriminating neural and muscle activity, and to the specification of an explicit link between these two sets. These considerations suggest that the work presented here lays down the foundations for a deeper understanding of the relationships between single-trial neural activity and the resulting recruitment of muscle synergies.

More generally, this investigation could reveal itself useful for human-machine interfaces and neuroprosthetics (Nazarpour et al., 2012; Ting et al., 2012). Assuming it is possible to decrypt movement intention (e.g., what target a subject wants to reach to), a set of synergies could be recruited in accordance and, if our metric showed good task-decoding performance of the synergy decomposition considered, we could ensure that a coordinated multijoint arm movement would be generated to the expected target. Recent works aimed at exploiting synergies to control arm and hand motion (Jackson et al., 2006; Radhakrishnan et al., 2008; Vinjamuri et al., 2011) and assessing the validity of muscle synergies in task space appears to be a basic prerequisite for the effectiveness of such techniques.

# **ACKNOWLEDGMENTS**

We acknowledge the financial support of the SI-CODE project of the Future and Emerging Technologies (FET) programme within the Seventh Framework Programme for Research of The European Commission, under FET-Open grant number: FP7–284553. We are deeply indebted to E. Chiovetto for participating in the first stage of this project and to M. Jacono for technical assistance.

modulated activations of muscle synergies during natural behaviors. *J. Neurophysiol.* 101, 1235–1257.


trajectories constructed in spinal cord. *J. Neurosci.* 28, 2409–2425.


consistent across different biomechanical contexts. *J. Neurophysiol.* 103, 3084–3098.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 06 November 2012; accepted: 07 February 2013; published online: 26 February 2013.*

*Citation: Delis I, Berret B, Pozzo T and Panzeri S (2013) Quantitative evaluation of muscle synergy models: a singletrial task decoding approach. Front. Comput. Neurosci. 7:8. doi: 10.3389/ fncom.2013.00008*

*Copyright © 2013 Delis, Berret, Pozzo and Panzeri. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

# A methodology for assessing the effect of correlations among muscle synergy activations on task-discriminating information

# *Ioannis Delis 1,2,3\*, Bastien Berret 1,4, Thierry Pozzo1,5,6 and Stefano Panzeri 3,7*


### *Edited by:*

*Andrea D'Avella, IRCCS Fondazione Santa Lucia, Italy*

### *Reviewed by:*

*Masafumi Oizumi, University of Wisconsin–Madison, USA Andrew Jackson, Newcastle University, UK*

### *\*Correspondence:*

*Ioannis Delis, Institute of Neuroscience and Psychology, University of Glasgow, 58 Hillhead Street, Glasgow, G12 8QB, UK. e-mail: ioannis.delis@glasgow.ac.uk* Muscle synergies have been hypothesized to be the building blocks used by the central nervous system to generate movement. According to this hypothesis, the accomplishment of various motor tasks relies on the ability of the motor system to recruit a small set of synergies on a single-trial basis and combine them in a task-dependent manner. It is conceivable that this requires a fine tuning of the trial-to-trial relationships between the synergy activations. Here we develop an analytical methodology to address the nature and functional role of trial-to-trial correlations between synergy activations, which is designed to help to better understand how these correlations may contribute to generating appropriate motor behavior. The algorithm we propose first divides correlations between muscle synergies into types (noise correlations, quantifying the trial-to-trial covariations of synergy activations at fixed task, and signal correlations, quantifying the similarity of task tuning of the trial-averaged activation coefficients of different synergies), and then uses single-trial methods (task-decoding and information theory) to quantify their overall effect on the task-discriminating information carried by muscle synergy activations. We apply the method to both synchronous and time-varying synergies and exemplify it on electromyographic data recorded during performance of reaching movements in different directions. Our method reveals the robust presence of information-enhancing patterns of signal and noise correlations among pairs of synchronous synergies, and shows that they enhance by 9–15% (depending on the set of tasks) the task-discriminating information provided by the synergy decompositions. We suggest that the proposed methodology could be useful for assessing whether single-trial activations of one synergy depend on activations of other synergies and quantifying the effect of such dependences on the task-to-task differences in muscle activation patterns.

**Keywords: muscle synergies, correlations, information theory, task decoding, single-trial analysis**

# **INTRODUCTION**

The central nervous system (CNS) is capable of performing a wide repertoire of motor tasks despite the high complexity of the musculoskeletal system (Bizzi et al., 1998). A possible strategy for achieving this accuracy despite the difficulties of controlling so many degrees of freedom may rely on generating movement as combination of a small number of invariant muscle patterns, commonly referred to as muscle synergies (Tresch et al., 1999; D'Avella et al., 2003; Bizzi et al., 2008). Muscle synergies are presumably recruited by neural motor commands—the so-called synergy activations—to produce the muscle activities required for task execution (Ting and McKay, 2007). However, how the appropriate synergies are combined and their corresponding activation levels are selected in single trials is still an open question.

An important empirical observation is that, although the CNS is able to generate a reliable and consistent motor behavior in each single trial, synergy activations are highly variable across trials: repeated executions of the same motor task rely on different activations of muscle synergies (Tresch and Jarc, 2009). The impact of such trial-to-trial variability on task performance remains to be understood. A question of particular interest is whether activations of different synergies in the same trials are correlated—in other words, whether recruitment of a given synergy may depend not only on the task at hand but also on the activation of other synergies—and what the potential roles of such correlations among synergy activations may be (Saltiel et al., 2001; Tresch et al., 2006). These trial-to-trial correlations of synergy activations are often neglected in studies that report for example only the mean of each synergy activation coefficient independently of the other. Investigating such correlations requires, of course, ways to quantify their joint distributions across different trials.

From the theoretical point of view, these correlations may arise from different factors or serve a number of different purposes. Because individual synergies are presumably recruited by different neural drives that are not necessarily synchronous, it is conceivable that information about the recruitment of one synergy may be utilized for the activation of complementary synergies. In the same vein, the brain could rely on such correlations to cope with neural noise by reinforcing the relationships between the recruitment of the different synergies. Moreover, the correlation between synergy activations could be an emergent constraint of the task execution and its identity. For example, variations of speed in a set of arm pointing movements to a given spatial target could naturally lead to such correlations. Alternatively, these correlations may not serve a purposeful function, but rather arise from limitations of the neural-musculoskeletal systems and be detrimental to task performance. In such latter case, a useful strategy would be to minimize such correlations rather than using them as part of the motor strategy.

Here we introduce an analytical methodology to address the nature and functional role of trial-to-trial correlations between synergy activations. This method, which takes inspiration from methodologies derived for studying neural population codes, is designed to quantify how these correlations may contribute to generating appropriate motor behavior in single trials. The algorithm we propose first divides correlations between muscle synergies into types (noise correlations, quantifying the trialto-trial covariations of synergy activations at fixed task, and signal correlations, quantifying the similarity of task tuning of the trial-averaged activation coefficients of different synergies). Then, building on recent work designed to quantify single-trial task discriminability of EMG data (Delis et al., 2013) the method employs single-trial methods (task-decoding and information theory) to quantify the overall effect of correlations between synergy activations on the task-to-task differences in patterns of muscle activation. We show that the methodology is readily applicable to any type of synergy decomposition and demonstrate its use for addressing the functional role of coordinated synergy recruitment on a task-by-task basis. To illustrate the method, and to begin to reason about the existence and potential function of cross-synergy correlations in real EMG data, we finally implement the method on muscle synergies extracted from an electromyographic (EMG) dataset recorded during the execution of a variety of reaching tasks (Delis et al., 2013). This application reveals the robust presence of information-enhancing patterns of signal and noise correlations among pairs of synchronous synergies, and shows that they contribute to enhance by approximately 9–15% (depending on the set of tasks considered) the task-discriminating information provided by synchronous synergy decompositions. Activations of time-varying synergies were instead much more weakly correlated and their correlations had a more limited impact on task information (0–5%).

# **MATERIALS AND METHODS**

# **MUSCLE SYNERGY EXTRACTION**

The extraction of muscle synergies relies on dimensionality reduction algorithms that determine stereotyped muscle activation patterns from the EMG data by modeling muscle activities as linear combinations of the extracted synergies (D'Avella et al., 2003; Tresch et al., 2006; Tresch and Jarc, 2009). There exist two influential models for describing muscle patterns as synergy combinations: the time-varying synergies (D'Avella and Tresch, 2002), which are genuine spatiotemporal patterns of muscle activation, with the EMG output specified by the amplitude and time lag of the recruitment of each synergy; and the synchronous synergies, which are co-varying groups of muscle activations, with the EMG output specified by a temporal profile defining the timing of each synergy during the task execution (Tresch et al., 1999; Cheung et al., 2005). Here, we implemented both models using algorithms based on Non-negative Matrix Factorization (NMF).

We selected this dimensionality reduction technique over alternatives, such as Principal Component Analysis (PCA) or Independent Component Analysis (ICA), because of two reasons. First, it imposes a non-negativity constraint to the extracted synergies (Tresch et al., 2006). Such a constraint reflects well the properties of muscle activation signals, as muscles cannot be activated "negatively." Second and more importantly for this study, PCA and ICA make specific assumptions about the dependencies among the extracted synergies (orthogonality and statistical independence respectively), which impose constraints on the relationships between the corresponding synergy activations as well. NMF does not impose such constraints and thus seems more suitable for studying the trial-to-trial relationships of synergy activations.

### *Synchronous synergy model*

We used the NMF algorithm (Lee and Seung, 1999) to extract synchronous synergies. In this model, the EMGs are represented as a linear combination of a set of time-invariant activation balance profiles across all muscles activated by a time-dependent activation coefficient:

$$\mathfrak{m}^{\circ}(t) = \sum\_{i=1}^{N} c\_i^{\circ}(t)\mathfrak{w}\_i + \mathfrak{e}^{\circ}(t) \tag{1}$$

where *m<sup>s</sup> (t)* is again the EMG data of all muscles at time *t*; *w<sup>i</sup>* is the synergy vector for the *i*-th synergy; *c<sup>s</sup> i (t)* is the scalar coefficient for the *i*-th synergy at time *t*; *N* is the total number of synergies composing the dataset; and **ε***<sup>s</sup> (*τ*)* is the residual (e.g., noise). The exponent *s* indicates the trial-dependence of the variables.

### *Time-varying synergy model*

We used the time-varying synergy model first introduced in D'Avella and Tresch (2002). According to it, a muscle pattern recorded during one sample *s* is decomposed into *N* time-varying muscle synergies combined as follows:

$$\mathfrak{m}^s(t) = \sum\_{i=1}^{N} c\_i^s \mathfrak{w}\_i \left( t - t\_i^s \right) + \mathfrak{e}^s(t) \tag{2}$$

where *m<sup>s</sup> (t)*is a vector of real numbers, each component of which represents the activation of a specific muscle at time *t*; *w*(τ) is a vector representing the muscle activations for the *i*-th synergy at time τ after the synergy onset; *t s <sup>i</sup>* is the time of synergy onset; *<sup>c</sup><sup>s</sup> i* is a non-negative scaling coefficient; and **ε***<sup>s</sup> (*τ*)* is the residual (e.g., noise). To implement this model, we used the NMF-based timevarying synergy extraction introduced in D'Avella et al. (2003).

In the following, our purpose is to develop a mathematical procedure for quantifying the single-trial correlations among synergy activations and in particular their contribution to the task-discriminating information carried by the synergy decompositions.

# **SINGLE-TRIAL DECODING OF MOTOR TASKS IN THE MUSCLE SYNERGY SPACE**

In our previous work, we introduced an approach for predicting the motor task performed in every single-trial using muscle synergy activation parameters (Delis et al., 2013). In this study, our aim is to examine the functional role of the correlations between muscle synergy activations in each single trial. For simplicity, we restricted our decoding analysis to one single-trial parameter per synergy. We use as decoding parameters the timeintegral of the synergy activation coefficient for the synchronous synergies and the scaling coefficient for the time-varying synergies. Decoding was performed using a quadratic discriminant algorithm (QDA) (Duda et al., 2001). QDA assumes that the probability of obtaining the synergy activation vector given that task *t* is performed follows a Gaussian distribution for each task. Based on this assumption, the algorithm attempts to decode the task by determining the decision boundaries that maximize the ratio of the between-task over the within-task distances. In contrast to the linear discriminant algorithm that we used previously, QDA assumes unequal covariance matrices across tasks and this difference leads to quadratic instead of linear decision boundaries (Duda et al., 2001). Although the differences between these two algorithms in terms of task decoding were relatively small, the use of QDA can in principle better handle the differences between correlated and uncorrelated data, as strong correlations may lead to curved decision boundaries (Duda et al., 2001; Averbeck and Lee, 2006).

To validate decoding results, we implemented the "leave-oneout" cross-validation (Kjaer et al., 1994; Quian Quiroga and Panzeri, 2009), in which each trial is predicted based on the distribution of all other trials. Hence, in each step of this crossvalidation procedure the training set consists of *M* − 1 trials, while the test set consists of 1 trial. This process is repeated until all *M* trials have been tested. The "leave-one-out" approach maximizes the number of trials used for optimizing the decoder (training set) as well as for assessing its performance (test set).

### **QUANTIFYING INFORMATION FROM THE CONFUSION MATRIX**

Once decoding is performed, we need to choose a measure of decoding performance. The simplest possibility, which we used in our earlier work, is to use the percentage of correct decoding (Delis et al., 2013). A potential problem with using percent correct is that it may fail to capture all task-discriminating information even when using an optimal decoding algorithm. Synergy coefficients may convey information by means other than just reporting the most likely task given the muscle synergy activation pattern: for example, they can provide the information that some tasks are utterly unlikely based on the synergy activations (Quian Quiroga and Panzeri, 2009).

A way to include in our calculation more ways to encode information is to use the mutual information *I(T*; *TP)* between the actual and the predicted tasks from the decoding outcomes (Shannon, 1948). The information *I(T*; *TP)* is a measure of the overall information about which task out of a set is gained by the prediction of the most likely task from the single-trial muscle synergy activations, and is defined as the mutual information between the rows and columns of the so called "confusion matrix" (i.e., the matrix quantifying the probability of predicting a given task given the presentation of a certain task):

$$I\left(T;T^{P}\right) = \sum\_{t,t^{P}} P\left(t,t^{P}\right) \log\_{2} \frac{P\left(t,t^{P}\right)}{P(t)P\left(t^{P}\right)}\tag{3}$$

where *t* is the motor task performed, *t<sup>p</sup>* is the one predicted by our decoding algorithm, *P(t)* is the probability of execution of task *t*, *P(t <sup>p</sup>,t)* is the confusion matrix, i.e., the joint probability of predicting task *t <sup>p</sup>* and executing task *t*, and *P(t <sup>p</sup>)* is the probability of predicting task *t <sup>p</sup>* across all tasks in the considered set. Such measure of information *I(T*;*TP)* is not an absolute property of the synergy set, but it depends on the specific set of tasks considered (because it measures discriminability between tasks belonging to the specific set considered). In this article we will compute information about three possible task sets: the set of all eight different reaching tasks; the set of all four center-out tasks, and the set of all four out-center tasks. All quantities are computed from Equation (3), but summing over the appropriate set or subset of tasks.

It is useful to remind that information is measured in bits. Every bit of information reduces the overall uncertainty about the task by a factor of two. Perfect knowledge about the task from the synergy decomposition gives maximum mutual information of log2 *K*, where *K* is the number of tasks. For a given percent correct value, more information can be obtained if incorrect predictions are concentrated into clusters around the correct stimulus rather than distributed randomly (Panzeri et al., 1999b; Samengo, 2002). This information measure can reveal the effect of such systematic errors in the decoder and thus capture also this form of information that may be carried by muscle synergy activations. Information values were computed using the Information Breakdown Toolbox (Magri et al., 2009) available at www*.*infotoolbox*.*org. To eliminate the systematic bias from which information measures suffer when computed from small datasets, we used the Panzeri-Treves (PT) bias-correction method (Panzeri and Treves, 1996; Panzeri et al., 2007).

In Results, for simplicity we will abbreviate *I(T*; *TP)* simply as *I*.

# **SELECTING THE SET OF SYNERGIES THAT CARRY ALL TASK-DISCRIMINATING INFORMATION IN THE MUSCLE SYNERGY SPACE**

The number of synergies (*N*) used to perform a set of motor tasks and their contribution to task-discriminating variations, is unknown a priori. To address this, we developed an automated procedure to select the minimal number of synergies that carry all information about task-to-task differences. This model selection technique is based upon the progressive evaluation of the statistical significance of the task-discriminating information added when progressively increasing the number of synergies in the decomposition model. After evaluating the information carried by *N* = 1 synergy, the number of synergies in the decomposition model increases step by step, until the increase of synergies does not gain any further statistically significant information. The test of statistical significance was designed as follows. For a given value *N*, we compare the information carried by the synergy parameters when using the *N* synergies with the decoding performance of the parameters of all subsets consisting of *N* − 1 synergies plus the parameters of the *N*-th synergy pseudo-randomly permuted ("shuffled") across conditions. We repeat this shuffling procedure a number of times (100 in our implementation) to obtain a nonparametric distribution of decoding performance values in the null hypothesis that the additional synergy does not add to information carried by the synergy decomposition. In the following we evaluated this significance at the *p <* 0*.*05 threshold. The statistical threshold for significant increase of decoding performance was graphically highlighted in the information curves as a function of the number of synergies as a shaded area indicating the 95% confidence intervals constructed using this bootstrap procedure (**Figure 1C**). The selected number of synergies can be simply visualized as the smallest value of *N* for which information lies above the no-significance (shaded) area. In this way, the chosen set of *N* synergies is the smallest decomposition that captures all available task-discriminating information within the synergy space.

**recorded during the execution of an arm pointing task. (A)** The sets of synchronous synergies obtained from the experimental data of four subjects as vectors of activation levels of nine shoulder and elbow muscles for each

indicated in left *y*-axis) and information curves (red curve whose scale is indicated in right *y*-axis). The shaded area represents the 5–95% confidence interval of the bootstrap test for decoding (see Materials and Methods).

We verified, by means of generation of realistically simulated EMG datasets [based on linear combination of realistic synchronous and time-varying synergies with the addition of various kinds of physiologically-relevant noise—see (Delis et al., 2013)] that the method needed only a small number of trials per task (10 or more) to individuate correctly the set of synergies generating the data and to evaluate correctly their information content [data not shown, but see (Delis et al., 2013) for similar evaluations of robustness of decoding algorithms].

# **DEFINING AND QUANTIFYING SIGNAL AND NOISE CORRELATIONS AMONG SYNERGY ACTIVATIONS**

Before we present our methodology based on the metrics defined so far, we explain what we mean by correlations among activations of muscle synergies, how we can quantify these correlations and how they may impact on the information about the task. In general, we would like to distinguish between different kinds of correlations ("signal" and "noise"—see below) that, as shown in previous studies of neural population codes and reviewed below, are known to have different impacts on information about the task (Averbeck and Lee, 2006; Averbeck et al., 2006; Ince et al., 2010).

To separate out the contribution of task modulation and of variability not attributable to task-to-task differences, it is useful to characterize the activation of each synergy in each trial as "signal plus noise" (Gawne and Richmond, 1993; Panzeri et al., 1999a; Averbeck et al., 2006), where we refer as the "signal" the trial-averaged synergy activation coefficients for each task and took as the "noise" the trial-by-trial fluctuations of the activation response around their averaged across trials at fixed task. We stress that such "noise" does not necessarily reflect only noise in the real sense, but comprises all types of variations at fixed task, which may well include the effect of various types of potentially important contributions such as modulations arising from variations across trials of the movement kinematics.

We performed a linear analysis of correlations across synergies of both the signal and the noise, as follows. The correlation of the averaged (across all trials to the same task) synergy activation coefficients across different tasks of two given synergies are called "signal correlations" (Gawne and Richmond, 1993; Panzeri et al., 1999a; Averbeck et al., 2006) because they are entirely attributable to task-to-task variations. The signal correlation coefficient was computed, for each synergy pair and channel, as the Pearson correlation across stimuli of the trial-averaged responses. Positive values indicate that the two synergies have similar task preference, whereas a zero value indicates that the two synergies have a radically different task tuning.

Correlations manifested as covariations of the trial-by-trial fluctuation around the mean response to the task are called "noise correlations" (Gawne and Richmond, 1993; Panzeri et al., 1999a; Averbeck et al., 2006). Since these noise covariations are measured at fixed task, they ignore all effects attributable to common task-to-task variations. To quantify the strength of noise correlations, we computed the Pearson correlation coefficient (across trials at fixed task) of the trial-average-subtracted synergy coefficients. This quantifies the correlations of the variations around the mean at each trial and task. Positive values of noise correlation mean that when the activation of one synergy fluctuates over its mean values, the activation of the other synergy is also more likely to do so.

The division in signal and noise correlation is important because, as we shall see below, they have a different impact on information about tasks.

# **MEASURING HOW CORRELATIONS BETWEEN SYNERGY ACTIVATIONS AFFECT TASK-DISCRIMINATING INFORMATION**

After having defined correlations, we proceed to present a methodology for characterizing how they affect the total task information carried by the set of synergies. In other words, we aim at comparing the information available in the set of synergies including correlations between them [denoted*I*, and defined in Equation (3)] with the information that would be available if correlations were absent (Hatsopoulos et al., 1998; Nirenberg and Latham, 1998; Panzeri et al., 1999a, 2010; Golledge et al., 2003; Schneidman et al., 2003; Ince et al., 2010). The former is simply computed as described above on the original data, which are made of a combination of synergy coefficients simultaneously acquired in each trial and thus contain trial-to-trial correlation between synergy activation coefficients. The information in absence of correlations can be denoted as *I*ind*(T*; *TP)*, the "ind" subscript indicating that it is built from data that are made to be distributed independently at fixed task. *I*ind*(T*; *TP)* can be computed again with the procedure described above, but applying it to combinations of synergy coefficients obtained after "shuffling" the data at fixed task, i.e., combining synergy values into non-simultaneous arrays each taken (randomly and without replacement) from different trials in which task *t*was performed (Ince et al., 2010; Panzeri et al., 2010). This shuffling preserves the marginal distributions of the activation of each synergy, and only changes the distribution of their joint observations. All subsequent decoding and information analysis were performed on the shuffled data, and then information results were averaged over the outcome of 50 independent random shufflings. Note that shuffling is done both on the test and training data for the decoder, thus in this way the effect of correlations is removed from both the decision boundaries determined by the decoder (using the training data) and the actual data to be decoded (test data). In the following, for brevity we will denote *I*ind*(T*;*TP)* simply as *I*ind.

From the theoretical point of view, correlations between synergy coefficients can either increase (i.e., *I > I*ind) or decrease (i.e., *I < I*ind) information with respect to the case in which their marginal distributions are the same but there is no correlation (Pola et al., 2003). If noise correlations increase the information carried by the muscle synergies, one may speculate that correlations between synergy activations may be useful in describing the salient task-to-task variations of muscle activation patterns. Whether or not correlations increase or decrease the information depends on several factors (Oram et al., 1998; Panzeri et al., 1999a; Averbeck et al., 2006). The first is the stimulus modulation of the strength of noise correlation: strongly task modulated correlation tend to increase the information, because their task-to-task modulation tends to further pull apart task-conditional distributions of joint synergy activations (Panzeri et al., 1999a; Pola et al., 2003). The second is the interplay between the sign of signal and noise correlations. If signal and noise correlation have opposite signs, noise correlations increase task discriminability compared to what it would be if noise correlations were zero because in such case they tend to pull apart task-specific joint distributions (Oram et al., 1998; Abbott and Dayan, 1999). If, instead, noise and signal correlations have the same sign, then noise correlations decrease information, and task are less discriminable than the zero noise correlation case. For intuitive illustration of these effects, see **Figure 2**.

### **GENERATION OF SIMULATED EMG DATA**

To investigate whether the synergy extraction algorithm used (NMF) affects the correlational structure of the identified synergies, we tested it on EMG data generated from muscle synergies with known task-dependent correlations. To this end, we simulated EMG data from synchronous synergies using the model introduced in Delis et al. (2013). Briefly, the data simulated the activation of 10 muscles used for executing 50 repetitions of each one of 2 motor tasks (T1 and T2). To execute each motor task, the first synergy was activated by a scalar coefficient drawn from a uniform distribution in the [0,1] interval. The activation coefficients of the second synergy were correlated with the ones of the first synergy with a correlation coefficients *r*<sup>1</sup> for task T1 and *r*<sup>2</sup> for task T2 respectively. We varied the levels and signs of *r*<sup>1</sup> and

**FIGURE 2 | The effect of interplay of signal and noise correlations between synergies on task information.** Each panelsketches joint distributions of activations of two hypothetical synergiesduring two different tasks (data for task one and two are plotted in orange and green color, respectively). The dots represent a hypothetical scatterplot from single-trial activations to the given task, and each ellipse denotes 95% confidence limits. In the upper panel, there is positive signal correlation (i.e., individual synergy activations to each task are positive correlated), whereas in the lower panels there is negative signal correlation. Positive noise correlations correspond to ellipses aligned along the diagonal. The more the ellipses are elongated, the stronger the noise correlation. The sign of noise correlations between the joint responses differs across columns of this figure (noise correlation is positive in the left column and negative in the right column). In this figure, noise correlations are task independent—equally strong across stimuli (all the ellipses within a panel have the same elongation). In general, if noise and signal correlation have opposite signs, the effect of correlations increases the information about tasks, because the joint response probabilities to each task become more separated. If instead noise and signal correlation have the same sign, tasks are less discriminable.

*r*<sup>2</sup> as shown in **Table 1** to generate four different datasets and ran the synergy extraction 50 times for each dataset.

# **EXPERIMENTAL PROCEDURES**

To test the methodology presented above and gain more insights about its potential usefulness, we applied it to physiological EMG data recorded in an experiment that has been presented before (Delis et al., 2013). In brief, the experimental dataset that will be used throughout the Results Section was composed of the EMG activity recorded from nine upper body and arm muscles during execution of arm pointing movements in the horizontal plane (see for a detailed description). Four participants were asked to perform center-out (forward, denoted by fwd) and out-center (backward, denoted by bwd) one-shot point-to-point movements between a central location (P0) and four peripheral locations (P1-P2-P3-P4) evenly spaced along a circle. In total, the experimental protocol specified 4 targets × 2 directions = 8 distinct motor tasks denoted by T1, T2, . . . , T8. Each task was composed of 40 trials. Such a relatively high number of repetitions of each task was useful for evaluating the impact of trial-to-trial variability on the combination of muscle activation patterns.

Body kinematics was recorded by means of a Vicon (Oxford, UK) motion capture system with a sampling rate of 100 Hz. The kinematics data were low-pass filtered (Butterworth filter, cut-off frequency of 20 Hz) and numerically differentiated to compute tangential velocity and acceleration. For each movement, we measured movement onset time, movement end time, maximum speed, maximum acceleration and their times of occurrence. Movement onset and movement end were identified as the times in which the velocity profile of the fingertip superseded 5% of its maximum. Subjects performed all motor tasks at a variety of speeds ranging from normal to very fast. For example, movement durations for subject AK ranged from 182 ms to 651 ms. Across subjects, the mean movement duration varied from 370 ms to 560 ms.

To identify muscle synergies from these data, we used the time course of EMG activity of all recorded muscles in all individual trials for each task.

# **RESULTS**

In this section, we illustrate the application of our method to muscle synergies identified from the EMG data recorded during the performance of point-to-point reaching movements in

**Table 1 | Correlation coefficients of the original (left columns) and the reconstructed (right columns) activations of the two simulated synergies for tasks T1 (***r***1) and T2 (***r***2).**


different directions in the horizontal plane. We first present extensive results of our analysis applied to the synchronous synergies of all subjects and then refer briefly to its application to the time-varying synergies.

# **IMPACT OF THE SYNERGY EXTRACTION ALGORITHM ON CORRELATIONS**

Before applying our methodology to experimental data, we tested the ability of the NMF algorithm to identify correctly the correlational structure in the synergy space. Thus, we simulated EMG data from synergies with known correlations (see Materials and Methods) and checked whether the output of the NMF algorithm represented reliably the correlations that were present in the underlying synergies. An illustration of the impact of synergy extraction on the correlations between synergy activations is shown in **Figure 3**. In this case, the original activations of the two synergies are highly positively correlated for both tasks considered (**Figure 3A**). The NMF algorithm identified correctly the two synergies as well as the positive noise correlations between their activation coefficients (**Figure 3B**). However, the strength of noise correlations was slightly underestimated by NMF.

We then investigated what happened when simulating data with different levels of either positive or negative correlations between the synergy activations. The results are summarized in **Table 1**. In all cases, the NMF algorithm recovered the original synergies and the sign and strength of the signal and noise correlations between the activation coefficients. The signal correlations were reconstructed accurately in all cases (*p >* 0*.*05, *t*-test), whereas the reconstructed noise correlations were in general slightly but significantly (*p <* 0*.*05, *t*-test) lower compared to the original activations (see **Table 1**). In sum, the application of NMF to our simulated data estimated correctly signal correlation and underestimated slightly noise correlation among the activation coefficients of the extracted synergies.

### **MUSCLE SYNERGY IDENTIFICATION AND TASK MODULATION**

We extracted synchronous muscle synergies from the recorded and pre-processed EMG data and used our information-based methodology to identify the smallest synergy sets that explain, for each subject, all task-discriminating information in the synergy space. In brief, the method first computes the information about task carried by each synergy; then computes the information from a set of synergies starting with the most informative ones and then including other synergies until they stop providing additional information. Since this selection is done using simultaneous (non-shuffled data), this set of synergies contains all information about task that can be extracted from the EMG data, including the information that is transmitted by correlations among synergy activations. Considering synergies belonging to this set (rather than for example including other synergies that explain some variance in the data but do not add any additional task discriminability) ensures that we can work on a compact set that yet contains all relevant variables for task discrimination.

Results of this synergy selection using this information criterion were very similar to the synergy selection that we obtained in a recent study (Delis et al., 2013) on the dataset using a slightly different and slightly less powerful criterion (we considered only percentage of correct decoding—rather than Shannon information—as the task discriminability metric in the above procedure). We will therefore present the selected synergy sets briefly and we refer to Delis et al. (2013) for more details. We found (**Figure 1**) that our method selected four synergies for two subjects (AK, ES) and three for the other two (AM, AB)- **Figure 1C**. These synergies not only carried all information about differences across tasks but also explained a high percentage of the variance of the recorded EMG data (see VAF curves in **Figure 1C**). Most of the extracted synergies were highly similar across subjects and each one had a distinct functional role in movement execution (**Figure 1A**). S1 activated mainly muscles flexing the shoulder, S2 consisted mainly of muscles extending the elbow, S3 had high activations for elbow flexors and S4 activated

**between synergy activations. (A)** Scatterplot of the original activations of the two simulated synergies for two tasks (T1 and T2). **(B)** Scatterplot of the values report correlation coefficients at fixed task. Noise correlations are slightly weaker after application of NMF to the data.

highly the shoulder extensors. The results were relatively robust across subjects: three of the four synergies (S1, S2, S3) were identified in all subjects and the fourth (S2) was present in two of them (AK, ES). In the other two subjects, the elbow extensors were activated by other synergies. By examining the trial-averaged synergy activations (**Figure 1B**), we observe that all synergies were shared across the eight motor tasks but their activation levels differed, which led to a high task-discrimination power in the muscle synergy space. Task modulations of individual synergies were robust across subjects, indicating a consistent mapping between synergy activations and task identification.

# **THE INTERPLAY OF SIGNAL AND NOISE CORRELATIONS AND ITS EFFECT ON TASK-DISCRIMINATING INFORMATION**

The previous section considered the tuning of individual synergy coefficients to task. We next wanted to investigate the nature of the joint (rather than the marginal) distributions of synergy coefficients across trials. In other words, we asked if the activation of any given synergy depended only in the task or also on the particular level of activations of other synergies in the same trial. We also asked whether single-trial correlations between synergy activations may affect task discriminability.

To gain more insights into trial-to-trial variations of synergy activations, we first illustrated scatter plots of the integrals of the single-trial activation coefficients for one pair of synchronous synergies (S1–S3). In **Figure 4A**, different tasks are color-coded to show the distribution of synergy activations at fixed task. Colored straight lines indicate the principal axis of dependence and the reported values correspond to the correlation coefficient for each task. Positive noise correlations are reflected in elliptic distributions aligned along the diagonal. The more the ellipses are elongated, the stronger the noise correlation.

The strength of noise correlations between synergy activations varied significantly depending on the task. For example, activations of the S1–S3 synergy pair were strongly positively correlated for the out-center tasks (T5-T6-T7-T8), but they were almost uncorrelated for the center-out tasks T1-T2-T3- T5 (**Figure 4A**).This finding was robust across all four subjects. Thus, these two synergies were differentially coupled across tasks, suggesting that they constitute a functional pair whose singletrial interactions are relevant for the performance of a subset of motor tasks. Theoretical studies demonstrate that the modulation of correlations across different experimental correlates can only increase the information carried about the external

synergies S1–S3. The two straight lines are the best-fit lines for the distribution of points for the center-out and out-center tasks respectively. Conventions are the same as in **(A)**. **(C)** Comparison of the information carried by the activations of synergies S1–S3 when taking into account correlations (*I*) and when ignoring them (*I*ind). The scatterplots show I vs. Iind across subjects. We plot the 45◦-slope line for comparison purposes. Left: Comparison of information about the center-out tasks. Middle: Comparison of information about the out-center tasks. Right: Comparison of information about all tasks.

correlates, with respect to the case of no correlation or nonmodulated correlations (Panzeri et al., 1999a; Pola et al., 2003). These theoretical considerations suggest that the observed task modulation of noise correlations of synergy pairs increase the task discriminability.

To evaluate further the impact that noise correlations may have on the information carried by this pair of synergies, we also examined their signal correlation. As discussed in Materials and Methods, signal correlations of sign opposite to that of noise correlations are best suited to increase task information (**Figure 2**). We considered separately signal correlations between out-center and center-out tasks, as they seemed to give a different pattern of correlations. The joint distribution of the trial-averaged synergy activations across out-center tasks of synergiesS1 and S3 shows that signal correlation was negative robustly across subjects (**Figure 4B**). In other words, out-center tasks that elicited on average a higher activation of synergy S1 for a subgroup of tasks tended to elicit a lower average activation for synergy S3 (**Figure 4B**). Taking into account that noise correlations for center-out tasks were consistently positive, we expected a significant increase of information about center-out tasks due to correlation. We verified this by comparing the amount of information about center-out tasks that can be obtained by observing S1 and S3 simultaneously to the information *I*ind obtained from data shuffled by pairing S1 and S3 activations from randomly selected trials to the same task. We remind that, while *I* contains all information carried by synergy coefficients, including the effect of their correlations, *I*ind is computed from data manipulated to preserve the same marginal distributions of each synergy but without any correlation. Indeed, we found (**Figure 4C** middle) that the information carried by S1 and S3 about out-center tasks was, for each subject, significantly higher (22*.*5 ± 1*.*5%, paired *t*-test, *p <* 0*.*01) than the information *I*ind that neglected the effect of correlations. This suggests that correlations among

such synergies enhance the amount of discriminability between out-center tasks.

We then considered signal and noise correlations of S1 and S3 activations among center-out tasks. For the center-out tasks the sign and strength of signal correlations varied across subjects being mostly positive and in general lower in magnitude than the signal correlation among out-center tasks. Hence, we expected the effect of correlations between S1 and S3 on information about center-out tasks to be lower than the effect on out-center tasks. Indeed, by computing the information carried by these two synergies when correlations are taken into account and comparing it with that of the "independent" model (i.e., when we remove noise correlations by shuffling), we found that correlations did not contribute much to the discrimination of the center-out tasks (**Figure 4C** left). This means that noise correlations between S1 and S3 have a positive effect on the identification of the out-center tasks, but essentially no effect for the center-out ones. Over all tasks, there was a significant increase of 4 ± 0*.*5% (paired *t*-test, *p <* 0*.*05) in the information about all tasks carried by the activation of S1 and S3 due to correlation between S1 and S3 (**Figure 4C** right).

To gain more insights into the impact of correlations on task discrimination, we illustrated graphically the decoding procedure for these two synergies and the four out-center tasks using data from one subject (AB). **Figure 5A** shows the integral of activations of S1 and S3 when correlations were present (**Figure 5A**) and when they were removed by shuffling (**Figure 5B**). In **Figure 5**, the QDA determined the decision boundaries for classifying the trials to the motor task performed, and as a result, separated the 2-dimensional spaces into 4 regions, one for each task. Each point is assigned to the task represented by the colored region on which it lies. The presence of noise correlations resulted in elliptic distributions of the synergy activations at fixed task, which were identified well by the QDA algorithm (**Figure 5A**).

**FIGURE 5 | Illustration of the effect upon decision boundaries and joint synergies activation distributions of the removal of noise correlations in task discrimination.** Decoding the four out-center motor tasks (T5-T6-T7-T8) using synergies' S1–S3 activation coefficients taking into account correlations **(A)** or after removing them **(B)**. For a given trial to be decoded, the activation

coefficients of the synergies are represented as a point in the 2-dimensional space. The color of each point indicates the actual task which this trial corresponds to. The quadratic discriminant algorithm has divided the space into four regions, one for each motor task. The trial is assigned to the task indicated by the color of the region on which the point lies.

Thus, 57% of the trials were correctly decoded and 0.52 bits of information about the task were carried by the co-activation of these two synergies. On the contrary, after eliminating correlations, the distributions of data points were more circular which led to a higher overlap between different tasks and, as a result, to decreased decoding performance (52%) and information (0.35 bits), as shown in **Figure 5B**.

Following this, we investigated how these patterns of signal and noise correlations generalized to other synergy pairs. We found (**Figure 6**) that the synergy pair S2–S4 exhibited a pattern of positive noise and negative signal correlations for the center-out tasks (but not for the out-center tasks). This finding was true for both subjects (AK, ES) for whom we detected four synergies describing all the task related information. We quantified the information gain also in this case and found that information about the task increased by 17% on average when correlations were kept in the analysis.

For all other synergy pairs, noise correlations were mostly weak (*r <* 0*.*35) and non-significant. The few cases of strong noise correlations were not robustly found across subjects and/or tasks. Regarding signal correlations, activations of synergies S3–S4 were robustly negatively correlated across all eight tasks considered (*r* = −0*.*80 on average) (as can also be observed by their task-tuning curves in **Figure 2**) but did not show strong noise correlations at fixed task (*r* = 0*.*14 on average).

Finally, we evaluated quantitatively the contribution of correlations between all synergies explaining task-to-task differences for the four recorded datasets. When considering only the centerout tasks, noise correlations had a slightly detrimental effect on task-discriminating information coding for the two subjects that used three synergies to perform the tasks but contributed positively for the other two subjects (**Figure 7A**). Across subjects, the average information increase when considering correlations was 11 ± 5*.*5% but the increase was not significant at the population level (paired *t*-test, *p >* 0*.*1). For out-center tasks, there was a significant information gain of 11 ± 2*.*5% (paired *t*-test, *p <* 0*.*1) when including correlations across all four subjects (**Figure 7B**). We should also note that task discrimination of out-center tasks was poorer than for the center-out for all datasets. Thus, the presence of noise correlations could play a role in improving task discriminability in cases where confusions between tasks are more likely. Overall, the presence of noise correlations resulted in a significant increase of 9 ± 1*.*5% (paired *t*-test, *p <* 0*.*05) in the total task-discriminating information that was present in the muscle synergy decompositions (**Figure 7C**).

**information about the task. (A)** Scatterplots of the integral of the single-trial activations of synergies S2–S4 across the eight tasks

statistical significance at *p <* 0*.*05. Conventions are the same as in **Figure 4**.

These findings suggest that single-trial correlations between muscle synergy activations increase task-related differences in muscle activation patterns.

### **DEPENDENCE OF SYNERGY CO-ACTIVATIONS ON MOVEMENT SPEED**

We next asked whether noise correlations among synergy activations may in part reflect the effect of the trial-to-trial covariations of a movement parameter. To answer this, we analyzed the relationship of synergy activations with single-trial kinematic parameters of the movements, such as movement duration, maximum speed and acceleration. Interestingly, our correlation analysis revealed a significant dependence of the activations of synergies S1–S3 on the speed with which the movement was performed for all out-center tasks. **Figure 8** reports scatterplots of the activations of each one of the two synergies at fixed task with respect to maximum movement speed for one subject. These results generalize across all subjects: significant activation-speed correlations were found in 10 out of 16 cases (4 center-out tasks times 4 subjects) for synergy S1 and 11 out of 16 for synergy S3 (*p <* 0*.*05). Accordingly, activations of synergies S2–S4 were correlated with movement speed for the four out-center tasks for both subjects that used S2 (5/8 and 6/8with significant correlations, *p <* 0*.*05). Therefore, the positive synergy correlations at fixed task explain a big part of the variability in the speed with which the task was executed.

### **EXTENSION OF RESULTS TO TIME-VARYING SYNERGIES**

Although we illustrated our methodology and discussed our findings only within the synchronous synergy framework so far, the method's applicability can be readily extended to the time-varying synergies. Here we illustrate this extension by decomposing the same dataset in time-varying rather than synchronous synergies.

We found that our synergy selection method identified the same number of time-varying synergies as the number of synchronous synergies for all the four datasets tested. **Figure 9A** shows the four time-varying synergies identified from one subject's dataset. Similar task modulations of average synergy activations were present also for this model (**Figure 9B**). Also in this case, all synergies participated in the execution of the eight motor tasks varying their activation levels in a task-dependent manner. However, when examining the impact of trial-to-trial correlations between synergies, we found weak noise correlations among the scaling coefficients activating the time-varying synergies. An illustrative example of the strength of noise correlations between time-varying synergy activations is shown in **Figures 9C,D**. To compare with the synchronous synergy case presented above, we depict scatterplots for the synergy pairs S1–S3 and S2–S4. In this case, noise correlations are nonsignificant for almost all tasks. Therefore, these correlations had no impact on task discrimination (they did not increase taskdiscriminating information, **Figure 7F**). In particular, correlations increased information about the task only in one of the four datasets for both center-out and out-center tasks, whereas they reduced or did not affect information for the other three datasets (see **Figures 7D–F**). This finding indicates that correlated activations of time-varying synergies contribute less to task identification with respect to the modulations of individual synergies.

Scatterplots of the single-trial activations of synergy S1 (blue) and S3 (green) with respect to the max movement speed in each trial at fixed

The r values report correlation coefficients and the <sup>∗</sup> denotes statistical significance at *p <* 0*.*05.

# **DISCUSSION**

In this article, we extended a previously-developed single-trial task decoding formalism (Delis et al., 2013) to derive a novel methodology that evaluates quantitatively the impact of trial-totrial correlations among muscle synergy co-activations on the task-to-task differences in patterns of muscle activation. Our aim was to suggest and test methodological ideas to answer an important question concerning task information coding in the CNS, and more precisely in the context of neuromuscular synergies: Do correlations between synergy activations play a role indiscriminating motor tasks?

### **METHODOLOGICAL DEVELOPMENTS**

The methodological approach that we suggested is derived from the literature in neural population coding, in which the problem of the role of trial-to-trial correlations among groups of simultaneously active neurons have been hotly investigated (Averbeck and Lee, 2006; Averbeck et al., 2006; Ecker et al., 2011). The approaches have been useful both to clarify the conditions in which correlation may increase or decrease information (Oram et al., 1998; Panzeri et al., 1999a; Schneidman et al., 2003; Latham and Nirenberg, 2005; Averbeck et al., 2006; Oizumi et al., 2010) and to indicate the potentially large impact of correlation on the accuracy of population codes especially when considering large number of cells (Zohary et al., 1994; Salinas and Sejnowski, 2001; Averbeck et al., 2006; Schneidman et al., 2006; Pillow et al., 2008; Ince et al., 2010; Oizumi et al., 2011). This has led several authors to propose that the ability of cortical circuits to modify, regulate or tune their correlations is an important feature of cortical functional organization (Salinas and Sejnowski, 2001; Ecker et al., 2010; Kumar et al., 2010; Renart et al., 2010).

In this article we discussed how to adapt this approach to EMG recordings. From the methodological point of view, we needed to make some progress before being able to adapt previous neural approaches to EMG recordings. First, we needed a method to select correctly the number of synergies to be used, which is one of the free parameters of the analysis. Here, we took the approach of selecting this number as to give the smallest possible set of variables that carries all task-discriminating information. Second, the sampling issues in computing information are more severe in EMG experiments than in most experiments used to study neural coding in peripheral system or anaesthetized animals, and synergy activations are analog variables rather than yes/no variables (like for spiking neurons). Here we addressed these problems by using an intermediate "decoding" step to project the relatively high-dimensional space of all synergy activations into the task space, and we computed the effect of correlations using data

motor tasks performed. Histograms are plotted as means ± SDs across all

manipulations to destroy them rather than computing their effect with analytical techniques (Panzeri et al., 1999a; Pola et al., 2003). The approach was overall robust and allowed to study, with realistic datasets, the effect of correlation on the task information on decompositions of arrays of up to nine muscles. We feel that the coupling of NMF procedures with information algorithms presented here could be a valuable tool not only for evaluation of muscle synergies, but also for the analysis of large scale simultaneous recordings of neural activity, as it combines two powerful data compression procedures to describe compactly the information content of large datasets. As such, it could be fruitful to further the progress of our understanding of the functional role of correlations among groups of neurons. Here, we used this approach to quantify, using EMGs recorded in a simple reaching task, the amount of task-discriminating information carried by trial-to-trial correlations among synergy activations. We found

the synchronous synergy activations.

that, overall, the presence of correlations between the integrated activation of synchronous synergies enhanced information significantly. Correlations among activations of synchronous synergies increased information about all tasks by 9%, and increased information about out-center reaching tasks by approximately 15%. While they may seem relatively small amounts, we must take into account that the effect of correlation might have been negative, and that this was a simple task that involved only a small set of synergies. Previously documented scaling properties of the impact of correlation of information with the number of elements in the array suggests that more complex tasks involving larger number of synergies may have larger increases of information due to correlation. All in all, these preliminary results suggest that the study of impact of correlations among synergies may be helpful in future experiments to better understand how neural drives need to interact with each other to optimize motor strategies.

# **ROBUSTNESS OF SYNERGY ACTIVATION CORRELATIONS TO FACTORIZATION ERRORS**

The applicability of the proposed methodology to evaluate the role of correlations in synergy activations relies on the ability of the synergy extraction algorithms (NMF here) to correctly estimate the correlational structure of the underlying muscle synergy activations. We evaluated this issue using simulated data with known correlations between synergy activations and found that the output of the NMF algorithm recovers correctly the sign and strength of signal correlations and underestimates only slightly the strength of noise correlations. It is interesting to consider the effect of these small errors in correlation estimation introduced by the NMF on estimating the impact of correlation activation in task discriminating information. In the EMG dataset that we analyzed, the increase in task-discriminating information because of correlation was due to the combination of negative signal plus positive noise correlation. Hence, the effect of the algorithm's artifact (slight reduction in noise correlations only) would go in the direction of slightly reducing the effect of correlation on task-discriminating information with respect to the true value. Thus, the conclusion that correlations between synergy activation coefficients increase task discriminating information seems a genuine property of the data that cannot be attributed to artifacts in correlation estimation due to the NMF decomposition.

### **POTENTIAL ORIGINS OF NOISE CORRELATIONS**

It is tempting to speculate that noise correlations may emerge as a result of the mechanisms implemented by the CNS to guarantee reliable movement reproducibility. For example, correlations among synergies coordinate their relative activation levels in order to stabilize limb movement against the detrimental effect of motor noise which leads to trial-by-trial variability in the neural motor commands. A possible way to achieve this is by using positive noise correlations to regulate the level of muscle co-contraction during movement execution. Put simply, by increasing activation of two muscle groups simultaneously, many muscles are highly activated leading to an increase of the stiffness of the moving limb which enhances movement stability. An example of such case is given in **Figure 4A**. Synergies S1 and S3 are comprised of shoulder flexors and elbow flexors respectively. Their positive noise correlations during performance of tasks T5-T6-T7-T8 (combined with the negative signal similarity) result in the co-contraction of these muscles which enhances task discrimination. As these two groups of muscles act on different joints, their positive noise correlations may suggest a cross-joint coupling of synergy activations to achieve specific task goals. In other words, such anatomical muscle groups may be coupled together to form new functional muscle synergies for some subsets of tasks. Alternatively, because of the vicinity of these muscle groups, we could think that these positive correlations may result from crosstalk artifacts. However, presence of crosstalk would imply strongly positive correlations for all tasks performed, which is not the case here. Instead, the different levels of muscle cocontraction may serve as a "tag" for the target reached in each trial. These differences might be explained by the inertial properties of the arm, for which some movement directions require less muscle effort than others (Gordon et al., 1994).

We also showed that a large part of the synergy co-variations at fixed task explained the variability in the speed with which different trials were performed. Since the task (pointing to a target) did not impose movement speed, the variability in the muscle synergy activations that captures trial-to trial speed variations can be considered as irrelevant for the set of tasks considered (Scholz and Schoner, 1999). Hence, a large part of the variability in the muscle synergy space may correspond to these redundant (taskirrelevant) dimensions in task space (Todorov and Jordan, 2002; Todorov et al., 2005). This interpretation relates closely to the "uncontrolled manifold" (UCM) concept (Scholz and Schoner, 1999) and its application to muscle synergies (Krishnamoorthy et al., 2003). In broad agreement with the UCM hypothesis, our results show that the noise correlation between synergy activations accounts in part for the trial-to-trial variability of a taskirrelevant parameter, as is movement speed in our experiment. As such, our methodology may contribute to answer questions related to the identification of hypothesized task performance variable. Whereas applying the UCM concept to EMG data is not straightforward because partitioning the synergy space variance into components that affect and do not affect the task variable is tricky [in Krishnamoorthy et al. (2003), the method relies on the computation of the Jacobean of the mapping from synergy activations to task variable and its null-space], our method provides a principled and robust information-theoretic approach to address the problem. Extending our method to actually handle the case of task-execution variables such as the reach endpoint coordinates is a future and promising line of research.

Other neurophysiological evidence suggests that noise correlations between synergy activations may arise during execution of correcting movements. Corrections in reaching movements driven by feedback mechanisms have been shown to be described by the superposition of the muscle synergies that are used for unperturbed reaching (D'Avella et al., 2011). Although trial-totrial variability was not considered in this study, synergy superposition may reflect robust muscle synergy correlations in single trials that are identified by the synergy extraction algorithm as invariant patterns. Thus, the open-loop muscle synergies may be coupled to form new muscle patterns that are appropriate for the accomplishment of the corrected motor task. Further support to this consideration comes from a study showing that sensory feedback can couple two independently-organized synergies (or uncouple two centrally-coupled synergies) by modulating activation of each synergy independently (Cheung et al., 2005). These findings suggest that synergy activations may be coordinated in single trials by central mechanisms.

# **CORRELATIONS IN SYNCHRONOUS vs. TIME-VARYING SYNERGY ACTIVATIONS**

Our results indicate a more important role of noise correlations in the synchronous synergy model compared to the time-varying synergies. Intuitively, this can be explained if we take into account the functionality of each type of synergies. On the one hand, synchronous synergies consist in functional groups of muscles whose activities co-vary across all tasks (Tresch et al., 2006). In our data, each one of the four synchronous synergies includes muscles that have the same functional role (either flexion or extension of the limb) and act on the same joint. As such, these groups have to be recruited simultaneously in many flexible combinations to perform a variety of motor tasks. Furthermore, recruitment of each synergy in every single trial depends crucially on the recruitment of other synergies, as muscle groups counteract or complement the activation of other muscle groups (e.g., an agonist-antagonist pair of synergies). This explains the existence of large trial-by-trial interactions between synchronous synergy activations and points out the importance of their function. On the other hand, timevarying synergies consist in spatiotemporal patterns of muscle activities that are invariant across tasks (D'Avella and Tresch, 2002;

# **REFERENCES**


*Proc. Natl. Acad. Sci. U.S.A.* 102, 3076–3081.


D'Avella and Bizzi, 2005). Although this formulation constitutes a very compact representation of muscle activities in single trials (2 single-trial parameters per synergy), it does not allow much flexibility in reusing muscle synergies across tasks because of the merging of spatial and temporal properties in one unique pattern. This results in the identification of more "task-specific" synergies that participate in the execution of only a subset of tasks. More importantly, activation of only one time-varying synergy can be sufficient for the execution of a task without rendering necessary the simultaneous and interactive activation of other synergies. This is because every time-varying synergy specifies an entire muscle activity waveform for each muscle. As a result, in this framework, task differences are mainly described by the activation of different time-varying synergies and thus, task-discriminating information is carried mainly by the modulations of individual time-varying synergies. Nevertheless, these conclusions are drawn on a somewhat simple dataset, and further investigations should be performed on more complex data involving more motor tasks (also more muscles, more targets, more speeds etc.). The present study mainly aimed at establishing a systematic and principled methodology to address such questions.

# **ACKNOWLEDGMENTS**

We acknowledge the financial support of the SI-CODE project of the Future and Emerging Technologies (FET) programme within the Seventh Framework Programme for Research of The European Commission, under FET-Open grant number: FP7-284553.


population codes. *Neural Netw.* 23, 713–727.


recordings. *BMC Neurosci.* 10:81.


decoding. *Neural Comput.* 14, 771–779.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 26 January 2013; accepted: 19 April 2013; published online: 13 May 2013.*

*Citation: Delis I, Berret B, Pozzo T and Panzeri S (2013) A methodology for assessing the effect of correlations among muscle synergy activations on task-discriminating information. Front. Comput. Neurosci. 7:54. doi: 10.3389/ fncom.2013.00054*

*Copyright © 2013 Delis, Berret, Pozzo and Panzeri. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

# Investigating reduction of dimensionality during single-joint elbow movements: a case study on muscle synergies

# *Enrico Chiovetto1,2\*, Bastien Berret 2,3, Ioannis Delis 2,4,5, Stefano Panzeri 5,6 and Thierry Pozzo2,7,8*

*<sup>1</sup> Section for Computational Sensomotorics, Department of Cognitive Neurology, Hertie Institute of Clinical Brain Research and Center for Integrative Neuroscience, University Clinic Tübingen, Tübingen, Germany*

*<sup>3</sup> UR CIAMS, EA 4532 - Motor Control and Perception Team, Université Paris-Sud 11, Orsay, France*

*<sup>4</sup> Department of Communication, Computer and System Sciences, University of Genoa, Genoa, Italy*

*<sup>5</sup> Institute of Neuroscience and Psychology, University of Glasgow, Glasgow, UK*

*<sup>6</sup> Center for Neuroscience and Cognitive Systems @ UniTn, Istituto Italiano di Tecnologia, Rovereto, Italy*

*<sup>7</sup> Institut Universitaire de France, Université de Bourgogne, UFR STAPS, Dijon, France*

*<sup>8</sup> INSERM U887, Motricité-Plasticité, Dijon, France*

### *Edited by:*

*Andrea D'Avella, IRCCS Fondazione Santa Lucia, Italy*

### *Reviewed by:*

*Guy Cheron, Université Libre de Bruxelles, Belgium Jinsook Roh, Rehabilitation Institute of Chicago, USA*

### *\*Correspondence:*

*Enrico Chiovetto, Section for Computational Sensomotorics, Department of Cognitive Neurology, Hertie Institute for Clinical Brain Research, Otfried-Müller-Str. 25, 72076 Tübingen, Germany. e-mail: enrico.chiovetto@ klinikum.uni-tuebingen.de*

A long standing hypothesis in the neuroscience community is that the central nervous system (CNS) generates the muscle activities to accomplish movements by combining a relatively small number of stereotyped patterns of muscle activations, often referred to as "muscle synergies." Different definitions of synergies have been given in the literature. The most well-known are those of synchronous, time-varying and temporal muscle synergies. Each one of them is based on a different mathematical model used to factor some EMG array recordings collected during the execution of variety of motor tasks into a well-determined spatial, temporal or spatio-temporal organization. This plurality of definitions and their separate application to complex tasks have so far complicated the comparison and interpretation of the results obtained across studies, and it has always remained unclear why and when one synergistic decomposition should be preferred to another one. By using well-understood motor tasks such as elbow flexions and extensions, we aimed in this study to clarify better what are the motor features characterized by each kind of decomposition and to assess whether, when and why one of them should be preferred to the others. We found that three temporal synergies, each one of them accounting for specific temporal phases of the movements could account for the majority of the data variation. Similar performances could be achieved by two synchronous synergies, encoding the agonist-antagonist nature of the two muscles considered, and by two time-varying muscle synergies, encoding each one a task-related feature of the elbow movements, specifically their direction. Our findings support the notion that each EMG decomposition provides a set of well-interpretable muscle synergies, identifying reduction of dimensionality in different aspects of the movements. Taken together, our findings suggest that all decompositions are not equivalent and may imply different neurophysiological substrates to be implemented.

**Keywords: muscle synergies, non-negative matrix factorization, EMG, elbow rotations, dimensionality reduction, triphasic pattern**

# **INTRODUCTION**

A large amount of studies have provided in the last two decades evidence according to which the central nervous system (CNS) generates the muscle patterns necessary to achieve a desired motor behavior by combining a relatively small number of stereotyped spatial and/or temporal patterns of muscle activation, often referred to as "muscle synergies" (Bizzi et al., 2008). An appeal of this framework is that it suggests that the CNS may control movement execution through a relatively small number of degrees of freedom (dof).

Different conceptual definitions of muscle synergies have been given in the literature. These in practice translate into different mathematical models used to factor electromyographic (EMG) array recordings collected during the execution of variety of motor tasks into different kinds of temporal, spatial, or spatio-temporal organizations. Invariant temporal components (or "temporal synergies," see Ivanenko et al., 2004, 2005; Chiovetto et al., 2010, 2012; Dominici et al., 2011) are defined as temporal muscle activation profiles that can be simply scaled and summed together to reconstruct the actual activity of each

*<sup>2</sup> Deparment of Robotics, Brain and Cognitive Sciences, Istituto Italiano di Tecnologia, Genoa, Italy*

muscle. "Synchronous synergies" (Cheung et al., 2005, 2009, 2012; Ting and Macpherson, 2005; Torres-Oviedo and Ting, 2007, 2010) are stereotyped co-varying groups of muscle activations, with the EMG output specified by a temporal profile defining the timing of each synergy during the task execution. "Time-varying synergies" (d'Avella et al., 2003, 2006, 2008, 2011) are genuine spatiotemporal patterns of muscle activation, with the EMG output specified by the amplitude and time lag of the recruitment of each synergy.

Typically, previous studies about muscle synergies focused on a given decomposition that was then used to investigate potential functions of muscle synergies in complex motor tasks involving a large number of dof. Each of these decompositions has been used successfully to identify common physiologically important factors of muscle activity (Cheung et al., 2005; Ivanenko et al., 2005; d'Avella et al., 2006). The existence in the literature of multiple definitions of muscle synergies and their separate application to complex tasks complicates however the comparison and interpretation of the results obtained across studies, and it is not always clear why and when one synergistic decomposition should be preferred to another one. We propose instead here that the systematic study of the application of all these decompositions to the same and simple data set for which the mechanical action of each muscle contraction is well-known would greatly help to build intuition about the merit and functional interpretation of each synergistic decomposition. This would moreover be beneficial to the interpretation and comparison of different studies. We thus considered the extreme case of single-joint elbow movements, characterized by one kinematic dof, two antagonist muscles (biceps and triceps) and four experimental tasks (flexions and extensions along both the horizontal and vertical directions). We applied systematically decompositions into synchronous, time varying and temporal synergies of EMG data recorded during this elementary and well documented motor task (see Berardelli et al., 1996 for a review), whose biomechanical and neurophysiological bases were studied intensively (Gottlieb et al., 1995; Shapiro et al., 2005).

Our findings support the notion that each EMG decomposition provides a set of well-interpretable muscle synergies, identifying reduction of dimensionality in different aspects of the movements. Each temporal synergy indeed conveys information about a specific temporal phase of the movement (acceleration, deceleration, and stabilization). Synchronous and time-varying synergies instead encode respectively the simultaneous and coordinated actions of specific groups of muscles aiming to achieve a specific action goal and a task-related feature of the elbow movements (specifically the direction of motion). Taken together, our findings suggest that all decompositions are not equivalent and may imply different neurophysiological substrates to be implemented.

# **MATERIALS AND METHODS**

# **SUBJECTS**

Eight healthy right-handed subjects (7 males, 1 female, ages 29 ± 4 years, mass 74 ± 9 kg, height 1.77 ± 0.07 m), participated voluntarily to the experiments that were all performed at the Robotics, Brain and Cognitive Sciences Department at Italian Institute of Technology (IIT) in Genoa (Italy). All subjects were in good health condition and had no previous history of neuromuscular disease. The experiment conformed to the declaration of Helsinki and informed consent was obtained from all the participants according to the protocol of the ethical committee of IIT.

### **PROTOCOL**

Subjects sat on a chair with their back straight and perpendicular to the ground. They were asked to perform one-shot 90◦ elbow rotations between two reference points along either a vertical and a horizontal plane (**Figure 1**). A total of four experimental conditions were thus studied (vertical flexion, VF; vertical extension, VE; horizontal flexion, HF; and horizontal extension, HE). For movements along the vertical direction, the two reference points were located in a vertical plane, placed laterally at approximately 10 cm from the subject's movement plane. To this aim, we used a wooden hollow frame containing 1.5 cm-spaced thin vertical fishing wires to which fishing leads indicating the requested fingertip initial position were attached. One reference point coincided with the subject's fingertip position in the vertical plane when the arm was completely relaxed and extended vertically with the index fingertip pointing at the ground (vertical position number 1, or VP1). The second point coincided with the subject's fingertip position in the vertical plane when, starting from VP1, the elbow was rotated of about 90◦ so that at the end the forearm was parallel to the ground (vertical position number 2, or VP2). The positions of the fishing leads were adjusted for each subject before the initiation of the experiment, based on the subject's upper arm and forearm lengths. For vertical elbow flexion subjects rotated the elbow so as to move their index finger from VP1 to VP2. On the contrary, during vertical elbow extension they had to move the fingertip from VP2 back to VP1. For rotation along the horizontal plane subjects sat in front of a table. One reference point on the table coincided with the horizontal location of the index fingertip when the upper-arm was kept horizontal with respect to the ground and perpendicular to the coronal plane and the forearm flexed of about 90◦ with respect to the upper-arm (horizontal position 1, or HP1). The second reference point coincided with the fingertip location when the whole arm was completely extended horizontally in front of the subjects and perpendicular to the coronal plane (horizontal position 2, or HP2). After that

and had to accomplish flexions or extensions of the elbow along both the vertical (V) and horizontal (H) planes.

(for each subject) HP1 and HP2 were identified, their location was marked on the table by means of two small squared pieces of colored tape. The table plane laid 10 cm below the plane of rotation of the arm, avoiding thus to disturb the accomplishment of the movement. For horizontal elbow flexion subjects had to rotate the elbow so as to move their index finger from HP1 to HP2. On the contrary, during horizontal elbow extension they had to move the fingertip from HP2 back to HP1. Subjects were always asked to perform fast movements (mean velocities and average peak velocities are reported in **Table 1** for each subject and condition). They performed 20 elbow flexion and 20 extensions for each plane orientation. During the experiment the wrist joint was frozen by means of two light and small sticks attached to the distal part of the forearm and the proximal part of the hand. At any trial repetition subjects put their index finger on the starting position. The experimenter started data acquisition and gave the "go" signal. The subjects performed the movement after the "go" signal and stopped on the target for about a second. Data acquisition stopped automatically after 2 s. At the end of the trial the subject assumed with his arm a relaxing position until the beginning of the next trial. After 20 trials subjects took a pause of about 3 min to avoid fatigue.

# **APPARATUS**

During trials' execution kinematic data were recorded by means of a Vicon (Oxford, UK) motion capture system. Six passive markers were attached on subjects' right arm (the acromion process, lateral epicondyle of the humerus, the styloid process and the tip of the index finger) and head (external canthus of the eye and



*For each movement and subject the average velocities (*± *standard deviations) are reported. Averages and standard deviations were computed over all trial repetitions. The first entry of each row of the table indicates the initials of first and last name of a subject.*

auditory meatus). EMG activity of biceps brachii (Bic) and triceps longus (Tri) was monitored by means of an Aurion (Milan, Italy) wireless EMG system. Impedance between the surface electrodes was always checked not to exceed 5 K-: in the case of higher values, skin was rubbed by means of an abrasive sponge in order to decrease it. EMG data were amplified (gain of 1000), bandpass filtered (10 Hz high-pass and 1 KHz low-pass) and digitized at 1000 Hz.

# **DATA PRE-PROCESSING**

Data were analyzed off-line using customized software written in Matlab (Mathworks, Natick, MA). Kinematic data were lowpass filtered (Butterworth filter, cut-off frequency of 20 Hz). The angular displacement of the elbow was computed starting from the markers' spatial positions. Elbow angular velocity of rotation was obtained by numerical differentiation of the angular position. Mean and peak angular velocities were computed for each trial. The mean velocity was computed as the mean value of the angular velocity over the movement duration. The time instants of movement initiation (*t*0) and end (*tf*) were defined respectively as the instants at which the bell-shaped angular velocity profile of the elbow exceeded and dropped below 5% of its peak value. For the EMG analysis, muscle signals were full-wave rectified, normalized in amplitude with respect to their maximum value recorded across all trials and conditions and low-pass filtered once more with a zero-lag Butterworth filter (cut-off frequency 5 Hz). The filtered EMG signals relative to each trial and comprised between 100 ms before *t*<sup>0</sup> and *tf* were normalized to a standard time window of 200 samples. By considering 100 ms before movement initiation we wanted to include in the analysis any kind of anticipatory activity associated with the movement. To identify specific invariant patterns characterizing the EMG activities of the different subjects, two versions of non-negative matrix factorization were applied to the low-pass filtered EMGs. The standard NMF algorithm (Lee and Seung, 1999) was used to identify both temporal components and synchronous synergies.

### *Temporal synergies (or temporal components)*

NMF was applied to the matrix **M** of the EMG signals (size *m* by *T,* where *m* is the number of muscles signals and *T* the number of time samples), providing two matrices **U** and **C** (of dimension respectively *m* by *Nc* and *Nc* by *T*, where *Nc* is the number of temporal components) such that, at the time intant *t*, it results

$$\mathbf{M}(t) = \sum\_{i=1}^{N\varepsilon} \mathbf{U}\_i \mathbf{C}\_i(t) + \text{residuals} \tag{1}$$

were **U***<sup>i</sup>* indicates the *i-th* column of the matrix **U** and C*i*(*t*) the *i-th* element of the column vector **C**(*t*). Note the number of muscles *m* indicates the number of muscles recorded during one single experimental trial. When considering multiple trials the matrix **M** was obtained by concatenating vertically the matrices of the single trials.

### *Synchronous synergies*

NMF was applied to the transpose matrix **M** of **M**, providing thus two matrices **V** and **W** (this time of dimension respectively *T* by *Ns* and *Ns* by *m*, where *Ns* is the number of synchronous synergies) such that

$$\mathbf{M}'(t) = \sum\_{i=1}^{Ns} \mathbf{V}\_i(t)\mathbf{W}\_i + \text{residuals} \tag{2}$$

were V*i*(*t*) indicates the *i-th* element of the row vector **V**(*t*) and **W***<sup>i</sup>* the *i-th* row of the matrix **W**. Note that in (1) the *j-th* row of the matrix **M** results from the linear combination of the rows of the matrix **C** scaled by the scalar coefficients of the *j-th* row of the matrix **U**. Each row of **C** therefore contains one temporal component. In (2), conversely, the *j-th* row of **M** is obtained by combining linearly the rows of **W** scaled by the coefficients of the *j-th* row of **V**. Each row of **W** therefore, of dimension 1 by *Ns*, represents a vector of muscle activations, i.e., a synchronous synergy. Note also that, because of the constraints imposed by NMF on parameters, all the entries of the matrices **U**, **V**, **C**, and **W** are non-negative. Even in this case, when considering multiple trials before applying NMF the transposed of the matrixes of the single trials were concatenated vertically.

### *Time-varying synergies*

We applied a customized version of standard NMF and that was developed by d'Avella et al. (2003) and d'Avella and Bizzi (2005). Similarly to standard NMF all the identified parameters are nonnegative, but temporal shifts of the synergies are also allowed so that each column vector of **M** at the instant *t* is the following relationship is such that

$$\mathbf{M}(t) = \sum\_{i=1}^{Nt} c\_i \mathbf{w}\_i (t - \mathbf{r}\_i) + \text{residuals} \tag{3}$$

where *Nt* is the number of time-varying synergies and the *ci* and τ*<sup>i</sup>* are respectively the scaling coefficient and the time delay associated the synergy *wi*. The algorithm by d'Avella et al. requires specifying the temporal duration of each time-varying synergy. In this study the time duration of each synergy was set, for each subject, as long as the time duration of the whole trial after time standardization (200 samples). Note that the residuals in (1), (2), and (3) decrease as the number of synergies increase. In case of multiple trials, the matrixes of the single trials were concatenated horizontally.

### *Selection of the number of synergies to be included in the EMG decomposition and their significance*

In (1), (2), and (3) the numbers of muscle synergies (*Nc*, *Ns*, and *Nt*) are free parameters of the analysis that can be set arbitrarily by the experimenter. Here, it was decided to set in all the three cases the number of synergies according to a criterion based on the computation of the variance accounted for (VAF) as a function of *Nc*, *Ns*, and *Nt*. The VAF was defined as it follows

$$\text{VAF} = 100 \cdot \left(1 - \left(\|\mathbf{M} - \mathbf{D}\|^2 / \|\mathbf{M} - \text{mean}(\mathbf{M})\|^2\right)\right) \tag{4}$$

where **D** is the matrix of the reconstructed EMG obtained by using a certain number of synergies and mean() is an operator that compute a matrix of the same size of the matrix **M** and whose rows are equal point by point to the mean values of the corresponding rows of **M**. The number of synergies was determined as the number of components at which the graph of the cumulative VAF presented a considerable change of slope (an "elbow") and after which the slope of the graph became constant (Ferré, 1995). The exact point of change was quantitatively determined by using a linear regression procedure already used in literature (Cheung et al., 2005, 2009; d'Avella et al., 2006; Chiovetto et al., 2010, 2012). We computed a series of linear regressions, starting from a regression on the entire cumulative VAF curve and progressively removing the smallest value of number of component from the regression interval. We then compute the mean square residual error of the different regressions and selected the number of optimal synergies the first number whose corresponding error was smaller than 10<sup>−</sup>3. To minimize the probability to find local minima, we always ran NMF 25 different times on the same data set and consider as valid solution that provided the lowest reconstruction error between original and reconstructed error. To test the robustness and generality of the synergies extracted from each data set we exploited the two following cross-correlation procedures. We divided each data set in 5 parts of the same size. Since every data set consisted of the EMG activities of the Bic and Tri muscles collected during 20 repetitions of the same movement accomplished by one subject, each part consisted of the EMG activities of four trials. We then chose randomly 4 parts to use as training data set and one part as test data set. We extracted the synergies from the training data set and used them to reconstruct the activations of the test data set. We used the original and reconstructed test data sets to compute the VAF to draw the graph of the cumulative VAF. We also used the synergies extracted from each subject to reconstruct the EMG data sets of all the other subjects and assessed the level of reconstruction goodness by computing the VAF. For all cases, we verified that the extracted synergies did not result from a bias associated with the extraction methods by running a simulation. For each subject and decomposition, we compared the VAF values for the reconstruction of the experimental data obtained by combining the identified synergies with the VAF values of the reconstruction of random, structureless data reconstructed by combination of the synergies identified from those artificial data. Such data sets were generated by reshuffling the samples of each muscle independently in each trials of each subject. Reshuffled data were then low-pass filtered (5 Hz cutoff). For each one of the actual data set we simulated 50 artificial data sets and extracted the synergies by using the same procedure used for the observed data. We estimated the significance by computing the 95th percentile of the VAF distribution for simulated data.

### *Similarity of synergies across subjects*

The similarity between synergies of different subjects was quantified by computing their scalar products. For synchronous synergies and temporal components we proceeded as follows. For all possible pairs of normalized synergies of two different subjects the corresponding scalar products were computed. Note that, by definition, such a product can only adopt values ranging between 0 and 1. The pair with highest similarity was selected and the corresponding synergies were removed from the two groups of synergies. The similarities between the remaining synergies were then computed, and the best matching pair of synergies was selected and removed from the original and reconstructed model. This procedure was iterated until all synergies were matched. To compute the similarity between time-varying synergies the procedure was very similar to the one just described above with the only difference that in the last case, before computing the scalar product, the matrices of the synergies were first rearranged by disposing the entries of the matrices in form of vectors. The similarity between synergies was then quantified by computing the maximum of the scalar products over all possible time delays of the second synergy with respect to the first. To access however the significance of the values of similarity provided by the scalar products we defined a similarity index (*S*) between two synergies. This index, ranging from 0 (similarity at chance level) to 1 (perfect matching of the synergies) was defined as follows

$$\mathcal{S} = (s\_{\text{data}} - s\_{\text{chance}})/(1 - s\_{\text{chance}}) \tag{5}$$

where *s*data is the scalar product between two synergies extracted from the actual data and *s*chance is the mean scalar product between 200 pairs of random synergies. We generated the artificial synergies by resampling randomly from the distribution of the activation amplitude of each muscle in the data set from which the synergies were extracted and constructed sequences of random data with the same length of the extracted synergies. Artificial data were then low-pass filtered to match the smoothness of the actual data.

# **RESULTS**

To compare systematically the results provided by different synergistic decompositions when characterizing the same EMG data set, we recorded EMGs during a series of elbow rotations and then we extracted and compared synchronous, time-varying and temporal muscle synergies.

To illustrate the data, we begin by showing in **Figure 2A** the EMGs recorded during a typical trial accomplished by one subject and relative to an elbow flexion in the horizontal plane. Consistent with previous literature (Berardelli et al., 1996), such a movement is characterized by a sequence of three EMG bursts: an initial burst of the agonist muscle having the goal of providing the propulsive force to accelerate the movement, followed by a second burst of the antagonist to decelerate the movement and a third burst of the agonist to dampen the oscillation that other appears at the end of the movement. The latter final corrective action is also reflected in the final overshoot of the finger velocity profile. This sequence of bursts of activity was found also for elbow extension in the horizontal plane and flexion and extension in the vertical one (**Figure 2B**).

We then considered the extraction of synergies from these data. The first interesting question is how many synergies of each type are needed to describe the data. The number of synergies to consider was determined, for each subject and type of decomposition, from the dependence of the percentage of VAF (see "Materials and Methods") upon the number of synergies. The latter curves are plotted in **Figure 3** for each type of synergy factorization and for each subject. The VAF curves in each decomposition were very similar across subjects. While for the temporal and time-varying decomposition we could extract up to 6 synergies (**Figures 2A**,**C**) we found that, when referring to a synchronous synergistic decomposition, two synergies were enough to account for 100% of the variance associated with the original data. We thus did not extract a number of synergies higher than two. In **Figure 3B**, however, we reported an amount of variance equal to 100% even for *N* = 3, 4, 5, and 6, to make **Figure 3B** graphically coherent with the other two panels, i.e., **Figures 3A**,**B**.

**Figure 3A** reports the VAF dependence upon the number of extracted temporal synergies. For all subject, the VAF reached a high value when including 3 synergies, and the linear interpolation algorithm that we used (see "Materials and Methods") indicated that in all subjects 3 temporal synergies were sufficient to explain the vast majority of the variance (with additional temporal synergies generated by the NMF algorithm adding only a very small fraction of the total variance). The VAF curves for synchronous (**Figure 3B**) and time-varying (**Figure 3C**) synergies show that, for each individual subject, only two synergies were instead required to account for the variance of the EMG data.

After having individuated their number, we next considered the shapes of the synergies extracted by each decomposition. **Figure 4A** reports the shapes of the three temporal synergies extracted from the EMGs of a typical subject (LA). The three temporal components clearly remind of the triphasic organization presented in **Figure 2**. Each temporal component is characterized by one major bump. The first temporal synergy can be interpreted as the component contributing the most to the modulation of the first burst of the agonist muscle during movement accomplishment: the second as the first burst of the antagonist; and the third as the second burst of the agonist. Note that the third temporal synergy shows an initial deactivation before the occurrence of the main peak. This initial part of the synergy can be associated to the antagonist deactivation, prior to movement initiation, of the anti-gravitational muscles during rotation along the vertical plane. The combination coefficients in **Figure 4B** (averaged across the repetitions of each kind of movement) show the contribution of each component to the activity of each muscle. Consistently with a triphasic pattern, it is evident that the first component is contributing more to the activity of the biceps during VF and HF; conversely, it contributes more to the activation of the triceps in VE and HE. Similarly the second temporal synergy is more active for the muscles opposing the actions exerted by the muscles activated by the first components. Thus for HF and VF movements the coefficients of the triceps are higher than those of the biceps. Whereas for VE the coefficient of the biceps is higher than that of the biceps, for HE movements however the level of the coefficients of the two antagonist muscles is approximately the same. The coefficients show then that, in all movements, the third component is contributing to the activations of both muscles in approximately equivalent proportion, in order to compensate for overshoots or to increase the joint stiffness by co-activating opposing muscles.

There are two points that need to be remarked. First of all in the pre-processing step all the EMG signals of each muscle were normalized with respect to the maximum value that

was recorded for that muscle across all trials. Such a procedure may consequently lead to a partial loss of information about the relationship among the EMG amplitudes of different muscles monitored within the same trial. Moreover, trials were normalized in duration, which may introduce some supplementary temporal variability when merging all trials together to extract synergies. These can explain why the average coefficients of biceps and triceps relative to temporal synergy 2 in **Figure 4B** had approximately the same value for condition HE, differently from the expectation according to which the coefficient of the biceps should have appeared much larger than that of the triceps. According to the triphasic strategy, indeed, it should have been expected the second component to contribute mainly to the activation of biceps muscle which, in HE, is devoted to exert the antagonist role.

In addition, it is important to note that the number of identified temporal synergies is three, which is higher than the number of degrees-of-freedom to control (one joint angle, two muscles). This may look at first as a useless increase of complexity. However, the strength of a triphasic strategy in a single-joint motor task lies likely in its flexibility and power of generalization. Indeed, similar triphasic muscle organizations were found

**FIGURE 3 | Levels of approximation as a function of the number of synergies. (A)** Percentage of VAF as a function of the number of temporal synergies. **(B)** Percentage of VAF as a function of the number of synchronous synergies. **(C)** Percentage of VAF as a function of the number of time-varying synergies. Each colored line is associated to a specific subject (see most right panel), which in the figure is identified by the

initials of his first and last name. In all the three panels the vertical arrows indicate the number of primitives at which the curves satisfy the linear regression criterion to choose the number of primitives (see "Materials and Methods"). These points are invariant across subjects and coincide, in most of the cases, with the points at which the curves present an "elbow" and start becoming straight.

**FIGURE 4 | Identified temporal synergies. (A)** Temporal components extracted from one typical subject (LA), ordered according to the time of the occurrence of their main peaks. **(B)** Corresponding scaling coefficients.

characterizing also arm raising (Friedli et al., 1984), rapid voluntary body sway (Hayashi, 1998) and whole-body reaching (Chiovetto et al., 2010, 2012) motor tasks. In accordance with this premise, one can note that the four tasks were all executed through a triphasic motor pattern. While previous studies mainly demonstrated the powerfulness of the synergy idea to reduce the dimensionality of motor control and execution, our results show in addition that temporal synergies present marked functional features.

**Figure 5A** depicts the two synchronous synergies extracted from the EMGs of a typical subject (LA). Each synergy is characterized by the activation of one single muscle. Due to their antagonist nature, biceps and triceps therefore were found to share no common level of activation. Note that, although such a result may seem trivial in a two dimensional space, we might have obtained a pair of linearly independent vectors characterized by noticeable activity of both muscles. In **Figure 5B** the temporal evolution of the scaling coefficients averaged across movement repetition are

shown for each muscle and each movement. Note how, within each movement condition, the activities of the agonist and antagonist muscles are always characterized by one main burst in agreement with a classic triphasic pattern. Only for the first coefficient relative to HF movements the second burst is not clearly visible, this being very likely due to the averaging procedure.

Finally, the two time-varying synergies are shown in **Figure 6A**. They were characterized by one single burst for each muscle, one for the biceps and one of the triceps. The two synergies differed however for the temporal order in which the two burst occurred: whereas the burst of the biceps anticipated the burst of the triceps in the first time-varying synergy, the order of the peaks was reversed in the second one. The average scaling coefficients and temporal delays corresponding to each synergy are shown in **Figures 6B**,**C**. Note that also in this case, the contribution of each synergy to the EMG activity of each movement is consistent with the biomechanical feature of the movement itself. Thus time-varying synergy 1, in which the biceps is activated first, contributes more to HF and VF movements, while time-varying synergy 2, in which the triceps is activated first, contributes more to HE and VE movements.

In sum, we found that each kind of muscle decomposition provided a set of interpretable synergies. Each temporal component described a temporal phase of the movement. Each synchronous synergy described the simultaneous and coordinated action of a group of muscles (only one in our case) aiming to achieve a specific action goal. Each time-varying synergy related instead to a specific task-related variable (specifically a direction of motion).

We used the synergies extracted from each subject to reconstruct the EMG data of each one of the others and assessed the percentage of VAF. The results are reported in forms of confusion matrices (**Figure 7**). The average percentage of VAF computed across subjects was 90 ± 7% when temporal synergies were extracted and used for reconstruction, and 87 ± 4 for the data sets reconstructed by using the time varying synergies. These values were found to be significant and did not result from a bias built in the extraction methods. The average 95th percentile of the distribution of VAF values obtained from the reconstructions of the simulated data were indeed much lower of the ones obtained from the reconstruction of the actual data, respectively 17.6 and 39.3% when data where decomposed according to the temporal and time-varying synergistic decompositions. The synchronous case was not considered given the features of the extracted sources and the fact that with such synergies a perfect match of the actual data could always be achieved.

We quantified how much the synergies illustrated in **Figures 4, 5,** and **6** for one single subject were representative also of the synergies extracted from the EMG activity of the other subjects. To this purpose we computed the average scalar products and similarity indices between groups of synergies belonging to different participants. For the temporal components, the average scalar product was*s* = 0.93 ± 0.01, *s* = 1 ± 0 for the synchronous synergies and *s* = 0.91 ± 0.05 for the time-varying ones. The scalar products across subjects of synchronous synergies were always equal to 1 because for all the subjects the same set of synchronous synergies was always identified, in which only one single muscle was recruited at a time. Note that in this case also the similarity index *S* is always automatically equal to 1. The mean *S* values computed between groups of synergies extracted from different subjects are plotted in **Figure 8**. On average *S* = 0.86 ± 0.06 for the groups of temporal synergies and *S* = 0.85 ± 0.11 for the time-varying synergies. Note that in both cases the average similarity index was much higher than 0 (chance level). In sum, all synergies decompositions show a very high degree of robustness across subjects.

# **DISCUSSION**

We used NMF-based methods to extract three different kinds of muscle synergies from the EMG activity of two antagonist muscles during the accomplishment of single-joint elbow rotations along both the horizontal and vertical planes. By using a wellunderstood motor task, we aimed to clarify better what are the

temporal synergies identified from the data sets of the other subjects. VAF values along each row are associated with the reconstruction of the data of one single subject. **(B)** Percentage of VAF for the reconstruction of the actual EMG data set of one subject by using the time-varying synergies identified from the data sets of the other subjects.

motor features characterized by each kind of decomposition and to assess whether, when and why one of them should be preferred to another. We found well-defined interpretable results for each of the EMG signals decomposition considered. This allow us to discuss more in detail about what motor features each kind of muscle synergy decomposition encodes and, consequently, to explain why sometimes the extraction of one type of synergy may be more meaningful than another one.

In some previous studies (Ivanenko et al., 2005; Tresch et al., 2006) different unsupervised learning algorithms were applied to the same data set to verify the independence of the synergies from the particular technique used for their identification, or to test the superiority of an algorithm with respect to another one. In such studies however, all the algorithms used always relied on the same generative model, i.e., on the same definition of synergy. To our knowledge, this is the first study comparing synchronous, time-varying and temporal muscle synergies extracted from the same data set. Hence it offers the possibility to gain novel insights into the benefits provided by the different modular decompositions. Our choice of an elementary motor task for which most of the neuromuscular functions are well-understood, made the interpretations of various synergies as transparent as possible.

The results that we presented revealed that in all the cases NMF led to the identification of interpretable muscle synergies.

The extraction of synchronous synergies yielded two primitives, each one characterized by the activation of only one of the two muscles, indicating that biceps and triceps (respectively flexor and extensor of the elbow joint) assumed independent levels of activation; in other words their activation waveforms did not, in general, co-vary in time. This might look like a trivial result given the small number of muscles considered and in view of antagonist nature of the two muscles during elbow rotations. However, following the generic definition of a muscle synergy as a group of muscles working together to achieve a common goal, it may appear surprising to find that the two main muscles controlling the task performance are not synergistic. However, the definition of synergies can be restated as groups of muscles acting at one or multiple joints to achieve a specific motor function (in our case the motor function could be simply flexing or extending the arm; in other terms, accelerate or decelerate the arm). From this point of view, our interpretation is in agreement with other previous studies considering more complex movements and a larger number of muscles. Similarly to us, for instance, the synergies extracted by Cheung et al. (2009) from the EMG activations of sixteen elbow and shoulder muscles of subjects performing a set of arm movements in space can be easily split in two groups: one encompassing synergies in which the most active muscles are flexor and another one in which extensor muscles are instead dominating (see Cheung et al., 2009, their **Figure 3A**). Also in this case, therefore, the goal associated with each synergy was to flex or extend the arm. By extension, this may suggest that muscles belonging to the same synchronous synergy share similarities with respect to their biomechanical function for the movement to be performed. Synchronous synergies were shown however encoding also other kinds of functional goals, or "strategies". Torres-Oviedo and Ting (2007) extracted synchronous synergies from a set of leg and trunk muscles during a postural task and found synergies characterized mainly by activation of either ankle or knee muscle. These synergies resulted therefore in producing muscle activation patterns associated with two well-known postural strategies, usually referred to as "hip" and "ankle" strategies, which were previously deeply described in human postural control (Horak and Macpherson, 1996).

When extracting temporal muscle components the application of NMF provided a decomposition based on three temporal synergies. Each one of them was found playing a well-determined functional role during movement accomplishment, in agreement with the three movement phases present in the classical triphasic pattern (see Berardelli et al., 1996, for a review relative to elbow and wrist movements). The three phases can be resumed as follows: a first phase (coinciding with the first agonist EMG burst) to provide the impulsive force to initiate the movement, a second phase (antagonist burst) dedicated to halt the movement at the desired end-point and a third phase (coinciding with the second agonist burst) to dampen out the oscillations which might occur at the end of the movement. Although in a single-joint motor task such a triphasic strategy may look like a useless increase of complexity due to the fact that the number of synergies is higher than the number of muscles to control, its strength lies likely in its flexibility and power of generalization. Indeed, similar muscle organizations were found characterizing also arm raising (Friedli et al., 1984), rapid voluntary body sway (Hayashi, 1998) and whole-body reaching (Chiovetto et al., 2010, 2012) motor tasks. Along with the need of reducing movement complexity by reducing the number of dof (number of muscles), the decomposition of EMG activations based on the definition of temporal synergies showed that at some extent even the temporal dimension of the movement is a source of complexity that could be controlled and simplified by the CNS. These findings also pose the question of the neural implementation of this kind of temporal synergies. For single-joint rotations, Irlbacher et al. (2006) showed that the bursts composing the triphasic pattern were triggered in cascade with the possibility for the second burst to depend partly on what occurred during the first burst and not as a complete undividable sequence. This is compatible with the extraction of three temporal synergies to account for the control of elbow rotations across several conditions. However, this asks the question whether there are indeed three "spinal" temporal patterns recruited by different premotor drives or if the same temporal pattern is recruited by a delayed sequence of premotor drives. Interestingly, this idea of time shifts is present in the time-varying model of muscle synergies, which might have solved this issue.

We found that two time-varying muscle synergies could account quite well for the EMG activity associated with elbow movements. Each synergy was characterized by two main bursts of activation for both the biceps and triceps, whereas the time of occurrence of their peaks was inverted in the two synergies. While the burst of the biceps in the first synergy of **Figure 6A** occurs for first and may be thought to contribute therefore to start elbow flexion and the burst of the triceps to brake it, in the second synergies to role of the two muscles is inverted and the synergy is consistent with the pattern associated with an elbow extension. The two synergies seem therefore to intrinsically encode the direction of motion, or in other words, the motor task, and therefore may allow a hierarchical control of movements, in which task goals are only needed to be specified to generate complete muscle patterns. This finding is coherent with the results presented in previous investigations regarding arm movements (d'Avella et al., 2006, 2008, 2011) in which, even when a larger number of muscles was taken into account in the analysis, time-varying synergies where found to be directionally tuned, so that they resulted active only when the movements occurred in well-determined directions. We also stress the subtle difference between the interpretation of time-varying synergies and synchronous synergies: with the first time-varying synergy only flexions can be performed (maybe varying its speed or amplitude depending the way it is recruited). In contrast, the first synchronous synergy can be used for both flexion (to accelerate) and extensions (to decelerate), showing that both representations encode divergent aspects of the movements data set.

The use of very simple motor tasks characterized by wellknown triphasic pattern allows us to evaluate some pros and cons of each of the decompositions used in this study. Previous works demonstrated that, in a triphasic pattern, the time of activation of the antagonist muscle is controlled independently by the cerebellum (Manto et al., 1995). Other studies (Cheron and Godaux, 1986) also reported that the timing of the antagonist burst onset increases with the movement amplitude, whereas the one of the agonist does not. Our results showed that neither the temporal synergistic decomposition nor the time-varying one can capture such timing features. In the first case, indeed, each one of the three bumps of **Figure 4** is invariant in time and cannot be shifted temporally. This makes impossible to model the inter-trial variability of the onset of the antagonist muscle. Rather, each bump represents the average temporal evolution of the corresponding bursts across all trials. In the second case, differently, in each of the time-varying synergies that we identified from the experimental data set, the time lag between the activation of the two antagonist muscles is constant. This prevents the possibility, when reconstructing the data, to vary from trial to trial the time interval between the activations of the agonist and antagonist muscles, as observed in human subjects. Different considerations can instead be made for the results associated with the synchronous decomposition. As each synergy that was identified from the data is responsible for the recruitment of one single muscle indeed, the activation profile of each muscle can be set arbitrarily and independently for each trial. This allows therefore not only to model independently the times of activation of each burst in each trial, but also their amplitudes, in agreement with other experimental observations. Hannaford et al. (1985) demonstrated indeed that the first agonist burst is

not modified by the vibration of the agonist muscle. In contrast the amplitude of the second agonist burst is increased and the vibration of the antagonist muscle increases the amplitude of the antagonist burst. Similarly to the synchronous one, even the temporal decomposition is suitable to capture such features of the amplitudes in the reconstructed data, as it allows the separate scaling of each one of the three identified bumps. The timevarying decomposition, on the contrary, introduces instead by construction a correlation between the amplitudes of the different muscles.

It was demonstrated that discrete movements regulated by a triphasic pattern may present an oscillatory component in the neural command (see for instance Cheron and Godaux, 1986). Very recently, it was also shown by the analysis of the dynamical structure of reaching movement that non-periodic movement such as the one presented here contains a strong rhythmic structure (Churchland et al., 2012). In this study the authors proved that, although EMG responses do not themselves exhibit state-space rotations, EMG can however be constructed from underlying rhythmic components. It makes thus sense to wonder which one of the decomposition methods that we investigated can be more useful or complementary for the understanding of the oscillatory nature of the control of movement. Each model might indeed provide a set of synergies revealing specific oscillatory features underlying the EMGs. In this framework, synchronous components cannot be of help, as they carry spatial and not temporal information. Interesting results might instead be provided by drawing the phase plots associated with each temporal component or with the activity of each muscle trace in a time-varying synergy. In case the plots presented evident rotations indeed, the hypothesis put forward by Cheron and Godaux and later by Churchland et al. would be strengthened. In the contrary case, however, the results obtained by these authors would not be discredited, as the absence of rhythmic features in the components might instead be due to the incapability of the synergy models to account for such features correctly.

We have in this discussion tried to provide evidence that the simple results that we found for the simple movement and system considered in this study might very likely hold also for more complex behaviors involving the action of large number of muscles. We think therefore that, in general, each kind of muscle synergy may encode a different motor feature. Specifically, temporal components encode different temporal phases of the movement, each one playing a specific functional role. Synchronous synergies encode the simultaneous and coordinated actions of specific groups of muscles aiming to achieve a specific motor function (e.g., accelerate the body toward the target). Finally, time-varying synergies encode high-level task-related functions (in this case the direction of motion). This suggests that the type of factorization to be chosen in each condition depends on which of these aspects the study intents to reveal. Note however that each type of synergies may not always characterize uniquely only one single motor feature, mainly because two or more variables may be correlated. Thus, for instance, the direction of motion can be inferred also from the amplitude of the scaling coefficients relative to temporal components (**Figure 4B**) once the action exerted by the muscles in known, or the triphasic temporal organization can be also reflected in the temporal evolution of the scaling coefficients in **Figure 5B**.

We conclude by stressing that a unifying synergy extraction method capturing all those aspects at once could simplify the interpretation of future works. If all these representations of synergies are simultaneously valid, then a more general model on the top of them should exist. Used systematically, such a model

# **REFERENCES**


could allow better comparisons and interpretations of muscle synergy studies in more complex motor tasks.

# **ACKNOWLEDGMENTS**

The authors wish to thank Miss. Laura Patanè for helping during data acquisition and Prof. Martin Giese for useful discussions. Dr. Chiovetto's research was partly supported by EU grant FP7-ICT-248311 (AMARSI).

vibration on the triphasic EMG pattern in neurologically ballistic head movements. *Exp. Neurol.* 88, 447–460.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 16 November 2012; accepted: 10 February 2013; published online: 28 February 2013.*

*Citation: Chiovetto E, Berret B, Delis I, Panzeri S and Pozzo T (2013) Investigating reduction of dimensionality during single-joint elbow movements: a case study on muscle synergies. Front. Comput. Neurosci. 7:11. doi: 10.3389/ fncom.2013.00011*

*Copyright © 2013 Chiovetto, Berret, Delis, Panzeri and Pozzo. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

# Effort minimization and synergistic muscle recruitment for three-dimensional force generation

#### *Daniele Borzelli 1, Denise J. Berger 1, Dinesh K. Pai <sup>2</sup> and Andrea d'Avella1 \**

*<sup>1</sup> Laboratory of Neuromotor Physiology, Santa Lucia Foundation, Rome, Italy*

*<sup>2</sup> Department of Computer Science, University of British Columbia, Vancouver, BC, Canada*

### *Edited by:*

*Martin Giese, University Clinic Tuebingen/Hertie Institute, Germany*

### *Reviewed by:*

*Etienne Burdet, Imperial College London, UK Jinsook Roh, Rehabilitation Institute of Chicago, USA*

### *\*Correspondence:*

*Andrea d'Avella, Laboratory of Neuromotor Physiology, Santa Lucia Foundation, Via Ardeatina 306, 00179 Rome, Italy e-mail: a.davella@hsantalucia.it*

To generate a force at the hand in a given spatial direction and with a given magnitude the central nervous system (CNS) has to coordinate the recruitment of many muscles. Because of the redundancy in the musculoskeletal system, the CNS can choose one of infinitely many possible muscle activation patterns which generate the same force. What strategies and constraints underlie such selection is an open issue. The CNS might optimize a performance criterion, such as accuracy or effort. Moreover, the CNS might simplify the solution by constraining it to be a combination of a few muscle synergies, coordinated recruitment of groups of muscles. We tested whether the CNS generates forces by minimum effort recruitment of either individual muscles or muscle synergies. We compared the activation of arm muscles observed during the generation of isometric forces at the hand across multiple three-dimensional force targets with the activation predicted by either minimizing the sum of squared muscle activations or the sum of squared synergy activations. Muscle synergies were identified from the recorded muscle pattern using non-negative matrix factorization. To perform both optimizations we assumed a linear relationship between rectified and filtered electromyographic (EMG) signal which we estimated using multiple linear regressions. We found that the minimum effort recruitment of synergies predicted the observed muscle patterns better than the minimum effort recruitment of individual muscles. However, both predictions had errors much larger than the reconstruction error obtained by the synergies, suggesting that the CNS generates three-dimensional forces by sub-optimal recruitment of muscle synergies.

**Keywords: muscle synergies, isometric force, directional tuning, effort minimization, non-negative matrix factorization**

# **INTRODUCTION**

Object manipulation and tool use require accurate control of the three-dimensional force generated at the hand by the contraction of arm muscles. To generate a force at the hand in a given spatial direction and with a given magnitude, the central nervous system (CNS) has to coordinate the recruitment of many muscles. A desired force vector must results from the sum of the force vectors generated by the contraction of each individual muscle. Thus, the control policy implemented by the CNS must select an appropriate muscle activation pattern for each desired force vector output. Such a mapping from force targets to muscle patterns is the inverse of the biomechanical transformation of muscle contraction into output force. However, because of the redundancy of the muscular apparatus, the solution is not unique and infinitely many muscle patterns can generate the same force output. These patterns only differ with respect to the amount of muscle co-contraction, i.e., the part of the muscle contraction which generates force components that cancel each other (Valero-Cuevas, 2009).

How the CNS coordinates many redundant muscles is a long standing question in motor neuroscience (Bernstein, 1967). One possibility is that CNS selects the muscle pattern for a specific goal by minimizing some cost, such as effort or inaccuracy (Harris and Wolpert, 1998; Fagg et al., 2002; Todorov and Jordan, 2002; Franklin et al., 2008; Kutch et al., 2008). Such minimization may be performed searching among all possible muscle patterns and potentially achieving the global minimum of the cost function. As optimization becomes computationally challenging when it involves a large number of variables, the CNS might search for a solution only within the subset of all possible patterns generated by the combination of a small number of muscle synergies, coordinated recruitment of groups of muscles with specific activation balances or profiles (Tresch et al., 1999; Saltiel et al., 2001; d'Avella et al., 2003; Ting and McKay, 2007; Bizzi et al., 2008; Lacquaniti et al., 2012; d'Avella and Lacquaniti, 2013). However, by reducing the number of variables, i.e., constraining the solution to combinations of muscle synergies, only a value of the cost function generally larger than the global minimum can be achieved. Thus, there is a trade-off between optimality and computational complexity in the solution of the coordination problem.

Whether muscle synergies are a simplifying control strategy actually implemented by the CNS or they represent a parsimonious description of the regularities in the motor output generated by a non-synergistic controller and due to specific task constraints is a debated issue (Kutch et al., 2008; Tresch and Jarc, 2009; Valero-Cuevas et al., 2009; d'Avella and Pai, 2010; Kutch and Valero-Cuevas, 2012; Berger et al., 2013; Bizzi and Cheung, 2013). Evidence for muscle synergies as neural control strategies has come from the observation of low-dimensionality in the muscle patterns. In many species and behaviors the muscle patterns recorded in a variety of conditions can be reconstructed by a combination of a small number of muscle synergies (Tresch et al., 1999; d'Avella et al., 2003, 2006; Ivanenko et al., 2004; Ting and Macpherson, 2005; Torres-Oviedo and Ting, 2007; Overduin et al., 2008; Dominici et al., 2011; Delis et al., 2013). Moreover, neural recordings and stimulation responses suggest that muscle synergies are encoded in the CNS (Saltiel et al., 2001; Ethier et al., 2006; Gentner and Classen, 2006; Hart and Giszter, 2010; Overduin et al., 2012). However, recent simulation studies have argued that the low-dimensionality that might be observed in the muscle patterns during isometric force generation could derive from biomechanical constraints (Kutch and Valero-Cuevas, 2012) and that the shape of the covariance of the force fluctuations recorded during static isometric force production is not compatible with muscle synergies (Kutch et al., 2008).

The aim of this study is to test whether the control policy employed by the CNS for the generation of force minimize effort by either independent recruitment of individual muscles or by synergistic recruitment. We have performed a comparison between the activation of several muscles acting on the shoulder and elbow joints observed during the generation of static isometric force at the hand across multiple three-dimensional force targets and the muscle activation predicted by minimizing effort either over the set of all possible muscle patterns or within the subset of muscle patterns generated by combinations of muscle synergies. To derive these predictions, we have estimated the isometric force generated by each muscle, assuming a linear relationship between rectified and filtered electromyographic (EMG) signal and force, and we have identified time-invariant muscle synergies by non-negative matrix factorization (NMF) (Lee and Seung, 2001; Tresch et al., 2006). While the observed muscle patterns could be reconstructed accurately by the combination of a small number of muscle synergies, they were not well predicted by either minimum effort recruitment of individual muscles or synergies. However, the synergistic prediction had a significantly lower error than the prediction based on individual muscles.

# **MATERIALS AND METHODS**

### **PARTICIPANTS**

Nine right handed subjects (5 males and 4 females, mean age 29*.*6 ± 4*.*4 years, age range 24–39) participated in the experiment after giving written informed consent. All procedures were approved by the Ethical Review Board of the Santa Lucia Foundation.

### **EXPERIMENTAL APPARATUS AND DATA ACQUISITION**

Subjects sat on a racing car seat with their torso immobilized by safety belts anchored behind their shoulders and hips. They inserted their right hand and forearm in a splint that immobilized hand, wrist, and forearm positioned on a desktop in front of them. The splint was attached to a steel bar and mechanically connected via a steel rod to a 6-axis force transducer (Delta F/T Sensor, ATI Industrial Automation, Apex, NC, USA) mounted below the desktop. In this posture the center of the palm was aligned with the body midline at the height of the sternum and the elbow was flexed approximately by 90◦. The height of the desktop and the distance of the chair from the desktop could be adjusted according to the subject's size. The subject view of his right hand was occluded by a mirror (29*.*7 × 21 cm), parallel to the desktop, that reflect the image displayed by a 21-inch LCD monitor (Syncmaster 2233, Samsung Electronics Italia S.p.A., Cernusco sul Naviglio, MI, Italy), also parallel to the desktop (**Figure 1A**). The height of the monitor was adjusted at the height of the subjects' eyes and the mirror was positioned halfway between the subjects' hand and the monitor. During the experiments subjects wore 3D shutter glasses (3D Vision P854, NVIDIA Corporation, Santa Clara, CA, USA) and viewed stereoscopically a virtual desktop matching the real desktop and a spherical cursor positioned, at rest, approximately at the center of the palm. The virtual scene was rendered by a 3D graphics card (Quadro Fx 3800, NVIDIA) on a PC workstation using custom software. Force targets were shown as transparent gray spheres and force feedback was provided by the displacement of the spherical blue cursor (**Figure 1B**). The scene was updated at 60 Hz with the cursor position processed by a second dedicated data-acquisition PC workstation running a real-time operating system and transmitted to the first workstation through an Ethernet link using the UDP protocol. Cursor motion was simulated in real time as a mass accelerated by the force applied by the subject on the splint, a viscous force, and an elastic force proportional to the distance from the rest position. The spring constant was set such that the force applied to maintain the cursor stationary at the target, distant 5 cm from the center of the palm, had a magnitude equal to 20% of the subject's mean maximum voluntary force (MVF) across force directions (see below). To maintain fast responses to changes in force while reducing the effect transducer noise when the force was stationary, the mass was adjusted adaptively in the range 15–140 g as a sigmoidal function of the rate of change in the magnitude of the recorded force. The damping constant was set to make the system critically damped.

Electromyographic activity from 17 muscles acting on the right shoulder and elbow was recorded with active bipolar electrodes (DE 2.1, Delsys Inc., Boston, MA), after band-pass filtering (20–450 Hz) and amplification (gain 1000, Bagnoli-16, Delsys Inc.). The following muscles were recorded: teres major (TeresMaj), infraspinatus (InfraSp), latissimus dorsi (LatDors), inferior trapezius (TrapInf), middle trapezius (TrapMid), superior trapezius (TrapSup), brachioradialis (BracRad), biceps brachii, long head (BicLong), biceps brachii, short head (BicShort), triceps brachii, lateral head (TriLat), triceps brachii, long head (TriLong), triceps brachii, medial head (TriMed), anterior deltoid (DeltA), middle deltoid (DeltM), posterior deltoid (DeltP), pectoralis major clavicular (PectClav), pectoralis major sternal (PectStern). Correct electrode placement was verified by observing the activation of each muscle during specific maneuvers. Force and EMG data were digitalized at 1 kHz using an A/D PCI board (PCI-6229, National Instrument, Austin, TX, USA). Only the forces (Fx lateral direction on the horizontal plane, positive to the right; Fy frontal direction on the horizontal plane,

positive away from the chest; Fz vertical direction, positive up) were used during the experiment.

### **EXPERIMENTAL PROTOCOL**

For each subject, the MVF along the direction of the 20 vertices of a dodecahedron was estimated at the beginning of the experiment and used to scale the magnitude of the force targets. For each direction the maximum force magnitude was recorded in two trials in which subjects were instructed to generate maximum force in a spatial direction indicated by an arrow. Subjects then performed a series of 160 trials generating forces in 32 directions (5 series of trials in all directions). The target directions were chosen to be approximately uniformly distributed on the surface of a sphere with radius of 0.2 MVF. Targets were arranged on horizontal planes at different heights. On the Fz = 0 plane, 8 targets were equally distributed on a circumference. The height of the other horizontal force planes was calculated such that the difference in elevation angle (ϕ = tan<sup>−</sup>1(Fz/(F2 x+ F<sup>2</sup> y)<sup>1</sup>*/*2)) of two adjacent planes was approximately equal to the angle between two adjacent targets of the Fz = 0 plane. The number of targets on each plane was chosen such that the azimuth angle (ϑ = tan<sup>−</sup>1(Fy/Fx)) difference between two adjacent targets on the plane was as close as possible to the angle between two targets on the F*<sup>z</sup>* = 0 plane (45◦ for 8 targets, see **Figures 1C,D**). At the beginning of each trial subjects were instructed not to apply any force and to maintain the cursor for 3 s (rest phase) within a transparent yellow sphere with a radius larger than the cursor sphere radius by 2% MVF and aligned with the center of the palm. A target, indicated by a gray transparent sphere with a radius larger than the cursor sphere radius by 2% MVF was then displayed in one of the 32 locations and subjects were instructed to move the cursor to the target by applying force (**Figure 1B**). The target sphere turned yellow when the cursor was inside it. Finally, subjects were required to maintain the cursor within the target for 3 s (hold phase) to successfully end the trial.

# **DATA ANALYSIS**

EMG data were used to characterize the directional tuning of muscle activations, to identify time-invariant muscle synergies, and, together with force data, to estimate an EMG-to-force matrix. One subject was excluded from the analysis after realizing that during the experiment the position of the cursor when the subject was not applying any force to the splint (at the beginning of each trial) had drifted, likely due to a lack of proper immobilization of the hand and forearm in the splint. A few trials in which the remaining eight subjects were not able to reach or remain in the target (3*.*4 ± 4*.*5 over 160 total trials, mean ± *SD*, range 0–13) as well as a few additional trials with EMG artifacts (6*.*0 ± 4*.*2, range 1–13) were excluded from the analysis. Finally, a few trials of the MVF block with EMG artifacts were also excluded from the analysis. The total number of excluded trials was 15*.*7 ± 12*.*3, range 1–33.

### *Directional tuning of muscle activations*

EMG data were rectified and digitally low-pass filtered (2nd order Butterworth, 5 Hz cutoff) and re-sampled at 100 Hz to reduce data size. In each trial, mean EMG activity of each muscle during the last 0.6 s of the rest phase was used to estimate the baseline noise level of each muscle which was then subtracted from the rest of the data. Filtered EMG waveforms for each muscle were aligned to the beginning of the hold phase and then averaged across repetitions of the same target to construct directional tuning curves. Averaged EMG for each muscle were normalized to the maximum voluntary contraction across direction (MVC) recorded during MVF.

The directional tuning of each muscle activation was also fitted by a spatial cosine function:

$$\begin{aligned} m(f; f\_{PD}) &= f^T f\_{PD} + m\_{\text{offset}} = f\_{PD} [\cos \varphi \cos \varphi\_{PD} \\ &+ \sin \varphi \sin \varphi\_{PD} \sin(\vartheta - \vartheta\_{PD})] + m\_{\text{offset}}, \end{aligned}$$

where *f* in the unit vector pointing in the direction of the force target, *fPD* is a preferred direction vector with length *fPD*, azimuth angle ϑ*PD*, and elevation angle ϕ*PD*, and *m*offset is an offset level. The parameters of the preferred direction vector and offset were estimated by multiple linear regressions (Matlab function regress) and the significance of the tuning assessed by an *F*-test.

# *Muscle synergies*

Muscle synergies were identified by a NMF algorithm (Lee and Seung, 1999, 2001). Muscle activation vectors (**m***k*) constructed with the rectified, filtered, and averaged EMG waveforms of each muscle during the hold phase of *k*-th trial, normalized to MVF after baseline noise level subtraction. Each vector (matrix column) was reconstructed as the combination of a unique set of *N* time-invariant synergies (**w***i*) scaled by time-varying synergy activation coefficients (*c<sup>k</sup> i* )

$$\mathbf{m}^k \ = \sum\_{i=1}^N c\_i^k \mathbf{w}\_i$$

or, equivalently, in matrix notation, **M** = **W C**. For each *N* from 1 to the number of muscles, the extraction algorithm was repeated 10 times and the repetition with highest reconstruction *R*<sup>2</sup> was retained. *R*2, the fraction of total variation explained by the synergy model, was defined as 1 - SSE/SST, where SSE is the sum of the squared residuals and SST is the sum of the squared differences between the recorded muscle patterns and their mean.

The number of synergies *N* is a free parameter that we chose as the smallest number that reconstructed accurately the data variation taking noise and the directional tuning of synergy activation coefficient into account. In previous studies using decomposition algorithms to identify muscle synergies, *N* was selected to capture the structured data variation not due to noise either according to a threshold in *R*<sup>2</sup> (Tresch et al., 1999; Ting and Macpherson, 2005; Torres-Oviedo et al., 2006) or by identifying a change in slope in the *R*<sup>2</sup> curve (Cheung et al., 2005; d'Avella et al., 2006; Tresch et al., 2006). We considered both criteria, and we computed (i) the smallest *N* for which the *R*<sup>2</sup> was larger than 0.9 and (ii) as the point at which the *R*<sup>2</sup> vs. *N* curve had a change in slope (MSE error of linear fit from *N* to 17, the number of muscles, below 10<sup>−</sup>4). In case of mismatch between the number of synergies selected according to the two criteria, we chose the one set of synergies with a more uniform directional distribution of preferred directions of the synergy activation coefficients (the direction of the maximum of the cosine function best fitting the directional tuning). To do so, for each one of the two synergy sets, we arranged their preferred direction vectors on a unit sphere, we considered all pairs, and we selected the set of synergies with the smallest number of pairs with an angular difference below 20◦. Finally, the elements of each synergy vector (**w***i*) in the selected set were normalized to their maximum value.

Directional tuning curves for the synergy activation coefficients, as for the muscle activations, were constructed by averaging their values in the hold phase and across trials to the same target.

# *EMG-to-force matrix*

The isometric end-point force (**f**) generated at the hand with the arm in a fixed posture (as both the trunk and the forearm were immobilized) by a muscle activation pattern (**m**) was modeled as linear combination of the end-point force associated to each muscle, **f** = **H m**, where **H** is a matrix with dimensions [3 × *Nm*] (*Nm* number of muscles). For each subject we estimated such matrix using multiple linear regressions of each force component, low-pass filtered (2nd order Butterworth, 5 Hz cutoff) with the rectified, filtered, re-sampled, baseline subtracted, MVC normalized EMG data recorded during the hold phase in all conditions. While the relationship between muscle activation and end-point force is generally not linear, for low muscle activation required for the force magnitude of the targets used in the experiment (0.2 MVF) linearity provided an adequate approximation (Lawrence and De Luca, 1983).

### *Minimum effort predictions*

We predicted the observed muscle activation pattern (**m**obs) for each force target either by minimizing the sum of squared muscle activation (**m**musc) (Buchanan and Shreeve, 1996; van Bolhuis and Gielen, 1999; Todorov and Jordan, 2002) or by minimizing the sum of squared synergy activations (**m**syn), under the constraint that the predicted pattern generates the desired force target according to the linear EMG-to-force mapping (**H**):

```
mmusc = arg min(m2) such that f = Hm
```

$$\begin{cases} \mathbf{m}^{\text{syn}} = \mathbf{Wc}^{\text{syn}}\\ \mathbf{c}^{\text{syn}} = \arg\min(\|\mathbf{c}\|^2) \quad \text{such that } \mathbf{f} = \mathbf{HWc} \end{cases}$$

We used the MATLAB function quadprog to find these minima.

# *Minimum effort prediction with random muscle synergies*

We performed a Monte Carlo simulation to assess the significance of the prediction obtained by minimizing synergy effort. We compared the mean squared residual of the minimum synergy effort prediction with the distribution of the mean squared residuals obtained with 200 sets of random synergies. For each subject, after randomly shuffling over the columns (trials) each row (muscle) of the muscle activation data matrix (**M**), random synergies were generated either selecting a number of columns equal to the number of synergies identified from the data or by extracting the same number of synergies from the shuffled data matrix.

# *Statistical analysis*

A Wilcoxon rank-sum test was performed for each subject to evaluate if the mean over force targets of the squared error of the prediction obtained minimizing muscle effort (**m**musc), (εmusc)2 = ||**m**musc – **m**obs||2, was statistically different from the squared error of the prediction obtained minimizing synergy effort (**m**syn), (εsyn)2 =||**m**syn – **m**obs||2.

# **RESULTS**

All subjects were able to reach the force targets and to maintain the force within the required 2% MVF tolerance for 3 s. Examples of the raw EMG and force data recorded for three trials to targets are shown along the positive Fx, Fy, and Fz axes in **Figure 2**.

# **DIRECTIONAL TUNING OF MUSCLE ACTIVATIONS**

As in previous studies (Flanders and Soechting, 1990; Roh et al., 2012), we found that the activation of most muscles was modulated by force direction. **Figure 3** illustrates the modulation of the activity of 17 arm muscles recorded in subject 8 as a function of the azimuth of the force target on three different horizontal planes (elevation angles: −29, 0, 29◦). For each muscle and target elevation, the directional tuning of the mean activity during the hold phase is illustrated by a polar plot in which the muscle activity is indicated by the radial distance of a marker in the direction of the target azimuth. Most muscles showed a directional tuning resembling the tuning expected by a spatial cosine function. For muscles with a preferred direction vector of the best fitting spatial cosine function lying close to the horizontal plane (e.g., TrapMid and PectStern), their azimuth directional tuning resembles a circle tangent to the origin. For muscles with a large vertical component in their preferred direction (e.g., BicLong, BicShort, TriLong, TriLong, and TriMed), the dependence of their activation on elevation is evident in the different radii of the circles. One muscles (BracRad) had a very narrow and non-significant spatial cosine tuning (*p* = 0*.*14). Other muscles had a significant (*p <* 0*.*05) but poor (*R*<sup>2</sup> value of the cosine fit less than 0.5) spatial cosine tuning (TeresMaj, LatDors, TrapInf, and TrapSup). Across subjects, 0*.*9 ± 0*.*3 (mean ± *SD*) muscles had a non-significant (*p >* 0*.*05) spatial cosine tuning and 3*.*2 ± 1*.*8 a poor fit (*R*<sup>2</sup> *<* 0*.*5).

# **MUSCLE SYNERGIES**

We decomposed the muscle patterns recorded during the hold phase as combinations of muscle synergies identified by the NMF algorithm. Across subjects (**Figure 4**) the number of synergies selected according to a threshold either in the fraction of the

total data variation explained by the synergies (*R*2) or in the mean squared error of a linear fit of the final portion of the *R*<sup>2</sup> curve (see Materials and Methods) ranged from 6 to 7 (6*.*4 ± 0*.*5, mean ± *SD*). The corresponding *R*<sup>2</sup> values ranged from 0.90 to 0.95 (0*.*93 ± 0*.*01). Thus, a small number of synergies captured the modulation of activity in many arm muscles across directions and magnitudes of isometric force generated at the hand. **Figure 5A** shows the six synergies identified in the muscle patterns of subject 8. Each synergy has a different balance of activation across muscles, with some muscles more strongly active than others (TrapMid, DeltM, and DeltP in W1, TerMaj, LatDors, TrapInf, TrapMid, TriLong, DeltP, and PectMajStern in W2, TriLat, TriLong, TriMed, DeltM and DeltP in W3, InfraSp and TrapSup in W4, BicLong and BicShort in W5, TerMaj, PectStern, and PectClav in W6) and with many muscles recruited in multiple synergies.

Synergy activation coefficients were in most cases also well captured by a spatial cosine function. The directional tuning of the activation coefficients of the six synergies of subject 8 (**Figure 5B**) was always significant (*p <* 0*.*0001) and well reconstructed by a cosine fit (*R*<sup>2</sup> *>* 0*.*5). Across subjects, only subject 6 had 4 out of 7 synergy activation coefficients not well fitted by a cosine functions (*p* = 0*.*40, 0.22, 0.05, 0.05) while all other subjects had a significant (*p <* 0*.*05) spatial cosine tuning. Across all

**FIGURE 4 | Selection of number of synergies.** The number of synergies (N) is chosen for each subject as (i) the smallest *N* for which the *R*<sup>2</sup> value (*blue markers and line*) was larger than 0.9 (*red dashed line*) or (ii) the point at which the *R*<sup>2</sup> vs. *N* curve had a change in slope [MSE of linear fit from *N* to

max(*N*) below 10<sup>−</sup>4, *green dashed line*]. In case of mismatch between the two criteria, the set of synergies with smallest number of similar preferred directions was selected (*red/green marker,* smallest number of synergy pairs with an angular difference between preferred direction below 20◦).

shows the components of one synergy vector, normalized to its maximum. **(B)** Directional tuning (polar plot as in **Figure 3**) of the synergy activation coefficients for force targets on three horizontal planes.

subjects, only 1*.*2 ± 1*.*7 synergy activation coefficients had a poor fit (*R*<sup>2</sup> *<* 0*.*5).

# **EMG-TO-FORCE MATRIX AND SYNERGY DIRECTIONAL TUNING**

As in previous studies of muscle activation during isometric force production (Osu and Gomi, 1999; Valero-Cuevas et al., 2009), we modeled the mapping between EMGs and sub-maximal magnitude (20% MVF) end-point force linearly. An EMG-to-force matrix (**H**) was estimated with multiple linear regressions of the mean EMG and forces recorded in the hold phase for each subject. **Figure 6A** illustrates force vectors associated to the activation of each muscle (columns of **H**) for subject 8. These force vectors in most cases matched the pulling directions of the muscles expected from their anatomical configuration. For example, on the horizontal plane (*left*), BracRad (elbow flexors) and TeresMaj (shoulder internal rotator and adductor) were associated to dorsally directed (negative Fy) forces, TriMed (elbow extensors) to a ventrally directed (positive Fy) force, PectClav and PectStern (shoulder flexors) to medially directed (negative Fx) forces, and DeltM (shoulder abductor) to a laterally directed (positive Fx) force. On the sagittal plane (*middle*), DeltA (shoulder adductor), InfraSp (shoulder external rotator), and PectClav showed a large rostral (positive Fz) and ventral force, BracRad a large rostral and dorsal force, TeresMaj a large caudal (negative Fz) and dorsal force, and TriMed a large caudal and frontal force. In the frontal plane (*right*) the two portions of pectoralis major showed distinct rostro-caudal (Fz) components. Across subjects, the forces recorded during the hold phase were reconstructed accurately by the product of the EMG-to-force matrix times the recorded EMGs (*R*<sup>2</sup> = 0*.*89 ± 0*.*02, mean ± *SD*, *n* = 8, for the reconstruction of the individual force samples in all trials; *R*<sup>2</sup> = 0*.*97 ± 0*.*01 for the reconstruction of the force averaged across time and trials to the same target by averaged EMGs).

We also estimated the force associated to the activation of individual muscle synergies by multiplying the EMG-to-force matrix with the synergy matrix (columns of the **HW** matrix, **Figure 6B**). Each synergy had a distinct force direction in space. W1 was associated to a lateral force, W2 to a dorso-caudal force, W3 to a ventro-caudal force, W4 to a ventro-rostral-lateral force, W5 to dorso-rostral force, and W6 to a medial force. However, with respect to individual muscle forces, there were larger angular differences between individual synergy force directions.

### **MUSCLE ACTIVATIONS PREDICTED BY MINIMUM EFFORT CRITERIA**

We compared the muscle activation observed in all force directions with those predicted by minimizing either muscle effort or synergy effort. Examples of the directional tuning curves on the horizontal force plane (polar plot, *left*) and for all directions

(*right*) of three muscles (InfraSp, TrapMid, and DeltM) of subject 8 are illustrated in **Figure 7**. In all three cases the predicted tuning curves peak in same directions as the observed curves but in some cases they do not fit well the whole curve. For InfraSp (*first row*), the minimum muscle effort curve underestimates the observed curve and the minimum synergy effort curve overestimates it. For TrapMid (*second row*), muscle effort minimization predicts a very weak activation while the minimum synergy effort prediction closely matches the observed data. For DeltM (*third row*), the minimum synergy effort prediction again matches the observed data while the minimum muscle effort prediction overestimates them. These differences between the two

label. **Right:** Average EMG activity for all 32 targets. *Blue* markers and lines (interpolating the markers with spline curves in polar coordinates) represent experimental data, *green* markers and lines (interpolating the markers with spline curves with negative values set to zero) represent predictions according to the linear EMG-to-force model with the minimum muscle effort criterion, *red* markers and lines with the minimum synergy effort criterion.

predictions depend on how the forces associated to the muscles (the columns of the H matrix, **Figure 6A**) and the synergies (**Figure 6B**) can be combined to minimize effort. For example, the minimum muscle effort criterion predicts an activation of TrapMid much weaker than the minimum synergy effort criterion because the minimum muscle norm solution is achieved by recruiting more strongly other muscles with a pulling direction close to that of TrapMid but with a larger forcer magnitude (in particular BracRad, see **Figure 6A**). In contrast, TrapMid has a stronger activation with the minimum synergy norm solution because it is recruited within W1 (see **Figure 5**) and no other synergies can generate forces in the medial-dorsal direction with small activations.

Across subjects, we noticed that the mean residual of the minimum muscle effort prediction over all muscles and targets was always negative (sign test, *p <* 0*.*0001 for all subjects, see *green bars* in **Figure 8A**) and that the mean residual of the minimum synergy effort prediction was always positive (*p <* 0*.*01 for all subjects except subject 6, *red bars* in **Figure 8A**). Thus, the minimum muscle effort criterion underestimated the observed muscle activations and the minimum synergy effort criterion overestimated them. The minimum muscle effort underestimation corresponds to a larger than minimal amount of co-contraction in the observed muscle patterns. Indeed, the amount of co-contraction, quantified by the mean Euclidian norm of the projection of the muscle patterns onto the null space of the EMG-to-force matrix, was significantly higher for the observed data than for the minimum muscle effort prediction (sign test, *p <* 0*.*0001 for all subjects; mean ± *SD* across subjects: 0*.*16 ± 0*.*04 for the data and 0*.*09 ± 0*.*02 for the prediction). The mean null space norm for the minimum synergy effort criterion (mean ± *SD* across subjects: 0*.*19 ± 0*.*04) was higher than the mean norm for the minimum muscle effort criterion but also slightly higher than the mean norm for observed data (sign test, *p <* 0*.*05 for subjects 2, 4, 5, 7, and 8) possibly due to inaccuracies in the estimation of the EMGto-force matrix. Finally, we found that the residuals for many muscles were not normally distributed. Across subjects, the residuals of the minimum muscle effort prediction of the activation of individual muscles had a distribution over different targets significantly different from the normal distribution (Lilliefors test, *p <* 0*.*05) in 62% of cases (84 cases over 17 muscles in 8 subjects) for the minimum muscle effort model and in 31% of cases for the minimum synergy effort model. However, we could not discern any clear pattern in the residuals.

We then compared the prediction error magnitudes. We found that the mean squared residual of the minimum synergy effort prediction was lower than the mean squared residual of the minimum muscle effort prediction (**Figure 8B**). The difference of the squared residual, averaged across muscles and targets, between the two criteria was significant (Wilcoxon rank-sum test, *p <* 0*.*0001, *n* = 8). To assess the significance of these results we compared, for each subject, the mean squared residual of the minimum synergy effort prediction with the distribution of the mean squared residual obtained applying the minimum effort criterion on random synergies. We performed a Monte Carlo simulation, generating, for each subject, random synergies either randomly shuffling the EMG data or performing NMF on

randomly shuffled EMG data. We found that the mean squared residual obtained with the synergies extracted from the data was much smaller than the residuals obtained with both types of random synergies in all subjects (empirical *p <* 0*.*005), indicating that the value of the mean squared residual of the minimum synergy effort prediction was not simply due to the small number of synergies but depended on the actual structure of the synergies.

Finally, we compared the squared prediction error of the two models with the reconstruction error of the synergies. For all subjects, the fraction of the variation of the muscle patterns across force targets, averaged over repetitions to the same target, explained by the combinations of the synergies (**Figure 8B**, *black bars*, *R*<sup>2</sup> = 0*.*95 ± 0*.*01, mean ± *SD*) was much higher than the fraction explained by both models. However, the minimum muscle effort model had a smaller *R*<sup>2</sup> value (*green bars*, 0*.*02 ± 0*.*20) than the minimum synergy effort model (*red bars*, 0*.*65 ± 0*.*10).

# **DISCUSSION**

We investigated muscle patterns underlying the generation of isometric force at the hand along 32 uniformly distributed directions in tri-dimensional space. Across subjects, the directional tuning of most muscles was well captured by a spatial cosine function and muscle patterns for all force targets could be reconstructed by the combinations of 6 or 7 muscle synergies identified by NMF. We then estimated the force associated to muscle activation by multiple linear regressions and we used such linear mapping to predict the minimum muscle effort and the minimum synergy effort muscle patterns for each force target. We found that the prediction error with both minimum effort criteria was larger than the synergy reconstruction error but the error obtained minimizing the synergy effort was significantly smaller than the error obtained minimizing muscular effort. These results suggest that the CNS recruits suboptimal combinations of muscle synergies to generate isometric forces.

The estimation of the mapping between muscle activity and isometric force at the hand was necessary to predict the minimum effort muscle patterns for a given force target. We approximated such mapping during the generation of a static isometric force (hold phase) as a linear transformation between rectified, lowpass filtered, MVC-normalized EMGs and low-pass filtered forces. We could then estimate an EMG-to-force matrix by linear regression of the force components as a function of the activity of all recorded muscles. The assumption of linearity is reasonable when the posture does not change and generated forces are much smaller than the MVF (Lawrence and De Luca, 1983), as in our case. Linear models have been used before to predict isometric forces from EMG recordings (Valero-Cuevas et al., 2009) and minimum effort muscle patterns (Fagg et al., 2002). However, our linear approximation of the mapping between muscle activity and force may have contributed to the model prediction error. Qualitatively the muscle pulling directions estimated by multiple linear regressions appeared compatible with the directions expected from the known anatomical arrangement and mechanical action of the muscles. A quantitative evaluation of the EMG-to-force matrix obtained with our simple procedure might be possible by comparing such matrix with one derived using a detailed musculoskeletal model of the arm (Holzbaur et al., 2005) but such comparison is challenging because of the many subject-specific anatomical and physiological parameters that need to be determined in order to generate reliable predictions with a musculoskeletal model. Thus, we believe that our simplifying assumptions are adequate for the purpose of comparing the two minimum effort criteria, since both minimizations rely on the same EMG-to-force matrix.

A second concern with our approach is the selection of the number of synergies. We used two criteria frequently used in the muscle synergy literature (Tresch et al., 2006; Delis et al., 2013): the total variation accounted by the synergies (synergy reconstruction *R*2) (Tresch et al., 1999; Torres-Oviedo et al., 2006) and the detection of a change in slope in the *R*2curve (d'Avella et al., 2003; Cheung et al., 2005). In case of discrepancy between the two criteria we selected the number of synergies with a more uniform distribution of the preferred directions of the synergy activation coefficients. Both criteria depend, however, on *ad-hoc* thresholds and thus, while they ensure a meaningful comparison across subjects, they cannot guarantee that the correct number of synergies is selected. In a recent study of the muscle synergies underlying force production in a task similar to ours (Roh et al., 2012), a smaller number of synergies has been reported (3–5). Such difference may be due to the smaller number of muscles recorded in that study (8 vs. 17 in ours) and to the different definition of variance accounted for (VAF). As muscle patterns are multidimensional observations, we referred the synergy reconstruction error to the total variation (Mardia et al., 1979) of the muscle patterns, i.e., the multidimensional generalization of the variance of a scalar observation, and we defined *R*<sup>2</sup> = 1− SSE/SST, with SSE the sum of the squared residual and SST as the sum of the squared residual with respect to the mean muscle pattern, proportional to the total variation (d'Avella et al., 2006; Delis et al., 2013). Roh and colleagues, in contrast, defined VAF = 100 × (1 − SSE/SST), with SST sum of the squared data, i.e., without subtracting the mean muscle pattern. As a consequence such VAF value is higher than the *R*<sup>2</sup> value for the same number of synergies and a smaller number of synergies are selected with the same threshold (90%). When we performed the same analysis of Roh and collaborators on our data, using the same 8 muscles, we found a comparable number of synergies (3–5). Notably, a minimum number of 4 synergies is required to generate forces in all spatial directions by non-negative combinations (Davis, 1954).

A number of previous studies have investigated whether the observed muscle patterns can be the result of effort minimization. Buchanan and Shreeve (1996) used models of the muscles about the elbow (11 muscles) and wrist (5 muscles) to compare the observed directional dependence of muscle activation with the prediction from the minimization of several cost functions, including sum of muscle force, stress, and normalized force (Buchanan and Shreeve, 1996). The choice of cost function had little influence on the results and all cost functions were not able to reliably estimate muscle activation as a function of force direction, even if predictions at the wrist were more favorable than those at the elbow due to the smaller number of muscles and degrees-of-freedom. A sensitivity analysis indicated that the discrepancies between predicted and observed values could not be explained by errors in the physiological parameters of the models, calling into question the applicability of optimization analysis to study such tasks. Our results are in accordance with those observations and extend them to the generation of hand forces by a larger number of muscles acting at the elbow and the shoulder. Moreover, the larger prediction errors that we obtained minimizing muscle effort with respect to synergy effort suggest that the synergistic recruitment of muscles contributes to the sub-optimal co-activation of muscles.

Investigating wrist movements, Fagg and colleagues showed that minimizing effort, defined as we did as the sum of squared muscle activations, yields muscle activation patterns qualitatively similar to those observed experimentally, in particular, reproducing the observed cosine-like recruitment of the muscles as a function of movement direction and also appropriately predicting that certain muscles will be recruited more strongly in movement directions that differs significantly from their direction of action (Fagg et al., 2002). While our model predictions also reproduced cosine-like recruitments and qualitatively similar directional tuning curves in several muscles, in many cases we did observed qualitative and substantial discrepancies between predicted and observed muscle activations. Such poorer model performance may be due, as observed by Buchanan and Shreeve, by the larger number of muscles and degrees-of-freedom considered in our study.

A recent study used a static quadrupedal musculoskeletal model of the cat to predict limb forces and muscle activity in response to multidirectional postural perturbations while minimizing different formulations of control effort, including muscle and synergy effort (McKay and Ting, 2012). Patterns of muscle activity producing forces and moments at the center of mass necessary to maintain balance and the resulting ground reaction forces predicted by the models were compared to experimental data. Limb forces at different stance distances were well predicted by both minimum-effort solutions. Muscle tuning directions were found to be invariant across postural configurations, similar to experimental data, but the quality of the muscle pattern predictions were not quantified and there also appeared to be discrepancies (see their **Figure 8**), especially for the minimum muscle effort solution (e.g., no activity predicted in biceps femoris and gracilis), matching our observations in the human arm. McKay and Ting concluded that reduced-dimension neural control mechanisms, such as muscle synergies, can achieve similar kinetics to optimal solutions, demonstrating the feasibility of muscle synergies as physiological mechanisms for the implementation of near-optimal motor solutions. In our study we could not assess kinetics predictions, as the generation of a specific force target was a constraint in the optimization. However, our analysis of muscle pattern predictions also supports the conclusion that three-dimensional forces are generated as near- or sub-optimal motor solutions by muscle synergy combinations.

The fact that the observed muscle activation patterns did not minimize muscle or synergy effort does not rule out the possibility that they minimized some other cost. The additional cocontraction inherent in the non-minimal effort solutions might be related to an increase in stiffness during the hold phase possibly due to endpoint stability maximization (Franklin and Milner, 2003). Since the task was isometric, in principle there was no need to increase endpoint stiffness to generate a target output force precisely. On the contrary, because of signal-dependent noise in force production by muscle activation, the precision would decrease with an increase in co-contraction. However, subjects had to control a moving cursor in a realistic virtual environment and they might have adopted a control strategy usually employed when required to generate a force while maintaining a freely moving endpoint, typically a tool, close to a fixed position. In those conditions an increase in stiffness associated to an increase in cocontraction would be an appropriate control strategy to achieve higher positional stability at the cost of an additional muscular effort. Thus, as suggested in recent studies, the CNS might adopt habitual rather than optimal (de Rugy et al., 2012) or locally rather globally optimal (Ganesh et al., 2010) muscle coordination strategies.

Whether muscle synergies are organized by the CNS to simplify motor control and motor learning (Giszter et al., 2007; Bizzi et al., 2008; d'Avella and Pai, 2010; Bizzi and Cheung, 2013; d'Avella and Lacquaniti, 2013) or they are results from biomechanical and task constraints (Todorov and Jordan, 2002; Kutch et al., 2008; Kutch and Valero-Cuevas, 2012) is a controversial issue (Tresch and Jarc, 2009). Evidence for muscle synergies as neural control strategies has come mainly from the low-dimensionality in the muscle patterns recorded during a variety of behaviors and task conditions and across different species (Tresch et al., 1999; d'Avella et al., 2003, 2006, 2008, 2011; Krishnamoorthy et al., 2003; Hart and Giszter, 2004; Ivanenko et al., 2004, 2005; Cheung et al., 2005; Ting and Macpherson, 2005; Overduin et al., 2008; Muceli et al., 2010; Dominici et al., 2011; Chvatal and Ting, 2013; D'Andola et al., 2013; Gentner et al., 2013), from neural recordings and stimulation (Saltiel et al., 2001; Ethier et al., 2006; Gentner and Classen, 2006; Gentner et al., 2010; Hart and Giszter, 2010; Overduin et al., 2012), and, recently, from the observation that adaptation to a perturbation of the normal mapping between muscle activity and force, simulated in a virtual environment using myoelectric control, is slower when the perturbation is not compatible with the synergies than when it is (Berger et al., 2013).

Two recent studies (Kutch et al., 2008; Kutch and Valero-Cuevas, 2012) have argued against the neural origin of the muscle synergies involved in the generation of isometric forces. In a first study, Kutch and colleagues compared the directional dependence of the covariance of the force fluctuations observed experimentally during the generation of planar isometric forces with the index finger with the directional dependence predicted by either a minimum synergy effort model of a minimum muscle effort model (Kutch et al., 2008). They argued that, if individual muscles are activated flexibly and the force they generate is affected by signal-dependent noise (Harris and Wolpert, 1998), the force generated in the direction of action of an individual muscle must show a covariance ellipse elongated in the direction of the force. In contrast, if muscles are recruited within fixed synergies, multiple muscles are always activated simultaneously and the force covariance must be on average less elongated in the direction of the target force. For isometric forces generated by the index finger on a plane, the observed force covariance directness was found to be more in agreement with the directedness predicted by minimum muscle effort than the directedness predicted by minimum synergy effort. However, we wonder whether the results of Kutch and colleagues depended on the fact that the synergies used in their calculation were not extracted from the data but generated randomly, while the directedness of the synergy model was evaluated only in three fixed directions corresponding to the peak values of the directedness of the data. We plan to test the directedness of the force covariance of three-dimensional forces generated at the hand by several arm muscles in a future study.

In a more recent study, Kutch and Valero-Cuevas studied the generation of isometric forces by actuation of the tendons of a cadaveric index finger and with a model of the human leg (Kutch and Valero-Cuevas, 2012). They argued that, if the set of all possible muscle coordination patterns that produce any single endpoint force vector are themselves a low-dimensional subset, the observed low-dimensionality of the muscle patterns could be misinterpreted as neurally-generated muscle synergies. Principal component analysis was performed on the set of all vertices of the solution set in muscle activation space for 16 planar force directions, identified with computational geometry techniques using the linear muscle-to-force mapping derived experimentally or from the model. The dimensionality of all possible coordination patterns resulted indeed lower than the number of muscles, thus, providing an assessment of the upper limit imposed by biomechanics, but, at least for the leg model, higher than the dimensionality typically observed in the data. Thus, such biomechanical limit to dimensionality should be directly compared to the dimensionality extracted from experimentally observed muscle patterns, as we also plan to do, to draw any conclusion on the neural origin of muscle synergies.

In conclusion, we have demonstrated that muscle patterns underlying the generation of three-dimensional forces can be reconstructed accurately by the combination of a small number of muscle synergies but they could not be predicted accurately by either minimization of muscle effort or synergy effort. However, the minimum synergy effort model fitted the experimental data much better than the minimum muscle effort model, suggesting that the CNS generates threedimensional forces by sub-optimal recruitment of muscle synergies.

### **ACKNOWLEDGMENTS**

This work was supported by the Human Frontier Science Program Organization (RGP11/2008), the European Union FP7- ICT program (Adaptive Modular Architectures for Rich Motor skills, AMARSI, Grant 248311), the Canada Research Chairs Program, NSERC, CFI, CIHR, NIH, and the Peter Wall Institute for Advanced Studies.

# **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 30 July 2013; accepted: 05 December 2013; published online: 20 December 2013.*

*Citation: Borzelli D, Berger DJ, Pai DK and d'Avella A (2013) Effort minimization and synergistic muscle recruitment for three-dimensional force generation. Front. Comput. Neurosci. 7:186. doi: 10.3389/fncom.2013.00186*

*This article was submitted to the journal Frontiers in Computational Neuroscience.*

*Copyright © 2013 Borzelli, Berger, Pai and d'Avella. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Dimensionality of joint torques and muscle patterns for reaching

### *Marta Russo1, Mattia D'Andola1, Alessandro Portone1,2, Francesco Lacquaniti 1,2,3 and Andrea d'Avella1 \**

*<sup>1</sup> Laboratory of Neuromotor Physiology, Santa Lucia Foundation, Rome, Italy*

*<sup>2</sup> Center of Space Biomedicine, University of Rome "Tor Vergata," Rome, Italy*

*<sup>3</sup> Department of Systems Medicine, University of Rome "Tor Vergata," Rome, Italy*

### *Edited by:*

*Martin Giese, University Clinic Tuebingen, Germany*

### *Reviewed by:*

*Simon Giszter, Drexel Med School, USA Stan Gielen, Radboud University Nijmegen, Netherlands*

### *\*Correspondence:*

*Andrea d'Avella, Laboratory of Neuromotor Physiology, Santa Lucia Foundation, Via Ardeatina 306, 00179 Rome, Italy e-mail: a.davella@hsantalucia.it*

Muscle activities underlying many motor behaviors can be generated by a small number of basic activation patterns with specific features shared across movement conditions. Such low-dimensionality suggests that the central nervous system (CNS) relies on a modular organization to simplify control. However, the relationship between the dimensionality of muscle patterns and that of joint torques is not fixed, because of redundancy and nonlinearity in mapping the former into the latter, and needs to be investigated. We compared the torques acting at four arm joints during fast reaching movements in different directions in the frontal and sagittal planes and the underlying muscle patterns. The dimensionality of the non-gravitational components of torques and muscle patterns in the spatial, temporal, and spatiotemporal domains was estimated by multidimensional decomposition techniques. The spatial organization of torques was captured by two or three generators, indicating that not all the available coordination patterns are employed by the CNS. A single temporal generator with a biphasic profile was identified, generalizing previous observations on a single plane. The number of spatiotemporal generators was equal to the product of the spatial and temporal dimensionalities and their organization was essentially synchronous. Muscle pattern dimensionalities were higher than torques dimensionalities but also higher than the minimum imposed by the inherent non-negativity of muscle activations. The spatiotemporal dimensionality of the muscle patterns was lower than the product of their spatial and temporal dimensionality, indicating the existence of specific asynchronous coordination patterns. Thus, the larger dimensionalities of the muscle patterns may be required for CNS to overcome the non-linearities of the musculoskeletal system and to flexibly generate endpoint trajectories with simple kinematic features using a limited number of building blocks.

**Keywords: modularity, reaching movements, human subjects, inverse dynamics, EMGs, muscle synergies, temporal components, joint torques**

# **INTRODUCTION**

How the central nervous system (CNS) coordinates a large number of muscles to generate complex motor behaviors is an open question. The dynamic complexity of the skeletal system with its many degrees of freedom (DoF), the versatility of the motor system, capable of accomplishing many different tasks, and the redundancy and non-linearity of the muscular apparatus all pose challenging control problems. A modular architecture has been proposed as a way for the CNS to tackle the complexity of motor control. In a modular architecture control is subdivided among basic building blocks, allowing for an efficient yet flexible task decomposition. In particular, a modular generation of the muscle patterns might allow for a low-dimensional representation of the motor output incorporating knowledge on the dynamic behavior of the musculoskeletal system into a small set of basic functions shared across tasks and conditions. Recently, the modular control hypothesis has been supported by observations of low-dimensionality in the muscle patterns underlying a variety of motor behaviors in different species. Using multidimensional decomposition techniques such as principal component analysis (PCA), factor analysis (FA), independent component analysis (ICA), and non-negative matrix factorization (NMF) it has been possible to reconstruct the muscle activation patterns as the combination of a small number of components (Tresch et al., 2006; Giszter et al., 2007; Ting and McKay, 2007; Bizzi et al., 2008; Tresch and Jarc, 2009; Lacquaniti et al., 2012; d'Avella and Lacquaniti, 2013). These components may capture different features of the muscle patterns shared across task conditions, such as specific relationships in the strength of activation of groups of muscles, i.e., muscle synergies (Tresch et al., 1999; Ting and Macpherson, 2005) or M-modes (Krishnamoorthy et al., 2003), specific time-courses of the activation waveforms for all muscles, i.e., temporal components (Ivanenko et al., 2004; Dominici et al., 2011), and specific collections of muscle activation waveforms, i.e., time-varying muscle synergies (d'Avella et al., 2003, 2006) but they all construct muscle patterns by linear combinations of a small number of generators. However, even if muscle patterns can be accurately described by such generators, task accomplishment depends on the actual joint torques and the consequent joint motions produced by muscle contractions. Thus, to better understand how motor tasks may be accomplished by the combination of a few muscle pattern generators it is necessary to assess the relationship between the organization of muscle patterns and that of joint torques.

While joint torques underlying many different motor behaviors have been investigated extensively, a characterization of their dimensionality with multidimensional decomposition approaches such as those recently used to analyze muscle patterns is missing. Joint torque generators for a two-joint arm have been identified before using NMF from simulated data (Chhabra and Jacobs, 2006) but not from experimental data. Focusing on reaching movements in vertical planes, as in many previous studies (Soechting and Lacquaniti, 1981; Lacquaniti et al., 1982; Flanders et al., 1994, 1996; d'Avella et al., 2006, 2008, 2011), our aim was to investigate the dimensionality of joint torques and to compare it with the dimensionality of the muscle patterns. Moreover, we wanted to explore systematically the dimensionality of different types of generators, i.e., generators capturing shared structure in the spatial–across joints or muscles–, temporal, and spatiotemporal dimensions. Planar point-to-point reaching movements, for which joint torques can be estimated using a simplified dynamical model of the arm with two joints, are normally associated with bell-shaped velocity profiles and biphasic torque profiles (Morasso, 1981; Soechting and Lacquaniti, 1981). The shape of such profiles is invariant with respect to movement speed (Soechting and Lacquaniti, 1981) or load (Lacquaniti et al., 1982) and the relationship between shoulder and elbow dynamic torques is almost linear (Soechting and Lacquaniti, 1981; Lacquaniti et al., 1986; Gottlieb et al., 1997). Similar observation were made for reaching movements in three-dimensional space (Lacquaniti et al., 1986). These observations indicate that joint torques for reaching have remarkable regularities suggesting that their dimensionality is also low. One might hypothesize that there is a one-to-one relationship between muscle pattern generators and torque generators. However, biomechanical characteristics and constraints must be taken into account.

To generate the joint torques **τ***(t)* required to move the arm along a given joint trajectory **q**(*t*), i.e., torques for which the trajectory is a solution of the arm motion equations [see Equation (3) in Material and Methods], the CNS, according to the modular control hypothesis, combines a set of *Nm* (spatial, temporal, or spatiotemporal) muscle pattern generators:

$$\mathbf{m}(t) = \sum\_{n=1}^{N\_m} a\_n \mathbf{v}\_n(t)$$

where **v***n(t)* is the *n*-th spatiotemporal generator or the product of the *n*-th temporal component times the *n*-th spatial weighting vector for spatial and temporal generators (see "Dimensionality of motor commands" in Materials and Methods). The tension generated by the activation of the each muscle is determined by the dynamics of the musculotendon unit, which depends non-linearly on muscle length, velocity, and muscle activation. Muscle length and velocity depend on joint angles and joint velocities via a matrix of moment arms. Then, muscle torque depends on joint angles, joint velocities, and muscle activations

$$
\mathfrak{r} = \mathfrak{r}(\mathfrak{q}, \dot{\mathfrak{q}}, \mathfrak{m}).
$$

Thus, the required torque profile can be generated by appropriate combination of the muscle pattern generators, i.e., it can be expressed as a function of the combination coefficients **a**:

$$\mathfrak{r} = \mathfrak{r}(\mathbf{q}, \dot{\mathbf{q}}, \sum\_{n} a\_n \mathbf{v}\_n).$$

The torque does not depend in general linearly on the muscle activations and, consequently, on the combination coefficients **a**. When linearity is an adequate approximation, muscle torque can be expressed as a linear combination of the "force fields" associated to each generator, **ϕ***<sup>n</sup>* = **ϕ***n(***q***,* **q˙***)* = **τ***(***q***,* **q˙***,* **e***n)*, where **e***<sup>n</sup>* is the unit vector along the *n*-th dimension in coefficient space:

$$\mathfrak{r} = \sum\_{n} a\_n \,\,\mathfrak{p}\_n(\mathbf{q}, \dot{\mathbf{q}}),$$

i.e., limb control can be achieved by combination of force-field primitives (Bizzi et al., 1991; Giszter et al., 1993, 2007; Mussa-Ivaldi et al., 1994; Kargo and Giszter, 2000a,b; Kargo et al., 2010; Giszter and Hart, 2013). However, torques profiles observed across different task conditions can also be expressed as (or approximated by) a linear combination of *N*τ torque generators:

$$\mathfrak{r}(t) = \sum\_{n=1}^{N\_t} b\_n \,\mathbf{u}\_n(t) \,.$$

Even if there is a one-to-one relationship between muscle pattern generators and force-field primitives, the number of muscle pattern generators must be larger than the number of torque generators because of the non-negativity constraint on muscle activations. As muscles can only pull, muscle pattern generators are combined with non-negative combination coefficients and, even considering a linear muscle-to-torque mapping, to generate torques spanning a *N*τ-dimensional space at least *N*<sup>τ</sup> + 1 nonnegative generators are required (Davis, 1954; Valero-Cuevas, 2009). Moreover, because of the redundancy of the muscular system, different muscle patterns can generate the same torques and thus the muscle pattern dimensionality can be larger than the minimum imposed by non-negativity, i.e., *Nm* ≥ *N*<sup>τ</sup> + 1. Thus, the minimum number of muscle pattern generators depends on the actual dimensionality of the joint torque required to perform all conditions of a specific task and the actual number of muscle pattern generators can be larger than the minimum and must be determined experimentally. Importantly, we consider here tasks whose conditions can be described by a set of parameters, such as, for example, the position of a target of a point-to-point reaching movement. Then, since the skeletal system is also redundant for the performance of many tasks, e.g., a specific position of the wrist in space can be achieved with many different joint angle configurations, the actual dimensionality of the joint torques may be lower than the number of joints (i.e., DoF) involved and it must also be determined experimentally. Finally, while the minimal number of muscle pattern generators might guarantee an optimal solution in terms of computational complexity, it might be suboptimal in terms of other costs such as muscular effort. Thus, comparing the torque and muscle pattern dimensionality can provide new information on the control strategy employed to perform a specific task.

We analyzed EMGs data recorded from 19 muscles and kinematic data collected from markers positioned on the arm of subjects performing fast reaching movements from one starting position to 8 targets on the sagittal plane and eight targets on the frontal plane. We used a dynamic model of the arm with four rotational joints (three at the shoulder and one at the elbow) and three translational DoF (the position in space of the shoulder) to estimate joint torques from joint angles with an inverse dynamics computation (Corke, 1996). We then considered the dynamic component of the torques, i.e., the total torques with the gravitational components removed (Gottlieb et al., 1997), and the phasic component of the muscle activity waveforms, i.e., the total rectified and filtered EMG waveforms with the tonic, anti-gravity components removed (Flanders and Herrmann, 1992; d'Avella et al., 2006). Spatial, temporal, and spatiotemporal torque generators were identified by performing PCA on different arrangements of the data matrix. Similarly, spatial, temporal, and spatiotemporal muscle pattern generators were identified with NMF. We first determined the dimensionality of generators according either to a threshold on the fraction of data variation explained (Tresch et al., 1999; Ting and Macpherson, 2005; Torres-Oviedo et al., 2006; Roh et al., 2012) or to the detection of a "knee" in the curve of the variation explained as a function of the number of generators (d'Avella et al., 2003; Cheung et al., 2005; d'Avella et al., 2006; Tresch et al., 2006). We used the former criterion for the torque data and the latter for the EMG data. However, to directly compare the dimensionality of torques and muscle patterns, we then also used a single criterion which took into account the different intrinsic variability of the two datasets when determining their dimensionality (Cheung et al., 2009).

# **MATERIALS AND METHODS**

# **PARTICIPANTS, EXPERIMENTAL APPARATUS, AND TASK**

Four right handed subjects (aged between 27 and 40) gave their written informed consent to participate in the study, which conformed with the Declaration of Helsinki and had been approved by the Ethical Review Board of the Santa Lucia Foundation. The experimental apparatus and reaching task has been described in details in a previous report (d'Avella et al., 2006). Briefly, standing subjects gripped with their right hand an handle (weight 180 g) which had a sphere (diameter 4 cm) attached to one extremity. The center of sphere was aligned with the axis of the forearm at a distance of 12 cm from center of the palm. Participants were instructed to move the sphere between a central position and 8 targets uniformly arranged on a circle at 15 cm of distance on either the frontal or sagittal plane while minimizing shoulder and wrist movements. The central position was adjusted for each subject so that it required maintaining the upper arm vertical and aligned to the trunk and the elbow flexed at 90◦. The targets were indicated by transparent spheres lighted from inside by an LED. In each trial, after holding the sphere at the start position for at least 1 s, subjects were instructed to move after a go signal, to reach the target with a movement of a duration (defined as the interval in which the speed of the sphere was above 10% of its maximum) shorter than 400 ms, and to hold there for at least 1 s. Unsuccessful trials were repeated. Each subject performed each movement successfully five times in different blocks of trials for a total of 160 point-to-point movements (2 planes × 8 targets × 2 directions -from the center to the target and from the target back to the center- × 5 repetitions).

# **DATA ACQUISITION**

The motion of the arm was recorded using an optic motiontracking system (Optotrack 3020, Nothern Digital, Waterloo, Ontario, Canada) with a sampling frequency of 120 Hz and spatial resolution below 0.1 mm. Active optical markers were positioned on the shoulder (acromion), the upper arm (at the proximal end close to the head of the humerus), the elbow (epicondylus lateralis), the wrist (one over the styloid process the radius and one on the styloid process of the ulna). The motion of the sphere on the handle (end-point) was recorded with an electromagnetic motion-tracking system (Fastrak, Polhemus, Calchester, VT) with sampling frequency of 120 Hz and spatial resolution below 4 mm, as estimated by a calibration process performed within the workspace used in the experiment.

EMG activity was recorded with active bipolar surface electrodes (DE 2.1; Delsys, Boston,MA) from the following muscles: biceps brachii, short head (BicShort), biceps brachii, long head (BicLong), brachialis (Brac), pronator teres (PronTer), brachioradialis (BrRad), triceps brachii, lateral head (TrLat), triceps brachii, long head (TrLong), triceps brachii, medial head (TrMed) deltoid, anterior (DeltA), deltoid, middle (DeltM), deltoid, posterior (DeltP), pectoralis major, clavicular portion (PectClav), pectoralis major, lower portion (PectLow), trapezius superior (TrapSup), trapezius middle (TrapMid), trapezius inferior (TrapInf), latissimus dorsi (LatDors), teres major (TeresMaj), infraspinatus (InfraSp). EMG signal was band-pass filtered (20–450 Hz) and amplified (total gain 1000, Bagnoli-16, Delsys Inc.). EMG data were digitized at 1 KHz (PCI-6035E, National Instruments, Austin, TX).

Data acquisition and experiment control were performed on a workstation with custom software written in LabView (National Instruments, Austin, TX). Fastrak data were processed on-line to compute the movement time and target accuracy and to provide auditory feedback about unsuccessful trials. The experiment control program logged the time of all relevant behavioral events.

### **DATA ANALYSIS**

### *End point kinematics*

All analyses were performed with custom software written in Matlab (Mathworks, Natick, MA). Position and orientation of the handle and the measured geometric parameters of the handle were used to compute the position of the end-point. The data were low-pass filtered (FIR filter; 15 Hz cutoff; zero-phase distortion; Matlab *fir1* and *filtfilt* functions) and differentiated to compute tangential velocity and speed. For each movement we computed the *onset time* and the *end time*, defined respectively as the time in which the speed profile crossed 10% of its maximum value, and the *movement duration* (MT), defined as the interval between the movement onset and the movement end.

# *Arm model*

A kinematic and kinetic model of the arm, incorporating geometrical and inertial parameters of the upper arm and forearm segments, was used to estimate joint angles and joint torques from the recorded spatial position of the shoulder, the elbow, and the wrist markers. The kinematic model was developed using the Denavit-Hartenberg (D-H) notation (Hartenberg and Denavit, 1955), i.e., as chain of articulated links with four parameters for each link (*a*: length, α : twist, *d:* offset, ϑ : joint angle) describing the position and orientation of a Cartesian reference frame fixed on each link with respect to the reference frame fixed on the preceding link of the chain according to the 4 × 4 homogeneous transformation matrix *T*:

$$T = \begin{bmatrix} \cos \vartheta - \sin \vartheta \cos \alpha & \sin \vartheta \sin \alpha & a \cos \vartheta \\ \sin \vartheta & \cos \theta \cos \alpha & -\cos \theta \sin \alpha & a \sin \theta \\ 0 & \sin \alpha & \cos \alpha & d \\ 0 & 0 & 0 & 1 \end{bmatrix} . \tag{1}$$

The rotation axis of each joint coincides with the *z* axis of the preceding link in the chain. The *x* axis in each frame is directed as the normal between the *z* axis of that frame and the *z* axis of the next frame. In this way the joint angle is the angle between the *x* axes of the frames of the two links connected by the joint. We modeled four rotational degrees-of-freedom (DOFs) of the arm—three rotations at the shoulder, i.e., adduction, flexion and external rotation, and one rotation at the elbow, i.e., elbow flexion (see **Figure 1**)—and three translational DOFs of the shoulder. We assumed that shoulder was a spherical joint (i.e., the rotation axes of the three joints intersect at a single point). Lengths of upper arm, forearm, and hand of each subject were estimated as a function of the subject's weight and height according to regression equations (Winter, 1990). Forearm, hand, and handle were considered a single link (7th) of length equal to the sum of the forearm length and the length of the opened hand, thus approximating the total length of the closed hand and the handle along the direction of the forearm axis with the length of the opened hand.

The kinetic model of the arm was developed adding to each link its inertial parameters (mass, center of mass, inertia tensor) also estimated as a function of the subject's weight and height according to regression equations (Zatsiorsky and Seluyanov, 1983). No mass was associated to the first three links required to represent the spatial position of the shoulder. However, these translational DOFs were introduced to take into account shoulder movements when estimating the joint torques. The mass of the upper arm was assigned to the 6th link, which had an offset equal to the length of the upper arm segment. The mass of the forearm, hand, and handle was assigned to the 7th link, associated with the elbow flexion. The inertial parameters for this link were computed from the inertial parameters estimated separately

from the regression equations for the forearm and hand. As the estimated position of the center of mass of the hand and of the handle coincided, the mass of the handle (180 g) was summed to the mass of the hand. The moments of inertia were computed with respect to its center of mass. The model was implemented in Matlab using the Robotic Toolbox (Corke, 1996, 2011). The D-H parameters of the generic arm model are reported in **Table 1** and the specific geometric and inertial parameters estimated for each subject are reported in **Table 2**.

# *Joints kinematics*

The arm model was used to estimate at each time sample the shoulder adduction angle, the shoulder flexion angle, the shoulder external rotation angle, the elbow flexion angle using the positions of the shoulder and elbow markers and the mean position between the two wrist markers. For each time sample and each joint angle, a vector between two markers aligned with the axis of the limb segment defining the rotation of that joint (i.e., shoulder and elbow markers for shoulder adduction and shoulder flexion, elbow and wrist markers for shoulder external rotation and elbow flexion) was computed first. Then, the segment vector was transformed into the reference frame associated to the joint according to the matrices defined by Equation (1) and the angle computed as

$$\vartheta\_i = \tan^{-1}(\mathbf{y}/\mathbf{x})\tag{2}$$

where *x* and *y* are the coordinate of the vector in the reference frame associated with the joint rotation axis (*z* axis). To compensate for potential misalignment between the tracker *z* axis and the vertical axis, the coordinates of the markers were first rotated into a Cartesian reference frame with the gravitational acceleration along the *z* axis. The direction of the gravitational acceleration

**Table 1 | D-H parameters for the 7 DOFs arm model.**


*Sh, shoulder; El, elbow; LF , forearm link length; LU , upper arm link length; P is for prismatic joints and R is for revolute joints.*

### **Table 2 | Arm model parameters for individual subjects.**


*LU , upper arm length; LF , forearm length (including hand and handle); rU, position of the upper arm center-of-mass along the link-6 x axis; rF, position of the upper arm center-of-mass along the link-7 x axis; I(lo) U, inertia along the longitudinal axis of the upper arm; I(ap) U, inertia along the antero-posterior axis of the upper arm; I(tr) U, inertia along the trasversal axis of the upper arm; I(lo) F, inertia along the longitudinal axis of the forearm* + *hand* + *handle system; I(ap) F, inertia along the antero-posterior axis of the forearm* + *hand* + *handle system; I(tr) F, inertia along the trasversal axis of the forearm* + *hand* + *handle system.*

was estimated by means of a calibration of based on tracking two markers attached to the fulcrum and the extremity of a pendulum.

Angular velocity and acceleration were computed by numerical differentiation. To validate the kinematic model, forward kinematics was used to compare estimated and measured end-point trajectories.

### *Inverse dynamics*

Joint angles, joint velocities and joint accelerations were used to estimate the torque profiles via recursive Newton-Euler calculation (*rne* function of Matlab Robotics Toolbox). We computed the total torques **τ**

$$\mathbf{r} = \mathbf{M}(\mathbf{q})\ddot{\mathbf{q}} + \mathbf{C}(\mathbf{q}, \dot{\mathbf{q}})\dot{\mathbf{q}} + \mathbf{G}(\mathbf{q}) \tag{3}$$

where **M** is the manipulator inertia matrix, **C** is the Coriolis and centripetal torque, and **G** is the gravitational torques. To estimate non-gravitational (dynamic) torques we subtracted gravitational torques from the total torques.

To validate the inverse dynamics calculation we also performed a forward dynamics simulation (*fdyn* function of Matlab Robotics Toolbox) using the arm model and the estimated torque profiles to reproduce the original joint angle trajectories.

### *Data preprocessing*

The EMGs for each trial were digitally full-wave rectified, lowpass filtered (FIR filter, 20 Hz cut-off, zero-phase distortion, Matlab *fir1*, and *filtfilt* functions), and integrated over 10 ms intervals. In a few cases muscle waveforms showed some artifacts, possibly due to a partial detachment of the electrode from the skin, or to an high correlation between two or more muscles, and those muscles were removed from further analysis (subject 1: PectLow; subject 2: TrapInf, PectLow).

EMGs and torques for all the trials in each experimental condition (2 planes × 8 targets × 2 directions) were aligned on the time of movement onset and averaged.

Finally, both torques and muscle waveforms were normalized in time to equal MT and resampled with 50 samples per MT. Samples from 0.5 MT before movement onset to 0.5 MT after movement end (total 100 samples) were considered for further analysis.

### *Dimensionality of motor commands*

We consider a set of *D* command signals (joint torques or muscle patterns) delivered by a controller in a given time interval (sampled *T* times) to accomplish a task in one of *K* distinct task conditions (e.g., different reaching targets). We hypothesize that a modular controller generates these command signals by modulating and combining a small set of generators whose structure is invariant across all task conditions. The structure of such generators may be defined in the spatial (across signals, i.e., muscles or joints), temporal, and spatiotemporal domains. The dimensionality of the ensemble of command signals is then simply the number of generators necessary to accomplish all *K* tasks conditions.

*Spatial dimensionality* is the number of generators necessary to capture time-invariant relationships between the signals. For *N* generators:

$$\mathbf{x}^k(t) = \sum\_{n=1}^N c\_n^k(t)\,\mathbf{w}\_n \tag{4}$$

where **x***k(t)* are the set of signals for condition *k*, i.e., a vectorvalued (*D*-dimensional) function time (or a *D* × *T* matrix for discrete time samples), *c<sup>k</sup> <sup>n</sup>(t)* is a condition-dependent, timevarying combination coefficient for the *n*-th generator, **w***<sup>n</sup>* is the condition-independent, time-invariant *n*-th spatial generator, i.e., a *D*-dimensional vector capturing the relative activation weight of different signals.

*Temporal dimensionality* is the number of generators necessary to capture temporal components shared across all signals (i.e., space-invariant). For *N* generators:

$$\mathbf{x}^k(t) = \sum\_{n=1}^N c\_n(t) \,\mathbf{w}\_n^k \tag{5}$$

where **x***k(t)* are again the set of signals for condition *k*, *cn(t)* is the condition-independent, time-varying *n*-th generator (or temporal component), **w***<sup>k</sup> <sup>n</sup>* is the condition-dependent, time-invariant *n*-th *D*-dimensional weight vector for the *n*-th component. Notice how the critical difference in the definition of spatial and temporal generators and dimensionality is in the dependence on the task condition (*k*). Indeed, generators are useful concepts only if they can be used for a variety of conditions thus allowing an effective reduction of the number of parameters to select for each condition.

*Spatiotemporal dimensionality* is the number of generators capturing simultaneously invariant spatial and temporal features in the signals. Thus, each generator includes a set of signal components that can be expressed as a time-varying vector. For *N* generators

$$\mathbf{x}^k(t) = \sum\_{n=1}^N a\_n^k \mathbf{v}\_n(t) \tag{6}$$

where *a<sup>k</sup>* is a condition-dependent combination coefficient for the *n*-th generator, **v***n(t)* is the n-th spatiotemporal generator, i.e., a condition-independent, time-varying *D*-dimensional vector (or a *D* × *T* matrix for discrete time samples). However, as different signals may be related synchronously or asynchronously, we can distinguish the case of *synchronous* spatiotemporal generators:

$$\mathbf{v}\_{\mathfrak{n}}(t) = c\_{\mathfrak{n}}(t)\mathbf{w}\_{\mathfrak{n}} \tag{7}$$

in which each generator **v***n(t)* can be expressed as the product of a scalar function of time *cn(t)*times a time-invariant weight vector **w***n*. In contrast, asynchronous spatiotemporal generators cannot in general be factorized into separate spatial and temporal generators.

In addition to being scaled in amplitude, spatiotemporal generators may also be recruited at different times across task conditions, i.e., they may also show invariance for time shifts (d'Avella et al., 2003, 2006). If we assume that the duration of each spatiotemporal generator is smaller than the duration of the signals, we can incorporate condition-dependent onset times *t k n* into Equation (6):

$$\mathbf{x}^{k}(t) = \sum\_{n=1}^{N} a\_{n}^{k} \mathbf{v}\_{n}(t - t\_{n}^{k}) \tag{8}$$

### *Identification of generators and their dimensionality*

To investigate the spatial, temporal, and spatiotemporal dimensionality of joint torques and muscle patterns we used multidimensional decomposition techniques to identify the different types of generators. We considered the dynamic component of the torques and the phasic component of the muscle activity waveforms. We then used PCA to identify torque generators and, because of the inherent non-negativity of muscle activity, we used NMF to identify muscle pattern generators. Finally, as discussed below, we selected the number of generators with three different criteria, two specific for each dataset and one for both datasets.

*Dynamic torques and phasic muscle patterns.* Reaching movements in vertical planes require torques and muscle activities to accelerate and decelerate the limb and to balance gravitational forces. In this work we focused on the former components, i.e., dynamic torques and phasic muscle patterns. Dynamic torques were computed as the total torques with the gravitational components [the last term of the right hand side of Equation (3)] removed (Gottlieb et al., 1997). Flanders and collaborators (Flanders and Herrmann, 1992) found that it is possible to distinguish the phasic component (related to the movement) from the tonic one (related to maintain a specific posture of the arm) of an EMG signal. As in d'Avella et al. (2006) we used a subtraction procedure to remove the tonic component, i.e., we subtracted a constant muscle activation level before and after the movement and a linear ramp between the two constant values during the movement. After the subtraction a small fraction of EMG samples assumed negative values, indicating that the phasic EMG activity was lower than the tonic activity. However, in order to use the NMF algorithm, we set to zero all negative values (ratio of negative area over total area of all muscles, 0*.*15 ± 0*.*04, mean ±*SD* over subjects). To assess the effect of this procedure on the number of generators and on their structure, we also identified generators from the original phasic muscle patterns without imposing a non-negativity constraint and using an iterative factorization algorithm based on gradient descent in place of NMF (see below).

*Data matrices.* To identify spatial, temporal, and spatiotemporal generators, joint torque and muscle patterns data, after preprocessing, were organized into three different data matrices that were factorized by either PCA (torques) or NMF (muscle patterns). For each subjects we identified generators from *K* task conditions (*K* = 32, except for subject 3 for which we had to exclude 2 conditions on the frontal plane and 2 on the sagittal plane because of missing data from the arm markers used to compute joint angles). To identify spatial generators, the data for each condition (*D* signals, EMG or torque, times *T* samples, with *T* = 100 after time normalization and resampling, see **Figure 2**) were arranged into a data matrix **X** with *D* row and *T* × *K* columns which was factorized, according to Equation (4) in matrix notation, as **X** = **W C**, where **W** is the conditionindependent synergy matrix with *D* rows and *N* columns, *N* number of generators, and **C** is the matrix of condition- and time-dependent combination coefficients with *N* rows and *T* × *K* columns. For temporal generators, in contrast, the data matrix was constructed by arranging the waveforms from all signals in all conditions as columns, i.e., **X** had *T* rows and *D* × *K* columns, and it was factorized, according to Equation 5 in matrix notation, as **X** = **C W**, with **C** is the condition-independent matrix of temporal components, with *T* rows and *N* columns, and **W** is the condition- and signal-dependent matrix of weights, with *N* rows and *D* × *K* columns. Finally, for spatiotemporal generators, the data samples for all signals of each conditions were arranged in a column and the data matrix **X**, with *D* × *T* rows and *K* columns, was factorized, according to Equation 6 in matrix form, as **X** = **V A**, with **V** condition-independent matrix of time-varying synergies with *D* × *T* rows and *N* columns and **A** condition-dependent

with different background colors. A spatial decomposition is obtained by factorizing the data matrix obtained by stacking the data from individual conditions horizontally (i.e., matching their spatial—channels—dimension) into a matrix of *N* (3) spatial generators (*D* rows and *N* columns) times a

coefficients. Finally, a spatiotemporal decomposition is obtained by arranging all the data samples of each condition into a column and factorizing the resulting matrix into a matrix of *N* (3) spatiotemporal generators (*D* × *T* rows and N columns) times a matrix of condition-dependent coefficients.

matrix of combination coefficients with *N* rows and *K* columns. For joint torque generators, the covariance of the data matrix was computed and, for each *N*, the first *N* principal components (extracted using MATLAB *pcacov* function) were considered. For muscle pattern generators, for each *N*, **C** and **W** matrices were initialized randomly and the best solution out of 20 runs of the NMF algorithm was selected. Each run of the iterative algorithm was terminated when the reconstruction R<sup>2</sup> increased in one iteration by less than 10−<sup>4</sup> for five consecutive iterations. To assess the effect of clipping to zero the negative values of the phasic muscle patterns, we also identified generators without non-negativity constraints using an iterative gradient descent algorithm. We minimized the data reconstruction error by combination of spatial, temporal, and spatiotemporal generators iterating a step in which combination coefficients (**C** and **A**), in the spatial and spatiotemporal cases, and weights (**W**), in the temporal case, were updated and a step in which synergies (**W** and **V**), in the spatial and spatiotemporal cases, and temporal components (**C**), in the temporal case, were updated. Both updates were performed along the direction opposite to the error gradient with step size of 0.05 for the combination coefficients (spatial and spatiotemporal cases), 0.0018 for the weights (temporal case), 0.0005 for the synergies (spatial and spatiotemporal cases), 0.0019 for the temporal components. These gradient step sizes were selected within a range of values in order to achieve the highest reconstruction R2. As in d'Avella et al. (2006), in the spatial and spatiotemporal cases, we added a term to the error function to penalize large negative values in the identified synergies. The same number of runs and termination conditions as for the NMF algorithm were used.

*Selection of the number of generators.* For torque generators, we selected their number as the minimum number which explained at least 90% of the data variation (VAF or R2, defined as 1—SSE/SST, with SSE sum of square residuals of the data reconstruction by the generators, and SST sum of the squared residuals of the data with respect to the mean over the rows of the data matrix). Such criterion ("R<sup>2</sup> threshold") has been frequently used in the muscle synergy literature (Tresch et al., 1999; Ting and Macpherson, 2005; Torres-Oviedo et al., 2006; Roh et al., 2012), even if sometimes with a different definition (i.e., with SST defined as the sum of the squared data, see Delis et al., 2013). Such criterion is based on the assumption that the fraction of data variation unexplained is due to noise and the threshold is supposed to separate structured variation due to the combination of generators and noise. However, if an independent estimation of the noise level is not available the choice of such threshold is necessarily *ad-hoc*. An alternative approach, also used in previous studies (d'Avella et al., 2003; Cheung et al., 2005; Tresch et al., 2006), that we used for selecting the number of muscle pattern generators is the detection of a "knee" in the curve of R<sup>2</sup> as a function of the number of generators. Such criterion ("R<sup>2</sup> knee") relies on the assumption that the noise is isotropic, i.e., contributes equally to all dimensions, and does not depend on a specific assumption of the relative level of noise. To detect a change in slope in the R<sup>2</sup> curve, for each *N*, we performed a linear fit of the portion of the curve from *N* to the end (i.e., *D*) and we selected *N* for which the mean square error of the fit was *<* 10<sup>−</sup>4, indicating that the "tail" of the curve after the "knee" was essentially straight. We could not use this second criterion for the torques as their maximum spatial dimension (4) was too low and it was impossible to identify a "knee" with such procedure. However, to compare torque and muscle pattern with the same criterion, we also determined their dimensionality with a criterion ("R<sup>2</sup> shuffle") that took into account the different intrinsic noise levels of the two datasets. We then used a threshold on the slope of the R<sup>2</sup> curve according to slope of the curve obtained after a random shuffling the rows of the data matrix (Cheung et al., 2009). Data samples after shuffling were low-pass filtered to match the smoothness in the original data. By shuffling the data the multidimensional structure of the original data was lost but each dimension maintained the original variability. Thus, we selected the number of generators as the point on the original R<sup>2</sup> curve at which any further increase in the number of extracted generators yielded an R<sup>2</sup> increase smaller than 75% of that for the generators extracted from the shuffled data (mean over 50 extractions from reshuffled data).

*Comparison of generators across subjects.* To compare generators across subjects, we tested how well a set of generators identified in one subject could reconstruct the data of a different subject. We then computed a *R*2-value to assess the similarity of the subspaces spanned by the generators of different subjects. To assess the significance of these *R*2-values we performed a Monte Carlo simulation identifying generators from random data obtained by randomly shuffling the original data (50 runs for each subject and type of generator). We then computed the 95% percentile of the distribution of *R*2-values for the reconstruction of the original data with generators identified from random data.

*Effect of potential contamination of EMG recordings by cross-talk.* As surface EMG recordings can be affected by crosstalk due to volume conduction of the EMG signal from neighboring muscles, we performed a Monte Carlo simulation to assess the effect of such potential contamination on the dimensionality of the spatial muscle pattern generators. For each muscle, i.e., the *i*-th row of the normalized data matrix **X**, we simulated a crosstalk contamination from a second muscle (*j*-th row), randomly chosen from all other muscles and not limited to the neighboring ones, according to a cross-talk weight (α) randomly drawn from an exponential distribution with mean 0.1, i.e., X ik = Xik + α Xjk. We then identified the generators from the contaminated data matrix **X** with NMF and we estimated their dimensionality. For each subject, we performed a total of 100 simulation runs, and, across all subjects, we found that the dimensionality estimated using the contaminated data matrix was different from the dimensionality estimated using the original matrix only in 6.7% of runs (0 runs for subjects 1, 2, 4; 27 runs for subject 3, mean dimensionality difference 0.27). Thus, while we cannot exclude that our EMG recordings were not affected by cross-talk, we are confident that such potential contamination did not significantly alter muscle pattern dimensionality estimation.

# **RESULTS**

### **DYNAMIC TORQUES**

Joint torques were estimated by inverse dynamics from joint angle trajectories using a kinetic model of the arm parametrized by the height and weight of each subject. To validate the arm model and the results of the inverse dynamics computation we performed a forward dynamic simulation. The joint angle trajectories simulated using the torques estimated by inverse dynamics matched well the original joint angle trajectories of all subjects and conditions (*R*<sup>2</sup> = 0*.*99 ± 0*.*02, mean across subjects ±*SD*). **Figure 3** shows an example of end point trajectories, end point speed profiles, joint angle trajectories, angular velocities, and gravitational and dynamic joint torque profiles for eight center-out movements on the frontal plane. As expected, end point trajectories were straight and velocity profiles bell-shaped. Joint angle trajectories and the corresponding angular velocities were modulated by movement direction. Dynamic torques, i.e., total torque with the gravitational torques removed, were bi-phasic, as observed before (Gottlieb et al., 1997). The time courses of the joint angle trajectories and angular velocities were different across joints and conditions but, because of the dynamic interaction between the different DOFs and they were generated by a synchronous biphasic pulse of torque distributed across joints with different balances depending on the movement direction. Such coordination patterns in the dynamic torque profiles is clearly visible in a scatter plot of a pair of joint torques. **Figure 4** shows the six scatter plots of all pairs of joint torque profiles, during an interval of 250 ms around movement onset, approximately capturing the

**FIGURE 3 | Example of endpoint speed, velocity, joint angles and torques.** Example of endpoint trajectories, end-point speed profiles, joint angles, joint angular velocities, gravitational (*light gray*) and dynamic (*dark gray*) torques for eight center-out movements in the frontal plane of

subject 1 (mean across repetitions of each movement). Vertical dashed lines represent the times of movement onset and movement end. Shaded areas around gravitational and dynamic torque profiles represent ±1 *SD* around the mean.

first phase of the profile, for the same 8 movements of **Figure 3**. If a pairs of torques were modulated synchronously, the corresponding trajectory in the scatter plot would appear as a straight line segment with a direction depending on the relative amplitude. Indeed, for most pairs and movement directions dynamic torques appeared to be modulated close to synchronously, especially in the initial (raising) portion of the profile. Finally, for two pairs of dynamic torques, shoulder external rotation-shoulder adduction and elbow flexion-shoulder flexion, the direction of the line segment in the scatter plot depended only weakly on the movement direction, suggesting that the dynamic torques were spanning a subspace of the four dimensional torque space orthogonal to those two directions. We then generalized these observations by identifying dynamic torque generators and estimating their dimensionality.

### *Spatial dimensionality*

We first assessed the spatial dimensionality of the dynamic torques by identifying spatial generators, i.e., vectors in the torque space capturing specific balances of torque magnitude which could reconstruct the data once multiplied by time- and condition-dependent coefficients (see Materials and Methods and **Figure 2**), using PCA. For each subject, the number of generators was selected as the minimum number for which the fraction of data variation explained exceeded 0.9 ("R<sup>2</sup> threshold" criterion) and as the number of generators for which adding an additional generator increased the *R*2-value less than 75% of the mean *R*2-values obtained identifying generators from shuffled data ("R<sup>2</sup> shuffle" criterion). The mean dimensionality across subjects was 2.25 according to the R2threshold criterion and 2.75 according to the R2 shuffle criterion (see **Table 3** for individual values). The maximum potential spatial dimensionality of the torques was 4, corresponding to the number of joints, i.e., the number of rows of the data matrix used for spatial decomposition (**Figure 2**).

**Figure 5A** shows the *R*2-value as a function of the number of generators for subject 1 and **Figure 5B** the three spatial generators of the same subject selected according to the R2 shuffle criterion. The first generator (**w1**) is dominated by shoulder flexion torque. The second generator (**w2**) combines a large shoulder adduction torque with a smaller shoulder internal rotation (i.e., negative external rotation) and elbow extension (i.e., negative elbow flexion). Finally, the third generator (**w3**) represents a large elbow flexion torque and a smaller shoulder adduction torque. Notably, none of the generators or their combinations can generate coordinated shoulder adduction and shoulder external rotation torques, i.e., the direction orthogonal to the torques direction observed in the corresponding scatter plot of **Figure 4**. Thus, the structure of the spatial generators indicated that such torque coordination pattern was never used to perform reaching movements in the frontal and sagittal planes.

**Figure 5C** illustrates an example of the reconstruction of the dynamic torque profiles of subject 1 in six different conditions by the combination of the three spatial generators of **Figure 5B**. The dynamic torques for the first two conditions, medial and lateral movements in the frontal plane, are generated by a comparable level of activation of all three generators with a bi-phasic activation of shoulder adduction and internal rotation followed by shoulder abduction and external rotation for the medial movement and the opposite order for the lateral movement captured mainly by the activation of the second generation with similar biphasic profiles but opposite signs of its combination coefficient


**Table 3 | Comparison of different types of dimensionality of dynamic torques and phasic muscle patterns estimated according to three criteria for the selection of the number of generators.**

(c2). The last two conditions, backward and forward movements in the sagittal plane, require large shoulder flexion/extension torques that are generated by a bi-phasic activation of the first generator, captured by the first time-varying coefficient (c1).

To assess the similarity between the subspaces spanned by the generators identified in each subject we reconstructed all dynamic torques of each subject with the generators of all subjects. **Table 4** shows the *R*2-values obtained using the number of generators determined according to the R2 shuffle criterion (see **Table 3**). The *R*2-values for the reconstruction of the data of each subject by the generators extracted from the other subjects (0*.*96 ± 0*.*04, mean ±*SD*, *n* = 12) were close to the *R*2-values of the reconstruction of the data of each subject by the generators extracted from the same data (0*.*98 ± 0*.*02, *n* = 4) and were significantly higher than the *R*2-values obtained with generators identified from randomly shuffled data, indicating that the dynamic torques of the different subjects shared a similar spatial organization.

# *Temporal dimensionality*

To identify generators of the temporal organization of dynamic torques we performed PCA on the collection of the torque profiles of all joints and conditions. The resulting temporal components were then waveforms with the same duration as the torque profiles and each profile was reconstructed by multiplying the component matrix by a weight specific for that joint and condition. The dimensionality was 1 for all subjects and for both criteria (see **Table 3**). In contrast, the maximum potential temporal dimensionality of the torques was 100, corresponding to the number of time samples after time-normalization and resampling from −0*.*5 MT before movement onset and 0.5 MT after movement end, i.e., the number of rows of the data matrix used for temporal decomposition (**Figure 2**).

**Figure 6A** illustrates the R2 curve for the temporal decomposition up to 12 generators for subject 1 and **Figure 6B** the single temporal component identified in this subject and representative of all subjects, clearly showing a bi-phasic profile. **Figure 6C** illustrates the reconstruction of the joint torques for the same six conditions of **Figure 5C** by the temporal generator. The torque profiles for each condition are reconstructed multiplying the single temporal component (c1) by a single condition-dependent weight vector (**w1**). With respect to the reconstruction with spatial generators, the weight vector, which has the same dimensions of a spatial generator, is now modulated by the movement. For example the opposite signs in the bi-phasic profiles of shoulder adduction and external rotation for medial and lateral movements and for shoulder flexion for backward and forward movements are obtained by opposite signs of the components of the weight vector.

Finally, the temporal generators were also similar across all subjects. As for spatial generators, the reconstruction of the data of each subject by the generators of all other subjects had *R*2 values (0*.*93 ± 0*.*01, mean ±*SD*) which were comparable with the *R*2-values for the reconstruction of the data of each subject by the generator extracted from the same data (0*.*94 ± 0*.*01) and significantly higher than the *R*2-values obtained with generators identified from randomly shuffled data.

# *Spatiotemporal dimensionality*

Spatiotemporal generators, which can be viewed as either timevarying vectors capturing a different spatial coordination among torques at each time or as collections of different waveforms for each torque, were identified by PCA on a data matrix obtained arranging all time samples from all joints in a single column for each movement condition. Thus, torque samples from different joints and times represented different dimensions and the possibility of generating the data with a number of generators smaller than the maximum potential dimension (400, corresponding to the number of joints times the number of samples) revealed a coordination in the activation of different joints at different times. Once a set of spatiotemporal generators are identified, the data are reconstructed by multiplying each generator by a single condition-dependent coefficient (see

**FIGURE 5 | Spatial decomposition of dynamic torques. (A)** R2 curve for subject 1 obtained by spatial decomposition using PCA. **(B)** Three spatial generators selected for subject 1. **(C)** Example of the reconstructions of the dynamic torques for six movement conditions of subject 1 obtained with the generators illustrated in panel **(B)** (*shaded area*: original data, *thick line*: reconstructed data, *bottom*: time-varying combination coefficients).

**Table 4 | R2 for the reconstruction of the data of each subject with the torque generators identified in all subjects.**


**Figure 2**). Thus the spatiotemporal decomposition provides a potentially very compact representation of the structure inherent in the data. The mean spatiotemporal dimensionality across subjects was 2.75 according to both criteria (see **Table 3** for individual values). Notably, mean spatial and spatiotemporal dimensionalities were very close and even equal for each subject when considering the R<sup>2</sup> shuffle criterion. Moreover, as the temporal dimensionality was 1, the spatiotemporal dimensionality was essentially the product of the spatial and the temporal dimensionalities.

pt **Figure 7A** illustrates the R2 curve for the spatiotemporal decomposition up to 12 generators and **Figure 7B** the three spatiotemporal component for subject 1. Comparing the structure of these generators with that of the spatial (**Figure 5B**) and temporal (**Figure 6B**) generators of the same subject, it is apparent how each spatiotemporal generator appears as the product of a spatial generator by a temporal one. Indeed, the activation waveforms of all spatiotemporal generators are approximately synchronous and similar to the waveform of the single temporal generator. **Figure 7C** illustrates the reconstruction of the joint torques for the same six conditions of **Figures 5C**, **6C**. The torque profiles for each condition are reconstructed multiplying each spatiotemporal component by a single condition-dependent coefficient (**ci**), represented by the height of the rectangle below the torque profiles. With respect to the reconstruction with spatial and temporal generators, movements requiring torque profiles with opposite signs are generated simply by changing the sign in a single combination coefficient, e.g., c2 for medial and lateral movements and c1 for backward and forward movements.

As in previous cases, spatiotemporal generators were similar across subjects. The *R*2-values for the reconstruction of the data of each subject by the generators of all other subjects (0*.*90 ± 0*.*04, mean ±*SD*) were close to the *R*2-values for the reconstruction of the data of each subject by the generator extracted from the same data (0*.*95 ± 0*.*02) and significantly higher than the *R*2-values obtained with generators identified from randomly shuffled data.

**FIGURE 7 | Spatiotemporal decomposition of dynamic torques. (A)** R<sup>2</sup> curve for subject 1 obtained by spatiotemporal decomposition using PCA. **(B)** Three spatiotemporal generators selected for subject 1. **(C)** Example of the reconstructions of the dynamic torques for six movement conditions of subject 1 obtained with the generator illustrated in panel **(B)** (*shaded area*: original data, *thick line*: reconstructed data, *bottom*: combination coefficients represented by the height of the rectangle containing the temporal profile of each generators averaged over joints).

subject 1 obtained with the generator illustrated in panel **(B)** (*shaded area*: original data, *thick line*: reconstructed data, *bottom*: joint-specific weights).

### *Potential misestimation of subject mass and height*

The estimation of the joint torques from the recorded joint kinematics through the inverse dynamic calculation depends on geometric and inertial parameters which are estimated as a function of the subject mass and height according to anthropometric tables (see Arm Model section in Materials and Methods). We assessed the effect of a potential misestimation of such parameters on the estimated torque dimensionality by identifying joint torque generators after varying the mass and the height of each subject by ±5, ±10, ±15, ±20%. We recomputed joint torques for individual trials of all subjects and re-processed the torque data as with the original parameters. Across all subjects and types of generators, the dimensionality was affected by a change in mass in 8 out of 96 cases (4 subjects × 3 types of generators × 8 mass change levels) and by a change in height in nine cases. Thus, the estimation of torque dimensionality is robust to small errors in the estimation of anthropometric parameters.

### **MUSCLE PATTERNS**

Phasic muscle patterns, obtained by subtracting the anti-gravity (tonic) components from the rectified, filtered, averaged, timenormalized, and resampled EMG waveforms, were decomposed with NMF to assess their dimensionality. Phasic muscle patterns for fast reaching movements in vertical planes have been described before (d'Avella et al., 2006). In contrast to our previous study, here we identified spatial generators, temporal generators, and spatiotemporal generators without onset delays and we compared their dimensionality with the dimensionality of the corresponding generators of dynamic torques.

### *Spatial dimensionality*

The mean spatial dimensionality of the phasic muscle patterns across subjects was five according to the position of change in slope of the R2 curve as a function of the number of generators (R<sup>2</sup> knee criterion) and five according to the R<sup>2</sup> shuffle criterion (see **Table 3** for individual values). Thus, as expected, the dimensionality of the muscle pattern generator was larger than the number of spatial torque generators (2.75 according R<sup>2</sup> shuffle criterion) as muscle pattern generators could only be combined with non-negative time- and condition-dependent combination coefficients. However, the number of muscle pattern generators was larger than the minimum required for generating a space of the same number of linear dimensions as the torque generators (2.75) by non-negative combinations (3*.*75 = 2*.*75 + 1).

**Figure 8A** shows the R<sup>2</sup> curve for the spatial decomposition of the phasic muscle patterns of subject 1, in which a knee at four generators is clearly visible. The lower *R*2-value at the selected number of muscle patterns generators (0.80) with respect to the corresponding value for the torque generators (0.99) indicated that a much larger fraction of the muscle data variation was due to noise. The four spatial generators (or time-invariant muscle synergies) for the same subject illustrated in **Figure 8B** (**w1**–**w4**) show specific groupings of muscles spanning multiple joints and with the same muscle recruited by multiple generators. Finally, in **Figure 8C** the examples of the reconstruction of the phasic muscle patterns for six movement conditions by the combination of the spatial generators are presented. The temporal structure of muscle

**FIGURE 8 | Spatial decomposition of phasic muscle patterns. (A)** R<sup>2</sup> curve for subject 1 obtained by spatial decomposition using NMF. **(B)** Four spatial generators selected for subject 1. **(C)** Example of the reconstructions of the muscle patterns for six movement conditions of subject 1 obtained with the generators illustrated in panel **(B)** (*shaded area*: original data, *thick line*: reconstructed data, *bottom*: time-varying combination coefficients).


**Table 5 | R2 for the reconstruction of the data of each subject with the muscle pattern generators identified in all subjects.**

patterns and of combination coefficients is clearly more complex than that of the spatial generators with the tri-phasic organization of the muscle patterns generated by both the temporal structure of the combination coefficients and by the superposition of different generators.

Finally the spatial generators for the muscle patterns were less similar across subjects than the spatial generators for the torques (see **Table 5**). The reconstruction of the data of each subject by the generators of all other subjects had a *R*2-values (0*.*66 ± 0*.*06, mean ±*SD*) much lower than the mean R<sup>2</sup> for the reconstruction by the generators extracted from the same data (0*.*80 ± 0*.*04) but still significantly higher than the *R*2-values obtained with generators identified from randomly shuffled data.

## *Temporal dimensionality*

The mean number of temporal generators of the phasic muscle patterns was 4.5 according to the R2 knee criterion and 3.75 according to the R2 shuffle criterion (see **Table 3** for individual values). As for the spatial generators, the temporal dimensionality of the muscle pattern generators was larger than the minimum number required to generate a space with the same linear dimensions as the number of temporal torque generators (1) by non-negative combinations (2 = 1 + 1).

**Figure 9A** shows the R<sup>2</sup> curve for the temporal decomposition of the phasic muscle patterns of subject 1 and **Figure 9B** the four temporal generators (or components) selected in that subject according to both criteria. The first three generators capture a single burst of muscle activity and the fourth component a small burst followed by a larger burst. The four components peak at different times and thus they appear to capture four distinct phases of the muscle patterns observed in different directions. However, the examples of muscle pattern reconstructions and combination weights for six movement directions (**Figure 9C**) show that in many cases the weight vectors loading the different components were similar within each movement condition (e.g., for the first two components of the medial movement and the last two components of downward movement), suggesting that such temporal decomposition was necessary to capture not only the major changes in the muscle patterns over the duration of the movement but also small asynchronous adjustments.

In contrast to the spatial generators but similarly to the temporal generators for torques, muscle pattern temporal generators were similar across all subjects. The *R*2-values for the reconstruction of the data of each subject by generators of all other subjects (0*.*82 ± 0*.*04, mean ±*SD*) were close to the values for the reconstruction by the generators extracted from the same data (0*.*85 ± 0*.*03) and higher than the *R*2-values obtained with generators identified from randomly shuffled data.

# *Spatiotemporal dimensionality*

The mean number of spatiotemporal generators of the phasic muscle patterns was 5.5 according to the R<sup>2</sup> knee criterion and 7.25 according to the R<sup>2</sup> shuffle criterion (see **Table 3** for individual values). Both dimensionality estimates were larger than the minimum number of generators required for generating a space with the same linear dimensions as the number of torque spatiotemporal generators (2.75) by nonnegative combinations (3*.*75 = 2*.*75 + 1). Moreover, differently from torques, the product of the spatial and temporal muscle pattern dimensionalities (22.5 according to the R<sup>2</sup> knee criterion and 18.7 according to the R<sup>2</sup> shuffle criterion) was much higher than the spatiotemporal dimensionality. Thus the spatiotemporal generators captured asynchronous muscle coordination patterns that were not simply the result of the synchronous combination of all possible spatial and temporal generators.

**Figure 10A** shows the R<sup>2</sup> curve for the spatiotemporal decomposition of the phasic muscle patterns of subject 1 and **Figure 10B** the seven spatiotemporal generators selected in that subject according to the R2 shuffle criterion. The asynchronous nature of the muscle activation waveforms can be noticed in most of these spatiotemporal generators. For example, in TrLat and TrLong in **w1**show two clearly delayed peaks. Finally, the examples of muscle pattern reconstructions and combination coefficients for six movement conditions illustrate how the organization of the muscle patterns is captured parsimoniously by the spatiotemporal generators as each movement is reconstructed specifying only 7 scalar combination coefficients (represented by the height of the rectangles depicting the mean generator waveform over all muscles).

Finally, muscle patterns of different subjects did not have similar spatiotemporal generators. The reconstruction of the data by generators of all other subjects had a much lower *R*2-values (0*.*37 ± 0*.*08, mean ±*SD*) than the *R*2-values for the reconstruction of the data by generators extracted from the same dataset (0*.*80 ± 0*.*03) but still higher than the *R*2-values obtained with generators identified from randomly shuffled data.

# *Effect of setting to zero negative values in the phasic muscle patterns*

To identify muscle pattern generators from phasic muscle patterns using NMF we set to zero all negative values resulting from the subtraction of the tonic muscle activity from the filtered EMG waveforms. However, to assess the effect of such procedure we also extracted muscle pattern generators from the unclipped phasic

curve for subject 1 obtained by temporal decomposition using NMF. **(B)** The four temporal generators selected for subject 1. **(C)** Example of the reconstructions of muscle patterns for six movement conditions of subject 1 obtained with the generator illustrated in panel **(B)** (*shaded area*: original data, *thick line*: reconstructed data, *bottom*: muscle-specific weights of each generator).

**FIGURE 10 | Spatiotemporal decomposition of phasic muscle patterns. (A)** R<sup>2</sup> curve for subject 1 obtained by spatiotemporal decomposition using NMF. **(B)** Seven spatiotemporal generators selected for subject 1. **(C)** Example of the reconstructions of the muscle patters for six movement conditions of subject 1 obtained with the generator illustrated in panel **(B)** (*shaded area*: original data, *thick line*: reconstructed data, *bottom*: combination coefficients represented by the height of the rectangle containing the temporal profile of each generators averaged over muscles).

muscle patterns using a gradient descent iterative algorithm (see Materials and Methods). In all cases the dimensionality of the generators identified with the gradient descent algorithm from the unclipped data was close to the dimensionality of the generators identified by NMF from the clipped data and the generators extracted in the two cases were very similar. The dimensionality of the spatial generators identified from the unclipped data was, on average across subjects, 4.75 (according to both R<sup>2</sup> knee and R<sup>2</sup> shuffle criteria) and thus differed only by 0.25 from the dimensionality of the generators identified from the clipped data. For the temporal generators the difference in dimensionality was 0.5 according to the R<sup>2</sup> knee criterion and 0.25 according to the R<sup>2</sup> shuffle criterion. Finally, for the spatiotemporal generators the difference was 0.25 according to both criteria. The similarity between the spatial generators identified from the clipped data and the same number of generators identified from the unclipped data, quantified by the mean normalized scalar product between matched pairs of generators, was 0*.*91 ± 0*.*09 (mean across subjects ± *SD*). For all subjects the similarity value was significantly higher that the value expected by chance, i.e., it was above the 95% percentile of the distribution of the similarity values between generators identified from the unclipped data and generators identified from the shuffled clipped data. The similarity between the temporal generators identified from the clipped data and the same number of generators identified from the unclipped data was 0*.*91 ± 0*.*05, also significantly higher than chance for all subjects. Finally, for spatiotemporal generators the similarity was 0*.*90 ± 0*.*02 and also significantly higher than chance for all subjects. We can thus conclude that clipping to zero the negative values of the phasic muscle patterns affected the dimensionality and the structure of the identified muscle pattern generators only minimally.

# **DISCUSSION**

We assessed the dimensionality of the dynamic joint torques responsible for accelerating and decelerating the arm during point-to-point reaching movements in different directions in the frontal and sagittal planes and the dimensionality of phasic muscle patterns underlying the production of those torques. We used multidimensional factorization techniques, PCA for the torques and NMF for the muscle patterns, to identify generators capturing the spatial, temporal, and spatiotemporal organization of the motor commands. The number of generators selected according to either a threshold in the total data variation explained, or a change in slope in the curve of the variance explained, or the increase in data variation explained adding an additional generator with respect to the increase obtained extracting generators from randomly shuffled data was taken as an estimate of the dimensionality. The spatial dimensionality of the dynamic torques was lower than the number of joints considered, indicating that some of the available spatial coordination patterns were never employed by the CNS when generating the joint torques for this task. A single temporal generator with a biphasic activation profile was identified in all subjects, in accordance and generalizing previous observations on the temporal organization of dynamic torques on a single vertical plane (Gottlieb et al., 1997). However, a higher number of temporal generators may be required to account for more complex changes in joint torques in other types of reaching movements (e.g., slow, egocentric etc., see Lacquaniti et al., 1986). The number of spatiotemporal generators was in most subjects equal to the product of the spatial and temporal dimensionality and their structure indicated that the spatiotemporal organization of the dynamic torques was essentially synchronous, obtained by the temporal modulation of the spatial generators by the biphasic profile of the single temporal generator. In contrast, the spatial, temporal, and spatiotemporal dimensionalities of the phasic muscle patterns were higher than the corresponding torque dimensionality, as expected because of the non-negativity constraints in the combination of muscle pattern generators, but also higher than the minimum number required according to this biomechanical constraint. Moreover, the spatiotemporal dimensionality of the muscle patterns was much lower than the product of their spatial and temporal dimensionality, suggesting that specific asynchronous coordination patterns were used in the generation of muscle patterns. In fact, most of the identified spatiotemporal generators showed peaks of activity in different muscles at different times, i.e., coordination patterns that cannot be captured by the synchronous modulation of one of the spatial generators by one of the temporal generators.

The CNS might generate motor commands by organizing a few generators, basic elements in a modular architecture capturing shared knowledge across tasks and conditions, to reduce the number of parameters required for control (Alessandro et al., 2013; Ruckert and d'Avella, 2013). Evidence for a modular organization of the motor commands has recently come from the observation of low-dimensionality in the muscle patterns recorded in many species, behaviors, and tasks (Tresch et al., 1999; d'Avella et al., 2003, 2006; Hart and Giszter, 2004; Ivanenko et al., 2004; Ting and Macpherson, 2005; Overduin et al., 2008; Muceli et al., 2010; Dominici et al., 2011; Berger et al., 2013) and from neural recordings and stimulation (Saltiel et al., 2001; Ethier et al., 2006; Gentner and Classen, 2006; Hart and Giszter, 2010; Overduin et al., 2012). Intramuscular recordings during isometric contractions have also revealed that the number of basic muscle activation patterns in complex movements is very limited (ter Haar Romeny et al., 1984; van Zuylen et al., 1988). However, motor task and behaviors are accomplished by the joint torques generated by the simultaneous and coordinated activation of many muscles and to understand how a small set of muscle pattern generators may accomplish a task it is necessary to understand the relationship between muscle patterns and joint torques. The transformation between muscle patterns and torques depends on several biomechanical characteristics and constraints. There are more muscles than joints, making motor commands at the level of muscle patterns redundant, i.e., the same torque pattern can be generated by different muscle patterns. Muscles can only pull and their activation can be expressed by non-negative values, thus introducing a fundamental non-negativity constraint in the generation of muscle patterns. These characteristics and constraints affect the potential dimensionality of the joint torques associated to the dimensionality of the underlying muscle patterns. For a linear mapping of muscle activity into force, an assumption that may be true only in specific conditions such as submaximal isometric contractions (Borzelli et al., 2013) but useful for illustrative purposes, because of the non-negativity constraint, at least *D* + 1 generators are required to span a *D* dimensional torque space (Davis, 1954; Valero-Cuevas, 2009). In fact, by linearity, the image of the pseudo-inverse transformation of the *D* dimensional torque space is a *D* dimensional subspace of muscle space but such subspace is not contained in the positive orthant of the muscle space, i.e., cannot be generated by non-negative activations and an additional dimension in the null space of the linear transformation needs to be used to achieve a non-negative muscle pattern for each torque. In the general non-linear case, the manifold in muscle space containing all the minimum patterns associated to a *D* dimensional torque space may be already higher than *D* even before considering the non-negativity constraint. Thus, the dimensionality of the muscle space must be at least *D* + 1 to generate a *D* dimensional torque space for the non-negativity constraint and possibly larger. However, because of redundancy, the CNS might use a number of muscle pattern generator larger than the minimum to optimize some other cost in addition to the number of control parameters, such as effort. Since the generation of muscle patterns with a smaller number of generation requires in general more effort, the dimensionality of the muscle pattern generators might result from a trade-off between computational complexity and effort. We found that the muscle pattern dimensionality is indeed larger than the minimum prescribed by non-negativity, likely an effect of non-linearity but possibly also due to a choice of generators capable of achieving the same motions with less effort. Future investigations comparing tasks with similar kinematics but different effort might help clarify this point.

As mentioned in the Introduction, muscle pattern generators can be associated to force-field primitives (Bizzi et al., 1991; Giszter et al., 1993; Kargo and Giszter, 2000a,b; Giszter and Hart, 2013), endpoint force or joint torque generators that depend on joint angles and velocities and are linearly combined using the activation coefficients of the muscle pattern generators. Competence of force-field primitives to generate observed force and kinematic behaviors has been demonstrated in the frog through forward biomechanical simulation (Kargo et al., 2010). Similarly, a forward dynamics simulation of a musculoskeletal model of the human leg has shown that the combination of a small number of muscle pattern generators is sufficient to perform the basic sub-tasks of walking (Neptune et al., 2009). Moreover, a recent simulation study using a musculoskeletal model of the human arm has indicated that spatiotemporal generators adequate to perform reaching movements can be learned through reinforcement (Ruckert and d'Avella, 2013). Here, in contrast, we did not perform forward dynamics simulation to assess the competence of muscle pattern generators and associated force-field primitive to control reaching movements. We focused, instead, onto the dimensionality of recorded muscle patterns and the dimensionality of joint torques estimated from recorded kinematics by inverse dynamics. Forcefield primitives are associated one-to-one with muscle pattern generators, i.e., they have the same dimensionality. Because of the non-negativity of muscle pattern generator activation coefficients and the non-linearity in the muscle-to-force mapping, such dimensionality is necessarily higher than the dimensionality of the joint torques, i.e., the number of torque generators necessary to adequately reconstruct the observed torques by linear combinations. As different numbers of muscle pattern generators and force-field primitives can potentially generate the same set of observed torques and thus being equally competent to perform a given behavior, comparing the dimensionality of muscle patterns and joint torque provides additional information on the strategy that the CNS employs to organize a modular control architecture.

We assessed the dimensionality of joint torques and muscle patterns according to three different definitions of generators (spatial, temporal, and spatiotemporal) and we could then also compare, within each dataset, the different types of dimensionality. For the torques we found that the dimensionality of the spatiotemporal generators was equal to the product of the dimensionalities of the spatial and temporal generators, suggesting that such generators can be obtained as the product of spatial and temporal generators (Delis et al., 2014). The spatial dimensionality was two or three, i.e., less than the number of angular DoF involved in the task, indicating that the CNS selected specific coordination strategies already at the kinematic level. The temporal dimensionality was one, indicating, in accordance with previous observations (Gottlieb et al., 1997), that the temporal organization of the dynamic torques is very simple: a bi-phasic profile shared by all joints and movement conditions (but see Lacquaniti et al., 1986 for more complex torque profiles). The dimensionality of the spatiotemporal generators was in all subjects equal to the spatial dimensionality because, given a single temporal generator, each spatiotemporal generator was obtained by the temporal modulation of each spatial generator by the temporal generator. Consequently, the spatiotemporal organization of the torques was essentially synchronous. In contrast, the number of spatiotemporal generators for the muscle patterns was much less than the product of spatial and temporal dimensionalities. Indeed, spatiotemporal generators captured asynchronous activations across muscles that could not be obtained by the modulation of a single spatial generator by a single temporal generator, which necessarily produces a synchronous pattern. Thus, spatiotemporal generators appear to provide a very compact representation of the organization of the muscle patterns (Delis et al., 2014). However, differently from our previous analyses of muscle patterns during reaching (d'Avella et al., 2006, 2008, 2011; d'Avella and Lacquaniti, 2013), in the present spatiotemporal decomposition we did not take into account the possibility of shifting in time the onset of different generators (Equation 8) because in this way we could use the same NMF algorithm used for spatial and temporal decomposition. We found a larger number of generators without time-shifts than the number of time-varying muscle synergies (with timeshifts) reported before. Thus, additional structure in the muscle patterns can be captured by allowing the independent modulation of the time of recruitment of the generators thus allowing for an even more compact representation of the muscle pattern organization.

The validity of our observations depends on a number of assumptions made in the analysis of the torque and muscle activity data. Concerning the estimation of the joint torques from the recorded motions of markers positioned on the subjects' arm, we relied on a simplified model of the human arm. We assumed that the shoulder was a spherical joint, i.e., all three rotation axes intersect at a single point, we estimated the length (Winter, 1990) and inertial parameters (Zatsiorsky and Seluyanov, 1983) of each segment as a function of the height and weight of each subject. To assess the effect of potential inaccuracies in the parameters of our model, we performed inverse dynamics varying the mass and height of each subject up to ±20% and we found that the estimated joint torque dimensionality changed only in a small fraction of cases. Thus, we believe that the estimation of torque dimensionality is robust to small inaccuracies in the anthropometric parameters. Concerning the identification of the muscle pattern generators with NMF, in order to be able to run such algorithm, we set to zero all samples with a negative value after subtracting the tonic components. We assessed the effect of such procedure by identifying muscle pattern generators from unclipped data using a gradient descent algorithm and we found that both the dimensionality changed only minimally (less than 0.5 in all cases) and did not significantly affect the structure of the generators. Concerning the criteria for the selection of the number of generators, we used a threshold based criterion for torques, a criterion based on the detection of a change in slope (i.e., a "knee") in the R<sup>2</sup> curve for muscle patterns, and a criterion based on the comparison of the slope of the R<sup>2</sup> curve for the generators extracted from the original data and those extracted after randomly shuffling the data (along the rows of the data matrix). All these criteria rely on the assumption that the data are generated by a number of generators smaller than the maximum dimensionality and that a fraction of the variation observed in the data is due to noise. Such assumption is shared by all previous studies using multidimensional decomposition to identify muscle synergies or temporal components but it is clear that it is not possible a-priori to exclude that, once a specific number of generators has been selected, the additional dimensions attributed to noise might be also necessary to capture the structure in the motor commands or, vice-versa, a generator might actually describe noise instead of structure in the motor commands. Moreover, unless an independent estimation of the level of noise in the data is available, the selection of the number of generators depends on *ad-hoc* choices of thresholds and parameters. However, as the determination of threshold on the MSE of the linear fit of the terminal portion of the R<sup>2</sup> curve, used for the R2 knee criterion, is less dependent on the amount noise in the data than the threshold on the *R*2-value, used in the R2 threshold criterion, we prefer, whenever possible, to use the former criterion, as we have done previously (d'Avella et al., 2003, 2006; Cheung et al., 2005), to the latter, also used in previous studies (Tresch et al., 1999; Ting and Macpherson, 2005; Torres-Oviedo et al., 2006; Roh et al., 2012). Unfortunately it was impossible to use the R<sup>2</sup> knee criterion for the spatial torque dimensionality, as the maximum dimension was 4 in that case, and we could only estimate the residual of a linear fit of the R<sup>2</sup> curve from 1 or 2 to 4 generators, i.e., evaluating the presence of a knee only at 1 or 2 generators. We then used the third criterion to compare more directly both datasets. The R<sup>2</sup> shuffle criterion (Cheung et al., 2009) is based on the assumption that the multidimensional structure but not the noise is affected by randomly shuffling the data. Thus, the selection of the number of generators in datasets with different amount of noise is based on the comparison of the slope of the R<sup>2</sup> curve for each data set with the slope of the R2 curve of a random dataset with comparable level of noise. In most cases the number of generators selected by the two criteria either matched exactly or different by one.

In conclusion, whether spatial (time-invariant muscle synergies), temporal (temporal components or patterns), or spatiotemporal (time-varying muscle synergies) generators are fundamental building blocks in a modular control architecture and how are they implemented in the CNS remain open and debated questions. Our comparison of the dimensionality of muscle patterns and joint torques suggests that the larger dimensionalities and spatiotemporal complexity of the muscle patterns with respect to the joint torques may be required for the CNS to overcome the non-linearities of the musculoskeletal system and, exploiting its redundancy, to flexibly generate endpoint trajectories with simple kinematic features using a limited number of building blocks.

# **ACKNOWLEDGMENTS**

Supported by the Italian Ministry of Health, the Italian Space Agency (DCMC, CRUSOE, and COREA grants), and the EU Seventh Framework Programme (FP7-ICT No 248311 AMARSi).

# **REFERENCES**


of motor primitives captures adjusted trajectory formation in spinal frogs. *J. Neurophysiol.* 103, 573–590. doi: 10.1152/jn.01054.2007


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 10 August 2013; accepted: 14 February 2014; published online: 03 March 2014.*

*Citation: Russo M, D'Andola M, Portone A, Lacquaniti F and d'Avella A (2014) Dimensionality of joint torques and muscle patterns for reaching. Front. Comput. Neurosci. 8:24. doi: 10.3389/fncom.2014.00024*

*This article was submitted to the journal Frontiers in Computational Neuroscience. Copyright © 2014 Russo, D'Andola, Portone, Lacquaniti and d'Avella. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Robustness of muscle synergies during visuomotor adaptation

#### *Reinhard Gentner 1, Timothy Edmunds 2, Dinesh K. Pai <sup>2</sup> and Andrea d'Avella1 \**

*<sup>1</sup> Laboratory of Neuromotor Physiology, Santa Lucia Foundation, Rome, Italy*

*<sup>2</sup> Department of Computer Science, University of British Columbia, Vancouver, BC, Canada*

### *Edited by:*

*Martin Giese, University Clinic Tuebingen/Hertie Institute, Germany*

### *Reviewed by:*

*Todd Troyer, University of Texas, USA Francesco Nori, Istituto Italiano di Tecnologia, Italy Bastien Berret, Université Paris-Sud, France*

### *\*Correspondence:*

*Andrea D'Avella, Laboratory of Neuromotor Physiology, Santa Lucia Foundation, Via Ardeatina 306, 00179 Rome, Italy e-mail: a.davella@hsantalucia.it*

During visuomotor adaptation a novel mapping between visual targets and motor commands is gradually acquired. How muscle activation patterns are affected by this process is an open question. We tested whether the structure of muscle synergies is preserved during adaptation to a visuomotor rotation. Eight subjects applied targeted isometric forces on a handle instrumented with a force transducer while electromyographic (EMG) activity was recorded from 13 shoulder and elbow muscles. The recorded forces were mapped into horizontal displacements of a virtual sphere with simulated mass, elasticity, and damping. The task consisted of moving the sphere to a target at one of eight equally spaced directions. Subjects performed three baseline blocks of 32 trials, followed by six blocks with a 45◦ CW rotation applied to the planar force, and finally three wash-out blocks without the perturbation. The sphere position at 100 ms after movement onset revealed significant directional error at the beginning of the rotation, a gradual learning in subsequent blocks, and aftereffects at the beginning of the wash-out. The change in initial force direction was closely related to the change in directional tuning of the initial EMG activity of most muscles. Throughout the experiment muscle synergies extracted using a non-negative matrix factorization algorithm from the muscle patterns recorded during the baseline blocks could reconstruct the muscle patterns of all other blocks with an accuracy significantly higher than chance indicating structural robustness. In addition, the synergies extracted from individual blocks remained similar to the baseline synergies throughout the experiment. Thus synergy structure is robust during visuomotor adaptation suggesting that changes in muscle patterns are obtained by rotating the directional tuning of the synergy recruitment.

**Keywords: muscle synergies, visuomotor rotation, motor adaptation, isometric force, EMG, directional tuning**

# **INTRODUCTION**

Human subjects can learn to move in novel environments and they can adapt to visuomotor (Ghilardi et al., 1995; Imamizu et al., 1995; Ghahramani et al., 1996; Krakauer et al., 1999, 2000) or dynamic (Lackner and Dizio, 1994; Shadmehr and Mussa-Ivaldi, 1994) perturbations. Generally, when subjects are exposed to a perturbation of the mapping between motor commands and end-effector motion or force, they initially produce large errors and they then gradually adapt, compensating for the perturbation and re-establishing baseline performance. When the perturbation is removed subjects make large errors in the opposite direction (after-effects) before gradually re-adapting. If the perturbation is unexpectedly and occasionally removed in a single trial (Thoroughman and Shadmehr, 2000) or if it changes continuously and randomly (Scheidt et al., 2001; Baddeley et al., 2003; Cheng and Sabes, 2007) the error experienced in one trial affects the motor command generated in the following trial. These observations suggest that the central nervous system (CNS) relies on internal models of the body and of the environment to predict the sensory consequences of motor commands and that adaptive processes adjust the internal models to reduce sensory prediction errors (Shadmehr et al., 2010; Krakauer and Mazzoni, 2011; Wolpert et al., 2011). Such adaptive processes can be modeled as error-based learning that reduces sensory prediction error by adjusting an internal state according to a linear time-invariant dynamics (Thoroughman and Shadmehr, 2000; Donchin et al., 2003; Cheng and Sabes, 2007; Tanaka et al., 2012). Multiple learning processes operating at different timescales (Smith et al., 2006) and learning at different hierarchical levels in the internal model (Braun et al., 2009) explain the time-course of performance errors under a variety of experimental manipulations. However, albeit behavioral observations, such as error time-course and generalization properties, made in numerous motor adaptation studies are well captured by current models, how the motor commands change during motor adaptation has been investigated only in a few cases (Wise et al., 1998; Thoroughman and Shadmehr, 1999; Li et al., 2001; Paz et al., 2003; de Rugy et al., 2009). Muscle pattern generation and its relationship to force generation during motor adaptation still needs to be fully understood.

Because of the redundancy in the musculoskeletal system, the change in motor commands underlying the change in motion or force necessary to compensate for a visuomotor or dynamic perturbation is not unique. For example, the rotation of the direction of force required to adapt to a rotation imposed onto the mapping between the force applied on an isometric joystick and the motion of a cursor on a computer screen (visuomotor rotation) may be accomplished by infinitely many different combinations of changes in individual muscle activations. In principle, during motor adaptation the performance error may be gradually reduced by changing the force output associated to each visual target using the same muscle pattern used for that force output before the perturbation. Alternatively error may be reduced by changing the activity of individual muscles independently of the muscle patterns used before the perturbation. For wrist muscles it has been shown that the rotation of the muscle directional tuning curve closely follows the rotation imposed onto the force-to-cursor mapping (de Rugy and Carroll, 2010), suggesting that adaptation occurs at the level of the planned force output. The first aim of our study was to investigate if this is also true for shoulder and elbow muscles during visuomotor rotation of the mapping between isometric forces generated by the arm at the hand, i.e., with a musculoskeletal system involving a larger number of muscles and joints.

The changes of the motor commands underlying adaptation to a visuomotor rotation may occur at the target force or at the muscle level, but in both cases the question of how a specific muscle pattern is selected to generate a desired force remains open. One hypothesis which has recently received considerable attention is that muscle patterns are generated as combinations of a few muscle synergies, coordinated recruitment of groups of muscles with specific activation balances, thus requiring the selection of only a small number of synergy combination parameters to generate a desired force. While muscle synergies have been studied intensively in human reaching movements (d'Avella et al., 2006, 2008, 2011; Muceli et al., 2010), isometric force generation (Borzelli et al., 2012; Roh et al., 2012), locomotion (Ivanenko et al., 2004; Dominici et al., 2011), cycling (Hug et al., 2010, 2011), responses to postural perturbations (Krishnamoorthy et al., 2003; Torres-Oviedo and Ting, 2007; Chvatal and Ting, 2012), complex motor skills (Frere and Hug, 2012), and in several different animal behaviors (Tresch et al., 1999; Saltiel et al., 2001; d'Avella et al., 2003; Hart and Giszter, 2004; Cheung et al., 2005, 2009; d'Avella and Bizzi, 2005; Ting and Macpherson, 2005; Torres-Oviedo and Ting, 2007; Overduin et al., 2008, 2012; Hart and Giszter, 2010), muscle synergies have not been directly investigated during adaptation to visuomotor rotations. Thus, our second aim was to investigate whether the synergies capturing the muscle patterns underlying the generation of multidirectional isometric forces are robust during motor adaptation. Thus, we hypothesized that the change in the tuning of muscles during adaptation to visuomotor rotation closely follows the rotation of the force and that the underlying changes in the muscle patterns can be explained by changes in the recruitment of synergies whose structure remains fixed.

# **MATERIALS AND METHODS PARTICIPANTS**

All procedures were approved by the Ethical Review Board of Santa Lucia Foundation. Eight right-handed naïve subjects (mean age 28.6 ± 6.0 year, age range 24–43, 5 females and 3 males, see **Table 1**) participated in the experiments after giving written informed consent.

# **EXPERIMENTAL SETUP**

Subjects sat in front of a desktop with their torso immobilized by safety belts. Their right forearm was inserted into a splint immobilizing the hand, wrist, and forearm. The center of the palm was aligned with the body midline at the height of the sternum and the elbow was flexed approximately by 90◦. The subjects' view of the hand was occluded by a 21-inch LCD monitor inclined with its


surface approximately perpendicular to the subjects' line of sight when looking at their hand (**Figure 1A**). After a calibration procedure, the monitor could display a virtual desktop matching the real desktop, a spherical cursor matching, at rest, the position of the center of the palm and moving on a horizontal plane, and spherical targets on the same plane (**Figure 1B**). A steel bar at the base of the splint was attached to a 6-axis force transducer (Delta F/T Sensor, ATI Industrial Automation, Apex, NC, USA) positioned below the desktop to record isometric forces. Surface electromyographic (EMG) activity from 13 muscles acting on the shoulder and elbow muscles was recorded with active bipolar electrodes (DE 2.1, Delsys Inc., Boston, MA), after band-pass filtering (20–450 Hz) and amplification (gain 1000, Bagnoli-16, Delsys Inc.). Force and EMG data were digitized at 1 kHz using an A/D PCI board (PCI-6229, National Instruments, Austin, TX, USA). The virtual scene was rendered by a PC workstation with a refresh rate of 60 Hz using custom software. Cursor position information was processed by a second PC workstation running a real-time operating system and transmitted to the first workstation through an Ethernet link. Cursor motion was simulated in real time as a mass accelerated by the horizontal force (parallel to the desktop) applied by the subject on the splint, a viscous force, and an elastic force proportional to the distance for the rest position. The spring constant was set such that a constant force with a magnitude equal to 20% of the mean maximum voluntary force (MVF) magnitude across force directions (see below) would maintain the cursor stationary at 5 cm from the origin. The damping constant was set to make the system critically damped.

# **EXPERIMENTAL PROTOCOL**

The experiment was subdivided into blocks, each consisting of a set of trials (**Figures 1C,D**). The first maximum voluntary contraction (MVC) block served to establish a mean MVF over horizontal force directions of each subject. At each trial subjects moved the sphere along a virtual line in one of 8 directions (equally spaced by 45◦) by applying horizontal forces until they reached their maximum force production capability. After remaining 1 s at the position of maximum force, subjects were instructed to relax and to bring back the sphere to the rest position. When the trial stopped after 15 s, a new trial with a different target direction was initiated. In the following blocks, subjects performed center out forces to 8 equally spaced targets with force levels of 20 and 30% of MVF (corresponding to displacements of 5 and 7.5 cm of the sphere, respectively). Each target was repeated two times in a pseudorandom order (i.e., 32 trials per block). A trial was initiated

**FIGURE 1 | Experimental setup and procedures. (A)** Subjects sat in a moveable chair with their forearm pronated and fixed in a splint rigidly coupled to a force transducer. A flat monitor occluded the subject's hand and displayed a virtual scene co-located with the real desktop. **(B)** Screenshot of the virtual scene. Subjects controlled the position of the blue sphere by applying forces to the force transducer. The sphere is illustrated inside the yellow semi-opaque sphere indicating the start position. The target is shown as a gray sphere. **(C)** Sequence of events in a trial. Subjects had to maintain the blue sphere inside the start sphere for 1 s. Afterwards the target appeared and the start sphere disappeared

instructing the subject to reach it and to hold the blue sphere inside the target sphere for 1 s. The target sphere changed its color from gray to yellow when the target was reached. Finally, the subject was instructed to return to the start position and remain there for 1 s. **(D)** Organization of an experimental session. In the first block the maximum voluntary contraction (MVC) during generation of maximum voluntary force across directions was established followed by three baseline blocks, each consisting of 32 trials. From the fifth block to the tenth block a clockwise (CW) visuomotor rotation was introduced followed by three washout blocks without visuomotor rotation.

by keeping the sphere at the start position (tolerance ±2% of MVF, i.e., 0.5 cm) for 1 s. Afterwards, a target appeared and the sphere indicating the start position disappeared. Subjects were instructed to move to the target as fast as possible, and to remain for 1 s at the target (tolerance ±2% of MVF). The trial was finished successfully 0.5 s after returning to the start position (**Figure 1C**).

Subjects performed three blocks of 32 trials (baseline), followed by six blocks with a 45◦ clockwise (CW) visuomotor rotation applied to the planar force used to compute the cursor displacement (rotation), and another three blocks without the rotation (washout) as shown in **Figure 1D**.

### **DATA ANALYSIS**

### *Initial directional error*

To evaluate the adaptation to the visuomotor rotation at the kinematics level we computed the initial directional error at 100 ms after movement onset. Movement onset was defined when the cursor speed exceeded 0.5 cm s<sup>−</sup>1. The initial directional error was defined as the angle of the vector pointing from movement onset to the cursor position at 100 ms after movement onset with respect to a straight line to the target (**Figure 2B**).

### *Synergy extraction*

Muscle synergies from each block were identified by non-negative matrix factorization (NMF) from EMG patterns recorded from the go signal to the end of successful target acquisition. Recorded EMG data were rectified and digitally low-pass filtered (2nd order Butterworth, 5 Hz cutoff) and re-sampled at 100 Hz to reduce data size. In each trial, mean EMG activity of each muscle during the initial rest phase was used as an estimate of baseline noise level and subtracted from the rest of the data. The EMGs were normalized to the maximum activation across direction recorded during the MVC block. Finally, the rectified and normalized EMGs of each trial from a given block (or from several blocks) were pooled together into a single data matrix **M**. The concatenated EMG patterns **m** (columns of the matrix **M**) were described by a combination of synergy coefficient by **m** = **W c**, with **W** the *M* × *N* synergy matrix whose columns are vectors specifying relative muscle activation levels (invariant across time and trials), and **c** a *N*-dimensional synergy activation vector (time- and trialdependent), *N* the number of synergies and *M* the number of muscles. The number of data points (columns) in the matrix **M** slightly varied between blocks and subjects because the time to successfully complete the target acquisition was not constant for each trial. For each possible *N* from 1 to *M*, the iterative optimization algorithm (Lee and Seung, 1999, 2001) was repeated 10 times and the solution with the highest fraction of data variation explained (*R*2) was retained. We selected the smallest number of synergies which explained more than 90% of the data variation. Synergies were extracted from the following 13 muscles: Brachioradialis (BracRad), Biceps brachii, short head (BicShort), Biceps brachii, long head (BicLong), Triceps brachii, lateral head (TrLat), Triceps brachii, long head (TrLong), anterior Deltoid (DeltA), medial Deltoid (DeltM), posterior Deltoid, posterior (DeltP), clavicular part of the Pectoralis major (PectMajClav), medial Trapezius (TrapMed), Latissimus dorsi (LatDorsi), Teres Major (TerMaj) and Infraspinatus (InfraSp).

subjects for each block.

(Block 10). Aftereffects were present at the beginning of the washout

### *Tuning curves*

Muscle and synergy tuning curves and preferred directions (PDs) were calculated for each block by a cosine fit (d'Avella et al., 2006) between the activation of each muscle or the coefficients of each synergy (averaged across target distances and repetitions in a block) and the corresponding target position. We fitted the muscle (or synergy) activity with a linear regression *m*(θ) = β<sup>0</sup> + β*<sup>x</sup>* cos(θ) + β*<sup>y</sup>* sin(θ), where *m*(ϑ) is the muscle (synergy) activity for a target in direction ϑ and θ*PD* = tan−1(β*y*/β*x*) is the PD of the cosine tuning. Tuning curves were computed for the time interval between movement onset and the following 100 ms, equivalent to the computation of the initial movement angle error. For visualization, the tuning curves were smoothed by a 2-dimensional spline interpolation and plotted in a polar coordinate system. Muscles or synergy coefficients which were not significantly cosine tuned were excluded from analysis (see **Table 1**). Significant cosine tuning was assumed when the *p*value of the regression between the data and the optimal cosine tuning was smaller than 0.05 (see **Table 1** for *R*<sup>2</sup> values of the regression for each subject). After applying a CW visuomotor rotation, the cursor movement was initially directed CW with respect to the target, i.e., with the same directional error that would have been obtained with a rotation of the target in a counter-clockwise (CCW) direction instead of the CW cursor rotation. Thus, the PDs of muscles and synergies were computed according to the CCW-rotated visual targets, i.e., the actual force targets. Then the initial change of PD is directed CCW as displayed for individual muscles in **Figure 3A**. To better compare the changes of muscle- and synergy-PDs with the cursor initial direction error, we changed the sign of those PDs in block 5–10.

### *Data reconstruction by synergies*

To quantify how well the structure of the muscle patterns of one block were captured by the synergies extracted from different blocks we reconstructed the EMG traces by finding the synergy coefficients that reconstructed those traces with the highest *R*<sup>2</sup> value. To find the optimal synergy coefficients we ran the same iterative optimization algorithm used for the extraction of the synergies without updating the synergies. We calculated the *R*<sup>2</sup> value of the reconstruction of EMG traces from blocks 2–13 using the synergies extracted from the pooled data from blocks 2–4. Monte-Carlo simulations were used to ensure that the reconstruction quality was higher than chance level. For the Monte-Carlo simulation we selected 30 sets of EMG signals at randomly chosen time points which reflected sets of *M* × *N* "random" synergies. The number of random synergies for each set and subject was chosen equal to the number of selected synergies. For each set of random synergies the EMG traces were reconstructed and the *R*<sup>2</sup> value computed.

**Frontiers in Computational Neuroscience www.frontiersin.org** September 2013 | Volume 7 | Article 120 |

PectMajClav, *second row*: DeltA) of subject 3 estimated from different blocks (*blue* markers and lines) compared to the tuning curve of the muscles calculated from the pooled data of all baseline blocks (reference blocks, *gray* markers and lines). As for PectMajClav, the changes in preferred direction (PD, *blue* radial segment) of many muscles with respect cosine tuned and those muscles were excluded from analysis. **(B)** Examples of the mean PD-change of three muscles across subjects and grand mean of the PD-change across muscles and subjects (*rightmost panel*). The PD changes of the muscles (*blue*) was not statistically different

from the initial direction angle error (*red*).

### *Synergy similarity*

We measured the similarity between two synergies by normalizing the synergy vectors (Euclidean norm) and computing their scalar product. Similarity between two sets of synergies was assessed by first matching pairs of synergies according to their normalized scalar product (starting from the pair with the highest value and continuing with the pairs from the remaining synergies until all pairs were matched) and then computing the mean scalar product over all matched pairs. We selected the number of synergies from the reference synergies for all comparisons of different sets of synergies for each subject. As for the reconstruction *R*2, Monte-Carlo simulations were used to ensure that the similarity was higher than chance level.

### *Statistical analysis*

Two-Way repeated measures ANOVAs (2 adaptation measures × 6 perturbation blocks) were conducted to detect significant differences between the time course of initial direction error and that of change of muscle PDs. Paired *t*-tests with Bonferroni correction were applied where appropriate.

# **RESULTS**

# **INITIAL DIRECTION ERROR CHANGES DURING VISUOMOTOR ROTATION**

In the baseline blocks subjects produced relatively straight trajectories (**Figure 2A**, Block 4) with bell shaped velocity profiles (**Figure 2A**, *bottom row*). Change of visuomotor mapping caused distorted trajectories (**Figure 2A**, Block 5) and multipeaked velocity profiles reflecting corrective movements necessary to reach the target. After exposure of six blocks with visuomotor rotation subjects compensated (**Figure 2A**, Block 7) for the perturbation and were able to produce relatively straight movements to the target (**Figure 2A**, Block 10). Velocity profiles approached a single bell-shaped velocity profile again. When the visuomotor rotation was removed (**Figure 2A**, Block 11), subjects showed aftereffects in the opposite direction of the perturbation, which were extinguished after three washout blocks (**Figure 2A**, Block 13).

We quantified the adaptation of all subjects by analyzing the initial movement direction error with respect to a straight line to the target at 100 ms after movement onset (**Figure 2B**, *right dotted vertical line*). Movement onset was defined as the time when the cursor speed exceeded a threshold of 5 cm/s (**Figure 2B**, *left dotted vertical line*) after the Go-signal had occurred. Across subjects, the initial movement direction showed a large CW deviation when the rotation was introduced (Block 5, **Figure 2C**) that was gradually reduced with practice (Blocks 5–10) and a large CCW deviation (aftereffect) once the perturbation was removed (Block 11). One-Way repeated measures ANOVA confirmed a significant difference of initial direction error (factor: block number, *F* = 5.04, *p* < 0.001). In the baseline blocks the initial direction error was small (Block 2: −1.86 ± 2.41◦, mean ± SD, Block 3: −1.44 ± 3.05◦; Block 4: −1.96 ± 2.64◦). At the beginning of the visuomotor perturbation the initial direction error (Block 5: −29.59 ± 8.42◦) was significantly different from the error in the last baseline block (*p* < 0.001, two-tailed, paired *t*-test) but approached baseline level by a gradual adaptation in subsequent blocks (Block 10: −11.35 ± 4.53◦, *p* = 0.002 with respect to Block 4). After the perturbation was removed, the initial direction error was significantly (Block 11: 18.62 ± 4.33◦, *p* < 0.001) higher than in the last baseline block, but approached baseline level at the end of the washout (Block 13: 7.02 ± 2.71◦, *p* < 0.001 with respect to Block 4). All comparisons remained significant after Bonferroni correction.

# **PREFERRED DIRECTION CHANGE OF MUSCLES MATCHES INITIAL DIRECTION ERROR CHANGE**

We tested if the PD of muscle directional tuning followed the change of initial direction error, as was observed in visuomotor rotation of isometric wrist movements (de Rugy, 2010). We found that not all recorded muscles were cosine-tuned. Some muscles showed peaks of activity in multiple directions and their tuning was not captured by a single cosine function. We therefore excluded all muscles which were not significantly cosine-tuned in the reference baseline blocks (see **Table 1** for *R*<sup>2</sup> values of the cosine fit) for this analysis as their PDs did not characterize their directional tuning reliably.

**Figure 3A** shows an example of a typical change of the PD of a cosine-tuned muscle (*blue tuning curves*, PectMajClav) with respect to the PD of the reference blocks (*gray*, Blocks 2–4). Several muscles were found not to be cosine-tuned, for example DeltA shown in **Figure 3A** (*second row*). The time-course of the PD of DeltA across blocks would indicate a larger PD change in the last baseline block (Block 4) than in the first visuomotor rotation block (Block 5) with respect to the reference blocks. Considering only cosine-tuned muscles the PDs closely followed the change in force direction error as shown for three muscles and the overall mean across muscles and subjects, respectively, in **Figure 3B**. A Two-Way repeated measures ANOVA comparing the mean change of the muscle PDs across subjects and muscles with the mean initial direction error did neither reveal a difference (adaptation measure × perturbation block, *F* = 0.73, *p* = 0.394) between the two measures nor a significant interaction (*F* = 0.34, *p* = 0.889).

# **ROBUSTNESS OF SYNERGY STRUCTURE**

Given the close relationship between muscles and forces we tested whether the adaptation process could be explained by fixed muscle synergies being recruited with PDs rotating together with the muscle PDs. We extracted synergies from each block from the EMG signals beginning at movement onset until the time point at which the target was successfully acquired. **Figure 4A** shows the fraction of data variation explained by the extracted synergies for the last baseline block (Block 4, *black*) and the first perturbation block (Block 5, *gray*). The number of synergies selected according to a 90% threshold was not significantly (*p* = 0.598, two-tailed paired *t*-test) different between the two blocks (Block 4: 4.75 ± 0.71 synergies, Block 5: 4.62 ± 0.52 synergies). When considering all blocks, the minimum number of synergies explaining more than 90% of the data variation was not significantly different over time, as revealed by a One-Way repeated measures ANOVA (factor: block number, *F* = 1.68, *p* = 0.093).

The structure of the synergies in most cases was similar across subjects and blocks. **Figure 4B** compares the synergies extracted

from Block 4 (black bars) with those extracted from Block 5 (white bars) for each subject. We compared a number of synergies in Block 4 and 5 equal to the number of synergies extracted from the pooled data of Blocks 2, 3, and 4 ("reference synergies," see **Table 1**). We identified the best matching pairs of synergies according to the similarity quantified by the cosine between the synergy vectors and plotted them side-by-side (**Figure 4B**, cosine values are shown with a gray shaded background). In general, the similarity of synergies from Block 4 and 5 across subjects, as indicated by a cosine-value close to one, was high (mean ± SD of similarity: 0.95 ± 0.06, range: 0.54–0.99). However, 2 out of 36 pairs had a similarity <0.8 (S1: second pair, similarity 0.54; S6: second pair, similarity 0.71) and an additional pair had a similarity <0.9 (S7, fifth pair, similarity 0.88). Additional analysis of the synergies in such pairs indicated that they were similar either to one of the synergies in the reference set with the same number of synergies (S6: similarity 0.93 and 0.90 for the synergies extracted from Block 4 and 5 respectively; S1: similarity 0.98 for the synergy extracted from Block 5; S7: similarity 0.93 for the synergy extracted from Block 5) or to one of the synergies in the reference set with an additional synergy (S1: similarity 0.95 for the synergy extracted from Block 4). In one case (S7), the sets of 4 synergies extracted from Blocks 4 and 5 had a much higher similarity (mean 0.97, minimum 0.94) than the sets with the same number of synergies as the reference synergies (5 for S7). These observations suggest that the identification of the synergies from individual blocks is affected by noise and inter-trial variability more than the identification of synergies from the pulled data of all three baseline blocks. Moreover, even if the data of a specific block were best captured by a synergy that did not match closely any synergies in a different block, such synergy might only have captured a very small amount of variation in the data.

Thus, we quantified the stability of the subspace spanned by the reference synergies by assessing how well they could reconstruct the muscle patterns of all other blocks. The similarity of the reconstructed muscle patterns (obtained by multiplying synergies extracted from reference blocks and synergy coefficients fitted onto the data of each block) with respect to the actual muscle patterns of each block was quantified as a *R*<sup>2</sup> value. The high *R*<sup>2</sup> values (range: 0.72-0.94, mean ± SD: 0.88 ± 0.04) indicate that muscle patterns during adaptation to visuomotor rotation are selected from a stable muscle subspace (**Figure 5A**). However, there was a small but constant decrease of *R*<sup>2</sup> values during the experiment, possibly reflecting small changes in elbow position or fatigue.

To exclude that the high *R*<sup>2</sup> values were obtained by chance, we attempted to reconstruct the data using random synergies. We repeated the reconstruction with 30 sets of random synergies. In all blocks the *R*<sup>2</sup> value obtained with random synergies was significantly smaller (all *p* < 0.001, two-tailed paired *t*-tests, *gray bars* in **Figure 5B**) than the reconstruction using synergies extracted from the reference blocks (**Figure 5B**, *black bars*).

We also assessed the similarity of the synergies extracted from the reference blocks and those extracted from all blocks. In all subjects and blocks, the mean normalized scalar product between the best matched pairs of synergies was close to one (**Figure 5C**), indicating a high similarity. Across subjects, in all block the similarity (**Figure 5D** *black bars*, range: 0.69–0.99, mean ± SD: 0.92 ± 0.07) was significantly higher that between random synergies (*gray bars*, *p* < 0.01).

### **ADAPTATION BY ROTATION OF SYNERGY PREFERRED DIRECTION**

Given the stability of muscle synergies during visuomotor adaptation, we expected the change of PDs of the synergy coefficients to closely match the change of directional error of the initial force and the change of PDs of the muscles. **Figure 6A** shows an example of the directional tuning and the PDs of the synergy coefficients for subject 3 (*blue tuning curves*) with respect to the tuning and PDs in the reference blocks (*gray*, Blocks 2–4). The change of synergy coefficient PDs from subject 3 is shown across all blocks. The PD change was similar across synergies with −1.43 ± 2.51◦ (mean ± SD) deviation in the last baseline block (Block 4), an initial deviation of −34.91 ± 0.85◦ at the beginning of the visuomotor rotation (Block 5), a gradual reduction of PD change (−9.67 ± 4.03◦, Block 10), aftereffects at the beginning of the washout phase (19.79 ± 11.43◦, Block 11) and a gradual return to baseline (7.52 ± 6.49◦, Block 13) at the end of the experiment (**Figure 6B**). Across subjects the mean PD change of synergy coefficients (**Figure 6C**, blue traces) was: Block 4: 1.06 ± 4.53, Block 5: −26.95 ± 5.54, Block 10: −9.97 ± 6.02, Block 13: 6.16 ± 7.57, similar to the change of PDs of muscles.

synergies.

**FIGURE 6 | Tuning curves and preferred directions of synergy coefficients. (A)** Example of tuning curves for the four synergies of subject 3 estimated from different blocks (*blue* markers and lines) compared to the tuning curves calculated from the pooled data of all baseline blocks (reference blocks, *gray* markers and lines). **(B)** Mean PD-change of synergy coefficients for subject 3 for all blocks. **(C)** Grand mean of the PD-change across synergies and subjects (*blue*) and initial direction angle error (*red*) for all blocks. The PD change of the synergy coefficients was not statistically different from the initial direction angle error.

### **DISCUSSION**

We investigated the changes in the muscle patterns underlying visuomotor adaptation in a virtual reaching task requiring the generation of multidirectional isometric forces with the arm. We found that the changes in the PDs of most of the muscles closely followed the change in force direction required to compensate for the perturbation, suggesting that the adaptive process relies on remapping target directions into new planned force output directions. We then tested whether a given force output is generated, during adaptation to a novel rotation of the force-to-cursor mapping and after re-adaptation to the normal mapping, by the same set of muscle synergies which capture the muscle patterns in the baseline condition. We found that the number and the structure of the synergies was robust throughout adaptation and re-adaptation. In each subject, four or five synergies extracted in the baseline condition could reconstruct with a comparable level of accuracy the muscle patterns recorded before, during, and after the visuomotor rotation and they had a mean similarity with the synergies extracted from individual blocks throughout the experiment significantly higher than the mean similarity between random synergies. We also found that the change in the PDs of the synergy recruitment closely matched the change in the PDs of the individual muscles.

Many studies of motor adaptation after dynamic or visuomotor perturbation have used center-out reaching tasks in which the motion of the hand is either directly perturbed, by viscous force fields generated by a robotic device (Shadmehr and Mussa-Ivaldi, 1994) and by Coriolis forces arising in a rotating room (Lackner and Dizio, 1994), or mapped into a virtual end-effector, as with visuomotor rotation of a cursor on a computer screen (Ghilardi et al., 1995; Imamizu et al., 1995; Krakauer et al., 2000). Only in a few cases isometric force at the hand has been used as motor output instead of hand motion to investigate visuomotor adaptation (Hinder et al., 2007; de Rugy et al., 2009; de Rugy and Carroll, 2010). By using isometric force, as the posture of the arm is fixed, there is no need for visual-proprioceptive recalibration after the perturbation. In contrast, the adaptive response to a visuomotor rotation of the movement of a cursor associated with actual hand movement involves both sensorimotor remapping and sensory recalibration, i.e., alignment of the felt and seen position of the hand at the end of the movement (Simani et al., 2007). Moreover, with isometric force as motor output there is no need for increasing limb impedance by increasing muscle co-contraction to stabilize the hand trajectory immediately after the perturbation, i.e., before it is compensated in the feed-forward motor command. Increase in muscle activation has been reported after dynamic perturbations (Thoroughman and Shadmehr, 1999; Franklin et al., 2003) but also after visuomotor rotation (Paz et al., 2003). In contrast, we did not observe a significant increase in muscle activation, in accordance with a previous study of visuomotor rotation in an isometric reaching task (de Rugy and Carroll, 2010).

Stability of the relationship between directional tuning of the muscles and force has been observed before in macaque monkeys and human subjects after both visuomotor and dynamic perturbations. In monkeys performing a reaching task by moving a joystick that controlled a cursor on a video screen, most muscles recorded in the shoulder, neck and trunk showed clear PDs which were stable in motor coordinates during adaptation to visuomotor rotations (Wise et al., 1998). In humans adapting to viscous force fields, i.e., to velocity dependent forces applied perpendicularly to the hand movement direction, the peak of EMG activity of two pairs of shoulder and elbow muscles counteracting the perturbation gradually shifted earlier in the reaching movement, becoming a feed-forward command, and the EMG tuning curves gradually rotated by an amount specific to the force field (Thoroughman and Shadmehr, 1999). Similarly in monkeys, during adaptation to force fields the PDs of shoulder and elbow muscles rotated in the direction of the external force and returned to baseline when the perturbation was removed (Li et al., 2001), indicating a stable relationship between muscle directional tuning and generated force. In an isometric virtual reaching task in which wrist flexion/extension and radial/ulnar deviation forces generated by human subjects were mapped, respectively, into horizontal and vertical movements of a cursor on a vertically mounted computer screen, the changes in the directional tuning of four wrist muscles closely matched the rotation of the directional error in force after a 45◦ visuomotor rotation, indicating that the functional contribution of muscles remained consistent during adaptation (de Rugy and Carroll, 2010). Thus, our observations on the stability of the relationship between muscle directional tuning of cosine tuned muscles and force are in accordance with previous observations but they are reported for the first time for adaptation to visuomotor rotation of the isometric force generated by a large number of arm muscles.

Importantly, subjects were not informed on what kind of perturbation (consistent rotation of the force by a fixed angle) they will experience in the experiment. This kind of adaptation is likely to occur implicitly when the desired hand trajectory and the executed trajectory in visual space do not match (Mazzoni and Krakauer, 2006; Krakauer, 2009), possibly by reducing a prediction error computed by a forward model (Shadmehr et al., 2010). A question that arises in this context is why the nervous system does not exploit the redundancy inherent in the neuro-muscular system to compensate the perturbation by reducing the force error at the level of force components generated by individual muscles. Indeed, it would have been possible to compensate for the perturbation by adapting the activation of each muscle reducing the error between force target and muscle force, possibly ending with a muscle pattern in the adapted state different from the muscle pattern selected to generate the rotated force before the perturbation. Despite the theoretical flexibility of the mapping from muscles to forces, i.e., many different muscle patterns can generate the same forces (Kutch and Valero-Cuevas, 2012), adaptation of individual muscles did not appear to be used for compensating the distorted visuomotor mapping. In contrast, the relationship between muscles and forces remained fixed indicating that subjects tried to adapt to the perturbation by rotating the forces ("aiming in a different direction") from the beginning on. An explanation could be that adaptation to this kind of perturbations occurs early in the sensorimotor transformations mapping visual targets into muscle patterns by adapting a single learning parameter (force direction) resulting in a coordinated rotation of all the muscle PDs. However, adapting only a few high level parameters may be computationally advantageous but it might also be required if the generation of muscle patterns is not as flexible as theoretically possible. The characteristics of the connectivity between different areas in the motor systems might prevent the nervous system from adapting the recruitment of individual muscles to compensate for visuomotor perturbations. Divergence from premotor neurons to many muscles and convergence to a single muscle from many premotor neurons (Graziano, 2006; Rathelot and Strick, 2006) might underlie the organization of muscle synergies in the primary motor cortex (Gentner et al., 2010; Overduin et al., 2012) and in the spinal cord (Hart and Giszter, 2010).

We therefore considered whether the visuomotor adaptation process is compatible with muscle synergies. Muscle synergies has been recently identified during isometric force production (Roh et al., 2012). However, to our knowledge, a direct test of robustness of muscle synergies during visuomotor adaptation has never been conducted. The structure of the synergies and their number appeared to be similar across subjects and blocks. During the adaptation the coefficients of cosine tuned synergies rotated almost identically as the PDs of individual tuned muscles, albeit some synergies contained contributions of non-cosine tuned muscles. Therefore, isometric visuomotor adaptation can be equally well described by rotation of forces, muscle-PD changes, and PD changes of synergy coefficients. Moreover, as adaptation may involve components with different learning rates (Smith et al., 2006) the analysis of muscle synergies may allow to dissociate different components of the adaptive process. A rapidly adapting component may be related to the adjustment of synergy coefficients, e.g., rotating their recruitment to compensate for a visuomotor rotation. A slower component may be involved in the acquisition of new synergies or in changing the structure of existing ones. In isometric visuomotor adaptation we found that only a fast learning component as there is no need for altering the synergies. In contrast, we recently found that adaptation to novel perturbations that cannot be compensated by adapting the recruitment of existing synergies but require new or altered synergies is slower than adaptation to similar perturbations compatible with the synergies (Berger et al., 2013). Testing such perturbations has provided new direct evidence for a synergistic organization (d'Avella and Pai, 2010).

To assess the robustness of the synergies during adaptation we assessed the quality of reconstruction of the muscle patterns recorded throughout the experiment by the synergies extracted from the baseline condition (reference blocks) and their similarity with the same number of synergies extracted from individual blocks. For each subject, the number of synergies was selected as the minimum number sufficient to explain at least 90% of the data variation. While criteria based on a threshold on the variation accounted for (VAF) has been frequently used in the muscle synergy literature (Tresch et al., 1999; Ting and Macpherson, 2005; Torres-Oviedo et al., 2006) other criteria based on the detection of a "knee" in the curve of the VAF as a function of the number of synergies (d'Avella et al., 2003; Cheung et al., 2005; Tresch et al., 2006), on a combination of VAF-threshold and knee (Berger et al., 2013) have been proposed. All these criteria depend on some threshold which must be chosen *ad-hoc*. Recently, a new criterion based on decoding single-trial task parameters from synergy coefficients has been proposed (Delis et al., 2013). Such criterion does not depend on *ad-hoc* parameters but it can be only applied to synergies extracted from a large number of repetitions (>10) of the same experimental condition. Thus, selection criteria for synergies extracted from averaged data or a limited number of repetitions, as in our case, mainly guarantee that the number of synergies can be meaningfully compared across different conditions and subjects rather than ensuring that the "true" number of synergies has been selected. Moreover, to simplify the assessment of synergy similarity we compared the same number of synergies extracted from the reference blocks and extracted from individual blocks. An alternative approach would have been to select a different number of synergies for each block according to the VAF criterion and to assess both the similarity between the pairs formed with the smallest of the two synergy sets and the dimensionality of the set. However, as VAF criterion is affected by noise, we preferred to rely only on the number of synergies selected from three baseline blocks (reference blocks) rather than on the number selected from individual blocks. An incorrect identification of the number of synergies in a single block might in fact significantly affect the mean similarity, as the set with an additional synergy often contains two synergies resulting from the splitting of one of the synergies in the original set (d'Avella et al., 2003) and both synergies can have low similarity with the original one. In any case, the *R*<sup>2</sup> measure of synergy subspace robustness does not depend on the number of synergies selected for the individual blocks as it is based on the reconstruction of the actual data of each block.

In summary our results indicate that muscle synergy structure is robust during visuomotor adaptation and that the required changes in the muscle patterns are obtained by rotating the directional tuning of the synergy recruitment. Visuomotor adaptation may occur by remapping desired end effector movement into synergy coefficients. Further experiments are required

# **REFERENCES**


to identify synergies as a physiological correlate of motor learning.

# **ACKNOWLEDGMENTS**

This work was supported by the Human Frontier Science Program Organization (RGP11/2008), the European Community's Seventh Framework Programme (FP7/2007- 2013—Challenge 2—Cognitive Systems, Interaction, Robotics, grant agreement No 248311-AMARSi), the Canada Research Chairs Program, the Natural Sciences and Engineering Research Council of Canada, Canada Foundation for Innovation, the Canadian Institutes of Health Research, NIH, and the Peter Wall Institute for Advanced Studies. We thank D. Borzelli for help with the illustration of the apparatus.


of motor skill in the corticomuscular system of musicians. *Curr. Biol.* 20, 1869–1874. doi: 10.1016/j.cub.2010.09.045


*in Neural Information Processing Systems 13*, eds T. K. Leen, T. G. Dietterich, and V. Tresp (Cambridge, MA: MIT Press), 556–562.


the spinal cord: evidence from focal intraspinal NMDA iontophoresis in the frog. *J. Neurophysiol.* 85, 605–619.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 17 March 2013; accepted: 15 August 2013; published online: 03 September 2013.*

*Citation: Gentner R, Edmunds T, Pai DK and d'Avella A (2013) Robustness of muscle synergies during visuomotor adaptation. Front. Comput. Neurosci. 7:120. doi: 10.3389/fncom.2013.00120 This article was submitted to the*

*journal Frontiers in Computational Neuroscience. Copyright © 2013 Gentner, Edmunds, Pai and d'Avella. This is an open-access article distributed under the terms of the Creative Commons Attribution License*

*(CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Physiological modules for generating discrete and rhythmic movements: action identification by a dynamic recurrent neural network

# *Ana Bengoetxea1,2\*, Françoise Leurs 1, Thomas Hoellinger 1, Ana M. Cebolla1, Bernard Dan1,3, Joseph McIntyre4,5 and Guy Cheron1,6*

*<sup>1</sup> Laboratoire de Neurophysiologie et Biomécanique du Mouvement, Faculté des Sciences de la Motricité, Université Libre de Bruxelles, Brussels, Belgium*

*<sup>2</sup> Laboratorio de Cinesiología y Motricidad, Departamento de Fisiología, Facultad de Medicina y Odontología, Universidad del País Vasco-Euskal Herriko Unibertsitatea (UPV/EHU), Leioa, Spain*

*<sup>3</sup> Département de Neurologie, Hôpital Universitaire des Enfants Reine Fabiola, Université Libre de Bruxelles, Brussels, Belgium*

*<sup>4</sup> Heath Division, Fondation Tecnalia Research and Innovation, San Sebastian, Spain*

*<sup>5</sup> IKERBASQUE – Basque Foundation for Science, Bilbao, Spain*

*<sup>6</sup> Laboratoire d'Électrophysiologie, Université de Mons-Hainaut, Mons, Belgium*

### *Edited by:*

*Yuri P. Ivanenko, IRCCS Fondazione Santa Lucia, Italy*

### *Reviewed by:*

*Juan C. Moreno, Spanish National Research Council, Spain Thierry Pozzo, Institut National de la Santé et de la Recherche Médicale, France*

### *\*Correspondence:*

*Ana Bengoetxea, Laboratoire de Neurophysiologie et Biomécanique du Mouvement, Faculté des Sciences de la Motricité (C.P. 640), Université Libre de Bruxelles, 808, Route de Lennik, 1070 Bruxelles, Belgium*

*e-mail: abengoec@ulb.ac.be*

In this study we employed a dynamic recurrent neural network (DRNN) in a novel fashion to reveal characteristics of control modules underlying the generation of muscle activations when drawing figures with the outstretched arm. We asked healthy human subjects to perform four different figure-eight movements in each of two workspaces (frontal plane and sagittal plane). We then trained a DRNN to predict the movement of the wrist from information in the EMG signals from seven different muscles. We trained different instances of the same network on a single movement direction, on all four movement directions in a single movement plane, or on all eight possible movement patterns and looked at the ability of the DRNN to generalize and predict movements for trials that were not included in the training set. Within a single movement plane, a DRNN trained on one movement direction was not able to predict movements of the hand for trials in the other three directions, but a DRNN trained simultaneously on all four movement directions could generalize across movement directions within the same plane. Similarly, the DRNN was able to reproduce the kinematics of the hand for both movement planes, but only if it was trained on examples performed in each one. As we will discuss, these results indicate that there are important dynamical constraints on the mapping of EMG to hand movement that depend on both the time sequence of the movement and on the anatomical constraints of the musculoskeletal system. In a second step, we injected EMG signals constructed from different synergies derived by the PCA in order to identify the mechanical significance of each of these components. From these results, one can surmise that discrete-rhythmic movements may be constructed from three different fundamental modules, one regulating the co-activation of all muscles over the time span of the movement and two others elliciting patterns of reciprocal activation operating in orthogonal directions.

**Keywords: rhythmic movement, muscular synergy, dynamic recurrent neuronal network, principal component analysis, upper limb, figure-eight**

# **INTRODUCTION**

The concept of synergy, associated with basic motor modules of activity, refers to two distinct notions. On the one hand, the large variety of movements accomplished by a limb could be explained by the activation of a reduced number of muscular synergies (Saltiel et al., 2001; Ivanenko et al., 2004; d'Avella et al., 2006). On the other hand, for a given movement, the establishment by the central nervous system (CNS) of synchronous muscular synergies could explain how activity is distributed within a muscle group (Weiss and Flanders, 2004; d'Avella and Bizzi, 2005; Klein Breteler et al., 2007). The first notion gives rise to a simplification in the number of degrees of freedom to be controlled by the CNS for motor control while the second one links modules of activity presented by limb muscles and their functional meaning in the context of motor action. Non-invasive recording of the electromyographic (EMG) signals are widely used to extract muscular synergies (d'Avella et al., 2003, 2008; Ivanenko et al., 2004; Klein Breteler et al., 2007; Cheung et al., 2012; Frère and Hug, 2012). These muscular synergies seem to be structured in the brain stem and spinal cord (Cheung et al., 2009; Clark et al., 2010) and even in the motor cortex for highly skilled movements (Gentner and Classen, 2006; Rathelot and Strick, 2006).

Most attempts to define muscular synergies to date have relied on tools such as principal component analysis or other forms of factor analysis that extract stable relationships (structure) between the activation patterns of multiple muscles. These techniques do not, however, serve to identify structure in the mapping of EMG inputs to the actual motor output (e.g., movement of the hand). In this context, the use of dynamic recurrent neural networks (DRNN) to interpret biological signals coming from the human body could be an interesting complementary approach to extract modules underlying the input-output relationship between muscle activation patterns and movement, where the input signal consist of the EMG signals provided by different muscles implicated in the movement and the output signals of the DRNN would be the movement kinematics. Our proposition is that using a DRNN to map EMGs to kinematics can provide a new, indirect method to better understand motor organization in the CNS, for reasons that we will lay out in following paragraphs.

DRNNs are recognized as universal approximators of dynamical systems (Kuan and Hornik, 1991; Doya, 1996; Yi et al., 2006; Tani et al., 2008; Bicho et al., 2011; Laje and Buonomano, 2013) and the attractor states reached through DRNN learning of EMG-to-kinematic patterns correspond to biologically interpretable solutions (Cheron et al., 1996, 2003, 2006, 2007, 2011; Song and Tong, 2005; Liu and Buonomano, 2009). After the learning phase, the identification performed by the DRNN offers a dynamic memory which has been used, for example, to recognize the physiological preferred direction of action for the studied muscles (Cheron et al., 1996, 2003, 2006, 2007). But the correct recognition by a trained DRNN of EMG patterns not included in the training set may also be related to motor learning, as shown by the following example. When humans learn a specific movement, the initial solutions acquired through self-organized principle are often unstable and become more stable with practice. This feature is apparent in the study by Dominici et al. (2011) where they demonstrated that development of motor patterns from neonatal to toddler consisted of learning new muscle synergies, adding new patterns to the few basic patterns present already at birth. When a DRNN was applied to EMG and kinematic data also acquired from infants and toddlers (Cheron et al., 2001) we showed that it is only when behaviors have been practiced sufficiently by the children and when the task and the context are unchanging that patterns emerged were sufficiently stable to allow the DRNN to generalize (Cheron et al., 2011). Thus, the ability of a DRNN to generalize across movements is a reflection of the stability and maturity of the underlying building blocks. Here we apply a similar concept to analyze a different question, that of how the CNS generalizes the task of programming movements across different kinematic and biomechanical conditions. We hypothesize that the CNS accomplishes this task by exploiting modules to simplify the computation of the motor command. If this hypothesis is valid, then application of the DRNN can be used to characterize which modules are stable across varying situations.

Taking into account Bernstein's theory of motor control (Bongaardt, 2001) where the motor program (also called engram) used to generate a movement is organized at a higher level in the CNS while the details of motor action (also called ecphoria) are selected at a lower level, we can consider that in the EMG command one can find a mixture of the higher (topological) and lower (metrics) aspects of motor action. If we extract the "synchronous synergies" for a given movement, each module could contain different levels of information ranging from the general (what is the form to be reproduced by the hand) to the more specific (what are the joint displacements and muscle activations used to generate the movement of the hand). Applied to the analysis of a drawing movement, such as a figure-eight, we should find in the EMG signals information corresponding to a generalized "figure-eight" motor program mixed with the information corresponding to the specific aspects of motor execution, such as the movement's velocity, amplitude, joint configuration and biomechanical constraints.

We chose to study figure-eight movements. These gestures require the displacement of the end-effector segment through all the directions within the plane of the figure. Note, however, that starting from the central point one can perform this figure with one of four different initial directions. Given the fact that EMG patterns are modulated by movement direction in 3D space (Flanders et al., 1994, 1996; Hoffman and Strick, 1999), forcing the DRNN to converge to any one of these four patterns of movement should create an attractor state that reflects the directional tuning of synergies within the workspace. Given also that a muscle's activation depends on its mechanical action, which in turn depends on joint configuration (Hogan, 1985; Buneo et al., 1997), a DRNN that converges to the four figure eights realized in one part of the joint workspace may or may not recognize muscle activities when the same movements are performed in a different workspace region. Finally, considering that the precise structures of some muscle synergies are subject-specific (Torres-Oviedo and Ting, 2010), a DRNN trained with all the figure-eight movements of one subject may or may not detect the tuning synergies of another, depending on how stable the underlying modules are across subjects. We therefore set out to measure the ability of a DRNN to learn and recognize movements from EMG signals for figure-eight movements performed in different directions and in different parts of the workspace as a mean to assess the invariance of movement modules or primitives across a variety of movement conditions. We also used the DRNN in a novel fashion to identify the physical manifestation, in terms of hand kinematics, of synchronous synergies (d'Avella and Bizzi, 2005; Klein Breteler et al., 2007) identified by principal component analysis in our previous study (see companion paper, this issue).

# **MATERIAL AND METHODS**

Data were collected from five right-handed subjects aged between 21 and 40 years. All were in good health, free from known neurological disorders, and had given informed consent to take part in the study, which was approved by the local ethics committee. They were asked to draw, as fast as possible, two series of figure-eight movements in free space with the right arm fully extended at the elbow (for more details see Bengoetxea et al., 2010). Movements were initiated in the center of the figure with an initial up-right (UR), down-right (DR), up-left (UL) or down-left (DL) direction with respect to external coordinates. Three subjects performed the task in both the frontal and sagittal workspaces (in separate sets of trials) depending on the flexion or abduction posture of the shoulder. Two additional subjects (subjects 4 and 5) performed the movements only in the frontal workspace.

# **DATA ACQUISITION**

Movements of the index finger were recorded and analyzed using the optoelectronic ELITE system (2 CCD-cameras, sampling rate of 100 Hz; BTS, Milan; Ferrigno and Pedotti, 1985). The cameras were placed 4 m apart from each other and 4 m from the subject. Four markers were attached to the arm (on the acromion, the lateral condoyle of the humerus, the radial apophysis of the wrist and the index finger). Velocity signals were obtained by digitally differentiating position signals using a fifth-order polynomial approximation. Reconstruction of the arm movements by the ELITE system using the trajectories of the 4 markers confirmed the visual observation that the upper arm, forearm, hand and index finger acted as a rigid link (Bengoetxea et al., 2010). Thus, we analyzed here only the marker on the index finger that was used to trace the figure-eight.

Surface electromyographic activity (EMG) was recorded with the TELEMG system (BTS, Milan) synchronized with the kinematic data. Silver-silver chloride electrode pairs (inter-electrode distance of 2.5 cm) were placed over the belly of the following 7 muscles: posterior deltoid (PD), anterior deltoid (AD), median deltoid (MD), pectoralis major superior and inferior (PMS and PMI), latissimus dorsi (LD), and teres major (TM). Raw EMG signals (differential detection) were amplified by a portable unit with a gain of 1000 and transmitted to the main unit via a telemetry system (Telemg, BTS). A functional resistance test that isolated specific muscles was made in order to verify the absence of cross talk between adjacent muscles. Thereafter, EMGs were band-pass filtered (10–500 Hz), digitized at 1 kHz, full-wave rectified and smoothed by means of a third-order averaging filter with a time constant of 20 ms (Hof and Van den Berg, 1981).

# **DYNAMIC RECURRENT NEURAL NETWORK**

We used a DRNN, that consisted of 50 fully connected hidden neurons, 7 input neurons and 2 output neurons. The network included a looping mechanism (fully connected structure) that enables this network to learn and store information (memory). This feature allows the network to model complex situations with multiple influences. This particular DRNN structure has varying time constants as well as varying weights for the artificial neurons. The adaptive time constants make the DRNN dynamic and therefore capable of modeling time varying input and outputs.

The DRNN was governed by the following equation:

$$T\_i \, d\mathbf{y}\_i / dt = -\mathbf{y}\_i + F(\mathbf{x}\_i) + I\_i \tag{1}$$

$$\text{where } F(\alpha) \text{ is the squaving function.}$$

$$F(\alpha) = (1 + e^{-\alpha})^{-1},$$

*Yi* is the state or activation level of unit i, *Ii* is an external input (or bias), and *xi* is given by

$$\varkappa\_{\vec{i}} = \sum\_{j} \varkappa\_{\vec{i}\vec{j}} \,\,\nu\_{\vec{j}},\tag{2}$$

which is the propagation equation of the network (*xi* is called the total or effective input of the neuron, and *wij* is the synaptic weight between units *i* and *j*). The time constant *Ti* acts like a relaxation process, allowing a more complex dynamical behavior and improving the non-linearity effect of the sigmoid function (Cheron et al., 1996; Draye et al., 1996, 1997). In order to make the temporal behavior of the network explicit, an error function is defined as

$$E = \int\_{t\_0}^{t\_1} q\left(\boldsymbol{y}\left(t\right), t\right) dt\tag{3}$$

where *t*<sup>0</sup> and *t*<sup>1</sup> give the time interval during which the correction process occurs. The function *q*(*y*(*t*), *t*) is the cost function at time *t* which depends on the vector of the neuron activations *y* and on time *t*. We then introduce new variables *pi* (called adjoint variables) that are determined by the following system of differential equations:

$$\frac{d\mathbf{p}\_i}{dt} = \frac{1}{T\_i} \int\_{t\_0}^{t\_1} \mathbf{p}\_i - \mathbf{e}\_i - \sum\_j \frac{1}{T\_j} \boldsymbol{\omega}\_{ij} \mathbf{F}' \left(\mathbf{x}\_j\right) \mathbf{p}\_j \tag{4}$$

with boundary conditions *pi*(*t*1) = 0. After the introduction of these new variables, we can derive the learning equations:

$$\frac{\partial E}{\partial \boldsymbol{w}\_{ij}} = \frac{1}{T\_i} \int\_{t\_0}^{t\_1} \boldsymbol{\chi}\_i \boldsymbol{F}^{\{\boldsymbol{x}\} \boldsymbol{p}\_{\dot{I}}} d\boldsymbol{t}; \ \frac{\partial E}{\partial T\_i} = \frac{1}{T\_i} \int\_{t\_0}^{t\_1} \boldsymbol{p}\_i \frac{d\boldsymbol{y}\_i}{dt} dt \tag{5}$$

The training of the DRNN was supervised, involving learningrule adaptations of the synaptic weights and time constants of each unit (for more details, see Draye et al., 1997). This algorithm, called "backpropagation through time," aims to minimize the error value defined as the differential area between the experimental and simulated output kinematics signals.

### *DRNN learning strategy*

The DRNN used here was adapted from a previous version originally developed for the reproduction of a figure eight (Cheron et al., 1996; Draye et al., 1997). Although in our previous study we showed that the DRNN could recognize the preferential direction of the muscles based on a single movement, it was not able to generalize from training on one movement to reproduce movements based on EMG signals from trials with different initial directions of the movement. In order to obtain this ability to generalize, we developed a new learning procedure called "multi-pattern learning" (**Figure 1**). In this figure we illustrated a multi-pattern learning for four movements realized in the frontal workspace, each one corresponding to a figure eight initiated in one different direction. This DRNN was alternatively trained in sequential iterations on one of the four patterns, in a random sequence. Three types of multi-pattern training were performed, the first one with the 4 movements realized in the frontal plane, the second one with the 4 figure-eight movements realized in the sagittal plane and the third one with the 8 figure-eight movements taken from both planes. We compared the results of these training processes to the results obtained when trained on a single movement, as reported in our previous publication.

### *DRNN generalization*

After training on data from a given workspace, EMG profiles corresponding to novel movements from either the same workspace

**FIGURE 1 | The central box symbolizes the DRNN.** Inside each corner of the box the four EMG patterns used to train the network are illustrated, one for each of the four movement patterns shown outside the box. Each EMG signal from a given movement was sent to all 50 artificial neurons (hidden units) which converge on two output units acting merely as summators. One output neuron

provides the vertical component of the finger velocity, the other the horizontal component. Each iteration of the training was performed with one pattern at a time in random order. Black velocity profiles represent the learned output. Gray dashed velocity profiles correspond to the experimental data. The corresponding experimental trajectory is illustrated in each corner of the figure.

or from the other workspace were fed into the trained DRNN. A comparison was then made between the velocity profiles predicted by the DRNN and the actual measured movements of the index finger. In order to quantify the resemblance between the measured and simulated velocity profiles, we calculated a similarity index (SI), using the following equation:

$$SI = \frac{\int f\_1\left(t\right) f\_2\left(t\right) dt}{\left[\left(\int f\_1\left(t\right)^2 dt\right) \left(\int f\_2\left(t\right)^2 dt\right)\right]^{\frac{1}{2}}} \tag{6}$$

We looked for the effect of the injection of novel EMG profiles in each of the 3 types of multi-pattern DRNN. For statistical analysis, we first tested for normality in the distributions of SIs, using the Kolgomorov–Smirnov test. We then used a repeatedmeasures ANOVA followed by Scheffe's test for *post-hoc* analyses (Statistica ©Statsoft).

# *Synergy action identification via DRNN simulation*

In the second part of our study we explored the physiological meaning of purported muscle synergies by reconstructing the EMG signals based on different combinations of components computed by principal component analysis and injecting them into the DRNN. The methods used to compute the principal components as well as an analysis of the resulting synergies are presented in our companion paper published in this issue. We limited this analysis to the first three PCs, which accounted for at least 75.28% of the total variance in each movement (mean across movements: 83.01 ± 2.84% of the total variance). For each muscle we reconstructed the EMG signals from components PC1, PC2, and PC3 individually and compound EMG signals constructed from PC1&2, PC1&3, PC2&3, and PC1&2&3, plus the movements predicted from the full EMG signal (i.e., PC1-7), for a total of 8 different sets of EMG signals. These sets of EMG signals were then injected into the DRNN that has been trained on figure eight movements performed in all four directions in both the frontal and sagittal planes. A comparison was then made between the velocity profiles predicted by the DRNN and the actual measured movements of the index finger, using the SI. We performed a similar procedure for EMG signals reconstructed from the first three factors after a varimax rotation.

# **RESULTS**

We first looked at the ability of the DRNN to predict patterns of movement from EMG signals as a function of the set of movements used to train the network. **Figure 1** illustrates the typical performance of the DRNN trained on the EMG patterns (center) and movement recordings from a set of 4 frontal workspace movements. To each side of the rectangle, representing the DRNN, we have superimposed the learned (black curves) and the measured (gray dashed curves) velocity profiles. After a learning phase involving 15,000 iterations, the DRNN trained on this set of movements in the frontal workspace was able to reproduce the horizontal and vertical velocity profiles of the training set with a mean error value of 0.004 ± 0.001.

**Figure 2** shows a comparison of learning performance of the network for different learning strategies. The learning sequence for 1-pattern learning (left column), for 4-pattern learning corresponding to the figure-eight movements realized in the frontal workspace (middle column) and 8-pattern learning trained with the movements realized in the frontal and sagittal plane and with the 4 different initial directions (right column). The first 40 iterations for each training session are plotted in the top row showing the sequence of movements presented to the network on each iteration, thus illustrating the difference between the three training strategies.

The middle row of **Figure 2** shows the RMS error as a function of iteration in the learning procedure. Note the change in scale on the X axis. One can observe that the learning error only reached the value of 0.001 that we observed in our previous studies for the single pattern training, despite the greater number of iterations performed for the multi-pattern trainings. For the multi-pattern training illustrated here, the mean error was 0.004 ± 0.001 and 0.009 ± 0.002 for 4- and 8-pattern training, respectively. In terms of the ability of the network to converge to a stable response, essentially all networks starting from a random set of initial synaptic weights and time constants and trained on a single movement pattern converged to a stable response. In contrast, out of the 90 multi-pattern trainings with 4 movements that were initiated, 32.5% had asymptotic error curves with a mean error of 0.016 ± 0.21. Similarly, out of the 79 sessions initiated for multi-pattern trainings with 8 movements, 21.5% had asymptotic error curves with a mean error of 0.011 ± 0.002.

The bottom row of **Figure 2** shows the actual, measured hand velocities (Vy and Vz) for a single movement pattern (Pattern #3, which started in the upward/leftward direction) compared to the simulated hand velocity produced by each of 3 DRNNs, after 1-pattern (left), 4-pattern (middle) and 8-pattern (right) learning. It is interesting to note that for the three learning conditions shown in **Figure 2**, the temporal relationship was well reproduced in all three cases between the real and the learned velocity profiles. The only differences between the actual and reconstructed hand trajectories appeared in the magnitude of the peak velocities, as can be seen in the example shown here. These qualitative differences were reflected in the similarity index for each of the velocity components for this particular movement. For the example shown, one can see that the SI for the vertical component decreased from 0.99 to 0.97 to 0.95, and for the horizontal component from 0.98 to 0.96 to 0.94, for 1-, 4-, and 8-pattern training, respectively.

The mean SI ± SD for the vertical and the horizontal velocity components were respectively 0.98 ± 0.004 and 0.97 ± 0.001 for the 1-pattern training, 0.97 ± 0.01 and 0.97 ± 0.02 for the 4-pattern training and 0.93 ± 0.02 and 0.95 ± 0.01 for the 8 pattern training. These values are illustrated in **Figure 3A** for the 1-pattern and 4-pattern training and in **Figure 3B** for the 4-pattern and 8-pattern learning (filled circles). A Kolgomorov– Smirnov test showed that the SIs followed normal distribution for the two velocity components and for the three types of training. To test the ability of each training method to reproduce the movements within the respective training sets, we performed repeated-measures ANOVA on the SIs with training type (1 pattern, 4-pattern, 8-pattern) and velocity component (Vy, Vz) as independent factors. The ANOVA showed a significant main effect of training type [*F*(2, 14) = 36.92; *p* < 0.001] and Scheffe's *post-hoc* analysis showed that the DRNN trained on 8-patterns reproduced significantly less accurately the velocity curves than either the 4- and 1-pattern trained DRNNs (*p* < 0.001 for both comparisons). There was no main effect of velocity component and although there was a significant cross effect between the two factors [*F*(2, 14) = 6.04, *p* = 0.0128], indicating a difference in the way that the SIs changed for the two velocity components across training types, *post-hoc* analysis did not detect a significant difference between SIs for Vy and Vz within any of the three training types.

finally an 8-pattern learning (right column). Green and gray surfaces represent the frontal and sagittal workspaces. The first 40 iterations for each training session are plotted in the top row showing the sequence of movements

**DRNN GENERALIZATION**

To test the ability of each DRNN to predict movement patterns from EMG signals that were not in the training set, we injected various EMG patterns, corresponding to figureeights initiated with different directions and realized in both workspaces, as inputs to the DRNN and compared the output of the network to the corresponding movement. For the generalization phase we used only the DRNN instances with the best mean error for each learning condition. This set of 7 trained DRNNs (4 1-pattern, 2 4-pattern, and 1 8-pattern) was used to test the ability of the networks trained in each fashion to generalize across movements within the same movement plane, to generalize across movement planes and to generalize across subjects, as we will describe in the following paragraphs.

### *Generalization between movement directions*

Consider first the generalization across movement patterns within the same workspace. Four instances of the DRNN were each exposed during the training phase to a single movement and the

movement pattern (pattern #3, which started in the upward/leftward direction) compared to the simulated hand velocity produced by each of 3 DRNNs, after 1-pattern (left), 4-pattern (middle) and 8-pattern (right) learning.

corresponding EMG recordings in the frontal plane, one for each of the four possible directions of movement. We then injected EMG patterns from four additional movements performed by the same subject in the frontal plane into each of the four instances of the DRNN (for a total of 16 EMG/DRNN pairings) and computed the similarity index between the predicted and actual movement velocities in Y and Z. The resulting similarity indices were then divided into four groups according to the pairing between the test movement and the movement on which the particular instance of the DRNN was trained. Four pairings consisted of test and training movements that started in the same direction in both Y and Z. Four pairings consisted of test and training movements that started in the same direction in Y but opposite directions in Z while conversely, four pairings consisted of test and training movements that started in the same direction in Z but opposite directions in Y. Finally, four pairings consisted of training and test movements that started in opposite directions in both Y and Z. To this we added one additional pairing in which each of the four test movements were injected into an instance of the DRNN that had been exposed to all four movements from the training

for 1-pattern DRNN and 4-pattern DRNN trainings. **(B)** Presents similarity indexes for same and crossed workspaces for 4-pattern DRNN and 8-pattern DRNN trainings. SIs for learned and simulated curves correspond to black and open circles, respectively. **(C)** Illustrates the simulated and experimental

Shows the simulated velocity profiles from injecting the sagittal EMG patterns into the frontal DRNN. **(E)** Shows the simulated velocity profiles from injecting the sagittal EMG patterns into the dual DRNN. The similarity indexes are indicated above each profile's graph.

set according to the multi-pattern learning scheme depicted in **Figures 1**, **2**.

Although an instance of the DRNN trained on a single movement converged to a very low RMS error for predicting the velocity of the hand from the EMG used to train the DRNN, such 1-pattern trained DRNNs did a poor job, in general, of reproducing from EMG signals figure-eight movements that were not included in the training set (mean SI for the vertical and horizontal velocity components were 0.63 ± 0.22 and 0.68 ± 0.15, respectively, across movements in all four directions). Of greater interest is the effect of movement direction on the ability of a single-pattern DRNN to reproduce the movement. When the EMG came from a movement not in the training set, but initiated in the same direction as the training movement in both Y and Z, the mean values of SI were 0.80 ± 0.16 and 0.76 ± 0.09 for the vertical and horizontal velocity components, respectively. When the test and training movements shared the same initial vertical component but had opposite initial horizontal component, the mean SIs were 0.63 ± 0.14 and 0.66 ± 0.23 for Vy and Vz, respectively, while when the two movements shared the same initial horizontal component but opposite vertical component, the corresponding SIs were 0.71 ± 0.22 and 0.68 ± 0.05. SIs were lowest when both vertical and horizontal component were different, with values of 0.62 ± 0.08 and 0.55 ± 0.27, respectively. An ANOVA with factors *velocity component* and DRNN/direction *pairing* showed a significant effect of the pairing factor [*F*(4, 12) = 4.8398, *p* = 0.01477), with no significant difference between Vy and Vz and no cross effect. The DRNN trained on all 4 patterns from the same workspace was much better able to reproduce the kinematics generated from the EMG recordings for other movements performed within that workspace. Indeed, the 4-pattern DRNN, with average SIs of 0.92 ± 0.03 and 0.91 ± 0.03 for Vy and Vz, respectively across all movement directions (**Figure 3A**), was better able to reproduce the hand velocities than a 1-pattern trained DRNN could for figure eights performed in the same direction as it's own training movement.

## *Generalization between planes*

Next we considered the ability of a DRNN to generalize between different parts of the workspace (i.e., different movement planes). For the frontal and sagittal movements, each simulated by the appropriate frontal and sagittal DRNNs, the mean SIs for the vertical and horizontal velocity components were 0.89 ± 0.06 and 0.91 ± 0.03, respectively (**Figure 3B**, "same workspace"). **Figure 3C** illustrates a typical pair of simulated velocity profiles (Vy and Vz) computed from EMG signals taken during movements in the frontal plane that were not in the training set, overlaid on the actual velocity profile.

DRNNs trained on all four movement patterns within one plane were nevertheless much less able to predict the hand trajectories from EMG signals recorded from movements in the other. **Figure 3D** illustrates the simulated and actual velocity profiles for the injection of the EMG patterns from a sagittal movement into the DRNN trained on the 4 frontal movements. While the predictions of the horizontal velocity profiles (Vz) achieved levels of SI similar to that produced by the 4-pattern DRNN for the same workspace, the DRNN did not successfully reproduce the vertical velocity component (Vy) across workspaces. The mean SI for simulated movements computed by the 4-pattern DRNN trained on the "other" workspace was 0.84 ± 0.06 for the horizontal component and 0.50 ± 0.13 for the vertical component (**Figure 3B**, "different workspace"). On the other hand, an 8-pattern DRNN trained on movements from both workspaces was able to reproduce movements in either workspace just as well as each of the 4-pattern DRNNs were able to reproduce movements within their own workspaces. For the 8-pattern DRNN, the mean similarity index for the vertical and horizontal velocity components was 0.90 ± 0.07 and 0.92 ± 0.02, respectively (**Figure 3B**, "8 patterns").

We used ANOVA to test for statistical significance of the observations described above. From all the movements recorded for the one subject whose data was used to train the networks, we injected the EMG signals from all the other movements that were not in the training sets into each of three instances of the DRNN, the one that had been trained on 4 movements in the frontal plane, the one that had been trained on 4 movements in the sagittal plane and one that had been trained on all 8 movements, resulting in a total of 8 × 3 = 24 simulated movements. We then used the pairing between the DRNN and the actual movement's workspace to divide the 24 simulated movements into 3 groups of 8 movements each, those produced by the 4-pattern DRNN trained on movements from the same plane, those produced by the 4 pattern DRNN trained on the other plane and those produced by the 8-pattern DRNN trained on both planes. This resulted in a 3 × 2 multifactor ANOVA, with *DRNN/EMG pairing* (sameplane, cross-plane, and dual-plane) and *velocity component* (Vy, Vz) as within-group factors (i.e., a repeated measure for the same movement produced by the subject). Note that the normality of our data set was first verified by the Kolgomorov–Smirnov test before the ANOVA was applied.

The ANOVA described above revealed a highly significant main effect for the type of DRNN/EMG pairing [*F*(2, 14) = 23.61; *p* = 0.0003]. There was a significant main effect of velocity component [*F*(1, 7) = 7.58, *p* = 0.0284] and a significant interaction [*F*(2, 14) = 6.04, *p* = 0.0129]. Scheffe's *post-hoc* analysis showed that there was no significant difference between the ability of each of the three DRNN/EMG pairings to reproduce the horizontal velocity component (Vz). On the other hand, the SIs for the vertical velocity component (Vy) were significantly lower (worse) for the cross-plane simulations than for the simulations produced by either the same-plane or the dual-plane DRNN/EMG pairings (illustrated in **Figure 3B**).

As a control, we considered whether the DRNN's inability to predict movements across planes could be attributed to differences in the kinematics of the figure-eight movements performed in each plane. **Figure 4A** shows a comparison of the velocity components for pattern #3 for the reference subject, performed in the frontal (blue) and sagittal (red) planes. One can observe that the movements were very similar both in terms of velocity amplitude and in terms of the temporal characteristics. **Figure 4B** shows a comparison of the mean similarity index (SI) computed between the real test movement in one plane and the corresponding training movement from the other plane (real-trained) and between the real test movement from one plane and the movement predicted from the corresponding EMGs by the DRNN that had been trained on movements from the other plane. One can see that on average the SIs between actual movements in different planes were high for both the vertical and horizontal velocity components (0.93 ± 0.04 and 0.92 ± 0.06, respectively). SIs between actual movements and movements predicted by the DRNN from EMGs were somewhat lower, especially for Vy (0.93 ± 0.04 and 0.6 ± 0.3). Statistical analyses revealed that although there was no difference between the SI for the comparison of real movements and predicted movements for the horizontal velocity component (Vz), there was a significantly lower similarity between predicted and actual movements for the vertical component

(Vy) (Scheffe's *post-hoc*: *p* < 0.005), compared to the similarity of actual movements performed in different planes [*F*(1, 14) = 10.986, *p* = 0.00511]. In other words, the inability of the DRNN to generalize across movement planes in terms of Vy cannot be attributed to differences in the movement kinematics performed in each plane.

Finally, we asked whether the inability of the DRNN to generalize across movement planes could be related to changes in muscle synergies as identified through principal component analysis of these same movements and EMG. In our companion paper we showed that the loading vectors (synergies) varied, on average across subjects, between figure eights drawn in the frontal and sagittal planes. **Figure 5** shows the average loading for each PC, computed for each of the two movement planes, for the reference subject alone. We performed an ANOVA on the loadings with movement plane (frontal or sagittal) as a grouping factor and muscle (AD, MD, PD, PMS, PMI, LD, TM) as a repeated measure. For each of the 3 PCs, there was a significant main effect of the muscle factor, a significant main effect of movement plane and a significant cross effect between the two. But this ANOVA was not performed with these global contrasts in mind. The pertinent test from this analysis was the *post-hoc* analysis that we applied to determine which muscle loadings, if any, changed between the two movement planes within each PC. For PC1 and PC2, Scheffe's *post-hoc* test detected no significant changes in individual muscle loadings between the frontal and sagittal planes. For PC3, however, there was a significant increase in the loading of LD and a significant decrease in the loading of PMI when passing from the frontal to the sagittal movement plane.

### *Synergy action identification via DRNN simulation*

We then set out to see how the DRNN would interpret the action of EMG signals associated with each of the different components identified by principal component analysis and by varimax

**FIGURE 5 | Average loadings for each PC and each muscle, computed for each of the two movement planes.** Each graph shows the loading for Anterior Deltoid (AD), Medial Deltoid (MD), Posterior Deltoid (PD), Pectoralis Major Superior (PMS), Pectoralis Major Inferior (PMI), Latissimus Dorsi (LD), and Teres Major (TM). In blue are represented the loadings for movements performed in the frontal workspace, in red the loadings for movements performed in the sagittal workspace. Stars show loading for individual muscles that were significantly different between the frontal and sagittal planes (*p* < 0.001).

factor analysis (see companion paper). The simulation phase consisted of sending to the 8-pattern (dual plane) trained DRNN the EMG signal reconstructed with the first, second and third components and the combinations of components 1&2, 1&3, 2&3, and 1&2&3 for both the principal component (**Figure 6**) and varimax (**Figure 7**) decompositions. In these figures are illustrated the simulations for EMG signals taken from the reference subject for the same figure-eight movement (initiated down and to the right) realized in the frontal and sagittal workspaces.

# *Principal components*

The superposition of real (gray dashed curves) and simulated (black and red curves) velocity curves in **Figure 6** shows that the EMG signals composed by PC2 and PC3 alone reproduced very nearly the horizontal and vertical velocity component, respectively, while the EMG signal reconstructed from PC1 alone produced simulated movements that did not resemble either of the

figure) and PC1&2, PC1&3, and PC2&3 (bottom part) for the same figure-eight movement (initiated down and to the right) realized in the

> velocity components. Nevertheless, combining PC1 with either PC2 or PC3 (lower part of **Figure 6**) increased the level of reproduction for the horizontal and vertical component respectively, compared to any one component alone. Indeed, PC1 seems to have an influence for the stability in the static phase existing before and after the movement, as we can see in the simulation velocity curves resulting from EMG reconstructed with the

arbitrarily set to 0.5. SIs are indicated in the upper right corner of each

graph.

second and third, but not the first PCs combined. Compared to the simulation from PC1, PC1&2, and PC1&3, the simulated movements that did not contain PC1 (PC2, PC3, PC2&3) exhibited a negative bias in Vy both before and after the figure-eight movement, indicating that without the PC1 component the hand would drift downwards.

We used ANOVA to test statistically the ability of each PC or combination of PCs to reproduce the velocity profiles of the actual movements. We compared the similarity indexes between the 8 measured movement patterns for the reference subject with the simulated movements from the eight different DRNN reconstructions from the corresponding EMG signals (PC1, PC2, PC3, PC1&2, PC1&3, PC2&3, PC1&2&3, PC1-7 = real EMG). Recall that the EMG signals and movement recordings that were used to simulate movements via the DRNN were different from the EMG and movement recordings used to the train the DRNN. This resulted in an ANOVA with two repeated-measures factors (PC combination, velocity component). The repeated measures ANOVA showed that the SIs were significantly different depending on which combination of PCs were used to reconstruct the EMG signal [main effect: *F*(7, 49) = 144.36, *p* < 0.0001]. Sheffe's *post-hoc* analysis showed that SIs obtained with EMG reconstructed from P1&2&3 were not significantly different from the movements simulated with the real EMG (mean SI: 0.80 ± 0.02 and 0.91 ± 0.01, respectively; *p* > 0.99]. SIs for movements simulated with PC2&3 (0.78 ± 0.02) were significantly lower than those simulated from the real EMG (*p* = 0.046), but only slightly worse than those simulated with PC1&2&3 (*p* > 0.99). For all the other simulated movements with PC1, PC2, PC3, PC1&2, PC1&3, or PC2&3 (mean SI 0.05 ± 0.04, 0.42 ± 0.03, 0.37 ± 0.02, 0.45 ± 0.03, 0.36 ± 0.03, respectively), the SIs were significantly lower than the reconstruction from the full EMG signal (*p* < 0.001).

The repeated measures ANOVA did not show a significant main effect of the velocity component factor [*F*(1, 7) = 4.26, *p* = 0.078] but there was a highly significant interaction between the velocity component and the PC combination factors [*F*(7, 49) = 117.95, *p* < 0.0001]. Scheffe's *post-hoc* analysis confirmed the results illustrated in **Figure 6**:


There were no differences in SIs for the horizontal and vertical velocity components for EMGs composed with PC2&3, with PC1&2&3 or for the real EMG (*p* > 0.99). The mean SI for the horizontal and vertical velocity component were 0.81 ± 0.03, and 0.74 ± 0.02, 0.88 ± 0.02 and 0.72 ± 0.03, 0.92 ± 0.01 and 0.90 ± 0.02, respectively.

### *Varimax*

The simulation of the velocity traces by the DRNN based on EMG signals reconstructed from the varimax decomposition (**Figure 7**) were more difficult to interpret in terms of the actions of each component. We display the velocity signals in this figure in black or red depending on whether the SIs for the varimaxreconstructed EMGs were significantly different from the best scores obtained for the principal component (**Figure 6**). From this one can see that no single varimax component produced very well either of the velocity components. Even the traces shown in red manifest noticeable differences between the actual and predicted velocities. While VM3 produced the same number of peaks in Vy, the predicted velocities were strictly negative while the real velocity had both positive and negative components. For VM1, the predicted and actual Vz showed similarities in the number of peaks, but the DRNN predicted velocities that were not nearly as strong as the actual measured velocities.

We used ANOVA to test statistically the ability of each PCA method (unrotated or varimax) to reproduce the velocity profiles of the actual movements. We compared the similarity indexes between the 8 measured movement patterns for the reference subject with the simulated movements from the 7 different DRNN reconstructions from the corresponding EMG signals from the principal component (PC1, PC2, PC3, PC1&2, PC1&3, PC2&3, PC1&2&3) and varimax (VM1, VM2, VM3, VM1&2, VM1&3, VM2&3, VM1&2&3). This resulted in an ANOVA with two repeated-measures factors (EMG components X velocity component). **Figure 8** illustrates the mean and SD values for SIs obtained for 8 measured figure-eight movements. In this figure we have drawn a "threshold" line that separate the SI values that present a significant difference from the best SIs values obtained with unrotated PCA for EMG composed with PC1&2. One observes that PC3 and VM3 were interpreted similarly as acting on the vertical component of the movement whereas VM1 acted more like PC2, each having a functional link with Vz. VM2 seems to have a more complex combination of information concerning both velocity components. When looking at the DRNN interpretation by combining PCs 2-by-2, one observes that PC2&3 and VM1&3 are interpreted similarly by DRNN as having the same level of action on Vy and Vz (Scheffe's *post-hoc p* < 0.99). The same situation is true for unrotated and varimax PC1&2. But the pattern of SIs for VM1&3 did not bear any resemblance to the patterns achieved with the principal components, since it reproduced better Vy than Vz. This observation, added to the fact that the mean SI for VM1&2 was not different from the one obtained for PC1&3, leads us to conclude that VM2 corresponded to a synergy that acts partly on horizontal and partly on vertical velocity component, without the clear demarcation between components that is found for unrotated principal components.

We went further to ask whether the differences between the principal component and varimax decompositions in terms of the mapping of EMG components to hand velocity could be explained simply by differences in the particular instances of the trained DRNN, or whether the pattern is repeatable to any successfully trained instance of the DRNN. In **Figure 9** we compared the behavior of 3 different instances of the DRNNs, all of which were trained with the same 8 movement trials from the reference subject, but each of which converged starting from a different random set of initial weights and time constants. Globally, the observations made for the single DRNN above were valid for all three instances of the DRNN. PC2 was associated with Vz while PC3 was associated with Vy. A Three-Way ANOVA, with PC combination, velocity component and DRNN instance showed no main effect of DRNN instance [*F*(14, 84) = 1.4213, *p* = 0.16], nor any cross effects be DRNN instance and either of the other two independent factors.

# *DRNN generalization between subjects*

We then tested the ability of a DRNN trained on data from one subject to simulate movements based on EMG signals recorded from another. We fed EMG signals from five different subjects to the same 8-pattern DRNN trained on movements from both planes performed by a single subject. The set of five subjects included the one subject whose data were used to train the networks, hereafter referred to as the reference subject, and four other subjects who performed the experiment in the frontal plane. Note that as in the previous analysis, the EMGs used to simulate movements for the reference subjects were different from that subject's EMG recordings used to train the network. The results of this analysis are shown in **Figure 10A**, where we have overlaid the velocity traces of for each subject and we have plotted the similarity indexes for the comparison of the actual movements and the movements predicted by the DRNN from the EMG (dark symbols). For comparison, we have plotted the SIs for the actual movements performed by each subject and the corresponding actual movements from the reference subject used to train the network (gray symbols). ANOVA showed that in general the DRNN predicted less well the velocity profiles of the other four subjects and that in all cases the SIs for Vy (Scheffe's *post-hoc*: *p* < 0.05) were lower than for Vz [cross-effect between subjects, velocities and real-trained movement vs. real-DRNN predictions *F*(4, 49) = 15.156, *p* < 0.001]. This is in contrast to the between-subject comparisons of the actual movement profiles, which showed SIs that were greater than those observed for the DRNN-reconstructed movements and which did not differ between Vy and Vz.

We performed a two-factor ANOVA to test the ability of each DRNN to reproduce the measured hand velocities from the recorded EMGs. As before, *velocity component* (Vy and Vz) was treated as a repeated measure, to which was added the grouping factor *subject* to indicate which subject's EMG data was used to simulate each movement. This led to a 2 × 5 mixed-model ANOVA. The ANOVA also showed a significant difference in SIs between the five subjects [main effect of subject, *F*(4, 49) = 28.439, *p* < 0.0001]. *Post-hoc* tests showed that the DRNN did a significantly better job, on average across both velocity components, of reproducing the trajectories for the reference subject (mean SI 0.91 ± 0.02) compared to all four other subjects (*p* < 0.00001). There was no overall difference (*p* > 0.3) for the simulations between three of the other subjects (mean SI was 0.64 ± 0.06, 0.52 ± 0.31 and 0.52 ± 0.37 for subject 2, 3, and 5 respectively). But subject 4 presented significantly lower SIs (*p* < 0.05) than any of the other subjects (mean SI 0.33 ± 0.46). There was, however, a significant cross effect between the subject and velocity-component factors [*F*(4, 49) = 13.947, *p* < 0.0001]. Indeed, the main difference for the simulations between the reference subject and the other subjects was found for the vertical velocity component: Scheffe's *post-hoc* analysis showed a significant difference (*p* < 0.0001) for Vy between the reference subject (mean SI 0.9 ± 0.07) and three of the other four subjects (mean SI 0.3 ± 0.25, 0.004 ± 0.05, and 0.26 ± 0.19 for subjects 3, 4, and 5

respectively) but not for subject 2 (*p* < 0.054) (mean SI 0.60 ± 0.2 for subjects 2). For the horizontal component, Scheffe's *post-hoc* analysis showed no differences between the reference subject and the four others (mean SI were 0.92 ± 0.02, 0.68 ± 0.18, 0.73 ± 0.14, 0.66 ± 0.12, and 0.79 ± 0.03 for subjects 1, 2, 3, 4, and 5 respectively).

We went further to test whether the patterns in the mapping of EMG components to hand velocity was specific to this particular instance of a trained DRNN, or whether the pattern was repeatable to any successfully trained instance of the DRNN. In **Figure 10B** we compared the behavior of three different DRNN instances, all of which were trained with the same eight movement trials from the reference subject, but each of which converged starting from a different random set of initial weights and time constants. We performed an ANOVA on the similarity indices with *DRNN instance* (dual-plane A, dualplane B, and dual-plane C) and *velocity component* (Vy and Vz) as repeated measures. To this was added the grouping factor *subject* to indicate which subject's EMG data were used to simulate each movement. This led to a 3 × 2 × 5 mixedmodel ANOVA. There was no significant effect between the three instances of 8-pattern DRNNs [*F*(2, 98) = 0.31392, *p* = 0.73131] nor was there a significant cross-effect between instances of 8 pattern DRNN and the factor subject [*F*(8, 98) = 1.6801, *p* = 0.11275]. There was, however, a significant cross effect between the velocity and subject factors [*F*(4, 49) = 16.861, *p* < 0.00001]. As we have observed in the other analyses, the main difference for the simulations between the reference subject and the other subjects was found for the vertical velocity component: Scheffe's *post-hoc* analysis showed a significant difference (*p* < 0.05) between the reference subject and each of the other 4 subjects for Vy but not for Vz. We note, finally, that the threeway interaction between subject, velocity component and 3 instances 8-pattern DRNN was not significant [*F*(8, 98) = 1.0999, *p* = 0.36998].

We completed our analysis by examining the ability of a DRNN trained on data from one subject to reproduce the hand trajectories of the other subjects on the basis of EMG signals reconstructed from different combinations of principal components. The similarity index was computed between each simulated movement and the corresponding actual movement and the SI's were subjected to a mixed-model ANOVA with *subject* as a grouping factor and *PC combination* (PC1, PC2, PC3, PC1&2, PC1&3, PC2&3, PC1&2&3, PC1–7) and *velocity component* (Vy, Vz) as repeated measures. This analysis showed that SIs depended on which subject's EMGs were fed to the DRNN [subject main effect: *F*(4, 49) = 13.98, *p* < 0.0001], on which principal components were used to reconstruct the EMG (PC combination main effect: *F*(7, 343) = 175.65, *p* < 0.0001] and on the velocity direction (velocity component main effect: *F*(1, 49) = 91.757,

pattern performed by 5 subjects. Similarity indexes for movements reconstructed from EMGs collected from each of 5 different subjects by an 8-pattern DRNN trained on data from the reference subject only. **(B)** Similarity indices for movements reconstructed by three different instances of an 8-pattern DRNN trained on data from the reference subject. Each instance converged to the final solution from a different set of random initial weights and time constants. Stars show SIs that were significantly different (*p* < 0.005).

*p* < 0.0001]. All interaction effects were highly significant (*p* < 0.0001). **Figure 11** shows the overall results, from which one can make the following observations:


(3) The DRNN decoded the horizontal component of the hand velocity (Vz) just as well across all subjects as it did for the reference subject on whose data the network was trained. Compared to the reference subject, however, the DRNN did a much poorer job of reproducing the vertical component (Vy) for the 4 other subjects.

Finally, we tested whether the ability of the DRNN to predict movements for different subject could be related to differences between subjects in the loading vectors (synergies) identified by principal component analysis. We applied ANOVA to the loading vectors obtained in our companion study for each subject with *subject* as a grouping factor and *muscle* as a repeated measure. We limited this analysis to movements in the frontal plane. **Figure 12** shows the average loadings for subjects 2–5, compared to the average loading for the reference subject (S1). Differences in individual muscle loadings that were significant (as measured by Scheffé's *post-hoc* test) are indicated with a <sup>∗</sup>. As shown in our companion article, the identified principal components were remarkably similar across subjects, with PC1 representing a global activation of all 7 muscles over the course of the movement, PC2 indicating a reciprocal relationship primarily between AD, PMS, and PMI on one side and MD and PD on the other, and PC3 showing a reciprocal relationship primarily between AD and MD on one side and PMI and TM on the other. Differences between the loadings of each subject and the loadings of the reference subject were restricted largely to PC3. This is consistent with the observation that (1) PC2 is linked to Vz and PC3 is linked to Vy and (2) that the DRNN predicted better Vz than Vy across subjects. There is nevertheless an indication of a tradeoff between the participation of TM, with this muscle sometimes participating primarily in PC2 and sometimes in PC3.

# **DISCUSSION**

In this study we looked for modularity in patterns of drawing movements exemplified by the figure eights performed by our subjects. We used the training of a DRNN as a means to identify structure in the mapping from muscle activations to hand movements. While one might quibble over whether the DRNN has learned the dynamics of the arm, it is certain that the DRNN identified structure in the mapping of EMG signals to movements of the hand. Otherwise, it would not be able to predict movements from EMGs not included in the training set. But as we reported, the DRNN was not always able to predict movements from novel EMGs.

A failure of a trained DRNN to generalize to EMG inputs from outside the training set can arise either because the DRNN does not have enough degrees of freedom to learn the actual movement dynamics or because the training set does not contain enough contrasting information to reveal all the underlying structure. In our study, whenever the DRNN failed to generalize from a given, limited set of training movements, it always succeeded when presented with a broader set of examples during training. We concluded that the DRNN structure was sophisticated enough to capture the EMG-to-movement relationship, if presented with a rich enough training set. It is therefore interesting to contrast when the DRNN could and could not generalize from

one dataset to another, as this provides an indicator as to what changes in conditions require modifications to the underlying movement-generating modules.

In this article we have also proposed a novel use of the DRNN to reveal properties of the modules that generate movement of the human arm. In the past we fed to the already trained DRNN modified versions of the individual EMG inputs that were scaled in amplitude (Cheron et al., 1996, 2003) or delayed in time (Cheron et al., 2007), in order to identify the action of individual muscles. In the current study we fed the trained DRNN with potential synergies that we identified through principal component analysis. This more global approach addressed the question of the neural organization of muscle activations from a modular point of view, in contrast to our preceding anatomical viewpoint concerning muscle mechanical actions.

### **SPATIAL vs. TEMPORAL**

As shown by de Rugy et al. (2013), variability in synergies could arise from specific behaviors or tasks to be accomplished. For the same motor program (a figure-eight) we should find in the EMG signals information corresponding to a general "figure-eight" motor program mixed with the information corresponding to the specific aspects of motor execution such as the movement's velocity, amplitude, joint configuration and biomechanical constraints. In this context, the cases where the DRNN failed to generalize can be summarized as follows: (1) the DRNN could not generalize from a single movement direction in a given workspace to the other three movements in the same workspace, (2) the DRNN could not generalize from movements performed in the four different directions in one workspace to the same four movement patterns in the other workspace, and (3) the DRNN could not generalize from one subject to another. One might hypothesize that the inability to generalize in any or all of these situations could be due to differences in the kinematics of the movements that were actually performed, i.e., the DRNN might fail to generalize if it has not been exposed to hand-velocity patterns in the training phase that are included in the test dataset. While this would be an interesting observation in itself, this was not the case in the study reported here. All the movements reported here corresponded to the realization of a figure eight. For the movements in different directions in the same plane, we have previously shown that the four movements were very similar in terms of spatial parameters and the temporal aspects of the trajectories (Cheron et al., 1999). Concerning the inter-workspace and inter-subject comparisons, the analysis of similarity indices between movements in the training and test datasets reported here (**Figure 4**) reject the hypothesis that the inability of the DRNN to generalize across these conditions can be attributed to differences in the movement patterns themselves. The differences detected by the DRNN between conditions must necessarily represent contrasts in the mapping between EMG and the movement in each situation.

Consider, then, the inability of the DRNN to generalize between movement directions within the same workspace. The spatial aspects of each of the four movements were the same, with the hand following the same spatial form (the figure eight), and the arm was fully outstretched in all cases. The biomechanical aspects of the different movements were therefore essentially

constant, at least in terms of the moment arms and preferred directions for the muscles involved. The velocities and accelerations of the hand were also similar across the four different movement directions, although they were not presented in the same order within each of the different movements. If the mapping from movement to EMG required to generate that movement consists of a simple mapping of instantaneous position, velocity or acceleration to muscle activations, the DRNN should have been able to capture that simple, time-invariant relationship from any one of the four movements in the same plane. The fact that the DRNN was unable to generalize from one movement direction to another suggests, therefore, that the modules underlying movement generation must take into account the temporal aspects of muscle activation patterns (Ivanenko et al., 2004; d'Avella et al., 2008; Delis et al., 2014). As shown in our companion article concerning the synergy analyses of figure-eights, factor loadings of the first three PCs did not show any systematic differences with respect to different initial direction of movements, but the temporal components for PC2 and PC3 were modulated according to horizontal and vertical movement components, respectively.

The fact that the DRNN trained with 4 figure-eights realized only in the frontal workspace, was unable to generalize for the sagittal workspace (and vice-versa) could indicate that it was able to detect the biomechanical differences between the two workspaces and the related retuning of the modular commands (Hogan, 1985; Buneo et al., 1994; Cheung et al., 2005; Kamper et al., 2006; d'Avella et al., 2008). Similarly, even though the DRNN was able to generalize across all movements for the reference subject, if trained with all eight movement patterns, it was not able to predict movement patterns from EMGs taken from other subjects. In this context we note that across our 5 subjects the DRNN consistently associated the second principal component with the horizontal component of the finger velocity; and that the analyses of the loadings showed that for all the subjects this second PC revealed the same muscular synergy: a reciprocal activation of muscles according to their line of action in the horizontal plane. In contrast, the DRNN did not correctly predict movements from EMG generated with the third PC for anyone but the reference subject, and the loadings analyses showed significant differences for grouping muscles between them. Studies concerning kinematic and muscular synergies have already proposed that the lower PCs may be responsible for the general aspects of the movement and present less inter-individual variability whereas higher PCs would be responsible for more subtler aspects and by consequence it present more inter-individual variability (Santello et al., 1998; Torres-Oviedo and Ting, 2007; Frère and Hug, 2012). In the case of our vertical figure-eight the invariant aspect necessary to accomplish the trajectory would be expressed by the activation of PC1 and PC2 whereas the personal "signature" would be the consequence of the activation of PC3.

Exactly the same invariant identification was observed for the reference subject when we crossed the EMG/DRNN pairing. A 4 pattern DRNN trained on movements from one plane was able to reproduce the horizontal component of the finger velocity for EMG from movements in the other plane, but was unable to reproduce the vertical component. Factor loadings analyses showed that PC2 maintained the same muscular synergy across planes whereas the third PC loadings analyses showed significant differences for grouping muscles between planes. One can therefore conclude that the DRNN was able to detect directional, biomechanical and subject-dependencies in the mapping from EMG to movement (Muceli et al., 2010; Torres-Oviedo and Ting, 2010; Frère and Hug, 2012; Kristiansen et al., 2013). Indeed it associated directional dependencies to temporal tuning and biomechanical and subject dependencies to spatial tuning of the third PC.

# **HORIZONTAL vs. VERTICAL**

An interesting question emerges from the observation that the DRNN was much more able to reproduce the horizontal component of the hand's velocity than the vertical component. A 4-pattern DRNN trained on movements from one plane was able to reproduce the horizontal component of the finger velocity for EMG from movements in the other plane, but was unable to reproduce the vertical component. Similarly, an 8-pattern DRNN trained on data from one subject was able to reproduce, in most cases, the horizontal component but not the vertical component of movements produced by the other subjects. This observation indicates that the trained DRNN was able to identify an invariant aspect corresponding to the figure-eight, i.e., the control of the horizontal velocity.

Across all the movement conditions the only aspect that remained invariant was the fact that all the movements corresponded to the realization of a figure-eight. Across all the generalizations the only relationship that remained invariant was the identification of the synergy extracted by PC2 with the finger horizontal velocity component. This synergy corresponded to a reciprocal command that groups the shoulder muscles with respect to their horizontal preferred action direction. In a previous work (Bengoetxea et al., 2010) concerning the temporal activation pattern for a figure-eight, cross-correlation analyses showed that the invariant aspect across shoulder position and subjects was the emergence of two groups of muscles acting in a reciprocal mode in relation with the horizontal direction. This invariant synergy suggests the existence of an underlying oscillator module (Hogan and Sternad, 2012), acting in the horizontal direction, and the DRNN seems to have identified this module.

The analyses of the loading corresponding to PC3 across subjects indicates that the muscular synergies associated with the vertical component of the figure-eight were more variable, compared to the synergies defined by PC2 (see companion paper). We can offer several possible explanations of this observation. The first is purely methodological. The third principal component is by definition the one that explains the least amount of variance in the input signals, compared to the first and second. This means that the signal associated with this purported synergy would be smaller and thus more sensitive to noise. But that would seem unlikely to explain the enormous difference in the ability to predict the horizontal or vertical velocity components. The second is behavioral, as it could be that each subject has developed their own idiosyncratic synergies depending, for instance, on their professional activities or sports played. An alternative explanation may be found in the phasic/tonic aspects of a discrete figure-eight movement. For discrete reaching movement it has been shown that synergies are stable across subjects and shoulder positions (d'Avella et al., 2008). Modulations of synergies correspond to a cosine tuning for postural and tonic synergies and more complex pattern for phasic synergies. Tonic synergies are responsible for antigravity and postural control, whereas phasic synergies are responsible for overcoming inertia to accelerate and decelerate the arm (d'Avella et al., 2008). In our case the DRNN had to learn both tonic synergies, before and after movement, and phasic synergies during the movement. The fact that the DRNN was not able to generalize the vertical component of the movement across workspaces or subjects as well as their third PC could be due to the fact that the postural synergy and the phasic synergy for the vertical component of the movement were mixed.

Comparison of the principal component and varimax decompositions via the DRNN provides further fuel for our argument that muscle synergies for discrete-rhythmic movements are best captured by that identified by the principal components; i.e., one module controlling co-activation and two modules producing reciprocal activations, one in the horizontal and one in the vertical direction. When we fed the trained DRNN with EMG reconstructed by the first, second and the third PCs, we obtained a clear identification of each two spatial velocity components of the figure-eight movement. The reciprocal command extracted by PC2, where muscles were partitioned by their horizontal line of action, was clearly associated by the trained DRNN with the horizontal component of the finger velocity. Similarly, the reciprocal grouping by PC3, where muscles were partitioned by their vertical preferred direction, was associated by the trained DRNN with the vertical component of the finger velocity. The DRNN predicted little or no movement from an EMG signal constructed only from PC1, as would be expected from a co-activation module destined to tune the mechanical state of the system, rather to generate movement *per se*. The hand velocities predicted by the DRNN for the first three varimax components (VM1, VM2, VM3) were not nearly so well demarcated, with each producing a combination of vertical and horizontal velocity.

Using the DRNN to interpret the physiological meaning of the muscle synergies that were previously identified through principal component analysis is therefore an interesting addition to the tools that may be used to study modularity in movement control. One might say that the DRNN has captured to some degree the physiological and mechanical relationship between the muscles and the motor output. Of course, use of the DRNN cannot replace a thorough biomechanical model of muscle, bones and joints if one wishes to fully understand the mapping from EMG to movement. But like principal component analysis and other forms of factor analysis, the analysis by DRNN can be useful to identify structure in the underlying relationship, with the added advantage of linking muscle activation to actual movement and with the possibility of identifying causal relationships resulting from neural connections as well as from biomechanical constraints. The DRNN could potentially be coupled with other exploratory techniques, such as more recent efforts to identify modularity in temporal as well as spatial domains (d'Avella et al., 2003; d'Avella and Bizzi, 2005; Delis et al., 2014). Indeed, the "memory" elements of the DRNN have the potential to identify dynamical constraints that determine not only which muscles to activate for a given movement, but also when.

# **CONCLUSIONS**

A comparison of a DRNN's ability to generalize between movement conditions combined with principal component analysis suggests that tuning of movement-generation modules for movement direction seems to be related primarily to the temporal aspects of the movement whereas tuning to take into account joint biomechanics and inter-subject difference seems to be spatial (in the sense of how activity is spread between muscles). Analysis of the network's interpretation of synergies identified by principal component analysis provides further insight into how movementgenerating modules are defined. This tool may therefore be used to motivate future experiments on the question of how human motor behavior may be organized in a modular fashion.

### **ACKNOWLEDGMENTS**

This work was funded by the Belgian Federal Science Policy Office, the European Space Agency, (AO-2004, 118), the Belgian National Fund for Scientific Research (FNRS), the research funds of the Université Libre de Bruxelles and of the Université de Mons (Belgium), the FEDER support (BIOFACT), and the MINDWALKER project (FP7 – 2007–2013) supported by the European Commission.

### **REFERENCES**


**Conflict of Interest Statement:** The Guest Associate Editor Dr. Ivanenko declares that, despite having collaborated with authors Dr. Bengoetxea, Dr. Cheron, Dr. Dan and Dr. Hoellinger, the review process was handled objectively. The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 25 November 2013; accepted: 06 August 2014; published online: 17 September 2014.*

*Citation: Bengoetxea A, Leurs F, Hoellinger T, Cebolla AM, Dan B, McIntyre J and Cheron G (2014) Physiological modules for generating discrete and rhythmic movements: action identification by a dynamic recurrent neural network. Front. Comput. Neurosci. 8:100. doi: 10.3389/fncom.2014.00100*

*This article was submitted to the journal Frontiers in Computational Neuroscience. Copyright © 2014 Bengoetxea, Leurs, Hoellinger, Cebolla, Dan, McIntyre and Cheron. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Physiological modules for generating discrete and rhythmic movements: component analysis of EMG signals

#### **Ana Bengoetxea1,2\*, Françoise Leurs <sup>1</sup> , Thomas Hoellinger <sup>1</sup> , Ana Maria Cebolla<sup>1</sup> , Bernard Dan1,3 , Guy Cheron1,4 and Joseph McIntyre5,6**

<sup>1</sup> Laboratoire de Neurophysiologie et Biomécanique du Mouvement, Faculté des Sciences de la Motricité, Université Libre de Bruxelles, Brussels, Belgium <sup>2</sup> Departamento de Fisiología, Laboratorio de Cinesiología y Motricidad, Facultad de Medicina y Odontología, Universidad del País Vasco-Euskal Herriko Unibertsitatea (UPV/EHU), Leioa, Spain

<sup>3</sup> Département de Neurologie, Hôpital Universitaire des Enfants Reine Fabiola, Université Libre de Bruxelles, Brussels, Belgium

<sup>4</sup> Laboratoire d"Électrophysiologie, Université de Mons-Hainaut, Mons, Belgium

<sup>5</sup> Heath Division, Fundacion Tecnalia Research and Innovation, San Sebastian, Spain

6 IKERBASQUE Science Foundation, Bilbao, Spain

### **Edited by:**

Tamar Flash, Weizmann Institute, Israel

### **Reviewed by:**

Todd Troyer, University of Texas, USA Juan C. Moreno, Spanish National Research Council, Spain

### **\*Correspondence:**

Ana Bengoetxea, Departamento de Fisiología, Laboratorio de Cinesiología y Motricidad, Facultad de Medicina y Odontología, Universidad del País Vasco-Euskal Herriko Unibertsitatea (UPV/EHU), Barrio Sarriena s/n., 48940 Leioa, Spain e-mail: ana.bengoetxea@ehu.es

A central question in Neuroscience is that of how the nervous system generates the spatiotemporal commands needed to realize complex gestures, such as handwriting. A key postulate is that the central nervous system (CNS) builds up complex movements from a set of simpler motor primitives or control modules. In this study we examined the control modules underlying the generation of muscle activations when performing different types of movement: discrete, point-to-point movements in eight different directions and continuous figure-eight movements in both the normal, upright orientation and rotated 90◦ . To test for the effects of biomechanical constraints, movements were performed in the frontal-parallel or sagittal planes, corresponding to two different nominal flexion/abduction postures of the shoulder. In all cases we measured limb kinematics and surface electromyographic activity (EMG) signals for seven different muscles acting around the shoulder. We first performed principal component analysis (PCA) of the EMG signals on a movement-by-movement basis. We found a surprisingly consistent pattern of muscle groupings across movement types and movement planes, although we could detect systematic differences between the PCs derived from movements performed in each shoulder posture and between the principal components associated with the different orientations of the figure. Unexpectedly we found no systematic differences between the figure eights and the point-to-point movements. The first three principal components could be associated with a general co-contraction of all seven muscles plus two patterns of reciprocal activation. From these results, we surmise that both "discreterhythmic movements" such as the figure eight, and discrete point-to-point movement may be constructed from three different fundamental modules, one regulating the impedance of the limb over the time span of the movement and two others operating to generate movement, one aligned with the vertical and the other aligned with the horizontal.

**Keywords: rhythmic movement, figure-eight, muscular synergy, principal component analysis, varimax factor analysis, upper limb**

## **INTRODUCTION**

Following the quantitative definitions for discrete and rhythmic gestures proposed by Hogan and Sternad (2007), handwriting movements, in terms of behavioral and observational features, are special cases of discrete movements because they have rhythmic phases but last a finite duration, with the hand starting and ending at zero velocity. Making the distinction between discrete and rhythmic movements is central because their underlying neural control could be different (Hollerbach, 1981). In fact, one can find in the literature three different proposals concerning the control of discrete vs. rhythmic movements. One view is that rhythmic movements are a concatenation of a series of discrete movements, the latter of which form the basic building blocks for complex movements (Abend et al., 1982; Soechting and Terzuolo, 1987a,b; Kalaska et al., 1997; Sabes, 2000). An opposing view is that rhythmic movements represent the fundamental class and that discrete movements are simply abbreviated rhythmic movements (Sternad and Schaal, 1999; Sternad et al., 2000; Schaal and Sternad, 2001; Sternad and Dean, 2003). Both of these viewpoints would suggest that only a single, common control mechanism is used to achieve both types of movement. A third possibility is that rhythmic and discrete movements represent two distinct movement classes that are mediated by separate neural control circuitry. Recent behavioral (Ikegami et al., 2010; Howard et al., 2011) and imaging studies (Schaal et al., 2004) support this latter hypothesis.

Numerous studies have examined the invariance of kinematic parameters for drawing movements, looking for the principles used by the central nervous system (CNS) for motor control (Viviani and Terzuolo, 1982; Lacquaniti et al., 1983; Viviani and McCollum, 1983; Soechting et al., 1986; Lacquaniti, 1989). Based on the kinematic invariance of the end-effector obtained, these authors have proposed that for curved movements the CNS respects the so-called "2/3 power law" and that each kinematic segment respects the same kinematic invariance presented by discrete movement, as specified by the "isochrony principle". In light of these kinematic invariances, their conclusions have been used to support the hypothesis that the figure-eight and other "discrete-rhythmic" movements are composed of a series of concatenated discrete movements. Indeed, the observed presence of multiple peaks in the endpoint velocity profile might suggest that a figure-eight is composed of a series of superimposed discrete segments (Richardson and Flash, 2002). But kinematic segmentation doesn't necessarily imply a segmented control of the movement (Sternad and Schaal, 1999). Indeed, evidence that the figure-eight is in fact an abbreviated rhythmic movement is emerging (Bengoetxea et al., 2010).

Within the set of all handwriting movements, the figureeight is of particular interest from a theoretical and experimental point of view because it can be described as a Lissajous figure for which the vertical and horizontal frequency components are in an exact ratio of 2 (Buchanan et al., 1996). A figure eight can therefore also be described as the result of two coupled oscillators acting in perpendicular directions over a finite number of cycles (two horizontal cycles and one vertical cycle, to be exact). Although we previously demonstrated that for rapid execution of a single figure-eight movement the isochrony principle and the 2/3 power law between angular velocity and curvature are respected, and that the tangential velocity profile is invariant relative to the initial direction of movement (Cheron et al., 1999), electromyographic activity (EMG) analyses have shown that muscular activations present temporal modulation related to the figure as a whole, in contrast to directional pattern of tuning that would point to a segmented control. Moreover, we have shown that the prime movers are partitioned into two sets of synergistic muscles acting in a reciprocal mode and this reciprocal command was highly correlated with the spatial component of the velocity presenting the highest frequency (in the case of a vertical figure-eight the horizontal velocity component) (Bengoetxea et al., 2010). These results pointed to one or more oscillators controlling two muscular synergies.

In the study presented here we set out to determine if the modules underlying the production of discrete-rhythmic movements, in terms of muscle synergies, reflect an organization based on a series of discrete movements or on a combination of abbreviated oscillations. We reasoned that if two orthogonal coupled oscillators underlie the execution of the figure-eight movement, these oscillators should define two muscular synergies, each one dedicated to one of the two spatial components of the kinematics. On the other hand, we know that the synergistic organization is flexible and that a single muscle may be a member of more than one synergy (Tresch et al., 1999; Weiss and Flanders, 2004). We also know that EMG patterns are modulated by movement direction in 3D space (Flanders et al., 1994, 1996; Hoffman and Strick, 1999) and that muscle activation depend on its mechanical action, which depend on joint position (Hogan, 1985; Buneo et al., 1997). Finally, we know that the mapping of required muscle forces and joint torques is most often under constrained, allowing the CNS to exploit additional degrees-of-freedom to tune other properties of the musculoskeletal system, such as limb impedance (Hogan, 1985). We therefore looked at how each of these considerations influences the grouping of muscles into functional modules.

In the present work we asked how movement type (discrete vs. discrete-rhythmic), in addition to directional and biomechanical constraints, affects the organization of modules used to generate movements of the arm. We used principal component analysis (PCA) and varimax factor analysis to extract synchronous synergies (d'Avella and Bizzi, 2005; Klein Breteler et al., 2007) to see the relative involvement of each recorded muscle. We compared the synergies identified by these methods between different orientations, joint configurations and directions of movement for the figure eight and between figure eights and discrete point-topoint movements. In a companion article (see Bengoetxea et al., 2014) we combined this factor analysis with the identification of the relationship between EMG and movement parameters via a dynamic recurrent neuronal network (DRNN), in order to link the muscular synergies extracted with the movement generated. These two studies revealed a high-level of communality between the production of discrete and discrete-rhythmic movements and suggest an organization of motor control constructed from one or more modules controlling limb dynamical properties (e.g., impedance) and multiple modules that elicit reciprocal activation of opposing muscles to generate forces and movement.

## **MATERIAL AND METHODS**

Data were collected from a total of 8 right-handed subjects, 4 males and 4 females, aged between 21 and 40 years. All were in good health, free from known neurological disorders, and had given informed consent to take part in the study, which was approved by the ethics committee at Brugman Hospital in Brussels ("Comité d'éthique hospitalier"—OM26). Data from one subject were unfortunately unusable due to a technical problem, leaving a total subject pool of 7 (3 males, 4 females).

Subjects were asked to draw, as fast as possible, figure-eight movements in free space with the right arm fully extended at the elbow (for more details see Bengoetxea et al., 2010). Movements were initiated in the center of the figure with an initial up-right (UR), down-right (DR), up-left (UL) or down-left (DL) direction with respect to external coordinates and subjects performed each of these movements twice. Two trials for one subject were lost for technical reasons, leaving a total of 7 × 4 × 2 − 2 = 56 figureeight movements in the frontal plane. All seven subjects also performed eight point-to-point movements starting from a central target, one in each of eight different directions. In addition, three subjects (subjects 1, 2 and 3) performed figure-eight movements in both the frontal and sagittal workspaces, while the four other subjects (subjects 4, 5, 6 and 7) performed "horizontal" figureeight movements (figure eights rotated in the frontal plane by 90◦ , such that the long axis of the figure was horizontal, instead of vertical). A part of the data has been reported in a previous study (data from the four subjects performing figure eights in the frontal and sagittal planes, see Bengoetxea et al., 2010). Data from the discrete movements performed by these subjects, and from the figure-eight and discrete movements from the three other subjects, are reported for the first time here and in our companion article in this issue.

# **DATA ACQUISITION**

Data acquisition methods were the same for both the previously reported data sets (Bengoetxea et al., 2010) and the new data reported here. Movements of the index finger were recorded and analyzed using the optoelectronic ELITE system (2 CCDcameras, sampling rate of 100 Hz) (BTS, Milan) (Ferrigno and Pedotti, 1985). The cameras were placed 4 m apart from each other and 4 m from the subject. Four markers were attached to the arm (on the acromion, the lateral condoyle of the humerus, the radial apophysis of the wrist and the index finger). Velocity signals were obtained by digitally differentiating position signals using a fifth-order polynomial approximation. Reconstruction of the arm movements by the ELITE system using the trajectories of the 4 markers confirmed the visual observation that the upper arm, forearm, hand and index finger acted as a rigid link (Bengoetxea et al., 2010). Thus, we analyzed here only the index-finger marker that was used to trace the figure-eight.

Surface EMG was recorded with the TELEMG system (BTS, Milan) synchronized with the kinematic data. Silver-silver chloride electrode pairs (interelectrode distance of 2.5 cm) were placed over the belly of the following 7 muscles: posterior deltoid (PD), anterior deltoid (AD), median deltoid (MD), pectoralis major superior and inferior (PMS and PMI), latissimus dorsi (LD) and teres major (TM). Raw EMG signals (differential detection) were amplified by a portable unit 1000 times and transmitted to the main unit with a telemetry system (Telemg, BTS). A functional resistance test that isolated specific muscles was made in order to verify the absence of cross talk between adjacent muscles. Thereafter, EMGs were band-pass filtered (10–500 Hz), digitized at 1 kHz, full-wave rectified and smoothed by means of a thirdorder averaging filter with a time constant of 20 ms (Hof and Van den Berg, 1981).

# **COMPONENT ANALYSIS**

In the first part of our study we set out to identify synchronous synergies (d'Avella and Bizzi, 2005) using PCA. The input to the PCA was the EMG signal for each muscle and each figureeight movement. The EMG signals were first normalized on a movement-by-movement basis for the discrete-rhythmic movements. For each EMG recording, the minimum value over the entire signal for each movement was subtracted and the maximum value was used to normalize the peak EMG signal during figure-eight movements. With this normalization, all EMG signals for each movement ranged from 0 and 1. A similar analysis was performed on a set of 8 point-to-point movements concatenated together, one in each of 8 directions, to produce the principal components associated with the production of discrete movements (Klein Breteler et al., 2007). We performed the PCA using the Statistica (© Statsoft) factor analysis module. This analysis resulted in 7 principle component vectors each composed of 7 loading factors (W1muscle–W7muscle) corresponding to the weights given to the EMG from each of the 7 muscles for each factor.

We focused the subsequent analysis of the principal component decompositions on the first 3 principal components, as these components accounted for 83.01 ± 2.84 % of the variance in the EMG data (mean across movements). We also computed the varimax rotation (Kaiser, 1958) of the first three principal components for each movement to generate a new set of three orthogonal loading vectors for each movement.

Because we computed the principal components on a movement-by-movement basis, we obtained multiple principal component and varimax decompositions for each direction, plane, figure-eight orientation and movement type. The principal component calculation, by design, assigns loading vectors in decreasing order according to the amount of variance explained by each one. If one adopts the basic premise that principal components reflect an underlying module or synergy, it is possible for a given synergy to be represented by the first, second or third principal component for a given movement trial, if the amount of movement (variance) associated with a given synergy increases or decreases between trials. We therefore used kmeans clustering, with the number of clusters set to three, as an objective means to assign each loading vector to the group PC1, PC2 or PC3 based on similarity rather than on the amount of variance explained. The clustering algorithm was applied the set of first three principal component loadings collected across all movements, all mixed together for a total of 351 vectors, without regard for each vector's ranking within the trial from which it was obtained. If the synergies are stable across movement types and subjects, one would expect that one of the three loading vectors from each movement would be assigned to PC1, one to PC2 and one to PC3. In the rare case where the k-means clustering assigned two loading vectors from a single movement trial to the same cluster, the loading vector with the highest distance from the cluster mean was shifted to the cluster that was left unassigned for that trial. A similar process was applied to assign the varimax loading vectors in each trial to groups VM1, VM2 and VM3.

The time courses of the activation of each principle component (PC1–PC3) and varimax loading (VM1–VM3) were then computed by projecting, at each time step, the vector of 7 muscle EMGs onto the loading vector describing each component. Note that the calculation of the covariance used to compute the PCA removes the mean from each of the 7 columns of the input matrix (i.e., removes the average EMG at the input) and also scales each input so that each channel has a variance equal to 1. The reconstructed EMG signals were therefore scaled and offset appropriately to account for this scaling of the inputs to the PCA.

# **Statistical analyses**

We considered that the loading vectors associated to PC1, PC2 and PC3 within a given condition were sufficiently similar to allow the PCs to be compared based on the mean and variance of the loading vectors computed across subjects. This assertion is supported by two statistical arguments. First, we performed the k-means cluster analysis on the ensemble of loading vectors identified by PCA for the figure-eight movements performed in the frontal plane. Out of 54 movements (162 principal-component loading vectors), only one PC1 loading vector was misclassified into the cluster containing PC2s. Thus, the principal components were highly repeatable and unambiguously grouped into three clusters. We further verified that the loading values for each muscle and each PC across subjects did not violate the assumption of a normal distribution, according to the Kolmogorov-Smirnov (K-S) test (*p* > 0.20 in all cases). We therefore used MANOVA to compare the average PC1, PC2 or PC3 vectors across different conditions. Whenever the MANOVA revealed a significant difference (*p* < 0.01) of the loading vectors between conditions we performed a one-way ANOVA muscle-by-muscle to determine which muscle loadings were affected.

Loading vectors based on the varimax rotation (VM1, VM2 and VM3) were somewhat less distinct across trials. Using the same k-means clustering as described for the principal components in the frontal plane, there were more instances (8 out of 54 movements) where the k-means clustering attributed two loading vectors from the same movement to the same group. Furthermore, even after correcting these cases by reclassifying the vector with the largest distance from the cluster mean, the loading values for individual muscles did not always follow the normal distribution across trials (K-S: *p* < 0.05). Nevertheless, based on visual inspection and the central limit theorem, we considered that the within-subject averages could be compared across trials as a means of detecting systematic changes between conditions. Indeed, when we computed the average VM1, VM2 and VM3 for each subject across all 4 figure-eight movement directions (See Section Results for further details), the individual weight for each muscle of these average loading vectors did respect the normal distribution (K-S: *p* > 0.2). We therefore also applied MANOVA to compare VM1, VM2 and VM3 for different types of movement, as we did for the principal components PC1, PC2 and PC3.

To compare which of the two factoring methods (principal component or varimax) produced the least variation in loadings across subjects, we counted the number of times that the cluster analysis assigned two loading vectors from the same movement to the same cluster, with the underlying assumption that the more the loading vectors varied in terms of directions, the higher the chance that such misclassification can occur. We also computed the distance from the cluster mean for each loading vector and applied a one-way ANOVA with component (PC1, PC2, PC3, VM1, VM2, VM3) as the independent factor as a measure of the dispersion of individual vectors within each cluster.

# **RESULTS**

We first looked to see if the PCA, which we applied to each movement one-by-one, was able to identify regular patterns of muscle involvement across the different movement directions and movement planes. Given that subjects may differ in the way that muscles may be organized into modules or synergies, we analyzed first the results from a single representative subject. This is the same subject whose data was used to train the artificial neural network in our companion study (see Bengoetxea et al., 2014). We then analyzed the principal components obtained across all participants to look for systematic, subject-independent changes in potential muscle synergies between conditions.

# **PCA ANALYSIS**

**Figure 1** illustrates for the one representative subject the factor loadings for the 3 first principal components (left column) for each initial direction movement, the latter represented by different colors and symbols. From the factor loadings, one can observe that PC1 included a contribution of all 7 muscles in a synergistic pattern (all weights were positive), PC2 identified a reciprocal pattern of activation (positive and negative weights) between MD, PD and TM on one side and AD, PMS and PMI on the other (LD had loadings close to 0), while PC3 identified a different reciprocal relationship with AD and MD clearly on one side and PMI and TM clearly on the other (PD, PMS and LD had loadings close to 0). It is interesting to note that in PC2 and PC3 two groups of muscles appeared according to their mechanical actions. For PC2, the two sets of muscles have opposite actions with respect to horizontal (left-right) movements, while for PC3, the groups of muscles have opposite actions with respect to vertical (up-down) movements.

**Figure 1** (right column) also shows the temporal evolution of the principal components for each of the 4 different movement directions (UL, UR, DL, DR). One can see that PC1 showed activation over the duration of the movement, with little or no activity in the stationary phase of the recording (prior to 0.5 s and after 2.0 s in the figure shown). There was little difference between the 4 different movement directions. The time course of the second and third PCs both showed significant modulation over the course of the movement that depended on the direction. PC2 presented 3 positive and 2 negative peaks for the UR and DR directions and 3 negative and 2 positive peaks for the UL and DL directions. PC3 also showed temporal modulation, but differed in terms of the number of peaks compared to PC2. Specifically, PC3 presented 1 negative and 2 positive peaks for the UL and UR directions and 1 positive and 2 negative peaks for the DL and DR directions.

To characterize to what extent the different muscles participated in each principal component, independent of their direction of action, we performed for this subject an ANOVA with muscle and PC as repeated measures on the absolute values of the loadings factors, with principal component (PC1, PC2, PC3) and muscle (AD, MD, PD, PMS, PMI, LD, TM) as independent factors. The cross-effect showed a significant difference between muscles and PC (*F*(12,84) = 48.012, *p* < 0.001). Scheffe's *posthoc* analyses showed that AD and PMI participated with similar loadings in all three PCs (mean loading ± SD were 0.53 ± 0.1, 0.52 ± 0.11, 0.54 ± 0.1 for AD for PC1, PC2 and PC3 respectively and 0.65 ± 0.07, 0.50 ± 0.09, 0.39 ± 0.09 for PMI). MD and LD

factor loadings for PC1, PC2 and PC3 for each of the 7 muscles recorded. AD: anterior deltoid, MD: medial deltoid, PD: posterior deltoid, PMS: pectoralis major superior, PMI: pectoralis major inferior, LD: latissimus dorsi and TM: teres major. The right column present the temporal component for each PC and each direction. Each direction is identified by colors. UR: green, UL: purple, DR: blue and DL: red.

participated significantly more in the first PC compared to PC2 and PC3 (*p* < 0.03), whereas they showed little or no difference between PC2 and PC3. LD had loadings near to 0 for PC2 and PC3 while MD participated in both PC2 and PC3 at the same level as AD and PMI (mean loadings ± SD were 0.75 ± 0.03, 0.46 ± 0.05, 0.35 ± 0.05 for MD for PC1, PC2 and PC3 respectively and 0.86 ± 0.02, 0.1 ± 0.07 and 0.11 ± 0.07 for LD). PD and PMS had the same level of participation for PC1 and PC2 but were not implicate in PC3 (mean loadings ± SD were 0.74 ± 0.05, 0.59 ± 0.05 and 0.04 ± 0.03 for PD for PC1, PC2 and PC3 respectively and 0.53 ± 0.09, 0.75 ± 0.05 and 0.1 ± 0.07 for PMS). TM was the only recorded muscle that had the same level of activity in PC3 as in PC1 (0.58 ± 0.1, 0.67 ± 0.06) and participated lightly in PC2 (0.21 ± 0.12).

**Figure 2** illustrates the EMG signals for each muscle corresponding to the time course and loadings of each of the first 3 principal components, shown here for the movement initiated downward and to the right (DR). The reconstructed signals reinforce the interpretation given previously about the role of each principal component (synergy) in the execution of the movement. Specifically, one can observe a co-contraction of all muscles during the movement for EMG reconstructed from PC1, whereas synergies from PC2 and PC3 produced reciprocal activation patterns for which not all muscles participated at the same level. For EMG activations reconstructed from PC2 the figure illustrates that for a movement initiated in the down and right direction, MD and PD were the prime movers (for a right arm their action is to move the arm to the right) and presented a reciprocal command with respect to AD, PMS and PMI. LD and TM are not implicated in this synergy. The synergy extracted by PC3 shows that TM and PMI were agonists and presented the first activity given the fact that they are muscles that move the arm downward and presented a reciprocal activity with respect to AD, MD and PMS. But the reciprocal activity for PC3 was less "pure" than for PC2.

**Figure 3** shows the muscle loadings for each of the first three principal components for all subjects, separated as a function of PC and of movement direction. The loading factors were remarkably similar for the 7 different subjects; the average intersubject standard deviation for each muscle and each factor was 0.133 ± 0.047. This observation, plus the stability in the cluster analysis of PCs (See Section Methods), justified the statistical comparison of loading patterns across subjects.

### **FACTORS AFFECTING MUSCLE SYNERGIES**

Based on the analysis and observations presented above we proceeded to analyze the data based on the means and variances across subjects of the muscle loadings for each of the first three principal components. We considered four main factors that might affect how muscles are grouped into synchronous synergies:


The search for potential effects of factors 1 and 4 addressed the primary questions that motivated our study, i.e., how might time series of movement directions affect the grouping of muscles into modules or synergies? Thus, all seven subjects were asked to perform trials to allow these two contrasts. The other two factors (figure orientation and workspace) addressed secondary questions that provided interesting benchmarks with which to compare the primary results. For practical reasons, therefore, we asked only 4 of our subject to perform figure eights in both the vertical and horizontal directions and only 3 of our subject to perform movements in both the frontal and sagittal planes. The results of each of these contrasts are described below. The loading

vectors averaged across subjects are shown in **Figure 4** while **Figure 5** shows the contribution of each muscle to each of the first three principal components for each subject and each condition. The details of the statistical tests are reported in **Table 1**.

# **Movement direction**

All seven subjects performed the figure-eight movement two times in each of the four possible directions. The loadings within a given PC were remarkably similar across the four different movement directions. There was no significant difference between the loading vectors computed across subjects for each direction (*p* > 0.2, see **Table 1**) and one-way ANOVA test conducted separately on each muscle loading for each PC confirmed that the loadings assigned to each muscle did not change for any of the first three PCs as a function of movement direction.

### **Figure-eight orientation**

Four subjects traced out figure eights in both the vertical and horizontal orientations in the frontal plane. The average loading vectors across subjects were qualitatively very similar for the two orientations. Nevertheless, some reliable differences were detected. The MANOVA test of loading vectors was significant for

each of the three PCs (*p* < 0.01). One-way ANOVAs computed *post-hoc* showed that PD, PMI and TM all had significantly greater weight (more negative values) in PC3 for the horizontal vs. vertical orientations. There was a concomitant decrease in the weight of PMI in PC1 and of PD and TM in PC2. AD had somewhat less influence in PC2 (less negative weight) in the horizontal figure eight, but there was no change in AD"s contribution to either PC1 or PC3.

### **Frontal vs. sagittal plane**

Three subjects performed the figure eight movements in two different nominal orientations of the outstretched arm: with the arm extended straight ahead (frontal plane) and with the arm stretched out straight to the side (sagittal plane). Again, the loadings for each PC were qualitatively similar between the two conditions, but with some small variations. The MANOVA showed only a marginally significant difference between the planes for PC1 (*p* = 0.0206) but statistically reliable difference between the planes for PC2 and PC3 (*p* < 0.01). One-way ANOVAs applied *post hoc* demonstrated that PMS had a slightly greater influence in PC1, MD had a greater influence and AD and TM had lesser influences on PC2 and MD and PMI had less weight in PC3, in the sagittal compared to the frontal plane of movement.

### **Discrete vs. discrete-rhythmic movements**

All seven subjects performed both the figure eights and the pointto-point movements in the frontal plane. As for the comparison between movement directions, there was no apparent difference in the principal components computed for the discrete-rhythmic figure eights and the discrete point-to-point movements. The MANOVA showed no significant difference (*p* > 0.4) between movement types for any of the three principal components PC1, PC2 and PC3.

### **VARIMAX ROTATION**

Compared to the un-rotated principal components computed for S1 (**Figure 1**), the varimax rotation for the same subject (**Figure 6**) grouped muscles into components (synergies) in a quite different fashion. Rather than identifying a "co-activation" module and two "reciprocal" modules, as seen for the first three principle components, the three varimax components could be described

as one that drives more-or-less rightward rotations (MD, PD, LD), one that drives more or less leftward rotations (AD, PMS, PMI), and one that favors muscles with a component of action in the downward direction for movements of the outstretched arm (PMI, LD, TM). But this is a gross over-simplification and the participation of the different muscles in the three varimax loading vectors, which was much more mixed in terms of direction of action. Indeed, even if VM1 is dominated by muscles that rotate the outstretched arm rightward (MD, PD), other muscles (LD, TM) participate just as much in VM1 as in VM3. Similarly, PD contributes as much to VM3 as it does to VM1.

**Figure 7** shows the comparison of average loading for VM1, VM2 and VM3 as a function of movement direction (UR, UL, DR, DL), figure-eight orientation (horizontal, vertical), movement plane (frontal or sagittal) and movement type (discrete-rhythmic or discrete). **Figure 8** shows the contribution of each muscle to each varimax component, computed separately for each subject and each condition. The only highly reliable difference found in the MANOVA analysis of these data (see **Table 2**) was in the comparison between the vertical and horizontal figure-eight orientations (*p* < 0.01 for VM1 and VM3; *p* = 0.056 for VM2). *Post-hoc* analyses showed fewer and statistically weaker differences (*p* < 0.05) in the loading for individual muscles, compared to the equivalent tests applied to the principal components.

### **COMPARING PCA VS. VARIMAX**

By mathematic principle, the varimax rotation of the three first principal components explains the same amount of variance in the data as the first three principal components do themselves. So one cannot say that one decomposition is to be preferred over the other on that basis. We instead asked whether the PCA or varimax decomposition was more invariant across subjects and conditions. Two observations suggest that the first three principal components were somewhat more regular than the corresponding varimax rotation. First, in the k-means clustering process that we used to assign the PCA vectors to the PC1, PC2 or PC3 groups and to assign the varimax vectors to the VM1, VM2 and VM3 groups, one would expect that if the three orthogonal vectors are aligned across subjects and conditions, the k-means algorithm should assign one vector from each individual decomposition to each group. If the orientation of the three vectors varies significantly, however, from the mean, a given vector might fall between two clusters, causing two vectors from the same decomposition to be assigned to the same group. This happened 3 times for the principal components and 21 times for the varimax decompositions, out of a total of 117 movements. Second, an ANOVA applied to the distance from the mean for each component cluster (PC1, PC2, PC3, VM1, VM2, VM3) showed a significant main effect (*F*(5,696) = 50.69, *p* < 0.001) and a planned comparison showed a significant difference overall between the principal component clusters and the varimax clusters (*F*(1,696) = 36.09, *p* < 0.001). Scheffe's *post-hoc* analysis showed the PC1 had the least average distance; PC2 and VM2 were next, followed by PC3, VM1 and VM3. Thus, overall the principal component decomposition was less variable across movements, with

### **Table 1 | Statistics for raw PCA**.


Significant differences identified by post hoc analyses (right columns) are indicated by \* (p < 0.05) and \*\* (p < 0.01).

the difference being mainly attributed to the lower intertrial variability of the co-activation component defined by PC1.

# **DISCUSSION**

In this study we looked for modularity in patterns of muscle activation used to perform discrete-rhythmic movements, a class of movements typical of handwriting (Hogan and Sternad, 2007), and we compared the underlying structure with that identified for discrete movements performed in eight different directions in the frontal plane. We asked subjects to draw figure eights in different directions, in different orientations and in two different nominal arm postures. Thus, in addition to the main comparison between discrete and discrete-rhythmic movements, we also considered how the underlying modules might be tuned as a function of the directional, biomechanical and rhythmic constraints.

PCA, and other forms of factor analysis, have in recent years become an important tool to identify the muscular synergies underlying human movement, from reaching (d'Avella et al., 2003; Bizzi et al., 2008) to locomotion (Ivanenko et al., 2004; Dominici et al., 2011) passing through complex movements (Weiss and Flanders, 2004; Klein Breteler et al., 2007; Danna-Dos-Santos et al., 2008). Here we used PCA and the varimax rotation as means to identify structure in the activation of different muscles.

Both the three principal component vectors and the three varimax vectors were remarkably stable across the different movement directions, figure-eight orientations, joint configurations (movement plane) and the types of movement (discrete or rhythmic). There were small, but measurable differences in loading between the figure-eight orientations (horizontal or vertical) and between the movement's planes (frontal or sagittal). It should be noted, however, that fewer subjects performed the movements in each of these two conditions, whereas as all 7 subjects performed the figure eights and discrete movements in the frontal plane. It is possible that the orientation and movement-plane comparisons were more sensitive to inter-individual changes between conditions due to the lower N in each case. Furthermore, we do not exclude the possibility that loadings change between movement directions and movement types for individual subjects. But the main result, in the context of the questions evoked in the Introduction, is that overall the synchronous synergies, whether identified through PCA or varimax rotation, were no more affected by the type of movement (rhythmic or discrete) than by changing the time series of movement directions, the organization of oscillations in cyclic movements or the biomechanical constraints. This observation runs counter to our hypothesis by which we expected the CNS to exploit the redundant degrees of freedom within the system to select synergies that would be best adapted to the performance of one or the other type of movement. Our component analyses suggest that three main modules can be extracted for the movements described here, because they capture the bulk of the variation in EMG signals. For the un-rotated principal components, the first component showed a general co-activation of all the muscles, irrespective of the type of movement or the initial direction. This co-activation started and ended with the movement despite the fact that before and after the movement the arm was held in a static position. The co-contraction induced by PC1 would tend to stiffen the arm and thus serve to stabilize the arm's posture before, during and after the movement and to tune the impedance of the limb to meet the demands of the movements to follow. The second and the third principal components each showed a pattern of reciprocal activation but differed in terms of how the muscles were grouped. Whereas the second module encompassed muscles that are antagonistic in terms of their horizontal direction of action, the same muscles were divided in the third module according to their vertical direction of action. Under this decomposition, the actual movement would then be realized by two reciprocal synergies represented by PC2 and PC3. According to the parlance proposed by Hogan and Sternad (2012), PC1 would constitute a "mechanical impedance" synergy while PC2 and PC3 would each be representative of "oscillation" synergies. Such a decomposition would be particularly adapted to rhythmic movements

direction.

were each reciprocal synergy could be associated with a separate oscillator.

In contrast, each of the varimax components manifested only non-negative weightings (on average). The varimax decomposition would be consistent with "sub-movement" synergies that would tend to push the limb in one direction or another through the activation of a set of agonistic muscles, without the automatic co-activation or reciprocal de-activation of the effective antagonists (Hogan and Sternad, 2012). The varimax decomposition is more representative of a vector strategy in which each underlying module drives the limb in a given direction, reflecting the fact that muscle can pull, but not push, and thus cannot be negatively active. Modulation of limb impedance can, nevertheless, be achieved through the varimax decomposition, even if there is no identified co-activation module *per se*. Co-activation, and thus impedance modulation, could be achieved by recruiting simultaneously VM1, VM2 and VM3, while cyclic movements could be achieved by various activations of the same modules to generate movement in different directions.

### **METHODOLOGY**

One might ask to what extent the details of the analysis procedures play a role in the modules that we observed. For instance, it is known that PCA are potentially sensitive to the normalizations applied to the input data. In this study we set out to compute the principal components on a movement-by-movement basis, thus allowing us to examine the stability of the principal component decompositions across repeated movements in the same conditions and across different conditions by using standard statistical methods such as ANOVA.<sup>1</sup> But the algorithms for PCA transform the incoming data to be centered on zero with variance equal to one, essentially normalizing the data on a trial-by-trial basis. By doing so, we open up the possibility that factor decompositions might change from one movement to the next due to the scaling factors that also changed from trial to trial. Surprisingly, our data showed that the decompositions were very stable, despite potential variability stemming from the normalization procedure. Our trial-by-trial normalization represents the more conservative method vis-à-vis our conclusions that synchronous muscular synergies vary little between discrete and discrete-rhythmic movements.

A second, perhaps more fundamental question is that of the factorization methods used to analyze the data. Different approaches of factor analysis have been used in the past to extract synergies, and the results obtained depend on the method used (Tresch et al., 2006). Here we compared the results from two different methods, varimax vs. unrotated principal components. Can one claim that the varimax is a better description of the underlying neural structure than the un-rotated principal components, or vice versa, based on our empirical observations? Both the un-rotated principal components and the varimax decompositions are mathematically valid solutions that describe equally well the variance of the various EMG signals. We therefore asked whether one or the other provided a more consistent representation of muscle activation patterns across subjects and across movements. In our conditions we found that the principal component decomposition was less variable than the varimax decomposition when computed on a trialby-trial basis. One might expect to see such a result if the neural hardware indeed organizes muscles into a fixed set of synergies according to PC1, PC2 and PC3. Thus, these observations support the hypothesis that muscles are organized in a set of co-contraction and reciprocal synergies. These observations do not, however, constitute a definitive proof, due to properties of the principal component computation. Principal component vectors are in fact the eigenvectors of a covariance

<sup>1</sup>However, that the discrete movements were normalized as a set (i.e., the EMG from each muscle was divided by its maximum value across all 8 movements) and the principal component analysis was applied to each set individually. This makes sense in that through the comparison of these data with the figure eights we were testing the null hypothesis (which actually proved to be true) that figure eight movements are constructed as a serial concatenation of modules used to produce discrete movements.

### **Table 2 | Statistics for varimax rotation**.


Significant differences identified by post hoc analyses (right columns) are indicated by \* (p < 0.05).

matrix. Those vectors are distinct and well defined when the eigenvalues corresponding to each vector are different. Had the first and second principal components accounted for similar amounts of variance in the EMG, the directions of the first and

second PCs would be ill-defined and one would expect them to vary considerably just due to measurement noise. By analogy, the process of finding the varimax solution might add variability across trials if the optimal solution is not sharply defined in each case.

By considering the un-rotated principal components and the varimax rotation of the same data we have therefore evoked an interesting contrast in the way that movements can be generated through the action of muscles. We note, however, that this clear contrast between the two identification strategies was a fortuitous outcome of our experimental conditions. The varimax rotation that we used here does not explicitly seek to generate only positive loading factors for muscles; it just happened to do so for the movement studied here. Recent studies (Ivanenko et al., 2004; Tresch et al., 2006; d'Avella et al., 2008; Delis et al., 2014) have employed the technique of non-negative factorization to explicitly find such solutions. Similarly, PCA does not explicitly seek to group muscles into a co-contraction module plus reciprocal activation, but that also happened to be the outcome of the analysis of our data. Future studies could use instead a factorization algorithm that explicitly looks to organize components in this manner. For instance, a hierarchical factor analysis could be use, where the "secondary" factor would identify the co-contraction unit while a rotation of the primary vectors to maximize "reciprocity" could provide an appropriate solution for future studies.

The comparison of the two factorial decompositions presented here and the discussion above should therefore serve as a cautionary tale for future studies. From the purely mathematical analysis presented here, one cannot claim with high confidence that we have identified the neural structure of modules or synergies based only on the correlations between muscle activations. As we have shown here, the grouping of muscles into purported synergies through component analysis of EMG will depend highly on the *a priori* choice as to what type of factor analysis is performed and on the experimental conditions. Additional information is needed before one can state a clear preference for one decomposition over another. In our companion article, we endeavored to do just that, by using an artificial dynamic recurrent neural network to search for the relationship between EMG and movement. Nevertheless, the simple fact that 3 components can account for a large part of the variance in EMG signals, regardless of which rotation is used, is consistent with what one would predict if activation patterns are organized into synergies as a means of reducing the number of degrees of freedom in the mapping from desired movement to muscle activations.

### **IMPLICATIONS FOR NEURAL MECHANISMS**

One can see in these analyses that, whichever decomposition is considered (PCA or varimax) two muscles might be agonistic in one synergy and antagonistic in another. It would be difficult to understand how the same muscle, if activated as a whole, could participate properly in both synergies. If we refer to the preferred action direction of motor units of the deltoid muscle (Herrmann and Flanders, 1998), most of the motor units exhibit a cosine tuning function showing a unique preferred direction. Thus, whereas a single muscle may be shared across muscle synergies, this sharing may be realized by incorporating the motor units of a given muscle into each synergy according to the preferred direction of each motor unit.

Transitions from fast discrete movements to rhythmic movement (Sternad et al., 2013) as well as transitions from slowly continuous movement to sub-movements (Teeken et al., 1996; van der Wel et al., 2009) reveal that the underlying controls for discrete and rhythmic movements are based on the same modules. Our results provide further evidence that muscular synergies underlying both types of movement are the same. Despite the fact that at the cerebral level the control of discrete vs. rhythmic movements has been shown to implicate different cortical areas (Schaal et al., 2004), the fact that directional discrete movements and rhythmic figure-eight present no differences in the three first principal components identified at the muscular level supports the hypothesis that discrete and rhythmic movements present the same neural control (Sternad et al., 2000, 2002; Sternad and Dean, 2003), at least at the level of synchronous muscular synergies.

The organization of those modules might, however, reflect higher levels of processing as well. In the case of our principal component decomposition, control would be shared between a co-contraction module and two reciprocal modules, the latter of which were surprisingly well aligned with the vertical and horizontal directions of movement (see also our companion paper for further evidence of this point). It is likely not a coincidence that the modules are oriented along these two canonical directions. One might hypothesize that the horizontal/vertical orientations of PC2 and PC3 are linked to the spatial characteristics of the figure eight, which was intrinsically orientated with the horizontal and vertical. But the PCA of our discrete movements was carried out separately from the computations on the figure eights, yet we found the same groupings in either case. Since the eight movement directions were equally spaced in all directions in the frontal plane, bias in the directions of movement cannot explain this phenomenon. Gravity itself acting on the arm could provide an explanation, as one might argue that the up/down synergies could take advantage of gravity as a driving force, reducing the amplitude of EMG modulation needed to produce movement in the vertical direction. But our EMG signals were normalized muscle-by-muscle, removing this as a possible explanation as well. On the other hand, there is ample evidence that human perception and visuomotor coordination is preferentially tuned to the vertical dimension tied to a multi-modal reference frame that includes the body axis and gravity (Howard, 1982; Paillard, 1991; Gentaz et al., 2001; McIntyre and Lipshits, 2008; Tagliabue and McIntyre, 2012). The organization of muscular synergies, which may be implemented at the level of the periphery, might nevertheless be tuned based on constraints defined in supraespinal areas involved with the processing of spatial information (Bizzi and Cheung, 2013).

# **CONCLUSIONS**

In this study we set out to compare discrete and discrete-rhythmic movements, in terms of synchronous muscular synergies than can be identified through principal component and varimax factor analysis. To this question we found a remarkably clear result: the invariance of the synchronous synergies, be they identified by principal components or varimax factors. This result suggests that a common mechanism underlies both types of movements, at least in terms of purported synergies that underlie the generation of muscle activation patterns. It is perhaps somewhat surprising that the CNSc does not exploit the additional degrees of freedom for generating forces to tune the system differently for these two classes of movements.

The secondary question of whether the principal components or varimax decompositions better represent the modules use to produce a certain class of upper-limb movements remains open. The un-rotated principal components suggested an organization based on a co-contraction module plus two modules for reciprocal activation, one horizontal and the other vertical. The varimax decomposition indicated instead a set of three basis vectors used to construct forces in different directions. Based on the analyses presented here, we argue for the cocontraction plus reciprocal organization, because of the somewhat less variability in the principal component decomposition and on conceptual grounds. Nevertheless, the arguments presented here are admittedly not conclusive. In our companion article we search for further evidence to support our hypothesis by using an artificial neural network to identify the functional significance, in terms of movement, of the modules identified here.

# **ACKNOWLEDGMENTS**

This work was funded by the Belgian Federal Science Policy Office, the European Space Agency, (AO-2004, 118), the Belgian National Fund for Scientific Research (FNRS), the research funds of the Université Libre de Bruxelles and of the Université de Mons (Belgium), the FEDER support (BIOFACT), the MIND-WALKER project (FP7—2007–2013) supported by the European Commission and the research funds of the Universidad del Pais Vasco/Euskal Herriko Unibertsitatea (UPV/EHU).

### **REFERENCES**


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 11 July 2014; accepted: 07 December 2014; published online: 09 January 2015*.

*Citation: Bengoetxea A, Leurs F, Hoellinger T, Cebolla AM, Dan B, Cheron G and McIntyre J (2015) Physiological modules for generating discrete and rhythmic movements: component analysis of EMG signals. Front. Comput. Neurosci. 8:169. doi: 10.3389/fncom.2014.00169*

*This article was submitted to the journal Frontiers in Computational Neuroscience*.

*Copyright © 2015 Bengoetxea, Leurs, Hoellinger, Cebolla, Dan, Cheron and McIntyre. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution and reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms*.

# Effective force control by muscle synergies

# *Denise J. Berger and Andrea d'Avella\**

*Laboratory of Neuromotor Physiology, Santa Lucia Foundation, Rome, Italy*

### *Edited by:*

*Tamar Flash, Weizmann Institute, Israel*

### *Reviewed by:*

*Florentin Wörgötter, University Goettingen, Germany Sandro Mussa-Ivaldi, Northwestern University, USA*

### *\*Correspondence:*

*Andrea d'Avella, Laboratory of Neuromotor Physiology, Santa Lucia Foundation, Via Ardeatina 306, 00179 Rome, Italy e-mail: a.davella@hsantalucia.it*

Muscle synergies have been proposed as a way for the central nervous system (CNS) to simplify the generation of motor commands and they have been shown to explain a large fraction of the variation in the muscle patterns across a variety of conditions. However, whether human subjects are able to control forces and movements effectively with a small set of synergies has not been tested directly. Here we show that muscle synergies can be used to generate target forces in multiple directions with the same accuracy achieved using individual muscles. We recorded electromyographic (EMG) activity from 13 arm muscles and isometric hand forces during a force reaching task in a virtual environment. From these data we estimated the force associated to each muscle by linear regression and we identified muscle synergies by non-negative matrix factorization. We compared trajectories of a virtual mass displaced by the force estimated using the entire set of recorded EMGs to trajectories obtained using 4–5 muscle synergies. While trajectories were similar, when feedback was provided according to force estimated from recorded EMGs (EMG-control) on average trajectories generated with the synergies were less accurate. However, when feedback was provided according to recorded force (force-control) we did not find significant differences in initial angle error and endpoint error. We then tested whether synergies could be used as effectively as individual muscles to control cursor movement in the force reaching task by providing feedback according to force estimated from the projection of the recorded EMGs into synergy space (synergy-control). Human subjects were able to perform the task immediately after switching from force-control to EMG-control and synergy-control and we found no differences between initial movement direction errors and endpoint errors in all control modes. These results indicate that muscle synergies provide an effective strategy for motor coordination.

### **Keywords: non-negative matrix factorization, isometric force, reaching movements, myoelectric control, modularity, electromyography**

# **INTRODUCTION**

How the CNS coordinates a large number of muscles to control forces and movements is a long standing issue in neuroscience. Muscle synergies, coordinated recruitment of groups of muscles with specific activation balances or temporal profiles, have been proposed as building blocks employed by the CNS to simplify the generation of forces or movements (Jacobs and Macpherson, 1996; Tresch et al., 1999; Bizzi et al., 2002; d'Avella et al., 2003; Flash and Hochner, 2005; Giszter et al., 2007; Ting and McKay, 2007; Bizzi et al., 2008; d'Avella and Pai, 2010; Lacquaniti et al., 2012; Bizzi and Cheung, 2013; d'Avella and Lacquaniti, 2013). A small number of muscle synergies, identified by multidimensional factorization techniques such as non-negative matrix factorization (NMF) (Lee and Seung, 1999), independent component analysis (ICA) (Bell and Sejnowski, 1995), and other iterative algorithms (d'Avella and Tresch, 2002; Tresch et al., 2006; Omlor and Giese, 2011), have been shown to explain a large fraction of the variation in the muscle patterns in a variety of vertebrate species (Tresch et al., 1999; Saltiel et al., 2001; d'Avella et al., 2003; Hart and Giszter, 2004; Ivanenko et al., 2004; Cheung et al., 2005; Ting and Macpherson, 2005), across different behaviors and experimental conditions (d'Avella and Bizzi, 2005; Cappellini et al., 2006; d'Avella et al., 2006, 2008, 2011; Ivanenko et al., 2007; Torres-Oviedo and Ting, 2007; Overduin et al., 2008; Torres-Oviedo and Ting, 2010; Dominici et al., 2011; Hug et al., 2011; Chvatal and Ting, 2012; Frere and Hug, 2012; Roh et al., 2012; Chvatal and Ting, 2013; d'Andola et al., 2013; Gentner et al., 2013). These observations provide support to the existence of muscle synergies as neural control strategy employed by the CNS for motor coordination. However, they do not directly demonstrate that a small number of synergies is sufficient to generate the functional output of muscle patterns, i.e., the forces or movements necessary for accomplishing a task (Alessandro et al., 2013). Thus, in order to validate muscle synergies as a neural control strategy their functional consequences need to be investigated.

Torres-Oviedo et al. (2006) first investigated the functional consequences of muscle synergies by extracting synergies simultaneously from EMGs and foot contact forces during postural responses to multidirectional stance perturbations in cats. Such functional muscle synergies were able to explain both muscle activation patterns and endpoint forces in a range of postural configurations, thereby supporting a functional role for the synergies. A common set of functional muscle synergies were then also found to explain different types of postural responses in human subjects (Chvatal et al., 2011). Furthermore, forward dynamics simulations using a musculoskeletal model of the human trunk, pelvis, and legs have shown that a small number of muscle synergies are sufficient to perform the basic sub-tasks of walking in two (Neptune et al., 2009) and three (Allen and Neptune, 2012) dimensions. More recently de Rugy et al. (2013) addressed the question of the functional consequences of muscle synergies in the context of isometric force generation at the wrist. They estimated forces as a linear function of EMGs recorded from five wrist muscles. Subject used such estimated forces to perform a force reaching task. de Rugy and collaborators then compared the forces estimated using EMGs with the forces estimated using synergies extracted from the EMGs. Four synergies explained most of the variation in the muscle patterns but they were not able to accurately reproduce the forces estimated using the recorded EMGs. However, estimated and real forces were not compared and it is not clear whether the apparent inaccuracy of the forces estimated using the synergies is specific to the wrist system.

Here, we extended the analysis of de Rugy et al. (2013) to a more complex and redundant system. We recorded muscle activity from 13 arm and shoulder muscles in humans performing a force reaching task in which a cursor in a virtual environment was displaced according to either the recorded isometric force (force-control) or the force estimated from the recorded EMGs (EMG-control). We compared the cursor trajectories executed during EMG-control with the trajectories reconstructed using synergies, as in de Rugy et al. (2013). However, we also performed two additional comparisons. First, we investigated the reconstruction of trajectories executed in force-control using synergies and individual muscles. Second, to explicitly validate the synergy hypothesis as a possible control principle, we directly tested whether subjects were able to control movements with synergies. We let subjects perform the force reaching task in force-control, in EMG-control, and in synergy-control, i.e., by projecting online each sample of the recorded muscle patterns in the synergy space, and we compared their performances across the three conditions.

We found that, across subjects, 4–5 synergies could capture adequately the EMG data variation but they were not sufficient to reconstruct the trajectories executed in EMG-control with the same endpoint accuracy, thus extending to the arm the results obtained for the wrist by de Rugy et al. (2013). However, when we compared the reconstructions of trajectories executed in forcecontrol using individual muscles and synergies we did not find any significant difference in several performance measures. These results demonstrate that individual muscles and synergies perform equally well in the prediction of the applied forces that were generated by human subjects. Finally, we found that humans were not only able to perform the task immediately after switching from force-control to EMG-control and synergy-control, but they also did not show any differences in performance between the three conditions. These results demonstrate that human subjects can achieve similar performances in an isometric reaching task using a small number of synergies and using individual muscles.

# **MATERIALS AND METHODS**

We asked naïve participants to reach targets on a virtual desktop by displacing a cursor (i.e., a virtual spherical handle) according to: (1) the force applied on a physical handle (force-control); (2) the force estimated from the EMG activity recorded from many shoulder and arm muscles (myoelectric or EMG-control); (3) the force estimated from the combination of the recorded EMG signals through a set of muscle synergies (synergy-control). Initially the reaching task was performed under force-control and, for each individual participant, the force and EMG data collected were used to estimate an EMG-to-force matrix by multiple linear regressions. EMG data collected during force-control or during EMG-control were also used to identify muscle synergies by non-negative matrix factorization. Such synergies were then used to reconstruct cursor trajectories executed in force- and EMG-control and to execute trajectories in synergy-control.

# **PARTICIPANTS**

14 right-handed naïve subjects (mean age 26.0 years, *SD* 3.5, age range 20–34, 9 females) participated in the experiments after giving written informed consent. All procedures were conducted in conformance with the Declaration of Helsinki and were approved by the Ethical Review Board of Santa Lucia Foundation.

# **EXPERIMENTAL SET-UP**

Subjects sat in front of a desktop on a racing car seat with their torso immobilized by safety belts, their right forearm inserted in a splint, immobilizing hand, wrist, and forearm. The center of the palm was aligned with the body midline at the height of the sternum and the elbow was flexed by approximately 90◦. The subjects' view of their hand was either occluded by a 21-inch LCD monitor inclined with its surface approximately perpendicular to the subjects' line of sight when looking at their hand (**Figure 1A**) during Experiment 1 (see Experimental Protocols below) or by a mirror displaying the virtual scene co-located with the real desktop positioned above the mirror (**Figure 1B**) during Experiment 2. After calibration, the monitor could display a virtual desktop matching the real desktop, a spherical cursor matching, at rest, the position of the center of the palm and moving on a horizontal plane, and spherical targets on the same plane (**Figure 1C**). A steel bar at the base of the splint was attached to a 6-axis force transducer (Delta F/T Sensor, ATI Industrial Automation, Apex, NC, USA) positioned below the desktop to record isometric forces and torques. Surface electromyographic (EMG) activity was recorded from the following 13 muscles acting on the shoulder and elbow: brachioradialis (BracRad), biceps brachii short head (BicShort), biceps brachii long head (BicLong), triceps brachii lateral head (TriLat), triceps brachii long head (TriLong), infraspinatus (InfraSp), anterior deltoid (DeltA), middle deltoid (DeltM), posterior deltoid (DeltP), pectoralis major (PectMaj), teres major (TerMaj), latissimus dorsi (LatDorsi), middle trapezius (TrapMid). EMG activity was recorded with active bipolar electrodes (DE 2.1, Delsys Inc., Boston, MA), band-pass filtered (20–450 Hz) and amplified (gain 1000, Bagnoli-16, Delsys Inc.). Force and EMG data were digitized at 1 KHz using an A/D PCI board (PCI-6229, National Instruments, Austin, TX, USA). The virtual scene was rendered by a PC workstation with a refresh rate of 60 Hz using custom software. In Experiment 2 the scene was rendered stereoscopically using a 3D graphic card (Quadro Fx 3800, NVIDIA Corporation, Santa Clara, CA, USA) and shutter glasses (3D Vision P854, NVIDIA). Cursor position information was processed by a second

**FIGURE 1 | Experimental setup and trial sequence.** Subjects sat in front of a desktop and applied forces on a transducer attached to a forearm, wrist, and hand splint. In Experiment 1 a LCD monitor **(A)** and in Experiment 2 a mirror **(B)** occluded the subject's hand and displayed a virtual scene co-located with the real desktop. **(C)** Transparent spheres positioned on a horizontal plane with centers at the same height as the center of the palm indicated force targets that the subjects were instructed to reach with a smaller spherical cursor moving on the same plane according to the force applied (force-control) or estimated from EMGs recorded from 13 arm and shoulder muscles (EMG-control, see Materials and Methods), or estimated from recorded EMGs recombined using a set of synergies extracted using NMF (synergy-control, see Materials and Methods). **(D)** Subjects were instructed to perform a center-out reaching task in which they had to maintain the cursor in a central start location for 1 s, reach a target as soon as it appeared at one of 8 peripheral locations, and maintain the cursor at the target for 0.2 s. **(A)** and **(C)** adapted from Berger et al. (2013); **(B)** adapted from Borzelli et al. (2013).

workstation running a real-time operating system and transmitted to the first workstation. Cursor motion was simulated in real time using an adaptive mass-spring-damper (MSD) filter (Park and Meek, 1995). Either the actual force recorded by the transducer (force-control), or the force estimated in real-time from the recorded and rectified EMGs (myoelectric or EMG-control) using a linear mapping (EMG-to-force matrix, *see below*), or the force estimated in real-time from synergies using the EMG-toforce matrix (synergy-control, *see below*), was applied to a virtual mass attached to a reference position through a critically damped spring. The position of the cursor corresponded to the position of the virtual mass. The reference position matched the position of the center of the palm. To maintain fast response to changes in force while reducing the effect of myoelectric noise, the simulated mass was adapted dynamically according to the time derivative of the applied force magnitude. Further details can be found in Berger et al. (2013).

# **EXPERIMENTAL PROTOCOLS**

In all experiments subjects initially performed two blocks of trials in force-control. In the first force-control block, the mean maximum voluntary force (MVF) along 8 directions (separated by 45 deg) in the horizontal plane was estimated as the mean of the maximum force magnitude recorded across 16 trials in which subjects were instructed to generate maximum force in each direction. Subjects were then instructed to move the cursor quickly from the rest position to a target in one of the 8 directions by applying forces on the splint. At the beginning of each trial (**Figure 1D**) subjects were requested to maintain the cursor within a transparent sphere at the central start position for 1 s (tolerance of 2% MVF). Next, a *go* signal was given by displaying a transparent target sphere while the start sphere disappeared. Subjects were instructed to reach the target as quickly as possible and to remain there for 0.2 s (tolerance of 2% MVF). After successful target acquisition the cursor and the target disappeared indicating the end of the trial. Trials had to be completed within 2 s from the *go* signal. In the second force-control block subjects performed 72 trials to targets positioned at force magnitudes corresponding to 10, 20, and 30% of MVF (random order within cycles of 8 directions). After this block there was a 5 min pause to process the recorded data in order to construct the myoelectric controller. All subsequent blocks consisted of 24 trials with targets at 20% MVF in random order within cycles of 8 directions.

# *Experimental protocol 1*

Eight participants (numbered from 1 to 8) performed this protocol. Data collected in force-control and EMG-control mode in our previous study (Berger et al., 2013) were used in this study to reconstruct cursor trajectories with synergies. After two initial blocks of trials in force-control the rest of the experiment was performed in EMG-control. For the purpose of the present analysis we only considered the force-control block and the second EMG-control block. Further details of the experimental protocol are described elsewhere (Berger et al., 2013).

# *Experimental protocol 2*

Six participants (numbered from 9 to 14) performed this protocol. After the initial two blocks of trials performed in forcecontrol, the system switched to synergy-control, using the subject specific synergies from the initial force-control block (synergycontrol, *see below*). After 6 blocks of synergy-control, three blocks of force-control were introduced; this was followed by 6 blocks of EMG-control. At the end of the experiment a final block in force-control was performed.

# **EMG-TO-FORCE MAPPING**

If the arm is in a fixed posture, the force generated at the hand is approximately a linear function of the activation of muscles acting on shoulder and elbow:

$$\mathbf{f} = \mathbf{H}\mathbf{m} + \mathbf{e}\_{\mathbf{f}} \tag{1}$$

where **f** is the generated 2-dimensional force vector, **m** is the 13-dimensional vector of muscle activations, and **H** is a matrix relating muscle activation to force (dimensions 2 × 13), and **ef** is a 2-dimensional vector of force residuals. The EMG-toforce matrix (**H**) was estimated using multiple linear regressions of each applied force component, low-pass filtered (2nd order Butterworth, 1 Hz cutoff), with EMG signals recorded during the initial force-control block (dynamic phase, i.e., time from target go until the target has been reached), low-pass filtered (2nd order Butterworth, 5 Hz cutoff), and normalized to the maximum EMG activity during the generation of MVF. We verified that the choice of filter parameters for the estimation of the **H** matrix did not affect the quality of the force reconstruction during EMG-control by investigating different force and EMG cutoff frequencies. **Figure 2A** illustrates an example of the columns of the EMG matrix (i.e., the force associated to each muscle, **hi**) estimated in subject 2.

# **SYNERGY EXTRACTION**

Muscle synergies were identified by NMF (Lee and Seung, 1999) from EMG patterns from the go signal to target acquisition (dynamic phase):

$$\mathbf{m} = \mathbf{Wc} + \mathbf{e}\_{\mathbf{m}} \tag{2}$$

with **W** a *M* × *N* synergy matrix whose columns are vectors specifying relative muscle activation levels (*N* number of synergies, and *M* number of muscles), and **c** a *N*-dimensional synergy activation vector, and **em** is a *M*-dimensional vector of muscle activation residuals. For the comparison between trajectories executed in EMG-control and their reconstruction using synergies EMG patterns recorded during EMG-control were used for synergy extraction. For the comparison of trajectories from data collected during force-control EMG patterns recorded during force-control were used for synergy extraction. EMG patterns were first lowpass filtered (2nd order Butterworth filter, 5 Hz cutoff frequency)

and rectified, their baseline noise level was then subtracted, and finally they were normalized to the maximum EMG activity of each muscle recorded during the generation of MVF. Baseline noise was estimated at the beginning of the experiment and updated periodically throughout the experiment while the subject was relaxed. For each possible *N* from 1 to *M*, the extraction algorithm was repeated 10 times and the repetition with the highest reconstruction R<sup>2</sup> was retained. **Figure 2B** illustrates an example of the set of 4 synergies extracted in subject 2.

# **NUMBER OF SYNERGIES**

For each subject the number of synergies adequately capturing the EMG data (*N*) was selected according to the fraction of data variation explained, defined as R2 EMG = 1-SSEEMG/SSTEMG, where SSEEMG is the sum of the squared muscle activation residuals and SSTEMG is the sum of the squared residuals of the muscle activation from its mean vector. We considered two criteria. The first criterion was a threshold of 0.9 on R<sup>2</sup> EMG. The second criterion was based on the detection of a change in slope in the curve of the *R*<sup>2</sup> value as a function of *N*. A series of linear regressions were performed on the portions of the curve included between *N* and its last point (*M*). *N* was then selected as the minimum value for which the mean squared error of the linear regression was less than 10<sup>−</sup>4. In case of mismatch between the two criteria, the larger *N* was chosen.

# **EMG- AND SYNERGY-CONTROL**

Output forces **f** during EMG-control were computed using the EMG-to-force mapping (**H**) and the recorded muscle activity **m** (compare EMG-to-force mapping), i.e., by

$$\mathbf{f} = \mathbf{H} \,\,\mathbf{m} \tag{3}$$

thus allowing for individual muscle control. During synergycontrol muscle activity was substituted by the product of the initially extracted subject-specific synergies (**W**) and estimated synergy coefficients (**cˆ**), i.e., by **f** = **H W c,ˆ** where **H W** is the synergy-to-force mapping (illustrated in **Figure 2C** for subject 2). Synergy coefficients were estimated by projecting recorded muscle activity onto the synergy space, i.e., by **cˆ** =**W**<sup>+</sup> **m,** where **W**<sup>+</sup> is the pseudo inverse of **W**, corresponding to estimating **cˆ** from **m** as least squares solution of **m** = **W c**. Thus, during synergy-control output forces were computed as:

$$\mathbf{f} = \mathbf{H} \,\mathbf{W} \,\mathbf{W}^+ \mathbf{m} \tag{4}$$

# **TRAJECTORY RECONSTRUCTION USING EMGS AND SYNERGIES**

Cursor trajectories executed in EMG-control, i.e., computed online as the displacement of a virtual mass-spring-damper system under the force estimated by Equation 3, were reconstructed using synergies, i.e., by computing offline how the cursor would have been displaced using the forces computed by Equation 4. Similarly, cursor trajectories executed in force-control, i.e., computed online as the displacement of a virtual mass-spring-damper system under the recorded force, were reconstructed using EMGs, i.e., using the forces computed by Equation 3, and synergies, i.e., using the forces computed by Equation 4.

from Berger et al. (2013).

# **PERFORMANCE MEASURES**

We compared cursor trajectories driven by EMGs, i.e., either executed in EMG-control (EC) or reconstructed by EMGs (ER) during force-control, with trajectories reconstructed by synergies (SR) during either EMG- or force-control, by assessing the fraction of EMG-driven trajectory variation explained, R<sup>2</sup> traj = 1 − SSEtraj/SSTtraj, where SSEtraj is the sum of the squared trajectory residuals and SSTtraj is the sum of the squared residuals of the EMG-driven trajectory from its mean vector. We also compared cursor trajectories computed from recorded forces with trajectories reconstructed by EMGs (ER) or synergies (SR) during force-control by assessing the fraction of force-driven trajectory variation explained with a similarly defined R<sup>2</sup> traj-force measure. Both such measures quantified the similarity of the entire time course of two sets of trajectories. We then also compared performances at the beginning of the movement, quantifying an initial angle error, and at the end of each movement, quantifying an endpoint error. Initial angle error was defined as the angular deviation of the initial movement direction of the cursor with respect to target direction. The angular deviation was computed as *abs*(ϑtarget − ϑcursor), where ϑtarget is the target direction and ϑcursor is the direction of the displacement between the position of the cursor at movement onset and at the first following peak of its tangential velocity. Taking the absolute value avoided cancellations when averaging the values of the angular deviations across targets with different signs for the difference between target direction and cursor initial direction. Endpoint error was defined as the Euclidean distance, normalized to target distance from the origin, between the target position and the mean cursor position during the 0.2 s following the cursor's entrance into the target region. Finally, in Experiment 2, we compared the fraction of unsuccessful trials, i.e., the fraction of trials in which the cursor did not reached and remained in the target within the instructed time intervals, during the task execution in EMG-control and in synergy-control.

# **STATISTICAL ANALYSIS**

Differences in performance measures were assessed either by *t*-test statistics (paired, two-tailed) if the data were distributed normally (according to a Lilliefors test) or by Wilcoxon ranksum test otherwise.

# **RESULTS**

# **SYNERGY RECONSTRUCTION OF CURSOR TRAJECTORIES DURING EMG-CONTROL**

To address the question whether a small set of muscle synergies not only explains a large fraction of the variation of the muscle activity but can also generate the forces necessary to perform an isometric reaching task accurately, we compared the trajectories performed by human subjects during EMG-control (EC) with the trajectories reconstructed using the subject-specific synergies (synergy-reconstructed, SR). **Figures 3A,B** illustrate examples of EC (*green*) and SR (*red*) trajectories for 8 different targets in one subject. **Figure 3C** shows the corresponding filtered EMGs traces (*gray area*) and their synergy reconstruction (*red line*) using four synergies (**Figure 2B**). These synergies adequately captured the muscle patterns across directions and muscles. For each direction a different combination of synergy coefficients is used (**Figure 3D**), e.g., in direction 0◦ (1st column) synergy 2 and 4 are recruited, whereas in direction 225◦ (6th column) synergies 1 and 3 are recruited. The directional tuning of all four synergies is well captured by a cosine functions (**Figure 3E**, see also Borzelli et al., 2013; Gentner et al., 2013). The SR trajectories show a high similarity to the EC trajectories in each of the eight target directions. To quantify similarity, we first computed the fraction of EC trajectory variation (R<sup>2</sup> traj) explained by trajectories reconstructed using for each subject a number of synergies adequately capturing the EMG data variation (R<sup>2</sup> EMG, see Methods) which varied across subjects between 4 and 5 (see **Table 1**). On average across subjects (*n* = 8) we found that SR trajectories reconstructed accurately EC trajectories (mean R2 traj value 0.96 ± 0.03 *SD*, range: 0.89–0.98, **Table 1**). We then assessed the mean similarity of EC and SR trajectories as a function of the number of synergies (*N*) used for the reconstruction (**Figure 4A**, averages across subjects, *n* = 8). Five synergies were sufficient to reconstruct the EC trajectories with an average R<sup>2</sup> traj value larger than 0.9 (mean R<sup>2</sup> traj values were 0.89 and 0.97, for *N* = 4 and 5, respectively).

To assess the performance of the synergy reconstruction at the beginning of the movement, we then compared the initial angle error of EC and SR trajectories (**Table 1** and **Figure 4B**). Using for each subject a number of synergies adequately capturing the EMG data variation (see **Table 1**), we did not find any significant differences between the mean initial angle error of EC and SR trajectories (**Table 1**, *p* = 0.087, *t*-test, *n* = 8). **Figure 4B** (*red*) shows the mean initial angular error of SR trajectories as a function of the number of synergies (*N*), averaged across subjects (*n* = 8), in comparison with the mean error of EC trajectories (*green*). While the trajectories reconstructed with 4 synergies had a significantly larger angle error than EC trajectories (*p* = 0.027, *N* = 4, *n* = 8, *t*-test with mean values ± *SD* of 15.1 ± 5.5◦, with respect to 8.9 ± 2.5◦ in EC), there was no significant difference between the error of EC trajectories and the error of trajectories reconstructed using 5 synergies (*p* = 0.14, *N* = 5, *n* = 8, *t*-test, mean value ± *SD*: 14.7 ± 10.2).

To assess the performance of the synergy reconstruction at the end of the movement and to make a direct comparison to the results of de Rugy et al. (2013), we also estimated the endpoint error of the trajectories (**Table 1** and **Figure 4C**). When comparing the endpoint error of the EC and SR trajectories we found similar results to those of de Rugy et al. (2013). Using for each subject a number of synergies adequately capturing the EMG data variation (see **Table 1**) we found a significant difference between the mean endpoint errors of EC and SR trajectories (*p* = 0.00015, Wilcoxon ranksum test, *n* = 8). Comparing the endpoint errors of EC and SR trajectories as a function of the number of synergies, we also found a significant difference between average endpoint errors for both four and five synergies (*p* = 0.0002 and *p* = 0.0003, for *N* = 4 and *N* = 5, Wilcoxon ranksum test, *n* = 8. Mean ± *SD* 0.22 ± 0.12 and 0.12 ± 0.05 for *N* = 4 and *N* = 5, respectively, compared to 0.067 ± 0.005 during EC). However, the analysis of individual subjects revealed a high inter-subject variability (**Figure 4D**). There was no significant difference between the endpoint error of EC and SR trajectories using 5 synergies for three out of eight subjects (*p* = 0.3481, *p* = 0.2883, and

**reconstruction by synergies. (A)** Examples of cursor trajectories executed by subject 2 in EMG-control (EC, *green)* and their reconstruction using four synergies (SR, *red*). Each column shows a trial to a different target (*gray circle*). Markers indicate the time of target acquisition. **(B)** Corresponding cursor displacements in x- and y- force directions for each trial. **(C)** Rectified and filtered EMG traces recorded during each trial (*gray area*) and their reconstruction (*red*) by the four subject-specific synergies shown in **Figure 2B**. Vertical dashed lines indicate the time of target acquisition. **(D)** Time varying synergies coefficients (color coded as in **Figure 2B**) for each trial. **(E)** Polar plot of the directional tuning of the four synergies shown in **Figure 2B**.


**Table 1 | Number of synergies estimated from EMG-patterns in EMG-control (Experiment 1), fraction of EMG data variation explained, and performance measures for individual subjects of EMG-control trajectories (EC) and trajectories reconstructed using synergies (SR).**

*p* = 0.9589 for subjects 2, 4, and 8, with *N* = 5) and for one subject using 4 synergies (*p* = 0.0764 for subject 1).

In summary, we found on average a high similarity between cursor trajectories under EMG-control and trajectories reconstructed using a set of synergies which adequately explained EMG data variation. Moreover, the mean initial angle error, indicative of the accuracy of the feed-forward commands, was not different between trajectories executed in EMG-control and trajectories reconstructed using synergies. However the mean endpoint error, more sensitive to feedback control, was larger for trajectories reconstructed using synergies than for trajectories executed using EMG-control of individual muscles.

# **SYNERGIES RECONSTRUCTION OF CURSOR TRAJECTORIES DURING FORCE-CONTROL**

As inaccuracies in the EMG-to-force mapping could be corrected by online adjustments in EC while inaccuracies in the synergy-to-force mapping could not be corrected in the offline reconstruction by synergies, we also compared the trajectories reconstructed using the EMG data (ER) recorded while human subjects performed the isometric reaching task in force-control (FC) with the trajectories reconstructed using subject specific synergies (SR). We quantified the similarity of ER trajectories with SR trajectories by computing the fraction of ER trajectory variation explained by SR trajectories (R2 traj, **Table 2** and **Figure 5A**). When we used, for each subject, a number of synergies adequately explaining EMG data variation (**Table 2**), we found, across subjects (*n* = 8), a mean R2 traj value of 0.88 ± 0.08 *SD* (**Table 2**). When we considered the mean R<sup>2</sup> traj value as a function of the number of synergies (**Figure 5A**), we found that the mean R2 traj value was 0.85 ± 0.10 with four synergies and 0.93 ± 0.06 with five. We also compared how ER and SR trajectories reconstructed FC trajectories by computing the ratio of the fraction of FC trajectory variation explained (R<sup>2</sup> traj-force) by SR and by ER trajectories (**Table 2** and **Figure 5B**). Selecting a number of synergies, for each subject, adequately capturing EMG data variation we found a mean R2 traj-force ratio (SR/ER) of 0.92 <sup>±</sup> 0.11 (*SD*). Averaging the R<sup>2</sup> *traj*-*force* ratio across subjects as a function of the number of synergies (**Figure 5B**) we found that 5 synergies reached a value of 0.96 ± 0.09 (*SD*). Thus, during force-control, we found a high similarity between the trajectories reconstructed using the entire set of recorded muscles and those reconstructed using only a small number of synergies.

We then compared the initial angle error of ER and SR trajectories. **Figure 5C** shows the average errors for ER trajectories and SR trajectories using, for each subject, a number of synergies which captured EMG data variation adequately. We found no significant differences between mean errors for ER and SR trajectories (*p* = 0.26, *t*-test, *n* = 8, mean ± *SD* 18.5 ± 8.2◦ and 15.5 ± 5.3 for ER). We also found no significant difference between the mean angle error of ER trajectories and the mean error of SR trajectories reconstructed using 4 synergies (*p* = 0.17 for *N* = 4 synergies, *t*-test, *n* = 8, with mean values of 18.4 ± 5.9◦ for SR and as above for ER). Finally, comparing the endpoint error of ER to SR trajectories (**Table 2** and **Figure 5D**) we found no significant differences between the average errors using the subject-specific number of synergies (*p* = 0.19, *n* = 8, Wilcoxon ranksum test, mean ± *SD*: 0.46 ± 0.09 for SR and 0.40 ± 0.06 for ER) as well as when using 4 synergies for all subjects (*p* = 0.10, *n* = 8, Wilcoxon ranksum test, mean ± *SD* for SR: 0.46 ± 0.09). Thus individual muscles and synergies showed similar performance during force-control.

# **PERFORMANCE DURING SYNERGY-CONTROL, EMG-CONTROL, AND FORCE-CONTROL**

We then investigated how well subjects were able to control the cursor directly with the synergy activation estimated online from the recorded EMGs, i.e., in synergy-control (SC) mode, by comparing their performances in FC, SC, and EC. In SC we used for each subject a number of synergies adequately capturing EMG data variation (**Table 3**). Subjects were able to control the cursor in SC and EC mode immediately after FC. **Figure 6A** shows examples of the trajectories for the first three movements in each of the eight directions performed in each control mode by one subjects. In these examples, all trials except one (bottom right target in EC) were successful. On average across subjects only 2.8 ± 3.4% (*SD*) of the trials in SC and 3.5 ± 6.7% of the trials in EC were unsuccessful while all trials were successful in FC for all subjects. The differences in the fraction of unsuccessful trials between all conditions were not significant (*t*-test, *n* = 6; SC–EC: *p* = 0.72; FC–SC: *p* = 0.10; FC–EC: *p* = 0.26).

We first assessed the subjects' performance by comparing initial angle errors (**Figure 6B**). There was no significant difference in angle error for the pairwise comparisons of the three different control modes (FC–SC: *p* = 0.067; SC–EC: *p* = 0.15; FC–EC: *p* = 0.13, *t*-test, *n* = 6, mean values ± SD for FC, SC, and EC, respectively, were: 7.7 ± 2.5, 11.6 ± 3.8, and 9.7 ± 3.9◦). We then

dashed line indicate 0.9 R2). **(B)** Mean initial angle error for EC and SR trajectories as a function of the numbers of synergies. **(C)** Mean endpoint error (normalized to target distance from the origin, *n.u.: normalized units*) for EC and SR trajectories as a function of the number of synergies.**(D)** Endpoint error of SR trajectories for individual subjects (bars with different gray levels) as a function of the number of synergies. Average endpoint of EC trajectories across subject is indicated by the *green* line.

investigated the endpoint error (**Figure 6C**). We also found no significant difference in the endpoint error for the pairwise comparisons of the three different control modes (FC–SC: *p* = 0.94; FC–EC: *p* = 0.89; SC–EC: *p* = 0. 91, *t*-test, *n* = 6, mean values ± *SD* for FC, SC, and EC, respectively, were: 0.072 ± 0.005, 0.070 ± 0.006, and 0.067 ± 0.006). In summary, during all three control modes subjects were able to control the cursor accurately and there were no significant differences in initial angular error and endpoint error.

# **DISCUSSION**

As in many previous studies, we found that a small number of muscle synergies explained a large fraction of the variation of the muscle patterns recorded during different task conditions. In this study, however, we focused on the question whether a small number of muscle synergies can accurately generate the forces involved in the task and, ultimately, whether muscle synergies can be used to perform the task effectively. We first tested whether trajectories of a virtual mass displaced by EMG activity recorded from 13 arm muscles during an isometric reaching task could be reconstructed with similar accuracy using the combinations of the same muscle activities into a small number of muscle synergies identified by NMF. Our results showed that the trajectories reconstructed using 4–5 synergies were as accurate as the trajectories obtained by displacing a virtual cursor according to the hand force estimated from EMGs recorded from the entire set of muscles when considering the initial movement direction error but not in terms of endpoint error. However, these results were not consistent across subjects, as for some subjects we found no difference in endpoint error using 4 or 5 synergies. We then assessed whether the availability of feedback in EMG-control mode could explain the lower endpoint accuracy of the synergies by comparing the reconstructions of the trajectories executed in force-control using synergies and using the entire set of muscles. Indeed, in force-control cursor movements depended only on the applied force and did not provide any information on the inaccuracy likely present in the EMG-to-force and synergy-to-force mappings used for the reconstruction. Trajectories reconstructed using synergies were not significantly different from trajectories reconstructed using individual muscles suggesting that the difference between the trajectories generated during EMG-control and their reconstruction by synergies observed in some subjects were due to online adjustments performed during the EMG-control mode. Finally, we explicitly tested whether human subjects were able to control a virtual cursor and to perform successfully a force reaching task by synergy recruitment. In synergy-control mode the cursor movement depended only on the portion of the recorded EMGs that could be reconstructed by synergy combinations. Subjects were able to control the cursor in synergy-control mode immediately after switching from force-control mode and there were no significant differences in performance between the three control modes.

As mentioned in the Introduction, several studies have provided evidence supporting the hypothesis that movements are controlled by a limited set of modules or muscle synergies. However, most of these studies focused on how well synergies describe muscle patterns, showing that a small number of synergies capture a large fraction of the muscle pattern variation across task conditions and often that such synergies are robust across different tasks, but they did not directly address the question whether synergies can be used by the CNS to effectively


**Table 2 | Number of synergies estimated from EMG-patterns in force-control (Experiment 1), fraction of EMG data variation explained, and performance measures for individual subjects of trajectories reconstructed using EMGs (ER) and synergies (SR) during force-control.**

accomplish a task. Thus, in order to validate the synergy hypothesis it must be demonstrated that a small number of synergies are sufficient to generate the forces or movements necessary for accurate task performance. Two recent studies have addressed the functional role of the muscle synergies underlying postural responses to stance perturbations in cats (Torres-Oviedo et al., 2006) and in humans (Chvatal et al., 2011) by using NMF to simultaneously extract synergies from EMG data and kinetic data (contact forces, center of mass accelerations). Such functional muscle synergies could explain both muscle activation patterns and kinetic data in a range of postural configurations and in different types of responses, suggesting that muscle synergies are responsible for the control of specific biomechanical functions shared across task conditions. Consistent functional roles of muscle synergies have also been demonstrated in dynamic simulations of human pedaling and walking (Raasch and Zajac, 1999; Neptune et al., 2009; Allen and Neptune, 2012). Identification of functional muscles synergies, however, relies on the assumption of a linear relationship between EMGs and kinetic variables which might be valid only in limited conditions. Similarly, accurate forward dynamic simulations using muscle synergies depend on many musculoskeletal parameters that are difficult to validate and require fine-tuning of the muscle excitation patterns. Thus, these studies do not provide direct evidence that a small set of muscle synergies is sufficient for achieving accurate task performance.

A recent study by de Rugy et al. (2013) investigated the relation between synergies and task performance in humans by comparing force trajectories generated under EMG-control during an isometric reaching task, similar to the one used in the present study but involving only five wrist muscles, with the trajectories reconstructed using muscle synergies. One advantage of this experimental approach is that task performance depends on a well-defined linear transformation of the recorded EMGs, even if such linear mapping is not an accurate estimate of the real EMGto-force mapping. de Rugy and collaborators found that four synergies on average explained more than 90% of the EMG data variation but they reconstructed cursor trajectories with a much higher endpoint error than the error of the trajectories executed in EMG-control. The authors claimed that synergy decomposition introduces substantial task space errors and concluded that applying synergy decomposition onto a set of available muscles appears of little use to best reconstruct the motor output in task space.

In the present study we have addressed the issue of whether muscle synergies can accurately reconstruct and generate forces in a number of ways. First, we performed the same analysis that de Rugy and collaborators performed on the wrist system (using EMG activity recorded from five muscles) on the more complex arm system (using EMG activity recorded from 13 muscles). We compared the endpoint error of cursor trajectories during EMGcontrol of a reaching task with the endpoint error of trajectories reconstructed using synergies. We found, on average across subjects, that a small number of synergies were not sufficient to reach the same endpoint accuracy as during EMG-control, thus replicating for the arm the results by de Rugy et al. (2013) for the wrist. However, investigation of individual subjects showed that this result was inconsistent across subjects. For half of the subjects synergy reconstruction using 4 or 5 synergies was sufficient to reach the same performance as during EMG-control. Moreover, as our subjects were instructed to reach the target with a fast reaching movement and were not required to minimize endpoint error, we also compared the entire cursor trajectories and the initial directional error and we found no significant differences between EMG-control and reconstruction with a number of synergies adequately explaining the EMG data.

Second, as the comparison of EMG-control and synergy reconstruction may be biased by the use of online feedback, we also compared EMG and synergy reconstruction of cursor trajectories generated by recorded forces. In EMG-control cursor trajectories are generated by forces estimated using EMG signals and not by the recorded forces and subjects were able to exploit feedback to correct inaccuracies in the estimated EMG-to-force mapping. However, inaccuracies in the synergyto-force mapping may require different corrections unavailable in the offline synergy reconstruction. We therefore argue that in order to conclude that synergy decomposition decreases task performance with respect to individual muscle control, one needs either to compare experimental conditions in which online corrections to both EMG-to-force and synergy-to-force inaccuracies are possible (as in our second experimental protocol), or to compare the reconstructions, using individual muscles and synergies, of trajectories executed in force-control mode, in which neither EMG-to-force nor synergy-to-force inaccuracies can be corrected. When we compared the trajectories reconstructed using synergies and using individual EMGs using data collected while the task was performed in force-control mode we did not find any significant difference.

**FIGURE 5 | Comparison between trajectories reconstructed using individual muscles and synergies during task performance in force-control.** Trajectories reconstructed using EMG data (ER) are shown in *green*; trajectories reconstructed using synergies (SR) are shown in *red*. **(A)** Mean fraction of EMG data variation explained by synergies (R<sup>2</sup> EMG, *black*) and mean fraction of ER trajectory variation explained by SR trajectories (R2 traj, *blue*) as a function of the number of synergies (*n* = 8, shaded areas indicate SD, dashed line indicate 0.9 R2). **(B)** Mean of the ratio between the fraction of FC trajectory variation (R<sup>2</sup> traj-force) explained by SR trajectories and the fraction of FC trajectory variation explained by ER trajectories. Dashed line indicate a ratio of 1. **(C)** Mean initial angle error for ER and SR trajectories as a function of the numbers of synergies. **(D)** Mean endpoint error (normalized to target distance from the origin) for ER and SR trajectories as a function of the number of synergies.

These results indicate that the trajectories reconstructed using synergies are not as accurate as the trajectories executed in EMG-control because of online feedback corrections during EMG-control.

Third, we directly demonstrated the effectiveness of a small number of synergies in generating the forces involved in a reaching task by showing that subjects were able to control the cursor accurately with the synergies. We compared performances when subjects controlled the cursor with synergies to performances in EMG-control and force-control. Remarkably, subjects were able to perform the task immediately after switching from forcecontrol to synergy-control and they did not show significant differences in initial angle error and endpoint error between the three control models. These results show that subjects were able to control a cursor in a reaching task using synergies with similar performance as during force-control and EMG-control.

Finally, de Rugy et al. (2013) examined the wrist system, which has a relatively low muscle redundancy, and it might not require synergistic control. They also drew similar conclusions analyzing arm muscle pattern that were simulated as to generate the target forces, using an EMG-to-force mapping derived from a biomechanical model, with minimal summed squared muscle activations (Fagg et al., 2002). They justified the use of simulated data for assessing the task efficacy of muscle synergies also in the more complex arm system with the observation that wrist simulated data show a fraction of variance explained by synergy decomposition and endpoint error of trajectories reconstructed using synergies similar to those obtained with experimental data. However, the similarity between simulated and experimental data in the weakly redundant wrist system might be due to the lack of synergistic control in such system. In contrast, simulated data might have a different synergy decomposition than experimental data collected in the arm system if this is controlled synergistically. In fact, a recent study by Borzelli et al. (2013) investigating muscle patterns of the arm underlying isometric force generation has shown that the estimated minimum effort recruitment of individual muscles does not adequately capture the observed muscle activation patterns. Thus, using simulated muscle patterns de Rugy et al. (2013) did not test whether a small number of synergies could achieve good task performance in the arm system. Not surprisingly the simulated data had a high dimensionality and the synergy decomposition resulted in higher aiming errors than individual muscles. However, if the data had been generated by combinations of a small number of synergies their decomposition into the same number of synergies would have achieved the same task performance as individual muscles. Thus, the results of these simulations, because they depend on the assumptions made for data generation, cannot lead to any conclusion on whether the CNS does employ synergies for simplifying control.

The number of synergies used for reconstructing cursor trajectories executed in force- and EMG-control and for projecting recorded EMG data in synergy-control is a critical parameter that has a major effect on the task space accuracy. We selected the number of synergies, a free parameter in the NMF decomposition of the EMG data, comparing the EMG data variation accounted by different number of synergies (R<sup>2</sup> EMG). We considered two criteria used in many previous studies of muscle synergies: the minimum number of synergies with R<sup>2</sup> EMG over 90% (Tresch et al., 1999; Ting and Macpherson, 2005; Torres-Oviedo et al., 2006; Roh et al., 2012; Delis et al., 2013) and the number of synergies at which the slope of the R2 EMG curve has a change in


**Table 3 | Number of synergies extracted from EMG-patterns in force-control (Experiment 2), fraction of EMG data variation explained, and performance measures for individual subjects of trajectories executed in force-control (FC), synergy-control (SC), and EMG-control (EC).**

slope (d'Avella et al., 2003; Cheung et al., 2005; d'Avella et al., 2006; Tresch et al., 2006; Delis et al., 2013). When the number of synergies selected by the two criteria did not match we chose the largest number to ensure the best reconstruction of the EMG data when analyzing task performance. However, both criteria depend on *ad-hoc* thresholds and they do not ensure the selection of the correct number of synergies. Importantly, we noticed that adding a single synergy may sometimes have a small effect on R<sup>2</sup> EMG but a large effect on the quality of the reconstruction of EMG-control trajectories using synergies (R<sup>2</sup> traj) (see **Figure 4D**, subject 3). Thus, the lower task performance of the trajectories reconstructed using synergies, in addition to the lack of feedback driven adjustments to synergy-to-force inaccuracy discussed above, might also be due to the inappropriate selection of the number of synergies for some of the subjects. However, the number of synergies selected with our criterion did allow to perform the task in synergy-control with accuracy similar to that of forceand EMG-control, suggesting that the criterion was adequate and that the lower performance of synergy-reconstructed trajectories executed in EMG-control was mainly due to the lack of appropriate feedback. Finally, task performance in synergy-control could be used as a new criterion for the selection of a number of synergies, especially for synergy-based myoelectric control applications. It would simply require testing an increasing number of synergies and selecting the number that ensures performance comparable to that obtained with individual muscles.

In conclusion, the investigation of a 2-dimensional force reaching task demonstrated that a complex arm muscle system can be effectively controlled by a small number of synergies. Our results suggest that muscle synergies are employed by the CNS to cope with the high number of degrees-of-freedoms in the musculoskeletal system and to simplify movement coordination. However, the fact that we did not find evidence for any significant reduction in performance using muscle synergies cannot definitively prove whether or not synergies are actually employed by the CNS. Further insights into the synergy hypothesis may be gained by testing subjects' adaptation to perturbation. In a recent study (Berger et al., 2013) we have shown that adaption to virtual surgeries, i.e., perturbation of the muscle-to-force mapping, depends on the compatibility of the surgery with the synergies. Human subjects adapted strikingly faster after compatible virtual surgeries, in which a full range of movements in the task space could be achieved recombining the initially identified synergies, than after incompatible virtual surgeries, for which new or modified synergies would be required. Muscle synergies might thus allow for faster adaptation to perturbation and to environmental demands. Comparing adaptation to perturbation directly under synergy-control and EMG-control might therefore shed further light into possible control strategies employed by the CNS. Synergy-control could moreover be useful for achieving intuitive simultaneous and proportional control of myoelectric prostheses (Jiang et al., 2009, 2013), and robot arms (Artemiadis and Kyriakopoulos, 2010) and for the development of novel diagnostic tools and rehabilitation approaches (Safavynia et al., 2011). Specifically, rehabilitation exercises in a virtual environment with synergy-control might promote recovery of movement skills in stroke patients by facilitating the recruitment of spared muscle synergies (Cheung et al., 2009, 2012) and the re-organization of altered ones (Roh et al., 2013).

# **ACKNOWLEDGMENTS**

This work was supported by the Human Frontier Science Program Organization (RGP11/2008), the European Union FP7- ICT program (Adaptive Modular Architectures for Rich Motor skills, AMARSI, Grant 248311). We thank Daniele Borzelli for help with setting up and making illustrations of the experimental apparatus and Benedetta Cesqui for useful discussions.

# **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 07 October 2013; accepted: 28 March 2014; published online: 17 April 2014. Citation: Berger DJ and d'Avella A (2014) Effective force control by muscle synergies. Front. Comput. Neurosci. 8:46. doi: 10.3389/fncom.2014.00046*

*This article was submitted to the journal Frontiers in Computational Neuroscience. Copyright © 2014 Berger and d'Avella. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Common muscle synergies for balance and walking

# *Stacie A. Chvatal and Lena H. Ting\**

*The Wallace H. Coulter Department of Biomedical Engineering, Georgia Tech and Emory University, Atlanta, GA, USA*

### *Edited by:*

*Andrea D'Avella, IRCCS Fondazione Santa Lucia, Italy*

### *Reviewed by:*

*Yuri P. Ivanenko, IRCCS Fondazione Santa Lucia, Italy Thierry Pozzo, INSERM, France*

### *\*Correspondence:*

*Lena H. Ting, The Wallace H. Coulter Department of Biomedical Engineering, Georgia Tech and Emory University, 313 Ferst Drive, Atlanta, GA 30332-0535, USA. e-mail: lting@emory.edu*

Little is known about the integration of neural mechanisms for balance and locomotion. Muscle synergies have been studied independently in standing balance and walking, but not compared. Here, we hypothesized that reactive balance and walking are mediated by a common set of lower-limb muscle synergies. In humans, we examined muscle activity during multidirectional support-surface perturbations during standing and walking, as well as unperturbed walking at two speeds. We show that most muscle synergies used in perturbations responses during standing were also used in perturbation responses during walking, suggesting common neural mechanisms for reactive balance across different contexts. We also show that most muscle synergies using in reactive balance were also used during unperturbed walking, suggesting that neural circuits mediating locomotion and reactive balance recruit a common set of muscle synergies to achieve task-level goals. Differences in muscle synergies across conditions reflected differences in the biomechanical demands of the tasks. For example, muscle synergies specific to walking perturbations may reflect biomechanical challenges associated with single limb stance, and muscle synergies used during sagittal balance recovery in standing but not walking were consistent with maintaining the different desired center of mass motions in standing vs. walking. Thus, muscle synergies specifying spatial organization of muscle activation patterns may define a repertoire of biomechanical subtasks available to different neural circuits governing walking and reactive balance and may be recruited based on task-level goals. Muscle synergy analysis may aid in dissociating deficits in spatial vs. temporal organization of muscle activity in motor deficits. Muscle synergy analysis may also provide a more generalizable assessment of motor function by identifying whether common modular mechanisms are impaired across the performance of multiple motor tasks.

**Keywords: locomotion, posture, muscle synergy, motor control, electromyography**

# **INTRODUCTION**

Humans and animals are able to robustly move over diverse terrains and withstand challenging disturbances to balance during locomotion. Achieving these remarkable behaviors requires precise and dynamic coordination of multiple muscles across the limbs and trunk via hierarchical neural pathways. However, little is known about how the nervous system integrates the concurrent control of locomotion and balance functions over different movement contexts. Neural circuits for locomotion have been identified in the mammalian spinal cord, and can endogenously produce rhythmic motor patterns to muscles (Brown, 1911; Grillner, 1975; Rossignol et al., 1996). These patterns can be modified by sensory feedback (Forssberg et al., 1980; Quevedo et al., 2000; Rossignol and Bouyer, 2004) and motor planning mechanisms (Drew et al., 2002) that alter the gait pattern. Perturbations to walking can elicit long-latency muscle responses (Tang et al., 1998; Chvatal and Ting, 2012) as well as alter the locomotor rhythm during stumbling corrective responses (Pijnappels et al., 2005; van Der Linden et al., 2007). During standing balance control, perturbations evoke coordinated long-latency responses in muscles that help to return the body to postural equilibrium; these require brainstem integration of multisensory cues (Macpherson and Fung, 1999; Deliagina et al., 2008).

Are there common neural mechanisms underlying the control of walking and reactive balance control, and how are these mechanisms integrated during natural movements? Recent research demonstrates the neural control of muscles may be modular, organized in functional groups often referred to as muscle synergies (Tresch et al., 1999; Giszter et al., 2007; Ting and McKay, 2007; Drew et al., 2008; Chiel et al., 2009; Yakovenko et al., 2011). Each muscle synergy is proposed to specify a fixed pattern of co-activation across multiple muscles at any given time point. Muscle synergies have been used to describe muscle coordination during a variety of motor behaviors including balance control (Torres-Oviedo and Ting, 2007), walking (Ivanenko et al., 2004; Clark et al., 2010; Chvatal and Ting, 2012), reaching (D'Avella et al., 2006; Muceli et al., 2010), and grasping (Hamed et al., 2007; Acharya et al., 2008; Overduin et al., 2008). Moreover, common muscle synergies have been identified across different motor behaviors such as frog swimming, kicking, and jumping (Hart and Giszter, 2004; Cheung et al., 2005; D'Avella and Bizzi, 2005; Cheung et al., 2009), forward and backward locomotion (Raasch and Zajac, 1999; Ting et al., 1999), and across reactive balance conditions (Torres-Oviedo and Ting, 2010; Chvatal et al., 2011). Based on these findings, we hypothesize that a common set of muscle synergies may be recruited by parallel neural pathways governing voluntary, reactive, and automatic motor behaviors in the upper and lower limbs. Muscle synergy analysis may thus provide a more generalizable assessment of motor function in neuromotor deficits, providing more specific information about functional deficits that may guide more targeted rehabilitation interventions.

It has been demonstrated that the phasic recruitment of muscle synergies underlies variability in locomotor behaviors such as pedaling and walking (Ting et al., 1999; Ivanenko et al., 2004; Krouchev et al., 2006; Clark et al., 2010; Lacquaniti et al., 2012). During locomotion, specific muscle synergies have been associated with a particular phase of the gait cycle (Ivanenko et al., 2004; Krouchev et al., 2006; Clark et al., 2010), despite differences in how muscle synergies are defined (Safavynia and Ting, 2012). Furthermore, the order of recruitment has been shown to be consistent across conditions, such as when subjects concurrently perform voluntary tasks (Ivanenko et al., 2005) or shift from walking to running (Cappellini et al., 2006). Moreover, recruitment of muscle synergies within a particular phase of gait has been shown to be modulated systematically as a function of walking speed, and also to account for cycle-by-cycle variability in the locomotor pattern (Clark et al., 2010). These variations in muscle synergy recruitment may reflect changing task demands across gait conditions. Muscle synergies may be organized to produce specific whole-limb or whole-body biomechanical functions during locomotion (Raasch and Zajac, 1999; Neptune et al., 2009; Allen and Neptune, 2012) such that altering the phase, amplitude, or duration of muscle synergy recruitment may produce a variety of locomotor behaviors (Raasch and Zajac, 1999; Ting et al., 1999; McGowan et al., 2010). This is consistent with the idea that muscle synergies reflect motor modules that allow the nervous system to produce consistent biomechanical functions.

Rapid and complex changes in the coordination of muscles are required to recover from discrete perturbations that produce large disruptions to the locomotor pattern. In response to unexpected obstacles, slipping, waist pulls, or surface heights, immediate and delayed changes to muscle activity and kinematics have been observed (Tang et al., 1998; You et al., 2001; Ferber et al., 2002; Misiaszek, 2003; Oddsson et al., 2004; Chambers and Cham, 2007; van Der Linden et al., 2007; Bachmann et al., 2008; Shinya and Oda, 2010). Perturbations typically alter the duration of stance and swing phase such that changes in stance duration, step length, and step width are observed within the perturbed step (Oddsson et al., 2004) and can continue in subsequent steps (Patla, 2003). Corrective muscular responses at long latencies (∼100 ms) following perturbations are observed in both the stance and swing limb (Tang et al., 1998; Bachmann et al., 2008) and appear to be superimposed upon the locomotor pattern (Gorassini et al., 1994; Hiebert et al., 1994). Moreover, different motor strategies to maintain whole-body stability can be evoked, consistent with the idea that long-latency responses are organized to maintain task-level goals (Nashner, 1976; Horak and Macpherson, 1996; Carpenter et al., 1999; Chvatal et al., 2011), and not simply limb posture (cf. autogenic reflex). We recently demonstrated that the same muscle synergies recruited phasically during overground walking were also recruited at longlatencies in response to discrete perturbations to walking (Chvatal and Ting, 2012). However, it remains unknown whether these long-latency responses are similar in organization to those evoked during perturbations to standing balance, which would support the idea of a common mechanism mediating balance responses across movement contexts.

In discrete perturbations to standing balance control, the modulation of a few muscle synergies can robustly explain variations in muscle activity across reactive balance responses during different perturbations to standing. In response to multidirectional support-surface perturbations, muscle synergy recruitment is directionally tuned to perturbation direction, and generates a specific biomechanical function (e.g., ground-reaction force direction) to restore the center-of-mass (CoM) in both humans and animals (Ting and Macpherson, 2005; Torres-Oviedo et al., 2006; Chvatal et al., 2011). Moreover, the multidirectional recruitment of muscle synergies during standing balance can be predicted based on the deviation of CoM kinematics from the upright, static state, such that they reflect the task-relevant error for maintaining postural equilibrium and orientation (Safavynia and Ting, 2013). Variations in muscle activity from trial to trial in identical perturbation conditions (Torres-Oviedo and Ting, 2007; Safavynia and Ting, 2012) as well as across biomechanical contexts [e.g., standing with wide, narrow, crouched, or single-limb stance (Torres-Oviedo et al., 2006; Torres-Oviedo and Ting, 2010)], can be accounted for by the differential modulation of a common set of muscle synergies for balance. Furthermore, muscle activity and the resulting force production during reactive non-stepping and stepping responses were explained by a common set of muscle synergies (Chvatal et al., 2011), demonstrating the robustness of the muscle synergy organization and function in mediating a variety of balance behaviors.

Here, we hypothesized that a common set of lower-limb muscle synergies mediate reactive balance and walking. Recent studies during perturbed locomotion demonstrate that locomotor muscle synergies are recruited at long-latencies following discrete perturbations to walking (Chvatal and Ting, 2012; Oliveira et al., 2012). This suggests that muscle synergies recruited rhythmically for locomotion may also be recruited during atypical phases of gait due to sensorimotor feedback mechanisms governing longlatency balance responses. However, it is not known whether the same muscle synergies are recruited during long-latency balance responses evoked during standing and walking. To compare muscle activity during directional balance control, we imposed twelve directions of support-surface perturbations during standing and walking at self-selected and slow speeds. First, we predicted that reactive balance to multidirectional perturbations across contexts, e.g., during standing and walking, were mediated by common muscle synergies. We then predicted muscle activity in both reactive balance and overground walking were also mediated by common muscle synergies. Our results suggest that a common set of muscle synergies is differentially recruited by neural circuits mediating reactive balance across movement contexts and for locomotion.

# **METHODS**

In order to determine whether common muscle synergies are recruited during postural responses to perturbations in different dynamic contexts, we recorded postural responses to ramp and hold translations of the support surface during standing balance as well as during walking at both self-selected and slow walking speeds. Perturbations in twelve directions in the horizontal plane were delivered in random order in each condition. Muscle synergies were extracted from both the standing balance and walking conditions, as well as from trials of unperturbed walking. Muscle synergies and recruitment coefficients from each condition were compared to give insight into neural mechanisms underlying each condition.

# **DATA COLLECTION**

Seven healthy subjects (four male, three female) between the ages of 19 and 26 responded to support surface translations according to an experimental protocol that was approved by the Institutional Review Boards of Georgia Institute of Technology and Emory University. All subjects gave informed consent before participating in each of three experimental blocks (standing balance, self-selected speed walking, and slow walking). The order in which the blocks were presented was randomized for each subject.

In the standing balance block, subjects stood on an instrumented platform that translated in 12 equally spaced directions in the horizontal plane (see **Figure 1**). Subjects were instructed to maintain balance without stepping if possible. The platform's displacement was 12.4 cm, velocity was 35 cm/s, and acceleration was 0.5 g. Five trials of each of the 12 directions of perturbation were collected in random order. All subjects were able to maintain balance without taking a step.

In the walking blocks, subjects walked overground slowly (0.6–0.7 m/s) or at a self-selected pace (1.2–1.5 m/s) for approximately 7.5 m, or 7 gait cycles. Subjects were instructed to maintain

vastus medialis (VMED), biceps femoris (BFLH), medial gastrocnemius (MGAS), soleus (SOL), peroneus (PERO), and tibialis anterior (TA) responses. Mean EMG activity was calculated for 3 time bins during the automatic postural response (PR), indicated by the red shaded region, beginning 100 ms (PR1), 175 ms (PR2), and 250 ms (PR3) following perturbation. One complete gait cycle is shown for each walking speed, and the horizontal bar indicates stance (gray) and swing (white) phase. Perturbations during walking were administered in early stance.

a pace as closely as possible to a metronome beat. Subjects listened to 4 metronome beats and then began walking at selfselected time after the metronome was silenced. In slow trials the metronome was set at 60 bpm, and in self-selected trials the metronome pace was matched to each subject's preferred pace, determined when they first arrived. Subjects began walking with their right foot, and data collection began on the third step to eliminate any variability associated with gait initiation. Eight trials of unperturbed walking were collected at the beginning of each block, in which the subject knew there would be no perturbation. In the remaining trials, subjects were told that there may or may not be a perturbation. Twelve trials of unperturbed walking were collected randomly in between the perturbation trials in order to capture any anticipatory responses. In perturbed trials, perturbations (displacement 12.4 cm, velocity 40 cm/s, acceleration 0.7 g) were applied as subjects crossed the instrumented platform halfway along the path, during early stance phase of the right leg. The perturbation was applied when the ground reaction force of the right foot reached ∼60% of body weight as measured by force plates (AMTI, Watertown, MA) embedded in the platform. Perturbation direction was randomized, and three trials of each direction for each walking speed were collected. Data from the four cardinal directions of perturbations and from unperturbed walking trials were analyzed and published previously (Chvatal and Ting, 2012).

Surface EMG activity was recorded from sixteen muscles of the lower-back and leg on the subject's right side, the stance leg in perturbed walking. Muscles recorded included: vastus lateralis (VLAT), rectus femoris (RFEM), rectus abdominis (REAB), biceps femoris long head (BFLH), semitendinosus (SEMT), adductor magnus (ADMG), erector spinae (ERSP), abdominal external oblique (EXOB), vastus medialis (VMED), tibialis anterior (TA), medial gastrocnemius (MGAS), lateral gastrocnemius (LGAS), soleus (SOL), peroneus (PERO), tensor fasciae latae (TFL), and gluteus medius (GMED). EMG data were sampled at 1080 Hz, high pass filtered at 35 Hz, de-meaned, rectified, and low-pass filtered at 40 Hz, using custom MATLAB routines. Additionally, kinetic data was collected at 1080 Hz from force plates under the feet, and kinematic data was collected at 120 Hz using a motion capture system (Vicon, Centennial, CO) and a custom 25-marker set that included head-arms-trunk (HAT), thigh, shank, and foot segments.

### **DATA PROCESSING**

In order to identify muscle synergies, we first generated EMG data matrices from each condition as follows:

In the standing balance condition, three time bins during the automatic postural response were analyzed (Torres-Oviedo and Ting, 2007; Chvatal et al., 2011). The automatic postural response (APR) has been well-characterized and occurs ∼100 ms following the perturbation (Horak and Macpherson, 1996). Due to variations in muscle activity during this APR, we further divided it into three 75-ms time bins beginning 100 ms (PR1), 175 ms (PR2), and 250 ms (PR3) after perturbation onset (**Figure 1A** red shaded areas). Mean muscle activity for each muscle during each time bin was calculated for each trial. These numbers were assembled to form the data matrix used for subsequent muscle synergy analysis,

which consisted of 3 time bins × 12 directions × 5 trials = 180 points for each of the 16 each muscles.

Similarly, in the perturbed walking conditions we also analyzed three 75-ms time bins to characterize the reactive response to perturbation. Mean muscle activity was calculated during three time bins beginning 100 ms, 175 ms, and 250 ms after the perturbation (**Figure 1B** red shaded areas). For perturbed walking, the data matrix consisted of 3 time bins × 12 directions × 3 trials = 108 points for each of the 16 each muscles.

In the unperturbed walking condition, at least three complete gait cycles for each trial were included in the analysis. EMG data were downsampled by averaging the data in 75-ms time bins (Chvatal and Ting, 2012). Reducing the size of the time bins to 10 ms during walking did not affect the number or structure of muscle synergies in prior studies (Chvatal and Ting, 2012), as well as for the current paper (not shown). Time-courses of EMG from unperturbed walking trials of each subject were concatenated to form the data matrix. The size of the data matrix varied across subjects and walking speeds since no time-normalization was performed on walking cycles, but each subject's data matrix had greater than 1044 points for each of the 16 muscles.

For all conditions, the activation of each muscle in each subject was normalized to the maximum activation observed during the unperturbed walking trials at the self-selected walking speed. The elements of each row of a data matrix (each muscle) constructed from unperturbed walking trials at the self-selected speed therefore ranged from 0–1. Identical normalization factors from the unperturbed self-selected walking condition were used for all other conditions for each subject. Tuning curves were generated by plotting the activation of each muscle with respect to perturbation direction within a given time bin.

# **EXTRACTION OF MUSCLE SYNERGIES**

We extracted muscle synergies from each data matrix of EMG recordings using non-negative matrix factorization (NNMF) (Lee and Seung, 1999; Tresch et al., 1999), which has previously been used for muscle synergy analysis (Ting and Macpherson, 2005; Torres-Oviedo and Ting, 2007). NNMF assumes that a muscle activation pattern, M, in a given time period is comprised of a linear combination of a few muscle synergies, Wi that are each recruited by a synergy recruitment coefficient, ci. Therefore, a particular muscle activation pattern, M, would be represented by:

$$\mathbf{M} = \mathbf{c}\_1 \mathbf{W}\_1 + \mathbf{c}\_2 \mathbf{W}\_2 + \mathbf{c}\_3 \mathbf{W}\_3 + \dots$$

where Wi specifies the relative contributions of the muscles involved in synergy *i*. Each muscle synergy has a fixed composition, and each is multiplied by a scalar recruitment coefficient, ci, which changes over time and across conditions. Prior to extracting muscle synergies, each muscle vector in the data matrix was normalized to have unit variance to ensure equal weighting in the muscle synergy extraction. After extracting muscle synergies, the unit variance scaling was removed from data so that each muscle's data was returned to the scale where 1 is the maximum activation during self-selected speed unperturbed walking, in order to permit comparison of responses and muscle synergies across conditions.

Muscle synergies for reactive balance were identified independently in each of the three perturbation conditions: standing, slow walking, and self-selected walking. We extracted 1–16 muscle synergies, and the goodness of fit of the data reconstruction using each number of muscle synergies was quantified by variance accounted for (VAF), defined as 100 × uncentered Pearson's correlation coefficient (Zar, 1999; Torres-Oviedo et al., 2006). The number of muscle synergies selected to describe each dataset *(Nsyn)* was determined by choosing the least number of synergies that could account for greater than 90% of the overall VAF. We added the further local criterion that muscle synergies also accounted for greater than 75% VAF in each muscle and each perturbation direction. This local fit criterion was more stringent and ensured that relevant features of the data set are reproduced. VAF for each muscle (VAFmus) quantified the extent to which the muscle synergies accounted for variability in the activity of individual muscles across all time bins, perturbation directions, and trials. VAF for each perturbation direction (VAFcond) quantified the extent to which the muscle synergies accounted for the variability in muscle activation patterns formed by the response of all 16 muscles to a single perturbation direction during one time bin across all trials.

To validate the similarity of muscle synergies in reactive balance responses across movement contexts, we further identified muscle synergies using two additional methods that combined perturbation conditions. First, muscle synergies identified from perturbation responses during standing were used to reconstruct the responses during the two walking perturbation conditions. Condition-specific muscle synergies were extracted from walking perturbation response data that was not accounted for by the standing muscle synergies. To this end, we used an iterative algorithm that held fixed the muscle synergies extracted from standing data while optimizing a new set of muscle synergies extracted from the remainder of the variability in the walking perturbation data not accounted for by the standing muscle synergies (Cheung et al., 2009; Torres-Oviedo and Ting, 2010; Chvatal et al., 2011). As a second validation, we extracted muscle synergies from a data matrix containing all three perturbation conditions combined, and compared these to the muscle synergies identified from the independent data sets. In a combined extraction, there is a possibility of one condition dominating the others, so most of the results presented are comparisons of the muscle synergies identified from the independent datasets.

Muscle synergies for walking were first identified from unperturbed walking. Since we previously showed that the similar muscle synergies are identified when each walking speed is analyzed individually (Chvatal and Ting, 2012), one set of muscle synergies was extracted from a data matrix consisting of both self-selected speed unperturbed walking catch trials and slow unperturbed walking catch trials, and these muscle synergies were termed "walking" muscle synergies. For each subject, we selected the least number of muscle synergies (*Nsyn*) that satisfied both the global criterion of reconstructing at least 90% of the overall variance (VAF = 90%) as well as the local criterion of reconstructing at least 75% of the variability in each muscle (Chvatal and Ting, 2012). Once *Nsyn* was selected for each condition, the muscle synergies were used to reconstruct the EMG patterns, and measured and reconstructed data were compared for a particular muscle, time bin, and perturbation direction for each trial to examine the ability of the muscle synergies to account for inter-trial variations. Similarities between measured and reconstructed data were quantified using *r*<sup>2</sup> and VAF.

We used two methods to validate the similarity of muscle synergies for reactive balance during standing and unperturbed walking. First, muscle synergies identified from perturbation responses during standing were used to reconstruct unperturbed walking data and *vice versa*. Using the algorithm described above, walking-specific muscle synergies were extracted from unperturbed walking data that was not accounted for by the standing muscle synergies, and standing-specific muscle synergies were extracted from standing perturbation response data that was not accounted for by the walking muscle synergies. Significant differences between reconstructions using the various muscle synergy sets were determined using paired *t*-tests. Second, we also extracted muscle synergies from a data matrix containing both unperturbed walking data and standing perturbation response data combined. To ensure an equal amount of walking and standing data in the combined data matrix, only a single trial of unperturbed walking at each walking speed was included. We first verified that the muscle synergies extracted from the single trial of walking at each speed were similar to the walking muscle synergies described above.

# **MUSCLE SYNERGY COMPARISON**

To determine similarity in muscle synergies across conditions, we compared muscle synergies extracted from reactive balance during standing and walking, as well as from reactive balance compared to unperturbed walking. When comparing two sets of muscle synergies, we calculated correlation coefficients (*r*) between each muscle synergy vector in the first set and each in the second set. A pair of muscle synergies were considered "similar" if they had *r >* 0*.*623, which corresponds to the critical value of *r*<sup>2</sup> for 16 muscles [*r*<sup>2</sup> = 0*.*388; *p* = 0*.*01; see Chvatal et al. (2011) for muscle synergy comparison details].

Across perturbation conditions, we compared the tuning curves of similar synergies to determine if they were recruited for similar perturbation directions in different contexts. We examined the composition and tuning of any condition-specific muscle synergies to determine their potential similarity in function.

# **RESULTS**

# **DIFFERENCES IN INDIVIDUAL MUSCLE ACTIVATION ACROSS PERTURBATION CONDITIONS**

The muscles activated in response to different perturbation directions was generally similar in standing and walking, but could vary greatly in magnitude (**Figure 1**). For example, in one subject, in response to a forward, rightward perturbation, TA and PERO were strongly activated in standing and slow walking, but more weakly activated in the self-selected walking condition; RFEM was activated strongly in standing but less so in walking; trunk muscles ERSP and GMED were strongly recruited in walking but less so in standing (**Figure 1**). Differences in muscle activity across standing and walking conditions were observed across perturbation directions and conditions (**Figure 2**). For example, agonist

pairs of muscles such as MGAS/LGAS and PERO/TA which were activated similarly in perturbation responses during standing had distinct activation patterns in perturbation responses during walking.

Muscle tuning curves revealed some differences in the directional tuning of muscles across perturbation conditions (**Figure 3**). For example, ERSP had the same preferred direction (forward/leftward perturbations) in all conditions, but was more highly activated in response to perturbations during walking

compared to standing. TFL had a different preferred direction of activation in standing (forward/leftward perturbation) compared to walking (rightward perturbations). MGAS was recruited in similar directions (backward perturbations) and magnitudes in response to perturbations during both standing and walking.

walking, self-selected walking), while other muscles have different tuning

# **COMMON MUSCLE SYNERGIES IN PERTURBATION RESPONSES DURING STANDING AND WALKING**

For each perturbation condition, on average, we identified 5*.*2 ± 0*.*9 muscle synergies (range 4–7, **Figure 4**) that explained the variance in muscle activation patterns across directions, time bins, and trials. In perturbation responses during standing, 5*.*6 ± 0*.*8 muscle synergies per subject (range 5–7) were sufficient to account for *>*90% total variability and *>*75% variability in each muscle and condition (all 3 time bins, 12 perturbation directions,

generate tuning curves shown in **Figure 3**.

across conditions.

across 5 trials of each) in the EMG data. In perturbation responses during slow walking, 4*.*9 ± 0*.*7 muscle synergies (range 4–6) were sufficient to explain the same amount of variability. In perturbation responses during self-selected walking, 5*.*1 ± 1*.*2 muscle synergies (range 4–7) were sufficient to explain the same amount of variability. Individual muscles were recruited by multiple muscle synergies; for example, PERO was recruited strongly by W1, but also was recruited by W2 and W5.

When muscle synergies were extracted separately from each perturbation condition, we found similarities in most of the muscle synergies identified (**Figure 4**). Of the muscle synergies extracted from perturbation responses during standing, 3*.*6 ± 1*.*0 (range 2–5) were identified in perturbation responses during slow walking, and 3*.*1 ± 0*.*9 (range 2–4) in perturbation responses during self-selected walking (**Table 1**). Of the 1 to 3 muscle synergies that were identified in perturbation responses to each walking condition that were not identified in perturbation responses to standing, one was similar between the perturbation responses during both walking conditions (e.g., Wsl5 and Wss5, **Figure 4**) for five subjects. The remaining muscle synergy identified in perturbation responses to self-selected walking was similar to a muscle synergy identified in unperturbed walking for most subjects, which will be discussed in detail later.

When muscle synergies extracted from perturbation responses during standing were used to reconstruct perturbation responses during walking, 1*.*7 ± 0*.*5 (range 1–2) additional muscle synergies specific to slow walking perturbations, and 2*.*4 ± 0*.*8 (range 1–3) additional muscle synergies specific to self-selected walking perturbations were required to explain the variability. Across subjects, the minimum VAF across muscles was significantly lower when only muscle synergies from standing perturbations were used to reconstruct walking perturbation responses than when muscle synergies extracted from walking perturbations were used (minimum VAFmus = 67*.*7 ± 11*.*6% vs. 82*.*4 ± 4*.*6%; *p <* 0*.*001). The reconstruction was improved once additional walking perturbation-specific muscle synergies were extracted, as evidenced by an increase in the minimum muscle VAF (minimum VAFmus = 82*.*3 ± 3*.*6%; *p <* 0*.*001). For all subjects, at least one of the walking perturbation-specific muscle synergies were similar to those identified from walking perturbations alone. For one subject all walking perturbation-specific muscle synergies were similar to walking perturbation synergies, for four subjects all but one were similar, for one subject all but two were similar, and in one subject all but three were similar.

Finally, we compared muscle synergies extracted above to those identified when all three perturbation conditions were combined. Across subjects, 6*.*6 ± 1*.*3 synergies could explain *>*90% VAF and *>*75% VAF in each muscle across all three perturbation conditions. For three subjects, all of the condition-specific muscle synergies were also identified from the combined data. For the other four subjects, at least one of the condition-specific muscle synergies were identified from combined perturbation response data. For muscle synergies that were similar across conditions, similar recruitment coefficients were identified from the combined dataset and each condition individually (VAF = 90*.*6 ± 3*.*6%, *r* = 0*.*90 ± 0*.*04).

# **MUSCLE SYNERGY TUNING ACROSS PERTURBATION CONDITIONS**

The differential recruitment of similar muscle synergies accounted for the differences in individual muscle patterns we observed during perturbation responses in both standing and walking. We found different magnitude and directional tuning of muscle synergies that were similar in standing and walking perturbation conditions (**Figure 5**). For example, W1, W2, and W3 were used in perturbation responses during standing as well as during both walking conditions. Muscle synergy recruitment tuning curves revealed differences in the both the magnitude and directional tuning of muscle synergies across perturbation conditions. For example, W2 was recruited for backward perturbations during both standing and walking, and W3 was recruited during postural responses to medial/lateral perturbations in all conditions, but both were more highly recruited in response to perturbations during walking compared to standing. W1 had a different preferred direction of activation in standing (forward and backward perturbations) compared to walking (rightward/forward perturbations).

The muscle synergies identified in standing but not walking perturbation responses were generally recruited for forward or backward perturbations. Four subjects had a muscle synergy tuned for backward perturbations in standing but not walking perturbation responses (**Figure 6A**), and separate set of four subjects had a muscle synergy tuned for forward perturbations


*The first row shows the total number of muscle synergies identified across all subjects for each condition. The second row indicates the number of muscle synergies identified in standing perturbation responses that were also identified in each of the other conditions.*

**(A)** standing perturbation responses as well as perturbation responses during **(B)** slow and **(C)** self-selected walking. W2 was recruited for backward

standing perturbation responses, and for anterior and lateral perturbations in walking perturbation responses.

in standing that was not used in perturbation responses during walking (**Figure 6B**). For example, W4 was highly recruited during standing to move the CoM backward in PR3 during standing responses, but was not identified in perturbation responses during walking (**Figure 6A**), consistent with the goal of moving the CoM forward for forward progression. Similarly, we identified other muscle synergies that were highly recruited to move the center of mass forward in standing that were not used during walking (**Figure 6B**), presumably because the whole-body forward momentum carries the CoM forward during walking.

The muscle synergies specific to perturbation responses during walking were recruited for medial/lateral perturbations. For example, muscle synergies having strong contributions from hamstring muscles and TA (e.g., **Figure 4**, Wsl5 and Wss5) were recruited following leftward perturbations during walking (**Figure 7A**). Additional muscle synergies using PERO and TFL were recruited following rightward perturbations during walking (**Figure 7B**) and resembled a muscle synergy previously identified to emerges in postural responses during standing on one leg (Torres-Oviedo and Ting, 2010). An additional muscle synergy identified in perturbation responses during self-selected walking (Wss4), was similar to a muscle synergy identified in unperturbed walking (Ww6, **Figure 4**, *r* = 0*.*86), and was not strongly recruited in any perturbation direction, possibly playing a trunk stabilization role.

# **COMMON MUSCLE SYNERGIES IN PERTURBATION RESPONSES AND UNPERTURBED WALKING**

Although unperturbed walking generally required a greater number of muscle synergies than perturbation responses during standing, the compositions of several muscle synergies used for walking and standing postural control were similar (see **Figure 4**). In unperturbed walking, on average, 6*.*9 ± 1*.*2 muscle synergies

When standing perturbation muscle synergies were used to reconstruct unperturbed walking, and vice versa, reconstruction quality decreased (minimum VAFmus = 79*.*8 ± 2*.*7% decreased to 48*.*4 ± 13*.*6%, *p* = 0*.*001; and minimum VAFmus 87*.*4 ± 3*.*8% decreased to 56*.*2 ± 21*.*6%, *p* = 0*.*009, respectively). With the addition of condition-specific muscle synergies, reconstructions were improved (minimum VAFmus = 76*.*7 ± 7*.*3%, *p* = 0*.*002; and minimum VAFmus = 82*.*3 ± 4*.*6%, *p* = 0*.*007, respectively). When muscle synergies extracted from perturbation responses during standing were used to reconstruct unperturbed walking, 3*.*5 ± 0*.*9 (range 3–5) additional muscle synergies specific to walking were identified in order to meet reconstruction criteria, consistent with our observation that a greater number of muscle synergies are used in unperturbed walking as compared to perturbation responses during standing. When muscle synergies extracted from unperturbed walking were used to reconstruct perturbation responses during standing, 2*.*0 ± 0*.*8 (range 1–3) additional muscle synergies specific to standing perturbations were required to explain the variability.

Across subjects, 6*.*0 ± 0*.*6 synergies could explain *>*90% VAF and *>*75% VAF in each muscle when standing perturbation data and unperturbed walking data were combined. For all subjects, 5*.*1 ± 0*.*9 (range 4–6) of the muscle synergies identified from standing balance perturbations and 4*.*3 ± 1*.*0 (range 3–6) of the muscle synergies identified from unperturbed walking were also identified when standing and walking data were combined.

Condition-specific muscle synergies reflected differences in the biomechanical demands of each conditions. For example, muscle

standing perturbation responses that were not used in walking perturbation responses were recruited for **(A)** backward or **(B)** forward perturbation

**FIGURE 7 | Recruitment of muscle synergies identified in perturbation responses during walking but not standing.** Muscle synergies used in walking perturbation responses that were not used in standing perturbation responses were recruited for **(A)** leftward or **(B)** rightward perturbation directions, shown for two different subjects.

directions, shown for two different subjects.

synergies used in unperturbed walking that were not used in perturbation responses during standing were comprised of hip/trunk muscles and recruited throughout the gait cycle, suggesting they may play a role in trunk stabilization during walking (**Figure 8**, Ww6). The muscle synergies used in perturbation responses during standing but not in unperturbed walking either had large contributions from TFL and were active for medial/lateral perturbations, or had large contributions from TA and PERO and were active for anterior perturbations (not shown). Some muscle synergies used for posterior CoM movements were common to both perturbation responses during standing and unperturbed walking, but were not identified in perturbation responses during walking (i.e., W4, **Figure 4**).

# **DISCUSSION**

Our results suggest that a common set of muscle synergies form a motor repertoire for both locomotion and reactive balance control. Our work unifies several different studies demonstrating a modular organization underlying variations in motor patterns across different walking and balance conditions. For example, step-by-step variation in muscle activity during walking as well as changes in muscle activity across walking speeds can all be explained by differential recruitment of walking muscle synergies as task demands change (Clark et al., 2010; Chvatal and Ting, 2012). Similarly, a common set of muscle synergies for reactive balance has been shown to underlie trial-by-trial differences in muscle activation patterns both within and across perturbation conditions (Torres-Oviedo and Ting, 2007, 2010; Chvatal et al., 2011). To bridge balance and walking behaviors, our study was the first to examine responses to perturbations during walking

**FIGURE 8 | Recruitment of muscle synergies identified in unperturbed walking.** Ww6 was identified from unperturbed walking but not during perturbation responses in any condition. Shown are the recruitment coefficients for a single trial of unperturbed walking at the self-selected speed. The gray boxes indicate stance phase.

over 12 directions in the horizontal plane. We demonstrate that two different motor behaviors: walking and reactive balance, may in fact be constructed by a common set of muscle synergies. Our prior work also demonstrated that anticipatory changes in the walking pattern could also be described by change in the recruitment of a fixed set of walking muscle synergies (Chvatal and Ting, 2012). Thus, our study supports that idea that muscle synergies form a general repertoire of motor actions that can be recruited by a variety of different neural pathways for voluntary, rhythmic, and reactive motor behaviors (Chvatal and Ting, 2012).

Our findings support prior work demonstrating that changes in the modular organization of walking affect both walking and balance function. For example, the number of muscle synergies for walking is often reduced in the paretic limb of individuals with post-stroke hemiplegia (Clark et al., 2010) or their organization is modified (Gizzi et al., 2011). The reduction in muscle synergies is correlated not only to walking speed (Clark et al., 2010), but also to measures of balance control during standing (Bowden et al., 2010). In Parkinson's disease (PD), a reduced number of muscle synergies has been identified during walking (Rodriguez et al., 2013), but the relationship to postural instability, a cardinal sign of PD, is unknown. Further, degradation in upper limb function after stroke may be a result of changes in the modular organization of long-latency responses (Trumbower et al., 2010, 2013). These initial studies suggest that muscle synergy analysis may be a powerful tool for distinguishing specific deficits in muscle coordination leading to functional impairments that may be generalized across different motor behaviors.

Muscle synergies may therefore form the lowest level of the motor control hierarchy, recruited by parallel descending pathways mediating a wide variety of motor behaviors. Because muscle synergies only prescribe the spatial coordination of muscles to produce a motor function at a given instant in time, they may be concurrently recruited by different neural circuits mediating motor behaviors with common task-level goals. During locomotion, spinal mechanisms specifying the locomotor rhythm are known to be distinct from the spatial patterning of muscles across the limbs (Lafreniere-Roula and McCrea, 2005). Moreover, spatial patterning by muscle synergies is not modified by sensory feedback (Hart and Giszter, 2004; Cheung et al., 2005; Kargo et al., 2010) and is thought to be downstream of the rhythm generation mechanisms (Burke et al., 2001; McCrea and Rybak, 2008). Further evidence suggests that the same muscle synergies can also be recruited by cortical mechanisms to alter the locomotor pattern during motor planning and obstacle avoidance during locomotion (Drew, 1988; Drew et al., 2002, 2008), or anticipation of a balance perturbation during walking (Chvatal and Ting, 2012). Similarly, in reactive balance, the temporal patterning of the long-latency sensorimotor feedback response also appears to be independent of the precise spatial patterning of muscles defined by muscle synergies (Welch and Ting, 2009; Torres-Oviedo and Ting, 2010; Chvatal et al., 2011). Both voluntary and reactive balance responses are thought to be mediated by brainstem pathways (Schepens and Drew, 2004; Deliagina et al., 2008; Schepens et al., 2008; Lyalka et al., 2009) and may recruit different muscles synergies depending upon task demands. In reduced preparation, common muscle synergies are used in both automatic and reactive motor behaviors (Cheung et al., 2005; Kargo and Giszter, 2008; Roh et al., 2011). Therefore, muscle synergies may form a modular repertoire of actions that is specific to any given motor task, but recruited by a variety of neural pathways governing different motor behaviors.

As long-latency response to perturbations are modified with task-level goals during posture and movement, it is likely that common mechanisms govern reactive balance responses in standing and walking. Following perturbations during both standing (Horak and Nashner, 1986; Torres-Oviedo and Ting, 2007) and walking (Pijnappels et al., 2005; van Der Linden et al., 2007; Chvatal and Ting, 2012) muscles exhibit similar longlatency responses (∼100 ms in humans). Moreover, long-latency responses coordinate the stance and swing limbs even when afferent inputs originate from a single limb (Dietz et al., 1989; Tang et al., 1998; Dietz and Duysens, 2000; Ting et al., 2000; Reisman et al., 2005; Duysens et al., 2013). Therefore, in contrast to shortlatency responses that simply return the limb posture to the original configuration (Horak and Macpherson, 1996), long-latency motor patterns reflect abstract task-level goals such as controlling endpoint or CoM motion, which cannot be specified by independent joint-level controllers and require multisensory information. Moreover, it has been shown in both the upper and lower extremity that long-latency responses are modulated by motor planning, obstacle avoidance, and voluntary movement goals (Marsden et al., 1972, 1981; Carpenter et al., 1999; Pruszynski et al., 2009; Shemmell et al., 2010; Pruszynski and Scott, 2012) and are influenced by cortical contributions (Evarts and Tanji, 1976; Cheney and Fetz, 1984; Taube et al., 2006; Pruszynski et al., 2011). Accordingly, we previously found that reactive balance responses during whole-body reaching were modified to support target acquisition (Trivedi et al., 2010). Similarly, in this study, muscle synergy W4 is recruited in reactive balance during standing to move the CoM backwards, but not during walking, presumably so as not to inhibit forward progression. Likewise, our recent studies further demonstrate that the recruitment of muscle synergies during reactive balance reflect task-relevant error, e.g., deviation of the CoM from the upright condition (Safavynia and Ting, 2013), even when perturbations are imposed when the body is already deviated from the desired state. Therefore, long-latency mechanisms which restore the body to the upright equilibrium state during standing balance appear to be modified during voluntary movements and walking to support the return of the body to the desired trajectory (Pozzo et al., 1990; Borghese et al., 1996).

Here, differences observed in muscle synergies identified across walking and reactive balance conditions could be explained by differences in the task-level goals. Muscle synergies that appear to be specific to walking perturbations were similar to a previously identified muscle synergy in perturbations during standing on one leg (Torres-Oviedo and Ting, 2010). Therefore, these muscle synergies likely reflect the additional biomechanical challenges associated with single limb stance, whether it occurs during standing or walking. Although this muscle synergy was not used in unperturbed walking, it might be expected to contribute to walking conditions that require more non-sagittal plane control such as turning (Courtine and Schieppati, 2003). Further, muscle synergies used during sagittal plane motions in standing balance recovery were absent in perturbations to walking, consistent with the different goals for CoM motion in standing and walking. For example, some muscle synergies were recruited following forward perturbations in standing to move the CoM forward back to the original position, but were not identified in walking (**Figure 6B**); presumably the forward momentum of the body during walking was sufficient to move the CoM forward in response to forward perturbations during walking. Similarly, following backward perturbations which caused the body to fall forward, W4 was recruited in standing to move the CoM back to the original position (**Figure 6A**), whereas in walking these perturbations were consistent with the goal of moving the CoM forward for forward progression, so recruiting W4 was not necessary. Furthermore, these muscle synergies for posterior CoM movements identified in perturbation responses during standing but not walking were actually found to contribute to unperturbed walking, likely acting to decelerate the limb near the end of swing. Additionally, some muscle synergies involving hip and trunk muscles used in unperturbed walking that were not found in the standing perturbation responses measured here (Ww6, **Figure 8**) are similar to muscle synergies that emerge under more dynamic perturbations in standing balance that were hypothesized to stabilize trunk orientation (Safavynia and Ting, 2013). Together these findings suggest that muscle synergies provide a motor repertoire for the lower limbs and trunk across diverse balance and locomotor behaviors.

A complementary notion of modularity in locomotion has focused on the generation of temporal patterns during locomotion. A set of fixed temporal patterns governing muscle activity has been identified during walking, even when voluntary actions are concurrently performed (Ivanenko et al., 2004, 2005). These studies demonstrated that the neural mechanisms governing rhythmic generation of motor commands may also be modular. However, reactive balance is clearly a feedback response that depends closely on the characteristics of the perturbation (Lockhart and Ting, 2007; Welch and Ting, 2009), and as we have shown here, the temporal response to perturbations in reactive balance is likely controlled independent from the locomotor rhythm. When using NMF to identify fixed temporal patterns, the spatial muscle coordination pattern necessarily varies across timepoints, as detailed in Safavynia, 2012 (Safavynia and Ting, 2012). However, it is likely that the fixed temporal patterns for locomotor rhythm generation recruit spatially-fixed muscle activation patterns, such as the muscle synergies identified here. Dissociating modularity in the temporal and spatial domains requires a hierarchical analysis procedure to first identify modularity in spatial muscle activity, followed by modularity in temporal muscle activity (Safavynia and Ting, 2012, 2013). Our results suggest that reactive balance responses during walking are achieved by distinct mechanisms governing reactive balance, which may be superimposed upon the rhythmic walking patterns and recruit a common set of spatially-fixed muscle synergies.

Although muscle synergies can explain a large proportion of variability observed in muscle activity, using muscle synergy analysis to draw conclusions about neural mechanisms has limitations. First, the selection of the number muscle synergies could vary depending on the method used. Typically, to ensure that the results are physiologically interpretable, several different criteria must be achieved. Overall VAF is typically a poor indicator of the goodness of fit, whereas as local criteria, such as the VAF for a particular experimental condition or phase of gait reflect the degree to which actual muscle coordination patterns are reconstructed (Ting and Chvatal, 2010). Remaining variability in muscle activity may reflect sensorimotor noise, or other neural mechanisms such as short-latency reflex responses and may not be accounted for by recruitment of muscle synergies (Ting, 2007). In particular, heterogenic reflex responses (Nichols, 1994) may have different organization than muscle synergies for long-latency responses and voluntary movements (Trumbower et al., 2013). Further, motoneuron excitability may vary with joint angle (Hyngstrom et al., 2007), causing differences in apparent muscle synergy composition across postures. One strength of the muscle synergy analysis is that the number of independent motor command signals is not affected by crosstalk in EMG signals, however, crosstalk will alter the apparent composition of muscle synergies extracted. While it is not possible to dissociate co-activation from crosstalk in adjacent muscles, muscle synergy analysis can identify whether a muscle is activated independent from an adjacent muscle even in the presence of crosstalk. Here, we only performed comparisons of muscle synergies within the same subject, such that effects of any possible crosstalk would carry over from one condition to the next, and do not affect conclusions about similarity of muscle synergies across conditions. However, such crosstalk could be more problematic when comparing muscle synergies across subjects. Finally, the number of muscle synergies that can be identified is limited by the number of muscle signals analyzed as well as by the number of disparate conditions examined. Therefore, the number of muscles recorded and the number of experimental conditions must both be sufficiently high such that a sufficiently diverse set of muscle coordination patterns is represented in the dataset. Nonetheless, given the appropriate experimental design,

# **REFERENCES**


determinants of human locomotion. *J. Physiol.* 494(Pt 3), 863–879.


muscle synergy analysis can help describe and potentially predict muscle coordination patterns in a functional and physiologicallyrelevant way.

A modular organization of spatial motor patterns may be a common principle for control of the upper and lower limbs useful for discerning mechanisms of motor deficit. Although muscle synergies for the upper and lower limbs may have different neural substrates, common principles likely govern their recruitment and organization. Muscle synergies for reaching may be organized by pyramidal cells in the motor cortex which project to multiple motoneurons. (Holdefer and Miller, 2002; Gentner and Classen, 2006; Overduin et al., 2008; Gentner et al., 2010; D'Avella et al., 2011). Pyramidal cells can also project to reticulospinal (Davidson et al., 2007) and propriospinal interneurons (Rathelot and Strick, 2009; Alstermark and Isa, 2012) which may explain residual motor function following stroke (Davidson and Buford, 2006). Muscle synergies for lower-limb movements are more likely encoded in the spinal cord (Hart and Giszter, 2004, 2010; Cheung et al., 2005; Kargo et al., 2010) and recruited by different neural pathways in the spinal cord, brainstem, and higher brain regions (Roh et al., 2011). By dissociating spatial from temporal aspects of motor coordination, muscle synergy analysis may aid in identifying neural impairments that are not evident in current clinical measures of motor function (Wolf et al., 1997; Coote et al., 2009; Hackney and Earhart, 2010). Such information may be important in identifying specific neural pathways that should be targeted for rehabilitation interventions, as well as for predicting generalized deficits in motor behaviors that are not specific to the particular tasks performed.

# **ACKNOWLEDGMENTS**

This work was supported by NIH Grant NS058322 to Lena H. Ting, and NSF Graduate Research Fellowships to Stacie A. Chvatal.

sensitivity of stretch reflexes and balance corrections for normal subjects in the roll and pitch planes. *Exp. Brain Res.* 129, 93–113.


motor cortex control. *J. Physiol.* 586, 1239–1245.


movements. *J. Neurophysiol.* 98, 327–333.


in the intact cat as revealed by cluster analysis and direct decomposition. *J. Neurophysiol.* 96, 1991–2010.


architecture principle. *Acta Anat. (Basel)* 151, 1–13.


(2006). Direct corticospinal pathways contribute to neuromuscular control of perturbed stance. *J. Appl. Physiol.* 101, 420–429.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 21 December 2012; accepted: 08 April 2013; published online: 02 May 2013.*

*Citation: Chvatal SA and Ting LH (2013) Common muscle synergies for balance and walking. Front. Comput. Neurosci. 7:48. doi: 10.3389/fncom. 2013.00048*

*Copyright © 2013 Chvatal and Ting. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

# Between-subject variability of muscle synergies during a complex motor skill

#### *Julien Frère <sup>1</sup> and François Hug2 \**

*<sup>1</sup> Laboratory* - *Motricité, Interactions, Performance , University of Maine, Le Mans, France <sup>2</sup> Laboratory* -*Motricité, Interactions, Performance , University of Nantes, Nantes, France*

### *Edited by:*

*Andrea D'Avella, IRCCS Fondazione Santa Lucia, Italy*

### *Reviewed by:*

*Vincent C. K. Cheung, Massachusetts Institute of Technology, USA Gelsy Torres-Oviedo, University of Pittsburgh, USA*

### *\*Correspondence:*

*François Hug, Laboratory* - *Motricité, Interactions, Performance (EA 4334), University of Nantes, 25 bis boulevard Guy Mollet, BP 72206, 44322 Nantes cedex 3, France. e-mail: francois.hug@univ-nantes.fr* The purpose of the present study was to determine whether subjects who have learned a complex motor skill exhibit similar neuromuscular control strategies. We studied a population of experienced gymnasts during backward giant swings on the high bar. This cyclic movement is interesting because it requires learning, as untrained subjects are unable to perform this task. Nine gymnasts were tested. Both kinematics and electromyographic (EMG) patterns of 12 upper-limb and trunk muscles were recorded. Muscle synergies were extracted by non-negative matrix factorization (NMF), providing two components: muscle synergy vectors and synergy activation coefficients. First, the coefficient of correlation (*r*) and circular cross-correlation (*r*max) were calculated to assess similarities in the mechanical patterns, EMG patterns, and muscle synergies between gymnasts. We performed a further analysis to verify that the muscle synergies (in terms of muscle synergy vectors or synergy activation coefficients) extracted for one gymnast accounted for the EMG patterns of the other gymnasts. Three muscle synergies explained 89*.*9 ± 2*.*0% of the variance accounted for (VAF). The coefficients of correlation of the muscle synergy vectors among the participants were 0*.*83 ± 0*.*08, 0*.*86 ± 0*.*09, and 0*.*66 ± 0*.*28 for synergy #1, #2, and #3, respectively. By keeping the muscle synergy vectors constant, we obtained an averaged VAF across all pairwise comparisons of 79 ± 4%. For the synergy activation coefficients, *r*max-values were 0*.*96 ± 0*.*03, 0*.*92 ± 0*.*03, and 0*.*95 ± 0*.*03, for synergy #1, #2, and #3, respectively. By keeping the synergy activation coefficients constant, we obtained an averaged VAF across all pairwise comparisons of 72 ± 5%. Although variability was found (especially for synergy #3), the gymnasts exhibited gross similar neuromuscular strategies when performing backward giant swings. This confirms that the muscle synergies are consistent across participants, even during a skilled motor task that requires learning.

**Keywords: motor modules, muscle coordination, nonegative matrix factorization, motor primitives, electromyography, backward giant circle, gymnastics**

# **INTRODUCTION**

Understanding how the central nervous system controls movement of the human body is a challenging question due to the biomechanical redundancy of the neuromusculoskeletal system, which is referred to as Bernstein's degrees of freedom problem (Bernstein, 1967). For example, at the neuromuscular level, the same movement can be performed by different muscle coordination strategies across trials (Torres-Oviedo and Ting, 2007) and/or between subjects (Ryan and Gregor, 1992; Hug et al., 2004). Low-dimensional modules formed by muscles activated in synchrony, referred to as muscle synergies, have been proposed as building blocks that may simplify the construction of motor behaviors (Ivanenko et al., 2003; d'Avella and Bizzi, 2005; Ting and McKay, 2007; Torres-Oviedo and Ting, 2007; Ting and Chvatal, 2010). The decomposition of multiple surface electromyographic (EMG) signals can be used to extract these synergies. This decomposition algorithm is based on two components: "muscle synergy vectors" which represent the relative weighting of each muscle within each synergy; and a "synergy activation coefficient" which represents the recruitment of the muscle synergy over time (Torres-Oviedo and Ting, 2007; Hug et al., 2011). Some previous research has proposed that temporal recruitment patterns are invariant while the weights can change across subjects/test conditions (Ivanenko et al., 2004, 2005; Cappellini et al., 2006; Dominici et al., 2011). Others have suggested that the muscle synergies are spatially fixed (i.e., muscle weightings are invariant) across subjects/test conditions while temporal recruitment patterns can change (Saltiel et al., 2001; Hart and Giszter, 2004; Torres-Oviedo and Ting, 2007; Hug et al., 2011; Safavynia and Ting, 2012). In line with this latter proposition, it has been shown during both postural (Torres-Oviedo and Ting, 2010) and locomotor tasks (Hug et al., 2011; Chvatal and Ting, 2012) that muscle synergy vectors (i.e., muscle weightings) are robust across various mechanical constraints allowing the temporal recruitment to vary according to the task demand. Moreover, altering the recruitment pattern of spatially fixed muscle synergies can produce different motor behaviors in animals (Cheung et al., 2005; Kargo et al., 2010).

As proposed by Safavynia et al. (2011), the acquisition of new motor skills can encourage the development of new muscle synergies, change the composition of existing synergies, and/or change their temporal activation. Through the process of learning, the modulation of the number and/or the composition of muscle synergies has been identified for postural tasks in humans (Asaka et al., 2008; Danna-Dos-Santos et al., 2008) and for reach-tograsp tasks in rodents (Kargo and Nitz, 2003). Simultaneously with improving performance, the composition of the muscle synergies modulates toward consistent patterns across the animals, in terms of both synergy composition and temporal recruitment (Kargo and Nitz, 2003). Hug et al. (2010) reported similar muscle synergies among trained cyclists. However, one common feature of the movements studied in the aforementioned studies is that they can be considered as fundamental or basic movement skills (mainly locomotor and balance skills) and consequently all healthy subjects would be able to perform them with similar mechanical performance in terms of both kinematics and kinetics. As evidence, kinetic patterns in terms of both effective force and mechanical effectiveness are very similar between untrained subjects and trained cyclists (Sanderson, 1991; Mornieux et al., 2008).

The purpose of the present study was to determine whether experts exhibit similar neuromuscular control strategies during a complex motor skill. In other words, did the learning process necessary to perform this task led to similar muscle synergies or did each individual develop specific synergies related to their personal anthropometric, anatomical, or muscular characteristics? In order to answer these questions, we looked at a homogeneous population of nine experienced gymnasts performing giant swings on a high bar. This cyclic movement is interesting because it requires learning, as untrained subjects are unable to perform this task. As proposed by Wulf and Shea (2002), motor tasks can be qualified as "complex" if they cannot be mastered in a single session. Consequently, we considered the gymnastic backward giant swing on a high bar as a complex motor skill that would provide a contrast to fundamental motor skills such as balance, walking, or pedaling. For the purpose of this study, we used a non-negative matrix factorization (NMF) algorithm to identify muscle synergies from surface electromyographic recordings performed on 12 upperlimb and trunk muscles of the right side. In light of recent studies (Chvatal and Ting, 2012; Safavynia and Ting, 2012), we hypothesized that performing a complex motor performance would result from the recruitment of similar spatially fixed muscle synergies, which would be flexibly recruited over the giant swings, across all individuals.

# **MATERIALS AND METHODS PARTICIPANTS**

Nine gymnasts performing at national level (age: 19*.*8 ± 2*.*5 years, height: 171 ± 8 cm, body mass: 66 ± 8*.*1 kg) and with 14 ± 3 years of training experience participated in this study. They were informed of the purpose of the study and methods used before providing written consent. The local ethics committee (University of Nantes) approved the study, and all the procedures conformed to the Declaration of Helsinki (last modified in 2004).

# **PROTOCOL**

Participants were asked to perform two sets of 11–12-linked backward giant swings, with 3–5 min of rest period in-between. A giant swing was defined as a complete rotation of the subject around the high bar. In this study, we considered the beginning and the end of a giant swing as when the gymnast was in the vertical position under the bar (**Figure 1**). To manage a complete rotation around the bar, the gymnasts can vary their body length to account for the loss of velocity due to the effect of friction between the hands and the bar. The gymnasts extend their body away from the bar to lengthen his radius of gyration during the descent phase, and shorten their body in the ascent phase (Sevrez et al., 2009). To do this and in line with the recommendations from the point code of the International Gymnastic Federation, the elbow and knee joints should be maintained in extension and only flexion-extension of the shoulder and hip joints are authorized for varying the body length.

# **MATERIALS AND DATA COLLECTION**

## *Motion analysis*

The giant swings were recorded with a video camera (Casio Exilim EX-ZR100, Casio Computer Co. Ltd., Tokyo, Japan) in the main plane of movement (i.e., sagittal plane) with a sampling frequency of 120 Hz. The camera was placed along the longitudinal axis of the bar at a distance of 5 m and a height of 2.60 m, equivalent to the height of the bar from the landing mat. The placement of the video camera had to cover a sufficient range to record the entire body of the gymnast during the giant swing, with the high bar at the center of the field. The calibration square was 1 × 1 m and the origin of the inertial coordinate system was located at the center of the bar in its neutral position. The x-axis was defined as the horizontal axis in the main plane of movement. The y-axis was defined as the vertical axis. The angular position of the gymnast from the bar was defined as the angle formed by the axis linking the femoral trochanter (hip marker) with the bar and the y-axis of reference (below the bar). To reconstruct a multisegment model of the gymnast, adhesive strips were placed on defined body locations for use as markers. The digitization of body marks was performed using Skillspector© software (Video4coach, Svendborg, Denmark) for the lower extremity (ankle, knee, and hip joints) and the upper extremity (wrist, elbow, and shoulder joints). The trunk was delineated by the shoulder and the hip. Thus, the model was composed of five segments (**Figure 2**): the arm, forearm, trunk, thigh, and leg. The masses and moments of inertia of different segments were calculated using an anthropometric table (de Leva, 1996), which was adjusted with consideration that the digitized model had one upper and one lower limb. Position data was smoothed using a 4th order low-pass Butterworth filter with a cut-off frequency of 5 Hz.

# *Surface electromyography*

From the assumption of symmetry in the actions of both upper limbs during the execution of the elements on the high bar, and to avoid electrocardiogram artifacts, the activity of 12 muscles

the angle formed by the axis linking the femoral trochanter (hip marker) with the bar and the vertical axis of reference (below the bar); **(B)** Evolution of the angular position of the gymnast as a function of the normalized time of the backward giant swing. The black bold line represents the mean value of the

swings in a very similar fashion (for more details see section "Mechanical Data"). The relationship between the angular position and the normalized time of the backward giant swing is not linear. The lower part (0–90◦ and 270–360◦ ) is shorter than the upper part (90–270◦ ) of the giant swing.

of the right side of the body was recorded: flexor digitorum (FD), short head of the biceps brachii (BBsh), long head of the biceps brachii (BBlh), lateral head of the triceps brachii (TB), clavicular (anterior) and scapular (posterior) parts of the deltoideus (DC and DS, respectively), upper part of the trapezius (TZ), latissimus dorsi (LD), sternocostal part of the pectoralis major (PM), rectus abdominis (RA), erector spinae at level of L4 (ES), and rectus femoris (RF). The surface EMG recordings were made using self-adhesive Ag/AgCl pairs of electrodes (Blue sensor N, Ambu, Denmark) with an inter-electrode distance of 20 mm (center-to-center). The electrodes were placed longitudinally with respect to the underlying muscle fiber arrangement (de Luca, 1997) and were located according to the recommendations of Surface EMG for Non-Invasive Assessment of Muscles (SENIAM) (Hermens et al., 2000) when available. For back muscles (TZ, LD, and ES), the electrodes position was according to de Sèze and Cazalets (2008). Skin was shaved and cleaned with alcohol and ether to minimize impedance before applying the electrodes. The wires connected to the electrodes were secured carefully with adhesive tape to avoid any movement-induced artifacts. Raw EMG signals were preamplified close to the electrodes (gain 375, bandwidth 8–500 Hz) at a sampling rate of 1000 Hz (ME6000, Mega Electronics Ltd., Kuopio, Finland). The EMG device was firmly attached on a belt during execution of the giant swings.

### *EMG-video synchronization*

To synchronize the motion capture with the EMG recordings, percutaneous muscular stimulation (model DS7A, Digitimer Ltd., Letchworth Garden City, UK) was performed on the forearm muscles of the gymnast prior to and subsequent to each set of 12-linked giant swings. Both video and EMG were recorded when the stimulation was applied, bringing a brief artifact on the EMG signal of the FD muscle and lighting a LED in the field of the video camera.

### **DATA ANALYSIS**

### *Biomechanical profile of the giant swing*

Kinematic and dynamic variables were extracted from the motion capture such as the horizontal and vertical positions of the center of gravity (CG) of each segment and of the gymnast's model (in m), the angular velocity of the gymnast (ω*G*, in ◦/s), and the shoulder and hip flexion-extension angle (in degree). Herein a flexion of the shoulder joint refers to a decrease in the trunkupper arm angle of the digitized model, which is in contrast to the clinical frontal shoulder flexion that generally corresponds to an increasing angle between the trunk and the arm. The moment of inertia (*IG*, in kg.m2) around the gymnast's CG and the gymnast's total body energy (*E*Tot, in Joule/kg) were calculated. The moment of inertia around the gymnast's CG was computed as follows:

$$I\_G = \sum\_{i=1}^{5} \left[ I\_i + (M \cdot m\_i) \cdot d\_i^2 \right],\tag{1}$$

where *Ii* was the moment of inertia of the *i*th segment, *M* the mass of the gymnast, *mi* the mass of the *i*th segment, *di* the distance between the CG of the *i*th segment and the CG of the whole gymnast's body. According to de Leva (1996), the moment of inertia of each segment *i* was equal to:

$$I\_i = (M \cdot m\_i) \cdot (l\_i \cdot r\_i)^2,\tag{2}$$

where *li* was the length of the *i*th segment and *ri* was the radius of gyration of the *i*th segment about the transversal axis expressed as a proportion of the segment length.

According to Arampatzis and Brüggemann (2001), the gymnast's total body energy was equal to:

Mechanical energy of the gymnast model. The black bold line represents the

$$E\_{\rm Tot} = \sum\_{i=1}^{5} (1/2 \cdot (M \cdot m\_i) \nu\_i^2 + 1/2 \cdot I\_i \alpha\_i^2 + (M \cdot m\_i)gh\_i), \quad (3)$$

where *vi* was the linear velocity of the *i*th segment, ω*<sup>i</sup>* the angular velocity of the *i*th segment, *g* the acceleration due to gravity, and *hi* the height of the *i*th segment center of gravity. *E*Tot was normalized to the body mass of the gymnast for comparison purpose. The examined variables (positions of the CG of the gymnast, joint angles, angular velocity, moment of inertia, and mechanical energy) were presented as a function of the body position angle, from 0 to 360◦ with 0◦ corresponding to the vertical axis below the high bar.

### *Extraction of muscle synergies*

variables were similar among the participants.

As inter-cycle variability contains important information for identifying the muscle synergies (Clark et al., 2010; Ting et al., 2012), they were extracted from a set of nine consecutive giant swings, with the first and last giant swing being automatically removed. EMG signals were band-pass filtered (20–450 Hz, Butterworth filter, 2nd order), rectified, smoothed with a zero lag low-pass filter (9 Hz, Butterworth filter, 2nd order), and timenormalized in order to obtain 200 data points for each giant swing. EMG was normalized to the maximum level of activity across all giant swings (Turpin et al., 2011a). Therefore, as is classically done in studies focusing on muscle synergies, the degree of muscle activity was not taken into consideration.

NMF was performed from this dataset. For this purpose, we implemented the Lee and Seung (2001) algorithm. Matrix factorization minimizes the residual Frobenius norm between the initial matrix and its decomposition, and is given as:

$$\mathbf{E} = \mathbf{W}\mathbf{C} + \mathbf{e} \tag{4}$$

$$\min\_{\begin{subarray}{c}W\geq 0\\C\geq 0\end{subarray}} \|\mathbf{E} - \mathbf{W}\mathbf{C}\|\_{\text{FRO}}\tag{5}$$

where **E** is a *p*-by-*n* initial matrix (*p* = number of muscles and *n* = number of time points), **W** is a *p*-by-*s* matrix (*s* = number of synergies), **C** is a *s*-by-*n* matrix, and *e* is a *p*-by-*n* matrix. •FRO establishes the Frobenius norm, **W** represents the muscle synergy vectors matrix, **C** is the synergy activation coefficients matrix, and **e** is the residual error matrix. The algorithm is based on iterative updates of an initial random guess of **W** and **C** that converge to a local optimal matrix factorization [see Lee and Seung (2001) for more details]. To avoid local minima, the algorithm was repeated 20 times for each subject. The lowest cost solution was kept (i.e., minimized squared error between original and reconstructed EMG patterns). The initial matrix **E** consisted of 9 consecutives giant swings for the 12 muscles. As each giant swing was interpolated to 200 time points, **E** was a 12-row and 1800-column matrix.

We iterated the analysis by varying the number of synergies between 1 and 12 and then selected the least number of synergies that accounted for *>*90% of variance accounted for (VAF) (Torres-Oviedo et al., 2006) or until adding an additional synergy did not increase VAF by *>*5% of VAF (Clark et al., 2010). Mean total VAF was defined as (Torres-Oviedo et al., 2006):

$$\text{VAF} = 1 - \frac{\sum\_{i=1}^{p} \sum\_{j=1}^{n} (e\_{i,j})^2}{\sum\_{i=1}^{p} \sum\_{j=1}^{n} (E\_{i,j})^2} \tag{6}$$

As the determination of the correct number of muscle synergies is not a trivial matter (Tresch et al., 2006), we further confirmed our results by using the best linear fit (BLF) method which selected the smallest *n* such that a linear fit of the "VAF" vs. "number of synergies" curve, from *n* to 12, had a residual mean square error of less than 5 × 10−<sup>5</sup> [i.e., the point at which the VAF curve plateaus to a straight line; see Cheung et al. (2005); Ajiboye and Weir (2009)]. Finally, we used a method reported by Cheung et al. (2009), named "knee point (KP)" herein. Briefly, the "VAF" vs. "number of synergies" curve was constructed from both the original EMG dataset and an unstructured EMG dataset generated by randomly shuffling the original dataset across time and muscles. *n* was then defined as the point beyond which the original-slope drops below 75% of the surrogate-slope. This corresponds to the number beyond which any further increase in the number of extracted synergies yields a VAF increase smaller than 75% of that expected from chance.

We calculated VAF for each muscle (VAFmuscle) to ensure that each muscle activity pattern was well accounted for by the extracted muscle synergies [for further details, see Hug et al. (2011)]. Finally, to further determine the subject-specific dimensionality of the data we calculated, for each gymnast, VAF for each of the extracted muscle synergies.

### *Cross-validation of the extracted muscle synergies*

To verify the within-subject consistency of the extracted muscle synergies, we used a cross-validation procedure as proposed by previous research (e.g., Cheung et al., 2005, 2009; Ting and Chvatal, 2010). For each participant, we checked that the muscle synergy vectors extracted for one set of giant swings (first set) accounted for the EMG patterns in the other set. To do this, the muscle synergy matrix (muscle synergy vectors) was held fixed in the algorithm and the coefficients matrix was free to vary [for additional details, see Hug et al. (2011)].

### *Between-subject similarity*

The comparison of the shape (i.e., waveform) of mechanical patterns, individual EMG patterns and synergy activation coefficients was assessed using two criterions: the Pearson's correlation coefficient (*r*) and the circular cross-correlation coefficient (*r*max). We also calculated the absolute lag times that assess differences in the timing of the activations (i.e., the magnitude of the time shift between mechanical patterns, EMG patterns or between synergy activation coefficients) as the lag time at the maximum of the cross-correlation function. As we are aware of the fact that *r*-values, *r*max-values and lag times provide some redundant information, we chose to report all of these indexes to increase our ability to compare our results with other studies that did not necessary report all these types of information. The index of similarity corresponded to both the averaged *r*- and *r*max-value between each pair of participants.

We then determined the similarity of muscle synergy vectors across participants by calculating a Pearson's correlation coefficient between each pair of participants. Based on the same principle that was previously described by Safavynia and Ting (2012), we considered a pair of muscle synergy vectors to be similar if *r* = 0*.*71, which corresponds to the critical value of *r* for 10 degrees of freedom (i.e., 12 − 2 muscles) at *p* = 0*.*01. However, because the NMF algorithm constrains muscle weightings to be non-negative, one would expect positive correlation by chance (Safavynia and Ting, 2012). Therefore, for each extracted synergy we generated 1000 random permutations of the weightings obtained from the extraction of the muscle synergy vectors. Then we calculated the *r*-value for each pair (36 pairs × 1000 iterations = 3600 *r*-values) and for each synergy, yielding a distribution of *r*-values expected by chance. An *r*-value of 0.71 corresponded to the 99th percentile of the distribution. Consequently, we considered a pair of muscle synergy vectors with a *r* = 0*.*71 more similar than expected by chance, and thus muscle synergy vectors with a *r <* 0*.*71 were considered different.

To further assess the similarity of the muscle synergies between the gymnasts, we checked that the muscle synergies extracted from one gymnast accounted for the overall and individual EMG patterns of each of the other gymnasts. The first step aimed at identifying the robustness of the muscle synergy vectors across the subjects. To do this, the muscle synergy vectors matrix extracted from one subject (i.e., control subject herein, **W**control) was held fixed in the NMF algorithm while the activation coefficient matrix of the compared subject (**C**subject) was free to vary (Torres-Oviedo et al., 2006; Hug et al., 2011). **C**subject was initialized with random values and iteratively updated until convergence. The EMG data matrix of the compared subject (**E**subject) was provided to the algorithm with the following update rule (Lee and Seung, 2001):

$$(\mathbf{C\_{subject}})\_{ij} \leftarrow (\mathbf{C\_{subject}})\_{ij} \frac{(\mathbf{W\_{control}E\_{subject}})\_{ij}}{(\mathbf{W\_{control}W\_{control}C\_{subject}})\_{ij}} \tag{7}$$

This process was performed for each of the 72 pairwise comparisons (nine gymnasts compared with the eight others). The overall VAF and VAFmuscle were used to quantify the success of the fixed muscle weightings and the newly computed synergy activation coefficients to reconstruct the EMG patterns. A VAFmuscle *>* 75% was considered satisfying (Torres-Oviedo and Ting, 2007). The second step was similar to the first but aimed to determine the robustness of the activation coefficients across the participants by fixing the activation coefficients matrix (**C**control) while the muscle synergy vectors matrix (**W**subject) was free to vary. Finally, a Two-Way ANOVA (factors: muscles and reconstruction methods, i.e., fixed muscle synergy vectors vs. fixed synergy activation coefficients) was used to determine whether VAFmuscle differed between the muscles and was influenced by the reconstruction method (fixed vectors vs. fixed coefficients). *Post-hoc* analyses were made with Scheffe's tests. The level of significance was *p* = 0*.*05.

# **RESULTS**

### **MECHANICAL DATA**

**Figure 2** depicts the kinematic and dynamic data computed from the motion capture of the giant swings. Due to the hip and the shoulder flexion, the gymnasts managed to increase their angular velocity (on average from 230 to 275◦/s between 35 and 90◦ of the giant swing), which when associated with the decrease in the moment of inertia of the gymnast, allowed an increase in mechanical energy to a sufficient level to attain the handstand position above the bar (i.e., 180◦ of the giant swing).

Relative to the horizontal and vertical positions of the CG of each segment of the gymnast's model, the indices of similarity (i.e., *r* and *r*max) were extremely high, ranging from 0*.*98 ± 0*.*02 to 1*.*00 ± 0*.*00. Regardless of the height of the participants, the trajectory of the CG of the gymnast's model was similar among them (**Figure 2**) with an averaged *r*-value and *r*max-value of 1*.*00 ± 0*.*00. The averaged absolute lag time between each pair of participants was below 1% of the giant swing for each kinematic parameter (horizontal and vertical position of the segments' and gymnast's CG). The indices of similarity for the shoulder and hip joint angles, the angular velocity, the moment of inertia, and the mechanical energy of the gymnast are reported in **Table 1**. Except for the moment of inertia that exhibited a low averaged *r*-value (0*.*32 ± 0*.*55 and range: −0.81 to 0.97) due to large time shifts (averaged lag time = 17*.*2 ± 17*.*3%; range: 0.5–50.0% of the giant swing), all the biomechanical

**Table 1 | Similarity of the kinematic and dynamic parameters across participants.**


*Values are mean (min-max). Lags were calculated as the lag times that maximized the cross-correlation function and correspond to the absolute time shift between the two waveforms (% giant swing).*

*Except for the moment of inertia that exhibited a low averaged r-value due to large time shifts (about 17% of the giant swing), all the biomechanical variables were similar among the participants.*

variables were similar among the participants. The lag that leads to differences in moment of inertia was mainly attributable to participant #7 who exhibited a moment of inertia in anti-phase relative to the other participants. This confirms that our population of gymnasts was homogeneous as they performed their backward giant swings similarly in terms of kinematics as well as dynamics.

### **INDIVIDUAL EMG PATTERNS**

For each participant, the EMG patterns for the 12 muscles investigated are depicted in **Figure 3**. The inter-subject indices of similarity (i.e., *r* and *r*max) are reported in **Table 2**. The averaged *r*-value between each pair of participants was 0*.*70 ± 0*.*20, and ranged from 0.26 (DC) to 0.89 (FD). The averaged *r*maxvalue was 0*.*90 ± 0*.*05, and ranged from 0.83 (DC) to 0*.*96 ± 0*.*02 (FD and RA). While the pattern of activity of some muscles exhibited high similarity between participants (e.g., FD, BBlh, LD, RA, ES, and RF), others were more variable (e.g., BBsh, TB, DC, DS, and TZ). The higher *r*max-values compared with *r*-values showed that the variability between participants can be partly explained by time shifts of the EMG patterns. Indeed, we found an absolute lag time ranging from 1.5% (ES) to 18.5% (DC) in the giant swing (**Table 2**). The largest time shifts were observed for BBsh, DC, and TB muscles and were mainly attributable to participant #6 and #7 (**Figure 3**).

### **NUMBER OF EXTRACTED MUSCLE SYNERGIES**

**Figure 4A** depicts the cumulative percentage of variance explained by each number of muscle synergies. Using the criterion previously described (i.e., VAF *>* 90% or until adding an additional synergy did not increase VAF by *>*5%), three synergies were identified for all the participants. When applying the BLF method, 6 out of 9 participants exhibited 3 muscle synergies (**Figure 4B**). When applying the KP method describing by Cheung et al. (2009), 8 out of 9 participants exhibited 4 muscle synergies (**Figure 4B**). These three analyses reveal that all (or most of) the participants exhibited the same number of muscle synergies (100% for the threshold method, 66% for BLF, and 89% for KP). Because it has not been demonstrated that one of this methods is more accurate than another to determine the correct

number of muscle synergies and because three muscle synergies were found to characterize the data in 2 out of the 3 methods, we decided to use three muscle synergies for all the participants for the subsequent analysis.

of activity across all giant swings. While the pattern of activity of some

Three muscle synergies accounted for a mean VAF of 89*.*9 ± 2*.*0% (range: 86.1–92.5%) and the VAFmuscle ranged from 70*.*9 ± 8*.*5 to 92*.*8 ± 4*.*3% (**Figure 4C**). While VAFmuscle of BBsh, DC, and TZ was lower than 75% for some participants (1–2, depending on the muscle), VAFmuscle consistently dropped below 75% for RF. The VAF explained by each of the three extracted muscle synergies is depicted in **Figure 5** for each gymnast. We observed between-subject variability, especially for synergy #2 and #3 (coefficient of variation = 6.3, 34.7, and 53.9% for synergy #1, #2, and #3, respectively). This variability can be explained mainly by the fact that VAF was higher for synergy #3 compared to synergy #2 for participant #1 and #2, while it was to the contrary for all the other participants (**Figure 5**).

# **WITHIN-SUBJECT CONSISTENCY OF THE EXTRACTED MUSCLE SYNERGIES**

An individual example (participant #6) of the three muscle synergies extracted during both the first and the second set is depicted in **Figure 6**. The cross-validation procedure showed that the muscle synergy vectors extracted for the first set of linked backward giant swings explained 87*.*9 ± 2*.*7% (range: 83.5–91.7%) of the variability of the dataset obtained during the second set.

To further assess the repeatability of the extracted muscle synergies, we compared the two sets of giant swings. Both the synergy activation coefficients and the muscle synergy vectors exhibited good repeatability. The averaged *r*-value over the three muscle synergies was 0*.*93 ± 0*.*05 (range: 0.88–0.98) for the synergy activation coefficients and 0*.*93 ± 0*.*06 (range: 0.87–0.98) for the muscle synergy vectors.

spinae at level of L4; RF, rectus femoris.

Overall, these results clearly show that the muscle synergies were robust for a given participant allowing us to interpret a difference between participants as different motor control strategies rather than as methodological issues.

# **BETWEEN-SUBJECT VARIABILITY OF THE EXTRACTED MUSCLE SYNERGIES**

The three extracted muscle synergies are depicting in **Figure 7**. The temporal activation of muscle synergies (i.e., synergy activation coefficients) was consistent across participants [*r*-value of 0.87 (range: 0.53–0.98), 0.76 (range: 0.50–0.87), and 0.72 (range: −0.03–0.98) for synergy #1, #2, and #3, respectively; *r*max-value of 0.96 (range: 0.87–0.99), 0.92 (range: 0.86–0.98), and 0.95 (range: 0.85–0.99) for synergy #1, #2, and #3, respectively]. The higher *r*max-values compared with *r*-values suggest that variability between participants is partly explained by time shifts. The mean absolute lag time was 4*.*3 ± 2*.*1% of the giant swing and ranged from 2*.*7 ± 1*.*9% (synergy #1) to 6*.*7 ± 9*.*9% (synergy #3) of the giant swing. The larger time shift observed in synergy #3



*Values are mean (min-max). Lags were calculated as the lag times that maximized the cross-correlation function and correspond to the absolute time shift between the two waveforms (% giant swing).*

*While the pattern of activity of some muscles exhibited high similarity between participants (e.g., FD, BBlh, LD, RA, ES, and RF), others were more variable (e.g., BBsh, TB, DC, DS, and TZ). As rmax -values are very high, variability between participants can be partly explained by time shifts of the EMG patterns.*

compared to synergy #1 and #2 was mainly attributable to participant #5. Indeed, its peak of activation occurred during the first half of the swing (*<*50% of the total swing), while the other participants had their peak of activation coefficients in the second half of the swing (**Figure 7**).

Concerning the muscle synergy vectors, we found an averaged *r*-value of 0.83 (range: 0.62–0.97), 0.86 (range: 0.64–0.98), and 0.66 (range: 0.03–0.97) for synergy #1, #2, and #3, respectively. Considering the critical *r*-value of 0.71 (see methods), four pairwise comparisons (out of 36 possibilities, i.e., 11%) were different for synergy #1, two (6%) were different for synergy #2, and 13 were different for synergy #3 (36%). This clearly shows that the composition of synergy #1 and #2 was consistent across participants while the composition of synergy #3 was more variable, as highlighted by **Figure 7**.

As explained in the Methods, two additional analyses have been performed to test the similarity of the muscle synergies between participants. First, by keeping the muscle synergy vectors constant, we obtained an averaged VAF across all pairs of 79*.*3 ± 3*.*7% (range: 70.6–87.5%). The VAFmuscle ranged between 48*.*2 ± 9*.*9% (DC) and 86*.*8 ± 2*.*4% (RA). Relative to the preset threshold of VAFmuscle *>*75%, the EMG patterns of the BBsh, DC, and RF muscles were not correctly reconstructed (**Figure 8**). By keeping the synergy activation coefficients constant, the averaged VAF was 72*.*4 ± 4*.*8% (range: 60.2–82.9%). The VAFmuscle ranged between 56*.*1 ± 3*.*8% (DC) and 83*.*0 ± 5*.*4% (FD). Relative to the preset threshold of VAFmuscle *>*75%, the EMG pattern of the BBsh, TB, DC, DS, TZ, PM, ES, and RF muscles were not correctly reconstructed (**Figure 8**). The Two-Way ANOVA

**FIGURE 4 | Variance accounted for (VAF) and number of extracted muscle synergies. (A)** The percentage of variance accounted for is depicted for each participant as a function of the number of extracted synergies. Error bars indicate the 95% bootstrap confidence interval across the participants for both the VAF calculated from the original data set (black bold line) and the VAF calculated from the unstructured EMG dataset generated by randomly shuffling the original dataset across time and muscles (gray bold line). **(B)** Number of extracted muscle synergies based on the VAF threshold method (VAF), the best linear fit method (BLF, Cheung et al., 2005) and the knee point method (KP, Cheung et al., 2009). **(C)** VAFmuscle is depicted for each participant and each muscle. For both Panels (**A** and **C**), the black bold line indicates the mean profile across the nine gymnasts. Abbreviations for individual muscles are described in the legend of **Figure 3**. For the color legend, see **Figure 1**.

showed a significant main effect for both "muscle" and "reconstruction method" (*p <* 0*.*01). More precisely, VAFmuscle was significantly lower when the synergy activation coefficients were fixed than when muscle synergy vectors were fixed (70*.*2 ± 10*.*5%

**FIGURE 5 | Variance Explained For (VAF) for each of the three extracted muscle synergies.** For each gymnast, VAF explained by each of the three extracted muscle synergies was calculated (**Left panel**). The temporal activation of each muscle synergy is depicted as a function of the mean

angular position (**Right panel**). For all the gymnasts, it clearly appears that the dimensionality in their EMG data was mainly explained by the first muscle synergy. Gymnasts #1 and #2 exhibited a higher VAF by the third synergy than the second one. This strategy differs from the seven other gymnasts.

**FIGURE 6 | Within-subject consistency of the muscle synergies extracted during the two sets of linked backward giant swings.** This figure depicts an individual example (Participant #6). On the left panel, the thin lines correspond to the synergy activation coefficient extracted for each giant swing and the bold lines correspond to the averaged profile over the 9 consecutive giant swings. Red stands for the set #1 and blue stands for the set #2. The corresponding muscle synergy vectors are depicted on the right panel. This figure clearly shows that the muscle synergies were robust for a given participant. Indeed, the cross-validation procedure showed that the muscle synergy vectors extracted for the first set of linked backward giant swings explained 87*.*9 ± 2*.*7% (range: 83.5–91.7%) of the variability of the dataset obtained during the second set. Abbreviations for individual muscles are described in the legend of **Figure 3**.

vs. 75*.*4 ± 14*.*1%, respectively). VAFmuscle was significantly lower for BBsh, DC, DS, TZ, and RF muscles than for the others. Overall, these results suggested that the muscle synergy vectors were more consistent across the gymnasts than the synergy activation coefficients.

# **DISCUSSION**

The results of the present study outlined three muscle synergies that accounted for the EMG patterns during giant swings on a high bar. The relative consistency of muscle synergies across trained gymnasts confirms that muscle synergies are consistent across participants (Torres-Oviedo and Ting, 2007; Cheung et al., 2009; Hug et al., 2011; Turpin et al., 2011b; Chvatal and Ting, 2012), even during a skilled motor task requiring learning.

### **METHODOLOGICAL CONSIDERATIONS**

As done in previous research (Turpin et al., 2011a), EMG activity from each muscle was normalized to its peak value from all of the cycles. Note that this normalization procedure only provides information about the level of muscle activity in relation

to this peak value (i.e., waveform of the EMG patterns). In other words, while interindividual variability in terms of degree of muscle activity can exist, the present study only focuses on the EMG waveform variability. This choice was motivated by the fact than an ideal normalization method to quantify the degree of muscle solicitation does not exist (Burden, 2010). Whatever the normalization method, a part of the observed variability would have been attributable to methodological considerations. Consequently, we considered a muscle synergy as a covariation of muscle activation where the output level of this activation was not taken into consideration.

synergy vectors are shown at the bottom of the figure for all the participants.

By quantifying the interindividual variability using *r*-values, *r*max-values, and lag times, our goal was to compare our results with those of the literature. However, it should be kept in mind that the smoothing of both the EMG patterns and the synergy activation coefficients, can influence the *r*-values (Hug, 2011). As a wide variety of low-pass filters have been used in the literature aimed at extracting muscle synergies during locomotor tasks, e.g., from 4 Hz in Clark et al. (2010) to 40 Hz (Chvatal and Ting, 2012), caution must be taken when comparing the results of interindividual variability from studies that used different cut-off frequencies.

abbreviations, see the **Figure 3** legend. For the color legend, see **Figure 1**.

# **FUNCTIONAL ROLES OF MUSCLE SYNERGIES**

The extracted muscle synergies were well related to the mechanics of the giant swing. Synergy #1 mainly involved the trunk and hip flexor muscles (e.g., LD, PM, RA, RF) at the beginning of the ascendant phase of the swing that would allow the gymnast to decrease his moment of inertia, and gain some mechanical energy and angular velocity to attain the handstand position.

**FIGURE 8 | VAFmuscle obtained by keeping either the muscle synergy vectors constant or the synergy activation coefficients.** Two additional analyses have been performed to test the similarity of the muscle synergies between participants (see "Materials and Methods" section). First, by keeping the muscle synergy vectors constant, we obtained on averaged a VAFmuscle ranging between 48*.*2 ± 9*.*9% (DC) and 86*.*8 ± 2*.*4% (RA). Relative to the preset threshold of VAFmuscle *>* 75%, the EMG patterns of BBsh, DC, and RF muscles were not correctly reconstructed. By keeping the synergy activation coefficients constant, the VAFmuscle ranged between 56*.*1 ± 3*.*8% (DC) and 83*.*0 ± 5*.*4% (FD). Relative to the preset threshold of VAFmuscle *>* 75%, the EMG pattern of BBsh, TB, DC, DS, TZ, PM, ES, and RF muscles were not correctly reconstructed. For muscle abbreviations, see **Figure 3** legend.

The forearm muscle (FD) was also involved in synergy #1 to firmly grip the bar [which was highly in tension in that phase of the swing (Cagran et al., 2010)] while arm (BBsh, BBlh) muscles were solicited to stiffen elbow and glenohumeral joints, respectively. Synergy #2 mainly involved the arm (TB) and shoulder muscles (DC, DS, TZ) and was activated during the upper part of the giant swing. In light of the inverse dynamic model of a ground handstand (Kerwin and Trewartha, 2001), synergy #2 was activated to support the body weight. The activation profile of synergy #2 also showed a second lower peak near the end of the descendant phase of the giant swing, simultaneously with the peak in activity for synergy #3 and with the peak in angular velocity. The angular velocity of the gymnast increased due to the gravitational acceleration up to this peak, which might coincide with the end of the "fall-like" part of the giant swing and with the highest tensile load on the high bar (Cagran et al., 2010). Therefore, the arm and shoulder muscles of synergy #2 (TB, DC, DS, TZ), plus the trunk muscles of synergy #3 (LD) were activated to limit the extension and the tensile load within the shoulder joint. Finally, the other muscles of synergy #3 ensured the grip on the bar (FD) and hip extension (ES) of the gymnast's body. This arch-like position of the body would set the tension in the flexor chain muscles and favor the subsequent shoulder flexion during the ascending section (Frère et al., 2012).

# **INTERINDIVIDUAL VARIABILITY OF THE NEUROMUSCULAR CONTROL STRATEGIES**

Interindividual variability in EMG patterns has often been reported at the level of individual muscles (Ryan and Gregor, 1992; Guidetti et al., 1996; Hug and Dorel, 2009; Hug et al., 2011). It is also the case in the present study where some individual EMG patterns (e.g., BBsh, TB, DC, DS, and TZ) exhibited interindividual variability that seems to be higher than the variability reported during pedaling (Hug et al., 2010). This difference may be explained by several factors, such as the number of degrees of freedom (closed vs. open kinematic chain for pedaling and giant swing, respectively) and the higher smoothing of the EMG profiles in the study by Hug et al. (2010), which may increase the similarity of the waveform (Hug et al., 2012).

It is unclear whether this variance in muscle activation across subjects would arise from variance in the motor program itself. In some cases, different muscle synergies have been identified in subpopulations (Torres-Oviedo and Ting, 2007). For instance, Torres-Oviedo and Ting (2007) extracted in some participants a muscle synergy specific to a knee-bending strategy during balance control. In contrast, despite a relatively high interindividual variability of some individual muscles, Hug et al. (2010) reported similar modular organization of muscle coordination (in terms of number of extracted muscle synergies, composition, and temporal activation) across trained cyclists during pedaling. In the present study, three consistent muscle synergies accounted for the EMG patterns in trained gymnasts during a giant swing, as reported in other cyclic tasks such as pedaling and rowing (Hug et al., 2010; Turpin et al., 2011b). Despite the overall similarity of both muscle synergy vectors and synergy activation coefficients across gymnasts, some differences occurred (36% of the pairwise comparisons of muscle synergy vectors), mainly for synergy #3. As this synergy is activated at the end of the descendant phase, the variability of the muscle synergy vectors may be explained by a lower muscular demand. Indeed, during the descendant phase of the giant swing, muscular torque accounted for less than gravitational and inertial torques to enable the arch-like position of the gymnast (Sevrez et al., 2012). According to previous studies demonstrating that the spatial components of the muscle synergies are related to biomechanical functions (Ting and Macpherson, 2005; Torres-Oviedo and Ting, 2007; McKay and Ting, 2008), this low muscular demand might involve subtle subject-specific muscle synergy compositions. High tensile load was determined at the end of the descending phase of the giant swing (Cagran et al., 2010). To counteract this tensile load, gymnasts stiffened the shoulder joint likely by the second peak of activity visible in synergy #2 rather than by synergy #3. This may confirm the relationship between muscle synergy composition and functional demand. This also confirms previous observations that although some muscle synergies are very robust across subjects, others are more variable (Hug et al., 2010).

A key bit of information provided by the synergy analysis regards the number of extracted muscle synergies that have been proposed to reflect the complexity of motor control (Clark et al., 2010). As justified in the Methods section, we extracted the same number of muscle synergies for all the participants. However, the determination of the correct number of muscle synergies is not a trivial matter (Tresch et al., 2006) and despite the use of different criterion, we cannot affirm that all the participants exhibit the same number of muscle synergies and thus that they exhibit the same complexity of motor control. However, the low coefficient of variations (mean/SD × 100) in VAF values (ranging from 0% for 12 muscles synergies to 7% for 1 muscle synergies; 2.2% for 3 muscle synergies) highly suggests that the gymnasts possess a very similar dimensionality in their EMG data.

# **NEUROPHYSIOLOGICAL INTERPRETATIONS**

The present results showed a strong similarity in neuromuscular control strategies across the experts during a skilled motor task (**Figure 7**). This consistency in muscle synergies might reflect the existence of lower-level neural control structures that can be flexibly modulated to result in complex, learned movements as previously suggested (Cheung et al., 2005; Ting and McKay, 2007; Torres-Oviedo and Ting, 2007; Hug et al., 2011; Chvatal and Ting, 2012). During skill learning, it has been shown that the modulation of muscles synergy composition emerged up to a stable state allowing a subsequent change in the temporal profile of the muscle synergies (Kargo and Nitz, 2003). This suggests that muscle synergies may be formed by adaptive processes in relationship to the experiences of each individual. Consequently, the relatively good similarity of muscle synergies observed between the gymnasts could be explained by their similar training experience. It should also be noted, however, that instead of constructing new muscle synergies during the learning process, it is also possible that the extracted muscle synergies have been adapted from existing synergies (Safavynia et al., 2011). Although numerous studies have suggested that the central nervous system produces movement through a flexible combination of muscle synergies (Ting and McKay, 2007), it should be kept in mind that other research has suggested that the synergies better reflect task constraints (Kutch et al., 2008; Valero-Cuevas et al., 2009). Therefore, the consistency observed in the present study might also be explained by the mechanical requirements demanded by the task and would

# **REFERENCES**


only signify that the observed synergies are compatible with the execution of a backward giant swing. As we studied only one condition (without varying constraints), we were not able to test this hypothesis. However, although mechanical constraints were similar across individuals, high interindividual variability was evident for some EMG patterns (e.g., DS, TZ, TB, **Figure 3**), confirming that different muscle activity patterns may lead to similar mechanical patterns, or task performance (Chvatal et al., 2011).

The higher VAF and VAFmuscle values obtained when muscle synergy vectors were fixed compared to fixed coefficients of activation further suggest that muscles synergies are spatially fixed while their temporal patterns of recruitment can vary (Chvatal and Ting, 2012; Safavynia and Ting, 2012). This spatial consistency of the nervous control of motor behavior might support the notion that descending cortical signals represent neuronal drives that select, activate, and flexibly combine muscle synergies specified to networks in the spinal cord and/or brainstem (Hart and Giszter, 2004; Cheung et al., 2005). In this way, it has been shown that only the temporal activation of muscle synergies (and not the spatial structure) is altered by deafferentation or cortical stroke in humans (Cheung et al., 2005, 2009).

# **CONCLUSION**

Although variability was found (especially for synergy #3), the gymnasts exhibited gross similar neuromuscular strategies when performing several consecutive giant swings. This confirms that muscle synergies are consistent across participants, even during a skilled motor task requiring learning. Further investigations are necessary to both confirm that these muscle synergies reflect lower-level neural control rather than biomechanical constraints and to understand whether they are constructed during the learning process or whether they have been adapted from existing synergies.

synergies during perturbed walking. *J. Neurosci.* 32, 12237–12250.


Temporal components of the motor patterns expressed by the human spinal cord reflect foot kinematics. *J. Neurophysiol.* 90, 3555–3565.


synergies: implications for clinical evaluation and rehabilitation of movement. *Top. Spinal Cord Inj. Rehabil.* 17, 16–24.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 15 September 2012; accepted: 09 December 2012; published online: 28 December 2012.*

*Citation: Frère J and Hug F (2012) Between-subject variability of muscle synergies during a complex motor skill. Front. Comput. Neurosci. 6:99. doi: 10.3389/fncom.2012.00099*

*Copyright © 2012 Frère and Hug. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

# Feedback of mechanical effectiveness induces adaptations in motor modules during cycling

# *Cristiano De Marchis\*, Maurizio Schmid , Daniele Bibbo , Anna Margherita Castronovo , Tommaso D'Alessio and Silvia Conforto*

*Department of Engineering, Roma TRE University, Rome, Italy*

### *Edited by:*

*Andrea D'Avella, IRCCS Fondazione Santa Lucia, Italy*

### *Reviewed by:*

*Francois Hug, The University of Queensland, Australia Vincent C. K. Cheung, Massachusetts Institute of Technology, USA*

### *\*Correspondence:*

*Cristiano De Marchis, Biolab3, Department of Engineering, Roma TRE University, Via Vito Volterra 62, 00146 Rome, Italy. e-mail: cristiano.demarchis@ uniroma3.it*

Recent studies have reported evidence that the motor system may rely on a modular organization, even if this behavior has yet to be confirmed during motor adaptation. The aim of the present study is to investigate the modular motor control mechanisms underlying the execution of pedaling by untrained subjects in different biomechanical conditions. We use the muscle synergies framework to characterize the muscle coordination of 11 subjects pedaling under two different conditions. The first one consists of a pedaling exercise with a strategy freely chosen by the subjects (Preferred Pedaling Technique, PPT), while the second condition constrains the gesture by means of a real time visual feedback of mechanical effectiveness (Effective Pedaling Technique, EPT). Pedal forces, recorded using a pair of instrumented pedals, were used to calculate the Index of Effectiveness (*IE*). EMG signals were recorded from eight muscles of the dominant leg and Non-negative Matrix Factorization (NMF) was applied for the extraction of muscle synergies. All the synergy vectors, extracted cycle by cycle for each subject, were pooled across subjects and conditions and underwent a 2-dimensional Sammon's non-linear mapping. Seven representative clusters were identified on the Sammon's projection, and the corresponding eight-dimensional synergy vectors were used to reconstruct the repertoire of muscle activation for all subjects and all pedaling conditions (*VAF >* 0*.*8 for each individual muscle pattern). Only 5 out of the 7 identified modules were used by the subjects during the PPT pedaling condition, while 2 additional modules were found specific for the pedaling condition EPT. The temporal recruitment of three identified modules was highly correlated with *IE*. The structure of the identified modules was found similar to that extracted in other studies of human walking, partly confirming the existence of shared and task specific muscle synergies, and providing further evidence on the modularity of the motor system.

**Keywords: muscle synergies, modularity, cycling, biomechanical function, instrumented pedals, pedaling effectiveness, biofeedback**

# **INTRODUCTION**

The study of the neuro-physiological mechanisms underlying movement production has a long fascinating history (Bernstein, 1967). In the last decade the scientific community has been focusing its attention on the possibility of simplifying the role of the central nervous system (CNS) for the production of movement, by hypothesizing that the complex muscle coordination shown during the execution of a variety of motor acts relies on a simple combination of motor modules (d'Avella et al., 2003). Experimental evidence has been provided that surface ElectroMyoGraphic signals (sEMG) recorded from many muscles during the execution of movement can be represented by the combination of a reduced number of muscle synergies. The latter constitute modules of muscle co-activation that—flexibly combined through amplitude scaling and time shifting mechanisms—can accurately reconstruct the repertoire of muscle activation for many motor tasks (d'Avella et al., 2003). Muscle synergies have been investigated in motor tasks like running (Cappellini et al., 2006), postural responses (Torres-Oviedo and Ting, 2007), pedaling (Hug et al., 2010), walking in normal and pathologic conditions (Ivanenko et al., 2004; Clark et al., 2010; Monaco et al., 2010) and upper limb reaching (d'Avella et al., 2008; Cheung et al., 2012). From this background it emerges that the muscle synergies paradigm seems to fairly represent the neural strategies underlying the control of movement, with motor modules characteristic for each task and robustly shared among different subjects, in terms of both temporal and spatial organization of the muscle activity. Moreover, the existence of a few shared and task-specific muscle synergies during the execution of different movements in the freely moving frogs (d'Avella and Bizzi, 2005) and the existence of separate modules during the coordination of locomotion with voluntary actions (Ivanenko et al., 2005), provides further evidence of modularity.

Nevertheless, the fact that muscle synergies actually reflect neural strategies has been criticized, and it has been hypothesized that they simply reflect the biomechanical constraints during movement execution (Kutch and Valero-Cuevas, 2012). As a matter of fact, a neuro-physiological mechanism able to fully justify the muscle synergy model is still lacking (Tresch and Jarc, 2009).

The linkage between muscle coordination and the mechanical outcome of movement has recently provided further insight into the modular control of movement through the use of simulation studies of human gait (Neptune et al., 2009). It has also been shown that scaling in amplitude and shifting in time the same small number of fixed motor modules leads to the satisfaction of altered mechanical task demands, and that the main modifications occur in the recruitment of those modules recognized as responsible of that particular biomechanical sub-function (Cheung et al., 2009; McGowan et al., 2010).

Modification of the mechanical constraint has also been investigated in cycling tasks executed by trained subjects, where it has been shown that the same few modules are shared among subjects and among different pedaling conditions, with limited adaptations in the synergy activation coefficients (Hug et al., 2011).

Even if cycling gesture is a quasi-constrained exercise with controllable experimental conditions, little is known about the effect of the pedaling technique (i.e., a mechanical factor in terms of force orientation on the pedal along the pedaling cycle) on the structure of the muscular coordination and on the underlying structure of the motor modules. Analysis of forces through the use of instrumented pedals can provide an insight about the effect of different pedaling techniques on muscle coordination (Mornieux et al., 2008). Many variables involved in cycling, such as physiological and metabolic factors, could influence the mechanical outcome and they are functionally connected to the evaluation of the athletes' performance (Zameziati et al., 2006). The concept of mechanical effectiveness in cycling is one of these, and it is directly related to the ability of the subject to orient the pedal forces so that all the expressed forces participate to the propulsive action. Index of mechanical effectiveness, defined as the ratio between the tangential force component and the total one, has been used as an indicator of cycling behavior and it has been put in relation with other parameters such as muscular efficiency or metabolic consumption (Mornieux et al., 2006; Zameziati et al., 2006).

In this study we enrolled untrained subjects to investigate whether a change in the pedaling technique, induced by a visual feedback of mechanical effectiveness, is accomplished by neuromuscular adaptations in modular motor control. In order to do this, we described the pedaling gesture from a biomechanical point of view by combining pedal forces, measured by instrumented pedals, and multi-muscle surface EMG recordings. Our main hypothesis is that when passing from a subject freely chosen pedaling technique to a novel one imposed by the visual feedback of mechanical effectiveness, EMG patterns could change altering the synergy recruitment rather than the structure or the number of the motor modules as a consequence of motor adaptation.

# **MATERIALS AND METHODS**

### **PARTICIPANTS**

Eleven male voluntary subjects (aged 27*.*4 ± 2*.*5 years) participated to the study. Each subject had no previous experience with professional cycling and reported less than 50 km riding in the previous 3 years. None of them reported previous history of lower limb pathology or surgery. The subjects were informed about the possible discomforts deriving from the experimental protocol and agreed to participate through an informed consent. The study was carried out according to the principles of the declaration of Helsinki.

# **PEDAL FORCES RECORDINGS AND FEEDBACK OF MECHANICAL EFFECTIVENESS**

Pedal forces were recorded by means of a pair of custom twocomponents instrumented clipless pedals, measuring the two orthogonal components of force *Fx* and *Fz* (respectively parallel and orthogonal to the pedal surface as in **Figure 1**, with an accuracy of 0.1% and a range of 2000 N), together with the angle θ*<sup>p</sup>* between the direction of the crank arm and a direction orthogonal to the pedal surface (Bibbo et al., 2009a). The pedals use a KEO clip-less fastening. *Fx* and *Fz* components of force were acquired using a previously developed recording wireless system (iPED) (Bibbo et al., 2009b) that provides the tangential and radial force components, according to the Equations 2 and 3:

$$F\_{\text{tot}} = \sqrt{F\_{\text{x}}^2 + F\_{\text{z}}^2} \tag{1}$$

$$F\_{\rm tg} = F\_{\rm x} \cos \left(\theta\_{\rho}\right) + F\_{\rm z} \sin \left(\theta\_{\rho}\right) \tag{2}$$

$$F\_{\rm rd} = -F\_{\rm x} \sin \left(\theta\_{\rm \mathcal{P}}\right) + F\_{\rm z} \cos \left(\theta\_{\rm \mathcal{P}}\right) \tag{3}$$

The iPED system also provides the index of mechanical effectiveness (*IE*), determined as reported in Equation 4:

$$IE = \frac{\int\_0^{2\pi} F\_{\text{tg}}\left(\theta\_p\right) d\theta\_p}{\int\_0^{2\pi} F\_{\text{tot}}\left(\theta\_p\right) d\theta\_p} \tag{4}$$

*IE* is an index theoretically varying in the range [−1, 1] and approaches 1 as the tangential force profile overlaps the total one

along the whole pedaling cycle. *IE* was used as a global indicator of performance, while the subjects were provided with a real time visual feedback of instantaneous mechanical effectiveness *IEi* drawn on a polar plot (**Figure 2**), and defined as follows:

$$IE\_i(\theta\_\mathcal{P}) = \frac{F\_{\text{tg}}(\theta\_\mathcal{P})}{F\_{\text{tot}}(\theta\_\mathcal{P})} \tag{5}$$

*IEi* is drawn as a vector whose magnitude is proportional to the instantaneous mechanical effectiveness and whose phase is proportional to the pedal angle.

In this way, the subjects were helped to effectively orient the forces along the pedaling cycle, receiving real time information about which sector of the cycle they had to improve to reach an optimal pedaling technique (Bibbo et al., 2012). An entirely filled circle thus corresponds to *IE* = 1.

# **EXPERIMENTAL PROTOCOL**

The experimental protocol was carried out on an aerodynamically braked cycle-simulator equipped with the instrumented pedals described in the previous section and standard 170 mm cranks. Before the execution of the exercises, the subjects performed a 10-min warming up session. At the end of the warm up the subjects performed a 10 s all-out trial, with the aim of determining the maximum reachable power output (resulting in 634*.*4 ± 85*.*5 W). The experimental procedure started after a 3-min rest period and consisted of two different sub-maximal pedaling exercises, each one lasting 2 min: the first exercise consisted in a 2-min pedaling task with a strategy freely chosen by the subject (Preferred Pedaling Technique, PPT). At the end of the first exercise, the subjects, having no previous knowledge about the concept of mechanical effectiveness in cycling, were instructed by the experimenter on how to follow the visual feedback of *IEi* and on how to optimally orient the forces on the pedal. After a familiarization with the feedback system, the subjects executed a second pedaling exercise consisting of a 2-min pedaling task aided by feedback (Effective Pedaling Technique, EPT). For both exercises the subjects were asked to adopt the same freely chosen pedaling cadence (resulting in 67*.*3 ± 5*.*7 rpm and corresponding to 120*.*6 ± 17*.*1 W of power output) with a comfortable seated position on the saddle. Such a protocol was chosen in order to avoid the occurrence of any sign of neuro-muscular alterations due to fatigue (Conforto and D'Alessio, 1999), which could negatively bias the execution of the exercises (Castronovo et al., 2012).

### **sEMG RECORDINGS**

sEMG data were recorded from the following eight muscles of the dominant leg, defined as the leg the subjects usually used to kick a ball: Gluteus Maximus (Gmax), Biceps Femoris long head (BF), Gastrocnemius Medialis (GAM), Soleus (SOL), Rectus Femoris (RF), Vastus Medialis (VAM), Vastus Lateralis (VAL), and Tibialis Anterior (TA). These muscles were chosen because they are deemed representative of the main muscular groups acting across the three main degrees of freedom involved in cycling, which are hip flexion-extension (RF, Gmax, BF), knee flexionextension (RF, VAL, VAM, BF, GAM), and ankle plantar-flexion (GAM and SOL) and dorsi-flexion (TA) (So et al., 2005; Hug and Dorel, 2009). A pair of Ag/AgCl electrodes was applied on each muscle, following the SENIAM recommendations (Hermens et al., 2000). Before applying the electrodes, the skin was shaved and cleaned to improve the electrode/skin impedance. sEMG data were collected with a wireless system (BTS FREEEMG 300, BTS s.p.a.) equipped with eight bipolar wireless channels, sampled at 1000 samples/s and digitized at 14 bits. All the sEMG signals were synchronized with the force data coming from the instrumented pedals. Preliminary results including part of these data were previously published (De Marchis et al., 2012).

### **DATA PREPROCESSING**

sEMG signals were filtered in the band (20–450) Hz, preprocessed for noise removal (Conforto et al., 1999), full-wave rectified and low-pass filtered at 4 Hz with a 3rd order Butterworth filter to obtain the signal amplitude envelope (Neptune et al., 2009). Each muscle pattern was amplitude normalized to the maximum value

of the envelope across the two pedaling conditions (i.e., PPT and EPT). Time scale was normalized by interpolating each sEMG envelope and each force component for each cycle on 100 data points, each one representative of the integer percentage of the pedaling cycle. A pedaling cycle was defined as the complete revolution of the crank starting from Top Dead Center (TDC, 0◦), passing through Bottom Dead Center (BDC, 180◦) and back to TDC in a 360◦ cycle.

### **MUSCLE SYNERGIES EXTRACTION**

Muscle synergies were extracted by means of Non-negative Matrix Factorization (NMF) (Lee and Seung, 1999) applied to the matrix *M* containing the envelopes of the eight muscles: the algorithm looks for an approximate solution of the kind *M* ∼= *WxH* by minimizing the matrix norm ||*M* − *WH*||, where *M* is the initial matrix containing the envelopes of the signal from each muscle, *W* is a 8×*s* matrix of the synergy vectors and *H* is a matrix containing the time-varying activation profiles, *s* being the number of modules specified before the NMF application. The convergence is ensured by the use of multiplicative update rules for each iteration of the algorithm.

The applied procedure followed the hypothesis that the number of synergies is not fixed from cycle to cycle, but the subjects can select a subset of modules belonging to the space of the possible basis vectors. This is particularly true for the feedback condition, in which the subjects adopt a new pedaling technique and may thus explore the space of muscle coordination to accomplish the biomechanical demands. We did not make any assumption about the similarity between modules, so that the dimensionality of the space of the basis vectors explored by the subjects is a-priori unknown.

For each subject and for each pedaling condition (PPT and EPT), the entire data set was divided into multiple episodes with each containing three consecutive cycles. A set of muscle synergies was then extracted from every of these episodes. The number of muscle synergies *s* for the reconstruction of the matrix *M* for each episode was chosen by calculating the Variance Accounted For (*VAF*) by the reconstruction *WxH* for each muscle activation profile, defined by Equation 6:

$$\text{VAF}\_{i} = \frac{\sum\_{j=1}^{k} (M\_{\vec{ij}} - R\_{\vec{ij}})^2}{\sum\_{j=1}^{k} (M\_{\vec{ij}})^2} \tag{6}$$

where *R* = *WxH* is the matrix emerging from the synergy model, *k* is the number of samples and *i* indicates the muscle taken into account for *VAFi* calculation. The number of extracted synergies *s* was varied between 1 and 8 for *VAFi* calculation, and *s* was chosen as the smallest number able to explain at least the 90% of the variance for each muscle. This approach is stringent enough to ensure a proper reconstruction of the original EMG signals in each cycle.

All the extracted modules were then pooled across subjects and conditions in order to obtain the whole synergy matrix *W***all** containing all the extracted synergy vectors from all the pedaling cycles for all the subjects.

### **SYNERGY DISCOVERY**

A synergy discovery procedure was then applied to the matrix *W***all** by performing a 2-dimensional non-linear Sammon's mapping (Sammon, 1969): briefly, this analytic procedure is based on mapping a dataset of k L-dimensional vectors belonging to the L-space (i.e., the 8-dimensional space of muscle synergy vectors in this study) to set of k N-dimensional vectors in the N-space (with *N < L* and usually set to *N* = 2 or 3) by preserving the inherent data structure. The inter-point distance defined in the L-dimensional space is maintained in the projected Ndimensional space by using an error minimization procedure. This is achieved by minimizing an error criterion which penalizes differences in distances between points in the original L-space and the corresponding points in the projected N-space. The error function to be minimized is defined as follows:

$$E = \frac{1}{\sum\_{\substack{k=1 \\ i=1}}^{k} \sum\_{j=i+1}^{k} d\_{ij}} \sum\_{i=1}^{k} \sum\_{j=i+1}^{k} \frac{(d\_{ij} - d\_{ij}^\*)^2}{d\_{ij}} \tag{7}$$

where *k* is the number of vectors in both the original and projected dataset, *dij* is the Euclidean distance between the *i*-th and *j*-th points in the L-space, *d*<sup>∗</sup> *ij* is the Euclidean distance between the *i*-th and *j*-th points in the N-space. The error function is minimized using a second order steepest descent procedure (Sammon, 1969; De Ridder and Duin, 1997).

This procedure has been followed with the aim of establishing the dimensionality underlying *W***all**, that is the number of representative modules. The interpretation and clustering of the Sammon's 2-D distribution allows a quantification of the number of underlying basis vectors, by observing their groupings on the 2-D map.

The number of underlying basis vectors composing *W***all**, was obtained by applying a hierarchical clustering (Ward minimum variance method, Matlab Statistics Toolbox) to the Sammon's map values, in order to organize it in clusters in the 2-D space. These clusters were used to group the synergy vectors contained in *W***all**, and the representative basis vectors were calculated as the average *W* within each cluster, leading to the representative basis vectors *W***basis**.

### **SYNERGY ACTIVATION ANALYSIS**

After applying the clustering, we performed a Non-negative Reconstruction (NNR, Muceli et al., 2010) on all the consecutive cycles (60 in average for each trial) for each subject and for each pedaling condition: NNR consists of applying NMF by keeping *W* fixed and letting *H* update at each algorithm iteration with the following rule:

$$H\_{rc} \leftarrow H\_{rc} \frac{(W\_{\text{basis}}^T M)\_{rc}}{(W\_{\text{basis}}^T W\_{\text{basis}} H)\_{rc}} \tag{8}$$

where the indexes *r* and *c* are referred to each component of the defined matrixes, and *T* denotes the transposed matrix. The temporal activation *H* for each component of *W***basis**provides information about the recruitment of that synergy within the trial cycle by cycle. The information related to the recruitment *H* of each component of *W***basis** was put in relation to the index *IE* cycle by cycle. The amount of activation of a synergy was expressed as the area underlying the temporal profile of activation within each cycle. The ability of the *W***basis** to reconstruct the repertoire of muscle activations from the set of all the consecutive cycles for all the subjects was evaluated by determining *VAFi* for each muscle.

### **STATISTICAL ANALYSIS**

Differences between force profiles, indexes of effectiveness, muscle activation patterns, and muscle synergy activation coefficients were assessed by using a one-way ANOVA with conditions (PPT and EPT conditions) as factors, with statistical significance coming from *p*-values lower than 0.05. All the differences were evaluated in four different sectors of the pedaling cycle, with each sector defined according to Hug et al. (2008), and roughly corresponding to the following: 1st sector is around TDC, 2nd sector refers to the down-stroke phase, 3rd sector roughly corresponds to the part around BDC, 4th sector is related to the up-stroke phase, with the corresponding EMG sectors defined by taking into account the electromechanical delay (Conforto et al., 2006; Hug et al., 2008).

In order to check that each obtained NNR reconstruction is significantly different from that expected from chance, we used a procedure similar to that used in Cheung et al. (2012): we compared the obtained *VAFi* reconstruction values with those expected from chance *VAF***shuffle**: for each reconstruction, 100 random synergy vectors were generated by shuffling the muscle components of each *W***basis** vector, and the obtained *VAF***shuffle** values expected from chance were compared with the reference *VAFi* reconstruction value.

### **RESULTS**

In this section, pedal force profiles are shown for the two pedaling conditions, and the corresponding possible changes in the EMG profiles and the underlying muscle synergies are reported as potential signs of neuromuscular adaptations to an altered pedaling technique.

### **MODIFICATION OF THE FORCE PROFILES**

When passing from the PPT condition to the EPT one there is a significant improvement in *IE* (*IE*PPT = 0*.*41 ± 0*.*09, *IE*EPT = 0*.*68 ± 0*.*14, *p <* 0*.*005). As outlined in **Figure 3A**, feedback of mechanical effectiveness helps the subjects to effectively orient the forces on the pedal, leading the profile of *F*tg to follow the trend of *F*tot along all the cycle, while this happens only in the down-stroke phase during the PPT condition. The effective force application mainly consists of a reduction of the average *F*tot value along the first half of the pedaling cycle and an increase in *F*tg during the second half (*p <* 0*.*005); this action is associated with the reduction in amplitude of the radial force component profile *F*rd along

**FIGURE 3 | Modifications in the force profiles when changing the pedaling technique through** *IE* **feedback.** Upper panels **(A)** Dashed lines represent tangential components, continuous lines represent total force components, and bold black lines are the profiles averaged across subjects. Different colors refer to different subjects. PPT condition left panel: *F*tg follows the trend of *F*tot only in the down-stroke phase. EPT condition right panel: subjects improve their mechanical effectiveness, projecting the

forces in such a way to lead *F*tg to approach *F*tot also during the pull-back and pull-up phases. Lower panels **(B)** Radial force components in the two conditions. PPT condition left panel: distribution of the dissipated forces is spread over a wide range of values. EPT condition right panel: radial components are reduced within a narrower range around 0 N, highlighting the improvement in the pedaling strategy. Black lines represent average *F*rad profiles.

all the cycle, as shown in **Figure 3B**. This action is reflected on the instantaneous index of effectiveness (*IEi*), showing an increased level in the second part of the cycle, in particular for sectors 3 and 4 (**Figure 4**).

# **NEUROMUSCULAR ADAPTATIONS**

The change in the distribution of the force profiles described in the previous section is accompanied by some adaptations in the activation of each muscle, particularly evident as an increased level of activity for BF, GAM, RF, and TA (**Figure 5**), a reduction in the activity of the mono-articular knee extensors (VAM and VAL) in the 1st sector, and an increased tonic activity for Gmax and SOL muscles. (Statistically significant values reported in **Table 1**).

During the cycle by cycle synergies extraction, from 3 to 5 synergies were extracted, and they were used to populate the matrix *W***all**. Seven clusters were indentified from the 2-dimensional Sammon's projections of *W***all** pooled across the two pedaling conditions (**Figure 6**). The corresponding centroids *W***basis** in the 8-dimensional space contain information about the structured information in the data. When passing from PPT to EPT there is a stability in the location of the clusters on the map; two zones (red and green in **Figure 6**) were more populated in EPT, meaning that additional basis vectors are explored in the EPT condition.

Synergy activation coefficients *H* (**Figure 7**) were obtained by applying NNR on each set of consecutive cycles (60 in average for each subject) by keeping *W***basis** fixed. The NNR allowed a reconstruction with mean *VAFi* values for each muscle always higher than 0.9 (except for Gmax, presenting anyway a satisfying reconstruction level). All the obtained *VAF* values were significantly higher than those expected from chance by applying NNR with *W***shuffle** (*p <* 0*.*01 for each muscle in **Table 2**).

In the PPT pedaling condition only the first four modules (*W1–4*, **Figure 7**) showed a significant level of activity. The first module *W1* consists of the co-activation of two mono-articular knee extensor muscles (VAM and VAL) and a bi-articular one (RF, also crossing the hip joint), and it is active during the first part of the pedaling cycle. *W2* mainly consists of the activity of two ankle plantar-flexors (SOL and GAM) together with Gmax, and it is active within the first quarter of the cycle. *W3* involves the co-activation of two bi-articular knee flexors (BF and GAM), and it is active around BDC. *W4* is composed by the activity of RF and TA and intervenes in the last quarter of the cycle before TDC.

In the EPT condition subjects showed an altered recruitment of the synergies active in PPT, and use additional synergies belonging to *W***basis**. *W1* and *W2* are less active (*p <* 0*.*05, **Table 1**), while *W3* and *W4* show an increased level of activation within the functional sectors in which they are recruited (*p <* 0*.*005). The change in the distribution of the values on the Sammon's map consists on the activation of two additional modules. *W5* consists of the co-activation of knee flexor muscles and TA, and it is active within sectors 3 and 4. *W6* occurs just before TDC and it is mainly composed by a merging of synergies *W1* and *W4*(RF, vastii and TA). *W7* mainly reflects the tonic activity of Gmax and SOL during the EPT condition.

# **SYNERGY ACTIVATION COEFFICIENTS AND MECHANICAL EFFECTIVENESS**

The activation of synergies *W3*, *W4*, *W5* is related to the change in mechanical effectiveness (correlation values in **Table 3**). The

coefficient of variation revealed that these are the only synergies relevantly correlated with *IE*, since they show a lower coefficient of variation *CV* (**Table 3**), meaning that their behavior is consistent across subjects.

# **DISCUSSION**

The obtained results seem to support the existence of a modular motor control in humans, with few muscle synergies shared among different subjects and able to reconstruct the variable muscle activation repertoire shown under different pedaling conditions. Pedal force measurements together with the use of a visual feedback of mechanical effectiveness allowed a controlled change in the pedaling strategy, which resulted in the ability of orienting the pedal forces in a direction almost completely tangential to the circle spanned by the pedal, thus confirming the validity of the used protocol.

When the subjects chose their PPT, they adopted a strategy which was mainly based on the propulsive action during the down-stroke phase (TDC – BDC, 0–180◦), where the tangential component of force is almost coincident with the total force, while *F*tg becomes negative during the second part, meaning that the action of the leg slightly opposed the propulsive action, so that most of the propulsion was generated by the down-stroke action of the other leg. Besides *F*tg, a dissipated radial component of force *F*rd was present all over the cycle. The obtained values of *IE* are in line with previous studies measuring mechanical effectiveness during pedaling with a self-selected

**Table 1 | Neuromuscular adaptations passing from PPT to EPT by taking into account the average amount of muscle activity as the area underlying the profile of the muscle activations and the synergy activation coefficients.**


*Force profiles modifications refer to the change in the mean value in each sector. A statistically significant difference in the amount of activation and in the force profile (p < 0.05) is indicated by \*. Dark gray rectangles indicate that muscle/synergy is not active in that particular sector.*

strategy (Sanderson, 1991; Mornieux et al., 2006; Zameziati et al., 2006).

When the subjects adopted an effective strategy, *F*tg tended to follow the profile of *F*tot also in the second part of the cycle (BDC – TDC, 180–360◦), and this action was accompanied by a reduction in *F*tot. This behavior can be associated with the reduction of the radial force component resulting in a significant increase of the index of mechanical effectiveness *IE*.

# **ADAPTATION IN THE MODULAR CONTROL OF PEDALING ACROSS DIFFERENT PEDALING TECHNIQUES**

A episode-by-episode synergy extraction procedure and the subsequent clustering on the Sammon's non-linear projection allowed the identification of seven basis muscle synergy vectors. In order to satisfy the mechanical requirement the subjects switched between the available modules to form different motor programs (Kargo and Nitz, 2003).

Pedaling with a low mechanical effectiveness was accomplished by using a modular muscle coordination mainly characterized by four muscle synergies which were able to account for most of the variance of the EMG data.

Passing to a mechanically EPT resulted in a modification of the mechanical demand which was accompanied by a modification in the muscle activation patterns with respect to the PPT condition, and additional modules were activated to explain the variance of the data. We therefore speculate that these

**FIGURE 6 | Sammon's maps. Right upper panel**: Sammon projections of *W*all. Different colors refer to the different identified clusters on the map. Each point corresponds to the projection of a single synergy vector of *W*all. **Left upper panel**: 8-D average synergy vectors (mean + SD in

figure) among the elements of *W*all belonging to the 2-D clusters identified on the map. **Lower panels:** Sammon's distribution related to the synergy vectors of *W*all extracted from the PPT (left panel) and EPT (right panel) conditions.

additional muscle synergies may represent a neural mechanism reflecting short term adaptation, where the subjects tend to adopt the already learnt muscle coordination shown in PPT, and they add modules to achieve the imposed mechanical requirement.

### **FUNCTIONAL INTERPRETATION OF THE MUSCLE SYNERGIES**

The structure of the extracted muscle synergies may be associated to different biomechanical sub-functions during the pedaling cycle (**Figure 7**):

*W1*, mainly consisting of knee extensors activity (VAM, VAL, RF), acts during the first part of the cycle and it is key to power production during the down-stroke phase, when the knee joint passes from a flexed position (TDC) to an almost completely extended one (BDC).

*W2* involves the co-activation of SOL, GAM, and Gmax. The main action of the two ankle plantar-flexors (SOL and GAM) may be responsible for the ankle angle variations during the pedaling cycle. This synergy might thus contribute to the fine control of the ankle movement preparing the pull back phase.

*W3* is characterized by the activity of two bi-articular knee flexors (BF and GAM), and it starts just before BDC, when the knee joint begins its flexing action in the second part of the cycle.

*W4* is a synergy characterized by the co-activation of RF and TA, and it intervenes during the last quarter of the cycle, in the transition phase between up-stroke and down-stroke passing around TDC, during the hip flexion action, propelling the crank toward the end of flexion.

*W5* is a synergy specific for the pedaling condition EPT, mainly consisting of the co-activity of knee flexors (BF and GAM), Gmax and TA muscle and it may be responsible, together with *W3* and *W4*, of the pull-up action during up-stroke.

*W6* appears in EPT and it seems to consist of a merging between modules *W1* and *W4*. It is active during the last part of the cycle and it may be related to an adaptation of the transition phase around TDC. *W7* clearly presents a tonic recruitment

**NNR by using** *W***basis. Central column**: components of *W*basis. **Side columns**: synergy activation coefficients extracted for the reconstruction of the muscle activation patterns of the PPT

right column). Subjects switch between additional modules to accomplish the different mechanical requirements of the EPT pedaling condition.



*There is a statistically significant difference between the VAFi values obtained by reconstruction with Wbasis and VAFi values obtained from shuffled versions of the original basis vectors.*

along the cycle reflecting the tonic components of Gmax and SOL in EPT.

Passing from PPT to EPT, the activation coefficients *H*, obtained as a NNR by using *W***basis**, show some adaptations, involving the amplitudes rather than the timings, which may reflect the satisfaction of the new mechanical requirements imposed by the feedback; this is particularly evident for the synergies *#3* and *#4*, where *H***<sup>3</sup>** and *H***<sup>4</sup>** show a significant increase



*Mean* ± *SD for each activation coefficient is reported, together with the coefficient of variation CV. H3, H4, H5 are the only components to show a robust behavior across subjects, as indicated by the low CV value.*

that contributes to the modifications of the orientation in *F*tg, leading to the improvement of the *IE*. *W5* displays a level of activation comparable to the one shown by the other synergies, meaning that its action co-participates to the increase of the pedaling propulsion, in particular to power the pedal during the up-stroke phase. On the contrary, *W1* and *W2* which are active during the first part of the cycle, show a reduced activation which may be due to the contribution of the other leg while pulling-up.

The intervention of *W5* is in accordance with what outlined in previous studies (Mornieux et al., 2010), where an increased activity of BF and TA was reported in elite cyclists pedaling with a feedback of mechanical effectiveness. Interestingly, these two muscles were found to be in synergy as well in a mechanically altered pedaling, when they were found to be co-active after a phase shift of the hamstrings activation during backward pedaling in a phase-reversal of the main biomechanical functions (Ting et al., 1999). This aspect might confirm the evidence that *W5* is an available module for the accomplishment of cycling in different conditions.

### **CONTRIBUTION OF THE MODULES TO MECHANICAL EFFECTIVENESS**

By analyzing the correlation between the synergy activation coefficients cycle-by-cycle and the *IE*, it emerges that those synergies which show an increased activity during EPT (i.e., *W3*, *W4*, *W5*), also show a significant correlation with *IE*, confirming their contribution to the change in the pedaling technique.

# **EVIDENCE OF A BETWEEN-TASK SHARED MODULAR ORGANIZATION: THE CASE OF HUMAN WALKING**

The fascinating hypothesis that some muscle synergies can be task-specific and other may be shared between different tasks (d'Avella and Bizzi, 2005), seems interestingly confirmed in our study. In particular, 4 out of the 7 synergies identified in the present study are highly similar to those extracted in other studies of human walking (*W1*, *W2*, *W4*, *W5*), but they are recruited in a different order during the movement cycle between the two tasks (**Figure 8**). This similarity is more evident when the comparison is carried out with studies using the same decomposition technique (Neptune et al., 2009; Clark et al., 2010), so that a common interpretation can be drawn. According to the modules extracted in (Neptune et al., 2009; Clark et al., 2010), *W1* (knee-extensors) is active in walking during the early stance phase providing body support, *W2* (ankle plantar-flexors and gluteus) intervenes during late stance and contributes to swing initiation, *W4* (RF and TA) is recruited just before stance and provides dorsiflexion during and just after heel strike, *W5* (Hamstrings and TA) decelerates the leg at the end of swing.

It is worth noting that two completely different tasks like walking and cycling, which basically involve the same body segments in the lower limbs, share some modules, but these modules are activated in a different way in order to satisfy the current task requirements in terms of both kinematics and kinetics. While *W3*(BF and GAM) seems to be specific for a task like cycling, *W5* seems to be shared with walking but it appears in cycling only when a change in the biomechanical requirement is present, providing further evidence that a small set of motor modules can account for a variety of motor tasks through a simple selective activation and combination of modules. This aspect, related to motor adaptation, is in accordance with the theoretical functioning of the structure of a modular controller, since the use of an already existing module allows a faster adaptation to a perturbation in the task that is likely to be compatible with the modules (d'Avella and Pai, 2010).

# **ASPECTS RELATED TO TRAINING AND REHABILITATION**

With respect to the study carried out on trained cyclists (Hug et al., 2010), where 3 synergies were extracted, here we extracted 4 synergies in the PPT. Despite the possible effect of the EMG processing techniques and the criterion used to choose the number of modules, a possible explanation could be attributed to the different power output expressed by the two studied populations, since an higher power output would increase the signal-to-noise ratio and would lead to a reduced number of synergies explaining an higher amount of *VAF*. An alternative hypothesis is that the difference in the number of synergies may be also due to a possible reorganization in the recruitment of the modules in trained subjects (Chapman et al., 2008), mainly consisting in the simultaneous recruitment of the modules *W2* and *W3*, and this may be a sign of the differences between the two studied populations of cyclists. Even if the merging of motor modules has been observed for stroke patients and it was able to explain the main

biomechanical impairments during upper and lower limb movements (Clark et al., 2010; Cheung et al., 2012), up to now it is not known if a simultaneous recruitment of separate motor modules is feasible in healthy conditions, and if it can lead to improved performance in terms of metabolic or muscular efficiency as a consequence of expertise.

The possible spatio-temporal re-organization of the modules could be studied for the functional evaluation of the cycling performance in both healthy and pathologic conditions. For example, this could emerge from relating the adaptations in modularity with the evolution of physiological factors such as muscular efficiency (Zameziati et al., 2006) or muscle fatigue (Theurel et al., 2011), or from providing a neuro-rehabilitation program based on cycling (Ambrosini et al., 2012), by relating the changes in modularity to the changes in the mechanical outcome of movement.

### **POSSIBLE METHODOLOGICAL LIMITATIONS**

The methodological approach used in the present study, consisting on the extraction of muscle synergies on an episodeby-episode basis, is subject-specific and is able to highlight intra-subject variation in muscle synergies. Nevertheless, it may not be able to characterize the behavior of the participant sample as a whole, since it might fail in capturing common features that could emerge only by decomposing data pooled across subjects and conditions.

Another possible limitation might rely in the use of the synchronous synergy model: in fact, up to now, it is not known whether the application of the time-varying synergy model could extract features otherwise not accessible when studying cyclic movements of the lower limbs.

# **CONCLUSIONS**

Our results provide further evidence that the motor system might rely on the combination of a reduced number of motor modules

# **REFERENCES**


Bibbo, D., Conforto, S., Bernabucci, I., Schmid, M., and D'Alessio, T. (2009b). "A wireless integrated system to evaluate efficiency indexes in real time during cycling," in *4th European Conference of the International Federation for Medical and Biological Engineering, IFMBE Proceedings*, Vol. 22 (Antwerp), 89–92.


for the control of movement. A small number of synchronous muscle synergies, scaled in amplitude and adjusted in time, are able to account for most of the variance of the EMG data. These modules are shared among subjects and across modifications in the mechanical requirements for the execution of the pedaling gesture imposed by the feedback, with the main adaptations occurring in those modules deemed responsible for a particular biomechanical sub-function (i.e., pulling up during the up-stroke phase).

Adapting to a new pedaling technique imposed by the feedback seems to be accomplished by exploring an already learnt modular structure, which is not pedaling-specific, but it is mostly shared with the one generally found in human gait. This aspect opens further perspectives in neuro-rehabilitation, e.g., by inserting cycling-based programs for the functional recovery of pathologic gait.

With respect to the study carried out by Kargo and Nitz (2003), where it was shown that skill learning is achieved by increasing the probability of selecting the most efficacious motor programs, our study only took into account a very reduced time slot of exercise, so that an immediate effect of training on a possible tuning of muscle synergies is not visible. Based on the previous observations, further studies should analyze the effect of short or long periods of training with biofeedback on the structure of muscle synergies in cycling, in order to establish if a modification in modularity occurs by altering the structure of the synergy vectors, or by selecting different motor programs. Aiming at this, and in order to overcome possible limitations of the present study, also the inter-limb coordination should be taken into account.

# **ACKNOWLEDGMENTS**

This work has been partially funded by the Italian Ministry of Higher Education and Research. We would like to thank the reviewers for their constructive indications, which contributed to improve the scientific content of the work.

of leg muscle recruitment vary between novice and highly trained cyclists. *J. Eletromyogr. Kinesiol.* 18, 359–371.


*Proc. Natl. Acad. Sci. U.S.A.* 102, 3076–3081.


across different mechanical constraints. *J. Neurophysiol*. 106, 91–103.


coordination while pulling up during cycling. *Int. J. Sports Med.* 31, 843–846.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 19 December 2012; accepted: 28 March 2013; published online: April 2013. 17*

*Citation: De Marchis C, Schmid M, Bibbo D, Castronovo AM, D'Alessio T and Conforto S (2013) Feedback of mechanical effectiveness induces adaptations in motor modules during cycling. Front. Comput. Neurosci. 7:35. doi: 10.3389/fncom.2013.00035*

*Copyright © 2013 De Marchis, Schmid, Bibbo, Castronovo, D'Alessio and Conforto. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

# Distinguishing synchronous and time-varying synergies using point process interval statistics: motor primitives in frog and rat

#### *Corey B. Hart 1,2 and Simon F. Giszter <sup>1</sup> \**

*<sup>1</sup> Neurobiology and Anatomy, Drexel University College of Medicine, Philadelphia, PA, USA <sup>2</sup> Lockheed Martin Corporation, Philadelphia, PA, USA*

### *Edited by:*

*Andrea D'Avella, IRCCS Fondazione Santa Lucia, Italy*

### *Reviewed by:*

*Stefano Panzeri, Italian Institute of Technology, Italy Dominik M. Endres, HIH, CIN, BCCN and University of Tübingen, Germany*

### *\*Correspondence:*

*Simon F. Giszter, Neurobiology and Anatomy, Drexel University College of Medicine, 2900 Queen Lane, Philadelphia, PA 19129, USA. e-mail: sgiszter@drexelmed.edu*

We present and apply a method that uses point process statistics to discriminate the forms of synergies in motor pattern data, prior to explicit synergy extraction. The method uses electromyogram (EMG) pulse peak timing or onset timing. Peak timing is preferable in complex patterns where pulse onsets may be overlapping. An interval statistic derived from the point processes of EMG peak timings distinguishes time-varying synergies from synchronous synergies (SS). Model data shows that the statistic is robust for most conditions. Its application to both frog hindlimb EMG and rat locomotion hindlimb EMG show data from these preparations is clearly most consistent with synchronous synergy models (*p* < 0.001). Additional direct tests of pulse and interval relations in frog data further bolster the support for synchronous synergy mechanisms in these data. Our method and analyses support separated control of rhythm and pattern of motor primitives, with the low level execution primitives comprising pulsed SS in both frog and rat, and both episodic and rhythmic behaviors.

**Keywords: synergy, primitives, synchronous synergy, time-varying synergy, point process**

# **INTRODUCTION**

The efficient control of an organism's motor architecture poses significant difficulties for the central nervous system. In particular, control of the limbs is an ill-posed problem: too many possible solutions are available to perform a particular motion for the nervous system to find the correct combinations of muscles in a timely manner. As a solution to this problem, a variety of modular control strategies have been presented (Giszter et al., 1989, 1991, 1992, 1993, 2007, 2010a,b; Mussa-Ivaldi et al., 1990, 1994; Bizzi et al., 1991, 1992; Mussa-Ivaldi, 1992; Mussa-Ivaldi and Giszter, 1992; Giszter and Kargo, 2000, 2002; Kargo and Giszter, 2000a,b; Mussa-Ivaldi and Bizzi, 2000; Mussa-Ivaldi and Solla, 2004; Cheung et al., 2005, 2009; d'Avella and Bizzi, 2005; d'Avella et al., 2006; Tresch et al., 2006; Bizzi et al., 2008). Modular control of motor structures reduces the number of independent points of control for the system and therefore reduces the number of degrees of freedom available in the execution of a movement.

### **TYPES OF MODULARITY**

Specifying that the motor system employs a modular control scheme does not, in the end, tell us very much. There are many different kinds of motor modularity. For example, some groups have found evidence of "kinematic modularity" or regular, repeated structure in movements or the planning level of movements (Viviani and Terzuolo, 1982; Hogan, 1984; Sosnik et al., 2004; Chiovetto et al., 2010; Omlor and Giese, 2011). In contrast, and perhaps as complement to kinematic modularity, there is execution modularity, or modular organization in the control mechanisms underlying performance of a particular task or set of tasks. Many examples of execution modularity have been reported in recent years and include such examples as central pattern generators (Grillner, 2006), half center oscillator models, blends (Stein et al., 1986; Stein, 1989), motor primitives (Giszter et al., 1991, 1993; Hart and Giszter, 2004, 2010), and time-varying synergies (d'Avella et al., 2006).

We are interested in examining two of these forms of execution modularity in detail. The first group, synchronous synergies (SS) (Hart and Giszter, 2004, 2010; Kargo et al., 2010), are built from synergistic groups of muscles activated with a fixed time course. Work done, both in our lab and several others, supports the observation that most movements generated by an unconstrained frog can be represented as the summation of multiple motor primitive style elements (Bizzi et al., 1991; Giszter et al., 1992; Mussa-Ivaldi and Giszter, 1992; Giszter et al., 1993; Mussa-Ivaldi and Bizzi, 2000). However, some researchers have advanced time-varying synergies an alternative to the motor primitive model (d'Avella et al., 2006). In this model, temporally coordinated (but not necessarily synchronous) drives are supplied to groups of muscles. These time-varying synergies are thought to form sequence units. Kinematic strokes form modules, and the time-varying synergy (TVS) might be thought to correspond to such higher order task units. Such time-varying drives, as formulated in theory, may be dilated uniformly across their temporal duration, as required by the task. In the TVS model described above, there is a strong connection between the duration of a sequence of a pulses on several muscles and the temporal widths of the pulses supplied to those muscles, i.e., uniform temporal scaling across the entire motor compositional unit.

To distinguish these two frameworks: "spatial/synchronous muscle synergies" and time-varying synergies we have examined how these models constrain the onset and peak timings of muscle activity viewed as point processes. To do this we will construct models of point processes reflecting both synchronous and TVS schemes. We show that distinct point process statistics arise. We plan to test the hypothesis that real electromyogram (EMG) activity recorded in frogs resembles activity that one would expect to see generated by a synchronous synergy model of motor production, rather than a TVS model of motor production and examine the degree to which statistics computed on real EMG data from 10 hindlimbs of the bullfrog resembles the statistics computed on each model.

There have been many metrics and procedures developed for quantifying the level of synchronous activity between sets of point processes. Many of these statistics have attempted to quantify synchronicity in terms of metrics computed between pairs of spike trains, such as cross corellograms and joint peristimulus time histograms (Ellaway and Murthy, 1985; Adams et al., 1989; Datta and Stephens, 1990; Nordstrom et al., 1990, 1992; Bremner et al., 1991a,b; Datta et al., 1991; Ushiba et al., 2002), although most of these metrics show significant sensitivity to the density of events within measured intervals (Bremner et al., 1991a,b; Kim et al., 2001). Higher order synchronization (between three timestamps) has been examined and quantified using such tools as the snowflake plot (Perkel et al., 1975; Czanner et al., 2005). However, statistics on synchronicity for time series of arbitrary dimensionality are lacking. Since the number of muscles participating in synergies is not necessarily fixed, this lack represents a problem for comparison of time series representing the activation of differing types of multi-muscle synergies. In light of these difficulties, in order to make comparisons between such processes, we have developed a metric that shows sensitivity to the degree of onset or peak synchronization over a wide range of parameters. We have performed extensive testing of this statistic using data from models of both synchronous motor synergy strategies and TVS strategies. We then apply this statistic to frog motor pattern data.

# **METHODS**

All experimental data from frogs and rats used in this paper for exposition was obtained under strict compliance with USDA and PHS guidelines, and with full oversight of the Drexel University College of Medicine IACUC.

### **DATA CONSTRUCTION**

Both real data and modeled data were used and tested in the developed type of discriminant analysis.

### *Real EMG data collection*

Real EMG data were derived from 10 spinalized frogs, with recording electrodes in 10 hindlimb muscles. (RA, RI, AD, SM, GL, VI, BI, SA, VE, ST). Frogs were anaesthetized, spinalized, and decerebrated. Ball electrodes, constructed as in (Hart and Giszter, 2004) were implanted in these 10 muscles of the hindlimb. After at least a day of recovery, EMG recordings were made during a variety of frog behaviors at 2 kHz. EMGs were recorded using a Cerebus 128-channel data acquisition system (Blackrock Microsystems) and saved to a file. Files were imported into the MATLAB™ programming environment for analysis. Imported EMG data were rectified, smoothed using an 80 point, triangular window, moving average filter, and down sampled to 250 Hz. The data were then saved to a MATLAB MAT-file and retained for later analysis.

*Basic model data generation parameters.* The basic model was defined as a renewal process, constructed on intervals of variable duration, ranging between 90 and 250 ms. Every - = 90–250 ms, an interval value was drawn from a Poisson interval distribution (exponential), and a pulse or sequence of pulses was placed at the cumulative sum of intervals up to that draw. The distribution of intervals constructed in this manner was approximately Poisson, with a maximum interval cutoff below 2 × ms (because every interval is populated with a draw). There was also some distortion away from true Poisson at the longer time scales, due to the fact that the Poisson process "reboots" at the edge of every ms interval. We tested and compared several different point process distributions. However, these did not affect our basic discrimination findings and here we focus entirely on the Poisson process results.

### **DISCRIMINANT METHOD**

We examined the effect of varying several synergy parameters on the subsequent discriminability of synchronous and timevarying synergies from one another using point process model statistics, and to distinguish the sum of time intervals between local maxima in each type of time series. These parameters were the density of synergies on an interval (ρ), the number of muscles per synergy (σ) and the number of simultaneous muscles in TVS models (η). We do not assume any *a priori* knowledge of σ in real-world data, although we do know that the majority of primitives examined in our work generally contribute significantly to between ∼2 and 4 muscles (Hart and Giszter, 2004), and because synergistic groupings with more or less muscles are possible.

Two distributions were used to model σ. The first distribution was sharply peaked and σ was permitted to vary between 2 and 10 muscles, with a maximum likelihood occurring at around 3 muscles. A second flat distribution, in which all values of σ are equiprobable, was also used. Other parameters, such as pulse amplitude were not expected to have a significant effect, based on the structure of the onset or peak-picking algorithm (the peak detection algorithm identifies all peaks above the noise threshold based on their resemblance to a template peaked waveform ) but were examined as well. For each model run, the total number of synergies was selected, at random, from a Gaussian distribution with a mean and median of ∼4 synergies/run, omitting negative results.

*Synchronous Synergy Model Construction (SS model)* Modeled data were constructed using the two different models (synchronous vs. time-varying), and used to act as the main poles of comparison for real data in this discrimination task.

The first model (**Figure 1A**) represented an idealization of SS or motor primitives. This model was constituted of several simultaneous pulses delivered to a subset of channels ("muscles") that are multiplied by a channel-specific gain. The EMG was constructed on intervals of 500 ms and the density (ρ) of possible synergies delivered on a single interval was varied between 0 and 5. Each simulated EMG would be constructed of several distinct synergies. The number of distinct synergies to be used in the entire constructed data set was selected and allowed to vary stochastically between different realizations, with care being taken not to include any duplicate synergies. (As an example of this constraint, if there were 8 muscles in one synergy, then only 2 non-identical and non-overlapping or disjoint synergies could be drawn, one synergy with 8 muscles and the second synergy with the remaining 2 muscles). We then created and drew from a distribution representing the number of possible muscles (σ) participating in each of these synergies. As mentioned in the preceding section, this distribution was permitted to be either a peaked function (reaching a maximum between 2 and 4 muscles per primitive, consistent with observed data) or a flat function where primitive muscle membership numbers were all equiprobable. For each synergy, we then randomly selected "muscles" from the 10 available muscles.

SS synergies were modeled as occurring in episodic motor patterns. After a time of occurrence of the first SS synergy (the episode onset) was chosen, other SS synergies were positioned to occur following. To accomplish this, the (randomly selected) following SS synergies in the motor pattern were assigned random onset times on the 250 ms interval following a first "seed" synergy event of the episode. This created a point process representation of motor pattern under the SS model. A continuous signal was then constructed for each synergy's component muscles, and then these were combined to obtain the different activity patterns on each virtual EMG channel. More specifically, we proceeded as follows: A gaussian pulse forming the "seed synergy" with a time

course of around 300 ms was placed at each of the point process sampled motor pattern event times. Each synergy pulse occurring on the interval was positioned shifted from the "seed synergy event" by a small amount, drawn from an exponential distribution, with a mean value of 250 ms and a median of 153 ms. Given that several synergies may in this way be invoked on the same interval, it is thus possible that there may be several overlapping pulses in a motor pattern "event" interval. In this case, the amplitudes of each pulse of an individual muscle's activity occurring in overlapping synergies were summed together. After the synergies were positioned in time the EMG signals were next calculated in this way. White noise was then added to all EMG channels of the model, with a maximum noise amplitude of 5% the maximum value of the clean signal. Amplitudes of each Synergy pulse were assumed to be either the same, or to be drawn from a normal distribution, with a mean value of 0.5, and truncated at amplitude values of 0 and 1. Amplitude effects on final results were negligible.

## *Time-varying synergy model construction (TVS model)*

The second model (**Figure 1B**), used in this analysis was constructed to simulate various instantiations of a TVS model of motor control. A single pulse event was created and placed on a 250 ms interval at a randomly drawn time. Delays for the remaining pulse events from the drawn time ranged from −100 to 100 ms and were sampled from an exponential distribution with a mean of 100 ms (the sign of the delay was selected randomly). These shifted pulse events were then delivered to a subset of channels ("muscles") that were multiplied by a channel-specific gain. A small number of such TVS were then randomly selected to be a set used in an individual realization, by repeating a randomly selected member of the set at the (Poisson distributed) synergy event times. Each such constructed synergy occurrence was scaled in time in its overall duration by a factor that was drawn from an exponential distribution with a mean of 2. This

created a point process representation of motor pattern variations under the TVS model. Apart from these changes, the TVS model continuous signal representation was then constructed as in the SS case. TVS models could thus range from models in which there were no synchronous drives delivered to any muscles, to models in which synchronous drives are supplied to many muscles in the TVS, while other muscles in a synergy are asynchronously activated. This latter variant could in principle be thought of as a TVS containing within it some number of SS.

### *Discriminant analysis of real and model data*

Simulated data from both TVS and SS models were compared with rectified, filtered, and downsampled examples of real data from frogs.

Peak/Onset times were extracted from each channel of real or simulated EMG by thresholding the EMG to 2 standard deviations above its mean value and identifying all points which exceeded this value. We then used a sliding gaussian pulse of duration 250 ms to discriminate values over the predefined threshold which represented peaks in the recorded data. We chose this peak width as it most closely reflected the dominant time scale observed in EMG recordings in bullfrog (Hart and Giszter, 2004). However, this peak discrimination process was simply to subject both real and artifically created data to a similar workflow and any errors introduced in peak or onset selection. Peaks were identified as those points over the amplitude threshold where the correlation with the sliding Gaussian pulse is more than 2 standard deviations over the mean correlation value (**Figure 2A**). This criterion nearly always (>95% of the time) found the correct local maximum in a sequence of data. We then chose the times taken from one chosen sample channel of EMG as "reference times" to be used in the ensuing analysis. The times of peak occurrence on other channels were identified via the thresholding algorithm described above and the difference from the reference times were calculated on each successive 250 point time window defined around each reference time. Peak time differences were rank ordered within each window and the mean time difference on the intervals were calculated.

A cumulative statistic Q was then created for the time window by summing time differences (**Figures 2B**,**C**) as follows:

$$Q = \frac{\sum\_{t\_{\rm ref}} \sum\_{t\_i} (|t\_i - t\_{\rm ref}|) \times \Theta(|t\_i - t\_{\rm ref}|)}{N} \tag{1}$$

−250 to 250 ms, we identify peaks in a rectified and smoothed set of EMG waveforms by sliding a gaussian waveform along the intervals and identifying points of maximum correlation with amplitudes larger than a rejection amplitude (more than 2 sds the mean EMG activity). Peak times are

**(B)** absolute differences are summed over that interval. **(C)** The same procedure is performed for all such non-overlapping intervals on the data set. The resultant Q values are rank ordered for the purpose of comparing distributions from different sets of data.

where is a step function with value 1 when its argument is < 250, and 0 otherwise. *ti* is the time of a peak in the window, *t*ref is the reference time, and *N* is the total number of pulse events in the analysis window.

The procedure for calculating Q was then repeated for the next time window in the record defined by the reference channel and reference times. We calculated Q values for each segment of the real data files, as well as each segment of both model implementations. This analysis was carried out on all runs of each model. Reference channels used were randomly assigned. In some instances all channels were tested as reference and compared. Choice of reference had little effect on statistics. This statistic, Q, captures the deviation of the peak clustering in SS and TVS patterns from that observed or expected in an unconstrained random distribution of pulse times, which was constrained to neither SS or TVS structure. The composite interval structures captured in the measure Q, will deviate from predictions based on uncorrelated point process assumptions, due to the correlations and constraints on intervals imposed by synergy structures. For example, the expectation of interval differences in unconstrained Poisson processes will be that they effectively represent the result of a Poisson process of higher rate, reflecting the independent constituent process rates. In contrast, SS constraints enforce short intervals in the joint process, and TVS enforce longer intervals, and also short intervals to the extent that SS are found within the TVS sequences.

# *Discriminant scaling analysis of real data*

To compare models and data we used the cumulative distributions of the statistic Q. Given the likelihood that connected peaks in TVS data will be farther apart than in SS data, we expected that data consistent with a TVS model will have larger values of this statistic than data consistent with SS models. Given a set of randomly drawn reference times, and for any distribution of pulses on a given interval, the maximum likelihood (ML) of the absolute value of the difference between Q values computed from a set of reference times and the distribution of pulses will be nonzero. If data is described perfectly by a SS model, Q-values will cluster around the ML value for this model, more or less normally, by the central limit theorem. Q-values from data better described by a TVS model will cluster around the ML Q-value for TV synergies. Taking the difference between Q values for each model (Qss, Qtvs) and the real data (Qreal) we arrive at two error distributions, both approximately normal.

We further anticipated that by rescaling (i.e., dilating) small time differences in EMG peak times in SS models, we will eventually obtain Q-statistics characteristic of, or similar to a TVS model, or a non-SS random pulse pattern.

Rescaling of synergies was done in constructed data as follows. During construction of simulated data sets, time-varying synergies were generated according to the procedure outlined above and were then rescaled by a variable factor. Each synergy was scaled independently before adding it to the data. Consequently, synergies consisting of only a single muscle were not shifted in time, nor was the stochastic rate at which synergies were generated scaled, leaving the length of the time series intact. Constructed SS were scaled in an analogous fashion, although in this case the rescaling factor was applied to the small temporal jitter between pulses in a primitive (See previous section for details of SS contruction).

To scale real data for comparison, we first identified likely synergies by identifying near-synchronous pulses in different EMG channels. Any pulses with a time difference of less than 5 ms were selected as potential SS and retained for dilation and statistical analysis. Those collections of pulses that appeared more often than expected due to random chance (2 SD > average frequency of occurrence calculated from 50 shuffles of time indexes of all peaks detected in a record) were identified as "synchronous synergies" for the purpose of this analysis. The small jitter between the pulses in each such synergy were then scaled as above. Scaling was relative to the mean midpoint time for each synergy/collection of pulses, so early pulses were shifted backward and late pulses forward.

As stretch of intervals outside of 250 ms is impossible in our formulation i.e., it invalidates the interval for the analysis (it will push timestamps into next window), it follows that, dilation should not significantly move the Q difference distribution of TVS data. In contrast, dilation of the SS model data was expected to yield cumulative Q values that were eventually more statistically similar to the original TVS Q values computed. Assuming real data is described by the SS model, subjecting the data to this procedure should yield Q distributions that are similar to TVS distributions. If real data is better described by the TVS model, one would expect little change in cumulative Q upon performing the dilation operation. With this in mind, we took the timestamps associated with peak times in real data and scaled the differences in timestamps over scale factors that ranged between x2 to x20 and calculated Q values at each scale-step. Scaling the differences in timestamps and scaling the underlying waveforms made no difference, as the peak picking algorithm identifies the same peak for a pulse, regardless of its width. Differences between the median Q at each scale and the original unscaled median Q-value were retained.

# *Are primitive timing and dynamics related?*

We also used more data-driven techniques to further assess the likelihood that synergies in the bullfrog consisted of SS style constructions, which has been our working hypothesis in prior research. To do this, we performed two analyses on real frog EMG data. In keeping with the Gottlieb work showing triphasic activation of muscle groups during movements (Gottlieb, 1998), we examined timing relationships within triplets of pulses extracted from frog EMG data. We first performed regressions of pulse widths for each pulse in a triplet against both the pulse widths of the other members of that triplet, and against the time delay between pulse 1 and pulse 2, as well as between pulse 2 and pulse 3. In order to assess the main sources of variance to the resultant regression coefficients, we then performed a principal components analysis on these coefficients, and generated projection biplots to assess the relative significance of the contribution of each component. Additionally, we took the entire sequence of pulse widths and pulse time differences, treated each of these variables as covariates for an independent component analysis (ICA) in order to assess the relative independence of each of the variables. Additional details on this analysis are provided in the Results section below.

### **RESULTS**

### **DIFFERENCE OF Q STATISTIC FOR SS AND TVS MODEL DATA**

We constructed 25 time series for each parameter set as above (Poisson interval distributed events) and calculated Q values on simulated TVS and simulated SS models for each sampling on these series. For a single draw from a distribution with six components and an η = 1 and a ρ = 1/90 we found that the Q distributions were easily distinguished from one another. The rank ordering of Q values from each SS and TVS distribution (**Figure 3A**), or the cumulative probabilities for each distribution (**Figure 3B**) were clearly and cleanly distinguishable from each other. Additionally, we compared Q statistics from the distribution of uniformly distributed synergy events (see above) with those calculated using the Poisson interval distribution, for each of the chosen sets of parameter values. Q statistics could not be discriminated between the uniform and Poisson generators using paired *t*-tests (comparison of TVS statistics: *p* = 0.34, SS statistics *p* = 0.75). Lowering ρ to 1/250 did not appreciably alter the situation.

### **EFFECTS OF DIFFERENT PARAMETERS**

As previously mentioned, we recognized that varying the parameters σ, ρ, and η has the potential to alter our ability to discriminate the outcomes of these different models. There are several different parameters that must be examined for their effect on the discriminability of TVS and SS models. First, the number of synergies must be considered. Secondly, each synergy may be constituted by a variable number of muscles. The distribution of the number of muscles in a synergy may also have an effect on discriminability. Q values calculated on two muscle synergies will tend toward smaller summed differences than in five muscle synergies by nature of the number of items measured and summed. This aspect of the statistic is unavoidable. Finally, the number of synergies on each interval used to calculate a Q value can impact the value of the Q parameter as well.

The number of synergies on an interval can have a drastic impact on the value of Q, as can the form of the TVS synergy itself. Therefore, we examined each 250 ms interval of our constructed data, and looked at the number of pulses on that interval.

### **EFFECT OF VARYING σ**

For small values of σ (the number of muscles per synergy), we find that the TVS, SS and real local Q distributions are very hard to discriminate. Q values that are very similar persist over all three models for a range of ρ values (**Figure 3C**), but begin to clearly diverge at a σ value of five or six muscles per primitive. For higher σ values, TVS and SS Q values are easily discriminated from one another. Real values, and SS synergies are much more similar than real and TVS values. The reason for this is easy to see. At small values of σ, only one or two muscles participate in a given synergy. Thus it becomes more difficult to say whether

of the basic SS model. **(B)** cumulative probability of the same Q distributions.

higher for TVS models than for real or SS constructed model data.

a given synergy is time-varying or not. At larger values, there are many pulses on an interval. For TVS, many of these pulses will be separated by significant time delays, resulting in a higher Q statistic. For SS, the pulses will be more tightly coordinated, yielding a lower Q statistic.

# **EFFECT OF VARYING ρ AND η**

Because σ (the number of muscles per synergy) tends to be around three or four muscles for both SS and TVS models as described in the literature and since σ and η (the number of simultaneous muscles in a TVS model ) are going to be somewhat coupled, we chose to examine the effect on model discriminability of varying ρ and η simultaneously.

For our purposes, we are classifying as "synchronous" only those synergy models in which all muscle activations in a given synergy are simultaneous (within some error δ). Synergy models containing two or more out of phase muscle activations (with the remainder occurring at variable times) will be considered, for the purposes of this study, as "time-varying."

Given this classification, we chose to examine how varying the number of simultaneous muscles in a TVS model interacts with the density of primitives on a 200 ms interval to impact the discriminability of TVS and SS models. We did this by identifying the points at which TVS and SS distributions were maximally discriminable, a measure which coincides with the Kolmogorov-Smirnov (KS) statistic. KS statistics for the discriminability of TVS and SS models, as a function of these variables are shown in **Figure 4**. TVS and SS models were constructed using both peaked (**Figure 4A**) and flat (**Figure 4B**) muscle/synergy distributions. As can be seen, for the peaked distribution, KS statistics were significant for a wide range of combinations of parameters. Further, the number of simultaneous muscles within a synergy did not appear to significantly impact the significance of these differences even at the lowest synergy density. For the peaked distributions, the Q statistic failed to distinguish TVS and SS models for high numbers of muscles (>4) in combinations that occurred only rarely on an interval (black floor in **Figure 4C**). Expanding the interval, or increasing the data set, might abolish this dead zone of discrimination. For the flat distribution of primitives, for all combinations of these variables, TVS and SS models were easily discriminable across all tested conditions.

from SS models at an alpha level of 0.001 for up to three simultaneous

muscles in a TVS).

# **DISTANCE BETWEEN REAL DATA AND SS DATA vs. REAL DATA AND TVS DATA**

We next calculated Q values for five records of real EMG activity (see methods). We then compared the middle of this Q distribution (Qreal) to the middle of Q distributions calculated for SS (Qss) and TVS (Qtvs) models at each of the parameter values examined above. We plotted the distance between Qreal and Qss against that between Qreal and Qtvs. Note that where the two distributions are broadly discriminable, either for synergy densities drawn from a peaked distribution (**Figure 5A**) or a flat distribution (**Figure 5B**) the distances between Qreal and Qss were smaller than Qreal and Qtvs (**Figures 5C**,**D**). Further, the real data is found *beyond* the model SS cumulative curve, rather than between the SS and TVS curves. The real data thus exhibited stronger SS statistics than our artificially created data. Accordingly, the cumulative distributions support the idea that the SS hypothesis for real data is likely still stronger than the *p* < 0.001 calculated for the artificial data. However, we limit our assessment to this value here.

# **EFFECTS OF FORCING DILATION ON DATA AND Q STATISTICS**

The construction of the Q statistic provides an additional test for the presence of SS synergies, based on a dilation manipulation of the synergies' time scale within the analysis window. The analogy is to using a higher powered microscope field, the intervals are dilated but some fall out of the analysis window field of view. If we assume that a particular distribution of pulses were constituted primarily of time-varying synergies with a maximum time scale of less than 200 ms (reasonable for frogs given that most movements are executed in under a half second), then finding the time of each pulse with respect to some arbitrary reference time on each interval and then scaling these times by a constant dilation within each interval should not impact the Q statistic significantly. In pure TVS synergies lacking SS components, or non-synergy processes, this scaling would simply push a few of the pulses out of the analysis window interval for computing Q values. So the Q value before dilation minus the Q value computed after dilation should often be near zero. In contrast, for a SS model, it should be possible to dilate all the short pulse times (with respect to a reference time) while keeping them all in the analysis window. As short intervals are highly associated with SS synergies (e.g., Krouchev et al., 2006; Markin et al., 2012)—dilation in the analysis window affects SS quite a lot, first altering Q statistics, before the Q statistics plateau. The statistic will plateau at the point at which pulses are pushed into the next computational interval). A new statistical measure then is obtained from the

**statistic.** Linear discriminant separates Qss-Qreal (y axis) and Qtvs-Qreal (x axis). **(A)** In data drawn from the peaked distribution. **(B)** Cumulative probability shows TVS/SS distinction. **(C)** For peaked distribution (see **Figure 4A**) we find that for a range of discriminable Qss-Qreal < Qtvs-Qreal. **(D)** For flat distribution (see **Figure 4B**) all parameter pairs ρ and η Qss-Qreal < Qtvs-Qreal. In both cases, this is consistent with a model where EMG activity is generated via an SS-type strategy.

difference of the Q statistic after the dilation minus that of the unaltered data. This new difference in Q statistic should rise to a plateau at a particular scale value. We found that the dilation of real data time intervals does tend to push the Q statistic for dilated data toward the Q distributions for TVS data, and obviously this is captured in the difference. This is shown to be the case in **Figure 6A**. As the real data is dilated (**Figure 6B**), the cumulative distribution moves from its position beyond the synthetic SS model data, and crosses over to lie between the SS and TVS cumulative distributions. Furthermore (**Figures 6C,D**), the difference between dilated and unaltered data Q statistics in model data is close to zero for TVS synergies (tQ cannot be significantly altered as the analysis intervals are dilated) but increases significantly for real data or SS model synergies. This observation holds true for both peaked and flat synergy density distribution models.

# **CONNECTION BETWEEN PULSE TIMING AND SCALING OF MUSCLE ACTIVATION**

TVS models predict a certain degree of covariance between the time scale of individual EMG activation and the relative timing of pulses in a synergy (d'Avella and Bizzi, 2005; d'Avella et al., 2006). Therefore, separate from the Q statistic analysis, we also chose to examine the functional dependence of primitive duration on the time delay between primitives. We explored this in wipes with a three primitive sequence, largely following work on triphasic bursts in human reaching (Gottlieb, 1998). We wanted to examine the degree to which primitive duration varied within these three primitive sequences, as well the dependency of primitive duration on pulse timing within a short sequence of primitive activity. Occurrences for each triplet were shuffled and significant three primitive sequences in the original data were identified by finding those triplets with z-scores greater than 2 (based on the statistics of the shuffled data). For each occurrence of a significant triplet, we calculated regression coefficients between (a) the durations of pulse 1 (D1) and pulse 2 (D2), (b) pulse 2 and pulse 3 (D3), (c) time of pulse 2-time of pulse 1 (t2-t1) and the time of pulse 3—the time of pulse 1 (t3-t1), (d) all cross terms (i.e., pulse duration 1 vs. time delay between pulse 1 and pulse 2). A principal components analysis was performed on the calculated regression coefficients, and the first two principal components were retained and plotted against one another (**Figure 7A**). Each of the regression coefficients was plotted on these axes as well. The regression coefficients between pulse widths (a, b) tended to align with the second principal component (**Figure 7A**, red lines) while the regression coefficients between pulse time delays (c) aligned

**generated by a TVS strategy, for a fixed analysis window. (A)** The Q statistic distribution of real data (green, left) which is clearly SS in form, is moved toward the Q statistic distribution of the TVS model data when intervals between real data are scaled linearly with an unscaled analysis window **(B)**. **(C)** Peaked σ-distribution: comparing the difference between

tightly around particular time scales (i.e., there is more room to scale intervals within the window before a timestamp is forced into the next counting interval and drops from the statistic) compared to TVS generated data. **(D)** Flat σ-distribution: note the nearly identical performance to that in **(C)**.

**(B)** Mixing matrix coefficients from an independent component analysis on time series defined by each peak time difference from first element in each triplet concatenated to the scale of the pulses during counting interval demonstrate a strong segregation of time-scale related information and peak time related information. Taken together, these results indicate that the duration of pulses in the EMG do not scale linearly with the variations in the interval to their time of occurrence, which is a prediction of a TVS model.

strongly with the first principal component. Cross term regression coefficients tended to be smaller than either of these and were distributed more or less equally between the principal component axes.

As a second test, for each occurrence of each significant three primitive sequence, we placed the t2-t1 values for each word occurrence in one row of a multidimensional array, t3-t1 in another row, and D1, D2, and D3 in the final three rows. The resulting multidimensional array was treated as a time series and decomposed using an independent components analysis (ICA). The resultant mixing weights (**Figure 7B**) show a clear segregation between the time scale components (the final three components that project to D1, D2, D3) and the pulse occurrence components (the first two components which project to the t2-t1 and t3-t1 series.

# **CYCLIC PATTERNS—SS AND TVS COMPARED IN RAT TREADMILL WALKING**

We examined the behavior of the Q statistic and related measures in rat ambulation, where cyclic patterns occur. The value of the Q statistic would be very limited if use was confined only to non-rhythmic motor behaviors. Surrogate TVS and SS data sets were constructed as in the methods. Calculation of the Q statistic was performed as described in the methods. We limited the counting time window to 400 ms in order to better deal with the compressed time scale of pulse activation observed in EMGs recorded from treadmill walking rats. Pulse widths extracted from data at a mean step rate of 1.0 step/cycle did not appear to exhibit significant variability (**Figure 8A**), an observation that was confirmed when we compared pulse widths against the actual step cycle length (**Figure 8B**).

We then sought to systematically vary step cycle duration and observe the effect on the Q statistic and pulse scaling (**Figures 9A,B**). We noted that the distribution of Q values did not exhibit any noticeable trend as step cycle duration was varied, either in TVS and SS models, or in real data (**Figure 9A**). The Kolmogorov Smirnov test showed the rat interval structure deviated from TVS significantly (*p* < 2.3 E-15) and also from expected SS (*p* < 5.2 E-11). However, once again, as in the frog data, for the rat, the real data is found *beyond* the model SS cumulative curve, rather than lying between the SS and TVS curves. The discontinuity observed in the real rat EMG data Q statistic curve in **Figure 9** is due to overlapping occurrence of two nearly-simultaneous, but apparently distinct, muscle groups on some trials. Because of these short-interval follow-on synergies in the rat data, the Q statistic could not go much above a certain value on the intervals where these overlaps occurred, because the short time separation between them lowered Q significantly and

thus created the low values and the discontinuity seen in the Q value population rank order. This was a situation not explicitly modeled in either the TVS or SS cyclic model data. However, it does not appear to affect our results discrimination using median values (**Figure 9B**) but may account for the *increased* distance of rat data from TVS curves. Examining pulse time course as a function of step cycle duration (**Figures 9C,D**) we observed that there does not appear to be a strong monotonic trend relating cycle duration and the scaling of pulse widths. Q statistics of the real data favored the SS models over TVS patterns, even in these cyclic repeating "pattern generator" data, consistent with separation of pattern formation and rhythm generation (Rybak et al., 2006; McCrea and Rybak, 2007, 2008).

# **DISCUSSION**

There is significant controversy in motor control as to the nature and origin of the strategy used for reduction of dimensionality. Many kinds of primitive—synchronous synergy, or TVS or motor pattern "block," or kinematic stroke—have been advanced as the fundamental organizational units for the great bulk of motor activities. To date it has proven difficult to identify which organizational strategy is used in the assembly of a given motor pattern observed during behavior.

Prior work has used onset or peak timings in EMG to explicitly cluster patterns into SS (Krouchev et al., 2006; Drew et al., 2008; Markin et al., 2012). We here presented a newly developed set of tests, that are capable of discriminating presence of TVS and SS based on onset or peak timing statistics prior to any explicit clustering of data. In part, the largest value of our method is that it is applicable to the EMG time series prior to any synergy extraction or fitting processes and free of assumptions about the precise type or numbers of synergies. After this new analysis, other synergy extraction procedures, guided by the discriminant results could be applied to extract the synergy structures with better precision. We investigated the statistical properties of a range of TVS and SS models for the construction of muscle activity and found that the Q statistic we developed represented a good discriminant over a large range of parameters when applied to the EMG time series in this way.

### **RANGE OF APPLICABILITY OF Q**

For a large range of parameters, it is possible to discriminate models arising from a TVS model from those arising out of a SS model by examining the EMG (or mixed) output time series' Q statistic. We varied three parameters in modeling phase of this study*:* the density of synergies occurring on an interval (ρ), the number of muscles per synergy (σ) and the number of simultaneous muscles in a TVS model (η). In general, the models became harder to discriminate as η increased (i.e., for mixed models), and we also observed that varying the shape of the synergy membership (σ) distribution appeared to have the strongest effect on the significance of these discriminations.

These limitations on results are not surprising: If most primitives have between 2 and 4 synchronous muscles active during their activation (as is the case in the peaked distribution) it will become much more difficult to discriminate TVS constructed data from SS data as the number of synchronous muscles mixed into the set of time-varying synergies increases. In effect, the timevarying synergies look more and more like SS in this case, because of the larger number of small delays. However, we found that in real data sets from rats and frogs the Q statistic we found was unambiguous, and was structurally always clearly in the SS model domain. In fact the real data Q statistic curve was "more" SS (i.e., further from the TVS curve) than the randomly generated family of SS data curves, and in both instances (frog and rat) the data lay further beyond these SS curves rather than between SS and TVS. The Q statistic of actual real world motor patterns and synergy data is a specific realization of one of the SS family of realizations and apparently strongly different.

As a test of the generalizability of this approach, we did the analysis of cyclic data using treadmill walking data from adult rats. The behavior of the Q statistic was qualitatively the same, although some adjustment had to be made for the fact that bursts of activity in rat EMG tend toward shorter time scales in the rat, at least during treadmill walking behaviors, and short latency synergy burst overlaps occurred in some cycles. Our Q statistic results for the rat were consistent with the idea of separation of rhythm and pattern components of CPG output suggested by McCrea and Rybak. The Kolmogorov Smirnov test shows the rat interval structure deviates from TVS significantly (*p* < 2.3 E-15) and also from expected SS (*p* < 5.2 E-11). However, the Q aligns well with SS Q behavior over much of its range (**Figure 9A**). We attribute the deviation in mid range to the cyclical pattern of locomotion, occasional closely overlapped or near synchrony of specific synergies during the cycle, and consequent deviations from the levels of intermittency used in the original simulations of motor patterns generating the expected Q distributions. A more complete exploration of the Q statistic and other interval-pulse-scaling metrics in the cyclic patterns may provide significant insight into the mechanisms underlying locomotor flexibility in quadrupedal mammals.

# **WHAT Q TELLS US ABOUT SYNERGIES IN THE FROG**

We compared SS to time-varying synergies using a linear discriminant analysis with the Q statistic. Differences were taken between Q value from each model and Q values computed on real data. These differences were then plotted against one another and a separatrix representing equal differences was used as a criterion for classification. Those "below" the line represented values where Qtvs-Qreal was greater than Qss-Qreal. I.E, i.e., these points were "closer" to a SS model than a TVS model. The separatrix used in this case was thus a line with slope of 1.

A broadly similar pattern was seen to that in the preceding section. Discrimination was possible for synergies with muscle densities (σ) drawn from a flat distribution, but became more difficult for σ drawn from the peaked distribution as η (number of simultaneous muscles in a synergy) was increased. The results for comparison of frog data to a flat σ distribution are unambiguously indicative of an SS model. Assuming instead that the σ is from a peaked distribution in frogs, it is still most likely the case that synergies observed in the bullfrog during a variety of reflex and locomotor tasks arise only from SS. The only alternative would be very weak scaling of time-varying and very high jitter synergies, within which most muscles are activated synchronously. This alternative may beg the question, and weakens the TVS formulation elegance and simplicity of more strongly coupled units. It also may not match other measures of synergy compositionality. The real data lay outside the bounds of both the TVS and SS curves, below the SS curves in both rat and frog. In the rat, inspection of a discontinuity revealed an intermittent coactivation of two synergies caused the discontinuous and outward deviation. Similar coactivations occur in frogs (e.g., in wiping behaviors). The statistics associated with these tight coactivations or "synergies of synergies" were not well represented in our simulations and would have been very rare in our generator processes. However, the processes noted would further differentiate the point process statistics between TVS and SS rather than collapsing them together, and should not be assumed *a priori*.

As a final check on the discrimination we used an interesting side effect of the Q statistic that we discovered, namely that rescaling the pulse times, thereby moving some pulses outside the bounds of the 250 ms counting interval does not appreciably change the TVS Q statistic, but significantly alters the SS Q statistic. We used this as an additional check to ascertain whether TVS or SS models statistics better explain observed EMG data from the frog. Because the statistic is only sensitive to rescaling within each counting intervals, rescaling the time differences between nearly SS results in large changes to the Q statistic relative to the unscaled data. These increases in -Q occur with increasing scaling, until plateauing at a particular -Q value (the point at which "rescaled" pulses are pushed into the next counting window). Rescaling TVS (time-varying synergies) fails to alter -Q appreciably for TVS models. Consonant with the other results so far, performing these operations on real EMG data we found that the data rescaled as one would expect SS to scale. The coactive synergies noted above in real frog and rat data would likely have further exacerbated these scaling effects.

Taken together, these data and analyses lead to the following conclusion: at least in the bullfrog, and SS model, or synergist coactivation of groups of muscles appears to be the norm. Any strategies similar to the TVS model are much more of an exception, at least within this preparation.

### *Covariance of muscle activation and pulse timing*

As a final check for any evidence of time-varying synergies in the frog, given the Q statistic results, we explored directly whether the time differences between pulses in a three pulse triplet depended at all on the pulse widths in that sequence. This would indicate correlated time-rescaling. PCA performed on regression coefficients between pulse time differences and time widths found that the most significant contributions to variance were entirely from components closely aligned to either the time difference axes, or pulse width axes. In contrast, PCs representing regression cross terms between pulse widths and

### **REFERENCES**


joint movements? *Behav. Brain Sci.* 15, 603–613.


pulse time differences contributed to very much less variance to the overall data set. The results thus indicate that there is little overlap or covariation between pulse width and pulse time/phasing differences and that the bulk of the variation of the delay between pulses in the EMG is independent of the variations underlying pulses durations. This observation is inconsistent with the notion of a uniformly scaled TVS, but matches the Q statistic data presented here. In frogs other data support SS structure based on ICA decomposition (Hart and Giszter, 2004), neural analyses (Hart and Giszter, 2010) and explicit physiological perturbations. We have demonstrated the ability to recruit SS synergies as single pulses (Kargo and Giszter, 2000a,b), and perturb them separately within a motor pattern (Kargo and Giszter, 2008), both inconsistent with TVS descriptions.

To bolster the temporal structure observations here, we performed an ICA on all time differences in the data set, and the corresponding pulse widths. We found that the resulting mixing matrixes indicated very little mixing between pulse width and time differences, with components contributing primarily to one or the other category of time series. This observation is also inconsistent with uniformly or correlated scaling of pulse duration and sequence as expected in TVS. The rat cyclic data also showed a lack of correlation between pulse duration and cycle duration in our data.

In summary, we present a new analysis working directly with EMG peak or onset data to differentiate synchronous and TVS patterns prior to full decomposition. These analyses applied to data from frog and rat support composition of frog and rat motor patterns as independent rhythms and synchronous synergy pulses, consistent with a separated control of the rhythm/phase and pattern compositional elements. These separated controls may be important to allow the force/effector compositional controls to adjust as needed to support the next level of task composition. Unitary task elements at the kinematic and kinetic levels of task description occur in reaching, and locomotion, but adaptation of these to momentary conditions may require less stereotypy in the supporting compositionality of muscle synergy bursts and pattern.

## **ACKNOWLEDGMENTS**

Supported by NIH NS54896, and NIH NS40412 and NSF IIS-0827684.

in the degree of synchronization exhibited by motor units lying in different finger muscles in man. *J. Physiol.* 432, 381–399.


motor behaviors. *J. Neurosci.* 25, 6419–6434.


*Proc. Natl. Acad. Sci. U.S.A.* 102, 3076–3081.


primitives in the frog spinal cord," in *Neural Systems: Analysis and Modeling,* ed F. Eeckman (Boston: Kluwer), 377–392.


in the intact cat as revealed by cluster analysis and direct decomposition. *J. Neurophysiol.* 96, 1991–2010.


Nerve-impulse patterns: a quantitative display technique for three neurons. *Brain Res.* 100, 271–296.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 21 December 2012; accepted: 16 April 2013; published online: 09 May 2013.*

*Citation: Hart CB and Giszter SF (2013) Distinguishing synchronous and timevarying synergies using point process interval statistics: motor primitives in frog and rat. Front. Comput. Neurosci. 7:52. doi: 10.3389/fncom.2013.00052*

*Copyright © 2013 Hart and Giszter. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

# Kinematic decomposition and classification of octopus arm movements

#### *Ido Zelman1,2\*, Myriam Titon1, Yoram Yekutieli 1,3, Shlomi Hanassy , Binyamin Hochner and Tamar Flash1 \* 4 4*

*<sup>1</sup> Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot, Israel*

*<sup>2</sup> General Motors, Advanced Technical Center - Israel, Herzliya, Israel*

*3 Department of Computer Science, Hadassah Academic College, Jerusalem, Israel*

*Department of Neurobiology and Interdisciplinary Center for Neural Computation, Hebrew University, Jerusalem, Israel <sup>4</sup>*

### *Edited by:*

*Yuri P. Ivanenko, IRCCS Fondazione Santa Lucia, Italy*

### *Reviewed by:*

*Dagmar Sternad, Northeastern University, USA Michael J. MacLellan, IRCCS Fondazione Santa Lucia, Italy*

### *\*Correspondence:*

*Ido Zelman and Tamar Flash, General Motors, Advanced Technical Center - Israel, PO 12091, Herzeliya Pituach 46725, Israel. e-mail: ido.zelman@gmail.com; tamar.flash@weizmann.ac.il ;*

The octopus arm is a muscular hydrostat and due to its deformable and highly flexible structure it is capable of a rich repertoire of motor behaviors. Its motor control system uses planning principles and control strategies unique to muscular hydrostats. We previously reconstructed a data set of octopus arm movements from records of natural movements using a sequence of 3D curves describing the virtual backbone of arm configurations. Here we describe a novel representation of octopus arm movements in which a movement is characterized by a pair of surfaces that represent the curvature and torsion values of points along the arm as a function of time. This representation allowed us to explore whether the movements are built up of elementary kinematic units by decomposing each surface into a weighted combination of 2D Gaussian functions. The resulting Gaussian functions can be considered as motion primitives at the kinematic level of octopus arm movements. These can be used to examine underlying principles of movement generation. Here we used combination of such kinematic primitives to decompose different octopus arm movements and characterize several movement prototypes according to their composition. The representation and methodology can be applied to the movement of any organ which can be modeled by means of a continuous 3D curve.

**Keywords: octopus, motion analysis, kinematic motion primitives (kMPs), 3D reconstruction, muscular hydrostat**

# **INTRODUCTION**

Octopuses are considered to be among the most developed and intelligent animals in the invertebrate kingdom, where at least part of their skills can be attributed to the high maneuverability of their arms and the capacity of the peripheral nervous system to process sensory information and control arm movements. The octopus uses its arms for various tasks such as locomotion, food gathering, hunting, and sophisticated object manipulation (Wells and Wells, 1957; Fiorito et al., 1990; Mather, 1998). The versatile and adaptive nature of octopus movements is mainly due to the flexible nature of the octopus arms which do not contain any rigid skeleton. The octopus arm is a muscular hydrostat built of closely packed arrays of muscle fibers organized in three main muscle groups: parallel, perpendicular, and helical that runs obliquely to the long axis (Matzner et al., 2000). A constant volume constraint that holds for muscular hydrostats allows forces to be transferred between the longitudinal and the transverse muscle groups. The movements of a muscular hydrostat are based on combinations of four elementary movements that can occur at any location: elongation, shortening, torsion, and bending (Kier and Smith, 1985). Therefore, both structural support and force transmission are achieved through the arm's musculature, such that the biomechanical principles governing octopus arm movements differ from those operating in arms with a rigid skeletal support.

The octopus nervous system is divided into a central and peripheral nervous systems. Axial nerve cords are projecting from the brain along the center of each arm, and the peripheral neurons located in the axial nerve cords are organized into an extensive nervous system comprising both sensory and motor circuits (Young, 1971). Behavioral studies suggest that the nerve cord circuitry and the peripheral components play a major role in the control of the complex actions performed by octopus arms (Altman, 1971; Wells, 1978).

Analyses of octopus reaching (Gutfreund et al., 1996, 1998; Sumbre et al., 2001; Yekutieli et al., 2005a,b) and fetching movements (Sumbre et al., 2001, 2005, 2006) have revealed some control principles that underlie movement generation. During reaching a bend point propagates along the arm following an invariant velocity profile. Fetching movements use a vertebrate-like strategy, reconfiguring the arm into a stiffened *quasi*-articulated structure. These movements were studied by analyzing the kinematics of the movements of specific points along the arm which display several stereotypical characteristics. Electromyographic recordings and detailed biomechanical simulations assisted in revealing common principles which reduce the complexity associated with the control of these movements. The travelling bend, used in arm extension movements, was found to be associated with a propagating wave of muscular activations, where simple adjustments of the excitation levels at the initial stages of the movement can set the velocity profile of the whole movement. Recently, a soft robotic arm inspired by the octopus arm has been designed in order to reproduce the octopus tentacle motor performance and to examine the possibility for the implementation of motor control principles identified in the octopus as part of its control (Laschi et al., 2009; Calisti et al., 2011).

However, describing the movements of specific points along the arm is insufficient for analyzing the full complexity of octopus arm movements. To determine whether the kinematics of octopus arm movements can be described by a reduced set of motion primitives requires analysis of different types of arm movements and of the shape of the entire arm as it moves through space. Motion primitives can be regarded as a minimal set of movements, which can be combined in many different ways giving rise to the richness of vertebrate and invertebrate movement repertoires and allowing motor learning of new skills (Flash and Hochner, 2005; Bizzi et al., 2008). Motor primitives have been inferred at various levels of the motor control system. Submovements were shown to be combined at the kinematic level (Krebs et al., 1999; Rohrer et al., 2002), a reduced set of static force field underlie controlling arm posture (Mussa-Ivaldi and Bizzi, 2000; d'Avella et al., 2003, 2006), while movement dynamics can be learned through a flexible combination of dynamic primitives (Thoroughman and Shadmehr, 2000). Dynamical movement primitives were also used to model attractor behaviors of autonomous non-linear dynamical systems and rhythmic movements (Ijspeert et al., 2002, 2013), and discrete and rhythmic movement elements were used to investigate single-joint and multi-joint motor behaviors (Sternad et al., 2000; Sternad and Dean, 2003). Inferring motion primitives from octopus arm movements may help understand underlying principles and kinematic optimal measures, and provide new understanding of how the nervous system in muscle hydrostats handles the complexities associated with the control of hyper-redundant arms. This may also facilitate designing control systems for hyper-redundant robotic manipulators.

Here we refer to the behavioral level and aim at describing octopus arm behaviors as being composed of elementary kinematic units to which we also refer to as motion primitives (Flash and Hochner, 2005). We believe that identifying basic kinematic patterns is the first step in further investigating the existence of primitives also at the control, movement dynamics, and muscle activation levels as well as the neural control levels as was demonstrated in earlier studies of the octopus motor system (Gutfreund et al., 1996, 1998; Sumbre et al., 2001, 2005, 2006; Yekutieli et al., 2005a,b). As was discussed in Flash and Hochner (2005) elementary building blocks may exist at all the above levels of motor representation but the most immediate and direct way is to search for elementary units at the kinematic level. Movement strokes with specific spatial and temporal features and submovements were shown to successfully describe both periodic and discrete motions and were indicated as plausible building blocks of human and monkey movements (Sosnik et al., 2004; Polyakov et al., 2009). Furthermore, in robotics research locomotion trajectories for a humanoid robot were constructed based on kinematic motion primitives derived from humans' locomotion trajectories (Moro et al., 2011, 2012). Relations between behavioral and control levels were suggested in different earlier studies, for example: hand trajectories of stroke patients were shown to be composed of submovements with velocity primitives obeying the minimum jerk model (Flash and Hogan, 1985) whose number was found to decrease as the patients gained better control of their limb (Rohrer et al., 2004), simple curved two-dimensional trajectories that follow the two-third power law (Lacquaniti et al., 1983) were described by means of parabolic units and corresponded to neural activation states identified using a hidden Markov Model (Polyakov et al., 2009). Another example consists of grasping and object manipulation movements described as arising from wellcoordinated combinations of basic motor actions-arm transfer and hand shaping (Jeannerod, 1994).

An algorithm for 3D tracking and analysis of octopus arm movements (Yekutieli et al., 2007; Zelman et al., 2009) enabled us to create a large database of many types of modeled octopus arm movements. Here we describe a new framework allowing the extraction of kinematic units from these reconstructions. The octopus arm movements were represented by a pair of surfaces describing the curvature and torsion values of the arm. 2D Gaussians for each surface were extracted such that each Gaussian represented the characteristic shape of the curvature or torsion along a section of the octopus arm during some time interval. We found that Gaussian functions generally fit quite well the continuous form of the configurations that the octopus arm can take with respect to both the time and the arm index dimensions: the curvature and torsion values were observed to change smoothly along the arm length for any quasi-static arm configuration, and the magnitude of curvature or of torsion at any specific point was gradually changing with time during the movement. Gaussianlike functions were previously used in composing hand velocity (Thoroughman and Shadmehr, 2000) and limb position profile (Hwang et al., 2003).

The resulting Gaussians were divided into clusters whose centroids defined kinematic units, and each movement was represented as a weighted combination of such units. These kinematic units can be used to form a language of motion primitives, allowing characterization and representation of a large repertoire of octopus arm movements. We show how these kinematic units can be used to classify octopus arm movements into meaningful groups. Understanding how kinematic primitives can be utilized and combined can greatly contribute not only to studies of motor control in octopus arms and other hyper-redundant appendages but can also provide a deeper understanding of motor control systems in general.

# **METHODS**

The analyzed octopus arm movements in our study were performed by four specimens of *Octopus vulgaris*, weighing 200 (female), 200 (male), 450 (male), and 470 (female) g. The animals were maintained in 50 × 40 × 40 cm tanks containing artificial seawater. The water was circulated continuously in a closed system through a biological filter of Orlon, gravel and coral dust. Water temperature was held at 16◦C in a 12 h light/dark cycle. Prior to the video recording sessions, the animals were moved to a bigger glass tank (80 × 80 × 60 cm) with a water temperature of 18◦C.

# **SPATIO-TEMPORAL REPRESENTATION OF MOVEMENT AS A PAIR OF CURVATURE AND TORSION SURFACES**

Since the octopus arm displays no well-defined landmarks, a skeletal representation can be naturally used to model the octopus arm using curves which prescribe its virtual backbone. The backbone was found using a "grass fire" algorithm that extracts the middle line of the arm: first the contour of the arm is separated into two sides, dorsal and ventral, from base to tip. Then two distinct waves are initiated from the two sides of the contour and are propagated at an equal speed inward. The set of points where the wave fronts collide is the midline (Yekutieli et al., 2007).

The reconstructions of octopus arm movements result in a sequence of 3D curves prescribing the virtual backbone of the octopus arm as its configuration changes during the movement. **Figure 1** presents an extension movement as a sequence of 60 3D curves that prescribe the virtual backbone during the movement. For each curve, green, and red points mark the base of the arm (that was aligned between sequential images using textural cues) and the tip, respectively. Given a sequence of *m* 3D curves as an input, we wished to construct a pair of surfaces describing the values of the curvature and torsion along these curves. Since arm configurations were reconstructed from video records whose sample rate was 50 frames/s, the smoothness of the motion between consecutive configurations of a movement was guaranteed, and a spline function was used to smooth noisy points as necessary (Yekutieli et al., 2007; Zelman et al., 2009).

Each 3D curve was first represented by (*n* = 100) sample points uniformly distributed along the curve. This 3D curve was then approximated by a cubic smoothing spline constructed of

piecewise third-order polynomials passing through the *n* sample points. Approximation was achieved, by considering both the smoothness of the spline and the distance between the spline and the sample points. Formally, given the data site *x*(*j*) and the corresponding data values *y*(*j*) for *j* = 1,..., *n*, the cubic smoothing spline *f* minimizes:

$$p\sum\_{j=1}^{n} \left| \mathcal{y}(j) - f(\mathbf{x}(j)) \right|^2 + (1 - p) \int \left| D^2 f(t) \right|^2 dt,$$

where the integral over the second derivative of *f* is over the smallest interval containing all the entries of *x*. The smoothing parameter *p* defines the tradeoff between the success in approximating the data points and the smoothness terms.

We then calculated the curvature (κ) and torsion (τ) values for the *n* sample points along each of the 3D curves. The curvature was calculated using the circle passing through three successive points as an approximation of the osculating circle to the curve at the middle point. This is formally described by Calabi et al. (1998): Let *A*,*B*,*C* be three successive points on the curve *C* such that the Euclidean distances are *a* = *d*(*A*,*B*), *b* = *d*(*B*,*C*), *c* = *d*(*A*,*C*). Let denote the area of the triangle whose vertices are *A*,*B*,*C*, and let *s* = <sup>1</sup> <sup>2</sup> (*a* + *b* + *c*) denote its semi-perimeter, so that - <sup>=</sup> <sup>√</sup>*s*(*<sup>s</sup>* <sup>−</sup> *<sup>a</sup>*)(*<sup>s</sup>* <sup>−</sup> *<sup>b</sup>*)(*<sup>s</sup>* <sup>−</sup> *<sup>c</sup>*). Then the radius of the circle passing through the points *A,B,C* is computed leading to the formula for its curvature:

$$\kappa(A, B, C) = 4 \frac{\sqrt{s(s-a)(s-b)(s-c)}}{abc}.$$

In this study we will use the word curvature as the inverse of the radius of curvature.

The torsion along a 3D curve, defined as τ = *<sup>d</sup>*<sup>θ</sup> *dt* , was calculated for a pair of successive points as the angle between the normals to the planes defined by the successive triangles corresponding to these points, divided by the distance between the points (Boutin, 2000). Let *A,B,C,D,E* be five successive points on the curve such that *n*ˆ*ABC* and *n*ˆ*CDE* are the normals to the planes defined by *A,B,C* and *C,D,E* respectively, and the Euclidean distance between points *B* and *D* is *d*(*B*,*D*). Then the torsion at point *C* is calculated as:

$$\pi(C) = \frac{\cos^{-1}(\hat{\boldsymbol{\mu}}\_{ABC} \cdot \hat{\boldsymbol{\mu}}\_{CDE})}{d(B, D)}.$$

Finally, the curvature and torsion values calculated for a sequence of 3D curves were represented on two surfaces that separately described the curvature and torsion as a function of time and arm index. The result was a pair of smooth and normalized curvature and torsion surfaces, such that a single arm movement was compactly represented by a pair of *n* by *n* matrices. This representation was invariant to rotation and translation in a Cartesian coordinate system, as curvature and torsion measures are intrinsic (i.e., they do not depend on the orientation and position of the arm in 3D space).

**Figure 2** presents the curvature and torsion surfaces for the extension movement presented in **Figure 1**. The relatively high

values of the curvature surface generally describe propagation of a bending section from the middle of the arm toward the tip. The decrease in the torsion values as the movement proceeded means that the arm configuration became relatively planar.

### **SURFACE DECOMPOSITION USING GMM**

Gaussian Mixture Model (GMM) is a statistical method for density estimation and data clustering (McLachlan and Peel, 2000). In this model a Gaussian fitting method can be used to approximate a function of one variable by a weighted sum of 1-dimensional Gaussians. As GMM is a generalized framework, it can approximate any multidimensional data by a set of multivariate Gaussians. The model uses an iterative process which optimizes the Gaussians' parameters by the Expectation Maximization (EM) algorithm (Xuan et al., 2001).

A function of one variable (*y* = *f*(*x*)) can usually be approximated by a mixture of 1D Gaussians, where each Gaussian is defined by its *mean* and *standard deviation*. In our case, we refer to a surface as a function of two variables *z* = *f*(*s,t*), where *z* stands for either the curvature or torsion values, and *s,t* refer to the arm index and time dimensions, respectively. We therefore use 2D GMM to approximate a surface by a weighted combination of 2D Gaussians. Specifically, a surface is approximated as:

$$z(s,t) = \sum\_{i} \boldsymbol{w}\_{i} \cdot \mathcal{g}\_{\{\boldsymbol{\mu}\_{i}, \Sigma\_{i}\}}(s,t),$$

where *g* is a Gaussian defined by a 2 × 1 *mean* vector μ and 2 × 2 *covariance* matrix , and *w* is the Gaussian weight. The Gaussian *g* is defined as:

$$\varrho\_{\left[\mu,\,\Sigma\right]}(\vec{\chi}) = \frac{1}{2\pi \left|\Sigma\right|^{1/2}} \exp\left(-\frac{(\vec{\chi}-\mu)^{T}\Sigma^{-1}(\vec{\chi}-\mu)}{2}\right),$$

where *x* = *s t* . The *mean* vector <sup>μ</sup> corresponds to the position of the Gaussian center on the surface, and the two eigenvalues of the *covariance* matrix correspond to the *standard deviation* of the 2D Gaussian. Its two eigenvectors correspond to the axes of the Gaussian with respect to a fixed coordinate system, such that the *covariance* matrix defines the shape and orientation of the Gaussian.

We also added a criterion to choose the right number of Gaussians into which each surface should be optimally decomposed, based on the Minimum Description Length (MDL) principle. The MDL descriptor to be minimized here is the Bayesian Information Criterion (BIC), as developed by Andrews and Lu (2001):

$$\text{BIC} = -2 \cdot L + d \cdot \log(n)$$

where *L* is the log likelihood of the mixture of Gaussians, *d* is the number of parameters in the model (number of degrees of freedom) and *n* is the number of observations in the sample. The BIC criterion allows choosing the most parsimonious model, i.e., the model which best describes the data with respect to the number of Gaussians it uses for the decomposition. [See also Bhat and Kumar (2010) for a more detailed derivation of the BIC formula].

**Figure 3** shows the approximation of a curvature surface by a weighted combination of four Gaussians. Intuitively, each Gaussian can be illustrated as a hill, whose center, shape, orientation, and height are defined by the Gaussian parameters. The decomposition into 2D Gaussians allows us not only to explore Gaussians as possible units enabling to define the kinematics of octopus arm movements, but also to compactly represent a surface as a weighted sum of 2D Gaussians.

### **CLUSTERING ALGORITHM**

The GMM allows us to decompose octopus arm movements into curvature and torsion 2D Gaussians which describe their kinematics. We refer to each of the resulting Gaussians as a data point, whose dimension corresponds to the number of parameters defining a Gaussian (section surface decomposition using GMM). To cluster these points (i.e., the 2D Gaussians)

into meaningful groups we used the *kmeans* clustering algorithm which is an unsupervised clustering method. The output of the *kmeans* algorithm is *k* disjoint clusters, where each cluster includes a different number of points and is represented by a centroid that can be regarded as the average of all the points assigned to the cluster. *kmeans* uses a two-phase iterative algorithm to yield a clustering result which minimizes the point-to-centroid distances summed over all *k* clusters.

The distance usually employed for *kmeans* is the Euclidean distance (Hastie et al., 2009), but here we want to improve the clustering from two points of view. First, we designed a Weighted Euclidean Distance, i.e., for each of the parameters of the Gaussians—center, shape, area, and orientation, we separately computed the Euclidean distance among the different elements of the sample. We then got four distances, each being related to one of the four parameters. The quantity to be minimized in the *kmeans* algorithm at each step is then an average of these four distances. Second, we used the Gap-Statistics (Tibshirani et al., 2001) as a criterion of the optimal number of clusters to be used. Gap-Statistics compares the within-clusters distance of the distribution (given by *kmeans*) to the within-distance *W*<sup>∗</sup> *kb* of a Monte–Carlo sample drawn within the range of the reference distribution. This criterion was used for example by Ben-Hur et al. (2002) and Pedersen and Kulkarni (2006). The idea of this approach is thus to compare the graph of log(*Wk*) (log of the within-cluster distance) with its expectation under an appropriate null distribution. The mathematical rationale of this approach is explained in greater detail by Tibshirani et al. (2001). Defining *B* as the number of generated data sets, the Gap-Statistics is expressed as:

$$\text{Gap}\left(k\right) = 1/B \sum\_{b=1}^{B} \log\left(W\_{kb}^\*\right) - \log(W\_k).$$

The optimal number of clusters is the minimal *k* which gives a local maximum of the Gap.

# **RESULTS**

Our data set consisted of 60 reconstructions of octopus arm movements that included *extension* movements. Extension in the octopus arm is generally characterized by a bend propagating along the arm (**Figure 11**). Some of these extension movements were preceded by initialization movements, referred to as *preextension* movements, in which the octopus arm moved from an initial random position to a configuration that seemed to be ideal for the extension (**Figure 12**).

Careful examination of the video sequences allowed us to define start- and end-points of the extension phase as characterized in earlier studies (Gutfreund et al., 1996, 1998). In an extension movement, a bend is created usually near the base of the arm and is propagated along the arm toward the tip where it vanishes, while the base of the arm points in the direction of propagation. A pre-extension movement is generally defined as a movement in which an arbitrary configuration of the arm is reconfigured to an initial extension configuration. Based on these observations we initially divided our data into 25 pre-extension and 60 extension movements. In order to characterize sets of kinematic units and determine synthetic rules allowing reconstruction of the observed movements, we next decomposed the movements into curvature and torsion Gaussian units (as defined in section surface decomposition using GMM) and analyzed these units as described below. Since each movement was defined by a specific combination of kinematic units, we could classify the movements into sub-groups, such that all the movements in a sub-group were defined by a combination of similar kinematic units. In order to explain the different phases of the movement analysis as clearly as possible we focus here mainly on the group of extension movements.

### **DECOMPOSITION AND CLUSTERING OF KINEMATIC UNITS**

Curvature and torsion surfaces were extracted for all the octopus arm movements in our database (see section spatio-temporal representation of movement as a pair of curvature and torsion surfaces). The curvature and torsion values at the tip of the arm could be very high (and sometimes noisy) relative to the values along the proximal and middle parts of the arm and were, therefore, analyzed separately. The resulting surfaces were approximated using the GMM, yielding decompositions of each curvature/torsion surface into 2D Gaussians units (see section surface decomposition using GMM). These units were found to naturally describe the surfaces, as each unit essentially described bending or torsion along a defined section of the arm and its movement along the arm as function of time.

A set of 2D Gaussians was assembled as a set of kinematic units separately for the pre-extension and extension movement groups. A set of Gaussians can be variously clustered by referring only to a subset of the parameters defining the 2D Gaussians, namely the center location, size, shape and orientation (see section clustering algorithm). These parameters were easily extracted from the *mean* and *covariance matrix* of a 2D Gaussian; the coordinates of the Gaussian center on the surface (i.e., the time point and the arm index at which the Gaussian reached its maximum) were directly defined by the Gaussian *mean*. The orientation of the Gaussian (the angle between the Gaussian axes and the axes of the fixed coordinate system) was defined by the eigenvectors of the Gaussian *covariance matrix*. The projection of the Gaussian on the plane was an ellipse whose size and eccentricity were defined by the eigenvalues of the *covariance matrix* and the ratio between them. Finally, the relative influence of the Gaussian in the decomposition in which it participated was defined by its weight. The clustering results presented here were obtained by referring to the Gaussian's center location (Gaussian *mean*), Gaussian's shape (ratio between the eigenvalues of the Gaussian's *covariance matrix*), and Gaussian's weight.

**Figure 4** presents the clustering results obtained for the curvature and torsion Gaussians of the 60 extension movements. Gaussians marked by the same color belong to a single cluster. Executing *kmeans* with the Gap-Statistics method (section clustering algorithm) resulted in three clusters both for the curvature and torsion Gaussians. Coordinates of the centroids of the various clusters are presented in **Table 1**.

These results essentially suggest that all the Gaussians composing the curvature and torsion surfaces of the extension movements can be classified into three types according to the values of the Gaussian's center location and shape. Explicitly, the blue curvature cluster (**Figure 4** left) represents curvature Gaussians defining curvature along the proximal section of the arm during the movement. Examining the orientation of these Gaussians as defined by the eigenvectors of their *covariance* matrices shows that there was a relatively small angle between each of the Gaussian axes and the axes of the arm-index—time coordinate system (**Table 2**), that is, the internal Gaussian axes almost aligned with the direction of the arm-index and time axes. This characteristic means that the section of the arm to which these Gaussians relate did not change during the movement; we therefore term them "fixed" Gaussians. We suggest that these Gaussians correspond to movements used to aim the base of the arm toward a target point during the extension movement (see discussion below).

The green and magenta clusters represent curvature Gaussians that travel toward the tip of the arm during the extension and are probably associated with the main characteristics of the bend

**Frontiers in Computational Neuroscience www.frontiersin.org** May 2013 | Volume 7 | Article 60 |

**GMM.** The arm index and time coordinates of each cluster centroid are given in **Table 1**.


**Table 1 | The arm index and time coordinates of the cluster centroids shown in Figure 4.**

**Table 2 | The mean (μ), median (μ1***/***2) and standard deviation (σ) for the orientation values (in degree) of the resulting curvature and torsion clusters shown in Figure 4.**


propagation in extension movements. The mean orientation of each of these clusters is significantly larger compare with the blue cluster that refers to the base of the arm (**Table 2**). A similar interpretation is valid for the resulting clusters of the torsion Gaussians, where the green and magenta clusters refer to torsion Gaussians in the early stage of a movement, and the blue cluster refers to torsion Gaussians at the end of movement. **Table 2** presents the median, mean and standard deviation of the orientation values of the resulting clusters. These findings are supported by earlier analyses and simulations of the stereotypical characteristics of an extension movement (Gutfreund et al., 1998; Yekutieli et al., 2005b).

### **SYNTHESIZING ARM BEHAVIORS FROM KINEMATIC UNITS**

The 2D Gaussians were clustered by the *kmeans* algorithm based on their mean vector and covariance matrix values. The centroid point of each cluster represented the center of the cluster, i.e., the point giving the minimum sum of distances from it to all the data points in that cluster. Since the *mean* vector and *covariance* matrix which define a Gaussian as a data point also apply to the centroid point, a centroid point essentially defines a representative Gaussian for its cluster. Each of these Gaussians has a unique position, size and orientation, thus uniquely defining the octopus arm movement in 3D space.

We therefore consider the curvature and torsion Gaussians defined by the resulting centroid points as kinematic units that can be used to generate a set of behaviors of the octopus arm. These are local time-dependent behaviors, since they refer to a specific section of the arm at a specific time during the movement. **Figure 5** presents the three curvature Gaussians (left) and the three torsion Gaussians (right) defined by the centroid points of the clusters found for the extension movements (**Figure 4**). A curvature unit alone defines a planar arm behavior, as it defines a change in the curvature level along a section of the arm as a function of time, with a zero value for the torsion associated with the arm. Coupling a curvature and a torsion unit, such that both of them refer to a common section of the arm, defines a 3D behavior. However, a torsion unit on its own is meaningless since applying torsion on a straight line representing the backbone of the arm has no effect on its configuration. A torsion unit has no significant effect also when it is coupled with a curvature unit that refers to a different section of the arm. In general, *nC* curvature units and *nT* torsion units can define *nC* · *nT* 3D behaviors, and since the *nC* curvature units define *nC* planar behaviors where they are not coupled with any torsion unit, they can overall define *nC* · (*nT* + 1) behaviors. **Figure 6** presents some of the behaviors that can be defined by the curvature and torsion kinematic units extracted for the extension group (**Figure 5**). They are shown as sequences of quasi-static configurations in 3D space, where the red, black and blue curves represent the initial, intermediate and final configurations in the sequence, respectively.

### **CLASSIFYING OCTOPUS ARM MOVEMENTS**

The kinematic units (curvature/torsion Gaussians) extracted for the extension and pre-extension movement groups can be used further to classify movements in a given group into different sub-groups according to the mixture of Gaussians composing their curvature and torsion surfaces. Intuitively, movements that were decomposed into weighted combinations of similar kinematic units were classified into the same sub-group, as they were assumed to be characterized by a similar 3D behavior.

We represented each of the movements in our data set by a weighted combination of the curvature and torsion kinematic units defined for the group of movements to which the movement belonged. That is, a movement *m* which was approximated by a pair of curvature (*C*) and torsion (*T*) surfaces *C* = *<sup>i</sup> <sup>w</sup><sup>C</sup> <sup>i</sup>* · *<sup>g</sup><sup>C</sup> i* and *T* = *<sup>j</sup> w<sup>T</sup> <sup>j</sup>* · *<sup>g</sup><sup>T</sup> <sup>j</sup>* , was represented by a row vector of the weights: *w* = [ *wC i* , *wT j* ]. We applied the *kmeans* algorithm (section clustering algorithm) to the set of vectors of weights corresponding to a group of movements, such that the input to the algorithm is a matrix of weights, where the rows correspond to the analyzed movements and the columns to the Gaussians that were previously identified as curvature and torsion units.

**FIGURE 5 | The curvature (left) and torsion (right) centroid Gaussians of the extension group of movements.** Each Gaussian is essentially the centroid of one of the clusters in **Figure 4**.

Each row practically defines a movement as a weighted sum of the elementary Gaussian units (**Figure 7**). The *kmeans* algorithm separated the movements into clusters, such that movements belonging to the same cluster shared a similar pattern of weights. That is, a cluster consists of movements that can be spanned by a similar weighted sum of the available curvature and torsion units. We therefore refer to each of the resulting clusters as a subgroup of movements that share similar characteristics of their 3D behavior. The centroid point of each of the resulting clusters was considered a representative pattern of a weighted combination of kinematic units, which defined the behavior of the sub-group of movements in 3D space. We refer to these different behaviors as movement prototypes.

Each of the three pairs of curvature and torsion surfaces presented in **Figure 8** is defined by a weighted combination of kinematic units, corresponding to one of the extension subgroups we found by the process just described. A pair of curvature and torsion surfaces defined a prototype which characterized one of the sub-groups by simulating a sequence of 3D curves whose curvature and torsion values corresponded to the values

given by the curvature and torsion surfaces (**Figure 9**). Although the curvature surface of the first and third prototypes (**Figure 8** top, bottom) share a similar topographic structure, the difference between their curvature surfaces defines prototypes with different characteristics. Relative to the first prototype, the third prototype (**Figure 9** right) describes an extension movement in which a higher level of torsion is observed along with the propagating bend, causing the 3D configuration to deviate from the movement plane. Furthermore, the higher weight of the proximal curvature Gaussian results in a higher level of curvature along the base of the arm. The values presented by the torsion surface for the first prototype decrease during the second half of the movement, meaning that the configuration of the arm tends to become more planar as the movement progresses. For the first prototype (**Figure 9** left) the arm section around the propagating bend, which takes higher curvature values, creates a loop during the initial phase of the extension movement. Compared to the first and third prototypes, in the second prototype (**Figure 9** middle) the propagating bend starts with a higher curvature value and lower curvature values for the proximal section of the arm.

### **PRE-EXTENSION RESULTS**

The analysis described above was also applied to the *pre-extension* movement group. The movements in this group refer to the actions that the octopus arm was observed to perform just before the extension phase has started. The well-defined time point in which the bend starts to propagate along the arm has been used to define the time at which a pre-extension movement ends. The kinematic units extracted for this group and the arm behaviors showed some similarities but also some unique characteristics. **Figure 10** presents the single prototype that was found for preextension movements. It appears to represent the initializing phase of the arm, in which the base is directed toward a target and the bend (which is propagated during the extension) is generated. The initialization of the bend is achieved by generating movements corresponding to the curvature and torsion kinematic units on the same mid-arm section. Such dynamics may be associated with a minimal loss of energy due to interactions with drag forces. Computer simulations of the movements using the dynamic model of the octopus arm (Yekutieli et al., 2005a,b) will help to further explore and characterize this prototype with respect to muscle activation and energy expenditure.

# **DISCUSSION**

By carefully watching the octopus arm movements in video sequences and identifying the time points bounding the extension phase, we were able to divide our data set of reconstructed arm movements into two main groups, *pre-extension* and *extension* movements. The analysis described here was applied separately to each of these groups but we have presented results mainly related to the extension group. Equivalent results for the pre-extension group are also available.

Instead of the common representation of octopus arm movements in 3D Euclidean space, we modeled each arm movement using pairs of curvature and torsion surfaces. These surfaces essentially describe the curvature and torsion values at the sampled points along the virtual backbone of the octopus arm as a function of time and arm index. Such pairs of curvature and torsion surfaces provide a compact description of arm configuration which is independent of the arm location in 3D space and is invariant to rotation and translation. Most importantly, this approach can be used to demonstrate the existence of kinematic units or motor primitives in octopus arm movements.

The characteristics of the surfaces led us to examine whether they can be meaningfully decomposed. We applied the GMM, suggesting the use of 2D Gaussians as building blocks approximating the curvature and torsion surfaces of a movement. These 2D Gaussians provided a mathematically quantified representation whose hilly shapes fitted well to the topographic characteristic of the surface. We thus have demonstrated a meaningful representation for octopus arm movements and a method GMM for decomposing the movements into well-defined building blocks (2D Gaussians) allowing us to further examine them as possible kinematic units. We have also applied an alternative method which decomposes a surface to its fundamental surfaces by analyzing the principal curvature values at each point on the surface. These parameters allow defining eight fundamental surfaces (e.g., peak, pit, ridge) that correspond to the topography of a surface (Yilmaz and Shah, 2005). Interestingly, we found the results extracted by this method to be very similar to those achieved using the GMM—the positions of the peak fundamental surfaces were highly correlated with the mean values of the positions derived using the 2D Gaussians.

Intuitively, being able to represent the arm by a 2D curvature Gaussian corresponds to the propagation of a bend point along a defined section of the arm and during a defined time interval. All the kinematic properties—the affected section of the arm, the time interval, the maximal curvature value and the velocity of propagation—are simply defined by the center location of the Gaussian, by its covariance matrix and by the weight assigned

to this Gaussian. By clustering Gaussians with similar characteristics, we were able to characterize each cluster by its centroid and use the Gaussian representing the entire cluster as a stereotypical kinematic unit. We obtained a set of such curvature and torsion units for each of the pre-extension and extension movement groups. Curvature and torsion units were then combined to simulate new movements in 3D space and to examine whether the entire observed repertoire of complex 3D octopus arm behaviors can be spanned using the derived basic set of kinematic units. We found that patterns of weighted combinations of the kinematic units can be clustered into prototypes of movements in the pre-extension and extension phases, allowing classification of the movements into sub-groups.

The combinations of kinematic units which define the prototypes needs further investigation to reveal the principles underlying the execution of the different arm movements. Sumbre et al. (2001) have suggested that a relation between kinematic features and basic motor programs (embedded within the neural circuitry of the arm itself) greatly simplify the motor control of the octopus arm. In addition, a simple command producing a wave of muscle activation in a dynamic model was sufficient to replicate the kinematic characteristics of natural reaching movements

(Yekutieli et al., 2005a). Specifically, it was found that natural extension movements can be generated by a dynamic model, in which a simple propagating neural activation signal is sent to contract muscles along the arm. In the model, the control of only two parameters fully specified the extension movement: the amplitude of the activation signal and the activation travelling time, such that different levels of activations can result in desired kinematics (Yekutieli et al., 2005b). We suggest that values of these two control parameters can be associated with the characteristics of the kinematic units extracted here. That is, the weight, shape, orientation, and size of a Gaussians can be related to the amplitude of the activation signal and the activation travelling time.

The relation between the kinematic units to the biomechanics of the octopus arm has to be examined (Feinstein et al., 2011). The arm morphology points to the dorsal group of the longitudinal muscles being much thicker than the ventral group, and both groups differ from the lateral groups. This anisotropy suggests that while bending movements to the left and right directions might be similar, this is not the case when comparing between upward, downward and sideward directions. The oblique muscles are composed of three pairs of helical bands, such that the handedness of the helix of one member of the pair is opposite to that of the other member of the same pair (Kier and Stella, 2007). This isotropy with respect to the arm axis supports that torsion toward the two different directions is applied in a similar manner.

The results from our analysis agree with our data on octopus arm movements. The extension movement shown in **Figure 11A** matched with prototype no. 2 of the extension group (**Figure 9** middle), as a movement in which a highly curved bend along a relative short section of the arm propagated rapidly toward a target. The lower movement **Figure 11B** matched with prototype no. 1 of the extension group (**Figure 9** left), as a movement in which the arm moved relatively slowly toward the target with relatively low curvature values for the propagating bend. The Gaussians referring to the main characteristic of extension movements strengthen the previous findings of Gutfreund et al. (1996) of a stereotypic profile of the position and velocity of the bend point. They suggested that the position of the bend in space and time is a controlled variable, which simplifies motor control. The travelling bend, associated with a propagating wave of muscle activation (Gutfreund et al., 1998), was simulated as a biomechanical mechanism in a dynamic model of the octopus arm (Yekutieli et al., 2005a).

Two pre-extension movements shown in **Figure 12** matched with the prototype of the pre-extension group (**Figure 10**). A substantial manipulation of the initial arm configuration was involved by creating a bend in the arm and directing it toward the target. In this movement, curvature and torsion kinematic units are both applied on the mid-section of the arm during the pre-extension phase. These results demonstrate that the Gaussian description of movement primitives allows us to describe a complex motor behavior. Clearly, additional types of octopus arm movements other than the pre-extension and extension movements analyzed here are part of the motor repertoire of octopus behavior. While reconstruction and analysis of these other movements will probably reveal additional kinematic building blocks, we can expect that the general characteristics of octopus

**FIGURE 11 | Two extension movements.** The upper movement **(A)** was classified as prototype no. 2 of the extension group (**Figure 9** middle), as a movement in which a highly curved bend was rapidly propagated toward a target, while the base of the arm stayed oriented with a fixed direction. The lower movement **(B)** was

classified as prototype no. 1 of the extension group (**Figure 9** left), as a movement in which the bend showed lower curvature values and moved relatively slowly toward the target while the direction of the base of the arm was not preserved. The movements progress from left to right in each panel.

**FIGURE 12 | A pre-extension movement as a sequence progressing from the upper left (A) to the lower right (F) frames.** A substantial manipulation, creating a bend and directing it toward a target, was applied to the initial

configuration (upper left). This movement is matched with the prototype of the pre-extension group (**Figure 10**). Frame **(F)** presents a temporal configuration that matches the beginning of an extension movement.

arm movements, mainly the smoothness in which curvature and torsion values vary along both the arm and the time dimensions will hold for other types of movements. Therefore, we believe that Gaussian functions could be efficiently used also in the decomposition and description of those movements.

Our results fit with Yekutieli et al.'s (2007) observations on the kinematic characteristics of the initiation of a reaching movement. The kinematic description is sufficiently rich for describing complex arm movements, although factors such as the biomechanics of the octopus arm (e.g., the different type of muscles and the constant volume constraint), water drag forces and energy expenditure also strongly influence the arm movement characteristics. For example, the perpendicular drag coefficient for an octopus arm is nearly 50 times larger than the tangential drag force coefficient. This most likely affects the preferred arm configuration during extension movements; only a small part of the arm is oriented perpendicularly to the direction of movement, minimizing drag (Yekutieli et al., 2005a). Yekutieli et al. (2005b) also found that the control of extension movements can be specified by the amplitude of the muscle activation signals and the activation travelling time. The primitives we suggest here can be used to further investigate the relation between the kinematic and muscle activation levels.

In our analysis the curvature and torsion surfaces were extracted for arm configurations whose length has been normalized. Replotting the curvature and torsion surfaces while showing the actual arm length values as analyzed from live data (**Figure 13** left) shows that the proximal section of the arm elongates during an extension (i.e., the section between the base and the bend point). This is demonstrated clearly in **Figure 13** (right) which shows the ratio between the length of the proximal section to the length of the entire arm during an extension movement. Arm elongation has recently been shown to play a key role in the biomechanics and control of octopus arm movements (Hanassy, 2008). Modeling the travelling bend along extension movements based on the propagation of muscle activation and stiffening wave (Gutfreund et al., 1998), where co-contraction of both the longitudinal and transverse muscles pushes the bends forward (Yekutieli et al., 2005b). Therefore, different ratios between the activation levels of longitudinal to transverse muscle can be used to control the elongation of the arm along the proximal section between the base and bending point. Gaussian units, which were found in this study to describe the travelling bend during extension movements, will be further examined in order to support recent findings related to the biomechanics and control of the octopus arm.

Our analysis presents a possible language of kinematic primitives—2D Gaussians of either curvature or torsion which define and classify octopus arm behaviors by their different combinations. Constructing a taxonomy of possible movements for a species is one approach to the study of its behavior. To construct a taxonomy of octopus arm movements and to reveal how combinations of components result in a variety of behaviors Mather (1998) used components which consist of movements of the arm itself, the ventral suckers and their stalks, as well as the relative position of the arms and the skin web between them. Comparing similar movement taxonomies and ethograms (catalog of body patterns and associated behaviors) in the squid and various octopus species (Hanlon et al., 1999; Huffard, 2007; Mather et al., 2010) suggests that behaviors may be conserved throughout the evolution of these species. Our results identify a number of kinematic units, possible time-dependent units and sub-groups (**Table 3**). As more reconstructed octopus arm movements become available (Yekutieli et al., 2007; Zelman et al., 2009), we will better be able to use our analytical tools to define a comprehensive language of motor primitives that incorporates the underlying kinematic principles, thus enriching the ethogram and taxonomy of octopus arm behavior.

Our analysis provides a new framework for research on the kinematics and control of any natural or mechanical flexible manipulator. Possible arm behaviors can be simulated by synthesizing new combinations of the extracted Gaussians. New primitives can be hypothesized and tested on dynamic models of the octopus arm and the resulting movements can


**Table 3 | The analysis applied to each movement group yielded a number of kinematic units defining possible local-temporal 3D behaviors of the arm.**

*Each group was divided into a number of sub-groups which were represented by different movement prototypes.*

be compared with live movements. This, in turn, may allow future studies of activation commands at the neural control level, which may then enable operation of a real flexible manipulator to perform specified goal-oriented tasks (Laschi et al., 2009, 2012; Calisti et al., 2011).

in the ICT-FET OCTOPUS Integrating Project under contract #231608, and by Israel Science Foundation #1270/06 to Hochner and Flash. Tamar Flash is an incumbent of the Dr. Hymie Morros professorial chair.

The Supplementary Material for this article can be found online at http://www.frontiersin.org/Computational\_Neuroscience/

# **SUPPLEMENTARY MATERIAL**

**ACKNOWLEDGMENTS**

We thank Dr. Jenny Kien for suggestions and editorial assistance. This work was supported by the European Commission

# **REFERENCES**


of fast-reaching movements by muscle synergy combinations. *J. Neurosci.* 26, 7791–7810.


(Jerusalem, Israel: The Hebrew University).

10.3389/fncom.2013.00060/abstract


*Reach to Grasp Movement.* ed K. M. B. Bennett, U. Castiello (North-Holland: Elsevier), 3–15.


of Sepioteuthis sepioidea squid with a muscular hydrostatic system. *Mar. Freshwater Behav. Physiol.* 43, 45–61.


merging optimality with geometric invariance. *Biol. Cybern.* 100, 159–184.


single joint movements. *Hum. Mov. Sci.* 19, 627–665.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 16 December 2012; accepted: 27 April 2013; published online: 24 May 2013. 4*

*Citation: Zelman I, Titon M, Yekutieli Y, Hanassy S, Hochner B and Flash T (2013) Kinematic decomposition and classification of octopus arm movements. Front. Comput. Neurosci. 7:60. doi: 10.3389/fncom.2013.00060*

*Copyright © 2013 Zelman, Titon, Yekutieli, Hanassy, Hochner and Flash. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

# Segmenting sign language into motor primitives with Bayesian binning

#### *Dominik Endres <sup>1</sup> \*†, Yaron Meirovitch2†, Tamar Flash2 and Martin A. Giese1 \**

*<sup>1</sup> Department of Cognitive Neurology, Section Computational Sensomotorics, CIN, HIH and University Clinic Tübingen, Tübingen, Germany <sup>2</sup> Department of Applied Mathematics and Computer Science, The Weizmann Institute of Science, Rehovot, Israel*

### *Edited by:*

*Andrea D'Avella, IRCCS Fondazione Santa Lucia, Italy*

### *Reviewed by:*

*Emili Balaguer-Ballester, Bournemouth University, UK Ioannis Delis, Istituto Italiano di Tecnologia, Italy*

### *\*Correspondence:*

*Dominik Endres and Martin A. Giese, Department of Cognitive Neurology, Section Computational Sensomotorics, CIN, HIH and University Clinic Tübingen, Otfried-Müller-Str. 25, Tübingen 72076, Germany. e-mail: dominik.endres@klinikum. uni-tuebingen.de; martin.giese@uni-tuebingen.de †These authors have contributed*

The endpoint trajectories of human movements fulfill characteristic power laws linking velocity and curvature. The parameters of these power laws typically vary between different segments of longer action sequences. These parameters might thus be exploited for the unsupervised segmentation of actions into movement primitives. For the example of sign language we investigate whether such segments can be identified by Bayesian binning (BB), using a Gaussian observation model whose mean has a polynomial time dependence. We show that this method yields good segmentation and correctly models ground truth kinematics composed of consecutive segments derived from wrist trajectories recorded from users of Israeli Sign Language (ISL). Importantly, polynomial orders between 3 and 5 yield an optimal trade-off between complexity and accuracy of the trajectory approximation, in accordance with the minimum acceleration and minimum jerk models. Comparing the orders of the polynomials best approximating natural kinematics against those needed to fit the power law ground truth data suggests that kinematic properties not compatible with power laws are also not adequately represented by low order polynomials and require higher order polynomials for a good approximation.

**Keywords: motor primitives, two-thirds power law, differential invariants, Bayesian binning, sign language, minimum jerk model**

# **1. INTRODUCTION**

*equally to this work.*

Complex motor behavior might be organized in terms of sequences of temporal movement primitives that follow each other sequentially in time. Determining such primitives from kinematic data is an important problem for many technical applications, e.g., in robotics, computer vision and computer graphics. At the same time, the characterization of possible temporal primitives that underlie the planning and execution of complex motor behavior remains a partially unresolved issue in motor control (Flash and Hochner, 2005). While the appropriate characterization of the temporal organization of complex motor behavior might require ultimately hierarchical multi-level representations (Flash and Hochner, 2005), many previous studies that investigated the nature of such primitives have focused on the analysis of movement kinematics. Specifically, it has been investigated how the temporal and kinematic properties of the movement are influenced by the path followed by the hand (see e.g., Polyakov et al., 2009b).

One approach to the definition of temporal segments is based on invariant properties that characterize movements within individual segments. It was already established at the end of the nineteenth century that for arm movements, curvature and speed are correlated variables, speed typically obeying an inverse relation to curvature (Jack, 1895). Almost a 100 years later, this rule was quantitatively formalized as the two-thirds power law. Specifically, this rule dictates that for figure drawing movements the speed along the motion path is proportional to the curvature of this path raised to the minus one-third power (Lacquaniti et al., 1983):

$$|\nu(t)| = \alpha \kappa^{-\frac{1}{2}}(t) \tag{1}$$

where *v* is the Euclidean velocity, κ is the Euclidean curvature (i.e. the reciprocal of the radius of osculating circle) and α is the so called velocity gain factor, which is constant within each individual segment.

Extensive research has investigated the conditions and origins of the two-thirds power law. Equation (1) was first developed for two-dimensional drawings but was also later applied to threedimensional drawing under isometric force conditions (Massey et al., 1992) and to various movement modalities including eye pursuit (Viviani and deSperati, 1997) and speech movements (Tasko and Westbury, 2004; Perrier and Fuchs, 2008). Also, the exponent of the two-thirds power law varies in children and becomes more stable with age (Viviani and Schneider, 1991). This principle not only applies to motion production but also to motion perception as has been supported by studies of the perception of handwriting and drawing movements (Soechting et al., 1986; Viviani et al., 2000) and of the motion of abstract visual stimuli (Viviani and Stucchi, 1992; Levit-Binnun et al., 2006). Finally, a functional magnetic resonance imaging (fMRI) study has supported a central representation of the perception of this kinematic law (Dayan et al., 2007). Taken together, these studies show that the 2/3 power law is most likely not the expression of bio-mechanical constraints but may reflect the involvement of the central nervous system (for a review see Flash et al., 2012).

A recent study (Meirovitch, 2008) has investigated the wrist trajectories in sign language with the goal of identifying the motor control strategies for such spatially constrained movements. Movement recordings from normal naive participants revealed that a generalized form of the 2/3 power law (Equation 1) predicts the velocity profiles of the wrist trajectories. The executed trajectories across a number of repetitions were fitted with the model:

$$|\nu(t)| = \alpha \kappa^{\beta}(t) \tag{2}$$

where *v*, κ and α are defined in the same way as for Equation (1) and the parameter β ∈ [−1, 0] typically remains constant during individual trajectory segments. This suggests that this parameter might be used to segment longer action sequences into movement primitives, identifying the segments with an approximately constant β. The presence of the kinematic segments does not necessarily imply segmented control by the brain [see Schaal and Sternad (1999) and Flash and Hochner (2005)]. Therefore, an attempt to unravel kinematic primitives would require to be consistent with optimization models used in motor control such as the minimization of jerk (time derivative of acceleration), variance, etc. [see section 1.1]. Here, we employ the segment-wise constancy of β to generate ground-truth data for the testing of motion segmentation algorithms. We use sign language trajectories as a basis for our analysis and modify their velocity to match a possible power law segmentation with fixed values of the parameters α and β within each predefined segment. The timing of each segment in the ground-truth parameterization closely follows the timing of the natural sign language trajectories. Moreover, in simulating the discrete segmentation of the ground-truth data, the algorithm optimizes for a smooth transition between adjacent segments by choosing suitable power law parameters. The details of this procedure are described in section 2.4.

# **1.1. CONNECTION BETWEEN POLYNOMIALS AND POWER LAWS**

In addition to these power laws (Equation 2), human movements were shown to be well-captured by optimization models that maximize the smoothness of the trajectories, mathematically expressed by the minimization of integrated jerk or by higherorder time derivatives of position, i.e., snap, crackle and so on (Flash and Hogan, 1985; Todorov and Jordan, 1998). Other types of movements (e.g. locomotion and arm reaching) were shown to be well captured by minimum acceleration models (Ben-Itzhak and Karniel, 2008; Mombaur et al., 2010). Mathematically, such models predict that the trajectories will be well captured by polynomials of orders 3, 5, and 7, corresponding to minimum acceleration, jerk and snap models, respectively. In Richardson and Flash (2002) it was mathematically shown that such polynomial trajectories, which optimize mean squared derivative cost functions

$$C\_n = \int\_0^T \left| \left| \frac{d^n r}{dt^n} \right| \right|^2 dt,\tag{3}$$

(where *n* = 3, 4 correspond to minimum jerk and snap, respectively) follow generalized power laws whose exponents depend on the cost function being optimized and on the geometrical shape of the trajectory being traced. In addition, such predicted power laws were shown to be consistent with the power law found empirically in the experimental data.

In another study Polyakov et al. identified parabolic strokes whose generation both obeys the 2/3 power law and yields minimum-jerk trajectories (Polyakov et al., 2009b). Parabolas are interesting because of their invariance with respect to affine transformations and additionally their special role as geodesics in equi-affine geometry which predicts the two-thirds power law (Flash and Handzel, 2007).

For 3-dimensional geometrically complex trajectories, a power law that depends on torsion which measures the rate of change of the osculating plane, was analyzed for 3D drawing movements (Maoz et al., 2009; Pollick et al., 2009) and although the link between power laws and variational optimization principles was studied for several figural forms (Polyakov et al., 2009b), such links have not been examined for natural complex trajectories. Here we provide the first detailed account for the computational equivalence between the generalized power law and variational models. To this end, we present a Bayesian approach for the temporal segmentation of complex end-effector trajectories based on a polynomial observation model and show that the resulting segments can be used to identify the power law structure of the kinematic profiles.

# **1.2. UNSUPERVISED SEGMENTATION OF COMPLEX END-EFFECTOR MOVEMENTS**

Segmenting trajectories into power-law obeying pieces is difficult for three reasons: first, the number of segments and their temporal boundaries are *a-priori* unknown. Second, estimating higher derivatives from noisy trajectory data is prone to errors. Third, segments obeying power laws with different βs (Equation 2) are typically connected by short transition periods during which the trajectory is not well described by a power law (Viviani and Flash, 1995). We address the first two problems by choosing a Bayesian approach, based on Bayesian Binning (BB). Estimating the number of segments is a model complexity estimation problem, which we deal with using the "Occam's razor" inherent in Bayesian approaches: the larger parameter space of more complex models (e.g., having more segments) implies that every individual instantiation of such a model is *a-priori* less probable than a more parsimonious model (Bishop, 2007). Thus, simpler models are preferable, if they can explain the observable data equally well. To handle the (possibly large amounts of) noise in parameter estimates, Bayesian models infer a posterior distribution over parameters, instead of point estimates yielded by maximum-likelihood fitting procedures. This posterior distribution allows to assess not only the expected value, but also the uncertainty of the parameter of interest. We sidestep the third problem—transition periods which do not obey power laws by using a dataset in which all transition periods are very short, almost instantaneous, and a segment model which can also well describe the point to point and transition movements (Flash and Hogan, 1985; Polyakov et al., 2009a). As detailed above, both polynomials and power laws can be derived from the same optimization principles, and power law trajectories can be well fitted by time-dependent polynomials, but the kinematic profiles at the boundaries of segments are better explained by polynomials. Furthermore, the parameterization by polynomials avoids singularities that arise in Equation (2) for straight curve segments with zero curvature.

The rest of the paper is structured as follows. First we give a concise description of the data recordings and ground truth generation in section 2, since these data have not been published before. We develop BB for the segmentation of wrist trajectories recorded with motion capture in section 2.5. There, we show how to use BB with observations models having a polynomial time dependence of the mean and segment-wise constant coefficients. In section 3 we demonstrate the results achieved by BB and compare them with the ground truth. Finally, we give an outlook for further investigations in section 4.

# **2. MATERIALS AND METHODS**

To generate a form of ground truth data that complies with the kinematic law (Equation 2), we corrected the speed parameterization of the original, recorded trajectories in a way that made them exactly compatible with the kinematic law.

Our experiments were based on data adapted from Meirovitch (2008). A short description of the experiments is given below. In section 2.4 we give a more detailed description of the segmentation mechanism which we used to synthesize the "ground truth" segmentation. While the employed synthesis method may appear complicated, we chose it to generate a ground truth with a high degree of biological realism.

# **2.1. SUBJECTS AND SIGNS**

Subjects were two natural users of Israeli Sign Language (ISL): S00 (male, 45), a native signer who acquired ISL during his childhood through exposure to his parents, and S01 (male, 44) who acquired ISL in childhood.

Each subject was asked to sign two words, either "cake (baked)" or "chandelier." The English equivalents of these ISL signs were shown in English on a screen in front of the subject prior to execution. Each sign was repeated 20 times.

# **2.2. DATA RECORDING AND PREPROCESSING**

Hand movements were recorded using the Polhemus LIBERTY 240/16 motion capture system which recorded the location of a sensor fixed to the subject's wrist at an accuracy of 0.08 cm at a frequency of 240 Hz.

The trajectories were preprocessed with a 50-samples 6 Hz low pass FIR filter (normalized gain of −6 dB at 6 Hz), and their velocity profiles and Euclidean curvature were calculated for each sample *n* (Calabi et al., 1998).

# **2.3. FITTING OF POWER LAW AND QUANTIFICATION OF COMPLIANCE**

We used correlation coefficients to compare the actual trajectories with the predictions based on the best-fitting power law within individual trajectory segments. Within each segment, the law was fitted using non-linear regression (Coleman and Li, 1994, 1996; Maoz, 2007), where the predicted speed value is denoted as *v*ˆ(*n*) and where αˆ and βˆ are the fitted parameter values for this segment. The predicted speed is given by the relationship:

$$
\hat{\nu}(n) = \hat{\alpha} \kappa^{\hat{\beta}}(n) \tag{4}
$$

A measure for the quality of the fit is given by the compliance

$$R\_s^2(\hat{\alpha}, \hat{\beta}) = 1 - \frac{\sum\_{\substack{i \le n \le j \\ \sum\_{i} n \le j}} \left(\nu(n) - \hat{\nu}(n)\right)^2}{\sum\_{\substack{i \le n \le j \\ \text{average}}} \left(\nu(n) - \nu\_{\text{average}}\right)^2} \tag{5}$$

where *s* is the segment for which regression is carried out and *v*average is the average speed within the segment.

# **2.4. POWER LAW SEGMENTATION**

To generate a data set as ground truth that fully complies with the power law (Equation 2) the speed of the original trajectories was reparameterized (by time warping) to make the individual trajectory segments exactly compatible with the best fitting kinematic law. The purpose of the time-warping is the generation of a ground truth dataset that fully complies with the power law against which the automatic (polynomial) segmentation can be compared. Our paradigm enables the treatment of several minimization principles (e.g., acceleration, jerk, snap, crackle etc.) in parallel by choosing the model via Bayesian model comparison.

The idea employed in the ground truth synthesis is based on randomizing some parameters of the segmentation while optimizing others. At the first step the algorithm iterates through possible segmentations in which the temporal breakpoints and the exponents of the power laws are randomized. To respect the smoothness characteristic of natural movement, the gain-factors of the respective segments are then calculated from the curvatures at the boundaries of the segments, and the trajectory is then timewarped. The segments that comprise the ground truth dataset have durations comparable to the respective segments in the natural trajectories, comparable maximal speed, and their speed is continuous at the boundaries. An example of a synthesized ground truth trajectory is shown in **Figure 1**.

In the following, we give some additional technical details: To avoid singularities we excluded from the data basis the initial and terminal parts of the movement, where the speed was below a prescribed ratio of the maximal speed (<15% × maximal speed).

First, each trajectory was randomly partitioned into *N* = 3 consecutive time intervals, [*ti*,*ti* <sup>+</sup> <sup>1</sup>] *N <sup>i</sup>* <sup>=</sup> <sup>1</sup>, where the first interval begins at *t*1, and the last interval terminates at the end of the recorded movement, at time *tN* <sup>+</sup> 1, such that the duration of each segment, *ti* <sup>+</sup> <sup>1</sup> − *ti*, was not too short (>300 ms). We proceeded to synthesize the power law parameters for each of the random partitions. At the first step, we uniformly randomized β = {β1,..., β*N*} ∈ [−1, 0] *<sup>N</sup>* such that |β*<sup>i</sup>* <sup>+</sup> <sup>1</sup> − β*i*| > 0.1 and those β *N*-tuples that were biologically implausible were rejected according to criteria that are described below.

The α parameter, α = {α1,..., α*N*}, was determined based on the randomized β value and the empirical speed and curvature. The first value, α1, was determined from the empirical speed using Equation (2) according to:

$$
\alpha\_1 = \nu(0)\kappa(0)^{-\beta\_1},
$$

**FIGURE 1 | The trajectories of an ISL "Cake" repetition before (a) and after (d) time-warping. (A)** The samples of the raw data are colored according to time. **(B)** The time-warped trajectory, where the three colors designate power law segments. Log–Log plots of curvature and velocity are depicted on the right side: **(b)** raw data and **(c,e)** time-warping, colored according to time and segments, respectively. It can be seen that the spatial representation, i.e., path, of the raw and time-warped trajectories are identical. Also, although both raw and time-warped log–log curves are

The α*i*+<sup>1</sup> value was determined enforcing the constraint that the speed should be continuous at the segment boundaries, resulting in the relationship:

$$\nu(t\_{i+1}) = \alpha\_{i+1}\kappa(t\_{i+1})^{\beta\_{i+1}} = \alpha\_i\kappa(t\_{i+1})^{\beta\_i}.$$

Using spline interpolation we reparameterized the trajectories within each time interval, [*ti*, *ti* <sup>+</sup> <sup>1</sup>], defining a new effective time characterized by linear segments, the time warping is based on a simpler power law representation which is characterized by three highly fitting (*R*<sup>2</sup> > 0.97) segments with beta values ranging from about −0.19 to −0.6. The transitions between long straight segments in the log–log representation are made either by a very brief transitional period which does not comply with the power laws of the adjacent segments [e.g., the portion between the blue and green segments in **(e)**] or in the temporal point of intersection of the piecewise linear sections [e.g., green to red segment in **(e)**].

parameter, τ, that was defined up to a constant by:

*ds* = α*i*κ(*t*) β*i d*τ

where *ds* is the Euclidean arc-length parameterization of the trajectory. This relationship results in the differential equation:

$$\frac{d\mathfrak{r}}{dt} = \frac{1}{\alpha\_i} \nu(t) \kappa(t)^{-\beta\_i}.$$

that links the original and the warped time axis of the trajectory.

Finally, those trajectories that induced biologically improbable high speed ratios, were rejected. For those synthesized trajectories that were not rejected, we recalculated, using non-linear regression (see section 2.3), the actual αˆ and βˆ *N*-tuples for the accepted [*ti*,*ti* <sup>+</sup> <sup>1</sup>] time intervals and those were stored for further analyses. It is important to note that our reparameterization method did not change the durations of the original behavioral time intervals given by τ*<sup>i</sup>* <sup>+</sup> <sup>1</sup> − τ*<sup>i</sup>* = *ti* <sup>+</sup> <sup>1</sup> − *ti*.

### **2.5. BAYESIAN BINNING FOR SIGN LANGUAGE SEGMENTATION**

In the following, we present an unsupervised segmentation algorithm that is based on BB. Briefly, BB is an approach to modeling data with a totally ordered structure, such as time series, by functions which are piecewise defined. The total order allows for an efficient iteration over all possible segment configurations in polynomial time.

BB was originally developed for modeling of (typically very noisy) neural spike train data (Endres et al., 2008; Endres and Oram, 2009) and their information-theoretic evaluation (Endres and Földiák, 2005). It was later generalized for regression of piecewise constant functions (Hutter, 2007). Concurrently, a closely related Bayesian formalism for dealing with multiple change point problems was introduced by Fearnhead (2006).

To apply BB to wrist trajectories we augment it by an observation model for the trajectory segments which is Gaussian with a full covariance matrix and a polynomial time dependence of the mean. This model was originally developed by two of the authors for segmenting joint angle trajectories of human actors in a "natural" fashion (i.e., in agreement with human intuition) (Endres et al., 2011a,b). To make this paper self-contained, the following sections describe the prior over bin boundaries (section 2.5.1) and the observation model (section 2.5.5). The algorithmic details of evaluating posterior expectations are only outlined schematically, they are elaborated in Endres and Földiák (2005). A full derivation of the polynomial observation model, including the exact posterior updates can be found in Endres et al. (2011a).

The results of this segmentation algorithm are validated using the ground-truth data basis from section 2.4 that consists of trajectories whose segments exactly comply with the previously described power law. We show (section 3) that BB results in good segmentation of data fulfilling this kinematic law. Furthermore, we argue that BB generalizes the segmentation approaches presented in Barbic et al. ˇ (2004) and Polyakov et al. (2009b) [see section 4].

# *2.5.1. The bin boundary prior*

Our objective is to model a time series *D* in the time interval [*t*min = *t*1,*t*max = *tN* <sup>+</sup> <sup>1</sup>]. We discretize [*t*min,*t*max] into *T* contiguous intervals of duration *t* = (*t*max − *t*min)/*T*, such that interval *j* is [*j* · *t*, (*j* + 1) · *t*] (see **Figure 2**). Choose *t* small enough to capture all relevant features of the data<sup>1</sup> . We model the generative process of *D* by *M* + 1 contiguous, non-overlapping segments, indexed by *m* and having *inclusive* upper boundaries *qm* ∈ {*qm*}. The segment *m* therefore contains the time interval *Tm* = (*t qm*−1, *t qm*]. Let *Dm* be that part of the data which falls into segment *m*. We assume that the probability of *D* given {*qm*} factorizes as

$$P(D|\{q\_m\}, M) = \prod\_{m=0}^{M} P(D\_m|q\_{m-1}, q\_m, M) \tag{6}$$

with *q*−<sup>1</sup> = −1, *qM* = *T* − 1.

# *2.5.2. Prior on* **{***qm***}**

Since we have no preferences for any segment boundary configuration other than they be totally ordered, the segment configuration prior becomes

$$P(\{q\_m\}|M) = \binom{T-1}{M}^{-1} \tag{7}$$

1E.g., choose *t* in the order of 1/sampling rate.

where *<sup>T</sup>*−<sup>1</sup> *M* is the number of configurations in which *M* ordered segment boundaries can be distributed across *T* − 1 places (segment boundary *M* always occupies position *T* − 1, hence there are only *T* − 1 positions left). While this prior expresses no preferences for boundary positions, it is important for complexity control: as long as *T M* (which is typically the case), this prior will decrease exponentially in *M*, thereby punishing models with larger number of segments.

# *2.5.3. Prior on M*

We have no preference for any model complexity (i.e., number of segment boundaries), so we let

$$P(M) = \frac{1}{T} \tag{8}$$

since the number of segment boundaries *M* must be 0 ≤ *M* ≤ *T* − 1.

# *2.5.4. Posterior of* **{***qm***}**

For temporal segmentation, the most relevant posterior is that of the {*qm*} for a given *M*:

$$P(\{q\_m\}|D,M) = \frac{P(D|\{q\_m\})P(\{q\_m\}|M)}{P(D|M)}\tag{9}$$

For the denominator, we need to compute *P*(*D*|*M*):

$$P(D|M) = \sum\_{q\_0=0}^{q\_1-1} \sum\_{q\_1=1}^{q\_2-1} \dots \sum\_{q\_{M-1}=M-1}^{T-1} P(D|\{q\_m\}, M) \tag{10}$$

which appears to be *<sup>O</sup>*(*TM*) since it involves *<sup>M</sup>* sums of length *O*(*T*). However, using the form of *P*(*D*|{*qm*}, *M*) (Equation 6) and distributivity of multiplication over addition allows us to "push sums" past all factors which do not depend on the summation variable:

$$P(D|M) = \sum\_{q\_0=0}^{q\_1-1} \sum\_{q\_1=1}^{q\_2-1} \dots \sum\_{q\_{M-1}=M-1}^{T-1} \prod\_{m=0}^{M} P(D\_m|q\_{m-1}, q\_m)$$

$$= \sum\_{q\_0=0}^{q\_1-1} P(D\_0|q\_{-1}, q\_0) \sum\_{q\_1=1}^{q\_2-1} P(D\_1|q\_0, q\_1) \dots$$

$$\dots \sum\_{q\_{M-1}=M-1}^{T-1} P(D\_M|q\_{m-1}, q\_M) \tag{11}$$

Each sum of length *O*(*T*) needs to be evaluated *O*(*T*) times for the possible values of the upper summation boundary. As there are *<sup>M</sup>* sums, this calculation has complexity *<sup>O</sup>*(*MT*2), rather than the naïve *<sup>O</sup>*(*TM*). This is an instance of the **sum-product** algorithm (Kschischang et al., 2001). As explained in Endres and Földiák (2005), the expectations of functions of the model parameters (e.g., segment boundary position, segment width or probability of a segment boundary at a given point in time) can be evaluated similarly, if the function depends only on the parameters of one segment.

# *2.5.5. Observation models P(D***|{***qm***}***) for wrist trajectories*

For the wrist trajectories, we employed a multivariate Gaussian observation model with polynomial time-dependence of the mean, because we would like to explore the relationship between power-laws and polynomials. With this choice, we can specify a conjugate prior on the parameters, which allows for an evaluation of expectations and marginal probabilities within each segment in closed form. A prior is conjugate to an observation model, if the resulting posterior has the same functional form as the prior (Bishop, 2007). In that case, posterior updates reduce to parameter updates of the prior, instead of having to compute a (often intractable) multi-dimensional integral. Thus, we can efficiently compute the marginal probability of the data given the number of bin boundaries (Equation 11), as explained above.

The exponential family conjugate prior on the mean μ and the precision matrix **P** (inverse covariance) is given by an extended Gauss–Wishart density (see e.g., Bishop, 2007). Let *X<sup>t</sup>* ∈ *D* be a *L* = 3-dimensional vector of wrist positions at time *t* ∈ *Tm*, and *S* be the chosen polynomial order. Let *tm* = *t qm*<sup>−</sup><sup>1</sup> be the start time of segment *m*. Then

$$p(\vec{X}\_t | t \in T\_m) = \mathcal{N}(\vec{X}(t); \vec{\mu}\_m, \mathbf{P}\_m^{-1}) \tag{12}$$

$$p(\mathbf{P}\_m|\upsilon\_m, \mathbf{V}\_m) = \mathcal{W}(\mathbf{P}\_m; \upsilon\_m, \mathbf{V}\_m) \tag{13}$$

$$\vec{\mu}\_m = \sum\_{i=0}^{S} \vec{a}\_{i,m} (t - t\_m)^i \tag{14}$$

The *a<sup>m</sup>* = (*ai*,*m*) are the polynomial coefficients in segment *m*. Note that this vector has (*<sup>S</sup>* <sup>+</sup> <sup>1</sup>) · *<sup>L</sup>* components. *<sup>N</sup>* (*X* ,μ , <sup>=</sup> **P**−1) is a multivariate Gaussian density in *X* with means μ and covariance matrix . *W*(**P**; ν, **V**) is a Wishart density in **P** with ν degrees of freedom and scale matrix **V**. To construct a prior which is conjugate to the likelihood (Equation 12), we choose a vector α*<sup>m</sup>* = (α*i*,*m*) with (*S* + 1) · *L* components, which are the biases on *am*. Furthermore, we introduce a symmetric, positive (semi-)definite (*S* + 1) × (*S* + 1) matrix **B***m*, which contains the concentration parameters on *am*. The prior on *a<sup>m</sup>* given **P***<sup>m</sup>* is then a multivariate Gaussian density

$$\mathcal{P}(\vec{a}\_m|\vec{\alpha}\_m, \mathbf{B}\_m, \mathbf{P}\_m) = \mathcal{N}(\vec{a}\_m; \vec{\alpha}\_m, \mathbf{Q}\_m^{-1}) \tag{15}$$

where the (*S* + 1)*L* × (*S* + 1)*L* matrix **Q***<sup>m</sup>* is given by the Kronecker-product of **B***<sup>m</sup>* and **P***<sup>m</sup>* (i.e., block-wise multiplication of the entries **B***m*,*i*,*<sup>j</sup>* of **B***<sup>m</sup>* with **P***m*):

$$\mathbf{Q}\_m = \mathbf{B}\_m \otimes \mathbf{P}\_m = \begin{pmatrix} \mathbf{B}\_{m,0,0} \mathbf{P}\_m & \cdots & \mathbf{B}\_{m,0,S} \mathbf{P}\_m \\ \vdots & \ddots & \vdots \\ \mathbf{B}\_{m,S,0} \mathbf{P}\_m & \cdots & \mathbf{B}\_{m,S,S} \mathbf{P}\_m \end{pmatrix} \tag{16}$$

It is shown in Endres et al. (2011a) that the product of the Gaussian (Equation 15) with the Wishart (Equation 13) does constitute a conjugate prior on the likelihood given by Equation (12). Since the prior is conjugate, we can evaluate the marginal likelihood of the data in each segment, and BB can be applied with this observation model.

# **3. RESULTS**

# **3.1. TRAJECTORY FITTING AND POLYNOMIAL ORDER DETERMINATION**

To illustrate that BB is a suitable tool for the computation of compact and accurate ISL trajectory representations, we generated ISL-like trajectories with a 3rd order polynomial segment structure and evaluated if BB was able to recover this polynomial order and the segment boundaries. These trajectories were computed by fitting the original data (black lines in **Figure 3A**) with 3rd order polynomials using BB and evaluating the posterior expected trajectories (red lines in **Figure 3A**), which we will refer to as "fitted trajectories." The dotted vertical lines in this plot are the most probable segmentation points determined by BB, of which it suggests *M* = 7 boundary points with almost certainty. We determined this number by finding the maximum of Equation (11) with respect to *M*.

We then tested whether BB would be able to recover the polynomial order of such fitted trajectories. To this end, we ran BB on the fitted trajectories and evaluated the posterior distribution of the segment order. The result, averaged across the whole dataset, is shown in **Figure 3B**. The correct polynomial order, here 3, is recovered with near certainty.

The fitted trajectories follow the original trajectories very accurately. **Figure 3D** shows the variance explained (EV) by polynomial orders between 0 and 7. Even for 0th order fits, EV > 0.95, obtained with on average *M* = 13 bin boundaries. EV > 0.99 for orders greater than 1, and it stays in that range for all tested orders up to 7, where an average *M* = 2 are needed to fit the data.

For a quantitative evaluation of the match between the segmentation points of the fitted trajectories, and the BB segmentation points computed on these fitted trajectories, we conducted a hit rate analysis similar to Endres et al. (2011a). The results are plotted in **Figure 3C**. We obtained this plot in the following way: after computing the most probable number of segmentation point with Equation (11), say *M*opt, we found the *M*opt maxima of the posterior distribution of the segmentation point locations. This yielded the "predicted segmentation points" (PSP). Denote with PSP3 the segmentation points of the fitted trajectory, and with PSP *<sup>k</sup>* the segmentation points of a *k*-th order BB model computed from the fitted trajectory. A PSP *<sup>k</sup>* counted as a hit if it was within an accuracy window of 90 ms of a PSP3, and if no other PSP *<sup>k</sup>* had been matched to that PSP3 already. All remaining PSP *<sup>k</sup>* comprised the false positives. PSP3s without a matching PSP *<sup>k</sup>* were counted as misses. The hit rate is then computed in the usual way:

$$\text{hit rate} = \frac{\text{hits}}{\text{hits} + \text{misses}}$$

which implies that the hit rate ≤1. Moreover, the miss rate (or false negative rate) is just given by miss rate = 1 − hit rate. Computing a false positive rate for a standard ROC analysis

$$\text{false positive rate} = \frac{\text{false positives}}{\text{false positives} + \text{true negatives}}$$

is somewhat problematic, since it requires the evaluation of the "true negatives," i.e., the number of instances where neither

**FIGURE 3 | (A)** Fitting a sign language trajectory (black lines) with a 3rd order polynomial segment model (red lines). Dotted vertical lines: most probable segmentation points determined by Bayesian binning. The fit closely models the original trajectory. Explained variance averaged across the whole dataset is >99%, see **(D)**. **(B)** Posterior probability of segment order, computed by using the 3rd order fitted trajectories computed with BB [red lines in panel **(A)**] as data. The correct polynomial (3, indicated by red vertical line) order is recovered with near certainty. **(C)** Hit rate analysis for polynomial segment orders between 0 and 7, using the 3rd order fitted trajectories as data. Dashed line: line of no discrimination. At order 3, hit rate is maximal with no false positives. Error bars (standard errors of the means of hit rate and false positive rate) are smaller than the symbols. **(D)** Explained variance of the original trajectories as a function of the polynomial order of the BB fit. Error bars are ±1 standard deviation, computed across the whole dataset. All polynomial orders are able to fit the data well. For details, see text.

BB model predicts a segmentation event. This number depends on the chosen discretization: the false positive rate can be reduced almost arbitrarily by increasing the temporal resolution, since both PSP *<sup>k</sup>* and PSP3 are (almost) point events. We therefore chose to evaluate the false positives per second, which is largely independent of the temporal resolution. As a reference, we computed a "line of no discrimination" (dashed line in **Figure 3C**) assuming a homogeneous Poisson process with rate parameter λ as a generator of uninformative segmentation events. Each setting of λ corresponds to one point on the line of no discrimination. If a given model's performance is above that line, then it can be said to provide an informative signal about the fitted trajectory segmentation points.

All polynomial orders provide an informative signal, with the 3rd order model performing optimally: it combines a very high hit rate with almost no false positives.

We performed the above analyses with fitted trajectories of orders between 1 and 7, see Appendix A1. The results are very similar to the 3rd order results presented here: the posterior distribution of the segment order peaks strongly at the order of the fitted trajectories. Moreover, the hit rates of the fitted order are near one, with almost no false positives.

# **3.2. POWER LAW GROUND TRUTH EVALUATION**

We applied BB to the generated power law ground truth trajectories to determine whether the segments predicted by BB would match the imposed power law segments [see section 2.4]. Prior hyperparameters were α*<sup>m</sup>* = 0, **B***<sup>m</sup>* = δ(*i*,*j*) × 0.1, prior covariance was diagonal with the data variances as diagonal entries. We chose *T* = 100 time discretization points, and experimented with polynomial orders between 0 (constant trajectories per segment) and 7.

As shown in **Figure 4**, top panel, the posterior expectation of the trajectories follows the actual trajectory closely for all orders. However, the 0th order observation model requires a large number of segments to do so. With increasing order, the number of necessary segments decreases. Shown in **Figure 4**, lower panels, are the segmentation densities, i.e., the posterior probability

**FIGURE 4 | Top panel:** Trajectory of X coordinate (black line) and posterior expected trajectories for observation models of 0th (red), 3rd (blue), and 6th (cyan) order. All observation models provide a good fit. Vertical dotted black lines indicate ground truth segmentation points in all panels. **Lower panels**: Segmentation densities (i.e., probability density of segmentation boundaries) for these observation models. The 3rd order model puts boundaries close to the ground truth, with no false positives in this example.

densities of finding a segment boundary at a given point in time. Black dotted vertical lines indicate ground truth segmentation boundaries in all panels. The 0th order model generates many, uncertain segmentation boundaries, resulting in a large number of false positives with respect to the ground truth. The 6th order model generates too few segments, but its segment boundaries coincide with the ground truth. The 3rd order model puts boundaries close to the ground truth, without false positives in this example.

For a more quantitative performance evaluation, we conducted a hit rate analysis as described above. The results are plotted in **Figure 5**. Here, power law ground truth segmentation points (vertical dotted lines in **Figure 4**) are compared against segmentation points predicted by BB models of polynomial orders between 0 and 7. The BB segmentation points were obtained as described in section 3.1. As can be seen in **Figure 5**, most polynomial orders provide an informative signal about the ground truth. However, the lower orders generate significantly more false positives per second than the higher ones. For orders >3, the hit rate decreases without a matching decrease in the false positives. This can be seen more clearly in the hit rate per false positives per second (HPFPPS) plot in **Figure 6**, bottom panel. Let

$$\text{HPFPPS} = \frac{\text{hit rate}}{\text{false positives per second}} \tag{17}$$

The larger HPFPPS, the fewer false positives are incurred per hit, hence a large HPFPPS is desirable. In the ground truth data, it peaks at polynomial segment order 3. This peaking is

**FIGURE 5 | Hit rate analyses for each sign ("cake" and "chandelier") and subject ("S00" and "S01"), for all polynomial orders between 0 and 7.** A predicted segmentation point counted as a "hit" if it occurred within an accuracy window of 90 ms around a ground truth segmentation point. The lines of no discrimination (black dashed) were computed assuming a homogeneous Poisson process as a generator of uninformative segmentation events. Error bars are ±1 standard errors of the means. For details, see text.

significant (Kruskal–Wallis, *p* < 10−<sup>7</sup> for both testing order 3 vs. rest and testing all orders against each other). A polynomial order of ≈3 therefore seems a reasonable choice for these data. This observation is confirmed by the posterior distribution of the polynomial orders (**Figure 6**, top panel), which peaks at order 3 for the ground truth data.

# **3.3. POLYNOMIAL ORDER OF ISL TRAJECTORIES**

Interestingly, the best polynomial order for the real ISL data peaks at 4, with *P*(3 ≤ order ≤ 5) > 0.95 (see **Figure 7**). The order of 5 corresponds to trajectories that comply with the minimum jerk principle (Flash and Hogan, 1985), which has been largely established as describing the structure of many types of

truth data of **Figure 5**, this quantity is maximized for order 3. Error bars are

natural movements (Todorov, 2004). This shows that the power law temporal structure in the real data requires higher order polynomials, suggesting that the co-articulation between consecutive power law segments is better represented by polynomial orders > 3. In other words, BB combined with a segment-wise polynomial trajectory model results in biologically reasonable segments that could be indicative of individual optimally controlled submovements. We elaborate this point further in the discussion (section 4).

# **3.4. INTERPRETATION OF SEGMENTS**

We worked on single signs, so the segments discovered by BB are units on a sub-semantic level. Even within movements, like a drawing of a letter or an ellipse, there are often multiple segments that are described by different mixtures of several non-euclidian geometries (Bennequin et al., 2009; Polyakov et al., 2009a; Pham and Bennequin, 2012). Our approach aimed at estimating such invariants. Consequently, a hit rate analysis for the real data cannot be done meaningfully, because we segmented single signs and because there is no accepted method for the sequential decomposition of trajectories based on power laws. To find out the relevant segments is exactly the scientific problem in motor control research which is addressed by our Bayesian approach. We therefore created ground truth data with known segments against which the Bayesian decomposition was successfully compared. Whether these segments are related to the temporal aspects of "phonemes" of sign language (Sandler and Lillo-Martin, 2006) (phonemes are defined as the smallest, contrastive units in a spoken language) remains to be investigated.

# **4. DISCUSSION**

We presented two novel contributions in this paper: firstly, we demonstrated the applicability of BB with piecewise polynomial observation models to motion capture data with a segment-wise

**with polynomial orders between 0 and 7, computed on the real ISL trajectories.** Posteriors were averaged across signs and subjects. 4th order is preferred.

±1 standard errors of the means.

power law structure. Secondly, we found that ISL wrist trajectories are best described by observation models with polynomial orders between 3 and 5.

This is compatible with established principles in motor control, like the minimum jerk and minimum acceleration principles. The study in Richardson and Flash (2002) suggested three main insights. First, among all optimization criteria whose mean squared derivative (MSD) cost functions are

$$C\_n = \int\_0^T \left| \left| \frac{d^n r}{dt^n} \right| \right|^2 dt,$$

the optimal trajectories that correspond to *n* = 3 (minimum jerk) provide the best kinematic fit to point-to-point reaching movements. Second, for periodic movements, the cost functions corresponding to *n* = 3 (minimum jerk—fifth order polynomials) and *n* = 4 (minimum snap—seventh order polynomials) provide reasonable predictions while optimal trajectories corresponding to the limit case *n* → ∞ converge to the 2/3 power law. Third, earlier studies (e.g., Viviani and Cenzato, 1985) have suggested based on the two-thirds power law that complex movements should be segmented at inflection points, however, this segmentation criterion is also predicted by a path-constrained minimum jerk criterion and thus may not necessarily be a result of segmented control by the brain. It should be noted that inflection points are special cases in equi-affine geometry since at these points the equi-affine arclength vanishes faster than the Euclidean arclength [ *<sup>d</sup>*<sup>σ</sup> *ds* → 0, see Flash and Handzel (2007); Bennequin et al. (2009)] from which one deduces that that the 2/3 power law breaks down at inflection points. Hence any kinematic model that is compatible with the 2/3 power law will give similar segments to those hypothesized according to the law and this explains the observations in Richardson and Flash (2002) and of Todorov and Jordan (Todorov and Jordan, 1998)—whereby both studies were using a constrained minimum jerk (as it was named by Todorov and Jordan). However, it was not *a-priori* clear whether this agreement will hold for different complex geometries and for different optimization principles. Our results indicate that the two approaches lead to compatible segmentations in a general sense. The unsupervised BB approach shows that for highly complex motor tasks, optimal MSD segments are temporally aligned with the generalized power law segments. It should be noted that MSD criteria are drawn from first principles and thus provide a predictive model while the two thirds power law was mainly studied as a descriptive model (Lacquaniti et al., 1983; Viviani and Cenzato, 1985). Nevertheless, it was found that the 2/3 power law is theoretically founded in equi-affine geometry (Pollick and Sapiro, 1996; Flash and Handzel, 2007; Bennequin et al., 2009). From this we hypothesize that the power law modulation is a possible outcome of an optimization procedure that takes into account different MSD criteria such as minimum acceleration, minimum jerk and minimum snap models.

Another implication is related to the distribution of polynomial orders found in the power law ground truth kinematics vs. that of the original kinematics. The kinematics in the ground truth dataset was implemented by introducing a perfect power law segmentation that respects the timing in the original data in a segment-wise manner and maintains continuity at the boundaries of segments. The original kinematics may differ from the ground truth in the parameters of the natural power law regularities and the transitions between segments which may be comprised of both co-articulatory movement and movement kinematics not adhering to the power law. The latter is more probably capturing the differences between the two datasets. We therefore hypothesize that the differences in the distribution of the polynomial orders found by BB for these two datasets is related to transitional movements that are less compatible with the generalized power law and require a description involving higher order polynomials. The interpretation of this result in terms of the MSD approach suggests that the minimum acceleration model (*C*2) does not provide an equally good explanation for the complex co-articulatory movements in between and at the boundaries of power law segments for which jerk and snap minimizations are required.

Three methods for motion capture data segmentation are compared in Barbic et al. ˇ (2004): segment-wise PCA, probabilistic PCA (pPCA) and finite Gaussian mixture models. The pPCA methods is found to deliver the best performance compared to manual segmentation. Our 0th order polynomial segment model, due to its full covariance matrix, essentially describes each segment by a different (p)PCA decomposition. This decomposition could be extracted from the posterior covariance matrices. Hence, the 0th order model is approximately equivalent to the best method of Barbic et al. ˇ (2004). Segment positions are decided via a Mahalanobis distance criterion in Barbic et al. ˇ (2004), which is related, but not equivalent to the marginal log-likelihood of our Gaussian observation model used by BB<sup>2</sup> . As illustrated in **Figure 5**, our higher-order models offer a significant performance advantage over a pPCA model with constant means on the ISL data.

The authors of Polyakov et al. (2009a) found that monkey scribbling trajectories could be fitted well with parabolic pieces. Such pieces can be generated by our 2nd order segment model. We showed that higher polynomial orders are favored on both an ISL-inspired ground truth and real (human) ISL data. However, the segmentation criteria in that paper appears rather different from ours: while we use a marginal likelihood based criterion which follows from the polynomial observation model, the authors of Polyakov et al. (2009b) first extracted from the recorded data portions corresponding to active movement and others of rest. The extracted movement portions were segmented into strokes at curvature extrema. This is just one example of a wide range of segmentation approaches based on kinematic descriptors, another example is the work of Fod (2002) which uses speed features.

Hidden Markov Models (HMM) have been used extensively for both action segmentation and recognition, see e.g., Kulic et al. (2009) for a template based approach, or the switching HMM approach of Green (2003) where actions are segmented

<sup>2</sup>Due to the conjugate prior on the polynomial observation model, we can integrate over the likelihood before taking the logarithm, see Endres et al. (2011a) for details. The resulting marginal log-likelihood is monotonically related to a Mahalanobis distance.

into "dynemes," a kind of dynamical primitives. While dynamical primitives are in principle more invariant, and hence variation tolerant (e.g., against time-warping) than polynomial segments, they are also much harder to learn: in Green (2003), the dynemes had to be defined manually. For American Sign Language recognition, Vogler and Metaxas (1998) used a semi-supervised training scheme, which was extended to deal with two-hand signing using parallel HMMs in Vogler and Metaxas (1999, 2001). In that work, labeled and pre-segmented data were used to bootstrap the training process. In contrast, we segmented sign language based on kinematic regularities which, in order to be independent of representation or a linguistic formalism (Sandler and Lillo-Martin, 2006), must be unsupervised. Furthermore, unsupervised segmentation facilitates working with large datasets.

We conclude that BB combined with polynomial observation models represents a biologically well-inspired way for the unsupervised extraction of movement primitives from natural action streams. It remains to be investigated whether our approach is applicable to data obtained with other recording

# **REFERENCES**


to bounds. *SIAM J. Optimiz.* 6, 418–445.


modalities, e.g., EMG, and if it yields interpretable results on forces/torques instead of positions. Instead of a polynomial observation model for (wrist) positions, one could also construct an observation model for velocities and curvatures. This would lead to a more direct power law segmentation than the approach presented here, and will be of interest for future work.

# **ACKNOWLEDGMENTS**

This work was supported by the EU Commission, 7th Framework Programme: EC FP7-ICT-249858 TANGO, EC FP7-ICT-248311 AMARSi, the Deutsche Forschungsgemeinschaft: DFG GI 305/4- 1, DFG GZ: KA 1258/15-1, the German Federal Ministry of Education and Research: BMBF FKZ: 01GQ1002A and the European Commission, Fp 7-PEOPLE-2011-ITN(Marie Curie): ABC PITN-GA-011-290011. We also acknowledge support by Deutsche Forschungsgemeinschaft and Open Access Publishing Fund of Tuebingen University. We thank Prof. Wendy Sandler and Mr. Meir Etedgy for very helpful discussions and for their critical contributions to the data acquisition of the sign language data. We also thank the reviewers for their constructive comments.

multiple changepoint problems. *Stat. Comput.* 16, 203–213.


production challenge the 1/3 power law. *J. Neurophysiol.* 100, 1171–1183.


smooth persuit eye movement. *J. Neurosci.* 17, 3932–3945.


*Conference on Computer Vision, 1999.* Vol. 1, (Los Alamitos: IEEE Computer Society), 116–122.

Vogler, C., and Metaxas, D. (2001). A framework for recognizing the simultaneous aspects of american sign language. *Comput. Vis. Image Understand.* 81, 358–384.

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 20 February 2013; accepted: 08 May 2013; published online: 27 May 2013.*

*Citation: Endres D, Meirovitch Y, Flash T and Giese MA (2013) Segmenting sign language into motor primitives with Bayesian binning. Front. Comput. Neurosci. 7:68. doi: 10.3389/fncom. 2013.00068*

*Copyright © 2013 Endres, Meirovitch, Flash and Giese. This is an openaccess article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

# **A. APPENDIX**

# **A.1. Polynomial Order Determination for Ground Truth Orders = 3**

In response to a reviewer request, we repeated the analysis in section 3.1 for polynomial ground truth orders other than 3. The resulting order posteriors and hit rate analyses are depicted in **Figure A1**. For all tested ground truth orders, the BB order posterior peaks at the ground truth order, i.e., this order can be recovered with high probability. Furthermore, hit rate is near one with almost no false positives for the ground truth order only, i.e., correct segment boundaries can also be determined by BB.

**segment orders between 0 and 7, using fitted trajectories of several orders as ground truth.** Vertical red lines in the order posteriors indicate ground truth order. Dashed lines in the hit rate analyses: line of no discrimination. Results for ground truth order 3 were depicted above in

orders, i.e., the correct order is recovered by BB with high probability. Hit rates are near one, with almost no false positives if and only if the ground truth order and BB order match. Error bars (standard errors of the means of hit rate and false positive rate) are mostly smaller than the symbols.

# Transitions between discrete and rhythmic primitives in a unimanual task

### *Dagmar Sternad1 \*, Hamal Marino2, Steven K. Charles 3, Marcos Duarte4, Laura Dipietro5 and Neville Hogan5,6*


### *Edited by:*

*Tamar Flash, Weizmann Institute, Israel*

### *Reviewed by:*

*Goren Gordon, Weizmann Institute of Science, Israel Omri Barak, Technion, Israel*

### *\*Correspondence:*

*Dagmar Sternad, Departments of Biology, Electrical and Computer Engineering, and Physics, Northeastern University, 134 Mugar Life Science Building, 360 Huntington Avenue, Boston, MA 02115, USA e-mail: dagmar@neu.edu*

Given the vast complexity of human actions and interactions with objects, we proposed that control of sensorimotor behavior may utilize dynamic primitives. However, greater computational simplicity may come at the cost of reduced versatility. Evidence for primitives may be garnered by revealing such limitations. This study tested subjects performing a sequence of progressively faster discrete movements in order to "stress" the system. We hypothesized that the increasing pace would elicit a transition to rhythmic movements, assumed to be computationally and neurally more efficient. Abrupt transitions between the two types of movements would support the hypothesis that rhythmic and discrete movements are distinct primitives. Ten subjects performed planar point-to-point arm movements paced by a metronome: starting at 2 s, the metronome intervals decreased by 36 ms per cycle to 200 ms, stayed at 200 ms for several cycles, then increased by similar increments. Instructions emphasized to insert explicit stops between each movement with a duration that equaled the movement time. The experiment was performed with eyes open and closed, and with short and long metronome sounds, the latter explicitly specifying the dwell duration. Results showed that subjects matched instructed movement times but did not preserve the dwell times. Rather, they progressively reduced dwell time to zero, transitioning to continuous rhythmic movements before movement times reached their minimum. The acceleration profiles showed an abrupt change between discrete and rhythmic profiles. The loss of dwell time occurred earlier with long auditory specification, when subjects also showed evidence of predictive control. While evidence for hysteresis was weak, taken together, the results clearly indicated a transition between discrete and rhythmic movements, supporting the proposal that representation is based on primitives rather than on veridical internal models.

### **Keywords: discrete, rhythmic, internal models, primitives, arm movements**

# **INTRODUCTION**

### E pur si muove (and yet it moves). Galileo Galilei, June 22, 1633

One core question in motor neuroscience is how neural control interacts with the peripheral mechanics of the body: What aspects of skilled movements are controlled by the nervous system and what aspects are attributable to intrinsic limb mechanics? What does the central nervous system "know" about the dynamics of its body? Neural control of movements necessitates some internal knowledge of the limb dynamics to predict and integrate the consequences of its commands to the mechanical periphery. However, it appears unlikely that the brain can make real-time predictions based on a veridical model of detailed body dynamics, especially if that model must include the dynamics of objects to be manipulated. Neural information transmission is extremely slow, which leads to substantial communication delays. This is particularly important for fast movements, where there is no time to correct for errors, yet complicated non-linear mechanical effects such as Coriolis accelerations become prominent. Consider wielding a whip: using a veridical model to predict the mechanics of this flexible body interacting with a compressible gas (air) would tax even modern supercomputers. Using this model to find an "optimal" action to place the end of the whip at a desired location in space is vastly more challenging. Yet, humans solve this apparently intractable problem and can manipulate a whip with astonishing skill. But how? Observing the solution that humans adopt is revealing: it appears to consist of essentially two relatively simple movements, a large sweeping arm motion combined with an extremely fast (and precisely-timed) wrist "flick."

We propose that the nervous system generates complex actions by combining elements from a limited set of modules or primitives. In recent work (Hogan and Sternad, 2012, 2013) have outlined a theoretical framework proposing dynamic primitives for the control of sensorimotor behavior. Specifically, we argued that actions are composed of submovements, oscillations, and mechanical impedances, the latter to account for interaction with external objects. Note that these primitives are dynamic attractors giving rise to the observable movements. Submovements and oscillations in particular generate the observable discrete and rhythmic movements, respectively. This work extends previous experimental and modeling work by Sternad and colleagues that examined complex movements as a combination of discrete and rhythmic elements (Sternad et al., 2000; Sternad, 2008). Generating actions on the basis of primitives may afford more computational efficiency but, in turn, may also imply less versatility. The present experimental study probes into such limitations to test whether rhythmic and discrete movements indeed reflect distinct classes of behavior, i.e., primitives.

A few studies have provided support that discrete and rhythmic movements are mediated by different neural circuits. An fMRI study revealed significantly different cerebral activation for the two types of movements (Schaal et al., 2004). In continuous rhythmic wrist movements cerebral activation was largely confined to unilateral primary motor areas, whereas a sequence of discrete movements elicited strong additional activity in the bilateral parietal cortex and cerebellum. Subsequent behavioral results reinforced this difference. For example, Ikegami et al. (2010) showed that adaptation to altered visuomotor conditions was almost fully transferred from discrete to rhythmic performance, while there was minimal transfer in the reverse direction. Howard et al. (2011) reported that when learning reaching movements in force fields with different directions, interference between reaching in different force fields was reduced when each field was performed in either a rhythmic or discrete manner.

While these studies present intriguing differences between rhythmic and discrete performance, a stronger test for the existence of primitives would be to reveal limitations that arise when the two movements coexist. Previous experimental work by Sternad and colleagues examined movements that combined rhythmic and discrete elements in uni- and bimanual, single- and multi-joint tasks. Probing superposition of rhythmic and discrete movements at random phasing revealed that discrete displacements preferentially occurred in limited phase windows of the ongoing rhythmic movements (Sternad et al., 2000, 2002; Sternad and Dean, 2003; Wei et al., 2003). If control were based on a veridical internal model, it should be possible to superimpose or merge discrete and rhythmic movements in any task-specified way, subject only to the limitations of the musculo-skeletal system. Other evidence comes from slow movements, where the slow response of the musculo-skeletal system becomes less important. If control was based on a veridical internal model, it should be possible to perform rhythmic movements arbitrarily slowly. However, several studies indicated that repetitive movements, if performed sufficiently slowly, transition to a sequence of discrete movements (Doeringer and Hogan, 1995; Adam and Paas, 1996; Hogan et al., 1999; van der Wel et al., 2010). Further, evidence from stroke patients showed that their earliest recovered movements are "quantized," but become smoother and more continuous with recovery (Krebs et al., 1999; Rohrer et al., 2002, 2004).

The present study adopted another approach to expose limitations arising from a modular representation. Based on the reasonable assumption that fast movements have high computational and mechanical demands (more prominent velocityand acceleration-dependent forces and time-stressed planning to compensate for them), we stressed the neuromechanical system by driving it fast. Performing a sequence of discrete point-topoint movements at an increasing frequency should challenge the central nervous system, leading to an eventual "break-down" of movement performance, either due to mechanical or computational limitations. We hypothesized that if the CNS operates on the basis of primitives, discrete movements with precisely timed starts and stops would become computationally more challenging with speed and give way to rhythmic movements as an easier way to satisfy the timing demands. Importantly, this transition should occur after the neuromuscular system had reached its mechanical limits and, ideally, within the course of a single cycle (*Hypothesis 1*).

In systems that have multiple stable states, transitions between them typically depend not only on the present state, but also on the history of states. Therefore, transitions in opposite directions may exhibit an asymmetry, usually termed hysteresis. This is particularly the case in systems that have a lag between input and output, as has been demonstrated in numerous physical systems. Importantly, this type of hysteresis has also been identified in biological systems and, specifically, in perceptual-motor systems. For example, the perception of motion direction shows hysteresis (Williams et al., 1986; Hock et al., 2005). In human locomotion the transition from walking to running typically happens at a speed greater than in the reverse direction (Thorstensson and Robertson, 1987; Hreljac, 1995; Li, 2000). We therefore hypothesized that if discrete and rhythmic movements are generated by dynamic primitives with attractor stability, then transitions between these two movements should exhibit hysteresis (*Hypothesis 2*).

To test these two hypotheses we required subjects to perform a sequence of precisely timed discrete movements in synchrony with a decreasing and, subsequently, increasing metronome interval. We used a chirp signal where, after initially constant intervals, each successive interval first decreased by a fixed amount per cycle, then was constant for several cycles, then increased again by a fixed amount per cycle, and finally remained constant. Discrete movements require explicit demarcation by a non-zero interval of dwell time (Hogan and Sternad, 2007). To emphasize starting and stopping as a task criterion, the instruction explicitly specified this dwell time to be half of the movement time. To make this instruction even more explicit, we presented not only a sequence of short auditory beeps, but also prolonged auditory stimuli that exactly specified the required dwell duration. In this way, any deviation from the instructed dwell time could not be attributed to ambiguous instruction. If movement control is based on primitives, we hypothesized that transitions from discrete to rhythmic movements would occur, even if explicit temporal information about dwell time was available (*Hypothesis 3*).

A large body of research has examined control and adaptation of discrete reaching movements. The majority of these studies included visual control or examined visually evoked adaptations explicitly (Krakauer et al., 2000; Shabbott and Sainburg, 2009, 2010). As visual information typically predominates over proprioceptive and auditory information, visual information and visually-based error-correction processes may mask aspects of representation and may potentially bias neural or computational constraints due to primitive-based representation. To avoid such masking or biasing effects, the experiment tested performance with and without visual information. We hypothesized that the transition from discrete to rhythmic movements would occur for both visual conditions, but may be facilitated under no-vision conditions (*Hypothesis 4*).

# **METHODS**

# **PARTICIPANTS**

Ten volunteers participated in this experiment (31 ± 12 years old, 6 male and 4 female). Nine subjects were right-handed according to the Edinburgh handedness test, one subject was left-handed. Prior to data collection, the participants were informed about the experimental procedure and signed an informed consent form approved by MIT's Institutional Review Board.

# **EXPERIMENTAL APPARATUS AND DATA COLLECTION**

The participant was seated in front of a table, with the sternum close to the table edge (**Figure 1**). To fixate the shoulder position, two belts tied the upper body to the back of the chair. A mark on the table was used to locate each subject in a comparable position. The marked spot was in the subject's mid-sagittal plane, 23 cm away from the edge of the table (in the anterior-posterior

**FIGURE 1 | Experimental set-up.** Subject holds a handle with a magnetic sensor attached that measures and shows its position online on the monitor in front. Subject is instructed to perform discrete movements forward (away from his body) and backward (toward his body), sliding on the horizontal table surface. Movement amplitude is indicated by two large circles on the monitor.

direction) and ∼26 cm distant from the subject's sternum. The height of the chair was adjusted to position the subject's upper arm to be ∼45◦ to horizontal; the forearm rested on the table. From this neutral position, the subject could perform a reaching movement forward and backward in the sagittal direction, involving both shoulder and elbow joints without reaching the limits of their workspace. The forearm was mounted on a low-friction skid that reduced the static and kinetic friction on the surface during the movements. A brace stabilized the wrist to discourage wrist joint rotation.

Two circular targets were shown on a screen in a vertical arrangement to signal the amplitude of the movements (**Figure 1**). The two targets were at a distance of 14 cm from the neutral position, specifying a movement amplitude of 28 cm. The targets had a radius of 5 cm, which was relatively large so that accuracy requirements were minimal. The monitor was placed ∼65 cm in front of the subjects to display the targets and a cursor showing their movements. The display gain was 0.5, showing the targets and movement amplitude at half of their real size. The subjects were asked to move from target to target by moving the handle back and forth on the table in the sagittal direction. A computer-generated metronome signal prescribed the timing for the movements.

The subject grasped a handle onto which a magnetic Flock of Birds sensor was attached (Ascension Technologies, Burlington, VT). The static accuracy and resolution were 0.25 and 0.08 cm, respectively. The combined weight of the sensor and handle was ∼70 g, which is about 1/8 of the mass of the hand. The sampling frequency was 100 Hz. This frequency was sufficient as the frequency content of the motion was significantly below 50 Hz and anti-aliasing was not required. The position was zeroed with the handle in a neutral position shown by a mark on the tabletop. Data collection was controlled by a custom-made software routine written in Tcl on a computer running Linux Ubuntu operating system.

# **EXPERIMENTAL CONDITIONS AND PROCEDURE**

At the beginning of each trial participants placed their hand in the neutral position. All participants used their dominant hand. They were then instructed to move between targets in synchrony with the metronome sounds. The metronome sequence took the following form for all trials: the trial began with 20 sounds separated by an interval of 2 s, presenting a constant-period signal for 40 s. Subsequently, 50 sounds were produced where each interval decreased by 36 ms, ending at an interval of 200 ms. This short period was sustained for 20 sounds, equivalent to a duration of 4 s. After this constant-period interval, another 50 sounds with an increasing interval of 36 ms followed. The trial ended with 20 sounds of 2 s duration. The total trial duration was 194 s for a sequence of 160 moves. **Figure 2** shows the sequence of periods as a function of time and also as a function of the number of metronome sounds. The figure also includes the change of intersound-interval as a percent of the previous inter-sound-interval to highlight that the changes were initially small, below 2%, but then grew to a maximum of 18%.

Subjects were instructed to perform point-to-point movements with an explicit dwell time to separate the movements

into discrete movements. This dwell time should last for 50% of the metronome interval as accurately as possible. Subjects were instructed to maintain this discretization of movements for as long as possible, even when the pace increased and made the task more difficult.

This same timing sequence was presented under three different perceptual conditions. In the "vision" condition *V-short*, subjects had their eyes open and executed their movements to the displayed targets on the monitor. All metronome sounds had a duration of 50 ms, a short "beep." In the "no-vision" condition, *NV-short*, subjects were asked to close their eyes, which removed the amplitude specification; the metronome sounds were still short with a 50 ms duration. In a third condition, *NV-long*, the sound duration was longer and adjusted at each interval to last 50% of the period, giving exact auditory information about the instructed dwell duration. Subjects again kept their eyes closed to encourage focus on the auditory timing information. These three conditions were repeated twice, both times in the same sequence. Comparison of the two trials allowed a test whether the performance features changed with practice. The total duration of these six experimental trials was ∼25 min. Prior to data collection, each participant performed several moves with the metronome to familiarize him/herself with the task.

# **DATA REDUCTION AND ANALYSIS**

Of the 3D signals from the Flock-of-Birds sensor only displacements in the sagittal direction were processed. **Figure 3** shows a complete time-series of one trial (condition *V-short*), divided into five segments due to the length of the trial. The time-series reveals the dwell times at the extreme positions in the initial and final slow-paced sections of the trial. As the pace increased, the dwell times steadily decreased and then disappeared. The dwell times reappeared with increasing metronome interval. The vertical lines denote the onset of the metronome sound; the circles mark the onset and offset of each movement.

To extract the changes of kinematics with the changing metronome pace and to identify whether a transition in control occurred, the kinematic signals were analyzed as follows. Before extracting quantitative markers, the position data were smoothed using a five-sample moving average filter, with centered filtering that did not introduce a lag (*smooth* function in Matlab®). Velocity was obtained numerically from the two-sample difference of the position signal, and was smoothed again with the same five-sample moving average filter. The acceleration signal was obtained by spline fitting and differentiation of the position signal (see below).

# **PARSING INTO SINGLE MOVEMENTS**

For all analyses, the continuous kinematic data were parsed into single movements, delimited by *t*onset and *t*end (**Figure 4A**). Both onset and end of a movement were defined by the time when velocity crossed a threshold, defined as 3% of the peak velocity of the same movement. Given that all subsequent measures depended on this temporal demarcation, alternative thresholds of 1 and 5% were compared for their influence on subsequent analyses. As no significant differences were identified, we used 3% as the threshold for all subsequent analyses. Another option was to apply a fixed threshold, for example based on a threshold velocity. However, with decreasing interval and increasing velocity, a fixed threshold would favor earlier parsing. As this might have biased the calculations of dwell times (see below) in favor of our hypothesis, we chose the percent-based criterion.

This parsing analysis, however, faced a challenge when the movements merged to become approximately sinusoidal: the velocity decreased to zero but there was no longer any dwell time (**Figure 4B**). To eliminate false detections, a linear fit was applied to the velocity samples that were below the 3% threshold; the number of samples for this regression varied between 3 for fast movements up to 100 for slow movements (depending on the individual). *t*onset and *t*end were considered coincident when the *R*<sup>2</sup> of the linear fit was above 0.99. The sample with the lowest velocity defined the time separating adjacent movements. To test the robustness of the onset and end times, we also ran the algorithm with an *R*<sup>2</sup> cut-off of 0.95. The small differences that resulted did not have any effect on the subsequent analyses, most notably on onset and offset of dwell time.

### **MOVEMENT TIME, DWELL TIME, AND TOTAL MOVEMENT TIME**

Movement time *MTi*, dwell time *DTi*, and total movement time *TMTi* were defined as

$$MT\_i = t\_{\text{end},i} - t\_{\text{onset},i}$$

$$DT\_i = t\_{\text{onset},i+1} - t\_{\text{end},i}$$

$$TMT\_i = t\_{\text{onset},i+1} - t\_{\text{onset},i}$$

where *i* and *i* + 1 denote the movement index. The duty cycle *DCi* for each movement was defined as:

$$DC\_i = MT\_i / (MT\_i + DT\_i)$$

By instruction, *DCi* should be equal to 0.5; it became 1.0 when dwell time became zero.

### **TRANSITION BETWEEN DISCRETE AND RHYTHMIC MOVEMENTS**

The criterion used to define a transition between discrete and continuous rhythmic movements was when dwell time became zero. In the accelerating portion of the trial, this time *DT* = 0Accel was defined by the movement *i* that had zero *DT* and was followed by at least two more movements with zero *DT*. Similarly, for the decelerating portion, where rhythmic movements transitioned to discrete movements, *DT* = 0Decel was defined as the

last movement *i* with zero *DT* preceded by at least two movements with zero *DT*. This criterion proved to be very robust, as the onset and offset times remained unaffected by small changes in the parsing algorithm (see above). **Figure 5** shows an exemplary trial with *MT* and *DT* displayed as a function of metronome number. For statistical comparisons we examined the movement times *MT* and metronome numbers associated with *DT* = 0.

than zero (first segment of profile); onset and offset are identical (second segment) and dwell time is zero. Linear regression at this

point yields *R*<sup>2</sup> greater than 0.99.

To evaluate the hypothesized transition from discrete to continuous rhythmic movements, it was also necessary to determine the minimum movement duration that subjects could perform, which was typically longer than the shortest metronome interval. Inspection of the data showed that the region of minimum movement duration was not coincident with the trial segment of the 200 ms metronome intervals. Hence, to obtain a robust estimate of this interval when subjects reached and left their minimum movement duration, a window was defined using the following threshold (**Figure 5**):

$$MT \le MT\_{\min} + 0.05 \times (MT\_{\max} - MT\_{\min})^2$$

where *MT*min and *MT*max was the absolute minimum and maximum in the same trial, respectively. Successive *MT*s below this

threshold defined the window length, which was typically shorter than the 20 metronome sounds that defined the constant-period interval. This window is shown in **Figure 5** as a horizontal red line in the center of the trial. The mean *MT* in this interval was used as a robust estimate of *MT-*min.

These defined landmarks served to quantify the relation between *DT* = 0 and when minimum movement duration was reached. In particular, we defined -*T*Accel as the interval between *DT* = 0Accel and the start of *MT-*min, and -*T*Decel as the interval between the end of *MT-*min and *DT* = 0Decel.

$$
\Delta T\_{\text{Accel}} = \text{start}(MT \text{- min}) - DT = 0\_{\text{Accel}}
$$

$$
\Delta T\_{\text{Decel}} = DT = 0\_{\text{Decel}} - \text{end}(MT \text{- min})
$$

### **DISCRETENESS INDEX**

It was expected that with changing pace, the shape of the kinematic profile of each movement would change. To capture this modulation, a discreteness index *DI* was defined for each movement *i* based on the position and acceleration profiles (**Figure 6**). The *DI* was defined as the relative timing of the first peak in the acceleration profile *t*acc with respect to the movement time:

$$DI\_i = (t\_{\text{acc},i} - t\_{\text{onset},i}) / MT\_i$$

**Figures 6A,B** illustrate this measure for discrete and rhythmic movements, using simulated profiles for clarity. A discrete movement *DI* had a non-zero value, while for the sinusoidal movement, *DI* was equal to zero. For reference, the *DI* for a cycloidal movement is 0.25. A movement that minimizes jerk has a *DI* of about 0.20 (Hogan, 1984). **Figures 6C,D** shows data.

As acceleration is notoriously noisy, the position signal of each movement (in the interval between onset and offset, not considering dwell time) was approximated via a quintic spline. The second derivative of the spline served as the acceleration profile needed for the calculation of the discreteness index. As the figure for the fast rhythmic movement shows, the quintic spline was different for the first and second half of the movement. Importantly, though, the DI was zero.

## **ONSET ASYNCHRONY**

To quantify how movements were synchronized with the metronome sounds, a measure of onset asynchrony was defined as the temporal difference between each movement's onset and the corresponding metronome sound's onset (**Figure 4A**):

$$OA\_i = t\_{\text{onset},i} - t\_{\text{metronome},i}$$

Positive numbers indicated that the movement onset lagged the onset of the metronome sound. Determining this difference needed care as in the faster portions of the trial subjects occasionally skipped cycles. To avoid erroneous matching yielding onset asynchronies longer than a cycle, the algorithm matched the metronome and movement times with increasing time for the first accelerating part of the trial; for the second decelerating part of the trial, the algorithm started at the end of the trial and worked backwards to determine correspondence between metronome and movement times.

# **STATISTICAL ANALYSES**

Comparison of the different measures in the three conditions was performed using two-way analysis of variance, with condition and trial as fixed factors and subjects as random factor. If variables of the accelerating and decelerating portion were compared, a three-way 3 (condition) × 2 (Accel, Decel) × 2 (trial) ANOVA was conducted. If different segments of the same trial were compared, a 3 (condition) × 2 (segment) × 2 (trial) ANOVA was used. Student *t-*tests were used to compare time estimates with metronome times. The significance level was always set at α = 0.05.

# **RESULTS**

An exemplary time series of one subject in condition *V-short* is shown in **Figure 3**. The figure shows that in the first and last portion of the trial dwell times at the position extrema demarcating discrete movements were pronounced, but these plateaus decreased and eventually merged at ∼90 s into the trial. They reappeared at approximately at 115 s in the trial. Movements appeared to be synchronized with the metronome, although strict synchronization between metronome sounds and movement peaks was lost during the short interval section in the center of the trial. In fact, almost all subjects skipped or even inserted cycles in the fast portion of the trial. In 22 out of 60 trials the number of cycles did not correspond to the metronome-specified number. However, the occurrence did not show any systematic dependency on the task condition. Supported by spontaneous comments of the subjects following the experiment, the fast movements were very difficult to perform and synchronization with the metronome could no longer be sustained.

Movement amplitudes were relatively invariant in the slower portions of the trial following the visual targets but became more variable and smaller when the movements became fast. As is to be expected, the decrease in amplitude in the short-interval section was more significant in the no-vision conditions. This corroborated that maintaining synchrony with the metronome was indeed taxing as intended and subjects compromised amplitude to facilitate the increasingly faster timing.

# **SYNCHRONIZATION WITH METRONOME**

To evaluate whether subjects followed the task instructions and produced the movement times as specified, **Figure 7** presents the total movement time, *TMT*, defined as the temporal difference between two consecutive movement onsets, plotted for each movement and for all subjects in condition *V-short*. The green band is an envelope around the instructed intervals with a width of ±350 ms that encompassed 99% of all values. The results are displayed as a function of metronome number. This convention avoided inconsistent plotting across individuals when movement cycles were skipped, which happened frequently during the shortinterval section of the trial.

To test the subjects' synchronization to the instructed time, *TMT* was regressed against the metronome interval. For perfect synchronization, the slope of this regression should be the identity line. For all 10 subjects in all three conditions the mean slope was 0.98 ± 0.025 (one standard deviation), with *R*2 values ranging between 0.89 and 0.99. There was no significant difference between the three conditions. Hence, it can be concluded that overall, subjects maintained synchronization with the metronome.

## **DISCRETE MOVEMENTS AND DWELL TIME**

To assess whether subjects satisfied the task instructions and produced discrete movements separated by dwell times, the TMT was split into its component times, dwell time *DT*, and movement time *MT*. **Figure 8** shows one subject's data in the three conditions (first trial), with *DT* in green, *MT* in blue, and the

duty cycle *DC* in red; the black line represents the metronomespecified movement interval. As is clear from the duty cycle, the instructed dwell time was not always achieved, neither in the steady state portions of the trial, nor in the accelerating and decelerating portions. This subject did not display a duty cycle of 1.0 in the beginning of conditions *NV-short* and *NV-long*. Other subjects showed similar deviations in this part of the trial but no systematic pattern or dependency on condition could be identified.

When the movement intervals changed after movement number 20, the duty cycle was not maintained for long. Following a steady decrease, dwell time *DT* vanished at about metronome number 60 in the decelerating portion of the trial. In the accelerating portion of the trial, *DT* re-appeared approximately at metronome number 100, followed by a declining duty cycle. **Figure 9** summarizes these observations for the condition *V-short*. The figure shows the average of all 10 subjects; the shaded error bands represent ± one standard deviation across subjects.

The important observation in **Figures 8**, **9** is that dwell times vanished before movement durations had reached their minimum in the center portion of the trial; in the accelerating portion of the trial, dwell times re-appeared after movement time had already increased. According to the task instructions, dwell times should have declined continuously proportional to movement duration until minimum movement time was reached. In contrast, *Hypothesis 1* predicted that discrete movements would be abandoned early in favor of computationally less demanding rhythmic movements.

# **MINIMUM MOVEMENT TIME**

To test *Hypothesis 1*, we first determined minimum movement time, *MT*-min, as reached in the center of the trial (see **Figure 5** and methods). All subjects' values in the three perceptual conditions were submitted to a 3 (condition) × 2 (trial) ANOVA. Results revealed that the condition *V-short* showed a significantly longer *MT-*min of 248 ms, compared to 220 and 217 ms in the no-vision conditions *NV-short* and *NV-long*, *F*(2, <sup>47</sup>) = 5.76, *p* < 0.001. *Post-hoc* tests showed that the 248 ms in the *V-short* condition was significantly different from the other two conditions, which did not differ from each other (*p* < 0.01). The two trials showed no significant difference. Pair-wise student *t*-tests compared *MT-*min against the instructed movement interval of 200 ms and detected a significant difference for all subjects in all three conditions, *t*(59) = 4.82, *p* < 0.0001. This result again corroborated that, as intended, subjects had reached their limit of movement duration in all conditions. This conclusion was further supported by the fact that this effect was not influenced by practice. The shorter times in the no-vision conditions correlated with a decrease in amplitudes. As expected from *Hypothesis 4*, subjects complied better with timing demands of the task when visual information was removed.

### **MOVEMENT TIME AT TRANSITION TO RHYTHMIC MOVEMENTS**

We next determined the times when dwell time disappeared and reappeared in the decelerating and accelerating portions and the corresponding movement times, *MTDT* <sup>=</sup> <sup>0</sup> (see **Figure 5**). Subjecting *MTDT* <sup>=</sup> <sup>0</sup> to a 3 (condition) × 2 (Accel-Decel) × 2 (trial) ANOVA revealed a significant difference between conditions, *F*(2, <sup>80</sup>) = 4.77; *p* < 0.01, while there was no difference between the accelerating and decelerating segments, nor a significant difference across the two trials. Subjects performed with an average movement time of 463 and 457 ms for *V-short* and *NV-short*, respectively; for *NV-long* the average movement time was 536 ms, which was significantly longer than the other two conditions, which did not differ from each other (*p* < 0.01). These times were significantly longer than the minimum movement times (*p* < 0.01). This result supported *Hypothesis 1.* However, the symmetry in movement times between the accelerating and decelerating portions was not consistent with *Hypothesis 2*. Importantly, these results also revealed that zero dwell times started earlier in the condition with the explicit temporal information. This finding is consistent with *Hypothesis 3* predicting that explicit temporal information will not prevent a transition. In fact, the result shows that the transition occurred even earlier in the short beep conditions.

### **HYSTERESIS**

Despite the symmetry in movement times at *DT* = 0, inspection of **Figures 8**, **9** shows that the first and second transition, i.e., start and end of the minimum movement time, *MT*-min, were slightly shifted relative to the instructed movement times or metronome indices, suggestive of hysteresis. To quantify this observation we defined the intervals between *DT* = 0 and *MT*-min in the accelerating and decelerating portions, -*T*Accel and -*T*Decel, for each subject (see methods and **Figure 5**). A first set of *t*-tests confirmed again that these intervals were significantly different from zero for all subjects and conditions (*p* < 0.01). Subsequently, -*T*Accel and -*T*Decel were subjected to a 3 (condition) × 2 (Accel-Decel) × 2 (trial) ANOVA. Results showed a significant difference between the two transitions, *F*(1, <sup>76</sup>) = 24.77, *p* < 0.0001, but no differences between the three perceptual conditions, nor between the two trials. Dwell time vanished on average 11 movements before the minimum movement time was reached; in the decelerating portion movement time *MT* increased on average 7 movements before dwell time *DT* re-appeared. This asymmetry is consistent with a hysteresis effect as predicted by *Hypothesis 2*. The persistence of fast rhythmic movements after metronome intervals had increased lasted ∼2.8 s, which is relatively long and makes a mechanical cause improbable.

### **DISCRETENESS INDEX**

To further identify the hypothesized transition we examined changes in the continuous movement kinematics in terms of the discreteness index *DI*; this index quantified the shape in the bellshaped velocity profile of each movement. **Figures 10A–C** shows the *DI* for all subjects in the three perceptual conditions. As can be seen, *DI* changed rapidly near movement number 60 and again at 100 in a similar fashion for all three conditions. **Figure 10D** shows an exemplary single subject's trial in the *V-short* condition to further illustrate the abrupt change in *DI*. For statistical analysis, the trial was parsed into three segments: before *DT* = 0Accel, the center portion, and after *DT* = 0Decel. This individual's *DI* values were 0.13 in first and last segments, with a drop to 0.05 in the middle segment. A 3 (segment) × 3 (condition) × 2 (trial) ANOVA on all subjects' data identified significant differences across segments, *F*(2, <sup>76</sup>) = 431.42, *p* < 0.0001. There was no main effect or interactions for the perceptual conditions and trials. The average *DI* across the three conditions was 0.13 before *DT* = 0Accel, 0.04 in the center portion, and 0.14 after *DT* = 0Decel, similar to the exemplary subject in **Figure 10D**. This measure captured changes in each trajectory and revealed a relatively abrupt transition from discrete to rhythmic movement control strategies, consistent with *Hypothesis 1*.

regressions were performed over the intervals 0–40 and 120–160 of metronome numbers. The red lines are dashed after *DT* = 0Accel and before *DT* = 0Decel.

conditions. **(D)** A single subject's trial; the vertical lines show the trial

**FIGURE 11 | Onset asynchrony values of all subjects across metronome number. (A–C)** All subjects' values in the three perceptual conditions. **(D)** A single subject's trial with line fits. The linear

shown.

# **ONSET ASYNCHRONY**

Given that the task required synchronization with a metronome, the synchrony of movement onset with the auditory stimulus was analyzed to assess entrainment and predictive control. **Figures 11A–C** shows the sequence of onset asynchronies, the difference between movement and metronome onset. The data from all individual subjects in all three conditions showed that there was considerable variability across subjects. Despite this inter-individual variability, the majority of subjects exhibited a relatively constant onset asynchrony *OA* from the trial start until movement numbers 50–60, which lasted well beyond the initial long-interval section. Similarly, the second part of the trial showed relatively constant values of *OA* for the last 40–50 movements. The *OA* values after *DT* = 0Accel and before *DT* = 0Decel are shown as dashed lines. To highlight these observations, **Figure 11D** shows *OA* values in a single subject's trial in the *V-short* condition. The straight black lines were obtained from linear regression over the first 40 and last 40 *OA* estimates in each trial. As the slopes did not differ from zero in all conditions, we used the mean values to estimate representative onset asynchrony for the accelerating and decelerating segment and condition.

A 3 (condition) × 2 (segment) × 2 (trial) ANOVA identified a main effect for condition, *F*(2, <sup>106</sup>) = 6.755, *p* < 0.002, and a main effect for segment, *F*(1, <sup>91</sup>) = 5.97, *p* < 0.05. There were no other main effects or interactions. In the two conditions with the short metronome sound, *V-short* and *NV-short*, subjects showed an onset asynchrony of 536 and 399 ms; in *NV-long* it was only 129 ms, which was significantly different from the other two, as shown in *post-hoc* tests (*p* < 0.01). The delayed onset of the discrete movement in the two conditions with the short metronome sounds indicates reactive responses to the auditory stimulus, while the shorter onset asynchrony suggests some degree of predictive control. The long reaction times also suggest that the movements were complex and demanded substantial neural resources. The mean onset asynchronies in the first and second segment of the trials were 469 and 240 ms, respectively. Even though this difference between the two segments does not directly test hysteresis, it may still be consistent with the hysteresis effect shown above (*Hypothesis 2*). Summarizing, explicit auditory timing information apparently elicited predictive control.

# **DISCUSSION**

This study attempted to test whether movement control is based on a small set of dynamic primitives, specifically submovements and oscillations that underlie observable discrete and rhythmic movements. We hypothesized that coordination based on primitives is computationally less demanding. However, representation in the form of dynamic primitives may also limit the versatility of coordinated movements that can be performed. One such limitation may be the speed with which a sequence of discrete movements can be executed. If rhythmic primitives are less demanding of neural resources then, when stressed by requiring rapid action, the system may merge a sequence of discrete movements into a continuous rhythmic movement.

# **TRANSITIONS ARE NOT DUE TO LIMITATIONS OF THE MOTOR PERIPHERY**

The findings unambiguously demonstrated a switch from the instructed discrete movements to continuous rhythmic performance as TMT was reduced and another switch back to discrete movements as TMT increased again. This could not be attributed to limitations of the peripheral neuro-musculo-skeletal system. Despite the fact that subjects matched their TMTs to the metronome interval, they did not preserve the instructed dwell times, but progressively reduced them. The movement times at which dwell times disappeared were significantly longer than the minimum movement time achieved.

This alone might be attributed to a limited peripheral response speed or bandwidth, because, for a given metronome interval, a movement with dwell time has higher frequency components than a movement without dwell. Stated in the time domain, a movement with infinitesimal dwell time has larger accelerations than a smoothly rhythmic movement (Hogan and Sternad, 2007). However, changing the sensory condition significantly affected the movement duration at which dwell times disappeared—536 ms for *NV-long* vs. 463 and 457 ms for *V-short* and *NV-short*, respectively. In the *NV-long* condition, the passage to zero dwell time cannot be attributed to limited peripheral response speed, because subjects were demonstrably capable of faster movements with non-zero dwell times.

In addition, in all subjects and conditions, the discreteness index showed an abrupt change around metronome number 60 and 100. It dropped rapidly to zero approximately coincident with dwell time reaching zero as TMT decreased; it recovered rapidly from zero approximately coincident with dwell time increasing from zero as TMT increased again. Both of these observations are as predicted by *Hypothesis 1*. Discrete movements with precisely timed starts and stops require more neural resources than smoothly rhythmic movements (Schaal et al., 2004). As available movement time decreased, the increasing demand on neural resources evoked a switch to the neurally simpler rhythmic movements. Once available movement time increased again, the more challenging discrete movements were reinstated.

# **WEAK SIGNS OF HYSTERESIS**

A common phenomenon in systems with more than one stable state is that transitions between these stable states may display hysteresis: transitions in one direction occur at different parameter values than in the reverse direction. To test *Hypothesis 2*, the experiment included accelerating and decelerating portions that induced transitions from discrete to rhythmic movements and back from rhythmic to discrete movements, respectively. However, the TMT when dwell time reached zero as metronome intervals decreased was not significantly different from the TMT when dwell time increased from zero as metronome intervals increased again. This result is not consistent with *Hypothesis 2*.

Nevertheless, some asymmetry in the transitions was evident. The fastest (rhythmic) movements persisted for several seconds (2.8 s on average) after the metronome intervals began to increase again. This was followed by a faster rate of increase of movement time and dwell time with metronome number than their rate of decrease during the accelerating portion of the trial. The exact origin of this phenomenon requires further investigation.

# **EXPLICIT AUDITORY INFORMATION PROMOTED RHYTHMIC MOVEMENTS**

One condition was included that presented explicit auditory information about the dwell duration in order to test whether this would guide subjects to produce the required dwell time. *Hypothesis 3* stated that explicit auditory information about dwell duration should not be able to prevent the transition between the two primitives: transitions should occur when the neural system was challenged and induced to resort to simpler solutions. Indeed, as predicted, auditory specification of dwell times did not facilitate the discrete movements and the weak hysteresis effect in the decelerating portion was unaffected by the different auditory conditions. However, there was one effect of prolonged auditory stimuli that was counter to expectations: when the metronome sounds were longer, subjects switched to rhythmic movements *earlier* and switched back to discrete movements *later* than in the short metronome conditions. This effect may be due to the fact that "filling" half of the time interval with sound made the auditory signal appear more periodic: sound and no-sound alternated periodically. The more rhythmic nature of the signal may have entrained the movement and induced rhythmic behavior.

The different auditory stimuli produced one additional effect. The two conditions with the short sound revealed a considerable delay between metronome to movement onset, 536 and 399 ms, that clearly indicated a reactive response. This is not surprising as the intervals of 2 s in the beginning and end of the trial were longer than humans can temporally integrate and perceive as rhythmic (Fraisse, 1984). In contrast, in the longbeep condition the delays from metronome onset to movement onset were considerably shorter (129 ms). In this condition the silent interval was at most 1 s, alternating with 1 s of sound. This sound pattern can easily be perceived as periodic. Human subjects readily predict periodic signals and have been shown to reduce the phase lag of their responses or even exhibit phase lead (Aschersleben, 2002). This distinctive feature of responses to periodic stimuli may also account for the transitions to and from rhythmic performance at longer metronome intervals. In effect, the auditory stimulus was more readily perceived as rhythmic and may have entrained a primitive rhythmic motor behavior (Mates et al., 1994). If so, this raises the intriguing possibility that *perception* as well as action may be based on dynamic primitives.

The substantially longer delays in the *V-short* and *NVshort* conditions suggest that subjects only reacted to the metronome trigger—without any prediction. Reactive behavior during the long inter-beep intervals of 2–1.5 s is not unexpected, although the reaction times were approximately 2–3 times longer than simple auditory reaction times. Moreover, the same reaction times persisted into much shorter intervals, even though the interval changes were systematic and predictable. These long reaction times may indicate the demands of movement planning (Henry and Rogers, 1960; Sternberg et al., 1978). It would appear that simple reaching movements with a timed dwell demanded considerable neural resources.

# **VISUAL INFORMATION HAS NEGLIGIBLE INFLUENCE ON TRANSITIONS**

We speculated in *Hypothesis 4* that visual information may mask primitive-based transitions and performance with visual information excluded may better reveal transitions due to neural or computational constraints. However, none of the transition effects showed any dependency on the presence of visual information. One possible explanation for this lack of effect could be that in the no-vision conditions, where the spatial targets were not visible, subjects may have decreased their amplitudes to maintain discrete movements longer, which could have cancelled the hypothesized effect. The only effect that depended on vision was the minimum movement time reached. In the vision condition the minimum movement time was ∼25 ms longer compared to the no-vision conditions. This difference was probably due to the fact that subject tried harder to maintain the target amplitude while when their eyes were closed they traded amplitude for timing accuracy.

# **SUMMARY AND CONCLUSIONS**

The results are consistent with the overall hypothesis that control is based on primitives. Transitions from discrete to rhythmic movements occurred before the neuro-mechanical system had reached its maximum performance and the kinematic profiles changed rather abruptly. On the other hand, the evidence for hysteresis was weak. Interestingly, the transition from discrete to rhythmic was facilitated by long auditory metronome sounds, counter to expectations, indicating that perceptual rather than biomechanical factors influenced the transition. These results complement findings that revealed how continuous movements, if performed sufficiently slowly, decompose into chunks or submovements. Evidence for this intermittent control has been seen in healthy subjects and patients with lesions (Doeringer and Hogan, 1998; Krebs et al., 1999). In particular, slow rhythmic movements may transition to a sequence of discrete movements (Adam and Paas, 1996; van der Wel et al., 2010). Our experimental results add one more piece of evidence supporting the hypothesis that control is based on dynamic primitives.

# **ACKNOWLEDGMENTS**

Dagmar Sternad was supported by The National Institutes of Health R01-HD045639, the American Heart Association 11SDG7270001, and the National Science Foundation NSF DMS-0928587. Steven Charles was supported by a Whitaker Graduate Fellowship. Neville Hogan was supported in part by the Eric P. and Evelyn E. Newman fund and by DARPA under the Warrior Web program, BAA-11-72.

# **REFERENCES**


typewriting," in *Information Processing in Motor Control and Learning,* ed G. E. Stelmach (New York, NY: Academic Press), 117–152.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 21 March 2013; accepted: 17 June 2013; published online: 22 July 2013.*

*Citation: Sternad D, Marino H, Charles SK, Duarte M, Dipietro L and Hogan N (2013) Transitions between discrete and rhythmic primitives in a unimanual task. Front. Comput. Neurosci. 7:90. doi: 10.3389/fncom.2013.00090*

*Copyright © 2013 Sternad, Marino, Charles, Duarte, Dipietro and Hogan. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

# From ear to hand: the role of the auditory-motor loop in pointing to an auditory source

# *Eric O. Boyer 1,2\*, Bénédicte M. Babayan1†, Frédéric Bevilacqua1, Markus Noisternig1, Olivier Warusfel 1, Agnes Roby-Brami 3, Sylvain Hanneton2 and Isabelle Viaud-Delmon1*

*<sup>1</sup> STMS IRCAM-CNRS-UPMC, IRCAM, Paris, France*

*<sup>2</sup> Laboratoire de Neurophysique et Physiologie, CNRS UMR 8119, UFR Biomédicale des Saints Pères, Université Paris Descartes, Paris, France*

*<sup>3</sup> Institut des Systèmes Intelligents et de Robotique, CNRS UMR 7222, UPMC, Paris, France*

### *Edited by:*

*Yuri P. Ivanenko, IRCCS Fondazione Santa Lucia, Italy*

### *Reviewed by:*

*Jonathan Z. Simon, University of Maryland, USA Barbara La Scaleia, IRCCS Fondazione Santa Lucia, Italy*

### *\*Correspondence:*

*Eric O. Boyer, STMS IRCAM-CNRS-UPMC, IRCAM, 1 place Igor Stravinsky, 75004 Paris, France. e-mail: eric.boyer@ircam.fr*

### *†Present address:*

*Bénédicte M. Babayan, Laboratoire Neurobiologie des Processus Adaptatifs, Navigation, Memory, and Aging (ENMVI) Team, CNRS UPMC, UMR 7102, Paris, France.*

Studies of the nature of the neural mechanisms involved in goal-directed movements tend to concentrate on the role of vision. We present here an attempt to address the mechanisms whereby an auditory input is transformed into a motor command. The spatial and temporal organization of hand movements were studied in normal human subjects as they pointed toward unseen auditory targets located in a horizontal plane in front of them. Positions and movements of the hand were measured by a six infrared camera tracking system. In one condition, we assessed the role of auditory information about target position in correcting the trajectory of the hand. To accomplish this, the duration of the target presentation was varied. In another condition, subjects received continuous auditory feedback of their hand movement while pointing to the auditory targets. Online auditory control of the direction of pointing movements was assessed by evaluating how subjects reacted to shifts in heard hand position. Localization errors were exacerbated by short duration of target presentation but not modified by auditory feedback of hand position. Long duration of target presentation gave rise to a higher level of accuracy and was accompanied by early automatic head orienting movements consistently related to target direction. These results highlight the efficiency of auditory feedback processing in online motor control and suggest that the auditory system takes advantages of dynamic changes of the acoustic cues due to changes in head orientation in order to process online motor control. How to design an informative acoustic feedback needs to be carefully studied to demonstrate that auditory feedback of the hand could assist the monitoring of movements directed at objects in auditory space.

**Keywords: spatial audition, human, pointing movement kinematics, orienting movements, reaching, auditory-motor mapping, movement sonification**

# **INTRODUCTION**

Interactions between the auditory and motor systems are mainly studied in the context of musical rhythm or vocal sounds perception and production (e.g., Hickok et al., 2003; Chen et al., 2009). However, hand pointing to sounds is often used to study auditory localization. It is a complex task that relies on a precise representation of auditory space that can be used for the control of directional motor output. Just like pointing to visual targets, it involves different modular neural processes since spatial information about the target position and hand position have to be combined across different senses and reference frames.

In order to address the mechanisms whereby an auditory input is transformed into a motor command, we studied online auditory control of the direction of pointing movements toward auditory sources. We first investigated whether pointing movements were more accurate when the target was present throughout the entire pointing movement than when the target disappeared shortly after the hand movement had begun.

We then added an auditory feedback of the pointing hand's position during the entire hand movement to evaluate whether human subjects could use such a feedback. This additional auditory feedback named auditory avatar (by analogy with avatars used to represent visually a part of the body of a participant in a virtual environment) was used in order to evaluate whether it would constitute stable and relevant information to guide the motor action of the user, as already suggested by recent results indicating that auditory information is used to control motor adaptation (Oscari et al., 2012). With such an auditory feedback, the auditory modality conveys supplementary sensory information that is correlated with proprioception and set in modular processes in the same spatio-temporal reference frame as the target, hence facilitating precision in the pointing task. A well-designed auditory avatar, which corresponds to a sonification transforming relevant parameters of human movement patterns into appropriate sound, could be used to enhance perception accuracy and would be useful for sensory substitution and motor training technologies.

The first auditory avatar condition was contrasted to a shifted condition where the heard hand position did not correspond to the actual hand position thus resulting in a discrepancy between auditory and proprioceptive information. Similar methodology can be found in Forma et al. (2011), where participants were asked to point to virtual targets in a spatialized audio environment using the openAL library (interaural time and level differences based audio environment). Studying online adaptation to this sensory conflict was expected to provide further information about the contribution of auditory inputs generated by arm movements to motor control.

# **MATERIALS AND METHODS**

## **SUBJECTS**

Twenty-four self-reported right-handed volunteers (12 females and 12 males; 25.6 ± 6.6 years old) participated in the experiment. All were healthy and had normal hearing. The study was carried out in accordance with the Declaration of Helsinki. All subjects gave written informed consent and were paid for their time.

# **EXPERIMENTAL SETUP**

The experiment used real-time controlled virtual audio rendering for both representing sound sources at the target positions in space and attaching sounds to the subject's right hand during the pointing movement. Audio was played back over headphones and subjects were seated in front of a table from which the auditory targets virtually originated. To prevent any visual input interference during the experiment all subjects were blindfolded.

The stimuli for target sources and the auditory avatar were (mutually uncorrelated) white Gaussian noise signals. The virtual audio targets as well as the auditory feedback of the hand position were provided with the Head-Related Transfer Functions (HRTFs) binaural technique (Wightman and Kistler, 1989a,b). Spat∼, IRCAM's software for real-time sound source spatialization, was used to create the binaural signals. Binaural rendering uses HRTFs to reproduce the sound pressure at the ear entrance that corresponds to a sound source at a given position in 3 dimensional space. Processing a monophonic audio signal with a set of HRTF filters and playing these signals back over headphones creates the illusion of a virtual sound source at the corresponding position in space. The spatialization of the sounds (stimuli and hand position) was calculated in real-time through the tracking of the head's and right hand's positions and orientations using a sixcamera Optitrack (by Natural Point) 3-D infrared motion capture system. To this end, two rigid sets of markers were placed on the headphones and the right-hand's forefinger. They were, respectively, composed of seven and four reflective markers tracked by the cameras. The coordinates of the hand and head's locations in space were measured and recorded with the tracking system at a sampling frequency of 100 Hz. The minimal latency of the overall system is then 10 ms, with an audio latency of 0.6 ms, which is fast enough to ensure perceptive coherence when localizing virtual sound sources (Brungart et al., 2004). The orientation of the 7-marker rigid body fixed to the headphones allowed for computing the heading direction (0◦ is forward, positive is to the right, see **Figure 1**). The endpoint used to measure the kinematics of the hand corresponded to the tip of the index finger.

**FIGURE 1 | View of the experimental set up, protractor on the table (0◦ axis straight ahead) and optical markers of the Optitrack 3-D motion capture system on the head (attached to the headphones) and right hand of the subject.** Note the positive/negative angles reference.

# **EXPERIMENTAL PROCEDURE**

The experiment lasted 1 h and was composed of pre-trials and 4 sessions. The pre-trials aimed at selecting the best-fitting HRTF from a set of several HRTFs. This best-fitting HRTF was then used to convolve the stimuli of the main experiment. Subject tested HRTFs previously selected in HRTFs fitting past experiments [see Sarlat et al. (2006) for a description of the method] plus their individual HRTFs when available, while hearing the spatialized targets. Up to four functions were tested. Approximately 10 practice trials per tested HRTF were performed in a pseudo-random order using the five targets of the experiment. Subjects were asked if they heard a spatialized sound and if so were asked to point toward its direction. The HRTFs were selected if in at least 8 trials the subjects pointed toward the correct direction (±10◦ approximately). The five subjects who did the pre-trials with their own HRTFs used them. The other subjects did not have individual HRTFs and used the non-individual HRTFs they selected during the pre-test.

Each session tested a different condition. In the short sound condition (named A) the auditory target was played for 250 ms before subjects pointed toward it. In the long sound condition (B) the auditory target was played for 2000 ms and subjects pointed toward it whilst hearing the auditory stimulus. Two other sessions included the auditory avatar that provided auditory feedback of the position of the hand in space. The fingertip position was dynamically tracked in real-time with the motion capture system and controlled the sound spatialization. Thus the white Gaussian noise stimulus was perceived as coming from the hand position. In these sessions the target was displayed during 250 ms and the avatar was heard constantly. In the "avatar condition" the actual hand position was heard (C), and in the "conflicting avatar condition" (D) the audio rendered hand position was shifted 18.5◦ left from the real hand position. Before each session, the subjects did a few trials to get used to the task demands and to the auditory feedback. The subjects were divided into 2 groups: group 1 performed the sessions in the regular order (A-B-C-D) and group 2 in the reverse order (D-C-B-A).

At the beginning of a trial, subjects were told to put their right hand on the table in front of them near their abdomen, with the palm at a position indicated by a tactile marker, and to hold their head up right facing ahead during the experiment. The auditory sources originated from a virtual distance of 60 cm in the horizontal plane of the table centered by the tactile marker. The targets originated from five directions with azimuth angles of −35◦, −20◦, 0◦ (ahead of the subject), 20◦, and 35◦ (right is positive). Each session contained 32 trials presented in the same pseudo-random order for each subject. Moreover, the table on which the subjects pointed was covered with a semi-circular protractor of which origin was located at the starting hand position. It enabled a measure in degrees of the pointing as subjects were asked to keep their hand still for a few seconds after pointing. After each trial the subjects put their hand back to the tactile marker. The experimental setup from the subject's viewpoint is shown in **Figure 1**.

# **DATA ANALYSIS**

### **LEVEL OF PERFORMANCE**

The pointing direction was directly measured on the protractor. The level of performance is evaluated by the signed angular error, which is the difference between the target direction and the final direction pointed by the subjects. If the subject pointed to the left of the target, the error was negative, and conversely it was positive if the subject pointed to the right of the target.

### **MOVEMENT ANALYSIS**

The raw data of hand and head positions was recorded and processed off-line for the analysis of the kinematics of hand and head movements. A semi-automatic method was designed to detect and segment each pointing gesture and eliminate the way back to the start tactile marker. A primary segmentation was performed by applying thresholds on the hand displacement along the horizontal plane (*x, y*). The typical trajectories projected on the horizontal plane are shown in **Figure 2** for each condition.

The second segmentation process was based on systematic movement kinetics analysis. To compute velocity, acceleration and jerk, position data was filtered with a Gaussian low-pass filter, with a cut-off frequency of 5 Hz. As the movement is captured along the 3-dimensions of space the computed values are 3-dimensional energy-related vectors: *v*3*D*, *a*3*D*, and *j*3*<sup>D</sup>* are, respectively, the norms of the tangential velocity, acceleration and jerk vectors. The beginning and the end of movement were defined as the crossing of a threshold on *v*3*<sup>D</sup>* corresponding to 3% of the peak velocity calculated on the trajectory. The "beginning" of the gesture is thus related to the energy of the movement. The typical velocity and acceleration profiles obtained for one pointing gesture are plotted on **Figure 3**.

Additionally, kinematic analysis included the following measures for hand and head movement: movement duration, peak velocity value, average velocity, acceleration peaks analysis (occurrence and position), and trajectory length in space. We counted the total number of acceleration peaks occurring before

**FIGURE 2 | Typical trajectories of the tracked hand for a single subject for each of the four conditions tested: short sound condition (A), long sound condition (B), avatar condition (C), and conflicting avatar condition (D).** Better pointing precision and reduced overshooting is noticeable in condition **(B)**.

and after the maximum velocity peak of the movement (peak velocity point PVP).

In order to investigate the possible role of the head in sound localization before and during pointing to the estimated location of the source, we also measured the heading angle around the vertical axis and computed its maximum values and range of motion (ROM).

### **RESULTS**

### **STATISTICAL ANALYSIS**

The results of six participants were removed from the analysis based on three criteria: subjects who did not follow the instruction to point directly toward the target (the trajectory duration is more than twice the average and longer than the longer stimulus duration in the long condition), three subjects; trajectories showing no dependence on the target direction (with only two ±90◦ endpoints), two subjects; short trajectories (less than 10 cm) that lead to unstable angular calculations, one subject.

The dependent variables considered in our statistical analysis (ANOVA) are the averaged measures (duration, maximum velocity, average velocity etc.) over each target direction and each condition. In the statistical analysis, we considered two grouping factors. The first is the two-level HRTF factor that indicates if the subject used his own HRTF or not. The second factor is the two-level group factor that indicates the order of the presentation of the experimental conditions. We also considered two repeated-measure factors. The first one, the five-level target direction factor, corresponds to the direction of the target. The second one is the four-level condition factor indicating the experimental condition of each trial (A-B-C-D).

Statistical data analysis showed no main effect of the group factor. There was thus no effect of the order of the conditions either on the pointing performance or on the dynamical control of the gestures. There was a main effect of the individualized HRTF only on the proportion of acceleration peaks of the head after the PVP [*F(*1*,* <sup>16</sup>*)* = 5*.*8, *p <* 0*.*05]. However the average peak number was not significantly different between the two-levels of the HRTF factor (*post-hoc* Bonferroni test). It is important to note that the individualized HRTF factor had no effect on the measures related to hand movement. The group factor and the individualized HRTF factor will not be used further in the analysis and data will be averaged per factor.

### **LEVEL OF PERFORMANCE**

There was a main effect of the condition factor but also of the target factor on the absolute value of the angular error [*F(*3*,* <sup>51</sup>*)* = 6*.*23, *p <* 0*.*005 and *F(*4*,* <sup>68</sup>*)* = 5*.*80, *p <* 0*.*001, respectively]. Subjects were significantly more accurate in the long sound condition B (see **Figure 4** top which shows the absolute pointing error for the different conditions and the results of the *post-hoc* Bonferroni test; error bars indicate 95% confidence interval). Furthermore, there was a significant interaction between the two factors [*F(*12*,* <sup>204</sup>*)* = 1*.*91, *p <* 0*.*05].

We also analysed the signed angular error as the sign indicates if the subjects pointed more to the left or more to the right of the target direction. There was a main effect of the condition [*F(*3*,* <sup>51</sup>*)* = 2*.*84, *p <* 0*.*05] and the target direction factor [*F(*4*,* <sup>68</sup>*)* = 20*.*34, *p <* 0*.*0001], and there was a significant interaction between condition and target direction factors [*F(*12*,* <sup>204</sup>*)* = 6*.*13, *p <* 0*.*0001]. Targets' azimuths were over-estimated by the subjects (see **Figure 4** bottom which shows the signed pointing error for target directions and among conditions tested). Left targets were pointed with negative errors, and right targets with positive errors. This overshooting was reduced in the B condition: −66%, −40%, −48%, −94%, and −90% for targets from left to right compared to the maximum errors in the other conditions. However, it is important to note that the subjects still presented a 9.8◦ average bias on the left when the target was presented straight ahead in the B condition.

### **GLOBAL KINEMATICS**

The parameters associated with movement velocity were significantly influenced only by the target direction [*F(*4*,* <sup>68</sup>*)* = 8*.*66, *p* = 0*.*00001 for the duration; *F(*4*,* <sup>68</sup>*)* = 81*.*59, *p <* 0*.*00001 for the peak velocity and *F(*4*,* <sup>68</sup>*)* = 72*.*31, *p <* 0*.*00001 for the average

velocity]. Peak and average velocities were significantly higher for target sounds coming from the right (i.e. for +20◦ and +35◦): +37% for peak velocity and +31% for average velocity, *post-hoc* Bonferroni test *p <* 0*.*0001. The same test revealed no exploitable difference between the five target directions regarding movement duration.

The condition factor, the target direction factor and their interaction had a significant effect on the trajectory length [*F(*3*,* <sup>51</sup>*)* = 5*.*47, *p <* 0*.*005; *F(*4*,* <sup>68</sup>*)* = 47*.*03, *p <* 0*.*0001 and *F(*12*,* <sup>204</sup>*)* = 2*.*58, *p <* 0*.*005, respectively]. The analysis showed a significantly longer distance covered for targets on the right (0.473 m at +20◦, 0.510 m at +35◦ against 0.414 for the three other targets averaged, *p <* 0*.*005), but also in the B condition (0.482 m against 0.432 m on average, *post-hoc* Bonferroni test *p <* 0*.*05; see **Figure 5**).

### **MOVEMENT DYNAMICS AND SEGMENTATION**

The counting of acceleration peaks revealed a significant effect of condition factors [*F(*3*,* <sup>51</sup>*)* = 3*.*04, *p <* 0*.*05] and target direction [*F(*4*,* <sup>68</sup>*)* = 30*.*93, *p <* 0*.*00001] on the total number of peaks and

on the proportion of peaks before reaching the PVP [*F(*3*,* <sup>51</sup>*)* = 3*.*34, *p <* 0*.*05 and *F(*4*,* <sup>68</sup>*)* = 36*.*97, *p <* 0*.*00001].

In the B condition, subjects' movements presented larger total number of acceleration peaks, however, not significantly different from the other conditions (4.85 against 4.20 on average).

The number of peaks decreased as the target direction shifted to the right of the subjects (significantly for the two targets on the right, *post-hoc* Bonferroni test *p <* 0*.*0005: 3.89 and 3.70 against 4.73 on average). Only the target direction factor had an effect on the proportion of peaks after PVP [*F(*4*,* <sup>68</sup>*)* = 10*.*68, *p <* 0*.*00001] significantly different for +20◦ and +35◦ targets (−18% at +20◦ and −22% at +35◦ on average), while there was a marginally significant effect of the condition factor [*F(*3*,* <sup>51</sup>*)* = 2*.*47, *p* = 0*.*07].

It is noticeable that subjects produced movements with more acceleration peaks on the second "half " of the trajectory, during the deceleration phase: 1.52 before the PVP, 2.84 after on average. If taken as a factor, the proportion of peaks before or after PVP together with the condition factor shows a significantly higher increase of peaks after PVP for condition B than conditions A and C (*post-hoc* Bonferroni test *p <* 0*.*01).

### **HEAD MOVEMENT ANALYSIS**

The same analysis was conducted on the head movement data. The target direction factor had a significant effect on the total number of acceleration peaks in the head movement [*F(*4*,* <sup>68</sup>*)* = 5*.*75, *p <* 0*.*005] with the same tendency toward right directions as for the hand (7.14 peaks for −35◦, 6.53 for +35◦). No significant effect was found on the proportion of acceleration peaks before PVP. After this point, both target direction and condition factors have significant effects [*F(*4*,* <sup>68</sup>*)* = 4*.*97, *p <* 0*.*005 and *F(*3*,* <sup>51</sup>*)* = 6*.*93, *p <* 0*.*001, respectively], again with the same behavior as for the hand. The B condition exhibited a significantly larger numbers of peaks after PVP (+50% for B on average, *posthoc* Bonferroni test *p <* 0*.*05) than in the other conditions and the center and right targets exhibited fewer acceleration peaks (−17% on average).

**condition.** Effect significance: [*F* = 20*.*2, *p <* 0*.*0001]; *post-hoc* Bonferroni test; Final heading angle (in degrees) for each target direction and each condition. Interaction effect significance: *F* = 14*.*9, *p <* 0*.*00001. Bars indicate 95% confidence interval.

Both condition and target direction factors had a significant effect on the ROM of the heading angle [*F(*3*,* <sup>51</sup>*)* = 20*.*2, *p <* 0*.*0001; *F(*4*,* <sup>68</sup>*)* = 3*.*93, *p <* 0*.*01 respectively] and there is a significant interaction between the two factors [*F(*12*,* <sup>204</sup>*)* = 2*.*40, *p <* 0*.*01]. The ROM of the heading angle was significantly higher in the B condition than in the other conditions (21.9◦ against 5.31◦, 7.42◦ and 5.17◦ for A, C, and D conditions, *post-hoc* Bonferroni test), as shown in top **Figure 6**. No significant difference was found among the target directions but the ROM increased with the target eccentricity (+45% on the left, +23% on the right on average compared to 0◦ target).

In order to investigate the potential link between target direction and head rotation for localization when pointing we analysed the distribution of the heading angles at the end of the movement. As for the ROM, the condition factor, target factor and their interaction had an effect on the angle [*F(*3*,* <sup>51</sup>*)* = 5*.*07, *p <* 0*.*005; *F(*4*,* <sup>68</sup>*)* = 9*.*17, *p* = 0*.*00001 and *F(*12*,* <sup>204</sup>*)* = 14*.*9, *p <* 0*.*00001 respectively]. Significant differences were found for the two right targets compared to left targets (*p <* 0*.*01); the subjects turned their head toward the correct hemisphere corresponding to the target direction. When coupling the effect of the condition and the target direction, we found that this behavior was prevailing under condition B (see **Figure 6** bottom). The two graphs on **Figure 6** show that subjects moved their head more under condition B and in the direction of the target. The bias for 0◦ target is also reduced under this condition: 0.30◦ compared to 6.66◦ for A, 3.06◦ for C, and 2.18 for D.

The analysis of the relative position of the PVP along the movement of the hand and the heading angle shows that subjects tended to initiate the movement of their head before the pointing movement. The distribution of these relative positions is shown in **Figure 7** for every trial over every subject in each condition. On average, 43% of the gestures exhibited heading peak velocity between the beginning and the first third of the movement completion against 12% only for the hand. The tendency is observed in all the conditions and in spite of the large differences in ROM of heading and final angle between conditions.

# **DISCUSSION AND CONCLUSION**

In this study, we attempted to address the mechanisms whereby an auditory input is transformed into a motor command. First, we aimed at assessing the role of auditory information about target position in correcting the trajectory of the hand by varying the duration of the target presentation. Second, we attempted to evaluate whether human subjects could use an auditory feedback about their hand position and how they would react to shifts in this avatar of their heard hand position.

Only the long sound target condition exhibited a higher level of performance of the subjects. This strong effect is comparable to the one obtained during pointing movements toward visual targets present throughout the entire pointing movement (Prablanc et al., 1986). In the present study, the target is presented during the whole movement only in the long sound duration condition (B). In the short sound duration condition, the location of the target needs to be memorized and it is possible that a shorter sound would lead to a less precise or reliable representation of the target. Errors in pointing to remembered targets presented visually have been shown to depend on delay between target offset and pointing (McIntyre et al., 1998). Therefore, the neural processes involved in coding the target in a motor-related or body-related reference frame from its auditory spatial trace seem to require

a sufficiently long auditory stimulation. On the other hand, one can assume that comparison of auditory information about target position with proprioceptive information is required to update or refresh an internal representation of the goal to drive optimally the pointing hand.

In addition to better performance and precision (reduced bias for 0◦ target), subjects presented longer trajectories in the longer sound condition and slightly more acceleration peaks. The proportion of acceleration peaks in the deceleration part of the movement also increased in this condition. These results show that the improvement of precision in this condition may not only be due to better memorization of the target but also to the possibility to make online corrections of the hand trajectory. The use of auditory information about target direction as a feedback for guiding the reaching movement is likely since the kinematics showed indices of iterative corrections in condition B (in particular, increased length of the trajectory and increased number of peaks after PVP). These online corrections can be produced only if a neural process is able to use the auditory estimation of the target position and to make it available continuously to the sensorimotor process that drives the hand. Therefore, a sound still heard at the end of the pointing movement as in condition B would allow a more efficient updating of the goal representation in relation to the hand's position and thus a more accurate movement.

# **CONTRIBUTION OF THE AUDITORY AVATAR**

As demonstrated in Oscari et al. (2012), hand trajectory can be controlled and optimized with an auditory feedback. Here, the directional accuracy of pointing movement was not greater with auditory feedback of the hand position than without this information available (comparison of conditions A and C). Furthermore, in condition D auditory feedback of hand position was shifted by 18.5◦ perpendicularly to the main movement direction. Following the shift, the hand trajectory was expected to deviate from those produced in the condition without the shift. The analysis showed no significant effect of the resulting discrepancy between auditory and proprioceptive information about hand position on the pointing accuracy. It is possible that the levels of performance in all conditions but the long target condition were impeded by an inaccurate representation of the target relative to the body and that this important inaccuracy masks a small effect of the hand auditory feedback. Indeed, in the short sound condition with no avatar (A), the mean absolute pointing error was of 26◦, higher than the shift used with the avatar in condition D.

In the avatar conditions, the proprioceptive modality also might have overtaken or dominated the overflowed auditory modality, hence the importance of the design of such feedback, as showed in Rosati et al. (2012). In their study, the authors compare the contribution of different sound feedbacks on the performance in a manual tracking task and their interaction with visual feedback. They have observed that sound feedback can be counterproductive depending on the task and mapping between gesture and sound. In our experiment the same sound was used for the targets and the hand feedback. This might have confused the subjects when localizing the target and addresses the question whether spatial auditory information about limb position is enough to provide an efficient feedback to a motor action. Different parameters of the motor action might indeed need to be sonified (for instance kinematics rather than position in space). It is therefore important to study the appropriate parameters for auditory-motor mapping before being able to provide useful information for rehabilitation and sensory substitution devices.

## **HEAD MOVEMENTS**

The analysis of final head orientation showed that in B condition heading automatically accompanied the auditory-manual pointing task despite the explicit instruction to avoid head movements. Thus, head rotations were only present when sufficient localization cues were available and the heading direction was consistently related to target direction and eccentricity. The first hypothesis than can be proposed is that this result indicates that in all the other conditions tested, the auditory target was too short to provide enough information to elicit head movements. However, since the heading direction and the direction of the pointing are clearly related in condition B (see **Figure 6**), one can propose also that the long sound allows an orienting movement of the head toward the auditory target and that the final angle of this orienting movement could guide the pointing movement of the hand. The fact that the head tends to achieve its maximum heading velocity before the hand PVP in all the conditions (see **Figure 7**) shows that early movement of the head alone did not lead to improved performance in condition B, but did along with a larger ROM and heading toward the target.

In general, heading movements belong to automatic orienting reactions that have been mainly studied in the framework of gaze orienting behavior (Guitton, 1992). Here in blindfolded subjects, we can assume that heading also aims at optimizing the binaural perception of the acoustic stimulation direction. The auditory system certainly relies on head motor information to build representations of the location of auditory targets. However and unfortunately, sound localization is mainly studied with the head fixed. Nevertheless several studies have used head orientation to quantify the ability of participants to indicate the perceived direction of a natural acoustic stimulation (Perrott et al., 1987; Makous and Middlebrooks, 1990; Pinek and Brouchon, 1992). These studies demonstrated that the direction indicated by the head was underestimated (∼10◦). We obtained similar results despite different experimental conditions (voluntary head pointing vs. automatic orienting reaction). Orienting reaction and voluntary heading to natural acoustic stimulation were observed with relatively short stimuli (500 ms) in Goossens and Van Opstal (1999). In contrast, in our experiment with HRTF spatial rendering, heading toward the target was little observed with short sound stimuli. However, Goosens and Van Opstal authors suggested that head movements could provide spatial information about rich and long enough sounds that would be used by the auditory system to update the internal representation of the target. Our results suggest indeed that the accuracy of pointing to long stimuli could be due to the contribution of heading toward the target providing a more accurate frame of reference for the anticipated control of pointing. However, this does not exclude a direct role of the on-going presentation of the acoustic target.

### **TARGET DIRECTIONS AND CHARACTERISTICS OF MOVEMENTS**

The estimated direction of targets are characterized by a perceived space wider than the real one. This was also observed with hand pointing toward "natural" sounds produced by loudspeakers (Pinek and Brouchon, 1992). However, it was much larger in our study than that observed with natural sounds (less than 10◦ for Pinek and Brouchon) and this could originate in the use of non-individual HRTF in which interaural differences are not adapted to the geometry of the head. The observed left bias in direction for straight ahead targets could result from a pseudo-neglect effect favoring the left hemispace similar to the pseudo-neglect effect observed with vision (Sosa et al., 2010).

The left/right asymmetry observed in the trajectories kinematics can be explained by this effect as well. Indeed, average and peak velocities increased for targets on the right without effect of the conditions. Along with longer distances covered and fewer number of acceleration peaks, this effect might have caused variations in the control parameters of the movements between the two hemispaces. The left bias observed for the 0◦ target sound supports this hypothesis. Nevertheless, considering the starting position of the task with the palm put at the center of the set-up, these results could also be accounted for subject's ease to point on the right with their right hand.

### **MODULARITY**

This study addresses also the question of the cooperation between different modular neural processes involved in the multisensory and motor representations of targets in goal-directed movements. Do these different processes share a global amodal spatial representation (e.g., Pouget et al., 2002) or do they have their own dedicated spatial representation? Visual and auditory modules use certainly very different reference frames. Sounds are localized thanks to spectral and binaural cues naturally linked to a headcentered frame of reference when visual positions are primarily

# **REFERENCES**


*Web Conf.* 1:00026. doi: 10.1051/ bioconf/20110100026


coded in an eye-centered reference frame. In addition, the visual system is retinotopic whereas the auditory system is characterized by broad tuning and lack of topographical organization (Maier and Groh, 2009).

The question of modularity in motor control arises when we consider the coordination between head orienting movements and hand movements. In the longer sound condition, the auditory stimulation is long enough to allow the triggering of head rotations. Since the amount of rotation of the head is related to the response of participants, there should certainly be a way for the two processes to share common information. This suggests that the heading direction is coded in a body-centered reference frame and can be used directly by the reaching motor command that shares the same reference frame.

To conclude, it is known that sound localization requires the integration of multisensory information and processing of selfgenerated movements: a stable representation of an auditory source has to be based on acoustic inputs and their relation to motor states (Aytekin et al., 2008). Our results highlight that auditory representations extracted from a sound signal can be transformed online into a sequence of motor commands for coordinated action, underlying the role of the auditory-motor loop in spatial processing.

# **ACKNOWLEDGMENTS**

We wish to thank the reviewers for their constructive comments on previous versions of the manuscript. This work was supported in part by the ANR (French National Research Agency) Legos project (11 BS02 012) and by the CNRS MI (Mission Interdisciplinarité) DEFI-SENS. It has been accomplished within the laboratory of Excellence SMART supported by French state funds managed by the ANR within the Investissements d'Avenir program under reference ANR-11- IDEX-0004-02.


parietal damage. *Brain Cogn.* 18, 1–11.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 21 December 2012; accepted: 15 March 2013; published online: 22 April 2013.*

*Citation: Boyer EO, Babayan BM, Bevilacqua F, Noisternig M, Warusfel O, Roby-Brami A, Hanneton S and Viaud-Delmon I (2013) From ear to hand: the role of the auditory-motor loop in pointing to an auditory source. Front. Comput. Neurosci. 7:26. doi: 10.3389/ fncom.2013.00026*

*Copyright © 2013 Boyer, Babayan, Bevilacqua, Noisternig, Warusfel,* *Roby-Brami, Hanneton and Viaud-Delmon. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

# Spatio-temporal analysis reveals active control of both task-relevant and task-irrelevant variables

#### *Kornelius Rácz <sup>1</sup> and Francisco J. Valero-Cuevas <sup>2</sup> \**

*<sup>1</sup> Department of Biomedical Engineering, and Neuroscience Graduate Program, University of Southern California, Los Angeles, CA, USA*

*<sup>2</sup> Department of Biomedical Engineering, Division of Biokinesiology and Physical Therapy, University of Southern California, Los Angeles, CA, USA*

### *Edited by:*

*Tamar Flash, Weizmann Institute, Israel*

### *Reviewed by:*

*Mark Latash, The Pennsylvania State University, USA Jonathan B. Dingwell, University of Texas at Austin, USA*

### *\*Correspondence:*

*Francisco J. Valero-Cuevas, Department of Biomedical Engineering, Division of Biokinesiology and Physical Therapy, University of Southern California, 3710 McClintock Ave., RTH 404, Los Angeles, CA 90089-2905, USA e-mail: valero@usc.edu*

The Uncontrolled Manifold (UCM) hypothesis and Minimal Intervention principle propose that the observed differential variability across task relevant (i.e., task goals) vs. irrelevant (i.e., in the null space of those goals) variables is evidence of a separation of task variables for efficient neural control, ranked by their respective variabilities (sometimes referred to as hierarchy of control). Support for this comes from spatial domain analyses (i.e., structure of) of kinematic, kinetic, and EMG variability. While proponents admit the possibility of *preferential* as opposed to strictly *uncontrolled* variables, such distinctions have only begun to be quantified or considered in the temporal domain when inferring control action. Here we extend the study of task variability during tripod static grasp to the temporal domain by applying diffusion analysis. We show that both task-relevant and task-irrelevant parameters show corrective action at some time scales; and conversely, that task-relevant parameters do not show corrective action at other time scales. That is, the spatial fluctuations of fingertip forces show, as expected, greater ranges of variability in task-irrelevant variables (*>*98% associated with changes in total grasp force; vs. only *<*2% in task-relevant changes associated with acceleration of the object). But at some time scales, however, temporal fluctuations of task-irrelevant variables exhibit negative correlations clearly indicative of corrective action (scaling exponents *<*0.5); and temporal fluctuations of task-relevant variables exhibit neutral and positive correlations clearly indicative of absence of corrective action (scaling exponents ≥0.5). In agreement with recent work in other behavioral contexts, these results propose we revise our understanding of variability vis-á-vis task relevance by considering both spatial and temporal features of all task variables when inferring control action and understanding how the CNS confronts task redundancy. Instead of a dichotomy of presence vs. absence of control, we should speak of a continuum of weaker to stronger—and potentially different—control strategies in specific spatiotemporal domains, indicated here by the magnitude of deviation from the 0.5 scaling exponent. Moreover, these results are counter examples to the UCM hypothesis and the Minimal Intervention principle, and the similar nature of control actions across time scales in both task-relevant and task-irrelevant spaces points to a level of modularity not previously recognized.

### **Keywords: motor control theory, redundancy, manipulation, dimensionality reduction, spatiotemporal dynamics**

# **INTRODUCTION**

Redundancy, and the variability it allows, has traditionally been viewed as the central problem of motor control research (Bernstein, 1967), which can be studied at a variety of levels (e.g., task, muscle, or goal redundancy). Here, we understand the term *task redundancy* to be the availability of infinitely many different *mechanical actions* by the neuromuscular system that can accomplish a given motor task. The totality of these mechanical actions form the goal equivalent manifold, a term coined in John and Cusumano (2007). This differs from *muscle redundancy*, which refers to the multitude of *muscle coordination patterns* producing a same mechanical action (Kutch and Valero-Cuevas, 2011). Multifinger static grasp has been studied extensively because it is a good example of task redundancy (Santello and Soechting, 2000; Latash and Zatsiorsky, 2009; Park et al., 2010; Rácz et al., 2012) since using *n* fingertips to satisfy static force and torque equilibrium of the object grasped is underconstrained (i.e., one can, for instance, squeeze an object harder without translating or rotating it). For multifinger grasp, the redundant task space of all applicable forces for static grasp can be mathematically separated into the mutually orthogonal subspaces of force variability that have no effect on static equilibrium (e.g., squeezing the object in static grasp) on the one hand, and on the other hand, force variability that disrupts static equilibrium (i.e., violates the task constraints). Others and we refer to the former and latter subspaces as task-irrelevant (or null space) and task-relevant, respectively, as they indicate a distinction about where the controller is thought to place emphasis.

Proponents of the Uncontrolled Manifold (UCM) and Principle of Minimal Intervention hypotheses have suggested that, to simplify the control task, the nervous system only needs to identify and control the task-relevant subspace, and can disregard the task-irrelevant subspace (Scholz and Schoener, 1999; Scholz et al., 2002; Jordan, 2003; Valero-Cuevas et al., 2009; Latash et al., 2010). Compelling evidence for this comes from spatial domain analyses showing clear structure in the spatial variability of task variables. By *spatial variability* we mean the amplitude and range of the multidimensional task variables of fingertip or resultant forces. Researchers, including our group, have repeatedly shown that the spatial variability in task-irrelevant dimensions is relatively larger than in task-relevant dimensions (Scholz and Schoener, 1999) in analyses of kinematic (Tseng and Scholz, 2005), kinetic (Santello and Soechting, 2000), and EMG variability (Valero-Cuevas et al., 2009). In this context, larger spatial variability in a task dimension is assumed to imply less control effort (i.e., intervention) of those task variables that do not affect the successful performance of the task. In practice, however, even task-relevant dimensions will exhibit some variability because a certain amount is acceptable given, say, high contact friction, or unavoidable, given, say, sensory or motor noise, or neural delays. Conversely, task-irrelevant dimensions will also show some control action when, for instance, noise, delays or stochasticity drive the system across some boundary that requires intervention (e.g., Insperger, 2006; Milton et al., 2009b). Therefore, the relative magnitude of variability across task variables is not necessarily a robust predictor of task-relevance, control action or strategy (Valero-Cuevas et al., 2009; Dingwell et al., 2010). In fact, even proponents of the UCM hypothesis admit the possibility of *preferential* as opposed to a strict separation into clearly controlled and uncontrolled variables (Latash et al., 2010). Despite this qualification, we lack specific quantification and description of controlled intervention in both task-relevant and task-irrelevant spaces that would allow us to understand neural control strategies better.

### **SPATIAL VERSUS TEMPORAL VARIABILITY**

There is a growing emphasis to infer neural control strategies by supplementing spatial quantification of variance with temporal analyses. As described above, much more attention has been given to spatial variability. However, relatively little attention has been directed at the temporal structure of variability in task variables in the context of task redundancy (Valero-Cuevas et al., 2009; Dingwell et al., 2010; van Beers et al., 2013). By *temporal variability* we mean the time-varying features of the multidimensional task variables, e.g., fingertip or resultant forces in this case. Lest the reader think that time-varying actions during static force production or grasp is an oxymoron, others and we have shown that finger muscles and fingertip forces exhibit rich dynamics during static grasp (Santello and Soechting, 2000; Valero-Cuevas et al., 2009; Rácz et al., 2012). Being considered and called uncontrolled, the implicit and explicit assumption is that task-irrelevant variability exhibits the spatial and temporal properties of uncontrolled dynamical processes. In the anomalous diffusion literature, this is considered either a white noise process, consisting of uncorrelated samples, or Brownian motion, formed by the integration of the former (Ben-Avraham and Havlin, 2000; Kantz and Schreiber, 2004). In the context of neural control, we take it to mean the state of least control (i.e., truly uncontrolled where the dynamics of the plant is not influenced by the controller). Conversely, a controlled process, continuously or intermittently (Collins and De Luca, 1994; Guckenheimer, 1995; Milton et al., 2009a; Suzuki et al., 2012), will exhibit the temporal properties of controlled dynamical processes such as negative correlations between time samples (i.e., if a task variable moves in one direction, at some future time it will require a corrective action in the opposite direction). Please also note that the mechanical properties of the musculoskeletal plant act as filters on the neural input, and can give to correlations in the output. This is a limitation common to all studies of neural commands. Therefore, studying the force variability that naturally occurs in static grasp provides unique opportunities to reveal the time-varying nature of control actions without having the confounding, or at least superimposed, effects of additional dynamics coming from other features of more dynamical tasks such as gait (Dingwell et al., 2010). By applying a combination of temporal and spatial analysis techniques to multifinger static grasp, we find that task-relevant and task-irrelevant variables are both subject to strong and weak control actions at different time scales. Therefore, these results provide evidence against the UCM hypothesis and the Minimal Intervention principle. We conclude that it is necessary to revisit and revise our understanding of variability vis-á-vis task relevance when inferring control action and understanding how the CNS confronts task redundancy.

# **METHODS**

We combine linear spatial approaches and non-linear temporal approaches to (1) quantify the spatio-temporal nature of the variability in both the task-relevant and task-irrelevant subspaces; (2) compare them to the mechanical predictions of necessary control actions for the task; and (3) evaluate them in light of the UCM hypothesis and Minimal Intervention principle. We selected the task of static tripod grasp because it is a common and useful redundant motor task, and a fundamental aspect of human manipulation (Yoshikawa and Nagai, 1991; Flanagan et al., 1999; Rácz et al., 2012).

# **DATA COLLECTION**

We asked 12 young, healthy and consenting subjects (ages 20–36, 6 males, 9 right-handed) to perform a static tripod grasp of an instrumented rigid object designed and built in our lab (**Figure 1**), whose use has been reported in Rácz et al. (2012). While performing the grasp, the thumb, index and middle finger were in contact with three ATI Nano17 6-axis force transducers (Apex, NC, USA) locked in a configuration comfortable for each subject. The angle between index and middle finger was approximately 30◦, while the angles formed with the thumb by each finger were approximately 165◦. Each force transducer was coated with a Teflon surface to reduce reliance on friction by the subjects to achieve a stable grasp. The force transducers were connected to a 16-bit National Instruments 6225 M-series data acquisition card (National Instruments, Austin, TX, USA). Attached to the object were three markers for motion capture, forming an equilateral triangle, whose plane was parallel to the grasp plane of the three fingertips. Seven motion capture cameras (Vicon, Oxford, UK) allowed us to measure the object's position and orientation to quantify how well the subject met the task goal of maintaining a simple static grasp.

Furthermore, three different weights (50, 100, and 200 g) were attached from below to the object (**Figure 1C**). Additionally, the latter half the trials were performed with visual feedback presented to the subjects approximately 1 m away on a 23 inch computer screen. The visual feedback consisted of a horizontal target line representing the target sum of normal forces (in Newtons) applied by three fingers, and a crosshair representing the actually applied sum of normal forces (**Figure 1D**). The goal in those trials with visual feedback was to align the horizontal component of the crosshair with the target line and keep the variability of force application minimal. The target force was the average sum of normal forces applied by subjects across all trials without visual feedback. In effect, the visual feedback added another task-relevant dimension to the task, besides keeping the grasp as static as possible.

Subjects performed all trials with their dominant hand determined as per Oldfield (1971), as shown in Rácz et al. (2012). Subjects were seated in a chair, with the grasping hand resting on the chair's armrest (**Figure 1**). Moreover, we asked subjects to immobilize the wrist of their grasping hand by gripping the wrist with their non-dominant hand to minimize wrist rotation and

hand translation, since we were interested in the coordination of fingertip forces for steady-state static grasp with as little motion as possible.

Subjects performed three repetitions of static grasp trials of 68 s duration for each weight and each visual condition, for a total of 18 trials per subject (3 × 3 × 2). The instructions to the subjects were to simply hold the object in a static tripod grasp with as little motion as possible, as in **Figure 1**. Even though the object was light (max. 260 g), we provided subjects with 1 min of rest to avoid fatigue or discomfort. Trials were block randomized: the different weights were attached in random order for each condition, but the nine trials with visual feedback were always performed after the ones without. This was because the target total grasp force line height was based on the self-selected average sum of normal forces for each weight in the non-visual condition. The individual experimental conditions are described in **Table 1**.

# **DATA PREPROCESSING**

The three-dimensional force data recorded by each transducer were sampled at 400 Hz, while the motion-capture marker positions were sampled at 200 Hz (both force and motion data collection were triggered synchronously). We removed the first seven and last 1 second(s) from each trial's time series to avoid transients. Next, we downsampled both the force and motion capture time series to 100 Hz, to balance the need for temporal resolution and computational cost. Having performed the same analysis on a subset of the trial at the original sample rate, we subsequently found that results were unaffected, when repeating the analysis at lower sample rates. Hence, 100 Hz was found to be a useful compromise as it still allows for a physiologically meaningful temporal resolution on the order of 10−<sup>1</sup> s.

As is required by our temporal analysis, see below, we did not filter the data to avoid creating artifactual correlations.

# **DATA ANALYSIS—SPATIAL**

To analyze the spatial coordinated action among the three fingertip forces, we first performed principal component analysis (PCA) on the time series of each sensor's normal forces for each trial. PCA is a popular linear method for the estimation of spatial correlation structures in data (Clewley et al., 2008). Specifically, we computed the three principal components (PCs) of the 3 × 3

**Table 1 | Overview of the experimental conditions (number of trials in parentheses).**


*The instructions to the subjects were to simply hold the object in a static tripod grasp with as little motion as possible, as in Figure 1*

normal force covariance matrix (q-PCA). Each PC is a unit vector whose elements, called loadings, specify the multidimensional correlation among variables; and a combination of PCs forms a basis defining a vector subspace that is a linear approximation to the spatial correlation structure in the data (Clewley et al., 2008). PCA has been commonly used to estimate effective degrees of freedom in motor systems, and in the context of the UCM hypothesis to compute task-relevant and -irrelevant latent variable spaces, which are represented by the orthogonal PC vectors (e.g., Santello and Soechting, 2000). We then projected the 3-dimensional normal forces (one normal force per force sensor) time series data onto the three principal components. Following Rácz et al. (2012), we call the first, second, and third principal components the Grasp, Compensation and Hinge Modes of this task (**Figure 2**), respectively. We also tested doing this same analysis on the full 3D force data (normal and two tangential force components per force sensor, see Discussion and **Figures 10**–**13**) but the results are unchanged from when using only the normal force component from each sensor, in particular since the magnitude of the tangential force fluctuations were several orders of magnitude smaller than those of the normal forces, but not their mean levels, since vertical tangential components are required to sustain the weight of the object against gravity. Importantly, adding tangential forces to the analysis adds several task-relevant or task-irrelevant dimensions, which however, does not affect the fundamental question or findings of this study, i.e., the implications of certain temporal dynamics for the study of control of task-relevant and -irrelevant dimensions.

# **DATA ANALYSIS—TEMPORAL**

Next, we applied Detrended Fluctuation Analysis (DFA) to each projected time series (Kantelhardt et al., 2001) to detect *temporal correlations in non-stationary time series*. It has the advantage, in particular over the classical time-lagged autocorrelation function, that it can distinguish unwanted trends of arbitrary order that can give rise to spurious non-zero correlations, from

**FIGURE 2 | Illustration of the three Modes of normal forces associated with the principal components computed from the data and the simulations, across all subjects, and conditions [adapted from Rácz et al. (2012)].** Please note that the loadings are the unit vectors describing the multidimensional correlation defining each PC. Therefore the loadings for this PC show that the thumb, index and middle finger forces all co-vary in this Mode. We refer to these three PCs as: (i) the task-irrelevant *Grasp Mode*, along [0.81 0.41 0.41]*<sup>T</sup>* , as it reflects synchronous increases and decreases in the three normal forces, which are also known as grasp forces, (ii) the *Compensation*

*Mode*, along [0.0 −0.71 0.71]*<sup>T</sup>* , reflecting the out-of-phase opposition, or compensation, of thumb normal force by either the index or middle finger normal force, and (iii) the task-relevant *Hinge Mode*, along [0.6 −0.5 −0.5]*<sup>T</sup>* , reflecting an increase (decrease) in thumb normal force accompanied by a simultaneous decrease (increase) in the index and middle finger normal forces, which would typically occur if the object was accelerated by the thumb, thus violating the mechanical task requirements of static grasp (without loss of generality, the violation of static grasp by purely rotating the object using tangential forces is not considered here, see Discussion).

actual long-range correlations in non-stationary data. Examples of non-stationary data are time-series with trends that are long relative to the length of the time series or which exhibit clustering—mathematically speaking, data whose two-point autocorrelation is time-variant. DFA has been used extensively for the analysis of behavioral and physiological data (Hausdorff et al., 1996; Peng et al., 1998; Penzel et al., 2003). Mathematically, it quantifies the power-law increase of the root-mean square deviations from a trend in the time series fluctuations, once segments of increasing length *n* have been subtracted from it to remove trends of that length:

$$F(n) = \left[\frac{1}{L} \sum\_{j=1}^{L} \left(X\_j - (a\_j + b)\right)^2\right]^{\frac{1}{2}}$$

Where *Xj* − *(aj* + *b)* represents the residuals of the linear fit *aj* + *b* to the time series segments *Xj* of length *n*. For a given segment length *n*, there are *L* overlapping segments in the process. The complete expression for *F(n)* represents the average root mean square deviation at segment length, or time scale, *n*. In a non-stationary process, this time scale is related to *F(n)* by the relationship

# *F(n)* ∝ *n*<sup>α</sup>

This power-law increase in root-mean square deviation is mathematically linked to long-range temporal correlations in the data: negative correlations will, over time, lead to a smaller rate of increase than positive correlations. The scaling exponent α indicates the type of correlation, as well as the strength of the relationship between data increments separated by a time scale *n*. DFA reveals empirically (i.e., in a model-free way with minimal assumptions) the inherent time scales for which different temporal correlations exist in the data by showing if the scaling exponent α [i.e., the slope of the logarithmic plots of *n* vs. *F(n)*] differs at different time scales. These time scales are found based on regions of slope linearity in the logarithmic plots of *n* vs. *F(n)*, and thus regions of actual power-law scaling.

In particular, the scaling exponents α can be fit to the logarithmic plots of the time scales *n* vs. the *F(n)* (for an interpretation of these scaling exponents, see **Table 2**).

Because long-range negative correlations reflect corrective actions that prevent dissipation, they are interpreted as evidence for the workings of corrective and stabilizing (i.e., negative feedback) control, while positive correlations can be interpreted as evidence of lack of corrections and thus lack of stabilizing control actions (Collins and Luca, 1993; Collins and De Luca, 1994). (Please note that these notions are related, but not equivalent to notions of stability, which are beyond the scope of this work because our static grasp task is stable). Recent work also supports the idea of interpreting scaling exponents in terms of indicating the degree of control effort (Dingwell and Cusumano, 2010; Dingwell et al., 2010).

To further confirm the reliability of our results, we repeated the DFA on the first and second half of each trial to test if the structure of the variability in normal forces is sensitive to the level of total grasp force. We felt this to be necessary because, as is commonly reported in studies of static grasp (e.g., Johansson and Westling, 1987), we noticed that some trials exhibited a relaxation of the total grasp force, likely an adaptation to reduce fingertip forces over time to mitigate fatigue (see Results).

# **MODELING OF TRIPOD GRASP**

As in Rácz et al. (2012), we applied the same analysis methods to synthetic data generated by a simulation of the task. For a description of the model, see Appendix. In that model the variability in the simulated normal forces comes from our implementation of a standard Brownian random walk [see Appendix and Rácz et al. (2012) for details]. Analyzing data from a strictly mechanical simulation allows us to disambiguate features of mechanical origin from features of the control that cannot be explained by mechanics, and are therefore of likely neural origin [for other examples of this approach see Kutch and Valero-Cuevas (2012), Rácz et al. (2012), and Ristroph et al. (2010)].

# **RESULTS**

# **PRINCIPAL COMPONENT ANALYSIS OF SIMULATED NORMAL FORCES**

**Figure 3** shows the simulated normal forces plotted against each other, which shows that, by construction, the valid solutions populate a plane representing the constraints of the task. In agreement with our mechanical analysis (Rácz et al., 2012), PCA of the simulated data finds the two basis vectors (principal components, or PCs) describing that plane: the Grasp Mode [0.81 0.41 0.41]*<sup>T</sup>* and the Compensation Mode [0.0 −0.71 0.71]*T*, **Figure 2**.

Mechanically, the dynamics associated with the Compensation Mode reflects movement of the intersection point of the three force vectors, as shown in Yoshikawa and Nagai (1991) and Flanagan et al. (1999): as long as the force vectors, extended from their respective application points, intersect in one common point inside the object, there will be no moment exerted on the object. The only physical limitation is that the force vector extended from each fingertip stay within its friction cone. The Grasp Mode, in turn, quantifies changes in the total grasp force, which is equivalent to the intersection point not moving side-to-side on the manifold in **Figure 2**, but rather upand-down as the distance to the origin quantifies the total grasp force.

**Table 2 | Different scaling exponents found by linear fitting in the logarithmic displacement vs. time scale plot.**


These two PCs together explain all the normal force variance in the simulated data. In this idealized case, by construction once again, if the variability of normal fingertip forces exhibits this structure in steady-state static tripod grasp, then such variability will not give rise to accelerations or rotations of the grasped object and exists entirely in the null space of the task. Actual acceleration of the object is associated with variability of normal forces perpendicular to this plane, along the PC vector of the Hinge Mode [0.6 −0.5 −0.5]*T*.

### **PRINCIPAL COMPONENT ANALYSIS OF EXPERIMENTAL FORCES**

As expected, subjects met the task requirements of not dropping the object and holding it still, but still showing some

hard constraint of minimum normal force in the simulation, which for the subjects is more flexible and can result in a downward trend in total grasp variability in their normal forces and object movement. The object markers (for motion capture) stayed well within 5 mm in all directions, and object motion was significantly affected by the presence of visual feedback, but not weight (*p <* 0*.*01, Mann–Whitney U test). Given the mechanics of the task and instructions to the subjects, the small but measurable linear accelerations of the object must be due to dynamics along the Hinge Mode (or to a lesser extent to the unmodeled vertical motion and 3*D* rotation modes given that the wrist was held fixed).

We applied PCA to the time series of experimental normal forces (see **Figures 4**, **5** for representative trials for two different conditions) and, as expected from the mechanical requirements of the task (Rácz et al., 2012), we found that the variability of

feedback. **Bottom:** The three normal forces during a trial plotted individually. Note the elongated distribution of the data is here seen as a downward trend in the three fingers.

force in trials without visual feedback.

normal forces consistently exhibited a structure described by the three principal components found in the simulation.

In the case of no visual feedback, the Grasp Mode obtained from PCA explains approximately 90% of the normal force variance, while the Compensation Mode approximately 5–10% and the Hinge Mode 1–3% (**Figure 6**). In contrast, in trials with visual feedback the Grasp and Compensation Modes contribute roughly equally to the normal force variance, slightly less than 50% each (**Figure 6**) with 1–3% accounted for by the Hinge Mode. The low percentage of variance explained by the Hinge Mode in both cases shows that subjects were mindful of the request to perform static grasp, and satisfied the task requirements of not accelerating the object. Lastly and not surprisingly, the Hinge Mode shows almost

no variation over time given that the object was held relatively still, as confirmed by motion capture. **Figure 10** further shows that the variance explained by three Modes remains unaffected even if we consider all three force components for each digit.

# **DETRENDED FLUCTUATION ANALYSIS OF TIME SERIES PROJECTED ONTO PRINCIPAL COMPONENTS**

Our first finding is that the Grasp, Compensation and Hinge Modes all naturally exhibit three distinct scaling regions, representing temporal correlations at three different time scales (**Figure 8**). In particular, the distinct time scales are at 10, 100, and 1000 s of milliseconds, subject to some fluctuation. Due to this fluctuation, we calculated the scaling exponent only for a conservative subrange of these time scales that was common to all trials and subjects, i.e., 1–50, 250–500, and 3500–7000 ms.

In the following, all reported changes in scaling exponents α (i.e., slopes of the log–log plots) are statistically significant at the *p <* 0*.*01 level, based on Kruskal–Wallis (across the three weight conditions) and Mann–Whitney U statistical tests (across the two visual feedback conditions). We used these non-parametric test (equivalents of ANOVA and *t*-test, respectively), because inspection of deviations from normality revealed a clear absence of a normal distribution of α required for parametric tests.

# **DETRENDED FLUCTUATION ANALYSIS: GENERAL SCALING EXPONENT RESULTS**

Consider **Figure 9**, which shows the mean scaling exponents across all trials, respectively. At short time scales (1–50 ms), the slopes associated with both the Compensation and Hinge Mode time series are close to 0.5, indicating lack of positive or negative correlation (approximating a random walk) between increments and thus absence of a corrective control effort, while the Grasp Mode has a mean slope of 0.7, reflective of positive correlations (i.e., diffusive growth) in the time series.

At medium time scales (200–500 ms), the slope of the Grasp Mode decreases to 0.5, indicating lack of corrective control effort along this dimension, while the Compensation Mode now indicates the activity of a stabilizing or correcting effort, with the scaling exponents α having decreased to a value of 0.3, and the Hinge Mode shows a very strong negative correlation (indicative of strong corrective action) of RMS deviation scaling with exponent α = 0*.*1, indicating a strong tendency to enforce a constant mean level. Importantly, the 200–500 ms time delays include the shortest voluntary time scales of the sensorimotor system (Kawato, 1999).

The long time scale (3500–7000 ms) is not particularly different from the 200–500 ms time scale in terms of DFA slopes, except that the Grasp Mode now becomes corrective as well, with a slope having decreased from 0.5 to 0.3.

Importantly, DFA scaling exponents did not significantly differ between the first and the second half of the trials.

# **EFFECT OF ADDING VISUAL FEEDBACK**

Solid arrows in **Figure 9** show the effect of adding visual feedback. Note that these arrows indicate only those statistically significant changes found based on our Mann–Whitney U statistical tests. Visual feedback had the predictable effect of decreasing the scaling exponent α for the Grasp Mode at the long time scales of 3500–7000 ms; indicating the success of the long visuomotor loop in keeping the total grasp constant. However, and somewhat counter-intuitively, it also increased the slope of the Grasp Mode at short time scales (1–50 ms), indicating greater positive correlations (i.e., diffusive growth) in the short latencies not affected by the visuomotor loop. This may reflect increased signal-dependent noise and spurious corrections known to result from higher gains in the motor and sensory components of a feedback loop—in this case the visuomotor loop. The Hinge Mode was the only other Mode affected by visual feedback; where its slope in the long time scales became slightly, but statistically significantly, more corrective as it is changing from 0.13 to 0.1.

# **DETRENDED FLUCTUATION ANALYSIS: EFFECT OF INCREASING WEIGHT**

Dashed arrows in **Figure 9** show the effect of adding weight to the object. Note that these arrows indicate only those statistically significant changes found based on our Kruskal–Wallis statistical tests. The α slope of the Grasp Mode at scales (1–50 ms) increased toward to 1.0, as in the case of adding visual feedback. Again, this perhaps reflects the increase in signal-dependent noise with the need for greater grasp forces. Signal-dependent noise scales linearly with force and is observed in the 8–12 Hz frequency band of force measurements (Jones et al. (2002), i.e., time scales of *<*125 ms) and induces positive mechanical correlations across fingers due to reaction forces. The only other significant effect of weight was a slight increase of the Hinge Mode slope in the medium time scales, possibly reflecting the increased difficulty of maintaining immobile the more massive objects, which would show less effective corrections in this time-scale (see **Figure 9**).

# **DISCUSSION**

Our spatio-temporal analysis of static grasp demonstrates that fingertip forces exhibit evidence of corrective actions and absence of corrective actions in both the task-relevant and task-irrelevant task subspaces. Our main message is that, during a static tripod grasp, we find examples at different time scales of how taskirrelevant parameters, which are commonly associated with the UCM, are actively controlled, and how task-relevant parameters (i.e., performance variables) are not actively controlled. This evidence critically extends our approach to task relevance, and compels us to revise our understanding of neural control of task redundancy. In particular, our results challenge the currently dominant approaches to redundancy of the UCM Hypothesis and the Minimal Intervention principle that advocate a separation of control strategies between task-relevant and task-irrelevant variables. Rather, we demonstrate that there exist corrective actions common to all task variables that supports the notion of a continuum, rather than a separation, of neural control strategies common to both task-relevant vs. task-irrelevant variables. Moreover, the similarity of control actions across time scales seen in both task-relevant and task-irrelevant spaces points to a level of modularity in corrective action not previously recognized. After explaining how methodological considerations do not challenge our main findings, we discuss the implications of our results to our understanding of neural control of task redundancy.

# **METHODOLOGICAL CONSIDERATIONS**

We find that variability of the normal forces of the fingertips on the object during static grasp suffices to show a counter example to current thinking about neural control of task redundancy. We designed our experimental paradigm of static equilibrium to sidestep methodological and theoretical difficulties encountered by prior studies of more complex tasks, e.g., (Dingwell et al., 2008, 2010; van Beers et al., 2013). Studies investigating the UCM hypothesis and Minimal Intervention principle must restrict themselves to a measurable subset of performance variables (it is not practical to record EMG from all muscles, angles from all joints, etc.) during well-defined tasks (like planar limb motion or body motion in the sagittal plane). We used a mechanical model developed in Rácz et al. (2012) to interpret our normal force data, and were careful to only analyze trials for which the linear and angular accelerations were measured as negligible based on motion capture data, and thus considered as static grasp. We initially analyzed the 9-dimensional system that included tangential forces of all three fingertips, but found that the only significant tangential forces were those counteracting gravity. They were relatively constant, which is not surprising given the trials we considered as valid examples of static grasp. The magnitudes of the fluctuations of the other tangential forces (those in the horizontal plane) were several orders of magnitude smaller than the normal forces, and therefore considered negligible for the purposes of making our main point. Namely, showing a counter example of task-irrelevant task variables (those associated with the UCM) being actively controlled during a static tripod grasp. Tangential forces add to the dimensionality of the motor control task—some of the performance variables associated with these dimensions could be identified as task-relevant, others as being part of the UCM. However, any additional task variability dimension is, mathematically and in the context of the UCM hypothesis, perpendicular to existing dimensions (e.g., moment cancelation efforts do not necessitate normal force variability, from a purely mechanical point of view). Therefore, whether or not these additional dimensions are subject to control (i.e., constitute additional performance variables) has no bearing on our main finding that there exist at least two task-irrelevant (from a UCM point of view) dimensions of variability (i.e., Grasp and Compensation Modes) that are being continuously controlled in the task of static tripod grasp, while simultaneously, there exists a task-relevant direction, or performance variable (i.e., Hinge Mode), that is not controlled at short time scales. Nevertheless, studying potential coupling between mechanically independent task dimensions is a worthwhile problem. In fact, we have looked at this problem for a similar (but dynamic) task in a previous paper (Rácz et al., 2012), but that analysis and discussion is beyond the scope of this work.

Our methodology has some important strengths and differences compared to prior work that uses a temporal analysis. Our work on multifinger manipulation differs from that of locomotion (Dingwell et al., 2008, 2010), reaching and gaze shifting (van Beers et al., 2013) in that: (1) it is substantially simpler problem than locomotion and therefore easier to identify performance variables; (2) it is equally important to activities of daily living; and (3) particularly relevant to human evolution. In particular, Dingwell et al. (2010) recently showed that gait, a non-linear dynamical task, exhibits the expected greater variability along goal-irrelevant directions as per the UCM and Minimal Intervention principle. In agreement with our findings, they find corrections for deviations in both goal-relevant and -irrelevant directions; but prefer to say that the nervous system largely "ignores non-essential variations." While they use DFA to study the correlation structure along each projected time series, they interpret the scaling exponents as continuous variables that indicate different levels of control action at different time scales in different subspaces. Given the complexity and non-linearity of their task, they explore variations in model structure to alter what was being controlled, but not the task variables, to further strengthen their conclusions. We did not need to do that because we chose a simpler task where the analytical solution to the mechanics of the system and task allows us to define our Modes, and interpret the scaling exponents. Importantly, they cite us (Valero-Cuevas et al., 2009)—when stating that quantification of variances along spatial dimensions alone can lead to incorrect conclusions about control—as motivation for their use of temporal analyses as a necessary next step. This is the point we also now make by emphasizing spatio-temporal analyses for static grasp. In fact, it is perhaps a testament to the utility of these spatio-temporal analyses that, even when done at multiple levels of observation and across multiple tasks, different studies agree that temporal dynamics is critical to proper interpretation of neural control. Lastly, van Beers et al. (2013) study two simultaneous discrete movement tasks: reaching and gaze shifts between visual targets that are not related to our work in multifinger grasp. However, their autocorrelation analysis of task-relevant and task-irrelevant variables shows that task-irrelevant variability is corrected less intensively. Because their tasks are dynamical target-driven tasks, their interpretation of the temporal structure of variability in the task-irrelevant variables is motor exploration, learning and performance optimization. Given that our static grasp task is simpler and has clear goals that can be modeled mechanically, we can make stronger claims as to the nature and structure of the variability. Our approach is, however, necessarily silent about methodological issues in those other non-linear dynamical experimental and analytical paradigms. However, we agree with them in that active exploration for fatigue mitigation is a potential benefit of variability in these task-irrelevant variables (see below).

Furthermore, it is important to consider prior studies that have identified voluntary and involuntary collaborative force interactions among fingertips when pressing or grasping rigid objects [e.g., Baud-Bovy and Soechting (2001); Scholz et al. (2002); Shim et al. (2005); Latash and Zatsiorsky (2009); see for review Schieber and Santello (2004)]. From the mechanical perspective, many extrinsic flexor and extensor muscles are multitendoned or have multiple compartments subject to a certain level of common neural inputs [but the thumb and index finger are largely independent (Brand and Hollister, 1999)]. This provides a level of mechanical coupling across fingers—which is mostly known to prevent large, individuated or disparate finger motions [Agee et al. (1991); Brand and Hollister (1999); Zilber and Oberlin (2004); as reviewed by Schieber and Santello (2004)]. Our task was designed to consider these potential confounds by requiring a low-magnitude static grasp in postures where all fingers are similarly flexed so that tendinous interconnections do not play a dominant role. Common neural inputs to muscles across fingers are also not a confound because, as reported by Latash and Zatsiorsky (2009), those common drives do not produce the kind of variability that leads to a pervasive dynamic Grasp Mode in the low frequency range during non-grasp force production tasks. Common neural input, by definition, is composed of highly correlated short-latency (i.e., high frequency) discharge of motor units. As reported by Bremner et al. (1991) the duration of the synchronization ranged from 5 to 31 ms (mode = 13 ms). These latencies are only applicable to the shortest (i.e., 1–50 ms) time scales in **Figure 9**. Moreover, the Grasp Mode captures such effects of common neural drive because it is defined as synchronous increase or decrease of finger forces. Common neural drive would not enter the other Modes because they require opposing (i.e., synchronous increases and decreases) in finger forces. Thus, common neural drive cannot explain our findings of evidence of control action in task-irrelevant variables, and lack of it in task-relevant variables, that are spread across Modes and time scales.

Lastly, Bryce and Sprague (2012) have urged caution when analyzing non-linear or non-stationary signals with DFA. However, our goal is not to estimate exact or specific Hurst exponents, but rather show that a clear deviation from the 0.5 line exists, much like in the recent work by Dingwell. We did consider the potential confound of non-stationary time series, but our results are robust with respect to analyzing first and second halves of each trial. Furthermore, we do not observe an initial curvature mentioned by Bryce and colleagues, among other things because we do not allow for estimation of very small time scales, as mentioned in our methods. Instead, we see an initial linear region, with scaling different for each Mode. This finding is robust across subjects and trial halves. This underscores the stability of our conclusion: that task-irrelevant dimensions are indeed subject to control intervention, and vice versa, and that this observation is time-invariant.

# **SPATIAL ANALYSIS**

Our simulation results clearly show that the first two principal components, the Grasp and Compensation Modes, span the null space of force dynamics associated with successful static grasp: variation of force inside this manifold does not violate the constraints of static grasp (i.e., zero net force and moment). Given however, that noise and variability are inevitable elements of neuromuscular systems, successful task completion naturally leads to the population of the null space manifold, and task-relevant variability in the Hinge Mode orthogonal to the solution manifold (i.e., modulating linear motion of the object in violation of the static task requirement) will be minimal, but not necessarily zero.

In the case of static grasp, the fingertips are coplanar in the horizontal plane, and their vertical tangential components serve to cancel gravity. Therefore, the point of intersection of 3D force vectors in the horizontal plane can either:

**Remain stationary**. In this case the only possible changes in the fingertip force vectors are to increase or decrease their magnitudes simultaneously and proportionally, i.e., change the total grasp force. Mind the fact that these magnitudes are bounded above by finger strength and the possibility of crushing the object; and below by the need to support the object against gravity. Regardless of the location of the point of intersection within the object, such simultaneous and proportional increases or decreases in 3D fingertip force vector magnitudes will induce identically simultaneous and proportional changes in the normal component of the normal forces. This is captured by the Grasp Mode where all normal forces are positively correlated and therefore having PC loadings of the same sign, as in [0.81 0.41 0.41]*<sup>T</sup>* in **Figure 2**. Please note that the loadings are the unit vectors describing the multidimensional correlation defining each PC. Therefore the loadings for this PC show that the thumb, index and middle finger forces all co-vary in this Mode. This analytical argument shows that the normal forces suffice to detect the spatial correlation structure defining the Grasp Mode. To confirm this, **Figure 11** plots the loadings of the 1st PC of the 3D force analysis case (i.e., normal and two tangential forces for each digit) for all subjects and trials. This 9D equivalent to the Grasp Mode shows that positive correlation of all three normal forces dominates, and that the loadings of the tangential forces straddle the zero line (i.e., do not show strong covariation with the normal forces) to create a vector roughly [0 0 0.8 0 0 0.4 0 0 0.4]*T*.

**Move within the object**. If, say, the thumb force vector maintained its magnitude but changed its direction along an arc to the right by increasing its the tangential component and decreasing its normal component, then maintaining static equilibrium (as it was in the experiments we analyzed) would require the other two fingertip force vectors to track the 3D thumb force vector. In so doing, the magnitude of one fingertip force vector must increase, and the other decrease. This lengthening and shortening of the vectors must again be simultaneous and proportional. Once again, this will also induce identically simultaneous and proportional changes in the normal component of the normal forces. This is captured by the Compensation Mode, where one normal force is positively correlated with the thumb force and the other negatively. Thus the fingers have PC loadings of opposite signs, as in [0 −0.71 0.71]*<sup>T</sup>* in **Figure 2**. That is, the full 3-component force vectors are not required to detect these changes. The normal forces suffice to detect these changes and their associated structure as the Compensation Mode. To confirm this, **Figure 12** plots the loadings 2nd PC of the 3D force analysis case for all subjects and trials. This 9D equivalent to the Compensation Mode shows a dominant anti-correlation between the loadings of the normal forces of the index and middle finger, and that the loadings of the normal force of the thumb and all tangential forces straddle the zero line to create a vector roughly [0 0 0 0 0 −0.7 0 0 0.7]*T*.

A different combination of normal forces is the one perpendicular to the manifold. This is the "Hinge Mode" that would induce linear motion, with PC loadings [0.6 −0.5 −0.5]*<sup>T</sup>* (thumb normal force increasing and simultaneous and proportional decreases in the fingers' normal forces). Our results show that dynamics along this task-relevant Mode was minimal because, by construction, we only analyzed cases where the object was in static equilibrium, in agreement with the UCM hypothesis that this Mode exhibits less variability. The normal forces suffice to detect these changes and their associated structure as the Hinge Mode, as also shown for the 3D force analysis case in **Figure 13**.

Very critically, we did not need PCA to identify our three Modes empirically. Rather, these were prescribed by the analytical solution to the mechanics of the system and task. PCA was only applied to the experimental data to identify for each subject the directions of normal force variability that maximally corresponded to the known directions inferred from mechanical analysis. For all subjects, these agreed well, by construction, with the closed-form analytical solution as mentioned in the results.

Our experimental spatial results, as expected, are in agreement with our simulations and the prior evidence for the UCM Hypothesis and the Minimal Intervention principle (Scholz and Schoener, 1999; Jordan, 2003): the variance in task-relevant variables is smaller than in the task-irrelevant spaces. The difference in variance explained by the Grasp and Hinge Modes in each case is explained by comparing **Figures 4**, **5** where, in the absence of visual feedback, the total grasp force (Grasp Mode) shows large variability that is absent when visual feedback is provided to avoid such drift. More specifically, the projection of the fingertip force time series data recorded without visual feedback onto the Grasp Mode shows a very slow monotonic downward trend (**Figure 7** for a representative trial). We interpret this slow trend to be the major contributor to large spatial variability explained by this mode: it is caused by the three fingers reducing their normal forces simultaneously. This underscores an important shortcoming of PCA when applied to non-stationary signals (for a detailed discussion see Clewley et al., 2008). On the other hand, in trials *including* visual feedback, the Grasp Mode does not exhibit such a trend (**Figure 5** for a representative trial). This is not surprising, since holding a constant total force is now an explicit


**Table 3 | Summary of findings, highlighting in bold discrepancy among UCM and Minimal Intervention (MI) predictions and temporal Detrended Fluctuation Analysis (DFA) results.**

*As per Figures 8 and 9, by uncontrolled we mean evidence of unstable and Brownian growth (scaling exponents* -*0.5), and by controlled we mean evidence of corrective action (scaling exponents <0.5). In particular, both the Grasp and Compensation Modes are task-irrelevant but show temporal features of corrective action at some time scales. Similarly, the Hinge Mode is task-relevant but shows temporal features of lack of corrective action at some time scales.*

task constraint, converting the Grasp Mode into a task-relevant Mode (see **Table 3**). As a consequence, the Compensation Mode (the other task-irrelevant dimension) now contributes a larger proportion of the overall variability (**Figure 6**). The fact that variability in the Grasp Mode does not disappear with visual feedback is well known and can be attributed to unavoidable motor noise, and other central and peripheral sources of correlated finger forces (Santello and Soechting, 2000; Poston et al., 2010; Rácz et al., 2012). The Compensation Mode also exhibits a slow non-monotonic modulation both increasing and decreasing over time (**Figure 7**). This indicates that index and middle finger normal forces are slowly and continuously modulated, out of phase, during static grasp.

As we have argued before (Clewley et al., 2008; Kutch and Valero-Cuevas, 2011; Kutch and Valero-Cuevas, 2012), PCA of analytical solutions and experimental data alike naturally show a reduction in the dimensionality of task variables, which is a necessary result of meeting task constraints with a biomechanical plant. But this does not imply that the CNS is itself using a low-dimensional controller to simplify or optimize the redundancy problem. Rather, this simply reflects the structure of the solution space. Therefore, the question in not only whether the CNS *can* meet the requirements of the task (by definition it did if the task was accomplished), but also *how* it continues to meet them as time goes by. This makes temporal analysis of task variable dynamics critical to understanding the neural control actions in both the task-relevant and task-irrelevant spaces.

### **TEMPORAL ANALYSIS**

Our DFA results, on the other hand, demonstrate the presence and absence of corrective actions by the CNS at different time scales in both the task-relevant and task-irrelevant task subspaces. Both linear and non-linear time series analysis has been commonly employed to reveal temporal correlation structures (positive or negative) indicative of control strategies (destabilizing or stabilizing, respectively), primarily in postural control research (Collins and De Luca, 1994; Jeka et al., 2004). For instance, in a seminal paper by Collins and De Luca (1994) the authors demonstrated a complex correlation structure in the center-of-pressure time series recorded during quiet stance, a highly redundant task. However, this perspective has not been

**FIGURE 8 | Representative DFA of projected normal force time series from one subject, where the data were collected in a 200 g weight trial,** *without* **visual feedback.** The plot shows the three scaling regions (1–50, 250–500, and 3500–7000 ms) which we used to fit the scaling exponent, for each normal force correlation Mode (Grasp, Compensation, and Hinge

Modes). The red lines show the linear fits to the behavior of diffusion vs. time scale—their slopes can either be greater than, equal or less than 0.5 (dashed line), indicating the diffusive process is positively correlated (pc or uncorrected divergence), uncorrelated (uc or Brownian motion), or negatively correlated (nc or corrective action), respectively, at those time scales.

brought to bear to the study of task redundancy. Once again, one can argue that the available literature endorses *preferential* as opposed to a strict separation into clearly controlled and uncontrolled variables, but we lacked a specific quantification of the temporal nature of the dynamics of task-relevant and taskirrelevant that would allow us to infer the neural control strategies in each space.

As per **Figures 8**, **9**, we find that both task-relevant and task-irrelevant variables exhibit the features of uncorrected divergence, Brownian motion and corrective action, depending on the time scale considered—as evidenced by positive, neutral, and negative correlations between force increments separated by different time periods (i.e., scaling exponents *>*0.5, = 0*.*5, and *<*0.5, respectively). As per **Table 3**, the UCM and Minimal Intervention approaches would predict a clearer separation of corrective actions (i.e., control strategies) across task-relevant and task-irrelevant variables.

The temporal features of the task-irrelevant Grasp Mode challenge the UCM Hypothesis and the Minimal Intervention principle. The Grasp Mode (when no visual feedback is given) exhibits all three control strategies as the time scales lengthen, and goes from uncorrected divergence, to Brownian motion to

**FIGURE 10 | Percent variance explained when considering three-dimensional forces for each digit (3D force analysis that is 9-dimensional given three forces for each of three digits, empty box plots) vs. when considering only the normal force at each digit (normal force analysis, gray box plots).** The box plots show the variance explained by each PC from all subjects and trials, where the 1st PC explains the majority of the variance, the 2nd PC a modest amount, and the third PC less than 10%. The remaining variance explained by PCs four to nine is shown for the 3D force analysis. The structure of each PC is given by its loadings (as shown in **Figures 11**–**13**). Those figures shows that, even in the 3D force analysis case, the 1st, 2nd, and 3rd PC's represent the Grasp, Compensation and Hinge Modes seen in the normal force analysis. This consistency across percent variances explained demonstrates that the reduced normal force analysis is valid and equivalent to the full 3D force analysis.

corrective action. The slow downward trend in total grasp force in trials when without visual feedback happens at medium to long time scales—so it does not explain the uncorrected divergence seen at the short time scales. Such divergence, which was also present and even accentuated with visual feedback, is more likely a consequence of positive correlations that can be shown to be a result of the interplay between purely random signaldependent noise (Jones et al., 2002), motor unit synchronization (Schieber and Santello, 2004), and instantaneous (but low-pass filtered by skin compliance) mechanical reaction forces. This variability in what are both task-irrelevant and task-relevant variables is nevertheless left uncorrected by the CNS either as part of the neural control strategy or because of inability to do so at such short latencies. Alternatively, we can argue that task-irrelevance is not only a spatial consideration but also a temporal one, where low-magnitude or short term variability is accepted and only corrected upon crossing a certain spatial or temporal threshold. But such interpretation is not really compatible with the UCM Hypothesis and the Minimal Intervention principle, but rather with other theories specifically phrased to advocate intermittent or drift-and-act control as an optimal strategy (Collins and De Luca, 1994; Guckenheimer, 1995; Milton et al., 2009a; Suzuki et al., 2012).

**FIGURE 11 | Loadings of the 1st PC, the Grasp Mode, when considering the 3D force analysis case (normal plus two tangential force components, empty box plots) for each digit vs. when considering only its normal force (normal force analysis, gray box plots).** Box plots show loadings from all subjects and trials. Note that the loadings of all tangential forces (Fx and Fy) straddle the zero line, demonstrating that they are not relevant to the correlation structure of the 1st PC. The normal force components (Fz) of all digits have positive and non-zero loadings, indicating that the structure of this PC using normal forces is equivalent to that of the full 3D force analysis. The dispersion or exact median values in the box plots are not the means to establish the task-relevance or task-irrelevance of the PC. That dispersion is a consequence of natural variability and inaccuracies in motor performance, and unavoidable sensor noise. It is the goals of the task and mechanical analysis that determine how to identify the task-relevant and task-irrelevant Modes.

**FIGURE 12 | Loadings of the 2nd PC, the Compensation Mode, when considering the 3D force analysis case for each digit (empty box plots) vs. when considering only its normal force (gray box plots).** Box plots show loadings from all subjects and trials. Note that the loadings of the normal force components (Fz) of the thumb, and all tangential force components (Fx and Fy), straddle the zero line, demonstrating that they are not relevant to the correlation structure of the 2rd PC. The normal force components (Fz) of the index and middle finger exhibit anti-correlation, indicating that the structure of this PC using normal forces is equivalent to that of the full 3D force analysis. The increase in dispersion in the full 3D force analysis compared to the Grasp Mode in **Figure 11** is naturally associated with the increased susceptibility to measurement noise as this Mode explains much less of the variance in the data, see **Figure 10**.

The neutral and negative correlations in the Grasp Mode at medium and long latencies, respectively, cannot be attributed to control intervention to avoid dropping the object due to a critical reduction in Grasp Mode force. The total grasp force level

tangential forces (Fx and Fy) of the thumb, index and middle fingers straddle the zero line, demonstrating that they are not relevant to the correlation structure of the 3rd PC. The normal force components (Fz) of the thumb exhibits anti-correlation to those of the index and middle fingers, indicating that the structure of this PC using normal forces is equivalent to that of the full 3D force analysis. The increase in dispersion in the full 3D force analysis compared to the Grasp Mode in **Figure 11** is naturally associated with the increased susceptibility to measurement noise as this Mode explains much less of the variance in the data, see **Figure 10**.

always remained well above the weight of the object, the hand was held still, and the scaling exponents were unchanged between the first and second half of the trials (**Figure 9**)—and slip-grip responses happen at latencies well below 200 ms (Cole and Abbs, 1988; Gysin et al., 2003; Rácz et al., 2012). Thus we conclude that corrective control intervention depends on factors other than safety boundaries or automatic grasp tendencies seen only during dynamic manipulation (Rácz et al., 2012). Moreover, such corrective control intervention occurs regardless of whether the Grasp Mode is task-irrelevant or task-relevant (when without or with visual feedback, respectively). Further challenging the UCM Hypothesis and the Minimal Intervention principle, the taskirrelevant Compensation Mode also exhibits corrective control intervention at medium and long time scales.

DFA exposes an absence of correlation at very short time scales in the task-relevant Hinge Mode. This indicates an absence of corrective actions (i.e., control). This lack of control may, however, simply be due to the inability of the neuromuscular system to do so at such short latencies; or may be evidence of an intermittent or drift-and-act strategy. While finding the reasons for this requires further investigation, it is nevertheless important to point out this important temporal feature not previously addressed by the UCM Hypothesis and the Minimal Intervention principle, to the best of our knowledge. That is, the fact remains that, due to physiological limitations or control strategy, even highly task-relevant variables are left uncontrolled at some time scales.

The fact that the results are so similar between the first and the second halves of the trials indicates that the observed dynamics and the associated correlation structure depend neither on time nor the total grasp force (which can be interpreted as location in the force space; or in control terms our findings are not state-dependent). This in turn suggests a temporal control strategy that is state-independent (except potentially *at* the boundaries; which we have no reason to believe our subjects approached, but could be an important next research step).

One possible explanation for the observed negative correlations along the Grasp and Compensation Modes could be that traversing the solution manifold is an active process, through which the CNS actually takes advantage of redundancy. Specifically, controlled dynamics along the Compensation Mode corresponds to the regulation of the index and middle finger contributions to the opposition of thumb normal force. In agreement with others, we speculate that the may be actively trying to shift the demands between the two fingers over time, which in turn might mitigate effects of fatigue at the muscle level (e.g., Cote et al., 2002; Dingwell et al., 2008; van Beers et al., 2013). By gradually varying fingertip forces, the CNS can achieve a change in the underlying muscle coordination pattern, which in turn will change the rates of fatiguing of individual muscles, thus allowing for improved use of available resources. The slow downward trend along the Grasp Mode direction of normal forces agrees with this fatigue reduction strategy: a general reduction of forces generated by the muscles leads to a reduction in the fatigue rate. But at these low levels of grasp force magnitude, the redundancy of solutions for a given set of fingertip force vectors would also allow changes in coordination patters that would not be detectable as changes in the magnitude or direction of fingertip force vectors. This issue, therefore, deserves further investigation.

Lastly, note that here we do not employ DFA to determine self-similarity or fractional dimensionality in the data, as has been done in some studies (Hausdorff et al., 1996). In those studies, the linearity in the logarithmic plots needs to extend over at least one order of magnitude to count as strong evidence of fractionality (Kantz and Schreiber, 2004). In our case the requirements for the linearity of the logarithmic plots are not as rigid because the quantification of long-range correlations applies to data where the linearity extends over shorter ranges of time scales. Moreover, challenging the preferential separation of control action across task variables as in the UCM Hypothesis and the Minimal Intervention principle only requires evidence of similar corrective actions (or their absence) in both task-relevant and task-irrelevant-which our results clearly show. These results expose a fundamental limitation of the UCM hypothesis and the Minimum Intervention Principle: their focus on spatial aspects of motor variability and disregard for temporal aspects.

## **CONCLUSIONS AND MODULARITY**

We show that both task-relevant and task-irrelevant parameters show corrective action at some time scales; and conversely, that task-relevant parameters do not show corrective action at other time scales. In agreement with recent work in other behavioral contexts, these results propose we revise our understanding of variability vis-á-vis task relevance by considering both spatial and temporal features of all task variables when inferring control action and understanding how the CNS confronts task redundancy. Moreover, these results are counter examples to the UCM hypothesis and the Minimal Intervention principle, as they assume a separation of task variables into relevant and irrelevant ones, indicated by their respective variabilities. As mentioned above, proponents of UCM hypothesis and the Minimal Intervention principle admit the possibility of *preferential* as opposed to strictly *uncontrolled* variables (Latash et al., 2010), or that the nervous system largely "ignores non-essential variations" (Dingwell et al., 2010), but such qualitative distinctions have only begun to be quantified or considered in the spatiotemporal domain when inferring control action. Following up on those qualifications, we present specific spatio-temporal quantitative examples of controlled intervention (or lack thereof) in both task-relevant and task-irrelevant spaces (based on mechanical/mathematical definition of the task and its possible modes of variability) to expand our understanding of neural control strategies. Additional work is needed to revise our view of neural control that takes into considerations both spatial and temporal aspects of neuromuscular function and variability, and the structure and nature of the solution space of the task.

The similar nature of control actions across time scales in both task-relevant and task-irrelevant spaces that we find point to a level of modularity not previously recognized. The spatio-temporal results presented here instead suggest that neural control uses a continuum of control strategies going from uncorrected divergence to strong corrective actions that are not defined by the level of task-relevance of the controlled variables; and which may also involve intermittent and drift-and-act characteristics. Importantly, while the increase in weight and the addition of visual feedback does seem to modulate the dynamics on the individual dimensions, it does not lead to a crossing of the 0.5 line and therefore not to a fundamental change in the control strategy. Our methodological consideration and spatio-temporal analysis allow us to present clear examples of how the task-irrelevant parameters (i.e., elemental variables that are organized to constitute the UCM) are actively and continuously controlled during a tripod grasp at certain time scales, while the task-relevant parameter (or performance variable) is not actively controlled during a tripod grasp at certain time scales. Therefore, we show that estimating the different extents of control based on task variable variances alone (a purely spatial approach) is insufficient, as Dingwell and we had proposed before (Valero-Cuevas et al., 2009; Dingwell et al., 2010). Rather, those variables constituting the UCM (which are again, mathematically defined by the unambiguous mechanics of the task, see **Figure 2**) may have different temporal dynamics, but are not controlled in a fundamentally different way.

This spatio-temporal approach to variability provides a tool to quantify the nature and degree of neural control action, extending the traditional spatial variance magnitude approach by quantifying the temporal nature of variability. For example, traversing the solution manifold is an active process by which the controller *enforces* the constraints of the task. The CNS does not *create* the solution manifold <sup>1</sup> , but rather seeks to inhabit it as has been discussed earlier (Keenan et al., 2009; Kutch and Valero-Cuevas, 2012; Suzuki et al., 2012). As such, the means by which the CNS enters and continually inhabits the solution manifold can be thought of as the implementation of a dynamical attractor on the task variables. In the context of time-varying stochastic behavior of differential and discrete-time distributed systems like the neuromuscular system, the implementation of such a controller enforcing an attractor can be thought of as the implementation of a specific probability density of the state (for a presentation of this view see Sanger (2011), which is different from Bayesian estimation and discrete-time Markov processes).

This emerging view of the nervous system as functioning at the level of affecting probability density functions (Sanger, 2011) is compatible with a modular interpretation of our spatio-temporal results. DFA estimates the statistical self-affinity of stochastic processes with memory whose underlying statistics (mean, standard deviation and higher-order moments) or dynamics are nonstationary (Kantelhardt et al., 2001). That is, DFA quantifies how well a probability density function is implemented. Thus the continuum of control strategies seen across all Modes and time scales can be thought of as essentially differently tuned versions of the same modular control process that can let drift (i.e., uncorrected divergence), be indifferent, or enforce (i.e., corrective action) the statistics of the time-varying probability density of the state so that it populates the solution space. Hence the level of modularity in the controller rests on the ability of the system to work with probability density functions in the task-relevant and taskirrelevant spaces at different time scales—and not with distinct basis functions or synergies implementing a separation of task variables.

# **REFERENCES**


<sup>1</sup>To be clear, the solution manifold arises independently of the controller as it depends only on the characteristics of the plant and the constraints of the tasks. A controller can then choose to inhabit a particular region or subset of the solution manifold to meet the requirements of the task (Kutch and Valero-Cuevas, 2012; Suzuki et al., 2012).


series under neural control. *Physica A* 249, 491–500. doi: 10.1016/S0378- 4371(97)00508-6


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 24 January 2013; accepted: 17 October 2013; published online: 13 November 2013.*

*Citation: Rácz K and Valero-Cuevas FJ (2013) Spatio-temporal analysis reveals active control of both task-relevant and task-irrelevant variables. Front. Comput. Neurosci. 7:155. doi: 10.3389/fncom.2013.00155*

*This article was submitted to the journal Frontiers in Computational Neuroscience. Copyright © 2013 Rácz and Valero-Cuevas. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, dis-*

*tribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# **APPENDIX**

## **IDENTIFICATION AND MODELING OF THE MECHANICAL REQUIREMENTS OF THE TASK AND ITS NULL SPACE**

Each fingertip applies a three-dimensional force **˜** *f* to the object. Computing the cross product of the moment arm, i.e., the vector between the point of force application and the object's center of mass, with the fingertip force vector yields the moment applied to the object. The total 6-dimensional force and moment applied to the object can be computed with the following mapping **W**:

$$
\begin{bmatrix} \sum \mathbf{f} \\ \sum \mathbf{m} \end{bmatrix}\_{6 \times 1} = \begin{bmatrix} \mathbf{I}\_{3 \times 3} \ \mathbf{I}\_{3 \times 3} \ \mathbf{I}\_{3 \times 3} \\ \mathbf{M}\_{\rm th} \ \mathbf{M}\_{\rm ind} \ \mathbf{M}\_{\rm mid} \end{bmatrix}\_{6 \times 9} \begin{bmatrix} \tilde{f}\_{\rm th} \\ \tilde{f}\_{\rm ind} \\ \tilde{f}\_{\rm mid} \end{bmatrix}\_{9 \times 1}
$$
 
$$= \mathbf{W} \tilde{f}$$

where **I**<sup>3</sup> <sup>×</sup> <sup>3</sup> is the unit matrix and **M**{th, ind, mid} is the skewsymmetric matrix representing the cross-product between the moment arm of the finger and its force vector **f**{th, ind, mid}. Since **W** is a mapping from 9-dimensional (three 3-D finger forces) to 6-dimensional (six degrees of freedom for the object grasped) space, the associated null space, i.e., the space of vectors for which **0˜** = **W***x***˜** has 3 dimensions. Any vector *x***˜** in this null space represents a solution to the static grasp requirement **f** = **0˜** ,

 **m 0˜** i.e., that both the sum of forces and the sum of moments should be zero. This is the mathematical description of the task-irrelevant subspace because fingertip forces can change inside this space but the object will remain static.

However, this is a necessary, but not sufficient, requirement. Additionally, we require that the finger tips do not slip, so the tangential forces are upper-bounded through the friction relationship *f*tangential ≤ μ*f*normal, i.e., the tangential force cannot exceed the normal force, multiplied with the friction coefficient μ, which we have set to 0.04, the approximate friction coefficient of Teflon on Teflon. This represents a lower bound on the coefficient of friction, since this coefficient is certainly greater whenfingertip and Teflon surface interact. That is, the grasp under experimental conditions is actually less constrained and tangential components can be greater. In addition, the sum of tangential forces directed vertically needs to oppose the force applied to the object by gravity. A nominal object had a weight of 100 g, hence the sum of tangential forces had to equal 0.981N, which in turn determined the sign (positive, i.e., into the object) and the minimum magnitude of the normal forces. This requirement changes with the weight condition, of course. For a complete list of static grasp model constraints, see **Table A1**.

Given that the task null space is well described, it was important to simulate the properties of expected solutions to compare against the experimental results to properly disambiguate the spatio-temporal features of the fingertip forces that can be explained by mechanics from those of neural origin; as in Rácz et al. (2012) and Kutch and Valero-Cuevas (2012). Enforcing all constraints gives us mathematical description of the null space of the task. To simulate instances of these fingertip forces, we numerically sampled vectors **˜** *f <sup>t</sup>*null from the null space of the above linear matrix by multiplying the three null space basis vectors *n***˜***<sup>i</sup>* with random values *a, b,c*, drawn from a standard Brownian random walk: **˜** *f <sup>t</sup>*null = *a* · *n***˜** <sup>1</sup> + *b* · *n***˜** <sup>2</sup> + *c* · *n***˜** 3. We then added these null space vectors **˜** *f* null to the minimum sum-of-squared-forces solution **˜** *f* min sq of force vectors that met all the above described static grasp constraints: **˜** *f <sup>t</sup>* = **˜** *f* min sq + **˜** *f <sup>t</sup>*null , using MATLAB's (Natick, MA) quadprog() function to determine the actual solution with minimum Euclidean distance to **˜** *f* min sq + **˜** *f <sup>t</sup>*null .

**Table A1 | List of relevant constraints in static grasp.**


# Dynamic primitives in the control of locomotion

#### *Neville Hogan1 \* and Dagmar Sternad2*


### *Edited by:*

*Tamar Flash, Weizmann Institute, Israel*

### *Reviewed by:*

*Francesco Lacquaniti, University of Rome Tor Vergata, Italy Auke Ijspeert, Ecole Polytechnique Federale de Lausanne, Switzerland*

### *\*Correspondence:*

*Neville Hogan, Department of Mechanical Engineering, Brain and Cognitive Sciences, Newman Laboratory for Biomechanics and Human Rehabilitation, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Room 3-146, Cambridge, MA 02139, USA e-mail: neville@mit.edu*

Humans achieve locomotor dexterity that far exceeds the capability of modern robots, yet this is achieved despite slower actuators, imprecise sensors, and vastly slower communication. We propose that this spectacular performance arises from encoding motor commands in terms of *dynamic* primitives. We propose three primitives as a foundation for a comprehensive theoretical framework that can embrace a wide range of upper- and lower-limb behaviors. Building on previous work that suggested discrete and rhythmic movements as elementary dynamic behaviors, we define submovements and oscillations: as discrete movements cannot be combined with sufficient flexibility, we argue that suitably-defined submovements are primitives. As the term "rhythmic" may be ambiguous, we define oscillations as the corresponding class of primitives. We further propose mechanical impedances as a third class of dynamic primitives, necessary for interaction with the physical environment. Combination of these three classes of primitive requires care. One approach is through a generalized *equivalent network*: a virtual trajectory composed of simultaneous and/or sequential submovements and/or oscillations that interacts with mechanical impedances to produce observable forces and motions. Reliable experimental identification of these dynamic primitives presents challenges: identification of mechanical impedances is exquisitely sensitive to assumptions about their dynamic structure; identification of submovements and oscillations is sensitive to their assumed form and to details of the algorithm used to extract them. Some methods to address these challenges are presented. Some implications of this theoretical framework for locomotor rehabilitation are considered.

**Keywords: discrete, submovement, rhythmic, oscillation, impedance, primitive, locomotion, rehabilitation**

# **INTRODUCTION**

In a recent publication, we asserted a pressing need for a fundamental mathematical theory to help organize and structure the prodigious volume of knowledge about sensorimotor control (Hogan and Sternad, 2012). We contend that such a theory has come within reach, though we anticipate that its development will require a process of continuous and incremental revision. While it is common practice to develop mathematical models for narrowly-specified sensorimotor tasks, to establish a reliable theoretical foundation it is necessary to take a broader perspective and consider the widest feasible range of behaviors—even if for no other reason than to uncover and confront facts that might prove embarrassing to a narrowlyformulated theory. Previously we outlined a theoretical framework for upper-extremity motor control that could encompass those quintessentially human behaviors, object manipulation and the use of tools. The goal of this essay is to extend this framework to lower-extremity motor control. To illustrate the potential value of such a theory we consider some of its possible implications for locomotor rehabilitation.

Of course, we acknowledge that an integrated theory of upperand lower-extremity motor control is ambitious, but it ought to be possible—after all, there is only one central nervous system (CNS). Moreover, many commonplace actions require integrated control and coordination of upper and lower extremities, indeed of the entire body. For example, drilling a horizontal hole in a vertical wall using a hand-held drill is commonly performed in a standing position. Therefore, the force exerted by the hand on the drill and wall necessitates tangential force on the ground at the feet. In fact, almost all of the body's degrees of freedom must be coordinated—essentially everything between the hands and feet. The horizontal force results in an overturning moment that must be offset by displacing the center of gravity from the center of pressure below the feet, and a sufficiently strong hand force is typically accomplished by moving the center of gravity far beyond the base of support—i.e., by leaning hard into the push or pull (Dempster, 1958; Rancourt and Hogan, 2001). That is a common cause of falls if the horizontal force exceeds the frictional force between feet and ground and the feet slip (Grieve, 1983). Moreover, with feet together in this leaning posture, an unstable dynamic zero is introduced such that the hand force cannot *decrease* without transiently *increasing*, and vice-versa (Rancourt and Hogan, 2001). With feet far apart, that dynamic zero can be eliminated. The essential point is that the configuration of the *feet* dictates the dynamics of force exertion by the *hands*.

Even aside from the need to integrate upper- and lowerextremity motor control, the spectacular agility of human locomotion demands explanation. Even walking, that most mundane of behaviors, is a subtle and complex dynamic process. Despite intensive and ongoing research, the dynamics of human walking have yet to be reproduced by robots, even though they have actuators faster than muscle by factors of tens to thousands, and communication faster than neurons by a factor of a million or more (Kandel et al., 2000; Hogan and Sternad, 2012). But locomotor behavior is far more versatile than walking. For example, soccer, arguably the world's most popular sport, not only requires agile high-speed maneuvering to avoid equally agile opponents, but controlling the ball requires dexterity with the legs and feet comparable to that of the hands and fingers. In comparison, robot soccer—though fun, highly motivating, and a superb enticement to study science and engineering—is a pale shadow of the "beautiful game."

# **DYNAMIC PRIMITIVES**

Why is human locomotion so agile despite the limitations of our neuro-mechanical system? We believe that the answer lies in the distinctive character of human motor control. Mounting evidence indicates that sensorimotor control relies on a composition of primitive dynamic actions (Sternad et al., 2000; Thoroughman and Shadmehr, 2000; Flash and Hochner, 2005; Kargo and Giszter, 2008; Sternad, 2008; Sing et al., 2009; Degallier and Ijspeert, 2010; Dominici et al., 2011). We propose that *human motor control is encoded solely in terms of these primitive dynamic actions*.

Part of the challenge of controlling locomotor behavior is the high-dimensional, strongly nonlinear, hybrid character of the mechanical dynamics. "Hybrid" in this context refers to a mixture of discrete-event dynamics (each foot making or breaking contact with the ground changes the structure of the dynamic equations) and continuous dynamics (the motion of the skeleton in response to muscular action). With one foot on the ground, the human skeleton has on the order of 200 degrees of freedom; with two feet on the ground, a closed-chain kinematic constraint adds to the complexity. Moreover, kinematicallyconstrained rigid-body mechanics as described by Lagrange's equations is at best an approximation. Soft tissues contribute significantly to musculo-skeletal dynamics and add more degrees of freedom, e.g., via the deformation of muscles or body fat. For example, the impact due to heel strike can cause the mass of the calf muscles to resonate with the elasticity of passive tissues such as the Achilles tendon (Wakeling and Nigg, 2001; Wakeling et al., 2003). That phenomenon cannot be described by a model with only kinematically-constrained rigid bodies. The human body is a forbiddingly complex dynamic object. As we outline below, control via primitive dynamic actions may provide a way to manage this complexity.

The idea that motor control is accomplished by combining primitive elements is not at all new but the full extent of its ramifications for motor control may not yet have been fully articulated. The search for primitive elements that generate motor actions dates back at least a century. Sherrington proposed stereotyped neuromuscular responses to sensory events—the *reflexes*—as building blocks of more complex actions (Sherrington, 1906; Gallistel, 1980; Elliott et al., 2001). The subsequent wave of behaviorist psychology explored how stimulus-response associations (S-R units) could become an "alphabet" for complex behavior. Learning a new action would comprise "chaining" such S-R units or reflexes such that each reflexive action resulted in sensory events that "triggered" the next (Bässler, 1986).

Discrete and rhythmic movements have been proposed as candidates for two classes of primitive actions (Schaal et al., 2000, 2003; Sternad et al., 2000; Sternad, 2008; Ijspeert et al., 2013). They have been termed *dynamic* primitives as they refer to patterns of behavior that may robustly emerge from dynamic systems. To explain, two of the prominent behaviors exhibited by non-linear dynamic systems are *point attractors* and *limit cycles*; a point attractor may describe a discrete movement to a stable posture; a limit cycle may describe a rhythmic movement. Even some of the simplest dynamic systems can exhibit these behaviors as may be seen by considering the class of negativeresistance oscillators from engineering (Strauss, 1970). Those second-order dynamic systems can exhibit robustly sustained oscillation (limit cycle behavior) or stable convergence to a single state (point attractor behavior); changing the value of a single parameter is sufficient to select or induce a transition between these two alternatives. More biologically plausible models of neural oscillators exhibit similar properties, thereby lending support for these mechanisms generating observable behavior (Fitzhugh, 1961; Nagumo et al., 1962; Matsuoka, 1985; Ronsse et al., 2009).

Discrete and rhythmic movements describe unconstrained behavior but frequent physical contact with the ground is an inescapable aspect of human locomotion. A different class of dynamic primitives is required to manage that physical interaction. Locomotion is often described as "controlled falling," yet most of the control occurs not during the fall, but during the sudden stop at its end. During single-legged support the body behaves approximately like an inverted pendulum and available control authority is quite limited. Most of the opportunities for control arise from the behavior of the swing leg as it contacts the ground. This dynamic "shock absorber" behavior is characterized by mechanical impedance. Controllable mechanical impedance is required as a third class of dynamic primitives to account for interaction, in locomotion as in object manipulation. We contend that, taken together, these three dynamic primitives may account for a wide range of behavior.

# **LEVELS OF ANALYSIS**

To understand how dynamic primitives might account for human locomotor control, we distinguish between (at least) three levels of analysis: an *observational level* of overt, measurable behavior; a *combinatorial level* at which the dynamic primitives may be combined; and a *physiological level* from which the dynamic primitives may actually arise—e.g., through a combination of muscular and/or neural dynamics giving rise to submovements, oscillations, and impedances. These levels are loosely analogous to Marr's three levels of analysis—computational, algorithmic, and implementational (Marr, 1982). However, Marr's levels refer to computation or information-processing, specifically for vision. While control of locomotion also involves computation or information processing, the control of physical interaction is essential and not adequately subsumed under information processing.

A failure to distinguish between these levels—observational, combinatorial, and physiological—all too frequently confounds sensorimotor neuroscience. The definitions of dynamic primitives we propose below describe *product* rather than *process*. That is, in an attempt to establish a foundation, we focus on the phenomenology of motor behavior, not on specific hypothesized mechanisms that may give rise to that observable behavior. For clarity, we define dynamic primitives in the mechanical domain of motions and forces at the interface (points of contact) between the neuro-mechanical system and the physical world.

### **ATTRACTORS**

We define dynamic primitives as patterns of behavior that robustly emerge from dynamic systems, that is, as *attractors*. For example, a reasonably general representation of a dynamic system describes the evolution of behavior in a finite-dimensional state space, **x**˙ = *f(***x***)* where **x** ∈ *Rn* for finite *n*. An attractor is a subset of state space with at least two properties: first, it is an *invariant set*: if the system begins in an invariant set, it never leaves it. Secondly, that invariant set is *attractive*: if the system starts sufficiently close to it, the system will ultimately converge to the attractor. Attractor sets may have many forms. A *point attractor* is a single point in state space. An attractor set that is a closed path (or orbit) defines a *limit cycle*. There are alternatives: any feasible path in state-space—any trajectory—may be an attractor; this may describe discrete reaching movements, which exhibit trajectory stability (Lackner and Dizio, 1994; Shadmehr and Mussa-Ivaldi, 1994; Won and Hogan, 1995; Burdet et al., 2001). Other subsets of state-space (e.g., manifolds) may also be attractors; these may describe synergies. Chaotic dynamic systems may have *strange attractors*, prodigiously complex objects with fractal geometry, and there is evidence that locomotion may exhibit chaotic dynamics (Strogatz, 1994; Hausdorff et al., 1995, 2001).

One important feature of this definition of dynamic primitives is that an attractor exhibits a degree of robustness that might be termed "*temporary permanence*": permanence due to robustness to perturbation, temporary due to the fact that dynamic primitives may have limited duration. The pattern of behavior described by the invariant set will re-emerge after perturbation, at least for sufficiently small perturbations.

An important consequence of dynamic primitives defined as attractors is that it points to experiments that might test their objective reality (at least in principle). Due to the robustness of the attractor, a dynamic primitive should manifest as a common pattern of behavior observable in different contexts and despite the presence of noise or perturbations. This feature may lend itself to experimental testing.

### **DISCRETE MOVEMENTS AND SUBMOVEMENTS**

An important requirement for a theory based on primitives is "composability"—it should be possible to combine the elements to generate a repertoire of behavior. In previous work we proposed precise quantitative definitions of discrete and rhythmic movements (Hogan and Sternad, 2007). Our definitions were deliberately confined to the behavioral or observational level, remaining silent about possible generative processes that might give rise to these observations. For a movement to be discrete, i.e., distinct from other movements, we reasoned that any consistent definition requires that it should begin and end with a period of no movement. With that definition, discrete movements can only be sequenced and cannot overlap in time. That would severely restrict the repertoire that could be generated.

To overcome this limitation, we propose that *submovements* are primitive dynamic elements of motor behavior. In essence, submovements are like discrete movements but they may overlap in time and their profiles may superimpose. A submovement is conceived as a *coordinative atom*: just as atoms are primitive units of chemical reactions, submovements are elements of dynamic coordination used to compose motor behavior. Just as atoms have complex internal structure, submovements may require complex patterns of neuromuscular activity to instantiate the dynamic process from which a submovement emerges as an attractor.

We define a submovement as an attractor that describes a smooth sigmoidal transition of a variable from one value to another with a stereotyped time-profile. For limb position, the variable is a vector in some coordinate frame **x** =[*x*1*, x*2*...xn*] *t* . If it is foot position in visually-relevant coordinates, the elements of **x** might be the positions and orientations of the foot (*n* ≤ 3 for location and *n* ≤ 6 if orientation is included). If it describes the configuration of the entire limb, the number of coordinates may be substantially greater, e.g., all of the relevant joint angles. Each coordinate's speed profile has the same shape which is nonzero for a finite duration *d* = *e* − *b*, where *b* is the time when the submovement begins and *e* is the time it ends, i.e., it has *finite support*:

$$\dot{\chi}\_{\dot{j}}(t) = \hat{\nu}\_{\dot{j}}\sigma(t), \; j = 1 \ldots \,\, n$$

where *v*ˆ*<sup>j</sup>* is the peak speed of element *j*; σ*(t) >* 0 if *b < t < e* and σ*(t)* = 0 if *t* ≤ *b* or *e* ≤ *t*. The speed profile has only one peak: there is only one point *tp* ∈ *(b,e)* at which σ˙*(tp)* = 0, and at that point, σ*(tp)* = 1.

Note that this definition is deliberately silent about possible generative dynamic processes that might give rise to submovements. However, some constraints on those processes can be identified. It may seem that a dynamic process with a point attractor is appropriate. However, physiological evidence shows that at least in reaching movements, the CNS does not simply specify final position (Bizzi et al., 1984; Won and Hogan, 1995). It is the trajectory, rather than final position, that is controlled. Further, this trajectory has a stereotyped time profile (Atkeson and Hollerbach, 1985). This dynamic primitive may be termed a "trajectory attractor."

### *Composability*

Submovements may be considered as *basis functions* and combined with overlap in time to produce a wide range of motion profiles. Though several combination operators are possible, linear vector superposition of discrete point-to-point reaching movements has been shown to provide an accurate description of movement trajectories in which a target shifts abruptly (Flash and Henis, 1991). Combining *m* submovements yields

$$\dot{x}\_{\vec{j}}(t) = \sum\_{k=1}^{m} \hat{v}\_{\vec{j}k} \sigma(t \, | \, b\_k, d\_k), j = 1 \, \dots \, m$$

where each submovement *k* has the same shape but may have different peak speed *v*ˆ*jk*, start time *bk* and duration *dk*.

Composability has its drawbacks. One important disadvantage is the concomitant difficulty of identifying submovements unambiguously from a continuous motion record. Some responses to this challenge are discussed below.

### **RHYTHMIC MOVEMENTS AND OSCILLATIONS**

From a strictly mathematical perspective, a rhythmic dynamic primitive is not essential. Rhythmic movements could be described parsimoniously as a composite of overlapping submovements in opposite directions. However, rhythmic movement is very old phylogenetically. Available evidence indicates that oscillatory behavior of both upper and lower extremities is a *distinct* dynamic primitive element of biological motor control and not a composite of submovements (Brown, 1911, 1914; Grillner and Wallen, 1985; Schaal et al., 2004).

Because the term "rhythmic" has numerous confusing variations of meaning, to render precision, we denote the corresponding dynamic primitive as an *oscillation* (Hogan and Sternad, 2007). Describing limb position as a vector quantity, **x** = [*x*1*, x*<sup>2</sup> *... xn*] *t* , we define the primitive as an attractor that describes almost-periodic motion:

$$\left| \left| \mathbf{x}\_{j}(t) - \mathbf{x}\_{j}(t + \Delta t + lT) \right| \right| < \varepsilon\_{j} \,\forall t, \,\, l = \pm 0, \, 1, \, 2, \ldots, \, j = 1 \, \dots n, \, j$$

where *T* is a constant (its smallest value is the period), |*t*| *<* δ and ε*<sup>j</sup>* and δ are small constants. This definition allows for the ubiquitous fluctuations exhibited in biological behavior, whether due to stochastic processes (noise) or deterministic chaos (Raftery et al., 2008). The main point of this definition is that the *average* time-course of an almost-periodic behavior is strictly periodic. The amplitude and phase of each vector component may differ but all components exhibit an average time-variation with the same shape and period, *T*.

As with submovements, this definition is deliberately silent about possible generative processes that might give rise to these observations. However, it seems reasonable to conjecture that oscillations emerge from a generative dynamic process with a limit cycle attractor (Kay et al., 1991; Rabinovich et al., 2006).

### *Composability*

Evoking Fourier's theorem, it is clear that a wide range of almost-periodic behaviors may be composed by superposition of oscillatory primitives,

$$\overline{\mathbf{x}}\_{\dot{\jmath}}(t) = \sum\_{k=1}^{m} \hat{\mathbf{x}}\_{\dot{\jmath}k} \, \mathbf{s}(t \, | \, T\_k, \, \phi\_k),$$

where the overbar denotes an average, *s(t* |*Tk,* φ*<sup>k</sup> )* is a sinusoid with period *Tk* and phase φ*<sup>k</sup>* as parameters, and *x*ˆ*jk* is its amplitude. However, as with submovements, composability also implies a challenge. Unless their form is known precisely, unambiguous identification of type and number of oscillatory primitives from a continuous motion record is problematic.

### **MECHANICAL IMPEDANCE**

To account for contact and physical interaction with the ground, a third class of dynamic primitives is required, mechanical impedances. Loosely speaking, mechanical impedance is a generalization of stiffness to encompass nonlinear dynamic behavior (Hogan, 1985). Mathematically, it is a *dynamic operator* that determines the force (time-history) evoked by an imposed displacement (time-history). The force and displacement must be *energetically conjugate*; that is, they must refer to the same point(s) so that incremental mechanical work *dW* may be defined i.e.,

$$dW = \mathbf{f}^t d\mathbf{x} = \sum\_{j=1}^n f\_j d\mathbf{x}\_j$$

where **x** = [*x*1*, x*<sup>2</sup> *... xn*] *<sup>t</sup>* is a vector of positions and **f** = *f*1*,f*<sup>2</sup> *... fn <sup>t</sup>* is a vector of forces, both defined with respect to any suitable coordinate frame. A mechanical impedance operator **Z** maps displacement onto the conjugate force.

$$\mathbf{f}(t) = \mathbf{Z}\left\{\triangle \mathbf{x}(t)\right\}$$

The form of this mapping may be nonlinear and timevarying. For convenience we often assume a state-determined representation

$$\dot{\mathbf{z}} = Z\_{\mathbf{s}}(\mathbf{z}, \Delta \mathbf{x}, t)$$

$$\mathbf{f} = Z\_{\boldsymbol{\phi}}(\mathbf{z}, \Delta \mathbf{x}, t)$$

where **z** = [*z*1*,z*<sup>2</sup> *...*] *<sup>t</sup>* is a vector of state variables and *Zs* and *Z*<sup>o</sup> are algebraic functions. For brevity, we often omit the "mechanical" prefix.

The displacement inputs need not be at the same physical location in space, provided they can be paired with energeticallyconjugate forces. For example, the several joints of the lower extremity (hip, knee, ankle, etc.) are in different physical locations. The limb configuration may be described using joint angles, **θ** = [θ1*,* θ<sup>2</sup> *...* θ*n*] *t* , a special case of *generalized coordinates* (Goldstein, 1980). The corresponding *generalized forces* (joint torques) **τ** = [τ1*,* τ<sup>2</sup> *...* τ*n*] *<sup>t</sup>* are defined such that incremental mechanical work may be defined.

$$dW = \mathfrak{r}^t d\mathfrak{\theta} = \sum\_{j=1}^n \mathfrak{r}\_j d\theta\_j$$

Joint mechanical impedance maps joint angular displacements onto the evoked joint torques.

$$\mathbf{r}(t) = \mathbf{Z}\_{\text{joint}} \left\{ \triangle \mathbf{\dot{\theta}}(t) \right\}.$$

Like submovements and oscillations, humans can voluntarily control mechanical impedance (Hogan, 1979, 1980, 1984, 1985; Burdet et al., 2001; Franklin and Milner, 2003; Franklin et al., 2007). The most obvious way is by co-contraction of antagonist muscle groups but the configuration of the limb also has a profound effect—posture modulates impedance (Hogan, 1990). During locomotion the mechanical impedance of the lower limb at the point of contact with the ground, and hence the way it absorbs or transmits the shock of impact to the rest of the body, depends strongly on whether first contact is made with the heel or the ball of the foot, or whether the leg is straight or the knee slightly flexed.

Mechanical impedance is a different kind of primitive than a submovement or oscillation; nevertheless it has properties of an attractor as we identified above. Mechanical impedance is extremely robust to contact and interaction. While the force and motion of the foot are obviously sensitive to contact with the ground, mechanical impedance at, say, the ankle is a property that emerges solely from the dynamics of the neuro-mechanical system supporting the foot and is completely independent of contact. It exhibits the robustness that we require for a dynamic primitive. Neuro-muscular mechanical impedance depends on the intrinsic physical properties of the muscular and skeletal systems but it is also influenced by neural feedback loops, especially those involving muscle spindles and Golgi tendon organs at the spinal level or higher. A compelling case has been made that one important function of these feedback loops is to maintain the mechanical impedance of the neuro-muscular actuator (Nichols and Houk, 1976; Hoffer and Andreassen, 1981). Undesirable impedance reduction due to cross-bridge detachment evoked by imposed displacement is corrected by enhanced neural activation; this makes the impedance an attractor of the closed-loop system. Moreover, it is known that the gains of these feedback pathways are highly modifiable, either via gamma motoneuron activity or via descending drive to spinal interneuron pools (Prochazka et al., 2000). Thus, the attraction to a particular impedance that these feedback loops provide has the *temporary permanence* that we believe is a hallmark of dynamic primitives.

### *Composability: superposition of impedances*

A remarkable feature of mechanical impedance is that, when coupled to skeletal inertia, *non-linear impedances may be combined by linear superposition* (Hogan, 1985). That is, given a set of different impedances {**Z**1*,* **Z**2*,...* **Z***k*} appropriate for different aspects of a task, the total impedance is

$$\mathbf{Z}\_{\text{total}} = \sum\_{k=1}^{m} \mathbf{Z}\_{k}$$

even if any or all of the component impedances, **Z***<sup>k</sup>* are nonlinear. These are among the reasons why modulating mechanical impedance is a particularly efficacious way to control interaction tasks (Toffin et al., 2003; Hogan and Buerger, 2004; Franklin et al., 2007). They are also the reasons why we believe that mechanical impedance is an essential dynamic primitive for contact tasks.

### **COMBINING DIFFERENT CLASSES OF DYNAMIC PRIMITIVES**

A theory based on dynamic primitives requires specification of how those primitives may be combined. An example may illustrate the challenge: a successful soccer kick requires skillful placement of the stance foot relative to the ball, a vigorous but carefully controlled motion of the swinging leg, and determination of appropriate impedance between foot and ball at the moment of contact, usually against the background of rhythmic running <sup>1</sup> . It is therefore essential to specify how the different dynamic primitives interact to produce observable forces and/or motions.

To do so we use the construct of a *virtual trajectory*, denoted **x**0. It summarizes the net motion due to commands from the CNS when the force exerted is identically zero. We make the mild assumption that the mechanical impedance is such that if the force is identically zero, the corresponding displacement is also identically zero: **f** ≡ 0 ⇒ **x** ≡ 0 or in words, if force and all of its time derivatives and integrals are identically zero, then the corresponding displacement and all of its time derivatives and integrals are also identically zero. This allows us to *define* the displacement input to the impedance operator, **x**, to be the difference between virtual (zero force) and actual (non-zero force) trajectories: **x** = **x**<sup>0</sup> − **x**, or in joint coordinates, **θ** = **θ**<sup>0</sup> − **θ** (Hogan, 1985). If the force is zero, the virtual and actual trajectories coincide. If the force is non-zero, the virtual trajectory **x**0*(t)* may be inferred from a knowledge of mechanical impedance **Z**, force **f***(t)*, and actual motion **x***(t)* as **x**0*(t)* = **x***(t)* + **Z**−1 **f***(t)* . This requires the inverse mapping **Z**−1{·} to exist. Note that the magnitude of impedance may be small provided it is non-zero.

This is not the only way these three classes of dynamic primitives—submovements, oscillations and impedances might be combined, but an advantage of this construction is that it defines a non-linear extension of the *equivalent networks* widely used in engineering to describe physical interaction between dynamic systems, e.g., an audio amplifier and the speakers it drives (Hogan, 1985; Johnson, 2003a,b; Hogan, in revision). According to our view of dynamic primitives, the virtual trajectory **x**0*(t)* specified by the CNS may be composed of *submovements* and/or *oscillations*. Based on the difference between virtual and actual trajectories, *impedances* specified by the CNS determine the forces evoked by contact. With this representation, much prior engineering insight about dynamic interaction in machines may be re-purposed to help understand physical interaction in locomotion.

### **RELATION TO THE LAMBDA HYPOTHESIS**

The virtual trajectory is related to the "lambda" or "equilibriumpoint" hypothesis but is also distinct from it in important ways. A common theme running through the several variants of the lambda hypothesis is the proposal that the CNS encodes motor commands as time-varying equilibrium postures (Feldman, 1966, 1986; Feldman and Latash, 2005). This is a proposed description of at least part of the process of generating movement. However, the mere existence of an "instantaneous equilibrium point," though not guaranteed, is not by itself very surprising from a physiological perspective; for example, the variation of muscle tension with length may suffice. Therefore, an instantaneous

<sup>1</sup>To understand the importance of mechanical impedance, consider the difference between kicking a ball and "trapping" it; the former require stiffening the ankle to transfer momentum to the ball; the latter requires relaxing the foot to "deaden" the bounce.

equilibrium point does not by itself provide compelling evidence about how the CNS encodes motor commands.

To define a virtual trajectory only requires that mechanical impedance—a physically measurable quantity—has a well-defined zero as described above. Most descriptions of the neuromuscular actuator satisfy this requirement. If so, an "instantaneous equilibrium point" (which we term a virtual position) may always be defined. This construct is a consequence of observable neuro-muscular mechanics. It is a description of behavior (the *product*) and may have no relation to how the CNS goes about producing that behavior (the *process*). Existence of a virtual trajectory does not require or imply that the CNS knows about or uses this construct for control. Indeed, available evidence suggests that this would be at best an incomplete account (Lackner and Dizio, 1994).

# **DYNAMIC PRIMITIVES IN LOCOMOTION**

What is the evidence that these dynamic primitives describe the control of locomotion? Walking clearly exhibits a strongly rhythmic character, but that by itself is not sufficiently informative; walking could be a sequence of discrete (or overlapping) steps and in some cases—e.g., the slow pacing used in a funeral march—it may be. Furthermore, what is the role of mechanical impedance?

### *Role of oscillations and submovements*

Observations of fictive locomotion in non-human vertebrates provide unequivocal evidence that neural circuits capable of generating an oscillatory dynamic primitive—sustained rhythmic activity—exist in the spinal cord isolated from its periphery, though sensory feedback is known to play a key role (Brown, 1911; Grillner and Wallen, 1985; Kriellaars et al., 1994; Stein et al., 1995; Cazalets et al., 1996; Grillner et al., 1998; Pearson et al., 2004). For unimpaired humans, continuous leg muscle vibration produced locomotor-like stepping movements, and spinal electromagnetic stimulation applied to unimpaired human vertebrae induced involuntary locomotor-like movements (Gurfinkel et al., 1998; Gerasimenko et al., 2010). That suggests the existence of a rhythmic central pattern generator (CPG) in the human spinal cord that may contribute to generating locomotor activity, though feedback elicited by limb loading, hip extension or the pressure on the sole the foot also play important roles (Grillner and Wallen, 1985; van Wezel et al., 1997; Dietz and Harkema, 2004).

However, the relative contribution of rhythmic pattern generation to unimpaired human locomotion remains unclear. Human infants exhibit a primitive rhythmic stepping reflex but it typically disappears at about 6 weeks after birth without training (Yang and Gorassini, 2006). When independent walking emerges at about one year old, it does not initially exhibit the rhythmic pattern of mature walking and this cannot be ascribed to immature postural control (Ivanenko et al., 2005). Furthermore, the locomotor-like movements evoked by stimuli to unimpaired human subjects were observed in a gravity-neutral position, unlike normal walking, rendering it difficult to assess how those results would apply to upright walking (Gurfinkel et al., 1998; Gerasimenko et al., 2010).

Walking in unimpaired adults is characterized by a remarkably repeatable spatial trajectory of the foot (Ivanenko et al., 2002). In response to surface irregularity in the form of small obstacles, subjects adjusted their minimum toe clearance using subtle adjustments of lower-limb kinematics (Schulz, 2011). Patients with spinal cord injury (SCI) who recovered following bodyweight supported treadmill training generated a foot trajectory that closely matched the normal pattern, although they used very different joint coordination patterns to do so (Grasso et al., 2004). Together, these observations suggest the presence of a trajectory attractor underlying foot motion similar to that underlying hand motion in simple reaching movements (Bizzi et al., 1984; Lackner and Dizio, 1994; Shadmehr and Mussa-Ivaldi, 1994; Won and Hogan, 1995; Burdet et al., 2001).

Given our definition of dynamic primitives as attractors, studying the stability properties of ambulatory behavior may help resolve this question. Robustly sustained oscillation emerges as a *limit cycle attractor* from nonlinear dynamical systems such as relaxation oscillators (van der Pol, 1926). Nonlinear limit cycle oscillators not only encapsulate the robust and stable rhythmic motion of the periphery in human walking; they also serve as competent models of neural rhythmic pattern generators (Matsuoka, 1987; Taga et al., 1991; Collins and Richmond, 1994; Taga, 1998; Rybak et al., 2006). One of their distinctive characteristics is that they may exhibit *dynamic entrainment*: under certain conditions they will synchronize their period of oscillation to that of an imposed oscillation and *phase-lock* to establish a particular phase relation with it (Bennett et al., 2002). Usually entrainment occurs only for a limited range of frequencies; it exhibits a *narrow basin of entrainment*. In fact, entrainment to periodic mechanical perturbation has been reported in several non-human vertebrates which show clear evidence of spinal pattern generators (Grillner et al., 1981; Pearson et al., 1992; McClellan and Jang, 1993; Kriellaars et al., 1994).

A recent study reported behavioral evidence that a neuromechanical oscillator contributes to human walking, though perhaps weakly (Ahn and Hogan, 2012b). As unimpaired human subjects walked on a treadmill at their preferred speed and cadence, periodic torque pulses were applied to the ankle. Though the torque pulse periods were different from their preferred cadence, the gait period of 18 of 19 subjects converged to match that of the perturbation (**Figure 1A**). Significantly, this entrainment occurred only if the perturbation period was close to subjects' preferred walking cadence: it exhibited a narrow basin of entrainment. Further, regardless of the phase within the walking cycle at which the perturbation was initiated, subjects' gait synchronized or phase-locked with the mechanical perturbation at a phase of gait where it assisted propulsion. These results were affected neither by auditory feedback nor by a distractor task. However, the convergence to phase-locking was slow, requiring many tens of strides.

The existence of a basin of entrainment, however narrow, indicates that a non-linear limit-cycle attractor underlies level treadmill walking but it does not discriminate between several physiologically-plausible mechanisms that might be responsible, e.g., a CPG in the spinal cord or a "closed chain" of reflexive actions such that each results in sensory events that "trigger" the next (Bässler, 1986; Gurfinkel et al., 1998; Gerasimenko

et al., 2010). Nevertheless, a highly-simplified mathematical model in which afferent feedback triggered actuation of the trailing leg reproduced all of the features observed experimentally (**Figure 1B**): (1) a periodic bipedal walking pattern; (2) local asymptotic stability of that periodic walking pattern; (3) entrainment of that walking pattern to periodic mechanical perturbations with a narrow basin of entrainment; and (4) phase locking to locate the perturbation at the end of double stance when entrained (Ahn and Hogan, 2012a; Ahn et al., 2012).

perturbation pulse is plotted as a function of stride number. Panel

### *Role of musculo-skeletal mechanical impedance*

A key insight derived from that model is that stable locomotion requires energy dissipation. Although collision-free legged locomotion is physically possible, to the best of our knowledge non-elastic interaction between foot and ground, which dissipates kinetic energy, is a common characteristic of legged animal locomotion. In human locomotion, muscles do more positive than negative work, even when walking at constant average speed on level ground, which provides evidence of energy dissipation (Devita et al., 2007).

The impact between the foot and the ground happens very rapidly; foot-ground forces have significant frequency content up to 15 or 20 Hz and beyond (Antonsson and Mann, 1985; Wakeling and Nigg, 2001). The bandwidth of lower-limb muscles in response to neural excitation is no more than a couple of Hz and the shortest transmission delay associated with spinal feedback is 50 ms or longer. As a result, reactive control of foot-ground interaction based on neural feedback is unworkable. However, musculo-skeletal mechanical impedance enables controlled reactions much faster than neural responses. Modulating shock absorption and energy dissipation depends on pre-tuning lowerlimb mechanical impedance, i.e., using impedance as a dynamic primitive of motor control. The magnitude of the required shock absorption varies with walking speed and variation of lower-limb stiffness with speed of human locomotion has been widely

perturbation pulses converged to phase-lock at the end of double stance, close to toe-off.

reported (Farley and Gonzalez, 1996; Ferris et al., 1999; Holt et al., 2003).

In the simplified mathematical model described above, ankle mechanical impedance also affected the energy added during the push-off phase; a pre-stretched spring-like muscle was released (Ahn and Hogan, 2012a; Ahn et al., 2012). Though this is at best a crude approximation to the action of lower-limb muscles, it yielded a more stable walking cycle (i.e., a larger basin of attraction) than simply modeling muscle action as generating a force or torque pulse with zero impedance. This further supports our contention that musculo-skeletal mechanical impedance is one of the essential dynamic primitives required for human locomotion.

### *Interaction between dynamic primitives in locomotion*

If dynamic primitives underlie locomotion, then interaction between them may also play an important role. One mathematical simulation study suggested that a hybrid dynamic walker was more stable when synchronized with an oscillator that acted as a clock than when it operated independently (Seipel and Holmes, 2007). Notably, the interaction between the oscillator and the periphery was exactly analogous to the equivalent network we propose: the oscillator specified a nominal limb trajectory; joint torque was exerted, determined by a simple mechanical impedance, as a function of the difference between nominal and actual limb trajectories.

In addition to rhythmic cycling of the limbs, functional locomotion requires the ability to place a foot, e.g., to avoid an obstacle or to secure an appropriate foothold on the first step of a flight of stairs. This requires the production of a discrete step against the background of an ongoing rhythm. In principle, that might be achieved by simple linear superimposition of a virtual trajectory corresponding to a submovement onto one corresponding to an oscillation. However, upper-extremity studies have shown that, against the background of rhythmic motion, the onset of a discrete action preferentially occurs at selected phases of the ongoing rhythm, which implies a nonlinear interaction (Sternad et al., 2002). A model comprising a Matsuoka oscillator coupled to antagonist muscles acting about a single joint successfully reproduced this phenomenon (De Rugy and Sternad, 2003; Ronsse et al., 2009).

Whether similar phenomena occur in human walking is, to the best of our knowledge, unknown at this time. It seems clear that a single interposed discrete step—e.g., a sidestep—does not catastrophically disrupt an ongoing walking rhythm. However, it is less clear which aspects of that rhythm exhibit the stability of an attractor. Subjects exhibit a preferred cadence and step length that appears to be robust (MacDougall and Moore, 2005). However, transient lower-limb perturbations induce phase-resetting of the walking rhythm, a persistent change of phase relative to the preperturbation oscillation (Nomura et al., 2009; Feldman et al., 2011). This indicates that the oscillatory lower-limb trajectory, e.g., time history of joint angles, is not an attractor. Whether interposed side-steps evoke similar phase-resetting is a matter for future investigation.

## **IDENTIFYING DYNAMIC PRIMITIVES IN LOCOMOTION**

Some progress towards identifying dynamic primitives and their interaction in locomotion has been made. Experimental identification of impedance requires mechanical perturbation; the evoked response at the point(s) of interaction is determined by impedance. However, even the static component of multivariable joint impedance (the relation between torque and angular displacement) may be highly structured. Measurements on unimpaired subjects show a pronounced weakness in inversioneversion, the direction of most ankle injuries (**Figure 2**). Increasing muscle activation increases stiffness but does not eliminate this relative weakness (Lee et al., 2012c).

It is common to assume that the combination of skeletal inertia and neuro-muscular impedance exhibits second-order dynamics (Dolan et al., 1993; Tsuji et al., 1995). Though that may seem reasonable, it is not necessarily correct and there is good reason to expect higher-order dynamic behavior (Wakeling and Nigg, 2001). To avoid *a-priori* assumptions about the order of the dynamics, stochastic methods may be used to identify a locally linear approximation to dynamic behavior (Palazzolo et al., 2007; Chang et al., 2012). They have been applied successfully (**Figure 3**) to identify the steady-state multi-variable dynamic impedance of the ankle (Rastgaar et al., 2009, 2010; Lee et al., 2012b).

Stochastic methods may also be extended to identify *timevarying* mechanical impedance (Lortie and Kearney, 2001). They have recently been applied (**Figure 4**) to identify a time-varying trajectory of multivariable ankle mechanical impedance during level walking (Lee et al., 2012a; Lee and Hogan, 2013).

A virtual trajectory, **x**0*(t)*, can also be measured experimentally. If the point of interaction is the sole of the foot, then during swing phase the force is identically zero, **f** ≡ 0, and because **x** ≡ 0, the observed motion is the virtual trajectory. During stance phase the force is non-zero, **f** ≡ 0 but **x**0*(t)* may be inferred from a measurement of mechanical impedance, **Z**, force, **f***(t)*, and actual motion, **x***(t)* as **x**0*(t)* = **x***(t)* + **Z**−1 **f***(t)* provided **Z**−<sup>1</sup> exists. If the point of interaction is at a joint—say, the ankle or the knee—then the dynamics between the joint and the point of force application must be identified and subtracted. The main difficulty is that estimates are exquisitely sensitive to the *assumed* order of the neuro-muscular impedance model used to infer a virtual trajectory—see Gomi and Kawato (1996) but compare with Gribble et al. (1998). However, there is no fundamental reason it cannot be determined and model-independent experimental methods have been demonstrated (Hodgson and Hogan, 2000).

Given a measured virtual trajectory, there remains the challenge of identifying underlying motion primitives, such as submovements and oscillations. Composability, the requirement that

**column**: Inversion-eversion. **Top row:** magnitude. **Bottom row:** phase. Color code: black, fully relaxed; red, tibialis anterior active at 10% maximum

dynamic primitives may be combined to produce behavior, may introduce ambiguities. One common approach to identifying submovements is to examine derivatives of the trajectory to identify local peaks, but that method is completely unreliable (**Figure 5**). A composite of two smooth submovements may yield one, two, or three local velocity peaks (Rohrer and Hogan, 2003).

Alternative methods use "greedy" algorithms which first find a submovement that best fits the trajectory in some suitable sense (least residual error, highest peak speed, etc.), then subtract it and repeat the procedure on the residual until the error between the sum of submovements and the original trajectory falls below a specified threshold. Unfortunately, these methods also yield spurious decompositions (**Figure 5**). Even in a simulated "test" case, where a sequence of submovements is known *a-priori* and used to compose a continuous trajectory, these methods cannot reliably recover the underlying submovements (Rohrer and Hogan, 2003).

However, global optimization methods have been developed which avoid spurious decompositions (**Figure 6**). With these methods it has been shown that (1) the statistics of the extracted submovement parameters are robust to the assumed submovement shape and (2) the errors introduced by inappropriate submovement shapes can be detected even in the presence of substantial measurement noise (Rohrer and Hogan, 2003, 2006).

# **LOCOMOTOR REHABILITATION**

A theoretical framework based on dynamic primitives may have particular relevance for sensorimotor rehabilitation, both in the development of assistive technologies and in the design of therapeutic procedures.

voluntary contraction (MVC); green, soleus active at 10% MVC; blue, tibialis anterior and soleus co-contracted, each at 10% MVC. The effect of muscle activation is to increase impedance below about 10 Hz, predominantly in dorsi-plantar flexion. The phase plots suggest dynamic behavior more complex than second-order.

# **ASSISTIVE TECHNOLOGIES**

The design and implementation of assistive orthoses and amputation prostheses has unequivocally demonstrated the importance of controllable mechanical impedance. It is a key element of recent highly-successful designs of ankle-foot orthoses and transfemoral prostheses (Blaya and Herr, 2004; Au et al., 2007; Sup et al., 2008, 2009; Ha et al., 2011; Lawson et al., 2011; Sup et al., 2011). A central feature of these designs is the equivalent network structure referred to above, which is used to combine neural and mechanical influences on how the foot interacts with the ground. However, it is less clear whether submovements and/or oscillations play a prominent role. For example, the designs by Goldfarb and colleagues implement a finite number of states arranged in a closed cycle (Sup et al., 2008). Rhythmic behavior emerges as consequence of this closed cycle rather than due to any neurally-generated oscillation. To anticipate future work, this new technology may provide essential tools to test a theory based on dynamic primitives.

### **PHYSIOTHERAPY**

A theoretical framework based on dynamic primitives may also have a substantial value for therapies to *recover* neuro-motor function rather than assist it or replace it. To date, therapeutic practices have lacked a basis in experimentally-verified theory. This is understandable because there is, as yet, little scientific consensus on the neural control of unimpaired locomotion, and certainly none on how the CNS responds to injury. Nevertheless, it is difficult to understand how rational design of therapeutic procedures might be accomplished without a fundamental theory of locomotion and its recovery.

Most rehabilitation practices tacitly assume that motor recovery is loosely analogous to unimpaired motor learning. However, unimpaired motor learning happens in an intact nervous system and is not accompanied by the common sequelae of neurological injury, which include muscular weakness, spasticity, abnormal muscle tone, abnormal synergies, and disrupted or unbalanced sensory pathways (Hogan et al., 2006). Nevertheless, the most successful form of upper-extremity robotic therapy to date was designed to incorporate principles of motor learning and it has proven effective (Krebs et al., 2003; Miller et al., 2010). It therefore seems probable that something resembling motor learning is at least part of the recovery process.

We propose that motor learning (and, by extension, recovery) consists of encoding the parameters of dynamic primitives and subsequently using them to reconstruct the primitives, rather than details of behavior. Support is found in the analysis of infant reaching movements, which initially exhibit submovements but become essentially continuous at around 6 months of age (Hofsten, 1991; Berthier, 1996). More recent work showed that the earliest movements made by patients recovering after a paralyzing stroke were composed of submovements with remarkably stereotyped speed profiles, even for different patients with different lesions (Krebs et al., 1999). This degree of robustness or "temporary permanence" makes a compelling case that submovements are, indeed, a primitive dynamic element of human motor behavior. Studies of movement changes during recovery after stroke (**Figure 7**) showed that submovements grew progressively larger, fewer, and more blended as recovery progressed (Rohrer et al., 2002, 2004; Dipietro et al., 2009). Whether similar patterns will be found in lower-extremity behavior remains a topic for future research.

Muscular weakness, spasticity and abnormal muscle tone may all manifest as disruptions of mechanical impedance. Because impedance is at the *interface* between the CNS and the physical world, inappropriate impedance may hinder the recovery of effective motor actions. We therefore expect normalization of impedance concurrently with recovery. Support is found in recent preliminary studies of how ankle impedance influences recovery of locomotion. The ankle impedance of neurologically impaired subjects was significantly different from that of agematched healthy subjects (Roy et al., 2009, 2011; Lee et al., 2011). Robot-aided therapy in which patients were seated with the foot clear of the ground ("open chain") and performed visually evoked "reaching" movements with the ankle while the robot provided graded assistance as needed successfully resolved the abnormality and—most remarkably—this form of therapy resulted in a 20% improvement of over-ground walking speed (Forrester et al., 2011).

This observation seems to suggest that correcting abnormal impedance due to weakness, spasticity or abnormal muscle tone is a *pre-requisite* for recovery, but caution is appropriate; changing impedance may not be a cause of recovery but a consequence. To elaborate, neurological injury may result in weakened and/or unbalanced descending neural drive from higher levels of the CNS to the periphery. This may alter the excitability of spinal segmental neurons, e.g., increasing reflex feedback gains by reducing inhibition, and that, in turn, may alter impedance. Active participation of the patient is an essential element of the

**(Rohrer and Hogan, 2003).** Solid lines: simulated speed profiles. Dotted lines: Gaussian submovements. Dashed lines: minimum-jerk submovements. A speed profile composed of Gaussian submovements yields substantially greater fitting error when decomposed into minimum-jerk submovements (right column) and vice-versa (left column).

robot-aided therapy that corrected abnormal ankle impedance (Forrester et al., 2011). Active participation may have increased descending drive, leading *both* to more normal impedance and improved overground locomotion. Of course, it must be emphasized that these are mere speculations; further study is needed to test whether they contain any grain of truth.

If motor learning is an essential part of neuro-recovery, we may expect that greater intensity of practice will yield better outcomes, and that is consistent with the success of upper-extremity robot-aided therapy. It might then seem that rhythmic practice should be most effective because it enables a greater intensity of practice—more movements per unit time than discrete movements spanning the same workspace. However, if rhythmic and discrete movements arise from distinct dynamic primitives, then learning one type may not generalize to improved performance of the other. In fact, recent studies of unimpaired subjects' adaptation showed that the benefits of rhythmic practice did not transfer to performance of discrete movements (Ikegami et al., 2010; Howard et al., 2011).

If that result generalizes to lower extremity actions, it might account for some of the difficulties that have thwarted attempts to improve locomotor therapy. Treadmill-based robot-aided therapy has been found less effective than conventional therapy and ". . . still in its infancy" (Miller et al., 2010). Human-administered locomotor therapy has fared little better: an extensive study of body-weight supported treadmill training found that it yielded no better outcome than a home-based exercise program that was ". . . expected to have little or no effect on the primary outcome, gait speed" (Duncan et al., 2007, 2011). Both of these treadmill-based approaches emphasized rhythmic practice of walking movements. However, any benefits may have generalized poorly to the wider context of functional walking, which may in addition require discrete actions and controlled impedance. Once again, these are no more than speculations, but they illustrate some of the insight that might be afforded by applying a theoretical framework based on dynamic primitives to both upper- and lower-extremity behavior.

# **FUTURE DIRECTIONS**

We have attempted to outline how a theory based on dynamic primitives might be applied to describe control of human locomotion as well as object manipulation and the use of tools. This outline is no more than a tentative beginning. Our first concern is that a theory should be *competent* to account for a wide range of observed behavior. Where possible, we have attempted to be faithful to the underlying physiology but that is a secondary consideration at this point; fidelity without competence is useless.

Moreover, complete fidelity is probably unattainable and certainly impractical. For example, functioning nerves and muscles require expression of genes to produce essential proteins but our present knowledge of that process and how it is controlled remains profoundly limited. Even if it were possible, a theoretical description of motor control that attempted to include that level of detail would be hopelessly cumbersome. It would defeat the main purpose of formulating a theory, to gain insight.

Any sufficiently ambitious theory will inevitably be contradicted by some experimental observations but this does not mean that it should be discarded outright. A practical theory should be *incrementally revisable* to accommodate new knowledge as it is gained. Theory-building is an iterative, ongoing process. In order for the revisions to be incremental rather than catastrophic, the foundations must be reliable. This requires the theory—and especially its foundations—to have passed the test of falsification (Ajemian and Hogan, 2010).

That reveals one of the challenges of a theory based on dynamic primitives, and may explain why it has not yet been established despite more than a century of investigation. In order to describe the wide repertoire of human behavior competently, the primitives must exhibit what we have called "composability"—observed behavior may be composed of multiple primitives overlapping in time. But composability also implies that unambiguously identifying primitives solely from measurements at the *observational* level is difficult. If the detailed form of the primitive (submovement, oscillation or impedance) is known, the problem is tractable. Without that knowledge, it is "ill-posed" in the sense that a unique solution may not exist. Some progress has been made on this problem by using optimization to provide regularization, but much remains to be done (Rohrer and Hogan, 2003, 2006).

Of course, scientific studies are not confined to this observational level. Studies at the *physiological* level may resolve the ambiguities. For example, it is not clear whether any convincing evidence of an oscillatory dynamic primitive can be found at the observational level; rhythmic movements could be a combination of back-to-back overlapping submovements in opposite directions. But physiological evidence clearly shows that rhythmic behavior cannot always be dismissed as a combination of submovements and is a distinct dynamic primitive.

Another open question is how many classes of primitives may be required. Here we have considered three—submovements, oscillations and impedances—but there are other possibilities. For example, synergies have been proposed as primitive elements of motor coordination to simplify the problem of managing the many degrees of freedom of the biological motor control system. That may be true, but it is also possible that at least some synergies may be an emergent property of mechanical impedance (Hogan and Sternad, 2012). Which of these possibilities is more competent requires further study.

We expect the parameters of individual exemplars within each class of primitives to be limited but we do not yet know the precise values of those limits. For example, a lower limit to the period of oscillatory movements seems uncontroversial (infinitely rapid movements are physiologically implausible) but there also appears to be an upper limit to the period of primitive oscillatory actions. Beyond that limit, submovements appear to predominate

# **REFERENCES**


*Behav.* 42, 333–342. doi: 10.1080/ 00222895.2010.529332


(Doeringer and Hogan, 1998; Dipietro et al., 2004, 2005a,b; van der Wel et al., 2010). Within their limits it is unclear whether the parameters may take on any of a continuous range of values or are confined to a finite set of values. Further research is required.

One essential aspect of our conception of motor control based on primitives is that they are attractors. That prompts the question: which attractors underlie human locomotion? They might be point attractors, e.g., to support foot placement; or trajectory attractors, e.g., to control foot trajectory; or limit-cycle attractors, e.g., to account for the orbital stability of the walking rhythm; or even chaotic attractors as reported by Hausdorff et al. (1995) and Hausdorff et al. (2001). Which of these attractors, or combinations of them, are demonstrable in human locomotion remains to be established.

To conclude, we do not have the temerity to claim that what we have outlined is yet a complete account of upper- and lowerextremity motor behavior. Yet we do contend that such a theory is possible, necessary and timely—perhaps even overdue. Its development will inevitably require considerable hard work from many contributors. To quote Ziman (1969):

*This technique, of soliciting many modest contributions to the store of human knowledge, has been the secret of Western science since the seventeenth century, for it achieves a corporate, collective power that is far greater than one individual can exert.*

The main thing is to get started. This paper is an attempt to do so.

# **ACKNOWLEDGMENTS**

Neville Hogan was supported in part by the Eric P. and Evelyn E. Newman fund and by DARPA under the Warrior Web program, BAA-11-72. Dagmar Sternad was supported by The National Institutes of Health R01-HD045639, the American Heart Association 11SDG7270001, and the National Science Foundation NSF DMS-0928587.

Huygens's clocks. *Proc. Math. Phys. Eng. Sci.* 458, 563–579. doi: 10.1098/ rspa.2001.0888


nonlinear frictions: an experimental validation. *IEEE Trans*. *Mech.* 18, 775–786. doi: 10.1109/TMECH. 2012.2184767


in the maintenance of human arm posture. *IEEE Trans. Syst. Man. Cybern.* 23, 698–709. doi: 10.1109/ 21.256543


*1979 Advances in Bioengineering,* ed M. K. Wells (New York, NY: ASME), 53–54.


328–373. doi: 10.1162/NECO\_a\_ 00393


the neural rhythm generators. *Biol. Cybern.* 56, 345–353. doi: 10.1007/ BF00319514


frequency scaling in a coupled oscillator model for free rhythmic actions. *Neural Comput.* 20, 205–226. doi: 10.1162/neco.2008. 20.1.205


D. A. (2006). Modelling spinal circuitry involved in locomotor pattern generation: insights from deletions during fictive locomotion. *J. Physiol*. 577, 617–639. doi: 10.1113/jphysiol.2006.118703


*Mov. Sci.* 19, 627–664. doi: 10.1016/ S0167-9457(00)00028-2


**Conflict of Interest Statement:** Neville Hogan holds equity in Interactive Motion Technologies, Inc., a Massachusetts corporation that manufactures robotic technology for rehabilitation. The other author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 05 February 2013; paper pending published: 17 March 2013; accepted: 12 May 2013; published online: 21 June 2013.*

*Citation: Hogan N and Sternad D (2013) Dynamic primitives in the control of locomotion. Front. Comput. Neurosci. 7:71. doi: 10.3389/fncom.2013.00071*

*Copyright © 2013 Hogan and Sternad. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

# Synaptic and functional linkages between spinal premotor interneurons and hand-muscle activity during precision grip

# *Tomohiko Takei 1,2\* and Kazuhiko Seki 1,2,3\**

*<sup>1</sup> Department of Neurophysiology, National Institute of Neuroscience, Tokyo, Japan*

*<sup>2</sup> Department of Developmental Physiology, National Institute for Physiological Sciences, Okazaki, Japan*

*<sup>3</sup> PRESTO, Japan Science and Technology Agency, Tokyo, Japan*

### *Edited by:*

*Andrea D'Avella, IRCCS Fondazione Santa Lucia, Italy*

### *Reviewed by:*

*Simon Giszter, Drexel Med School, USA Marco Santello, Arizona State University, USA*

### *\*Correspondence:*

*Tomohiko Takei and Kazuhiko Seki, Department of Neurophysiology, National Institute of Neuroscience, 4-1-1 Ogawa-Higashi, Kodaira, Tokyo 187-8502, Japan. e-mail: takei@ncnp.go.jp; seki@ncnp.go.jp*

Grasping is a highly complex movement that requires the coordination of a number of hand joints and muscles. Previous studies showed that spinal premotor interneurons (PreM-INs) in the primate cervical spinal cord have divergent synaptic effects on hand motoneurons and that they might contribute to hand-muscle synergies. However, the extent to which these PreM-IN synaptic connections functionally contribute to modulating hand-muscle activity is not clear. In this paper, we explored the contribution of spinal PreM-INs to hand-muscle activation by quantifying the synaptic linkage (SL) and functional linkage (FL) of the PreM-INs with hand-muscle activities. The activity of 23 PreM-INs was recorded from the cervical spinal cord (C6–T1), with EMG signals measured simultaneously from hand and arm muscles in two macaque monkeys performing a precision grip task. Spike-triggered averages (STAs) of rectified EMGs were compiled for 456 neuron–muscle pairs; 63 pairs showed significant post-spike effects (PSEs; i.e., SL). Conversely, 231 of 456 pairs showed significant cross-correlations between the IN firing rate and rectified EMG (i.e., FL). Importantly, a greater proportion of the neuron–muscle pairs with SL showed FL (43/63 pairs, 68%) compared with the pairs without SL (203/393, 52%), and the presence of SL was significantly associated with that of FL. However, a significant number of pairs had SL without FL (SL∩!FL, *n* = 20) or FL without SL (!SL∩FL, *n* = 203), and the proportions of these incongruities exceeded the number expected by chance. These results suggested that spinal PreM-INs function to significantly modulate hand-muscle activity during precision grip, but the contribution of other neural structures is also needed to recruit an adequate combination of hand-muscle motoneurons.

**Keywords: muscle synergy, grasping, spinal cord, spike-triggered average, cross-correlation**

# **INTRODUCTION**

Grasping is a highly complex movement that requires the coordination of a number of hand joints and muscles. The large number of degrees of freedom (DOF) of hand anatomy enable its flexible and varied movement, but this requires a high computational load and causes the "DOF problem" (Wing et al., 1996). Previous electromyographic (EMG) studies in non-human primates showed that hand-muscle activity can be explained by a linear combination of a few basic components (i.e., muscle synergy), suggesting that the neural system reduces hand anatomy DOF by using muscle synergies as modules (Brochier et al., 2004; Overduin et al., 2008).

Neural implementation of muscle synergy has been extensively investigated for hind-limb movement in frogs (Giszter et al., 1993; Mussa-Ivaldi et al., 1994; Tresch et al., 1999; Saltiel et al., 2001; Bizzi et al., 2002) and rats (Tresch and Bizzi, 1999), and spinal interneuron involvement has been suggested. As for hand-muscle synergy, we showed that spinal premotor interneurons (PreM-INs) had divergent facilitatory effects in multiple finger muscles by compiling the spike-triggered averages (STAs) of rectified EMGs in monkeys performing a precision grip task (Takei and Seki, 2010), and PreM-INs showed significant trial-to-trial correlations with grip force and target muscle activity (Takei and Seki, 2006). These results suggested that PreM-IN divergent connections facilitate the coactivation of hand muscles and that they could contribute to hand-muscle synergy formation. However, the extent to which these PreM-IN synaptic connections functionally contribute to activate hand muscles and to build muscle synergy is not clear.

In this study, we specifically tested how PreM-IN output contributes to hand-muscle activation by quantifying PreM-IN synaptic linkage (SL) and functional linkage (FL) with handmuscle activity (Miller et al., 1993; McKiernan et al., 2000; Holdefer and Miller, 2002). In two macaque monkeys performing a precision grip task, we recorded PreM-IN activity in the cervical spinal cord (C6–T1), with EMG signals measured simultaneously from hand and arm muscles. SL was quantified by testing the existence of post-spike effects (PSEs) with STAs of the rectified EMG. FL was determined by calculating the long-term cross-correlation between the PreM-IN firing rate and each rectified EMG. Then, we compared the SL and FL and examined the associations between the presence of SL and of FL. Our results showed that SL and FL between PreM-IN and their target muscle activities were significantly associated, indicating that spinal PreM-INs significantly contribute to modulating hand-muscle activity involved in grasp control. However, a significant number of incongruities between SL and FL were also found, suggesting that other neural structures contributed to recruiting an adequate combination of hand-muscle motoneurons.

# **MATERIALS AND METHODS**

### **ANIMALS**

Electrophysiological recordings were obtained from two adult macaque monkeys (*monkey A*: *Macaca fuscata*, male, 6.8 kg, and *monkey E*: *Macaca mulatta*, male, 5.6 kg). Experiments were performed in accordance with the National Institutes of Health Guidelines for the Care and Use of Laboratory Animals and were approved by the Animal Research Committee at the National Institute for Physiological Sciences, Japan.

# **PRECISION GRIP TASK**

Details of the behavioral task, surgical operations, experimental setup, and procedures for recording single-unit and EMG activity were described previously (Takei and Seki, 2008, 2010). Briefly, monkeys were trained to grip spring-loaded levers with the index finger and thumb (precision grip task, **Figure 1A**). The lever positions were displayed on a computer screen as cursors, and monkeys were required to track targets. Each trial consisted of a rest period (1.0–2.0 s), lever grip, lever hold (1.0–2.0 s), and lever release. Successful completion of a trial was rewarded with a drop of applesauce. The force required to reach the target positions was adjusted individually for the index finger and thumb (monkey A: 0.4–2.0 N for index finger, 1.0–3.0 N for thumb; monkey E: 0.6–1.1 N for index finger, 0.1–0.3 N for thumb).

# **SURGICAL PROCEDURES AND DATA ACQUISITION**

Unilateral laminectomy of vertebrae C5–T1 was performed while the animals were anesthetized with isoflurane (1.0–2.0% in 2:1 O2:N2O) or sevoflurane (1.5–3.0% in 2:1 O2:N2O) under aseptic conditions, and a custom-made recording chamber was implanted over the laminectomy (Perlmutter et al., 1998). During the recording, the monkey was seated in a primate chair with the head and upper back restrained. Single-unit activities from C5–T1 were recorded with a Tungsten or Elgiloy microelectrode. EMGs from the hand, forearm, and upper arm muscles were simultaneously recorded (**Figure 1B**). For EMG recording, pairs of stainless steel wires (AS632, Cooner Wire) were chronically implanted subcutaneously in 19 (monkey A) or 20 (monkey E) forelimb muscles, including intrinsic hand muscles: first, second, third, and fourth dorsal interosseous (FDI, 2DI, 3DI, and 4DI, respectively); adductor pollicis (ADP); abductor pollicis brevis (AbPB); abductor digiti minimi (AbDM); extrinsic hand muscles: flexor digitorum superficialis (FDS), radial and ulnar parts of flexor digitorum profundus (FDPr and FDPu), abductor pollicis longus (AbPL), extensor digitorum-2,3 (ED23), extensor digitorum-4,5 (ED45), extensor digitorum communis (EDC); wrist muscles: flexor carpi radialis (FCR), flexor carpi ulnaris

**FIGURE 1 | Data recording and analysis procedures. (A)** Spinal interneuron (IN) activity and forelimb EMG activities were recorded while monkeys performed a precision grip task. **(B)** The signals recorded during two successive trials are shown: grip force (*top*), spinal interneuron firing (*middle*), and 20 EMG recordings (*bottom*). **(C)** Data analysis procedures. Spike-triggered averages were compiled using a spike train and rectified EMG signals (*top*). Cross-correlations were calculated from the neural signal, which was transformed to an instantaneous firing rate signal, low-pass filtered, and downsampled, and EMG signals, which were low-pass filtered and downsampled (*bottom*).

(FCU), palmaris longus (PL), extensor carpi radialis longus and brevis (ECRl and ECRb), extensor carpi ulnaris (ECU); and elbow muscles: brachioradialis (BRD), pronator teres (PT), biceps brachii (biceps), and triceps brachii (triceps). The muscles recorded in each monkey were tabulated in a previous paper (Takei and Seki, 2010). Data recorded over at least 10 trials for each single unit were included in the present dataset.

### **SPIKE-TRIGGERED AVERAGING OF RECTIFIED EMGs**

To quantify the SL from spinal INs to hand motoneuron pools, we computed the STA of rectified EMGs (**Figure 1C**). Details of the STA method were described previously (Takei and Seki, 2010). Briefly, STAs were compiled off-line for neuron–muscle pairs with at least 2000 recorded action potentials. All spikes recorded during whole-task phases (i.e., rest, grip, hold, and release phases and intertrial intervals) were used to compile the STAs. EMG was rectified and averaged over an interval of 80 ms, beginning 30 ms before and ending 50 ms after the spike onset. The baseline STA trend was subtracted using the incremented-shifted averaging (ISA) method (Davidson et al., 2007), and then the STA was smoothed with a flat five-point finite impulse response filter. Significant STA effects were identified with multiple-fragment statistical analysis (Poliakov and Schieber, 1998) at *p <* 0*.*0025 (*p <* 0*.*05 after Bonferroni's correction). The test window was set at a duration of 12 ms (i.e., between 3 and 15 ms) after the spinal neuron spike. Potential cross-talk between simultaneously recorded EMGs was evaluated by combining a cross-correlation method (Buys et al., 1986) and the third EMG differentiation (Kilner et al., 2002). STA effects potentially resulting from crosstalk between EMG recordings were eliminated from the present dataset.

The STAs of rectified EMGs can produce two types of effects: PSEs and synchrony effects (Schieber and Rivlis, 2005). PSEs reflect the mono- or disynaptic effects of trigger neurons on the motoneuron pool that facilitate or suppress the EMG signal (Fetz and Cheney, 1980). In contrast, synchrony effects are derived from synaptic inputs from other neurons in the motoneuron pool that are synchronized with the discharges of the trigger neurons (Fetz and Cheney, 1980; Schieber and Rivlis, 2005). Therefore, synchrony effects can appear in STAs even if no mono- or disynaptic connection exists between the trigger neuron and the motoneuron pool. Based on criteria established by Schieber and Rivlis (2005), we discriminated PSEs from other synchrony effects according to the onset latency and peak width at half maximum (PWHM) of the STA effects (Schieber and Rivlis, 2005). Onset latency was defined as the time when the averaged EMG exceeded two standard deviations (SDs) from the baseline mean (from 10 to 30 ms before the trigger). PWHM of the STA effect was determined by finding the level that was half of the peak amplitude above (or below for a trough) the baseline mean and by measuring the width of the peak (or trough) at this level. The earliest possible onset latency of the PSEs was set at 3.5 ms based on our previous investigation (Takei and Seki, 2010). The largest PSE PWHM was set at 7 ms based on theoretical considerations (Baker and Lemon, 1998). Therefore, if a neuron produced PSEs with an onset latency of *>*3.5 ms and PWHM of *<*7 ms on at least one muscle, the neuron was identified as a PreM-IN.

### **CROSS-CORRELATION BETWEEN PreM-IN AND EMG ACTIVITY**

To quantify the FL between neuronal activity and hand-muscle activity, we calculated cross-correlations between the neuronal and muscle activities (**Figure 1C**). First, the instantaneous firing rate [*IFR*(*t*)] of PreM-INs was calculated as the inverse of the interspike interval:

$$IFR(t) = \frac{1}{t\_{i+1} - t\_i}, \text{ for } t\_i < t < t\_{i+1}, \dots$$

where *ti* is the time of the *i*th spike. The instantaneous firing rate was then low-pass filtered (second order, Butterworth, cutoff of 20 Hz in forward and backward directions) and downsampled to 1000 Hz. Rectified EMGs were also low-pass filtered (second order, Butterworth, cutoff of 20 Hz in forward and backward directions) and down-sampled to 1000 Hz. Continuously recorded 90-s data points, which contained ∼10 successive trials including whole-task phases, were used to calculate the cross-correlation. Cross-correlation significance was defined by a Monte Carlo method (Miller et al., 1993). To obtain the crosscorrelations between non-correlated signals, we transposed the first and second halves of the spinal IN rate signal and calculated the full set of the cross-correlations to obtain the distribution of the peak values between the uncorrelated signals. The 0.5th and 99.5th percentiles of this distribution were used as the lower and upper levels of significance for the cross-correlations (i.e., *p <* 0*.*01). The transposed signal cross-correlations were compiled for 456 neuron–muscle pairs, and the lower and upper limits were set at −0.29 and 0.25, respectively. These analyses were performed off-line using MATLAB (MathWorks).

# **RESULTS**

# **SYNAPTIC LINKAGE BETWEEN SPINAL PreM-INs AND HAND-MUSCLE ACTIVITY**

Among the 210 spinal neurons recorded from the two monkeys (34 in monkey A, 176 in monkey E), 23 neurons produced 63 significant PSEs (51 facilitations and 12 suppressions, SL) in hand and arm muscles, and were identified as PreM-INs (18 excitatory and five inhibitory). The neurons had either post-spike facilitation (PSF) or post-spike suppression (PSS) effects on at least one muscle; no neuron had both PSF and PSS simultaneously. As an example, a single PreM-IN STA is shown in **Figure 2A**. This IN produced significant PSF in four hand muscles (FDI, ADP, AbDM, and FDS). In total, PreM-INs produced PSE in 2.7 ± 2.1 [mean ± standard deviation (SD)] muscles (excitatory: 2.8 ± 2.1; inhibitory: 2.4 ± 2.1) on average, which is referred to as a muscle field (Fetz and Cheney, 1980; Buys et al., 1986). This result indicated that the spinal PreM-INs had a divergent hand-muscle field rather than affecting the activity of a single muscle.

# **FUNCTIONAL LINKAGE BETWEEN SPINAL PreM-INs AND HAND-MUSCLE ACTIVITY**

A majority of PreM-INs (19 of 23; 83%), including 17 excitatory and two inhibitory PreM-INs, had significant cross-correlations with at least one muscle (FL). In total, 246 of 456 neuron– muscle pairs had significant cross-correlations. Interestingly, FL polarity was positively biased; most FLs were positive (231 of

Data from 10 muscles were selected. Spike-triggered averages with significant post-spike effects (PSEs) are shown in red, and cross-correlations with significant peaks are shown in blue. The solid as follows: both SL and FL were significant (SL∩FL); SL was significant but not FL, or vice versa (SL∩!FL and !SL∩FL); and neither SL nor FL was significant (!SL∩!FL).

246; 94%), and only a few pairs showed negative FLs (15 of 246; 6%). Moreover, all PreM-INs with significant FLs had positive FLs regardless of whether they were excitatory or inhibitory PreM-INs; two excitatory PreM-INs concurrently had negative FLs (**Table 1**). This result suggested that the excitatory and inhibitory PreM-INs were mostly coactivated with hand muscles during precision grip rather than being reciprocally activated. An example of cross-correlations in a single PreM-IN (same neuron as shown in **A**) is shown in **Figure 2B**. This IN had a significant positive cross-correlation with six hand muscles (FDI, ADP, FDS, FDPu, AbPL, and EDC). In total, PreM-INs had a FL with 10.7 ± 6.7 muscles on average, and the size was significantly larger than that of the muscle field of SL (*p <* 0*.*05, *t*-test). This result indicates that PreM-IN activity had significant covariation with muscles other than those on which they had output effects.



### **ASSOCIATION BETWEEN SYNAPTIC AND FUNCTIONAL LINKAGES**

To test the relationship between synaptic connections and functional covariation further, SL and FL pairwise association was tested. The example in **Figure 2** shows various combinations of SL and FL. For example, FDI showed both significant PSF and cross-correlation, indicating that the PreM-IN had a strong excitatory effect on the motoneurons of this muscle, and their activities strongly covaried. This implies a causal relationship, as the PreM-IN activity modulated the target muscle activity. In addition to these congruent cases, however, there were many incongruent instances. The AbDM had a clear significant PSF from the PreM-IN, but the cross-correlation of their activities was not significant. In another example, FDPu showed a clear crosscorrelation peak, but it had no significant PSE on the STA. To quantify the association between SL and FL, the neuron–muscle pairs were categorized into four groups according to the existence of significant SL and FL (**Figure 2C**): both SL and FL were significant (SL∩FL), SL was significant but FL was not, or vice versa (SL∩!FL and !SL∩FL); and neither SL nor FL was significant (!SL∩!FL). In the PreM-INs shown in **Figure 2**, three pairs (pairs with FDI, ADP, FDS) were SL∩FL, one pair (AbDM) was SL∩!FL, three pairs (FDPu, AbPL, and EDC) were !SL∩FL, and three pairs (ED23, FCR, and ECU) were !SL∩!FL.

Among a total of 456 neuron–muscle pairs, 266 pairs showed either a SL (PSF or PSS) or a FL (positive or negative), and 43 pairs concurrently showed both SL and FL (SL∩FL, **Figure 3A**, **Table 2**). The existence of SL and FL was significantly associated (*p* = 0*.*014, χ<sup>2</sup> = 6*.*0); a greater proportion of pairs with significant SL than pairs without significant SL also showed significant FL (43/63 pairs, 68%, and 203/393, 52%, respectively), and only 20/63 pairs with significant SL lacked the significant FL. This clear association between SL and FL suggested that spinal PreM-IN output effects significantly modulate target muscle activities.

Interestingly, SL and FL association depended on whether the PreM-INs were excitatory or inhibitory (**Figures 3B,C**). In the excitatory PreM-INs, the majority (42 of 51; 82%) of neuron– muscle pairs with a significant SL (PSF) also showed a significant FL, and the association between SL and FL was significant (*p* = 0*.*006, χ<sup>2</sup> = 7*.*5). On the other hand, for the inhibitory PreM-INs, only one of 11 neuron–muscle pairs with significant SL (PSS) showed a significant FL, and the association was not significant (*p* = 0*.*6, χ<sup>2</sup> = 0*.*3). This result suggested that excitatory PreM-INs constituted the prime movers of the target muscle activity; inhibitory PreM-INs were involved to a lesser extent.

Although there was a significant association between SL and FL, there were many exceptions: 20 pairs had SL without FL (SL∩!FL), and 203 pairs had FL without SL (!SL∩FL). To test

**FIGURE 3 | Association between SL and FL.** Venn diagrams showing the association between SL and FL in all PreM-INs **(A)**, excitatory PreM-INs **(B),** and inhibitory PreM-INs **(C)**. Relative size of the red, blue, and purple areas are proportional to the number of pairs in each category: SL∩!FL, !SL∩FL, and SL∩FL, respectively.


whether these incongruities occurred by chance, we quantified the chance level of these incidences. PSE significance was tested using *p <* 0*.*0025 (*PSL*), and cross-correlation significance was tested using *p <* 0*.*01 (*PFL*). Therefore, the chance level of SL∩!FL and !SL∩FL was set to *PSL*<sup>∗</sup> (1 − *PFL*) and (1 − *PSL*) <sup>∗</sup> *PFL*, respectively. A binomial test showed that the number of SL∩!FL (*n* = 20) and !SL∩FL (*n* = 203) significantly exceeded the chance level (*p <* 0*.*001, binomial test). These results indicated that SL and FL were clearly associated, but a significant number of incongruities between SL and FL also existed.

# **DISCUSSION**

The existence of hand-muscle synergy and the modular control of primate grasping has been suggested (Brochier et al., 2004; Overduin et al., 2008), but neural implementation of hand-muscle synergy remained unclear. Here we explored how PreM-IN output effects contributed to hand-muscle activation by investigating the PSEs of PreM-INs on hand muscles (i.e., SL) and the long-term cross-correlation between PreM-IN and handmuscle activity (i.e., FL). Our results showed that the existence of SL and FL were significantly associated and suggested that spinal PreM-IN output effects significantly contribute to hand-muscle activity modulation during grasp control. However, we also found considerable incongruities between SL and FL. This result suggested that although the PreM-IN output projections significantly affect hand-muscle activity modulation, other neural structures are needed to recruit an adequate combination of hand-muscle motoneurons.

## **ASSOCIATION BETWEEN SPINAL PreM-IN SL AND FL WITH HAND-MUSCLE ACTIVITY**

The contribution of spinal interneurons to muscle synergy has been extensively investigated in the hind-limb movement of frogs (Giszter et al., 1993; Mussa-Ivaldi et al., 1994; Tresch et al., 1999; Saltiel et al., 2001; Bizzi et al., 2002; Hart and Giszter, 2010) and rats (Tresch and Bizzi, 1999). However, it was not self-evident that the analogous neural mechanism could be assumed for the primate cervical spinal cord and the control of hand grasping. Our results revealed that spinal PreM-INs in the primate cervical cord had divergent output effects on hand muscles and significantly functioned to modulate target muscle activity, suggesting that they could be a part of the neural implementation of hand-muscle synergy. This is analogous to frog lumbar spinal interneurons (Hart and Giszter, 2010). In hind-limb movement, a small number of motor primitives are represented in spinal cord, and their combination can construct a variety of reflexive and natural movements (d'Avella et al., 2003). As the motor primitives exist in the lower CNS (i.e., spinal cord), the control dimension in the higher motor structures might be reduced (Tresch and Jarc, 2009). Similarly, primate hand movements are characterized by very high DOF (Ogihara and Oishi, 2012) and therefore may have a computational advantage if a neural structure for hand muscle synergy is implemented in the spinal cord.

The clear association between SL and FL was specific to excitatory PreM-INs and was not found in inhibitory PreM-INs (**Figures 3B,C**). This suggests a functional difference between excitatory and inhibitory spinal PreM-INs related to the control of primate grasping. Excitatory PreM-INs mostly positively covaried with the target muscles (**Figure 3B**), suggesting that excitatory PreM-INs were a prime mover of hand-muscle coactivation. Conversely, few inhibitory PreM-INs significantly covaried with target muscles (**Figure 3C**). Because no inhibitory PreM-INs showed significant negative covariation with target muscle activity (**Table 1**), inhibitory PreM-INs may function to adjust the activities and response gains of agonist muscles (Chance et al., 2002; Berg et al., 2007; Kristan, 2007), rather than reciprocally inhibiting antagonist muscles.

# **POSSIBLE MECHANISMS OF INCONGRUENCE BETWEEN SL AND FL**

In addition to the significant association between SL and FL, our results also showed a significant number of incongruities, i.e., !SL∩FL and SL∩!FL (**Figures 2**, **3**). Several mechanisms could explain these incongruities (**Figure 4**). First, the incongruities can be explained by assuming a common input into several PreM-INs, which have different types of muscle field. **Figure 4A** shows a schematic illustration of how a common input ("S") can produce these incongruities. Common input into two excitatory (IN1 and IN2) and one inhibitory PreM-IN (IN3) induces synchronization among these PreM-INs. This synchronization, in turn, would induce covariation between the activity of the recorded PreM-IN (IN1) and its non-target muscles (M4–5) due to the synchronized excitatory PreM-IN (IN2) input to them (!SL∩FL). Additionally, the inhibitory PreM-IN (IN3), synchronized with the recorded PreM-IN, suppresses the shared target muscle activity (M1), and this might result in decorrelation between IN1 and M1, even though IN1 had a synaptic effect on M1 (SL∩!FL). The correlation between spinal INs reported by Prut and Perlmutter (2003) may have been induced by divergent branching of the descending (Shinoda et al., 1981; Li and Martin, 2002) and afferent (Ishizuka et al., 1979; Brown, 1981; Ralston et al., 1984) axons to the spinal cord.

Another possible explanation for the incongruities is the involvement of other premotor systems (**Figure 4B**). If a premotor system ("D"), parallel to the spinal PreM-INs, primarily contributes to the formation of the hand-muscle coactivation pattern, the SL and FL between PreM-IN and the target muscle could produce incongruities (**Figure 4B**). For example, if a premotor system coactivates the target muscles (M2–3) and the recorded PreM-IN (IN1) while suppressing some of the target muscles (M1) via inhibitory neurons, the incongruences will occur between SL and FL of the recorded PreM-IN and its target muscles (M1 is SL∩!FL and M3 is SL∩!FL). Every premotor system that bypasses spinal PreM-INs [e.g., corticomotoneuronal (CM), rubromotoneuronal (RbM), reticulomotoneuronal (RtM), and group-Ia primary afferent cells] is a possible candidate for the premotor system that contributes to the coactivation of hand-muscle activity. CM cells have a selective hand-muscle field (Buys et al., 1986), and they could function to coactivate a small group of hand muscles. However, CM neurons are specifically active during precision grip as compared with power grip (Muir and Lemon, 1983), and their firing increases when one of the their target muscles is more active than another, in contrast to equal coactivation of the target muscles (Bennett and Lemon, 1996). Therefore, CM cells might function to fractionate hand-muscle activity rather than simply to coactivate the target muscles. The relative contribution of CM cells and PreM-INs to hand-muscle activity control and hand-muscle synergies should be further tested. RbM cells are another candidate for constructing muscle synergy. Several studies showed that RbM cells have a divergent hand-muscle field (Mewes and Cheney, 1991; Sinkjaer et al., 1995). However, it has been reported that their muscle field is strongly biased toward the forearm extensors (Mewes and Cheney, 1991; Sinkjaer et al., 1995); cells in the magnocellular division of the red nucleus, where most RbM cells are located, are preferentially activated when monkeys preshape their hand rather than when they grasp objects during a reaching-to-grasp task (Van Kan and McCurdy, 2001, 2002). These results suggested that RbM cells mainly contribute to constructing the muscle synergy for preshaping the hand rather than for grasping objects. Finally, Davidson (2011) recently reported the PSEs of pontomedullary reticular formation (PMRF) neurons on the extrinsic hand muscles (Davidson, 2011); hence, RtM cells may function to modulate hand-muscle activity involved in the control of grasping. In addition to these descending sources, Ia afferent to spinal motoneurons, which are also obvious premotor neurons, show task-relevant activity during wrist movement (Flament et al., 1992), but their contribution to hand-muscle movement is unknown. The afferent feedback may include functions that modulate the SL–FL relationship and define the final muscle activities according to context and external event. So far, as seen in these previous reports, the contributions of each descending tract to hand grasping have been separately investigated. Therefore, the differential contributions of these multiple premotor systems (spinal PreM-INs, CM, RbM cells, RtM cells, and primary afferents) to the control of hand grasping and the mechanism of their coordination for control of hand grasping remain to be clarified. To approach this issue, it is crucial to directly compare functional differences in the contributions of

these parallel premotor systems to the formation of hand muscle synergy under the same behavioral paradigm and in the same subjects.

# **CONCLUSIONS AND FUTURE DIRECTIONS**

In this study, we explored how PreM-IN output effects contribute to hand-muscle activation by investigating SL and FL between PreM-INs and hand-muscle activity in monkeys performing a precision grip task. Our results showed that SL and FL between PreM-IN and their target muscle activities were significantly associated, indicating that spinal PreM-INs contribute to hand-muscle activity modulation during control of grasping. However, a significant number of incongruities between SL and FL were also found, suggesting the contribution of other neural structures in recruiting an adequate combination of hand-muscle motoneurons. Further studies are needed to elucidate the relative importance of multiple premotor systems to the control of hand-muscle activity during grasping.

The co-existence of associations and incongruities between SL and FL may reflect that the modular control of hand movements is characterized by both fixed and flexible control (Macpherson, 1991). First, the clear association between SL and FL indicates that synaptic connections from PreM-INs significantly contribute to the modulation of hand-muscle activity. As PreM-INs have a divergent muscle field, these neuroanatomical or hardwired connections may produce the invariant activation patterns of the hand-muscle activities. On the other hand, the fact that FL is not always restricted to instances of SL but can be dissociated from the latter suggests that FL may be flexible according to the context or tasks (Nazarpour et al., 2012). Let us imagine that the monkeys in this study performed a different type of grasping task (e.g., power-grip task) in addition to the precision-grip task. It is possible that the PreM-INs activated during the precision grip would also be recruited in a different grasping task and that the mechanism shaping FLs would be flexible enough to modify the basic pattern of the SLs according to task demands. In this case, the SL for a specific movement would be generalizable to other types of movement. Alternatively, it is also possible that the power-grip task would recruit populations of PreM-INs, producing SLs that differed from those recruited for the precision grip, and that the PreM-INs would form FLs that would be adequate for the powergrip task. In this case, the generalization of a given SL would be rather limited, and a different movement would be controlled by different PreM-INs that exhibit unique SLs. Although results in this paper suggest a flexible FL, these two possibilities may not be mutually exclusive. Further studies investigating PreM-IN firing during different types of grasping may contribute to understanding the invariance and flexibility of the modular control of hand movements.

# **ACKNOWLEDGMENTS**

This work was supported by a Grant-in-Aid for Scientific Research on Priority Areas [Mobilligence] and [System study on higher-order brain function] from MEXT (18020030, 18047027), a Grant-in-Aid for Young Scientists (B) from MEXT (21700437, 23700482), and the JST PRESTO program. We thank Nobuaki Takahashi (NIPS) for technical assistance.

# **REFERENCES**


fields organized in the frog's spinal cord. *J. Neurosci.* 13, 467–491.


control of task-specific muscle synergies. *J. Neurosci.* 32, 12349–12360.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 11 January 2013; paper pending published: 15 February 2013; accepted: 03 April 2013; published online: 25 April 2013.*

*Citation: Takei T and Seki K (2013) Synaptic and functional linkages between spinal premotor interneurons and hand-muscle activity during precision grip. Front. Comput. Neurosci. 7:40. doi: 10.3389/fncom.2013.00040*

*Copyright © 2013 Takei and Seki. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

# Compositionality in neural control: an interdisciplinary study of scribbling movements in primates

# *Moshe Abeles 1,2\*, Markus Diesmann3, Tamar Flash4, Theo Geisel 5, Michael Herrmann6 and Mina Teicher 1,7*

*<sup>1</sup> Gonda Brain Research Center, Bar Ilan University, Ramat Gan, Israel*


*<sup>5</sup> MPI for Dynamics and Self-Organization, Göttingen, Germany*


### *Edited by:*

*Martin Giese, University Clinic Tuebingen, Germany*

### *Reviewed by:*

*Alessandro Treves, Scuola Internazionale Superiore di Studi Avanzati, Italy Joaquín J. Torres, University of Granada, Spain*

### *\*Correspondence:*

*Moshe Abeles, Gonda Brain Research Center, Bar-Ilan University, Max ansAnna Wb Str., 52900 Ramat-Gan, Israel e-mail: abelesm@mail.biu.ac.il*

This article discusses the compositional structure of hand movements by analyzing and modeling neural and behavioral data obtained from experiments where a monkey (*Macaca fascicularis*) performed scribbling movements induced by a search task. Using geometrically based approaches to movement segmentation, it is shown that the hand trajectories are composed of elementary segments that are primarily parabolic in shape. The segments could be categorized into a small number of classes on the basis of decreasing intra-class variance over the course of training. A separate classification of the neural data employing a hidden Markov model showed a coincidence of the neural states with the behavioral categories. An additional analysis of both types of data by a data mining method provided evidence that the neural activity patterns underlying the behavioral primitives were formed by sets of specific and precise spike patterns. A geometric description of the movement trajectories, together with precise neural timing data indicates a compositional variant of a realistic synfire chain model. This model reproduces the typical shapes and temporal properties of the trajectories; hence the structure and composition of the primitives may reflect meaningful behavior.

**Keywords: voluntary-movements, scribbling, compositionality, hand-motion-model, synfire chains, motion-primitives**

# **1. INTRODUCTION**

The hallmark of human cognition is compositionality. We compose words out of phonemes, phrases out of words, sentences out of phrases. Primate dexterity exhibits the same property and may even form the evolutionary origin of language. Compositionality may be manifested either by stringing components along in time, such as in speech, or simultaneously, such as in understanding a visual scene. In motor behavior both forms are found abundantly. Complex drawing motions may be generated by concatenating simpler strokes. Picking up an object (*prehension*) requires simultaneous coordination of three timedependent motions: arm reaching, hand orientation, and finger shaping.

In drawings, as in language, not all possible combinations are utilized. The rules—which stroke may be concatenated to which—constitute the syntax of action. Motor compositionality, again like in language, is manifested at multiple levels starting from the way we activate groups of muscles to produce a movement, and ending in the way we coordinate fingers, hand and arm when using a tool.

This article explores continuous two-dimensional scribbling motions as a compositional product of brain activity. It integrates mathematical modeling of scribbling, recordings of multiple single unit activities during scribbling, the syntax of concatenating scribbling primitives into a continuous drawing. It also develops neural network scribbling models that are consistent with the known psychophysics of hand motion control and the observable properties of single units in the motor cortex. Overall, it is shown how such a compositional approach may be applied to two-dimensional scribbling. It is based on the rationale that whereas an infinite plethora of drawings could be created, in practice there are only a limited set of shapes that are generated spontaneously. This is suggested by an intuitive analysis of the drawings of small children; here the principles are demonstrated for monkey drawings as well as for adult scribbling.

Our data indicate that in a well-trained monkey, scribbling is composed of a small number of elementary shapes concatenated into a continuous drawing. Human subjects under similar conditions tend to behave similarly. Such elementary components may be regarded as drawing primitives if it can be shown that they exhibit the following properties:


While other reported studies have attempted to identify primitives at the kinematic dynamic and muscular levels (see for example, Brandon et al., 2002; d'Avella et al., 2003; Mussa-Ivaldi and Solla, 2004; Polyakov et al., 2009a,b; Dominici et al., 2011; Overduin et al., 2012a,b) while still others have proposed computational models for inferring sub-movements and/or primitives and for modeling other aspects of motor compositionality (Thoroughman and Shadmehr, 2000; Giese et al., 2009; Degallier and Ijspeert, 2010; Hogan and Sternad, 2012; Ijspeert et al., 2013, and see the current special issue of frontiers in Neuroscience), relatively a few studies have searched for the neural correlates of primitives or sub-movements at the cortical or spinal cord levels (Hatsopoulos et al., 2007; Hart and Giszter, 2010; Overduin et al., 2012a,b). Here we combined behavioral, neurophysiological and theoretical studies and have investigated the issues of the inference of motion primitives, their neural representations and the syntactic principles subserving their combination into sequences. We have also suggested novel concepts concerning temporal and spatial aspects of brain representations of complex movements based on the notion of synfire chains. In particular, generating a complex drawing shape repeatedly calls for precise sequencing of muscular activities. We suggest a neural network structure which is compatible with the known anatomy of the cortex, and can generate the sequence of elementary shapes observed in the behavior. We show by way of simulations that such networks may indeed generate the observed behavior and provide indirect evidence for its existence.

Our presentation is built on two levels. In section 2 we provide a synopsis in which we describe the main facts that lead to the above description. Following this synopsis each part is described in greater details and more supporting evidence is provided.

# **2. SYNOPSIS**

Here we briefly describe our main results in the 7 sections. In section 2.1, we describe the monkey's scribbling data. In sections 2.2, 2.3 and 2.4 we describe various ways of analyzing the neural activities recorded while the monkeys were scribbling. In section 2.5 we describe a neural network that can produce scribbling similar to those of the monkey. Finally in sections 2.6 and 2.7 we analyse human scribbling and ways in which the syntax of concatenating simple strokelets into a complete drawing may be detected.

# **2.1. MONKEY SCRIBBLING**

### *2.1.1. Materials and methods*

Two monkeys (*Macaca fascicularis*) were trained to sit in a monkey chair and hold a low-friction, low-inertia manipulandum and continuously move it in the horizontal plane. The hand and manipulandum were under an opaque white screen so the monkey could not see them. A yellow dot was projected on the screen just above the position of the manipulandum's handle. The controlling computer was programmed to select a random (invisible) target in the working space. Once the monkey hit this target a beep was sounded, the monkey got a few drops of orange juice, and another target was selected at random. In this way the monkey was induced to move the manipulandum continuously. The *X*-*Y* position of the manipulandum was recorded at a rate of 100 per second. Target entry time, beep times and reward delivery were also recorded.

Once the monkey reached a stable behavior, it was anesthetized and prepared for recording of single-unit activities through metal microelectrodes. Spike shapes were detected by template matching either during the experiments or after it. Analysis of parallel spike trains in the cortex of behaving monkeys previously revealed a sequence of quasi-stable firing rate states (Radons et al., 1994; Abeles et al., 1995; Seidemann et al., 1996). This was also found to be true here.

# *2.1.2. Results*

Initially the monkeys moved the manipulandum erratically but within several sessions the motion became rounded and more relaxed. Practice continued almost on a daily basis for a few weeks until the shape of the motion seemed stable. **Figure 1A** illustrates a short segment of a drawing. Red circles mark the points at which the monkey hit a target and got a reward.

Based on a previously suggested theory postulating the possible importance of equi-affine geometry in motion planning (Pollick and Sapiro, 1997; Handzel and Flash, 1999; Flash and Handzel, 2007), which was more recently extended to suggest a possible mixture of several geometries (Bennequin et al., 2009) it was postulated (Flash and Handzel, 2007; Polyakov et al., 2009a,b) that parabolas may subserve as drawings primitives. Indeed, we found that large segments of the drawings could be approximated by parabolas, where two parabolas are concatenated along their initial and final portions. **Figure 1B** illustrates such an approximation. The shape and orientation of a parabola may be specified by its focal distance (the radius of the circle at the tip) and the direction of its axis of symmetry. **Figure 1C** shows the distribution of these two parameters as the monkey's training progressed. Clearly, with time, the shapes "crystalized" into 2–3 parabolas.

Although the monkey often paused and occasionally drew some other shapes, the sequence of parabola #1 followed by parabola #2 followed by parabola #3 was by far the most prevalent.

If an entire parabola represents some elementary drawing component, there must be some expression in brain activity. This issue is examined in the next sections.

# **2.2. HIDDEN MARKOV MODEL ANALYSIS**

Drawing a large segment of a parabola requires recruiting neurons with different directional tuning in the correct sequence at the correct timing. If such a sequence is produced repeatedly it is likely that there will be brain activity that determines the drawing.

Producing a sequence of shapes repeatedly can be considered to be similar to the situation in speech when sequences of phonemes are repeatedly produced. The exact sound may vary from production to production, but the intention is the same. Speech analysis has benefitted from treating it as a Hidden Markov Model (Rabiner, 1989) where the intended phoneme is a hidden state, and the actual sound is an expression of this state. It is assumed that the hidden states behave like a first order Markov process. In such a process, time proceeds in small steps and at each step the state of the system either remains unchanged or flips to another state. The probabilities of transitions (or lack thereof) are

**FIGURE 1 | Parabolic elements of scribbling. (A)** Sample drawing. Motion is sampled at 100/s (blue dots) points at which reward was given are marked by red circles. **(B)** Breaking a piece into 3 parabolas. Black dots—the measurements; red, green, magenta—three parabolas. **(C)** Changes over the course of training. For each parabolic segment two parameters were extracted: the orientation of the parabola and its

focal distance. Those are plotted in the 2 dimensional (2-D) histogram. On the left is the histogram of the orientations (Y-marginal of the 2-D histogram). Each pair of histograms is for 1 training day. The left pair is for an early training session; the middle after a several months; and the right after half a year. Clearly, with training, three classes of parabolas emerged.

specified by a matrix *P* that indicates the probability of changing from a state *i* to a state *j* for all pairs *(i, j)*. This process is hidden, but at each state the system emits some outputs. The probability of emitting each possible output *O* at each state *i* is given by an emission probability matrix *Q*. Once *P* and *Q* are known, for any observable set of emissions the most likely sequence of hidden states the system went through can be computed.

In our case, the command or intention to draw a certain parabola can be considered to be a hidden state of the motor system. The brain activity during this state is the observed output. Applying Hidden Markov analysis to recorded brain activity may reveal hidden states that correspond to such intentions or commands **Figure 2** illustrates a small time span showing the probability of being in any one of six states over this time span.

# *2.2.1. Materials and methods*

Once the monkey reached a stable behavior it was anesthetized and prepared for recording of single-unit activities through metal microelectrodes. Spike shapes were detected by template matching either during the experiments or after it. More technical details are described in section 3.

# *2.2.2. Results*

In the past, analysis of parallel spike trains in the cortex of behaving monkeys did reveal a sequence of quasi-stable firing rate states (Radons et al., 1994; Abeles et al., 1995; Seidemann et al., 1996). This was found to be true also in the present analysis. **Figure 2** illustrates a small time span of analysis showing the probability of being at any one of six states over this time span.

It is often difficult to judge whether the analysis truly reveals some underlying states or is an inevitable result of the assumption that the observed firing is the outcome of a Markov process. In our data there is a unique opportunity to tackle this problem. In addition to the sequence of states we observed the behavior of the monkeys. If the HMM reveals states that are linked in time to the production of the parabolas we can infer that there are activity states which represent the intention (or maybe the plan or command) to draw the parabolas.

In this study we recorded the activity of only a few neurons out of tens of millions that are probably involved in planning and execution of arm motion. The recording area was in the arm region of M1 and the pre-motor cortices. Activities there are assumed to be related to the velocity vector of the hand end point. As such, they are expected to change dramatically over the span of each drawn parabola. A-priori the chances of finding stable firing configuration that span a whole parabola seem slim.

Nevertheless, in our data we found in four out of eight cases clear connection between the hidden Markov states and the parabolas. Section 4 describes this analysis in more details. **Figure 3** illustrates the clearest case.

The findings that in some cases we found evidence of stable firing configurations that relate to the drawn parabolas support the notion that the parabolas represent a distinct entity in the monkey's scribbling. Is there any special significance for parabolas in hand motion? The next section addresses this issue from a mathematical point of view.

# **2.3. EQUI AFFINE GEOMETRICAL ANALYSIS**

Continuous two-dimensional (2D) drawings of human and monkeys obey a series of simple laws. One of them states that during continuous 2D drawings the drawing speed becomes slower as the curvature of the drawings become higher. The relation between angular speed and the curvature is described by a power law which may be written as:

$$A = \mathbb{K}C^{\theta}$$

or by dividing by the radius (*R)*:

$$V = KR^{(\oplus -1)}$$

states are coded by the thickness of the arrows. The actual drawing shapes associated with each HMM state are plotted near the state number. The different colors have no meaning, they are meant to facilitate discrimination among the various repetitions of similar shapes. State 1 appears to be associated with one parabola, while states 2 and 3 with the two others. The other states are associated with other drawings, many of which had to do with pausing or restarting to move.

where *A* is the angular velocity, *C* is the curvature, *V* is the tangential velocity, *R* is the radius, *K* is the velocity-gain constant, and β is typically near 2/3. Note that *K* becomes larger when the drawing is larger (isochrony) and smaller when the level of accuracy is higher (Fit's law). There is something fundamental in the 2/3 power law as it holds not only for production of movement but also for the perception of motion speed. When a dot moves along a convoluted line, it is perceived as though it were moving at a constant speed when its movement follows the 2/3 power law (Viviani and Stucchi, 1992; Pollick and Sapiro, 1997; Levit-Binnun et al., 2006).

In differential geometry, the local properties of a curve can be described by means of the derivatives of the path with respect to some measure of distance (i.e., arc-length) and looking for invariant properties under some family of transformations. Of particular interest here are the equi-affine transformations in which a trajectory <*x*(*t*), *y*(*t*)> parameterized by time (or by arc-length) is transformed into a new trajectory <*u*(*t*), *w*(*t*)> by:

$$\mu(t) = a\mathbf{x}(t) + b\mathbf{y}(t) + c$$

$$\mathbf{w}(t) = d\mathbf{x}(t) + e\mathbf{y}(t) + f$$

with the condition: *ae* − *bd* = 1, i.e., the area within a closed loop is preserved by the transformation. When such trajectories follow the two-thirds power law in the time domain they have a constant equi-affine speed *K*.

All parabolas have zero equi-affine curvature. **Figure 4A** illustrates the equi-affine properties of a short piece of a monkey's drawing. We observe half a second stretch [from time of 0.2 up to 0.73 at which the equi-affine curvature is close to 0 (red trace)]. This segment is composed of 2 parabolas as can be seen in **Figure 4B**. In fact it was such a finding that drew our attention to the fact that the drawings tended to be composed of sequences of parabolas. This issue is described in more detail in section 4.

Thus in equi-affine differential geometry parabolas play the role of straight lines in Cartesian geometry. The finding that the monkey drawing is composed to a large extent of parabolas, and that both motion production and the perception of the speed of a moving target obey the two-thirds power law which is equivalent to having a constant equi-affine speed, raises the possibility that at least some of the neuronal activity in the brain is coding the equi-affine parameters of the motion. This analysis is described in detail in section 4 and is summarized in the next section.

# **2.4. TUNING OF SINGLE UNITS IN THE MOTOR CORTEX: PARTIAL CORRELATIONS**

It is well accepted that activity of neurons in the arm areas of M1 and premotor cortices code for the direction and velocity

of motion (Georgopoulos et al., 1982, 1986, 1988; Moran and Schwartz, 1999), although there is strong evidence that they code also for lower level representation of movement [muscle group activity (Kakei et al., 1999)] and for higher level representations such as sequential order of movement sequences (Carpenter et al., 1999) or other more complex features of the motion (Paninski et al., 2004). Many studies of velocity tuning of cortical motor areas have been based on a simple task where the monkey needs to move from one starting location toward one of eight peripheral targets. In such a task the direction of the velocity vector, the direction of the initial acceleration vector and the position of the final target vary together. For this reason it may be difficult to distinguish which of the three components (position, velocity, acceleration) the unit is tuned to. In the current experiments, where the monkey is scribbling, as well as in experiments where the monkey had to trace convoluted trajectories, the coupling is less tight. However, in these experiments as well there are couplings between the parameters. These are even stronger if variable delays between the parameters are allowed because for quasi-periodic movements the velocity vector often looks like its derivative (acceleration) shifted in time. For this reason we developed the idea of partial correlation to find solutions for simultaneous parameters each of which are tuned to its own delay. For three parameters *x*, *y*, and *z* (here *x*, *y*, *z* stand for position, velocity, and acceleration) we describe the firing rate of a neuron at time *t* [λ*(t)*] as a linear combination of the effect of the three components with three delays:

$$\lambda(t) = a\mathbf{x}(t + \mathfrak{r}\_{\mathbf{x}}) + by(t + \mathfrak{r}\_{\mathbf{y}}) + c\mathbf{z}(t + \mathfrak{r}\_{\mathbf{z}}) + d \tag{1}$$

We fit the best coefficients *a*, *b*, *c*, and *d* for every possible combination of the three delays in the range of ±250 ms, (taking into account that the rate cannot be negative) and test the fit (Stark et al., 2006, 2009). The results may then be displayed in a cube whose three dimensions are τ*x*, τ*y*, and τ*z*, with color reflecting the goodness of fit. However, as position, velocity, and acceleration may be highly correlated, it is better to build three such cubes for each neuron, one showing the regression on position when the contributions of velocity and acceleration are factored out; one for velocity when the contributions of position and acceleration are factored out; and one for acceleration when the other two are factored out. **Figure 5** illustrates examples of such cubes.

Eighty percent (218 out of 272) of the units recorded were tuned to at least one of these parameters. The most prevalent was velocity (71% of the tuned units), 19% included acceleration, and 10% position. Fifty-six percent of the tuned units were tuned to only one of these parameters but only very few to all three. It is worth noting that when a unit was tuned to more than one parameter, the delays were generally different, as illustrated in **Figure 5A2**.

In a similar way velocity tuning may be related to the tangential velocity or the equi-affine velocity. These two parameters are very strongly correlated, so the distinction is less clear. The fit of the firing rate f(t) would look like:

$$f(t) = a + b\dot{\mathbf{s}} \,(t + \mathfrak{r}\_{\dot{\mathbf{s}}}) + c\dot{\mathbf{o}} \,(t + \mathfrak{r}\_{\dot{\mathbf{o}}}) \tag{2}$$

well-fitted by parabolas (dashed lines).

**FIGURE 5 | Multi parameter tuning.** Data for single units in motor cortex of a monkey tracing a convoluted trajectory. Three possible parameters were studied: position, velocity, and acceleration. For each of them all possible delays within ±250 ms were tried. The color within the cube shows the contribution to the variance of firing rate for one parameter (e.g., velocity) given the other two (e.g., position and acceleration). **(A1)** Velocity tuning. This single unit showed only one plane of higher contribution to the total variance. At the velocity cube (middle) we see a horizontal plane (for τvel) at time 70 ms (leading the velocity). **(A2)** Velocity and position tuning. For Position (given velocity and acceleration) we see a vertical plane at 0 delay and for the velocity (given position and acceleration) we see a horizontal plane leading the velocity by 90 ms. This single unit was coding two parameters at different delays.

Where ˙*s* and σ˙ are the amplitudes of the Euclidian and equiaffine velocities, τ*<sup>s</sup>* and τσ are the delays between firing and the Euclidian and equi-affine velocities.

**Figure 6** illustrates a case with stronger tuning to equi-affine velocity.

In 6 out of 16 units for which this analysis was attempted, the tuning to Equi-affine velocity was stronger.

Even in M1 some single units may code not just for instantaneous position, velocity, or acceleration, but rather only for the serial order at which targets are presented, only to the direction of movement when the monkey moves its arm, or to both. Even when a single unit codes for both, the directions may be very different.

In our data, when the monkeys alternated between continuous curved motion (free scribbling or tracing) and center-out straight movements we found that 110 out of 304 units were directionally tuned during one of the tasks only, while only 72 during both. But even when a unit was directionally tuned in both tasks it did not necessarily have the same preferred direction. Thirty-eight of

**FIGURE 6 | Equi-affine tuning.** As Euclidian velocity and equi-affine velocity have the same direction only the amplitude of the velocity vectors was considered here. All possible delay combinations within ±250 ms were evaluated. The (thick) vertical line when equi-affine speed was the main regressor indicates that this unit is tuned to equi-affine speed. The thickness of the line may be attributed to the fact that both types of speeds are highly correlated; hence the contribution of one, after factoring out the effect of the other, is noisy.τ|σ˙| and τ*s*˙ are the delays of the amplitudes of the equi-affine and Euclidian velocities.

the seventy-two had preferred direction difference of more than 45◦. Thus, representations in the motor cortex are far from uniform and heavily dependent on the context in which they are studied.

# **2.5. SIMULATION OF NEURAL NETWORKS FOR GENERATING PARABOLAS**

### *2.5.1. Introduction*

We saw that the monkeys' scribbling tends to be characterized by concatenating parabolas and that parabolas are special shapes in terms of equi-affine differential geometry. We also know that many of the motor-cortex-units are tuned to the direction and velocity of the hand motion.

Suppose that we plot these units in velocity coordinates (rather than in their topological position on the cortex). In such a plot the *(x, y)* position of the neuron is its velocity components in the *X* and *Y* directions. If we express its position in polar coordinates (ϕ, *v*) ϕ represents the direction of motion and *v* its speed. In these coordinates, motion with constant acceleration will be a motion on a straight line. By contrast, any straight line in these velocity coordinates represents motion along a parabola. This is illustrated in **Figure 7**.

Activity that moves at a constant acceleration along any such straight line in velocity space will represent motion along a parabolic trajectory while obeying the two-thirds power law.

If a neuronal network is behind the generation of a parabola it may be composed of groups of neurons that are activated one after the other, where each group of neurons has an appropriate location in velocity space. Each such group should be composed of neurons tuned for an appropriate direction and velocity such that when these groups are plotted in velocity coordinates they form a straight line. Activity should propagate with a constant delay from group to group. This description is consistent with the activity along a synfire chain. This notion was tested in simulations.

# *2.5.2. Simulations*

A synfire chain is essentially a feed-forward network composed of many pools of neurons. Each pool excites the following one by multiple diverging—converging excitatory connections. In such a network activity may propagate as a synchronous volley of spikes travelling through the pools or dissipate in time and quench (Abeles, 1982, 1991; Diesmann et al., 1999). Any individual neuron may take part in several pools of the same chain as well as in pools of other chains. Theoretical work suggests that if the pools are large enough (on the order of 100 neurons per pool or more) and the overall average activity is low enough (5 per s per neuron) each neuron may participate in many (10–100) pools without confusion (Bienenstock, 1995; Herrmann et al., 1995).

A simulation that mimics cortical tissue in which multiple synfire chains are embedded needs to include a situation where there is both excitation and inhibition with balance between them, so that the membrane potential of each neuron randomly fluctuates below threshold and only very occasionally hits the threshold and fires (Brunel, 2000). Each neuron should also receive multiple, possibly uncorrelated, inputs from other brain regions [on average the number of excitatory inputs arriving at any cortical patch is on the same order of magnitude as the number of neurons in the patch (Abeles, 1991; Braitenberg and Schuez, 1998)]. This implies simulating many tens of thousands of neurons with hundreds of millions of connections among them. Despite the theoretical work on the feasibility of embedding multiple synfire chains in a large network, in simulations the entire network tended to explode into global synchrony in a periodic manner and lost the identity of the individual synfire chains. To enable the embedding of several synfire chains in the network while maintaining spontaneous low levels of asynchronous background activity the homogeneity of neuromime properties had to be broken. In such a network each neuron can participate in several pools. In these conditions it is possible to excite individual synfire chains while assuring the propagation of a synchronous wave along each of the chains. More details are provided in section 7.

Consider a situation where each pool in a synfire chain codes for a different direction and velocity, such that when the pools' positions are plotted in velocity coordinates they lie along a straight line. A wave of activity progressing with a constant delay from pool to pool of the chain will thus produce a parabolic trajectory. A wave in a synfire chain may be elicited by synchronous excitation in one of its pools, or by enhanced, asynchronous, excitation to several of its initial pools. In the monkey, after long training, a particular sequence of parabolas tended to appear repeatedly. The parabolas became connected so that the end of one was tangent to the beginning of the other, conveying the impression of a smooth transition from one to the other.

In the above scenario the end of a synfire chain producing one parabola was at the same velocity-space location as the beginning of another synfire chain which produced another parabola. There may be several synfire chains starting (or passing through) that very same velocity space region, but with practice the end of one chain becomes connected in a stronger fashion to the beginning of another specific chain. These stronger connections between chains may be thought as the basis for the drawing syntax, where after a particular parabola *A* there is a higher probability of drawing another particular parabola *B*. If with training the monkey tends to produce sequence *A, B, C, A, . . .* the end of the network producing *A* becomes connected to the beginning of the network producing *B* whose end becomes connected to the beginning of the network producing *C* and so on.

This situation was simulated in a network of 50,000 neuromimes with 10 synfire chains three of which were connected as described above. Once activity was initiated in one of them it would circulate through the three producing a sequence of parabolas. Sample results of this simulation are illustrated in **Figure 8**.

If parabolic drawings are produced by activity waves in synfire chains, it is reasonable to inquire how these chains came about. Were they always there, explaining the tendency to draw parabolas? Or, perhaps whatever induced the monkey to draw the same parabola several times generated the connections between the pools of neurons with the appropriate velocity-tuning to become connected to each other by spike-timing-dependent-plasticity. Once such connections are established, even in a weak form, activity tends to follow the same pattern and strengthen the connections until they form a reliable synfire chain and produce a reliable segment of a parabola. We cannot answer this issue at this stage.

The simulations show the feasibility of generating the drawings by synfire chains which needs corroboration by experimental evidence. Is there any experimental evidence to support this idea? The next section deals with this issue.

### *2.5.3. Experimental evidence*

Direct observation of activity along a synfire chain requires the simultaneous recording of many neurons form the region in which the synfire chain is embedded. In simulations it was found that the activity of at least 200 neurons needs to be recorded simultaneously (Schrader et al., 2008). While recording

**FIGURE 8 | Simulations of the production of 3 parabolas. (A)** The shape of the drawings generated by the simulation. **(B)** Raster plot of the activity in the network. Abscissa describes time, Ordinate provide the cell #. Activity of each neuron is described by dots along one line. The neurons along the ordinate are arranged according to their participation in the synfire chains. Synfire chains 4, 6, and 8 are the ones generating the drawing on the left. **(C)** The layout of the 3 synfire chains in velocity space. The synfire chain which is currently active is colored in thick red. At the moment activity is at the red dot producing the vertical motion near the maximal curvature of the parabola. Note that when one single synfire chain ends there is a competition among several others. E.g., at 750 synfire 6 ended and synfires 3, 7, and 8 show enhanced activity. After a short while synfire 8 wins.

techniques approach this limit, they typically record from a much larger volume than that in which a single synfire chain is expected to exist. The recording methods employed in the present study were unable to match the above figures. Thus, the most we can hope for is that by luck we record from two to three neurons that take part in the same chain.

If we record from two neurons that take part in a single chain, every time a wave sweeps through this chain we expect the two neurons to fire with some fixed interval in between. If the chain is associated with part of the monkey's drawing we expect to see this interval appearing repeatedly. In our data with approximately 10 neurons simultaneously measured and drawing quantized to approximately 22 different strokes we can construct 10 × 9 × 22 = 1980 different neuronal pairs for all the quantized drawings. If for each such pair we consider 50 possible intervals, there will be 50 × 1980 = 99,000 possible intervals, some of which will definitely contain a large number of repetitions even if there was no real preferred interval. To overcome this problem we adopted an idea proposed by Bienenstock and Geman (Geman et al., 2000; Amarasingham et al., 2003) as follows:


become closer than the refractory period, re-teeter their times till refractoriness is preserved.


This smallest *W* provides an estimate of the time precision of the data. **Figure 23** illustrates the results of steps 1, 2, 3, and 4, for *W* = 10 ms. The precision obtained in this way is an upper bound on the spike-timing precision. Clearly, by defining a better statistics, a smaller W would be sufficient to obtain a significant difference between the real-data statistic and the teetered distribution.

This statistics, dubbed the "relations-score" was based on estimating the probability of getting *N* or more repetitions of each possible interval for each possible pair of neurons around each of the drawing strokelets. Minus the log of the product of the smallest 10 probabilities (*pi*) was defined as the relations-score:

$$Relations-score' = -\sum\_{i=1}^{10} \log\_{10} \left( p\_i \right)^2$$

We took several measures to assure that large relations-scores would not be due to imperfect spike sorting, or merely to some strong correlations for one particular pair.


Eight experimental days in which the spike shape isolation was good and the firing rates stayed stable along the recording session were selected for analysis. This analysis was repeated for W of 10 ms and less, up to 0.3 ms. In five of the eight, accuracy was 8 ms or better. **Figure 9** summarizes this analysis. If 5% is chosen as the significance level accuracy can reach 0.5 ms!

Accuracy did not reach even 10 ms on any day for randomly selected times during the recordings. Thus, the observed accuracy is clearly associated with the shape of the drawings. While these findings show that precise spiking intervals exist, and that they are associated with particular segments of the drawings, they do not prove that the drawings are produced or controlled by activity in synfire chains. However, we tried to disprove the hypothesis that the drawings were produced by synfire chains by looking for

precise timing. Had we found none, the support for our hypothesis would be weakened. Nevertheless, we did find precise timing in five out of eight datasets at a precision of 0.5 ms. While these findings support our hypothesis, stronger support or disproof must wait until technology allows us to measure the spiking activity of many hundreds of neurons in a small volume in behaving subjects.

Elements of the drawings that repeat again and again can be produced by synfire chains. A movement primitive was defined as an entity that cannot be intentionally stopped before its completion (Polyakov et al., 2009a,b). A hint for such behavior was found in a well-trained monkey where the movement was usually decelerated after receiving a reward, but it stopped only after the completion of a sequence composed of several parabolic segments. A more direct test for the existence of "point of no return" in human scribbling is examined in the next section.

## **2.6. POINT OF NO RETURN: ANALYSIS OF HUMAN SCRIBBLING**

To study the existence of the "point of no return" we trained human subjects in the same setup as the monkeys. The subjects sat holding the manipulandum. They were told to keep moving it so as to hear as many sound beeps as possible. The working space was tiled with invisible hexagonal targets. Whenever the manipulandum handle passed through one of them a beep was sounded and another hexagonal target was selected at random. Some subjects tended to produce rounded movements while others moved essentially only to and fro (as the monkey initially did).

After a few sessions the task was changed. The subject was told to move, as before, but to stop immediately after hearing a beep and wait until told to start moving again.

Examination of the velocity of drawing indicated a succession of peaks with deep troughs in between. If stopping in the middle of the drawing caused no problems, we would expect the subject to stop at the next trough after the stop signal + processing delay. **Figure 10** illustrates an example where the drawings continued for several up and down cycles after the stop signal, suggesting that the subject had to complete some pre-planned sequence of strokes before stopping. Such behavior was repeatedly observed in 6 out of 9 subjects. For more details see (Sosnik et al., 2007).

The existence of drawing elements that repeats over and over again in scribbling, and the finding that some cannot be stopped in the middle support the notion that these represent some sort of drawing primitives. The idea that each such primitive is generated by an activity wave propagating through a synfire chain is attractive in the sense that it explains the sequencing in time of the series of small movements (strokelets) that compose the primitive and explains why it is very difficult to stop such elements of motion in the middle. However, the available data do not either prove this idea or disprove the possibility that some other type of neural network is responsible for producing such elements. The scribbling was often composed of several distinct repeating elements and tended to show regularities in terms of the order of recruitment of these elements. These regularities may be treated as the "syntax" of scribbling. The next section discusses how this syntax can be detected.

# **2.7. THE SYNTAX OF SCRIBBLING**

Some of the human subjects carrying out the scribbling tasks tended to move to and fro in a regular manner. **Figure 11** illustrates such a scribbling session. To the human eye and brain this strategy is quite clear. The subject is scanning the work space horizontally and once finished, scans it vertically and then horizontally again etc. . . . Can this structure be revealed automatically?

We start the analysis by breaking the scribbling into smaller segments (strokelets). The velocity of scribbling shows multiple peaks separated by deep troughs. It is reasonable to assume that each section between two successive velocity minima should be considered as one segment. Nevertheless, in the monkeys' scribbling the different parabolas were concatenated at points of maximal velocity. Lacquaniti et al. (1983), showed that when parsing tracing motion according to the 2/3 power law, one obtains segments with close- to- constant velocity gain coefficients (*K*) with abrupt shifts in *K* at the points of maximal velocity. For these reasons we decided to parse the scribbling between adjacent extrema in the velocity.

Each such strokelet was then described by the direction of motion at 10 points along its trajectory, and whether it was accelerating or decelerating during the strokelet. The strokelets were clustered into 47 clusters, 23 for accelerating strokelets and 24 for decelerating. We then used the information bottleneck analysis (Tishby et al., 1999; Slonim and Weiss, 2000) to cluster these

47 groups so as to preserve the maximal mutual information between each strokelet and its subsequent one. The results are depicted in the matrix in **Figure 12**. Cluster 1 (top left in the matrix) is composed of 3 groups of strokelets. They are followed by strokelets in cluster 2 which are followed by strokelets in cluster 3 which are followed by strokelets in cluster 4 which are followed by strokelets from cluster 1. Thus, with a few exceptions cycles of clusters 1 → 2 → 3 → 4 → 1 → . . . This could be compared to spoken language. Each of the 47 groups is like a phone. A cluster of phones with similar transition properties is like a phoneme, and the sequence of frequently connected phonemes (e.g., 1, 2, 3, 4) is like a word. In this analogy, there are three words: A composed of clusters 1, 2, 3, and 4; B composed of clusters 5, 6, 7, and 8; C composed of clusters 9, 10, 11, and 12. The sentences in this analogy are: "α" composed of the sequences A, A, A,. . . ; or "β" composed of B, B, B,. . . ; or "γ" composed of C, C, C,. . . The transition between sentences is not random. Sentence β is followed only by sentence α. This transition occurs only when one of the group members ("allophones") of cluster 8 is followed by one of the group members of cluster 4 in sentence α. Similarly, sentence γ is followed by sentence α only when one of the group members of cluster 9 is followed by one of the group members of cluster 2 in sentence α. Sentence α, on the other hand may be followed by either sentence β or γ. Here too the transitions are through one specific "allophone" from cluster 3 in α into cluster 8 in β; or from cluster 2 in α into cluster 11 in γ. Thus, the structure of a paragraph in this analogy is: β, α, γ, α, β,. . .

Sentence β is composed of left-right-left . . . strokes, sentence γ is composed of up-down strokes and sentence α of diagonal strokes. While this is a very simple example it shows that there is some internal "syntax" for the scribbling and illustrates how this syntax can be revealed without human interpretation.

The language of scibling in monkeys and men is very limited. The above method would probably be insufficient to reveal the syntax of handwriting. Yet this analysis shows that scribbling does have its internally driven syntax.

More details on this analysis can be found in section 8.

# **2.8. SUMMARY**

This overview described how scribbling is generated, the brain activity correlates of scribbling, what neural networks can generate the scribbling and how the rules by which simple elements of scribbling are concatenated when a longer drawing is generated. Although each of these topics deserves fuller exploration, these findings are sufficient to support our hypothesis that synfirechain-like structures are likely to underlay the neural data. It can conversely reproduce the behavioral data.

# **3. DATA ACQUISITION**

### **3.1. MATERIALS AND METHODS**

Monkey subjects (*Macaca fascicularis*) were trained to hold a low friction and low inertia manipulandum and carry movements in the horizontal plan. The subjects could not see their hand or manipulandum. An opaque white screen was positioned just above the manipulandum and a cursor (yellow circle) was projected on a point just above the manipulandum's handle. When necessary, additional targets were projected too on the same screen. A juice spout was in touch with the subject lips. The desired behavior was reinforced by releasing a few drops of orange juice whenever the subject successfully completed a trial. No negative reinforcements (punishments) were employed.

Three types of experiments were performed: Free scribbling, tracing and a center-out task. In the free scribbling the subject was motivated to continuously move the manipulandum in the following way. A target zone (invisible to the subject) was selected at random. When the target was hit, a short beep was heard, the subject was rewarded, and the target jumped to another random location.

In the tracing task, a trial was started by projecting a green target on the opaque screen in front of the subject. As soon as the subject brought the cursor into this green target, a convoluted trajectory was displayed in gray. The initial target disappeared and an elongated target made out of eight partially overlapping green circles was displayed on the gray trajectory just in front of the initial target. Once the subject placed the cursor into the first circle it disappeared and another circle was added in front of the elongated target. In this way it appeared as if the subject was chasing an elongated worm that progressed along the convoluted trajectory. When the subject reached the end of the trajectory it was rewarded by a few drops of orange juice. On each day a set of different 40 trajectories was selected out of a repertoire of 100 trajectories. The trajectories were generated by spline interpolation between 10 randomly selected points.

### **3.2. SURGERY AND MONKEY HANDLING**

Following training, a localizing MRI scan was performed and a chamber (22 × 22 mm) was implanted in aseptic conditions [halothane anesthesia, induced by ketamine and medetomidine hydrochloride (Domitor)] over the left hemisphere. The dimensions of the chamber were selected in order to allow access to both motor and premotor areas. Analgesia [pentazocine (Talwin), carprofen (Rymadil)] and antibiotics (ceftriaxone) were administered peri-operatively. The dura mater was left intact. Location of sulci was confirmed visually during surgery and, following chamber implantation, by another MRI scan.

All the procedures were supervised by the institution veterinary, approved beforehand by the institutional ethics committee and conformed the laws in Israel, and the NIH Guide for the Care and Use of Laboratory Animals (1996).

# **3.3. NEURAL AND BEHAVIORAL RECORDINGS**

During each recording session up to eight glass-coated tungsten micro-electrodes (impedance 0.2–2 M at 1 kHz) were inserted through the dura. Electrodes were arranged in a circular guide tubes (MT, Alpha-Omega Engineering, Nazareth, Israel), such that inter-electrode spacing within a circle was ∼ 300μm. Each electrode was moved independently (EPS 1.31, Alpha-Omega Eng.). The electrodes were inserted either into the primary or the premotor areas. The signal from each electrode was amplified (10 K), band-pass filtered (1–6000 Hz), and sampled at 25 kHz (Alpha-Map 5.4, Alpha-Omega Eng.). Eye movements were recorded using an infra-red beam system (Oculometer, Dr. Bouis, Karlsruhe, Germany) tracking movements of the right eye. The 2D signal from this system, as well as the position of the robotic arms, was sampled at 100 Hz. Behavioral events (LEDs, switches, lights, and so on) were sampled at 6 kHz. The workspace and monkey's movements were monitored using three infra-red CCD video cameras synchronized to the task and recorded on computer disk.

### **3.4. NEURAL DATA PREPROCESSING**

An offline procedure was applied to identify spike waveforms in the 25 kHz digitized traces (Bar-Hillel et al., 2006). Spikes were subjected to manual offline spike-sorting (Abeles and Goldstein, 1977) (Alpha-Sort 4.0, Alpha-Omega Ind.), and the clusters defined examined for unit isolation (ISI histograms, individual spike shapes) and unit separation (Ben-Shaul et al., 2003). Longterm (trial-to-trial) stationarity of the responses of each unit was determined by an algorithm based on a time-varying Poisson counting process and validated by visual inspection of raster plots.

Further details on the methods of analysis of the behavior and spike trains are described in the appropriate places of the following sections.

# **4. GEOMETRIC APPROACH TO MOVEMENT ANALYSIS**

# **4.1. PIECE-WISE PARABOLIC PATTERNS EMERGE IN SPONTANEOUS OVER-TRAINED PRIMATE DRAWINGS**

Since the pioneering work of Bernstein (1967), there has been a general consensus that we have mental templates of motions which we try to follow when executing motor tasks. For example, reaching movements consist of fairly straight lines with bell shaped velocity profiles. Several optimization criteria have been suggested as the basis for the selection of these motion primitives (Flash and Hogan, 1985). The shape of the velocity profiles is invariant under changes in speed (Atkeson and Hollerbach, 1985) unless there are accuracy demands, in which case the movements obey Fitts' law (Fitts, 1954).

Simple curved motions which contain several velocity peaks may be observed during obstacle-avoidance movements or when the target jumps in the middle of the motion. Such motions seem to be composed of the summation of straight or slightly curved motions, which are partially overlapped in time (Morasso and Mussa-Ivaldi, 1980). Similarly, the analysis of movements generated by stroke patients or during load adaptation tasks has shown that template for the speed of hand trajectory might be composed of a single or a few velocity primitives (Krebs et al., 1999).

Continuous two-dimensional drawing motions tend to follow a power law (Lacquaniti et al., 1983) *A* = *KC*β, where *A* is the angular velocity, *C* is the curvature, the power β is often near 2/3, and *K* is a gain factor which has been shown to be piecewise constant. The value of K depends on the linear extent of the segment, in a way that conforms to the isochrony principle (Viviani and Schneider, 1991). Thus, in spite of the apparent continuity of drawing movements, they may be, in fact, intrinsically discontinuous and constructed of individual segments. We term such intrinsic components as "primitives" of motion. While there is experimental evidence supporting the idea that complex motion is composed of primitives (Flash and Hochner, 2005; Hart and Giszter, 2010), not everybody agrees on that (Tresch and Jarc, 2009).

To examine the nature of movement elements from which monkey scribbling movements are constructed, three data sets recorded from two monkeys were analyzed and included two data sets recorded during 16 or 17 sessions at the beginning of the practice period and one data set recorded from one of these monkeys during 17 sessions conducted following a full year of practice. Kinematic analysis of the scribbling movements of the highly trained monkey showed that these movements can be well-approximated by parabolic segments. The movements were first segmented into periods of rest and active drawing and the drawing movements were then kinematically analyzed and various geometric, temporal and kinematic variables were calculated. These variables included hand velocity and acceleration, Euclidean arc-length and curvature and equi-affine arc-length curvatures.

To empirically examine whether the monkey movements can indeed be shown to be composed of a sequence of parabolas, the movement records were segmented into separate strokes, each lying between local minima of Euclidean curvature, and containing a single maximum of Euclidean curvature. These strokes were then fitted with parabolic segments whose canonical representation is *y* = *<sup>x</sup>*<sup>2</sup> <sup>2</sup>*<sup>p</sup>* . The parameter *p* is the focal parameter of the parabola, its value being equal to the radius of curvature at the point of maximum curvature. For more details see Polyakov et al. (2009a,b).

The error in fitting a stroke with a parabolic model was estimated using the parameter D evaluating the proportion of the data variance unexplained by the parabolic model, namely:

$$D = 1 - R^2 = \frac{\sum \left(\wp\_i - \frac{\chi\_i^2}{2p}\right)^2}{\sum \left(\wp\_i - \text{mean}(\infty)\right)^2 + \left(\wp\_i - \text{mean}(\wp)\right)^2}$$

# **4.2. PARABOLIC PATTERNS DURING DRAWING: EVIDENCE FROM MONKEY SCRIBBLING**

Recorded trajectory segments were fitted with parabolic strokes (see **Figure 13**). The length of the movement segments that were well-approximated by parabolas was found to be longer for data derived from well-trained behavior vs. those derived from movements performed during the beginning of the practice period (see **Figure 14A**). **Figure 14B** shows the values of the D parameter estimating the error in fitting parabolas to the extracted segments. As is clear from this figure such error became smaller as a function of the amount of practice the monkeys have had. To assess the degree to which scribbling movements are well-approximated by parabolic-like strokes, the values of the equi-affine curvature along these strokes were derived (see also **Figure 4A**) and their modifications with practice were examined.

This analysis showed that the distributions of the equi-affine curvature k of the fitted strokes peaked at zero (histograms in **Figure 15A**) with both negative and positive values. Moreover it was found that during the first five to six practice sessions, the absolute values of the equi-affine curvature |*k*| consistently decreased, converging toward nearly zero equi-affine curvature (**Figure 15B**). Hence, with practice, the extracted movement segments indeed tended to become more parabola-like. We also assessed what other geometric forms besides parabolic strokes may possibly provide a good fit to the extracted scribbling segments. These other geometric forms included ellipses and polynomials of third, fourth, and fifth order. Several considerations suggest that parabolic rather than elliptic or polynomial segments provide a better model for drawing movements [for further details see Polyakov et al. (2009b)]. To quantify the tradeoff between goodness-of-fit and model simplicity (number of parameters of the fitted curve) the Schwarz information criterion (SIC) (Schwarz, 1978) was used (Polyakov et al., 2009b). This analysis showed that the parabolic model yielded the highest SIC score indicating that the parabolic model is optimal in the sense of goodness-of fit vs. simplicity trade-off. Hence, taken together these results suggested that parabolas might be considered as more attractive candidates for serving as plausible movement primitives.

# **4.3. CLUSTERING OF THE EXTRACTED PARABOLIC SEGMENTS**

The focal parameter and the orientation define a unique parabola up to translation (see **Figure 16**). The parabolic segments that were fitted to the recorded movement segments were then clustered into different clusters according to their spatial orientation. In comparison to the lack of distinct clusters in the histograms obtained for the parabolic segments derived from the movements recorded during the beginning of the practice period the parabolic segments extracted from the well-practiced movements

**FIGURE 14 | Degree of fit and length of the parabolic strokes fitted to movement segments. (A)** Euclidean lengths of the fitted parabolic strokes. In each plot, median values (over sessions) and 95% confidence intervals are shown. **(B)** Values of the D parameter estimating the error in fitting parabolas to the extracted segments.

clearly showed convergence toward a few well-separated clusters (see **Figure 16**). Hence the well-practiced movements could be fitted by three parabolic segments (see **Figure 16B**). Further examination of the locations of the vertices of similarly oriented extracted parabolic segments also showed that after a period of practice, these locations could be separated into three distinct locations.

### **4.4. NEURAL CODING BY MEANS OF EQUI-AFFINE VARIABLES**

of the equi-affine curvature (same data).

While the above analysis has indeed supported the hypothesis that parabolas might be considered as likely candidates for serving as movement primitives, further analysis was carried out to directly examine whether equi-affine speed is indeed represented in single unit motor cortical activities recorded during well-trained scribbling. This analysis was based on the method suggested by (Stark and Abeles, 2007). Here we describe the procedure used to compare the representation strengths of Euclidean vs. equi-affine speeds [for details of the method see Stark and Abeles (2007) and Polyakov et al. (2009b)].

Given that the equi-affine and Euclidean speeds were found to be highly correlated for the recorded scribbling movements the neural activities related to one of these two variables is expected to be trivially related to the other variable at a similar time lag. Hence, to overcome this problem, the relation between single-unit firing rates vs. Euclidean and equi-affine speeds were simultaneously analyzed at multiple time lags using the following multiple linear regression model as described in Equation 2 of section 2.4:

$$f(t) = a + b\dot{\mathbf{s}}\,(t + \tau\_{\dot{\mathbf{s}}}) + c\dot{\boldsymbol{\sigma}}\,(t + \tau\_{\dot{\mathbf{e}}}) \tag{3}$$

where *f*(*t*) is the unit's firing rate at time *t*, τ˙*<sup>s</sup>* and τσ˙ are the time lags for the Euclidean and equi-affine speeds, respectively, and *a*, *b*, and *c* are regression coefficients. Positive time lags correspond to the neural activity preceding the movement. Note that Euclidean and equi-affine velocity vectors have the same direction. Therefore, we ignore the direction of the velocity vectors and relate only to the amplitude of these vectors. The influence of the Euclidean and equi-affine speeds was estimated using the measure of contribution defined in (Stark et al., 2006).

Following Stark et al. (2006) [for further details see Polyakov et al. (2009b)] a stripe (horizontal or vertical set of values) in a contribution matrix, all corresponding to the same time-lag of one of the two speed parameters was deemed dominant if at least half of the constituent values were above *max*(*R*2)/2, where the maximum is taken over all *R*<sup>2</sup> values at all time-lag combinations of the two parameters. Using this method, the activity of 87 well-isolated units recorded during the scribbling task was analyzed. The activity of 72 units (83%) was related to the monkey's hand position, velocity, acceleration, or some combination of these kinematic variables.

The contribution matrices for the combined speed model for one of these units are shown in **Figure 17A**. For the unit depicted there, the firing rate is movement-related because contribution of equi-affine speed to the firing rate variance is dominant as the right matrix contains a vertical dominant region around the time lag of 0.12 s, indicating that neural activity precedes movement (permutation test, *P* < 0.05). In contrast, the contribution matrix for the Euclidean speed (left) does not contain a dominant stripe. Further analysis showed that the combined Euclidean/equi-affine speed model (see above) provided a good fit for the firing rates of 16/72 (22%) of the movement-related units (permutation test, *P* < 0.05; **Figure 17B**). The activity of seven of these units (44%) was related to both Euclidean and equiaffine speeds. However, equi-affine speed was dominant in the activity of six units (38%) whereas Euclidean speed was dominant for only three units.

# **5. ANALYSIS OF NEURAL STATES IN PARALLEL SPIKE TRAINS**

In this section we wish to treat the parallel recorded spike train as a time-varying vector of activity. We do this by considering the recorded activity as the output of a Markov chain.

In a first order Markov chain a system may be in one of *N* states while time progresses in discrete steps. For each state there is a vector of probabilities that the system would flip to any other state or remain where it is now. The state of the system may not be known explicitly, but is observable by some output that is related to the state in a probabilistic manner. The observed information can be disentangled by an estimation of the most likely state by a Hidden Markov Model (HMM). HMMs were used for the analysis of spike trains in the past (Radons et al., 1994; Abeles et al., 1995; Gat et al., 1997). In our analysis, we assume that the little piece of cortex in which our electrodes were placed behaves like a Markov chain, and the activity of several (M) recorded neurons is the observable output of the system. Thus, we assume that the piece of cortex may be in one of *N* states, we first find an optimal *N* × *N* matrix of the probability for transitions among states (*P*) and an optimal set of M firing rates at each state (An *N* × *M* matrix of firing rates -). As firing rates are typically low, time steps had to be fairly big (50 ms). We will assume that the probability of observing *n* spikes of a certain single unit

is: *p*(*n*|*x*) = *e*<sup>−</sup>*xxn*/*n*!, where x is the expected number of spikes given by x = λ*t*, where λ is the firing rate and *t* is the duration of the step. We further assume that the different single units independently fire.

With these assumptions and knowing P and -, we may estimate the likelihood of observing the measured spike trains throughout the recorded period for any possible series of the states of the Markov chain. We start with a guess of P and and improve them by expectation maximization procedure. Here we used both the Viterbi training algorithm and the Baum-Welch algorithm, and once optimal P and are found we can also specify the state-sequence that provides the best likelihood of observing the recorded spike train. Typically, for any given spike train, within a window of 50 ms, no spikes at all were observed in many cases, occasionally there was one spike, and rarely two or three. This situation produces a very uneven terrain of likelihood (as a function of P and -). Therefore, the initial guess of P and may be critical. We used the following algorithm to obtain the initial guess.

For each 50 ms window and each spike train, we computed the probability of observing so many spikes, given the number of spikes observed in the preceding 200 ms. The product of these probabilities for all the recorded spike trains provided an estimate for the likelihood that the firing rates at the present 50 ms are the same as in the previous 200 ms. **Figure 18** illustrates a small stretch of such a computation.

Instead of the probability itself, we used the negative log of the probability (MLL for minus log likelihood). Peaks in MLL that exceeded five times the standard deviation of MLL were taken as points in time at which the activity flipped from one stationary state to another.

The number of states of the HMM was preselected between *N* = 6 and *N* = 8 as the results with this number of states seem to yield best results, as described below. Any series of observations will converge to some optimal P and -. How do we know that the HMM is a reasonable one. One way is to look at the probability of the Markov chain to be at each possible state as a function of time. A good fit to an HMM would result in sharp transients of probabilities, as may be seen in **Figure 2**. In the present experiments, we have the advantage of observing the behavior (scribbling) during the time that the brain activity was recorded. Thus, we may confront the series of hidden state with the drawings as illustrated in **Figure 19**.

dominant contribution from the equi-affine and/or Euclidean speeds.

In motor areas of the arm the majority of units are related to direction of movement of the arm's end-point. Indeed as seen in **Figure 19** most of the time the individual HMM states map into short arches of the scribbling with similar directions.

However, we also showed in section 4 that the monkey's drawings tend to be composed of concatenations of 3 parabolas. One can hope that on some days HMM may reveal also the "intention" to draw parabolas. If unique states of ensemble activity of motor cortical cells represent movement primitives, it should be possible to associate such states with distinct movements having common characteristics. To test this possibility, we used a HMM (Abeles et al., 1995; Gat et al., 1997) and the recorded motor cortical activities were segmented in an unsupervised manner without using any information about the concurrent movements. The HMM analysis was applied to the activity of a group of simultaneously recorded motor cortical units (5–12 units/session, 8 sessions). To be considered dominant, a state had to have a probability above 0.5 for at least 0.1 s with its time-average being at least 0.75.

One such segmentation is demonstrated for the results shown in **Figure 20A**. The HMM provided the a posteriori probabilities of the states as a function of time. **Figure 20C** shows a period when state 1 was dominant. Movement periods associated with the periods of dominance for the eight identified states were identified by finding an optimal time lag between the neural state and the corresponding movement segments. An optimal time lag for each state was determined by seeking a time lag providing the highest similarity between the geometrical shapes of the movement segments associated with each state lagged relative to the neural activity. A single time lag was used, although many units were active during each state and different neurons may have diverse time lags (Moran and Schwartz, 1999; Stark and Abeles, 2007). The paths identified with each state are

**FIGURE 18 | Cooperative changes in firing rates.** Negative log-likelihood that the activity is stationary during a 5-s time interval. For all the spike trains we compute MLL*<sup>i</sup>* = − log [prob(#spikes now | #spikes in the past 200 ms)] and MLL = *<sup>M</sup> <sup>i</sup>* <sup>=</sup> <sup>1</sup> MLL*i*. We show the MLL of 11 well-isolated and stable spike trains (blue traces), the sum of the blue traces (red trace) and times of transition (black lines). Top 22% of the peaks were taken as transition times. Note that several individual MLL's have peaks at the same points indicating the tendency of cooperative changes in firing rates. For

each such "stationary" piece we computed the mean firing rates of every single unit obtaining a vector of M firing rates. The vectors of mean firing rates were clustered into *N* groups by the *k*-means algorithm. The probability of transition from state *i* to state *j* was estimated by counting how many times activity assigned to group i was followed by activity assigned to group *j*. The firing rate for group *i* was computed by pulling together all the time slices judged to belong to group *i*. These were, then, used to initialize *P* and -.

depicted in **Figure 20A**. Nearly 50% of duration of the neural data analyzed in this session was identified with the periods of dominant a-posteriori probabilities of the hidden states. States 1–4 corresponded to geometric strokes easily identifiable as parabolic strokes with specific orientations. State 1 (**Figures 20B,C**), for example, corresponded to parabolic strokes having an orientation of 270 degrees (direction of the normal at the vertex). States 1 and 2 could be identified with single parabolic strokes, whereas states 3 and 4 corresponded to elements from sequences composed of two parabolic strokes. States 5–8 corresponded to slower movements, presumably associated with periods of rest. Thus, the HMM segmentation, although unsupervised, resulted in partitioning the movements into sets of parabola-like elements.

probability is about 15 times higher than the lowest depicted probability. Periphery: paths corresponding to the states (for presentation purposes, the paths with shortest and longest durations within each state were

# **6. TIME PRECISION OF NEURAL DATA**

highlighted; this segment is parabolic.

In the cerebral cortex, where each nerve cell is affected by thousands of others, it is a common belief that the exact time of a spike is random up to an averaged firing rate. Precise time relations of several neurons have been observed in brain slices (Ikegaya et al., 2004) and in behaving animals. In behaving monkeys, the time intervals between spikes, measured in correspondence to a specific behavior, may be controlled to within the milliseconds range with the best case reaching 0.5 ms. The realization that time relations among different neurons could be precisely controlled and read out, can also imply that complex representations could be built from simpler ones efficiently and very fast. We used data-mining techniques and rigorous statistic testing to test how precise can time intervals between spikes of different neurons be.

state with the highest instantaneous probability. **(C)** The movement corresponding to the analysis in **(B)**. The segment based on state 1 is

# **6.1. DATA MINING ON THE SPIKES RECORDING**

Single-unit activity was recorded from 8 microelectrodes inserted into the motor and premotor cortices of a monkey while it was freely scribbling as described in section 3. Spike data analysis was carried out for two sets of measurements. In the first set (consisting of three experimental days), time resolution of recording was 1 ms, while in the second set (consisting of five other days) it was 0.1 ms.

The basic entity of the neural data is a single spike generated at a specific time by a specific neuron. A neural component was defined as a triple (*n*1,*n*2,δ), where *n*1 and *n*2 are two neurons and δ is a time-interval (between spikes generated by these neurons). In this way, each pair of spikes in the neural data could be interpreted as an occurrence of some neural component. For the results given in this article, the total number of time-intervals between spikes was limited to 50. In the first set of measurements, time-intervals were quantized to 2 ms. In the second set, in which spikes were recorded with a resolution of 0.1 ms, the bin width was 1 ms.

Spike sorting by shapes that were recorded through a single electrode can result in confusions. Intracellular properties which may generate precise time intervals can be confused with precise timing which is generated by the organization of activity in the network. Therefore, we considered only neural components consisting of two neurons that were recorded through different electrodes. For example, if we have three electrodes recording spikes from two neurons each, there are 30 − 6 = 24 valid pairs of neurons (note that the pairs <*n*1,*n*2> and <*n*2,*n1*> are different). Combining with the 50 possible time intervals per component, we have 24 · 50 = 1200 potential neural components, some of which are frequent while others may never occur. In the days analyzed there were thousands of such neural components.

# **6.2. DATA MINING ON THE DRAWING RECORDING**

The hand position was sampled 100 times per second (dots in the trajectory drawings). The monkey mostly drew in a counter clockwise direction. In order to find repeated patterns of drawing we used data-mining techniques. For this purpose the continuous drawing must be converted into a sequence of events (drawing events). In one experimental day there are hundreds of such events. Searching for repeating sequences of events is greatly facilitated by algorithms of data-mining. A drawing event was marked as occurring at the time at which a certain drawing-property changed from one range of values to another. The property itself may be arbitrarily chosen. For example, it could be defined as a change in the drawing direction from a range of 0-30◦ to a range of 30-60◦. Other definitions can be based on changes in the curvature or in the velocity of the drawing. Once a set of criteria for identifying drawing events is defined, the drawing data is translated into a sequence of these events along the time axis. Then, data-mining algorithms are activated to detect repeating subsequences in the translated data. The repeating subsequences are called drawing components. Naturally, different definitions of the set of criteria lead to different drawing components.

Repeated scribbling paths were extracted by data-mining algorithms. These paths are called drawing components. In a typical day there are 12–22 such drawing components. **Figure 21** illustrates the monkey's drawings and two simple drawing components.

# **6.3. COMPARISON**

To determine whether there are precise timing relations between the spikes of two neurons and the drawing, we selected a time slice before the start of each drawing component. For a given pair of neurons, we counted how many times a spike in the first neuron was followed by a spike in the second neuron within each of 50 particular time intervals. For the first set of measurements, these intervals were: 0–1 ms, 2–3 ms,. . . , 98–99 ms and for the second set they were: 0–0.9 ms, 1–1.9 ms, 2–2.9 ms,. . . , 49–49.9 ms. The interval that repeated the largest number of times was hypothesized to show precise firing times in relation to this particular drawing component. For example, the interval 37–38 ms between neuron 1 from electrode 8 (denoted by 8.1) and neuron 2 from electrode 1 (denoted by 1.2) repeated 372 times within the time window 400 ms to 100 ms prior to the drawing component that is shown in **Figure 21B**. **Figure 22** depicts 62 of these repetitions (uniformly distributed such that 1 of each 6 is shown).

To assess the probability of chance events we generated 1000 surrogate spike trains by randomly teetering the time of each spike within 10 ms around its real time. For each of these surrogates we used the same idea for counting all possible intervals between neurons 8.1 and 1.2, which were repeated around the same drawing component during the same time slice. Similarly, the maximal frequency of these intervals was taken as a representative of that surrogate. For the 2 single units whose firing is illustrated in **Figure 22**, the 1000 maximal frequencies tended to be significantly smaller than 372. Their mean and their variance were used to estimate the probability of these 372 repetitions, assuming normal distribution of counts. For this drawing component and pair of neurons the mean was 350.7 and the standard deviation was 5.0, yielding a probability of 0.00001.

The counting process is not likely to be distributed normally, so that assessing probability by the mean and variance may be misleading. A much more complicated issue involves finding a rare event for all twelve drawing components recorded that day.

**FIGURE 22 | Frequent inter-spike intervals.** Dot display showing occurrences of a frequent inter-spike interval around occurrences of the drawing component that was shown in **Figure 21B**. **Top panel** shows the firing times of unit 8.1. The **bottom panel** shows the firing times of unit 1.2. Each linelet represents a single spike. Linelets representing spikes which took part in the selected interval are colored red. The rasters were aligned on the first spike of the selected interval. The time of onset of the drawing is colored blue. Trials are sorted by increasing delays between the neural intervals and the drawing components. The gray line in each panel represents the average firing rate considering all 372 common occurrences using bins of 9 ms. Scale bars at the bottom right corner of each panel are 50 spikes per second.

We analyzed all the 50 possible pairs of neurons on that day, all 50 possible time intervals for each pair, and for seven different time slices around the start of each of the drawing components. Picking the rarest event out of all these possibilities should yield a highly unlikely event. Hence, we need to assess the likelihood of finding such low probabilities when multiple trials are conducted.

Once we had the time occurrences of all the drawing components and the neural components, we were interested in finding relations between drawing components and pairs of neurons. For each drawing component A and for each possible pair of different neurons <*n*1,*n*2> we counted the occurrences of each neural component around A that consisted of a spike from *n*1 and a spike from *n*2. In other words, we were interested in the total occurrences of each relevant time-interval between a spike of *n*1 and a spike of *n*2. The time regions in which we counted these intervals were determined relative to the occurrences of A by two external parameters *T*from and *T*to. Formally, suppose that during a recording day A occurred at {*T*1,*T*2,*T*3,. . . ,*Tn*}, then the time regions are [*T*i + *T*from, *Ti* + *T*to], where 1 ≤ *i* ≤ *n*.

Eventually, we defined the support of a relation to be the total number of occurrences of the most frequent interval. Note that in practice the supports of the relations were computed for several [*T*from,*T*to] windows. For the first set of measurements the windows were: {[−1.4, −1.1], [−1.2, −0.9], [−1.0, −0.7],. . . ,[−0.2, 0.1]}, while for the second test, they were: {[−1.0, −0.9], [−0.9, −0.8], [−0.8, −0.7],..., [−0.1, 0.0]} At a later stage, the range with the strongest result was selected for each recording day.

**Figure 22** shows the spike activity around the appearances of the drawing components in **Figure 21A**. This figure provides further indications that the relations between the neural interval and the drawing component were not random or due to trivial artifacts: The delay between the neuronal component and the drawing component (blue marks) is not evenly distributed between −0.4 and −0.1 s as might be expected for chance relations. Second, in the lower panel the firing rate is stationary. In this condition, had the red marks been random, the spike density around these dots marks should have approximated the autocorrelation function which must be symmetric. However, the little troughs on both sides of the peak (relative refractoriness) are not symmetric. The difference is significant at 0.013.

Using statistical analysis we showed that the probability for synchronization in the scribbling—neural activity is more than 99.8%. Using the computed support for each relation in a recording day, a statistic called the *relations-score* was extracted for this day (details are given below). Intuitively, relations-score gets larger as the likelihood of the support for the relations decreases. Once a relations-score was computed for the actual data (denoted by *S*0), we evaluated its probability of its occurring by chance.

As relations between hand motion and firing rates of neurons have been studied extensively, we wanted to test whether *S*0 was significantly higher than what we would expect from random data that have similar firing rates. In order to simulate random data that preserves firing rates of neurons, we randomly teetered the time of each original spike within a small window of size W (Geman et al., 2000; Amarasingham et al., 2003).

For example if *W* = 10 ms and the original time of some spike was 125 ms, its new time after teetering may be any time within (120, 130 ms). Using this technique, we generated 5000 such surrogate spike trains (when the first set of measurements was analyzed) or 1000 such surrogates (when the second set was analyzed). Each surrogate train was given a relations-score (*S*1, *S*2, *S*3, . . . respectively) following exactly the same procedure as for computing *S*0 (including: re-teetering for 1000 times to evaluate the probability of each relation, re-selecting the best support among all possible intervals, and re-selecting the best time slice [*T*from,*T*to] that leads to the largest relations-score). These 5000 values were used for estimating the probability [denoted by *p*(*S*0)] of getting the value *S*0 by chance. For example, if only 50 surrogate trains exceeded the relations-score of the actual data, then *p*(*S*0) was estimated by 50/5000 = 1%. **Table 1** shows the minimal width of the teetering window for each day that led to a significant value of *p*(*S*0).

A significant value for this probability indicates that the relations between drawing components and pairs of neurons (in that day) are damaged as a result of teetering within a window of size W. By this we can conclude that around the occurrences of similar behavior, pairs of spikes tend to prefer specific Inter-Spike Intervals (ISIs). We considered only which ones were involved in significant relations (i.e., relations with an estimated probability less than 0.05). Note that due to the fact that only a minute set of neurons in the cortex is recorded, not necessarily all the recorded neurons would show the real time accuracy of the brain.

A solution which can be used to test the null hypothesis that spike times are random within a window of width W was offered (Geman et al., 2000; Amarasingham et al., 2003). If this null hypothesis is true, then replacing the time of each spike by a randomly selected time within W around its true time should not affect any of the statistics extracted from the spike times. To use this idea we need to describe the entire set of relations between firing intervals and drawing by one statistic. To do so we defined a statistic based on the ten least likely relations between pairs of spikes and any of the drawing components. We termed this the *relations-score*. Intuitively, relations-score gets larger as the existing relations are less likely to exist by chance. For each recording day we computed a relations-score for the actual data. Then we randomly teetered all spike times within some time window W and recomputed it for the teetered data. Given a teetering width W, the computation of the relations-score statistic for a recording day involves the following steps:

	- (a) Compute the support of R (denoted by *R*support).
	- (b) Skip R if its support is less than a predefined noise threshold. This step is carried out in order to prune noisy relations. Note that the values used for this threshold were 60, 40, 20, 20, 30, and 30 for the six significant recording days listed in **Table 1** from left to right respectively (depending on the firing rates of the neurons in that day).



*The table shows minimal widths of the teetering windows used in each of the six (out of eight) recording days to obtain a value of p(S0) which was less than 5%. The second set of measurements, where spike times were recorded at a resolution of 0.1 ms (instead of 1 ms), led to much better precision. Note that no significant results were achieved when smaller teetering windows were used.*


Independent teetering was done 5000 times (for the first set of measurements) or 1000 times (for the second set), and a histogram of the relation-scores for teetered data was constructed. After each teetering all the parameters for extracting the relations score were re-evaluated to obtain the highest possible value for each teetered data set. **Figure 23**-left illustrates this histogram for *W* = 10 ms. As stated above, all computations of relations score for each teetered data set were performed de novo by the same process; the same was done for the actual data (including multiple trials of all possible drawing components, pairs of neurons, time intervals and time slices).

This method was used to estimate the probability *p* of the relations-score value of the actual data. From this probability we derived the surprise-value defined as −log2 (*p*). **Figure 23**-right shows the surprise values for one recording session, obtained when spike times were teetered within different windows between 2 and 8 ms. Clearly, teetering within 3 ms already had a significant effect on the surprise value. Thus, **Figure 23** right already indicates that the spike times of the cortical neurons are accurate within 3 ms. Although we used 2 ms bins for measuring the intervals in that day, note that teetering by 2 ms is not pointless, due to the fact that intervals are binned only after teetering is done. Since

**FIGURE 23 | Relations-score and teetered data. (A)** Distribution of relations-scores for surrogate spike trains and the actual data. Five thousand surrogate spike trains were independently generated by teetering spike times within 10 ms. For each of these a relation-score was extracted. The distribution of these relations-score values was estimated by a histogram. The actual data had a value of 106.37 (arrow). None of the 5000 surrogate trains had a value above it. Hence the *p*-value for the actual data was estimated as less than 1/5000. **(B)** Surprise values for different teetering windows. Abscissa is the teetering window, ordinate is the surprise value. The horizontal line shows the surprise value for significance of 0.05. Thus, teetering within 3 ms already had a significant effect.

the original data are measured at a resolution of 1 ms in this case, intervals may be still damaged by teetering.

Significant relations-score values were observed in 3 days (out of three) for the first set of measurements and in three (out of five) for the second set in this study (total of six significant days). For the 3 days of the first set, the smallest teetering windows producing significant results were 3 ms (shown here), 6 ms, and 12 ms. For the 3 days of the second set these windows were 0.5, 3, and 4 ms. Note that this represents an upper bound for the resolution. One may find that a different statistic can indicate even higher time precision.

When the same procedure was repeated step by step for the neural data around randomly selected points in time (instead of time occurrences of drawing components), no significant surprise values were found (for a teetering window W of 10 ms). Thus, we only obtained significant time relations by relating the neural intervals to specific features in the behavior. Furthermore, no significant surprise values were found when the same procedure was repeated taking a teetered neural data instead of the original data.

In section 7 we will present a specific implementation of a neural architecture, the synfire chain (Abeles, 1991) that provides an explanation for the observation of these timing relations.

# **7. MODELING NEURAL MOVEMENT CONTROL WITH SYNFIRE CHAINS**

### **7.1. SYNFIRE CHAINS**

Here we describe neural network simulations of synfire chains and consider in particular how multiple synfire chains may be embedded in a large network. Synfire chains can be seen as constituting sub-networks of the local cortical network and interact in the context of the ongoing background activity. The original version was a chain of length *l* composed of groups of *w* neurons each (the chain width), with full unidirectional connectivity between successive groups. A group together with its output connections can interchangeably be called a link in the synfire chain. Such a chain structure propagates near-synchronous activity in each group along successive groups like a row of dominoes. Detailed properties of various versions of such systems have received considerable theoretical attention; stability properties (Herrmann et al., 1995) have been studied as well as structural variations such as partial group to group connectivity or such as feedback when a given neuron occurs in more than one group. There also have been studies of partially interconnected synfire chains to define the conditions in which activity can jump between chains (Hayon et al., 2005).

For a theoretical framework, we will refer to the balanced random network architecture which is a common single layered model of the local cortical network (van Vreeswijk and Sompolinsky, 1996; Amit and Brunel, 1997; Brunel, 2000). It explains the asynchronous irregular low rate activity of cortical neurons and the large fluctuations of the membrane potential observed *in vivo* and will serve as the basic architecture also here, see **Figure 24**. Our network (**Figure 24**) differs from previous approaches in two essential properties: First, there is heterogeneity in the number of dendritic synapses. Second, the excitatory-to-excitatory sub-network is purely composed of synfire chains.

The dynamics of the network is adjusted to the asynchronous irregular (AI) regime to reproduce relevant aspects of cortical dynamics: Low-rate firing of the individual neurons with Poisson-like interval statistics and large fluctuations of the membrane potential. The model consists of 80% excitatory and 20% inhibitory integrate-and-fire point model neurons (see **Figure 24** for the full set of parameters). For the E-I, I-I, and I-E connections the neurons establish a random number (drawn from a binomial distribution) of dendritic synapses. The mean of the distribution is a tenth of the size of the possible connections which does not exclude multiple synapses between

```
FIGURE 24 | Sketch of the cortical network model. All connections
between the 40,000 excitatory neurons in the model network are formed
by synfire chains (vertically striped arrow). A neuron can occupy only one
position in each chain but may contribute to several ones out of the total of
50 chains. Each chain consists of 20 pools each of which is fully connected
to the next pool. The intra-chain synapses enable excitatory postsynaptic
potentials (EPSPs) of a = 0.5 mV generated by alpha-function current with a
rise time τα = 0.2. Synaptic delays are drawn from a uniform distribution
between 0.5 and 3 ms, but are identical for all synapses connecting one
pair of pools. Chains are stimulated at 1 Hz with independent Poisson
sources. Stimuli arrive only at the neurons in the first pool of a chain and
consist of 100 spike times drawn from a Gaussian distribution with
standard deviation σ = 1 ms. In addition to the excitatory (E) neurons there
are 10,000 inhibitory (I) neurons which are recurrently interconnected
(horizontally striped arrows): each neuron establishes a random number of
synapses drawn from a binomial distribution with the mean given by 10%
of the size of the respective target population. EPSP amplitudes outside
chains are a = 0.1 mV, inhibitory postsynaptic potential [IPSP] amplitude are
6-fold larger and have a rise time of 0.6 ms. Delay distributions are the
same as within chains. Each neuron receives (cross-hatched arrows) an
excitatory DC drive of 350 pA. Further parameters for the integrate-and-fire
point neurons are τ = 20 ms, C = 250 pF, θ = 20 mV, τref = 2 ms.
Excitatory neurons are randomly chosen to form 50 completely connected
synfire chains (vertically striped arrow). A neuron can occupy only one
position in each chain but may contribute to several chains. Intra-chain
synapses are 5-fold stronger than other EPSPs. Intra-chain delays are drawn
from the same distribution as others but are identical for synapses
connecting a particular pair of neuron links. Chains are stimulated at 1 Hz
with independent Poisson sources. A stimulus consists of 100 spike times
drawn from a Gaussian distribution with standard deviation σ = 1 ms and is
received by all the neurons of the first link in a chain.
```
a pair of neurons. The excitatory-to-excitatory sub-network is here, in contrast to other works, purely composed of a superposition of *m* = 50 synfire chains. These chains are consecutively constructed. First, we randomly select *wxl* neurons (here: *w* = 100, *l* = 20) from the population of excitatory neurons without repetitions. In the next step, these neurons are connected into a feed-forward sub-network of *l* successive links of *w* neurons. Each neuron, except the neurons in the first link, establishes a dendritic synapse with all the neurons in the preceding group. The connection procedure is repeated *m* times and leaves us with complete divergent and convergent connectivity between the links (Griffith and Horn, 1963; Abeles, 1982).

An excitatory neuron may participate in several chains but occurs at most once in any given chain. The synaptic delays are drawn from the distribution specified earlier for the other synapses but is kept fixed for all synapses connecting two subsequent links of neurons. The amplitudes of postsynaptic potentials of synapses in the chains are fivefold larger than those of other excitatory connections. In the absence of specific stimuli the synfire chains are not active and the excitatory neurons spike irregularly at a low rate. For the data presented herein we apply independent 1-Hz Poisson trains of stimuli to the synfire chains. A stimulus consists of a volley of 100 spikes. The individual spike times are drawn from a Gaussian distribution centered at the time of stimulation with standard deviation of 1 ms. This volley is sent to all the neurons of the first group in a chain using synapses as strong as the intra-chain synapses.

Synfire activity stably propagates, but several chains can be active simultaneously. With a load of 50 chains the dynamics is still stable, although the global activity exhibits large fluctuations mainly due to the initiation and termination of chains.

In the neural net simulations we deliberately randomized the identity of neurons with respect to the embedded synfire structures to mimic the random sampling in real recordings. It is instructive, however, to remap some data so that adjacent lines in the raster are associated with particular links and chains. The strong, nearly vertical lines in the raster are firings of the synfire chain, background neuron firings in the raster represent either spontaneous activity or the activity of other chains, since each neuron in this chain is also a member of two to three other chains. **Figure 25B** shows a portion of this raster with high time resolution revealing details of several chain runs. The offset between short vertical segments corresponds to activations of successive links; the offset is the interlink propagation or synaptic delay. Note that failure of propagation is possible such as shown in the middle run of the chain.

Without the remapping of neuronal identities (**Figure 25**) synfire chain sequences are not visible in raster-plots. **Figure 26A** shows a 5-s segment of the activity of 150 excitatory neurons randomly selected (and not remapped according to chain membership) from a synfire simulation (50,000 neurons, 50 fully connected synfire chains independently stimulated by low-frequency Poisson trains); the raster has no particular structure. The raster from a larger set of 10,000 excitatory neurons over the same time

**FIGURE 25 | Raster representation of the activity of the neurons in a particular synfire chain. (A)** The randomly assigned neuron numbers have been remapped so that the 2000 neurons of the chain are labeled 8001–10,000 (ordinate). The data show an arbitrary 10-s segment (abscissa) of activity. The thin black, almost vertical lines represent runs of this chain. **(B)** Temporal magnification of a portion of **(A)**. Two complete runs and one partial run of the chain are shown. At this timescale the nearly synchronous activity of each link in the chain becomes visible, as does the propagation time between links.

range is shown in **Figure 26B**. Again, since the neurons are chosen randomly, there are no structures like those in **Figure 25B** that are directly ascribable to activations of individual synfire chains. However, there is an obvious broader vertical striping that connotes a coordinated fluctuation in the population activity; this striping becomes increasingly visible when the raster display shows a large number of neurons (contrast **Figure 26A** vs. **B**). It turns out that these coordinated rate fluctuations are the result of fluctuations in the number of synfire chains that have been activated at any time. Individual neurons, however, experience no rate increase.

**Figure 27** juxtaposes (1) the times of chain stimulations with (2) their (low-pass) temporal representation (i.e., the number of chains activated at any time) and (3) the fluctuations of total population rate. These are clearly strongly, although not completely, related.

The framework described so far combines general ideas about the dynamics of neural activity in the cortex and observations of precise firing patterns related to motor activity. We will now formulate a model that supports the evidence for such relations. An obvious question is now how movement primitives are represented in a dynamic network of spiking neurons.

It has previously been shown that networks of synfire chains can generate sequences of primitives in an abstract model (Schrader et al., 2010; Hanuschkin et al., 2011). We will consider a functional model that is simultaneously capable of reproducing several experimental findings on cortical activity and generating trajectories which exhibit key features of free monkey scribbling. Recent theoretical studies suggest that coupled synfire chain structures can demonstrate compositionality, the hierarchical representation of complex entities in terms of parts and their relations (Abeles, 1982; Hayon et al., 2005).

### **7.2. MOVEMENT REPRESENTATION MODEL**

The model consists of a topologically organized network of synfire chains. Neurons in the same pool of a chain encode the same preferred velocity vector, thus realizing a population coding for movement (Georgopoulos et al., 1986). The trajectories generated by our model consist of a series of parabolic segments similar to those identified experimentally (Polyakov et al., 2001, 2009b) which fulfill the well-established two-thirds power law relationship of velocity and curvature (Lacquaniti et al., 1983; Viviani and Flash, 1995). It has previously been demonstrated that monkey scribbling is well approximated by parabolic strokes (Polyakov et al., 2001, 2009b). Parabolic movement primitives obey the two-thirds power law, are invariant under equi-affine transformations and minimize end effector jerk.

A parabola can be constructed from a constant acceleration produced by a homogeneous force field. Assume the initial position and velocity of a point mass is *X*<sup>0</sup> = - *x*0, *y*<sup>0</sup> *<sup>T</sup>* and *V*<sup>0</sup> = *X*˙ <sup>0</sup> = - *x*˙0, *y*˙<sup>0</sup> *T*, respectively. If the point mass experiences a constant acceleration *a* = - *ax*, *ay <sup>T</sup>* then the trajectory of *x* = - *x*, *y T*is given by

$$\mathbf{x}(t) = \mathbf{x}\_0 + \nu\_0 t + \frac{1}{2}at^2$$

The curvature of the path is

$$c = \frac{\left|\ddot{\mathbf{x}}\ddot{\mathbf{y}} - \dot{\mathbf{y}}\ddot{\mathbf{x}}\right|}{\nu^3} \tag{4}$$

where *v* = *x*˙ is the tangential velocity. Since *x*˙*y*¨ <sup>−</sup> *<sup>y</sup>*˙*x*¨ <sup>=</sup> *xa*˙ *<sup>y</sup>* <sup>−</sup> *ya*˙ *<sup>x</sup>* is constant, c and v obey the following relation

$$\nu = Kc^{\frac{1}{3}}\tag{5}$$

where *K* is named velocity gain factor. Equation 5 is thus satisfied by parabolic movement segments.

Constant accelerations are equivalent to velocities that change linearly in time, as *a* = *dv dt* . In velocity space, linearly evolving velocities (*v*˙ = const) are represented by a uniform motion along a straight line. Therefore, straight lines in velocity space can be mapped into parabolic trajectories in position space.

A neural architecture which can be associated with a uniform motion is the synfire chain (SFC) (Abeles, 1991). In the simplest formulation of the SFC concept, excitatory neurons are grouped in pools and each neuron is connected to all neurons of the following pool creating a chain of convergent and divergent feed-forward connections. If the first group is stimulated with sufficient strength and a sufficiently high degree of synchrony, a wave of synchronous activity propagates along the chain. The propagation along the chain is at constant speed (Wennekers and Palm, 1996; Diesmann et al., 1999) and stable under fairly general conditions (Herrmann et al., 1995).

An appropriate mapping of preferred velocities to the pools of a synfire chain enables the generation of an individual parabolic segment. By extension, a series of parabolic strokes in position space can be realized by uniform motion along a graph of connected straight lines in velocity space. Each arrow in velocity space is realized by a synfire chain with the corresponding velocity mapping. By construction, the velocity at the end of one parabolic stroke is equal to the velocity at the beginning of the next. When the activity volley in a synfire chain reaches the final pool, feed-forward connections to the initial pools of the two potential successor chains initiates the propagation of an activity volley in each of them. Assuming a strong competition between the two stimulated chains, such that only one of the chains can continue propagating the activity, a trajectory of parabolic segments is produced. We analyze the properties of the generated trajectories to see whether they are sequences of parabolic segments in accordance to the above remarks on equi-affine geometry.

The activity of single cells in the motor cortex has been shown to be directionally tuned to arm movements (Georgopoulos et al., 1982). The arm trajectory can be estimated by calculating the population average over all neurons (Georgopoulos et al., 1986, 1988). Similarly, we use population coding to generate a trajectory from simulated neuronal activity:

$$\nu = \sum\_{k}^{\text{neurons}} \,\,\nu\_k a\_k(t) p\_k = \sum\_{i}^{\text{chain pool}} \sum\_{j}^{\text{pool}} \nu\_i^j a\_i^j(t) p\_i^j \tag{6}$$

where *v* is the instantaneous velocity, *a j i* (*t*) is the activity in the *j th* group of the *i th* chain and *p j <sup>i</sup>* its i preferred velocity. The weights are set to *w<sup>j</sup> <sup>i</sup>* = 0.02*s* ∀*i*, *j* resulting in velocities comparable to the monkey experiments [median 300 mm/s as given by Polyakov et al. (2009a,b)]. The propagation speed of the activity volley in an SFC from one pool to the next is constant as described above. We can therefore map an SFC to an arrow in the velocity space. Each pool of the SFC is assigned its preferred velocity *pj* according to its position along the arrow, i.e., for a chain consisting of *n* pools mapped to an arrow starting at *v*<sup>0</sup> and ending at *v*1,

$$p\_i = \frac{i-1}{n-1} \left(\nu\_1 - \nu\_0\right) + \nu\_0 \tag{7}$$

This is illustrated in **Figure 28A**; the activity of the corresponding synfire chain is given in **Figure 28B**. As the preferred velocity for each chain *j* changes linearly with the pool index *i* and the propagation speed from one pool to the next is constant, the instantaneous velocity vector also evolves linearly resulting in parabolic motion as derived in section 7.2. **Figure 28C** shows the parabolic trajectory in position space generated by the synfire activity in **Figure 28B**. To extract the trajectory from the simulated neuronal activity, we bin the activity in 1 ms intervals

**FIGURE 28 | Mapping synfire activity to parabolic movements. (A)** The preferred velocity vectors for the pools of the synfire chain (gray arrows; shown for every third pool of the chain) are determined by sampling a straight line in velocity space (red arrow). **(B)** The spiking activity of an activity volley propagating with constant speed along a synfire chain. Preferred velocity vectors for every third pool as in **(A)** are shown as gray arrows above the dot display. **(C)** Generated parabolic trajectory. The black cross at (0,0) indicates the start position.

and reconstruct the motion according to the population coding scheme given by Equations 6 and 7.

### **7.3. COMPETITION AND COOPERATION BETWEEN CHAINS**

We develop a spiking network model to realize a generator of random trajectories consisting of parabolic segments. Our model comprises two interconnected networks. The synfire chain network (SFCs) consists of ten chains, each chain corresponding to one of the arrows in velocity space and thus encoding a parabolic segment. Each chain consists of 80% excitatory neurons that make feed-forward connections with dilution factor *p*<sup>d</sup> = 0.75 and 20% inhibitory neurons making *k*<sup>g</sup> random connections to other neurons in the SFCs network. To distinguish the random inhibitory connections from other connectivity patterns, we will refer to *k*<sup>g</sup> as the global inhibition parameter.

Feed-forward connections join the final group of each chain to the initial groups of two other chains, e.g., the final group of chain 1 has feed-forward connections to the initial groups of chains 2 and 7, see **Figure 29**. The preferred velocity of the last group of a chain is the same as the first groups of the chains it connects to in order to generate trajectories that are smooth at the transition points. Reliable switching at the transition points is enabled by mutual inhibition between potential successor chains.

We create a synfire chain with feed-backward as well as feed-forward connections, both with a dilution factor of *p*<sup>d</sup> (a backward-and-forward connected chain, BFC). One end of the BFC makes excitatory feed-forward connections with dilution

**FIGURE 29 | Generation of scribbling trajectories. (A)** Spiking activity of bidirectional and synfire chain networks. Colors of the activity of each chain corresponds to the colors of the arrow in velocity space shown in **(B)**. Above the raster plot the average firing rate of the synfire network (red) and the bidirectional network (blue) is plotted. **(B)** Abstract generator for trajectories consisting of parabolic segments. Uniform motion along straight lines in velocity space is equivalent to parabolic motion in position space. Each colored arrow represents a parabolic segment and its direction of execution. When the end of an arrow is reached, one of the two successor arrows is selected. **(C)** Scribbling trajectory extracted from the spiking activity using population coding. Segments are drawn in the color of the most active synfire chain.

factor *p*<sup>d</sup> to the initial pool of chain 1 in the SFC network. Each inhibitory neuron in the SFCs network makes *kB* connections to neurons randomly selected from the BFC network, thus inhibiting its activity when synfire activity is present. If synfire activity is extinguished, the drop in inhibition causes a self-ignition in the unstable BFC, which in turn triggers a fresh wave of activity in chain 1. Thus, the recurrent connections between the SFCs network and the BFC network ensure sustained activity. The dynamics of the BFC network and of the interaction with the SFCs network are investigated in section 7.4. The scaling of inhibitory synapses with respect to excitatory synapses and the rate of the external excitatory Poisson input to each neuron in the SFC network are chosen such that in the absence of synfire activity, the network spikes in the asynchronous irregular (AI) regime (Brunel, 2000).

Each of the chains in the SFC network represents a parabolic movement primitive. To produce a series of primitives, it is necessary that activity reliably propagates from one SFC to exactly one of multiple (here two) potential successor SFCs at the vertices of the network graph. In our model, cross-inhibition between two competing SFCs realizes this switching between two simultaneously activated SFCs. There are two possible approaches to achieve reliable switching: (1) cross-inhibition can be structured such that synchronous activity in each pool directly inhibits the activity in the next pool of the competitor chain. This approach is motivated by the idea of synfire binding (Hayon et al., 2005; Schrader et al., 2010), in which two simultaneously active chains can bind a third chain due to structured excitation. In an alternative approach (2) the cross-inhibition is unstructured. Synfire chain competition relying solely on global inhibition has recently been proposed by Chang and Jin (2009). However, in their study the synfire chain activity is "driven": a supra-threshold driving input is combined with dominant global inhibition. In contrast, our model exhibits activity in the asynchronous irregular regime due to balanced global inhibition (van Vreeswijk and Sompolinsky, 1996) and only exhibits synfire activity if the initial pool of a chain receives additional stimulation. Due to our different activity regime, additional assumptions on the inhibition between chains need to be made to realize reliable switching.

Each neuron in the initial pools of the potential successor chains is activated by *pCE* randomly chosen excitatory neurons from the final pool of the preceding SFC. The symmetric connections ensure that the successor chains are stimulated equally. All inhibitory neurons of pool *i* of one potential successor chain project to *kc* neurons of pool *i* + 1 of the other potential successor chain, and vice versa. Thus, each wave of synchronous activity directly inhibits the propagation of the activity to the next pool in the competitor chain, leading to a competition. The activity in the losing chain dies away leaving the activity in the winning chain to continue propagating, thus realizing a switching mechanism.

### **7.4. ACTIVITY STATE TRANSITION IN THE BFC**

When the backward-and-forward connected chain (BFC network) described above is not being inhibited by the presence of synfire chain activity in the SFCs network, the external drive is just strong enough to induce spontaneous synfire activity and so re-ignite activity in the SFCs network. The ignition of synfire activity in the network can be understood intuitively as follows: random synchronous activity in a small subset of the neurons in a given pool *i* will be projected to the pools *i* ± 1, which in turn project back to *i*, thus building a recurrent positive feedback loop. Synfire activity emerges spontaneously at around 380 ms and propagates in both directions along the chain. Once the spike volleys have reached the ends of the BFC the synfire activity is extinguished. A reflection of activity does not occur because for any activated pool, the neurons in the previously activated pool have been reset by the propagating volley. When the SFCs network has been re-ignited, the increased inhibition decreases the net drive to the network such that no spontaneous synfire activity occurs.

Synfire volleys caused by spontaneous self-ignition tend to be of limited duration. Activity volleys traveling in different directions cancel each other when they meet and volleys reaching either end of the BFC are not reflected. Furthermore, the high activity in the BFC during synfire activity results in strong global inhibition and a reset of nearly all neurons, which in turn decreases the probability of self-ignition until activity has built up again. However, if the pool size is chosen sufficiently large, a single self-ignition results in ongoing pathological high firing rates.

The spiking activity of the complete network underlying the average firing rates is given in **Figure 29A**. The trajectory extracted from the synfire activity is shown in **Figure 29C**. The trajectory consists of a long random sequence of parabolic movement primitives. Small overlaps can be seen at the transition points where both successor chains are active before one of the chains wins the competition. The distribution of the length *n* of an uninterrupted sequence is well fitted by *P*(*n*) = *p*<sup>0</sup> - 1 − *p*<sup>0</sup> *n* , where *p*<sup>0</sup> is the probability that neither successor chain is activated during synfire chain switching.

We analyze the characteristics of the trajectory shown in **Figure 29C**. Due to the piece-wise constant accelerations we conclude that the trajectory does indeed fulfill the two-thirds power law (e.g., Viviani and Flash, 1995). The equi-affine curvature of the trajectory is close to 0. We therefore conclude that the trajectory does indeed consist of a series of parabolic segments.

# **8. COMPOSITIONAL NATURE OF THE DRAWINGS**

## **8.1. TEMPORAL PROPERTIES OF STROKELETS**

We will now describe another approach to analyze the compositional nature of the drawings. Using machine learning techniques, we show how very elementary strokelets can be treated as allophones of spoken language, which can be in turn clustered into equivalence groups (like phonemes of language) and how by Hidden Markov modeling combined with the informationbottleneck technique, these groups can be composed into words. Furthermore, the syntax of sequencing these words into sentences is revealed in this analysis.

Human subjects were tested on the hidden target task as described in the methods (Section 3). Each subject was first holding the manipulandum while searching for an invisible target. Each target hit added a little sum to the subject's fee, so subjects were highly motivated to hit as many as possible targets. After a few sessions of this type, subjects were told that every time they hear a beep they should stop as fast as possible.

We postulated that if the searching motion is composed of primitives (Sosnik et al., 2007), then it would be hard to stop anywhere. Rather if the "command" to draw a primitive was already issued it will execute to completion. Indeed, it was found that typically motion continued with several peaks in the tangential velocity. **Figure 10B** illustrates that. The delay from the instruction till stopping was longer then delays reported for stopping a simple motion. Furthermore, in subjects where the motion-style showed repetitive patterns the piece of drawing which continued after the "stop instruction" was similar to the typically repeated pattern.

**Figure 10** also illustrates the well-known phenomenon of isochrony. The duration of each hill in the drawing was approximately identical and those after the stop instruction had the same duration as those while freely scribbling. We argue that the neural networks responsible for such behavior should have similar properties: constant duration of activity and hard to stop in the middle. Indeed, the synfire chains as proposed in section 7 have these properties.

To get a better glimpse at what are the elements of these drawings we broke the drawing to small pieces and studied the rules of combining the pieces into a whole. Breaking into pieces was done by considering the drawings between successive extrema of tangential velocity as drawing elements. It may look artificial to break a motion into two elements at the peak velocity. However, drawing parameters may change abruptly at points of maximal tangential velocity. Our analysis in the previous parts pointed at parabolas as being drawing primitives in monkey scribbling. These primitives too, start and end at maxima of tangential velocity. **Figure 30** illustrates a little piece of the drawing and how it was decomposed into little strokelets.

### **8.2. PARSING THE MOTION**

The scribbling in one block of motion for one subject is illustrated in **Figure 11**. If we look on a short stretch from this block

(**Figure 30B**), it is clear that he moves at a high speed when the trajectory is close to straight and slows down when it is curved. **Figure 30A** shows the tangential velocity (speed) of drawing for this section. It is composed of 3 hills of almost equal duration (isochrony).

We used this features to parse the drawing into small strokelets. **Figure 30A** shows the tangential velocity during 1 s of scribbling. Each strokelet spans the time between adjacent extrema and is colored differently. **Figure 30B** shows the actual drawing shape of each of these strokelets.

We selected 10 equally spaced points on each strokelet and computed the angle of the tangent at this point. These, ten dimensional vectors were then grouped by a Gaussian Mixture Model (GMM). By construction, an accelerating strokelet must be followed by a decelerating one and vice versa, except at the very beginning and end of the drawing. Therefore, we grouped separately the two kinds of strokelets. GMM was able to divide the strokelets into 30–50 groups.

# **8.3. INFORMATION BOTTLENECK METHOD**

We further clustered the groups using the information bottleneck idea (Tishby et al., 1999). Suppose we deliver a large number of stimuli (X) and make observations of the responses (Y). We can ask whether we can represent X by a small number of clusters (P) such that the mutual information I (P; Y) between P and Y will be close to that between X and Y. If so, P will be the "stimulus features" represented in the responses. Alternatively, we may ask whether we can cluster Y into a small number of clusters Q, such that I (X; Q) is close to I (X, Y). If so, then Q represents a feature representation in the responses. In our case, we treat the strokelets as "stimulus" and the following strokelets as "responses." Thus, the sets X and Y are the same. Therefore, we expect P = Q [see Erez et al. (2009) for more details]. Using this idea, we clustered the groups of strokelets into twelve clusters, see **Figure 31**. We can take the matrix of probabilities of transitions between strokelet-groups and reshuffle rows and columns so that all strokelets in the same cluster are at neighboring lines (and columns) in **Figure 12**.

Several features jump to the eye when examining **Figure 12**. Nonzero transitions are concentrated in patches. We observe that there are three distinct cycles. Clusters {1, 2, 3, 4} follow each other in cyclical manner and so do {5, 6, 7, 8} and {9, 10, 11, 12}. Finally transitions from one cycle to another happen only at particular groups. If we compare the drawings to language, we can say that each individual strokelet is like a phone, the 47 groups of strokelets are phonemes, the cycles are words and the sequences of words are sentences. Let us call the cycles W1, W2, W3, then the sentences look like: "W1 W1 W1 ... W1" followed by "W3 W3 W3 ... W3" followed by "W2 W2 W2 ... W2." The transition between sentences happens with specific allophones (for example W3 changes to W1 mostly by one allophone of cluster 9 to one allophone of cluster 2).

This structure can be seen readily in the actual drawings (**Figure 11**) where the subject is scanning the workspace repeatedly from left to right and back, then vertically and then obliquely. We note that this form of scribbling is not just less elegant than the monkey's scribbling (**Figure 1**), but also involves much

higher accelerating—decelerating motion and larger distances between hitting targets and obtaining the reward. We conclude by stating that we found strong evidence for primitives in human scribbling and for hierarchical organization of the drawing elements.

# **9. SUMMARY**

In sections 3–8 we provided further evidence to support our working hypothesis. According to this hypothesis when a subject repeats again and again he same scribbling task, the entire scribbling may be regarded as a composition of a few elementary shapes. These shapes may be produced by activity propagating between neuron-groups. In each group the neurons have similar velocity tuning and the propagation is with approximately constant delay from group to group. When the drawing elements are parabolas they are better described by equi-affine parameters and the sequence of neuronal groups are laid along a straight line in velocity space. We provide some evidence that for neurons that are tuned to the speed of motion equi-affine speed is somewhat more adequate descriptor then the Euclidian speed.

The synfire chain model is a simple network that may have the required properties. We show that it is feasible to generate drawings similar to that of a monkey by such networks. We also provide indirect support to this idea by showing that one can detect precise timing relation between pairs of neurons in relations to the drawings.

Finally we show how the preferred sequence of such elements in the drawings can be revealed, and suggest that such preferred sequences are generated by partial connections among synfire chains.

The supporting evidence given here is partial and there is room for considerable more experimental and theoretical work on all the points suggested here.

# **ACKNOWLEDGMENTS**

We thank our students and post-docs and colleagues who contributed to most of the work described here: I. Asher, Y. Ben Shaul, H. Bergman, R. Drori, K. Erez, K. Fiedler, J. Hass, A. Levina-Martius, Z. Nadasdy, F. Polyakov, M. Shemesh, T. Shmiel, R. Sosnik, E. Stark, and E. Vaadia. This project was supported in part by Deutsche – Iraelische Projektkooperation, DIP No. F1.2.

# **REFERENCES**


et al. (2002). Movement smoothness changes during stroke recovery. *J. Neurosci.* 22, 8297–8304.


mathematical model. *J. Neurosci.* 7, 1688–1703.


motor cortex. *J. Neurosci*. 27, 5105–5114. doi: 10.1523/ JNEUROSCI.3570-06.2007


B. Steenbergen (The Netherlands: University of Nijmegen), 78–83.


*J. Neurosci.* 27, 8387–8394. doi: 10.1523/JNEUROSCI.1321-07.2007


*Percept. Perform.* 17, 198–218. doi: 10.1037/0096-1523.17.1.198


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 18 May 2013; paper pending published: 11 June 2013; accepted: 11 July 2013; published online: 12 September 2013.*

*Citation: Abeles M, Diesmann M, Flash T, Geisel T, Herrmann M and Teicher M (2013) Compositionality in neural control: an interdisciplinary study of scribbling movements in primates. Front. Comput. Neurosci. 7:103. doi: 10.3389/ fncom.2013.00103*

*This article was submitted to the journal Frontiers in Computational Neuroscience.*

*Copyright © 2013 Abeles, Diesmann, Flash, Geisel, Herrmann and Teicher. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Muscle synergies evoked by microstimulation are preferentially encoded during behavior

#### *Simon A. Overduin1 \*, Andrea d'Avella2, Jose M. Carmena1,3,4 and Emilio Bizzi <sup>5</sup>*

*<sup>1</sup> Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA, USA*

*<sup>3</sup> Helen Wills Neuroscience Institute, University of California, Berkeley, CA, USA*

*<sup>4</sup> UCB-UCSF Joint Graduate Group in Bioengineering, University of California, Berkeley, CA, USA*

*<sup>5</sup> Department of Brain and Cognitive Sciences, McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA*

### *Edited by:*

*Tamar Flash, Weizmann Institute, Israel*

### *Reviewed by:*

*Le Wang, Boston University, USA Michael Graziano, Princeton University, USA Christopher Fricke, University Hospital of Leipzig, Germany*

### *\*Correspondence:*

*Simon A. Overduin, BMI Systems Lab, Department of Electrical Engineering and Computer Sciences, University of California, 253 Cory Hall #1770, Berkeley, CA 94720, USA e-mail: overduin@mit.edu*

Electrical microstimulation studies provide some of the most direct evidence for the neural representation of muscle synergies. These synergies, i.e., coordinated activations of groups of muscles, have been proposed as building blocks for the construction of motor behaviors by the nervous system. Intraspinal or intracortical microstimulation (ICMS) has been shown to evoke muscle patterns that can be resolved into a small set of synergies similar to those seen in natural behavior. However, questions remain about the validity of microstimulation as a probe of neural function, particularly given the relatively long trains of supratheshold stimuli used in these studies. Here, we examined whether muscle synergies evoked during ICMS in two rhesus macaques were similarly encoded by nearby motor cortical units during a purely voluntary behavior involving object reach, grasp, and carry movements. At each microstimulation site we identified the synergy most strongly *evoked* among those extracted from muscle patterns evoked over all microstimulation sites. For each cortical unit recorded at the same microstimulation site, we then identified the synergy most strongly *encoded* among those extracted from muscle patterns recorded during the voluntary behavior. We found that the synergy most strongly evoked at an ICMS site matched the synergy most strongly encoded by proximal units more often than expected by chance. These results suggest a common neural substrate for microstimulation-evoked motor responses and for the generation of muscle patterns during natural behaviors.

### **Keywords: motor, movement, muscle, synergy, hand, macaque, grasping, cortex**

# **INTRODUCTION**

In numerous studies, motor primitives have been defined as "synergies" in which a group of muscles is simultaneously recruited, each with a specific balance of activation. These investigations have involved frog axial and hindlimb behaviors (Tresch et al., 1999), cat axial and hindlimb behaviors (Ting and Macpherson, 2005) and forelimb reaches (Yakovenko et al., 2010), rat forelimb reaches and grasps (Kargo and Nitz, 2003), monkey forelimb reaches and grasps (Brochier et al., 2004; Overduin et al., 2012), and human axial and hindlimb behaviors (Torres-Oviedo and Ting, 2007) and forelimb reaches, grasps and gestures (d'Avella et al., 2006; Klein Breteler et al., 2007; Berger et al., 2013). Analogous synergies have also been defined at the kinematic level (e.g., Santello et al., 1998; Gentner and Classen, 2006; Gentner et al., 2010).

Much of the direct evidence for synergistic muscle control by the central nervous system (CNS) comes from studies of the spinal cord. For instance, chemical intraspinal microstimulation in the frog evokes topographically-organized, low-dimensional electromyographic (EMG) activity patterns (Saltiel et al., 2001). In spinalized frogs, too, neurons including those in the intermediate zone of the spinal cord are better correlated with synergistic premotor drives than to the activity of individual muscles (Hart and Giszter, 2010). The spinal cord of primates contains premotor interneurons facilitating multiple muscles, including those intrinsic to the hand (Takei and Seki, 2010), and has been proposed as a substrate for synergies (Tresch et al., 1999; Cheung et al., 2009). Phasic activation of such units may be responsible for multi-muscular EMG bursts (Kargo and Giszter, 2008).

Synergies may be encoded at supraspinal as well as spinal levels, e.g., in the brainstem (Roh et al., 2011). At the level of the primary motor cortex (MI), in rodents learning a reaching task the firing rates of a minority of neurons are correlated with changes in the activation of synergies extracted from forelimb muscle EMG activity (Kargo and Nitz, 2003). Continuous neural control of synergistic muscle groups in primates is also circumstantially suggested by the ability to reconstruct forelimb EMG profiles as the weighted sum of an ensemble of neurons (Morrow and Miller, 2003; Schieber and Rivlis, 2007). MI neurons have muscle fields (defined by the strongest cell-EMG correlations) that appear to fall into relatively few, synergy-like clusters (Holdefer and Miller, 2002). Such cortical muscle fields may be "hard-wired," changing their structure rarely, if ever (Kargo and Nitz, 2003). This may be particularly true of corticospinal and corticomotoneuronal cells, which facilitate small sets of muscles relatively directly (Fetz and Cheney, 1980; Bennett and Lemon, 1994). The latter cell

*<sup>2</sup> Laboratory of Neuromotor Physiology, Santa Lucia Foundation, Rome, Italy*

population may define a "new MI" that affords muscular coordination unconstrained by synergies encoded in the spinal cord (Rathelot and Strick, 2009).

As with intraspinal microstimulation, application of relatively long trains of suprathreshold electrical current to individual sites in motor cortex can evoke complex movements. In cats (Ward, 1938), rats (Haiss and Schwarz, 2005; Ramanathan et al., 2006), prosimians (Stepniewska et al., 2005, 2011), and macaques (Graziano et al., 2002a, 2004b, 2005), suprathreshold microstimulation trains lasting several hundred milliseconds evoke complex multijoint forces that frequently drive the animal's body toward invariant postures. Shorter-train (<100 ms) intracortical microstimulation (ICMS), in contrast, typically evokes simpler twitchlike movements, often restricted to single joints (Graziano et al., 2002a; Stepniewska et al., 2011).

In the case of forelimb and hand areas of non-human primate motor cortex, ICMS has been shown to evoke behavioral fragments including reaching and defensive motions (Graziano et al., 2002a; Kaas et al., 2013). It has been suggested that motor cortical areas may be defined by a continuous map of endpoint postural space (Graziano et al., 2002b), e.g., divided according to the 3D regions around the monkey to which the forelimb is driven by ICMS (Graziano et al., 2002a), or to distinct behaviors like reaching and defending (Graziano et al., 2005). Whatever the nature of the topographical clustering of ethologically-relevant movements on the motor cortical surface, this organization appears to be reflected by distinct, interconnected regions in premotor and posterior parietal cortex (Stepniewska et al., 2005, 2011, 2013; Gharbawie et al., 2011).

Recently, we demonstrated that ICMS of monkey motor cortex elicited EMG patterns that could be decomposed into muscle synergies—ones similar to those seen in natural behavior (Overduin et al., 2012; for discussion see also Diedrichsen and Classen, 2012; Bizzi and Cheung, 2013; Santello et al., 2013). These EMG patterns co-occurred with movements of the hand toward a convergent posture. Like the apparent topographical organization of spinally-encoded synergistic limb movements (Tresch et al., 1999; Bizzi et al., 2000, 2002, 2008; Tresch et al., 2002), we also observed a non-uniform representation of each forelimb synergy on the cortical surface of macaques (Overduin et al., 2012). Much as microstimulation of multiple points in the spinal cord of frogs (Mussa-Ivaldi et al., 1994; Lemay et al., 2001) and rats (Tresch and Bizzi, 1999) evokes a linear summation of convergent forces and end posture, when ICMS is applied at multiple points in the motor cortex of anesthetized cats, the evoked EMG activity tends to sum linearly (Ethier et al., 2006). Together these results suggest a role of muscle synergies in simplifying (even to the point of linearizing) motor control.

Yet even if ICMS is able to evoke movements through combination of hard-wired muscle synergies, this does not imply that the synergies are *organized* intracortically. In humans, for instance, cortical stroke appears to spare some or most synergies (Cheung et al., 2009, 2012; Cruz and Dhaher, 2009), suggesting encoding at a subcortical locus. In the absence of descending signals from the brain, the spinal cord remains fully capable of generating complex behaviors and muscle activations (Pearson and Rossignol, 1991; Zimmermann et al., 2011), as well as microstimulation-evoked convergent forces (Giszter et al., 1993; Aoyagi et al., 2004).

Here, we examined motor cortical activity during voluntary behavior to see if it might play any role in controlling the activation (if not the structure) of muscle synergies. In particular, we tested a simple experimental hypothesis motivated by our earlier microstimulation work (Overduin et al., 2012), namely that cortical units should preferentially encode synergies similar to those evoked by ICMS at the same electrode. (Our null hypothesis, in contrast, was that whichever synergy was most strongly encoded by a given unit would bear only at-chance similarity to those evoked by ICMS near the unit.) If so, this would suggest that intracortical currents—whether endogenously generated by motor planning or exogenously introduced by microstimulation—may determine the degree to which downstream synergies are recruited.

# **MATERIALS AND METHODS**

### **SUBJECTS**

Behavioral, muscular, and cortical data were collected from two rhesus macaques (*Macaca mulatta*): G1 (a 5.9-kg, 8-year-old female) and G2 (a 6.5-kg, 4-year-old male). Procedures were approved by the MIT Committee on Animal Care, and conformed to the National Institutes of Health *Guide for the Care and Use of Laboratory Animals*.

# **BEHAVIOR**

Subjects used their left hand to press a start button and then reach for, grasp, and carry objects between two wells spaced 20 cm apart. Reward (0.2–0.3 ml of apple juice or water) was given if the object was removed from the first well within 1.0 s, and released into the second well within another 1.0 s, where it had to remain for at least 0.1 s. Button press, reward dispensation, and data from two photosensors (E3T-SR12; Omron, Kyoto, Japan) mounted within each well were recorded together, allowing trials to be divided into reach and carry phases (Overduin et al., 2008). The 25 Delrin plastic objects (density 1.4 g/cm3) included 5 spheres of variable diameter (ranging from 1.6 to 3.6 cm), 5 cubes of variable width (1.5–3.6 cm), and 15 cylinders of which 5 each spanned one of three dimensions (height, 0.6–5.7 cm; uniform diameter, 1.3–3.8 cm; inner diameter, 0.6–3.2 cm). The number of objects presented in a given day, or spanned by any one single unit recorded in a given session, could be fewer than 25. The same object was presented enough times consecutively for the animal to be able to perform 10 successful trials, before another object was pseudorandomly selected. During recordings, subjects were head-restrained via an implanted cranial post.

### **SESSIONS**

Cortical recordings from monkey G1 comprised 7798 successful left-target-directed trials performed over 20 recording sessions spanning 45 days; those from G2 comprised 775 left-targetdirected trials performed over 6 sessions spanning 6 days. The analysis presented here, however, was focused on the subset of these data for which ICMS experiments were performed on the same days as the cortical recordings (4485 trials over 13 sessions from G1 and 544 trials over 4 sessions from G2). Muscle recording was done in each of these sessions from G1 and G2, and also included trials (as well as other, interspersed sessions) when cortical recording was not done. The EMG data from which synergies were extracted (Overduin et al., 2008, 2012; **Figure 3A**) thus include trials recorded both with and without simultaneous neural data. In particular, of G1 and G2's 1000 left-target-directed trials used to construct trial-averaged EMG activity for synergy extraction, 482 and 462 trials overlapped with the 4485 and 544 trial subset we focus on here, respectively. The EMG data of the remaining 89% (4003/4485, G1) and 15% (82/544, G2) of trials are new to this report. None of the cortical data have previously been presented.

# **SURGERY**

Cortical surgeries followed (G1) or occurred along with the first of (G2) the muscle electrode implantation surgeries (described in Overduin et al., 2008). These surgeries were performed under sterile conditions and general anesthesia (0.05 mg/kg atropine and 10 mg/kg ketamine injected intramuscularly, followed in G1 by 5 mg/kg sodium pentobarbital intravenously or in G2 by inhalation of 1–2% isoflurane with 2 L O2). Craniotomies were centered over right-hemisphere motor cortex. Custom stainless steel wells (G1: 28 mm wide, G2: 20 mm) were secured with bone screws and bone cement. The animals were given analgesics and systemic antibiotics following the surgeries. The dura was kept intact during surgery, and was subsequently treated with topical antibiotics and anti-inflammatories. Fresh connective tissue growth above the dura was further controlled by periodic (∼ 1× weekly) mechanical scraping, done under light anesthesia through the weeks of recording.

# **MUSCLES**

EMG recordings were made via 15 (G1) or 19 (G2) electrodes chronically implanted in muscles of the left forelimb. Proximal muscles acting on the shoulder and elbow included Del (deltoideus), Pec (pectoralis major), TriU and TriR (triceps brachii, ulnar and radial short heads), Bic (biceps brachii longus), and BR (brachioradialis). Wrist and extrinsic hand extensors included AbPL (abductor pollicis longus), ECRB (extensor carpi radialis brevis), EDC (extensor digitorum communis), ED23 (extensor digiti secundi and tertii proprius), ED45 (extensor digiti quarti and quinti proprius), and ECU (extensor carpi ulnaris). Wrist and extrinsic hand flexors included FCR (flexor carpi radialis), FDS (flexor digitorum superficialis), FDPU and FDPR (flexor digitorum profundus, ulnar and radial), and FCU (flexor carpi ulnaris). Intrinsic hand muscles included AbPB (abductor pollicis brevis), AdP (adductor pollicis), OpP (opponens pollicis), F5B (flexor digiti quinti brevis manus), and Op5 (opponens digiti quinti manus). EMG data were recorded on a trial-by-trial basis (between buttonpress and reward events). These data were bandpass-filtered (between 10 and 1000 Hz), notch filtered (60 Hz), and differentially amplified (5000×) by a programmable signal conditioner (CyberAmp 380; Molecular Devices, Sunnyvale, CA) under software control (CyberControl; Molecular Devices). Data were digitized (2 kHz) via a data acquisition board (NI PCI-6035E; National Instruments, Austin, TX) under custom software control (LabVIEW; National Instruments). EMG channel subselection following cross-talk analysis is described in Overduin et al. (2008).

## **CORTEX**

Single units were recorded from dorsal premotor cortex (PMd), ventral premotor cortex (PMv), and MI (**Figure 1**). Areas were identified using magnetic resonance imaging data and sensorimotor mapping including ICMS. MRIs were collected 2 years after (G1) or 4 months before (G2) the recordings presented here. At the beginning of recording sessions, unit somatosensory (proprioceptive and cutaneous) response fields were estimated by passively moving the monkey's limbs and by stimulating its skin. Both in early exploratory sessions and at the end of each recording session, ICMS was applied via tungsten microelectrodes (FHC, Bowdoin, Maine; 0.3–3-M impedance, 250-μm shaft diameter tapered to a 3-μm-wide tip). Up to 10 such electrodes were acutely introduced into the brain in each session using a manual microdrive (30-μm depth resolution). The microdrive was mounted on a grid that was secured to the recording well and that constrained the interelectrode spacing to 1 mm. The ICMSevoked movements and stimulation thresholds were used to identify the portion of cortex sampled by the electrodes. Stimulation parameters used for this mapping included 2 × 0.2-ms pulse duration (cathodal-leading), 10–150-μA current, 330-Hz pulse frequency, and 0.05-s train duration. Modified ICMS parameters were used to evoke longer-lasting movements at the end of some sessions (**Figure 2A**): 8–100-μA current, 200-Hz pulse frequency, and 0.15–0.5-s train duration (as in Overduin et al., 2012). We consider only the first seven ICMS trains delivered at each site, this being the minimal number applied across the 33 (G1) or 13 (G2) sites (including 32 in MI, 9 in PMd, and 5 in PMv; Overduin et al., 2012).

# **UNITS**

Extracellular, intracortical voltages were recorded through the same electrodes used for ICMS. During recordings, no attempt was made to record from units within particular cortical layers or in sulcal sites, and the laminar location of recorded units was generally unknown given limited depth resolution and dimpling of the cortex upon penetration. Instead, we selected for recording the first stably-firing and well-discriminated unit(s) (if any) discovered along a given electrode track, as revealed by audio and oscilloscope monitoring of the recorded voltage. We rejected all cortical sites wherein somatosensory stimulation or ICMS had indicated response fields extending beyond the left forelimb (e.g., to the face or leg). We did not require the remaining sites to demonstrate sensorimotor responses, only that they be within topographical regions convexly bounded by sites that did. No other criteria were applied to units; i.e., we treated all as potentially task-related. We only consider units recorded at ICMS electrode sites (33 sites in G1, 13 in G2; Overduin et al., 2012); these comprise 94 units (83 in G1, 11 in G2). Signals were preamplified at unity gain by a headstage located ∼5 cm from the electrodes, and then passed to an amplifier for amplification (10,000×) and bandpass-filtering (600–6000 Hz, 2nd-order filter with roll-off on both ends) before digitization (Neuralynx, Inc., Tucson, Arizona). Spikes were identified online when electrode voltages exceeded a manually-set threshold, and 1.1-ms waveforms (including the threshold-crossing moment at 0.26 ms into the waveform), sampled at 30 kHz, were stored to disk. Spike times and other event times were recorded together for later synchronization of behavioral, EMG, and neural data. Offline, single units were identified based on spike waveform features and interspike intervals (ISIs), using manual clustering with MClust (MClust-3.4, A. D. Redish et al.) and custom routines written in MATLAB (MathWorks, Natick, Massachusetts), for this and the following analysis. Particular care was taken to ensure that waveform features and firing rates remained relatively constant over the recording span of each accepted unit. The 94 units had mean firing rates between 1.0 and 58.0 Hz, and signal-to-noise ratios between 2.4 and 24.3 (deCharms et al., 2009), with 1% of ISIs <1 ms.

### **PREPROCESSING**

After preprocessing as described in Overduin et al. (2008), grasping-related EMG data were integrated over 9-ms (G1) or 11-ms (G2) bins, and normalized to each channel's maximum integrated EMG over object conditions. The EMG data of 40 trials within each of the 25 object conditions were aligned on the time of object removal from the first well, cropped to a 100-sample window around this event [G1:(−0.35 : +0.55) s, G2:(−0.5 : +0.6) s], and then averaged over trials within each object condition. (The different window and integration times for the two animals were chosen based on their different movement latencies; Overduin et al., 2008.) ICMS-evoked EMG data were integrated over a (+0.025 : +0.150) s window relative to ICMS onset for each of 7 trains applied at each stimulation site, and normalized by the same factors as the grasping-related EMG data (Overduin et al., 2012). With regards to the neural data, mean firing rates were computed on a within-object basis, using all trials fully spanned by a unit. Trials were time-aligned on the time of object removal from the origin well (as for the grasping-related EMG data). Subsequent analysis was restricted to a (−0.4 : +0.5) s (G1) or (−0.55 : +0.55) s (G2) window around this event, i.e., to the same windows as for the grasping-related EMG data minus a fixed 50-ms delay (Morrow and Miller, 2003; Schieber and Rivlis, 2007; Stark et al., 2007). Within this window, each unit's spikes were summed within 9- or 11-ms bins (again, for consistency with the grasping-related EMG data binning). The units' mean firing rate profiles within each object condition were then smoothed by convolution with a 50-ms Gaussian kernel. (This smoothing, evident in **Figure 3B**, did not have a qualitative effect on the summary results of **Figure 4**.) All preprocessing was done in MATLAB.

### **SYNERGIES**

Non-negative matrix factorization (NNMF) was used to identify a set of synchronous synergies underlying each monkey's muscle patterns (Lee and Seung, 1999; Tresch et al., 1999). There were two sets of muscle patterns to consider: ICMS-evoked EMG data *I* (including the data evoked at the sample site in **Figure 2A**, *right*), and grasping-related EMG data *G* (shown averaged over object conditions in **Figure 3A**, *bottom*). Matrix *I*(*e*, *s*, *l*) pools together the activity in *E* EMG channels evoked by each of *S* stimulation trains delivered at each of *L* ICMS locations (Overduin et al., 2012). *I* thus has dimensionality 15 × 7 × 33 (G1) or 19 × 7 × 13 (G2). Matrix *G*(*e*, *t*, *o*) specifies trial-averaged EMG activity over the same *E* channels, but across each of *T* time points in each of the *O* object conditions. *G* thus has dimensionality 15 × 100 × 50 (G1) or 19 × 100 × 50 (G2). Data *I* and *G* were independently reconstructed as combinations of *n* = 1, ··· , *N*<sup>I</sup> or *n* = 1, ··· , *N*<sup>G</sup> synergies, respectively, where both *N*<sup>I</sup> and *N*<sup>G</sup> were evaluated up to the number of EMG channels *E*, i.e., 15 (G1) or 19 (G2). Each ICMS-evoked or grasp-related synergy *ν*<sup>I</sup> *<sup>n</sup>*(*e*) or *ν*<sup>G</sup> *<sup>n</sup>* (*e*) is a vector of length *E* capturing a unique balance of activation across the EMG channels. In reconstructions, each synergy was multiplied by a scalar, non-negative weighting coefficient *w*I *<sup>n</sup>*(*s*, *l*) or *w*<sup>G</sup> *<sup>n</sup>* (*t*, *o*). These coefficients could vary both within conditions (i.e., over stimulation trains *s* or time samples *t*) and across conditions (i.e., over ICMS locations *l* or objects *o*). The reconstructions can be expressed as:

$$I(\boldsymbol{e},\ s,l) = \sum\_{n=1}^{N^{\mathrm{I}}} \boldsymbol{w}\_n^{\mathrm{I}}(\boldsymbol{s},\ l) \cdot \boldsymbol{\nu}\_n^{\mathrm{I}}(\boldsymbol{e})\tag{1}$$

$$\mathbf{G}(e, \ t, \ o) = \sum\_{n=1}^{N^{\mathcal{G}}} \boldsymbol{\nu}\_n^{\mathcal{G}}(t, \ o) \cdot \boldsymbol{\nu}\_n^{\mathcal{G}}(e) \tag{2}$$

For a given *N*<sup>I</sup> or *N*G, the algorithms iteratively updated the structures *ν*<sup>I</sup> *<sup>n</sup>* or *ν*<sup>G</sup> *<sup>n</sup>* and coefficients *w*<sup>I</sup> *<sup>n</sup>* and *w*<sup>G</sup> *<sup>n</sup>* until the total reconstruction error, *R*2, increased by less than 0.001 over 10 iterations. The algorithms were repeated five times for each extraction; the set of synergies with the highest EMG variation explained was selected for further analysis. Dimensionalities (*N*<sup>I</sup> and *N*G) were chosen by applying a threshold of *R*<sup>2</sup> = 95% (Overduin et al., 2012).

### **MATCHING**

Synergies *ν*<sup>I</sup> *<sup>n</sup>* were compared and matched to synergies *ν*<sup>G</sup> *<sup>n</sup>* using a greedy search procedure (Tresch et al., 1999; Overduin et al., 2012). For all *N*<sup>I</sup> × *N*<sup>G</sup> possible pairs of ICMS-evoked and grasping-related synergies, we first computed dot products (e.g., 6 × 8 = 48 dot products, for G2. The pair of ICMS-evoked vs. grasping-related synergies with the highest dot product was defined as the best-matching pair. The pair with the highest dot product among the remaining (*N*<sup>I</sup> − 1) × (*N*<sup>G</sup> − 1) pairs (e.g., 5 × 7 = 35 pairs, for G2) defined the second-best match. This process was repeated until all synergies in one set had been paired (e.g., over min(6,8) = 6 times for G2). We then used Monte Carlo simulation to assess the significance of each match. We repeated the greedy search algorithm 10,000 times for each monkey, after first randomly shuffling EMG channel identity each time. (For G2, for instance, this involved finding 10,000 × 48 = 480,000 dot products.) We then compared the highest (best-matching) dot product between *actual* ICMS-evoked and grasping-related synergies with the distribution of highest dot products from the 10,000 comparisons of *shuffled* synergies. If the former value exceeded the 95th percentile of the distribution of latter values, we took the match as significant at *p* < 0.05. We then repeated this comparison for the second-best actual synergy pair vs. the distribution of second-best shuffled pairs, etc.

averaged over ICMS trains.) Of G2's ICMS sites, this synergy was most strongly evoked at site XI, in MI.

### **ANALYSIS**

In reconstructing the ICMS-evoked EMG vectors at a given site, the synergy *ν*<sup>I</sup> *<sup>n</sup>* most "evoked" at the site was the one with the largest weighting coefficient *w*<sup>I</sup> *<sup>n</sup>*, averaged over ICMS trains (**Figure 2A**, showing results for site XI in **Figure 1**, *right*). To determine which synergy was instead most "encoded" by a unit, the unit's firing rate profile was correlated against the task-related synergy scaling profiles *w*<sup>G</sup> *<sup>n</sup>* , and the largest positive correlation was identified. Results were insensitive to the use of linear (Pearson) or rank (Kendall or Spearman) correlation. To determine which task-related synergy was most evoked at a recording site, the site's ICMS-evoked EMG patterns were decomposed into combinations of synergies, which in turn were matched to taskrelated synergies (as described above). Over units, we counted the frequency at which the most-encoded synergy was the same as the most-evoked synergy at the electrode, defined as above. The chance frequency of such matches was 1/6 = 17%, as the analysis considered the 6 muscle synergies both evoked by ICMS (in the case of G2, **Figure 2A**, *left*) and observed in the task data (**Figure 2B**). Results of this χ<sup>2</sup> test were evaluated for significance at a *p* < 0.05 threshold.

# **RESULTS**

In a recent study (Overduin et al., 2012), we examined ICMSevoked movements and muscle activity in two rhesus macaques ("G1" and "G2"). EMG data were recorded from 15–19 electrodes chronically implanted in muscles of the shoulder, arm, and hand. As we reported (Overduin et al., 2012), ICMS within MI and dorsal and ventral premotor (PMd and PMd) cortex (**Figure 1**) appeared to drive the forelimb toward an invariant, site-specific posture, and at the same time to replace voluntary muscle activity with an invariant, site-specific tonic EMG pattern (as in Griffin et al., 2011).

After aggregating ICMS-evoked EMG data over multiple stimulation sites, these pooled data could be decomposed into

### **FIGURE 2 | Microstimulation-evoked muscle activity and**

**reconstruction by synergies. (A)** The EMG activity evoked during the first 0.15 s of ICMS at site XI from **Figure 1**, shown both in gold-colored profile (*far right*) and as the integrated mean ± SE of this activity over stimulation trains (*second from right*). (The vertical 20-μV scale bars to the right of the EMG profiles give the relative level of activation in each channel.) The mean activity could be reconstructed by six ICMS-evoked synergies *ν*<sup>I</sup> *<sup>n</sup>* (*left*), with the largest weighting given to *ν*<sup>I</sup> <sup>4</sup>. **(B)** Of eight muscle synergies *<sup>ν</sup>*<sup>G</sup> *<sup>n</sup>* derived from an object grasping behavior, synergies *ν*<sup>G</sup> <sup>1</sup> -*ν*<sup>G</sup> <sup>6</sup> could be matched in structure to ICMS-evoked synergies *ν*<sup>I</sup> 1-*ν*<sup>I</sup> 6.

combinations of a reduced set synchronous synergies using NNMF (Lee and Seung, 1999; Tresch et al., 1999). Each of the ICMS-derived synergies *ν*<sup>I</sup> *<sup>n</sup>* (*n* = 1,. . . , *N*<sup>I</sup> ) captures a pattern of synchronous firing over muscles. We found that *N*<sup>I</sup> = 6 (G2) or 7 (G1) synergies were sufficient to reconstruct ≥ 95% of the variability in the ICMS-evoked EMG data (Overduin et al., 2012). Such data reconstruction is exemplified in **Figure 2A** for G2's ICMS site XI, in MI. The figure depicts the ICMS-evoked activity at this site in the form of: the time-varying, trial-averaged EMG signal (*far right*); an ICMS-evoked EMG vector integrating this per-ICMS activity over time (*middle*); and as a vector sum of ICMS-derived synergies, each scaled by the weighting coefficient *w*<sup>I</sup> *<sup>n</sup>* (*left*). For this site, the ICMS-evoked EMG activity was dominated by synergy *ν*<sup>I</sup> <sup>4</sup>, as shown.

We also studied a manual behavior performed by these subjects (Overduin et al., 2008, 2012), in which they had to reach for an object presented in a well, grasp it, and then carry it to the opposing well. To elicit a variety of hand postures and forces, the objects included 25 spheres, cubes, and cylinders of different dimension. Pooled over multiple days, the EMG data could be decomposed into combinations of grasping-related synergies *ν*G *<sup>n</sup>* (*n* = 1,. . . ,*N*G). We found that *N*<sup>G</sup> = 8 (G2) or 10 (G1) synergies were sufficient to reconstruct ≥ 95% of the variability in the grasping-related EMG data (Overduin et al., 2012). For each monkey, 6 of these synergies could be matched uniquely to one of the ICMS-derived synergies (**Figures 2A** vs. **2B**) using a greedy search procedure. **Figures 2B**, **3A** illustrate the synergies found for one animal and their reconstruction of average EMG activity.

Synergies could provide the animal with a mechanism to continuously and efficiently control its muscles, by specifying a task-specific amplitude coefficient time course *w*<sup>G</sup> *<sup>n</sup>* for each of the synergies (**Figure 3A** *top* vs. *bottom*). The involvement of motor cortex in this control is circumstantially suggested by the correspondence in synergy structure whether these are derived from ICMS-evoked or voluntarily-generated EMG data (**Figures 2A** vs. **2B**). Here, we sought more direct evidence for the role of motor cortex. In particular, we looked for evidence that the muscle synergies evoked by ICMS (and matched to those generated voluntarily) were also encoded by single motor cortical units near the ICMS electrodes.

A positive example of such a correspondence is shown in **Figure 3A** (*top*) and **Figure 3B** (*top*), for monkey G2. The MI unit shown in **Figure 3B** is at site XI, the location of which is highlighted in **Figure 1** (*right*), and the ICMS-evoked EMG activity of which is shown in **Figure 2A**. The single unit recorded at this site exhibited a strong burst of activity just prior to the time when the object was retrieved from its origin well, and was largely quiet during the following carry and release movements

(**Figure 3B**). (These modulations were captured in the unit's firing rate, derived from spike counts binned at 50 ms and smoothed with a 50-ms Gaussian kernel.) Of all the grasping-related synergies *ν*<sup>G</sup> <sup>1</sup> -*ν*<sup>G</sup> <sup>6</sup> , the one whose activation profile was most positively correlated with that of the unit was *ν*<sup>G</sup> <sup>4</sup> (**Figure 3A**, *top*), which appeared to capture the coactivation of forearm flexor muscles (**Figure 2B**). And of all the ICMS-evoked synergies *ν*<sup>I</sup> <sup>1</sup> <sup>−</sup> *<sup>ν</sup>*<sup>I</sup> <sup>6</sup>, it was the matching synergy *ν*<sup>I</sup> <sup>4</sup> that was most dominant in reconstruction of the evoked EMG activity.

How consistently did the muscle synergy most encoded by a unit match the synergy most evoked by ICMS at the same site (as in the foregoing example)? For this analysis we simply counted the number of MI, PMd and PMv units for which the grasping-related synergy most strongly encoded by the unit matched the primary muscle synergy evoked by ICMS at the electrode. (This analysis is restricted to those ICMS sites at which units were also recorded; these are highlighted in **Figure 1**.) As shown in **Figure 4**, the actual frequencies were significantly higher than the 1/6 = 17% chance frequency [26 of 94 units, or 28%; χ<sup>2</sup> (1) = 6.82, *p* < 0.01], as was true for G1 alone [22/83 = 27% of units; χ<sup>2</sup> (1) = 4.82, *p* < 0.05] and supported by a trend among G2's smaller population of units [4/11 = 36% of units; χ2 (1) = 2.56, *p* = 0.07]. (Note that the "1st most encoded synergy" in this plot is not necessarily *ν*<sup>G</sup> <sup>1</sup> , but instead whichever synergy *ν*<sup>G</sup> *<sup>n</sup>* was most strongly correlated with the unit's firing profile, e.g., *ν*<sup>G</sup> <sup>4</sup> for the unit shown in **Figure 3B**. Similarly, the "most evoked synergy" is whichever synergy *ν*<sup>I</sup> *<sup>i</sup>* was dominant in reconstruction of the EMG activity evoked at the unit's electrode. In general, *ν*<sup>I</sup> <sup>3</sup>, *<sup>ν</sup>*<sup>I</sup> <sup>4</sup>, and *<sup>ν</sup>*<sup>I</sup> <sup>5</sup> appeared to be most commonly evoked by ICMS, being the synergies "evoked" at 18/83, 28/83, and 20/83 of monkey G1's units, respectively, and by 3/11, 3/11, and 4/11 of G2's units.)

While the qualitative pattern for G2 followed very closely that of G1 (**Figure 4**), it likely failed to reach significance due to insufficient sampling of units (11 units, vs. 83 for G1). Also, these trends were weakened by inclusion of PMv units. Considering *only* MI and PMd units, the fraction of cases in which the grasping-related synergy most strongly encoded by a unit matched the synergy most strongly evoked by proximal ICMS was significantly higher than 1/6 in G2 [4/10 = 40% of units; χ<sup>2</sup> (1) = 3.27, *p* < 0.05] as well as G1 [20/73 = 27% of units; χ<sup>2</sup> (1) = 5.04, *p* < 0.05], and in both animals combined [24/83 = 29%; χ<sup>2</sup> (1) = 7.47, *p* < 0.01].

While the synergies *ν*<sup>I</sup> *<sup>i</sup>* extracted from ICMS-evoked EMG activity did cluster non-uniformly on the cortical surface (Overduin et al., 2012), we observed no tendency of the subset of sites at which similar synergies were both encoded and evoked to be topographically grouped on the cortical surface.

# **DISCUSSION**

Our results suggest that direct stimulation of patches of motor cortex generates synergistic combinations of muscle activity that are weighted toward synergies encoded at the stimulating electrode. A common motor cortical substrate appears to be activated both by endogenous currents during voluntary behaviors and by exogenous currents introduced by electrical microstimulation. In either case, these currents appear to modulate the amplitude, not structure, of a shared set of downstream muscle synergies. Our earlier work also suggests that each synergy (and the posture reached through its tonic activation) may be represented by a non-uniform map over the motor cortical surface (Overduin et al., 2012).

It may be contested that electrical currents injected into the CNS via microstimulation elicit non-natural patterns of neural activity (even if the downstream muscle activity is resolvable into well-organized movements and natural muscle synergies). The biological basis of ICMS, and even the extent of cortex activated, remain poorly understood (Butovas and Schwarz, 2003; Tolias et al., 2005), despite recent studies using single-cell recording, behavioral methods, functional magnetic resonance imaging (Tehovnik et al., 2006), and two-photon calcium optical imaging (Histed et al., 2009). Moreover, these investigations of ICMS have largely been limited to more conventional, short-train and/or subthreshold ICMS rather than the form used here. In applying ICMS in motor cortex, researchers have traditionally used short train durations (typically 25–70 ms) and sub- or per-threshold currents (typically 10–60μA; e.g., Asanuma and Rosén, 1972; Sato and Tanji, 1989; Donoghue et al., 1992) in order to map the overt response of cell populations including corticomotoneuronal cells. Such studies neither sought nor reported convergent movement responses of the sort found here and by others (Graziano et al., 2002a,b), and have naturally aroused some controversy (Strick, 2002).

Together with Overduin et al. (2012), our findings indicate that movements, muscle patterns, and cortical activations evoked by relatively long-train ICMS can be related to those observed in natural behavior. Indeed, it can be argued that a physiologically realistic model of motor activation *requires* ICMS with relatively long trains, as well as suprathreshold currents and intermediate pulse frequencies (optimally 80–140μA and 80–140 Hz, in the case of primate MI forelimb-area ICMS; Van Acker et al., 2013). Graziano and coworkers (Graziano et al., 2002a) have emphasized that ∼500-ms stimulation trains approximate the time scale of natural movements like primate reach and grasp (Georgopoulos et al., 1986; Reina et al., 2001). Even in spinalized amphibians, in whom convergent force patterns were first observed (Mussa-Ivaldi et al., 1990; Giszter et al., 1993; Loeb et al., 1993), these movements were evoked by similarly long-train (typically 300-ms) intraspinal microstimulation. Researchers using ICMS to study oculomotor and somatosensory systems have also used relatively long trains. For instance, 400-ms trains were applied to the arcuate sulcus to replicate the time scale of typical regular head movements (Freedman et al., 1996), and 500-ms trains were applied to primary somatosensory cortex to mimic a tactile stimulus (Romo et al., 1998). While trains shorter than ∼500 ms elicit truncated movements when delivered to primate MI (Graziano et al., 2002a; Van Acker et al., 2013), in Overduin et al. (2012) we show that even 150 ms of ICMS yields convergent movements with a predictable equilibrium point and synergistic muscle activation.

For the majority of units we sampled, firing rate profiles were more correlated with muscle synergies *other* than the mostevoked synergy (**Figure 4**, non-shaded bars). There are many ways to account for these other units. For example, units may encode other continuously-controlled quantities like the position, velocity, or acceleration of extrinsic effectors or intrinsic joints, or the forces underlying these kinematics, or the muscle contractions determining these dynamics (Carmena et al., 2003; Morrow and Miller, 2003; Schieber and Rivlis, 2007; Stark et al., 2007; Velliste et al., 2008; Ganguly and Carmena, 2009; Hochberg et al., 2012; Collinger et al., 2013). Motor cortical encoding of purely extrinsic variables may be unlikely based both on theoretical grounds (Todorov, 2000; Paninski et al., 2004) and on empirical evidence that neurons' intrinsic muscle fields are more stable than their tuning to extrinsic dimensions like hand direction (Morrow et al., 2007), with such tuning observed to fluctuate even during stable within-session behavior (Carmena et al., 2005; Rokni et al., 2007; cf. Chestek et al., 2007). However, the purpose of this report was not to determine whether motor cortical units encode muscle activity, synergy coefficients, or any other motor variable. Instead, we simply sought to test whether the units were more likely than chance to encode those synergies evoked by ICMS at the same electrode.

Another way to account for units which appeared to encode different synergies than those evoked by nearby ICMS is to recognize that the number of units affected by ICMS current far exceeds the handful sampled at an electrode. With the stimulation currents needed to elicit complex movements (≤ 100μA in this report), cortex is no doubt activated far outside the 100-μm radius assumed for ∼ 10μA stimulation (Ranck, 1981). Such suprathreshold trains are likely to recruit other cells beyond those immediately next to the stimulating electrode through synaptic connections (Histed et al., 2009), and indeed may be required to transsynaptically activate non-direct connections between motor cortex and the spinal cord (Strick, 2002).

Another possibility to be explored in future work is that populations of units may specify not only continuous variables like synergy amplitude, but also sequential recruitment of muscles (e.g., via synergies) in the form of discrete "motor programs" (Keele, 1968; Polit and Bizzi, 1978, 1979; Georgopoulos et al., 1983). Elsewhere these programs have been referred to as "timevarying synergies," in distinction from the "synchronous synergies" discussed in the present report (Kargo and Nitz, 2003), and have been extracted from muscle data using a modified form of NNMF (d'Avella et al., 2003, 2006; Overduin et al., 2008). A dynamical systems approach may also be appropriate for capturing time-varying patterns within population-level neural and muscular data (Churchland et al., 2012).

Besides studies of voluntary neural and muscular dynamics, microstimulation studies are also broadly consistent with the idea of discrete movement encoding. Intraspinal microstimulation of sufficient duration generates forces that tend to drive a limb to particular postures or through sequences of postures, in frogs (Mussa-Ivaldi et al., 1990; Giszter et al., 1993; Kargo and Giszter, 2000), cats (Lemay and Grill, 2004) and rats (Tresch and Bizzi, 1999). Sufficiently-long ICMS trains applied to mammalian motor cortex (Ward, 1938; Graziano et al., 2002a, 2004a, 2005; Stepniewska et al., 2005, 2011; Ramanathan et al., 2006) can evoke complex multijoint forelimb behaviors with multiple phases (such as reach, grasp, then retraction) and invariant endpoints. The stimulation-evoked convergent forces (Giszter et al., 1993; Kargo and Giszter, 2000), invariant endpoints (Graziano et al., 2004a; Graziano and Aflalo, 2007), and bell-shaped speed profiles (Graziano et al., 2005) all tend to overlap with motions and postures found in subjects' natural behavior. Together with the present results, these studies indicate that microstimulationevoked movements are not artifactual, and indeed can provide insights into natural movement planning.

# **ACKNOWLEDGMENTS**

This work benefited from the assistance of Margo Cantor, Charlotte Potak, Jinsook Roh, and Sylvester Szczepanowski. The project was supported by fellowships to Simon A. Overduin from the Canadian Institutes of Health Research and Dystonia Medical Research Foundation, and a National Science Foundation grant EFRI-1137267 to Jose M. Carmena.

# **REFERENCES**


Giszter, S. F., Mussa-Ivaldi, F. A., and Bizzi, E. (1993). Convergent force fields organized in the frog's spinal cord. *J. Neurosci.* 13, 467–491.


circuits underlying motor behavior. *Proc. Natl. Acad. Sci. U.S.A.* 108, E725–732. doi: 10.1073/pnas.1109925108


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 17 November 2013; accepted: 09 February 2014; published online: 05 March 2014.*

*Citation: Overduin SA, d'Avella A, Carmena JM and Bizzi E (2014) Muscle synergies evoked by microstimulation are preferentially encoded during behavior. Front. Comput. Neurosci. 8:20. doi: 10.3389/fncom.2014.00020*

*This article was submitted to the journal Frontiers in Computational Neuroscience. Copyright © 2014 Overduin, d'Avella, Carmena and Bizzi. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Motor cortical regulation of sparse synergies provides a framework for the flexible control of precision walking

# *Nedialko Krouchev and Trevor Drew\**

*Groupe de Recherche sur le Système Nerveux Central, Département de Physiologie, Université de Montréal, Montréal, QC, Canada*

### *Edited by:*

*Andrea D'Avella, IRCCS Fondazione Santa Lucia, Italy*

### *Reviewed by:*

*Simon Giszter, Drexel Med School, USA Paul S. G. Stein, Washington University in St. Louis, USA*

### *\*Correspondence:*

*Trevor Drew, Groupe de Recherche sur le Système Nerveux Central, Département de Physiologie, Université de Montréal, CP 6128 Succ. Centre-ville, Montréal, QC H3C 3J7, Canada e-mail: trevor.drew@umontreal.ca*

We have previously described a modular organization of the locomotor step cycle in the cat in which a number of sparse synergies are activated sequentially during the swing phase of the step cycle (Krouchev et al., 2006). Here, we address how these synergies are modified during voluntary gait modifications. Data were analysed from 27 bursts of muscle activity (recorded from 18 muscles) recorded in the forelimb of the cat during locomotion. These were grouped into 10 clusters, or synergies, during unobstructed locomotion. Each synergy was comprised of only a small number of muscles bursts (sparse synergies), some of which included both proximal and distal muscles. Eight (8/10) of these synergies were active during the swing phase of locomotion. Synergies observed during the gait modifications were very similar to those observed during unobstructed locomotion. Constraining these synergies to be identical in both the lead (first forelimb to pass over the obstacle) and the trail (second limb) conditions allowed us to compare the changes in phase and magnitude of the synergies required to modify gait. In the lead condition, changes were observed particularly in those synergies responsible for transport of the limb and preparation for landing. During the trail condition, changes were particularly evident in those synergies responsible for lifting the limb from the ground at the onset of the swing phase. These changes in phase and magnitude were adapted to the size and shape of the obstacle over which the cat stepped. These results demonstrate that by modifying the phase and magnitude of a finite number of muscle synergies, each comprised of a small number of simultaneously active muscles, descending control signals could produce very specific modifications in limb trajectory during locomotion. We discuss the possibility that these changes in phase and magnitude could be produced by changes in the activity of neurones in the motor cortex.

**Keywords: locomotion, motor cortex, voluntary gait modifications, cat, synergy**

# **INTRODUCTION**

The question of modularity within the locomotor control system has a long history. From the original proposition of Graham Brown (1911, 1914) that two half-centres are responsible for generating locomotion grew a body of experimental work to determine the neuronal basis of the alternating rhythmical activity observed during locomotion (Jankowska et al., 1967a,b; Lundberg, 1981). This work took a new and, initially, controversial, direction when Grillner postulated the existence of a central pattern generator (CPG) for locomotion (Grillner and Zangger, 1975, 1979; Grillner, 1981). Subsequent studies have shown that this CPG in the spinal cord has the intrinsic ability to generate an intricate and complex pattern of locomotor activity (Pearson and Rossignol, 1991). From the original concept of the CPG as a single neuronal entity (or network) has grown the idea that the CPG may in fact be considered as a series of modules that are interconnected and provide the capacity to produce a rich behavioral repertoire, involving flexible and coordinated activity around multiple joints.

A key evolution in this respect was the formulation of the concept that the CPG is comprised of a number of unit pattern generators (Grillner, 1981; Grillner and Wallen, 1985). Grillner suggested the existence of four pairs of unit pattern generators responsible for producing the rhythmical pattern of activity in the hip, knee, ankle, and toe muscles. He proposed that relatively simple command signals to modify the connections between the hip and knee modules, for example, could easily change the pattern of muscle activity required for forward progression into that required for walking backwards.

In the turtle spinal cord, such an organization of interconnected modules has been demonstrated electrophysiologically to generate several different scratch patterns depending on the location of the offending stimulus (Berkowitz and Stein, 1994; Stein and Smith, 1997). Additional evidence for modularity in the turtle has come from the studies of deletions, in which bursts of activity in some populations of interneurones and their associated motor pools are absent in some scratch cycles while others persist (Stein and Daniels-McQueen, 2002; Stein, 2008). However, the neuronal mechanisms leading to modularity within the mammalian spinal cord are more difficult to study because of the size and complexity of the neuronal networks. As in the turtle, some evidence for modularity, although not necessarily for independent burst generators, has also come from the study of deletions (Grillner and Zangger, 1979; Jordan, 1991; Smith et al., 1995; Lafreniere-Roula and McCrea, 2005; Zhong et al., 2012). These deletions are observed in flexor and extensor motoneurones and, in many cases, have no effect on cycle timing, supporting views that rhythm generation (defining the basic rhythmicity of the step cycle) and pattern generation (defining the spatio-temporal organization of the muscle bursts within the step cycle) are separate (Lennard, 1985; Koshland and Smith, 1989). These results have led to a particularly interesting computational model of locomotion consisting of a rhythm generator and distinct pattern generators (Rybak et al., 2006; McCrea and Rybak, 2007; Zhong et al., 2012).

As an extension of these ideas, and particularly on the basis of Grillner's unit CPG model, we suggested (Drew, 1991a) that the existence of such modules could provide a substrate by which the motor cortex could exert a precise control over the magnitude, duration and relative timing of specific muscles groups while at the same time ensuring that these changes are appropriately integrated into the locomotor cycle. Similar ideas were incorporated into computer simulations designed to determine how supraspinal command signals might interact with spinal unit CPGs in a human model of locomotion (Taga, 1995, 1998).

More recently, the idea of modularity within the spinal circuits has been developed to incorporate the idea of muscle synergies. This development, which owes much to the studies of Bizzi and his collaborators (Bizzi et al., 1991; Tresch et al., 1999, 2002; d'Avella et al., 2003; d'Avella and Bizzi, 2005), posits that the nervous system produces complex movements by combining the activity of a limited number of synergies, 4–6 in most studies. In brief, synergies are defined mathematically, by using decomposition methods such as non-negative matrix factorization (NNMF), as a matrix of weights that differentially activate *all* of the muscles involved in producing a movement. By modifying the magnitude and the phase of activity of each synergy, a wide range of movement patterns can be produced (Tresch and Jarc, 2009). Such synergies have been described during locomotion, scratching and swimming in the frog (Giszter et al., 1993; Saltiel et al., 2001; Cheung et al., 2005), during postural compensation to perturbation in cats and humans (Ting and Macpherson, 2005), during human locomotion (Ivanenko et al., 2004; Lacquaniti et al., 2012) as well as in reaching movements in primates and humans (d'Avella et al., 2006, 2008; Overduin et al., 2008). Muscle synergy analysis has also been used to study the deficits in movement after stroke (Cheung et al., 2009, 2012; Clark et al., 2010). In some of these experiments, and particularly those performed in spinal animals, these synergies have been suggested to form the basis of unit burst generators of the type proposed by Grillner (Hart and Giszter, 2004; Cheung et al., 2005).

In our own studies of synergies (Krouchev et al., 2006; Drew et al., 2008a), we have taken a different approach in which synergies are defined using more classical, physiological methods. In our approach, a synergy is defined as a group of muscles that are synchronously activated such that the period of activity during locomotion begins and ends simultaneously in all muscles in a synergy. By applying a custom clustering algorithm we were able to identify 11 synergies in the forelimb of the cat during unobstructed locomotion, with nine of these occurring during the swing phase. These synergies differed from those identified most commonly by using mathematical decomposition methods (see above) in that they were both more numerous and contained only a small proportion of the total number of muscle bursts recorded. We refer to these as sparse synergies to differentiate them from those synergies comprised of all periods of muscle activity (see preceding paragraph). Our suggestion, as in our original work on this issue (Drew, 1991a), is that the motor cortex modulates these sparse synergies in order to modify limb trajectory during locomotion (Drew et al., 2008a,b).

The question remains, however, as to *how* these synergies are modified during voluntary gait modifications and whether the modifications in synergies that are observed are compatible with the discharge activity of neurones in the motor cortex recorded during the same behaviors. To answer that question, we have extended our previous analysis to examine how synergies are modified when cats step over obstacles. We consider two conditions. First we examine how synergies are modified in the lead (when a given limb is the first to step over an obstacle) and the trail (second to step over the obstacle) condition, for which our previous studies showed differential patterns of muscle activation related to the biomechanical requirements of the gait modification (Drew, 1993; Lavoie et al., 1995; McFadyen et al., 1999). Second, we examine how the synergies are modified when the cats step over obstacles of different shapes and sizes (Drew, 1988, 1991b, 1993). We then discuss the implications of these results for the cortical control of voluntary gait modifications.

# **METHODS**

The methods used in this study were either identical or similar to those previously detailed (Krouchev et al., 2006) and will only be briefly described here. All animals used in this study were originally used for other studies (Drew, 1993; Stapley and Drew, 2009; Yakovenko et al., 2011). Details concerning the methods used for animal training and implantation methods in the different animals used in the current manuscript can be found in those manuscripts. Details concerning the analytical methods are found in Krouchev et al. (2006).

# **TRAINING**

Cats were trained to walk on a treadmill at 0.35–0.5 m.s<sup>−</sup>1, first in the absence of any obstacles (unobstructed locomotion) and then in the presence of 1 or 2 obstacles attached to the treadmill belt. In these experiments, the obstacles always moved at the same speed as the treadmill. The obstacles were visible to the cat a minimum of two steps before the step over the obstacle. Most data were obtained from studies in which the cat stepped over a cylindrical obstacle of 10 cm cross-section. Additional data were obtained during steps over a high obstacle (13 cm high, 1 cm wide), a small high obstacle (7.5 cm high, 1 cm wide) and a wide obstacle (2.5 cm high, 15 cm wide) (see Drew, 1993 and **Figure 8E**).

# **SURGERY**

After the cats were trained, they were prepared for surgery under general anesthesia and in aseptic conditions following the protocols approved by the animal ethics committee at the Université de Montréal and according to the recommendations of the Canadian Council for the Protection of Animals. Details for different animals can be found in the references above. Heart rate and temperature were monitored during surgery and anesthesia level was regularly verified by testing for a corneal reflex. Solutes were administered throughout the surgery and analgesics were provided prior and subsequent to the surgery.

In all animals, pairs of Teflon-insulated, braided stainless steel wires were implanted into multiple muscles of the left forelimb. Wires were led sub-cutaneously to a 51 pin connector attached to the cranium of the cat. Although other surgical procedures were practiced on these cats, they are not reported here, as they are not relevant to the data illustrated. Details of these supplementary procedures can be found in the published manuscripts (see above). All data presented in this manuscript were obtained several weeks following the surgical procedures.

### **PROTOCOL**

Electromyographic (EMG) activity was normally recorded along with cell activity during locomotion. Data were generally recorded during unobstructed locomotion and then during periods of locomotion when cats stepped over 1 or 2 obstacles attached to a treadmill belt. EMG data were filtered at 100 Hz to either 450 or 500 Hz and amplified by a factor of 1–10 K to produce a signal of ∼1 volt. Data were digitized at 1 KHz. Video recordings of the locomotion were obtained simultaneously with the EMG data and synchronized with the aid of a SMPTE (Society of motion pictures and television engineers) digital time code.

## **ANALYSIS**

The video recordings were initially screened for sections of stable locomotion in which the cat maintained its position in the center of the treadmill. As such, we chose sections of data in which EMG activity was uniform from cycle to cycle with no evidence of intermittent changes. We then manually marked the onset and offset of each burst of EMG activity within these sections (**Figure 1A**) and classified them as either steps over the obstacle with the left leg leading or trailing, or as unobstructed locomotion (either no obstacles attached to the treadmill or at least two steps before a step over an obstacle; see Drew, 1993). A step cycle was defined as the time between two successive periods of activity in either the brachialis (Br; **Figures 1**–**7**) or the cleidobrachialis (ClB; **Figure 8**) muscle, each of which becomes active at approximately the onset of swing. The step cycle was normalized to unity (1.0) and the onsets and offsets of periods of activity in each selected muscle were then expressed as a proportion, or phase, of the step cycle. Measured events occurring after the onset of the ClB or Br were given positive values while those occurring before ClB or Br onset were given negative values. Data were plotted in phase space in which the phase of offset of a given burst was plotted as a function of its onset (**Figures 1C**, **2A**).

### *Associative clustering*

Trial data points are formed pairwise in the phase plane (*x* = *onset*, *y* = offset) and one or more bursts may be *associated* in clusters as described in Krouchev et al. (2006). It is important in this respect to realize that the data points for each muscle

**FIGURE 1 | Data selection and clustering algorithm. (A)** Untreated data during locomotion from the brachialis (Br), Cleidobrachialis (ClB), and Biceps brachii (Bic) muscles. Upward and downward directed arrows indicate onset and offset of muscle burst activity, respectively. All activity was synchronized to the Br and burst onset activity is defined as phase = 0.0. A step cycle is defined as the time between two successive bursts (phase = 1.0). Activity in other bursts is defined with respect to Br onset. The red arrows indicate data illustrated in **(B,C)**. The green arrows define the first burst of activity in Bic [Bic(1) in **Figure 2**]. **(B)** The rectangles define 1SD of the mean onset and offset of the activity of each muscle (P1–P3, individual data illustrated in **C)**. A vector is drawn from the center of each rectangle to the vertex closest to the center of the nearest neighbor (thick lines, Sn). The vector distance (dotted line: Dn) is then calculated as described in the text. **(C)** Bivariate ellipses are drawn around the centroid of each cluster (see text). Same dataset as in **Figure 2**.

are defined as belonging to that muscle and cannot be divided among different clusters. In the clustering algorithm, it is assumed that each burst is fully described by its centroid—i.e. the mean (onset, offset) phase vector, and the associated pair of standard deviations (SD's: σ*X*, σ*Y*; assumed uncorrelated). Hence, each burst is assumed equivalent to the encompassing rectangle *[*−σ*X*, σ*X]* × *[*−σ*Y*, σ*Y]* around the centroid mean values *[X*, *Y]* (**Figure 1B**).

Sufficiently overlapping rectangles form clusters, or synergies (Krouchev et al., 2006). Thus the algorithm associates sets of data points from different muscles rather than dissociating them as do most clustering methods.

First, the individual data points are tested for outliers using Rosner's test (Rosner, 1983) with a 2 SD margin. For each burst *i*, the phase-plane vectors of mean phase:

$$P\_i = (X\_i, Y\_i) = mean \left( onset\_i, offset\_i \right)$$

and of standard deviations

$$S\_i = (\sigma X\_i, \sigma Y\_i) = std\left(onset\_i, offset\_i\right).$$

are calculated. The distance between two bursts *i* and *j*, is expressed as the phase-plane vector (**Figure 1B**):

$$D\_{\vec{\eta}} = P\_i - P\_j$$

Two bursts are considered part of the same cluster when:

$$q\_{i\bar{j}} = \max\left\{ \left( P\_{\bar{i}} \pm \mathcal{S}\_{\bar{i}} \right) D\_{i\bar{j}} + \left( P\_{\bar{j}} \pm \mathcal{S}\_{\bar{j}} \right) D\_{i\bar{j}} - ||D\_{i\bar{j}}||^2 \right\} / ||D\_{i\bar{j}}|| > 0 \tag{1}$$

where the maximum is taken over all possible combinations (**Figure 1B**), the products are the dot (scalar) vector products, and || · || is the usual Euclidean norm in the xOy phase plane.

For all possible pairs of bursts *i* and *j* (*i* = 1, 2 ..., *n*, *j* = *i* + 1,..., *n*)— where *n* is the total number of bursts to classify, Equation (1) is verified and thence a boolean *n* × *n* uppertriangular adjacency matrix **B** is obtained (**B***ij* = 1 when **q***ij* > **0**, and zero otherwise). The latter adjacency matrix is then the input to a Matlab implementation of an algorithm due to Press et al. (1992)to determine the equivalence classes—i.e., the clusters. The final cluster numbers are rearranged so that their mean onsets proceed in an ascending order.

The clustering proceeds in two sub-stages. First, we cluster the bursts coming from the first (and most elaborate and ample) subset of the data—coming from the same animal. In the second sub-stage, additional bursts (coming from additional animal data sets) are either allowed to join an existing cluster, or to form a new one. In cases in which the existing cluster has more than one member, the new candidate burst will join if it overlaps with at least two bursts from the existing cluster. It should be noted that this associative clustering method is robust to even relatively large changes in the centroids of EMGs making up a cluster. In a previous publication (Krouchev et al., 2006; supplemental information), we tested the stability of the methods by introducing random jitter to the centroids of the muscles making up a cluster. In 970/1000 cases in which each centroid was displaced by <0.6 SD, there was no change in cluster composition compared to that obtained using the actual data.

### **CONSTRAINED vs. UNCONSTRAINED CLUSTERS**

In the analysis in our original papers (Krouchev et al., 2006; Drew et al., 2008a) the algorithm was always unconstrained in that muscles were formed into clusters, and therefore synergies, simply on the basis of the rules summarized in the preceding paragraph. A similar approach was used to obtain the synergies active during unobstructed locomotion (**Figure 2**) as well as during the lead and trail conditions of the voluntary gait modifications. However, because of the addition of periods of muscle activity that occurred only during the gait modifications, and because of the relative changes in phase of some muscles, there were small changes in the composition of some of the synergies. This makes a direct comparison of changes in the phase and magnitude of synergies during all three conditions problematic. We, therefore, also used a constrained analysis in which muscles bursts were confined to the same clusters identified during unobstructed locomotion. This allowed us to directly compare the changes in phase and magnitude of synergies comprised of the same muscles in all three conditions. Further details of the approach and of its limitations are provided in the Results and Discussion.

## *Direct component analysis (DCA)*

For each cluster we derive a temporal activation profile, which is labeled a *direct* component (DC). The latter contrasts to the muscle activation components in the literature, which most often are obtained through abstract mathematical decomposition.

DCs are derived directly from the corresponding overall **(**onset, offset**)** phase statistics, which describe the periods of activity of the muscles (or bursts) forming the cluster. For each cluster, we calculate the 2 *marginal univariate* Gaussian probability density functions (pdf) for the onset and the offset phases. For each cluster *k*, the onset/offset pdf is, respectively, *N*(*Xk*, σ*Xk*) and *N*(*Yk*, σ*Yk*) and activation of the muscles forming this cluster is assumed to span the phase interval (*Xk* − 3σ*Xk*, *Yk* + 3σ*Yk*).

We further assume that the overall shape of the temporal activation profile for cluster *k* is captured by Gaussian basisfunctions. Hence its DC *uk(t)* is described by:

$$\mu\_k(t) = \begin{cases} \exp\left(-\left(t - Z\_k\right)^2 / 2\sigma\_1\right), & t < Z\_k\\ \exp\left(-\left(t - Z\_k\right)^2 / 2\sigma\_2\right), & t > Z\_k \end{cases}$$

where

$$\begin{aligned} Z\_k &= (X\_k + Y\_k)/2, \sigma\_1 = (Z\_k - X\_k)/3 + \sigma X\_k, \text{ and} \\ \sigma\_2 &= (Y\_k - Z\_k)/3 + \sigma Y\_k \end{aligned}$$

### *Cluster statistics and representation*

In this paper we use a correlated bivariate-normal model. The data scatter for burst *k* is captured by the pdf:

$$f\left(\mathbf{x},\mathbf{y}\right) \sim \exp\left(-z/2\sigma\right) \tag{2}$$

where:

$$\begin{array}{rcl} z = \mu^2 + \nu^2 - 2 \ \mu\nu\rho & \nu = 1 - \rho^2\\ \mu = (\varkappa - X\_k) \ / \sigma X\_k & \nu = \left(\nu - Y\_k\right) / \sigma Y\_k \end{array}$$

Hence, each cluster is represented by an ellipse of the form:

$$
\mu^2 + \nu^2 - 2\,\mu\nu\rho = 1 - \rho^2 \tag{3}
$$

The ellipses' skewing and orientation depend on the value of **ρ**. Krouchev et al. (2006) assumed no (x,y) correlation—i.e., **ρ** = 0. This yielded ellipses with major axes, respectively (σ*X*, σ*Y*), parallel to the coordinate axes. For the general case of non-zero **ρ**, it may be shown that the ellipse described by Equation (3) touches the ±1*SD* encompassing rectangle exactly at the points (±**1**, ±**ρ**) and (±**ρ**, ±**1**) (**Figure 1C**).

### *Statistics*

To determine whether the phase and/or the magnitude of the synergies were significantly different in each condition we used ANOVAs across the conditions (unobstructed, lead and trail). Individual One-Way ANOVA's were performed for each of the synergies using the data for the individual periods of activity of each muscle within a synergy. Tests for difference in the phase of activity were made using the onset ∼ *N*(*Xk*, σ*Xk*), the offset ∼*N*(*Yk*, σ*Yk*) and the peak phase (*Xk* + *Yk*)/2 of activity. The nullhypothesis for each ANOVA is that condition has no significant effect on the range of phase-values in the sample. For synergies that showed a significant (*p* < 0.05) effect of the locomotion condition, we performed pair-wise *t*-tests between conditions. Individual One-Way ANOVA's were also performed to determine if the magnitude of the synergies also varied as a function of condition (see Results).

# **RESULTS**

### **DATABASE**

The database for the principal analysis in this paper is based around cat RS26, as in a previous publication (Krouchev et al., 2006) examining synergies during unobstructed locomotion. Supplementary muscles in that previous publication were obtained from 3 other cats, PCM3, MC8 and RS23. Ideally, we would have used the same database in the current manuscript. Unfortunately, however, data for voluntary gait modifications were not available from all animals. We have, therefore, complemented the data from RS26 with data from cats MC29 and RS23 for the current analysis to produce a full dataset during unobstructed locomotion consisting of 27 bursts of EMG activity from 18 muscles of the forelimb, all recorded on the same side. The 10 clusters defining the muscle synergies in this dataset during unobstructed control locomotion are illustrated in **Figure 2** and the average activity of some of these muscles can be observed in **Figure 3** (black lines). Note that the clusters defined in the present analysis are very similar to those published previously (Figure 6 in Krouchev et al., 2006). The major difference is that the current dataset produced 10 clusters of activity rather than the 11 that we defined previously. This was primarily caused by the muscles in cluster #2 forming a single cluster (including EDC, LtD and TrM) rather than being divided into 2 clusters as in our previous analysis. One other difference is the presence of a very early burst of activity in the Bic, preceding activity in all other muscles. Apart from these minor differences with our previously described dataset (Krouchev et al., 2006), the essential features of these synergies are maintained. These include: the sequential pattern of activation of the synergies; the sparse nature of each synergy; the fact that a given muscle may be represented in more than one synergy and the fact that a given synergy can include muscles acting proximally together with others acting distally (e.g., EDC and TrM in cluster #2).

# **CHANGES IN PHASE OF SYNERGIES DURING GAIT MODIFICATIONS (LEAD AND TRAIL CONDITIONS)**

During the gait modifications, there were substantial changes in the magnitude and phase of some of the bursts of EMG activity as we have also described elsewhere (Drew, 1993; Lavoie et al., 1995; Drew et al., 2008a). The changes observed in selected muscles of the current dataset are illustrated in **Figure 3** for the lead and trail condition. The EMG data for all muscles are normalized in time to the average cycle duration for the original dataset in which they were recorded, and normalized in amplitude to the largest magnitude recorded from any one given muscle. As such, changes in amplitude and relative changes in phase are clearly visible in the presentation. Several points need to be emphasized. First, in both conditions, there are changes in amplitude and phase in a number of muscles. During the lead condition, these changes are primarily expressed as changes in amplitude (e.g., Br and BrR) with fewer changes in phase, although see phase delay in EDC. In contrast, during the trail condition, there are major changes in both phase and amplitude in a number of muscles. This is particularly clear in the shoulder retractor muscles (LtD and TrM) and in the muscles acting around the wrist and digits (PrT, EDC). Last, one other major difference from the control activity is the presence of periods of activity during the gait modifications (**Figure 3**, arrows) that were not present in the unobstructed condition. The clearest examples are the supplementary periods of activity in the EDC, SpD and the Tri at the onset of swing in the trail condition. There are also small changes in some of the extensor muscles. These occur after the step over the obstacle in the lead condition and before the step over the obstacle in the trail phase.

In general, most of the changes in phase of the muscles active during swing occur during the trail condition, before the onset of swing (defined here as Br onset) while the more modest phase changes in lead occur after swing onset. As discussed previously (Drew, 1993; Lavoie et al., 1995; McFadyen et al., 1999), this difference is related to the constraints of the task. During the lead condition, swing begins with the obstacle well forward of the paw and the major requirement is to lift the limb above and over the advancing obstacle. During the trail condition, swing begins with the obstacle close to the trail paw and the major requirement is to retract the limb sufficiently to ensure that the limb is lifted away and above the obstacle as it continues to advance (see Drew, 1993).

To determine the effects of the changes in the pattern of muscle activity on synergy composition, we performed the cluster

**FIGURE 2 | Synergies during unobstructed locomotion. (A)** The results of the cluster analysis are illustrated for a total of 27 periods of EMG activity recorded from 18 muscles in 3 cats (30–70 values for each muscle burst). 10 clusters, each corresponding to a muscle synergy, are identified by color and the EMG bursts in each cluster are identified by symbol and color as shown in the key to the right. The same order of muscles is used in the key of all figures except when a change in synergy composition makes it impossible. The spatial location of each cluster is illustrated by the ellipses (thick lines). EMG onsets and offsets are referenced to the onset of activity in the brachialis (Br) muscle (phase = 0.0). Negative phase onsets indicate muscle bursts active prior to the onset of activity in the Br. **(B)** the muscles used in

this study are illustrated on stick figures of the cat's forelimb taken from Crouch (1969). Clusters are organized according to the biomechanical function and the colors of the muscles correspond to the colors of the synergies. Note that colors accorded to some muscles are different from those used in Krouchev et al., 2006 because of a slightly different composition of the synergies (see text). Muscle Abbreviations: AcD, acromiodeltoideus; BrR, brachioradialis; ClT, cleidotrapezius; ECR, extensor carpi radialis; EDC, extensor digitorum communis; ECU, extensor carpi ulnaris; LtD, latissimus dorsi; LvS, levator scapularis; PaL, palmaris longus; PrT, pronator teres; SpD, spinodeltoideus; SSp, supraspinatus; Tri, triceps bacchii, long head; TriL, triceps brachii, lateral head; TrM, teres major.

muscles during the lead (left column, red traces) and the trail (right column, green traces) condition. Average activity during gait modification is superimposed onto the activity in the unobstructed condition (black traces). Data are synchronized to the onset of the Br and the duration

analysis on the datasets for the voluntary gait modifications (lead and trail conditions) using all available bursts of EMG activity including those that were present only during the voluntary gait modifications (**Figure 4**). As explained in the Methods, we used two complementary approaches to examine changes in synergy. In the first instance, we applied the clustering algorithm to the entire dataset in the lead and the trail condition in exactly the same way as we did for the unobstructed data. We refer to this as the unconstrained condition (**Figures 4A,B**). In this approach, we are effectively asking whether the same synergies are found in all three conditions studied, unobstructed, lead and trail. In the second approach, and the one that we used for all further analyses, we constrained the synergy composition during the lead and trail conditions to be identical to that observed in the unobstructed condition (**Figures 4C,D**). In this instance, we are determining how the phase of the synergies identified during unobstructed locomotion needs to be modified in order to obtain the patterns of activity observed during the gait modifications. In this second approach, we used the adjacency matrix to determine to which cluster the bursts activated only during the gait modifications were associated. This, constrained, approach has the advantage of allowing us to provide a direct comparison of the changes in phase and magnitude that occurred in each synergy during these gait modifications.

taken from cat MC29, those without an asterisk are from cat RS26. Arrows indicate bursts of activity that were not present in the

unobstructed condition.

The results obtained by using the unconstrained algorithm show that the muscles bursts observed during the voluntary gait modifications are organized into synergies that are fully consistent with those observed in the unobstructed condition. Specifically, the analysis results in the presence of a similar number of synergies as in the unobstructed condition, with the same sequential organization as in unobstructed locomotion and with each synergy being comprised of a small number of muscles (**Figures 4A,B**).

clusters formed by applying the analysis to all bursts recorded in each condition and allowing the analysis to define the resulting clusters (unconstrained). **(C,D)** analysis performed using all bursts but with the clusters constrained to have the identical composition as in the

condition, the colors describing some clusters are different from those used in **Figure 2** because of the modification of synergies detailed in the text. In the constrained condition, the colors are identical to those used in **Figure 2**.

However, there were some differences in cluster composition, compared to the unobstructed condition as can be seen by examining the legend identifying the clusters in **Figures 4A,B**. These differences are illustrated in more detail in **Figure 5** where the synergies identified during the swing phase in the unconstrained condition (thin ellipses) are compared to those obtained in the constrained condition (thick ellipses) for the lead (**Figure 5A** and trail (**Figure 5B**) conditions (see also **Table 1**).

In the lead condition, there were only minor changes between constrained and unconstrained synergies as can be seen by the overlap of the ellipses representing the synergies obtained in the two conditions (**Figure 5A**) and by inspection of **Table 1A**. Indeed, 7 of the 10 clusters showed no changes at all in the two conditions (**Table 1A**). Among those clusters that were modified in the two conditions, the largest change was the division of cluster #2 (as defined in the unobstructed condition) into 2 clusters in the unconstrained condition. This was because of a slight phase advance of the EDC(1) burst with respect to the other three muscles in the cluster. Note, however, that even for this relatively large modification in phase, the change in centroid position of the EDC (0.059) was substantially less than the distance between the centroids of clusters #1 and #2 (0.13) and between clusters #2 and #3 (0.19) in the constrained condition. In addition, the Br was included in a cluster with the PrT and the SpD instead

illustrated as thick lines and those from the unconstrained condition as thin lines. Each symbol represents the centroid of a burst of activity in a given muscle as indicated by the key for the constrained condition (to the left of each illustration). It should be emphasized that the location of the centroid of each muscle is identical for the constrained

constrained and unconstrained clusters ({). Cluster #1 in the trail condition is not displayed because of the scale used. Note that the colors identifying each muscle and each ellipse are the same for the constrained and unconstrained conditions and identical to those used in **Figure 2**. As a result, muscles with the same colors sometimes belong to different clusters in the unconstrained condition (see key).

of being separate as in the unobstructed condition. This cluster in the unconstrained condition was only displaced by 0.008 from the location in the unconstrained condition (**Table 1A**). Despite these small changes in the unconstrained condition, it is evident that the basic elements of the synergies identified in the unobstructed condition are equally visible in the step over the obstacle.

Similar qualitative changes were seen in the trail condition as can be seen in **Figure 5B**. Again, the muscles comprising cluster **Table 1 | Phase differences between the centroids of constrained and unconstrained clusters.**


*The tables show the differences in the phases of the centroids of the constrained and unconstrained clusters as illustrated in Figures 4, 5.*

#2 in the unobstructed condition were divided, this time into 2 groups of 2. Both clusters, however, remain well separate from the next cluster in the sequence. For example, the differences of the two unconstrained clusters (#2 and #3) from constrained cluster #2 was 0.057 and 0.046, respectively (**Table 1B**), while the distance between constrained clusters #2 and #3 was 0.17, three times as large. (The distance between cluster #1 and cluster #2 was 0.29). There were also changes in the other clusters, but in all cases, these involved the transfer of one muscle from one cluster into an adjacent cluster without change in the overall sequential nature of the activation of both individual muscles and clusters. In other words, in both the unobstructed condition and during the step over the obstacle, activation of e.g., LvS, follows activation of ClB and ClT, which follow activation of Br, which follows activation of PrT and SpD . . . etc.

Overall, the underlying principle of sequential activation of sparse synergies is well supported by the analysis (see Discussion). In particular, the analysis serves to illustrate that the synergies observed during the voluntary gait modifications are fully consistent with those observed during the unobstructed condition. More specifically, they serve to illustrate that the changes in the pattern of EMG activity that results in the modified limb trajectories required to step over the obstacles can be produced by changing the phase of activity of the synergies that underlie the EMG pattern during unobstructed locomotion.

To directly compare the phases of activity of the synergies in the lead and trail conditions with those identified during the unobstructed condition, we transformed the clusters obtained by using the constrained analysis into DCs, centered on the centroids of each cluster (**Figures 6A–C**) as described in the Methods (see also Krouchev et al., 2006; Drew et al., 2008a). These DCs, for each of the three conditions, are shown superimposed in **Figure 6D**, which allows the changes in phase of activation of each group of synergies to be synthesized in an economical manner. Changes in amplitude are ignored in this display but are addressed below. During the lead condition (red traces) the changes in phase are relatively minor apart from the phase delay in clusters 7 and 8, which are responsible for the wrist dorsiflexion and the preparation of the limb for landing. There is an increase in the duration of clusters #5 and #6, which include the Bic and the ClB (see **Figure 2**), and which are responsible for transporting the limb over the obstacle. In the trail condition, the phase advances observed in the activity of individual muscle bursts (**Figure 3**) have a substantial effect on the phase activity of a number of different synergies, and particularly those active prior to the onset of swing (i.e., Clusters #1–3). Note that treating cluster #2 as separate clusters as suggested by **Figures 4A,B**, **5** does not change the basic conclusion of a phase advance of muscle activity in this phase of the step cycle.

A One-Way Anova for phase of onset, phase of offset and peak phase (three different tests) as a function of locomotion condition showed significant differences for all combinations except 1 (phase of offset for synergy #9). Pair-wise comparisons of the

three conditions likewise showed significant changes of at least one of the measured variables (onset, offset, peak) for all three comparisons (control-lead, control-trail, and lead-trail). In the control-lead comparisons, phase of onset was unchanged for four

**(A)**, trail **(B)** and unobstructed **(C)** conditions. **(D)** direct components showing the changes in phase of the synergies in the lead (red

> synergies while phase of offset was changed for all 10 synergies, reflecting the fact that most changes occur after swing onset in this condition. In the control-trail comparison, phase of offset was unchanged in three synergies. Overall these results emphasize

as suggested by the analysis in the unconstrained condition

(**Figures 4**, **5**).

that gait modifications require changes in the phase of activation of all synergies.

# **CHANGES IN AMPLITUDE OF SYNERGIES DURING GAIT MODIFICATIONS (LEAD AND TRAIL CONDITIONS)**

We measured the magnitude of the EMG activity of each burst of activity used in the analysis for the unobstructed, lead and trail conditions by integrating each 1 ms bin of activity between the measured onset and offset of the period of activity. These values were then divided by the duration of the burst (in ms) to give a value that represents the level of activity of the muscle burst in each condition, independent of any changes in duration of the muscle activity.

The results from this analysis are shown in **Figure 7A**, with each measured EMG burst being classified according to the synergy to which it is assigned. As expected from the data illustrated in **Figure 3**, muscles such as the Br, ClB and ECR(2) show a large increase in activity (up to 200%) during the lead condition while others, such as the PrT(1), and the TrM(1) show their largest changes (up to 400%) during the trail condition. In most cases, these changes were significantly different (asterisks) from control levels in both the control and lead conditions. Furthermore, in many cases, muscle amplitudes were significantly different between lead and control. Note that changes in the magnitude of activity cannot be displayed for muscle bursts that were inactive in the unobstructed situation.

In general, changes in magnitude of muscle bursts (relative to unobstructed locomotion) for clusters comprised of several muscles (e.g., clusters 2, 5, 8, and 10) were of the same sign and of similar magnitude. For example, in cluster #2 all of the muscles in the cluster showed a significant increase in activity during the trail condition. In clusters #3 and #5, each muscle shows a significant increase in the lead condition while in synergy #10, the magnitude of muscles in the lead condition is either unchanged or shows only a small increase. Only in cluster #8, does one muscle show a significant decrease in activity during the lead condition while the other five muscles either show an increase or are unchanged. A One-Way ANOVA for the changes in amplitude as a function of condition (control, lead, and trail) showed significant changes for all 10 synergies. Comparison between the pairs of conditions showed significant changes in at least seven synergies in all three conditions.

These results indicate that the voluntary gait modifications require changes in the magnitude of muscles in most synergies and that these changes in magnitude are different for the lead and trail conditions.

To provide an illustration of how control signals would need to change in both magnitude and phase in order to produce gait modifications, we used the average value for the EMG bursts in each cluster to scale the amplitude of the DCs that we illustrated in **Figure 6**. The results from this procedure are illustrated in **Figure 7B** which serves to illustrate that the gait modifications require coordinated changes in both amplitude and phase of the synergies. In the lead condition, the changes in phase or amplitude in the first two synergies (mainly responsible for paw lift) were small. Subsequently, there were changes primarily in amplitude in synergies 3–6 that are responsible for flexing the limb above the obstacle and transporting it forwards. Subsequently in synergies 7 and 8, responsible for wrist dorsiflexion and then placement of the paw on the substrate, there are increases in amplitude and a phase delay of synergy 7. Minimal changes are observed in the subsequent period of extensor muscle activity. In contrast, in the trail condition, there are both phase advances in the first 3 clusters and an increase in magnitude in clusters #2 and 3. There are minimal changes in phase or magnitude in clusters 4–6 although the slight decrease in duration of cluster #5 leads to a phase advance of the activity in clusters 7 and 8.

# **EFFECT OF OBSTACLE DIMENSIONS**

Changing the shape and size of an obstacle changes the limb trajectory as the cat steps over it (Drew, 1988). This implies that the relative magnitude, duration and timing of bursts of EMG activity in each synergy have to be precisely modulated to produce the appropriate limb trajectory. This is illustrated in **Figure 8** with a different cat (MC8) for which data were available from a more limited number of muscles but for a variety of obstacles (**Figure 8E**).

**Figure 8A** illustrates EMG activity from representative muscles during control locomotion (black trace) and during steps over a thin high obstacle (**Figure 8Eiii**) in the lead (red trace) and trail condition (green trace). As in the example illustrated in **Figure 3**, there were characteristic changes in activity with phase advance of the TrM burst prior to swing onset in the trail condition, together with an increase in EMG amplitude of the ClB and phase delay of the burst of the EDC in the lead condition. Performing the cluster analysis on the EMG activity during unobstructed locomotion (**Figure 8B**) provides the same sequential pattern of activity that was obtained with our full database, despite the smaller number of muscles bursts that were available in this cat. However, because of the smaller number of EMG bursts in this database we identified only seven synergies.

The changes in phase and amplitude in the DCs constructed from these clusters are illustrated in **Figure 8C** for the lead condition and in **Figure 8D** for the trail condition. For all of the obstacles for which data were analysed these changes are similar in form to those obtained with our main database synthesized in **Figure 7B**. Anovas showed a significant effect of both condition and obstacle on the phase and magnitude for many of the synergy. In the lead condition, the effect of obstacle was significant for synergies #3, 6, and 7. In the trail condition, there was an effect of obstacle on all seven synergies.

In general, the change in phase and magnitude were smallest for the small high obstacle (green) and largest for the very high (mauve) and round (orange) obstacles. In cluster #3, in the lead condition, for example, there is a large increase in magnitude for the very high obstacle, presumably because of the need for shoulder retraction to raise the limb above the obstacle. In clusters #4 and 5, there are large changes in amplitude and duration for the 3 largest obstacles and relatively smaller ones for the small high obstacle. In cluster #6 there are large changes for the very high and round obstacles that require the paw to be raised above the obstacles and correspondingly large phase delays in cluster #7, especially for the wide obstacle. In the trail condition, changes in amplitude of cluster #3 are particular clear for the very high obstacle and to a lesser extent in the small high and the round

### **FIGURE 7 | Change in magnitude of the synergies. (A)** the mean magnitude (+SD) of the activity in each muscle during lead (red bars) and trail (green bars) is expressed as a percentage of the activity in the unobstructed condition (100%, thin horizontal line). The magnitude is normalized to the duration of the burst (see text). Asterisks beside the bars indicate muscle bursts that were significantly different from control (*p* < 0.05). Horizontal

lines with an asterisk indicate muscle bursts that were significantly different between the lead and trail conditions. **(B)** the average change in the level of activity from each of the synergies is used to modify the magnitude of the direct components, which now provide a representation of both the change in phase and the change in magnitude of the synergies that is required to step over the obstacles in the lead and trail condition.

**of different size and shape. (A)** averaged activity of 5 representative muscles during the lead (red), trail (green) and unobstructed (black) conditions. Data are synchronized to the onset of activity in the ClB. **(B)** Cluster analysis for the database of 14 muscle bursts available from this cat. **(C,D)** direct component

each synergy when the cat steps over obstacles of different sizes and shapes in the lead **(C)** and trail **(D)** conditions. The insets amplify the traces representing synergy #6 in the lead condition and synergy #3 in the trail condition. The color code for the obstacles is represented by the color of the cats in **(E)**.

obstacle. The changes in amplitude are least for the wide obstacle (which was also the least high). These changes are needed to retract the limb above the obstacle. The changes in duration are minimal except for the ECR (cluster #6). This latter is presumably because of the greater excursion required to bring the wrist into dorsiflexion in the trail condition (Lavoie et al., 1995).

# **DISCUSSION**

We have argued in previous publications (Krouchev et al., 2006; Drew et al., 2008a) that small modifications in the activity of the sparse synergies defined by our analysis could provide a flexible manner of producing modifications in limb trajectory during walking. The results obtained in the current study support this premise by clearly demonstrating that small changes in the phase and magnitude of the synergies identified during unobstructed locomotion can produce the changes in limb trajectory required to step over obstacles of different sizes and shapes, both in the lead and the trail condition.

# **SYNERGY DEFINITION AND COMPOSITION**

Our definition of a synergy is a group of muscles that become active simultaneously and remain active for the same period of time. One of the results of this definition is that our synergies include only a limited number of muscles (hence sparse synergies) compared with those defined by mathematical decomposition that generally contain all muscles contained in the dataset (see below). Indeed, some of our synergies contain only a few, or in some cases, only one muscle (e.g., Cluster #7 in **Figure 2**). As we have argued previously (Krouchev et al., 2006), this is simply the result of recording a limited number of muscles and one would expect other wrist and digit extensors to be included in this synergy, if recorded. A correlate of using our definition of a synergy is that we define a relatively large number of synergies (10–11) compared to the 4–6 that are normally identified by the more commonly used decomposition methods (d'Avella et al., 2003; Tresch and Jarc, 2009). It is, however, interesting to note that synergies with similar compositions to those obtained using our methods can be obtained if the number of synergies defined by the linear decomposition methods is increased to 10 or 11 (Krouchev et al., 2006). Similarly, synergies comprised of a limited number of muscles can be obtained by other methods by including only those that show significant differences from lower amplitude weights (Hart and Giszter, 2004).

In this study we have taken the approach of defining our synergies on the basis of the muscle activity patterns generated during a single behavior, unobstructed locomotion, and then defining how these synergies must be modified to produce other behaviors. This differs from the more common approach in which synergies are defined on the basis of all behaviors under study (see e.g., d'Avella and Bizzi, 2005. In other words, most studies ask what parameters (weighting and phase) applied to all muscles in the dataset will result in all behaviors under study. In contrast, the question that we are asking is, how does one need to modify the magnitude and phase of the synergies obtained during unobstructed locomotion in order to obtain the patterns observed during voluntary gait modifications?

One of the results of defining synergies on the basis of a single behavior is the possibility that in addition to magnitude and phase, synergy composition may also be modified in other behaviors (see e.g., Ivanenko et al., 2005; Kargo and Giszter, 2000, see below). Indeed, we observed some changes in synergy composition in the unconstrained condition of our analysis (**Figures 4A,B**, **5**). This raises the question of whether the voluntary gait modifications are really the result of modifying synergies that define unobstructed locomotion or whether there is a real need to modify synergy composition as suggested by **Figures 4A,B**. In some cases, such as the division of cluster #2 in the unobstructed condition into two clusters (#3 and #4) in the voluntary gait modifications, there is reason to think that this might indeed reflect a true need to modify the synergies. In most other cases, however, such as the merging of clusters 4 and 5, it is probable that this reflects the variability in the measurements of the onset and offset of EMG activity together with the modifications of the adjacency table because of the addition of bursts of activity not present in the unobstructed condition. Certainly in the absence of additional independent methods to define synergies this must remain an open question. However, what is clear is that the basic sequential activation of the muscles bursts and resultant synergies observed in the unobstructed condition is maintained in the voluntary gait modifications. For example, all 4 of the muscles comprising cluster #2 in the unobstructed condition are activated subsequent to the activation of the Bic(1) and prior to the activation of the Prt and SpD. Moreover, all 4 of these muscles show similar changes in magnitude during both the lead and the trail condition (**Figure 7A**). Lastly, as illustrated in **Figure 6D**, treating these muscles as 2 clusters does not alter the basic facts that the muscles active at this time are phase advanced compared to the unobstructed condition. Similar considerations hold for the other synergies. For example, whether the Br forms part of a synergy with the PrT (**Figure 5A**) or is in a separate synergy (**Figures 2**, **5B**) does not change the fact that activity in both of these muscles occurs after the period of activity in the EDC and the TrM and before the period of activity in the ECR. Thus the basic concept of a sequential activation of a (relatively) large number of sparse synergies is fully supported by the data while the exact composition of each synergy must remain open to further investigation. Nonetheless, by maintaining the same number of synergies and the same composition, this approach has the advantage of allowing us to directly address some of the properties of the descending signals that would be required to modify gait by modifying the phase and magnitude of muscle synergies.

### **CORTICAL CONTROL OF SYNERGIES**

Our results demonstrate the existence of a relatively large number of synergies, each of which is comprised of a small number of muscles (sparse synergies) and each of which is active during only a small part of the step cycle. These synergies are activated sequentially during the step cycle and show well defined modifications during the gait modifications.

In the lead condition, the major change is the increased duration and the phase delay of the synergies active at or after swing onset. These changes lead to the increased swing duration and the delayed onset of the activity of the muscles related to paw placement. There is also an increase in the magnitude of the synergy that leads to increased elevation over the obstacles. These changes in phase and magnitude are scaled to the size and the shape of the obstacle. Wider obstacles lead to progressively increased durations and longer phase delays. Higher obstacles lead to smaller changes in duration and smaller phase delays but increased amplitude in synergies related to limb flexion. In contrast, during the trail condition, the major changes are the phase-advance and increased amplitude of the synergies active before swing onset, particularly those active at paw lift and responsible for lifting the paw above the obstacle. The data therefore illustrate how differentially modifying the activity of different synergies can lead to adaptive changes in limb trajectory that allow for avoidance of obstacles covering a relatively wide range of sizes and shapes. There seems little doubt that the phase and magnitude of these synergies could be modified to produce an infinite range of limb trajectories that could be adapted to any specific obstacle that was encountered during locomotion.

In the most parsimonious reasoning, one might expect that the activity of these synergies would be modified by control signals with similar characteristics. This is exactly what is seen in recordings from the cat motor cortex during locomotion (Drew, 1993; Drew et al., 1996, 2008a). During voluntary gait modifications, a large population of motor cortical cells (86% in Drew, 1993) show a modification in activity (increase or decrease) consisting of changes in magnitude, duration and/or the relative timing (phase of onset) of the discharge (Drew, 1993; Drew et al., 1996). Sub-populations of motor cortical cells recorded during locomotion, and particularly during gait modifications, are each activated during only a small part of the step cycle in a sequential pattern (Lavoie and Drew, 2002; Drew et al., 2008b). Examples of these activity patterns during the lead condition taken from the database used in a previous publication (Drew, 1993) are summarized in **Figure 9A** which serves to illustrate both the discrete, phasic nature of the activity patterns observed in different pyramidal tract neurones (PTNs) in the motor cortex and the sequential nature of this activation pattern. PTN1, for example, is active at the onset of the swing phase as the paw is lifted from the support surface at the onset of swing. This discharge occurs at the same time as the activity observed in muscles such as the TrM and the LtD. Subsequently, other PTNs become active coincident with the elbow flexion, characterized by activity in Br (PTN2); then with the wrist dorsiflexion, characterized by activity in the ECR (PTN3) and finally during paw placement, characterized by activity in muscles such as the EDC, TrM and the LtD (PTN4).

Particularly striking in our database (Drew, 1993) were cells whose discharge activity covaried with the first period of activity in the TrM (e.g., PTN1 in **Figure 9A**). As illustrated in **Figure 9B** these cells generally show a phase advance together with a major increase in the magnitude of their discharge in the trail condition, in the same manner as does the activity in synergy #3 (**Figures 6**, **7**), containing the TrM. Importantly, discharge activity in these cells is scaled for different obstacles in the same way as the muscle synergy.

Overall, the data from our previous recording studies suggest that there are multiple sub-populations of PTNs, each of which regulates the activity of a specific sparse synergy. We suggest that there should be as many sub-populations of cortical neurones as there are synergies. However, we emphasize that some of the neurones that we recorded did not correlate with the activity of any of the muscles recorded. Such neurones might be related to controlling higher-level aspects of the movement or may even relate to more general kinematic variables (see Morrow et al., 2007).

The changes in phase and magnitude of cell discharge frequency of the overall population of task-related PTNs suffice to modify appropriately activity for those muscles in a synergy that are active in all three conditions (lead, control, and trail). A question arises, however, as to how muscles active only during the voluntary gait modifications are controlled. This is a particularly important consideration for those muscles that are strongly activated only in the trail condition, e.g., the Tri(1) burst. There are several possible ways in which this burst could be controlled. First, it is possible that cells that contribute to the production of activity in the extensor muscles during stance would discharge an additional burst of activity in the swing phase during the trail condition and contribute to activity both during swing and stance. However, an examination of the database of cells used in previous publications (Drew, 1993; Drew et al., 1996) showed little evidence for this pattern of activity. A second possibility is that there is a separate population of cells that discharge only, or primarily, in the trail condition and are responsible for the production of these additional bursts of activity. Again, there was little evidence for this in our population. A third possibility is that the discharge activity in the population of cells that influence activity in e.g., the TrM(1) is sub-threshold to produce activity in e.g., the Tri(1) during the unobstructed and lead conditions. The large increase in activity observed in these cells during trail, however, would be sufficient to produce a (non-linear) increase in the excitation of this muscle. Such a non-linear effect might be facilitated by a change in afferent input produced by the modified movement. It is interesting to note, for example, that low intensity stimulation of peripheral afferents during the swing phase of locomotion evokes strong responses in the Tri, despite the lack of any natural activity in the motoneurones at this time (Drew and Rossignol, 1987).

In addition to issues of the characteristics of the control signal, one also has to consider the anatomical bases for the proposed control system. If a given synergy is controlled as a unit by activity in a small homogeneous subpopulation of PTNs, then one would expect one or both of the following conditions to be met: (1) given that there are no monosynaptic connections from the motor cortex to motoneurones in the cat (Illert et al., 1976), PTNs should project to interneurons in the spinal cord that selectively activate the muscles identified as belonging to a single synergy (**Figure 10A**) and/or (2) cells in the motor cortex with more restricted projections to a subset of the muscles identified as belonging to a single synergy are connected by intra-cortical connections (**Figures 10B,C**). General evidence for both of these propositions exists. The anatomical studies of Alstermark (Tantisira et al., 1996) and the electrophysiological studies of Lundberg et al. (1987) emphasize that interneurons within the spinal cord branch to activate groups of proximal and distal muscles. In addition, Shinoda has very elegantly shown that individual corticospinal axons branch to multiple levels of the cervical spinal cord (Shinoda et al., 1976, 1986; Futami et al., 1979). Together, this provides two mechanisms by which small populations of cortical neurones could activate muscle synergies.

Interestingly, Hart and Giszter (2010) have used spike triggered averaging (STA) to demonstrate that the muscles activated by a given interneurone in the frog spinal cord correlate strongly with the weighting matrix of individual motor primitives, or synergies.

sequentially (filled histogram) as the cat steps over the obstacle. (Figure adapted from Drew et al., 2008b). **(B)** We illustrate activity of a PTN (different

> Furthermore, in the primate, STA demonstrates that individual corticomotoneuronal neurones in the motor cortex can influence the activity of multiple muscles including those acting at proximal and distal joints (McKiernan et al., 1998; Griffin et al., 2008).

magnitude and phase of the PTN and the TrM. The data for these illustrations

is taken from a previous publication (Drew, 1993).

Similarly, microstimulation of the motor cortex, particularly during locomotion (Armstrong and Drew, 1985) simultaneously activates both proximal and distal muscles. At the cortical level, there are abundant references to show the richness of the corticocortical connections between both adjacent and more distant regions of cortex (Huntley and Jones, 1991; Schneider et al., 2002; Capaday et al., 2009, 2011; Smith and Fetz, 2009). In addition, the motor cortex receives abundant input from premotor cortex and from the posterior parietal cortex (Ghosh, 1997; Andujar and Drew, 2007), which might also serve to coordinate activity between different subpopulations of PTNs.

Missing from the experimental data in the mammal, however, is the information as to whether the muscles linked anatomically or functionally in the studies mentioned above correspond to the synergies identified by our, or other, studies. In addition, there is no evidence of a direct link between the discharge pattern of individual cells and the muscles of the synergy that is activated or influenced by its discharge.

# **COMPARISON WITH OTHER MODELS OF MODULARITY**

Our model shares the same basic concept proposed by Bizzi (see Introduction for references), namely that a complex series of behaviors can be produced by the differential combination of a finite number of synergies. This base concept has been tested and confirmed in a large number of behaviors as listed in the Introduction. However, the majority of these models, whether based on synchronous or time-varying synergies (d'Avella et al., 2003), propose a small number of synergies, typically 4–6, although sometimes fewer, to control a wide range of complex movements. Moreover, the control signals, represented by the waveforms resulting from the mathematical decomposition, especially those obtained using principal component analysis (PCA), are generally quite broad and occupy a large proportion of the step cycle (e.g., Lacquaniti et al., 2012) or the movement phase (e.g., Overduin et al., 2008). These control signals activate muscles according to a weighting matrix. As such, all muscles are activated to varying amounts by all control signals and the movement produced is the sum result of the muscle activation although generally a small number of muscles has relatively large weights in each synergy (see e.g., Kargo and Nitz, 2003; Hart and Giszter, 2004). Moreover, the calculation of the synergies is generally performed by including as wide a range of movements as possible to ensure that the resultant synergies are capable of reproducing a large behavioral repertoire. It is generally considered that these signals are then transformed at the level of the spinal cord to produce the appropriate periods of muscle activity (see e.g., Ting, 2007 for a schema for postural control). Some support for this view of motor control comes from recent experiments by Overduin et al. (2012) who showed that the EMG responses evoked by long trains of stimulation (150–500 ms) at distributed points in the primate motor cortex could be reduced to three synergies. Moreover, the synergies evoked by microstimulation corresponded to the synergies defined by EMG activity during reach and grasp movements.

In contrast, our approach and resulting conceptual model differs in two important considerations from those of the more commonly used methods. The first is that we use a more classical method of defining a synergy (see above) and place no limit on the number of synergies that can be defined. During locomotion, this results in the identification of 10 or 11 synergies rather than the 4–6 more commonly used. Moreover, the large number of synergies and the fact that a single muscle burst cannot be included in more than one synergy results in the sparse synergies defined in this manuscript and in Krouchev et al. (2006). These synergies occupy only a small part of the step cycle and are compatible with a control mechanism by which each synergy is modulated by a specific sub population of cortical neurones as defined in the previous section. The second difference is that we define our synergies during only a single behavior, locomotion. This is based on the premise that the neural circuits in the spinal cord primarily evolved to be optimal for producing the basic pattern of locomotion. As such, we propose that the motor cortex (and other descending pathways) act via these circuits, adapting their activity to the behavior as required. In some cases, as demonstrated here for voluntary gait modifications, modifying the phase and magnitude of these synergies is sufficient to produce changes in behavior. In other behaviors, however, the challenge might well be to modify these synergies by changing both the number and the composition of these synergies (see also Kargo and Giszter, 2000, 2008). This contrasts with the idea of a limited number of immutable synergies that can be used to produce a wide range of behaviors (d'Avella and Bizzi, 2005; Overduin et al., 2008).

Our results can also be compared to other studies and in other tasks that have specifically studied the contribution of motor cortical discharge to the control of synergies. Kargo and Nitz (2003), for example, also showed motor cortical cells that were sequentially activated in rats trained to reach. They used independent component analysis (ICA) to identify a number of synergies active throughout a reach. Cross-correlation analyses showed that a majority of task-related motor cortical cells showed a significant correlation with only a single independent component (IC). When cross-correlated with the activity of individual muscles, neurones were significantly correlated with many, although not all of the muscles included in a synergy. Miller (Holdefer and Miller, 2002; Morrow et al., 2009) also suggested the existence of a number of muscle synergies during a reaching task, with each synergy controlled by separate populations of motor cortical neurones. These synergies were relatively stable across task (Morrow et al., 2009). In our own work (Yakovenko et al., 2011), using similar methods to those in this manuscript, we have also shown sparse synergies during reaching movements that are similar to those observed during locomotion and we have suggested that PTNs in the motor cortex control the limb trajectory during reaching in the same manner that we propose for voluntary gait modifications.

The model that we propose is clearly most similar to the original modular concepts proposed by Grillner (1981). As illustrated in Grillner and Wallen (1985), we propose that descending pathways act via modules in the spinal cord to modify locomotion. We further propose that the detailed motor commands observed in sub-populations of motor cortical neurones (Drew, 1993; Drew et al., 1996, 2004, 2008b) provide a means for exerting specific control over muscle activity while ensuring that the resulting gait modifications are fully integrated into the base locomotor rhythm (Drew, 1991a; **Figure 10D**). However, it should also be noted that in the majority of these experiments, again including our own, the synergies that are identified are multi-articular and not confined simply to a single joint as in the original model of Grillner (1981).

It could be argued that the synergies that we are defining in the intact cat emerge from the concerted activity of a unit CPG of the type produced by Griller, moulded by descending and peripheral afferents to produce our more fractionated, multi-joint modules. Indeed, a recent study by Markin et al. (2012), using methods similar to those described here, found some differences

**(A)** a single subpopulation of PTNs connects with spinal interneurons (ins) that project to the motoneurones (mns) of all muscles in a synergy. **(B)** Two subpopulations of PTNs, linked by intracortical connections, and discharging simultaneously during locomotion connect with different populations of spinal interneurons, each of which innervate only part of the total muscle synergy. **(C)** Same principal as **(B)** but providing the possibility of more fractionation. Dotted lines in **(A–C)** indicate weaker connections to interneurons and motoneurones that are not part of the synergy directly modified by the motor sub-populations of PTNs with spinal modules comprising the CPG. Each module will activate the motoneurones of a given synergy. Note that in this illustration we have separated the rhythm generator from the pattern generator elements as in the model by Rybak et al. (2006) and as supported by the results of our microstimulation studies (Rho et al., 1999). **(D)** is modified from Drew (1991a); Drew et al. (2008b). Abbreviations: E, extensor part of rhythm generator; F, flexor part of rhythm generator; in, interneurone; mn, motoneurone; PTN, pyramidal tract neurone; Sn, synergy #n.

in the synergies identified in the hindlimb of intact cats and those identified during fictive locomotion in decerebrate cats. These differences were primarily observed in muscles acting around two joints suggesting that the expression of the final pattern of activity in such muscles is modified by peripheral input. In this respect, it is important to realize that the patterns of activity observed in spinal animals are strongly modulated by peripheral input (Pearson and Rossignol, 1991; Rossignol et al., 1993; Lemay and Grill, 2004; Saltiel and Rossignol, 2004a,b; Cheung et al., 2005).

# **SYNERGIES AS A UNIT OF CONTROL OF GENERAL MOTOR ACTIVITY**

Even in cats a large number of forelimb movements do not involve the simple sagittal pattern of intralimb coordination that we have so far examined and in primates, and especially in humans, the patterns of limb movement become more and more flexible. This is especially true when we consider grasping movements that involve both arm movement and control of the hand. Indeed, the extent to which control of hand movements may be explained by synergies is quite controversial (Brochier et al., 2004; Theverapperuma et al., 2006; Kutch et al., 2008; Overduin et al., 2008). How then does the concept of a control of movement by synergies, at least those of the type that we propose, lend itself to a flexible control of limb movement throughout a wide behavioral range?

The answer to this question lies to some extent in whether one considers muscle synergies as a concept that simplifies motor control, as is generally assumed (see Tresch and Jarc, 2009) or one that is the result of an evolution in which more recently developed descending pathways must act through spinal circuits that have been conserved (see Krouchev et al., 2006; Giszter et al., 2007; Giszter and Hart, 2013). As mentioned above, we favor the second possibility. We suggest that with the parallel evolution of the nervous system and the musculoskeletal system, spinal circuits became adapted to provide more flexible and agile locomotor limb movements, culminating in reaching and ultimately, in primates, reaching and grasping movements (Georgopoulos and Grillner, 1989). In this view, as movements became more complex, the challenge for the nervous system was to produce flexible movements that are independent of the more stereotypical arm movements observed during locomotion. Indeed, from an evolutionary viewpoint one might speculate the development of a hierarchical control system that evolved together with the increasing flexibility required in the control of movement. Such a hierarchy is frequently discussed in the locomotor literature (Rossignol, 1996) but less frequently with respect to voluntary movements (see Ting, 2007; Roh et al., 2011).

During locomotion, it is well established that circuits in the spinal cord generate a rhythm that contains details about the pattern of muscle activity (see Introduction). These spinal circuits are then subject to modification by brainstem and cortical inputs. The brainstem pathways are suggested to regulate the level of muscle activity and posture, for example, during walking uphill (Orlovsky, 1972; Drew et al., 2004). This general control of muscle activity is facilitated by the fact that the axons of neurones in the brainstem pathways, the reticulo- and vestibulo-spinal tracts, have diffuse termination patterns that influence muscle activity around multiple joints, and in multiple

limbs (Matsuyama et al., 1988; Drew and Rossignol, 1990a,b). Moreover, the signals recorded from reticulospinal neurones during unobstructed locomotion (Drew et al., 1986; Matsuyama and Drew, 2000) and during voluntary gait modifications (Prentice and Drew, 2001) do not show the same level of fractionation as signals from the motor cortex. Indeed, many cells in the reticulospinal system show broad patterns of modulation that are more compatible with the waveforms identified by many of the decomposition studies than those observed in the motor cortex. Moreover, rather than specifying the exact patterns of motor activity that need to be produced, we have suggested that the signals from these pathways are integrated with the rhythmical signals in the spinal cord to modify the production of many of the more stereotypical patterns of behavioral activity, including locomotion (Drew et al., 2004). In some respects, therefore, the activity and connectivity of brainstem pathways resembles that postulated for synergies in general, and especially those implicating the brainstem (Roh et al., 2011). Nonetheless, it should be emphasized that there is little evidence for a limited subset of discharge patterns among reticulospinal cells during locomotion as might be expected if they represented a small range of synergies. Moreover, it should also be noted that synergy studies generally concentrate on the activity within a single limb whereas many neurones in the reticulospinal pathways appear to be involved in coordinating activity in two, or more, limbs.

At a higher level of the hierarchy, the motor cortex (and the red nucleus) provides a more specific level of control of locomotion. During unobstructed locomotion, the contribution is likely to be one of step-by-step regulation of the step cycle superimposed on the control exerted at lower levels of the nervous system. A contribution at this level is supported by the fact that motor cortical cells, including PTNs are modulated during unobstructed locomotion [reviewed in Drew et al. (1996)] and that microstimulation of the motor cortex can modify both the pattern and the timing of the step cycle (Armstrong and Drew, 1985; Rho et al., 1999). The major contribution of the motor cortex, however, is in the control of precise locomotor movements on the basis of visual information (Drew, 1988, 1993; Beloozerova and Sirota, 1993). As developed in this manuscript, we propose that the motor cortex produces gait modifications by altering the timing, duration and relative timing of muscle synergies. As suggested above, these signals are likely to be mediated by the same neuronal circuits responsible for the generation of the locomotor rhythm and the pattern of locomotion. This allows a specific control over the level of activity in the different synergies, thus adapting activity to the specific requirements of the task while still integrating this into the step cycle (**Figure 10D**). In our model, control of voluntary gait modifications does not require the production of new synergies but rather modification of the synergies already present in unobstructed locomotion. This should be compared with the results of Ivanenko et al. (2005) in which stepping over an obstacle by human subjects required the addition of a new principal component that was responsible for explaining >20% of the overall variance in the EMG patterns.

At the highest level there is a need to control movements that are non-stereotypic. This might involve breaking apart synergies or the addition of new synergies. For example, during corrective manoeuvres, Kargo and Giszter (2000) have shown there is a need to modify synergy composition to correct for perturbations. In other conditions, for example when making relatively simple movements, there might be a need to activate 1 or 2 sparse synergies independently of others. Other movements may require separation of synergies, for example, uncoupling activity in wrist and elbow muscles. This may be achieved by activating only a part of the population of PTNs contributing to muscle activity in a given synergy (e.g., **Figure 10B**). Additionally, it might require inhibiting some of the motoneurones that would normally be activated by a given sub-population of PTNs. Indeed, it has been suggested that the development of the motor cortex and, in primates, the corticomotoneuronal system, provides the ability to generate more fractionated movements, in part by inhibiting some synergies and activating others (Lemon and Griffiths, 2005; Drew et al., 2008a). In other movements, particularly those involving the fingers, the number of patterns of motor cortical recruitment makes it difficult to argue that movement is controlled by synergies. Poliakov and Schieber (1999), for example, failed to find any evidence of broad groups of motor cortical neurones controlling finger movements and instead reported a highly diverse pattern of activity suggesting control by a distributed network.

# **REFERENCES**


*J. Neurophysiol.* 92, 1770–1782. doi: 10.1152/jn.00976.2003


# **SUMMARY**

The data illustrated in this manuscript are compatible with a view that motor cortical cells contribute to the control of locomotion by modulating the activity of a limited number of sparse synergies. We suggest that a relatively simple control system can serve to control limb trajectory under a wide range of situations simply by modifying the phase and magnitude of the period of activity in a limited number of functionally segregated populations of PTNs. In addition, Yakovenko et al. (2011) showed that similar synergies are observed during reach as during locomotion. Indeed, the sequential organization of sparse synergies in unobstructed locomotion, stepping over obstacles and even reaching, is strongly suggestive of a relationship to prime mover muscles and their agonists. These relationships, in turn, are activated subject to biomechanical (kinematic, dynamic) constraints and to existing neural circuits provided by subcortical and spinal neural patternand rhythm- generating circuitry.

# **ACKNOWLEDGMENTS**

This work was supported by a CIHR operating grant (MOP 9558) to T. Drew, a New Emerging Team Grant in Computational Neuroscience from the CIHR, and an infrastructure grant from the FRSQ. We thank Drs. Elaine Chapman and John Kalaska for their comments on this manuscript.

muscle synergies during natural muscle behaviors. *J. Neurosci.* 25, 6419–6434. doi: 10.1523/ JNEUROSCI.4904-04.2005


synergies in the construction of a natural motor behavior. *Nat. Neurosci.* 6, 300–308. doi: 10.1038/ nn1010


165, 323–346. doi: 10.1016/S0079- 6123(06)65020-6


muscle synergies during locomotion in the intact cat as revealed by cluster analysis and direct decomposition. *J. Neurophysiol.* 96, 1991–2010. doi: 10.1152/jn.00241.2006


linkage of reflex actions to alphamotoneurones. *Exp. Brain Res.* 65, 271–281. doi: 10.1007/BF00236299


*Neuron* 76, 1071–1077. doi: 10.1016/j.neuron.2012.10.018


circuitry involved in locomotor pattern generation: insights from deletions during fictive locomotion. *J. Physiol.* 577, 617–639. doi: 10.1113/jphysiol.2006.118703


the isolated mouse spinal cord during spontaneous deletions in fictive locomotion: insights into locomotor central pattern generation organization. *J. Physiol.* 590, 4737–4759. doi: 10.1113/jphysiol.2012.240895

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 26 March 2013; paper pending published: 16 April 2013; accepted: 12 June 2013; published online: 11 July 2013.*

*Citation: Krouchev N and Drew T (2013) Motor cortical regulation of sparse synergies provides a framework for the flexible control of precision walking.* *Front. Comput. Neurosci. 7:83. doi: 10.3389/fncom.2013.00083 Copyright © 2013 Krouchev and Drew. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

# Increased motor cortex excitability during motor imagery in brain-computer interface trained subjects

### *Olesya A. Mokienko1,2, Alexander V. Chervyakov1 \*, Sofia N. Kulikova1, Pavel D. Bobrov2,3, Liudmila A. Chernikova1, Alexander A. Frolov2,3 and Mikhail A. Piradov1*

*<sup>1</sup> Research Center of Neurology Russian Academy of Medical Science, Moscow, Russia*

*<sup>2</sup> Institute of Higher Nervous Activity and Neurophysiology of RAS, Moscow, Russia*

*<sup>3</sup> Technical University of Ostrava, Ostrava, Czech Republic*

### *Edited by:*

*Yuri P. Ivanenko, IRCCS Fondazione Santa Lucia, Italy*

### *Reviewed by:*

*Mikhail Lebedev, Duke University, USA*

*Francesca Sylos Labini, IRCCS Santa Lucia Foundation, Italy*

### *\*Correspondence:*

*Alexander V. Chervyakov, Research Center of Neurology Russian Academy of Medical Science, Marshala Katukova Street, 13-2-124, 123181 Moscow, Russia e-mail: tchervyakovav@gmail.com*

**Background**: Motor imagery (MI) is the mental performance of movement without muscle activity. It is generally accepted that MI and motor performance have similar physiological mechanisms.

**Purpose**: To investigate the activity and excitability of cortical motor areas during MI in subjects who were previously trained with an MI-based brain-computer interface (BCI).

**Subjects and Methods**: Eleven healthy volunteers without neurological impairments (mean age, 36 years; range: 24–68 years) were either trained with an MI-based BCI (BCI-trained, *n* = 5) or received no BCI training (*n* = 6, controls). Subjects imagined grasping in a blocked paradigm task with alternating rest and task periods. For evaluating the activity and excitability of cortical motor areas we used functional MRI and navigated transcranial magnetic stimulation (nTMS).

**Results:** fMRI revealed activation in Brodmann areas 3 and 6, the cerebellum, and the thalamus during MI in all subjects. The primary motor cortex was activated only in BCI-trained subjects. The associative zones of activation were larger in non-trained subjects. During MI, motor evoked potentials recorded from two of the three targeted muscles were significantly higher only in BCI-trained subjects. The motor threshold decreased (median = 17%) during MI, which was also observed only in BCI-trained subjects.

**Conclusion**: Previous BCI training increased motor cortex excitability during MI. These data may help to improve BCI applications, including rehabilitation of patients with cerebral palsy.

**Keywords: brain-computer interface, motor imagery, navigated TMS, functional MRI, neurorehabilitation**

# **HIGHLIGHTS**


# **INTRODUCTION**

Modularity is important concept in understanding mechanisms of motor control and motor learning. Recent investigations of muscle synergies, motor primitives, compositionality, basic action concepts, and related work in machine learning have contributed to advance our understanding of the architecture underlying rich motor behaviors (Lacquaniti et al., 2013). One of the most interesting topic in researches of motor system modularity and organization is study of motor imagery and changes in neural networks during it.

Motor imagery (MI) activates brain regions that participate in motor control (Crammond, 1997; Jeannerod, 2001; Stippich et al., 2002; Ehrsson et al., 2003; Neuper et al., 2005). These structures include the premotor and supplementary motor cortices (Brodmann area 6), parietal cortical areas, cingulate gyrus, basal ganglia, and cerebellum. The primary motor cortex (Brodmann area 4) is also active during MI (Jeannerod, 2001; Neuper et al., 2005; Sharma et al., 2006; Dickstein and Deutsch, 2007; Mulder, 2007). Furthermore, several studies using transcranial magnetic stimulation (TMS) demonstrated increased corticospinal excitability and increased amplitudes of motor evoked potentials (MEPs) during MI (Fadiga et al., 1999; Hashimoto and Rothwell, 1999; Vargas et al., 2004; Cicinelli et al., 2006; Stinear et al., 2006; Pichiorri et al., 2011).

These previous findings led scientists to develop an MI training paradigm to stimulate neuroplastic changes in patients with paresis resulting from brain injury, or for use in athletic training. An electroencephalography (EEG)-based brain-computer interface (BCI) is a promising method to support MI during such training. The BCI transforms EEG signals generated during MI into commands that can control an external device (Prasad et al., 2010; Mokienko and Chernikova, 2011; Shih et al., 2012). Modulation of the sensorimotor rhythm can serve as the signal of brain activity during MI (Pfurtscheller and Lopes da Silva, 1999). However, the plasticity-related changes resulting from BCI-supported MI training have not been studied in detail. Furthermore, no study has included the navigated transcranial magnetic stimulation (nTMS) method, or compared functional magnetic resonance imaging (fMRI) and nTMS data for functional mapping during MI in BCI-trained and not trained subjects. The aim of our experiment was to investigate the activity and excitability of different cortical motor areas during MI in BCI-trained and BCI-naïve subjects.

# **MATERIALS AND METHODS PARTICIPANTS**

The inclusion criteria were an age of 20–70 years old, an absence of neurological disorders, normal vision, right-handedness, and written informed consent. Eleven volunteers (mean age, 36 years; age range, 24–68 years; 7 males and 4 females) were included into the study. Subjects of group 1 (*n* = 5, mean age = 45.8 y.o) had 10 to 15 sessions of BCI-supported training 20–30 min each. The training course was followed by fMRI and nTMS examinations. Subjects of group 2 (*n* = 6, mean age = 27.6 y.o) were tested without preliminary training session. All subjects underwent fMRI and nTMS after the training sessions. The protocol was approved by the local Ethical Committee of the Research Center of Neurology of RAMS, Moscow. All subjects provided written informed consent.

# **BCI TRAINING**

The BCI training was based on EEG activity patterns recording during grasping MI. Subjects sat comfortably in an armchair located 1 m from a computer screen that presented visual instructions. Subjects visually fixated on a circle presented in the center of the screen and received instructions from three surrounding figures (rhomboidal arrows). Subjects were given three commands instructing them to relax (upper arrows were illuminated) or imagine slow grasping movements with the right or left hand (right or left arrow illuminated). The "Relax" command meant that the subject had to sit still and look at the center of the screen. Commands were presented randomly, each of 10-s duration. For each subject, training was performed in 10–15 experimental days, with one 20–30 min session performed each day. Intervals between training sessions were 1–4 days.

A visual cue provided the subject with feedback regarding the mental task recognition: the central circle turned green if the classifier recognized the task in agreement with the given command, or remained white if the signal was not recognized. The EEG was registered with 30 electrodes distributed over the head in accordance with the standard international 10–20 system. EEG signals were filtered from 5–30 Hz. We used a Bayesian approach for EEG pattern classifying. The activity sources most relevant for BCI functioning were identified using an independent component analysis (ICA). Classification accuracy was measured with Cohen's kappa, a parameter conventionally used in BCI studies (Kohavi and Provost, 1998). A kappa of 1 indicates perfect recognition, whereas a kappa of 0 indicates random recognition.

# **fMRI**

fMRI was conducted using a Magnetom Avanto 1.5-T MRI system (Siemens, Germany). Standard axial T2-weighted turbo-spin echo imaging was performed initially to rule out pathological changes in brain tissue [repetition time (TR), 4000 ms; echo time (TE), 106 ms; section thickness, 5.0 mm; matrix, 230 × 230 mm; imaging time, 2 min 2 s]. Anatomical data were obtained with sagittal T1-weighted gradient echo imaging with isometric voxels (Ò1 multiplanar reconstruction: TR, 1940 ms; TE, 3.1 ms; TI, 1100 ms; section thickness, 1.0 mm; matrix, 256 × 256 mm; imaging time, 4 min 23 s). During the fMRI experiment, subjects performed the same task that was performed during the BCI training sessions, but without feedback. For each subject, three sets of functional data were obtained representing different conditions, including rest (8 replicates) and right or left hand movement imagery (4 replicates each). The imaging mode used was axial T2<sup>∗</sup> gradient echo (TR, 3800 ms; TE, 50 ms; matrix, 192 × 192 mm, section thickness, 3 mm) with fat suppression and correction for motion. The imaging time was 6 min 10 s.

The data analysis was performed in the MATLAB (Mathworks, Natick, MA, USA) environment using the statistical package for processing in SPM8 (Welcome Trust Center of Neuroimaging, London, UK). The first step of analysis corrected head movement artifacts. Next, the functional data were translated to Montreal Neurological Institute (MNI) coordinates (i.e., normalization). Standard MNI coordinates, which are used in the SPM8 package, were developed and adopted by the International Consortium for Brain Mapping. In the next step, the normalized data were smoothed. This step was followed by a classic analysis that used generalized linear models. The results from each subject were used in the group analysis to identify areas showing task-specific activity.

# **nTMS**

Neurophysiological investigation was performed using nTMS with the NBS eXimia Nexstim apparatus (Finland).It includes 70 mm figure-eight-shaped BiPulse Nexstim coil, with a maximal magnetic field strength of 199 V/m and a magnetic impulse duration of 280µs. The coil was placed anteromedially at a 45◦ angle from the midline. The stimulated hemisphere was not the same for all the subjects and was chosen randomly.

As a first step, all subjects underwent an MRI investigation on a Magnetom Symphony 1.5 T scanner (Siemens, Germany) using a T1 multiplanar reconstruction regime (MPR); the data were loaded into the NBS eXimia Nexstim system to obtain subjects' individual 3D brain models. Following that, real anatomical entities were matched to their MRI representations.

The MEPs were recorded using a standard EMG machine (Nexstim, EMD, Finland) and surface electrodes. MEPs were recorded by placing 0.6 cm<sup>2</sup> EMG electrodes on the target muscle being mapped [abductor pollicis brevis (APB), flexor carpi ulnaris (FCU), and extensor carpi radialis (ECR)] which were positioned according to the atlas of Leis and Trapani (2000), according to the belly–tendon principle. The ground electrode was placed on the right clavicle or on the upper third of the right forearm. We then determined the resting motor threshold (MT), defined as the lowest stimulation intensity allowing evocation of motor responses 0.50 mV peak to peak amplitude in 5/10 trials with the patient at rest (Rossini et al., 1994). Resting MT was measured in present (%) of the maximum intensity of the magnetic stimulator (1,5 Tesla). Evoked motor responses (EMRs) and their amplitudes and latencies were recorded for each target muscle. Cortical motor representations were constructed from these observations.

In the first step of the experiment, the areas of interest (contralateral primary motor and premotor cortices) were stimulated with magnetic fields of 80–110 V/m to identify EMRs with amplitudes of 100–500µV. The resting MT was determined for each site by the largest detected EMR amplitude. Cortical representations of the target muscles were mapped at 110% intensity of the determined resting MT. The mean EMR amplitudes and muscle motor representations were evaluated during cortical mapping.

In the second step of the experiment, the passive EMR threshold was determined and the motor representations were mapped while subjects imagined grasping with the contralateral hand. The motor representations were mapped using stimulus intensities as in the first step. The hand was positioned on the armrest in the neutral position of the radiocarpal joint.

Statistical analysis of quantifiable data was performed using a repeated measures analysis of variance (ANOVA) and Newman-Keuls *post-hoc* test using the Statistica 6.0 software package (StatSoft, 2003). The data are presented as the median and 25–75% quartiles. Differences were considered significant at *p <* 0*.*05.

# **RESULTS**

### **SUBJECTS**

Eleven volunteers (mean age, 36 years; age range, 24–68 years; 7 men, 4 women) participated in the study. Subjects in group 1 (*n* = 5) underwent 10 to 15 sessions of BCI-supported training that were each 20–30 min in duration. The training course was followed by fMRI and nTMS examinations. Subjects in group 2 (*n* = 6) were tested without performing preliminary training sessions.

### **BCI TRAINING**

The achieved accuracy rates (median Cohen's kappa) were 0.46 [0.45; 0.52]. BCI control for all subjects was achieved with sensorimotor rhythm modulation. MI was accompanied with desynchronization of mu and low beta rhythms (i.e., eventrelated desynchronization) (Pfurtscheller and Lopes da Silva, 1999). This signal was a recognizable command for the BCI.

### **FUNCTIONAL MRI**

In BCI-trained subjects, MI was accompanied by activity in the contralateral somatosensory (Brodmann area 3), primary motor (Brodmann area 4), and premotor cortical areas. Activity also was observed in the bilateral supplementary motor cortex (Brodmann area 6), contralateral ventral lateral nucleus of thalamus, and ipsilateral cerebellum (*p <* 0*.*0005; **Figure 1**).

For BCI-naïve subjects, activity was observed in the contralateral somatosensory (Brodmann area 3) and premotor cortical areas, as well as the supplementary motor cortex (Brodmann area 6) bilaterally. Other activated areas included the contralateral ventral lateral nucleus of the thalamus, ipsilateral cerebellum, contralateral Brodmann area 9, and bilateral Brodmann areas 40 and 13 (*p <* 0*.*0005). The primary motor cortex was not activated in untrained subjects (**Figure 2**). The areas of activation including the somatosensory, premotor, and supplementary motor areas observed during MI were significantly larger in BCI-naïve subjects than in BCI-trained subjects (*p <* 0*.*01).

### **nTMS**

In BCI-trained subjects, the passive MT for the motor cortex decreased by 6–18% (median change was 17%) during MI compared to the rest condition. In BCI-naïve subjects, the threshold change during MI compared to rest were not significant and were inconsistent among subjects. The threshold decreased by 1–8% in three subjects, insignificantly increased in two subjects, and was unchanged in one subject. The MT changes were statistically significant only for BCI-trained subjects (*p <* 0*.*01, **Table 1**).

For APB, the median change in motor response during MI compared to rest condition was 63% in BCI-trained subjects, and 11% in BCI-naïve subjects. For ECR, the change was 150% in BCI-trained subjects and 1% in BCI-naïve subjects. In BCItrained subjects, the responses in APB and ECR (mean EMR) during MI were significantly higher during MI compared to the rest condition (APB, *p* = 0*.*03; ECR, *p* = 0*.*01). In BCI-naïve subjects, the differences in EMR were not significant (APB, *p* = 0*.*24; ECR, *p* = 0*.*23, **Table 1**).

We did not observe statistically significant increases in mean EMR amplitude in FCU for either group (**Table 1**). The median change in motor response was 78% in BCI-trained subjects

**FIGURE 2 | Areas of activation during grasping imagery in untrained subjects (group analysis of fMRI data, Left hand imagery** *>* **Rest,** *p <* **0***.***0005). (A),** Brodmann areas 3 and 4; **(B)**, supplementary motor cortex; **(C)**, cerebellum; **(D)**, insula; **(E)**, Brodmann area 9; **(F)**, Brodmann area 40.

**Table 1 | Motor thresholds and evoked motor responses during rest and motor imagery for the two groups (represented as median, [25th and 75th percentiles]).**


*EMR, evoked motor response; APB, abductor pollicis brevis; FCU, flexor carpi ulnaris; ECR, extensor carpi radialis. ANOVA with Newman-Keuls post-hoc test.*

and 12% in BCI-naïve subjects. Moreover, in BCI-trained subjects, stimulation induced EMRs that were larger during MI than at rest, which was not observed in BCI-naïve subjects (**Figure 3**). A comparison of nTMS and fMRI maps revealed partial overlap of motor areas detected by these two methods (**Figure 4**).

# **DISCUSSION**

The activation and excitability of the motor cortex during MI is different in BCI-trained and BCI-naïve subjects, and this difference can be detected with a combination of fMRI and nTMS.

# **MI AND ACTIVATION OF MOTOR STRUCTURES**

The fMRI analysis revealed the same brain areas were active during MI for both groups. These areas included the contralateral somatosensory, contralateral premotor, supplementary motor cortex bilaterally, contralateral ventral lateral nucleus of thalamus, and ipsilateral cerebellum. Similar activation patterns were described in other fMRI-based MI studies (Jeannerod, 2001; Neuper et al., 2005; Sharma et al., 2006; Dickstein and Deutsch, 2007; Mulder, 2007).

In the literature, there is a debate regarding what role the primary motor cortex plays in MI, as some studies failed to observe its activation (Parsons et al., 1995; Hanakawa et al., 2003; Meister et al., 2004; de Lange et al., 2005). In our study, primary motor cortex activation was observed only in BCI-trained subjects. Therefore, we suppose primary motor cortex is involved in MI in individuals who can "successfully" imagine a movement, or who have been trained to do so (e.g., BCI-supported training).

The somatosensory, premotor, and supplementary motor cortical areas were activated during MI, and were larger on fMRI in BCI-naïve subjects. This is in agreement with the principles implying localization of new or skilled movements.

In BCI-naïve subjects, we observed bilateral activation of Brodmann area 40, the complex associative cortex. This region plays a central role in developing cognitive strategies and motor programs, and its activation was described in several previous MI studies (Gerardin et al., 2000; Lafleur et al., 2002; Jackson

et al., 2003). This associative area was reported to be activated predominantly in the left hemisphere during complex motor performed by right-handed individuals (Gerardin et al., 2000). In addition, the dorsolateral prefrontal cortex (Brodmann area 9) was active in BCI-naïve subjects. This associative area represents the highest level of motor planning and regulation, and plays an important role in sensory and mnemonic information integration and working memory processes (Derrfuss et al., 2004). Right and left insula activation can be associated with cognitive control, task coordination, and working memory involvement (Derrfuss et al., 2004). It should be noted that BCI-trained subjects did not show significant activity in the associative areas.

# **MI AND CORTICOSPINAL EXCITABILITY**

Our nTMS findings indicate that MI is generally associated with a decrease in the evoked response threshold, an increase in the EMR amplitude, and an expansion of evoked response areas against the background of decreased excitation thresholds. Together, these changes reflect increased motor cortex excitability during MI. These changes are increased with MI training and often do not occur in untrained individuals. Our results are in agreement with the findings of other studies using classical TMS (without MRI navigation) (Fadiga et al., 1999; Hashimoto and Rothwell, 1999; Vargas et al., 2004; Cicinelli et al., 2006; Stinear et al., 2006).

### **nTMS AND INVESTIGATION OF MI**

Pichiorri et al. (2011) used TMS to assess the neuroplastic changes associated with MI-based BCI training. In that study, 10 healthy volunteers participated in 6–8 40-min BCI sessions. The training resulted in a significant increase in motor cortex excitability, and enhanced EMRs in target muscles during MI (Pichiorri et al., 2011). The TMS used in those studies was not navigated using MRI or fMRI data. In contrast to conventional TMS, nTMS allows local and precise stimulation based on an individual's MRI data (Chervyakov et al., 2013). This technique makes it possible to assess cortical excitability with a high spatial (2 mm) and temporal resolution. In the present study, we obtained similar results in terms of EMR, but we used both nTMS and fMRI. nTMS allowed us to map target muscle representations during MI for each subject based on MRI and fMRI data.

nTMS can be used to evaluate the dynamics of neuroplastic processes accompanying MI. MI mapping is most commonly performed with fMRI. In this case, the indirect measure of brain

**FIGURE 4 | Comparison of mapping results obtained during motor imagery in a BCI-trained subject by using fMRI (fist clenching imagery task) and nTMS.**

activity is the BOLD signal. The main advantage of fMRI is its high spatial resolution of approximately 1 mm (deCharms et al., 2004). However, the temporal resolution of this technique is relatively low, reaching 1–2 s. In addition, the physiological slowing of the hemodynamic response ranges from 3–6 s (Weiskopf et al., 2004). Most fMRI-based studies of MI do not involve EMG activity. In contrast, TMS provides a high temporal and spatial resolution. Surface EMG recording during TMS mapping makes it possible to control for the lack of muscle control during MI. Moreover, this mapping technique is based on directed and selective cortex stimulation, whereas in fMRI mapping, brain activity is evaluated based on an indirect signal.

# **COMPARISON OF fMRI AND nTMS MOTOR REPRESENTATION MAPS**

The activity foci determined by the group analysis of fMRI data were in agreement with previously published data from other MI studies (Jeannerod, 2001; Neuper et al., 2005; Sharma et al., 2006; Dickstein and Deutsch, 2007; Mulder, 2007; Pichiorri et al., 2011). The discrepancy between the results for motor area mapping obtained using the two techniques (fMRI and nTMS) can be explained by the fact that TMS has a direct and selective effect on corticospinal pathways, whereas fMRI reflects BOLD signal changes associated with task performance (i.e., MI). A large study aimed at comparing these two neuroimaging techniques showed the distance between motor areas identified by fMRI and nTMS ranged from 0–21.7 mm (3*.*70 ± 4*.*85 mm) (Neuvonen and Niskanen, 2009).

# **MI TRAINING AND ITS CLINICAL APPLICATION IN NEUROLOGICAL REHABILITATION**

Changes in EMR amplitudes and cortical representations were mainly associated with a decrease in motor thresholds in individuals who had undergone MI training in a similar task. The EMR threshold reflects motor cortical excitability and was shown to be an informative parameter in several neurological diseases (Nikitin and Kurenkov, 2003). Our results suggest that MI training has a significant effect on cortical motor representations, which is probably comparable to that of actual motor training. Therefore, MI can be recommended as a rehabilitation practice in patients with severe motor deficiencies resulting from central nervous system injury.

To conclude, although the number of participants in this experiment was small, the results suggest the possibility of appropriate and optimal neuroplasticity control using BCI training. The considerations discussed above also suggest that nTMS is a highly promising method for investigating neurological plasticity. Nevertheless, further combined TMS-MRI-fMRI studies are required to determine its optimal application sites.

## **ACKNOWLEDGMENTS**

This work was supported by the Research center of neurology Russian academy of medical science, by the Russian Foundation for Basic Research (projects no. 13-04-12019), by IT4 Innovations Centre of Excellence (project no. CZ.1.05/1.1.00/02.0070) and by the Bio-Inspired Methods: research, development and knowledge transfer project (reg. no. CZ.1.07/2.3.00/20.0073) funded by Operational Programme Education for Competitiveness, cofinanced by ESF and state budget of the Czech Republic.

# **REFERENCES**


execution by functional magnetic resonance imaging. *Neurosci. Lett.* 331, 50–54. doi: 10.1016/S0304-3940(02)00826-1


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 18 July 2013; accepted: 02 November 2013; published online: 22 November 2013.*

*Citation: Mokienko OA, Chervyakov AV, Kulikova SN, Bobrov PD, Chernikova LA, Frolov AA and Piradov MA (2013) Increased motor cortex excitability during motor imagery in brain-computer interface trained subjects. Front. Comput. Neurosci. 7:168. doi: 10.3389/fncom.2013.00168*

*This article was submitted to the journal Frontiers in Computational Neuroscience.*

*Copyright © 2013 Mokienko, Chervyakov, Kulikova, Bobrov, Chernikova, Frolov and Piradov. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Learned parametrized dynamic movement primitives with shared synergies for controlling robotic and musculoskeletal systems

#### *Elmar Rückert <sup>1</sup> \* and Andrea d'Avella2*

*<sup>1</sup> Institute for Theoretical Computer Science, Graz University of Technology, Austria <sup>2</sup> Laboratory of Neuromotor Physiology, Fondazione Santa Lucia, Rome, Italy*

### *Edited by:*

*Martin Giese, University Clinic Tuebingen, Germany*

### *Reviewed by:*

*Florentin Wörgötter, University Goettingen, Germany Dominik M. Endres, HIH, CIN, BCCN and University of Tübingen, Germany Tomas Kulvicius, University of Goettingen, Germany*

### *\*Correspondence:*

*Elmar Rückert, Institute for Theoretical Computer Science, Graz University of Technology, Inffeldgasse 16b/1, 8010 Graz, Austria e-mail: rueckert@igi.tugraz.at*

A salient feature of human motor skill learning is the ability to exploit similarities across related tasks. In biological motor control, it has been hypothesized that muscle synergies, coherent activations of groups of muscles, allow for exploiting shared knowledge. Recent studies have shown that a rich set of complex motor skills can be generated by a combination of a small number of muscle synergies. In robotics, dynamic movement primitives are commonly used for motor skill learning. This machine learning approach implements a stable attractor system that facilitates learning and it can be used in high-dimensional continuous spaces. However, it does not allow for reusing shared knowledge, i.e., for each task an individual set of parameters has to be learned. We propose a novel movement primitive representation that employs parametrized basis functions, which combines the benefits of muscle synergies and dynamic movement primitives. For each task a superposition of synergies modulates a stable attractor system. This approach leads to a compact representation of multiple motor skills and at the same time enables efficient learning in high-dimensional continuous systems. The movement representation supports discrete and rhythmic movements and in particular includes the dynamic movement primitive approach as a special case. We demonstrate the feasibility of the movement representation in three multi-task learning simulated scenarios. First, the characteristics of the proposed representation are illustrated in a point-mass task. Second, in complex humanoid walking experiments, multiple walking patterns with different step heights are learned robustly and efficiently. Finally, in a multi-directional reaching task simulated with a musculoskeletal model of the human arm, we show how the proposed movement primitives can be used to learn appropriate muscle excitation patterns and to generalize effectively to new reaching skills.

**Keywords: dynamic movement primitives, muscle synergies, reinforcement learning, motor control, musculoskeletal model**

# **1. INTRODUCTION**

Reinforcement Learning of motor skills in robotics is considered to be very challenging due to the high-dimensional continuous state and action spaces. In many studies it has been shown that learning can be facilitated by the use of movement primitives (Schaal et al., 2003; Rückert et al., 2013). Movement primitives are parametrized representations of elementary movements, where typically for each motor skill a small set of parameters is tuned or learned. However, many motor control tasks are related and could be learned more effectively by exploiting shared knowledge.

This is a well-known concept in motor neuroscience, where muscle synergies or coherent activations of groups of muscles (d'Avella et al., 2003; d'Avella and Bizzi, 2005; Bizzi et al., 2008) have been proposed to simplify the control problem of complex musculoskeletal systems. In analyzing muscle activation recordings it has been demonstrated that by combining only few muscle activation patterns multiple task instances of natural motor behaviors, e.g., fast reaching movements of humans (d'Avella et al., 2006), primate grasping movements (Overduin et al., 2008), or walking patterns of infants, toddlers, and adults (Dominici et al., 2011) could be efficiently modeled. One important finding of theses studies is that the dimensionality of the motor control problem can be drastically reduced by reusing common knowledge of related tasks, i.e., grasping objects at different locations using a linear combination of shared muscle synergies. While this has been demonstrated in biological data analysis, only few robotic applications exist that use this shared task knowledge (Chhabra and Jacobs, 2006; Alessandro et al., 2012). These methods demonstrate the advantages of shared synergies in learning robotic tasks. However, different procedures were applied to obtain a parametric description of synergies, i.e., in Chhabra and Jacobs (2006) a variant of non-negative matrix factorization (d'Avella et al., 2003) was used given a set of pre-computed trajectories and in Alessandro et al. (2012) the synergies were extracted from dynamic responses of a robot system with random initialization. In contrast, we propose to learn the synergies representation in a reinforcement learning framework, where task-specific and task-invariant parameters in a multi-task learning setting are learned simultaneously.

In robotics the most widely used approach for motor skill learning are Dynamic Movement Primitives (DMPs) (Schaal et al., 2003; Ijspeert et al., 2013). This approach uses parametrized dynamical systems to determine a movement trajectory and has several benefits. First, as it is a model-free approach, there is no need to learn the typically non-linear, high-dimensional dynamic forward model of a robot (However, this is not the case when inverse dynamics controller are used to compute the control commands). Second, it provides a linear policy parametrization which can be used for imitation learning and policy search (Kober and Peters, 2011). The complexity of the trajectory can be scaled by the number of parameters (Schaal et al., 2003) and one can adapt meta-parameters of the movement such as the movement speed or the goal state (Pastor et al., 2009; Kober et al., 2010). Finally, the dynamical system is constructed such that the system is stable. This simplifies learning since even without modulating the dynamical system the movement trajectory is always attracted by a known (or learned) goal state. However, this parametrization does not allow for reusing shared knowledge, as proposed by the experimental findings studying complex musculoskeletal systems (d'Avella et al., 2003; Bizzi et al., 2008; d'Avella and Pai, 2010). Thus, typically for each motor task an individual movement parametrization has to be learned.

In this paper we propose to use a superposition of learned basis functions or synergies to modulate the stable attractor system of DMPs. This allows for reusing shared knowledge for learning multiple related tasks simultaneously while preserving the benefits of the dynamical systems, i.e., the stability in learning complex motor behavior. The synergies and their activation in time are learned from scratch in a standard reinforcement learning setup. Note that imitation learning could also be applied to implement an initial guess for the synergies, e.g., by using decomposition strategies discussed in d'Avella and Tresch (2001). However, this is beyond the scope of this paper. Moreover, our approach is like the DMPs applicable to discrete and rhythmic movements and allows for modeling time-varying synergies (d'Avella et al., 2006). We therefore denote our approach *DMPSynergies*. By using for each task a combination of individual, temporally fixed, basis functions DMPs can be modeled as special case of this approach. The benefit of the common prior knowledge is even more drastic when generalizing to new motor tasks given the previously learned basis functions. Thus, for simpler synergies only the weights for the linear combination have to be acquired and for time-varying synergies additionally the time-shift parameters need to be learned. This is demonstrated on a complex walking task and on reaching task using an arm actuated by muscles.

As in previous studies on DMPs (Meier et al., 2011; Mülling et al., 2013) we want to go beyond basic motor skills learning. However, in contrast to those studies that use a library of primitives for sequencing elementary movements (Meier et al., 2011) or mixing basic skills (Mülling et al., 2013), we implement the common shared knowledge among multiple tasks as prior in a hierarchical structure. On the lower level task related parameters, i.e., amplitude scaling weights or time-shift parameters are used to modulate a linear superposition of learned basis functions, the shared higher level knowledge. This has the promising feature that by combining just a small number of synergies diverse motor skills can be generated.

In the Materials and Methods, we will first briefly introduce DMPs (Schaal et al., 2003; Ijspeert et al., 2013) as we build on this approach. We then extend DMPs to allow for reusing shared task knowledge in the form of parametrized synergies. The advantage of the shared knowledge is evaluated in the Results on three multi-task learning scenarios. First, a simple via-point task is used to demonstrate the characteristics of the proposed representation. Then, rhythmic movements are learned in a dynamic 5-link planar biped walker environment. Finally, a musculoskeletal model of a human arm is used to evaluate our primitives on a muscle actuated system learning discrete reaching movements to multiple targets.

# **2. MATERIALS AND METHODS**

### **2.1. DYNAMIC MOVEMENT PRIMITIVES**

DMPs generate multi-dimensional trajectories by the use of non-linear differential equations (simple damped spring models) (Schaal et al., 2003). The basic idea is to use for each degree-offreedom (DoF), or more precisely for each actuator, a globally stable, linear dynamical system of the form

$$
\tau \dot{z} = \alpha\_{\overline{z}} (\beta\_{\overline{z}} (g - \mathbf{y}^\*) - z) + f, \quad \tau \dot{\mathbf{y}}^\* = z,\tag{1}
$$

which is modulated by a learnable non-linear function *f* . The final position of a movement is denoted by *g* and the variables *y*<sup>∗</sup> and *y*˙<sup>∗</sup> represent the desired state in i.e., joint angles and joint velocities. The time constants α and β are usually pre-defined. The temporal scaling factor τ can be used for de- or accelerating the movement execution as needed. Finally *z* denotes an internal variable of the dynamical system. For each DoF an individual function *f* is used which is different for discrete and rhythmic movements.

For discrete movements the function *f* only depends on the phase *s*, which is an abstraction of time and was introduced to scale the movement duration (Schaal et al., 2003). The function *f*(*s*) is constructed of the weighted sum of *N* Gaussian basis functions *n*

$$f(s) = \frac{\sum\_{n=1}^{N} \Psi\_n(s)\Psi\_n s}{\sum\_{n'=1}^{N} \Psi\_{n'}(s)},\tag{2}$$

where for discrete movements these Gaussian basis functions are

$$\Psi\_n(s) = \exp\left(-\frac{1}{2h\_n^2} \left(s - \mu\_n\right)^2\right), \quad \text{tr}\dot{s} = -\alpha\_s s.$$

Only the weights *wn* are parameters of the primitive which can modulate the shape of the movement. The centers or means μ*<sup>n</sup>* ∈ [0, 1] specify at which phase of the movement the basis function becomes active. They are typically equally spaced in the range of *s* and not modified during learning. The bandwidth of the basis functions is given by *h*<sup>2</sup> *<sup>n</sup>* and is typically chosen such that the Gaussians overlap.

For rhythmic movements periodic activation functions are used (Ijspeert et al., 2002). The non-linear function *f* reads

$$f(\phi) = \frac{\sum\_{n=1}^{N} \Psi\_n(\phi) w\_n}{\sum\_{n'=1}^{N} \Psi\_{n'}(\phi)},\tag{3}$$

where the periodic phase angle is denoted by φ ∈ [0, 2π]. In Ijspeert et al. (2002) additionally a scalar variable was used to scale the amplitude of the oscillator, which was omitted for simplicity. The basis functions are given by

$$\Psi\_n(\phi) = \exp(h\_n(\cos(\phi - \mu\_n) - 1)), \quad \text{tr}\dot{\phi} = 1, 2$$

which implement von Mises basis functions. Note that for the periodic basis functions the trajectory in Equation 1 oscillates around the attractor point or goal state *g*.

Integrating the dynamical systems in Equation 1 for each DoF results into a desired trajectory **y**<sup>∗</sup> *<sup>t</sup>* , **<sup>y</sup>**˙<sup>∗</sup> *<sup>t</sup>* of the joint angles. To follow this trajectory, in the most simple case a linear feedback controller is subsequently used to generate appropriate control commands denoted by **u***t*:

$$\mathbf{u}\_t = \text{diag}(\mathbf{k}\_{\text{pos}})(\mathbf{y}\_t^\* - \mathbf{y}\_t) + \text{diag}(\mathbf{k}\_{\text{vel}})(\dot{\mathbf{y}}\_t^\* - \dot{\mathbf{y}}\_t). \tag{4}$$

For each actuator the linear weights **W** = [**w**1,...,**w***D*] as well as the control gains **k**pos and **k**vel have to be specified, i.e., **θ** = [**W**, **k**pos, **k**vel]. This results into *ND* + 2*D* parameters for the movement representation, where *D* denotes the number of actuators or muscles of a system. The simulated trajectory is denoted by **y***t*, **y**˙*t*.

In multi-task learning we want to learn *k* = 1..*K* tasks simultaneously. For very simply tasks, such as the via-point experiments described below, it could be sufficient to adapt the goal state *g*. However, this is usually not the case for more complex motor skill learning tasks in robotics. With DMPs typically for each motor skill an individual movement parametrization **θ***<sup>k</sup>* has to be learned. However, if we assume similarities among these tasks the learning problem could potentially be simplified by reusing shared knowledge. Inspired by experimental findings in biology (d'Avella et al., 2003; Bizzi et al., 2008; d'Avella and Pai, 2010) we extend these DMPs. Only the parametrization for the non-linear function *f*(*s*) for discrete movements or *f*(φ) for rhythmic movement changes. The dynamical system in Equation 1 and the linear feedback controller in Equation 4 remains the same.

### **2.2. DYNAMIC MOVEMENT PRIMITIVES WITH SHARED SYNERGIES (DMPSynergies)**

For learning the *k*th task, we propose to use a linear combination of temporal flexible basis functions or synergies to parametrize the non-linear function *f*(*s*) in Equation 2 or for rhythmic movements *f*(φ) in Equation 3:

$$f(s,k) = \sum\_{m=1}^{M} \beta\_{m,k} \Lambda\left(s, \Theta\_m, \Delta s\_{m,k}\right) s,\tag{5}$$

$$f(\phi, k) = \sum\_{m=1}^{M} \beta\_{m,k} \Omega\left(\phi, \theta\_m, \Delta s\_{m,k}\right),\tag{6}$$

where *s* denotes the phase variable which is only used for discrete movements. As with DMPs (*<sup>n</sup>* in Equation 2) the functions (.) and (.) are different for discrete and rhythmic movements.

All *K* tasks share *m* = 1..*M* synergies which are parametrized via the vector **θ***m*. Solely the weights β*m*, *<sup>k</sup>* and the time-shift *sm*, *<sup>k</sup>* are individual parameters for each task. The basic concept of the model is sketched in **Figure 1** for a one-dimensional discrete movement.

The complexity of each synergy is controlled by the number of Gaussians for discrete movements or by the number of von Mises basis functions for rhythmic patterns. We denote this number by *N*, where we parametrize in both cases the amplitude, the mean and the bandwidth. Thus, each synergy is represented by a parameter vector **θ***<sup>m</sup>* = [*am*, <sup>1</sup>,μ*m*, <sup>1</sup>, *hm*, <sup>1</sup>,..., *am*, *<sup>N</sup>*,μ*m*, *<sup>N</sup>*, *hm*, *<sup>N</sup>*].

For discrete movements the function (.) reads

$$\Lambda\left(s,\Theta\_m,\Delta s\_{m,k}\right) = \sum\_{n=1}^{N} a\_{m,n} \exp\left(-\frac{1}{2h\_{m,n}} (s - \mu\_{m,n} + \Delta s\_{m,k})^2\right). \tag{7}$$

For rhythmic movements a superposition of von Mises basis functions is used

$$\begin{aligned} &\Omega\left(\phi,\Phi\_m,\Delta s\_{m,k}\right) \\ &=\sum\_{n=1}^N a\_{m,n} \exp\left(h\_{m,n}\cos\left(\phi-\mu\_{m,n}+\Delta s\_{m,k}\right)-1\right). \end{aligned} (8)$$

**FIGURE 1 | Conceptual idea of using shared synergies in dynamical systems. (A)** A synergy is constructed by a superposition of parametrized Gaussians. The parameters are the amplitude *am*, *<sup>n</sup>*, the mean μ*m*, *<sup>n</sup>* and the bandwidth *hm*, *<sup>n</sup>*. In the example two Gaussians (*n* = 1..2) are used to model the first *m* = 1 synergy. **(B)** For each task only the activation β*<sup>m</sup>* of a synergy is learned. Time-varying synergies additionally implement a time-shift *sm*. The key concept is that multiple tasks share the same parametrized synergies shown in **(A)**, which represent task related common knowledge. **(C)** For each task the non-linear function *f*(*s*) is given by the weighted sum of the (time-shifted) synergies. Shown is a normalized version of *f*(*s*) to illustrate the effects of the superposition also at the end of the movement, which would usually converge toward zero. **(D)** Finally, the non-linear function *f*(*s*) is used to modulate a dynamical system. The unperturbed system with *f*(*s*) = 0 is denoted by the dashed line which is *attracted* by the goal state that is indicated by the large dot.

DMPs (Schaal et al., 2003) can be modeled as a special case of this formulation. For DMPs using *n* = 1..*N* basis functions the mean μ*m*, *<sup>n</sup>* and the bandwidth *hm*, *<sup>n</sup>* of the basis functions are fixed as discussed in Section 2.1. Solely the *n* = 1..*N* amplitudes or weights *am*, *<sup>n</sup>* are learned. By fixing these parameters and by modeling the non-linear function *f*(*s*) for discrete movements or *f*(φ) for rhythmic movements using a single (*M* = 1) synergy our representation can be used to implement DMPs.

# *2.2.1. Multi-dimensional systems*

For multi-dimensional systems for each actuator *d* = 1..*D* an individual dynamical system in Equation 1 and hence an individual function *f*(*s*, *k*) in Equation 5 or *f*(φ, *k*) in Equation 6 is used (Schaal et al., 2003). The phase variable *s* or φ is shared among all DoF (Note that *k* = 1..*K* denotes the task.).

Extending our notation for multi-dimensional systems the non-linear function *f*(*s*, *k*) in Equation 5 can be written as

$$\underbrace{f(s,d,k)}\_{1\times 1} = \sum\_{m=1}^{M} \underbrace{\theta\_{m,k,d}}\_{1\times 1} \underbrace{\Lambda\left(s,\theta\_{m,d},\Delta s\_{m,k,d}\right)}\_{1\times 1} s.$$

Depending on the dimension *d* different weights β*m*, *<sup>k</sup>*, *<sup>d</sup>*, policy vectors **θ***m*,*<sup>d</sup>* and time-shift parameters *sm*, *<sup>k</sup>*, *<sup>d</sup>* are used. Note that the policy vector **θ***m*,*<sup>d</sup>* is task-independent. Interestingly, when implementing additionally dimension-independent policy vectors, i.e., **θ***<sup>m</sup>* anechoic mixing coefficients (Giese et al., 2009) can be modeled.

Here, we only discuss discrete movement representations, however, the reformulation procedure applies also for rhythmic movement parametrizations. Let us also define a vector notation of **f**(*s*, *k*)

$$\underbrace{\mathbf{f}(s,k)}\_{1\times D} = \sum\_{m=1}^{M} \underbrace{\mathfrak{g}\_{m,k}}\_{1\times D} \circ \underbrace{\mathbf{w}\_m \left(s, \mathfrak{G}\_{m,1\dots D}, \Delta s\_{m,k,d}\right)}\_{1\times D},\tag{9}$$

where the symbol ◦ denotes the Hadamard product, the elementwise multiplication of vectors. The synergy vectors are specified by

$$\begin{aligned} \mathbf{w}\_{m}\left(s,\,\boldsymbol{\theta}\_{m,1\ldots D},\,\Delta s\_{m,k,d}\right) &= \left[\boldsymbol{\Lambda}\left(s,\,\boldsymbol{\theta}\_{m,1},\,\Delta s\_{m,k,1}\right)s,\,\right.\\ &\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\Lambda\left(s,\,\boldsymbol{\theta}\_{m,2},\,\Delta s\_{m,k,2}\right)s,\,\ldots,\\ &\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\ldots\\ &\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\ldots\end{aligned}$$

This vector notation is used in the following to compare to existing synergies representations (d'Avella et al., 2003, 2006).

## **2.3. MUSCULOSKELETAL MODELS AND MUSCLE SYNERGIES**

We also use the proposed movement representation, the DMPSynergies, to generate muscle excitation patters. These patterns are applied as input in a forward simulation of a musculoskeletal model. A schematic overview of such a simulation is shown in **Figure 2**. We briefly discuss all processes involved.

**Muscle synergies for generating muscle excitation patterns** are used as input in forward dynamics simulations. In our simulation experiments we evaluate time-varying synergies (d'Avella et al., 2006), which are a particular instance of the DMPSynergies, i.e., the weights β*m*, *<sup>k</sup>* and time-shift parameters *sm*, *<sup>k</sup>* in Equation 9 are independent of the dimension *d*. Thus, for discrete movements in multi-dimensional systems *f*(*s*, *k*) reads

$$\underbrace{\mathbf{f}(s,k)}\_{1\times D} = \sum\_{m=1}^{M} \underbrace{\beta\_{m,k}}\_{1\times 1} \underbrace{\mathbf{w}\_m \left(s + \Delta s\_{m,k}, \mathbf{\theta}\_{m,1\dots D}, \mathbf{0}\right)}\_{1\times D},\tag{10}$$

where β*m*, *<sup>k</sup>* is a scalar and the time-shift parameter *sm*, *<sup>k</sup>* is directly added to the phase variable *s*. This allows for a comparison to e.g., the formulation of time-varying synergies given in d'Avella et al. (2006), where

$$\mathbf{x}(t,k) = \sum\_{m=1}^{M} a\_m^k \mathbf{v}\_m \left( t - t\_m^k \right).$$

Shared synergies are represented by time-dependent vectors **v***m*(*t* − *t k <sup>m</sup>*), where in contrast to the proposed DMPSynergies a minor difference is the sign of the time-shift parameter *t k m*.

In this formulation of time-varying synergies (d'Avella et al., 2006) only the time-invariant combination coefficients *a<sup>k</sup> <sup>m</sup>* are task-dependent, whereas the vector **v***<sup>m</sup>* is task-independent. However, by using task, spatial or temporal (in)variant implementations of the mixing coefficients *a* or the basis vectors **v** other representations of synergies (d'Avella et al., 2003; Ivanenko et al., 2004; Giese et al., 2009) can be also implemented.

**Activation dynamics** model the effect of the delayed force generating process in muscles, as they are not capable of generating force instantaneously. Typically, for each muscle a first order differential equation is used, i.e., *a*˙ = (*f*(*s*, *k*) <sup>2</sup> − *f*(*s*, *k*)*a*)/τrise + (*f*(*s*, *k*) − *a*)/τfall (Zajac, 1989).

Here, *f*(*s*, *k*) denotes the generated muscle excitation signal using e.g., the proposed DMPSynergies. The actual muscle activation is denoted by *a*, which is a function of the rise time constant τrise and the fall time constant τfall. For our evaluations we implemented τrise = 10 ms and τfall = 40 ms (Winters and Stark, 1985).

**Muscle tendon dynamics** describe the complex and non-linear force generation properties of muscles. For an approximation a variety of models exist (Zajac, 1989). In these models a muscle is approximated by a number of musculotendinous units, each of which is implemented by a Hill-type contractile element in series with tendon. Characteristic properties of muscles are the optimal fiber length *L<sup>M</sup>* <sup>0</sup> , the maximum isometric force *F<sup>M</sup>* <sup>0</sup> , and the muscle pennation angle α, which are shown in **Table A5** in the appendix for the investigated model of a human arm. The tendon dynamics in this musculoskeletal model were approximated by the muscle model proposed in Schutte et al. (1993).

**Musculoskeletal geometry** represents the path of a muscle from its origin to its insertion that can be implemented as a series of straight-line path segments, which pass through a series of via points (Delp et al., 2007). To simulate how muscles wrap over underlying bone and musculature wrapping surfaces i.e., cylinders, spheres and ellipsoids are implemented, where this model is based on the upper extremity model discussed in Holzbaur et al. (2005). A detailed description of the implemented musculoskeletal geometry is given in the supplement (in form of a simulation model file, .osim).

**Multibody dynamics** are simulated by the physics simulation application OpenSim (Delp et al., 2007; Seth et al., 2011). It is an open source software that already implements a variety of muscle models (Zajac, 1989) and a large number musculoskeletal models are freely available. In our experiments the computational time needed to simulate a movement with a duration of e.g., 500 ms takes between 10 and 20 s (OpenSim implements numerical integrators with an adaptive time step) on a standard computer (3 GHz and 4 GB memory). However, we exploited parallel computing techniques for policy search, which resulted in a gain of factor 10. Alternatively, the muscle dynamics could be approximated via regression methods to speed-up the simulations (Chadwick et al., 2009).

# **2.4. LEARNING WITH MOVEMENT PRIMITIVES**

We denote the parametrization of a movement primitive by a policy vector **θ**. A widely used approach in robotics to learn these parameters is episodic reinforcement learning (Kober and Peters, 2011), which is outlined in **Figure 3A**. A policy search method is used to improve the movement primitive's representation **θ** assuming a given objective or reward function *C*(τ) ∈ R1. Throughout this manuscript *C*(τ) denotes a cost value that is equivalent to the negative reward in classical reinforcement learning (Sutton and Barto, 1998). It indicates the quality of an executed movement. A trajectory τ = **y**1:*T*, **u**1:*<sup>T</sup>* <sup>−</sup> <sup>1</sup> is specified by the simulated joint angles **y** and the applied controls (torques) **u**, where *T* denotes the number of time steps. We want to find a movement primitive's parameter vector **θ**<sup>∗</sup> = argmin**θ***J*(**θ**) which minimizes the expected costs *J*(**θ**) = E[*C*(τ)|**θ**]. We assume that we can evaluate the expected costs *J*(**θ**) for a given parameter vector **θ** by performing roll-outs (samples) on the real or simulated system. In other words each movement trajectory is quantified by a single scalar reward *C*(τ), which can be used by an optimization method to improve the best guess of the movement policy **θ**.

For learning or optimizing the policy parameters **θ** a variety of policy search algorithms exist in the motor control literature. Examples are the REINFORCE (Williams, 1992), the episodic Natural Actor Critic (Peters and Schaal, 2008), the Power (Kober and Peters, 2011) or the PI2 (Theodorou et al., 2010) algorithm, which are reviewed in Kober and Peters (2011). Alternatively, standard optimization tools such as the 2nd order stochastic search methods (Hansen et al., 2003; Wierstra et al.,

**FIGURE 3 | Overview of the learning framework. (A)** A parametrized policy **θ** modulates the output of a movement primitive that is used to generate a movement trajectory τ. The quality of the movement trajectory is indicated by a sparse reward signal *C*(τ) which is used for policy search to improve the parameters of the movement primitive. For a single iteration the

implemented policy search method - Covariance Matrix Adaptation (CMA) (Hansen et al., 2003) is sketched in **(B)**. The parameter space is approximated using a multivariate Gaussian distribution denoted by the ellipses, which is updated (from left to right) using second order statistics of roll-outs or samples that are denoted by the large dots (see text for details).

2008; Sehnke et al., 2010) can be used for policy search. These machine learning tools make no assumptions on a specific form of a policy and typically have just a single parameter to tune, the initial exploration rate. We therefore use the stochastic search method Covariance Matrix Adaptation (CMA) (Hansen et al., 2003) for learning the policy parameters in our experiments.

Roughly, CMA is an iterative procedure that locally approximates the function *C*(τ(**θ**)) by a multivariate Gaussian distribution, which is denoted by the ellipse in the sketch in **Figure 3B**. From left to right a single optimization step for a two-dimensional policy vector **θ** = [*w*1, *w*2] is shown. The colored regions denote the unknown optimization landscape, where solid lines depict equal *C*(τ) values. From the current Gaussian distribution, denoted by the ellipse in the left panel, CMA generates a number of samples, denoted by the black dots, evaluates the samples (the size of the dots in the center panel is proportional to their *C*(τ) values), computes second order statistics of those samples that reduced *C*(τ) and uses these to update the Gaussian search distribution, which is shown in right panel. For algorithmic details we refer to Hansen et al. (2003).

Note that for most interesting robotic tasks the unknown optimization landscape that is also sketched in **Figure 3B** is multimodal and policy search might converge to a local optimum. Thus, the result of learning is sensitive to the initial policy parameters **θ** and for evaluating the convergence rate of different policy search methods multiple initial configurations should be considered (Kober and Peters, 2011). However, in this manuscript we evaluate the characteristics of movement primitive representations and put less emphasis on a particular policy search method. As we will demonstrate in our experiments CMA is robust in terms of converging to "good" solutions given the initial values of the evaluated movement primitive representations listed in the appendix.

In our experiments we compare single task learning with DMPs to learning multiple tasks simultaneously with DMPSynergies. With DMPs for each task *k* = 1..*K* an individual policy vector **θ***<sup>k</sup>* is learned, where the objective function used in policy search takes the task index as additional argument, i.e., *C*(τ, *k*). For learning multiple tasks simultaneously with DMPSynergies the policy vector **θ** encodes all *K* task specific parameters β*m*, *<sup>k</sup>* and *sm*, *<sup>k</sup>*, and all shared parameters denoted by θ*<sup>m</sup>* in Equation 5 or Equation 6. The objective function is the sum of the individual task dependent costs *C*(τ) = -*K k* = 1 *C*(τ, *k*).

### **3. RESULTS**

We evaluated the proposed movement representation, the DMPSynergies, with simulations using three multi-task learning scenarios. A simple via-point task was used to illustrate the characteristics of the proposed movement representation. A challenging robotic learning task was used to generate rhythmic walking patterns for multiple step heights. Discrete reaching movements were learned using a musculoskeletal model of a human arm with eleven muscles.

### **3.1. VIA-POINT REACHING TASK WITH A SIMPLE TOY MODEL**

The goal of this simple multi-task learning problem is to pass through *k* = 1..5 via-points (vp*<sup>k</sup>* ∈ {0.2, 0.1, 0, −0.1, −0.2}), denoted by the large dots in **Figure 4A** and navigate to the goal state *g* at 1. We used a point mass system (1 kg), where the state at time *t* is given by the position *yt* and the velocity *y*˙*t*. The applied controls *ut* shown in **Figure 4B** are computed using the linear feedback control law with *kpos* = 400 and *kvel* = 15 specified in Equation 4. The finite time horizon is given by *T* = 50. For the dynamical system in Equation 1 we used the parameters α*<sup>z</sup>* = 2, β*<sup>z</sup>* = 0.9 and τ = 0.1. Further parameter settings used for policy search are summarized in **Table A1** in the appendix.

This task is specified by the objective function

$$\begin{split} C(k) &= 10^5 \left( \mathbf{y}\_{\text{hp}\_k} - \text{vp}\_k \right)^2 + 10^4 \left( \dot{\mathbf{y}}\_T^2 + 10 \left( \mathbf{y}\_T - \mathbf{g} \right)^2 \right) \\ &+ 5 \cdot 10^{-4} \sum\_{t=1}^T \boldsymbol{u}\_t. \end{split}$$

The first two terms punish deviations from the via-point vp*<sup>k</sup>* and the goal state *g*, where *yt*vp*<sup>k</sup>* denotes the position of the state at the time index of the *k*th via-point. The last term punishes high energy consumption, where *ut* denotes the applied acceleration. Note that for simplicity we did not introduce the variable τ denoting the movement trajectory in *C*(τ, *k*) in Subsection 2.4. We always add a Gaussian noise term with a standard deviation of σ = 0.5 to the control action to simulate motor noise.

We used a single synergy (*M* = 1) with *N* = 2 Gaussians to model the shared prior knowledge. The learning curve is shown in **Figure 4C**, where we compare to single-task learning using DMPs with *N* = 8 Gaussian basis functions. For the via-point task 8 Gaussians were optimal with respect to the convergence rate, where we evaluated representations using *N* = 2..20 Gaussians (not shown). Additionally, we compare to an incremental learning setup (DMP inc.) in **Figure 4C**, where the DMP representation is always initialized with the best learned solution from the previous task. On the x-axis the number of samples or trajectory evaluations on the point mass system is plotted. As it can be seen the proposed approach can benefit from the shared knowledge and has a faster overall learning performance.

In this experiment, for each task we fixed the time-shift *sk* = 0 and only learned the *k* = 1..5 weights β*<sup>k</sup>* in Equation 5 (Note that the synergy index *m* was omitted as only a single synergy was used). For each of the *N* = 2 Gaussians we learned the mean μ, the bandwidth *h* and the amplitude *a* in Equation 7. Thus, in total 5 + 2 × 3 = 11 parameters were learned. In contrast with DMPs 8 Gaussian amplitudes were optimized.

The β values of the DMPSynergies representation for the five via-points are shown in **Figure 4D** for 10 runs. New motor skills can be generated without re-learning via a simple linear interpolation. The resulting trajectories are shown in **Figure 4E**. However, this is only the case in the simple via-point task. For more complex tasks these β values have to be learned.

**FIGURE 4 | Results for the dynamic via-point task.** The goal of this simple multi-task learning problem is to pass through five via-points, denoted by the large dots in **(A)** and navigate to the target state at 1. The corresponding controls (accelerations) of this dynamical system are shown in **(B)**. These five trajectories are simultaneously learned using DMPSynergies with a single synergy (*M* = 1) represented by *N* = 2 Gaussians. We compare to dynamic movement primitives (DMPs) with *N* = 8 Gaussians and to an incremental

variant of DMPs in **(C)**. For the DMP approaches each task (via-point) has to be learned separately. Thus, the two learning curves have five peaks. In contrast with DMPSynergies we could learn these five tasks at once, which resulted in faster overall convergence. The plot in **(D)** illustrates the mean and the standard deviation of the learned β values for the DMPSynergy approach. Via interpolating β and by reusing the learned synergy new motor skills can be generated without re-learning. This is illustrated in **(E)**, where β ∈ [0.07, 0.34].

# **3.2. DYNAMIC BIPED WALKER TASK**

To evaluate the DMPSynergies on a multi-dimensional robotic task we learned multiple walking patterns using a 5 degree-of-freedom (DoF) dynamic biped robot model, which is shown in **Figure 5A**. We demonstrate that by exploiting the shared knowledge among multiple walking gaits, solutions could be found more robustly and more efficiently in terms of learning speed compared to single task learning. Further, the shared synergies could be used to generalize new skills. The model is only as complex as required to study difficulties like limb coordination, effective underactuation, hybrid dynamics or static instabilities. More details on the design and challenges can be found in Westervelt et al. (2004).

The 10-dimensional state **q***<sup>t</sup>* = [q1:5, q˙1:5] of the robot is given by the hip angles (q1 and q2), the knee angles (q3 and q4), a reference angle to the ground (q5), and the corresponding velocities q˙1:5. Only the hip and the knee angles are actuated. Thus, 4 dynamical systems in Equation 1 are used to generate desired trajectories for the linear feedback controller in Equation 4. A phase resetting strategy is implemented to facilitate learning (Nakanishi et al., 2004). At each impact of the swing leg the phase φ in Equation 8 is set to zero. This increases the stability of the robot as the gait cycle duration is implicitly given by the impact time.

The initial state **q**<sup>1</sup> ∈ R10, the goal state **g** and the control gains in Equation 4 were optimized in advance for a desired step height of *r*<sup>∗</sup> = 0.2 m to simplify learning. The resulting values are shown in **Table A2** in the appendix. For rhythmic movements the goal state **g** ∈ R<sup>5</sup> models an attractor point which is only specified for joint angles and not for velocities in Equation 1. As for the viapoint reaching task, Gaussian noise with σ = 1 is added to the

the hip angles q1, q2 and the knee angles q3, q4 are actuated. The reference angle to the flat ground is denoted by q5. In this multi-task learning experiment we want to learn walking patterns for different step heights. Examples for step heights of 0.15, 0.25, and 0.3 m for a single step are shown in **(B–D)**. These patterns were learned using the proposed movement primitives with shared synergies (*M* = 2 and *N* = 3). The green bars in **(B–D)** denote the true (maximum) step heights, which are 0.19, 0.24, and 0.31 m.

simulated controls. For Equation 1 we used the parameters α*<sup>z</sup>* = 2, β*<sup>z</sup>* = 0.5 and τ = 0.06. The initial parameter values and the applied ranges used for policy search are shown in **Table A3** in the appendix.

In this multi-task learning experiment we want to learn walking patterns for different desired step heights: *r*<sup>∗</sup> *<sup>k</sup>* ∈ {0.15, 0.2, 0.25, 0.3} m. Example patterns for step heights of 0.15, 0.25 and 0.3 m are shown in **Figures 5B–D**, where the green

bars denote the maximum step heights during a single step (0.19, 0.24 and 0.31 m).

The objective function for a single walking task is given by the distance travelled in the sagittal plane, the duration of the simulation and deviations from the desired step height *r*<sup>∗</sup> *<sup>k</sup>* with *k* = 1..4:

$$C(k) = -0.6(\mathbf{x}\_T - \mathbf{x}\_1) + 0.2(\mathbf{5} - T \cdot \Delta t) + 50 \sum\_{i=1}^{S} (r\_i - r\_k^\*)^2,\tag{11}$$

where *x* denotes the x-coordinate of the hip, *S* the number of steps and *ri* the maximal step height during the *i*th step. We used a time step *t* = 2 ms. The time horizon *T* ∈ [1, 5000] is given by the last valid state of the robot, where the biped does not violate the joint angle constraints specified by **q**min and **q**max in **Table A2** in the appendix.

With the proposed DMPSynergies the non-linear function *f*(φ, *k*) in Equation 6 is generated by combining a set of learned synergies that are shared among multiple task instances, i.e., the four (*k* = 1..4) desired step heights. This combination mechanism is illustrated for a representation using *M* = 2 synergies modeled by *N* = 3 Gaussians in **Figure 6**. For each actuator (left hip, right hip, left knee, and right knee) an individual function *f*(φ, *k*) is generated, which is subsequently used to modulate an attractor system shown in Equation 1 to compute the desired movement trajectories. The shared synergies shown in the last two rows in **Figure 6** can be scaled and shifted in time. This is indicated by the enclosing rectangles. Note that the color of the synergies is used to distinguish the four actuators of the walker model.

We evaluated different movement primitive representations with increasing complexity compared to single-task learning using DMPs with *N* = 4 and *N* = 8 Gaussians. The average final

**Table 1 | Achieved costs for the walker task, where the standard deviation is denoted by the symbol ±.**


*In the second column the symbol denotes if additionally 16 M time-shift variables s are learned. The total number of parameters is denoted by the symbol #, where e.g., for the representation in the 1st row 4 M* = *8 task related weights and 3* · *M* · *N* · *4* = *48 shared parameters were optimized. The best results with the lowest costs denoted by Cmean are highlighted in gray shading.*

costs Cmean after learning over 10 runs are shown in **Table 1**. In the most simple representation we used *M* = 2 synergies modeled by *N* = 2 Gaussians. More complex representations implementing time-varying synergies are denoted by the symbol = 1 in **Table 1**. Here, additionally the time-shifts *s*1:*<sup>M</sup>* were learned for all synergies and all actuators. However, the final learning performance did not outperform the representation with fixed time-shifts (i.e., *M* = 2, *N* = 3 and = 0: −21.4 ± 0.4 compared to *M* = 2, *N* = 3 and = 1: −20.5 ± 1.4). This can be also seen in **Figure 7**, where we plot the learning curve for synergies with *s*1:*<sup>M</sup>* = 0 in **Figure 7A** and the results for time-varying synergies in **Figure 7B**.

The average final cost value of the DMP representation is higher (i.e., DMP*<sup>N</sup>* <sup>=</sup> 8: −14.8 ± 1.4) compared to the best costs achieved with shared synergies (*M* = 2, *N* = 3 and = 0: −21.4 ± 0.4). This holds also for an incremental learning setup (e.g., DMP inc.*<sup>N</sup>* <sup>=</sup> 4: −19.2 ± 0.6), where DMPs were initialized with the best result from the previous task.

The joint angle trajectories of the left hip and knee joint for the DMPSynergy representation using *M* = 2 synergies modeled by *N* = 3 Gaussians and = 1 are illustrated in **Figure 8**. The average step heights were *r* ∈ {0.22, 0.22, 0.26, 0.28}, which do not match the desired step heights *r*<sup>∗</sup> ∈ {0.15, 0.2, 0.25, 0.3}. The reason for this is that the objective function in Equation 11 is designed to prefer correct multi-step walking movements over exact matches of the step heights since learning to walk is already a complex learning problem (approximately 90% of the costs are determined by the travelled distance and only 5% are caused by the distance to the desired step heights). However, for the different desired step heights the shape of the trajectories as well as the moment of the impact vary. The moments of impact are denoted by arrows in **Figure 8**.

**FIGURE 7 | Learning curves for the biped walker task.** This figure illustrates the learning performance over 10 runs of the proposed approach using *M* = 2 synergies with *N* = 3 Gaussian basis functions. In **(A)** the time-shift variables **s** are not learned and set to zero. Whereas, in **(B)** also these **s** variables are adapted during learning. We compare to the dynamic movement primitives (DMP) with *N* = 4 Gaussians in **(A)** and to DMPs with *N* = 8 Gaussians in **(B)**. *DMP inc.*

While generalizing to new motor skills was straightforward for the simple via-point task, for the walking tasks a linear interpolation turns out to be ineffective. We therefore demonstrate in **Figures 7C,D** how a new walking pattern for a desired step height of *r*<sup>∗</sup> = 0.1 m can be learned be reusing the previously learned prior knowledge (taking the best solution for *r*<sup>∗</sup> = 0.25 m) for *M* = 2 synergies modeled by *N* = 3 Gaussians and = 1. Only the weights β1:*<sup>M</sup>* are optimized in this experiment, keeping the learned time-shifts fixed. The costs in **Figure 7C** and the average step height *r* in **Figure 7D** demonstrate the advantage of using a fixed prior, where we compare to DMPs with *N* = 8 Gaussians.

### **3.3. MULTI-DIRECTIONAL REACHING TASK WITH A MUSCULOSKELETAL MODEL OF THE HUMAN ARM**

A simplified model of a human arm based on the model by Holzbaur et al. (2005) was used to learn six reaching tasks simultaneously. The shoulder and the elbow joint were modeled by hinge joints. Thus, only movements in the sagittal plane were possible. The initial arm configuration and the six target locations (with a distance of 15cm to a marker placed on the radial stylion) are shown in **Figure 9A**. A learned example movement is illustrated in **Figure 9B**, where the cylinders, the spheres and the ellipsoids denote wrapping surfaces discussed in Subsection 2.3.

We focused on fast reaching movements of 500 ms duration (*T* = 500 and *t* = 1 ms) that can be implemented in an open-loop control scheme. Note that with our approach also closed-loop systems with feedback could be implemented, as discussed below. Thus, the learnable non-linear function **f**(*s*, *k*) in Equation 10 is directly used as input to the system in form of muscle excitation patterns. The parameter settings for learning are shown in **Table A4** in the appendix.

For learning the reaching tasks we evaluated the Euclidean distance of a marker **v***k*(*t*) placed on the radial stylion to a given target **g***k*, where *k* = 1..6 denotes the task index. Additionally, large muscle excitations signals are punished:

$$C(k) = \sum\_{k=1}^{6} \mathbf{3} \cdot \frac{1}{T} \sum\_{t=1}^{T} \|\mathbf{g}\_k - \mathbf{v}\_k(t)\| + 10^{-3} \int\_{s=0}^{1} \mathbf{f}(s,k)^T \mathbf{f}(s,k) ds,\tag{12}$$

where . denotes the Euclidean distance between the marker **v***k*(*t*) and the target **g***<sup>k</sup>* at time *t*.

We evaluated five movement representations, defined in Equation 10, with an increasing number of shared synergies, i.e., *M* = {1, 2, 3, 4, 5}. Each synergy is represented by a single (*N* = 1) Gaussian. For each target and for each synergy the task-specific parameters β*k*,*<sup>m</sup>* and *sk*,*<sup>m</sup>* are learned. The number of taskspecific and the number of task-invariant or shared parameters is shown in **Table 2**.

**FIGURE 9 | Musculoskeletal model for learning reaching tasks.** A model of a human arm with eleven muscles shown in **Table A5** in the appendix was used to learn six reaching skills in the sagittal plane **(A)**. As reward signal we encoded the distance to a marker placed on the radial stylion (denoted by the *plus* symbol) and punished large muscle

excitation signals. Targets are denoted by large dots. We focused on fast reaching skills of 500 ms duration, where an example movement is shown in **(B)**. To simulate how muscles wrap over underlying bone and musculature wrapping surfaces are implemented as cylinders, spheres and ellipsoids (Holzbaur et al., 2005).

**Table 2 | Details of the evaluated parametrizations and achieved costs for the reaching task.**


*M denotes the number of implemented synergies, Cmean the final cost values, and the symbol* ± *the standard deviation. We use* #*<sup>K</sup> to denote the number of task-specific parameters (6* · *M) and* #*<sup>M</sup> to denote the number of task-invariant or shared parameters (3* · *11* · *M).*

We hypothesized that a muscle excitation signal can be generated by combining a small number of learned synergies. An example for the anterior deltoid muscle (DeltA) is shown in **Figure 10** for two movement directions. Here, DMPSynergies with *M* = 4 synergies were used to generate the muscle excitation patterns. The muscle excitation patterns for all six movement directions and all eleven muscles are shown in **Figure 11**. Two observations can be made: first, as our objective function in Equation 12 punishes large muscle excitation signals a sparse representation of multiple motor skills is learned. Second, the learned muscle patterns partially show the typical triphasic behavior of human movement (Angel, 1975; Hallett et al., 1975; Berardelli et al., 1996; Chiovetto et al., 2010), where individual muscles (e.g., DeltA, PectClav and BRA in the first column in **Figure 11**) become activated at the onset of the movement, shortly before the velocity peak to decelerate, and finally, multiple muscles cocontract at the target location. These three events are denoted by the labels 1, 2, and 3 in the last row in **Figure 11**, where a threshold of 2 cm s−<sup>1</sup> was used to determine the movement onset and the termination of the movement.

Comparing all five movement representations (*M* = {1, 2, 3, 4, 5}), we found that at least three synergies were necessary to accomplish all reaching tasks. This is shown in **Figure 12A**, where with only one (*M* = 1) or two synergies (*M* = 2) not all targets can be reached. Shown are the marker trajectories of three independent learning sessions (out of ten

**FIGURE 10 | Synergy combination mechanism.** We hypothesize that a muscle excitation signal can be generated by combining a small number of learned synergies. Here, we illustrate this combination process for the deltoid anterior (DeltA) with four synergies for two movement directions. For the two movement directions different combination coefficients β*m*, *<sup>k</sup>* and different time-shift parameters *sm*, *<sup>k</sup>* were learned. The synergies are represented by a single parametrized Gaussian, where the corresponding basis function for DeltA is denoted by a bold line in the enclosing rectangles.

runs). Note that similar findings were obtained in analyzing human arm reaching movements, where four to five synergies were observed (d'Avella et al., 2006). The corresponding learning curves for all five movement representations are shown in **Figure 12B**, where the parametrizations with *M* = 3..5 synergies perform equal. This is also reflected in the final costs shown in **Table 2** (rows 3..5). As an example the marker trajectories and the tangential velocity profiles for the representation using *M* = 4 synergies are illustrated in **Figure 12C**. As we evaluated an open-loop control scheme these marker trajectories did not exactly terminate at the target location (after the limited number of episodes for learning). However, by increasing the number of episodes or by adding feedback the terminal accuracy could be improved.

For testing the generalization ability of DMPSynergies we rotated all six targets by 30 degrees and only re-learned the task-specific coefficients, i.e., the mixing coefficients β*m*, *<sup>k</sup>* and the time-shift parameters *sm*, *<sup>k</sup>*. Interim solutions with a movement representation implementing *M* = 4 synergies are shown in **Figure 13A**. Note that, as we evaluated an open-loop controller, the rotated targets were unknown to the controller. Solely the objective function in Equation 12 quantifies deviations from the targets. After 15 episodes a first trend toward the new targets was visible, however, most of the trajectories (three learned solutions are illustrated) ended at the original targets. The corresponding learning curves for DMPSynergies with three (*M* = 3) and four (*M* = 4) synergies are shown in **Figure 13B**. The learning curve for the unperturbed scenario from the previous experiment is denoted by the dashed line (*M* = 4 orig.). Note that in both the unperturbed and the perturbed experiments *K* = 6 reaching movements were learned, which demonstrates the benefit of the shared learned knowledge when generalizing new skills. For a comparison the blue line denoted by DMP *N* = 4 illustrates the convergence rate of single task learning with DMPs, where DMPSynergies (*M* = 4 orig.) can compete in terms of learning speed.

# **4. DISCUSSION**

We proposed a movement representation based on learned parametrized synergies (DMPSynergies) that can be linearly combined and shifted in time. These learned synergies are shared among multiple task instances significantly facilitating learning of motor control policies. This was demonstrated on simulated robotic and on musculoskeletal systems. Below we discuss the significance and the implication of our findings with respect to robotics and biological motor control.

# **4.1. EXPLOITING SHARED SYNERGIES FOR MOTOR SKILL LEARNING IN ROBOTICS**

For motor skill learning in robotics a common strategy is to use parametrized elementary movements or movement primitives (Kober and Peters, 2011). In this paper we proposed

**FIGURE 12 | Learning multi-directional reaching movements.** We evaluated five movement representations with an increasing number of shared synergies, i.e., *M* = {1, 2, 3, 4, 5}. The resulting trajectories of the marker placed on the radial stylion are shown in **(A,C)**, where with less than three synergies not all targets can be reached. Illustrated are three independent learning results. In **(B)** we illustrate the average learning curves over 10 runs for these movement representations. For the representation using *M* = 4 synergies shown in **(C)** additionally the tangential velocity profiles are illustrated.

**FIGURE 13 | Generalization to new reaching directions.** For testing the generalization ability of the proposed DMPSynergies we fix the learned shared synergies and only adapt the task-specific parameters, i.e., the mixing coefficients β*m*, *<sup>k</sup>* and the time-shift parameters *sm*, *<sup>k</sup>* . The *K* = 6 targets were rotated by 30 degrees, where in **(A)** the marker trajectories after 15, 50, 200, and 1000 episodes for a movement representation with *M* = 4 synergies are shown. In **(B)** we show the averaged learning curves for DMPSynergies with three and four synergies over 10 runs (*M* = 3 and *M* = 4). The learning curve for the unperturbed scenario from the previous experiment is denoted by the dashed line (*M* = 4 orig.). For a comparison the blue line denoted by DMP *N* = 4 illustrates the convergence rate of single task learning.

a generalization of the most widely used movement primitive representation in robotics, dynamic movement primitives (DMPs) (Schaal et al., 2003; Ijspeert et al., 2013). DMPs evaluate parametrized dynamical systems to generate trajectories. The dynamical system is constructed such that the system is stable. This movement representation has many advantages. It is a model-free approach, partially explaining its popularity in robotics as model learning in high-dimensional stochastic robotics systems is challenging. Further, its stable attractor system facilitates learning and DMPs can represent both rhythmic and discrete movements. Meta parameters can be used for adapting the movement speed or the goal state. Finally, the movement representation depends linearly on the policy parametrization, i.e., the learnable function *f* depends linearly on the parameters **θ** of the movement primitive: *f*(*s*) = (*s*)*T***θ**, where *s* is the time or phase variable. As a result, imitation learning for DMPs is straightforward, as this can simply be done by performing linear regression (Schaal et al., 2003). However, for each task *k* an individual set of parameters **θ***<sup>k</sup>* has to be learned, which unnecessarily complicates learning of a large number of related motor skills. In contrast we proposed a generalization that allows for reusing shared knowledge among multiple related motor skills, i.e., the parameter vector **θ** is task-invariant.

In particular, we replaced the non-linear modulation function *f*(.) in DMPs by a hierarchical function approximator. On the lower level task related parameters (amplitude scaling weights and time-shift parameters) are used to modulate a linear superposition of basis functions. These basis functions encode shared higher level knowledge and are modeled by a mixture of Gaussians. With the proposed DMPSynergies representation discrete and rhythmic movements can be generated. By using Gaussians at the higher level DMPs can be implemented as special case. However, the DMPSynergies can compete with DMPs in terms of learning efficiency while allowing for learning multiple motor skills simultaneously.

This was demonstrated in two robotic multi-task learning scenarios, where we showed that, with the DMPSynergies, good policies could be found more reliably, i.e., local minima with high cost values were more often avoided, more efficiently (fewer samples were needed), and new skills could be generalized by exploiting the previously learned shared knowledge. A simple via-point task was used to demonstrate the characteristics of the approach, where the proposed movement representation could be used to generalize new movement trajectories by applying a linear interpolation on the synergy's weights β. In a second robotic task, a biped walker task, the hierarchical representation was used to learn walking patterns with multiple step heights. In this complex reinforcement learning task, it was shown that better solutions were found more reliably by exploiting the learned shared knowledge, which is a strong feature of a movement representation. While also with the classical DMP approach high quality movement gaits were learned, on average the achieved costs were higher compared to the proposed hierarchical synergies representation, i.e., −19.2 ± 0.6 for DMPs with 4 Gaussians (and with incremental learning) compared to −21.4 ± 0.4 when using *M* = 2 synergies with *N* = 3 Gaussians (where the time-shift parameters were fixed and set to zero, = 0). In this experiment 10, 000 samples were needed to learn 4 walking gaits simultaneously, where the DMPSynergies approach can compete with DMPs (15, 000 samples). Additionally, we demonstrated in a generalization experiment that walking patterns for an unknown step height (*r*<sup>∗</sup> = 0.1 m) could be learned with 100 samples by exploiting the previously learned prior knowledge.

While DMPs (Schaal et al., 2003; Ijspeert et al., 2013) are most closely related to our shared synergies approach, there exist a few other approaches (Chhabra and Jacobs, 2006; Alessandro et al., 2012) also implement shared knowledge. In Chhabra and Jacobs (2006) a variant of non-negative matrix factorization (d'Avella et al., 2003) was used to compute the synergies given a set of trajectories created by applying stochastic optimal control methods (Li and Todorov, 2004). In Alessandro et al. (2012) an exploration phase was introduced to compute the dynamic responses of a robot system with random initialization. After a reduction phase, where a small number of proto-tasks were executed, a reduced set of dynamic responses was used to compute the synergies matrix by solving a linear system of equations. We proposed an alternative for learning the synergies and their combination parameters, where all unknowns are learned in a reinforcement learning setting from a single sparse reward signal. Moreover, for robotic tasks we embed the synergies approach in stable dynamical systems like in DMPs. This combines the benefits of DMPs and muscle synergies, namely the efficient learning ability of DMPs in high-dimensional systems and the hierarchical representation of movements that can be used for multi-task learning.

As with DMPs the complexity of the DMPSynergies representation can be scaled by the number of combined synergies or the number of implemented Gaussians modeling these synergies. However, as the trajectories generated with our representation depend non-linearly on the policy parameters (in contrast to DMPs) more sophisticated decomposition strategies like for example d'Avella and Tresch (2001); Chiovetto et al. (2013) are needed for imitation learning. With such approaches the extracted synergies could be implemented as initial solutions in our learning framework.

# **4.2. LEARNED SHARED SYNERGIES FOR BIOLOGICAL MOVEMENT GENERATION**

The idea of reusing shared knowledge for movement generation is a well-known concept in biological motor control. Muscle activation patterns recorded during multiple task instances of natural motor behavior, i.e., fast reaching movements of humans (d'Avella et al., 2006), primate grasping movements (Overduin et al., 2008), or walking patterns (Dominici et al., 2011), could be efficiently modeled by combining only few muscle activation patterns. In particular, time-varying muscle synergies (d'Avella et al., 2003; Bizzi et al., 2008) were proposed to be a compact representation of muscle activation patterns. The key idea of this approach is that muscle activation patterns are linear sums of simpler, elemental functions or synergies. Each muscle synergy can be shifted in time and scaled with a linear factor to construct a large variety of activation patterns. In this manuscript we proposed a generative model to represent and learn time-varying synergies (d'Avella et al., 2006).

The proposed framework allows for studying the concept of muscle synergies from a generative perspective in contrast to the analytical approach, where muscle synergies are identified from observed data. Applying such a generative approach to a musculoskeletal model, we could provide a proof-of-concept of the feasibility of a low-dimensional controller based on shared synergies and a demonstration of its learning efficiency. Moreover, we could ask different question, i.e., how does performance scale with the complexity of the movement representation, how sparse is the encoding of the muscle patterns to solve particular tasks, and how well does the learned representation generalize to new movements? We addressed these questions in a multidirectional reaching task, where we investigated a musculoskeletal model of the upper limb with 11 muscles. Motor skills for 6 reaching directions were learned within 3000 episodes and by exploiting the learned shared synergies movements for rotated target directions can be generalized 3 times faster (**Figure 13**). We found that a minimum of three synergies were necessary to solve the task (**Figure 12B**). In our objective function large muscle excitation signals were punished, which resulted in a sparse representation of muscle excitation patterns. This sparse representation illustrated in **Figure 11** shows similarities to observed electromyographic activity recorded in related human reaching tasks (d'Avella et al., 2006), i.e., triphasic muscle patterns, where some of the muscles contributed at the movement onset, some at point of the maximum tangential velocity, and some at the end of the movement to co-contract. However, sensor feedback might be an important modulation signal to make this effect more pronounced.

The model was designed to capture salient features of human musculoskeletal system, such as muscle activation dynamics, Hilltype musculotendinous units, realistic geometry. However, to reduce the computational effort needed to simulate a movement we made a few simplifying assumptions. First, a limited number of muscles (11) were implemented, where simplified wrapping objects and muscle paths were modeled. Further, we implemented the shoulder and the elbow joint as hinge joints. Thus, only reaching movements in the sagittal plane could be performed. Finally, we focused on fast reaching movements in an open-loop control scheme. This was a valid assumption for comparing to human data for fast reaching movements (d'Avella et al., 2006). However, our proposed learning and control framework also allows for implementing closed-loop controllers, i.e., when introducing an inverse kinematics model ∈ R*Dx*<sup>3</sup> in Equation 1, i.e., τ**z**˙ = (α*z*(β*z*(**g** − **y**∗) − **z**)) + **f**, where *D* denotes the number of actuators and we assumed that the goal state **g** lives in a threedimensional Cartesian space. The inverse kinematics model maps the feedback error signal into the muscle pattern space and modulates the learned muscle excitation basis **f** ∈ R*D*. With such closed-loop systems we might better understand the contribution of feedback to muscle control in biological movement generation (Lockhart and Ting, 2007).

Musculoskeletal models have been used before to investigate movement generation with muscle synergies (Berniker et al., 2009; Neptune et al., 2009; McKay and Ting, 2012). Berniker and colleagues used model-order reduction techniques to identify synergies as a low-dimensional representation of a non-linear system's input/output dynamics and optimal control to find the activations of these synergies necessary to produce a range of movements. They found that such a set of synergies was capable of producing effective control of reaching movements with a musculoskeletal model of a frog limb and that it was possible to build a relatively simple controller whose overall performance was close to that of the system's full-dimensional non-linear controller. Neptune and colleagues generated muscle-actuated forward dynamics simulations of normal walking using muscle synergies identified from human experimental data using nonnegative matrix factorization as the muscle control inputs. The simulation indicated that a simple neural control strategy involving five muscle synergies was sufficient to perform the basic sub-tasks of walking. McKay and Ting, studying an unrestrained balance task in cats, used a static quadrupedal musculoskeletal model of standing balance to identify patterns of muscle activity that produced forces and moments at the center of mass (CoM) necessary to maintain balance in response to postural perturbations. CoM control could be accomplished with a small number of muscle synergies identified from experimental data, suggesting that muscle synergies can achieve similar kinetics to the optimal solution, but with increased control effort compared to individual muscle control. In line with these simulation studies, we also found that a small number of muscle synergies was sufficient to perform multiple reaching tasks in a forward dynamic simulation of a musculoskeletal model. However, we did not use experimental data or model-order reduction techniques to identify muscle synergies. In our framework, both synergy structural parameters and synergy combination parameters were found with reinforcement learning, supporting the generality of the solutions identified. Moreover, we were able to test the generalization ability of the synergies in the same framework by optimizing only the task-specific synergy combination parameters.

The proposed reinforcement learning framework with movement primitives relates to optimal control approaches in the biological motor control literature (Delp et al., 2007; Erdemir et al., 2007). In these simulation studies muscle patterns are parametrized by e.g., bang-bang (on-off) controls, constant control values, or control vectors approximated with polynomials [see Table 2 in Erdemir et al. (2007) for an overview of different control strategies]. However, to the best of our knowledge non of these approaches implemented shared synergies as control signal representation for learning multiple task instances simultaneously. Even with complex representations, e.g., with *M* = 5 synergies learning 225 parameters converged within 3000 episodes, which is a promising feature of the proposed approach for studies on more complex musculoskeletal models.

In this manuscript we demonstrated how time-varying synergies (d'Avella et al., 2006) can be implemented and learned from

# **REFERENCES**

Alessandro, C., Carbajal, J., and d'Avella, A. (2012). "Synthesis and adaptation of effective motor synergies for the solution of reaching tasks," in *From Animals to Animats (SAB 2012)*, Vol. 7426 *Lecture Notes in Computer Science*, eds T. Ziemke, C. Balkenius, and J. Hallam (Denmark: Odense), 33–43. doi: 10.1007/978-3-642- 33093-3\_4

Angel, R. W. (1975). Electromyographic patterns during ballistic movement of normal and spastic limbs. *Brain Res.* 99, 387– 392. doi: 10.1016/0006-8993(75) 90042-6

Berardelli, A., Hallett, M., Rothwell, J. C., Agostino, R., Manfredi, M., Thompson, P. D., et al. (1996). Single-joint rapid arm movements in normal subjects and in patients with motor scratch. Interestingly, by adding an additional constraint on the movement representation, i.e., by using a single policy vector for all actuators anechoic mixing coefficients (Giese et al., 2009) can be implemented. However, in general any synergy representation such as synchronous synergies (Ivanenko et al., 2004; Dominici et al., 2011) used for locomotion can be learned. Thus, we do not argue for a particular synergy representation. Our representation was motivated to extend the widely used DMPs (Schaal et al., 2003) for exploiting shared task-invariant knowledge for motor skill learning in robotics.

# **5. CONCLUSION**

We proposed a movement primitive representation implementing shared knowledge in form of learned synergies. The representation competes with the state-of-the-art, it can implement DMPs (Schaal et al., 2003) as a special case, and it allows for an efficient generalization to new skills. Importantly, shared knowledge simplifies policy search in high-dimensional spaces, which was demonstrated in a dynamic biped walking task. Further, the proposed learned synergies are a compact representation of high-dimensional muscle excitation patterns, which allows us to implement reinforcement learning in musculoskeletal systems. In such frameworks muscle patterns are learned from scratch using a sparse reward signal, where we could investigate how muscles and muscle synergies contribute to a specific task, how complex a task-invariant representation must be, and how well the learned synergies generalize to changes in the environment. In a multi-directional arm reaching experiment we provided first insights to these questions. In future research the proposed movement generation and learning framework will be used to study feedback signals and feedback delays, imitation learning from biological data and the effect of simulated muscle surgeries.

# **ACKNOWLEDGMENTS**

This paper was written under partial support by the European Union project FP7-248311 (AMARSI) and project IST-2007- 216886 (PASCAL2). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

### **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www.frontiersin.org/journal/10.3389/fncom.2013.00138/ abstract

disorders. *Brain* 119, 661–674. doi: 10.1093/brain/119.2.661


(2008). Combining modules for movement. *Brain Res. Rev.* 57, 125–133. doi: 10.1016/j.brainresrev. 2007.08.004

Chadwick, E., Blana, D., van den Bogert, A., and Kirsch, R. (2009). A real-time, 3-d musculoskeletal model for dynamic simulation of arm movements. *Biomed. Eng. IEEE Trans.* 56, 941– 948. doi: 10.1109/TBME.2008. 2005946


267–282. doi: 10.1113/jphysiol. 2003.057174


*International Conference on Robotics and Automation (ICRA 2009)*, (Kobe). doi: 10.1109/ROBOT.2009. 5152385


*Proceedings of the 18th International Conference on Artificial Neural Networks, Part I, (ICANN 2008)*, (Berlin), 407–416.


*Trans.* BME-32, 826–839. doi: 10.1109/TBME.1985.325498

Zajac, F. E. (1989). Muscle and tendon: properties, models, scaling, and application to biomechanics and motor control. *Crit. Rev. Biomed. Eng.* 17, 359–411.

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 27 May 2013; paper pending published: 04 July 2013; accepted: 25 September 2013; published online: 17 October 2013.*

*Citation: Rückert E and d'Avella A (2013) Learned parametrized dynamic movement primitives with shared synergies for controlling robotic and musculoskeletal systems. Front. Comput. Neurosci. 7:138. doi: 10.3389/fncom. 2013.00138*

*This article was submitted to the journal Frontiers in Computational Neuroscience.*

*Copyright © 2013 Rückert and d'Avella. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# **APPENDIX**

For the via-point task the parameter settings for learning are shown in **Table A1**. Initial parameter values and parameter settings for policy search for the biped walker task are shown in **Table A2** and in **Table A3**. In **Table A4** we list the learning settings for the multi-directional reaching task using a musculoskeletal model of a human arm. The implemented muscles and their characteristic parameters are shown in **Table A5**.

### **Table A1 | Parameter settings for the discrete via-point task.**


*In brackets are the variable names for the dynamic movement primitives, which are used for comparison.*

### **Table A2 | Biped walker setting of pre-optimized quantities.**


### **Table A3 | Policy search parameter settings for the rhythmic walking task.**


*In brackets are the variable names for the dynamic movement primitives.*

# **Table A4 | Parameter settings for the multi-directional reaching task.**




*The tendon slack length is denoted by L<sup>T</sup> <sup>s</sup> , the maximum isometric force by F<sup>M</sup>* <sup>0</sup> *, the optimal fiber length by LM* <sup>0</sup> *, and the muscle pennation angle by* α*. To increase the reachable space, we adapted the tendon slack length LT <sup>s</sup> of a small number of muscles (bold numbers vs. the original values in brackets).*

# A musculoskeletal model of human locomotion driven by a low dimensional set of impulsive excitation primitives

# *Massimo Sartori 1,\*, Leonardo Gizzi 2, David G. Lloyd3 and Dario Farina1*

*<sup>1</sup> Department of Neurorehabilitation Engineering, Bernstein Focus Neurotechnology Göttingen, University Medical Center Göttingen, Göttingen, Germany*

*<sup>2</sup> Pain Clinic, Center for Anesthesiology, Emergency and Intensive Care Medicine University Hospital Göttingen, Göttingen, Germany*

*<sup>3</sup> Centre for Musculoskeletal Research, Griffith Health Institute, Griffith University, Gold Coast, QLD, Australia*

### *Edited by:*

*Yuri P. Ivanenko, IRCCS Fondazione Santa Lucia, Italy*

### *Reviewed by:*

*Juan C. Moreno, Spanish National Research Council, Spain Richard Neptune, The University of Texas at Austin, USA*

### *\*Correspondence:*

*Massimo Sartori, Department of Neurorehabilitation Engineering, Bernstein Focus Neurotechnology Göttingen, Bernstein Center for Computational Neuroscience, University Medical Center Göttingen, Georg-August University, Von-Siebold-Str. 4, Göttingen, 37075, Germany e-mail: massimo.srt@gmail.com*

Human locomotion has been described as being generated by an impulsive (burst-like) excitation of groups of musculotendon units, with timing dependent on the biomechanical goal of the task. Despite this view being supported by many experimental observations on specific locomotion tasks, it is still unknown if the same impulsive controller (i.e., a low-dimensional set of time-delayed excitation primitives) can be used as input drive for large musculoskeletal models across different human locomotion tasks. For this purpose, we extracted, with non-negative matrix factorization, five non-negative factors from a large sample of muscle electromyograms in two healthy subjects during four motor tasks. These included walking, running, sidestepping, and crossover cutting maneuvers. The extracted non-negative factors were then averaged and parameterized to obtain task-generic Gaussian-shaped impulsive excitation curves or primitives. These were used to drive a subject-specific musculoskeletal model of the human lower extremity. Results showed that the same set of five impulsive excitation primitives could be used to predict the dynamics of 34 musculotendon units and the resulting hip, knee and ankle joint moments (i.e., *NRMSE* = 0.18 ± 0.08, and *R*<sup>2</sup> = 0.73 ± 0.22 across all tasks and subjects) without substantial loss of accuracy with respect to using experimental electromyograms (i.e., *NRMSE* = 0.16 ± 0.07, and *R*<sup>2</sup> = 0.78 ± 0.18 across all tasks and subjects). Results support the hypothesis that biomechanically different motor tasks might share similar neuromuscular control strategies. This might have implications in neurorehabilitation technologies such as human-machine interfaces for the torque-driven, proportional control of powered prostheses and orthoses. In this, device control commands (i.e., predicted joint torque) could be derived without direct experimental data but relying on simple parameterized Gaussian-shaped curves, thus decreasing the input drive complexity and the number of needed sensors.

**Keywords: EMG-driven modeling, musculoskeletal modeling, lower extremity, multiple degrees of freedom, muscle dynamics, muscle synergy**

# **INTRODUCTION**

Human movement is the result of the coordinated excitation of musculotendon units (MTUs), which actuate multiple joints in the upper and lower extremities. Because of the inherent redundant nature of the human neuromuscular system, multiple MTU excitation patterns can result in the same joint moment, position, and motion (Tax et al., 1990; Buchanan and Lloyd, 1995). When performing a motor task, the neural drive to MTUs defines the specific excitation patterns among the many possible solutions. Understanding the mechanisms underlying an individual's excitation patterns is an open question in current movement neuroscience and biomechanics. This is fundamental for understanding human locomotion and for the development of novel neurorehabilitation technologies.

The neural drive, or excitation, to MTUs is ultimately determined by action potential trains generated from pools of alpha motor neurons that innervate specific MTUs (Farina and Negro, 2012). Surface electromyography (EMG) indirectly reflects the neural excitation to MTUs and can be easily recorded during human movement. For this reason, EMG signals recorded from the major lower extremity muscle groups have been used to directly drive open-loop forward dynamics simulations using models that are accurate, physiological, and anatomical representations of the human neuromusculoskeletal system (Lloyd and Besier, 2003; Sartori et al., 2012a,c).

It has been shown that the multi-muscular EMG patterns observed during motor behaviors have a lower dimensionality with respect to the number of muscles and associated MTUs (D'Avella et al., 2003; Bizzi et al., 2008). Therefore, the EMG excitation patterns can be expressed using a low-dimensional set of MTU excitation primitives (XPs). In human locomotion, the XPs have been consistently observed to be sequential and minimally overlapped impulses (burst-like) of excitation (Ivanenko et al., 2004, 2006; Cheung et al., 2005; Bizzi et al., 2008). Therefore, human locomotion has been interpreted as being generated by an impulsive excitation of groups of MTUs, with the timing dependent on the biomechanical goal of the task (Ivanenko et al., 2005). The association between XP timing and task performance was also particularly evident in (Oliveira et al., 2013).

Based on this experimental evidence, in this study we propose the use of a low-dimensional set of single-impulse, Gaussianshaped XPs to drive a physiologically accurate, subject-specific musculoskeletal model of the human lower extremity (Sartori et al., 2012a). Within the musculoskeletal model, the XPs operate as an impulsive controller, where the onset of an XP corresponds to the recruitment of the associated muscles and MTUs.

Although the use of single-impulse, Gaussian-shaped curves was previously supported by experimental evidence from human locomotion studies (Ivanenko et al., 2006), this current study is not focussed on any speculation on the physiological nature of human locomotion excitation patterns. Rather, the use of singleimpulse curves in this study has the purpose of exploiting actual primitives of excitations having simple mathematical formalizations. The combination of these single-impulse curves allows generation of more complex multi-impulse excitation inputs and MTU recruitment patterns that might emerge from human locomotion tasks, which have been often observed in the literature (Davis and Vaughan, 1993; Ivanenko et al., 2006; Clark et al., 2010).

Previous studies have used low-dimensional sets of multiimpulse curves within musculoskeletal models of the human lower extremity for the purpose of assessing the mechanical role of muscles during human locomotion (Neptune et al., 2009; McGowan et al., 2010; Allen and Neptune, 2012). Furthermore, other studies assessed and explored the conceptual idea of muscle synergies in relation to the biomechanics of human and animal movement (Zajac et al., 2002; McKay and Ting, 2008; Fregly et al., 2012b; Kutch and Valero-Cuevas, 2012; Ting et al., 2012).

The present study, however, addresses a number of questions that have not been considered in the current literature. One main hypothesis is that a low-dimensional controller of singleimpulse XPs could be designed to be generic to subjects and motor tasks, but sufficiently selective to drive a subject-specific musculoskeletal model of the human lower extremity. This would allow coordinating large groups of MTUs and subsequently predicting joint moments about multiple degrees of freedom (DOFs) in the lower extremity during a variety of motor tasks that are substantially different to each other. It is also hypothesized that the use of a *subject-generic, task-generic, low-dimensional* XP set, as opposed to a *subject-specific, task-specific, high-dimensional* EMG set, does not lead to substantial loss of joint moment prediction accuracy in the driven musculoskeletal model. This view would pose the problem of how to differentiate the model's outputs across movements, since the same impulsive controller is used as input drive across different tasks and subjects. In this scenario, it is hypothesized that the model's outputs (i.e., MTU forces and resulting joint moments) are differentiated across movements by the experimental joint kinematics input to the model. This is subsequently used to estimate somatosensory information such as the instantaneous MTU kinematics (Sartori et al., 2012c) and update the MTU force and resulting joint moment estimates (see Methods Section) (Lloyd and Besier, 2003; Sartori et al., 2012a). It is worth noting that the experimental joint kinematics input might, in fact, differ from the kinematics that would be obtained by converting the model's predicted joint moments into joint position. Therefore, within our methodology, the experimental joint kinematics operates as an error correction factor that accounts for somatosensory information (i.e., MTU kinematics) and compensates for the static behavior and simplified structure of the generic XP-based controller. Therefore, prediction discrepancies are not compensated for by a task-specific, subject-specific impulsive controller but rather by joint kinematics information.

Addressing these questions not only would offer further indirect evidence of the theoretical correctness of the human locomotion control scheme previously proposed in the literature (Ivanenko et al., 2005, 2007; Lacquaniti et al., 2012), but would also support the hypothesis that a variety of human locomotion tasks, substantially different from each other, may share a similar neuromuscular control scheme of impulsive nature. Finally, it would provide a novel musculoskeletal model of human locomotion that (1) could be operated in an open-loop forward dynamics way without using numerical optimization to match the experimental joint moments, (2) could be therefore executed at low computational cost (see Results Section), and (3) could produce movement-specific joint moment estimates even if driven by subject-generic and task-generic XPs.

The main advantage of the proposed approach is that, once an XP set has been defined, no EMG recordings are needed for the model operation. This might have substantial implications in the development of novel neurorehabilitation technologies. In this scenario, our proposed XP-driven musculoskeletal model can be operated at low computational costs for the real-time prediction of an individual's neuromusculoskeletal dynamics and for the subsequent development of neuromuscular human-machine interfaces for powered prostheses and orthoses control. In this, MTU activation and joint moments can be predicted in real-time solely from the low-dimensional XP set and three-dimensional joint kinematics without loss of accuracy with respect to using EMG signals. This would substantially decrease the input drive complexity as well as the number of needed sensors on the wearable robot, thus increasing the system robustness.

In this paper, the proposed XP-driven modeling methodology is presented and validated with a direct comparison to the previously presented EMG-driven modeling method (Lloyd and Besier, 2003; Sartori et al., 2012a).

# **MATERIALS AND METHODS**

The procedure comprised four steps: (1) collecting human movement data using motion capture technology, (2) modeling and simulating the recorded human movement, (3) determining a low-dimensional set of XPs, and (4) calibrating and executing the musculoskeletal model of the human lower extremity.

### **HUMAN MOVEMENT DATA COLLECTION**

Two healthy men (age: 28 and 26 years, height: 183 and 167 cm, mass: 67 and 73 kg) volunteered for this investigation and gave their informed, written consent. The project was approved by the Human Research Ethics committee at the University of Western Australia.

The motion data acquired from the two subjects were static anatomical trials, functional calibration trails, and the actual dynamic gait trials. During all trials, the three-dimensional location of retro-reflective markers placed on the subjects' body was recorded (250 Hz) using a 12-camera motion capture system (Vicon, Oxford, UK). During the dynamic trials, ground reaction forces (GRFs) and EMG signals were collected (2000 Hz) synchronously with the marker trajectories using an in-ground force plate (AMTI, Watertown, USA), and bipolar electrodes with a telemetered EMG system (Noraxon, Scottsdale, USA), respectively.

The dynamic trials were eight repetitions of four motor tasks including fast walking (FW, 1.3 ± 0.25 m/s), running (RN, 2.5 ± 0.5 m/s), sidestepping (SS, 2.0 ± 0.35 m/s), and crossover (CO, 1.9 ± 0.15 m/s) cutting maneuvers. Each trial included the full stance phase of gait of the subjects' right leg. The CO and SS tasks were straight running with change of direction to the right and left respectively. In these, the direction change was performed by having the right leg in contact with the floor and going through the full stance phase. In all tasks, velocities were measured by tracking the speed of the trunk markers during the stance phase. The four motor tasks were chosen because (1) they required the production of large joint moments about the six considered DOFs including: hip flexion-extension (HipFE), hip adductionabduction (HipAA), hip internal-external rotation (HipROT), knee flexion extension (KneeFE), ankle plantar-dorsi flexion (AnkleFE), and ankle subtalar flexion (AnkleSF), and (2) because they reflected different MTU recruitment strategies and contraction dynamics. This permitted us to investigate whether the proposed methodology could use a simple XP set to predict joint moments produced about the six considered DOFs while accounting for different MTU operation strategies.

EMG data were collected from 16 muscle groups including: hip adductors (add), gluteus maximus (gmax), gluteus medius (gmed), gracilis (gra), tensor fasciae latae (tfl), lateral hamstrings (latham), medial hamstrings (medham), sartorius (sar), rectus femoris (recfem), vastus medialis (vasmed), vastus lateralis (vaslat), gastrocnemius medialis (gasmed), gastrocnemius lateralis (gaslat), peroneus group (per), soleus (sol), and tibialis anterior (tibant). Both GRFs and marker trajectories were low-pass filtered with a fourth-order Butterworth filter. Cut-off frequencies (between 2 and 8 Hz) were determined by a trialspecific residual analysis (Winter, 2004). EMGs were processed by band-pass filtering (30–450 Hz), then full-wave rectifying, and low-pass filtering (6 Hz).

From the collected dynamic trials, two distinct datasets were created; one for the calibration of the musculoskeletal model and the other for the validation. For each subject, the calibration dataset included two repeated trials of the four motor tasks (i.e. FW, RN, SS, and CO). A different dataset was used to validate the calibrated musculoskeletal model on each subject and included six repeated novel trials for the four considered motor tasks. None of the trials in the validation dataset were included in the calibration dataset. Therefore, there was no common data between the two datasets.

# **MOVEMENT MODELING**

Using the software OpenSim <sup>1</sup> (Delp et al., 2007), a generic model of the human musculoskeletal geometry<sup>2</sup> was scaled to match the individual subject's anthropometry. This was done based on the experimentally measured marker positions recorded from the static standing poses, and the location of the hip, knee and ankle joint centers as well as knee flexion-extension axis determined using the functional calibration trials (Besier et al., 2003b). During the scaling process, virtual markers were created and placed on the musculoskeletal geometry model based on the position of the experimental markers. The OpenSim Inverse Kinematics (IK) algorithm (Delp et al., 2007) solved for joint angles that minimized the least-squared error between experimental and virtual markers. The joint moments that needed to track the IK-generated angles were obtained using Inverse Dynamics (ID) and Residual Reduction Analysis (RRA) (Delp et al., 2007). The joint moments produced by this pathway were called "the experimental" moments. The alternative pathway to estimate joint moments was by the XP-driven and EMG-driven models (Sartori et al., 2012a).

The estimates produced by our proposed methodology can be directly compared to previously proposed works in the literature (Besier et al., 2009; Winby et al., 2009; Krishnaswamy et al., 2011; Sartori et al., 2012a,b; Hamner and Delp, 2013). Furthermore, **Figure A1** compares our derived experimental joint moments to those available in the literature (Winter, 1983, 2004; Liu et al., 2008; Hamner et al., 2010). The normalized root mean squared error and correlation coefficients assumed values that ranged between 0.1–0.3 and 0.89–0.98 respectively (see Validation Procedure Section). During walking, our derived experimental joint moments had peak values of comparable magnitude between the hip extension in the early stance and the ankle plantar flexion in the late stance. This differed to what reported in (Winter, 2004; Liu et al., 2008) in which the peak hip extension moment was smaller than the peak ankle plantar flexion moment. However, it is worth stressing the fact that the use of an ankle joint with oblique axes, like that used in our proposed work, directly coupled ankle plantar-dorsi flexion with ankle subtalar flexion (Kirby, 2001; Yamaguchi et al., 2009). The effect of this coupling on the ankle plantar-dorsi flexion moments is less pronounced in ankle joint models with orthogonal axes (Winter, 2004) or in models in which the ankle subtalar flexion is constrained to the anatomical neutral position (Liu et al., 2008).

# **MUSCLE EXCITATION PRIMITIVE**

To identify the muscle XPs, a two-step process was used based on a previously described non-negative matrix factorization (NNMF) technique (Lee and Seung, 2001). First, the muscle-specific EMG linear envelopes were normalized in time and with respect to the peak processed EMG values obtained from all trials (Ivanenko et al., 2004; Gizzi et al., 2011). In this way, each muscle for each trial and motor task was equally represented in the final muscle

<sup>1</sup>OpenSim release 2.2.0 available from: https://simtk.org/home/opensim 2Available from: https://simtk.org/home/nmblmodels/

weighting computation and results reflected only changes in timing. The normalized linear envelopes computed from all dynamic trials in the validation dataset collected from the two subjects were then combined into an *m* × *n* matrix, where *m* indicates the number of muscles and *n* the number of trial frames × number of trials × number of subjects. That is, each row was associated to a muscle, which concatenated the muscle's EMG data from all trials and subjects. The NNMF was then applied to the *m* × *n* matrix with a number of non-negative factors identified together with their associated weightings (see Results Section). The extracted, experimental non-negative factors were linearly combined with their associated weightings to produce an *m* × *n* matrix of reconstructed EMGs and then compared to the original EMG matrix. The agreement between the two matrices was then quantified by least squares errors. The NNMF was then iterated within an optimization procedure by adjusting the non-negative factors until they minimized the least squared error between experimental and reconstructed EMG data. In this procedure the dimensionality of the non-negative factor set was increased until the accuracy of the reconstructed EMG data exceeded a predefined threshold. This was assessed by means of the Variation Accounted For (VAF) index, which was defined as VAF = 1 – SSE/TSS, where SSE (sum of squared errors) represented the unexplained variation and TSS (total sum of squares) was the total variation of the EMG data. A minimal VAF value of 80% was the threshold to be exceeded in this study to consider the reconstruction quality as satisfactory (Gizzi et al., 2011). This resulted in a matrix of five non-negative factors (i.e. 5 × *n* nonnegative factor matrix) that accounted for 89% of the variability in the EMG data. In this, each muscle group had five associated weighting factors. These determined how much each of the five non-negative factors contributed to the excitation of a specific muscle group. The weightings were then normalized to the highest weighting value: *<sup>W</sup>*¯ *<sup>j</sup>*,*<sup>i</sup>* <sup>=</sup> *Wj*,*<sup>i</sup>* max(*W*1,*i*,*W*2,*i*,*W*3,*i*,*W*4,*i*,*W*5,*i*), where *Wj*,*i*represented the jth weighting (1 ≤ j ≤ 5) for the ith muscle group (1 ≤ i ≤ 16), whereas *W*¯ *<sup>j</sup>*,*<sup>i</sup>* was the resulting normalized weighting (i.e., 0 ≤ *W*¯ *<sup>j</sup>*,*<sup>i</sup>* ≤ 1).

In the second step, from the 5 × *n* non-negative factor matrix, each non-negative factor that was associated to a specific trial and subject was isolated based on the frames associated to the specific trial. This allowed removing the discontinuities existing between two adjacent non-negative factors. Then, the extracted trial-specific non-negative factors were averaged across trials. The five trial-averaged non-negative factors were then fitted with five Gaussian-shaped single-impulse curves (see Results Section), which represented the XPs that were used to drive the proposed XP-driven musculoskeletal model:

$$\lg(t) = h \cdot e^{-\frac{(t-b)^2}{2\cdot\epsilon^2}} - s \tag{1}$$

where *t* is the time frame and *h*, *b*, *c*, and *s* were the function parameters defining the Gaussian curve peak height, the position of the center of the peak, the width of the curve bell, and the vertical shift respectively. The function parameters were identified using a simulated annealing optimization algorithm (Goffe et al., 1994) that minimized the root mean squared error with respect to each of the five average non-negative factors.

### **MUSCULOSKELETAL MODELING**

The XP-driven musculoskeletal model (**Figure 1**) was developed from our previously described EMG-driven model of the human lower extremity (Lloyd and Besier, 2003; Winby et al., 2009; Sartori et al., 2012a,c). The following of this section provides a description of the XP-driven modeling workflow as well as a description of the model components.

In the proposed *XP-driven modeling workflow*, a fivedimensional impulsive controller (i.e. made of five XPs, **Figure 2**) defined, a priori, an initial recruitment scheme for 34 MTUs in the human lower extremity. The properties of the recruitment scheme were preserved across subjects and motor tasks including the relative position and peak amplitude of one XP with respect to another and the MTUs recruited by each XP. The five XPs were only time-scaled (i.e. stretched or compressed) to match the length of the stance phase across movements. A closed-loop calibration step (**Figure 1E**, also see below in this section) was then performed to identify a number of musculoskeletal model parameters, which varied non-linearly across subjects because of anatomical and physiological differences (Sartori et al., 2012a). In this step, the impulsive controller was further refined to determine a finer mapping between the low-dimensional set of XPs and the highdimensional set of MTU-specific activations (**Figure 1B**) that best described the MTU-specific activation strategies across the four selected tasks.

The calibrated XP-driven model was then validated on the same motor tasks selected for calibration (i.e. FW, RN, SS, and CO) but using a novel set of trials (see the Human movement data collection Section). During the validation step, the calibrated XPdriven model operated as an open-loop predictive system, which did not use numerical optimization to track the experimental joint moments, and therefore operated at low computational cost (see Results Section). The MTU-specific activations and the resulting joint moments were directly determined as a function of the five XPs and the three-dimensional joint kinematics, i.e. there was no need to record further EMG data.

The proposed model's structure comprised five main components (**Figure 1**): Musculotendon Kinematics, Musculotendon Excitation-to-Activation, Musculotendon Dynamics, Moment Computation, and Model Calibration.

The *Musculotendon Kinematics component* (**Figure 1A**) used MTU-specific multidimensional spline functions to produce instantaneous estimates of MTU length *mt*, and threedimensional moment arms *r* as a function of joint angles (Sartori et al., 2012c)<sup>3</sup> .

The *Musculotendon Excitation-to-Activation component* (**Figure 1B**) mapped the five XPs into the 34 MTU-specific activations. The five XPs were initially associated to the 16 muscle groups from which the EMG data were recorded. In this, if a muscle group had an associated weighting factor greater than 0.4 on one of the five XPs, then the muscle group was considered to be associated to that specific XP that defined its initial excitation (Neptune et al., 2009). With respect to (Neptune et al., 2009),

<sup>3</sup>Code and documentation are freely available from: http://code.google.com/ p/mcbs/

our proposed cut-off criterion differed in the fact that weightings and primitives were extracted from the matrix concatenating all EMG linear envelopes from all subjects and trials (see Muscle Excitation Primitive Section). Therefore, NNMF generated a single set of weightings that applied to all subjects' trials. In (Neptune et al., 2009), NNMF was individually applied to each subject. This created subject-specific weightings, which were then averaged prior to the application of the 0.4 cut-off criterion. In our proposed methodology, if a muscle group had more than one associated XP, then the average across the XPs was calculated and used as the muscle group initial excitation. Muscle weightings allowed arranging muscles groups into seven modules (see second test results in Results Section). A module included all muscle groups with a weighting greater than or equal to 0.4 on the same XP. All MTUs from all muscle groups within a module received the same initial XP. The XPs were also assigned to MTUs for which experimental EMG data could not be recorded and included the gluteus minimus, iliacus, psoas, and vastus intermedius. In this allocation, two MTUs that shared the same innervation and contributed to the same mechanical action were assumed to be in the same module and have therefore the same initial XP (Kahle and Frotscher, 2002; Ivanenko et al., 2006). Therefore, the XP that was assigned to both the rectus femoris and the sartorius (module 2 in **Figure 2A**) was also assigned to the iliacus and psoas. The vastus medialis and vastus lateralis XP (module 3 in **Figure 2A**) was assigned to the vastus intermedius, and the gluteus medius XP to the gluteus minimus (also module 3 in **Figure 2A**). These assignments were motivated by anatomical and functional information on the MTUs, assuming that the dimensionality computed from a smaller set of MTUs could be applied to the entire set. Furthermore, if the XPs were the reflection of the spinal circuitry dynamics then the XPs must apply to all lower extremity MTUs whether or not experimental EMGs were available for a specific MTU. This mapping enabled us to use the low dimensional set of XPs to excite a larger number of MTUs than those for which experimental EMG data were available. It is worth noting that this mapping is, in general, not entirely described by the muscle group weightings extracted from the available experimental EMG data using NNMF. This is because experimental EMGs do not directly reflect the activity of deeply located MTUs. For this reason, muscle group weightings were not used to linearly combine XPs together. This allowed us

to account for the impulsive nature of the MTU recruitment. The transformation from the XP (i.e. applied to a group of MTUs) to the MTU-specific activation (i.e., applied to a single MTU only) is discussed below.

Each XP that was assigned to a group of MTUs was processed by a critically damped second order recursive filter, which simulated the individual MTU twitch response to the initial XP excitation (Thelen et al., 1994; Lloyd and Besier, 2003):

$$u(t) = \alpha \cdot x(t - d) - \beta\_1 u(t - 1) - \beta\_2 u(t - 2) \tag{2}$$

where *x(t)* was the XP at time *t*, *u(t)* was the MTU-specific postprocessed XP, α was the MTU-specific gain coefficient, and β1, and β<sup>2</sup> were the MTU-specific recursive filtering coefficients. The term *d* was the electromechanical delay. This was set to 10ms based on previously reported experimental results (Nordez et al., 2009) and it was treated as a global parameter as previously suggested (Heine et al., 2003).

The resulting *u(t)* signal was then further processed using the non-linear transfer function in Equation 3 (Lloyd and Besier, 2003; Buchanan et al., 2004). This accounted for the non-linearity between the MTU excitation and force, reflecting the saturation at high levels of the motor unit recruitment in generating force (Lloyd and Besier, 2003; Buchanan et al., 2004; Farina and Negro, 2012):

$$a(t) = \frac{e^{A\mu(t)} - 1}{e^A - 1} \tag{3}$$

where *A* was the non-linear shape factor, which was constrained to −3 < *A* < 0, with 0 being a linear relationship (Lloyd and Besier, 2003). The resulting MTU-specific activation, *a(t)*, represented the ultimate control input to the MTU contractile component. Note that this transformation adjusted the timing (Equation 2) and shape (Equations 2 and 3) of the XPs for each MTU individually. This is important to account for the different activation timing emerging from different motor tasks (Ivanenko et al., 2005).

In the *EMG-driven model*, the EMG linear envelopes were directly used to drive the musculoskeletal model. As previously described, EMG linear envelopes were normalized with respect to the peak processed EMG values obtained from the entire set of recorded trials (Sartori et al., 2012a,b). In this scenario, a dedicated EMG linear envelope was associated with each muscle group individually, with all MTUs within a muscle group receiving the same EMG pattern (Sartori et al., 2012a). This accounted for the different excitation dynamics across muscle groups as opposed to when using XPs, which excited multiple muscle groups simultaneously. Therefore, the excitation-to-activation transformation (Equations 2 and 3) could be treated as a global transformation that applied equally to all MTUs. That is, the same values for the filtering coefficients and the shape factor were used for all MTUs in the model. In this context, the deeply located iliacus and psoas MTUs were not driven by EMG signals. As a result, only their passive force contribution was modeled using the Musculotendon Dynamics component (**Figure 1C**, see below) by setting the MTU activation to zero (Sartori et al., 2012a).

In the *Musculotendon Dynamics component* (**Figure 1C**), each MTU was modeled as a Hill-type muscle model. In this, the muscle fibers had generic force-velocity *f*(*vm*), force-length passive *fP*(*l <sup>m</sup>*), and active *fA*(*l <sup>m</sup>*) curves, which were normalized to maximum isometric muscle force (*F*max), optimal fiber length, and maximum muscle contraction velocity (Zajac, 1989). The tendon dynamics was modeled using a non-linear force-strain function *f*(ε) normalized to *F*max (Zajac, 1989). Using biomechanical parameters from (Delp et al., 1990; Lloyd and Buchanan, 1996, 2001), the MTU force *Fmt* was calculated as a function of *a*(*t*), fiber length *l <sup>m</sup>* and fiber contraction velocity *vm*:

$$\begin{aligned} F^{mt} &= F^t = F^m \cos(\phi(t)) \\ &= \left[ a(t) f\_A(l^m) f(\nu^m) + f\_P(l^m) \right] F^{\text{max}} \cos(\phi(t)) \end{aligned} \tag{4}$$

where *F<sup>t</sup>* and *F<sup>m</sup>* were the tendon and fiber force, and ϕ(t) the pennation angle. During the process of MTU force estimation, *l <sup>m</sup>* and *v<sup>m</sup>* were determined at each time point while ensuring equilibrium between *F<sup>t</sup>* and *F<sup>m</sup>* in Equation 4 (Lloyd and Besier, 2003).

The Moment Computation component (**Figure 1D**) estimated the joint moments *MX* as the sum of the product of *rX* and *Fmt*, for each *X* DOF, i.e.,HipFE, HipAA, HipROT, KneeFE, AnkleFE, and AnkleSF.

The *Model Calibration component* (**Figure 1E**) determined the values for a set of parameters that vary non-linearly across subjects and cannot be determined experimentally or from literature (Winby et al., 2008). Parameters were varied within predefined boundaries to ensure MTUs always operated within their physiological range (Lloyd and Besier, 2003). Parameters were adjusted using a simulated annealing algorithm (Goffe et al., 1994) until the objective function *fE* = (*E*HipFE + *E*HipAA + *E*HipROT + *E*KneeFE + *E*AnkleFE + *E*AnkleSF) was minimized equally for each DOF. Each DOF error term (*E*HipFE, *E*HipAA, *E*HipROT, *E*KneeFE, *E*AnkleFE, *E*AnkleSF) was the sum of the root mean square differences between the predicted and experimental joint moments calculated over the eight calibration trials recorded for a specific subject.

During calibration, two MTU-specific activation-filtering coefficients in the Musculotendon Excitation-to-Activation component (**Figure 1B**) were adjusted, while being constrained to realize a stable positive solution and a critically damped impulsive response for the recursive filter (Equation 2) (Lloyd and Besier, 2003). In this, the two adjusted parameters determined the final value of α, β1, and β<sup>2</sup> in Equation 2. The MTU-specific global shape factor parameter *A* (Equation 3) was also altered between −3 and 0 to account for the non-linear EMG-to-force relationship (Lloyd and Besier, 2003; Buchanan et al., 2004; Winby et al., 2009).

In the Musculotendon Dynamics component, 11 muscle strength coefficients were calibrated to scale the MTU-specific *F*max to match the person's strength, while maintaining the force generating capacity across MTUs. Strength coefficients were varied between 0.5 to 2 and gathered MTUs in 11 groups according to their functional action including uniarticular hip flexors, uniarticular hip extensors, hip adductors, hip abductors, uniarticular knee flexors, uniarticular knee extensors, uniarticular ankle plantar flexors, uniarticular ankle dorsi flexors, biarticular quadriceps, biarticular hamstrings, and biarticular calf muscles. Muscle tendon slack length *l t <sup>s</sup>*, and optimal fiber length *l m <sup>O</sup>* were also adjusted so that *l t <sup>s</sup>* = initial value ± 5% and *l m <sup>O</sup>* = initial value ± 2.5% with initial values obtained using the methodology presented in (Winby et al., 2008).

# **VALIDATION PROCEDURE**

The validation comprised four tests to assess the XP-driven model prediction ability and to compare it to the EMG-driven model prediction ability. Furthermore, one additional test was performed to assess the XP-driven model computation time.

In the four prediction tests, the subject-specific calibrated XPdriven and EMG-driven models were operated in open-loop, (i.e. without using optimization to track the experimentally recorded joint moments) on each individual motor trial performed by each subject. In this, both the XP-driven and EMG-driven models predicted *a(t)*, and *MX* solely using the parameterized XPs, or experimental EMGs respectively, and the three-dimensional joint angles. The models' outputs were then time-normalized using a cubic spline and the similarity between the predicted and the experimental variables was quantified using the coefficient of determination *R*<sup>2</sup> (i.e., square of the Pearson product moment correlation coefficient) and the normalized root mean squared deviation *NRMSD*:

$$NRMSD = \frac{\sqrt{\frac{1}{N} \sum\_{i=1}^{N} \left(\hat{X}\_i - X\_i\right)^2}}{\max(\hat{X}, X) - \min(\hat{X}, X)}\tag{5}$$

where *X* and *X*ˆ referred to the two variables being compared, which were (1) the predicted and experimental joint moments, (2) the *a(t)* produced using the EMG-driven model and the XP-driven model, or (3) the experimental and parameterized XP curves. Furthermore, *N* referred to the number of points in the considered curves. In our proposed study, these metrics identified acceptable results for values of *NRMSD* and *R*<sup>2</sup> being 0.0 ≤ *NMRSD* ≤ 0.3 and 0.7 ≤ *R*<sup>2</sup> ≤ 1.0. This criterion was based on the results previously proposed in the literature that used EMG-driven methodologies (Besier et al., 2003a; Lloyd and Besier, 2003; Buchanan et al., 2004, 2005; Winby et al., 2009; Manal et al., 2011; Shao et al., 2011).

For the only purpose of displaying results in a concise way, in some cases, the time-normalized models outputs from the same motor task were averaged across trials and/or across subjects. This produced ensemble average curves for the predicted *a(t)*, and *MX* as well as for the matching experimental joint moments *M*ˆ *<sup>X</sup>*.

The *first test* examined the five non-negative factors that were extracted from the experimental EMG data as well as the five XPs that were parameterized using Equation 1. The agreement between the *j* th experimental non-negative factor (*g j* exp, 1 ≤ *j* ≤ 5) and the corresponding *j* th parameterized XP curve (*g j* par, 1 ≤ *j* ≤ 5) was then quantified using the *R*<sup>2</sup> and the *NRMSD* coefficients. Furthermore, for the *i* th muscle group (1 ≤ *i* ≤ 16), the value of the five associated normalized weightings *W*¯ *<sup>j</sup>*, *<sup>i</sup>* (1 ≤ *j* ≤ 5) was analyzed. If *W*¯ *<sup>j</sup>*, *<sup>i</sup>* was greater than 0.4, then the *j* th XP was associated to the *i* th muscle group. Because EMG linear envelopes were normalized to the peak processed EMG values on a trial basis, it was expected that the amplitude of the final XPs did not affect the final muscle weighting distribution based on the employed 0.4 cut-off criterion. However, in order to assess this directly, an additional set of weightings was computed. In this, the weighting factors *Wj*, *<sup>i</sup>* (with 1 ≤ *j* ≤ 5, and 1 ≤ *i* ≤ 16) extracted from NNMF were first multiplied by the amplitude peak of the associated XP and then normalized as previously described in the Muscle Excitation Primitive Section. The two muscle weighting sets were then directly compared.

The *second test* assessed whether the MTU activations (i.e. resulting from the XP-to-activation mapping, Equations 2 and 3) predicted by the XP-driven model were similar to the MTU activations predicted by the EMG-driven model. Because the XP-driven model was calibrated on an individual to best match the variety of MTU and joint dynamics observed over all four calibration tasks (i.e., FW, RN, SS, and CO), it was expected that the MTU activations resulting from this task-generic XPto-activation mapping well matched the EMG-driven MTU activations on average over the four motor tasks. However, because the XP-to-activation mapping was preserved across tasks, it was expected a lesser favorable matching with EMG-driven MTU activations across each individual trial.

The *third test* compared the joint moment prediction accuracy of the XP-driven and EMG-driven models. This assessed whether our proposed methodology could use a task-generic XP-to-activation mapping to predict task-specific joint moments simultaneously produced about the six considered DOFs.

The *fourth test* assessed whether the XP-driven musculoskeletal model was able to reproduce the similar variability observed in the joint moments predicted by the EMG-driven model as well as in the joint moments experimentally recorded. This question arises from the consideration that our proposed model is driven by the same set of XPs, which are only scaled in time to match the length of the trial-specific stance phase. A positive outcome of this test would give further confidence that the use of a subject-generic, task-generic, and low-dimensional XP set would not decrease the ability of the musculoskeletal model to produce movement-specific outputs and would not imply substantial loss of predictive ability with respect to using EMG recordings as an input to the model. Furthermore, this would imply that the predicted joint moment variability was the direct reflection of the predicted MTU kinematics which is the only model input that varies across trials as a function of the three-dimensional joint angles. Finally, this would support the hypothesis that dynamically different movements could emerge from the same locomotion program decoded in the spinal circuitries. For this purpose, we calculated and compared the standard deviation curves extracted from the joint moments predicted using the XPs and the EMG data as well as from those experimentally recorded.

In the *fifth test* the XP-driven musculoskeletal model calibration and execution time were examined. Calibration time was calculated as the time needed to calibrate the model on the eight calibration trials of each subject. Execution time was calculated as the average time needed to repeatedly compute one time point from all DOF joint moments 1000 times. Tests were performed on an 8 GB RAM Intel i7 CPU. If fast execution times were obtained from this test, it would imply the possibility of applying our proposed methodology for the on-line control of powered prostheses and orthoses.

# **RESULTS**

In the *first test* (**Figures 2**, **A2**, **Tables 1**, **A1**), the five experimentally extracted non-negative factors, and muscle group weightings accounted for the 89% of the experimental EMG data variability. Muscle groups were apportioned into seven modules according to the dominant weightings (**Table 1**). The NNMF (**Figure 2A**) revealed that the non-negative factor 1 was mostly responsible for the excitation of add (*W*<sup>1</sup> = 0.77), medham (*W*<sup>1</sup> = 1), latham (*W*<sup>1</sup> = 0.64), and gra (*W*<sup>1</sup> = 1). For the remaining muscle groups *W*<sup>1</sup> ranged from 10−<sup>5</sup> to 0.17. The non-negative factor 2 was mostly responsible for the excitation of recfem (*W*<sup>2</sup> = 1), sar (*W*<sup>2</sup> = 1), and tfl (*W*<sup>2</sup> = 1). For the remaining muscle groups *W*<sup>2</sup> ranged from 10−<sup>5</sup> to 0.26. The non-negative factor 3 was mostly responsible for the excitation of gmax (*W*<sup>3</sup> = 1), gmed (*W*<sup>3</sup> = 1), tfl (*W*<sup>3</sup> = 0.92), vaslat (*W*<sup>3</sup> = 1), and vasmed (*W*<sup>3</sup> = 1). For the remaining muscle groups *W*<sup>3</sup> ranged from 10−<sup>5</sup> to 0.33. The non-negative factor 4 was mostly responsible for the excitation of gaslat (*W*<sup>4</sup> = 1), gasmed (*W*<sup>4</sup> = 1), per (*W*<sup>4</sup> = 1), sol (*W*<sup>4</sup> = 1), and tfl (*W*<sup>4</sup> = 0.55). For the remaining muscle groups *W*<sup>4</sup> ranged from 10−<sup>5</sup> to 0.13. The non-negative factor 5 was mostly responsible for the excitation of add (*W*<sup>5</sup> = 1), gra (*W*<sup>5</sup> = 1), and tibant (*W*<sup>5</sup> = 1). For the remaining muscle groups *W*<sup>5</sup> ranged from 10−<sup>5</sup> to 0.3. The only muscle groups that received excitation from more than one XP were add, gra, and tfl. The alternative muscle weighting set, which we computed accounting for the XP peak amplitude (see Muscle Excitation Primitive Section), resulted in the same XP-to-MTU distribution that was obtained from the muscle weighting set that did not account for the XP peak amplitude. **Figure A2** and **Table A1** directly compare the values from the two muscle weighting sets. The parameterized XPs well fitted the experimental non-negative factors with *R*<sup>2</sup> values ranging from 0.74 to 0.94, and *NRMSD* values ranging from 0.0003 to 0.25 (**Figure 2B**). **Table 1** also summarizes how the five parameterized XPs were assigned to the 16 muscle groups and to the MTUs within and how these were apportioned into the seven muscle modules.

In the *second test* (**Figures 3**, **4**, **Table A2**) the MTU-specific activations predicted by the XP-driven model closely matched the activations predicted using the EMG-driven model on average on all subjects and tasks (**Figure 3**). For this analysis the iliacus



and psoas MTUs were not considered due to lack of experimental EMG data available. The *R*<sup>2</sup> coefficient on the average MTU activations assumed values below 0.7 (i.e. between 0.07 and 0.67) for six MTUs only. For the remaining 26 MTUs, the *R*<sup>2</sup> coefficient assumed higher values that ranged from 0.8 and 0.99. Similarly, the *NRMSD* coefficient on the average MTU activations assumed values above 0.3 (i.e. from 0.31 to 0.45) for five MTUs only. For the remaining 27 MTUs the *NRMSD* coefficient assumed smaller values that ranged between 0.057 and 0.25. **Table A2** reports the detailed values from all MTUs for the *R*<sup>2</sup> and *NRMSD* coefficients averaged across trials and subjects (i.e. results depicted in **Figure 3**). The proposed XP-driven model was also able to predict the activation of deeply located MTUs such as psoas and iliacus for which experimental EMG data were not available. In this, the ability of the XP-driven model to match, on average, the EMG-dependent MTU activation generated on the four motor tasks gives further confidence that the XP-dependent activations predicted for the deeply located MTUs may also be a reliable reflection of their average physiological behavior. However, further experimental validation is needed in this context. **Figure 3** also shows that the XP-driven MTU activations assumed smaller variability (i.e. standard deviation range) with respect to the EMG-driven MTU activations. This is because of the use of a task-generic XP-to-activation mapping, which does not allow reproducing the whole range of the task-specific MTU activation patterns observed when using EMG data as input to the model. In this scenario, **Figure 4** depicts the specific case of a representative MTU, i.e. the peroneus brevis. For this, *R*<sup>2</sup> ranged from 0.58 to 0.98 while *NRMSD* ranged from 0.12 to 0.46 across subjects and tasks. Similar results were found for all remaining MTUs. **Table A2** also reports the subject-specific *R*<sup>2</sup> and *NRMSD* values for all MTUs averaged across all trials within each motor task and for each subject individually.

In the *third test* (**Figure 5**, **Tables A3**, **A4**), the XP-driven model predicted joint moments produced about the six lower extremity DOFs during the four motor tasks with comparable performance to the EMG-driven model (**Figure 5**). The *NRMSD* coefficient between predicted and experimental joint moments ranged from 0.048 and 0.46 when the EMG-driven model was used, whereas it ranged from 0.082 and 0.42 when the XP-driven model was employed. The *R*<sup>2</sup> coefficient between predicted and experimental joint moments ranged from 0.2 to 0.99 when the EMG-driven model was used, while it ranged from 0.3 to 0.98 when the XP-driven model was used. **Table A3** reports the full range of subject-specific and task-specific *R*<sup>2</sup> and *NRMSD* values observed between the XP-driven model predictions and the experimental measurements. **Table A4** reports the full range of subject-specific and task-specific *R*<sup>2</sup> and *NRMSD* values between

**FIGURE 3 | Predicted musculotendon unit (MTU) activations.** The ensemble average (filled lines) and standard deviation (dotted lines) activation curves are depicted for the 34 MTUs included in the musculoskeletal model. Data are averaged across all trials and subjects. MTU names are defined as in **Table 1**. MTU activations are reported both from the estimates obtained from XP-driven and EMG-driven musculoskeletal models. The reported data are from the stance phase with 0% being heel-strike and 100% toe-off events.

activation curves are depicted for the peroneus brevis MTU. Data are averaged across all trials within each motor task performed by the two subjects individually. Motor tasks included fast walking (FW),

XP-driven and EMG-driven musculoskeletal models. The reported data are from the stance phase with 0% being heel-strike and 100% toe-off events.

the EMG-driven model prediction and the experimental measurements. The weakest prediction accuracy was observed both in the XP-driven and EMG-driven models for the moments about HipAA during FW and to the moments about AnkleSF during CO and SS.

In the *fourth test* (**Figures 5**, **6**, **Table 2**), the upper and lower standard deviations (SDs) of the joint moments predicted using the XP-driven model assumed similar values to those predicted using the EMG-driven model and to those experimentally recorded. **Figure 5** displays joint moment standard deviations

curves are depicted for the predicted (i.e. XP-driven and EMG-driven) and experimental (i.e., Reference) joint moments about six degrees of freedom (DOFs) including: hip flexion-extension (HipFE), hip adduction-abduction (HipAA), hip internal-external rotation (HipROT),

motor tasks including: fast walking (FW), running (RN), side-stepping (SS), and cross-over (CO) cutting maneuvers. The reported data are from the stance phase with 0% being heel-strike and 100% toe-off events.

task-wise, resulting from averaging across the subjects' performed trials within each specific task. **Figure 6** and **Table 2** report joint moment standard deviations subject-wise, resulting from averaging across all trials and tasks performed by each subject individually.

The SD similarity observed between the XP-driven and the EMG-driven model estimates was within acceptable ranges. The *R*<sup>2</sup> coefficients were always greater than 0.65 whereas the *RMSD* coefficients where always less than 0.26, about all DOFs, both across motor tasks (**Figure 5**) and subjects (**Figure 6** and **Table 2**), and both for the upper and lower SDs. This gives further confidence that the use of our proposed XP-driven model can reproduce similar output variability both across subjects and tasks with respect to the use of experimental EMG recordings as an input to the musculoskeletal model.

Across tasks (**Figure 5**), the SD similarity observed between the XP-driven model estimates and the experimental data, had less favorable *R*<sup>2</sup> and *RMSD* values that were observed about the AnkleSF during CO (i.e. *R*<sup>2</sup> = 0.53 and *RMSD* = 0.4, upper SD) and SS (i.e. *R*<sup>2</sup> = 0.01 and *RMSD* = 0.5, upper SD), and during FW about HipAA (i.e. *R*<sup>2</sup> = 0.26 and *RMSD* = 0.36, lower SD), and KneeFE (i.e. *R*<sup>2</sup> = 0.38 and *RMSD* = 0.22, upper SD). In the remaining cases, the *R*<sup>2</sup> coefficients were always greater than 0.7 whereas the *RMSD* coefficients where always less than 0.21.

Across individuals (**Figure 6** and **Table 2**), subject 1's *R*<sup>2</sup> and *RMSD* coefficients were always greater than 0.69 and always less than 0.31, respectively. Less favorable values were obtained for subject 2 about AnkleSF (i.e. *R*<sup>2</sup> = 0.13 and *RMSD* = 0.48, lower SD) and HipAA (i.e. *R*<sup>2</sup> = 0.12 lower SD and *RMSD* = 0.31 upper SD). This may also explain the less favorable results from CO, SS, and FW about the same DOFs obtained when analyzing data task-wise (**Figure 5**). In the remaining cases, the *R*<sup>2</sup> coefficients were always greater than 0.73 whereas the *RMSD* coefficients where always less than 0.19.

The *fifth test* revealed that the average calibration time for the XP-driven model was 21 h and 24 min. However, the calibrated open-loop models executed fast. The average open-loop execution time was 20.32 ± 0.2 ms.

**FIGURE 6 | Predicted and experimental standard deviations (SDs).** Upper and lower SDs are reported subject-wise resulting from averaging across motor tasks. SDs are reported for the experimental joint moments (i.e., Reference, shaded area) about 6 degrees of freedom (DOFs) including: hip flexion-extension (HipFE), hip adduction-abduction (HipAA), hip internal-external rotation (HipROT), knee flexion-extension (KneeFE),

ankle plantar-dorsi flexion (AnkleFE), and ankle subtalar-flexion (AnkleSF). The SDs are also shown for same joint moments predicted by the XP-driven and EMG-driven musculoskeletal models (i.e., dotted lines). The data are averaged across all trials and tasks. The data are reported from the two subjects over the stance phase, with 0% being heel-strike and 100% toe-off events.

**Table 2 | Coefficients of determination (***R***2) and the normalized root mean squared deviation (***NRMSD***) between the standard deviations** *(SD)* **of the joint moments measured experimentally and those predicted by the parameterized XP-driven model.**


# **DISCUSSION**

Despite previous works used low-dimensional sets of impulsive curves to drive musculoskeletal models of the human lower extremity (Neptune et al., 2009; McGowan et al., 2010; Allen and Neptune, 2012), our proposed study combined muscle modularity with musculoskeletal modeling with the aim to address a number of novel questions. Our proposed study showed that one single low-dimensional set of single-impulse excitation primitives, or XPs, could be found to best fit the variety of muscle recruitment and excitation patterns observed from two subjects performing motor tasks biomechanically different from each other (i.e. FW, RN, SS, and CO). Once an XP set was defined, no further EMG recordings were needed for the model operation. The XP set determined the structure of a task-generic impulsive controller, which could be preserved across all tasks and subjects. The simplified structure of the task-generic impulsive controller was compensated by combination with movementspecific estimates of MTU kinematics. This allowed producing movement-specific estimates of MTU force and joint moment with no loss of accuracy with respect to those derived from experimental EMG data.

The application of the NNMF algorithm to the EMG data set showed that each XP excited one specific subset of muscle groups (**Figures 2**, **A2**, **Tables 1**, **A1**). The only groups that were excited by more than one XP were the hip adductors, the gracilis, and the tensor fasciae latae. This scheme of recruitment was determined based on a 0.4 cut-off criterion on the muscle weightings (see the Musculoskeletal Modeling Section) and was preserved across subjects and motor tasks, where the XPs were only scaled to match the stance phase length of each individual motor trial.

The XP-driven musculoskeletal model internal parameters were then calibrated to match the physiological characteristics of each subject recruited in the study (see Methods Section). This also allowed defining a finer non-linear mapping from the initial XPs to the 34 MTU-specific activations. The nature of this mapping represented a best fit of the variety of MTU excitation patterns observed during the four considered tasks and was specific to an individual. This was subsequently applied without further variations during the model validation step (i.e. model open-loop operation). It is worth stressing the fact that, in the context of muscle synergies, the XP-to-activation transformation (Equations 1, 2, and 3) reflects the weightings on the XPs (**Figure 2A**) because it allows for changes in the amplitude level and in the time shifting for a specific XP being refined on a specific MTU. One benefit of the XP-to-activation transformation is that it accounts for the dynamics of muscles (i.e. excitation, activation, and force) and joint (i.e. joint moment) as well as for the demand of the motor tasks being used for calibration. These factors are not accounted for by previously proposed dimensionality reduction methodologies that operate on EMG recording only. These include NNMF, principal component analysis, independent component analysis, or factor analysis.

During validation, the calibrated model was operated in openloop on a set of novel trials that were not used during the calibration. However, the novel set of validation trials comprised the same motor tasks used for calibration (i.e. FW, RN, SS, and CO). In this process, the proposed model was driven by the five XPs and by the three-dimensional joint kinematics. In this, numerical optimization was not employed and experimental EMG data were not used as input.

Results demonstrated the proposed XP-to-activation transformation (Equations 1, 2, and 3) could properly solve for the neuromuscular redundancy by predicting a specific MTU activation solution (among the several possible ones) that well reflected a best fit of the different EMG-based MTU activation strategies observed during the four selected tasks (**Figures 3**, **4**, **Table A2**). This result gains further importance if we consider that MTUs were driven by a set of Gaussian-shaped curves that were not linearly combined according to the muscle weightings (**Figure 2** and **Table 1**). This proves that a subject-generic, task-generic, lowdimensional impulsive controller that recruits groups of MTUs with timing dependent on the stance phase can predict physiological MTU activation patterns that reflect subject-specific, task-specific EMG recordings of higher-dimensionality.

Results also showed that the proposed XP-driven model was able to predict joint moments that matched those experimentally measured from the six selected lower extremity DOFs with comparable accuracy to that associated to the EMG-driven model (**Figure 5**, **Tables A3**, **A4**). The ability of matching joint moments produced during different motor tasks implied that the proposed methodology was able to account for the different MTU activation strategies and contractile conditions associated to each motor task.

Furthermore, results showed that, although the excitation patterns driving the model (i.e. XPs) were the same across tasks (**Figures 3**, **4**), the patterns of predicted joint moments varied across trials and this variability was in agreement with the variability observed both in the experimentally measured joint moments as well as in the joint moments predicted from EMG data (**Figures 5**, **6**, **Table 2**). In this, the task-generic excitation patterns were continuously modulated by the movement-specific estimates of MTU kinematics (i.e. MTU length and moment arms, **Figure 1A**) derived from the experimental joint kinematics input (Sartori et al., 2012c). This modulation process took place both in the Musculotendon Dynamics component (**Figure 1C**) and in the Moment Computation component (**Figure 1D**). The Musculotendon Dynamics component combined MTU activation with MTU length to compute MTU force (Equation 4). The MTU force was then combined with the MTU moment arms to compute MTU moment. Therefore, the computation of MTU force and moment could be seen as a transformation of the taskgeneric MTU activation that accounted for movement-specific MTU kinematics, thus resulting in movement-specific estimates of joint moments (i.e. summation of MTU moments about a specific joint and DOF). This allowed compensating for the static behavior and for the simplified structure of the single XP-based controller.

Previous studies proposed analyses of muscles modularity using NNMF during locomotion tasks including walking (Ivanenko et al., 2005), running, and sidestepping cutting maneuvers (Oliveira et al., 2013). In these studies, five non-negative factors were identified and extracted. These reflected the recruitment of muscles in the lower and upper extremities. Our proposed study identified the same number of non-negative factors with respect to those in the literature (Ivanenko et al., 2005; Oliveira et al., 2013). However, our study only analyzed lower extremity muscles and solely during the stance phase. Furthermore, our extracted set of non-negative factors and weightings reflected the dynamics of four motor tasks simultaneously (i.e. FW, RN, SS, and CO) whereas the previously proposed studies (Ivanenko et al., 2005; Oliveira et al., 2013) analyzed a specific motor task individually, thus generating a task-specific set of non-negative factors and weightings. These differences were reflected in dissimilarities in the timing of the maximum peak amplitude observed across non-negative factors as well as in the distribution of weightings across muscles. Our extracted non-negative factors had maximum peaks localized from about 20% to 80% of the stance phase (**Figure 2**). The non-negative factors reported by (Ivanenko et al., 2005) during walking had the maximum peaks distributed from about 5% to the end of the stance phase. However, the nonnegative factors 1 and 5 reflected the recruitment of trunk and arm muscles and their associated peaks were in the transition from the swing to the stance phase. Higher similarity in the maximum peak timing was observed in the inner non-negative factors including non-negative factor 2 (at about 45% and 55% in our study and in Ivanenko's et al. respectively), and non-negative factor 3 (at about 70% in both studies). A similar scenario was observed in Oliveira et al. (2013) during sidestepping, where nonnegative factors had the maximum peaks distributed throughout the entire stance phase, with non-negative factors 1 and 5 reflecting the recruitment of the trunk muscles and implementing the transition across the stance and swing phase. In this, higher similarity in the maximum peak timing was observed in the inner non-negative factors that most reflected the recruitment of lower extremity muscles. These included non-negative factor 2 in our study and non-negative factor 3 in Oliveira et al. (2013) both at about 45% of stance. Furthermore, it also included non-negative factor 3 in our study and non-negative factor 4 in Oliveira et al. (2013) at about 70% and 65% respectively. The non-negative factors reported by Oliveira et al. (2013) during running had the maximum peaks distributed throughout the entire stance phase. In these, the recruitment of the trunk muscles was reflected by all five non-negative factors. The best similarity in the maximum peak timing was observed in the non-negative factors that most accounted for the recruitment of lower extremity muscles. These included non-negative factor 1 (at about 20% in both studies). Furthermore, it also included non-negative factor 3 in our study and non-negative factor 2 in Oliveira et al. (2013) at about 70 and 65% of stance respectively.

Future work is needed to further improve our proposed methodology. Experimental results showed that the trial-specific non-negative factors extracted from the different motor tasks had peaks that were substantially shifted in time depending on the nature of the task. This was especially evident in the third nonnegative factor (**Figure 2B**). While the timing of the peaks for the trial-specific factors extracted from RN, SS, and CO was consistent, the timing of the peak for the trial-specific factors extracted from FW was anticipated by 30%. Future work is needed to allow adjusting the peak timing, magnitude, and bell width of the parameterized XPs during the model execution to implement the transition across tasks (i.e., RN, CO, and SS). Furthermore, the XP-to-activation transformation (Equations 1, 2 and 3) will be modulated across tasks thus allowing better representing the dynamics of individual movements. Moreover, the parameterized XPs could be, in the future, further modulated in time and amplitude based on biomechanical events triggering appropriate muscle reflexes to allow for adaptation to different gait dynamics and terrains.

The joint moment prediction accuracy tests (**Figure 5** and **Tables A3**, **A4**) revealed that both the XP-driven and the EMGdriven models could not predict a substantial moment contribution about AnkleSF during the CO and SS cutting maneuvers tasks and about HipAA during FW. This may be explained by the fact that the MTUs currently included in the model, with AnkleSF and HipAA moment arms, accounted for the 80 and the 86% respectively of the total physiological cross sectional area. Future work should therefore include additional MTUs crossing the hip and ankle joints. The MTUs in the model with moment arms about the remaining four DOFs accounted for more than the 90% of the total physiological cross sectional area.

Our proposed methodology predicted joint moments during the stance phase only. The main reason for this was that calibration included trials of running, as well as sidestepping and crossover cutting maneuvers. For these motor tasks the swing phase occurred partially, or totally, out of the motion capture volume. Therefore there was an incomplete swing phase data available for calibration across trials. The second, although much lesser reason, was that joint moments were estimated using inverse dynamics, which strongly relies on the magnitude of GRFs (Delp et al., 2007). During the swing phase of locomotion, the GRFs are zero, which means the inverse dynamics calculations become highly sensitive to segmental inertial parameters that are difficult to measure *in vivo*. These include the segment mass, the location of the segment center of mass, and the mass moment of inertia (Lanovaz and Clayton, 2001; Delp et al., 2007), which were only scaled linearly to the subject's size (Delp et al., 2007). Inverse dynamics measurements of joint moments during the swing phase may therefore not be reliable and we preferred not to use these for the model calibration and for the subsequent validation step. Future work will focus on (1) using better methods for extracting subject-specific segmental parameters (i.e. using MRI), and (2) predicting joint kinematics, rather than joint moments, using full forward dynamics models (Barrett et al., 2007) or nonparametric methods such as Bayesian filtering (Ko and Fox, 2009). This will allow extending the analyses presented in this study to the whole gait cycle thus increasing the applicability of our proposed methodology.

The present results showed that our proposed methodology could predict MTU forces and joint moments within the range of DOFs, tasks, and gait cycle phases (i.e. stance phase) on which the model was calibrated. However, how the model extrapolates outside the range of these DOFs, tasks, and gait cycle phases is currently not know. Furthermore, it is not known whether or not new ranges of DOFs, tasks, and gait cycle phases would require updating the Gaussian curves in the impulsive controller accordingly. This requires an extensive and structured research, which was beyond the scope of this study. However, this will be important to be determined as the size of the calibration data set also affects the speed at which calibration can occur. Indeed, our proposed XP-driven model relies upon an off-line calibration procedure that is time consuming. On the other hand, the model execution was observed to be fast, i.e. in the order of 20ms per time frame. Future work should focus on the design of more efficient calibration algorithms. The use of MTU models that do not require an explicit integration of the MTU dynamics equations could considerably speed up the calibration process without loss of joint moment prediction accuracy as it was shown in Sartori et al. (2012b).

This work presented a study on two subjects only. Therefore, it may not be completely generalizable. However, the proposed XPdriven musculoskeletal model was scaled and then calibrated to the actual subjects to account for the subject-specific (1) anthropometry, (2) XP-to-activation mapping, and (3) MTU intrinsic properties. This allows our methods to be applied across individuals without relying on the existence of specific anthropomorphic models, while accounting for the individual's muscle activation patterns across multiple DOFs. This represents an improvement in current state of the art methodologies were the recruited subjects were chosen to be of similar build of the anatomical model (Lloyd and Besier, 2003; Martelli et al., 2011). However, a more general model validation across a larger number of individuals will be the subject of future work.

It is important noting that the aim of our proposed work was not that of addressing all limitations associated to excitationdriven modeling in one single study. Our aim was to demonstrate that a single impulsive controller could be used as input drive to large musculoskeletal models operating in open-loop across different motor tasks with no loss of accuracy with respect to using experimental EMGs. This supports the hypothesis that biomechanically different movements could emerge from the same locomotion program decoded in the spinal circuitries.

The proposed XP-driven model may have direct implications in the development of rehabilitation technologies. The proposed methodology could be, in the future, further extended to create generic XP sets descriptive of larger populations of subjects and motor tasks. Also, additional XP sets could be specifically created for different patient populations thus describing MTU recruitment patterns typically observed in different neurological or orthopedic conditions. This will give the potential possibility of extrapolating the generic XP-based impulsive controller to novel subjects (within a specific population) without needing to record further EMG data.

The ability of our proposed XP-driven model to predict physiological MTU activations and joint moments, will allow obtaining accurate predictions of the user's effort during dynamic movement. This will allow determining how muscles contribute to modulate joint compliance in locomotion (Rapoport et al., 2003; Cronin et al., 2011; Heitmann et al., 2011; Pfeifer et al., 2012). It will allow determining the heat released by muscles and the resulting metabolic energy consumption during movement (Sawicki and Ferris, 2009; Bisi et al., 2011; Krishnaswamy et al., 2011; Farris and Sawicki, 2012). Furthermore, it will allow determining the magnitude of reaction forces in the lower extremity joints (Winby et al., 2009; Lin et al., 2010; Fregly et al., 2012a; Modenese and Phillips, 2012; Manal and Buchanan, 2013). The ability of determining these variables will enable a number of applications in the field of neurorehabilitation technologies including (1) the design of powered prostheses that modulate joint compliance

# **REFERENCES**


during movement. *J. Electromyogr. Kinesiol.* 21, 1074–1080. doi: 10.1016/j.jelekin.2011.07.003


according to that modulated in the subject's contralateral leg, (2) the design of powered orthoses that can effectively reduce the energy consumption during locomotion, and (3) the monitoring and prevention of orthopedic conditions such as osteoporosis and osteoarthritis. In these scenarios, the proposed XP-driven model would only need direct recordings of joint angles and estimates of the gait cycle percentage as input. This would allow decreasing the input drive complexity and the number of needed sensors, thus increasing the robustness of the system with respect to the case requiring real measurements of EMG data.

The availability of the proposed methodology will facilitate the transition toward the design of human-inspired devices that can effectively embody the dynamics of the human neuromuscular control of movement without relying on explicit representations of task-specific control models.

# **ACKNOWLEDGMENTS**

The work was supported by the National Institute of Health in the USA [R01 EB009351-01A2]; the National Health and Medical Research Council in Australia [628850 and 334151]; the Western Australian Medical and Health Research Infrastructure Council; the ERC Advanced Grant DEMOVE; The EU-FP7 Project H2R. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.


M. (1990). An interactive graphicsbased model of the lower extremity to study orthopaedic surgical procedures. *IEEE Trans. Biomed. Eng.* 37, 757–767.


Gaussian process prediction and observation models. *Autono. Robots* 27, 75–90. doi: 10.1007/s10514-009- 9119-x


musculoskeletal model of the knee to predict *in vivo* joint contact forces during normal and novel gait patterns. *J. Biomech. Eng.* 135, 210–214.


of muscle force and joint moment about multiple degrees of freedom in the human lower extremity. *PLoS ONE* 7:e52618. doi: 10.1371/journal.pone.0052618


jogging. *J. Biomech.* 16, 91–97. doi: 10.1016/0021-9290(83)90050-7


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 27 September 2012; accepted: 02 June 2013; published online: 26 June 2013.*

*Citation: Sartori M, Gizzi L, Lloyd DG and Farina D (2013) A musculoskeletal model of human locomotion* *driven by a low dimensional set of impulsive excitation primitives. Front. Comput. Neurosci. 7:79. doi: 10.3389/ fncom.2013.00079*

*Copyright © 2013 Sartori, Gizzi, Lloyd and Farina. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*


# **APPENDIX**

peak). Also see **Table A1**.


**Table A1 | Comparison of muscle weightings normalized with and without accounting for the excitation primitive (XP) peak.**

**Table A2 | Coefficients of determination (***R***2) and the normalized root mean squared deviation (***NRMSD***) between the MTU activations predicted using the XP-driven and the EMG-driven models.**


*(Continued)*

### **Table A2 | Continued**


*The R*<sup>2</sup> *and the NRMSD are reported both for (1) the MTU activations averaged over all trials for a certain task and subject and for (2) the MTU activation values averaged over all trials and subjects.*

### **Table A3 | Coefficients of determination (***R***2) and the normalized root mean squared deviation (NRMSD) between the joint moments measured experimentally and those predicted by the XP-driven model.**


*Values are reported for joint moment estimates averaged over all trials within a certain task.*

### **Table A4 | Coefficients of determination (***R***2) and the normalized root mean squared deviation (***NRMSD***) between the joint moments measured experimentally and those predicted by the EMG-driven model.**


*Values are reported for joint moment estimates averaged over all trials within a certain task.*

# A computational analysis of motor synergies by dynamic response decomposition

#### *Cristiano Alessandro1 \*, Juan Pablo Carbajal <sup>2</sup> and Andrea d'Avella3*

*<sup>1</sup> AI Lab, Department of Informatics, University of Zurich, Zurich , Switzerland*

*<sup>2</sup> Department of Electronics and Information Systems, Ghent University, Ghent, Belgium*

*<sup>3</sup> Laboratory of Neuromotor Physiology, Fondazione Santa Lucia, Rome, Italy*

### *Edited by:*

*Tamar Flash, Weizmann Institute, Israel*

### *Reviewed by:*

*Florentin Wörgötter, University Goettingen, Germany Simon Giszter, Drexel Medical School, USA*

### *\*Correspondence:*

*Cristiano Alessandro, AI Lab, Department of Informatics, University of Zurich, Andreasstrasse 15, Zurich 8050, Switzerland e-mail: alessandro@ifi.uzh.ch*

Analyses of experimental data acquired from humans and other vertebrates have suggested that motor commands may emerge from the combination of a limited set of modules. While many studies have focused on physiological aspects of this modularity, in this paper we propose an investigation of its theoretical foundations. We consider the problem of controlling a planar kinematic chain, and we restrict the admissible actuations to linear combinations of a small set of torque profiles (i.e., motor synergies). This scheme is equivalent to the time-varying synergy model, and it is formalized by means of the dynamic response decomposition (DRD). DRD is a general method to generate open-loop controllers for a dynamical system to solve desired tasks, and it can also be used to synthesize effective motor synergies. We show that a control architecture based on synergies can greatly reduce the dimensionality of the control problem, while keeping a good performance level. Our results suggest that in order to realize an effective and low-dimensional controller, synergies should embed features of both the desired tasks and the system dynamics. These characteristics can be achieved by defining synergies as solutions to a representative set of task instances. The required number of synergies increases with the complexity of the desired tasks. However, a possible strategy to keep the number of synergies low is to construct solutions to complex tasks by concatenating synergy-based actuations associated to simple point-to-point movements, with a limited loss of performance. Ultimately, this work supports the feasibility of controlling a non-linear dynamical systems by linear combinations of basic actuations, and illustrates the fundamental relationship between synergies, desired tasks and system dynamics.

### **Keywords: muscle synergies, number of synergies, system dynamics, kinematic strokes, kinematic chain**

# **1. INTRODUCTION**

Richness, flexibility, and adaptability characterize the generation of movements in many animal species. During the last century these features have fascinated many scientists, who started to investigate the possible mechanisms underlying the observed motor performance. Although many questions remain open, today there is a large consensus that motor skills may arise from a modular and hierarchical organization of the movement system (Kargo and Giszter, 2000a,b; Hart and Giszter, 2004; Ting and McKay, 2007; Bizzi et al., 2008; Kargo and Giszter, 2008; d'Avella and Pai, 2010). This idea was initially introduced by Bernstein (1967) in the context of motor redundancy, and it has then evolved into different, yet related, concepts (Flash and Hochner, 2005; Giszter et al., 2010). The common denominator of these ideas is that motor actions emerge from the combination of a limited set of modules. This strategy would reduce the number of variables to be controlled, and therefore it might simplify motor control and learning.

One of the proposed forms of modularity are the so-called muscle synergies, coordinated activations of groups of muscles (Tresch et al., 1999; Saltiel et al., 2001; d'Avella et al., 2003). Hypothetically, the central nervous system (CNS) encodes a parsimonious set of synergies and combines them in a task-dependent fashion to generate appropriate motor commands. This hypothesis is typically evaluated by analyzing the spatio-temporal regularities of electromyographic signals (EMG) recorded from a group of subjects. Decomposition-based techniques, such as principal component analysis (PCA) or non-negative matrix factorization (NMF), are used to extract the components that best reconstruct the recorded dataset. In many cases these components (i.e., synergies) appear very similar across different experimental conditions, and therefore they are regarded as an indirect evidence of the hypothesized neural modularity. This methodology has been successful in explaining muscle contractions across a wide range of complex tasks (e.g. running, walking, keeping balance, reaching and other combined movements) in humans (Ivanenko et al., 2005; Cappellini et al., 2006; d'Avella et al., 2006, 2008, 2011; Torres-Oviedo and Ting, 2007, 2010), in frogs (Giszter et al., 1993; Mussa-Ivaldi et al., 1994; Kargo and Giszter, 2000b, 2008; Mussa-Ivaldi and Bizzi, 2000), cats (Ting and Macpherson, 2005; Torres-Oviedo et al., 2006), monkeys (Overduin et al., 2008, 2012), and other species (Dominici et al., 2011). However, the results are often descriptive in nature and they do not offer a principled investigation of the hypothesized synergy-based control strategy (Alessandro et al., 2013).

The implementation of muscle synergies within the CNS is currently under investigation (Bizzi and Cheung, 2013). Recently, Hart and Giszter (2010) have provided direct evidence that dedicated sets of spinal interneurons are associated to the temporal activations of synchronous synergies in frogs. Experiments with monkeys (Overduin et al., 2012) and humans (Cheung et al., 2009b; Clark et al., 2010) suggest that synergies may be organized in the spinal cord and in the cortico-spinal divergent connectivity, and that the motor cortex modulates their recruitment. For visually guided tasks, time-varying synergies might be represented also at the cortical level; their spatial structure might derive from divergent corticospinal connectivity or from spinally organized modules, and their temporal characteristic may originate from the activation dynamics of the motor cortex (d'Avella et al., 2006, 2008, 2011).

While these studies focus on physiological aspects of the muscle synergy hypothesis, very little research addresses the theoretical foundation of the proposed modular controller. Which synergies should be employed to execute the desired motor tasks? How many synergies are needed? How does the dynamics of the system to be controlled affect the synergy-set? Is there a relation between the desired tasks and these elementary control modules? Addressing these theoretical questions would certainly provide a better understanding of the muscle synergy hypothesis, and might eventually lead to a computational model to explain the experimental data. In this paper we analyze these aspects from the perspective of controlling an idealized arm. We formulate control signals for a planar kinematic chain as linear combinations of a small set of predefines actuations (i.e., synergies) in accordance with the model of time-varying synergies (d'Avella et al., 2003). For this purpose we propose the dynamic response decomposition (DRD), a general tool to find the open-loop controllers that enable a dynamical system to solve desired tasks (Alessandro et al., 2012; Carbajal, 2012). Our method initially solves the task in state variables by interpolation; then, it identifies the combination of synergies (i.e., actuation) that leads to the closest kinematic trajectory to the computed interpolant. Additionally we propose a procedure to synthesize a limited set of effective synergies. In this manuscript we apply the DRD to point-to-point reaching tasks, and to via-point movements. Within the latter class of tasks we analyze two specific scenarios: (1) moving to a desired target and coming back to the initial posture (i.e., reversal task) and (2) reaching a desired location, passing though a given via-point (i.e., via-point reaching). Our theoretical analysis is independent of the biological implementation details of muscle synergies; i.e., we employ a kinematic chain instead of a biologically plausible musculoskeletal model, and DRD is currently not proposed as a model of the CNS mechanisms underlying muscle synergies. However, we believe that our results have a general validity as they interpret the fundamental problem of controlling a non-linear dynamical system by means of a modular synergy-based controller.

Reversal and via-point reaching movements can be subdivided in two distinct kinematic phases: from the initial to the intermediate point, and from the intermediate to the final point. A possible strategy to solve these tasks is therefore to concatenate the actuations associated to the two phases; each actuation is in turn realized as a combination of synergies. This idea is related to another form of modularity, the composition of movements into sequences of kinematic primitives, or strokes (Flash et al., 1992; Novak et al., 2003). While this segmentation explains a vast amount of experimental data, there is no consensus on whether such strokes effectively reflect a segmented control strategy (Fishbach et al., 2005, 2007). Alternatively they could just emerge as a result of a possible trajectory optimization (Dagmar and Schaal, 1999), or even be artifacts of the data analysis. In these latter cases the actuation could be computed in its entirety without concatenation. In this manuscript we analyze both strategies: the concatenation of simple synergy-based control signals, and the computation of a synergy-based actuation for the whole task. This investigation provides some computational insights on the advantages and the disadvantages of these two approaches, and it offers a proof of concept on how muscle synergies and kinematic modularity might be integrated into a unified framework.

This paper is organized as follows. In section 2 we introduce the mathematical formulation of DRD, the method that we employ throughout the paper to synthesize synergies and to compute task solutions. Section 3 presents the results obtained for reversal and via-point reaching tasks. Such results are further discussed in section 4, where we additionally summarize and speculate on important aspects of the muscle synergy hypothesis that are highlighted by DRD; finally we provide some concluding remarks.

# **2. METHODS**

In this section we introduce the mathematical details of the dynamic response decomposition (DRD). After some definitions, we present the core element of the method: a general procedure to compute actuations that solve generic reaching tasks (see section 2.1). Subsequently, in section 2.2, we show how DRD can be used for the synthesis of a set of synergies.

Let us consider a differential equation modeling a physical system

$$\mathcal{D}\left(\mathfrak{q}(t)\right) = \mathfrak{u}(t),$$

where *D* is a differential operator, *q(t)* represents the timeevolution of the configuration variables (their derivatives with respect to time are *q***˙***(t)*), and *u(t)* is the actuation applied. Inspired by the hypothesis of muscle synergies, we formulate the actuation as a linear combination of predefined motor co-activation patterns:

$$\mathfrak{u}(t) = \sum\_{i=1}^{N\_{\Phi}} \phi\_i(t) b\_i := \Phi(t) b,\tag{1}$$

where the *N*<sup>φ</sup> functions φ*i(t)* ∈  are called *motor synergies*, and are modulated by the weighting coefficients *bi*. The notation *-(t)* describes a formal matrix where each column is a different synergy, and the column vector *b* encapsulates the weighting coefficients. If we consider a time discretization, *-(t)* becomes a *N* dim*(q)*-by-*N*<sup>φ</sup> matrix, where *N* is the number of time steps and dim*(q)* is the dimensionality of the configuration space. Equation (1) is essentially equivalent to the model of time-varying synergies (d'Avella et al., 2003), however, in this paper we neglect the possibility to modulate the onset time of each synergy.

We define *dynamic responses* (DR) of the set of synergies the responses **θ***i(t)* ∈ of the system to each synergy (i.e., forward dynamics):

$$\mathcal{D}(\boldsymbol{\theta}\_{i}(t)) = \phi\_{i}(t) \quad \boldsymbol{i} = 1...N\_{\phi}. \tag{2}$$

with initial conditions chosen arbitrarily.

### **2.1. THE DYNAMIC RESPONSES DECOMPOSITION**

A generic reaching task consists in reaching a final state - *qT, q***˙***<sup>T</sup>* from an initial state - *q*0*, q***˙**<sup>0</sup> in a given amount of time *T* satisfying intermediate constraints called via-points. In the case of a single via-point defined at time *tv*, the task can be formalized as follows:

$$\begin{aligned} q(0) &\doteq q\_0, \qquad \dot{q}(0) \doteq \dot{q}\_0, \\ q(t\_\nu) &\doteq q\_\nu, \qquad \dot{q}(t\_\nu) \doteq \dot{q}\_\nu, \\ q(T) &\doteq q\_T, \quad \dot{q}(T) \doteq \dot{q}\_T, \end{aligned} \tag{3}$$

where *.* = indicates a prescribed value, i.e., a point constraint. Depending on the desired task, more or less requirements can be imposed. For example a simple point-to-point reaching task consists only of the constraints defined at *t* = 0 and *t* = *T*. Furthermore, one could formulate via-point tasks without prescribing any velocity. This would define a class of tasks where the system is free to traverse the desired positions with any velocity. In addition, it is also possible to constrain higher order time derivatives of the configuration vector, e.g. acceleration, jerk, etc.

Controlling a system to perform a given task amounts to finding the actuation *u(t)* that leads to an evolution of the system-variables that fulfills the point constraints (Equation 3). Specifically, assuming that the synergies are known, the goal is to identify the appropriate synergy combination coefficients *b*. The DRD procedure consists of, first, solving the problem in kinematic space (i.e., finding an appropriate *q(t)*), and then computing the corresponding actuation. From the kinematic point of view, solving the task can be seen as an interpolation problem; i.e., a set of functions is used to generate a trajectory *q(t)* that interpolates the points {*qk(tk), q***˙***k(tk)*}*<sup>k</sup>* <sup>=</sup> <sup>0</sup>*,v,<sup>T</sup>* associated to the task-constraints (Equation 3); the idea is not to track a desired trajectory defined *a priori*, but to find any trajectory that passes through the points defined by the task. To build this interpolant one could employ orthonormal polynomials, trigonometric or Gaussian functions, to mention just a few possibilities. One of the most salient properties of DRD is that it employs the dynamic responses of the synergies (given by Equation 2), that is:

$$\boldsymbol{q}(t) = \sum\_{i=1}^{N\_{\boldsymbol{\theta}}} \boldsymbol{\theta}\_{i}(t)\boldsymbol{a}\_{i} := \boldsymbol{\Theta}(t)\boldsymbol{a}.\tag{4}$$

The quality of the DRs as building blocks for the interpolation was evaluated in our previous works on planar kinematic chains (Alessandro et al., 2012) and other dynamical systems (Carbajal, 2012). As we mentioned before, if time is discretized, *(t)* becomes a *N* dim*(q)*-by-*N*<sup>θ</sup> matrix, where *N*<sup>θ</sup> is the number of dynamic responses. The vector of combination coefficients *a* is chosen such that the task constraints are satisfied, obtaining one out of the myriad of possible trajectories that solve the task. Specifically, this vector is computed by solving the following linear system of equations:

$$\begin{pmatrix} \Phi\_1(0) \dots \Phi\_{N\_{\emptyset}}(0) \\ \Phi\_1(t\_{\boldsymbol{\nu}}) \dots \Phi\_{N\_{\emptyset}}(t\_{\boldsymbol{\nu}}) \\ \Phi\_1(T) \dots \Phi\_{N\_{\emptyset}}(T) \\ \dot{\Phi}\_1(0) \dots \dots \dot{\Phi}\_{N\_{\emptyset}}(0) \\ \dot{\Phi}\_1(t\_{\boldsymbol{\nu}}) \dots \dot{\Phi}\_{N\_{\emptyset}}(t\_{\boldsymbol{\nu}}) \\ \dot{\Phi}\_1(T) \dots \dots \dot{\Phi}\_{N\_{\emptyset}}(T) \end{pmatrix} a = Ma = \begin{pmatrix} q\_0 \\ q\_{\boldsymbol{\nu}} \\ q\_T \\ \dot{q}\_0 \\ \dot{q}\_T \\ \dot{q}\_T \end{pmatrix} = P. \tag{5}$$

The matrix *M* in the left-hand side is called *alternant matrix*; the solvability of the problem depends on its rank. If the matrix has full row rank, any point constraint can be solved. Otherwise, the possibility to find an exact solution (as opposed to an approximation) becomes strictly dependent on the specific task. According to the Rouché-Capelli theorem, if the rank of the alternant matrix (not necessarily equal to number of rows) is equal to the rank of the augmented matrix [*M*|*P*], where *P* is the vector of point constraints, the specific problem can be solved exactly. Section 3 presents some examples. These observations, and their implications for the hypothesis of muscle synergies, are further discussed in section 4.

Once a kinematic solution has been found (as a linear combination of DRs), the corresponding actuation *u*˜*(t)* can be obtained by applying the differential operator (i.e., inverse dynamics);

# *D ((t)a)* = *u*˜*(t).*

Finally, the vector *b* can be computed by projecting *u*˜*(t)* onto the linear span of the synergy set *-*. If *u*˜*(t)* does not belong to the linear span of *-*, the solution can only be approximated in terms of a defined norm (e.g. Euclidean):

$$b = \operatorname\*{arg\,min}\_{b} ||\tilde{u}(t) - \Phi(t)b||. \tag{6}$$

When time is discretized, all functions of time become vectors and this problem can be solved explicitly using the psuedo-inverse of the matrix *-(t)*,

$$
\Phi^+ \tilde{u} = \Phi^+ \mathcal{D}\left(\Theta a\right) = b. \tag{7}
$$

This equation highlights the mapping between the kinematic combination coefficients *a* (kinematic solution) and the synergy combination coefficients *b* (dynamic solution):

$$
\mathcal{F} = \Phi^+ \circ \mathcal{D} \circ \Theta,\tag{8}
$$

where ◦ denotes composition. Generically, this operator represents a non-linear mapping *<sup>F</sup>* : <sup>R</sup>*N*<sup>θ</sup> <sup>→</sup> <sup>R</sup>*N*<sup>φ</sup> , and it will be discussed in section 4.3.

To assess the quality of the solution we define the following measures:

*Interpolation error*: measures the quality of the interpolant *(t)a* with respect to the task-constraints.

$$\text{err}\_{I} = \sqrt{\sum\_{k \in K} e\_{I\text{P}k}^{2} + e\_{IVk}^{2}}$$

$$e\_{I\text{P}k} = ||q\_{k} - \Theta(t\_{k})a|| \qquad e\_{IVk} = ||\dot{q}\_{k} - \dot{\Theta}(t\_{k})a|| \tag{9}$$

$$K = \{0, \nu\_{1}, \dots, \nu\_{n}, T\}$$

where || · || denotes the Euclidean norm, and the difference between angles are mapped to the interval *(*−π*,*π]. The subindex *k* identifies the point constraint, i.e., *k* = 0 for the initial condition, *k* = *vi* for the *i*-th via-point, and *k* = *T* for the final condition. In this work we consider tasks with a single or with no via-points, i.e., *K* = {0*, v, T*} and *K* = {0*, T*}, respectively (the latter case corresponding to simple point-to-point tasks). Note that err*<sup>I</sup>* is not a tracking error with respect to a predefined trajectory, but a measure of the distance between *(t)a* and the points {*qk(tk), q***˙***k(tk)*} defined by the tasks.

*Projection error*: measures the distance between the actuation *u*˜*(t)*, that solves the task, and the control signal obtained by the linear combination of the synergies *-*

$$\text{err}\_P = \sqrt{\int\_0^T ||\tilde{u}(t) - \Phi(t)b||^2 dt}. \tag{10}$$

This error represents the loss caused by projecting the actuation *u*˜*(t)* onto the linear span of the synergies, and is zero only when the calculated actuation is an element of this span.

*Forward dynamics error*: measures the quality of the trajectory *q*˜*(t, b)*, obtained by applying the actuation *-(t)b* to the dynamical system (i.e., forward dynamics), with respect to the task constraints

$$\text{err}\_F = \sqrt{\sum\_{k \in K} e\_{F\mathbb{k}k}^2 + e\_{F\mathbb{k}}^2}$$

$$e\_{F\mathbb{k}k} = ||q\_k - \tilde{q}(t\_k, b)|| \qquad \qquad e\_{F\mathbb{k}k} = ||\dot{q}\_k - \dot{\tilde{q}}(t\_k, b)|| \qquad$$

$$K = \{0, \nu\_1, \dots, \nu\_n, T\}$$

Similarly to the interpolation error, err*<sup>F</sup>* is not a tracking error with respect to a desired trajectory, but a measure of the distance between *q*˜*(t, b)* and the points defining the tasks. Replacing *q***˜**, **˙** *q***˜**, *q<sup>k</sup>* and *q***˙***<sup>k</sup>* with their corresponding end-effector values provides the *forward dynamics error of the end-effector*.

Note that the quantities err*<sup>I</sup>* and err*<sup>F</sup>* provide a cumulative evaluation of the DRD solution with respect to all the taskconstraints. Mathematically, they represent the Euclidean distance between the DRD solution and the points characterizing the task. Since these errors are defined as a sum over quantities with different units, it could be hard to interpret them from a physical point of view. To overcome this problem, we present our results in two ways. On one hand, we present them in terms of error measures above, which provide a cumulative assessment of the results simplifying the explanation. On the other hand, we report the results in terms of the quantities *e*IPk, *e*IVk, *e*FPk, *eFVk*, which represent interpolation and forward dynamics errors with respect to position and velocity constraints independently, and therefore are susceptible to a physical interpretation. These quantities will be normalized by factors that provide references to the obtained results, and that will be defined in the next sections.

### **2.2. SYNTHESIS AND DEVELOPMENT OF SYNERGIES**

The synthesis of synergies is carried out in two phases: exploration and reduction. The exploration phase consists in actuating the system with an extensive set of motor signals *-*<sup>0</sup> to obtain the corresponding DRs 0. The reduction phase consists in solving a small set of tasks (that we call proto-tasks, and are defined as a set of point constraints) in kinematic space, and then computing the corresponding actuations. The elements of the set <sup>0</sup> are used to interpolate the proto-tasks as described in Equations (4) and (5); the obtained trajectories are taken as the elements of the reduced set . Finally, the synergy set  is computed by applying relation (Equation 2), i.e., inverse dynamics, to these kinematic trajectories. As a result, there will be as many synergies as the number of proto-tasks (i.e., *N*<sup>φ</sup> = *N*θ).

In a nutshell, the synthesized synergies are the actuations solving the proto-tasks. A legitimate question is: "how do we choose the proto-tasks?" In principle, the DRD method does not impose any restriction. However, in order to obtain satisfactory performance, synergies should be able to approximate the desired actuations. Since the control signals corresponding to similar tasks are likely to be characterized by similar features, a reasonable choice is that the proto-tasks belong to class of the desired tasks (e.g. reversal, via-point reaching). In such a case, the synthesized synergies are actuations solving instances of the desired class of tasks, and therefore they embed the characteristic features of the desired control signals. Thus, we expect that appropriate linear combinations of these synergies are able to approximate the other actuations belonging to the desired set. In general, the more similar the proto-tasks are to the tasks to be solved (in terms of Equation 3), the better the performance of the corresponding synergies. Section 3.4 provides some examples and addresses these issues in detail.

Two other aspects that directly influence the quality of the synergy-based controller are the number of proto-tasks and their particular instances. To obtain good performance in a wide variety of tasks, the constraints defining the proto-tasks should cover relevant regions of the state space. Clearly, an increasing number of (different) proto-tasks corresponds to a gradual improvement of the overall performance. However, it also systematically expands the synergy-set, thus affecting the dimensionality of the controller. In order to tackle this tradeoff, we propose a procedure that parsimoniously adds a new proto-task only when and where it is needed: if the performance in a desired task is not satisfactory, we add a new proto-task in one of the regions of the state-space with the highest projection error. In other words, the new proto-task is the task with the worst approximated actuation. Note that the procedure to evaluate the projection error in the entire workspace does not involve any actual task execution nor forward dynamics integration, and therefore it is relatively light in calculation.

# **3. RESULTS**

We apply the methodology described in section 2 to a simulated planar kinematic chain modeling a human arm [see (Hollerbach and Flash, 1982) for model details]. In the exploration phase, we employ an extensive set of motor signals *-*<sup>0</sup> to actuate the arm model and generate the corresponding dynamic responses 0. The nature of these signals has a marginal role and it does not affect the quality of the obtained results (Alessandro et al., 2012; Carbajal, 2012). Here we use a set of 90 low-pass filtered uniformly random signals (butterworth with cutoff frequency of 0*.*314 rad). We test the performance of the method on three classes of tasks: point-to-point (section 3.1), reversal (section 3.2), and via-point-reaching (section 3.3).

### **3.1. POINT-TO-POINT TASKS**

A point-to-point reaching task consists in reaching a final state from an initial state in a given amount of time. Thus, a task instance is specified by four two-dimensional point constraints: initial and final joint angles and velocities. In this section we restrict our analysis to the subclass of tasks that are characterized by the initial position *q<sup>c</sup>* (red cross in **Figure 1**), and that impose initial and final velocities equal to zero, i.e., *q***˙***<sup>T</sup>* = *q***˙**<sup>0</sup> = 0. The only unspecified constraints are the joint-coordinates of the target; i.e., since the kinematic chain has two degrees of freedom (DoF) there are two free task-parameters. Essentially the arm is required to start from the configuration *q<sup>c</sup>* and reach a desired target with zero velocity. Note that the velocity constraints are added just to restrict the class of desired tasks, and therefore to simplify the explanations throughout the paper. The method is mathematically general, and therefore can also be used to solve tasks in which these constraints are not imposed.

configuration vector corresponding to the red cross is referred as *q<sup>c</sup>* .

After the reduction phase the linear system in Equation (5) becomes:

$$\begin{pmatrix} \mathbf{q}\_{\mathcal{L}} & \dots & \mathbf{q}\_{\mathcal{L}} \\ \mathbf{e}\_{1}(T) & \dots & \mathbf{e}\_{N\flat}(T) \\ 0 & \dots & 0 \\ 0 & \dots & 0 \end{pmatrix} \mathbf{a} = \begin{pmatrix} \mathbf{q}\_{\mathcal{L}} \\ \mathbf{q}\_{T} \\ 0 \\ 0 \end{pmatrix},\tag{12}$$

where **θ** are the reduced DRs, and *q<sup>T</sup>* is the target posture (that uniquely defines a desired task instance as *q<sup>c</sup>* is a fixed value). Since each element is a two-dimensional column vector, the extended matrix consists of four non-zero rows; the first two rows consist of repetitions of the same numerical values (the components of *qc*). As a result, an exact kinematic solution is guaranteed if the rank of the alternant matrix is equal to 3; i.e., there should be at least three linearly independent columns. This poses a lower bound on the minimum required number of DRs and therefore of synergies. However, a higher number of synergies might be necessary to achieve satisfactory approximations of the desired actuations, and ultimately to fulfill the task requirements.

Notice that in order to obtain the alternant matrix described in Equation (13), the proto-tasks should belong to the same class of the desired tasks (i.e., point-to-point, starting at *qc*). Additionally, the exploration DRs <sup>0</sup> should be able to generate kinematic solutions that fulfills all the constraints of the proto-tasks (i.e., zero interpolation error). As it was shown by Carbajal (2012), for systems with non-linear dynamics this is likely to happen as the 8-by-90 alternant matrix, built from the exploration DRs, most probably contains more than eight linearly independent columns. Thus any point-to-point task could be solved.

**Figure 2A** shows the distribution of the projection error for an increasing number of synergies, and exemplifies the proposed procedure to incrementally add new proto-tasks. Initially, two targets are chosen randomly (top left panel); subsequent targets are added in the regions characterized by higher projection error. As it can be seen, the introduction of new proto-tasks leads to better performance on wider regions of the space, and eventually the actuations needed to solve any point-to-point task can be reasonably approximated (err*<sup>P</sup> <* 10−<sup>2</sup> Nm with seven synergies). The bottom right panel shows the distribution of the forward dynamics error of the end-effector obtained with seven proto-tasks. Comparing this panel with the bottom center one (projection error with seven proto-tasks), it can be seen that the forward dynamics error reproduces the distribution of the projection error, rendering the latter a good estimate of the relative forward performance across tasks. However, it is important to stress that, due to the non-linearity of the dynamical system, the projection error serves only as an heuristic estimate of the actual error made when executing the task.

**Figure 2B** shows the trend of the average projection error (across the targets distributed in the workspace) as a function of the number of proto-tasks. Depending on the precision required, more or less proto-tasks can be used. Here we employ seven proto-tasks to obtain an average projection error *<* 10−<sup>2</sup> Nm. This means that the actuations to solve any point-to-point task

initial posture (straight segments), and the distribution of the projection error over the end-effector space (colored region). The color of each point indicates the projection error produced to reach a target in that position. The bottom right panel shows the distribution of the forward dynamics error of the end-effector using seven proto-tasks (seven synergies). **(B)** Average projection error (across targets distributed in the

between the synthesized synergies (filled circles) and subsets randomly selected from the exploration-actuations (box-plots). **(D)** Actuation that solves the task (continuous lines) and projected (dashed lines) torque, and interpolated (continuous lines) and executed (dashed lines) joint trajectories for the tasks with the highest projection error (i.e., target 11).

(starting at *qc*) can be approximated by combining only seven synergies. The average forward dynamics error err*<sup>F</sup>* using seven synergies amounts to ≈10<sup>−</sup>2. These results show that a set of "good" synergies can drastically reduce the dimensionality of the controller, while maintaining satisfactory performance. Note that the controller has to "choose" the values of two joint-torques at each time-step, thus its dimensionality is much higher than the number of DoF of the system (in fact it is infinite dimensional if we consider actuations as continuous vector-valued functions of time). Hence, seven synergies contribute a dimensionality reduction even if the system has two DoF (Alessandro et al., 2013).

To further demonstrate that the reduction phase is not trivial, we compare the errors resulting from the set of seven synthesized synergies, with the errors corresponding to 100 random subsets of size seven drawn from the exploration signals. The testing point-to-point tasks are identified by the 13 targets depicted in **Figure 1**. **Figure 2C** shows that the errors of the random subsets (box-plots) are always orders of magnitude higher than the errors of the synergies resulting from the reduction phase (filled circles). The seven reduced DRs lead to an alternant matrix with rank equal to 3, therefore any point-topoint constrain-vector of this class can be interpolated exactly. As a result, in contrast to the case of random DRs, the obtained interpolation error is negligible for all the testing tasks (err*<sup>I</sup>* 10−<sup>15</sup> ∼ 0). In terms of projection and forward dynamics error, the reduced synergies perform about 2–3 orders of magnitude better than any random subset. Additionally, they lead to high task performance (forward dynamics errors in the range [10−3*,* 10−2]), yet greatly reducing the dimensionality of the controller.

**Figure 2D** exemplifies these results for the testing tasks characterized by the highest projection error (target 11). The difference between the torque that solves the task *u*˜*(t)* (continuous lines) and that obtained as a linear combination of synergies *b* (dashed lines) is negligible. Similarly, there is negligible difference between the kinematic solution obtained as a linear combination of DRs (continuous lines) and the trajectory resulting from the projected actuation (dashed lines).

A more detailed evaluation of the obtained results is summarized in **Table 1**, which presents the normalized values of interpolation and forward dynamics errors for each task-constraint separately at the target points (i.e., *k* = *T*, see Equations 9 and 11). The errors in position (*e*IPT and *e*FPT) are normalized to ||*ePM*|| = 5*.*02 rad, where *ePM* is a vector containing the angular ranges of the two joints (therefore encoding the maximum position error possible); the errors in velocity (*e*IVT and *e*FVT) are normalized to ||*eVM*|| = 5*.*70 rad/s, where *eVM* contains the peak angular velocities of the two joints across the kinematic solutions to the 13 testing tasks. As it can be seen, the very satisfactory maximum normalized values are 3*.*62 × 10−<sup>4</sup> (i.e., 0*.*0002 rad, task 12) for position, and 5*.*13 × 10−<sup>3</sup> (0*.*03 rad/s, task 11) for velocity forward dynamics errors.

**Table 1 | Normalized interpolation (int) and forward dynamics (fwd. dyn.) errors for each task-constraint of the testing point-to-point tasks.**


*The normalization factors are* ||*ePM*|| = *5.02 rad and* ||*eVM*|| = *5.70 rad/s for position and velocity errors, respectively; the rationale behind these factors is discussed in section 3.1. The errors are evaluated at the time of the target constraint T . The expressions pos and vel identify position and velocity constraints, respectively.*

# **3.2. REVERSAL TASKS**

A reversal task consists in reaching a desired target and coming back to the initial position. The tasks considered in this subsection are characterized by zero velocity at the time of the constraints, i.e., *q***˙***(*0*)* = *q***˙***(tv)* = *q***˙***(T)* = 0, and by the initial (and final) posture placed in the center of the operational space, i.e., *q(*0*)* = *q(T)* = *q<sup>c</sup>* (red cross in **Figure 1**). Thus, the only free taskparameters are the joint-coordinates of the intermediate target (two parameters). In other words, the agent is required to reach a certain location with zero velocity (i.e., the via-point), and return to its initial posture. These reversal tasks have relevance as they resemble the motion performed for carrying objects to and from the agent, e.g. reaching for food and bringing it to the mouth, or picking up a salient object and moving it closer for examination.

After the reduction phase, the linear system of Equation (5) becomes:

$$\begin{pmatrix} q\_c & \dots & q\_c \\ \Phi\_1(t\_\nu) & \dots & \Phi\_{N\_0}(t\_\nu) \\ & q\_c & \dots & q\_c \\ & 0 & \dots & 0 \\ & 0 & \dots & 0 \\ 0 & \dots & 0 \end{pmatrix} a = \begin{pmatrix} q\_c \\ q\_\nu \\ q\_c \\ 0 \\ 0 \\ 0 \end{pmatrix} . \tag{13}$$

where **θ** are the reduced DRs, and *q<sup>v</sup>* is the intermediate desired position (that uniquely defines the specific task instance). For the same rationale discussed in section 3.1, to guarantee the existence of an exact kinematic solution for any reversal task belonging to this class, the rank of the alternant matrix, and therefore the minimal number of DRs, should be equal to 3. However, the number of synergies required to obtain satisfactory values of projection and forward dynamics errors might be higher.

Like in the case of point-to-point movements, proto-tasks belong to same class of the desired tasks (i.e., reversal, *q*<sup>0</sup> = *q<sup>T</sup>* = *qc*), and they are added incrementally. Since the position of the desired intermediate target is the only unknown, the newly added proto-task is identified by placing the via-point in the region of the operational space with the highest projection error. As shown in **Figure 3A**, this strategy aims at decreasing the projection error over the entire configuration space, such that eventually the actuations necessary to solve any reversal task can be approximated satisfactorily. In particular, eight synergies are enough to obtain an average projection error err*<sup>P</sup> <* 10−<sup>2</sup> Nm (see **Figure 3B**, blue line), and an average forward dynamics error of ≈ 10<sup>−</sup>2.

The reduced synergies are compared to 100 subsets of 8 actuations, randomly chosen from the exploration motor signals. The testing reversal tasks are identified by the 13 intermediate targets depicted in **Figure 1**. The results shown in **Figure 3C** provide additional evidence that the reduction phase identify effective synergies: the mean errors of the random subsets (boxplot) are orders of magnitude higher than those corresponding to the reduced synergies (filled circles), and the forward dynamics errors lie in the range [10−3*,* 10−2], meaning that the 13 approximated actuations lead to good task performance. **Figure 3D** depicts the DRD solution of the task with highest projection error (target 11). The difference between computed and projected torques, as well

**FIGURE 3 | Results of reversal tasks. (A)** Selection of proto-tasks based on projection error. Each panel shows the kinematic chain in its initial posture (straight segments), and the distribution of the projection error over the end-effector space (colored region). The color of each point indicates the projection error produced to reach that position and to go back to the initial posture. The bottom right panel shows the distribution of the forward dynamics error of the end-effector using eight proto-tasks (eight synergies). **(B)** Averaged projection error as a function of the number of proto-tasks for increasingly general classes of via-point tasks. The least general tasks are reversal motions (blue continuous line), characterized by two free task-parameters (i.e., configuration of the intermediate target). An increase in generality

consists in fixing only the initial posture, while intermediate target and final position represents free task-parameters (red dotted line). Finally the most general class (green dashed line) does not fix any posture (six free task-parameters). The number of synergies required to achieve the same error increases with the generality of the class of tasks. These results are discussed in section 3.4. **(C)** Evaluation of the reduction phase for the testing reversal tasks. Comparison between the synthesized synergies (filled circles) and subsets randomly selected from the exploration-actuations (box-plots). **(D)** Actuation that solves the task (continuous lines) and projected (dashed lines) torque, and interpolated (continuous lines) and executed (dashed lines) joint trajectories for the tasks with the highest projection error (i.e., target 11).

as the difference between computed and executed trajectories are negligible, showing the quality of the synthesized synergies.

The values of the normalized interpolation and forward dynamics error for each task constraints are summarized in **Table 2**. The normalization factors, computed as in section 3.1, are ||*ePM*|| = 5*.*02 rad, and ||*eVM*|| = 8*.*20 rad/s, for position and velocity errors, respectively. The maximum normalized values of the errors are 1 × 10−<sup>3</sup> (i.e., 0*.*005 rad, task 12, *k* = *T*) for position, and 2*.*5 × 10−<sup>3</sup> (0*.*02 rad/s, task 11, *k* = *T*) for velocity forward dynamics errors.

### *3.2.1. Concatenation of point-to-point actuations*

Reversal tasks are composed by two kinematically different phases: from the initial point to the target (center-out), and from the target back to the initial position (out-center). Therefore, it should be possible to generate suitable control signals by


**Table 2 | Normalized interpolation (int) and forward dynamics (fwd. dyn.) errors for each task-constraint of the testing reversal tasks.**

*The normalization factors are* ||*ePM*|| = *5.02 rad and* ||*eVM*|| = *8.20 rad/s for position and velocity errors, respectively. The errors are evaluated at the via-point (k* = *v) and at the final point k* = *T . The expressions pos and vel identify position and velocity constraints, respectively.*

concatenating the actuations associated to the individual pointto-point tasks. Each of these subtasks are solved by means of DRD. In the following we explore this possibility, and we compare the obtained solutions to the results of applying DRD to the entire reversal tasks.

In order to produce a meaningful solution from the concatenation, at the beginning of the out-center movement all the system variables (positions, velocities and accelerations) should match the values obtained at the end of the center-out phase. This condition can be enforced by imposing additional constraints on the acceleration of the joints. Here we prescribe zero velocity and acceleration at the end of the center-out tasks, at the beginning of the out-center, as well as at the target-point of the reversal tasks. Clearly, any other value would represent an equally suitable choice. Additionally, we assign zero velocity at the beginning and at the end of the reversal movements. Formally, the tasks are defined as follows:

*Center-out*

$$\begin{aligned} \dot{q}(0) &= q\_c, & \dot{\dot{q}}(0) &= 0, \\ \dot{q}(t\_\nu) &= q\_\nu, & \dot{\dot{q}}(t\_\nu) &= 0, & \ddot{\dot{q}}(t\_\nu) &= 0 \end{aligned} \tag{14}$$

*Out-center*

$$\begin{aligned} \dot{q}(t\_\nu) &= \mathfrak{q}\_\nu, \quad \dot{\dot{q}}(t\_\nu) = 0, \quad \ddot{\dot{q}}(t\_\nu) = 0, \\ \dot{q}(T) &= \mathfrak{q}\_c, \quad \dot{\dot{q}}(T) = 0 \end{aligned} \tag{15}$$

*Reversal*

$$\begin{aligned} \dot{q}(0) &= q\_c, & \dot{\dot{q}}(0) &= 0, \\ \dot{q}(t\_\nu) &= q\_\nu, & \dot{\dot{q}}(t\_\nu) &= 0, & \ddot{q}(t\_\nu) &= 0, \\ \dot{q}(T) &= q\_c, & \dot{\dot{q}}(T) &= 0. \end{aligned} \tag{16}$$

The synthesis of the synergies for each class of tasks follows the same procedure described in section 2.2 and exemplified in **Figure 3A**. We choose the number of synergies for the point-topoint (six synergies) and for the reversal tasks (seven synergies) in order to achieve comparable average projection errors across the 13 testing targets (0.011 for center-out, 0.014 for out-center, 0.016 for reversal tasks as computed by DRD, and 0.013 for the concatenation of DRD point-to-point solutions). The individual projection errors are depicted in **Figure 4A**. For the targets 1–8, 10, and 13, the actuations provided by the concatenation of pointto-point DRD solutions are better suited than those computed by applying DRD to the entire tasks. However, the forward dynamics errors do not always follow the same relation (**Figure 4B**). As an example, for the targets 2–7, the entire DRD solution performs better than the concatenation of the point-to-point actuations. The relation is, however, kept for targets 1, 8, 10, 11, and 12. Although these results might seem counter intuitive, they can be explained by analyzing the forward dynamics errors of the single center-out and the out-center tasks. It can be noticed that when the error of the entire DRD reversal solution is lower than any of the point-to-point errors, the former solution is preferable to the concatenation-based trajectory (targets 2–7, 9, 11–13). On the other hand, when the forward errors of both point-to-point tasks are lower than the error of the entire reversal solution, concatenation seems to be a better strategy (targets 1, 8, 10). In most of the cases, the forward error of the concatenation errFcoc is almost close to the "sum" of the single point-to-point errors, err*Fco* and errFoc. In order to conform to the definition of the error (see Equation 11), this sum is computed as errFcoc = err2 Fco <sup>+</sup> err<sup>2</sup> Foc.

The relation between the forward error of the concatenation and the forward errors of the individual point-to-point DRD solutions is, in reality, far from trivial. The scenario is depicted schematically in **Figure 5**, where the red line represents a possible solution to a reversal task. Trivially, in the first part of the movement the trajectory obtained from the concatenation strategy (dashed line) corresponds to the DRD solution to the center-out task (dashed green). The actuation

corresponding to the out-center task is then applied. Since the first submotion is affected by errors (i.e., forward error of the center-out task, *eco(tvp)*), the system does not lie in the initial conditions associated to the out-center task (yellow line). This initial error propagates over the course of the movement according to the dynamical properties of the system (dashed blue line), and affects the state at the end of the motion. The resultant final error *e*coc*(T)* is in general different from the forward error of the DRD out-center solution *eoc(T)*. As a result, the overall forward error of the concatenation can be higher (e.g. target 11) or lower (e.g. target 9) than the "sum" of the pointto-point errors. In theory, due to this effect, applying DRD to the entire task could lead to better performance than concatenating DRD point-to-point actuations even if the error of the entire solution is higher than both the errors of center-out and out-center tasks. Such a scenario is, however, not very likely if the error associated to center-out task is very low (as in our examples).

In general terms, none of the two methods seems to be better than the other, however, the following conclusions can be drawn. The concatenation-based solution accumulates the errors of the single movement phases. Furthermore, this strategy requires additional conditions on the kinematic variables to enable the compatibility between the two point-to-point trajectories. On the other hand, the application of DRD to the entire reversal task requires the definition of adequate proto-tasks. If these details are not available (the class of desired tasks is too general, see section 3.4), the concatenation method might be a viable alternative. **Table 3**, summarizes the results of this and the next sections.

### **3.3. VIA-POINT REACHING**

In this section we show the performance of DRD to solve via-point reaching tasks. These motions require the agent to reach a desired final position, passing through a given viapoint. Specifically, in this section we set the via-point to be the center of the operational space *q<sup>c</sup>* (red cross in **Figure 1**), and the initial, intermediate, and final velocities to be equal to zero. The joint-coordinates of initial and final postures,

**Table 3 | Mean projection errors obtained for the testing instances of reversal and via-point reaching tasks using** *N***<sup>φ</sup> synergies.**


*See text for more details.*

(orange continuous line).

*q*<sup>0</sup> and *qT*, represent the free task-parameters as they can be chosen arbitrarily to instantiate specific tasks (four parameters). Finally, we prescribe acceleration equal to zero at the via-point. As described in the previous section, this enables us to generate meaningful task solutions by concatenating the actuations corresponding to the two phases of the movement. Formally, the desired class of tasks can be described as follows:

$$\begin{aligned} \dot{q}(0) &= q\_0, & \dot{q}(0) &= 0, \\ \dot{q}(t\_\nu) &= q\_\epsilon, & \dot{q}(t\_\nu) &= 0, & \ddot{q}(t\_\nu) &= 0 \\ \dot{q}(T) &= q\_T, & \dot{q}(T) &= 0. \end{aligned} \tag{17}$$

The synergies are synthesized as described in section 2.2. Since the parameters *q*<sup>0</sup> and *q<sup>T</sup>* can be chosen arbitrarily, the parameter space is four-dimensional. This condition does not affect the general procedure; i.e., proto-tasks are sequentially added in the point of the space characterized by the highest projection error. **Figure 6A** depicts the averaged projection error (across the targets distributed in the parameter space) as a function of the number of synergies.

The synthesized synergies are tested on 18 tasks, the initial and final positions of which are drawn from the targets in **Figure 1**. **Figure 6B** reports the errors obtained by using 17 reduced synergies (upward green triangles), and the performance of 100 sets of size 17 drawn from the exploration signals (box-plots). The interpolation errors corresponding to the synthesized synergies are lower than, but comparable to, the mean errors of the random sets (≈10<sup>−</sup>14). This is not surprising since 17 random signals are likely to produce an alternant matrix with full row-rank, thus any desired constraint vector can be obtained with negligible interpolation error. However, it is interesting to notice that the information added by the reduction phase leads to lower interpolation errors. In relation to projection and forward dynamics errors, the synthesized synergies perform about two orders of magnitude better than the random signals, providing further evidence that the reduction phase is a valuable procedure. **Figure 6C** shows the DRD solution of the via-point reaching task with the highest projection error (starting at point 10 and arriving at point 5). Similarly to point-to-point and reversal movements, the difference between computed and projected actuations, and the difference between interpolated and executed trajectories are negligible.

The detailed values of normalized interpolation and forward dynamics errors are summarized in **Table 4**. Similarly to the position and velocity errors, the acceleration errors are defined as *eIAk* = ||*q***¨***<sup>k</sup>* − **¨** *(tk)a*|| and *eFAk* = ||*q***¨***<sup>k</sup>* − **¨** *q***˜***(tk, b)*|| (interpolation and forward dynamics, respectively). The normalization factors, computed as in section 3.1, are ||*ePM*|| = 5*.*02 rad, and ||*eVM*|| = 7*.*05 rad/s, for position and velocity errors, respectively; the errors in acceleration are normalized to ||*eAM*|| = 61*.*5 rad/s2, where *eAM* contains the peak angular accelerations of the two joints across the kinematic solutions to the testing tasks. The maximum normalized values of the errors are 4*.*2 × 10−<sup>3</sup> (i.e., 0*.*021 rad, task 10-3, *k* = *T*) for position, 6*.*4 × 10−<sup>3</sup> (0*.*046 rad/s, task 13-1, *k* = *T*) for velocity, and 2*.*03 × 10−<sup>6</sup> (1*.*2 × 10−<sup>4</sup> rad/s2, task 2-8, *k* = *v*) for acceleration forward dynamics errors.

Finally, we compare the use of DRD for solving the entire tasks, to the concatenation of individual DRD point-to-point solutions. In the same vein of the reversal tasks, the considered via-point reaching movements can be composed of an initial out-center motion (from *q*<sup>0</sup> to *qc*), followed by a center-out movement (from *q<sup>c</sup>* to *qT*). The number of synergies is chosen to obtain a comparable mean projection error across the 18 testing tasks. We used six synergies for both out-center and center-out tasks, and 17 synergies for via-point reaching, leading to the following average errors: 0*.*012 Nm for center-out, 0*.*014 Nm for out-center, 0*.*013 Nm for via-point reaching as solved by DRD, and 0*.*015 Nm for the concatenation. **Table 3** summarizes these results.

The yellow downward triangles in **Figure 6B** indicate the performance of the concatenation strategy. In line with the rationale in section 3.2.1, this method accumulates the errors of the sequential point-to-point solutions, resulting in higher values of forward dynamics and interpolation error. From the point of view of dimensionality reduction, the concatenation strategy might be convenient as the number of synergies reduces from 17 to 12 (six for each movement phase) with a small loss of performance (see **Table 3**).

# **3.4. TASK GENERALITY AND NUMBER OF SYNERGIES**

The obtained results show that via-point reaching tasks require a higher number of synergies than reversal tasks. To achieve a mean projection error *<*10−<sup>2</sup> Nm, via-point reaching needs at least 17 synergies, and the reversal tasks at least 7. In this section, we provide a plausible interpretation of this difference, accompanied by additional results to support our rationale.

For the sake of clarity let us first define the *generality* of a class of tasks as the number of its free task-parameters. As discussed above, the desired class of tasks can be defined by imposing certain values to the state variables and their derivatives. For example, the reversal tasks presented in section 3.2 impose zero velocities, and additionally fix initial and final postures to a specific point of the configuration space, *qc*. Although they are essentially via-point tasks, each instance is defined only by the position of the desired intermediate target. Thus the generality of this class of task is 2 as the target is specified by two values (i.e., its joint-coordinates). Via-point reaching tasks, as defined in section 3.3, fix the position of the via-point to *qc*, and impose initial, intermediate and final velocities equal to zero; each task instance is therefore defined by the desired initial and final postures, thus the generality of this class of tasks is 4.

The lower the generality of the desired class of tasks, the lower the variability of the control signals. This observation is exemplified in **Figure 7**, which shows the actuations associated to the reversal (panel **A**) and the via-point reaching testing tasks (panel **B**). As expected, the actuations in panel **A** are more regular than those in panel **B**. Quantitatively, the mean correlations between the (absolute values of the) control signals of the shoulder are 0*.*97 and 0*.*67 for reversal and via-point reaching, respectively, and the correlations between the actuations of the elbow are 0*.*70 and 0*.*53. The regularities that can be observed in the first phase of the via-point reaching movements are simply due to the fact that groups of testing tasks are characterized by the same initial position (see the abscissas label of **Figure 6B**). If this was not the case, the corresponding mean correlation values would be even lower.

The number of required synergies is strictly related to the previous observations. Since the proto-tasks belong to the desired class of tasks (see section 2.2), the reduced synergies are elements of the set of desired actuation. If the desired control signals are

characterized by a low degree of variability (e.g. reversal case), their essential features can be captured by a handful of elements. Otherwise, a higher number of synergies is required.

To further test the validity of our rationale, we consider three increasingly more general classes of tasks. The first class (a) consists of the reversal tasks described in section 3.2, in which the only free task-parameters are the joint-coordinates of the viapoint. The second one (b) fixes only the initial position, while via-point and final posture can be chosen arbitrarily. Finally, the third class of tasks (c) does not impose any fixed posture. **Figure 3B** shows the trends of the average projection errors as a function of the number of synergies for the three cases (blue continuous, red dotted, and green dashed lines, respectively). As expected, the number of synergies that are needed to obtain a certain degree of performance increases with the generality of the class of tasks. The projection error is meaningful only if the kinematic solution fulfills the task constraints, thus the trends in **Figure 3B** should be considered starting from the minimum number of proto-tasks that guarantees this condition (i.e., three, five, and six synergies). The oscillations that can be observed for a smaller number of synergies can therefore be ignored as they are not representative of task performance in any way.

The effectiveness of the reduction phase is strictly related to the generality of the desired class of tasks. Very general classes lead to weakly correlated control signals. Thus, the reduction phase becomes less useful, and the synthesized synergies will embed regularities that are solely due to the dynamics of the system. Additionally, in order to obtain good performance in all the desired tasks, a large number of synergies will be required. As a direct consequence, the performance of the synthesized synergies will approach the performance of generic actuations. To illustrate this concept we compare the synergies synthesized for each of the previous classes of tasks with random sets of exploration actuations. The latter control signals are not generated through the process of reduction, and therefore they are not expected to embed any information about the tasks to be solved. We choose the minimum number of synergies that guarantees a mean projection error *<*10−<sup>2</sup> Nm, i.e., 8, 18, and 24 for classes (a), (b), and (c), respectively (see **Figure 3B**). Then we use these groups of synergies to solve the 13 reversal testing tasks. **Figure 8** depicts the


**Table 4 | Normalized interpolation (int) and forward dynamics (fwd. dyn.) errors for each task-constraint of the testing via-point reaching tasks.**

*The normalization factors are* ||*ePM*|| = *5.02 rad,* ||*eVM*|| = *7.05 rad/s and* ||*eAM*|| = *61.5 rad/s2 for position, velocity and acceleration errors, respectively. The errors are evaluated at the via-point (k* = *v) and at the final point k* = *T . The expressions pos, vel, and acc identify position, velocity, and acceleration constraints, respectively.*

difference between the mean projection errors obtained by using the random sets *eri*, and the projection errors corresponding to the three sets of synergies *esi* (i.e., *Ii* = *eri* − *esi* for each class *i*). As expected, this difference reduces for increasingly more general tasks.

# **4. DISCUSSION**

We performed an analysis of the muscle synergy hypothesis from a computational perspective; i.e., the control of a planar kinematic chain through linear combinations of a limited set of torque profiles (motor synergies). We proposed the DRD as a tool to generate appropriate synergy-based controllers and to synthesize an effective set of synergies; such a tool has been tested to solve point-to-point and via-point tasks. DRD generates a kinematic solution by combining the dynamic responses of the synergies, and it employs inverse dynamics to compute the corresponding actuation; this control signal is finally approximated by a linear combination of synergies. The problem of finding a kinematic solution is therefore reduced to a simple interpolation, and the associated combination of synergies is obtained by projection. The quality of the obtained controller (and ultimately the task performance) depends on the set of synergies used.

Although our approach involves many assumptions and simplifications, we believe that it highlights important theoretical aspects of the muscle synergy hypothesis. First, we have provided direct evidence to the possibility of controlling a non-linear dynamical system by linear combinations of a parsimonious set of basic actuations; these scheme can result in good performance across many instances of the desired class of tasks. Hence, we support the paradigm of muscle synergies as a possible CNS principle to simplify motor control and learning. Furthermore, our results suggest that, in order to realize an effective and lowdimensional controller, synergies should embed features of the system dynamics and the desired class of tasks. Within the DRD, the information on the system dynamics is captured by the DRs (i.e., trajectories of the system variables under the actuation of each synergy), and that on the desired class of tasks is obtained by means of the reduction procedure (i.e., solving a representative set of proto-tasks). The beneficial effect of this approach is visible from two perspectives: at the kinematic level, it leads to an alternant matrix that can generate the desired constraint vectors (see Equations 12 and 13); at the actuation level, it provides samples of the desired control signals (see **Figure 7**). As a result, the obtained synergies over-perform hundreds of arbitrary choices of basic controllers taken from the exploration motor signals.

The number of required synergies to achieve a given performance depends on the generality of the desired class of tasks (i.e., number of free task-parameters); general tasks (e.g. viapoint reaching) require more synergies than highly specific ones (e.g. reversal). These considerations further confirm that synergies are strictly tailored to the class of tasks to be solved. The mathematical formulation of DRD shows a clear non-linear relationship between kinematic and actuation modularity, that is directly intertwined to the dynamics of the system. Our analysis on the concatenation of synergy-based controllers to solve viapoint tasks is directly related to the notion of kinematic primitives (Flash et al., 1992), and it represents a control scheme that, for the

**FIGURE 7 | Actuations corresponding to the testing reversal and via-point reaching tasks.** Since the via-point task is more general, the corresponding control signals **(right)** are less correlated than the reversal ones **(left)**. This is particularly visible in the second phase of the movement (after the dashed vertical line that marks the time of the via-point). See text for more details and for the values of the correlation.

first time, integrates this form of modularity together with muscle synergies. The obtained results show that the concatenation method accumulates the errors of the individual submotions. On the other hand, the application of DRD to the entire via-point task requires the definition of well specified proto-tasks. If the class of task is too general, concatenation could be a viable strategy to keep the number of synergies low (see **Table 3**).

The usage of a kinematic chain rather than a muscle-driven skeletal model is a simplification in our work that is worth discussing. This simplification implies the definition of control signals (and therefore synergies) in the space of joint torques, and not in muscle activation space. In a musculoskeletal system, the non-linear relation between torques and kinematic variables is complemented by the additional non-linear dynamics that translates muscle activations into joint torques. The total mapping between muscle activations and kinematic variables is non-trivial. The chain of the two non-linear relations might either compensate each other, resulting in overall milder non-linearities, or form an even stronger one. A more detailed model could also bring into play other effects, for example the preflex dynamical properties of muscles, which might themselves correct mild external disturbances, stabilizing the overall system. In any case, our mathematical framework aims at capturing the fundamental theoretical problem behind the muscle synergy hypothesis; i.e., the possibility of controlling the output variables of a nonlinear dynamical system (i.e., kinematic chain or musculoskeletal model) by means of a linear input strategy (i.e., linear combination of torque or muscle synergies). Thus, although muscle synergies may emerge from the interaction between neural as well as biomechanical constraints (Ting and McKay, 2007), we believe that the findings of this work (see section 4.1) are qualitatively valid also for realistic musculoskeletal models. Nevertheless, quantitative details such as the obtained number of synergies and their waveforms are strongly intertwined to the dynamical system used. We intend to evaluate DRD in more biologically plausible systems in future developments of our work. In what follows we are going to discuss our work in relation to the current debate on muscle synergies (sections 4.1 and 4.2), and to the field of robotics (section 4.3).

# **4.1. COMPUTATIONAL INSIGHTS ON THE MUSCLE SYNERGY HYPOTHESIS**

Many studies in experimental neuroscience analyze the validity of the muscle synergy hypothesis solely in terms of the accuracy in approximating recorded EMG signals (d'Avella et al., 2003; d'Avella and Bizzi, 2005; Torres-Oviedo and Ting, 2007; Cheung et al., 2009a; Torres-Oviedo and Ting, 2010). This measure is equivalent to our projection error, and it does not explicitly quantifies the quality of the synergy-based controller. The introduction of complementary measures, similar to the forward dynamics error, would provide a direct evaluation of task performance, and therefore they could shed new lights on the hypothetical modularity of the CNS (Alessandro et al., 2013; Delis et al., 2013).

In this vein, some researchers introduced the concept of functional synergies, i.e., the components of an extended dataset that includes muscle activations as well as measurements of task variables (e.g. joint angles, end-limb force) (Torres-Oviedo et al., 2006; Chvatal et al., 2011). As a result, each component consists of two elements: a pattern of muscle contractions, and the corresponding evolution of the task variables. Such an approach is not too different from the idea behind DRD: synergies are associated to their DRs (i.e., biomechanical functionalities), which are linearly combined to obtain the kinematic solution of the task. However, the identification of functional synergies by means of non-negative matrix factorization (NMF), implies that muscle synergies and their biomechanical functionalities are scaled by the same coefficients. This contrasts with our theoretical results, which show a non-linear relationship (the mapping *F*, see Equation 8) between the mixing weights of the synergies and those of the DRs. Ideally, one should go beyond the use of NMF, and develop novel techniques that do not impose a linear mapping between the two sets of coefficients.

The mapping *F* points out a fundamental non-linear relationship between kinematic and actuation modularity. More generally, this result applies to any groups of variables that are related to each other by a non-linear differential operator like *D* (e.g. kinematic and muscle variables, muscle excitation and activation, neural and muscle activation). However, linear forms of modularity have been investigated both at the kinematic (Berret et al., 2009) and at the muscle activation level (d'Avella et al., 2006). Our result suggests that these modularities cannot coexist; i.e., if one level of variables is bounded to a linear set (e.g. kinematic variables in our work), the other level of variables can at most be approximated linearly, but they intrinsically belong to a non-linear space (e.g. torque). Alternatively, additional processes might linearize the system dynamics as suggested by Berniker et al. (2009) and Nori (2005).

The fact that synergies and DRs are related through the dynamics of the system has another important implication. Since the former are feasible kinematic solutions to the proto-tasks, the obtained synergies can always be realized as actuations. The same cannot be said, in general, for synergies identified from numerical analyses of biomechanical data. Though some studies have verified the feasibility of the extracted synergies as actuations (Neptune et al., 2009; McGowan et al., 2010; Allen and Neptune, 2012), biomechanical constraints are not explicitly included in the extraction algorithms. Additionally, Equation (2) provides an automatic way to cope with smooth variations of the agent morphology. That is, both the synergies and their dynamic responses evolve together with the body. In line with Nori (2005), these considerations highlight the importance of the body in the hypothetical modularization of the CNS.

The mathematical formulation of DRD, and in particular the system of linear equations (5), shows a clear relation between the minimum number of synergies and the difficulty of the task. To guarantee the existence of a kinematic solution, the alternant matrix should be full-row rank. In other words, the minimum number of proto-tasks, and therefore of synergies, should correspond to the dimensionality of the task-constraint vector. For a two-DoF kinematic chain, general via-point tasks consist of three position and three velocity constraints (each of them is two-dimensional); thus, at least 12 DRs are required to be able to solve any task in kinematic space. A highly specified class of tasks reduces the minimum number of required synergies. For example, point-to-point and reversal tasks, that are characterized by two free task-parameters (i.e., location of the target), require three DRs (instead of 12); for via-point reaching this number increases to 5 (see section 3). Note that these bounds are solely based on kinematic considerations; since the dynamical system is non-linear, they do not necessarily guarantee low values of projection and forward dynamics error. In fact, as shown in section 3, the number of synergies that is required to obtain satisfactory performance is certainly higher than the theoretical kinematic-based estimation. However, this number still follows the principle that more general tasks require a higher number of synergies (see **Figure 3B** and section 3.4).

Our method to synthesize synergies might be interpreted from a developmental perspective. Initially, the agent explores its sensory-motor system employing a variety of actuations. Later, it attempts to solve the first tasks (proto-tasks), perhaps obtaining weak performance as the exploration phase may not have produced enough responses yet (see the box-plots in **Figures 2C**, **3C**, and **Figure 6B**). If the agent finds an acceptable solution to a proto-task, such a solution is used to generate a new synergy (populating the set *-*), otherwise it continues with the exploration. The failure to solve important tasks for its survival, could motivate the agent to include additional proto-tasks; **Figures 2A**, **3A** illustrate this mechanism. The development of the synergy-set incrementally improves the overall abilities of the agent. Alternatively, existing proto-tasks could be modified. It has to be clear that we are not arguing in any way that this procedure resembles the mechanisms involved in the motor development of biological organisms. It is, however, interesting that our procedure facilitates the autonomous generation of new synergies, and the possible adaptation of the existing ones to cope with the changes in the body dynamics (see Equation 2). These features are in line with the recent findings by Dominici et al. (2011). An alternative strategy for synergy development (not implemented in this paper) might be the concatenation of movement chunks. If the agent has already developed the synergies to solve point-to-point tasks, via-point proto-tasks could be solved by the concatenation of point-to-point actuations. As shown in **Figures 4B**, **6B** the results might not be as good as if the solution were computed *ad hoc* (i.e., for the entire via-point proto-tasks). However, inspired by Sosnik et al. (2004) and Rohrer et al. (2004), one could imagine that such solutions might improve with practice, eventually leading to appropriate via-point modules.

The concatenation of point-to-point control signals to solve via-point tasks is based on the observation that movements can be composed by sequences of kinematic strokes, or submovements. The relation between this form of planning modularity (Morasso and Mussa-Ivaldi, 1982) and muscle synergies is still under debate. Possibly, as implemented in our formulation, each kinematic stroke translates into a combination of time-varying synergies, and therefore the final movement plan corresponds to a sequence of mixing patterns. This strategy would be in line with the hypothesis of an intermittent controller that sequentially initiates discrete movement primitives (Fishbach et al., 2005; Loram et al., 2010; Squeri et al., 2010; Karniel, 2013). Submovements might be combined in time succession (Soechting and Terzuolo, 1987; Meyer et al., 1988), or based on the vectorial summation of overlapping preplanned trajectories (Flash and Henis, 1991; Henis and Flash, 1995; Novak et al., 2003; Roitman et al., 2004; Pasalar et al., 2005). In this manuscript we have exemplified the former approach; the analysis of the latter by means of DRD is non-trivial, and it is therefore left for future work. As shown by recent experimental studies (d'Avella et al., 2011), such a strategy might enable reusing synergies underlying point-to-point kinematic trajectories to generate more complex trajectories involved in reaching a jumping target. Finally, it is important to notice that the kinematic solution to a via-point task appears to be composed of different movement-chunks even when it is obtained from a single composition of highly specified synergies. This observation supports the idea that strokes could just emerge as a result of the trajectory optimization (Dagmar and Schaal, 1999) or even be data analysis artifacts.

Our work analyzes the theoretical aspects, rather than the implementation details, of the muscle synergy hypothesis. As such DRD does not represent a model of the neural substrates involved in muscle synergies, and we do not claim that DRD is somehow implemented within the CNS. In fact, the biological mechanisms involved in muscle synergies are probably very different from the mathematical techniques used in this paper. For example, in our method synergies can be obtained simply by computing the solution to the proto-tasks; on the contrary, the biological process of synergy development is very likely to be incremental, and it spans several years of development (Dominici et al., 2011). However, some of the functionalities of DRD are not biologically implausible. The computation of a kinematic solution to a task (see Equation 8) can be regarded as a form of kinematic planning, and can be performed by means of a recurrent neural network (Cichocki and Unbehauen, 1992a,b) that computes the DRs mixing weights *a*. Interestingly, DRD suggests that, although muscle synergies are defined at the motor command level (i.e., muscle activation), they could also be related to kinematic planning, and that the planning process might be carried out by exploiting knowledge of the system dynamics (in our framework embedded in the DRs). The non-linear function *F* is a mapping between two finite dimensional sets of variables (the DR weights, expression of the planned trajectory, and the synergy weighting coefficients *b*), therefore it can be encoded by means of a feedforward neural network. Conceptually, this function represents the neural pathways between the cortical areas related to planning (Buneo and Andersen, 2006) and the neural substrate where synergies are supposedly located; the outputs of this function represent the descending neural commands that modulate synergy recruitments (Ivanenko et al., 2003; Torres-Oviedo et al., 2006; Ting, 2007; Ting and McKay, 2007; Torres-Oviedo and Ting, 2010). As a matter of fact, *F* is a compact form of inverse dynamical model. Thus, its hypothetical neural implementation may involve the primary motor cortex (M1), which is know to be related to dynamical features of the movements (and in particular to inverse dynamics) (Kalaska, 2009), and the cerebellum, which is supposedly involved in the neural representation of internal models (Kawato, 1999; Diedrichsen et al., 2005; Bursztyn et al., 2006). These considerations are supported by the recent hypothesis suggesting that muscle synergies might be organized both at the spinal (Hart and Giszter, 2010) and at the cortical level (Overduin et al., 2012); their spatial structure might derive from divergent corticospinal connectivity or from spinally organized modules, and their temporal characteristic may originate from the dynamics of the recurrent connections of the motor cortex (d'Avella et al., 2006).

### **4.2. COMPARISON WITH OTHER COMPUTATIONAL STUDIES**

While many studies try to validate or falsify the hypothesis of muscle synergies, only a few researchers have focused on developing and testing control architectures based on this concept. Some of these works aim at proposing novel techniques for robot control, other intend to analyze the hypothesized modularity from a computational point of view. Our work falls into the second category; in this section we briefly compare it to similar contributions, in particular to those studies that provide a possible interpretation of muscle synergies. The reader is referred to Alessandro et al. (2013) for a more comprehensive review.

Inspired by the original work by Mussa-Ivaldi (1997), Nori and Frezza (2005) developed a control architecture for nonlinear systems based on the idea of spinal force fields (Giszter et al., 1993; Mussa-Ivaldi et al., 1994; Mussa-Ivaldi and Bizzi, 2000; Nori, 2005). Relying on the technique of feedback linearization, the method yields a set of synergies that is able to generate a complete repertoire of movements (i.e., the system can reach any arbitrary state in an arbitrary amount of time). Thus, the authors interpreted muscle synergies as a basis of the entire control action space. Berniker et al. (2009) defines synergies as the smallest set of input vectors that influences the output of a reduced-order model of the agent, and that minimally restrict the commands useful to solve the desired tasks. Practically, this set is found by optimizing the synergies against a representative dataset of desired sensory-motor signals. Similarly, Todorov and Ghahramani (2003) employ an unsupervised learning procedure to identify muscle synergies from a collection of sensory-motor data, which is obtained by actuating the robot with random signals. Their work proposes that synergies are a constituent part of an inverse model of the sensory-motor system. Another interpretation is given by Marques et al. (2012), who suggest that synergies solely reflect the biomechanical constraints of the agent.

Similar computational approaches have also been used to test whether a given model of muscle synergies (or more generally, a primitive-based controller) is competent to reproduce experimental observations. The comparison between simulated and experimental data is often performed both at the kinematic and at the muscle activation level. Furthermore, the role of biomechanical constraints is explicitly taken into account. Hence, employing biologically plausible models of the musculoskeletal apparatus becomes necessary. Kargo et al. (2010) have demonstrated that the model of premotor drives accounts for the kinematic trajectories and the isometric force fields observed in frog wiping reflexive behaviors (Kargo and Giszter, 2008). In particular they have showed that realistic wiping trajectories can be obtained simply by modulating the amplitudes and the phase-shifts of the activation pulses, without altering the muscle activation balance of each synchronous synergy. Similar studies have been carried out in the context of human walking (Neptune et al., 2009; McGowan et al., 2010; Allen and Neptune, 2012) and balancing in cats (McKay and Ting, 2008, 2012).

Unlike all those studies, the work presented herein does not aim at reproducing experimental data, rather it provides a theoretical investigation of motor synergies. As discussed in section 3.4, our work suggests that synergies can be obtained by solving well defined control problems. Similar ideas have already been proposed (Chhabra and Jacobs, 2008; Todorov, 2009; Alessandro and Nori, 2012; Thomas and Barto, 2012). However, these studies do not investigate which class of problems are best suited for this purpose. In this manuscript we show that these problems (i.e., proto-tasks) should belong to the same class of the desired tasks; this would lead to a compact set of synergies that capture features of the system dynamics and the desired class of tasks, and therefore result in good task performance. Additionally we show a clear relation between the number of synergies and two characteristics of the task: generality (i.e., number of free task parameters), and difficulty (i.e., number of constraints). Further, we propose a possible integration scheme between kinematic stroke and muscle synergies; to the best of our knowledge no other synthetic study has tested this idea.

### **4.3. THE DRD METHOD AND ITS RELEVANCE TO ROBOTICS**

In robotics an active field of research focuses on novel mechanisms to generate trajectories (e.g. kinematic patterns or motor commands) and to learn their representations from given samples. The frameworks of Dynamic Movement Primitives (DMPs) (Ijspeert et al., 2013) and Stable Estimator of Dynamical Systems (SEDS) (Khansari-Zadeh and Billard, 2011) have recently received particular attention for their stability and invariance properties. Both methods encode desired trajectories in the attractor landscapes of appropriately tuned autonomous dynamical systems. While in DMPs this is obtained by modifying the dynamics of a well known system by mean of a learned forcing term, SEDS employs Gaussian mixture models (GMM) to identify the desired attractor landscape from scratch. Also the DRD can be interpreted as a method to generate kinematic trajectories and control signals. The former are obtained by linearly combining the DRs (i.e., kinematic solutions to the proto-tasks), and the latter by linearly combining the synergies (i.e., projections of the actuations that solve the proto-tasks onto the synergy-span). A quantitative comparison between our method and dynamical system-based architectures is out of the scope of this paper, however, the following considerations can be made.

DMPs and SEDS employ advanced machine-learning techniques to learn a representation of externally provided desired trajectories (e.g. via imitation learning). In contrast, DRD is not only limited to represent task solutions, but it also provides a strategy to self-generate them (i.e., planning). Given a set of constraints defining the task, DRD finds both a kinematic solution by interpolation, and the corresponding actuation by projection. As a result, no desired trajectory has to be provided externally nor any complex learning procedure is required, instead simple algebraic operations are used to solve the control problem. These features are possible mainly for two non-trivial results: (1) the dynamic responses of non-linear systems are good basis functions to build interpolant trajectories and (2) the actuations solving the proto-tasks (i.e., synergies) span a representative set of control signals.

In terms of generalization, the spatial invariance property of DMP can be exploited to generate only scaled versions of the learned movement kinematics (e.g. point-to-point reaching and reversal tasks). This is not the case for combinations of DRs, which, for example, can generate via-point reaching movements that share the same initial and intermediate points, but have different targets (see section 3.3). This kind of generalization could be obtained by shaping the dynamics of the DMPs by means of appropriate basis functions that capture common features of the desired tasks (Rückert and d'Avella, 2013). This idea is in spirit similar to solving proto-tasks, however, it requires a computationally intense learning phase if compared to our method to synthesize synergies. The same drawback is experienced by using SEDS. Furthermore, synergies embed essential features of the desired control signals, and therefore, unlike DMP and SEDS, DRD can generalize also at the actuation level.

The main disadvantage of DRD is its explicit time-indexing; as a result it does not provide an easy strategy to modulate the velocity of a given movement, and it leads to controllers that are not robust to time-perturbation. Moreover, at the current stage DRD does not provide proved stable controller, a feature that can be enjoyed in DMP and SEDS. These drawbacks could be avoided by encoding synergies and DRs by means of DMPs. In a similar vein, techniques based on mixture of DMPs have recently been proposed to improve generalization. Outstanding results have been obtained, however, each primitive has to be learned by demonstration (Muelling et al., 2010). Using DRD such primitives could be self-generated by means of the procedure to solve proto-tasks, and then they could be translated into dynamical systems. In conclusion, DRD and DMP could be combined into a unified powerful technique that inherit the advantages of both approaches, rendering the two methods more complementary than competitive.

In the DRD method, once the task is solved in kinematic space, the corresponding actuation can be computed using the explicit inverse dynamical model of the system (i.e., the differential operator *D*). It might appear that there is no particular advantage in projecting this solution onto the linear span of the synergy set. However, the differential operator might be unknown or affected by errors; this is very often the case in robotics, where learning inverse models is still a hot topic of research (Nguyen-Tuong and Peters, 2011). A synergy-based controller would enable to compute the appropriate actuation by evaluating the mapping *F* on the vector *a*, hence obtaining the synergy combinators *b*. Since *F* is a mapping between two finite low-dimensional vector spaces, estimating this map may turn out to be easier than estimating the differential operator *D*. In order to estimate the map *F*, the input–output data generated during the exploration phase (i.e., *-*<sup>0</sup> and 0) could be used as learning data-set. The obtained relation could be instrumental to estimate a first guess of the synergy set; *F* and  could then be iteratively modified until convergence. Further work is required to test these ideas.

# **5. CONCLUSIONS**

The current work analyzes the hypothesis of muscle synergies from a computational perspective; i.e., the control of a planar kinematic chain through linear combinations of a limited set of torque profiles (motor synergies). The proposed DRD is able to generate effective synergies, greatly reducing the dimensionality of the problem, while keeping a good performance level. In order to obtain good performance across a variety of task instances, synergies should capture the essential features of the tasks to be solved, and take the system dynamics into account. The number of required synergies increases with the generality of the desired class of tasks. Nevertheless, to keep the number of synergies low, solutions to general tasks can be obtained by concatenating the synergy-based controllers associated to simple point-to-point movements with a limited degradation of task performance. Overall our work serves as a proof of concept for the notion of muscle synergies, showing that linear combinations of actuation modules can be used to control a non-linear dynamical system. This paper highlights the advantages and the limitations of this approach, and it draws attention to important aspects that are not easily accessible in experimental studies.

The future developments of this research point toward different directions. The relations between muscle synergies and kinematic submovements will be further investigated. In particular, we will analyze the idea of overlapping point-to-point strokes (Flash et al., 1992). Another interesting line of investigation is the validation of our method against biological data, paving the way toward a predictive model of the muscle synergy hypothesis. To this end, a first step will be the evaluation of DRD on realistic musculoskeletal models. From the theoretical point of view, we are currently studying the mathematical properties of the synergies synthesized by means of the reduction procedure. Finally, we plan to tackle the challenge of learning the mapping between kinematic and synergy coefficients.

The software used to produce all the results reported in this paper is available as a GNU Octave package under free and open source license1 . The reader is encouraged to download, test, report bugs and submit improvements to the algorithm.

### **ACKNOWLEDGMENTS**

The research leading to these results has received funding from the European Community's Seventh Framework Programme FP7/2007-2013-Challenge 2—Cognitive Systems, Interaction, Robotics- under grant agreement No 248311-AMARSi, and from the EU project RobotDoC under 235065 from the 7th Framework Programme (Marie Curie Action ITN).

# **AUTHOR CONTRIBUTIONS**

Cristiano Alessandro performed all numerical simulations and data analyses. Cristiano Alessandro and Juan Pablo Carbajal worked on the implementation of the algorithm. The DRD method was born during Juan Pablo Carbajal's visit to Andrea d'Avella's laboratory. Andrea d'Avella provided material support for this development and uncountable conceptual inputs. All three authors have contributed to the creation of the manuscript.

### **REFERENCES**


<sup>1</sup>http://users*.*elis*.*ugent*.*be/~jcarbaja/DRD/drd*.*html.


L. Bottou, and A. Culotta (Vancouver, British Columbia: Nips Foundation), 1856–1864. (http://books.nips.cc).


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 09 July 2013; accepted: 21 December 2013; published online: January 2014. 16*

*Citation: Alessandro C, Carbajal JP and d'Avella A (2014) A computational analysis of motor synergies by dynamic response decomposition. Front. Comput. Neurosci. 7:191. doi: 10.3389/fncom.2013.00191*

*This article was submitted to the journal Frontiers in Computational Neuroscience. Copyright © 2014 Alessandro, Carbajal and d'Avella. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Learned graphical models for probabilistic planning provide a new class of movement primitives

#### *Elmar A. Rückert1 \*, Gerhard Neumann1, Marc Toussaint <sup>2</sup> and Wolfgang Maass <sup>1</sup>*

*<sup>1</sup> Institute for Theoretical Computer Science, Graz University of Technology, Austria <sup>2</sup> Department of Computer Science, Freie Universität Berlin, Germany*

### *Edited by:*

*Andrea D'Avella, IRCCS Fondazione Santa Lucia, Italy*

### *Reviewed by:*

*Cees Van Leeuwen, Katholieke Universiteit Leuven, Belgium Andrey Olypher, Emory University, USA*

*Petar Kormushev, Italian Institute of Technology, Italy*

### *\*Correspondence:*

*Elmar A. Rückert, Institute for Theoretical Computer Science, Graz University of Technology, Inffeldgasse 16b/1, 8010 Graz, Austria. e-mail: rueckert@igi.tugraz.at*

Biological movement generation combines three interesting aspects: its modular organization in movement primitives (MPs), its characteristics of stochastic optimality under perturbations, and its efficiency in terms of learning. A common approach to motor skill learning is to endow the primitives with dynamical systems. Here, the parameters of the primitive indirectly define the shape of a reference trajectory. We propose an alternative MP representation based on probabilistic inference in learned graphical models with new and interesting properties that complies with salient features of biological movement control. Instead of endowing the primitives with dynamical systems, we propose to endow MPs with an intrinsic probabilistic planning system, integrating the power of stochastic optimal control (SOC) methods within a MP. The parameterization of the primitive is a graphical model that represents the dynamics and intrinsic cost function such that inference in this graphical model yields the control policy. We parameterize the intrinsic cost function using task-relevant features, such as the importance of passing through certain via-points. The system dynamics as well as intrinsic cost function parameters are learned in a reinforcement learning (RL) setting. We evaluate our approach on a complex 4-link balancing task. Our experiments show that our movement representation facilitates learning significantly and leads to better generalization to new task settings without re-learning.

**Keywords: movement primitives, motor planning, reinforcement learning, optimal control, graphical models**

# **1. INTRODUCTION**

Efficient motor skill learning in redundant stochastic systems is of fundamental interest for both, understanding biological motor systems as well as applications in robotics.

Let us first discuss three aspects of human and animal movement generation the combination of which is the motivation for our approach: (1) its modular organization in terms of movement primitives, (2) its variability and behavior under perturbations, and (3) the efficiency in *learning* such movement strategies.

First, concerning the movement primitives (MPs) in biological motor systems, the musculoskeletal apparatus is a highdimensional redundant stochastic system and has many more degrees-of-freedom (DoF) than needed to perform a specific action (Bernstein, 1967). A classical hypothesis is that such redundancy is resolved by a combination of only a small number of functional units, namely MPs (d'Avella et al., 2003; Bizzi et al., 2008; d'Avella and Pai, 2010). In other terms, MPs can be understood as compact parameterizations of elementary movements which allows for an efficient abstraction of the high-dimensional continuous action spaces. This abstraction has been shown to facilitate learning of complex movement skills (d'Avella et al., 2003; Schaal et al., 2003; Neumann et al., 2009).

A second important aspect about biological movement are the characteristics of motor variability under perturbations or stochasticity. If humans perform the same task several times, the resulting movement trajectories vary considerably. Stochastic optimal control (SOC), besides its high relevance in engineering problems, has proven itself as an excellent computational theory of this effect (Todorov and Jordan, 2002; Trommershauser et al., 2005). An implication of SOC, the *minimum intervention principle*, states that we should only intervene in the system if it is necessary to fulfill the given task. If the task constraints are not violated it is inefficient to suppress the inherent noise in the stochastic system. The fact that biological movements account for such principles suggests that SOC principles are involved on the lowest level of movement generation.

These biological perspectives suggest that the third aspect, efficient motor skill learning, is facilitated by this combination of MPs with low level SOC principles. While existing MP methods have demonstated efficient learning of complex movement skills (d'Avella et al., 2003; Schaal et al., 2003; Neumann et al., 2009) they lack an integration of SOC principles *within* MPs. Instead, in current approaches the parameters of the MP compactly determine the shape of the desired trajectory either directly or indirectly. This trajectory is then followed by feedback control laws. An example for an indirect trajectory parameterization are the widely used Dynamic Movement Primitives (DMPs) (Schaal et al., 2003), which use parameterized dynamical systems to determine a movement trajectory. The idea of DMPs to endowing MPs with an intrinsic dynamical system has several benefits: they provide a linear policy parameterization which can be used for imitation learning and policy search (Kober and Peters, 2011). The complexity of the trajectory can be scaled by the number of parameters (Schaal et al., 2003) and one can adapt metaparameters of the movement such as the movement speed or the goal state of the movement (Pastor et al., 2009; Kober et al., 2010). Further, the dynamical system of a DMP is to some degree also reactive to perturbations by adapting the time progression of the canonical system depending on joint errors and thereby deor accelerating the movement execution as needed (Ijspeert and Schaal, 2003; Schaal et al., 2003). However, the trajectory shape itself is fixed and non-reactive to the environment.

In our approach we aim to go beyond MPs that parameterize a fixed reference trajectory and instead truly integrate SOC principles within the MP. The general idea is to endow MPs with an intrinsic probabilistic planning system instead of an intrinsic dynamical system. Such a *Planning Movement Primitive* (PMP) can react to the environment by optimizing the trajectory for the specific current situation. The intrinsic probabilistic planning system is described as a graphical model that represents the SOC problem (Kappen et al., 2009; Toussaint, 2009). Training such a MP therefore amounts to learning a graphical model such that inference in the learned graphical model will generate an appropriate policy. This has several implications. First, this approach implies a different level of generalization compared to a dynamical system that generates a fixed (temporally flexible) reference trajectory. For instance, if the end effector target changes between training and testing phase, an intrinsic planning system will generalize to a new target without retraining. A system that directly encodes a trajectory would either have to be retrained or use heuristics to be adapted (Pastor et al., 2009). Second, this approach truly integrates SOC principles within the MP. The resulting policy follows the minimum intervention principle and is compliant compared to a feedback controller that aims to follow a reference trajectory.

As with DMPs, a PMP is trained in a standard RL setting. Instead of parameterizing the shape of the trajectory directly, a PMP has parameters that determine the intrinsic cost function of the intrinsic SOC system. While the reward function (typically) gives a single scalar reward for a whole movement, the learned intrinsic cost function is in the standard SOC form and defines task and control costs for every time-step of the movement. In other terms, training a PMP means to learn from a sparse reward signal an intrinsic cost function such that the SOC system will, with high probability, generate rewarded movements. Parallel to this learning of an intrinsic cost function, a PMP also exploits the data to learn an approximate model of the system dynamics. This approximate dynamics model is used by the intrinsic SOC system. Therefore, PMP learning combines model-based and model-free RL: it learns a model of the system dynamics while at the same time training PMP parameters based on the reward signal. It does not learn an approximate model of the reward function itself. We can exploit supervised learning methods such as Vijayakumar et al. (2005) and Nguyen-Tuong et al. (2008a,b) for learning the system dynamics and at the same time use policy search methods to adapt the PMP parameters that determine the intrinsic cost function. This two-fold learning strategy has the promising property of fully exploiting the data by also estimating the system dynamics instead of only adapting policy parameters.

As mentioned above, our approach is to represent the intrinsic SOC system as a graphical model, building on the recent work on Approximate Inference Control (AICO), (Toussaint, 2009). AICO generates the movement by performing inference in the graphical model that is defined by the system dynamics and the intrinsic cost function. Since we learn both from experience, all conditional probability distributions of this graphical model are learned in the RL setting. The output of the planner is a linear feedback controller for each time slice.

Our experiments show that by the use of task-relevant features, we can significantly facilitate learning and generalization of complex movement skills. Moreover, due to the intrinsic SOC planner, our MP representation implements the principles of optimal control, which allows to learn solutions of high quality which are not representable with traditional trajectory-based methods.

In the following section we review in more detail related previous work and the background on which our methods build. Section 3 then introduces the proposed PMP. In section 4 we evaluate the system on a one-dimensional via-point task and a complex dynamic humanoid balancing task and compare to DMPs. We conclude with a discussion in section 5.

# **2. RELATED WORK AND BACKGROUND**

We review here the related work based on parameterized movement policies, policy search methods and SOC.

# **2.1. PARAMETERIZED MOVEMENT POLICIES**

MPs are a parametric description of elementary movements (d'Avella et al., 2003; Schaal et al., 2003; Neumann et al., 2009). We will denote the parameter vector of a MP by **θ** and the possibly stochastic policy of the primitive as π(**u**|**x**,*t*; **θ**), where **u** is the applied action and **x** denotes the state. The key idea of the term "primitive" is that several of these elementary movements can be combined not only sequentially but also simultaneously in time. However, in this paper, we want to concentrate on the parameterization of a single MP. Thus we only learn a single elementary movement. Using several MPs simultaneously is part of future work for our approach as well as for existing approaches such as Schaal et al. (2003) and Neumann et al. (2009).

Many types of MPs can be found in the literature. The currently most widely used movement representation for robot control are the DMPs (Schaal et al., 2003). DMPs evaluate parameterized dynamical systems to generate trajectories. The dynamical system is constructed such that the system is stable. In order to do so, a linear dynamical system is used which is modulated by a learnable non-linear function *f* . A great advantage of the DMP approach is that the function *f* depends linearly on the parameters **θ** of the MP: *f*(*s*) = *-*(*s*)*T***θ**, where *s* is the time or phase variable. As a result, imitation learning for DMPs is straightforward, as this can simply be done by performing a linear regression (Schaal et al., 2003). Furthermore, it also allows the use of many wellestablished RL methods such as policy gradient methods (Peters and Schaal, 2008) or Policy Improvements by Path Integrals PI<sup>2</sup> (Theodorou et al., 2010). The complexity of the trajectory can be scaled by the number of features used for modeling *f* . We can also adapt meta-parameters of the movement such as the movement speed or the goal state of the movement (Pastor et al., 2009; Kober et al., 2010). However, as the features *-*(*s*) are fixed, the ability of the approach to extract task-relevant features is limited. Yet, the change of the desired trajectory due to the change of the meta-parameters is based on heuristics and does not consider task-relevant constraints. While the dynamical system of a DMP is to some degree reactive to the environment—namely by adapting the time progression of the canonical system depending on joint errors and thereby de- or accelerating the movement execution as needed (Ijspeert and Schaal, 2003; Schaal et al., 2003)—the trajectory shape itself is fixed and non-reactive to the environment. As the DMPs are the most common movement representation, we will use it as a baseline in our experiments. A more detailed discussion of the DMP approach can be found in the Appendix.

Another type of movement representation was introduced in Neumann et al. (2009) by the movement template framework. Movement templates are temporally extended, parameterized actions, such as sigmoidal torque, velocity or joint position profiles, which can be sequenced in time. This approach uses a more complex parameterization as the DMPs. For example, it also incorporates the duration of different phases, like an acceleration or deceleration phase. The division of a movement into single phases allows the use of RL methods to learn how to sequence these primitives. However, as the approach still directly specifies the shape of the trajectory, defining complex movements for highdimensional systems is still complicated, which has restricted the use of movement templates to rather simple applications.

An interesting movement representation arizing from analysis of biological data are muscle synergies (d'Avella et al., 2003; Bizzi et al., 2008). They have been used to provide a compact representation of electromyographic muscle activation patterns. The key idea of this approach is that muscle activation patterns are linear sums of simpler, elemental patterns, called muscle synergies. Each muscle synergy can be shifted in time and scaled with a linear factor to construct the whole activation pattern. While the synergy approach has promising properties such as the linear superposition and the ability to share synergies between tasks, except for some smaller applications (Chhabra and Jacobs, 2006), these MPs have only been used for data analysis, and not for robot control.

All the so far presented MPs are inherently local approaches. The specified trajectory and hence the resulting policy are only valid for a local (typically small) neighborhood of our initial state. If we are in a new situation, it is likely that we need to re-estimate the parameters of the MP. The generation of the reference trajectory for these approaches is often an offline process and does not incorporate knowledge of the system dynamics, proprioceptive or other sensory feedback. Because the reference trajectory itself is usually created without any knowledge of the system model, the desired trajectory might not be applicable, and thus, the real trajectory of the robot might differ considerably from the specified trajectory.

There are only few movement representations which can also be used globally, i.e., for many different initial states of the systems. One such methods is the Stable Estimator of Dynamical Systems (Khansari-Zadeh and Billard, 2011) approach. However, this method has so far only been applied to imitation learning, using the approach for learning or improving new movement skills is not straight forward. We will therefore restrict our discussion to local movement representations.

Our PMP approach is, similar as the DMPs, a local approach. In a different situation, different abstract goals and features might be necessary to achieve a given task. However, as we extract task-relevant features and use them as parameters, the same parameters can be used in different situations as long as the taskrelevant features do not change. As we will show, the valid region where the local MPs can still be applied is much larger for the given control tasks in comparison to trajectory-based methods.

# **2.2. POLICY SEARCH FOR MOVEMENT PRIMITIVES**

Let **x** denote the state and **u** the control vector. A trajectory τ is defined as sequence of state control pairs, τ = **x**1:*T*, **u**1:*T*−1, where *T* is the length of the trajectory. Each trajectory has associated costs *C*(τ) (denoted as extrinsic cost), which can be an arbitrary function of the trajectory. It can, but need not be composed of the sum of intermediate costs during the trajectory. For example, it could be based on the minimum distance to a given point throughout the trajectory. We want to find a MP's parameter vector **<sup>θ</sup>**<sup>∗</sup> <sup>=</sup> argmin**θ***J*(**θ**) which minimizes the expected costs *J*(**θ**) = E [*C*(τ)|**θ**]. We assume that we can evaluate the expected costs *J*(**θ**) for a given parameter vector **θ** by performing roll-outs on the real system.

In order to find **θ**<sup>∗</sup> we can apply policy search methods. Here a huge variety of possible methods exists. Policy search methods can be coarsely divided into step-based exploration and episode-based exploration approaches. Step-based exploration approaches such as Peters and Schaal (2008), Kober and Peters (2011), and Theodorou et al. (2010) apply an exploration noise to the action of the agent at each time-step of the episode. Subsequently, the policy is updated such that the (noisy) trajectories with higher reward are more likely to be repeated. In order to do this update, step-based exploration techniques strictly rely on a policy which is linear in its parameters. This is true for the DMPs (Schaal et al., 2003). Currently, the most common policy search methods are step-based approaches, including the REINFORCE (Williams, 1992), the episodic Natural Actor Critic (Peters and Schaal, 2008), the PoWER (Kober and Peters, 2011), or the PI<sup>2</sup> (Theodorou et al., 2010) algorithm. This also explains partially the popularity of the DMP approach for motor skill learning because DMPs are, from those introduced above, the only representation which can be used for these step-based exploration methods (apart from very simple ones like linear controllers).

However, recent research has also intensified on episode-based exploration techniques that make no assumptions on a specific form of a policy (Hansen et al., 2003; Wierstra et al., 2008; Sehnke et al., 2010). These methods directly perturb the policy parameters **θ** and then estimate the performance of the perturbed **θ** parameters by performing roll-outs on the real system. During the episode no additional exploration is applied (i.e., a deterministic policy is used). The policy parameters are then updated in the estimated direction of increasing performance. Thus, these exploring methods do not depend on a specific form of parameterization of the policy. In addition, they allow the use of second order stochastic search methods that estimate correlations between policy parameters (Hansen et al., 2003; Wierstra et al., 2008; Heidrich-Meisner and Igel, 2009b). This ability to apply correlated exploration in parameter-space is often beneficial in comparison to the uncorrelated exploration techniques applied by all step-based exploration methods, as we will demonstrate in the experimental section.

# **2.3. STOCHASTIC OPTIMAL CONTROL AND PROBABILISTIC INFERENCE FOR PLANNING**

SOC methods such as Todorov and Li (2005), Kappen (2007), and Toussaint (2009) have been shown to be powerful methods for movement planning in high-dimensional robotic systems. The incremental Linear Quadratic Gaussian (iLQG) (Todorov and Li, 2005) algorithm is one of the most commonly used SOC algorithms. It uses Taylor expansions of the system dynamics and cost function to convert the non-linear control problem in a Linear dynamics, Quadratic costs and Gaussian noise system (LQG). The algorithm is iterative—the Taylor expansions are recalculated at the newly estimated optimal trajectory for the LQG system.

In Toussaint (2009), the SOC problem has been reformulated as inference problem in a graphical model, resulting in the AICO algorithm. The graphical model is given by a simple dynamic Bayesian network with states **x***t*, actions **u***<sup>t</sup>* and task variables **g**[*i*] (representing the costs) as nodes, see **Figure 1**. The dynamic Bayesian network is fully specified by conditional distributions encoded by the cost function and by the state transition model. If beliefs in the graphical model are approximated as Gaussian the resulting algorithm is very similar to iLQG. Gaussian message passing iteratively re-approximates local costs and transitions as LQG around the current mode of the belief within a time slice. A difference to iLQG is that AICO uses forward messages instead of a forward roll-out to determine the point of local LQG approximation and can iterate belief re-approximation with in a time slice until convergence, which may lead to faster overall convergence. For a more detailed discussion of the AICO algorithm with Gaussian message passing see section 3.5.

Local planners have the advantage that they can be applied to high-dimensional dynamical systems, but the disadvantage of requiring a suitable initialization. Global planning (Kuffner and LaValle, 2000) on the other hand does not require an initial solution, however, they have much higher computational demands. Our motivation for using only a local planner as component of a PMP is related to the learning of an intrinsic cost function.

Existing planning approaches for robotics typically use handcrafted intrinsic cost functions and the dynamic model is either analytically given or learned from data (Mitrovic et al., 2010). PMPs use RL to train an intrinsic cost function for planning instead of trying to learn a model of the extrinsic reward directly. The reason is that a *local* planner often fails to directly solve realistically complex tasks by optimizing directly the extrinsic cost functions. From this perspective, PMPs learn to translate complex tasks to a simpler intrinsic cost function that can efficiently be optimized by a local planner. This learning is done by trial-anderror in the RL setting: the PMP essentially learns from experience which intrinsic cost function the local planner can cope with and uses it to generate good trajectories. Thereby, the RL of the intrinsic cost function can compensate the limitedness of the local planner.

# **3. MATERIALS AND METHODS**

In this section we introduce the proposed PMPs, in particular the parameterization of the intrinsic cost function. The overall system will combine three components: (1) a regression method for learning the system dynamics, (2) a policy search method for finding the PMP parameters, and (3) a SOC planner for generating movements with the learned model and PMP parameters.

# **3.1. PROBLEM DEFINITION**

We consider an unknown dynamical system of the form

$$\mathbf{x}\_{t+1} = f\_{\text{Dyn}}(\mathbf{u}\_t, \mathbf{x}\_t) + \boldsymbol{\epsilon}\_t,\tag{1}$$

with state variable **x***t*, controls **u***<sup>t</sup>* and Gaussian noise *<sup>t</sup>* ∼ *N* (0, **σ**). The agent has to realize a control policy π : **x***<sup>t</sup>* → **u***t*, which in our case will be a linear feedback controller for each time slice. The problem is to find a policy that minimizes the expected costs of a finite-horizon episodic task. That is, we assume there exists a cost function *C*(τ), where τ = (**x**1:*T*, **u**1:*T*−1) is a roll-out of the agent controlling the system. We do not assume that the cost function *C*(τ) is analytically known or can be written as sum over individual costs for each time-step, i.e., - *C*(τ) = *<sup>t</sup> ht*(**x***t*, **u***t*). This would imply an enormous credit assignment problem with separate costs at each time-step. Thus more generally, we only get a single scalar reward *C*(τ) for the whole trajectory. The problem is to find argminπ-*C*(τ)π.

The system dynamics *f*Dyn as well as the cost function *C*(τ) are analytically unknown. Concerning the system dynamics we can estimate an approximate model of the systems dynamics from a set of roll-outs—as standard in model-based RL. However, concerning costs, we only receive the single scalar cost *C*(τ) after a roll-out indicating the quality or success of a movement. Note that *C*(τ) is a function of the whole trajectory, not only of the final state. Learning *C* from data would be an enormous task, more complex than learning an immediate reward function **x***<sup>t</sup>* → **r***<sup>t</sup>* as in standard model-based RL where **r***<sup>t</sup>* denotes the reward at time *t*.

Generally, approaches to learn *C*(τ) directly in a form useful for applying SOC methods seems an overly complex task and violates the maxim "never try to solve a problem more complex than the original." Therefore, our approach will not try to learn *C*(τ) from data but to employ RL to learn *some* intrinsic cost function that can efficiently be optimized by SOC methods and generates control policies that, by empiricism, minimizes *C*(τ).

### **3.2. PARAMETERIZATION OF PMP'S INTRINSIC COST FUNCTION**

In PMPs the parameters **θ** encode task-relevant abstract goals or features of the movement, which specify an intrinsic cost function

$$L(\mathbf{r}; \boldsymbol{\theta}) := \sum\_{t=0}^{T} l(\mathbf{x}\_t, \mathbf{u}\_t, t; \boldsymbol{\theta}) + c\_{\boldsymbol{\mathcal{P}}}(\mathbf{x}\_t, \mathbf{u}\_t), \tag{2}$$

where *l* denotes the intermediate intrinsic cost function for every time-step and *cp*(**x***t*, **u***t*) is used to represent basic known task constraints, such as torque or joint limits. We will assume that such basic task constraints are part of our prior knowledge, thus *cp* is given and not included in our parameterization. For the description of PMPs we will neglect the constraints *cp* for simplicity. We will use a via-point representation for the intermediate intrinsic cost function *l*(**x***t*, **u***t*, *t*; **θ**). Therefore, parameter learning corresponds to extracting goals which are required to achieve a given task, such as passing through a via-point at a given time. As pointed out in the previous section, *L*(τ; **θ**) is *not* meant to approximate *C*(τ). It provides a feasible cost function that empirically generates policies that minimize *C*(τ).

There are many ways to parameterize the intermediate intrinsic cost function *l*. In this paper we choose a simple via-point approach. However, in an ongoing study we additionally implemented a desired energy state of a pendulum on a cart, which simplifies the learning problem. The movement is decomposed in *N* shorter phases with duration *d*[*i*] , *i* = 1, .., *N*. In each phase the cost function is assumed to be quadratic in the state and control vectors. In the *i*th phase (*t* < *d*[1] for *i* = 1 and *i*−1 *<sup>j</sup>*=<sup>1</sup> *<sup>d</sup>*[*i*] <sup>&</sup>lt; *<sup>t</sup>* <sup>≤</sup> *i <sup>j</sup>*=<sup>1</sup> *<sup>d</sup>*[*i*] for *<sup>i</sup>* <sup>&</sup>gt; 1) we assume the intrinsic cost has the form:

$$d(\mathbf{x}\_t, \mathbf{u}\_t, t; \theta) = (\mathbf{x}\_t - \mathbf{g}^{[i]})^T \mathbf{R}^{[i]} (\mathbf{x}\_t - \mathbf{g}^{[i]}) + \mathbf{u}\_t^T \mathbf{H}^{[i]} \mathbf{u}\_t. \quad (3)$$

It is parameterized by the via-point **g**[*i*] in state space; by the precision vector **r**[*i*] which determines **R**[*i*] = diag(exp **r**[*i*] ) and therefore how steep the potential is along each state dimension; and by the parameters **h**[*i*] which determine **H**[*i*] = diag(exp **h**[*i*] ) and therefore the control costs along each control dimension. We represent the importance factors **r**[*i*] and **h**[*i*] both in log space as we are only interested in the relationship of these factors. At the end of each phase (at the via-point), we multiply the quadratic state costs by the factor 1/*t* where *t* is the time-step used for planning. This ensures that at the end of the phase the via-point is reached, while during the phase the movement is less constraint. With this representation, the parameters **θ** of our PMPs are given by

$$\boldsymbol{\theta} = [d^{\{1\}}, \mathbf{g}^{\{1\}}, \mathbf{r}^{\{1\}}, \mathbf{h}^{\{1\}} \dots d^{\{N\}}, \mathbf{g}^{\{N\}}, \mathbf{r}^{\{N\}}, \mathbf{h}^{\{N\}}].\tag{4}$$

Cost functions of this type are commonly used—and handcrafted—in control problems. They allow to specify a via-point, but also to determine whether only certain dimensions of the state need to be controlled to the via-point, and how this trades off with control cost. Instead of hand-designing such cost functions, our method will use a policy search method to learn these parameters of the intrinsic cost function. As for the DMPs we will assume that the desired final state at time index *T* is known, and thus **g**[*N*] is fixed and not included in the parameters. Furthermore, since we consider finite-horizon episodic tasks the duration of the last phase is also fixed: *d*[*N*] = *T* − -*N*−1 *<sup>i</sup>*=<sup>1</sup> *<sup>d</sup>*[*i*] . Still, the algorithm can choose the importance factors **r**[*N*] and **h**[*N*] of the final phase.

### **3.3. DYNAMIC MODEL LEARNING**

PMPs are endowed with an intrinsic planning system. For planning we need to learn a model of the system dynamics *f*Dyn in Equation (1). The planning algorithm can not interact with the real environment, it solely has to rely on the learned model. Only after the planning algorithm is finished, the resulting policy is executed on the real system and new data points -[**x***t*, **u***t*], **x**˙*t* are collected for learning the model.

Many types of function approximators can be applied in this context (Vijayakumar et al., 2005; Nguyen-Tuong et al., 2008a,b). We use the lazy learning technique Locally Weighted Regression (LWR) (Atkeson et al., 1997) as it is a very simple and effective approach. LWR is a memory-based, non-parametric approach, which fits a local linear model to the locally-weighted set of data points. For our experiments, the size of the data set was limited to 105 points implemented as a first-in-first-out queue buffer, because the computational demands of LWR drastically increase with the size of the data set. In particular we used a Gaussian kernel as distance function with the bandwidth parameters *h*<sup>φ</sup> = 0.25 for joint angles, *h*φ˙ = 0.5 for velocities, and *hu* = 0 for controls <sup>1</sup> . For more details we refer to Chapter 4 in Atkeson et al. (1997).

### **3.4. POLICY SEARCH FOR PMP'S INTRINSIC COST FUNCTION**

Model learning takes place simultaneously to learning the parameters **θ** of the MP. In general this could lead to some instability. However, while the distribution *P*(**x***t*) depends on the policy and the data for model learning is certainly non-stationary, the conditional distribution *P*(**x***t*+1|**u***t*, **x***t*) is stationary. A local learning scheme as LWR behaves rather robust under such type of nonstationarity of the input distribution only. On the other hand, from the perspective of **θ** optimization, the resulting policies may change and lead to different payoffs *C*(τ) even for the same parameters **θ** due to the adaption of the learned system dynamics.

<sup>1</sup>The bandwidth parameter for controls is set to zero as the matrizes **A** and **B** in the linearized model, i.e., **x**˙ = **A**(**x**)**x** + **B**(**x**)**u** are independent on the controls **u**.

Since the resulting control policies of our PMPs depend nonlinearly on the parameters **θ**, step-based exploration techniques can not be used in our setup. Hence, we will use the second order stochastic search method CMA (Covariance Matrix Adaptation, Hansen et al. 2003) which makes no assumptions on the parameterization of the MP.

We employ the second order stochastic search method CMA to optimize the parameters **θ** w.r.t. *C*(τ). The parameter space is approximated using a multivariate Gaussian distribution. Roughly, CMA is an iterative procedure that, from the current Gaussian distribution, generates a number of samples, evaluates the samples, computes second order statistics of those samples that reduced *C*(τ), and uses these to update the Gaussian search distribution. In each iteration, all parameter samples **θ** use the same learned dynamic model to evaluate *C*(τ). Further, CMA includes an implicit forgetting in its update of the Gaussian distribution and therefore behaves robust under the non-stationarity introduced by adaptation of the system dynamics model.

We will compare our PMP approach to both, DMPs learned with CMA policy search and DMPs learned with the state of the art step-based method PI<sup>2</sup> (Theodorou et al., 2010). However, we focus in this work on the characteristics of the movement representation and place less emphasis on a specific policy search method.

Note that even if the learned model is only a rough approximation of the true dynamics, the policy search for parameters of the intrinsic cost function can compensate for an imprecise dynamics model: the policy search approach finds parameters **θ** of the intrinsic cost function such that—even with a mediocre model the resulting controller will lead to low extrinsic costs in the real system.

### **3.5. PROBABILISTIC PLANNING ALGORITHM**

We use the probabilistic planning method AICO (Toussaint, 2009) as intrinsic planning algorithm. It offers the interpretation that a MP can be represented as graphical model and the movement itself is generated by inference in this graphical model.

The graphical model is fully determined by the learned system dynamics and the learned intrinsic cost function, see **Figure 1**. In order to transform the minimization of *L*(τ; θ) into an inference problem, for each time-step an individual binary random variable *zt* is introduced. This random variable indicates a reward event. Its probability is given by

$$P(\mathbf{z}\_t = 1 | \mathbf{x}\_t, \mathbf{u}\_t, t) \propto \exp(-l(\mathbf{x}\_t, \mathbf{u}\_t, t; \theta)),$$

where *l*(**x***t*, **u***t*,*t*; **θ**) denotes the cost function for time-step *t* defined in Equation (3). AICO now assumes that a reward event *zt* = 1 is observed at every time-step. Given that evidence, AICO calculates the posterior distribution *P*(**x**1:*T*, **u**1:*T*−1|*z*1:*<sup>T</sup>* = 1) over trajectories.

We will use the simplest version of AICO (Toussaint, 2009), where an extended Kalman smoothing approach is used to estimate the posterior distribution *P*(**x**1:*T*, **u**1:*T*−1|*z*1:*<sup>T</sup>* = 1). The extended Kalman smoothing approach uses Taylor expansions to linearize the system and subsequently uses Gaussian messages for belief propagation in a graphical model. Gaussian message passing iteratively re-approximates local costs and transitions as a LQG around the current mode of the belief within a time slice. For more details we refer to Toussaint (2009).

AICO is only a local optimization method and we have to provide an initial solution which is used for the first linearization. We will use the direct path (or the straight line) to the via-points **g**[*i*] in Equation (3) as initial solution. Before learning the viapoints **g**[*i*] with *i* = 1..*N* − 1 are set to the initial state **x**1. The final via-point is fixed and set to the desired final state **g**[*N*] = **x***T*.

AICO provides us with a linear feedback controller for each time slice of the form

$$\mathbf{u}\_t = \mathbf{O}\_t \mathbf{x}\_t + \mathbf{o}\_t,\tag{5}$$

where **O***<sup>t</sup>* is the inferred feedback control gain matrix and **o***<sup>t</sup>* denotes the linear feedback controller term. This feedback control law is used as policy of the MP and is evaluated on a simulated or a real robot.

The original formulation of the AICO method (Toussaint, 2009) does not consider torque limits, which are important for many robotic experiments as well for the dynamic balancing experiments we consider in this paper. Therefore we extended the algorithm. This extension yields not only a modified form of the immediate cost function but also results in different update equations for the messages and finally different equations of the optimal feedback controller. A complete derivation of the extension including the resulting messages and the corresponding feedback controller is given in Rückert and Neumann (2012). Also the algorithm is listed in that work.

On overview of the interactions between policy search of the PMP's intrinsic cost function and the planning process using AICO is sketched in **Figure 2**. The learning framework is organized the following: given the parameters **θ** from the policy search method CMA, AICO is initialized with an initial solution which is the direct path to the via-points. AICO is then used to optimize the parameterized intrinsic cost function *L*(τ; θ) to estimate a linear feedback controller for each time-step, see Equation (5). The feedback controller is subsequently executed on the simulated or the real robot and the extrinsic cost *C*(τ) is evaluated. Based on this evidence CMA will update its distribution over the policy search space and computes a new parameter vector. Simultaneously we collect samples of the system dynamics -[**x***t*, **u***t*], **x**˙*t* while executing the MP. These samples are used to improve our learned dynamics model, which is used for planning.

# **4. RESULTS**

We start our evaluation of the proposed PMP approach on a one-dimensional via-point task to illustrate basic characteristics. In order to demonstrate our approach on a more challenging dynamic robot task we choose a complex 4-link humanoid balancing task. At the end of this section we discuss an important issue: the computational time of PMPs for simulated and real world tasks.

In our experiments, we focus on investigating the optimality of the solution, the robustness to noise for learning, and the

generalizability to different initial or final states. For the 4-link task we additionally demonstrate how model learning influences the learning performance.

For a comparison we take the commonly used DMPs as a baseline where we use the newest version of the DMPs (Pastor et al., 2009) as discussed in detail in Appendix A. As described above we use 2nd order stochastic search to learn the PMP and DMP parameters. In order to compare to a more commonly used policy search algorithm we additionally test the PI<sup>2</sup> algorithm (Theodorou et al., 2010) for learning the DMP parameters. For all experiments we empirically evaluate the optimal settings of the algorithms (such as the exploration rate of CMA and PI2, the number of centers for the DMPs, or the number of via-points for the PMPs), which are listed in Appendix B.

# **4.1. ONE-DIMENSIONAL VIA-POINT TASK**

In this task the agent has to control a one-dimensional point mass of 1 kg. The state at time *t* is denoted by **x***<sup>t</sup>* = [φ*t*, φ˙*t*] *<sup>T</sup>* and we directly control the acceleration *ut*. The time horizon was limited to *T* = 50 time-steps, which corresponds to a simulation time of 0.5 s with a time-step of *t* = 10 ms. Starting at **x**<sup>1</sup> = [0, 0] *<sup>T</sup>* the agent has to pass through a given via-point *gv* = −0.2 at *tv* = 30. The velocity of the point mass at the via-point is not specified and can have any value. The final target *gT* was set to 1. The movement is shown in **Figure 3**. For this task we define the extrinsic cost function:

$$C(\mathfrak{r}) = 10^4(\dot{\Phi}\_T^2 + 10(\mathfrak{g}\_T - \phi\_T)^2) + 10^5(\mathfrak{g}\_\nu - \phi\_{t\_{\mathfrak{I}0}})^2$$

$$+ 5 \cdot 10^{-3} \sum\_{t=1}^T \mu\_t^2.$$

The first two terms punish deviations from the target *gT* and the via-point *gv*, where φ*t*<sup>30</sup> denotes the first dimension of the state **x***<sup>t</sup>* = [φ*t*, φ˙*t*] *<sup>T</sup>* at time index 30. The target should be reached with zero velocity at *T* = 50. The last term punishes high energy consumption where *ut* denotes the applied acceleration. The control action is noisy, we always add a Gaussian noise term with a standard deviation of σ = 20 to the control action. As this is a very simple task, we use it just to show different characteristics of the DMPs (using 10 Gaussians for that representation was optimal) and PMPs (apparently 2 via-points are sufficient for this task).

A quite similar task has been used in Todorov and Jordan (2002) to study human movement control. The experiments showed that humans were able to reach the given via-points with high accuracy, however, in between the via-points, the trial-totrial variability was rather high. This is a well-known concept from optimal control, called the *minimum intervention principle*,

showing also that human movement control follows basic rules of optimal control.

### *4.1.1. Optimality of the solutions*

We first estimate the quality of the *best available* policy with the DMP and the PMP approach. We therefore use the PMPs with two via-points and set the parameters **θ** per hand. As we are using a linear system model and a simple extrinsic cost function, the PMP parameters can be directly obtained by looking at the extrinsic costs. As the PMPs use the AICO algorithm, which always produces optimal policies for LQG systems, the PMP solution is the optimal solution. We subsequently use the mean trajectory returned by AICO and use imitation learning to fit the DMP parameters. We also optimized the feedback controllers used for the DMPs <sup>2</sup> . In **Figure 3** we plotted 100 roll-outs of the DMP and PMP approach using this optimal policies. The second column illustrates the trial-to-trial variability of the trajectories. The optimal solution has minimum variance at the via-point and the target. As expected this solution is reproduced with the PMP approach, because the parameters of the PMPs are able to reflect the importance of passing through the via-point. The DMPs could not adapt the variance during the movement because the used (optimized) feedback controller uses constant controller gains. As we can see, the variance of the DMP trajectory is simply increasing with time.

Comparing the optimal solutions we find that PMPs, in contrast to DMPs, can naturally deal with the inherent noise in the system. This is also reflected by the average cost values over 1000 trajectories, 1286 ± 556 for the DMPs and 1173 ± 596 for the PMPs. The ± symbol always denotes the standard deviation. PMPs perform significantly better than DMPs (*t*-test: *p* < 10<sup>−</sup>3).

# *4.1.2. Robustness to noise for learning*

This advantage would not be very useful if we were not able to learn the optimal PMP parameters from experience. Next we test using CMA policy search to learn the parameters for the DMPs and the PMPs. In addition, in order to compare to a more commonly used policy search method, we also compare to the PI<sup>2</sup> approach (Theodorou et al., 2010) which we could only evaluate for the DMP approach. We evaluated the learning performance in the case of no control noise, **Figure 4A**, and in the case of control noise σ = 20, **Figure 4B** performing 15 runs.

<sup>2</sup>The control gains, i.e., the two scalars *k*pos and *k*vel of the linear feedback controller in Equation (7) in the Appendix are learned using a 2nd order stochastic search method.

Without control noise the quality of the learned policy found by 2nd order search is similar for the DMPs (657.5 ± 0.18) and the PMPs (639.6 ± 0.01). PI2 could not find as good solutions as the stochastic search approach. The reason for this is that PI2 could not find the very large weight values which are needed for the last few centers of the DMPs in order to have exactly zero velocity at the final state (note that the weights of the DMPs are multiplied by the phase variable *s* which almost vanishes in the end of the movement and therefore these weight values have to be very high). Because CMA policy search uses second order information, such large parameter values are easily found. This comparison clearly shows that using 2nd order search for policy search is justified. If we compare the learning speed in terms of required episodes or roll-outs between DMPs and PMPs, we find an advantage for PMPs which could be learned an order of magnitude faster than the DMPs.

The second experiment (with control noise of σ = 20) was considerably harder to learn. Here, we needed to average each performance evaluation over 20 roll-outs. The use of more sophisticated extensions of CMA (Heidrich-Meisner and Igel, 2009a) which can deal with noisy performance evaluations and hence improve the learning speed of CMA policy search in the noisy setup is part of future work. In **Figure 4B** we find that the PMPs could be learned an order of magnitude faster than the DMPs. As expected from the earlier experiment, the PMPs could find clearly better solutions as the DMPs as they can adapt the variance of the trajectory to the task constraints. Again, PI2 showed a worse performance than 2nd order search. Illustrated are mean values and standard deviations over 15 runs of learning (1034 ± 1.46 for the PMPs and 1876 ± 131 for the DMPs using CMA). To compare these results to the optimal costs we evaluated the best learned policies of both approaches and generated 1000 trajectories. The learned solution for the PMPs was similar to the hand-coded optimal solution, 1190 ± 584 versus costs of 1173 ± 596 for the optimal solution. DMPs achieved costs of 1478 ± 837, illustrating that, eventhough the DMPs are able to represent much better solutions with costs of 1286 ± 556 (see **Figure 3**), it is very hard to find this solution.

In **Table 1**, we show the mean and variance of the found parameters averaged over 15 runs for the first via-point in


**Table 1 | Learned parameters using PMPs for the via-point task (1st via-point).**

*The symbol* ± *denotes the standard deviation.*

comparison to the optimal PMP parameters. We can see that the found parameters matched the optimal ones. Interestingly, in the experiment with no noise, the found parameters had a larger deviation from the optimal ones, especially for the first via-point *g*[1] in **Table 1**. The reason for this is the simple observation that without noise, we can choose many via-points which results in the same trajectory, whereas with noise we have to choose the correct via-point in order to reduce the variance of the trajectory at this point in time.

## *4.1.3. Generalizability to different task settings*

Next, we investigate the ability of both approaches to adapt to different situations or to adapt to different priors. In the previous task, **Figure 3** the initial and the target state were assumed as prior knowledge. The movement was learned for the initial state φ<sup>1</sup> = 0 and for the target state φ*<sup>T</sup>* = 1. We want to investigate if the same learned parameters can be re-used to generate different movements, e.g., used for different initial or target states.

For PMPs we use the new initial or final states denoted by **x**<sup>1</sup> and **g**[*N*] in the graphical model in **Figure 1** and re-plan the movement (using the same learned parameters). The change of the initial state or the target state is also allowed by the DMP framework. However, how the movement is generalized to these new situations is based on heuristics (Pastor et al., 2009) and does not consider any task constraints (in this example to pass through the via-point).

In **Figure 5** the learned policies are applied to reach different initial states φ<sup>1</sup> ∈ {−0.6, −0.4, −0.2, 0, 0.2, 0.4, 0.6} (**Figures 5A,B**) and different goal states φ*<sup>T</sup>* ∈ {1.5, 1.25, 1, 0.75, 0.5} (**Figures 5C,D**). All plots show the mean trajectory. In order to change the initial or the target state of the movement we have to change the point attractor of the DMPs, which changes the complete trajectory. Due to this heuristic, the resulting DMP trajectories shown in **Figures 5A,C** do not pass through the via-point any more. Note that we use a modified version of the DMPs (Pastor et al., 2009) which has already been built for generalization to different initial or target points. The PMPs on the other hand still navigate through the learned via-point when changing the initial or the goal state as shown in **Figures 5B,D**.

### *4.1.4. Concluding summary*

As we have seen the PMPs implement all principles of optimal control, which allows to learn solutions for stochastic systems of a quality which is not representable with traditional trajectorybased methods such as the DMPs. The optimal movement trajectory could be learned from scratch up to one order of magnitude faster compared to DMPs. This difference was even more visible in the stochastic case, where the DMPs needed more than 30, 000 episodes to find satisfactory solutions. In the setting with control noise the learned parameters matches the optimal ones because only by the use of noise the parameters are uniquely determined. Finally the PMPs could extract the task-relevant feature, the viapoint. Even if the task changes—e.g., the initial or the final state are changed, the movement trajectory still passes through the learned via-point. The DMPs on the other hand heuristically scale the trajectory which offers no control for fulfilling task-relevant constraints.

With DMPs 12 parameters were learned, where we used 10 Gaussian kernels and optimized 2 control gains. For the PMPs 2 via-points were sufficient, where the last one was fixed. However, for both via-points we could specify 3 importance weights. Thus, in total 8 = 2 + 3 + 3 parameters were learned.

## **4.2. DYNAMIC HUMANOID BALANCING TASK**

In order to assess the PMPs on a more complex task, we evaluate the PMP and DMP approach3 on a dynamic non-linear balancing task (Atkeson and Stephens, 2007). The robot gets pushed with a specific force *F* and has to keep balance. The push results in an immediate change of the joint velocities. The motor torques are limited, which makes direct counter-balancing of the force unfeasible. The optimal strategy is therefore to perform a fast bending movement and subsequently return to the upright position, see **Figure 6**. This is a very non-linear control problem, applying any type of (linear) balancing control or local optimal control algorithm such as AICO with the extrinsic cost function fails. Thus, we have to use a parametric movement representation. Like in the previous experiment, we take the DMP (Schaal et al., 2003) approach as a baseline.

We use a 4-link robot as a simplistic model of a humanoid (70 kg, 2 m) (Atkeson and Stephens, 2007). The eight-dimensional state **x***<sup>t</sup>* is composed of the arm, the hip, the knee and the ankle positions and their velocities. Table 1 in Rückert and Neumann (2012) shows the initial velocities (resulting from the force *F* which always acts at the shoulder of the robot) and the valid joint angle range for the task. In all experiments the applied force was *F* = 25Ns. If one of the joints leaves the valid range the robot is considered to be fallen. Additionally to the joint limits, the controls are limited to the intervals [±250, ±500, ±500, ±70] Nm (arm, hip, knee, and ankle). For more details we refer to Atkeson and Stephens (2007).

<sup>3</sup>For PMPs again 2 via-points were optimal. DMPs performed best when using 10 Gaussian kernels per dimension.

**movement was learned for a single initial state φ<sup>1</sup> = 0 and a single target state φ***<sup>T</sup>* **= 1.** The initial and the target state were assumed as prior knowledge. In this experiment we evaluated the generalization of the learned policies to different initial states φ<sup>1</sup> ∈ {−0.6, −0.4, −0.2, 0, 0.2, 0.4, 0.6} **(A–B)** and different target states

Let *ts* be the last time index where the robot has not fallen and let **x***ts* be the last valid state. The final state or resting state (upright position with zero velocity) is denoted by **x***r*. The movement was simulated for 5 s with a *t* = 10 ms resulting in *T* = 500 time-steps. As extrinsic cost function *C*(τ)

mean trajectories. The DMPs **(A,C)** are not aware of task-relevant features and hence do not pass through the via-point any more. **(B,D)** PMPs can adapt to varying initial or target states with small effects on

passing through the learned via-point.

we use:

$$C(\mathbf{r}) = 20(t\_i - T)^2 + (\mathbf{x}\_{t\_i} - \mathbf{x}\_r)^T \mathbf{R}\_E (\mathbf{x}\_{t\_i} - \mathbf{x}\_r)$$

$$+ \sum\_{t=1}^{t\_i} \mathbf{u}\_t^T \mathbf{H}\_E \mathbf{u}\_t. \tag{6}$$

The first term (*ts* − *T*)<sup>2</sup> is a punishment term for falling over. If the robot falls over, this term typically dominates. The precision matrix **R***<sup>E</sup>* determines how costly it is not to reach **x***r*. The diagonal elements of **R***<sup>E</sup>* are set to 103 for joint angles and to 10 for joint velocities. Controls are punished by **H***<sup>E</sup>* = 5 · 10−6**I**. Because of the term (*ts* − *T*)<sup>2</sup> we cannot directly encode the extrinsic cost function as a sum of intermediate costs, which is usually required for SOC algorithms. But we can use PMPs to transform this reward signal into an intrinsic cost function for a local probabilistic planner.

### *4.2.1. Optimality of the solutions*

We use additive zero-mean Gaussian noise with a standard deviation σ = 10. In contrast to the simple via-point task where imitation learning was used to compare the trajectories shown in **Figure 3** are the policies for the 4-link task learned from scratch. **Figure 7** illustrates the best learned policies for DMPs (left column) and PMPs (right column). Shown are the joint angle trajectories (**Figures 7A,B**) and the variance of these trajectories (**Figures 7C,D**). The corresponding controls are illustrated in **Figure 8**. We evaluated 100 roll-outs of the best policies found by both approaches. While the DMPs cannot adapt the variance during the movement **Figure 7C**, the PMPs can exploit the power of SOC and are able to reduce the variance at the learned viapoint (marked by crosses) **Figure 7D**. As the PMPs are able to control the variance of the trajectory, we can see that the variance of the movement is much higher compared to the DMPs (**Figures 7C,D**). Accuracy only matters at the via-points. We can also see that the arm trajectory has a high variance after the robot is close to a stable up-right posture **Figure 7B**, because it is not necessary to strictly control the arm in this phase. The best found policy of the DMPs had costs of 568 while the best result using PMPs was 307. This strongly suggests that it is advantageous to reduce the variance at certain points in time in order to improve the quality of the policy.

# *4.2.2. Robustness to noise for learning*

Next, we again want to assess the learning speed of both approaches. We again used CMA policy search for the PMPs and DMPs as well as PI2 for the DMP approach. The average over 20

runs of the learning curves are illustrated in **Figure 9**. Using the PMPs as movement representation, good policies could be found at least one order of magnitude faster compared to the trajectorybased DMP approach. The quality of the found policies was better for the PMP approach (mean values and standard deviations after learning: 993 ± 449 for the DMPs and 451 ± 212 for the PMPs). For the DMP approach we additionally evaluated PI<sup>2</sup> for policy search, however, PI2 was not able to find good solutions—the robot always fell over.

# *4.2.3. Generalizability to different task settings*

In the next step we again test the generalization to different initial or final states. More specific we investigate how well the approaches can adapt to different priors of the arm joint.

In the previous task the target was assumed to be known prior knowledge and the policy was learned for a final arm posture of φ*T*arm = 0. We used this learned policy to generate movements to different final targets of the arm joint φ*T*arm ∈ {3, 2.5, 2, 1.5, 1, 0.5, 0, −0.2, −0.4, −0.6}. We only change either the arm-position of the last via-point or the point attractor of the dynamical system. The results shown in **Figure 10** confirm the findings of the one-dimensional via-point task. The PMPs first move to the via-point, always maintaining the extracted task constraints, and afterward move the arm to the desired position while keeping balance. All desired target positions of the arm could be fulfilled. In contrast, the DMPs managed to keep balance only for few target positions. The valid range of the target arm position with DMPs was φ*T*arm ∈ [−0.2, 1]. This shows the advantage of generalization while keeping task constraints versus generalization per using the DMP heuristics.

The ability of the two approaches to adapt to different initial states is illustrated in **Figure 11**. We used the learned policy for φ1arm = 0 to generate movements using different initial states of the arm joint: φ1arm = {1, 0.5, 0.2, 0, −0.2, −0.4, −0.6}. The push perturbing the robot results in an immediate change of the joint velocities, which are shown in **Table A5** in the Appendix for these different initial states. For the DMPs only the joint angles 0 and −0.2 resulted in successful policies. Whereas with PMPs the valid range of the initial arm position was φ1arm ∈ [−0.6, 0.5].

# *4.2.4. Model learning using PMPs*

So far all experiments for the PMPs were performed using the known model of the system dynamics, these experiments are denoted by PMP in **Figure 12**. Note that also for the DMPs the known system model has been used for inverse dynamics control. Now we want to evaluate how model learning affects the performance of our approach. This can be seen in **Figure 12**. In the beginning of learning the extrinsic costs are larger compared to motor skill learning with a given analytic model. However, as the number of collected data-points -[**x***t*; **u***t*], **x**˙*t* increases the PMPs with model learning quickly catch up and converge finally to the same costs. The PMP representation with model learning in parallel considerably outperforms the trajectory-based DMP approach in learning speed and in the final costs.

# *4.2.5. Computational time*

For our simulations we used a standard personal computer (2.6 Ghz, 8 Gb ram) with implementations of the algorithms in C++. For the 4-link pendulum the DMPs could generate the movement trajectory within less than 0.1 s. With the proposed PMPs it took less than 1 s (including model learning). The time horizon was 5 s and we used a time-step of *t* = 10 ms.

### *4.2.6. Concluding summary*

In contrast to the via-point task the optimal solution for this dynamic balancing task is unknown. The comparison to the

**FIGURE 9 | The figure illustrates the learning performance of the two movement representations, DMPs and PMPs for the 4-link balancing task.** Illustrated are mean values and standard deviations over 20 runs after CMA policy search. The controls (torques) are perturbed by zero-mean Gaussian noise with σ = 10 Nm. The PMPs are able to extract characteristic features of this task which is a specific posture during the bending movement, shown in **Figure 7B**. Using the

proposed Planning Movement Primitives good policies could be found at least one order of magnitude faster compared to the trajectory-based DMP approach. Also, the quality of the best-found policy was considerably better for the PMP approach (993 ± 449 for the DMPs and 451 ± 212 for the PMPs). For the DMP approach we additionally evaluated PI<sup>2</sup> for policy search, which could not find as good solutions as the CMA policy search approach.

DMPs shows a similar result as for the via-point task. With the PMPs we could find movements of significantly higher quality compared to the DMPs and the motor skill could be learned up to one order of magnitude faster with PMPs. We again applied the same parameters to different initial or final states to demonstrate the generalization ability. Now we can see the advantage of the learned task-relevant features. While the PMPs still try to fulfill the extracted task-relevant constraints and therefore succeeded for almost all initial/final state configurations, the DMPs again just heuristically scale the trajectory, which results in the robot falling in almost all but the learned configurations. Finally we showed that the dynamic model could be learned in parallel for the 4-link balancing task.

For all balancing experiments shown in this section the robot was pushed with the specific force *F* = 25 Ns. We have performed the same evaluations for various forces and the results are basically the same. For example, a comparison of the learning performances using the negative force *F* = −25 Ns is shown in **Figure 13**. The executed movement of the best learned policy using PMPs is shown in **Figure 14**.

**FIGURE 11 | This figure illustrates the joint angle trajectories (arm, hip, knee and ankle) of a 4-link robot model during a balancing movement for different initial states of the arm joint (1, 0.5, 0.2, 0, −0.2, −0.4, −0.6).** The applied policies were learned for an initial

arm posture of φ1arm = 0. **(A)** The valid range of the arm joint using DMPs is φ1arm ∈ [−0.2, 0]. Large dots in the plot indicates that the robot has fallen. **(B)** PMPs could generate valid policies for φ1arm ∈ [−0.6, 0.5].

speed and in the final costs.

In total 40 weights and 8 control gains were learned with DMPs. For PMPs we used 2 via-points where the second one was fixed. Thus, we learned 8 parameters specifying the first via-point and additionally 12 importance weights for each via-point. This results in 32 parameters.

denoted by PMP-M. In the beginning of learning the extrinsic costs are

# **5. DISCUSSION**

In this paper we concentrated on three aspects of biological motor control, which are also interesting for robotic motor skill learning: (1) the modularity of the motor system, which makes it possible to represent high-dimensional action spaces in terms of lower-dimensional MPs, (2) its variability and behavior under stochasticity, and (3) the efficiency in learning movement strategies.

In order to achieve similar properties also for robotic motor skill learning we propose to exploit the power of SOC already at the level of the MP. Instead of endowing a MP with a dynamical system, like the DMPs (Schaal et al., 2003), we endow a MP with an intrinsic probabilistic planning system. The resulting MP is called PMP. For the dynamical systems approach the parameters of the MP indirectly define the shape of the trajectory. In our case, the parameters of the MP define the intrinsic cost function of a graphical model, which represents a SOC problem. Performing inference in this graphical model yields the controls for executing the movement.

Due to the use of the intrinsic planning system our representation complies with basic principles of SOC. For example, the PMPs are able to account for the motor variability often observed

**FIGURE 13 | This figure illustrates the learning performance of the two movement representations, DMPs and PMPs for the 4-link balancing task. But in contrast to the evaluations shown in the experimental section did we here apply a** *negative* **force F = −25 Ns.** Illustrated are mean values and standard deviations over 20 runs after CMA policy search. The controls (torques) are perturbed by zero-mean Gaussian noise with

in human motion. Instead of suppressing the noise of the system by following a single reference trajectory, the PMPs are able to learn to intervene the system only if it is necessary to fulfill a given task, also known as the minimum intervention principle (Todorov and Jordan, 2002). This allows a much higher variance in parts of the trajectory where less accuracy is needed. Current methods which rely on a reference trajectory are not able to reproduce these effects.

The parameters of PMPs encode learned task-relevant features of the movement, which are used to specify the intrinsic cost function for the MP's intrinsic planning system. Our experiments have shown that such a task-related parameterization facilitates learning and generalization of movement skills. Policies of higher quality could be found an order of magnitude faster than with the competing DMP approach. In addition, as confirmed by our experiments, the intrinsic planner also allows a wide generalization of the learned movement, such as generalization to different initial or goal positions. The DMPs on the other hand have to use heuristics for this generalization (Pastor et al., 2009), which had the consequence that the robot typically fell over in a new situation. In this case relearning is needed for the DMPs while the PMPs allow to reuse the learned parameters.

In traditional SOC methods (Todorov and Li, 2005; Kappen, 2007; Toussaint, 2009) the intrinsic cost function is typically hand-crafted. In contrast we learn the cost function from experience. We considered a general class of motor skill learning tasks where only a scalar reward is observed for the whole movement trajectory. Thus, with PMPs this external sparse reward signal is used to learn the intrinsic cost function. We applied the second order stochastic search method CMA (Hansen et al., 2003) for finding appropriate intrinsic cost functions. In this paper we focused on the representation of movements, and placed less emphasis on a specific policy search method. We want to point out again that our method does not depend on the used policy search method, any episode-based exploring policy search method can be used. We also do not want to argue for using episode-based exploring methods for policy search, however, as our experiments show, these methods provide useful alternatives to the more commonly used step-based approaches such as the PI<sup>2</sup> algorithm (Theodorou et al., 2010). Future work will concentrate on more grounded approaches for extracting immediate costs from a sparse reward signal. This can also be of interest for imitation learning, where we do not know the immediate costs used by the demonstrator, but often we can evaluate the demonstrators behavior by an external reward signal.

The planner requires to know the system dynamics, which we also learn from data. As shown by our experiments this can be done without significant loss of performance. Hence, our approach combines model-based and model-free RL. As in model-based RL, we learn a system model to plan with. Modelfree RL is used as method to search for appropriate intrinsic cost functions. We used the LWR (Atkeson et al., 1997) for learning the system dynamics as it is a very simple and effective approach. Future work will also concentrate on more complex robot models where more sophisticated methods like Vijayakumar et al. (2005), Nguyen-Tuong et al. (2008a), and Nguyen-Tuong et al. (2008b) could be applied for model learning.

In our experiments the number of phases was fixed (*N* = 2). It was assumed as prior knowledge and can model the complexity of the movement representation (Similarly, the complexity of DMPs can be scaled by the number of Gaussian activation functions). During our experiments we also evaluated the balancing task with up to *N* = 5 phases, but more than 2 phases did not improve the quality of the learned policy. One via-point on the other hand was not sufficient to describe the movement.

# **REFERENCES**


*Machine Learning*, (Montreal, QC: ACM), 401–408.


A promising topic for future investigation is the combination of primitives in order to achieve several tasks simultaneously. This is still a mostly unsolved problem for current movement representations. Because of the non-linear task demands and system dynamics a naive linear combination in trajectory space usually fails. Here, our PMPs offers new opportunities. New movements can be inferred by a linear combination of cost functions, which results in a non-linear combination of the policies for the single tasks.

# **ACKNOWLEDGMENTS**

This paper was written under partial support by the European Union project FP7-248311 (AMARSI) and project IST-2007- 216886 (PASCAL2). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

*Robotics and Automation ICRA*, (San Francisco, CA), 995–1001.


skills with policy gradients. *Neural Netw.* 21, 682–697.


changes in task-relevant movement variability. *J. Neurosci.* 25, 7169–7178.


*18th International Conference on Artificial Neural Networks, Part I*, *(ICANN 2008)* (Berlin;Heidelberg: Springer-Verlag), 407–416.

Williams, R. J. (1992). Simple statistical gradient-following algorithms for connectionist reinforcement learning. *Mach. Learn.* 8, 229–256.

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 12 October 2012; accepted: 04 December 2012; published online: 02 January 2013.*

*Citation: Rückert EA, Neumann G, Toussaint M and Maass W (2013) Learned graphical models for probabilistic planning provide a new class of movement primitives. Front. Comput.* *Neurosci. 6:97. doi: 10.3389/fncom. 2012.00097*

*Copyright © 2013 Rückert, Neumann, Toussaint and Maass. This is an openaccess article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

# **APPENDIX**

### **A. DYNAMIC MOVEMENT PRIMITIVES**

The most prominent representation for movement primitives (MPs) used in robot control are the DMP (Schaal et al., 2003). We therefore used the DMPs as a baseline in our evaluations and will briefly review this approach in order to clarify differences to our work. For our experiments we implemented an extension of the original DMPs (Pastor et al., 2009), which considers an additional term in the dynamical system which facilitates generalization to different target states. For more details we refer to Schaal et al. (2003) and Pastor et al. (2009).

DMPs generate multi-dimensional trajectories by the use of non-linear differential equations. The basic idea is to a use for each degree-of-freedom (DoF) of the robot a globally stable, linear dynamical system which is modulated by learnable non-linear functions *f* :

$$\tau \dot{z} = \alpha\_{\bar{z}} \beta\_{\bar{z}} (\bar{y} - \bar{y}) - \alpha\_{\bar{z}} z - \alpha\_{\bar{z}} \beta\_{\bar{z}} (\bar{y} - \bar{y}\_{1}) s + f,\\ \tau \dot{y} = z, \bar{z}$$

where the desired final position of the joint is denoted by *g* and the initial position of the joint is denoted by *y*1. The variables *y* and *y*˙ denote a desired joint position and joint velocity, which represent our movement plan. The temporal scaling factor is denoted by τ and α*<sup>z</sup>* and β*<sup>z</sup>* are time constants. The non-linear function *f* directly modulates the derivative of the internal state variable *z*. Thus, *f* modulates the desired acceleration of the movement plan. *s* denotes the phase of the movement.

For each DoF of the robot an individual dynamical system, and hence an individual function *f* is used. The function *f* only depends on the phase *s* of a movement, which represents time, τ˙*s* = −α*ss*. The phase variable *s* is initially set to 1 and will converge to 0 for a proper choice of τ and α*s*. With α*<sup>s</sup>* we can modulate the desired movement speed. The function *f* is constructed of the weighted sum of *K* Gaussian basis functions *<sup>i</sup>*

$$f(s) = \frac{\sum\_{i=1}^{K} \Psi\_i(s)\omega\_i s}{\sum\_{i=1}^{K} \Psi\_i(s)}, \quad \Psi\_i(s) = \exp(-\frac{1}{2\sigma\_i^2}(s - c\_i)^2).$$

As the phase variable *s* converges to zero also the influence of *f* vanishes with increasing time. Hence, the dynamical system is globally stable with *g* as point attractor.

In our setting, only the linear weights *wi* are parameters of the primitive which can modulate the shape of the movement. The centers *ci* specify at which phase of the movement the basis function becomes active and are typically equally spaced in the range of *s* and not modified during learning. The bandwidth of the basic functions is given by σ<sup>2</sup> *i* .

Integrating the dynamical systems for each DoF results into a desired trajectory **y***t*, **y**˙*t* of the joint angles. We will use an inverse dynamics controller to follow this trajectory (Peters et al., 2008). The inverse dynamics controller receives the desired accelerations **q**¨des as input and outputs the control torques **u**. In order to calculate the desired accelerations we use a simple decoupled linear PD-controller

$$
\ddot{\mathbf{q}}\_{\rm des} = \text{diag}(\mathbf{k}\_{\rm pos})(\mathbf{y}\_t - \mathbf{q}\_t) + \text{diag}(\mathbf{k}\_{\rm vel})(\dot{\mathbf{y}}\_t - \dot{\mathbf{q}}\_t). \tag{7}
$$

Unfortunately standard inverse dynamics control did not work in our setup because we had to deal with control limits of multidimensional systems. Thus, we had to use an inverse dynamics controller which also incorporates control constraints. For this reason we performed an iterative gradient ascent using the difference between the actual (using constrained controls) and the desired accelerations **q**¨des as error function. This process was stopped after at most 25 iterations.

For our comparison, we will learn the linear weights **w** for each DoF as well as the controller gains **k**pos and **k**vel, i.e., **θ** = [**w**1,...,**w***D*, **k**pos, **k**vel]. This results into *KD* + 2*D* parameters for the movement representation, where *D* denotes the number of DoF of the robot.

# **B. TASK SETTINGS AND PARAMETERS**

In this section the MP parameters and constants are specified for the one-dimensional via-point task and for the humanoid balancing task.

### **B.1 ONE-DIMENSIONAL VIA-POINT TASK**

For the one-dimensional via-point task the parameters of the Dynamic Movement Primitives are listed in **Table A1**. The valid configuration space for the policy search algorithm is listed in **Table A2**. The CMA policy search algorithm has just one parameter, the exploration rate. Where the best exploration rate using DMPs for this task found was 0.05.

The limits of the parameterization of the Planning Movement Primitives (PMPs) (see Equation 4) is listed in **Table A3**. For the via-point task we choose *N* = 2, where the second via-point *g*[*N*] = *gT* was given. The exploration rate was set to 0.1 in all experiments.


### **Table A2 | Via-point task: DMP policy search configuration parameters.**


# **Table A3 | Via-point task: PMP policy search configuration parameters with** *i* **= 1,2.**


# **B.2 DYNAMIC HUMANOID BALANCING TASK**

The DMP parameters for the balancing task are listed in **Table A4**. The policy search parameters are the same like for the via-point task, **Table A2**. The exploration rate was set to 0.1.

The PMPs were again evaluated with *N* = 2 via-points, where the second via-point *g*[*N*] = *gT* (the up-right robot posture) was given and for the first via-point the valid joint angle configuration is shown in Table 1 in Rückert and Neumann (2012). The exploration rate was 0.1 and the policy search algorithm configuration is listed in **Table A5**.

In the generalization experiment we applied the same learned policy of the 4-link balancing task to different initial states. For different initial arm joint configurations the push modulated with *F* resulted in different initial joint velocities, which are shown in **Table A6**.


# **Table A5 | Balancing task: PMP policy search configuration parameters with** *i* **= 1***,* **2.**


*Vector 1 denotes a 4-dimensional column vector, where all elements are equal to 1.*

# **Table A6 | Initial velocities (multiplied by 102***/F* **) for different initial states of the arm joint φ0arm .**


# Model selection for the extraction of movement primitives

# *Dominik M. Endres\*, Enrico Chiovetto and Martin A. Giese*

*Section Computational Sensomotorics, Department of Cognitive Neurology, CIN, HIH, BCCN, University Clinic Tübingen, Tübingen, Germany*

### *Edited by:*

*Andrea D'Avella, IRCCS Fondazione Santa Lucia, Italy*

### *Reviewed by:*

*Stefano Panzeri, Italian Institute of Technology, Italy Vincent C. K. Cheung, MIT, USA*

### *\*Correspondence:*

*Dominik M. Endres, Section Computational Sensomotorics, Department of Cognitive Neurology, CIN, HIH, BCCN, University Clinic Tübingen, Otfried-Müller-Str. 25, 72076 Tübingen, Germany e-mail: dominik.endres@ klinikum.uni-tuebingen.de*

A wide range of blind source separation methods have been used in motor control research for the extraction of movement primitives from EMG and kinematic data. Popular examples are principal component analysis (PCA), independent component analysis (ICA), anechoic demixing, and the time-varying synergy model (d'Avella and Tresch, 2002). However, choosing the parameters of these models, or indeed choosing the type of model, is often done in a heuristic fashion, driven by result expectations as much as by the data. We propose an objective criterion which allows to select the model type, number of primitives and the temporal smoothness prior. Our approach is based on a Laplace approximation to the posterior distribution of the parameters of a given blind source separation model, re-formulated as a Bayesian generative model. We first validate our criterion on ground truth data, showing that it performs at least as good as traditional model selection criteria [Bayesian information criterion, BIC (Schwarz, 1978) and the Akaike Information Criterion (AIC) (Akaike, 1974)]. Then, we analyze human gait data, finding that an anechoic mixture model with a temporal smoothness constraint on the sources can best account for the data.

**Keywords: motor primitives, blind source separation, temporal smoothing, model selection, laplace approximation, bayesian methods, movement primitives**

# **1. INTRODUCTION**

In recent years substantial experimental evidence has been provided that supports the hypothesis that complex motor behavior is organized in modules or simple units called movement primitives (Flash and Hochner, 2005; Bizzi et al., 2008). In this framework each module, or motor primitive, consists of a set of movement variables, such as joint trajectories (Santello et al., 1998; Kaminski, 2007) or muscle activations (d'Avella et al., 2006; Chiovetto et al., 2010) acting synergistically over time. By combination of small numbers of these primitives complex motor behaviors can be generated. Several methods have been used so far in the literature for the identification of motor primitives starting from experimental data sets, which include both well-known classical unsupervised learning techniques based on instantaneous mixture models, such as principal component analysis (PCA) and independent component analysis (ICA) (Chiovetto et al., 2010; Dominici et al., 2011), or even more advanced techniques that include the estimation of temporal delays of the relevant mixture components (d'Avella et al., 2006; Omlor and Giese, 2011). On the one hand, all these approaches differ from each other in multiple aspects, such as their underlying generative models or the specific priors imposed on the parameters. On the other hand, however, for all of them the number of primitives to be extracted and subsequently used to approximate the original data has to be set a priori. To our knowledge only very few motor control studies have so far addressed the problem of model selection in a principled way, see e.g., Delis et al. (2013); Hart and Giszter (2013) for notable exceptions. The existing generative models for the extraction of motor primitives have indeed been demonstrated to provide a low-dimensional decomposition of the experimental data, but no clear criterion has been developed to objectively determine which model is best suited for describing the statistical features of the data under investigation. We are concerned with two types of statistical features:


Concerning the model order selection, several criteria have been developed. Most of them require the computation of the likelihood function (Schwarz, 1978; Akaike, 1987; Basilevsky, 1994; Minka, 2000; Zucchini, 2000) and attempt to determine the right model order as the one that offers the best trade-off between accuracy of data fitting and complexity of the model. Our approach uses this trade-off in a more general setting. Such information criteria were proven to identify with almost no error the model order of noisy data sets when these were corrupted with Gaussian noise, but performances were shown to be noticeably worse when data were corrupted with signal-dependent noise (Tresch et al., 2006), which is actually thought to affect strongly the neural control signals (Harris and Wolpert, 1998). In this article we present a new objective criterion for model-order selection that extends the other classical ones based on information-theoretic and statistical approaches. The criterion is based on a Laplace approximation of the posterior distribution of the parameters of a given blind source separation method, re-formulated as a Bayesian generative model. We derive this criterion for a range of blind source separation approaches, including for the first time the anechoic mixture model (AMM) described in Omlor and Giese (2011).

We provide a validation of our criterion based on an artificial ground truth data set generated in such a way to present well-known statistical properties of real kinematic data. We show in particular that our method performs at least as well as other traditional model order selection criteria [Akaike's Information Criterion, AIC (Akaike, 1974) and the Bayesian Information Criterion, BIC (Schwarz, 1978)], that it works for both instantaneous and delayed mixtures and allows to distinguish between these given moderately sized datasets, and that it can provide information regarding the level of temporal smoothness of the generating sources.

We finally apply the criterion to actual human locomotion data, to find that, differently from other standard synchronous linear models, a linear mixture of time shiftable components characterized by a specific degree of temporal smoothness is a better account of the data-generating process.

# **1.1. RELATED APPROACHES**

The well-known plug-in estimators, BIC and AIC, have the advantage of being easy to use when a likelihood function for a given model is available. Hence, they are often the first choice for model order estimation, but not necessarily the best one. In Tu and Xu (2011) several criteria for probabilistic PCA (or factor analysis) models were evaluated, including AIC, BIC, MIBS (Minka's Bayesian model selection) (Minka, 2000) and Bayesian Ying-Yang (Xu, 2007). The authors found that MIBS and Bayesian Ying-Yang work best. The approach presented in Kazianka and Pilz (2009) corrected the approximations made in MIBS, which yielded improved performance on small sample sizes. This corrected MIBS performed better than all other approaches tested in that paper, including AIC and BIC.

The authors of Li et al. (2007) estimated the number of independent components in fMRI data with AIC and minimum description length [MDL, (Rissanen, 1978)], which boils down to BIC. They showed that temporal correlations adversely affect the accuracy of standard complexity estimators, and proposed a subsampling procedure to remove these correlations. In contrast, we demonstrate below how to deal with temporal dependence as a part of our model. Another MDL-inspired approach, code length relative to a Gaussian prior (CLRG) was introduced in Plant et al. (2010) to compare different ICA approaches and model orders. It was demonstrated to work well on simulated data without the need of choosing additional parameters, such as thresholds, and it was shown that it is able to recover task-related fMRI components better than heuristic approaches.

Such heuristic approaches typically utilize some features of the reconstruction error (or conversely, of the variance-accounted-for (VAF)) as a function of the model order, e.g., finding a "knee" (inflection point) in that function, a procedure which is inspired by the scree test for factor analysis (Cattell, 1966). For example, the authors of Cheung and Xu (1999) experimented with an empirical criterion for ICA component selection. The independent components were ordered according to their contribution to the reduction of reconstruction error. Only those independent components were retained that had a large effect on this error. Similarly, the approach of Sawada et al. (2005) used "unrecovered power," which is basically reconstruction error, to determine which components of a (reverberant) mixture are important. The work in Valle et al. (1999) compared various criteria for PCA component selection on real and simulated chemical reactor data, finding that some of the heuristic reconstruction-error based methods still perform well when PCA model assumptions are violated by the data-generating process.

To distinguish convolutive (but undelayed) mixtures from instantaneous ones, the work in Dyrholm et al. (2007) employed the framework of Bayesian model selection for the analysis of EEG data. Related to our approach, the authors of Penny and Roberts (2001) derived Laplace approximations to the marginal likelihood of several ICA model classes for model selection and model order determination. Their work is conceptually similar to our approach, but we also consider delayed mixtures.

All approaches reviewed so far are deterministic in nature. There are also sampling methods available for model selection purposes, see Bishop (2007) for details. One example is e.g., the work of Ichir and Mohammad-Djafari (2005) which used importance sampling and simulated annealing for model-order selection of L1-sparse mixtures.

# **2. MATERIALS AND METHODS**

We develop our model (order) criterion in the framework of Bayesian generative model comparison Bishop (2007). Let *D* be observable data, -*<sup>M</sup>* a tuple of model parameters for a model indexed by *M* (the "model index") and a tuple of hyperparameters. Using standard terminology, we denote

$$\text{likelihood} : \mathfrak{p} \,(\!D \vert \Theta \!M, \!\Phi, M) \tag{1}$$

$$\text{prior}: p\left(\Theta\_M|\Phi, M\right). \tag{2}$$

The likelihood is the probability density of the data given the model parameters, model index and hyperparameters. The parameter prior is the probability density of the model parameters. Then the marginal likelihood of *M*, or model evidence for *M* is given by

$$\begin{aligned} \mathcal{P}(D|\Phi, M) &= \int d\Theta\_M \mathcal{P}(D, \Theta\_M|\Phi, M) \\ &= \int d\Theta\_M \mathcal{P}(D|\Theta\_M, \Phi, M) \mathcal{P}(\Theta\_M|\Phi, M) \end{aligned} \quad (3)$$

where the second equality follows from the product rule for probability distributions. Strictly speaking, the would have to be integrated out as well after choosing a suitable prior for them. However, to keep the problem tractable we determine their value by maximizing the model evidence with respect to them, finding that this yields sufficiently good approximations for our purposes. Once we have evaluated Equation 3 for all *M*, we can select that *M* which maximizes the model evidence, since we have no *a-priori* preference for any *M*.

To apply this model selection framework, we reformulate three popular blind source separation (BSS) methods, namely probabilistic PCA (pPCA), ICA and anechoic demixing as generative models in section 2.1. This reformulation allows us to evaluate their likelihoods and parameter priors. We then use a Laplace approximation (Laplace, 1774) to compute an approximation to the marginal likelihood of each model. This approximation is derived in section 2.2.

### **2.1. GENERATIVE MODELS OF BLIND SOURCE SEPARATION METHODS**

The BSS methods we consider all assume a linear generative model in discrete time, where observable data **X** can be written as a linear superposition of sources **S** multiplied by weights **W**. Let *t* = 1,..., *t* be the *T* (equally spaced) time points, *i* = 1,..., *I* the source index, and *j* = 1,..., *J* the signal index. Note that *J* could also be interpreted as a trial index, i.e., one signal repeated *J* times, or any combination of trials and signals. For the models we consider, there is no formal difference between "trial" and "signal," as opposed to e.g., the time varying synergy model (d'Avella et al., 2006). Then **X** is a (*J* × *T*) matrix, **S** is (*I* × *T*) and consequently **W** must be (*J* × *I*) so that

$$\mathbf{X} = \mathbf{W}\mathbf{S} + \mathbf{Z} \tag{4}$$

$$
\Sigma\_{\rm jet} \sim \mathcal{N}\left(0, \sigma\_n^2\right) \tag{5}
$$

where the entries of the noise matrix  are drawn independently from a Gaussian distribution with zero mean and variance σ<sup>2</sup> *<sup>n</sup>*. In an anechoic (delayed) mixture, the sources additionally depend on the signal index *j* (see section 2.1.3 for details).

The differences between the BSS approaches can be expressed as priors on **S** and **W**, which we describe in the following.

## *2.1.1. Probabilistic PCA (pPCA)*

PCA is one of the most widely used BSS approaches. In Tipping and Bishop (1999), it was demonstrated how PCA results from a probabilistic generative model: assuming the data have mean zero (i.e., ∀*j* : *<sup>t</sup>* **X***jt* = 0), and using an independent zero-mean Gaussian prior on the sources, i.e.,

$$\mathbf{S}\_{it} \sim \mathcal{N}\left(\mu = 0, \sigma^2\right) \tag{6}$$

the weights **W** which maximize the marginal likelihood of **X** after integrating out **S**, are given by the scaled (and possibly rotated) principal *I* eigenvectors of the (*J* × *J*) data covariance matrix, 1 *<sup>T</sup>* **XX***T*. This model differs from PCA insofar as the sources will only be equal to the PCA factors in the noise-free limit σ*<sup>n</sup>* → 0, and is hence referred to as *probabilistic* PCA (Tipping and Bishop,

**FIGURE 1 | Graphical model representations of the blind source separation algorithms for which we compute a model evidence approximation.** We follow standard graphical modeling terminology (see e.g., Bishop, 2007). Open circles represent random variables, which may also be random functions. Filled circles are parameters. Arrows denote conditional dependencies. The *plates* (colored frames) indicate that the enclosed structure is repeated as often as the corresponding letter indicates. Enclosure in multiple plates indicates a product of repetitions. For example, in panel **(A)** there are *I* × *T* random variables *S* which comprise the source matrix. **(A)** Instantaneous, undelayed mixtures such as pPCA (where λ = 0) and ICA. *J* × *T* signals *X* are computed by mixing *I* × *T* sources *S* with *J* × *I* weights *W*. σ*<sup>w</sup>* is the standard deviation of the zero-mean Gaussian prior on the weights. σ*<sup>n</sup>* is the noise standard deviation. μ and σ are the parameters of the Gaussian part of the prior on the sources, λ measures the deviation from Gaussianity. **(B)**. Convolutive, delayed mixtures, like the anechoic mixture of Omlor and Giese (2011) with additional temporal smoothness constraints. *I* source functions *S*(*t*) are drawn from a Gaussian process *GP*(μ(*t*), *<sup>k</sup>*(*t*, *<sup>t</sup>* )) with mean function μ(*t*) and kernel *k*(*t*, *t* ). These sources are shifted in time by *J* × *I* many delays (one per trial and source) drawn from an exponential distribution with parameter γ and mixed with *J* × *I* weights *W* which are drawn from a zero-mean Gaussian distribution with standard deviation σ*<sup>w</sup>* , to yield *J* signals *X*(*t*). For details, see text.

1999). Similarly, when we put a prior on the weights

$$\mathbf{W}\_{ji} \sim \mathcal{N}\left(\mathbf{0}, \sigma\_{\mathbf{w}}^{2}\right) \tag{7}$$

and integrate them out, we find that the best *I* sources **S** are the principal eigenvectors of the (*T* × *T*) data covariance matrix 1 *<sup>J</sup>* **<sup>X</sup>***<sup>T</sup>* **X** (assuming zero mean signals at every time step, i.e., ∀*t* : *<sup>j</sup>* **X***jt* = 0). We will therefore use both priors for a completely probabilistic pPCA model<sup>1</sup> .

A graphical model representation of pPCA is shown in **Figure 1A**. Open circles represent random variables, which may

<sup>1</sup>We will refer to this model interchangeably by pPCA or just PCA in the following.

also be random functions. Filled circles are parameters. Arrows denote conditional dependencies. The *plates* (colored frames) indicate that the enclosed structure is repeated as often as the corresponding letter indicates. Enclosure in multiple plates indicates a product of repetitions. Thus, in a pPCA model, *I* × *T* sources are *a-priori* drawn independently of each other (μ and σ are parameters, not random variables), and source values have no dependencies across time. Likewise, weights have no dependencies across sources or signals. In contrast, data points depend on both weights and sources, as indicated by the arrows converging on **X** from **S** and **W**.

Given the generative model (Equation 4) and the prior specification (Equation 6 and Equation 7), we can now write down the likelihood and prior terms which we need for the evaluation of the model evidence (Equation 3). To this end, we identify the number of sources *I* with the model index *M*, and (cf. Equation 3)

$$D = \mathbf{X} \tag{8}$$

$$
\Theta \mathbb{M} = (\mathbb{W}, \mathbb{S}) \tag{9}
$$

$$\Phi = (\mu, \sigma, \sigma\_{\w}, \sigma\_{n})\tag{10}$$

$$p\left(D|\Theta\_M, \Phi, M\right) = \frac{\exp\left(-\frac{1}{2\sigma\_n^2} \|\mathbf{X} - \mathbf{W}\mathbf{S}\|\_F\right)}{\sqrt{2\pi\sigma\_n^2}^{IT}}\tag{11}$$

$$p(\Theta\_M|\Phi, M) = \frac{\exp\left(-\frac{1}{2\sigma^2} \|\mathbf{S}\_{it} - \mu \cdot \mathbf{1}\_{\mathbf{IT}}\|\_F\right)}{\sqrt{2\pi\sigma^2}^{IT}}$$

$$\times \frac{\exp\left(-\frac{1}{2\sigma\_w^2} \|\mathbf{W}\|\_F\right)}{\sqrt{2\pi\sigma\_w^2}^{II}}\tag{12}$$

where **1***IT* is an (*I* × *T*) matrix with every element being 1, and **A***<sup>F</sup>* is the Frobenius norm of matrix **A**.

# *2.1.2. Independent Component Analysis (ICA)*

The term ICA refers to a variety of BSS methods which try decompose signals into sources with two main goals:


Infomax ICA (Bell and Sejnowski, 1995) tries to achieve these goals by maximizing the mutual information (Cover and Thomas, 1991) between sources and signals, which clearly promotes the second goal. The first one is promoted if the BSS system contains an information bottleneck, e.g., fewer sources than signals. In that case, maximizing mutual information amounts to maximizing the total source entropy, which is achieved if the sources are independent.

The FastICA algorithm (Hyvarinen, 1999) aims directly at minimizing the mutual information between the sources, thereby promoting goal one. Goal 2 is achieved by constraining the (linear) transformation from signals to the sources to be invertible, or at least almost invertible in the noisy or lossy case, such that the signals can be reconstructed using the generative model above (Equation 4). Mutual information is measured via *negentropy*, which is the negative difference between the entropy of a source and the entropy of a variance-matched Gaussian variable, i.e., it is a measure of non-Gaussianity. Maximizing negentropy then minimizes mutual information. To measure negentropy, the authors of Hyvarinen (1999) used the "contrast function" approach developed in Hyvärinen (1998). Contrast functions provide constraints on expectations of probability distributions, in addition to the mean and variance constraints of Gaussians. Consequently, the maximum entropy distributions obeying these constraints have the contrast function(s) as sufficient statistics, with an associated natural parameter, which controls the deviation of the resulting distribution from a Gaussian. For a detailed derivation see Hyvärinen (1998). This motivates the following source prior for probabilistic ICA models: let *G*(.) be the contrast function, then

$$\rho\_{\varepsilon}(\mathbf{S}\_{it}) = \frac{1}{Z(\mu, \sigma, \lambda)} \exp\left(-\frac{1}{2\sigma^2} (\mathbf{S}\_{it} - \mu)^2 + \lambda \, G(\mathbf{S}\_{it} - \mu)\right) (13)$$

where λ is the natural parameter associated with *G*(.). The normalization constant *Z*(μ, σ, λ) can be evaluated by numerical integration, since the prior is a density over a one-dimensional random variable. Similar to pPCA, we use a Gaussian prior on the weights. The graphical model representation of ICA is the same as for pPCA (see **Figure 1A**), since there is no a-priori dependency between sources or weights across time.

We can now identify the number of sources *I* with the model index *M*, and furthermore (cf. Equation 3)

$$D = \mathbf{X} \tag{14}$$

$$
\Theta\_M = (\mathbf{W}, \mathbf{S}) \tag{15}
$$

$$\Phi = (\mu, \sigma, \sigma\_{\w}, \sigma\_{\mathfrak{n}}, \lambda) \tag{16}$$

$$p(D|\Theta\_M, \Phi, M) = \frac{\exp\left(-\frac{1}{2\sigma\_n^2} \|\mathbf{X} - \mathbf{W}\mathbf{S}\|\_F\right)}{\sqrt{2\pi\sigma\_n^2}^{JT}}\tag{17}$$

$$\begin{split}p(\Theta\_M|\Phi, M) &= \prod\_{i,t} p(\mathbf{S}\_{it})\\ &\times \frac{\exp\left(-\frac{1}{2\sigma\_w^2} \|\mathbf{W}\|\_F\right)}{\sqrt{2\pi\sigma\_w^2}^H} \end{split} \tag{18}$$

# *2.1.3. Anechoic mixture models (AMM) and smooth instantaneous mixtures (SIM)*

AMMs may be seen as an extension of the above BSS approaches to deal with time-shifted sources (Omlor and Giese, 2007a). Such time shifts are obviously useful in motor control, where coordinated movement patterns, such as gaits, might be characterized by opposite joints moving in a similar manner but time-shifted against each other (e.g., the legs during walking); the well-known time-varying synergy model (d'Avella et al., 2006) is a kind of AMM. The generative models of AMMs are linear with additive Gaussian noise (similar to Equation 4), but the sources *Si*(*t*) are shifted by delays τ*ji*, which are the elements of a (*J* × *I*) matrix τ. We draw these delays from an exponential prior with mean γ, which promotes delays that differ sparsely from zero.

$$\mathbf{X}\_{jt} = \sum\_{i} \mathbf{W}\_{ji} \mathbf{S}\_{i} (t - \mathbf{r}\_{ji}) + \eta\_{jt}$$

$$= \sum\_{i} \hat{\mathbf{X}} + \eta\_{jt} \tag{19}$$

$$
\eta\_{jt} \sim \mathcal{N}(0, \sigma\_n) \tag{20}
$$

$$p(\mathbf{r}\_{jl}) = \chi \exp\left(-\frac{\mathbf{r}\_{ij}}{\chi}\right) \tag{21}$$

where we define the matrix of the reconstructed signals **X**ˆ as **X**ˆ *ji* = *<sup>i</sup>* **W***jiSi*(*t* − τ*ji*) . Moreover, we impose soft temporal regularity constraints on the sources. To this end, we draw the sources from a Gaussian process (GP) (Rasmussen and Williams, 2006) with mean function μ(*t*) and covariance (or kernel) function *k*(*t*,*t* ). A GP is a prior over functions *S*(*t*) where the joint distribution of any finite number of function values at times *t*1,..., *tN* follows a multivariate Gaussian distribution i.e.,

$$\vec{S} = (S(t\_1), \dots, S(t\_N))\tag{22}$$

$$\vec{\mu} = (\mu(t\_1), \dots, \mu(t\_N))\tag{23}$$

$$\mathbf{K}\_{mn} = k(\mathbf{t}\_m, \mathbf{t}\_n) \tag{24}$$

$$
\vec{S} \sim \mathcal{N}(\vec{\mu}, \mathbf{K}) \tag{25}
$$

Thus, the choice of kernel function determines how much the function values at different points tend to co-vary a priori. Throughout this paper, we will use kernel functions of the form

$$k(t, t') \propto \text{sinc}(2f\_0|t - t'|) = \frac{\sin(2\pi f\_0|t - t'|)}{2\pi f\_0|t - t'|} \tag{26}$$

which is also called *wave kernel* (Genton, 2001) in the machine learning literature. This choice is motivated by the observation that the inverse Fourier transform of an ideal low-pass filter with cutoff-frequency *f*<sup>0</sup> is proportional to this kernel. Thus, functions drawn from a GP with this kernel will vary on timescales comparable to *f*0, see **Figure 2** for examples. Note, however, that the regularization provided by the kernel is "soft": when learning sources from small datasets, they will have the smoothness properties given by the kernel. For large datasets, the kernel regularization may be overridden by the data.

With this prior, the matrix of reconstructed signals **X**ˆ and using as model index the tuples *M* = (*I*, *f*0) we find

$$D = \mathbf{X} \tag{27}$$

$$\Theta\_{\mathsf{M}} = (\mathsf{W}, \mathsf{r}, \mathsf{S}\_{\mathsf{I}}(t), \dots, \mathsf{S}q(t)) \tag{28}$$

$$\Phi = \left(\mu(t), k\_{\not\!\! 0}(t, t'), \sigma\_n, \sigma\_w\right) \tag{29}$$

$$p(D|\Theta\_M, \Phi, M) = \frac{\exp\left(-\frac{1}{2\sigma\_n^2} \|\mathbf{X} - \hat{\mathbf{X}}\|\_F\right)}{\sqrt{2\pi\sigma\_n^2}^{JT}}\tag{30}$$

$$p(\Theta\_M|\Phi, M) = \frac{\exp(-\frac{1}{2}\mathbf{S}\_i\mathbf{K}^{-1}\mathbf{S}\_i^T)}{\sqrt{2\pi}^T\sqrt{|\mathbf{K}|}}\prod\_{j,i}\chi\exp\left(-\frac{\mathbf{r}\_{ji}}{\chi}\right)$$

**FIGURE 2 | Examples of kernel functions (left) and sources (right) drawn from a Gaussian process prior with the corresponding kernel.** Throughout this paper, we use shift-invariant kernels of the form *k*(*t*, *t* ) ∝ sinc(2*f*0|*t* − *t* |). **Top row:** Kernel function for *f*<sup>0</sup> = 1Hz (left) and source function drawn from a Gaussian process with that kernel. The source varies rather smoothly on a timescale comparable to *f*0. **Bottom row:** Kernel function and source for *f*<sup>0</sup> = 3Hz.

$$\times \frac{\exp\left(-\frac{1}{2\sigma\_w^2} \|\mathbf{W}\|\_F\right)}{\sqrt{2\pi\sigma\_w^2}^{II}}\tag{31}$$

where **S***<sup>i</sup>* is the i-th row of the *undelayed* source matrix, i.e., the matrix of the source functions sampled at times *t* = 1,..., *T*. A graphical model representation of AMM is shown in **Figure 1B**.

As a special case of the AMM model above, we consider the case ∀*i*, *j* : τ*ji* = 0, i.e., a mixture without delays, but GP-induced temporal regularization. In the following, we refer to this as the *smooth instantaneous mixture*, or SIM.

### **2.2. LAPLACE APPROXIMATION**

We now turn to the evaluation of the model evidence, Equation 3. The difficult part, as usual in Bayesian approaches, is the integral over the model parameters - (we drop the index *M* in the following for notational simplicity, since we evaluate the model evidence for each *M* separately). Instead of an exact solution, we therefore resort to a *Laplace approximation* (Laplace, 1774; Bishop, 2007). To use this approach, concatenate the into a vector and then construct a saddle-point approximation (Reif, 1995) of intractable integrals of the form

$$\int d\Theta \exp\left(-f(\Theta)\right) \tag{32}$$

assuming that *f*(-) has a single, sharply peaked minimum at some -<sup>∗</sup> <sup>=</sup> argmin*f*(-) and is twice continuously differentiable. In this case, only exponents close to the minimal exponent *f*(-<sup>∗</sup>) will make noticeable contributions to the integral. Hence, we can approximate *f*(-)*locally* around -<sup>∗</sup> by a Taylor expansion

$$f(\Theta) \approx f(\Theta^\*) + \nabla\_{\Theta^\*} f(\Theta)^T (\Theta - \Theta^\*) + \frac{1}{2} (\Theta - \Theta^\*)^T \mathbf{H} \ (\Theta - \Theta^\*) \tag{33}$$

where

$$\mathbf{H}\_{\mu\nu} = \left. \frac{\partial^2 f(\Theta)}{\partial \theta\_{\mu} \partial \theta\_{\nu}} \right|\_{\Theta^\*}$$

is the Hessian matrix of the 2nd derivatives evaluated at -∗. Since -<sup>∗</sup> is the location of the minimum of *f*(-), it follows that ∇-<sup>∗</sup> *f*(-)*<sup>T</sup>* = 0 and **H** is positive (semi-)definite. Thus

$$f(\Theta) \approx f(\Theta^\*) + \frac{1}{2}(\Theta - \Theta^\*)^T \mathbf{H} \ (\Theta - \Theta^\*) \tag{34}$$

and we can approximate the integral as

$$\begin{aligned} &\int d\Theta \exp\left(-f(\Theta)\right) \\ &\approx \int d\Theta \exp\left(-f(\Theta^\*) - \frac{1}{2}(\Theta - \Theta^\*)^T \mathbf{H} \left(\Theta - \Theta^\*\right)\right) \\ &= \exp\left(-f(\Theta^\*)\right) \int d\Theta \exp\left(-\frac{1}{2}(\Theta - \Theta^\*)^T \mathbf{H} \left(\Theta - \Theta^\*\right)\right) \\ &= \exp\left(-f(\Theta^\*)\right) \frac{(2\pi)^{\frac{p}{2}}}{\sqrt{|\mathbf{H}|}} \end{aligned}$$

where *F* = dim(-) is the dimensionality of -. For the derivation of our model comparison criterion, we will need the logarithm of this integral:

$$\log\left(\int d\Theta \exp\left(-f(\Theta)\right)\right) \approx -f(\Theta^\*) + \frac{F}{2}\log(2\pi) - \frac{1}{2}\log\left(|\mathbf{H}|\right) \,. \tag{35}$$

In summary, the Laplace approximation replaces the intractable integral with differentiation, which is always possible for the models we consider.

To approximate the model evidence (Equation 3) in this way, let

$$\Theta^\* = \operatorname{argmin}\_{\Theta} \left[ -\log(p(D|\Theta, \Phi, M)) - \log(p(\Theta|\Phi, M)) \right] \tag{36}$$

in other words, -<sup>∗</sup> are the parameters which maximize the likelihood subject to the regularization provided by the parameter prior. Furthermore, denote

$$\mathbf{H}\_{\mu\nu} = -\left. \frac{\partial^2 \log(p(D|\Theta, \Phi, M))}{\partial \Theta\_{\mu} \Theta\_{\nu}} \right|\_{\Theta^\*} $$
 
$$ -\left. \frac{\partial^2 \log(p(\Theta|\Phi, M))}{\partial \Theta\_{\mu} \Theta\_{\nu}} \right|\_{\Theta^\*} \tag{37} $$

and thus

$$p(D|\Phi, M) \approx \underbrace{\log(p(D|\Theta^\*, \Phi, M))}\_{\text{log-likelihood}} + \underbrace{\log(p(\Theta^\*|\Phi, M))}\_{\text{log-prior}}.$$

$$+\underbrace{\frac{\dim(\Theta)}{2}\log(2\pi)-\frac{1}{2}\log(|\mathbf{H}|)}\_{\text{log-posterior-volume}}\tag{38}$$

which we will refer to as the *LAP* criterion for model comparison: the larger LAP, the better the model. It comprises three parts, which can be interpreted: the log-likelihood measures the goodness of fit, similar to explained variance or VAF. The second term is the logarithm of the prior, which corresponds to a regularization term for dealing with under-constrained solutions for when the datset is small. Finally, the third part measures the volume of the parameter posterior, since **H** is the posterior precision matrix (inverse covariance) of the parameters in the vicinity of -<sup>∗</sup>, i.e., it indicates how well the data constrain the parameters (large |**H**| means small posterior volume, which means is well-constrained).

We will compare the LAP criterion to two standard model complexity estimators below (see section 3): BIC and AIC. BIC is given by

$$\text{BIC} = -2\left(\log(p(D|\Theta^\*, \Phi, M)) - \frac{1}{2}\text{dim}(\Theta)\log(N)\right) \tag{39}$$

where *N* is the number of data-points. The best model is found by minimizing BIC w.r.t. *M*. BIC can be obtained from LAP in the limit *N* → ∞, by dropping all terms from LAP which do not grow with *N* and multiplying by −2. Assuming that the model has no latent variables (whose number typically grows with *N*), the terms to be dropped from Equation 38 are the log-prior, the first term of the posterior volume, and the second term of the Hessian (Equation 37). For i.i.d. observations, the determinant of the first term of the Hesssian will typically grow like *Nc*dim(-) where *c* is some constant independent of *N*. Hence, the BIC follows. While this reasoning is somewhat approximate (a rigorous derivation can be found in Schwarz (1978)), it highlights that we might expect LAP to become more similar to BIC as the dataset increases.

AIC is originally derived from information-theoretic arguments (Akaike, 1987): a good model loses only a small amount of information when approximating (unknown) reality. When information is measured by Kullback-Leibler divergence (Cover and Thomas, 1991), AIC follows. Alternatively, it also obtained by choosing a model complexity prior which depends on *N* and dim(-) (Burnham and Anderson, 2004) and is given by

$$\text{AIC} = -2\left(\log(p(D|\Theta^\*, \Phi, M)) - \dim(\Theta)\right). \tag{40}$$

Like BIC, a good model has a low AIC score.

## **2.3. ASSESSMENT OF CRITERION PERFORMANCE**

To validate our criterion we assessed its performance on synthesized data sets with well-known statistical properties and on actual kinematic data collected from human participants during a free walking task. We also compared the results with those provided by AIC and BIC. Before applying the model selection criteria we factorized each available data set according to the mixture models Equation 4 and Equation 19. The identification of the parameters was carried out in two phases: first, we applied singular value decomposition to identify the principal components (for PCA), or fastICA (Hyvarinen, 1999) or the anechoic demixing algorithm mentioned above (Omlor and Giese, 2011) to yield weights and sources. Second, we used these solutions to initialize an optimization of the corresponding likelihood function, to determine the optimal parameters -<sup>∗</sup> and hyperparameters needed for the Laplace approximation. The optimization in the second step was carried out using the L-BFGS-B routine in the SciPy package (Jones et al., 2001) for -<sup>∗</sup>, was then reestimated for fixed -<sup>∗</sup>. This second optimization was necessary for two reasons: the statistical reformulations of pPCA and ICA will yield solutions which are very similar, but not identical to the original algorithms, and the AMM method from Omlor and Giese (2011) can not handle temporal smoothness priors. The number of components *I* identified ranged, for all algorithms, from 1 to 8.

### *2.3.1. Ground-truth data generation*

We simulated kinematic-like data (mimicking, for instance, jointangle trajectories) based on the generative models Equation 4 and Equation 19 that is linear combinations of *I* primitives that could be synchronous (SIM) or shifted in time (AMM). For the generation of each primitive *Si*(*t*) we drew 100 random samples from a normal distribution (MATLAB (2010) function "randn") and then we low-pass filtered them with a 6th-order Butterworth filter [MATLAB (2010) functions "butter" and "filtfilt"]. Two cut-off frequencies were used for filtering, respectively, 5 and 10 Hz, to simulate data with two different frequency spectra. Sampling frequency of the data was assumed to be 100 Hz. This procedure allowed to generate band-limited sources mimicking actual kinematic or kinetic trajectories of time duration *T* = 1 s. We generated artificial mixture data by combining a number of sources ranging from 1 to 4. Combination coefficients of the mixing matrix **W** were generated from a uniform continuous distribution in the interval [−10,10]. Temporal delays τ*ji* were drawn, when needed, from an exponential distribution of mean 20. Sets of noisy data were generated by corrupting noiseless data generated as described above with signal dependent noise. Noise was drawn from a Gaussian distribution of variance σ = α |*xi*(*t*)|, where α is the slope of the relationship between the standard deviation and the noiseless data values *xi*(*t*) (Sutton and Sykes, 1967; Schmidt et al., 1979; van Beers et al., 2004). The slope α was computed though an iterative procedure. Starting from α = 0, its value was iteratively increased by a predefined increment until a desired noise-level 1 − *R*<sup>2</sup> was reached and stayed constant for at least 10 consecutive computations of 1 − *R*<sup>2</sup> given the same value of α. We define *R*<sup>2</sup> as follows: as the artificial noiseless data sets and their corresponding noisy versions are multivariate timeseries, a measure of similarity (typically a ratio of two variances) must be defined using a multivariate measure of data variability. We used the "total variation" (Mardia et al., 1979), defined as the trace of the covariance of the signals, to define a multivariate measure as follows:

$$R^2 = 1 - \frac{\left\|\mathbf{X}\_{noiseless} - \mathbf{X}\_{noise}\right\|^2}{\left\|\mathbf{X}\_{noiseless} - \overline{\mathbf{X}}\_{noiseless}\right\|^2} \tag{41}$$

where **X**noiseless is the matrix of the noiseless data set, **X**noisy the noisy data, and where **X**noiseless is a matrix with the mean values of the noiseless data over trials. For each noiseless data set, two datasets were generated with 1 − *R*<sup>2</sup> levels equal to 0.15 and 0.3, corresponding to approximate signal-to-noise ratios of 22 dB and 15 dB, respectively. We thus generated 2 models (AMM/SIM) x 2 cut-off frequencies (5 Hz/10 Hz) × 4 number of sources x 3 levels of noise = 48 different data sets. Each of those datasets contained *J* ∈ {5; 10; 25} (data) trials. A "trial" (one row of the matrix **X** in Equation 4) is a one-dimensional time-series sampled at *T* points in time. For reliable averages, we drew 20 data sets for each number of trials.

# *2.3.2. Actual kinematic data*

We applied the model selection criteria to select also the model of a second data set consisting of movement trajectories of human actors walking neutrally, or with different emotional styles (happy and sad). This data was originally recorded for the study presented in Roether et al. (2008). The movements were recorded using a Vicon (Oxford, UK) optoelectronic movement recording system with 10 infrared cameras, which recorded the three-dimensional positions of spherical reflective markers (2.5 cm diameter) with spatial error below 1.5 mm. The 41 markers were attached with double-sided adhesive tape to tight clothing, worn by the participants. Marker placement was defined by the Vicon's PlugInGait marker set. Commercial Vicon software was used to reconstruct and label the markers, and to interpolate short missing parts of the trajectories. Sampling rate was set at 120 Hz. We recorded trajectories from six actors, repeating each walking style three times per actor. A hierarchical kinematic body model (skeleton) with 17 joints was fitted to the marker positions, and joint angles were computed. Rotations between adjacent body segments were described as Euler angles, defining flexion, abduction and rotation about the connecting joints. The data for the BSS methods included only the flexion angles of the lower body joints, specifically right and left pelvis, hips, knees and ankles, since the other angles had relatively high noise levels. From each trajectory only one gait cycle was extracted, which was time normalized. This resulted in a data set with 432 samples with a length of 100 time points each. It was already shown previously (Omlor and Giese, 2007a,b) that an anechoic mixture model is more efficient than synchronous models for the representation of such kinematic data. To test the capability of the new LAP criterion to confirm such an observation we applied temporal shift to each trajectory of the data set. Each delay corresponding to a specific trajectory was drawn from a continuous uniform statistical distribution in the interval [−20,20], the sign of the delay determining the shift direction (forwards or backwards), signals were wrapped around at the boundaries of the 100 time-point interval.

# **3. RESULTS**

We first present the evaluation of the three model selection criteria, LAP, BIC and AIC on the ground truth data described above in section 2.3.1. The evaluation is done with respect to three questions: how well can the generator type be detected (AMM or SIM), how accurate is the number of sources *I* estimation, and whether the amount of temporal smoothness [i.e., *f*<sup>0</sup> in Equation 26] can be determined. Second, we analyze the human gait data.

# **3.1. GROUND TRUTH EVALUATION**

# *3.1.1. Model type detection*

We measure the accuracy with which the generating model can be detected by the classification rate, averaged across generating and estimated number of sources, the estimated *f*<sup>0</sup> and the 20 data sets per condition. It is given by

$$\text{classification rate} = \frac{\text{number of correct detections}}{\text{total number of trials}} \qquad (42)$$

The results are summarized in**Table 1** for each number of trials *J*. LAP clearly outperforms BIC and AIC, particularly for small *J*. To understand where this difference comes from, **Figure 3** shows a detailed analysis of the results for *J* = 10 trials. The anechoic generator is correctly detected by both LAP and BIC in most cases, whereas AIC often mistakes it for a pPCA model. The SIM generator, on the other hand, is only detected by LAP, both BIC and AIC mistake it for a pPCA model. Hence, LAP achieves very high classification rates, BIC is wrong about half the time, and AIC is even worse. This is due to the terms in BIC and AIC which punish complex models (second terms in Equation 39 and Equation 40, respectively): they only depend on the *number* of degrees of freedom and the number of data-points, but do not measure the effects of any "soft" constraints. Since such soft constraints will have a reducing effect on the likelihood, BIC and AIC will prefer models without such soft constraints over those with constraints. Consequently, BIC and AIC select pPCA over SIM. Note that in the limit of *f*<sup>0</sup> → ∞, the kernel of the SIM model will give rise to a diagonal covariance matrix **K**, and thus uncorrelated sources, whereas the **K** for finite *f*<sup>0</sup> will impose a correlational constraint. Thus, the SIM model will turn into a pPCA model in this limit.

In contrast, the LAP criterion measures the effect of the source correlations via the log-prior and log-posterior-volume terms. If the posterior is concentrated in a region of parameter space where the prior is high, the effects of the reduced likelihood can be counterbalanced. Since we evaluated the LAP criterion for *f*<sup>0</sup> ∈ {5 Hz,10 Hz}, one of the tested SIM models will match the generator and have a correspondingly high LAP score.

# *3.1.2. Estimating the number of sources*

Next, we looked at how well the criteria are suited for estimating the number of sources. **Table 2** shows the average difference between estimated and generating number of sources, averaged across noise levels, number of generating sources and *f*0s. An empty cell indicates that this model would have been picked by the above model type detection only very infrequently.

Particularly for a small number of trials *J*, LAP is closer to the correct number of sources than BIC or AIC. For larger number of trials, the results between BIC and LAP become more similar, which is to be expected, even though BIC does not detect the correct model type. Moreover, the average number of sources estimated by LAP is always within one standard deviation of 0, and these standard deviations are mostly smaller than those of BIC and AIC.

**Table 1 | Model type classification rates of the three tested criteria, for number of trials between 5 (top) and 25 (bottom).**


*1* − *R<sup>2</sup> is the noise level from Equation 41.*

 *indicates the best criterion for each row. LAP consistently outperforms BIC and AIC, mostly because the latter two are unable to distinguish between a smooth instantaneous mixture and a pPCA model (see also Figure 3). Furthermore, for 5 trials BIC and AIC tend mistake an anechoic mixture for an ICA model, leading to model type classification rates which are virtually zero.*

The dependency of the estimated number of sources on the noise level is depicted in **Figure 4** for *J* = 10. Also unsurprisingly, the estimated number of sources decreases with increasing noise level, since noisy data contain less information about the generating process.

# *3.1.3. Temporal smoothness constraints*

In section 3.1.1, we showed that LAP is the only criterion which can detect the presence of temporal smoothness constraints. Now we investigate whether it can also identify the amount of smoothness, i.e., *f*<sup>0</sup> in Equation 26. To this end, we computed the LAP score for 16 smoothness settings: {1Hz, 2Hz,..., 15Hz}, and also without smoothness constraint (i.e., effectively a pPCA model). We select the optimal smoothness setting for each dataset, and compute the average deviation to the generator smoothness (either 5 or 10 Hz) across all numbers of generating and estimated sources. The results are summarized in **Table 3** for all noise levels, **Figure 5** shows the detailed distributions for *J* = 10 trials. Except for the noiseless anechoic case, the correct temporal smoothness is found with average deviations near zero and standard deviations < 1.5Hz. We have as of yet no explanation for the overestimation in the noiseless anechoic case, but speculate that it is due to some jitter in the estimated delays of the anechoic model, which can be "explained away" by allowing for high-frequency components in the sources. As soon as noise is present in the data, this effect disappears.

# **3.2. HUMAN GAIT ANALYSIS**

Having confirmed the validity of the LAP criterion on the synthetic ground truth, we now turn to real data. Since we are interested in smoothness properties as well as model types, we carried out a comparison between PCA and ICA; and SIM and AMM models with different *f*<sup>0</sup> ∈ {1Hz;...; 12Hz}. The results

are summarized in **Figure 6**, top, where the simple "Anechoic" model (dark blue) is an AMM without smoothness constraints. As might be expected, AMMs are the best models (within our tested models) for this kind of data. Furthermore, a correctly chosen *f*<sup>0</sup> increases the LAP score significantly, i.e., the soft constraint provided by the smoothing kernel is an important feature of these kinematic data, see **Figure 6**, bottom. The best AMM has 3 sources, whereas the best SIM model needs 5, and has a lower score (see **Figure 6**, bottom).

LAP is an approximation of the marginal log-probability of the data [cf. Equation 3]. The best SIM model and the best AMM differ by a LAP score of ≈ 46, which translates into a probability ratio of *<sup>P</sup>*(AMM) *<sup>P</sup>*(SIM) <sup>&</sup>gt; <sup>10</sup>19. The best PCA model (5 sources) has a LAP score which is lower than the 7 Hz AMM score by ≈ 600.



*Shown is the difference between the best number of sources determined with a given criterion (LAP, BIC, or AIC) and the number of sources in the generator. Gen. is the generating model, either anechoic (AMM) or smooth instantaneous (SIM) mixture. Anal. is the analysis algorithm. An empty cell indicates that a model comparison criterion (LAP, BIC, AIC) would have picked the corresponding analysis algorithm with a chance of less than 10% (cf. Figure 3 and Table 1). For rows with more than one entry, indicates best criterion.*

Note that individual datasets consisted of *J* = 8 trials (one per joint angle), therefore models with more than 8 sources are a priori too complex. This fact is also detected correctly by LAP, which assigns a roughly linearly decreasing score (exponentially decreasing in marginal probability) to models with ≥ 8 sources.

# **4. DISCUSSION**

In this study, we attempted to develop a more objective probabilistic criterion for motor primitive model selection. Our criterion turned out to be more reliable than other already existing classical criteria (cf. sections 1.1 and 3) in selecting the generative model underlying a given data set, as well as in determining the corresponding dimensionality. The criterion can moreover provide accurate information about soft constraints, here the smoothness of the temporal evolution of the signals.

We tested LAP performance on synthesized, kinematic-like data and on actual motion capture trajectories. However, motor primitives have also been identified at the muscle level (Bizzi et al., 2008), where usually the signals are rectified after collection. As we tested LAP only on data with unconstrained signs, its applicability to positive-only data, such as EMG recordings, is a subject for further investigations.

The application of the criterion to emotional gait trajectories suggested the anechoic model as the most suitable description of the data. This result is in agreement with previous findings from our lab (Omlor and Giese, 2007b), where it was demonstrated that the anechoic model can represent emotional gait data more

**FIGURE 4 | Estimating the number of sources in the ground truth from** *<sup>J</sup>* **<sup>=</sup> 10 trials.** *<sup>I</sup>* <sup>=</sup>estimated-generating number of sources. 1 <sup>−</sup> *<sup>R</sup>*2: noise level from Equation 41. Error bars are ± one standard deviation. Symbol shapes stand for analysis algorithms, colors indicate the selection criterion. **Top panel**: Anechoic generator. If an AMM is used for analysis, BIC and LAP perform comparably well within the error bars, AIC tends to overestimate. For incorrect analysis models (SIM/ICA/PCA), all criteria overestimate the number of sources, since the extra variability provided by the time shifts needs to be explained via additional sources in instantaneous mixture models. Note, however, that a model type determination step based on BIC or LAP would have ruled out an instantaneous mixture with high probability (cf. **Figure 3** and **Table 2**). **Bottom panel:** For the instantaneous mixture (bottom panel) all three criteria give good results when using PCA or SIM models.

efficiently (in terms of data compression) than other classical synchronous models. Also the best number of primitives determined by LAP is in line with Omlor and Giese (2007b), where three components were found capable to explain about 97% of the total data variation. Interestingly, the criterion suggested a temporal smoothness regularization with *f*<sup>0</sup> = 7 Hz. Such a value may at first seem to be in contradiction with the step frequency of normal walking behavior that tends to be around 2 Hz (Pachi and Ji, 2005). The higher frequency value found by LAP can however be justified by multiple reasons. First, our data comprised also happy walks, which are known to be characterized by higher movement energy (Omlor and Giese, 2007b; Roether et al., 2009) and higher average movement velocity when compared to neutral or sad walks (Omlor and Giese, 2007a; Roether et al., 2009). Therefore, the average frequency power spectrum of the walking trajectories shows indeed considerable power within the band ranging from 0 to 10 Hz, with a peak at 5 Hz. In addition, in **Figure 6** the maximum LAP score occurs at *f*<sup>0</sup> = 7 Hz. However, taking the error bars into account, the LAP score associated with the optimal frequency is not statistically different from that associated with any score in the range [3 Hz, 10 Hz], in agreement with the power spectrum. Another factor contributing to *f*<sup>0</sup> = 7 Hz might be the tendency of LAP to overestimate **Table 3 | Mean temporal smoothness estimation accuracies and standard deviations for LAP criterion, marginalized across number of source of both generator and analysis model.**


*Estimation accuracy is given by best estimated f*<sup>0</sup> *(see Equation 26) as determined by LAP minus actual cutoff frequency (either 5 or 10 Hz). Gen. is the generative model: anechoic (AMM) or smooth instantaneous mixture (SIM). 1* − *R<sup>2</sup> is the noise level (Equation 41). Except for the zero-noise anechoic generator, f0 can be determined to within 1 Hz of its true value. For details, see text.*

the cutoff frequency slightly for nearly noise-free datasets, see **Figure 5**.

Additional and more advanced models of motor primitives, corresponding to a multivariate version of the anechoic mixture model considered in this study, have been developed (d'Avella et al., 2003, 2006) to describe the modular organization associated with EMG data sets. As we have not computed the LAP for these models, they are not among the possible model selection options yet. Future work will therefore aim to formulate the priors and generative models which would allow for the application of LAP to EMG data.

An interesting feature of LAP is its capability to discriminate between instantaneous vs. anechoic mixtures. The importance of introducing temporal delays in the model of a motor behavior has revealed to be crucial in some cases such as, for instance, in the modeling of emotional movements or facial expressions (Roether et al., 2008; Giese et al., 2012). LAP is to our knowledge the first model selection criterion explicitly designed for this.

Another remarkable feature of LAP is its capability, thanks to the addition a smoothness prior, to identify the amount of smoothness in the data, in other words to select the frequency *f*<sup>0</sup> in Equation 26 based on the available data. While a Fourier analysis would also reveal where the power spectrum drops off, LAP has the advantage of providing a principled, quantitative trade-off between smoothness and goodness-of-fit, which allows for a more objective selection of *f*0. However, computing the power spectrum could be a first step to determine the range of *f*0s across which to search for the optimum.

Moreover, incorporating smoothness priors in time and/or space might be a viable extension of LAP to make it suitable to distinguish between a low-dimensional generative model based on time-invariant primitives vs. a model based on space-invariant primitives. Muscle synergies, for instance, have indeed been presented in the literature in those terms. Among

them, "synchronous" synergies (Cheung et al., 2005; Ting and Macpherson, 2005; Torres-Oviedo et al., 2006) have been described as stereotyped co-varying groups of muscles activations, with the EMG output specified by a temporal profile determining the timing of each synergy during task accomplishment. This definition of synergies reflects the idea of invariance across space (namely the space spanned by the muscles) mentioned above. "Temporal" synergies (Ivanenko et al., 2004, 2005;

SIM model with *I* = 1 set to 50) for both models peaks at *f*<sup>0</sup> = 7Hz. However, the best AMM model's approximate posterior probability is larger than the best SIM posterior by a factor of ≈ 1019. For details, see text.

Chiovetto et al., 2010, 2012), are instead defined as temporal activation profiles that can be simply linearly combined together to reconstruct the actual activity of each muscle. Such a definition of synergies is therefore incorporating a notion of invariance across time. Also more "hybrid" definition of primitives have been given. "Time-varying" synergies (d'Avella et al., 2003, 2006), for instance, are defined as spatio-temporal pattern of muscle activations, with corresponding EMG output determined by the scaling coefficients and time delays associated with each synergy. Chiovetto et al. (2013) already showed heuristically what movement features these definitions of synergies are describing. Although this knowledge can surely help to decide, dependent on the kind of analysis that one needs to carry out, which kind of synergies to extract from a given EMG data set, it however, does not provide a systematic criterion for such a decision. An extension of LAP might help here, too.

To apply LAP to a given source extraction method, it is necessary to (re)formulate this method in the language of generative probabilistic models. Only when the joint probability of the data and all latent variables (such as **W** or **S**) is available can Equation 38 be evaluated. Furthermore, since LAP results from a second-order approximation to the exponent of that joint probability, LAP will only yield (approximately) correct answers if such an approximation is valid. While the possibility of reformulating a given method can usually be decided *a-priori*, the validity of the second-order approximation typically needs testing on ground-truth data.

In conclusion, we presented an innovative and objective criterion that can be used to reliably select an adequate factorization model to explain the variance associated with kinematic/dynamic data and its corresponding dimensionality. We showed LAP to perform better than two plug-in estimators, BIC and AIC. It needs, however, to be extended to be used in the future for additional types of data, such as EMG data.

# **ACKNOWLEDGMENTS**

This work was supported by the EU Commission, 7th Framework Programme: EC FP7-ICT-249858 TANGO, EC FP7-ICT-248311 AMARSi, FP7-ICT-2013-10/611909 Koroibot, Deutsche Forschungsgemeinschaft: DFG GI 305/4-1, DFG GZ: KA 1258/15-1, German Federal Ministry of Education and Research: BMBF, FKZ: 01GQ1002A, European Commission, Fp 7-PEOPLE-2011-ITN(Marie Curie): ABC PITN-GA-011-290011, The FP7 Human Brain Flagship Project. All graphics were prepared with matplotlib (Hunter, 2007). We acknowledge support by Deutsche Forschungsgemeinschaft and Open Access Publishing Fund of Tübingen University.

### **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 30 September 2013; paper pending published: 18 November 2013; accepted: 05 December 2013; published online: 20 December 2013.*

*Citation: Endres DM, Chiovetto E and Giese MA (2013) Model selection for the extraction of movement primitives. Front. Comput. Neurosci. 7:185. doi: 10.3389/fncom. 2013.00185*

*This article was submitted to the journal Frontiers in Computational Neuroscience. Copyright © 2013 Endres, Chiovetto and Giese. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Do muscle synergies reduce the dimensionality of behavior?

#### *Naveen Kuppuswamy1 \* and Christopher M. Harris <sup>2</sup>*

*<sup>1</sup> Artificial Intelligence Laboratory, Department of Informatics, University of Zürich, Zürich, Switzerland*

*<sup>2</sup> Centre for Robotics and Neural Systems and Cognition Institute, Plymouth University, Plymouth, Devon, UK*

### *Edited by:*

*Andrea D'Avella, IRCCS Fondazione Santa Lucia, Italy*

### *Reviewed by:*

*Amir Karniel, Ben-Gurion University, Israel Evangelos Theodorou, Georgia Institute of Technology, USA*

### *\*Correspondence:*

*Naveen Kuppuswamy, Artificial Intelligence Laboratory, Department of Informatics, University of Zürich, Andreasstrasse 15, 8046 Zürich, Switzerland e-mail: naveenoid@ifi.uzh.ch*

The muscle synergy hypothesis is an archetype of the notion of Dimensionality Reduction (DR) occurring in the central nervous system due to modular organization. Toward validating this hypothesis, it is important to understand if muscle synergies can reduce the state-space dimensionality while maintaining task control. In this paper we present a scheme for investigating this reduction utilizing the temporal muscle synergy formulation. Our approach is based on the observation that constraining the control input to a weighted combination of temporal muscle synergies also constrains the dynamic behavior of a system in a trajectory-specific manner. We compute this constrained reformulation of system dynamics and then use the method of system balancing for quantifying the DR; we term this approach as Trajectory Specific Dimensionality Analysis (TSDA). We then investigate the consequence of minimization of the dimensionality for a given task. These methods are tested in simulations on a linear (tethered mass) and a non-linear (compliant kinematic chain) system. Dimensionality of various reaching trajectories is compared when using idealized temporal synergies. We show that as a consequence of this Minimum Dimensional Control (MDC) model, smooth straight-line Cartesian trajectories with bell-shaped velocity profiles emerged as the optima for the reaching task. We also investigated the effect on dimensionality due to adding via-points to a trajectory. The results indicate that a trajectory and synergy basis specific DR of behavior results from muscle synergy control. The implications of these results for the synergy hypothesis, optimal motor control, motor development, and robotics are discussed.

**Keywords: modular motor control, muscle synergies, dimensionality reduction, system balancing, Hankel singular values, optimal motor control**

# **1. INTRODUCTION**

There is increasingly a consensus that the solution to the *Degree of Freedom (DoF) Problem* of Bernstein (1967) involves some form of Dimensionality Reduction (DR) resulting from modularization, although it is unclear how exactly this occurs. Of the many kinds of modules that have been proposed (Flash and Hochner, 2005), the muscle synergy hypothesis, typified by coordinated activation of groups of muscles, has in recent times emerged as one of the front runners (Alessandro et al., 2013). Spatio-temporal regularities in activation patterns across many muscles that seemingly are task and subject independent is usually cited as evidence for DR in the muscle synergy hypothesis (d'Avella et al., 2003; Hart and Giszter, 2004; Ivanenko et al., 2004; Ting and Macpherson, 2005; Tresch et al., 2006). Nevertheless, a recurring criticism of the hypothesis is its phenomenological nature and difficulty of falsification (Tresch and Jarc, 2009; Kutch and Valero-Cuevas, 2012). One approach toward validating the hypothesis, is to develop a well grounded theoretical understanding of the functionality offered by muscle synergies for neural control.

Although various formulations have been proposed for muscle synergies in literature (Chiovetto et al., 2013), there are some common characteristics to the various models: (1) there is a task-specific recruitment of task-independent modules; (2) the synergies themselves are considered as input-space generators (d'Avella et al., 2003); (3) suggested in some formulations that the number of modules available for recruitment represents a DR of the control input (Ting, 2007; Chiovetto et al., 2013); (4) there is a linearization of the highly non-linear control problem (Alessandro et al., 2012). From a computational viewpoint, each of these features facilitate real-time control and speed up motor learning. However, from a control perspective, modularization could also potentially constrict the functionality of the system. Consequently, investigators have begin to examine the theoretical basis (Berniker et al., 2009; Alessandro et al., 2012) and the feasibility of experimentally extracted synergies for task control (Ting and Macpherson, 2005; Neptune et al., 2009; McKay and Ting, 2012; de Rugy et al., 2013). We propose that this task-space perspective (Alessandro et al., 2013) must be extended to also incorporate the ability of a given set of muscle synergies to reduce behavior dimensionality. Muscle synergies must be evaluated both for task performance and effectiveness as a reduced dimensional controller. In the context of this paper, we denote behavior dimensionality as simply the (apparent) state-space dimensionality of the dynamics of the motor behavior.

The necessity for reducing behavior dimensionality is best seen from the viewpoint of optimal control theory. Observations of a number of regularities in biological movements that are seemingly task-independent have lead to the claim of optimality principles underlying motor control. One the one hand, several investigators have attempted to uncover empirical rules governing motor behaviors such as the *Fitt's* law, 2/3rd power law (Viviani and Flash, 1995), or the bell-shaped velocity profiles of reaching behaviors (Morasso, 1981). Alternately, the socalled complete models (Todorov and Jordan, 1998) have instead suggested that these features are a consequence of minimizing some performance index; several such candidate indices have been proposed, such as energy, force, accuracy, time, peak acceleration, torque changes etc. (Flash and Hogan, 1985; Harris, 1998b; Todorov, 2004). Nevertheless, it is unclear how organisms might autonomously acquire the optimal behavior; i.e., how the neural instantiation of optimality occurs. Developmental motor hypotheses instead suggest that this optimal control is acquired through an ontogenetic learning strategy (Vereijken et al., 1992; Sporns and Edelman, 1993; Ivanchenko and Jacobs, 2003); typically involving some form of progressive exploration of state-space by an organism. There is also evidence for some form of adaptive optimization mechanism underlying motor control learning (Izawa et al., 2008; Wolpert et al., 2011). However, regardless of the actual mechanism underlying motor learning, large state-space dimensionality has a critical impact on the tractability of iteratively acquired optimal behavior, i.e., the *learnability* of the control (Kuppuswamy and Harris, 2013). DR in this case might have a vital role to play in guaranteeing a tractable developmental acquisition of control. We contend that a control strategy composed of synergies, in addition to input dimensionality reduction, must also facilitate a reduction in the dimensionality of the state-space relevant to the optimal motor control problem. This entails an analysis of the DR resulting from the constraints placed on the dynamics due to muscle synergy control.

The reduced dimensional control perspective on muscle synergies was investigated by Berniker et al. (2009), who proposed a time-invariant synergy synthesis technique that utilized the method of system balancing (Lall and Marsden, 2002) for DR of the dynamics. A task variable relevant reduced dimensional dynamic model was generated from an accurate musculo-skeletal model of a frog's leg. This reduced dimensional model was used for synergy synthesis and control planning. Although the method yields synergies that closely correspond with those extracted experimentally, it must be noted that the time-invariant synergy formulation does not conveniently encode the temporal complexity of natural behaviors. For instance, in the analysis of locomotor movements it has been shown that temporal synergies (Ivanenko et al., 2003, 2004, 2005) are more effective in capturing the temporal aspects of the muscle activation patterns at various instances within a gait cycle. Temporal synergies are characterized by a dominant timing sequence that are seemingly independent of sensory feedback (Ivanenko et al., 2005). The synergies can then be interpreted as a pool of task-independent fixed temporal patterns that are selectively recruited in a task-dependent manner for generating the necessary muscle activation (Chiovetto et al., 2013). This formulation has also been used to model motor skill development; an increasing pool of synergies is seemingly employed by adults when compared with infants (Dominici et al., 2011), or in allowing increased behavioral complexity (Ivanenko et al., 2005). Therefore, we use the temporal synergy formulation for exploring the DR in motor behaviors of a system. The control input is composed of a weighted combination of task-independent orthonormal basis patterns as synergies—the weight matrix uniquely specifies the behavior (trajectory) of the system. This enables us to extend the procedure of Berniker et al. (2009) to generate both a task variable, as well as synergy basis relevant analysis of the DR of motor behavior.

In this paper, we first develop a method for the analysis of the constraints placed on the dynamics due to temporal muscle synergy control. For a given dynamical system, where the set of synergies, and the weight matrix corresponding to a given trajectory are pre-specified, a "*constrained reformulation*" of the dynamics is computed. This is a trajectory, and synergy basis specific constrained reformulation of the dynamics where the temporal synergies are treated as control inputs triggered for the duration of the movement. We then quantify the DR by using the approach of system balancing (Moore, 1981; Hahn and Edgar, 2002; Lall and Marsden, 2002). This approach preserves the features of the dynamics that are most relevant to control; the subspace of the state that is most affected by the input (control variables) and in turn has the greatest effect on the output (task variables) is identified.

Our proposed Trajectory Specific Dimensionality Analysis (TSDA) obtains both the dimensionality of this subspace and the corresponding reduced-dimensional dynamics of the system following a given trajectory. We then demonstrate that synergies can contribute to a DR in behaviors, however, the resulting reduction is specific to the synergy basis utilized and the trajectory that is followed in order to realize the task. We test our methods in simulations on two kinds of systems: (1) a linear system composed of a tethered mass, and (2) a non-linear compliant kinematic chain, and contrast the DR in performing reaching tasks in various trajectories. Idealized temporal synergies composed of Legendre polynomial and the Fourier bases are used for the experiments.

We then examine the consequences of reducing the dimensionality of a given task to the greatest extent possible. A cost function for quantifying the dimensionality is developed using the system balancing measure of Hankel Singular Values (HSV). Numerical minimization of this cost function obtains the weight matrix, and the corresponding trajectory, that best minimizes the dimensionality while satisfying the task constraints. This control model of Minimum Dimensional Control (MDC) is tested in the simulated linear and non-linear systems for two kinds of tasks: (1) reaching tasks, (2) via-point tasks. From the results it can be seen that smooth trajectories with bell-shaped velocity profiles emerges as the optima. Furthermore, we show that the velocity profiles of the trajectories are dependent on the temporal synergy basis that is employed. The similarity of the resulting trajectories to experimentally observed human behaviors lead us to hypothesize that a dimensionality reduction principle might underlie motor control.

We introduce our approaches in the following way: In section 2 we first outline the temporal synergy control problem and review dimensionality reduction and system balancing. Subsequently we derive the TSDA and our proposed minimization model of MDC. This is followed by a description of the simulation setup and experiments in section 2.5 and the results in section 3. We then discuss the implications in section 4.

# **2. MATERIALS AND METHODS**

We first introduce some basic formalism to the optimal control problem. Consider the following representation of the neuromechanical dynamics,

$$\mathbf{y}(t) = h(\mathbf{x}, t), \quad \dot{\mathbf{x}} = f(\mathbf{x}, t) + g(\mathbf{x}, \mathbf{u}, t), \tag{1}$$

where the variables **x**(*t*) denotes the state, **u**(*t*) the input, and **y**(*t*) the output. For this system the state-space dimensionality can be described by **x**(*t*) ∈ R*N*, the input by **u**(*t*) ∈ R*Ni* , the output by **y**(*t*) ∈ R*No* and *Ni* and *No* need not be equal to *N*. We utilize a continuous-time deterministic control system description, so **u** can be considered to lie in the infinite dimensional space of continuous functions. Let us define this system by *F*(*f*( · ), *g*( · ), *h*( · )), where, *F* ∈ -, a space of sufficiently regular (continuously differentiable) functions.

Although in this paper, we consider **u**(*t*) to be input joint torques or actuator forces, the approach is unaffected if muscle activation dynamics are instead incorporated. The aim of control in the system *F* is to influence the behavior in order to satisfy task requirements. For the scope of this paper, we simply define behavior as the trajectory followed by the system in accomplishing a task. A task *T* is then denoted by a set of Cartesian constraints that must be obeyed, i.e., by the tuple *C<sup>T</sup>* = {**y***<sup>T</sup>* (*td*) = **y***<sup>T</sup> td* , **x**˙*<sup>T</sup>* (*td*) = ˙**x***<sup>T</sup> td* }. The constraints are specified by a set of boundary conditions on the behavior such as zero endpoint velocity for reaching, or as a discrete set of via-points to be followed.

A trajectory is then denoted by *T*, one of the many possible unique paths in the task space satisfying all of the task constraints *CT*. For this system, from an engineering perspective, the feedforward control problem is to compute the function (or policy) **u**(*t*) = *ff*(, **x**(*t*0)). Let us denote then **u**(*t*) ∈ *U* as the set of admissible control inputs that satisfy the desired objectives *C<sup>T</sup>* . There may exist multiple solutions for the task, i.e., multiple trajectories, and therefore the cardinality of *U* could be considered to be greater than 1. This relation is the well-known *redundancy problem* of motor control, i.e., there is a non-univocal relationship between observed movements and input actuation (Bernstein, 1967).

Many investigators have suggested that the solution to the redundancy problem arises from minimizing some form of cost function *J*(**x**(*t*), **u**(*t*), *t*)—i.e., an underlying optimization principle to motor control. Typically such cost functions have been justified by citing various biologically relevant factors that impact survival such as energy requirements, accuracy, stability of control etc. (Hogan, 1984; Harris and Wolpert, 1998; Todorov and Jordan, 2002).

The optimal control approach to solve this problem typically is based on methods such as solutions to Hamilton-Jacobi-Bellman (HJB) equation or the Pontryagin Minimum Principle (PMP) (Bertsekas, 1995). However, it may not always be possible to obtain analytical solutions to problems—complexity of plant dynamics and the requirement for accurate dynamic models have been major issues. Also, proponents of optimality in biological motor control do not really address how an organism might autonomously acquire optimal solutions. It is instead implied that some form of motor learning or adaptation at different time scales allows the acquisition of optimal behavior (Wolpert et al., 2011). Several developmental theories, such as the Bernstein's threestage learning model (Bernstein, 1967) have been put forward to how this might be autonomously acquired through a process of state-space exploration. In this context, state-of-art methods in artificial systems such as iterative optimal control and the algorithms of reinforcement learning (Sutton and Barto, 1998) have proved to be a popular alternative to analytical optimal control techniques and have found many applications in areas such as robotics (Kober et al., 2013).

Regardless of the actual mechanism of neural learning, for the system in Equation (1), the complexity of control learning is dictated by a number factors such as the dimensionality of the input *Ni*, the dimensionality of the goal *No*, the temporal complexity of the goal trajectory **y***<sup>T</sup>* (*td*), the complexity of the cost function *J*(**x**(*t*), **u**(*t*),*t*) and finally the dimensionality of the state, *N*. For even moderate dimensional systems, this represents a serious limitation on the tractability of computing an appropriate control policy. Also non-linearities in the functions *f*( · ), *g*( · ), and *h*( · ) can further complicate the problem.

Even from a neuroscientific perspective, most investigations in optimal motor control have focussed on relatively simpler models approximating real musculo-skeletal structures (Harris, 1998b). However, optimal control models such as the minimum energy, minimum torque change, minimum jerk, and the minimum variance may instead be intractable for an organism when confronted with anything more than a moderate number of dimensions. Clearly the redundancy and dimensionality problem is not just a motor neuroscience question but represents a constraint on learning for an organism (Kuppuswamy and Harris, 2013). The famous phrase "*curse of dimensionality*" coined by Bellman (1961) to describe the exponential increase in search space of discrete optimization problems due to dimensionality increase seems appropriate in describing this predicament. DR in this case offers an obvious coping strategy wherein the tractability of control learning can be ensured. It has therefore been suggested that neural architectures must intrinsically incorporate some form of DR such as the muscle synergies. The temporal muscle synergy formulation is introduced next within this framework.

### **2.1. TEMPORAL MUSCLE SYNERGY FORMULATION**

Most models of the muscle synergy hypothesis tackle the DoF problem by constraining the space of control inputs into combinations of predefined primitives. The temporal synergy formulation has the advantage of conveniently delineating the spatial task-dependent and temporal task-independent components of a synergistic control (Alessandro et al., 2013). Temporal synergies are primarily relevant to locomotor tasks and are a direct example of dimensionality reduction in the control input (Ivanenko et al., 2003, 2004, 2006; Cappellini et al., 2006) with relevance to development and evolutionary theories (Ivanenko et al., 2005; Dominici et al., 2011). Chiovetto et al. (2013) tested the equivalence of temporal muscle synergies with the other main formulations of time-invariant and time-varying synergies on reaching task. The temporal synergy model also has the added advantage of allowing interpretation of the temporal components of the muscle activation occurring at different segments of the movement.

In this formulation, the input **u**(*t*) is constrained in the form of a weighted linear combination of *S* synergies ψ*i*(*t*) represented by,

$$\mathbf{u}(t) = \sum\_{i=1}^{S} w\_i \psi\_i(t),\tag{2}$$

which can be rewritten in matrix notation by *W*ˆ (*t*) such that (*t*) = [ψ1(*t*)...ψ*S*(*t*)] *<sup>T</sup>* defines the temporal synergies and the weight matrix *W*ˆ = [*w*<sup>1</sup> ... *wS*] contains the linear combinators approximating a particular input signal **u**(*t*). In the reported models, arbitrary phase shifts are also included in the synergies, however, we do not incorporate them into the analysis presented in this paper.

There is a unique *W*ˆ for a given **u**(*t*) if the functions ψ1(*t*),...ψ*s*(*t*) are linearly independent and *W*ˆ ∈ R*I*×*S*, i.e., the synergies are an orthonormal basis set of the space of inputs. The synergies are specified as a task-independent basis spanning the space of inputs, while the appropriate weight matrix is then computed in a task-dependent manner.

The control learning problem is to obtain the appropriate weight matrix *W*ˆ *<sup>d</sup>* corresponding to a desired task **y***d*(*t*). Due to the reduction in dimensionality, the desired solution is within a space of size *Ni* × *S*. This is a linear space of inputs and therefore learning can be accomplished by a number of tools and superposition can be utilized to generalize to novel problems. The direct approach for trajectory learning in temporal synergies using an inverse dynamic model can be seen in the top part of the schematic in **Figure 1**.

Despite the reduction in dimensionality of inputs, we contend that the complexity of the optimal motor control problem may not necessarily be reduced simply through reduction of input space dimensionality. For instance, if the desired cost function is a function of the state **x**, the state dimensionality is a bottleneck affecting learnability. Also, the specification of the task might have an important role to play in existing methods of quantifying the dimensionality of synergies (de Rugy et al., 2013); i.e., the number of synergies may be insufficient to ensure optimal control or learning convergence.

In the case of synthetic systems, parameterized control policies in this form (sometimes also called as motor primitives Ijspeert et al., 2013) have been successfully applied in planning and control for robotics. Reinforcement learning approaches such as policy gradients (Peters and Schaal, 2008) offer several methods for iteratively updating policy parameters depending on some predefined behavior objective. However, in the synthetic context, several *a priori* design choices must be carefully made in order to ensure convergence of the learning within reasonable time-scales in high-dimensional control problems (Kober et al., 2013); DR is one such approach toward rendering optimal control learning tractable. Clearly, if the policy design itself could facilitate DR of a system for a given task, the learning would in turn be naturally facilitated.

The primary question investigated in this paper is therefore: do temporal muscle synergies reduce the state-space dimensionality of the system in performing motor behaviors? Next, dimensionality reduction and system balancing are briefly introduced.

### **2.2. DIMENSIONALITY REDUCTION AND HANKEL SINGULAR VALUES**

From the control engineering viewpoint, the aim of dimensionality reduction is to simplify the input–output dynamics of a system in order to reduce the complexity of simulation and control optimization. Many algorithms have been proposed for model and controller order reduction (Antoulas et al., 2001) including both analytic and computational methods. Consider the statespace model of a system in Equation (1). The DR problem is the synthesis of an equivalent system given by,

$$\ddot{\mathbf{y}}(t) = h'(\mathbf{z}, t), \quad \dot{\mathbf{z}} = f'(\mathbf{z}, t) + g'(\mathbf{z}, \mathbf{u}, t), \tag{3}$$

where **z**(*t*) ∈ R*K*, and typically the dimensionality of the new state variable *K* < *N*. Note that when driven by input signals **u**(*t*) the output of the reduced system is **y**˜(*t*) is close to **y**(*t*) for some measure of similarity. The dimensionality of the inputs and outputs remain unaffected by the reduction.

We seek a quantification of DR in a system instead of simply reducing it to the form of Equation (3). Therefore, we define the reduced dimensionality of a system by the operator *D*,

$$\mathcal{D}(\mathcal{F}) = D,\tag{4}$$

where *D* ∈ Z+, the space of positive integers. For the system defined in Equation (1), 1 ≤ *D* ≤ *N* for any given measure of dimensionality, or reduction algorithm. Obviously, *D* = *K* for the reduction leading to the system in Equation (3).

In order to achieve this kind of a reduction, the commonly used approach is to compute a projection of the full dimensional state into a lower dimensional subspace. This is defined as a mapping *W*, such that, **z** = *W***x**. Various methods exist for computation of an appropriate *W*, such that certain conditions are met in the input, state and output relationship. We utilize the well known method of system balancing (Moore, 1981) due to its relevance for control and stable numerical properties. System balancing also offers bounds on the approximation errors (Gugercin and Antoulas, 2004) which is crucial for robust controller design.

Through system balancing, we seek to rotate the system coordinates (i.e., the state-space) in order to balance the controllability (difficulty of reaching a state) and observability (difficulty of

observing a state) of the system (Skogestad and Postlethwaite, 1996). This process reorganizes the system by ranking the importance of each of the state variables using a Hankel Singular Value (HSV) measure. They are defined as the square root of the eigenvalues of the product of the controllability (*P*) and observability Gramians (*Q*); measures computed on the dynamics of the system. For a Linear Time Invariant (LTI) system in the form of Equation (1), defined by the matrices in *f*(**x**, *t*) = *A***x**(*t*), *g*(**x**, **u**, *t*) = *B***u**(*t*), and *h*(**x**, *t*) = *C***x**(*t*), analytical formulations exist for the Gramians defined by,

trajectory (and synergy basis) specific constrained reformulation is then obtained and the procedure of system balancing is used to reduce the

$$\mathcal{P} = \int\_0^\infty e^{At} B B^T e^{A^T t} \mathrm{d}t, \quad \mathcal{Q} = \int\_0^\infty e^{A^T t} C^T C e^{At} \mathrm{d}t, \qquad (5)$$

For non-linear systems, there is no analytical solution but instead *Empirical Gramians* may be computed using datasets of system behavior (Lall and Marsden, 2002).

First the system is perturbed in *r* different (input) directions (defined by the set *Tni* = {*T*1,..., *Tr*}, where *Ti TTi* = *I*, *Ti* <sup>∈</sup> *<sup>R</sup>ni*×*ni* , *<sup>i</sup>* <sup>=</sup> <sup>1</sup> ...*r*) at *<sup>s</sup>* different sizes of perturbations in each direction (defined by the set *M* = {*c*1,...,*cs*} where *ci* ∈ *R*, *ci* > 0, *i* = 1 ... *s*) across all the *ni* inputs and across all *n* states (defined by the set of unit vectors *E<sup>n</sup>* = {**e***i*,..., **e***n*}) of the system. Then the empirical Gramians are obtained from the resulting

in the control and learning scheme (on top) to speed up learning and

adaptation in a task-specific manner.

state trajectories as,

$$
\hat{\mathcal{P}} = \sum\_{l=1}^{r} \sum\_{m=1}^{s} \sum\_{i=1}^{p} \frac{1}{rsc^2\_m} \int\_0^{\infty} \Phi^{ilm}(t)dt,
$$

$$
\hat{\mathcal{Q}} = \sum\_{l=1}^{r} \sum\_{m=1}^{s} \frac{1}{rsc^2\_m} \int\_0^{\infty} T\_l \Upsilon^{lm}(t)T\_l^T dt,\tag{6}
$$

where for the controllability Gramian *<sup>P</sup>*ˆ, *ilm*(*t*) <sup>∈</sup> *<sup>R</sup>n*×*<sup>n</sup>* is given by *ilm*(*t*) = (**x***ilm*(*t*) − **x**<sup>0</sup> *ilm*)*T*, for **x***ilm*(*t*) being the state of the non-linear system corresponding to the impulse input **u**(*t*) = *cmTl***e***i*δ(*t*) and for the observability Gramian *<sup>Q</sup>*ˆ, <sup>ϒ</sup>*ilmij*(*t*) <sup>∈</sup> *<sup>R</sup>n*×*<sup>n</sup>* is given by <sup>ϒ</sup>*ilmij*(*t*) <sup>=</sup> (**y***ilm*(*t*) <sup>−</sup> **<sup>y</sup>***ilm* 0)*<sup>T</sup>*(**y***ilm*(*t*) − **y***ilm* 0), and **y***ilm*(*t*) is the output of the system for the initial condition **x**(0) = *cmTl***e***<sup>i</sup>* + **x**0, and **y***ilm* <sup>0</sup> is the steady state output. A detailed description of the non-linear balancing model reduction utilizing the empirical Gramian method can be found in Hahn and Edgar (2002).

These Gramians allow quantification of how controllable and how observable the state variables are; taken together they measure the "importance" of individual state variables and can thus be used for a dimensionality reduction algorithm. For both linear and non-linear systems, the Hankel Singular Values (HSV) of a system σ*HSV* are then obtained as,

$$
\sigma\_{HSV} = \sqrt{\lambda(\mathcal{P}\mathcal{Q})},\tag{7}
$$

where the λ operator yields the eigenvalues of the product matrix, and the resulting set σ*HSV* = [σ<sup>1</sup> ...σ*N*] are the HSVs corresponding to each state variable.

The HSVs can be viewed as a score of the control 'energy' of the state variables. Thus to reduce dimensionality it is sufficient to eliminate the states with a low HSV magnitude. This process can be automated by first obtaining a rotation on the system *T* of the form **x**ˆ = *T***x** which reorders the states in decreasing magnitude of HSV—i.e., system balancing. This results in a transformation of the system to a basis where the transformed states that are easiest to reach (control) are simultaneously easiest to measure (observe). Computational efficient methods exist for linear systems for computing the balancing transform *T* (Laub et al., 1987). Then its possible to truncate the resulting system to the first *K* states—hence the method is called balancing truncation (Moore, 1981). The choice of *K* is typically dependent on the requirements of the controller design and is usually fixed after examination of the HSV magnitudes (Hahn and Edgar, 2002).

If the HSVs are normalized by using the sum, the DR is directly given by,

$$\mathcal{D}\_{\mathcal{H}\mathcal{S}\mathcal{V}}(\mathcal{F}) = \begin{cases} \mathcal{K} & \text{if there exists } \sigma\_{\mathcal{K}} \le t\_r, \\ 1 & \text{otherwise} \end{cases} \tag{8}$$

where the threshold *tr* <sup>∈</sup> <sup>R</sup>+, *tr* <sup>≤</sup> 1, and the resulting *<sup>K</sup>* <sup>∈</sup> <sup>Z</sup>+, with 1 < *K* ≤ *N*. Clearly, this form of DR is dependent on the choice of threshold. In the case of control engineering applications, the threshold is chosen on the basis of careful observation of the system (Antoulas et al., 2001). In our approach, presented next, we present a method to simplify choice of this threshold.

### **2.3. TRAJECTORY SPECIFIC DIMENSIONALITY ANALYSIS (TSDA)**

Through system balancing we can quantify the DR of a system. This is a task-independent quantification and depends on the system properties, for e.g., the passive mechanical properties. However, if DR is to be utilized in order to facilitate learning and real-time control of various tasks, the task-dependent reduction of the state-space must instead be considered.

**Figure 1** depicts the stages of TSDA computation. The first step is to evaluate the constraints on the system dynamics resulting from the constraints placed on the input due to usage of temporal muscle synergies. The system in Equation (1) can now be represented by,

$$\mathbf{y}(t) = h(\mathbf{x}, t), \quad \dot{\mathbf{x}} = f(\mathbf{x}, t) + \hat{\mathbf{g}}(\mathbf{x}, \Psi, t), \tag{9}$$

We term this as a constrained reformulation of the system dynamics where the inputs are the temporal synergies (*t*), and can be viewed as signals which control the onset and termination of the movements for a task. For the duration of the behavior, the dynamics is described by Equation (9) due to the constrained input function *g*ˆ( · ) where,

$$
\hat{\mathcal{g}}(\mathbf{x}, \Psi, t) = \mathcal{g}(\mathbf{x}, \hat{W}\Psi, t). \tag{10}
$$

It must be emphasized that the constrained reformulation only describes a *virtual* system dynamics for the duration of the movement when actuated by the synergistic input (*t*). The state-space, however, has not changed; i.e., the state variable *x* for constrained-reformulated system is the same as the original system. Let us denote the system of Equation (9) by *<sup>F</sup>*ˆ(*f*( · ), *g*ˆ( · ), *h*( · )).

Clearly, *F*ˆ is unique to a given trajectory and given synergy basis set, since it incorporates the weight matrix *W*ˆ corresponding to a trajectory *T* and uses input signals in the form of temporal synergies. Therefore *<sup>F</sup>*<sup>ˆ</sup> can be considered to be a trajectory specific constrained reformulation of the dynamics. Then the trajectory specific dimensionality is given by,

$$\mathcal{D}(\hat{\mathcal{F}}) = D\_T,\tag{11}$$

If *<sup>W</sup>*<sup>ˆ</sup> is computed to solve a given task *<sup>T</sup>* uniquely, Equation (11) gives the DR of the equivalent trajectory that satisfies the task requirements. The TSDA measure can be contrasted against the intrinsic DR of the system of Equation (4), which is task independent.

In this formulation, although any kind of DR algorithm can be utilized for computing *D<sup>T</sup>* , we use the system balancing and HSV based approach due to its relevance for the control problem. HSVs measure the importance of each of the state variables of the system *<sup>F</sup>*<sup>ˆ</sup> for both the outputs (the task) and the inputs (synergy patterns). Thus they quantify the DR of the behaviors that is dependent on the kind of synergy used and the kind of task that is being performed.

In order to compute the DR, it is desirable that the importance of the careful choice of the threshold HSV measure of Equation (8) is reduced. Depending on the structure of the constrainedreformulated system, it can be expected that HSVs computed for different trajectories may be of completely different orders of magnitudes. Even if normalization using the sum of the HSVs is employed, this may complicate the choice of threshold to compare trajectories. Furthermore this could limit the applicability of the method in comparing different kinds of temporal synergies in reducing the dimensionality.

In order to address this issue in our approach, we simply normalize the HSVs after utilizing a cumulative sum. First the individual HSVs are redefined by,

$$\tilde{\sigma\_i} = \sum\_{j=1}^{i} \sigma\_j / \sum\_{l=1}^{N} \sigma\_l,\tag{12}$$

therefore, the vector σ˜*HSV* is the normalized cumulative sum of σ*HSV*. This process renders the relationship σ˜*HSVN* = 1. Thus, independent of basis or the weight matrix magnitude, the threshold can be chosen to lie in the interval 0 < *tr* < 1. We later discuss the implications of the choice of threshold magnitude on motor skill development.

Using the threshold normalized HSVs, the trajectory specific dimensionality is therefore given by,

$$\mathcal{D}\_T(\hat{\mathcal{F}}) = \begin{cases} \mathcal{K} & \text{if there exists } \tilde{\sigma}\chi \le t\_{\tilde{\tau}}, \\ 1 & \text{otherwise} \end{cases} \tag{13}$$

The TSDA can therefore be computed for both linear and nonlinear systems. It must also be noted that through computation of the TSDA, an equivalent trajectory-specific reduced dimensional model of the behavior is also computed as described in **Figure 1**. We now extend these methods in order to examine the implications of dimensionality minimization, as described next.

# **2.4. MINIMUM DIMENSIONAL CONTROL (MDC)**

The objective of this paper is to test the supposition that temporal muscle synergies lead to a DR of the state-space dimensionality. Through the method presented developed in the previous section, we can compare various trajectories that satisfy task requirements in terms of the reduction in dimensionality. Now we examine the consequence of minimization of this DR for a given task and a given synergy basis. We define the minimization problem as follows.

As described earlier, for an orthonormal basis set of temporal synergies (*t*) each weight matrix *W*ˆ corresponds to a unique trajectory in state-space (for the same initial conditions of the dynamical system). Therefore the problem is posed as a constrained minimization for identifying the optimal weight matrix *W*ˆ <sup>∗</sup> *<sup>T</sup>* that minimizes a dimensionality performance index *<sup>J</sup>*(*DT*) while satisfying task constraints *C<sup>T</sup>* as,

$$
\hat{W}\_T^\* = \underset{\hat{W}\_T}{\text{argmin}} \quad f(\mathcal{D}\_T),
$$

$$
\text{subject to} \quad \dot{\mathbf{x}} = f(\mathbf{x}, t) + g(\mathbf{x}, \mathbf{u}, t),
$$

$$
\mathbf{y}\_T(t\_d) = \mathbf{y}\_{Tt\_d}, \dot{\mathbf{x}}\_T(t\_d) = \dot{\mathbf{x}}\_{Tt\_d}, \tag{14}
$$

where the task is specified by a set of task-space and state-space constraints. We term the solution to this minimization problem as Minimum Dimensional Control (MDC) as depicted in **Figure 1**. The key to this approach is the specification of an appropriate performance index.

In order to generalize our approach to different kinds of physical systems, a computational (numerical) solution is ideally sought. Therefore, the desired properties of this performance index *J*(*DT*) are that it needs to be continuous, and computationally simple for any kind of physical system *<sup>F</sup>*ˆ.

From the definition of the normalized HSVs in Equation (12), it can be seen that σ˜ is a positive, real, bounded, and ordered vector of magnitudes. Also, by definition, the difference between adjacent HSVs, given by δ = ˜σ*<sup>i</sup>* <sup>+</sup> <sup>1</sup> − ˜σ*i*, always monotonically decreases toward 0. This implies that the crucial determining factor for minimum reduced dimensionality *K* is the magnitude of the second cumulative HSV σ˜2. This is because the magnitude of subsequent HSVs will be greater, and the first HSV magnitude σ˜<sup>1</sup> is irrelevant for the reduction since *D<sup>T</sup>* ≥ 1.

For any convenient choice of threshold *tr*, a large magnitude of σ˜<sup>2</sup> ensures that *K* is minimized since all subsequent HSV values (σ˜2,... σ˜ *<sup>N</sup>*) are in the interval [ ˜σ2, 1]. Effectively, increasing σ˜<sup>2</sup> is equivalent to increasing the range of values of *tr* that result in a reduction to a system of subspace dimensionality 1. Clearly, σ˜<sup>2</sup> is the critical magnitude determining reduction in dimensionality.

Based on this rationale the performance index we propose for the MDC is,

$$J(\mathcal{D}\_T) = \mathcal{S}\_F(1 - \tilde{\sigma}\_2),\tag{15}$$

where *SF* is a positive rational scale factor. Computationally, the minimization can be carried out using any convenient numerical optimization algorithm. Since the obtained weight matrix *W*ˆ <sup>∗</sup> *<sup>T</sup>* is specific to a given task, a given synergy basis set and a given dynamical system, the obtained optimal trajectories are similarly system, task and synergy specific. Despite these conditions, as seen later in the results, invariant characteristics similar to human movements emerge as the optima on the tested linear and non-linear systems. An important consequence of deriving the MDC using the system balancing method is that the approach automatically yields a reduced dimensional dynamic model corresponding to the minimum dimensional trajectory. This is therefore a task-specific reduced dimensional model as depicted in the lower portion of **Figure 1**.

We hypothesize that the MDC trajectories will lower the difficulty of task learning and optimization. This is particularly relevant for the case of adaptive control, when the dynamics of the system changes with time and optimizing schemes need to keep track of changes, i.e., necessitating a cost on the number of dimensions. The MDC essentially allows task-specific adaptation which can gradually change in a manner mirroring development (Berthier et al., 1999).

It must be noted that MDC itself might be susceptible to the curse of dimensionality and is not meant to explain the neural instantiation of control signals for real-time task planning and control. Instead we propose that it is a model for an optimal mechanism underlying trajectory planning in order to overcome the limitations imposed on the learnability. MDC thus represents a bridge between the muscle synergy hypothesis and the optimal motor control models of redundancy resolution.

# **2.5. SIMULATION SETUP**

The experiments were performed on two kinds of simulated systems, (1) the linear tethered mass, and (2) a non-linear compliant kinematic chain.

# *2.5.1. Tethered mass system*

This system consists of a point mass constrained to move in a 2*D* plane as seen in **Figure 2A**. It is "tethered" to an origin by weak passive forces using linear springs and is subject to visco-elastic damping. The system can be actuated by independent forces in two orthogonal directions, and the output describes the position in the 2*D* space relative to the origin. The dynamics of this system are described by,

$$
\ddot{\phi} = -K\phi - C\dot{\phi} + \mathbf{F}\_{\mu}, \tag{16}
$$

where φ(*t*) = [φ*x*(*t*), φ*y*(*t*)] *<sup>T</sup>* is the position of the mass in space, *K* is a stiffness matrix, *C* is a damping matrix and **F***u*(*t*) = [**F***ux* , **F***uy* ] *<sup>T</sup>* are orthogonal input forces actuating the system. The simulation parameters were chosen as *C* = 2I N/m/s and *K* = 6I N/m.

The system can be considered to be a simplified analog of the oculomotor system. It describes the eye orb dynamics without taking torsional forces into consideration and approximates the passive effects of the orbital tissue. The output can be considered as the displacement angles in horizontal and vertical directions (in radians) since linear approximation of orb movements have been shown to be valid in the range of ±π/6 radians (Bahill et al., 1980).

# *2.5.2. Compliant kinematic chain*

This system is a two-link planar kinematic chain with passive joint compliance as seen in **Figure 2B**. Actuation is applied through the joint torques. The dynamics are described by Spong and Vidyasagar (2008),

joint torques are used to actuate the system. The state-space descriptions of these systems have identical input (2), state (4), and output (2)

$$\vec{\theta} = M(\theta)^{-1} \text{[N}(\theta, \dot{\theta})\dot{\theta} + K(\theta - \theta\_0) + \text{\texttau]},\tag{17}$$

where the state is described by θ(*t*) = [θ1(*t*), θ2(*t*)] *<sup>T</sup>*, *M*(θ) is denoted the mass-inertia matrix of the system, *N*(θ , θ˙) is the Coriolis matrix and *K* is the joint stiffness matrix and the joint rest positions are given by θ0. The system is actuated by the torques τ (*t*) = [τ1(*t*), τ2(*t*)] *<sup>T</sup>* at the two joints. The parameters of the simulation are chosen as, *m*<sup>1</sup> = 0.75 kg, *m*<sup>2</sup> = 0.5 kg, *l*<sup>1</sup> = 0.4 m, *l*<sup>2</sup> = 0.4 m. The applied torques are scaled by a factor of 1.88 at joint 1 and 0.45 at joint 2. A joint stiffness of 0.6 Nm/rad and viscous joint friction of 0.15 Nm/rad is used at both the joints with rest angles fixed at θ(*t*0) = [−π/16,π/8] *<sup>T</sup>*. The output of the system is the position **P** = [*Px*(*t*), *Py*(*t*)] *<sup>T</sup>* in the 2*D* Cartesian space which are related to the joint angles through the forward kinematics.

This system describes the behavior of vertebrate limbs. The passive joint compliance not only adds to the biological realism, but also renders the system stable—this is a necessary condition for empirical balancing.

# *2.5.3. Synergy bases*

Two kinds of idealized temporal synergies composed of orthonormal basis functions are tested: (a) Legendre polynomial basis (*l*(*t*)), and (b) Fourier basis (*f*(*t*)) in order to simplify the weight learning for the analysis; they are well known approximators used for curve fitting. They are represented by,

$$\Psi\_l(t) = \sum\_{i=0}^{n} a\_i P\_i((2t - t\_d)/t\_d),$$

$$\Psi\_f(t) = a\_0 + \sum\_{i=1}^{n} a\_i \sin(i\omega t) + b\_i \cos(i\omega t), \qquad (18)$$

respectively, where *td* is the duration of the movement and the weights are given by *W*ˆ *<sup>l</sup>* = [*a*0,... *an*], and *W*ˆ *<sup>f</sup>* = [*a*0, *a*1,... *an*, *b*1,... *bn*]. The Legendre polynomials were computed using the standard Rodriguez formula. Since the polynomials are defined in [−1, +1], they are time-scaled to accommodate the entire duration of the intended movement.

These synergies have another convenient property that their magnitudes are bounded, i.e., abs((*t*)) ≤ 1. This property is essential for non-linear TSDA using empirical balancing since the method involves perturbing the inputs using unit impulse signals (Lall and Marsden, 2002). Since the TSDA treats the synergies as input signals, this insures that a unity input perturbation can be applied.

# *2.5.4. Simulation framework*

The simulation was performed on MATLAB (2012). The equations were integrated using the *ode15s* solver in the ODE package with the settings of absolute tolerance = 5*e*−<sup>2</sup> and relative tolerance 1*e*<sup>−</sup>3. Model reduction routines developed in Hahn and Edgar (2002), and Sun and Hahn (2005) were used for the non-linear system balancing. The weights *W*ˆ for the TSDA benchmark tasks and the MDC initialization were acquired by using a least-squares method. The numerical optimization

dimensionality.

of MDC was carried out using the *fmincon* routine, with the *interior point* algorithm (Waltz et al., 2006) for the linear MDC and *active set* (Gill et al., 1981) for the non-linear MDC.

# **3. RESULTS**

The results of the experiments on the test systems using TSDA and MDC are presented in this section.

### **3.1. TSDA ON THE TETHERED MASS**

A set of four benchmark trajectories, denoted by *Ti* = φ*i*(*t*), were compared using TSDA for the tethered mass system. Each trajectory described a motion from the origin to a target output position of [0.5, 0.5], each thus representing a solution to the reaching task. The trajectories, seen in **Figure 3A**, were specified by via-points in Cartesian space and cubic-spline fit was computed with smoothness conditions enforced at the boundaries (2nd order boundary conditions set to 0). The weight matrix *W*ˆ *<sup>i</sup>* for the control of each of the trajectories were computed using a least-squares fit of the corresponding inverse dynamic control signals *udi* (*t*). Two kinds of synergies were compared: Fourier and Legendre polynomial bases of order 4 each as seen in **Figures 3B,C**. In the case of the Fourier basis temporal synergy 9 components are necessary corresponding to the sinusoidal and co-sinusoidal parts of the Fourier basis as seen in **Figure 3B**.

The result of the weight training can be seen in the Hinton diagrams of the weight matrices in **Figures 4A,B**. The weights, represented by the size of the shaded ellipses, clearly capture the temporal components of each of the trajectories. However, some trajectories are easier to interpret and understand for one kind of synergy alone. For instance, while the weights corresponding to trajectory *T*<sup>1</sup> are identical in both rows, in the case of *T*3, mirroring of weights across the inputs is seen only for the Fourier basis synergy.

For each trajectory, the constrained-reformulated system was constructed and the corresponding reduction, denoted by the vector *KT*, was computed using the linear system balancing procedure. The cumulative normalized HSVs of the constrainedreformulated system can be seen in **Figures 4C,D**. As noted earlier, the final HSV (σ˜<sup>4</sup> = 1) for all trajectories, i.e., the last bar in each plot is always unity in magnitude. The magnitude of the other HSVs reflect the task, trajectory and the synergy choice.

For this experiment, a threshold value of *tr* = 0.975 was utilized to compute the DR (black solid lines in **Figures 4C,D**). It can be seen that the straight line Cartesian trajectory seemingly has the minimum dimensionality of *K* = 1 independent of the choice of threshold magnitude. For the chosen threshold, the DR for each of the trajectories was then obtained as *Kfourier* = [1, 3, 2, 3], and *Klegendre* = [1, 3, 3, 3]. In the case of TSDA on the Legendre polynomial basis, it can be seen that tasks *T*2, *T*3, and *T*<sup>4</sup> are nearly identical in the HSV magnitudes barring minor differences in the 3rd HSV.

The obtained dimensionality on the straight line trajectories imply that it could be a possible candidate for the minimum

dimensional solution to the reaching tasks. This is investigated using the MDC framework as described next.

size is the magnitude, a dark region denotes positive weight and white region

# **3.2. MDC ON THE TETHERED MASS**

In this experiment, the MDC was synthesized for the tethered mass system for a point-to-point reaching task, i.e., with zero velocity at the boundaries. The constrained numerical optimization computed the weight matrix for the synergies which minimize the cost in Equation (15).

For the optimization the initial weights were set using a cubicspline interpolate of a trajectory fitting the boundary constraints (φ(*td*) = [0.5, 0.5] *<sup>T</sup>*, φ˙(*td*) = [0, 0] *<sup>T</sup>*). A constraint tolerance of = 10−<sup>2</sup> was used as a terminal criterion for the minimization. In each of the cases, a local minimum was achieved when using the interior-point algorithm for minimization.

dimensionality for both of these synergy bases.

The trajectories resulting from MDC can be seen in **Figure 5** for the Legendre, and Fourier basis synergies. Smooth sigmoidal trajectories were obtained as the optimal reaching solution in both cases for multiple movement durations. The terminal cost of optimization was obtained as 2nd HSV σ˜<sup>2</sup> ≈ 0 for all cases. The time normalized velocity profiles, as seen in **Figure 5B**, are bell-shaped.

Interestingly, from the peak velocities in **Figure 5B**, it can be seen that while the Legendre polynomial synergies correspond closely to the minimum jerk criterion (Hogan, 1984), the Fourier basis synergy result was a close match with the

minimum acceleration criterion (Ben-Itzhak and Karniel, 2008) (represented by the dashed black lines in both cases). There were other minor differences between the trajectories for each kind of synergy. Nevertheless, in both cases the peak velocity of the trajectory scales linearly with the movement duration. The results show that the MDC model computes a synergy specific minimum dimensional trajectory for a given task. It must, however, be noted that MDC does not guarantee symmetric bell-shaped velocity profiles, this is a consequence of the boundary conditions specified and the initialization of the weights for the constrained minimization. Nevertheless, it can be seen that the minimum dimensional solution for the reaching task corresponds to a reduction to a 1 dimensional system independent of the synergy basis chosen.

Due to the linearity of the system, the weight matrix computed by MDC linearly scales with the movement duration as seen in **Figure 5C** (represented only for one of the inputs). The magnitude of the changes are synergy dependent. This implies that for linear systems the peak velocity and movement duration are a linear function of the synergy weights; the relationship depending on the synergy type.

The tethered mass system can be seen as an analog of the human eye mechanism. The passive forces acting on the mass are similar to the weak passive forces due to the orbital tissue. Although the notion of synergies does not seem to extend to the oculomotor system, the Fourier basis synergy can be viewed as a useful modeling tool for analysis of the frequency response characteristics (Harris, 1998a).

We then used the MDC framework to analyze the reduction in dimensionality in via-point tasks. Via-points were chosen to lie on a circle about the target position (as seen in **Figure 6**). The via-points were specified to be reached at exactly half the movement duration. In each case, the appropriate synergy weight matrix was computed for both the tested synergy types using an inverse dynamic model and the linear least-squares procedure. In this experiment, instead of just minimizing the performance index, the variation with via-point orientation was obtained, as seen in the polar plot in **Figure 6B**.

As expected from the earlier reaching experiments, the minimum dimensional via-point is seen to lie exactly along the diagonal, i.e., along the straight line connecting the origin to the target of the movement. Interestingly, for the linear system a minimum dimensional solution was also obtained for the viapoints corresponding to the reversal task, i.e., the via-points that lie beyond the target position but along the same line connecting origin and target. Reversal tasks and straight-line reaching are therefore seemingly identical in dimensionality for the linear system. This result also implies that the symmetry of velocity profiles is not guaranteed through MDC, rather it is a consequence of the boundary conditions utilized.

In general, however, the results indicate that for the tested linear system, the choice of via-point can strongly impact the dimensionality of the dynamics. Furthermore the synergy basis specific nature of the dimensionality in following via-points can be seen in the difference between the blue (Legendre polynomials) and red (Fourier basis) lines in **Figure 6B**. Clearly, the differences in performance index with orientation between the two synergies indicate that certain via-points are 'easier' to reach with one kind of synergy basis. This observation is an ideal test-scenario for experimental investigation with subjects and could potentially be used to identify the most appropriate experimentally extracted synergy basis.

The generalization of the MDC is demonstrated in **Figure 7**. The numerical optimization was initialized with a trajectory passing through a via-point located at (0.4, 0.3). The MDC optimization converged toward the straight line trajectory with a bell-shaped velocity profile as seen in **Figure 7B**. The change in cost with each iteration of optimization shows that the algorithm rapidly converges towards the optimal solution of cost *J*(*DT*). The synergy weight matrix in the optimal case consists of identical values in each row indicating that the MDC solution yields identical force inputs to the system for the reaching task.

### **3.3. TSDA ON THE KINEMATIC CHAIN**

In case of the non-linear compliant kinematic chain system, the empirical balancing procedure was used to compute TSDA. Again a set of four benchmark trajectories *T*1...<sup>4</sup> were utilized. In each case, the arm was initialized with the angles θ(*t*0) = [−π/16,π/8] *<sup>T</sup>*, i.e., the rest position. Similar to the linear system experiments, each trajectory described a motion from the initial position to an end position [0.5, 0.2] in the Cartesian space. Again, the trajectories were obtained by fitting cubic splines to Cartesian via-points with smoothness conditions enforced at the boundaries (2nd order boundary conditions set to 0), each representing a variation on the reaching task. Inverse kinematics was then used to compute the joint angle trajectories for each trajectory; the "*down*" configuration was utilized mimicking the reaching behaviors in humans. The required torque τ*i*(*t*) = [τ*i*<sup>1</sup> (*t*), τ*i*<sup>1</sup> (*t*)] *<sup>T</sup>* corresponding to each task *Ti* was then computed by using the inverse dynamics of the system. The weight matrix was then computed for each trajectory using a least-squares procedure. For the experiments carried out, analysis was restricted to the Legendre polynomial synergies since it offered a better fit of the desired torques with a relatively low order in comparison with the Fourier basis synergies.

The minimum value of 0 is located exactly along the straight line linking origin

and target for both kinds of synergies.

(0.5, 0.5) for a reaching movement from the initial position (origin); **(B)** Polar plot of the variation in the dimensionality performance index against

The endpoint trajectories for the four cases using Legendre basis synergy control is seen in **Figure 8A**. The weight matrix is represented by the Hinton diagram in **Figure 8B**. From the size of the shaded ellipse, it can be seen that in all four cases, the contribution of the proximal joint inputs is much higher. The temporal aspects of the trajectories can been seen in the relative contributions of the negative weights (ellipses with white shading). Again, the corresponding constrained reformulation was obtained and the empirical balancing procedure was utilized to compute the approximate HSVs. Since the Legendre polynomial synergy magnitudes are bounded, the empirical Gramians were computed from the state trajectories resulting from applying unit impulses across the inputs of the constrained reformulated system.

to Cartesian straight-lines with bell-shaped time-normalized velocity profiles

The application of empirical balancing in this framework is equivalent to activating combinations of the synergies with bounded impulses; the magnitudes were chosen from a uniform distribution about an input ball of same dimension as the number of synergies, i.e., of dimension *S*. The HSVs corresponding to each task *Ti* computed by this method can be seen in **Figure 8C**. The DR using a threshold choice of *tr* = 0.935 was obtained as *K* = [1, 2, 2, 2]. Similar to the earlier linear example, it can be observed that the straight line trajectory with a sigmoidal profile seemingly has the minimum dimensionality of 1. This observation was examined in detail in the MDC experiments, presented next.

# **3.4. MINIMUM DIMENSIONAL CONTROL IN KINEMATIC CHAIN**

normalized Hankel singular values.

The MDC experiment was repeated on the kinematic chain system for a set of reaching targets within the workspace of the arm. Similar to the linear case, the minimization process was initiated with the constraints of zero velocity enforced at the boundaries. A constraint tolerance of = 10−<sup>2</sup> was used as a terminal criterion for the minimization.

The (locally) optimal trajectories resulting from MDC can be seen in **Figure 9** for the Legendre basis synergies. Smooth sigmoidal near-straight line trajectories emerge for some movement durations; the results were obtained for different movement durations of *td* = 2.5, 3.5, and 4 s. In contrast with the linear MDC case minor skewing effects can be seen in the velocity profiles. These effects are a consequence of the approximate fitting offered by a fixed set of synergies in order to meet the terminal boundary conditions.

Similar to the linear system experiments, the peak velocity obtained for the reaching movements is dependent on the movement amplitude. It can also be seen in this case that the correspondence of the obtained trajectories to the Minimum Acceleration (MA) model (Ben-Itzhak and Karniel, 2008) is greater (black dashed lines in **Figures 9A,B**).

Clearly, a close correspondence is seen between the obtained reaching trajectories and human reaching behavior as reported by Morasso (1981) and by several others.

As in the earlier linear system experiments, we use the MDC framework to analyze the reduction in dimensionality in viapoint tasks. Via-points are chosen to lie on a circle about the target position (as seen in **Figure 10**). Again, the via-points are specified to be reached at exactly at half of the movement duration. For each trajectory, the appropriate synergy weight matrix

was computed. The variation of the dimensionality performance index with respect to via-point orientation is obtained, as seen in the polar plot in **Figure 10B**.

In contrast with the linear example, it can be seen that there exists a non-zero minimum value of the performance index. The reaching target of (0.4, 0.2) was chosen from the set of points investigated in the earlier MDC reaching experiments. For this target position, it can be seen that the via-point resulting in the best DR lies on the straight line connecting origin and the target position. However, reversal tasks are greater in dimensionality implying that they are more complex to achieve in the kinematic chain system.

The generalization of the MDC in the non-linear case can be seen in **Figure 11**. The numerical optimization is initialized with a trajectory passing through a via-point located at (0.6, 0.1). The MDC converges toward a trajectory close to the straight line with bell-shaped velocity profile as seen in **Figure 11B**. The change in cost with each iteration of optimization shows that the algorithm rapidly converges towards the optimal solution of cost *J*(*DT*). In contrast with the linear result earlier, at some stages of the optimization, the intermediate cost is below the terminal cost as seen in **Figure 11C**. This is a consequence of the *active-set* algorithm which results intermediate solutions which do not obey the constraints. The convergent (locally) optimal solution obeys the terminal position and velocity constraints as seen in **Figure 11B**.

# **4. DISCUSSION**

In this paper, we develop a quantification for the reduction in the behavioral dimensionality in a system due to control in the form of muscle synergies. When using the temporal synergy formulation, the behavior dynamics are dependent on the synergy basis and the weight matrix. We model this as a trajectory-specific constrained reformulation of the dynamics of the system. Using the approach of system balancing, we quantified the reduction

in dimensionality using a threshold-normalized Hankel Singular Value (HSV) measure this process computes the dimensionality of the subspace of the dynamics of the balanced system. Using our method of Trajectory Specific Dimensionality Analysis (TSDA) we show that various trajectories that satisfy task constraints can be compared in terms of reduction in dimensionality in a system and synergy basis specific manner. We then develop a method for minimization of this dimensionality in our model of Minimum Dimensional Control (MDC). The method yields the weight matrix corresponding to the minimum dimensional trajectory that satisfies task constraints using a constrained minimization of the HSV measure. The proposed methods were simulated on biologically-relevant linear (tethered mass) and non-linear (compliant kinematic chain) systems. Using idealized temporal synergies, a task, synergy, and system specific reduction of dimensionality of behavior due to control using muscle synergies was demonstrated. The trajectories obtained as a consequence of this minimization, closely correspond with observations of some of the kinematically invariant features in human movements. We therefore propose that a dimensionality reduction principle might underlie motor control as a direct consequence of developmental necessities.

Bernstein's "*degrees of freedom problem*" remains a seminal observation of natural motor coordination, and continues to challenge our biological understanding as well presenting a fundamental obstacle to biomimetic engineering. Some kind of DR surely occurs, but whether it is an implicit/ emergent phenomenon (e.g., Lagrangian optimization), or an explicit 'simplifying' evolutionary and/or developmental strategy remains a conundrum. The muscle synergy hypothesis suggests that the DR is a fundamental advantage resulting from the partitioning of the space of inputs (Alessandro et al., 2013). However, it has faced criticism. Although statistical regularities seem to be present in the measurements of EMG, and kinematic data from subjects performing behavioral tasks, the extracted synergies are strongly dependent on the nature of observations that can be made (Steele et al., 2013). Despite recent approaches for careful experiment design have aimed at addressing this criticism, the perception that this hypothesis represents only a phenomenological view of motor control seems hard to shake off (Tresch and Jarc, 2009). Falsification of this theory requires careful identification of the actual functionality offered by muscle synergies toward learning and control of optimal motor behavior.

Our view is that for DR to exist in biological organisms, it would need to impact on the organism's behavior, as this is a major determinant of fitness. Muscle synergies would probably only evolve if they had a positive influence on an organisms ability to solve tasks, learn motor skills, and adapt to changes. To this end, TSDA quantifies the DR in dynamic behavior. The dimensionality of behavior is taken to denote the dimensionality of the state-space of the system under synergy control. It is specific to a task and to a defined set of synergies. The dynamic models obtained through the task-specific reduction of this state-space are reminiscent of the internal model hypothesis (Wolpert et al., 1998; Kawato, 1999). Although we do not investigate this relationship further in this work, the task-specific reduced internal representations associated with our MDC trajectories could be very relevant for motor planning for tasks (Braun et al., 2009). Through following these minimum dimensional trajectories, an organism could minimize the neural complexity required for learning internal models.

Attempts have been made to fit synergy data extracted from behavior onto musculoskeletal models (Neptune et al., 2009; McKay and Ting, 2012; Steele et al., 2013). Our approach could potentially complement this analysis and allow the quantification of the differences between synergies extracted by various methods on a given dataset. This would then be a synthetic approach for testing the validity of any set of synergies toward simplifying the control and learning problem. Although we only employed fictitious synergies composed of idealized bases of Legendre and Fourier components, our methods can be applied to any synergy set specified by a time series. TSDA can also potentially

be used to test the validity of a task definition, in terms of constraints presented to subjects, as well as the nature and quality of the number of EMG measurements that are employed for synergy extraction. Although our demonstration focussed on the temporal synergy model, in principle the methods can be used for quantification of other models of synergies such as the time-varying synergies (d'Avella and Bizzi, 2005).

the straight line connecting origin and target (0.4, 0.4); Gradual convergence to Cartesian straight-lines with bell-shaped time-normalized velocity profiles

The methods we developed in this paper represent a controltheoretic perspective on the muscle synergy hypothesis. This entails a synthetic examination of the role of muscle synergies in acting as facilitators of optimization through control dimensionality reduction. In this view, it is not only important to extract spatio-temporal regularities from biological behavior datasets, but also to carefully examine if task control and learning is indeed facilitated (Alessandro et al., 2013; de Rugy et al., 2013). In particular, Berniker et al. (2009) suggested that synergies represent a task-variable specific reduction in controller dimensionality. We essentially extend this view by quantifying a task-variable and synergy basis specific reduction—thereby allowing us to understand the temporal aspects of motor behaviors. Our approach is also closely related to a recent analysis of the synergy hypothesis from an intermittent hierarchical control perspective (Karniel, 2013). In principle, the notion of minimal intermittancy and our concept of minimum dimensionality both have an underlying objective of minimizing control effort, and further investigation of this relationship is definitely warranted.

endpoints. **(C)** Change in *J*(*D<sup>T</sup>* ) cost with each iteration of optimization, and **(D)** Hinton diagram of the initial and optimal weights and the corresponding

normalized Hankel singular values.

The methods presented in this paper also have potential applications in the control of artificial systems such as robots. Current state-of-art methods such as policy gradients (Peters and Schaal, 2008), and the *PI*<sup>2</sup> algorithm (Policy Improvement through Path Integrals) (Theodorou et al., 2010) have been used for demonstrations of reinforcement learning being applied to high-dimensional robot systems. In comparison with modelfree reinforcement learning, model-based methods offer several advantages such as the ability to update policies offline and then performing sporadic updates from real-world data. Also modelbased methods allow safe exploration without risking damage of robots. Our approach naturally facilitates tractable model based learning and could serve as a planning tool acting in concurrence with existing reinforcement algorithms in order to speed-up learning.

In several reinforcement learning proposals, the trade-off between exploration and exploitation is often discussed. It is important to note that a method based on reduced dimensional internal models, although offering potential speed-up of learning, could also limit the scope of the obtained solutions—i.e., the learning could converge to suboptimal behavior. Within the context of our framework, we believe that this problem could instead be tackled by a developmental scheme of progressively increasing internal model dimensionality along with the acquisition of control of newer skills. This notion is similar to the developmental hypothesis of degree-of-freedom freeing and unfreezing (Bernstein, 1967). Consequently, the developmental increase in the number of synergies to cope with increased task requirements (Dominici et al., 2011) would than be equally supplemented by a progressive increase in internal model dimensionality. Thus task-specific models of internal models of increasing complexity would progressively be evaluated as the organism matures.

Although the scope of this paper was limited to the analysis of deterministic continuous-time systems, the methods can in principle be adapted to deal with stochastic effects and discretization. The resulting approach could then be used to supplement existing state-of-art methods in iterative stochastic optimal control (Theodorou et al., 2010). Furthermore, although the investigations focussed on a feedforward control scenario, the methods can easily incorporate a feedback control formulation of plant dynamics; the models we tested already include a weak mechanical feedback in the form of passive joint compliance. Nevertheless, it must be noted that several existing models in the synergy hypothesis suggest that muscle synergies are a high-level feedforward control scheme that incorporates low-level feedback (d'Avella et al., 2003; Hart and Giszter, 2004; Ivanenko et al., 2004; Ting and Macpherson, 2005; Tresch et al., 2006). In an artificial context, this notion has also been explored in the design of dynamical movement primitives (Ijspeert et al., 2013) wherein the policies encode trajectory features while the primitives themselves can then be modified online in a smooth manner taking into account disturbances etc. due to their dynamic nature.

The Optimal Control Theory (OCT) models of human motor behavior originate from a evolutionary perspective; there is a fitness-driven necessity for behaviors to be optimal. Various Lagrangians have been proposed to quantify task optimality depending on the different perspectives of the system such as the output (kinematic) (Flash and Hogan, 1985), control input (minimum variance Harris and Wolpert, 1998, minimum norm Dean et al., 1999), or intermediate variables (minimum torque Nakano et al., 1999). However, it must be noted that OCT hypotheses employ relatively complex mathematical techniques; current theoretical limitations mean that OCT methods can only be applied analytically on relatively simpler models such as linearized models of the oculomotor system or limb movements (Harris and Wolpert, 1998). Also, there is no testable suggestion so far as to how and where the optimization might actually be happening in terms of actual neural mechanisms. The method proposed in this paper is possibly a step toward this goal, since we relate optimization to the actual recruitment of synergies to accomplish tasks.

From a developmental perspective, the process of acquisition of motor coordination is gradual and seemingly composed of intermediate stages of learning (Sporns and Edelman, 1993). If we consider that optimal solutions exist in a high dimensional space (system dynamics, neural control input) unique to an individual organism, then fitness must also depend on the ability to find good solutions in the developmental time frame (Harris, 2011). Searching for an optimal trajectory has a little value if it takes a long time to find. We propose that the time taken to learn an optimal control, which we call "learnability" is itself an important parameter in a self-organizing system (Kuppuswamy et al., 2012). DR is one possibility which may speed up learning, but there might be a trade-off with precision and learning rate to the extent that non-redundant degrees of freedom are eliminated. Our approach provides a mechanism to examine this hypothesis through the measurement of dimensionality of empirically measured trajectories relative to some assumed or computed basis set of synergies.

The most interesting results obtained through our methods are the smooth straight-line sigmoidal trajectories with bellshaped velocity profiles as the minimum dimensional solution to reaching tasks. The similarity at the output for two basis sets (Legendre and Fourier) and for both linear and non-linear systems suggests the possibility of some kind of invariance at the output task variable level. We also observed that the symmetry of the velocity profiles is strongly affected by the specification of boundary conditions on the behaviors. Smoothness implies a potential relationship between DR and bandwidth reduction. Clearly, task demands place constraints on possible trajectories, and hence on their spectral content. In point-to-point reaching trajectories with zero velocity boundary conditions, the temporal truncation forces a strictly infinite bandwidth, with rapidly decaying spectral energy limiting envelope (Harris, 2004). The fastest movement that can be achieved without exceeding this spectral limit are the family of minimum square derivative functions, such as minimum acceleration for 2nd order systems, or minimum jerk for 3rd order systems. The DR trajectories had lower peak velocities than expected from the minim jerk profile, but were similar to minimum acceleration (dotted lines in **Figures 5**, **9**). The relationship between DR and low bandwidth is unclear at present, but has two important implications.

If this invariance is upheld, it implies that the choice of basis set is not critical (presumably provided the output trajectory can be spanned by the input basis set). Indeed, it may reflect the possibility that DR occurs at the output directly. In our work we only examine the state-space dimensionality and the computation of minimum dimensional weight matrix. In principle, this approach may also be used for investigating the optimal temporal characteristics of the basis set themselves. For example, using the Legendre polynomial basis, we observe a reduction in dimensionality across tasks, both in the input as well as in the output. In this respect, it is interesting that low bandwidth signals also have low Shannon numbers (although the Shannon number is an imprecise measure of signal dimension when duration is finite).

Second, there is a coincidence between low dimensionality and optimal control. That is, if low dimensionality is maintained, optimal or near-optimal trajectories are automatically generated for a given set of boundary conditions, and the curse of dimensionality is largely circumvented. An alternative is that the optimality approach itself is a misconstrued attempt to explain low dimensionality via a Lagrangian. However, for the minimum variance model, it would be difficult to explain the known presence of signal-dependent noise unless the noise is somehow a product/compensation for DR.

This last point is also relevant to synthetic (robotic) systems. Minimization of biologically relevant Lagrangians in synthetic systems does not necessarily lead to biologically realistic behavior, but depends on the synthetic architecture. For example, minimizing reaching time in a natural system appears to be achieved by the smooth bell-shape velocity profiles, but in a linear robot the same Lagrangian (functional mimicry) would be optimized by bang-bang control leading to skewed velocity profiles. In any case, finding such solutions in real-time is non-trivial, and often natural behavior must be programmed explicitly into the artificial system (esthetic mimicry) (Harris, 2009). However, when we consider DR as the underlying principle for generating natural behavior, we envision that functional mimicry in a robot would produce similar or the same natural behavior. It is not entirely clear at present, how precisely the mimicry would need to be. It is plausible that only crude approximations are needed. Furthermore, although we investigated two relatively simple systems performing reaching and via-point type tasks, the methods are computationally applicable to any control-affine systems. Thus in principle, these methods could be used to compute "natural" behaviors in robots of a variety of morphologies. A related application would be to optimize behavior in artificial systems that are driven by pattern based mechanisms such as Central Pattern Generators (CPG) (Ijspeert, 2008). Our approach is thus a potential path toward robots with neurally inspired motor control of reduced complexity.

# **FUNDING**

This research was supported by the European Community under the 7th Framework Programme by the projects RobotDoc (Grant Agreement No. 235065), a Marie Curie Action ITN, and AMARSi (Grant Agreement No. 248311).

# **ACKNOWLEDGMENTS**

The Authors would like to thank Dr. Juan-Pablo Carbajal for the discussions that lead to this work and Dr. Cristiano Alessandro for offering helpful comments on early versions of the manuscript. We also would like to thank the reviewers for their helpful comments and suggestions in shaping this manuscript.

# **REFERENCES**


Wolpert, D. M., Diedrichsen, J., and Flanagan, J. R. (2011). Principles of sensorimotor learning. *Nat. Rev. Neurosci.* 12, 739–751. doi: 10.1038/nrn3112

Wolpert, D. M., Miall, R. C., and Kawato, M. (1998). Internal models in the cerebellum. *Trends Cogn. Sci.* 2, 338–347. doi: 10.1016/S1364-6613(98)01221-2

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 18 September 2013; accepted: 21 May 2014; published online: 23 June 2014.*

*Citation: Kuppuswamy N and Harris CM (2014) Do muscle synergies reduce the dimensionality of behavior? Front. Comput. Neurosci. 8:63. doi: 10.3389/fncom. 2014.00063*

*This article was submitted to the journal Frontiers in Computational Neuroscience. Copyright © 2014 Kuppuswamy and Harris. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Synergetic motor control paradigm for optimizing energy efficiency of multijoint reaching via tacit learning

# *Mitsuhiro Hayashibe1,2\* and Shingo Shimoda2*

*<sup>1</sup> INRIA DEMAR Project and LIRMM, University of Montpellier, Montpellier, France <sup>2</sup> Brain Science Institute-Toyota Collaboration Center, RIKEN, Nagoya, Japan*

*Edited by:*

*Andrea d'Avella, IRCCS Fondazione Santa Lucia, Italy*

### *Reviewed by:*

*Amir Karniel, Ben-Gurion University, Israel Elmar A. Rückert, Graz University of Technology, Austria J. Lucas McKay, Georgia Tech/Emory University, USA*

### *\*Correspondence:*

*Mitsuhiro Hayashibe, INRIA DEMAR Project and LIRMM, University of Montpellier, 161 Rue Ada, 34095 Montpellier, France e-mail: mitsuhiro.hayashibe@inria.fr*

A human motor system can improve its behavior toward optimal movement. The skeletal system has more degrees of freedom than the task dimensions, which incurs an ill-posed problem. The multijoint system involves complex interaction torques between joints. To produce optimal motion in terms of energy consumption, the so-called cost function based optimization has been commonly used in previous works.Even if it is a fact that an optimal motor pattern is employed phenomenologically, there is no evidence that shows the existence of a physiological process that is similar to such a mathematical optimization in our central nervous system.In this study, we aim to find a more primitive computational mechanism with a modular configuration to realize adaptability and optimality without prior knowledge of system dynamics.We propose a novel motor control paradigm based on tacit learning with task space feedback. The motor command accumulation during repetitive environmental interactions, play a major role in the learning process. It is applied to a vertical cyclic reaching which involves complex interaction torques.We evaluated whether the proposed paradigm can learn how to optimize solutions with a 3-joint, planar biomechanical model. The results demonstrate that the proposed method was valid for acquiring motor synergy and resulted in energy efficient solutions for different load conditions. The case in feedback control is largely affected by the interaction torques. In contrast, the trajectory is corrected over time with tacit learning toward optimal solutions.Energy efficient solutions were obtained by the emergence of motor synergy. During learning, the contribution from feedforward controller is augmented and the one from the feedback controller is significantly minimized down to 12% for no load at hand, 16% for a 0.5 kg load condition.The proposed paradigm could provide an optimization process in redundant system with dynamic-model-free and cost-function-free approach.

### **Keywords: feedback error learning, motor synergy, optimality, interaction torques, redundancy, Bernstein problem, tacit learning**

# **1. INTRODUCTION**

A human motor system can continuously act to improve its behavioral performance toward optimal movement. Motor learning and control are executed seamlessly, adapting to environmental variations and newly-generated desired goals based on a person's intentions. In addition, when we move our limbs to execute a motor task, our body has more degrees of freedom (DOF) than the number of dimensions in its task space. Kinematic redundancy can contribute to better dexterity and versatility, but incurs an ill-posed problem of inverse kinematics from the taskdescription space to the human joint space. Such an ill-posed problem of DOF was originally formulated by Bernstein (1967) as the DOF problem. It is still an open problem on how motor controllers in the brain solve kinematic redundancy.

It is known that the cerebellum takes an important role in such motor learning by developing the internal model while comparing the actual outcome to the predicted outcome (Wolpert et al., 1998; Kawato, 1999). Ito (1972) first proposed that the cerebellum contains forward models of the limbs. This internal model theory has been well supported by behavioral studies in the field of motor control (Schweighofer et al., 1998) and by neurophysiological studies (Kawato, 1999). To establish such an internal model, feedback-error-learning (FEL) is well studied to provide computational adaptation paradigms, including prism adaptation, saccade adaptation and reaching (Kawato and Gomi, 1992a,b). There is extensive evidence that the learning system using feedback error relies on the cerebellum. FEL can provide an algorithm to establish the internal inverse dynamics model by minimizing the error against the desired joint angle trajectory. However, particularly for a redundant system, it does not provide a mechanism that can systematically improve performance toward optimal solutions such as minimizing total energy or torque changes (Uno et al., 1989). FEL has a computational adaptability, but for computational optimality in motor redundancy, FEL should be used together with the so-called cost function based optimization (Schweighofer et al., 1998; Todorov, 2004; Braun et al., 2009). In a typical approach to using FEL with a redundant system, the desired joint angle trajectory is prepared using optimization to solve the redundant problem, and FEL is then applied to establish the inverse dynamics model to map the joint angles and torques.

Several types of optimality model have been proposed. Such models are often defined as "minimum X," where X can be jerk (Flash and Hogan, 1985), torque changes (Uno et al., 1989), motor command (Harris and Wolpert, 1998), and energy consumption (Alexander, 1997). In redundant manipulators, such cost function based optimal control was successfully applied in Todorov and Jordan (2002) and Guigon et al. (2007). In robotics, several methods were studied to deal with the redundancy (Nakamura, 1991; Nguyen-Tuong and Peters, 2011). They basically assume the use of a physical inverse dynamic model (Nakanishi et al., 2008) or approximation-based model (Peters and Schaal, 2008). The model-based cost function is commonly used for the optimization process. As for model-free approach, adaptive feedback control is already known in control society (Astrom, 1987; Marino and Tomei, 1993). Adaptive control is basically a mechanism of the parameter adjustment of the model or the gain adjustment of the controller using the trajectory error. However, adaptive control can not be applied to redundant systems without using cost function based optimization (Nakanishi and Schaal, 2004). In addition, dual task of the target task execution and the behavior optimization toward energy efficiency can not be performed in parallel.

The phenomenological optimal solutions appearing in human motion can be obtained using such a mathematical optimization approach. It is known that we employ muscle synergy (D'Avella et al., 2003; Alnajjar et al., 2013) for natural motion, which should have a relationship to optimal solutions in redundant space. Even if it is a known fact that an optimal motor pattern is employed phenomenologically, there is no evidence that shows the existence of a physiological process that may be similar to such a mathematical optimization in our brain or central nervous system (CNS). For instance, infants can modify their motion toward an optimal solution through repetitive interactions with the environment, but the appropriate cost function may initially not have been obtained. In addition, cost function based optimization is a process which involves a global image of the dynamic system over time rather than a simple feedback process of the states for an instant in time, which is a complex process to be embedded in the CNS as a modular configuration. Thus, we believe that there is an importance to find an alternative simpler computational paradigm which can induce equivalent optimization property. There are two types of redundancy in human motion. One is about muscle redundancy, and the other is about kinematic redundancy. In this study, we focus on the control solutions on kinematic redundancy, and we define a coordinated command pattern in the joint level as motor synergy.

In this study, we aim to find a more primitive computational mechanism to realize both adaptability and optimality in a redundant system with a dynamic-model-free and cost-function-free approach. A simple control architecture which can deal with optimality in a redundant system, can be a key organizational principle that CNS employs for achieving versatility and adaptability in motor control. FEL allows to establish the internal model, but does not provide an optimization process in a redundant system. Thus, the main contribution of this paper is to propose a new way of inducing optimization process without a prior knowledge of the system dynamics by using the task space error. Recently, a novel learning scheme named *Tacit Learning* was proposed (Shimoda and Kimura, 2010; Shimoda et al., 2013) as an unsupervised learning paradigm. Tacit learning is a biomimetic learning architecture where the primitive behaviors composed of reflex actions are tuned to the adaptive behavior. The experimental results demonstrated that the walking gait composed of primitive motions was well adapted to the environment in terms of walking efficiency (Shimoda et al., 2013). Here, we reformulated the paradigm as a supervised learning approach applied to simple cyclic reaching tasks using the feedback motor command error as a supervising signal.

This work is also oriented for reaching simulation of motorimpaired subjects and able-bodied subjects. The skeletal system has a complex series of linkages that produce coupled dynamics. For instance, when we quickly move our forearm by flexing the elbow joint, the flexion torques on the elbow joint accelerate our forearm. However, because of the forearm's inertia, this acceleration also produces torques on the shoulder. These interaction torques induce the undesired effect of accelerating the upper arm segment. The dynamics of multijoint limbs often cause such complex torques. However, an able-bodied subject can normally handle such interaction torques through motor learning and predict them without difficulty (Shadmehr and Wise, 2005; Braun et al., 2009). In contrast, the vertical reaching task was studied in patients with cerebellar lesions in Bastian et al. (1996). They concluded that cerebellar patients had specific deficits in their predictive compensation for the interaction torques. In control subjects, the elbow and shoulder joints rotated in a synergetic manner to compensate for the interaction torques (Gribble and Ostry, 1999). In a patient with cerebellar damage, it was difficult to control the endpoint of the arm in a synergetic way between multiple joints because of gravity and interaction torques. This implied that cerebellar damage affects the prediction of interaction torques that is normally based on the internal model established from motor learning. Thus, it is significant to enable the control simulation of motor performance for subjects who are successful in dealing with the interaction torques and those who can not manage. In this paper, the control simulation results with redundant actuators demonstrate that the proposed method can systematically produce motor synergies and energy efficiency while finding a way to compensate the interaction torques during multijoint reaching tasks.

# **2. MATERIALS AND METHODS**

# **2.1. VERTICAL REACHING AND DYNAMICS SIMULATION**

In this study, we propose an optimal control paradigm in motor learning which has adaptability similar to FEL and optimality without using cost-function based optimization. We verified the performance of tacit learning in vertical reaching that involves complex interaction torques and the gravitational effect, as shown in **Figure 1**. This configuration was used in Bastian et al. (1996). We evaluated whether the proposed computational learning paradigm can learn how to compensate the interaction torques during multijoint reaching.

For simulating joint dynamics, we used MatODE (Caarls, 2010) which is a Matlab interface to the Open Dynamics Engine (Russell Smith, 2000). In a sagittal plane, 3DOF composed of the shoulder, elbow and wrist joint were assumed. The upper arm, forearm and hand segments were connected through each joint in the dynamics simulation environment. All the dynamics simulations were managed by external ODE package, in which the control module has access only to the control of each joint torque, and none to the manipulator dynamics model itself, in the learning process. It should be noted that the configuration used in this study is in the so-called Bernstein's DOF problem where we have actuation redundancy because the task is performed in 2D with a 3DOF manipulator.

# **2.2. CONFIGURATION OF CONTROLLER WITH TACIT LEARNING**

In tacit learning, the command signal accumulations during repetitive interactions with the environment, play a main role in creating appropriate behavior. In biological controllers, signal accumulations can be considered as the typical learning method to create the adaptive behaviors, such as long-term depression (LTD) and long-term potentiation (LTP) in the cerebellum (Coesmans et al., 2004).

Previously in tacit learning for biped walking, joints were divided into kinematically specified and unspecified groups (Shimoda et al., 2013). The unspecified joints were then controlled with tacit learning as an unsupervised learning paradigm. In this reaching task, only the desired position in task space was given as a target to follow, and all the joints were controlled with tacit learning as in **Figure 2**. The block diagram was formulated as a supervised learning paradigm using the feedback motor command error. Conceptually, it has an approach in common with FEL in how to use feedback errors as supervising signals. However, in FEL, optimization of some criteria is still necessary to achieve optimality. Thus, we aim to provide a primitive mechanism for realizing such optimality along with a FEL-like controller without using cost function. As in the mechanism of the cerebellum with regard to LTD and LTP, simple tacit learning with torque signal accumulation is employed to realize both adaptation and optimal control synchronously. This study is oriented for compensation of the interaction torques of unknown multijoint dynamics. We assume that only forward kinematics (FK) information is available. In contrast, in FEL, the given inverse kinematics computation is typically assumed to establish the internal inverse dynamics (ID) model.

Let us explain the details of the proposed learning paradigm shown in **Figure 2** by listing separated steps:

1. The intention of the subject to follow the target is expressed by a force vector in the task space, which represents the direction to the target, and the distance as its intensity, using the proportional (P) feedback error between the target and current endpoint.

error between the target and current endpoint. The feedback torque command error at each joint space is computed through the Jacobian of the arm by mapping the feedback force into the joint torque space. Local PD control represents the local reflex loop as a function of a muscle spindle. The torque command accumulation part in gray color corresponds to tacit learning.


Specifically, the controllers for PD feedback and tacit learning can be expressed as follows.

PD feedback case:

$$\mathbf{r}\_1 = -J^T(\boldsymbol{\theta})k\Delta\mathbf{p} - \mathbf{A}\Delta\mathbf{\theta} - \mathbf{B}\dot{\boldsymbol{\theta}}.\tag{1}$$

Tacit Learning case:

$$\mathfrak{tr}\_2 = -\boldsymbol{J}^T(\boldsymbol{\theta})k\Delta\mathfrak{p} - A\Delta\mathfrak{\boldsymbol{\theta}} - \mathcal{B}\dot{\boldsymbol{\theta}} + \mathcal{C}\int \mathfrak{r}\_1 dt. \tag{2}$$

$$\{\mathfrak{v}\_1, \mathfrak{v}\_2, \Delta\mathfrak{v}, \dot{\mathfrak{v}} \in \mathbb{R}^m, \Delta\mathfrak{p} \in \mathbb{R}^n, \mathbf{J}^T(\mathfrak{v}) \in \mathbb{R}^{m \times n}, \mathbf{A}, \mathbf{B}, \mathbf{C} \in \mathbb{R}^{m \times m}\}$$

where, *m* is the number of the joints, *n* is the dimensional number of the task space, **τ** implies the control torque inputs of the joints, **θ** implies the angles of the joints, **θ**˙ implies the angular velocities of joints. *JT*(**θ**) is the transpose of the Jacobian of the arm, *k* is the gain of the task space propotional feedback, *p* is the endpoint error vector. This term corresponds to the neural substrate of force mapping functionality presumably due to corticospinal control (Bizzi et al., 1991).

*A* and *B* are diagonal matrices which consist of the proportional and derivative gains of the PD controllers. *C* is a diagonal matrix which consists of the gains of the torque command integration regarding motor-command error and local feedback torque. The term *A***θ** is optional, and it can be set if you specify the neutral position of the joint. In this simulation, this neutral position is specified only for the wrist joint, because the wrist tries to return to the neutral position when we relax.

As for local PD feedback, this part corresponds to a local reflex loop as a function of the muscle spindles (Shadmehr and Mussa-Ivaldi, 1994). When a muscle is stretched, primary sensory fibers of the muscle spindle respond to changes in muscle length and velocity. The reflexivity evoked activity in the alpha motoneurons is then transmitted via their efferent axons to the muscle, which generates force and thereby resists the stretch. This work is still at joint level representation, but the resisting feature against muscle length change and velocity change can be captured by the resisting feature in the joint angle and angular velocity changes as in local PD control.

Note that all joints are controlled independently, then this configuration can be regarded as a modular structure presumably implemented within cerebellar pathways. All dynamical parameters, such as segment inertia and mass, and the model itself, are completely blind to the controller. Differently from typical optimal solutions that is based on model-based cost functions, our approach is to produce such optimization process without using cost function, purely with repetitive interactions with the environment. It purely works only with the controller presented, that tends to find optimal solutions by the given dynamic environment. The difference between the PD feedback case and the tacit learning case is only the last term of the command signal accumulation in Equation (2).

### **2.3. MECHANISM OF TACIT LEARNING FOR REACHING**

As for the neurological explanation of the proposed control model, it has a common concept with FEL regarding the use of feedback error as a supervising signal. We can basically apply the same physiological roles as in FEL and the so-called internal model theory in the cerebellum (Kawato, 1999). The climbing fiber inputs to Purkinje cells carry error signals in the motor command coordinates, and their temporal waveforms can be well reproduced using the inverse dynamics model. The phase shift between feedback control and feedforward control during motor learning is well justified by obtaining the internal model in the cerebellum in previous papers (Kitazawa et al., 1998; Kawato, 1999). Feedforward movements are made without sensory feedback, which have predictive nature of the given dynamics. Feedback control, in contrast, involves modification of the current movement using information from sensory receptors and error detection. Optimal movement control likely reflects a combination of both feedback and feedforward processes (Desmurget and Grafton, 2000).

The difference in this work from a typical FEL configuration is first the point where the motor-command error is created by the mapping between the task space force and the joint space torque. In FEL, the optimized desired trajectory of position and velocity in joint space should be prepared in advance by optimizing some criteria specifically for the arm with redundant degrees of freedom (Schweighofer et al., 1998). Even if we use the Jacobian information, we do not perform inverse kinematic (IK) computation explicitly. The pseudo-inverse of Jacobian is not computed in this method differently from the typical methods in the robotics approach. Thus, the dimension reduction is not performed. The Jacobian itself can be obtained with the knowledge of the FK model. Thus, only FK information is assumed in this method, and the IK and ID models are unknown, then how to take the dynamics into account is being learned by the repetitive interactions with the environment. Thus, the controller design is different from a typical FEL configuration. In the proposed method, the optimality can also be addressed by tacit learning with command signal accumulations. Thus, along with the adaptivity originating from the FEL architecture, the optimal solution manageability can be a significant contribution of this study.

As for the explanation on how motor performance can be optimized over time, the motor command accumulation part serves as an energy feedback with task space directional information. Simply, in general error feedback control, when the error is feedback, the error can be minimized. Similarly, the integrated torque command contains an energy measure since it accumulates the past torque generation history during cyclic reaching task. Then this term works as directional energy feedback, thus naturally the energy can be minimized as it is in a feedback loop. Thus, tacit learning can induce energy minimization through the repetitive actions with the environment while minimizing the endpoint error toward a given target point in the task space.

## **2.4. CONTROL SIMULATION STUDY OF VERTICAL REACHING**

The task of vertical reaching is to move the endpoint of the arm following the target which is moving between two points at a frequency *f* = 0.5 Hz. These two points in the coordinate system of **Figure 1** and the moving target *r*(*t*) are given as follows:

$$\mathfrak{p}\_1 = \begin{bmatrix} 0.25 \ -0.5 \end{bmatrix}^T, \quad \mathfrak{p}\_2 = \begin{bmatrix} 0.35 \ -0.1 \end{bmatrix}^T.$$

$$r(t) = (\mathfrak{p}\_1 - \mathfrak{p}\_2)\sin(2\pi ft)/2 + (\mathfrak{p}\_1 + \mathfrak{p}\_2)/2. \tag{3}$$

Initial joint angles are θ<sup>1</sup> = 0◦, θ<sup>2</sup> = 90◦, θ<sup>3</sup> = 0◦. As for the segment length, the inertia around the z axis and the mass of the upper arm, forearm and hand, they are obtained from the anthropometric table reported in De Leva (1996). They are set respectively as follows:

*l*<sup>1</sup> = 0.282[m] *l*<sup>2</sup> = 0.269[m] *l*<sup>3</sup> = 0.086[m], *I*<sup>1</sup> = 0.01275[kgm2] *I*<sup>2</sup> = 0.006516[kgm2] *I*<sup>3</sup> = 0.001305[kgm2], *m*<sup>1</sup> = 1.978[kg] *m*<sup>2</sup> = 1.183[kg] *m*<sup>3</sup> = 0.445[kg].

The control gains are set respectively as follows:

$$k = 20.0, \quad \mathbf{A} = \begin{bmatrix} 0 \ 0 \ 0 \\ 0 \ 0 \ 0 \\ 0 \ 0 \ 0.05 \end{bmatrix}, \quad \mathbf{B} = \begin{bmatrix} 0.01 & 0 & 0 \\ 0 & 0.01 & 0 \\ 0 & 0 & 0.01 \end{bmatrix},$$

$$\mathbf{C} = \begin{bmatrix} 0.15 & 0 & 0 \\ 0 & 0.15 & 0 \\ 0 & 0 & 0.15 \end{bmatrix} \tag{4}$$

We investigate the motor learning with different loads at the endpoint. Two conditions, with no load or a 0.5 kg load attached to the hand, were evaluated. The energy consumption during each reaching cycle is compared between only PD control case and tacit learning controller in different load conditions. In this study, the 30% value of the above mass parameters both for arm segments and load, was used to achieve faster convergence of the learning within 60 s, to allow plotting of the whole range of the learning process in the limited space. Even in this condition, the allocated mass and inertial parameters create the effect of gravity and interaction torques. As long as these dynamical parameters are blind as set in this simulation, it does not influence the verification of the tacit learning performance except for the learning speed. The control gains are set the same for the two controllers and for the different load conditions.

### **2.5. COMPARISON TO MODEL-BASED OPTIMIZATION**

The above proposed method is purely based on control with sensory feedback information and FK model without using a knowledge of manipulator dynamics. The standard solution for redundant system control is to use mathmatical optimization with dynamics model as described in introduction. Thus, we have also tried the model-based optimization to compare to the result of tacit learning. We first define the equation of motion of the manipulator with *m* revolve joints. The equation of motion of such manipulator can be described as follows (Nakamura, 1991):

$$\mathbf{r} = \mathcal{R}(\boldsymbol{\theta})\ddot{\boldsymbol{\theta}} + \frac{1}{2}\dot{\mathcal{R}}(\boldsymbol{\theta})\dot{\boldsymbol{\theta}} + \mathcal{S}(\boldsymbol{\theta}, \dot{\boldsymbol{\theta}})\dot{\boldsymbol{\theta}} + \mathcal{g}(\boldsymbol{\theta}), \tag{5}$$

where **θ**, **θ˙**, **θ¨** ∈ *Rm* implies the vectors of joint angle, angular velocities and acceleration, respectively. We assume that **θ** expresses the relative angles between neighboring links. *R*(**θ**) ∈ *Rm*×*<sup>m</sup>* is the inertia matrix that is symmetric and positive definite. The eigenvalue of *R*(**θ**) has the upper and lower bounds for any **θ** because all elements in *R*(**θ**) are the constant or the trigonometric function of **θ**. *S*(**θ**, **θ**˙) ∈ *Rm* denotes centrifugal and Coriolis forces. *g*(**θ**) ∈ *Rm* is the gravitational component derived from the potential energy of the manipulator *U*(**θ**). All elements in *g*(**θ**) are trigonometric functions of **θ**. The link length, link mass and inertia are set as indicated in the previous section.

The Matlab function fminunc was used to optimize the joint torque **τ** with the constraints of the endpoint at the desired trajectory. The inverse dynamics is available with closed-form explicit equations as in Equation (5). This allows the joint angle to be calculated from the torques and vice versa. Optimal control solutions were obtained by finding deterministic controls **τ**(*t*) = {τ*i*(*t*)}(*i* = 1...*m*) in [*t*0;*tf* ] such that the cost function

$$E = \sum\_{i=1\ldots m} \int\_{\{t \neq i\}} \pi\_i(t)dt\tag{6}$$

is minimum during the cyclic reaching task. Fifty discrete points per one reaching cycle of 2 s are used for the optimization process.

# **3. RESULTS**

## **3.1. MOTOR CONTROL WITH TACIT LEARNING**

To evaluate the performance of the proposed tacit learning, we compare the control results for vertical reaching between **(A)** a PD feedback controller and **(B)** tacit learning with feedback controller. The task of vertical reaching is to move the endpoint of the arm following the target.

**Figure 3** shows a control result for vertical reaching. The first plot is the endpoint in the case with only PD feedback, and the second plot is with tacit learning in addition to feedback control. The time sequential transition is illustrated using a color map which changes depending on the progress of time. The color map correspondence to time can be seen in the color bar on the right side of figure. A cool color map is used for **(A)** PD feedback control, and a jet color map is used for **(B)** tacit learning. This colormap configuration is also used in other figures in this Result section.

**Figure 3** shows that PD control is largely affected by gravity and the interaction torques. In contrast, we see that the trajectory is corrected in time in the case of tacit learning, minimizing the effect of gravity and interaction torques. **Figure 4** shows a phase portrait for the joint angle-angular velocity of the shoulder, elbow and wrist joints. We notice that each joint phase portrait in tacit learning is gradually shifted from its original form that is similar to the form in the PD controller. It seems that the joint space around the neutral position, which is θ<sup>1</sup> = 0◦, θ<sup>2</sup> = 90◦, θ<sup>3</sup> = 0◦., is being sought regardless of the effect of gravity and interaction torques.

# **3.2. EMERGENCE OF MOTOR SYNERGY VIA TACIT LEARNING**

**Figure 5** indicates a phase portrait of the shoulder-elbow joint angle in no load and 0.5 kg load conditions. We can find that the phase in tacit learning converges into more aligned synergetic solutions compared with the case with only feedback control. The phase form in PD control changed with the addition of a 0.5 kg

load. It represents that the controller is significantly influenced by gravity and the interaction torques.

As a result, it shows more unrelated and non-synergetic solutions between the shoulder and the elbow for PD control. In contrast, the phase form is similar for different load conditions in tacit learning. The shoulder and elbow joints are used in a synergetic way even with the load. As the metrics of the joint synergy, the coefficient of correlation between joint angles is calculated as in **Figure 5**. It showed low value in both PD control cases, in contrast, it showed high value after the learning process in tacit learning case. This implies that tacit learning allowed it to learn how to manage the interaction torques and to find synergetic combinations between neighboring joints to achieve efficiency in multijoint coordination. It is interesting to see such a synergetic solution is gradually found with the dynamics-modelfree and cost-function-free approach. "synergistic solutions" can be considered as equivalent to "reduced space coordination." The aligned solutions toward reduced dimension was appeared in different load conditions with tacit learning. It implies that the aligned solutions is also robust to the dynamic condition changes, since the phase form is not necessary to be largely modified.

The integrated torque term of the wrist joint is depicted in **Figure 6**. The plot in the cool color map represents the case with PD feedback. In this mode, there is no integrated torque term in the controller, but this term was computed for comparison. The plot in the jet color map is the integrated torque term in tacit learning. We can find that the torque pattern of this term is converging into a certain form as in **Figure 6**. This torque pattern can be regarded as the part that compensates for gravity and the interaction torques of the dynamic system. In this sense, this torque integration term can be considered a feedforward (FF) controller which anticipates the environmental interactions during the reaching task. Motor learning is a process that develops a feedforward controller and minimizes the contributions from

indicates the result with only feedback control, and the line in the jet color map is that for tacit learning. We see the phase in tacit learning converges

correlation between joint angles represents the synergetic joint usage in the tacit learning case.

the feedback controller. During learning, the contribution from FF increased and the torque from FF converged into a certain pattern. Thus, tacit learning naturally matches this neurological learning process.

# **3.3. ENERGY EFFICIENCY AND TRACKING ERROR MINIMIZATION**

Energy consumption in one cycle of reaching was measured and compared between PD control and tacit learning in different load conditions. The vertical reaching in this study was the motion between two vertically located points at a frequency of 0.5 Hz. Therefore, the energy consumption during every 2 s period was calculated by summing each joint energy consumption 2πτθ˙. The transition of energy consumption in tacit learning with a 0.5 kg load is illustrated in **Figure 7** (middle). The corresponding endpoint error is also plotted with the calculation of the rootmean-square (RMS) error between the target point and current endpoint during one cycle as in **Figure 7** (top). The energy used in each joint is depicted by the red line for the shoulder, green for the elbow, and blue for the wrist joint as in **Figure 7** (bottom). In Equation (2), the torque component of PD feedback can be regarded as a feedback (FB) controller, and the integration term can be regarded as a FF controller. The energy consumption by each torque component was also computed, as shown in **Figure 7** (middle). For comparison, the transition of endpoint error and the energy consumption with a 0.5 kg load with feedback control case is shown in **Figure 8**. Initial transition phase is purely due to the stored energy in the mass-spring-damping property by PD control, since the arm starts to move from the stopped condition toward the moving target. Except this period, there is no adaptation process both in end point error as well as energy consumption. From the graph in **Figure 7**, the endpoint error is minimized asymptotically during motor learning. The energy consumption is also minimized globally, while the contribution from the FF controller is augmented in the course of motor learning and the contribution from the FB controller is minimized. In addition, we notice that the energy used in the elbow increased, while that in the shoulder decreased.

The endpoint error and energy consumption transition versus time is summarized in **Table 1** for all conditions. We can notice that the energy consumption in tacit learning is minimized during motor learning, while the energy transition remains

**learning (middle).** Not only improving the target tracking accuracy, but tacit learning solutions result in efficient total energy consumption. In addition, it was possible to observe the contribution ratio was switched between FB and FF controllers. Initially FB was mainly used, and with learning progress, the energy consumption with FB is significantly minimized. In addition, we notice that the energy used in the elbow increased while that in the shoulder decreased (**bottom**).

constant in the case of feedback control because there is no adaptive functionality. As for endpoint error, it was also minimized to improve target tracking accuracy in tacit learning, while it remained constant in FB control. The energy consumption ratio by the FF controller was augmented, while that of the FB controller decreased. The figures in parentheses in **Table 1** indicate the cycle-to-cycle variability to evaluate the convergence of tacit learning. We can confirm that the error, the energy and the contributions of FF and FB are all converged in the course of the optimization process in tacit learning.

We should note that more energy was naturally necessary when we try to follow the moving target more precisely. The absolute energy consumption was not significantly different between the only feedback and the tacit learning cases. However, we should remember that the tracking error was largely different in both cases.

To realize accurate target tracking performance with only PD control, much greater energy will be required by increasing feedback gain because the dynamic effect should be canceled precisely, while having the conflicts between joints. With a load of 0.5 kg in tacit learning, both accuracy and total energy clearly improved with synergetic motor control of the shoulder and elbow joints,

as a result of the optimality in tacit learning. In high-gain PD case, the four times larger gain *k* of the task space propotional feedback is used. Even though the end point error is similar scale to the one in tacit learning, the energy consumption becomes larger. Between PD and Tacit, the same gain *k* of the task space propotional feedback is used.

In addition, to show the performance for other tasks in different directions, a result for multidirectional reaching is shown as in **Figure 9**. The target moving line was tilted with every 60◦ for 3 directions. For each different target direction, the learning is started from center position with no prior knowledge. Here, the same control gains were employed for all the cases. Thus, we can find that the dynamics effect appears differently for different directions. For instance, the error and the energy consumption was different for each direction. Especially, direction 2 required larger energy than other directions. If we remind that the shoulder position is at origin of the coordinate, the inertial momentum around the shoulder and the swing-up momentum around the elbow can not be used effectively for direction 2, then we imagine that it was resulted in higher energy requirement.

# **3.4. RESULT WITH MODEL-BASED OPTIMIZATION**

We analyzed the system also with model-based optimization by using a knowledge of the dynamics model. The endpoint error and energy consumption with the solution produced by modelbased optimization are summarized in **Table 2**. We can notice that the energy consumption is similar to the one converged in tacit learning. It implies that the solution in tacit learning is close to the optimal solution by the so-called optimization approach. If we remind that in tacit learning, no dynamics knowledge is used differently from the case in so-called optimization, the result supports the advantage of the proposed method.


### **Table 1 | Endpoint RMS error (m) and energy consumption (J) in one cycle of reaching.**

*The figures in parentheses indicate the cycle-to-cycle variability to evaluate the convergence of tacit learning.*

*In high-gain PD, the four times larger gain k of the task space propotional feedback is used. Even though the end point error is similar scale to the one in tacit learning, the energy consumption becomes larger. In contrast, the end point error is being minimized as well as the energy consumption in tacit learning. Between PD and Tacit, the same gain k of the task space propotional feedback is used.*

**FIGURE 9 | Endpoint transition (no load at hand) for multidirectional reaching with tacit learning.** The target moving line was tilted with every 60◦. For each different target direction, each learning is started from center position with no prior knowledge. The resultant error of the end point and the energy consumption is indicated as well as the contribution information of FF and FB controllers.

# **4. CONCLUSIONS AND DISCUSSION**

In this paper, we proposed a novel computational control paradigm in motor learning for a reaching task, especially vertical reaching which involves the management of interaction torques and gravitational effects. From the control results, we claim that the proposed method is valid for acquiring motor synergy in the system with actuation redundancy. We highlighted that tacit learning provides computational adaptability and optimality with dynamic-model-free and cost-function-free approach, in contrast to previous studies. Energy efficient solutions were obtained by

**Table 2 | Endpoint error and energy consumption in the case of model-based optimization.**


the emergence of motor synergy in the redundant actuation space. Not only were the target tracking accuracy and energy efficiency improved, but the learning behavior was supported by a finding that the shift of contributions between the FB and FF controllers is observed, as shown in **Figure 7** (middle). Phenomenologically, this shift fits well with the findings reported in the internal model theory (Kawato, 1999).

Finally, the FF torque pattern converged to a specific temporal pattern in order to manage the given dynamics, as shown in **Figure 6**. Such effect of command signal accumulation may be regarded as phenomenological LTP and LTD, provided by the tacit learning. The above explanation is a qualitative interpretation. As for a theoretical explanation of the learning process, motor command accumulation part served as an energy feedback with task space directional information. As same as the case where the error can be minimized in error feedback loop, energy could be minimized when the integrated torque command was in the feedback loop through the repetitive actions with the environment.

In this work, FEL was taken as an example to be contrasted with the proposed method, but the result of this work actually doesn't conflict with FEL at all. On the one hand, the proposed method can be regarded as a special form of FEL. On the other, a neural network FEL architecture (Kawato et al., 1987) is still useful to memorize the optimal control solutions in the obtained behavior together with sensory feedback signals for managing discrete movements. Thus, the proposed method could coexist alongside the conventional neural network FEL by adding a new optimality feature of the proposed method in a complementary role.

In addition, the simulation results showed good correspondence to the experimental results reported in Bastian et al. (1996). In their experiment, they found that the inability to produce accommodating joint torques for the dynamic interaction torques appeared to be an important cause of kinematic deficiencies shown by subjects with cerebellar abnormalities. Thus, the reaching by them showed lack of coordination of the shoulder and elbow joints, and a curved endpoint trajectory according to Bastian et al. (1996). The characteristics of reaching in subjects with cerebellar abnormality are equivalent to the results of **(A)** with only FB control in this study. In the case of PD feedback, we can confirm the failure to compensate for gravity and the interaction torques in **Figure 3**, and the curved non-synergistic joint use in **Figure 5**, which showed the conflicts between the joints. The level of the interference was higher in a 0.5 kg load condition because the interaction torque levels got higher.

In contrast, the experimental reaching of an able-bodied subject showed the correspondence to the result **(B)** with tacit learning. In Bastian et al. (1996), they suggest that a major role of the cerebellum is in generating joint torques with prediction of the interaction torques being generated by other moving joints and compensating for them. It implies that the proposed computational learning paradigm well represents the learning principles actually taking place in the cerebellum. The failure to manage interaction torques leads to the situation where one joint motion affects the motion of another. Thus, the solution to managing these environmental forces should be achieved by finding synergetic use of neighboring joints. When the conflicting torque could be minimized, it can naturally result in energy effective motion. It can be a reason why we employ motor synergy that can reduce the interaction conflicts in a multijoint system. In the results, the energy used in the elbow increased, while that in the shoulder decreased. This is one of the results from the strategy where an energy effective solution is being learned because the joint angle acceleration in the shoulder involves all the arm segments from the upper arm to the hand, while the joint angle acceleration in the elbow involves only the forearm and the hand, which are half the total mass in the arm dynamics system. Tacit learning found it only by repetitive interactions with the environment without using a dynamic model and cost function. This process is similar to human motor control principles, where even an infant can improve his motor control ability by repetitions without thinking about it. Increasing the contribution of the FF controller that corresponds to the so-called internal model development also matches well the nature of computational motor learning in a human being (Kawato, 1999). In **Table 1**, the contribution ratio was switched between FB and FF, initially FB was fully used, and with learning progress, the energy consumption with FB is significantly minimized down to 12% for no load, 16% for a 0.5 kg load condition. Instead, the contribution of FF is increased from initial 0% to 95% for no load, 96% for a 0.5 kg load condition. Since there is still remained conflicts between FB and FF solutions, the sum of both goes over 100%.

The results demonstrated in this paper also concern Bernstein's DOF problem. The Bernstein problem is how the CNS can find an optimal solution with actuation redundancy. The use of motor synergy was pointed out by Bernstein, but a fundamental motor control principal that can generate motor synergy has not yet been reported in neuroscience. In this study, it is a simple situation of actuation redundancy, but the proposed tacit learning first managed to generate motor synergy by a simple computational principle, which is more likely to be embedded as a modular configuration in the CNS, rather than the so-called cost function based mathematical optimization approach. The obtained solution in tacit learning also showed the similar energy consumption to the case with such model-based optimization.

In this study, we have not conducted the tuning of control gains, since we preferred to propose a new type of computational paradigm which can manage redundant system optimization process. However, in the given dynamic conditions and the given control gains, the tacit learning showed that the contribution of the FF controller is augmented in the course of motor learning and the contribution from the FB controller is minimized for all the directions and different loads. Further, more optimal solutions may be obtained in different control gain conditions. However, the detailed analysis on the dynamic stability of the system, would be required for the generalization of the control gain tuning for future work. Simple cyclic reaching task was used in this study to show the optimization process in redundant system. Toward complex task management, additional work would be necessary to establish the internal model from the optimized torque solutions obtained from the proposed method.

A recent study from the group of G. Courtine (Van Den Brand et al., 2012) reported that smart circuits embedded in the brain stem and spinal cord, may elaborate the detailed motor command toward optimum motor states, based on the supraspinal signal, current limb position, and constraints. As the proposed controller has a simple modular paradigm with a distributed structure that can be embedded into the individual controllers for multijoint coordination to achieve adaptivity and optimality for the total system, the proposed computational principle may also help to represent spinal adaptivity toward optimal solutions.

### **ACKNOWLEDGMENTS**

This work was supported by Toyota Motor Corporation.

# **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 12 May 2013; accepted: 10 February 2014; published online: 28 February 2014.*

*Citation: Hayashibe M and Shimoda S (2014) Synergetic motor control paradigm for optimizing energy efficiency of multijoint reaching via tacit learning. Front. Comput. Neurosci. 8:21. doi: 10.3389/fncom.2014.00021*

*This article was submitted to the journal Frontiers in Computational Neuroscience. Copyright © 2014 Hayashibe and Shimoda. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# A hexapod walker using a heterarchical architecture for action selection

### *Malte Schilling1 \*, Jan Paskarbeit 1, Thierry Hoinville2, Arne Hüffmeier 1, Axel Schneider 1, Josef Schmitz <sup>1</sup> and Holk Cruse1*

*<sup>1</sup> Center of Excellence 'Cognitive Interaction Technology,' Bielefeld University, Germany*

*<sup>2</sup> Biological Cybernetics, Bielefeld University, Germany*

### *Edited by:*

*Thomas Schack, Bielefeld University, Germany*

### *Reviewed by:*

*Florentin Wörgötter, University Goettingen, Germany Örjan Ekeberg, Royal Institute of Technology, Sweden Paolo Arena, DIEEI -University of Catania, Italy*

### *\*Correspondence:*

*Malte Schilling, Center of Excellence 'Cognitive Interaction Technology', University of Bielefeld, D-33594 Bielefeld, Germany e-mail: malteschilling@ googlemail.com*

Moving in a cluttered environment with a six-legged walking machine that has additional body actuators, therefore controlling 22 DoFs, is not a trivial task. Already simple forward walking on a flat plane requires the system to select between different internal states. The orchestration of these states depends on walking velocity and on external disturbances. Such disturbances occur continuously, for example due to irregular up-and-down movements of the body or slipping of the legs, even on flat surfaces, in particular when negotiating tight curves. The number of possible states is further increased when the system is allowed to walk backward or when front legs are used as grippers and cannot contribute to walking. Further states are necessary for expansion that allow for navigation. Here we demonstrate a solution for the selection and sequencing of different (attractor) states required to control different behaviors as are forward walking at different speeds, backward walking, as well as negotiation of tight curves. This selection is made by a recurrent neural network (RNN) of motivation units, controlling a bank of decentralized memory elements in combination with the feedback through the environment. The underlying heterarchical architecture of the network allows to select various combinations of these elements. This modular approach representing an example of neural reuse of a limited number of procedures allows for adaptation to different internal and external conditions. A way is sketched as to how this approach may be expanded to form a cognitive system being able to plan ahead. This architecture is characterized by different types of modules being arranged in layers and columns, but the complete network can also be considered as a holistic system showing emergent properties which cannot be attributed to a specific module.

### **Keywords: insect locomotion, motor control, decentral architecture, action selection**

# **INTRODUCTION**

In this article, we propose a simple neural architecture that consists of basically independent, parallel sensori-motor procedures—or modules—that allows to orchestrate these modules. In addition this architecture shows the property for easy expansions of the system. This architecture is not based on a specific biological brain structure, but is inspired by behavioral experiments on insects (Cruse et al., 2009a) as are walking on unpredictable environment, performed with stick insects, and navigation, performed with desert ants and honey bees. Nonetheless, such a modular architecture may be of broader interest because many authors assume that a modular structure is a basic property of brains in general.

For example, Anderson (2010) has argued that evolution had to find specific solutions for quite different requirements posed by specific environmental conditions as are locomotion, mating, navigation or feeding, and problems occurring later during evolutionary development may be solved by combining existing (functional) modules in different ways, following the principle of "neural reuse" (Anderson, 2010). In this way, different procedures might be developed that finally serve the same or very much related purposes, thus leading to redundant structures. In insect navigation, for example, path integration and landmark navigation are used in parallel. In this case, different input modules are used to drive the same output elements. In turn, the same motor structure may be used for different purposes. This is obvious in the case of the Praying Mantis, where front legs can be used for walking, but can also be used for a completely different purpose, namely for catching prey, which requires specific neuronal structures for these different functions. Furthermore, Flash and Hochner (2005) have reviewed results on both vertebrates and invertebrates that lead these authors to the interpretation that "many different movements can be derived from a limited number of stored primitives." These movements can further be "combined through a well defined syntax of action to form more complex action."

Therefore, many results suggest the existence of such discrete primitives. The question as to how behavioral choice is performed, i.e., how the lower level elements are selected and orchestrated to control a specific behavior is still open (Briggman and Kristan, 2008). Activating such a subfset means to select a specific internal state that sets priorities. At a higher level, this corresponds to the faculty of selective attention, i.e., the enactment of internal states that control which sensory input may be exploited. This top–down influence may be complemented by bottom–up attention: sufficiently strong, specific sensory inputs could influence and change the internal state of the system.

In this article, we propose a way how this problem could be addressed by exploiting an artificial neural network, Walknet, that has been developed to describe a huge amount of biological data concerning insect, i.e., hexapod (forward) walking (Dürr et al., 2004; Schilling et al., 2013). This network consists of a number of parallel modular elements able to control "microbehaviors" as for example swing movement of the left front leg or stance movement of the right middle leg. This decentralized architecture is able to control forward walking within a continuous range of velocities as observed in stick insects (Graham, 1972) or Drosophila (Wosnitza et al., 2013). Walknet can further describe a large number of behavioral experiments performed with stick insects, including difficult cases like climbing over a gap of about body size. In doing so, the controller shows its robustness with respect to disturbances resulting from unpredictable environment but also from the complex dynamics of the own body.

So, Walknet is able to deal with complex behavior, but still controls only one overarching specific context, namely forward walking as will be briefly reviewed in Section The Basic Version (for a recent review see Schilling et al., 2013). According to this structure, only local decisions are required, which concern the decision between swing and stance for a given leg. Walknet has been expanded by a body model, briefly reviewed in Section Body Model (for details see Schilling, 2011; Schilling et al., 2012), that represents the kinematics of the body, i.e., all 18 joints of the six legs plus two body joints, each with 2◦ of freedom, as can be found in stick insects and as they are used by the physical hexapod robot Hector (Schneider et al., 2011). However, to allow for controlling different behaviors, further decision structures are required. A simple case concerns the decision between forward and backward walking. In addition, the system may decide between 6-legged walking and 4-legged walking, where the front legs may be used for other purposes, for example as grippers, as is the case in the above mentioned Mantis. A more general problem concerns the ability to exhibit trial-and-error behavior, i.e., select a behavior that is not selectable in the current context. This faculty is a prerequisite for a further capability, namely being able to plan ahead (i.e., internal simulation of a selected behavior), a property which may not be found on the insect level. Another problem concerns the question how biological systems may be able to invent and deal with symbols, or new concepts. Hints exist that this property can already be found in insects (Giurfa et al., 2001), at least on a simple level.

Based on the already given Walknet, in this article we will introduce a neural architecture that provides a precondition for dealing with those problems. To this end, we will adopt a structure already successfully applied within a network, Navinet, that is able to control insect-like navigation (Cruse and Wehner, 2011; Hoinville et al., 2012). Navinet can then be used as an expansion of Walknet to allow for decisions at higher levels of integration. For example, a foraging ant may decide between selecting one of different food sources stored in memory, and, at a lower level, to select between inbound or outbound travels. After having selected the food source, the ant may decide to attend a visual landmark seen or not. In Navinet, this ability is given by a so-called motivation unit network, a structure that will now be applied to Walknet, too (Section Motivation Unit Network). This expansion allows the complete system to decide between different behaviors at different levels of integration. In the final version this will lead us to an architecture that will consist of many parallel "columns" which are organized in four layers. As the motivation unit network can be considered the "backbone" of the architecture proposed here, we call the latter Motivation Unit Based Columnar Architecture (MUBCA).

In this article we will show how Walknet can be expanded by motivation units and still does inherit its earlier properties. To this end the behavior of the expanded Walknet will be tested in different situations of forward walking including starting from uncomfortable leg configurations and the negotiation of very tight curves. These tests are critical for studying the stability of the system because due to the irregular stepping patterns, resulting from the dynamical properties of the body including possible leg slipping, the complete system is subject to continuous disturbances. We further will show that the network can select between states of forward walking and backward walking. To illustrate how the motivation unit network could be expanded for further procedures, as an example we will discuss the case of switching between 6-legged walking and 4 legged walking (Section Discussion) and will note how to connect Walknet with Navinet to equip the complete system with insectlike navigation procedures as are vector navigation and landmark navigation. In addition, to illustrate the capacity of this architecture we will briefly sketch how this structure could be exploited to equip the complete system with cognitive abilities in the sense to be able to plan ahead and how the use of symbols may be possible.

# **WALKNET**

# **THE BASIC VERSION**

Tightly based on the morphology of a stick insect (*Carausius morosus*), the walker has six legs, each of which is equipped with three joints. The abdomen and the head of the insect are not functionally relevant for walking. Therefore, the body of the walker consists of only three functional relevant body segments. The leg pairs are connected to those three body segments. The body segments are connected by joints, each allowing for 2◦ of freedom (up and down as well as side movements of the body segments). Therefore, the controller has to deal with 22◦ of freedom (DoF). As the position of one body segment in space is defined by only six DoFs (three for position in space, three for orientation) there are 16 DoFs free to be decided upon by the controller. However, "free" does not mean that the controller may leave the decision open, rather it has to make these 22 decisions in a sensible way at any moment of time and, as mentioned, in an unpredictable environment. Note that in many similar robots the number of free DoFs is artificially reduced by a central controller (e.g., an explicit tripod controller), which simplifies the control but, on the other hand, restricts the flexibility of the system.

As a first step—and to make it simple for our understanding, but not necessarily for the system itself—, the walker is not equipped with distance sensors like vision or acoustic sensors, but only with tactile sensors situated in the legs (and possibly the antennae) measuring contact with external objects, and with proprioceptors measuring position and velocities of joints.

The walking system to be described in the following and that has been tested to be able to control a six-legged robot (Dürr et al., 2004; Schmitz et al., 2008), is based on behavioral (and to some extent neurophysiological) studies on insects, in particular stick insects. For the purpose required here the reactive controller will also be equipped with elements forming plausible expansions. At first, we briefly describe the essentials of the earlier version, Walknet, and will then introduce the expansions in Sections Further Procedural Elements and Body Model.

Forward walking in stick insects, which has already been intensively studied (Cruse et al., 1998), consists, on the leg level, of the stance movement, during which the leg maintains ground contact and is retracted to propel the body forward while supporting the weight of the body, and the swing movement where the leg is lifted off the ground and moved in the direction of walking to touch down at the location where the next stance should begin. Experiments on the walking stick insect have shown that the neuronal system is organized in a decentralized way (Wendler, 1964; Bässler, 1983; Cruse, 1990). Derived from these results, a model has been proposed in which each leg is attributed to a separate controller (Dürr et al., 2004). These single leg controllers are assumed to be situated in the thoracic ganglia [for a review see (Bässler and Büschges, 1998)]. **Figure 1** sketches the approximate anatomical arrangement of the controllers and the numbering of the legs. Each controller is in charge of the behavior of the connected leg, as the controller decides which behavior is executed by this leg and in which way the joints are moved. **Figure 2** shows the details of the controllers as used in Walknet for two legs (leg1, leg2, e.g., the right front leg and the right middle leg). A single leg controller mainly consists of several procedures that are realized by artificial neurons forming a local, in general, recurrent neural network (RNN). In most cases these networks consist of perceptron-like feedforward networks. These modules might receive direct sensory input and provide output signals that can be used for driving motor elements. But other modules may also provide input to a module. All these networks may be considered to form elements of the procedural memory. The two most important procedural elements in our example are the Swing-net, responsible for controlling a swing movement, and the Stance-net controlling a stance movement [**Figure 2**, see (Dürr et al., 2004; Schumm and Cruse, 2006) for details concerning the Swing-net, and (Schmitz et al., 2008) for Stance-net]. In addition, each leg possesses a so-called Target\_fw-net (**Figure 2**). This net influences the Swing-net to determine the endpoint of the swing movement during forward walking. During normal forward walking the swing end-position is situated in the anterior section of the leg's range of movement (anterior extreme position, AEP). During stance the leg is moved backward until it reaches the posterior extreme position (PEP), the latter being represented by another memory element (not depicted in **Figure 2**).

**FIGURE 1 | Schema of the morphological arrangement of the leg controllers and the coordination influences (1–6) between legs.** Legs are marked by L for left legs and R for right legs and numbered from 1 to 3 for front, middle, and hind legs, respectively. The question mark indicates that there are ambiguous data concerning this influence.

Furthermore, a leg controller must also take into account the interaction with the other legs. Part of these interactions are mediated directly by the body and through the environment, making explicit computations superfluous [see, e.g., the local positive velocity feedback approach (Schmitz et al., 2008)]. While the physical coupling through the environment is important, it is not sufficient. In addition, the controllers of neighboring legs are coupled via a small number of channels transmitting information concerning the actual state of that leg (e.g., swing, stance) or its position (i.e., values of joint angles). These coordination rules were derived from behavioral experiments on walking sticks (Cruse, 1990; Dean, 1991a,b, 1992a,b). In **Figure 1** the channels are numbered 1–6. Coordination rules 1–3 influence the length of the stance movement by influencing the transition from stance to swing movement, i.e., they change the PEP value. The Target-nets (rule 4) influence the AEP. As an example, in **Figure 2** (dashed line) only one connection is shown, rule 1 (r1), which suppresses the start of a swing movement of the anterior leg (in this case the front leg, leg1) during the swing movement of the posterior leg (here the middle leg, leg2).

The local sensory influences and the coordination influences are integrated in the so-called analog selector net (Schilling et al., 2007) that decides whether a swing movement or a stance movement is performed. Activation of the Stance-net is triggered by ground contact, activation of Swing-net is triggered when the current PEP value is reached. In **Figure 2** the selector net is represented by two units (marked red) which are connected by mutual inhibition, thus forming a winner-take-all network (WTA-net, for details see Section Motivation Unit Network). Activation of such a unit (between 0 and 1) controls the output of the corresponding procedure in a multiplicative way. The representation of the value for the default PEP is not depicted in **Figure 2** as well as the detailed influence of the coordination rules influencing the actual PEP value (for reviews see Dürr et al., 2004; Schilling et al., 2013).

Kinematic and dynamic simulations as well as tests on robots have shown that this network can control walking in different velocities, producing different insect gaits including the continuous transition between the so called tetrapod gait and the tripod gait, negotiate curves (Kindermann, 2002), climbing over obstacles (Dürr et al., 2004) and over very large gaps (Bläsing, 2006), and coping with leg loss (Schilling et al., 2007). Thus, Walknet exhibits a free gait controller where the gaits are not explicitly implemented but emerge from a strictly decentralized architecture. Including some more recent extensions, Walknet can describe further behaviors observed in stick insects walking on variously shaped substrates (e.g., Diederich et al., 2002; Schumm and Cruse, 2006).

In the following, we will expand Walknet as illustrated in **Figure 3**. These expansions concern (i) the introduction of further procedural elements (Section Further Procedural Elements), (ii) a body model (Section Body Model), and (iii) a motivation unit network (Section Motivation Unit Network). The properties of the robot simulator will briefly sketched in the Appendix.

### **FURTHER PROCEDURAL ELEMENTS**

Further procedures are required when the controller should not only be able to perform one type of behavior, for example forward walking, but also others like backward walking. Specifically, further Target-nets are introduced which can influence the corresponding Swing-net to move the leg in posterior direction as during backward walking (bw) the swing end position is in the posterior (rear) section (**Figure 3**, Target\_bw-net). Correspondingly, the default end position of the stance movement is represented by a "PEP-net," one for forward walking and another one for backward walking. For simplicity the latter are not depicted in **Figure 3**. The Target\_fw-net mentioned above can be realized by a two-layer perceptron (Dean, 1990). Its function is to compute the anterior target position, the leg aims at during the swing movement. The anterior target is usually situated directly behind the current position of its anterior neighboring leg. The computation therefore represents the inverse kinematics. Alternatively, and this version is used in the simulation shown here, the Target-nets and the PEP memories are realized as a three-unit RNN with bias, representing a static memory element storing the corresponding target position. For backward walking in addition the corresponding coordination rules are required but not depicted in **Figure 3**.

### **BODY MODEL**

A second important expansion concerns the construction and implementation of a body model. This body model is represented by a specific RNN (Schilling, 2011) and has a modular structure (Schilling et al., 2012). It consists of six networks each representing one leg. These modules are connected on a higher level forming a seventh network representing the whole body. This network represents the central body and the legs in an only abstracted form. In **Figure 3** the elements of the body model are marked in blue. Thus, the body model is represented by a modular structure which, as it is constructed as a RNN, at the same time comprises a holistic system [**Figure 4**, for details concerning the body model see (Schilling, 2011; Schilling et al., 2012)].

In normal walking the body model is used for controlling stance movements. It is used in forward and backward walking as well as in negotiating curves and provides joint control signals to the system. Sensory data are fed into the body model. Due to its holistic structure the body model integrates redundant sensory information and is able to correct possible errors in the sensor data (Schilling and Cruse, 2012). As will be sketched in the Discussion, due to its ability of pattern completion, this model can also be used as a forward model (Schilling, 2009). Therefore, the model allows for prediction, too.

The function of the body model is to mediate the coupling between the single leg vectors (**Figure 4**). During a movement these vectors have to be moved in a coherent way. While the body moves to the front, the feet should stay on the same place on the ground, i.e., the relative position between the feet must not change. As the model mirrors the 22◦ of freedom of the insect body the task is underdetermined and therefore this is still a hard problem and a unique solution is not directly computable (Schilling and Cruse, 2012).

As a solution, we apply the idea of the passive motion paradigm to this problem (von Kleist, 1810; Mussa Ivaldi et al., 1988; Loeb, 2001). Like a simulated marionette puppet (**Figure 5B**), the internally simulated body is pulled by its head in the direction of desired body movement (**Figure 5B**, delta\_0), provided, for example, by a vector based on sensory input from the antennae (Dürr and Schütz, 2011) or, if available, by visual or acoustic input (**Figure 3**, sensory input, **Figure 5B**, delta\_0, delta\_back). As a consequence, the stance legs of the puppet move in an appropriate way. The changes of the simulated joint angles as well as the body joints can be used as commands to control the actual joints. If such a body model is given that represents the kinematical constraints of the real body, we obtain in this way an easy solution of the inverse kinematic problem, i.e., for the question how the joints of legs standing on the ground have to be moved in concert to propel the body (for details and application for the control of curve walking see Schilling et al., 2012). In this case, the positive feedback input given to Stance-net, as was used in the earlier version (Schmitz et al., 2008) is not anymore necessary (although application of both influences may be sensible). The Stance-net can therefore be considered to only consist of Integral Controllers, one for each leg joint to which the body model provides the reference inputs.

# **MOTIVATION UNIT NETWORK**

To allow the system to select autonomously between different behaviors as for instance standing and walking, or forward and backward walking, Walknet is expanded by introduction of a RNN consisting of so-called motivation units (**Figure 3**) as has been done for Navinet (Cruse and Wehner, 2011; Hoinville et al., 2012). The units used here are artificial linear summation neurons with a piece-wise linear activation function showing lower and upper limits of 0 and 1, respectively.

The function of a motivation unit as applied here is to control to what extent the corresponding procedural element contributes to the behavior. To this end, these units influence the strength of the output of this network (in a multiplicative way). In an earlier version (**Figure 2**, Schilling et al., 2007) such a network containing only two motivation units has been applied to control the output of Swing-net and Stance-net. In this network each motivation unit is reinforcing itself, whilst the two motivation units are mutually inhibiting each other forming a WTA-net. Due to this competition, only one of the two behaviors is active most of the time. This decision is influenced by sensory signals acting on the motivation units. When the leg touches the ground at the front during a swing movement, the ground contact causes switching to the stance movement. When the leg reaches the PEP during stance, swing is initiated. Introduction of motivation units was inspired by Hassenstein (1983) and Maes (1991) and is based on the finding of Schumm and Cruse (2006) according to which there are indeed variable motivational states for individual procedures, in this case the swing controller.

Here we expand this network in two ways (**Figure 3**). First, each procedural element will be equipped with a motivation unit. This means that not only Swing-net and Stance-net, but also all Target-nets and all PEP memories as well as the leg coordination channels have an own motivation unit. Motivation units for Target-nets and PEP-nets are required to select between different target positions for forward walking and backward walking. The motivation units influencing the coordination rules are motivated by Dürr (2005), and Ebeling and Dürr (2006), who showed that coordination influences can be modulated (e.g., during curve walking).

Second, motivation units can also be used to influence other motivation units via excitatory or inhibitory connections. This is illustrated in **Figure 3**. Units which belong to the procedural nets controlling the six legs (only two legs are depicted in **Figure 3**) show mutual positive connections to a unit termed "walk" in **Figure 3**. This unit serves the function of arousing all units possibly required when the behavior walk is activated. Neurophysiological grounding of such an influence is given by Büschges (1998): when walking is started, the basic potential level in a number of relevant neurons is increased. In addition, following earlier authors (see Discussion) we introduce units "forward" and "backward" to activate procedures required for forward or backward walking (**Figure 3**, fw, bw), respectively, by selecting specific Target-nets and PEP-nets. It is only indicated in this figure that the unit "walk" may be coupled via mutual inhibition to other units that stand for different behaviors like, for example, standing still (unit "stand"). However, the corresponding procedures are not depicted (for a further expansion see **Figure 10**). Apart from the units fw and bw, it is also not shown that these "higher-level" motivation units, as is the case for the motivation units of Swing-net and Stance-net, may receive direct or indirect input from sensory units that influence the activation of a motivation unit.

As illustrated in **Figure 3**, this at first sight hierarchical structure is in general not forming a simple, tree-like arborization. As indicated by the bi-directional connections, motivation units form a RNN coupled by positive (arrowheads) and negative (T-shaped connections) influences (for details concerning the weights used see the Appendix). This structure may therefore be better described as "heterarchical." The combination of excitatory and inhibitory connections form a network that can adopt various stable attractor states. Some of these motivation units are coupled by local winner-take-all connections. This is true for the Swing-net and Stance-net of each leg, as well as for the motivation units for forward and backward walking. Thereby, a selection of one of the available Target-nets and of the PEP-nets is possible. Excitatory connections between motivation units allow for building coalitions. As depicted in **Figure 3**, there are different overlapping ensembles. For example, all "leg" units and the unit "walk" are activated during backward walking and during forward walking, but only one of the two units termed "fw" (forward) and "bw" (backward) and only some of the targeting modules are active in either case. This architecture can produce various stable attractor states or "internal states" (see Appendix for details). Such a state protects the system from responding to inappropriate sensory input. For instance, as a lower-level example, depending on whether a leg is in swing state or in stance state, a given sensory input can be treated differently: stimulation of a specific sense organ leads to a Levator reflex (when hitting an obstacle the leg is briefly retracted and then lifted) when in swing, but not during stance (see **Figure 5** from Dürr et al., 2004). Correspondingly, internal states can be distinguished on higher levels, as for example walking, standing still or forage (in the case of Navinet).

# **RESULTS**

As this article is focused on demonstrating the structure and the functioning of the heterarchical network in cooperation with the decentralized procedural memory, we will not report on a detailed quantitative evaluation of the functional properties of the walking system as has been done by Kindermann (2002). Instead, we show six examples of walking situations for which various combinations of active motivation units are required (five cases for forward walking, one case for backward walking) as are different walking velocities, "uncomfortable" starting configurations, or curve walking. The behavior is illustrated by plotting the footfall pattern, i.e., for each leg the state swing (black) or stance (white) over time, to illustrate the temporal structure of the gaits. In addition, in the supplement we provide videos showing the behavior of the robot, the temporal development of the footfall patterns, as well as the temporal sequence of the internal states, represented by the activation of all motivation units.

During forward walking, continuously active units are walk, fw (forward), all six leg units and the units of the corresponding Target-nets and PEP-nets. More or less regularly alternating are the Swing and Stance units of the six legs.

### **STRAIGHT WALKING**

In forward walking a vector (**Figure 5B**, delta\_0), which might be provided by tactile input from the antennae or by visual input, is pulling the body model straight forward. Velocities are given as a dimensionless number (relative velocity v\_rel = swing duration/stance duration). **Figure 6A** (**Movie S1**) shows high velocity walking (v\_rel = 0.5) corresponding to what is usually called tripod gait (at least three legs are on the ground at any time). **Figure 6B** (**Movie S2**) shows a walk with lower velocity (v\_rel = 0.4), usually called tetrapod (at least four legs are on the ground at any time). **Figure 6C** (**Movie S3**) shows very slow walking (v\_rel = 0.15), sometimes termed wave gait. Note that, like in the insects, there is no clear separation between these "gaits." Rather, examples shown in **Figure 6** are taken from a continuum which depends on one control parameter, the velocity.

**Figure 6D** (**Movie S4**) shows a walk (v\_rel = 0.3) starting from a difficult starting configuration (see legend). In this case, contralateral leg pairs started with exactly the same leg position. This leads to a gallop-kind stepping pattern (see the first three steps). A symmetry-break occurs due to minor irregularities in the ODE simulation (e.g., short slipping of a leg). After another about three to four steps, a stable wave gait pattern can be observed (Videos for all example walking cases are provided as supplementary material.).

# **NEGOTIATING CURVES**

Still in forward mode, the body model can also be used for walking in curves which leads to another kind of leg coordination. Only the pull vector acting on the body model has to be adjusted and has to point in the direction the agent should walk to. As mentioned, the pull vector may be provided by signals from the antennae or via visual input. The body model is pulled (at the front of the first segment, **Figure 5B**, delta\_0) into this direction and all standing legs as well as the body segments are following (Schilling et al., 2012). **Figure 7** (**Movie S5**) shows a simulation run, where the body model is pulled to the front and the right by an angle of 12◦. Velocity is set to v\_rel = 0.4, i.e., no different velocities are required for right and left legs in contrast to the simulation of Kindermann (2002). We also did not change the

for **(D)**: L1: 0.15, R1: 0.15, L2: -0.1, R2: -0.1, L3: -0.02, R3: -0.02. Black bars indicate swing movement of the respective leg: left front, middle and hind leg, right front, middle and hind leg, from top to bottom. Abscissa is simulation time. The lower bars indicate 500 iterations corresponding to 5 s real time.

nominal AEP and PEP values in contrast to the behavior observed in the insects (Jander, 1985; Dürr, 2005; Dürr and Ebeling, 2005; Rosano and Webb, 2007; Gruhn et al., 2009). As has been shown by these authors, during swing, the legs, in particular the front legs, target sideways, which may even allow these animals to turn on the spot (Cruse et al., 2009b). Nonetheless, the simulated agent can negotiate curves as tight as a radius of about one body length (distance between front leg coxa and hind leg coxa, 578 mm, see **Figure 8B**) compared to the approach of Kindermann (2002) whose tightest curves had a diameter of about three body lengths. During the sequence depicted in **Figure 7**, a curve of about 180◦ has been negotiated. In the simulation, inner legs show much smaller stance velocities and smaller step amplitude compared to the outer legs. As can be observed in **Figure 7**, the inner hind leg shows much smaller step frequencies and, depending on the radius of the curve, may even keep staying on the ground during the complete turn (not shown). Both results compare to those observed in insects (Dürr, 2005; Rosano and Webb, 2007; Gruhn et al., 2009).

**Figure 8A** Illustrates the movement of the leg tips during stance in a top–down view using as coordinate system that of the last body segment. Shown is at least one stance phase of each leg over the time of 2 s simulation time after the curve walking has been initiated for 6 s This means, that initialization of the curve has been completed and the behavior is now in a stable

**internal body model.** The internal body model is constantly pulled to the front and the right. The complete run shown corresponds to a turn of about 180◦. Starting positions (in m, origin is position of coxa): L1: 0.20, R1: 0.05, L2: -0.04, R2: -0.14, L3: -0.02, R3: -0.22; for further explanations see **Figure 6**.

**FIGURE 8 | Curve walking of the model.** In **(A)** the movement of the leg tips during the stance movement is shown with respect to the last body segment. **(B)** shows snapshots of the movement over time. Every second of simulation time the body posture is shown, the postures of the legs in stance are shown four times each second.

execution state. This is also shown by the stable orientation of the body segment angles [first segment joint has a mean value of 15.0◦ (std. ± 1.27), second segment joint has a mean value of 4.8◦ (std. ± 0.35)]. The outer legs perform faster movements during stance than do inner legs (see different distances of symbols). The outer front leg is showing movements far away from the body, mostly because the body is pulled away from its footpoint. In **Figure 8B** the movement is shown as a sequence, emphasizing the tightness of the curve. Here, every second of simulation time the body posture is shown and the leg postures are given for standing legs in addition four times each second. **Figure 8B** includes the initiation of curve walking (see also the video provided as supplementary material).

# **BACKWARD WALKING**

To trigger backward walking, the motivation unit for backward walking (**Figure 3**, bw) has to be activated. In animals, this behavior can be elicited by tactile stimulation of both antennae (Graham and Epstein, 1985). Motivation unit bw activates the corresponding Target-nets and PEP-nets. Activation of this motivation unit additionally sets the forward pull vector to zero, and activates a displacement vector being attached to the back of the last segment, thereby pulling the model backwards (**Figure 5B**, delta\_back). Furthermore, the leg coordination rules required for backward walking are switched on (here we used a mirror image version of the rules used for forward walking). As in straight forward walking or curve walking, this vector is assumed to be provided by appropriate sensory input.

In the case of backward walking (**Figure 9**, v\_rel = −0*.*4, **Movie S6**) the attractor states of the motivation unit network are characterized by the continuously active units walk, bw (backward), all six leg units as well as the units of the corresponding Target-nets and PEP-nets required for backward walking. As in forward walking, Swing-nets and Stance-nets of the six legs show variable activation patterns, which, in backward walking, also result from the different coordination rules applied. The corresponding video further outlines the details of the behavior.

# **DISCUSSION**

We describe a novel architecture that can be used to control an autonomous robot and is based on earlier approaches, Walknet and Navinet. The neural controller Walknet, as described earlier (e.g., Dürr et al., 2004; Bläsing, 2006; Schmitz et al., 2008; Schilling et al., 2013), represents a typical case of an embodied controller: the network is able to control the movement of a hexapod walker in unpredictably varying environments without relying on other information than available using the given mechanosensors. This is possible because the body and properties of the environment are crucial elements of the computational system—the system is embodied not only in the sense that there is a body (i.e., that there are internal states being physically represented), but in the sense that the properties of the body (e.g., its geometry) are required for computational purposes. Exploiting the loop through the world (including the own body) allows for a dramatic simplification of the computation. In the version being expanded by an internal body model, too, control of DoFs does not result from explicit specification by the neuronal controller, but results from a combination/cooperation of the neuronal controller, the internal body model and the coupling via the environment.

The procedures forming the decentralized controller are basically arranged in parallel, i.e., obtain sensory input and provide motor output, but there are also procedures that receive input from other procedures and, as a consequence, procedures that provide output to other procedures. Application of such a decentralized architecture helps to solve the flexibility—automaticity dilemma. Modules have a fixed function, but can work together in a flexible way to solve difficult tasks. In the case described here this is realized by a system that, based on studies of insect behavior, has been designed to control hexapod walking but is also able to climb over very large gaps (Bläsing, 2006).

Flexibility of the system is improved by introduction of the motivation unit network (Section Motivation Unit Network), which allows to integrate additional behaviors in the process of behavioral selection. The organization of this network is especially designed to allow for competitions on different levels, in this way forming different clusters of units. For example, the competition on a leg level selects swing and stance movements, while, on a more global level, the walking direction or other behaviors different from walking can be selected. Activities of these motivation units not only allow for selection of behavioral elements, but also provide a context according to which specific sensory inputs are attended or not. In this sense, the motivation unit network can be considered to be a system allowing for controlling attention. This property is more obvious in another case, where the architecture proposed here has successfully been applied for controlling insect navigation ("Navinet," Cruse and Wehner, 2011; Hoinville et al., 2012). In this task the animals are able to select visiting one of a number of food sources learned, and to decide between traveling to the food source or back home. Of particular interest is here that a desert ant and also Navinet attend known visual landmarks only in the appropriate context, i.e., depending on the food source it is actually traveling to. In this way, the system controlled by the motivation network allows for selective attention. As Navinet provides walking direction as output, both Navinet and Walknet can directly be combined, whereby the output of Navinet controls the pull vector of the body model.

Introduction of motivation units does not impair the behavior of the basic Walknet structure, because during walking in one direction (forward or backward) all motivation units maintain their activation values except for the Swing and Stance motivation units. Only the latter change their activation dynamically and do this in the same way as is the case in the earlier Walknet versions. Therefore, all properties of Walknet concerning forward walking are inherited in the version expanded by motivation units.

Although the procedures as such are essentially arranged in parallel, the motivation network provides connections between the modules that form a dynamical heterarchy. In contrast to Jenkins and Mataric (2003), for example, who discuss a three level structure (motor level, skill level, task level), our architecture does not imply such a strict separation in levels. Rather, any combination of modules might, in principle, be possible in this architecture. Furthermore, this architecture is very flexible as it easily allows for later expansions to represent not only six-legged walking, but, for example, four-legged walking where both front legs could be used as grippers. How this could be done is illustrated in **Figure 10**. As depicted in this figure, to this end the front leg controller, in the figure leg 1, is equipped with a parallel procedure termed "grasp," that is controlled by an own motivation unit (**Figure 10**, gri1). This motivation unit is activated if the unit "4-legged walk" is active, which in turn inhibits the unit "6-legged walk."

**Figure 11** indicates how the motivation unit network of Navinet (Cruse and Wehner, 2011; **Figure 1**) and that of the Walknet version depicted in **Figure 3** can be combined. To this end, we introduce a higher layer consisting of two motivation units "sleep" and "awake," which are then connected to the uppermost layer of Walknet (**Figure 3**), units "stand" and "walk," and to the uppermost layer of Navinet, units "nest" and "forage."

Our network provides an example showing that concatenation of modules required for control of complex behavior does not necessarily require explicit coding, but may emerge from local rules and the coupling through the environment. The heterarchical structure used in the expanded version of Walknet and in Navinet comprises a simple realization of "neural reuse" as proposed in Anderson's massive redeployment hypothesis 2010.

The network, as described, consists of a "hard-wired" structure, i.e., the weights connecting the artificial neurons are fixed. Nevertheless, the system is able to flexibly adapt to properties of the environment. However, there may also situations occur in

which the controller runs into a deadlock. Think for example of the situation in which, during forward walking, by chance all legs but the right hind leg are positioned in the frontal part of their corresponding range of movement, whilst the right hind leg is positioned very far to the rear. When this leg starts a swing movement, the body may fall backward as the center of gravity is not anymore supported by the legs on the ground. Such a "problem" might be signaled by specific sensory input, for example a specific load distribution of the legs. To find a way out of this deadlock, a random selection of a behavioral module not belonging to the actual context, in our case forward walking, may help. A possible example might be a backward step of the right middle leg. Such a backward step of the middle leg would make it possible to support the body, then allowing the hind leg to start a swing. However, in our controller, backward steps are only permitted in the context of backward walking. How might it be possible for the system to find such a solution nonetheless?

In **Figure 12** we briefly illustrate a simple expansion allowing the system to search for such a solution (Schilling and Cruse, 2008). A third layer, essentially consisting of a recurrent WTA-net, is arranged in such a way that each motivation unit has a partner unit in the WTA-net (**Figure 12**, green units). The WTA units might be activated by various "problem detectors" not depicted in **Figure 12**. Motivation units activated in the actual context inhibit their WTA partner unit (T-shaped connections in **Figure 12**). Thus, a random activation of the WTA-net will, after relaxation, find one of the currently not activated modules. The WTA unit winning the competition can then be used to activate its partner motivation unit and thereby trigger a new behavior that can be tested for being able to solve the problem. In this way, realizing a special type of top down attention, the network has the capability of following a trial-and-error strategy.

As has been proposed (Schilling and Cruse, 2008) a further expansion of the system may permit to use the body model instead of the real body to test the new behavior via "internal trial-and-error" whilst the motor output to the real body is switched off. To this end, switches have to be introduced allowing the motor output signals to circumvent the real body and being passed directly to the body model (not depicted in **Figure 12**). Only if the internal simulation has shown that the new trial provides a solution to the problem, the behavior will actually be executed. McFarland and Bösser (1993) define a cognitive system in the strict sense as a system that is able to plan ahead, i.e., to perform internal simulations to predict the possible outcome of a behavior. Therefore, the latter expansion would, according to McFarland and Bösser, make the system a cognitive one (for details see Schilling et al., 2013).

Furthermore, inspired by Steels and Belpaeme (2005); Steels (2007), the possibility to expand the network by a forth layer, that contains specific procedures, namely networks that represent verbal expressions has been discussed by Cruse (2010). These "word-nets" may likewise be used to utter or to comprehend the word stored. The underlying idea is to connect each word-net with a unit of the motivation network of which it carries the

meaning (e.g., the word-net "walk" should be connected with the motivation unit walk), thereby grounding the symbolic expression (Cruse, 2010). Interestingly, Jenkins and Mataric (2003) draw an analogy between the structure for what they call a "motor vocabulary" to linguistic grammar or "verb graphs," a property that is reflected in our network.

Although the latter two levels (WTA-net and word-nets) are still quite speculative as they have not yet been tested, together with the two lower layers they illustrate the principal idea of this architecture (**Figure 13**). Horizontally arranged modules (procedures, motivation units, WTA neurons and procedures for words), are ordered in the horizontal layers in such a way that the corresponding elements in the different layers appear in a vertical order, leading to modules arranged in a columnar fashion (**Figure 13**, dashed rectangles). Addressing this columnar structure does not mean that each lower level procedure or each motivation unit has to have a partner in the upper layers, but only means that such connections are in principle possible. Similarly, not every unit or procedure in the upper layers necessarily has a partner procedure in the lowest layer. As a column in this architecture is characterized by a motivation unit, we name our architecture MUBCA.

Interestingly, Schack and Ritter (2009) and Schack (2010) have, too, proposed a four layer model to describe an architecture for the control of complex movements, which makes a comparison tempting. The four layers of this model are characterized as (i) sensorimotor control, (ii) sensorimotor representation, (iii) mental representation, and (iv) mental control. Although this architecture is settled at a more abstract level of description, our procedural layer might well be compared with Schack's sensorimotor control layer. Further, our motivation network has relations to Schack's sensorimotor representation layer (ii). Schack's layers (iii) and (iv) are not directly comparable, as "mental" is defined in his approach by consciously accessible, an aspect not considered here. Nonetheless, our WTA layer may be related with Schack's uppermost layer (iv), the mental control layer. Our uppermost layer might better be related with the mental representation layer of Schack's model, i.e., his layer (iii). This layer is characterized by containing the so called basic action

concepts, behavioral elements to which verbal expressions can be assigned. So, there appear to exist some interesting relations between Schack's model and ours, but obvious differences can be observed in detail.

Another proposal, the DAC architecture developed by Verschure and cooperators (review Verschure, 2012), is at first glance formally very similar to our approach, as it consists of four layers and (three) columns. The four layers show some relation to our layers. DAC differentiates between soma (body including sensors and actuators) and three layers characterizing the brain, the reactive layer, the adaptive layer and the contextual layer. The reactive layer roughly corresponds to our procedural layer, the adaptive layer controls (classical and operant) learning, not addressed in our approach. The contextual layer has some relation to our motivation unit network. The contextual layer of the DAC architecture also contains memory elements (STM, LTM) in contrast to our system where these memories are only stored in the lower, procedural layer. The motivation units only contribute to selecting the different procedures containing the memory content. The three columns, however, are quite different from the columns used in our architecture. They concern (i) the state of the world, (ii) the state of the self, and (iii) control of action. In our network, these functions are implicitly embedded in the different layers making our architecture much simpler. Learning is, in our model, foreseen to be implemented as explorative learning (Schilling et al., 2013) and will be realized by a cooperation of our third layer (WTA net) and the second layer (motivation unit network).

Our architecture follows the concept of D'Avella et al. (2003) assuming that natural motor patterns are constructed by combination of discrete elements ("modules," "motor programs"). To simplify the simulation, we assume that different modules are constructed of separate, non-overlapping neuronal elements. The situation might, however, be more complicated. As reviewed by Briggman and Kristan (2008), "morphologically defined circuits could be reconfigured into many distinct functional circuits [*...*] generating recognizable discrete behaviors." To cope with this case, for example Tani et al. (2004) studied how a number of different behaviors can be represented by different states of one RNN. A similar question can be asked with respect to the motivation units. Are they better described by individual units or do they form a distributed structure? Tani (2007) showed interesting studies of models, where not individual motivation units are responsible to modulate a given behavior. In contrast, a higher level RNN adopts different attractor states which as such influence the properties of lower level RNN. As, however, distributed systems containing recurrent connections appear to be more difficult to be stabilized, we have chosen to deal with distinct modules on the lower level and single units at the higher level. This is in particular helpful when dealing with a large number of modules and aiming to control relatively complex behaviors as is the goal of Walknet and Navinet.

Two further approaches should be mentioned that address the problem of how various sensory input can be used to select between different behaviors. The architecture of Arena et al. (2009) essentially consists of a number of "basic behaviors" (e.g., phonotaxis, phototaxis, obstacle avoidance) and a "representation layer." The former compare with our procedures, the latter is related to our motivation unit network layer. Whereas the motivation unit layer comprises a simple, sparsely coded Hopfield-like network forming a decentralized structure with local, dedicated sensory inputs for some of the units, the Representation Layer of Arena et al. (2009)is quite different as it receives all sensory inputs to form a "central representation of the actual environmental situation" being represented by a Turing pattern. The Turing pattern emerges from a structure consisting of sensory neurons and two layers of Reaction-Diffusion Cellular Non-linear Networks (CNN) which, through temporal dynamics leads to static Turing patterns. The attractors represented by these patterns depend on the actual activation of the sensory cells. An additional selector network computes, after learning is finished, the relative contribution of the different basic behaviors to the overall behavioral output. A comparable selection process is, in our system, performed directly by the motivation unit network, that allows for parallel activation of different procedures except for those that are directly connected via inhibitory weights. As an important functional difference to our approach, learning plays a crucial role for the system developed by Arena et al. (2009), requiring a more complex architecture. As in our system no (online) learning is taking place, the network can be dramatically simplified and allows a simple way of finding sensible combinations of procedures. A central representation of the actual situation given by the current sensory input is not required.

An, on a general level, similar architecture, that transforms sensory input data into a vector that then is used to drive the single behaviors is given by Steingrube et al. (2010). Not counting on details, the representation layer of Arena et al. (2009) is here replaced by a network showing chaotic properties. The neuronal chaos controller requires a preprocessor and a postprocessor for sensorimotor mapping able to select between 11 different basic behaviors, whereby at least six of them are considered as "typical walking patterns emerging in insects." Although this solution is from a mathematical point of view quite interesting, the biological grounding is not well justified. The different gaits used should not be interpreted as discrete "basic behavioral patterns." Rather, as discussed earlier, these "gaits" are arbitrarily selected patterns out of a continuum and should therefore be considered as one basic behavior. Furthermore, although being claimed as allowing for fast switching between behaviors, the chaos controller is slower than our very simple motivation unit network which requires one or at the most two iteration for finding another attractor.

Both approaches are characterized by application of centralized structures to solve the problem of combining a large amount of sensor data and use this information to control various different behaviors. As an alternative, here we propose a decentralized solution. Our approach of using individual motivation units is supported by evidence for the existence of discrete neurons on the higher level, at least for invertebrates. For example, Briggman et al. (2005) show that in the leech a specific neuron drives crawling, while experimental inhibition of this neuron supports swimming. In Aplysia feeding behavior, a "commandlike neuron" influenced by motivational states normally elicits ingestion behavior. After experimental application of a specific neuropeptid, this neuron elicits egestive behavior, thus deciding between two mutually excluding behaviors (Jing et al., 2007). Briggman and Kristan (2008) reviewed further examples including units releasing neuromodular substances.

Application of motivation units for controlling the selection of different procedures arranged in parallel have been introduced by Maes (1991, see also Hassenstein (1983), both inspired by K. Lorenz). Maes also included connections between motivation units that allow to control temporal relationships between the procedures, a property not applied here. Instead, we introduced a heterarchical structure to allow for the selection of various combinations of modules. Simple, non-heterarchical structures have already been applied in Walknet to select between procedures (Swing—Stance, Schilling et al., 2007) and between higher-level states (forward—backward, (Schilling and Cruse, 2008), then called "distributor net"). In the form of so-called command neurons the selection between forward walking and backward walking has already been applied by Ayers and Davis (1977). Similarly, mutually inhibitory units used to decide between forward and backward walking have also been used in a model of Tóth et al. (2012). In both cases, the units controlling forward or backward walking are directly connected with the motor units on the muscle level, in contrast to our approach. More generally, these models are based on strictly hierarchical structures what makes it difficult to use individual modules in other ways than allowed in the actual context. Therefore, these hard-wired hierarchical structures prohibit trial-and-error variations in cases of emergency. Our heterarchical system with selective access to individual modules makes this possible (**Figure 12** and related text).

Although our approach has originally been derived from studying insect behavior, the neuronal architecture is purposefully abstracted from any neuroanatomical constraints given by the insect brain. Therefore, concentrating on the functional aspect and searching for some kind of minimal model, this architecture may be applied to different types of brains. Nonetheless, in the following we will briefly discuss to what extent this model may be mapped to the neuronal system of insects. Some motivation units clearly should be part of local, thoracic ganglia. These are the motivation units for swing and stance, target\_fw, target\_bw, PEP\_fw, PEP\_bw of the different legs, as well as most probably the six "leg" motivation units. Less clear is the localization of the motivation units controlling the different coordination influences acting between the legs. All other motivation units, like stand, walk, forage, (stay in) nest, forward, backward, are to be attributed to the brain of the insects.

Finally, a nomenclatural question that might be briefly discussed is whether it is sensible to attribute a higher level term as "motivation" to such simple units as used here to control microbehaviors like swing or stance, for example. Generally accepted examples for motivational states are, for instance, aggression controlling fight, or fear controlling flight. However, in our network, there is functionally no principal difference between motivation units controlling behaviors at any level of our network. Therefore, we believe it is justified to apply this term also to such lower level elements, or "microbehaviors," like swing or stance of a leg, for example. Of course, this usage of the term motivation often applied in robotics and animal behavior, is still quite different from that used in psychology, focusing on systems with cognitive abilities. Motivation in psychology is generally considered as a (multiplicative) combination of desire and expectancy. In our case, only the aspect of "desire" is addressed.

# **CONCLUSION**

Our architecture, MUBCA, integrates often discussed properties postulated to exist in neuronal systems, as are modularity, hierarchy, cross modal influences (e.g., path integration and landmark navigation in Navinet), bottom–up and top–down attention control, i.e., selection of relevant input data establishing priorities, application of internal models for prediction, and redundant structures. Due to the fact that some central structures as the motivation unit network and the body model are realized as RNN, the complete network forms a holistic system. Therefore, this architecture can be considered as modular and holistic at the same time. As this architecture does not follow strict rules for construction, it is very flexible and can be expanded in various ways. To verify these capabilities further, our next step to do is to implement the cognitive layer as sketched in **Figure 12**. In parallel, all versions of the model will be tested on the physical robot Hector. The versatility of this approach is further underlined by a proposal

# **REFERENCES**


insect behavior. *Adapt. Behav.* 14, 265–285. doi: 10.1177/105971 230601400307


as to how this architecture could be expanded to show properties of a mirror system and Theory of Mind (Cruse and Schilling, 2011). Recently, these authors have argued that even higher-level properties as are intention, (bottom–up and top–down) attention, volition as well as some aspects of consciousness can be attributed to a network based on this architecture. Interestingly, these properties are not explicitly implemented but appear as emergent properties of the network (Cruse and Schilling, 2013).

# **ACKNOWLEDGMENTS**

This work has been supported by the Center of Excellence "Cognitive Interaction Technology" (EXC 277) and by the EU project EMICAB (FP7-270182).

# **SUPPLEMENTARY MATERIAL**

**Movie S1 | Corresponds to Figure 6A, fast walking, normal starting posture.**


**Movie S6 | Figure 9 backward walking.**

and L. Patanè (Berlin: Springer), 43–96.


cognitive artificial neural network. *Front. Psychol*. 4:324. doi: 10.3389/fpsyg.2013.00324


2762–2768. doi: 10.1109/IROS. 2012.6385709


body nexus. *Biol. Inspired Cogn. Architect.* 1, 55–72. doi: 10.1016/j.bica.2012.04.005


*Physiol.* 48, 198–250. doi: 10.1242/ jeb.078139

Wosnitza, A., Bockemühl, T., Dübbert, M., Scholz, H., and Büschges, A. (2013). Interleg coordination in the control of walking speed in Drosophila. *J. Exp. Biol.* 216, 480–491. doi: 10.1242/jeb. 078139

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 13 December 2012; accepted: 24 August 2013; published online: 17 September 2013.*

*Citation: Schilling M, Paskarbeit J, Hoinville T, Hüffmeier A, Schneider A, Schmitz J and Cruse H (2013) A hexapod walker using a heterarchical architecture for action selection. Front. Comput. Neurosci. 7:126. doi: 10.3389/ fncom.2013.00126*

*This article was submitted to the journal Frontiers in Computational Neuroscience.*

*Copyright © 2013 Schilling, Paskarbeit, Hoinville, Hüffmeier, Schneider, Schmitz and Cruse. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# **APPENDIX**

# **MOTIVATION UNIT NETWORK**

The motivation unit network as applied here represents a recurrent neural network with piecewise linear summation units and bounded activation function (output ≥ 0 and ≤ 1). Each unit has a fixed self-activation weight [w/(w + 1)], which gives each unit the property of a first order low-pass filter with a time constant tau = w − 1 (here we used w = 3). The used weights are given in the matrix shown in **Table A1**. Note that all values are multiplied by (w + 1). In **Table A1** several submodules can be identified. One consists of the two units Stance and Walk, the other of units Walk, Fw, Bw, leg1 *...* leg6. In addition, there are six leg submodules consisting of leg#, Sw# and St#, (# = 1–6). Motivation units for Target-net, PEP-net and Coordination Rules only receive top– down influence. This is done to simplify the net dynamics, but in principle recurrent connections were possible, too.

Training of such a network is possible for linear and non-linear activation functions (Makarov et al., 2008; Cruse and Schilling, 2010). To obtain sparse coding, only those weights were allowed to be learned that in **Figure 3** are represented by (red) connections. By this way, the different procedures controlling the six legs were kept separate which simplifies the number of attractors to be trained. For training the leg network we used four vectors characterized by activation of units (1) walk, fw, leg1, sw, (2) walk, fw, leg1, st, (3) walk, bw, leg1, sw, and (4) walk, bw, leg1, st. Application of the procedure proposed by Makarov et al. (2008) guarantees to find stable attractor solutions even for a piecewise-linear activation function showing no upper limit. However, following this procedure, stability of the network would depend on the exactness of the weight values, i.e., noise would deteriorate the stability. We therefore, made this network more stable against noisy weights by introduction of an upper limit of 1 for each activation function. In this way, the network shows a Hopfield-like structure, apart from asymmetric weights being possible. Introduction of this activation function further allows in a second step, to increase the weight values in order to decrease relaxation time. To this end, within a leg, positive weights were increased to 1, negative weights decreased to −3. The weight values of the upper motivation unit "walk" that connects all leg units, were only increased to a value of 0.2, but the inhibitory weights decreased to −5, to cope with the different number of positive (five) and negative (one, in the example of **Figure 3**) input channels. This two-step procedure makes the already stable trained net solution more stable with respect to noisy weight values and allows for faster relaxation. Note that the figures for the weight values given here and in **Table A1** are multiplied by a factor (w + 1) with w = 3.

With the weights given in **Table A1**, the motivation network shows the following dynamic properties. Motivation units connected via mutual inhibition (e.g., Walk—Stand, Swing— Stance) change from zero to one (and back) within one iteration. Motivation units which receive only feedforward input (e.g., motivation units of Target-nets or PEP-nets) change their output value following a low-pass filter dynamic with a time constant of four iterations.

An important aspect is that (sensory) input driving the motivation unit has to show a transient component, i.e., has to be endowed with high-pass filter like properties. The amplitude of the first pulse is tuned by hand and depends on the value of the inhibitory weight stabilizing the WTA elements.


*Upper row and left column refer to the individual motivation units. Weights arranged in a horizontal row mark input weights of the unit shown at the left. Units marked in Figure 3 by "stand," or "walk" are abbreviated here by "st" or "wa," respectively. Units shown in Figure 3 as "fw," "bw," "leg1," "leg2" ... are shown as marked in Figure 3. Units not explicitly marked in Figure 3 are the motivation units for the swing procedure (here abbreviated by sw plus the number for the corresponding leg), the motivation unit for the stance procedure (marked by "leg model" in Figure 3) here abbreviated by st plus the number for the leg. The motivation units for the target nets are abbreviated by Tfw and Tbw for forward walking and backward walking, respectively plus the number for the leg. Correspondingly, the motivation units for the PEP nets, not shown n Figure 3, are abbreviated by Pfw and Pbw plus the number for the leg. Furthermore, motivation units for coordination rule 1 is abbreviated by R1fw and R1bw for forward walking and backward walking, respectively. All weights multiplied by (w* + *1); w* = *3.*

# **ROBOT SIMULATION**

As mentioned in the Introduction, to test the capability of our controller, a physical robot or, as a first step, a physics engine is required that is able to simulate the robot and its interaction with the environment. The robot named Hector (Schneider et al., 2011), currently being developed, has 22 DoFs. The spatial arrangement of the joints corresponds to that found in stick insects, however, all measurements are scaled by a factor of 20. The distance between the onset of the front leg and the middle leg is 360 mm, the corresponding distance between middle leg and hind leg is 218 mm. The three leg segments have a length of 30, 260, and 280 mm (coxa, femur, tibia, respectively). For the robot, all legs are constructed identically in order to simplify the design. The nominal step length of a leg is 310 mm, i.e., about 85% of the scaled up step length as found in stick insects (Stick insects have a step length of about 18 mm which would correspond to 360 mm in the scaled robot.). The step amplitude is limited mainly because in the stick insects, different to the robot, front legs and hind legs are longer than the middle legs. Nevertheless, step amplitude is quite large compared to usual 6-legged robots.

For the simulation the Open Dynamics Engine is used as a physics engine to simulate the interaction between all the moving parts and with the environment. The simulation basically consists of a rigid body dynamics simulation engine and a collision detection engine. Within this virtual environment the robot is rebuilt and the inertias of the sub-assemblies of the robot are set according to their real complement. The geometry of the housings and the movement ranges correspond to the real robot. The joints in the real robot will consist of a joint drive coupled with elastic properties in order to model muscle-like characteristics (Annunziata and Schneider, 2012). In the computer simulation the elasticity is also incorporated into the joint actuators and corresponds to a spring-damper-system. As the nature of the friction between the legs and the environment is generally quite complex in real systems, these contacts are hard to simulate. Therefore, these interactions can only be approximated and this will lead to differences between the virtual and the real system. This makes the application of the real robotic system important. The weight of the three simulated body segments is 2000 g, each.

All simulation experiments were made on flat terrain, however, due to (in particular for fast walking) considerable up- and down movements of the body, irregular ground was mimicked. In addition, as friction is limited, eventually legs showed slipping in various directions during stance. The resulting irregularities are also visible in the footfall patterns

# Biological oscillations for learning walking coordination: dynamic recurrent neural network functionally models physiological central pattern generator

*Thomas Hoellinger <sup>1</sup> \*, Mathieu Petieau1, Matthieu Duvinage2, Thierry Castermans 2, Karthik Seetharaman1, Ana-Maria Cebolla1, Ana Bengoetxea1, Yuri Ivanenko3, Bernard Dan1 and Guy Cheron1,4*

*<sup>1</sup> Laboratory of Neurophysiology and Movement Biomechanics, CP601, ULB Neuroscience Institute, Université Libre de Bruxelles, Brussels, Belgium*

*<sup>2</sup> TCTS Lab, Faculty of Engineering, Université de Mons, Mons, Belgium*

*<sup>3</sup> Laboratory of Neuromotor Physiology, Fondazione Santa Lucia, Rome, Italy*

*<sup>4</sup> Laboratory of Electrophysiology, Université de Mons, Mons, Belgium*

### *Edited by:*

*Tamar Flash, Weizmann Institute, Israel*

### *Reviewed by:*

*Abdelmalik Moujahid, University of the Basque Country UPV/EHU, Spain Thierry Pozzo, INSERM, France Jacques Duysens, KU-Leuven, Belgium*

### *\*Correspondence:*

*Thomas Hoellinger, Laboratory of Neurophysiology and Movement Biomechanics, CP601, ULB Neuroscience Institute, Université Libre de Bruxelles, 808 route de Lennik, 1070 Brussels, Belgium e-mail: hoellint@gmail.com*

The existence of dedicated neuronal modules such as those organized in the cerebral cortex, thalamus, basal ganglia, cerebellum, or spinal cord raises the question of how these functional modules are coordinated for appropriate motor behavior. Study of human locomotion offers an interesting field for addressing this central question. The coordination of the elevation of the 3 leg segments under a planar covariation rule (Borghese et al., 1996) was recently modeled (Barliya et al., 2009) by phase-adjusted simple oscillators shedding new light on the understanding of the central pattern generator (CPG) processing relevant oscillation signals. We describe the use of a dynamic recurrent neural network (DRNN) mimicking the natural oscillatory behavior of human locomotion for reproducing the planar covariation rule in both legs at different walking speeds. Neural network learning was based on sinusoid signals integrating frequency and amplitude features of the first three harmonics of the sagittal elevation angles of the thigh, shank, and foot of each lower limb. We verified the biological plausibility of the neural networks. Best results were obtained with oscillations extracted from the first three harmonics in comparison to oscillations outside the harmonic frequency peaks. Physiological replication steadily increased with the number of neuronal units from 1 to 80, where similarity index reached 0.99. Analysis of synaptic weighting showed that the proportion of inhibitory connections consistently increased with the number of neuronal units in the DRNN. This emerging property in the artificial neural networks resonates with recent advances in neurophysiology of inhibitory neurons that are involved in central nervous system oscillatory activities. The main message of this study is that this type of DRNN may offer a useful model of physiological central pattern generator for gaining insights in basic research and developing clinical applications.

**Keywords: central pattern generator (CPG), human locomotion, biological oscillations, dynamical recurrent neural network (DRNN), kinematics, neurophysiology of walking**

# **INTRODUCTION**

Neuronal modules profoundly influence many aspects of motor behavior. However, little is currently known about the control mechanisms that allow the coordination of these modular entities. In this theoretical context, human locomotion is a challenging movement because of the numerous neuroanatomical modules implicated in the different aspects of whole body movement, ranging from the cyclic propulsion of the limb to balance control. In spite of these intricate movement components and neuronal modules involved, it has been proposed that the overall control can be realized by reducing the number of degrees of freedom of the system into global variables (Bernstein, 1967; Lacquaniti et al., 1999, 2002; Flash and Hochner, 2005; Latash, 2008). The fact that the elevation angles of the three main lower limb segments are coordinated during gait within a covariation plane (Borghese et al., 1996), forming an elliptic loop corroborates the idea that control of locomotion is also submitted to the general law of reducing variables (Barliya et al., 2009). This general law is also applicable for different walking speeds (Bianchi et al., 1998), for forward and backward directions (Grasso et al., 1998), rectilinear or curvilinear walking paths (Courtine and Schieppati, 2004), walking with stilts (Dominici et al., 2009; Leurs et al., 2011), or with a transfemoral prosthesis walk (Leurs et al., 2012), with different levels of body weight unloading (Ivanenko et al., 2002) and for running (Ivanenko et al., 2007). Notably, this inter-segmental coordination is not present in newly walking toddlers (Cheron et al., 2001a,b; Ivanenko et al., 2005), but covariation planarity rapidly emerges over the first few days of independent walking experience, while the shape of the loop and plane orientation "mature" gradually over several years. This evolution indicates that the attractor plane and the shape of the loop are neurophysiologically defined, rather than being imposed by biomechanical constraints (see Hicheur et al., 2006; and Ivanenko et al., 2008 for discussion). More recently, the developmental study of this complex behavior in a new born baby (Dominici et al., 2011) has permitted revisiting the concept of locomotor modules coding for the control of movement primitives.

This modular approach raises the question of the dynamic coordination of modules in the context of oscillatory properties of neuronal ensembles. Indeed, the dynamical structure of these modules must logically obey a common principle for movement generation: the production of oscillatory activity. Although this principle is well accepted in case of the rhythmic nature of locomotion (Georgopoulos and Grillner, 1989; Grillner, 2006), the recent study of Churchland et al. (2012) surprisingly demonstrates that non-periodic movements such as reaching are also generated by neuronal oscillation. This means that there is a strong possibility that the spinal modules organized in a central pattern generator (CPG) could be dynamically controlled by cortical and/or supraspinal oscillations. Interestingly, Barliya et al. (2009) recently modeled the time course of elevation angles of the three lower limb segments in terms of simple oscillators coupled with appropriate time shifts reproducing the orientation of the plane and their elliptical shape. The oscillators were obtained by taking, after Fourier transform, the first three harmonics of the elevation angle kinematics. Each of these oscillators could be interpreted in term of oscillatory module acting in such a way that the orientation of the plane and the elliptic shape of the coordination are conserved. It could thus be possible that oscillatory signals derived from these harmonics are used for activating CPG modules.

The existence of CPG in the spinal cord in mammals has been proposed a century ago (Brown, 1911). In essence, it represents a group of neurons acting as a network to produce coordinated patterns of rhythmic activity. New evidence has shown the presence of CPG in the spinal cord in humans (Calancie et al., 1994; Bussel et al., 1996). The characteristics of such CPG modules are their adaptability and robustness that lead to the production of different gait patterns adapted to their current environmental context. For example, young infants (less than 1-year old) are already able to walk over a split-belt treadmill with different types of coupling (Yang et al., 2005). Some of them were even able to walk in opposite directions. Mimicking physiology, the robotic and neuroscientific community developed artificial CPGs that are commonly used to animate robots from multilegged insect-like to humanoids (for a review see Ijspeert, 2008). In their pioneering work in the cat, Patla et al. (1985) proposed an analytic model of limb locomotor pattern generator based on recorded muscle activity induced by electrical stimulation over the mesencephalic locomotor region (MLR) in the decerebrate cat. In this model, locomotor like patterns of six muscles resulted from six independent oscillators with dedicated parameters. These oscillators were reduced to simple sine and cosine functions fed by a tonic input. Since then, different methods have been used from coupled non-linear oscillators (Ijspeert et al., 2007; Duvinage et al., 2011, 2012a), to highly detailed biophysics of small groups of interconnected neurons (Hellgren et al., 1992) and rhythm genesis of larger groups of neurons (Wadden et al., 1997) mainly in animal models. While human locomotion has often been reproduced computationally in the robotic field using equations that are numerically integrated (Righetti et al., 2005; Righetti and Ijspeert, 2006; Ceccato et al., 2009; Duvinage et al., 2011, 2012a), few methods involving neuron modeling for human gait generation have been studied so far. Among them Prentice and coauthors (1998, 2001) have successfully transformed fundamental timing signals (sine and cosine inputs) into individual muscles activation bursts related to gait locomotion at different speeds using a feedforward neural network. Our group used the electromyographic (EMG) signals of the lower limb muscles as input for a dynamic recurrent neural network (DRNN) producing as output the lower kinematics during locomotion (Cheron et al., 2003, 2012). However, the possibility to produce the motion of the lower limb segments by means of oscillations derived from the three harmonics of the Fourier transform of walking kinematics has not yet been assessed by means of the same DRNN. We show here that after learning based on different walking velocities, the DRNN is able to reproduce the lower limb kinematics of both legs. The DRNN can also generalize to the unlearned walking velocities. The analysis of the required network structure (e.g., number of units, distribution of time constant, and synaptic sign) provides a basis for the discussion about the constraints required for the elaboration of a CPG.

# **METHODS**

# **EXPERIMENTAL SETUP**

Seven healthy subjects aged from 25 to 35 years (mean age: 28 years) participated in this experiment. The protocol consisted of walking on a treadmill at 11 different velocities [from 1 (0.28 m/s) to 6 km/h (1.67 m/s) stepped by 0.5 km/h (0.14 m/s)] leading to 11 trials per subject (total of 77 trials over all subjects). Whole body kinematics was recorded using Vicon system with six infrared Bonita cameras recording at 100 Hz during 40 s for each trial. The tracking consisted of recording 29 markers placed over the whole body. The position of the markers in an orthogonal reference was computed using a custom biomechanical model. From the position of the markers of both legs, the kinematic (elevation) angles relative to the vertical plane of the laboratory have been calculated bilaterally for thighs, shanks, and feet. In this study we performed two experiments. The first one, "proof of concept," was done to ascertain the feasibility of a DRNN to learn elevation angles from pure sine waves. This part only includes supervised learning on a single pattern (**Figure 1A**). The second experiment was performed in order to study the possibility to learn multiple patterns of walking (i.e., velocities) and to predict kinematics from unlearned patterns (**Figure 1B**). After that, the DRNN structures were analyzed.

# **DYNAMICAL RECURRENT NEURONAL NETWORK**

The DRNN is capable of modeling time-varying input–outputs and has varying weights as well as varying time constants for the artificial neurons (Pearlmutter, 1990). The adaptive time

constants make it dynamic (Draye et al., 1996). The DRNN is governed by the following equations:

domain using fast Fourier transform (FFT). Then the signal characteristics are computed for the first three harmonics and back-transformed into temporal

$$T\_i \frac{d\mathbf{y}\_i}{dt} = -\mathbf{y}\_i + F(\mathbf{x}\_i) + I\_i \tag{1}$$

where *F(a)*is the squashing function *F(a)* = *(*1 + *e*−*a)*<sup>−</sup>1, *yi* is the state or activation level of unit *i*, *Ii*, is an external input (or bias), and *xi* is given by:

$$\varkappa\_i = \sum\_j \varkappa\_{ij} \varkappa\_i \tag{2}$$

which is the propagation equation of the network (*xi* is called the total or effective input of the neuron *i*, *wij* is the synaptic weight between units *i* and *j*). The time constants *Ti* will act like a relaxation process. In order to make the temporal behavior of the network explicit, an error function is defined as:

$$E = \int\_{t\_0}^{t\_1} q(\mathbf{y}(t), t)dt\tag{3}$$

where *t*<sup>0</sup> and *t*<sup>1</sup> give the time interval during which the correction process occurs. The function *q(y(t),t)* is the cost function at time *t* which depends on the vector of the neuron activations *y* and on time. We then introduce new variables *pi* (called adjoint variables) that will be determined by the following system of differential equations:

$$\frac{d\rho\_i}{dt} = \frac{1}{T\_i}\rho\_i - e\_i - \sum\_j \frac{1}{T\_i} \omega\_{ij} F'(\mathbf{x}\_j)\rho\_j \tag{4}$$

with boundary conditions *pi(t*1*)* = 0. After the introduction of these new variables, we can derive the learning equations:

$$\frac{\delta E}{\delta \nu\_{\dot{\nu}}} = \frac{1}{T\_i} \int\_{t\_0}^{t\_1} \wp\_i F'(\varkappa\_{\dot{\jmath}}) p\_{\dot{\jmath}} dt \tag{5}$$

$$\frac{\delta E}{\delta T\_i} = \frac{1}{T\_i} \int\_{t\_0}^{t\_1} p\_i \frac{d\nu\_i}{dt} \tag{6}$$

space using sine waves formulation. These data are fed as input to the DRNN successful learning will permit the DRNN to predict kinematics based on sine wave inputs only. These output signals can produce biologically plausible walking patterns in a virtual reality avatar or an actual robotic exoskeleton.

The sinusoid signals derived from the Fast Fourier transform (FFT) kinematic data are sent as input to a DRNN (**Figures 1**, **2** cf. Experiments 1, 2) that has to learn from these data to produce the kinematics specified as elevation angles (**Figure 1A**). Successful trainings were also used to produce kinematic patterns from unknown inputs (**Figure 1B**) aiming to produce walking for multiple purposes, such as virtual avatars or robotic exoskeletons. The training is supervised, involving learning rule adaptations of synaptic weights and time constant of each unit (Draye et al., 1995, 1996). A specific training procedure using Almeida algorithm was used to optimize learning performance (Cheron et al., 2011). The DRNN involves a looping mechanism (fully connected structure) which enables this network to learn and store information (memory). This equips the network with the ability to model complex situations with multiple influences. The DRNN was successfully used to map EMG signals into kinematics during walking (Cheron et al., 2003), for the identification of the triphasic EMG pattern in subjects performing fast elbow flexion movements (Cheron et al., 2007) or to predict specific muscular activity in elite fencers compare to amateurs (Cheron et al., 2011).

### **EXPERIMENT 1: PROOF OF CONCEPT**

DRNN computation has been used to prove that simple sine waves with specified temporal characteristics can be used as input to an artificial neural network to be transformed into elevation angles obtained from kinematic recordings during locomotion (**Figure 3**).

### *Input: construction of sine waves*

As the learning phase of the DRNN is a time-consuming process, we had to select appropriate sample from the whole available data as input. Moreover, even if the kinematics during walking is relatively stable, it may vary too much to feed the DRNN during the learning phase. For these reasons, we extracted two consecutive gait cycles from the 40 s of experimental data recorded in each trial (**Figure 2A**, black curves). They were chosen so as to be representative in terms of frequency of the whole recording set. Then, to determine the kinematic characteristics of gait, we transformed

**FIGURE 2 | (A)** Elevation angles (in degree) of the three segments (thigh, shank, and foot) for each leg as a function of time for a subject walking at 3 km/h (0.83 m/s). The whole pattern is presented for duration of 5 s. The black lines represent a pattern for two gait cycles used to determine the FFT characteristics. **(B)** The mean FFT transformation for six joints

for 40 s (in gray) and the two gait cycles (in black). Note that the two gait cycle patterns are selected so as to preserve the FFT characteristics in terms of amplitude and frequency (stars) for the overall pattern of 40 s. These characteristics are used as parameters to generate sine waves as input of the DRNN.

Three sets of learning were defined as the input differed (SEA, SEB, SEC). The structures of DRNNs were modeled with 30 hidden units for design of the network is then processed with 6 inputs, 30 hidden units, and 6 outputs.

the data of the lower limb segments elevation angles into the time frequency domain using the Matlab *fft* function to get the FFT power amplitude and their related frequency values of the first three harmonic peaks (**Figure 2B**). It has been shown previously that the first two Fourier harmonics accounted for approximately 98% of the experimental variance of the thigh, shank, and foot angles (Bianchi et al., 1998). We decided to create sine waves based on the characteristics of the first three harmonics to ensure that the signal proposed as learning input to the DRNN contains enough information to match the desired output precisely.

We extracted the values of the three frequencies (*f* 1*, f* 2*, f* 3) corresponding to the maximal amplitudes (*a*1*, a*2*, a*3). It was verified that *f* 2 = 2*f* 1 and *f* 3 = 3*f* 1. We then artificially produced six sinusoidal signals using frequency values as parameters (Equations 7 and 8).

$$\mathbf{y}\_{\hat{\mathbb{H}},1} = \sin(2\pi \times \hat{\mathbf{f}} \times \mathbf{t})\tag{7}$$

$$y\_{\hat{\mathbb{H}},2} = \sin(-(2\pi \times \hat{\mathbb{H}} \times t))\tag{8}$$

$$\text{For } i = 1, \ 2, \ 3$$

These six sine waves thus correspond to the frequency characteristics of the kinematic patterns that will be further used as pattern to be learned. For the sake of clarity we called the set of original inputs set SEA (for set of Equations A). Additionally, we produced six new sine waves using the preceding computations (Equations 7 and 8) where *f* 1- = *f* 1 + 0*.*25 Hz and six other sine waves where *f* 1-- = *f* 1 − 0*.*25 Hz, respectively called SEB (for set of Equations B) and SEC (for set of Equations C). Please note that in the latter two cases the original relations *f* 2 = 2*f* 1 and *f* 3 = 3*f* 1 were respected and hence *f* 2- = 2*f* 1 + 0*.*50 Hz; *f* 3- = 3*f* 1 + 0*.*75 Hz in the set SEB and *f* 2-- = 2*f* 1 − 0*.*50 Hz; *f* 3-- = 3*f* 1 − 0*.*75 Hz in the set SEC. Three different input sets (SEA, SEB, SEC) were thus defined for learning.

## *Pattern to learn: kinematic data*

The pattern to learned corresponds to the elevation angles of the right and left thigh, shank, foot segments in the two gait cycles of a 3 km/h (0.83 m/s) walk, normalized between −1 and 1. The outputs were the same for SEA, SEB, and SEC whereas inputs differed.

### *DRNN learning phase*

It was expected that the DRNN would learn how to transform the input to output thanks to its dynamical and recurrent structure of 30 hidden neurons. For each of the seven subjects, the network iterated 10,000 times. This process was repeated 100 times, leading to 100 different networks. At the end of the learning phase, we selected for each subject the network for which the difference between predicted and real kinematics was minimal. This computation was performed by calculating a similarity index (*SI*) (Bengoetxea et al., 2010) defined by the following equation:

$$SI = \frac{\int\_{-\frac{T\epsilon}{2}}^{\frac{T\epsilon}{2}} p\_1(t)p\_2(t)dt}{\left[\int\_{-\frac{T\epsilon}{2}}^{\frac{T\epsilon}{2}} p\_1(t)^2 dt \int\_{-\frac{T\epsilon}{2}}^{\frac{T\epsilon}{2}} p\_2(t)^2 dt\right]^{\frac{1}{2}}} \tag{9}$$

where *Tc* is the period of the limit cycle, *p*<sup>1</sup> and *p*<sup>2</sup> being synchronized, i.e., the matching between both patterns is based on the maximum of each pattern. Note that if both functions are identical, *SI* = 1. *SI* was calculated with the recorded pattern of elevation angles and the output of the DRNN.

### **EXPERIMENT 2: MULTIPLE PATTERN LEARNING AND PREDICTION**

In this experiment sine waves were modulated in frequency and amplitude and transformed into kinematic data using multi-pattern training. The prediction of kinematic pattern from unknown sine wave pattern was also tested.

## *Input*

As for the proof of concept methods, we extracted two gait cycles of each trial for multiple velocities (in km/h) (*v* = {1*.*5*,* 2*.*5*,* 3*.*5*,* 4*.*5*,* 5*.*5}). We transformed the kinematic data into the time-frequency domain to get their frequency *(f* 1*, f* 2*, f* 3*)* and amplitude (*a*1*, a*2*, a*3) (**Figure 2B**) parameters using the following set of Equations (10 and 11).

$$(\ y\_\nu, \hat{p}\_\nu, a\dot{a}\_{\nu, 1} = a\dot{a}\_\nu \times (\sin(2\pi \times \hat{p}\_\nu \times t))\tag{10}$$

$$y\_{\nu}, f\_{\nu}, a i\_{\nu, 2} = a i\_{\nu} \times (\sin(-(2\pi \times f\_{\nu} \times t)))\tag{11}$$

For *i* = 1*,* 2*,* 3

# *Patterns to learn: kinematic data*

The patterns to be learned consisted of elevation angles of the right and left thigh, shank, and foot segments corresponding to the two gait cycles, normalized between –1 and 1. These patterns were obtained from recordings at multiple velocities (in km/h) (*v* = {1*.*5*,* 2*.*5*,* 3*.*5*,* 4*.*5*,* 5*.*5}).

### *DRNN training phase*

We used the possibility to train the DRNN in a multi-pattern purpose (Bengoetxea et al., 2005). For each subject, the DRNN was trained to match the inputs/outputs patterns corresponding to five different velocities within a single DRNN structure. Hundred networks were made per subject, each of them iterating 50,000 times. This operation was assigned for 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 60, 70, and 80 hidden units. When this was completed, we selected the best network for each subject and each number of hidden units using the maximal *SI* values.

### *Output prediction*

We built sine waves from intermediary pattern velocities (km/h) (*v* = {2*,* 3*,* 4*,* 5}) as explained above. We fed the best DRNN structures previously obtained after the training phases with these unlearned inputs sine waves and analyzed DRNN behavior by calculating the predicted output with experimental data using *SI* values.

### **NEURONAL PROPERTIES AND CONNECTIVITY AFTER LEARNING**

Introduction of timing allows modeling of more complex frequency behavior, improves the non-linearity effect of the sigmoid function and the memory effect of time delays (Draye et al., 1995). The distribution of the time constant and the synaptic weights between units (Draye et al., 1996) after learning was analyzed after multiple pattern learning and prediction. In particular, we recorded the number of positive and negative connections inside DRNN structures of the best networks. Additionally we studied the distribution of neuronal time-constants.

### **STATISTICAL ANALYSIS**

Statistical analysis was performed using Statistica software (Statsoft, www*.*statsoft*.*com). For each test performed and described in the result section, we firstly verify whether the data were distributed normally using Kolmogorov–Smirnov test. When the data were normally distributed we use ANOVA with repeated measures and *post-hoc* Fisher analyses. When it was not possible to use ANOVA we used non-parametric tests such as Friedman ANOVA or sign tests.

## **COMPUTATIONS**

All DRNN computations were performed in the Hydra computing center hosted in ULB (https://cc*.*ulb*.*ac*.*be/hydra/). We allocated 1 node and 10 Gb of memory per computation (i.e., per subject per condition in the Experiment 1, per subject per structure in the Experiment 2). The computations were run in parallel to optimize the learning duration. Afterwards we simulated 5% of the overall experiment in the same conditions to estimate the learning time. The overall duration of the process was obtained by linear interpolation (**Figure 4**).

# **RESULTS**

### **EXPERIMENT 1: PROOF OF CONCEPT**

A statistical test was designed to compare *SI* values from different input types (SEA, SEB, or SEC) (**Figures 3**,**5**). The Kolmogorov– Smirnov did not reject the hypothesis that *SI* values were normally distributed (*D* = 1*.*4414, *p >* 0*.*20) when analyzing together values of the different inputs (SEA, SEB, or SEC). We used an ANOVA with repeated measures where dependant variable was *SI* and the independent variable chosen was the type of input. The analysis showed an effect of the input frequency in the DRNN prediction (*SI* value) [*H F(*2*,* <sup>12</sup>*)* = 38*.*110, *p* = 0*.*00001]. *Post-hoc* analysis confirmed that *SI* values of original group of unchanged frequency input (SEA) were higher than the 2 modified groups where frequency inputs have changed (SEB and SEC).

### **EXPERIMENT 2: PREDICTION OF INTERMEDIARY VELOCITIES**

As we decided to use frequency (*f* 1, *f* 2, *f* 3) and amplitude (*a*1, *a*2, *a*3) characteristics to modulate inputs for multiple learning procedures (Equations 10 and 11), we have verified that there was a statistical significance of these parameters for different velocities. The Kolmogorov–Smirnov test for *f* 1 (*D* = 0*.*14370, *p <* 0*.*1), *f* 2 (*D* = 0*.*14370, *p <* 0*.*1), and *f* 3 (*D* = 0*.*14370, *p <* 0*.*1) was not clear enough to reject the fact their population may follow a normal law. When looking at the distribution, they tend to be normal and it is possible that the significance of the test is due to the weak number of values. According to similar test,

of the procedure. Please note that the overall process for the seven subjects has taken approximately 200 days to be computed. **(B)**, **(C)**, Please note that only the units of the hidden layers are represented

without input or output neurons.

the values for parameters *a*1 (*D* = 0*.*08476, *p >* 0*.*2), *a*2 (*D* = 0*.*08758, *p >* 0*.*2), and *a*3 (*D* = 0*.*09035, *p >* 0*.*2) were normally distributed. We then used two ANOVA with repeated measures where the dependent variables were, respectively, the amplitude and frequency values and the within-subject factors were velocity of walking and the specific harmonic (1, 2, or 3).

ANOVA shows significant changes in amplitude [*F(*10*,* <sup>60</sup>*)* = 31*.*351, *p <* 0*.*0001] and frequency [*F(*10*,* <sup>60</sup>*)* = 69*.*276, *p <* 0*.*0001] with an increase in velocity. *Post-hoc* analyses revealed an increase in *f* 1*, f* 2*, a*1*, a*2*, a*3 and a decrease of *f* 3 with an increase in the velocity. These significant differences justified their use for building specific sine waves for different walking velocities (**Figure 6**).

Concerning the performance of the DRNN outputs, we wanted to verify if the *SI* value applied for the best networks was different for learning and prediction. Additionally we wanted to statistically check if the number of hidden units of the networks increases the *SI* value (**Figure 7**). The Kolmogorov–Smirnov did not verify that the populations of *SI* values among the learning (*D* = 0*.*18467, *p <* 0*.*01) or the prediction (*D* = 0*.*20684, *p <* 0*.*01) were normal. Thus, to compare the *SI* values between learning and prediction process we chose to use a sign test as the structure of the network (weights and time constant) were the same. This test reveals no significant differences in *SI* values between the two populations except when the network contains 1, 2, 4, 5, 6, and 7 neurons. Moreover we use a Friedman test to analyze *SI* values (dependent variable) according to the number of hidden units (independent variables) of the network. There is an effect of the number of hidden units to the *SI* both for learning [χ<sup>2</sup> *(*16*)* <sup>=</sup> <sup>111</sup>*.*5630, *<sup>p</sup> <sup>&</sup>lt;* <sup>0</sup>*.*0001] and for prediction [χ<sup>2</sup> *(*16*)* = 109*.*3721, *p <* 0*.*0001].

We also analyze the output of the DRNN for specific velocities where the hidden layer was the biggest with 80 units. A Kolmogorov–Smirnov run together with learning and prediction values did not verify that the populations of *SI* was normal (*D* = 0*.*18342, *p <* 0*.*05).

**velocities [from 1 to 6 km/h (0.28 to 1.67 m/s)].** The FFT peaks

interval.

We use a Friedman test to analyze *SI* values (dependent variable) according to the velocity (independent variables) of the network. There is an effect of the velocity to the *SI* [χ<sup>2</sup> *(*8*)* = 30*.*22, *p* = 0*.*00019] (**Figure 8**). A *post-hoc* analysis at 0.05 significance level reveals that *SI* values of 4.5 km/h were different from 1.5 and 2 km/h. *SI* values of 4 km/h were also different from 2 km/h. An example of prediction of intermediate velocities in one subject is illustrated in (**Figure 9**).

# **DRNN STRUCTURES FROM EXPERIMENT 2** *Weight distribution analysis*

The percentage of positive and negative weights was calculated for each best network per subject per condition. We wanted to verify if the distribution of the positive and negative weights were different (**Figure 10A**). The Kolmogorov–Smirnov test did not verify that the distribution of positive weights were normal (*D* = 0*.*15022, *p <* 0*.*01) nor the distribution of negative weights (*D* = 0*.*15022, *p <* 0*.*01). We then used a non-parametric signtest to compare the two-distribution as the two samples were dependent. Regardless of the number of hidden units, the test shows a difference between the distributions of the two populations (*Z* = 9*.*482, *p <* 0*.*0001). When the number of units in the test were included, it appears that the population of negative weights is higher than population of positive weights when the network possesses more than 3 hidden units (for 4 hidden units, *Z* = 2*.*268, *p* = 0*.*023342). When the structure of the

**FIGURE 7 |** *SI* **values of all multi-patterns learning and prediction for all participants with different number of hidden neurons.** Whiskers correspond to 95% confidence interval.

DRNN reaches 80 hidden neurons, 70*.*6 ± 0*.*84% of the synapses are negative.

### *Time constant distribution analysis*

The distribution of the time constant was represented by the median of neuronal time constants of the best networks per subject and number of hidden units (**Figure 10B**). The Kolmogorov–Smirnov displayed a normal distribution of the time constant median (*D* = 0*.*09110, *p >* 0*.*20). ANOVA with repeated measures was designed with the number of hidden units as independent variable and time constant median as dependent variable. ANOVA analysis shows an interesting

effect of the number of hidden neurons [*F(*16*,* <sup>96</sup>*)* = 3*.*6245, *p <* 0*.*0001]. Overall, *post-hoc* analysis revealed that the distribution of the time constants were different when the network was small (less than 5 hidden units). It also reveals that the networks with 80 hidden neurons were differently distributed than medium-sized network (8–10 hidden neurons).

# **DISCUSSION**

### **MAIN FINDING**

We show here that a fully connected recurrent neural network is able to reproduce human walking pattern based on oscillatory properties of kinematics. Although, this network is a black box model without prewired structure mimicking a physiological CPG, its actual performance allows direct comparison with CPG dedicated structure and related algorithm (Duvinage et al., 2012a). Moreover, by the inherent input-output mapping, the DRNN models not only the CPG but also neural feedback pathways and the musculoskeletal system. For simplicity, we consider this neural network as "CPG-like structure" here. We proved that the DRNN is capable of generating the kinematics as elevation angles pattern of walking for both limbs (six degrees of freedom) from simple oscillations corresponding to the three main harmonics of the walking kinematics. Moreover, by modulating those frequencies and tuning them in amplitude as input, the DRNN was able to learn and reliably predict walking kinematics at different velocities (**Figure 9**). After this appropriate learning the DRNN can thus be considered as a CPG-like structure that would continuously receive oscillation inputs to produce the relevant elevation patterns of the six leg segments. Another interesting result is observed when looking at the structure of best CPGs obtained after learning. Hence, it appears that all of them contain a major part of negative connection weight between units.

# **LIMITATIONS OF THE PRESENT APPROACH**

Obviously, there are infinite different ways to train the DRNN and this is a strength of this approach. However, it implies a corollary limitation as not all possibilities could be tested in the present study. For example, in order to better document its generalization ability, the model could be trained for a defined low range of velocities, e.g., from 1 to 4 km/h, and then tested with unlearned oscillation input corresponding to higher velocities. A reverse procedure could also be made, i.e., from faster training to slower predictions. Furthermore, inter-subjects generalization has not been studied in the present investigation. The actual usefulness of performing this would largely depend on the basic or application purposes. Another limitation of the present work is the lack of feedback testing, which necessitates a priori identification of a reliable signal and a new operational strategy for learning. Future work will address these aspects specifically.

### **NEUROPHYSIOLOGICAL SIMILARITY BETWEEN MODELED CPG AND CPG IN HUMANS AND OTHER MAMMALS**

The understanding of CPG mechanisms remains central in locomotion study (Grillner, 2006; Kiehn, 2006; Rossignol and Frigon, 2011). The CPG is a spinal network of neurons capable of generating a rhythmic pattern of alternate activities between flexor and extensor motoneurons on the same side with reciprocal activation of homologous motoneurons in the contralateral limb. This intrinsic spinal circuitry has been well described in many invertebrate and vertebrate animals, and is highly conserved even in humans, where greater cortical control of spinal modules is required working in conjunction with sensory feedback (Calancie et al., 1994; Bussel et al., 1996; Duysens and Van de Crommert, 1998; Drew et al., 2002; Rossignol et al., 2009). The unique characteristics of human walking probably reflect a complex neural mechanism responsible for pattern production. It is therefore difficult to directly extend experimental findings obtained in quadruped animals to human walking (Barbeau et al., 1998; Capaday, 2002).

The fact that some patients with incomplete spinal injury can move their legs in a rhythmic fashion (Dietz et al., 1995) and that the primary sensorimotor cortex provides oscillatory commands toward the spine during walking (La Fougère et al., 2010) motivates new experiments in which different types of oscillatory signals could be used as input to the CPG-like DRNN. In this context, recent studies have showed EEG oscillations in relation to the gait cycle phase including event-related spectral perturbation in the alpha-beta and gamma bands (Gwin et al., 2011; Haefeli et al., 2011; Cheron et al., 2012; Wagner et al., 2012).

These results are consistent with a top down control of locomotion (Capaday, 2002) and demonstrate the feasibility of extracting EEG signals from the sensorimotor cortex controlling the contralateral foot placement during walking. Although the distinction between the brain signals directly linked to the motor commands and those related to the treatment of multiple sensory signals is a hard task. In this context, Petersen et al. (2012) have found evidence of synchrony in the frequency domain between the primary motor cortex and the tibialis anterior muscle prior to heel strike during the swing phase of walking signifying that rhythmical cortical activity is transmitted via the corticospinal tract to the active muscles. Additionally Wagner et al. (2012), showed a significant difference in the alpha (8–12 Hz) and beta (18–21 Hz) rhythm recorded over the central midline area between passive and active walking with exoskeleton. The role played by specific oscillations related to the initiation and control of human locomotion coming from supraspinal structure was recently demonstrated by local field potential recordings performed in the pedunculopontine nucleus in parkinsonian patients during rest and unconstrained walking (Thevathasan et al., 2012). Alpha oscillation recorded in the caudal part of this nucleus is correlated with gait speed and permits to suppress "task irrelevant" distraction for improving gait performance. Moreover, these authors showed that gait freezing of parkinsonian patient was associated with the attenuation of these alpha waves.

Consistent with this aspect of gait physiology, in our model, input sine waves are sufficient to predict successful output with the DRNN and offer the possibility to mimic such type of supraspinal oscillatory input. It could be advanced that nonlinear mapping of sinusoidal oscillations to kinematic pattern should be realized by other mathematical functions, such as by a Taylor series, but such multi-dimensional approximation seems to be highly difficult to obtain and does not permit testing different network configurations mimicking biological organization such as the CPG. In the present study, we focused on input oscillation derived from the first three harmonics of kinematic signals, which were slower than the alpha frequency range. However, it could be possible to extract slower oscillation from alpha or beta derived signals (envelope) in order to activate the DRNN in our future work.

# **STRUCTURAL SIMILARITIES OF MODELED CPG AND NEUROPHYSIOLOGICAL CPGs IN ANIMALS**

It has been suggested that the biological CPG is considered to serve two basic functions: rhythm generation (RG) and pattern formation (PF).

Initially proposed by Perret and Cabelguen (1980), the idea that the biological CPG is composed of a rhythm and a patternamplitude generator is now widely accepted (Kriellaars et al., 1994; Guertin et al., 1995; Perreault et al., 1995; Grillner, 2006; Kiehn, 2006; Talpalar et al., 2011) and paved the way to more complex models of multi-level CPG (McCrea and Rybak, 2008, see below).

It is well-recognized that rhythm generating networks can be realized by means of (1) pacemaker neurons with intrinsic membrane properties such as those described in the stomatogastric ganglion of crustaceans or in the mammalian thalamus (Steriade and Llinás, 1988) or (2) most simplistic neurons without intrinsic pacemaker properties but interacting with inhibitory synapses for producing oscillation as emergent properties of this neuronal population (Geisler et al., 2005). Both neuronal systems thus present the fundamental ability to oscillate. Firstly described in the tadpole and lamprey CPGs, glutamatergic excitatory neurons distributed along the cord (Grillner, 2003) assume the function of rhythmic generator by driving motor neurons and other ipsilateral and commissural inhibitory neurons coordinating the different CPG modules. By blocking the inhibitory networking in the lamprey and also in rodent and cat, many authors (see Kiehn, 2006 for a review) have demonstrated that the glutamatergic burst neurons are the generators of the CPG rhythm.

In addition to intrinsic RG properties, the walking CPGs need to integrate the ipsilateral coordination of flexors and extensors across the same or different joints in a limb and perform interlimbs coordination. It has been proposed (Zhong et al., 2012) that a subpopulation of neuronal CPG that drives extensor activity is tonically active and is regulated via inhibitory interactions with another CPG rhythmic structure responsible for flexors activity in the same hemicord. This assumption may explain why, during experimental recordings on the neonatal mouse isolated spinal cord, spontaneous deletions of extensor activity do not perturb rhythmic flexor activity. Thus, the inhibitory interneurons play a major role in the temporal sculpting and coordination of the CPG units. The interneurons and the Renshaw cells are involved in this function and in the regulation of walking speed. In addition, the left-right coordination is assumed by a complex network of excitatory and inhibitory commissural interneurons acting on both motor neurons and inhibitory interneurons of the contralateral side (Kiehn, 2006). Interestingly, we have shown that a great percentage of artificial neurons became inhibitory neurons (negative synaptic weight) when the number of neurons progressively increases in the DRNN structure. In this context, it was recently demonstrated in awake mice that the spiking activity of inhibitory neurons of the barrel cortex is organized in order to balance excitation and prevent explosive activity in the recurrently connected cortical microcircuit (Gentet et al., 2010). This physiological mechanism can also be proposed in the present case of the emergent structure of artificial DRNN circuit. Another, not exclusive explanation can reside in the prevalence of inhibitory recurrent connections for producing network oscillation (Geisler et al., 2005; Wildie and Shanahan, 2011). In their review, Nishimaru and Kakizaki (2009) have proposed that inhibitory interneurons play a major role in the CPG of rodent spinal cord. The interneurons are likely to control the bursting of motor neurons during locomotion and it appears that the synaptic transmission mediated by glycine and GABA shifts from excitatory to inhibitory during the prenatal period. It was recently demonstrated that in the absence of glutamatergic synaptic transmission, the flexor-extensor alternation appears to be generated by the inhibitory interneurons, mediating reciprocal inhibition from muscle proprioceptors to antagonist motor neurons (Talpalar et al., 2011).

The present artificial model does not pretend to mimic the complexity of the CPG structure. Instead, it presents a highly simplified recurrent organization from which CPG-like dynamic function emerges, following appropriate learning. Sinusoidal inputs serve as temporal referent to produce rhythmic angles patterns. This model could correspond to the RG structure described previously as a higher order structure that determines rhythmic output of the system (McCrea and Rybak, 2008; Zhong et al., 2012) since sine waves are transformed into kinematics. Another, lower order structure responsible for the phasing and intensity coordination (McCrea and Rybak, 2008; Zhong et al., 2012) could be assumed by another model of DRNN transforming theoretical kinematics into practical muscle orders. We have already studied such relation where EMG signals from walking where used to predict kinematics (Cheron et al., 2003). To conclude to this point, we propose that the two model driving two specific DRNNs (one for producing elevation angles from sine waves and one producing muscular patterns from elevation angles during walking) could act as a complementary top-down pathway to produce adequate coordinated patterns as it has been proposed to model the locomotion in spinal mouse (Zhong et al., 2012).

The present results can also be discussed in the light of the electrical stimulations performed in the MLR) inducing locomotor behavior in decerebrate cat (Shik et al., 1969) or in lamprey (McClellan and Grillner, 1984). In mice, prolonged rhythmic stimulation on the midline of the caudal hindbrain or the ventral spinal cord (C1–C4) induces a stable locomotor activity (Talpalar et al., 2011). Typically, low-frequency stimulation leads to slowfrequency movements and inversely fast-frequency stimulation leads to fast-frequency movements. Our model is in accordance with this physiological behavior, when amplitude of the artificial sine wave inputs increases, the amplitude of stepping increases as well, leading to a change in walking velocities (**Figure 6B**). In terms of neurological development, there is some evidence for the existence of CPG very early in CNS maturation (Yang et al., 1998; Dominici et al., 2011). Neonatal, so-called "infant" stepping has been ascribed to similar EMG patterns activity in different directions inducing stereotyped yet non-functional walking patterns (Lamb and Yang, 2000). This leads the authors to conclude that the same CPG controls different stepping in human infants in contrast with some studies in adults (Thorstensson, 1986; Grasso et al., 1998). Interestingly, we found that DRNN with only four hidden artificial neurons can generate walking pattern, whereas at least 50 hidden neurons are the prerequisite to generate accurate movements (**Figure 7**). Obviously, recruitment and training of such high numbers of neurons requires long computational time. For example, with a 4 hidden units DRNN structure, the learning process lasts about 5 min based on our computer performance while 160 min are necessary for a DRNN containing 80 units (**Figure 4**).

## **PERSPECTIVES**

Such a tool can be used to produce gait kinematics in numerous and various applications. For rehabilitation it can be used to train people for recovering a walking pattern corresponding to their physical characteristics by training with an appropriate feedback. Specifically dedicated DRNN based on the proper dynamics of participants could be used for medical applications such as in prosthesis and exoskeleton control (Cheron et al., 2012). It can also be integrated in BCI applications in which higher order commands can be used, e.g., from steady state visual

### **REFERENCES**


*Comput. Methods Biomech. Biomed. Engin.* 8, 29–30.


or somatosensory evoked potentials (Cheron et al., 2012) or P300 (Castermans et al., 2011; Duvinage et al., 2012b). This neuronal avenue might lead to the decoding of higher neuronal commands that govern CPG mechanisms. Since these CPG can be trained using specific sinusoidal frequency signals, it might be possible to extract this type of signals from specific EEG rhythms since the brain itself is an effective machine for producing oscillations (Buzsáki and Draguhn, 2004). One of the strengths of this approach is that it is not necessary to determine in advance the topology and the timing sequences between the artificial neurons. This contrasts with other CPGs, such as a recently developed ones (Duvinage et al., 2011) based on coupled oscillators (Righetti and Ijspeert, 2006), where adjustment of intrinsic parameters by optimization techniques was necessary.

In future studies, by introducing an informational delay (Draye et al., 1997) or an artificial distance based on a Gaussian factor modulating the weights between the different neurons (Draye et al., 2002), it will be possible to analyze the self-tailored organization of the links between neurons and the possible emergence of specific topologies. In this case it will also be possible to select different modular architectures of the DRNN.

# **ACKNOWLEDGMENTS**

This work was funded by the Belgian Federal Science Policy Office, the European Space Agency, (AO-2004, 118), the Belgian National Fund for Scientific Research (FNRS), the research funds of the Université Libre de Bruxelles and of the Université de Mons (Belgium), the FEDER support (BIOFACT), and the MINDWALKER project (FP7 – 2007–2013) supported by the European Commission. M. Duvinage is a FNRS Research Fellow. This paper presents research results of the Belgian Network DYSCO (Dynamical Systems, Control, and Optimization), funded by the Interuniversity Attraction Poles Program, initiated by the Belgian State, Science Policy Office. The scientific responsibility rests with its author(s).

study. *Acta Neurobiol. Exp. (Wars)* 56, 465–468.


Hoellinger et al. Biological oscillations for learning walking coordination

actions of the triphasic EMG pattern by a dynamic recurrent neural network. *Neurosci. Lett.* 414, 192–196.


dynamic recurrent neural network. *Biol. Cybern.* 76, 365–374.


gait: backward versus forward locomotion. *J. Neurophysiol.* 80, 1868–1885.


postural stability. *J. Neurophysiol.* 94, 754–763.


*Animals and Machines – AMAM 2005* (Ilmenau: Verlag ISLE), 45.


oscillations in the pedunculopontine nucleus correlate with gait performance in parkinsonism. *Brain* 135, 148–160.


Zhong, G., Shevtsova, N. A., Rybak, I. A., and Harris-Warrick, R. M. (2012). Neuronal activity in the isolated mouse spinal cord during spontaneous deletions in fictive locomotion: insights into locomotor central pattern generator organization. *J. Physiol. (Lond.)* 590, 4735–4759.

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 20 December 2012; accepted: 09 May 2013; published online: 29 May 2013.*

*Citation: Hoellinger T, Petieau M, Duvinage M, Castermans T, Seetharaman K, Cebolla A-M, Bengoetxea A, Ivanenko Y, Dan B and Cheron G (2013) Biological oscillations for learning walking coordination: dynamic recurrent neural network functionally models physiological central pattern generator. Front. Comput. Neurosci. 7:70. doi: 10.3389/fncom. 2013.00070*

*Copyright © 2013 Hoellinger, Petieau, Duvinage, Castermans, Seetharaman, Cebolla, Bengoetxea, Ivanenko, Dan and Cheron. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

# Modular neuron-based body estimation: maintaining consistency over different limbs, modalities, and frames of reference

#### *Stephan Ehrenfeld1, Oliver Herbort <sup>2</sup> and Martin V. Butz <sup>1</sup> \**

*<sup>1</sup> Cognitive Modeling, Department of Computer Science, Eberhard Karls University of Tübingen, Tübingen, Germany <sup>2</sup> Department of Psychology, Julius-Maximilians University, Würzburg, Germany*

### *Edited by:*

*Thomas Schack, Bielefeld University, Germany*

### *Reviewed by:*

*Dirk Koester, Bielefeld University, Germany Emili Balaguer-Ballester, Bournemouth University, UK*

### *\*Correspondence:*

*Martin V. Butz, Cognitive Modeling, Department of Computer Science, Eberhard Karls University of Tübingen, Sand 14, Tübingen 72076, Germany e-mail: martin.butz@ uni-tuebingen.de*

This paper addresses the question of how the brain maintains a probabilistic body state estimate over time from a modeling perspective. The neural Modular Modality Frame (nMMF) model simulates such a body state estimation process by continuously integrating redundant, multimodal body state information sources. The body state estimate itself is distributed over separate, but bidirectionally interacting modules. nMMF compares the incoming sensory and present body state information across the interacting modules and fuses the information sources accordingly. At the same time, nMMF enforces body state estimation consistency across the modules. nMMF is able to detect conflicting sensory information and to consequently decrease the influence of implausible sensor sources on the fly. In contrast to the previously published Modular Modality Frame (MMF) model, nMMF offers a biologically plausible neural implementation based on distributed, probabilistic population codes. Besides its neural plausibility, the neural encoding has the advantage of enabling (a) additional probabilistic information flow across the separate body state estimation modules and (b) the representation of arbitrary probability distributions of a body state. The results show that the neural estimates can detect and decrease the impact of false sensory information, can propagate conflicting information across modules, and can improve overall estimation accuracy due to additional module interactions. Even bodily illusions, such as the rubber hand illusion, can be simulated with nMMF. We conclude with an outlook on the potential of modeling human data and of invoking goal-directed behavioral control.

**Keywords: modular body schema, sensor fusion, multisensory perception, multisensory processing, multimodal interaction, probabilistic inference, population code, conflicting information**

# **1. INTRODUCTION**

Humans and other animals appear to learn and maintain a body schema <sup>1</sup> (Graziano and Botvinick, 1999; Haggard and Wolpert, 2005), which is used to realize goal-directed behavior control. Evidence for having knowledge about the own body schema and associated body image is already found in 2-month old children, indicating that this knowledge is acquired very early in life (von Hofsten, 2004; Rochat, 2010). The more accurate the own body schema is, the more the infant is able to separate the external world (von Holst and Mittelstaedt, 1950) from its own body and, consequently, the more the infant is able to actively and goal-directedly explore the world (Konczak et al., 1995; Butz and Pezzulo, 2008). Developmental as well as neuroscientific evidence indicates that developing a body schema is critical for developing flexible, goal-directed behavioral control. In this paper we propose a computational neural model of how knowledge about the body can be represented, processed, and learned.

The human brain has solved these challenges. In particular, the brain appears to be able to flexibly integrate multimodal sensory information about the body into a current estimate of its body state. This body state estimate seems to be modularized in two fashions: sensory modality-respective modularizations and body part-respective modularizations.

Evidence for sensor-specific modularizations can be found in brain imaging studies, which suggest that cross-modal sensory information fusion is common when perceiving the own body (Shams et al., 2000; Shimojo and Shams, 2001; Beauchamp, 2005). Related research suggests that body state representations are separated into body parts to certain degrees (Andersen et al.,

When learning such a body schema, specific challenges must be met. First, sensory information about the body is available in different modalities and frames of reference. Thus, mappings between these modalities need to be established. Second, uncertainty due to noise, external forces, and changes of the body and the environment has to be handled effectively. Third, different information signals about the body may contradict each other, so that the maintenance of the present body state estimate is non-trivial.

<sup>1</sup>Note that **Table 1** lists the terminology utilized in this paper.

### **Table 1 | MMF-terminology.**


1997; Gentner and Classen, 2006; Latash et al., 2007; Shadmehr and Krakauer, 2008; de Vignemont et al., 2009). Thus, a highly modularized body state estimate is maintained by our brain.

For maintaining such a modularized but consistent body state estimate, information is effectively interchanged and fused across the modularizations (Tononi et al., 1998; Ernst and Bülthoff, 2004; Stein and Stanford, 2008). Hereby, the information exchange typically depends on how the body is currently positioned and oriented in space (Holmes and Spence, 2004; Butz et al., 2010). Neurological disorders further indicate that both sensory input and body state estimates are fused across modules (Giummarra et al., 2008). To combine incoming sensory information with the most accurate body state estimate, the brain also anticipates body state changes and consequent sensory feedback during movement execution (von Holst and Mittelstaedt, 1950; Blakemore et al., 2000; Sommer and Wurtz, 2006). Many of these interactions seem to take place in early stages of the cortical processing hierarchy (Stein and Stanford, 2008), probably before the sensory information is fully integrated into the own body state estimate. Further evidence for sensory information comparisons and the flexible fusion of this information for maintaining body state estimates is given by multimodal illusions like the rubber hand illusion (Botvinick et al., 1998; Haggard and Wolpert, 2005; Makin et al., 2008) and the Pinocchio illusion (Lackner, 1988). Thus, it appears that while the brain's body state estimate is highly modularized, many interactions ensure an effective estimate maintenance and sensory information integration. However, it remains unclear how, when, and which information is compared and selectively fused.

We recently proposed the Modular Modality Frame (MMF) model (Ehrenfeld and Butz, 2011, 2012, 2013), which models the maintenance of a body state estimate given noisy, multimodal sensory information sources. The MMF model fully relies on hard-coded kinematic knowledge of the simulated body and estimates body states by means of Gaussian probability densities. Here we present a neural extension of MMF—the neural Modular Modality Frame (nMMF) model. The novel contributions of nMMF are as follows:

First, body spaces, current body state estimation modules, and mappings between body modules are now implemented neurally. As a result, nMMF is able to encode arbitrary, even multimodal body state estimations. Moreover, the neural population encodings for body state estimates are plausible from a computational neuroscience perspective (Deneve and Pouget, 2004; Knill and Pouget, 2004; Denève et al., 2007; Doya et al., 2007). Second, we now ensure that the Shannon entropy of a distribution remains unchanged during multi-body state fusion, in order to avoid excessive information gain when fusing dependent sources of information. Third, information exchange is no longer restricted to forward and inverse kinematic mappings. Distal-to-proximal mappings are also included. This means that information about the hand in space can, for example, influence the estimate of the elbow location, of the orientation of the upper arm, or even of the shoulder joint angles.

The remainder of this paper is structured as follows. First, the nMMF model is detailed. Next, nMMF is evaluated on a simulated two degree of freedom arm in a two-dimensional setup. The evaluations show that nMMF is able to detect faulty sensory information on the fly and is able to propagate information appropriately distal-to-proximal, i.e., from hand to upper arm. In the final discussion, we compare nMMF to related models and sketch-out future research directions.

# **2. MATERIALS AND METHODS**

nMMF is inspired by those processes of human body state estimation which are detailed above. In a computational framework, these processes can be approximated by five key assumptions: (1) the body state is continuously estimated probabilistically over time; (2) multimodal, redundant sensory information sources are integrated based on Bayesian principles; (3) the body state representation is modularized along body parts as well as along modalities and their corresponding frames-of-reference; (4) the body modules are locally interactive in that information about the body state is compared and fused locally; (5) the redundant, modularized representation of the body is exploited for autonomous sensor failure detection and subsequent avoidance of the failing sensor's influence.

We now detail how these key aspects are realized in nMMF. First, we describe which modules are used, second, how neurons encode the sensory inputs and the body state, third, how information is fused, fourth, how information is projected across modules, fifth, how conflicting information is detected and blocked out, and, finally, how the overall information flow unfolds over time. In the subsequent evaluation section we show how nMMF processes sensory information, how faulty sensory information can be ignored to a certain degree, but also how such faulty sensory information can influence the complete body state estimation.

# **2.1. MODULES**

nMMF represents a body state by a collection of modules, where each module represents an aspect of the overall body state. In particular, nMMF's modules differ with respect to (1) the encoded joint (or the next distal limb) and (2) the modality frame in which the joint or limb is encoded. The term modality frame defines which modality is perceived (location, orientation, or joint angle) and in which frame of reference the modality is encoded (shoulder-centered or "local" with respect to the next proximal limb).

In the following, we focus on a general description of a humanoid arm, although the same principle may apply for a complete body description. First, we specify the state of an arm in general. Next, we detail how nMMF encodes the arm state in its respective modules.

# *2.1.1. Arm specification*

An arm state may be encoded by the arm's location in space, its limb orientations, or the joint angles. With respect to the arm's location, we denote the shoulder (elbow, wrist, fingertips) location by **λ**<sup>0</sup> (**λ**1, **λ**2, **λ**3) (cf. **Figure 1** for an illustration). To

**FIGURE 1 | Schematic of the four "hand"-limb-encoding modules.** Three coordinate systems (solid axes) are shown, together with the components (dashed lines) of the respective encoded vector. Dark gray (Global Location module): the coordinate system is centered around the shoulder with fixed orientation. Encoded is the global location vector, which goes from shoulder to the end-effector. Yellow (Global Orientation module): the coordinate system has the same orientation as the gray one but in this case the limb orientation is encoded by the means of two vectors: a unit vector parallel to the "hand" limb (shown, dashed lines), and a perpendicular vector (not shown). Red (Local Orientation module): the local coordinate system is oriented along the forearm. Relative to this forearm orientation, the orientation of the "hand" limb is encoded—by a unit vector parallel to the "hand" limb (shown), and a perpendicular vector (not shown). Green (Local Angle module): the fourth module encodes angles. The same four modules and respective coordinate systems exist for the forearm and the upper arm (not shown). Modified based on Ehrenfeld and Butz (2012, 2013).

derive the arm limb orientations we simply subtract successive limb locations. To additionally encode the inner rotations of the respective limbs, we define a point **κ***<sup>i</sup>* for each limb *i*, where **κ***<sup>i</sup>* is locked relative to the limb. Essentially, **κ***<sup>i</sup>* always lies somewhere on the unit circle around **λ***i*, where the unit circle's plane is perpendicular to the orientation of limb *i*. Finally, the joint angles of each arm joint *i* are denoted by the Tait-Bryan angles - φ*i*,1, φ*i*,2, φ*i*,<sup>3</sup> , which rotate about the intrinsic rotation axes *<sup>i</sup>*−1*x*, *<sup>i</sup>*−1*y* , *<sup>i</sup>*−1*z*, where one (two) apostrophes denote that the rotation axis has been rotated by the angles φ*i*,<sup>1</sup> (and φ*i*,2).

### *2.1.2. nMMF's arm encoding*

nMMF encodes probabilistic arm states by means of distributed population codes in redundant modules. In particular, each limb is encoded in four modality frames: global location (*GL*), global orientation (*GO*), local orientation (*LO*), and local (joint) angles (*LA*). Note that other modalities could be used in addition and other combinations of modalities and frames of reference are possible—such as a local location. It is crucial, however, that the chosen combinations form a redundant estimate of the overall body state. nMMF's implemented modules and their interactions are shown in **Figure 4**; **Figure 1** shows the employed modality frames for an exemplar arm.

To encode each modality frame, respective coordinate systems need to be defined. In order to provide a consistent notation for all nMMF modules, we introduce **x***Zi* as the estimated arm state of limb *i* in modality frame *Z*, where *Z* ∈ {*GL*,*GO*, *LO*, *LA*}<sup>2</sup> .

The first modality frame encodes the global location (*GL*) of an arm limb. Limb *i*'s end point **λ***<sup>i</sup>* in the *GL* modality frame is the 3D vector from the shoulder to the end-point of limb *i*:

$$\mathbf{x}^{GL\_i} \equiv \mathbf{\hat{x}}\_i - \mathbf{\hat{x}}\_0. \tag{1}$$

The global orientation (*GO*) is a 6D vector. It concatenates both a 3D unit-vector in the direction of the arm limb, and a 3D unit-vector perpendicular to the arm limb dependent on its inner rotation:

$$\mathbf{x}^{GO\_i} = \begin{pmatrix} \text{unit}\left(\lambda\_i - \lambda\_{i-1}\right) \\ \text{unit}\left(\kappa\_i - \lambda\_{i-1}\right) \end{pmatrix},\tag{2}$$

As both vectors are unit vectors and are perpendicular to each other, three degrees of freedom are canceled out and all remaining orientation vectors form a 3D manifold in 6D space.

The local orientation (*LO*) is analogous, but expresses both subvectors in a local coordinate system (e.g., *LO*<sup>2</sup> is expressed in a coordinate system whose axes are defined by *GO*1). Again, only a 3D manifold remains:

$$\mathbf{x}^{LO\_i} = \begin{pmatrix} \text{unit}\left(^{i-1}\lambda\_i - ^{i-1}\lambda\_{i-1}\right) \\ \text{unit}\left(^{i-1}\kappa\_i - ^{i-1}\lambda\_{i-1}\right) \end{pmatrix}. \tag{3}$$

Note that we use the pre-superscript to denote a particular, relative coordinate system, whereas we use the subscript to denote a particular limb. Furthermore, note that *<sup>i</sup>*−1**λ***i*−<sup>1</sup> ≡ (0, 0, 0) *<sup>T</sup>* due to the definition of the coordinate system relative to limb *i* − 1.

Finally, the local angles (*LA*) are encoded as Tait-Bryan angles

$$\mathbf{x}^{LA\_i} \equiv \begin{pmatrix} \phi\_{i,1}, \phi\_{i,2}, \phi\_{i,3} \end{pmatrix}^T,\tag{4}$$

which is identical to the arm encoding itself.

Note that all modality frames are maximally 3D. Thus, the locality of the modular architecture ensures that the amount of neurons needed to represent a particular modality frame with a neural population code of *n* neurons per dimension scales in *O* - *n*3 .

### **2.2. PROBABILISTIC REPRESENTATION**

In complex tasks, uncertainty is ubiquitous due to sensory and motor noise, external forces, changes in the environment, and changes of the body schema. To deal with this uncertainty, humans apply probabilistic body state estimations (Ernst and Banks, 2002; Körding and Wolpert, 2004). In computational models (e.g., Ma et al., 2006), state estimates are often simplified by confining probability density estimates to one type of distribution (such as the Gaussian, Gamma or Poisson distributions). However, shapes may vary greatly due to non-linear influences of mappings across modules, constraints (like joint restrictions or obstacles), varying shapes of sensory input to begin with, or even neural disorders. Moreover, in certain circumstances the brain may actually maintain multimodal alternatives about the current body state.

In contrast to MMF, nMMF approximates probability distributions with neural population codes (Deneve et al., 1999) to enable the representation of probability distributions with arbitrary shapes. Each neuron in such a code is responsive to specific values of the input data (preferred value) and thus has a local receptive field of a particular size. Note that by using population codes, the shapes of the encoded probability distributions become unconstrained. The modularity of nMMF ensures a scalable neural encoding of the arm or even the full body. In the following, we describe how the receptive fields and the preferred values of the population neurons are determined.

# *2.2.1. Sampling of neural populations*

In order to create neurons only within the reachable manifolds, we let the populations of neurons grow while observing simulated arm states. This is done in the following way: A simulated arm is set to a random arm position, which is uniformly distributed in angular space. Then, noiseless measurements **z***<sup>j</sup>* are obtained in each module *j*. If

$$||\mathbf{z}^j - \mathbf{x}\_l^j|| > d\_{\text{min}} \,\forall \, l \in \{1, \dots, N^j\}, \tag{5}$$

<sup>2</sup>Without any additional specification, arm states are encoded in a "global", i.e., shoulder-respective coordinate system. In the case of the local orientation (*LO*) modality frame, however, the coordinate system used to encode the state is relative to the next proximal arm limb. We use the pre-superscript to denote the encoding of a location in a limb-relative coordinate system. For example, *<sup>i</sup>*−1**λ***<sup>i</sup>* denotes the location of limb *i* relative to the location **λ***i*−<sup>1</sup> and orientation of limb *i* − 1.

a new neuron is added at **z***<sup>j</sup>* , where **x** *j <sup>l</sup>* denotes the preferred value of neuron *l* and *N<sup>j</sup>* the current number of neurons that exist in module *j*. Next, the arm is set to a new random position. Thus, all sampling positions are independent of each other and the resulting neurons in each module are approximately uniformly distributed, covering the reachable manifold.

# *2.2.2. Tuning function*

Each neuron has an associated tuning function (Deneve et al., 1999), which specifies how the neuron responds to a signal. We use Gaussian tuning functions with mean **x***<sup>l</sup>* and covariance **R**. For instance, if a measurement signal occurs at position **z**, the probability density function (PDF) at **x***<sup>l</sup>* is:

$$\mathbf{p}\_l = N\left(\mathbf{z}, \mathbf{R}\right)\left(\mathbf{x}\_l\right). \tag{6}$$

In effect, a Gaussian PDF is activated over the whole neural population (cf. **Figure 2**, yellow bars for an illustration). If the covariance **R** of all tuning functions is equal to the sensor covariance, then Equation (6) is the same as the inverse measurement model (Thrun et al., 2005).

Since probability mass has to be conserved when information flows from one module to another in nMMF, we derive the probability mass function (PMF) from the PDF. Note that the neural PMF encoding will typically slightly differ from the PDF encoding in nMMF, because the population codes in nMMF may not be uniformly distributed. This is illustrated in **Figure 2**.

### *2.2.3. Probability mass*

Let **X** be a multivariate random variable, and ω a subset of a sample space -. The probability mass *q* in ω corresponds to the

the neural population becomes Gaussian as well (yellow bars), while the probability mass (blue) is somewhat distorted because it accounts for the probability that *X* lies in ω:

$$q\_{\boldsymbol{\alpha}} \equiv \Pr\left[\mathbf{X} \in \boldsymbol{\omega}\right] = \int\_{\boldsymbol{\omega}} \mathbf{p}\left(\mathbf{x}\right) d\mathbf{x} \tag{7}$$

Just as *N* neurons are spread over -, is discretized into *N* subsets ω*l*, *l*∈(1..*N*), which are simply the Voronoi cells *Rl* of those neurons (cf. Appendix A.2). The probability mass of a neuron can then be approximated by the Volume *V* of the cell times the density (Equation 6) at the neuron's position

$$q\_l = \int\_{R\_l} \mathbf{p}\left(\mathbf{x}\right) d\mathbf{x} \approx \frac{V\_l \cdot \mathbf{p}\left(\mathbf{x}\_l\right)}{\sum\_{l^\*=1}^N V\_{l^\*} \cdot \mathbf{p}\left(\mathbf{x}\_{l^\*}\right)},\tag{8}$$

where the denominator normalizes the probability mass to 1. An illustration of a probability mass is shown in **Figure 2**, blue bars. To handle potential approximation errors, we ensure that the sum of the probability mass over all neurons *N* in a module is always normalized to 1, by

$$q\_l \leftarrow \frac{q\_l}{\sum\_{l^\*} q\_{l^\*}},\tag{9}$$

where the symbol "←" is used as a value update assignment.

# **2.3. INFORMATION FUSION**

With a neural, modularized, probabilistic body state representation in hand, we now focus on information processing and information exchange. In this section, we first detail the fusion of different neurally-represented PDFs, and consecutively derive the fusion of different PMFs. Two cases are considered: that the information carried by the different PMFs is dependent or independent.

The Bayesian fusion (Bloch, 1996) of multiple independent neurally-encoded probability distributions is the neuron-wise product of the respective PDFs. Thus, the fusion yields:

$$\mathbf{p}\_{\text{fused},l} \propto \prod\_{j=1}^{M} \mathbf{p}\_{j,l},\tag{10}$$

where *M* specifies the number of modality frames that are fused, *l* is the index of a specific neuron, and p*j*,*<sup>l</sup>* encodes the probability density that stems from modality frame *j* and that is covered by neuron *l*. As the density can be converted to a mass by p*<sup>l</sup>* = *ql* · *V*−<sup>1</sup> *<sup>l</sup>* , applying this identity to both sides of Equation (10) yields the fusion of PMFs

$$q\_{\text{fused},l} = \frac{(V\_l)^{-(M-1)} \prod\_{j=1}^{M} q\_{j,l}}{\sum\_{l^\*} (V\_{l^\*})^{-(M-1)} \prod\_{j=1}^{M} q\_{j,l^\*}}.\tag{11}$$

When Equations (10) or (11) is used to fuse partly or fully dependent information, the resulting distribution is overconfident (i.e., too narrow).

To correct for this overconfidence, the PDF can be raised to the power of an exponent α < 1. However, since we encode PMFs, additional conversions are again necessary to account for the Voronoi volumes covered by the

local neural density.

respective neurons. The correction for overconfidence is thus accomplished by:

$$q\_{\text{fused},l} \leftarrow \frac{V\_l^{1-\alpha} \left(q\_{\text{fused},l}\right)^{\alpha}}{\sum\_{l^\*} V\_{l^\*}^{1-\alpha} \left(q\_{\text{fused},l^\*}\right)^{\alpha}};\tag{12}$$

where the denominator normalizes the mass to 1. The effect is a widening of the encoded PMF, which is illustrated in **Figure 3**.

To infer the exponent α, a measure of information content is required. We use the Shannon entropy *h* to estimate the amount of information in a PMF:

$$h \equiv -\sum\_{l} q\_{l} \cdot \ln \left( q\_{l} \right), \tag{13}$$

where *ql* may denote the fused distribution as in Equation (11) or any other arbitrary distribution. If all distributions were Gaussian, the exponent could be derived from Equation (12) by requiring that the Shannon entropy in a module before fusion should be equal to the Shannon entropy after fusion:

$$\alpha = \mathbf{e}^{-2\left(\min\_{j} h\_{j} - h\_{\text{fused}}\right)}.\tag{14}$$

Due to the lack of a rigorous derivation of α in the general case, we utilize this approximation to determine α for our populationencoded probability masses in each module.

### **2.4. CROSS-MODULE CONNECTIONS**

With notations for modules in nMMF, neurally-encoded probability masses, and information fusion of redundant sources of information at hand, we now specify how the neural, crossmodule connections are implemented in nMMF.

Modules may differ along two axes: the limb-axis (proximalto-distal, shown horizontally in **Figure 4**), and the modality

frame axis (forward and inverse, shown vertically in **Figure 4**). Information may flow from one or two input modules to a neighboring output module. This may happen diagonally: Out of the four diagonal directions, only three are single transformation steps: proximal-to-distal-forward, proximal-to-distalinverse, and distal-to-proximal-forward. <sup>3</sup> Together, all three form a triangle in **Figure 4**—e.g., (*GL*2,*GL*3,*GO*3). In robotics, proximal-to-distal-forward and proximal-to-distal-inverse are typically termed forward and inverse kinematics, respectively, while distal-to-proximal mappings are often ignored.

### *2.4.1. Single transformation steps*

Rather than learning the neural connections, here we use hardcoded kinematic mappings

$$\mathbf{x}^{j,k \to i}(m,n) = \mathbf{f}^{j,k \to i}\left(\mathbf{x}\_m^j, \mathbf{x}\_n^k\right),\tag{15}$$

where *i*, *j*, *k* are neighboring modules of nMMF. A derivation of the closed form of **f***j*,*k*→*<sup>i</sup>* can be found in Ehrenfeld and Butz (2013).

For all pairs of input neurons *m* and *n*, connections are built to those neurons *l* in the output module, which are sufficiently close to the transformation result **x***j*,*k*→*<sup>i</sup>* (*m*, *n*). The Gaussian

<sup>3</sup>In contrast, the fourth diagonal direction, distal-to-proximal-inverse, is not a single transformation step: the fingertip location and the hand orientation simply do not influence the proximal arm's orientation directly.

**FIGURE 4 | Transformation steps between different modules: The modules (shown as circles) differ with respect to limbs (horizontal axis) and with respect to modalities and frames of reference (vertical axis).** Every transformation step consists of one or two input modules and one output module. An example is the two solid lines on the top right: together, they encode how the wrist location *GL*<sup>2</sup> depends on both the fingertip location *GL*<sup>3</sup> and the global hand orientation *GO*3. Yellow dash-dotted lines are the forward kinematics, dark gray dotted lines the inverse kinematics, and red solid lines the distal-to-proximal kinematics. Modified based on Ehrenfeld and Butz (2012, 2013).

(Equation 30) value for the Euclidean distance of each neuron *l* in the output module *i* to the transformation result **x***j*,*k*→*<sup>i</sup>* (*m*, *n*) is used as connection strength *w*:

$$\mathbf{w}\_{m,n \to l}^{j,k \to i} = V\_l \cdot N\left(\mathbf{x}^{j,k \to i} \left(m,n\right), \mathbf{R}\_{\text{Map}}^i\right)(\mathbf{x}\_l^i), \tag{16}$$

where the receptive field covariance **R***<sup>i</sup>* Map regulates how much the mapping itself widens the encoded probability distribution. It models an information loss during a transformation, either due to inaccurate mappings or due to discretization errors. Since we use accurate mappings, we only need to consider the latter and therefore base **R***<sup>i</sup>* Map on the neuron distance in the output module.

If the transformation step has two inputs from the location modality *GL* (e.g., an elbow location *GL*<sup>1</sup> and a wrist location *GL*2) the distance of both neurons' preferred values **<sup>x</sup>***GL*<sup>2</sup> *<sup>m</sup>* <sup>−</sup> **<sup>x</sup>***GL*<sup>1</sup> *<sup>n</sup>* must be approximately equal to the length of the forearm. We introduce a modifying factor *F* with respect to neurons *m* and *n*, which reflects how well the constraint is met:

$$F\_{mn} = \mathbf{e}^{-\frac{1}{2}\frac{\Delta\mathbf{x}\_{mn}^T}{|\Delta\mathbf{x}\_{mn}|}} (\mathbf{R}\_{\text{Map}}^i)^{-1} \frac{\Delta\mathbf{x}\_{mn}}{|\Delta\mathbf{x}\_{mn}|} \cdot (|\Delta\mathbf{x}\_{mn}| - d\_{\text{limb}})^2,\tag{17}$$

where *d*limb is the length of the respective arm limb, and **x***mn* ≡ **x***GL*<sup>2</sup> *<sup>m</sup>* − **x***GL*<sup>1</sup> *<sup>n</sup>* the relative position of both input neurons. Intuitively, (|**x***mn*|−*d*limb) <sup>2</sup> results in a penalization of larger deviations from the limb length, and the first factors scale this penalization dependent on the covariance in the mapping. For all other transformation steps, no constraints are necessary, and *Fmn* = 1 in these cases. In consequence, the connection weights *w* are normalized by

$$\boldsymbol{w}\_{m,n\to l}^{j,k\to i} \leftarrow \frac{\boldsymbol{F}\_{mn}\cdot\boldsymbol{w}\_{m,n\to l}^{j,k\to i}}{\sum\_{l^\*} \boldsymbol{w}\_{m,n\to l^\*}^{j,k\to i}} \; \forall \; m, n,\tag{18}$$

where the modifying factor *Fmn* blocks the influence of pairs of location neurons that do not correspond with the arm length sufficiently well.

Finally, the projection of two probability distributions *q<sup>j</sup>* , *q<sup>k</sup>* along the connections *f <sup>j</sup>*,*k*→*<sup>i</sup>* into module *i* yields

$$q\_l^i = \frac{\sum\_m \sum\_n q\_m^j q\_n^k \,\boldsymbol{\nu}\_m^{j,k \to i} \boldsymbol{\nu}\_{m,n \to l}^{j,k \to i}}{\sum\_{l^\*} \sum\_m \sum\_n q\_m^j \,q\_n^k \,\boldsymbol{\nu}\_{m,n \to l^\*}^{j,k \to i}},\tag{19}$$

where the denominator normalizes the overall activity again to 1.

### *2.4.2. Chain of transformation steps*

As nMMF's modules are strongly interconnected, information flows from any module to all other modules. This requires that multiple information transformation steps be done successively.

In nMMF, information is projected into other modules by means of two different approaches. The first approach is used when information needs to stay independent for determining plausibility estimates (cf. section 2.5). In this case, the forward or inverse kinematic mappings are used without fusing other information on the way. Thus, information is not mixed and projections of independent information sources into a common module stay independent. For example, sensory input from a local angle module may be projected to the corresponding global location module by the forward kinematics chain *LA* → *LO* → *GO* → *GL*. Meanwhile, sensory information from the global orientation may also be projected into *GL* by *GO* → *GL*. These two information sources remain independent of each other but are now represented in a common module and can thus be directly compared.

The second approach is used when information is fused across modules (cf. section 2.6). In this case, the information is projected across the modules of nMMF by alternating between local projection and information fusion steps. For example, the *LA* information is projected to *LO*, where the result is fused with the *LO* input. The fused result is then projected further to *GO*, where the result is fused again, and so on. This method enables the integration of even incomplete information<sup>4</sup> and it reduces computation time because fewer transformation steps are required.

### **2.5. CONFLICT RESOLUTION**

The information, which is exchanged via the specified crossmodule connections, has a specific certainty to it. This certainty is encoded implicitly in the neural population codes in each module. Sensory signals are encoded in a population code by making assumptions about the noise in the signal, typically using a measurement model (Thrun et al., 2005). However, those assumptions can be violated by, for example, sudden occurrences of systematic sensor errors, unacquainted environmental conditions, or changes in the body schema due to growth or injury. To be able to account for such potentially unknown signal disturbances, nMMF estimates plausibilities for each signal. If a signal has low plausibility, it is mistrusted and its information content is consequently decreased.

Because the true state of the body is unknown, nMMF estimates signal plausibilities by comparing different, redundant information sources. The modular encoding of the body in nMMF is highly suitable for conducting such comparisons. Given several redundant distributions about a body state, a failing distribution can be detected when it systematically and strongly differs from the complementary, redundant sources of information.

### *2.5.1. Acquisition of plausibilities*

Let *m*<sup>12</sup> be a measure of how well two sources (or distributions) 1 and 2 match each other. Zhang and Eggert (2009) provide an overview of different potential measures for *m*12. In nMMF, we use the scalar product as a matching measure. Given any neural module *i*, in which

<sup>4</sup>Incomplete information: If e.g., a location input *GL* is transformed into the global orientation module *GO*, the result specifies only one subvector in the direction of the arm, while the other, perpendicular subvector remains unspecified. The second approach can then easily fuse a complete *GO* input onto this incomplete information.

two PMFs (1 and 2) are encoded, their relative match is determined by:

$$m\_{12}^i = \frac{q\_1^i \left(\mathbf{x}^i\right) \cdot q\_2^i \left(\mathbf{x}^i\right)}{||q\_1^i \left(\mathbf{x}^i\right)|| \cdot ||q\_2^i \left(\mathbf{x}^i\right)||}$$

$$= \frac{\sum\_l q\_{1,l}^i \cdot q\_{2,l}^i}{\sqrt{\sum\_l q\_{1,l}^i \cdot q\_{1,l}^i} \cdot \sqrt{\sum\_l q\_{2,l}^i \cdot q\_{2,l}^i}},\tag{20}$$

where the dot · in the first line's numerator is the inner product of the two functions *qi* 1 - **x***i* and *q<sup>i</sup>* 2 - **x***i* . The measure *mi* <sup>12</sup> is symmetric, i.e., *mi* <sup>12</sup> <sup>≡</sup> *<sup>m</sup><sup>i</sup>* <sup>21</sup>. Thus, if one source has an offset, the matching measure can not determine which of the two sources has that offset. This can be solved by comparing multiple pair matches given at least three redundant sources of information.

To identify faulty sensory information, nMMF computes a plausibility value *mi* for each information source *i* by comparing it to multiple other redundant information sources *j*. The most direct comparison is done by determining the mean of the matches of channel *i* with all other channels *j*, whose information was transferred to module *i*:

$$\left(m^i\right)^\* = \frac{1}{N-1} \sum\_{j=1, j \neq i}^N m^i\_{\vec{\eta}}.\tag{21}$$

The measure may be termed an absolute plausibility measure of information source *i*. To obtain the final plausibility value, the relative matching quality is determined by dividing - *mi* <sup>∗</sup> by the highest absolute plausibility measure - *mj* <sup>∗</sup> of all related sources:

$$m^i = \frac{\left(m^i\right)^\*}{\max\_{\vec{l}} \left(m^j\right)^\*}.\tag{22}$$

The whole process is illustrated in **Figure 5**. In the illustration, sensor *S*<sup>4</sup> <sup>4</sup> is assumed to have a systematic error. As the sensor is

always included for comparisons in its own module *m*4, but only once in each other module, the arithmetic mean of its matching value is lower than that of the others. In our experience, this approach of comparing pairs of information sources is more robust than, for example, comparing one sensor to the combined information of all other sensors.

In summary, if a channel *i* is in accordance with most of the other channels, the plausibility estimate *m<sup>i</sup>* will be relatively high. In contrast, if a specific channel *i* systematically deviates from all other channels, its plausibility estimate will be relatively low.

### *2.5.2. Usage of plausibilities*

To incorporate the plausibility estimates into the sensor fusion process, the contribution of each information source *i* is weighted by its plausibility estimate *mi* . This is done by Equation (12), where the exponent α*<sup>i</sup>* needs to depend on the plausibility *m<sup>i</sup>* . Boundary constraints are α*<sup>i</sup>* (0) = 0, α*<sup>i</sup>* (1) = 1 and the mapping should strictly increase monotonically. We simply set α*<sup>i</sup>* ≡ *mi* , which meets these constraints.

# **2.6. INTERACTIVE INFORMATION FLOW**

With all options for information fusion at hand, we can finally specify the iterative information flow in nMMF. nMMF maintains an arm state estimate over time by executing four processing steps in each time step: a prediction step (A), a sensor fusion step (B), an update step (C), and a crosstalk step (D) (cf. **Figure 6**). The prediction step includes the impact of the movement on the estimates. The sensor fusion step first increases the dispersion of those sensory distributions that badly match other sensors. After that, the modified sensory distributions are fused. The next step integrates the sensor fusion result into the estimate of the body state. The last step enforces synchronization between the individual modules of the body state.

### *2.6.1. Prediction step*

In order to be able to use the information from previous time steps, the impact of any movement of the arm on the state estimates *q<sup>i</sup>* (**x**) is predicted. First, the arm movement **y** and motor noise **P<sup>y</sup>** are projected from motor space to all nMMF modules by linear approximations, resulting in **y***<sup>i</sup>* and **P***<sup>i</sup>* **y**. The involved Jacobians can be found in Ehrenfeld and Butz (2013).

Second, the impact of the movement is predicted by convolving the probability distribution of the last time step *q<sup>i</sup> t*−1|*t*−1 - **x***i* with the Gaussian *N* **y***<sup>i</sup>* , **P***<sup>i</sup>* **y** . This convolution can be understood as a translation of *q<sup>i</sup> t*−1|*t*−1 - **x***i* along the vector **y***<sup>i</sup>* and a blurring with the covariance **P***<sup>i</sup>* **y**. Thus the activity *<sup>q</sup><sup>i</sup> <sup>n</sup>* of some source neuron *n* in module *i* flows to all target neurons *l* in the same module. The consequent *a priori* activity of target neuron *l* after movement but before any sensor consideration can be determined by:

$$q\_{l,t|t-1}^i \leftarrow \sum\_n \left( q\_{n,t-1|t-1}^i \right.$$

$$\cdot \frac{V\_l \, N\left(\mathbf{x}\_n^i + \Delta \mathbf{y}^i, \mathbf{P}\_{\Delta \mathbf{y}}^i\right)(\mathbf{x}\_l)}{\sum\_{l^\*} V\_{l^\*} \, N\left(\mathbf{x}\_n^i + \Delta \mathbf{y}^i, \mathbf{P}\_{\Delta \mathbf{y}}^i\right)(\mathbf{x}\_{l^\*})}\right),\qquad(23)$$

**FIGURE 6 | Data flow for one limb: for simplicity, the inter-limb dependencies are not shown.** First, the forward model predicts the state estimate after the movement **(A)**. Second, the measurements are transformed from all modality frames to all other frames (dashed lines), where their respective qualities are calculated **(B.1)**. Third, copies of the original measurements are fused weighted with both

where the derivation is specified in the Appendix, cf. Equation (31). The equation sums up the activities from all source neurons *n*, where *N* is the Gaussian, which does the translation and blurring. The normalization in the denominator ensures that the activity that flows from each source neuron *n* is preserved.

# *2.6.2. Multi-sensor fusion*

During multi-sensor fusion, conflicting information content is reduced by deriving sensory plausibilities for each module (Equation 22) and modifying the sensory inputs using (Equation 12). Second, the modified distributions are projected across modules (Equation 19) in order to provide each module with all the sensory input. During this projection, chains of transformation steps accumulate information from more and more modules along the way. Finally, in each module *i*, the underlying distribution is fused with the outputs from all three chains (forward, inverse, and distal-to-proximal). With Equation (11) the fusion is:

$$\begin{split} s\_{l,t}^{i,\text{fused}} &= \\ \frac{V\_{l}^{-3} s\_{l,t}^{i} \cdot s\_{l,t}^{i} | \text{for} \cdot s\_{l,t}^{i} | \text{inv} \cdot s\_{l,t}^{i} | \text{dis}}{\sum\_{I^{\star}} V\_{l^{\star}}^{-3} s\_{l^{\star},t}^{i} \cdot s\_{l^{\star},t}^{i} | \text{for} \cdot s\_{l^{\star},t}^{i} | \text{inv} \cdot s\_{l^{\star},t}^{i} | \text{dis}}, \end{split} \tag{24}$$

the quality and the quantity of their information **(B.2)**. These fused measurements are then integrated in their respective modality frame **(C)**. Lastly, the crosstalk shifts all state estimates toward all other estimates, synchronizing them **(D)**. **(A–D)** are then repeated for other limbs and other time steps. Modified based on Ehrenfeld and Butz (2012, 2013).

where the notation |xyz is used to indicate the particular sensory information source that is projected into module *i* and *s i l*,*t* denotes neuron *l*'s share of this information<sup>5</sup> . The denominator normalizes the result.

# *2.6.3. Sensor integration*

After sensor fusion, the fused sensor distributions *s i*,fused *l*,*t* (Equation 24) are fused again, but this time with the *a priori* state estimate distributions *q<sup>i</sup> <sup>l</sup>*,*t*|*t*−<sup>1</sup> resulting from the prediction step (Equation 23). The resulting posterior distribution before the final crosstalk step (denoted by ∼) thus equates to:

$$\tilde{q}\_{l,t|t}^i = \frac{V\_l^{-1} q\_{l,t|t-1}^i \cdot s\_{l,t}^{i, \text{fused}}}{\sum\_{l^\*} V\_{l^\*}^{-1} q\_{l^\*,t|t-1}^i \cdot s\_{l^\*,t}^{i, \text{fused}}}. \tag{25}$$

# *2.6.4. Multi-body state fusion*

Finally, the module interaction in nMMF is applied to ensure that the state estimates stay consistent across the modules. This is done the same way as in multi-sensor fusion, except that afterwards the resulting distributions are modified such that each one has the same entropy as it had before (using Equations 12–14).

5While *q* denotes the probability mass of a body state estimate, *s* denotes the probability mass of a neurons response to sensory input.

Thus, during multi-body state fusion, information is first erroneously gained, and then corrected for by artificial information loss. The crosstalk step essentially shifts the means and shapes of each distribution toward other modules, ensuring consistency over modules. It does so without changing the distribution width. As a result, we have determined the final posterior distribution encoded by the probability masses in all neurons *l* for all modules *i*, denoted by *q<sup>i</sup> l*,*t*|*t* .

This step concludes the iterative information processing in nMMF, which continuously cycles over these processing (cf. **Figure 6**) steps over time. In the following, we validate the functionalities and capabilities of nMMF.

# **3. RESULTS**

To test if nMMF is capable of maintaining a coherent body state estimate, we evaluated nMMF in a simple arm model setup, in which a simulated sensor failure occurs temporarily. We then analyzed whether the sensor failure can be detected (section 3.2); whether the sensor failure can be compensated for (section 3.3); how the available, partially conflicting information is propagated across modality frames (section 3.4); and if the distal-to-proximal mappings improve nMMF's state estimation (section 3.5).

# **3.1. ARM SETUP**

To keep it simple, we use a minimally complex arm, which still shows all essential characteristics (i.e., modules that differ with respect to modalities, frames of reference and limbs, and cross-module interactions as in section 2.6). Specifically, a simulated planar arm with two limbs is used. The arm is controlled by a kinematic simulator, disregarding angular momentum or gravity. The simulator executes noisy movements with mean zero in the (x,y)-plane. The motor noise in the angular modules is

$$
\sigma\_{\text{movement}}^{LA\_1} = \sigma\_{\text{movement}}^{LA\_2} = \begin{pmatrix} 0 \ 0 \ 0.1 \ \text{rad} \end{pmatrix}^T. \tag{26}
$$

Each limb has one degree of freedom and a length equal to 1. Results are averaged over 200 runs. In each run, the arm is initially set to a new random position, while the state estimates start with uniform distributions (i.e., no knowledge).

# *3.1.1. Distribution of neurons*

Both neurons and mappings are built once before starting all 200 runs. The angles **x***LA*<sup>1</sup> and **x***LA*<sup>2</sup> can take on values in the interval (−π,π) on the *z*-axis. The direction parts of the global (local) orientation **x***GD*<sup>1</sup> (**x***LD*<sup>1</sup> ) and **x***GD*<sup>2</sup> (**x***LD*<sup>2</sup> ), as well as the location of the elbow, are on the unit circle. Thus, the populations in the modules *LA*1, *LO*1, *GO*1, *GL*1, *LA*2, *LO*2, and *GO*<sup>2</sup> all need to cover **lines** with the length 2π. Only the wrist location deviates from this: it must cover a whole **disk** with radius 2.

Two hundred Neurons are sampled in each of the former modules. Thus the average Euclidean distance between two neighboring neurons equals to

$$d\_{\text{avg}} = \frac{2\pi}{200} \approx 0.031\tag{27}$$

The minimum allowed distance between two neurons (cf. section 2.2.1) is set to *d*min = 0.7 · *d*avg. In order to achieve the same average distance in *GL*2, the number *NGL*<sup>2</sup> of neurons which need to be sampled is defined by

$$
\sqrt{\frac{\pi r^2}{N^{GL\_2}}} = \frac{2\pi}{200}.\tag{28}
$$

The *GL*<sup>2</sup> neurons are distributed on a disc with radius *r* = 2 + 3σ*GL*<sup>2</sup> Map = 2.09. The summand 2 accounts for the two limb lengths from shoulder to wrist, while 3σ*GL*<sup>2</sup> Map (cf. section 3.1.2) guarantees that some neurons have receptive fields outside but close to the arm's reach. This slightly enlarged neural coverage avoids that boundary effects distort a probability distribution. The enforced equality (Equation 28) yields *NGL*<sup>2</sup> = 14.0 · 10<sup>3</sup> neurons.

# *3.1.2. Mappings*

We chose the standard deviation for the mapping's spreading (cf. Equation 16) so that it is equal to the average neuron distance, i.e., σ*<sup>i</sup>* Map <sup>=</sup> *<sup>d</sup><sup>i</sup>* avg ≈ 0.031. The mappings spread radially, i.e., **R***<sup>i</sup>* Map = diag - σMap , where diag refers to a diagonal matrix. We discarded any mappings that fall outside a 3σMap-range.

# *3.1.3. Tracking of information*

In order to track the information influence stemming from one module (here *GL*2), we (1) introduced an offset to *GL*<sup>2</sup> and (2) set its noise very low when compared to the other modules. The offset is introduced for two reasons: to distinguish the information that originates in *GL*<sup>2</sup> from all other information, and to observe how nMMF reacts to the sudden failure of a sensor. The offset has a magnitude of 0.5 limb length. It is switched on at time *t* = 4 and switched off again at *t* = 7. The offset is in a counterclockwise direction (i.e., from the arm's perspective, the offset is to the left). *GL*2's noise is low compared to other modules, in order to increase *GL*2's impact. We chose radial Gaussians for the sensor noise:

$$\sigma^i = \begin{cases} 0.05 \text{ limb length} & \text{if } i = \text{GL}\_2 \\ 0.5 \text{ (in rad, limb length, } \dots \text{)} & \text{otherwise} \end{cases}, \qquad (29)$$

where σ is the standard deviation.

Evaluating nMMF when conflict resolution is applied allows us to determine whether the sensor failure can be detected and how well nMMF compensates for it. When conflict resolution is turned off, the setup shows how information starting in *GL*<sup>2</sup> is generally propagated across modalities, frames of reference, and limbs.

# **3.2. DETECTION OF SENSOR FAILURE**

A sensor failure is modeled by the *GL*2-sensor offset during the interval *t* ∈ [4, 6]. By comparing all sensors, nMMF autonomously infers plausibility measures (Equation 22), which are displayed in **Figure 7**.

Even outside the offset-interval, *GL*<sup>2</sup> (top right) shows a low plausibility *m* as compared to other modules. This is because, in general, three aspects characterize a distribution: its mean, its shape, and its dispersion. However, deciding which of these characteristics should be tested by a matching-measure *m* depends on the application. For instance, Equation (22) compares all three characteristics. As *Gl*2's receptive field (Equation 29) is narrower than all other receptive fields, its dispersion is lower, and *mGL*<sup>2</sup> mainly detects the different dispersions, while it might be more interesting to instead detect systematic errors of the mean. Thus, for this application, a dispersion-independent measure (Ehrenfeld and Butz, 2012, 2013) might be more appropriate. This would yield much higher measures *mGL*<sup>2</sup> than shown in **Figure 7**, top-right.

Nevertheless, the measure is still able to detect sensor failure: while the offset is present (*t* ∈ [4, 6]), the plausibility measure drops in the setup with offset (red), as compared to the setup without offset (yellow) (**Figure 7**, top-right).

## **3.3. COMPENSATION OF SENSOR FAILURE**

Plausibilities were introduced as a measure of quality of an information source. If all sources provide correct data, plausibilities introduce a random change on otherwise Bayesian fusion. Such a change can only worsen the state estimate. The results confirm this: With plausibilities switched on, state estimates get worse (cf. red vs. yellow, blue vs. green in **Figure 8**). If, however, a sensory source is conflicting the others (red and yellow in the interval *t* ∈ [4, 6]), plausibilities can suppress the influence of the false sensor information and improve the overall state estimate (red vs. yellow in **Figure 8**). This improvement is even visible under strong noise (red vs. yellow in **Figure 8**). Again, a dispersionindependent measure (Ehrenfeld and Butz, 2012, 2013) could improve the performance.

# **3.4. PROPAGATION OF INFORMATION ACROSS MODALITIES, FRAMES OF REFERENCE AND LIMBS**

The setup without conflict resolution (**Figure 8**, yellow and green) shows how information is propagated across modality frames and limbs in general. The yellow peak, which starts in *GL*<sup>2</sup> (top right), is successfully propagated to all other modality frames (from top to bottom) and to the next proximal limb (from right to left). Shown is the estimation error (Euclidean distance between the real arm state and the estimated arm state).

# **3.5. PERFORMANCE IMPROVEMENT DUE TO DISTAL-TO-PROXIMAL MAPPINGS**

In order to see if distal-to-proximal mappings improve or worsen the state estimation, two setups, one with mappings and one without are compared. **Figure 9** shows that the proximal limb's state estimate improves (yellow vs. blue, red vs. purple) because additional information flows to it from the distal limb. A

**FIGURE 8 | An offset is propagated from** *GL***<sup>2</sup> to other modality frames and toward the upper arm (dashed yellow).** The usage of plausibilities reduces the offset's influence (the solid red curve is lower than the dashed yellow curve).

slight improvement can even be seen in the distal limb. This is the case because the distal limb profits from more accurate forward and inverse kinematic estimates in the proximal limb.

# **4. DISCUSSION**

We introduced the neurally-encoded modular modality frame (nMMF) model, which maintains a consistent and robust but also highly distributed body state estimate over time. As in the previously published Gaussian MMF model (Ehrenfeld and Butz, 2011, 2012, 2013), nMMF represents the body (an arm in the current implementation) modularized into body parts and sensor-respective frames of reference. Local, body-statedependent mappings allow for continuous interactions between modules, ensuring consistency. Bayesian information fusion principles are applied to fuse sensory information in the respective modules, to compare redundant information across modules, and to adjust the modular body state estimate for maintaining estimation consistency. Forward models are used to anticipate the sensory consequences of own movements and thus to fuse the consequent sensory information even more effectively.

In contrast to the MMF model, we showed that the same principles can be realized by means of a neural implementation, adding to the plausibility of the model. To succeed, population encodings principles of state estimates had to be employed. To establish a population code in one nMMF module, arm states were sampled randomly. To establish the neural mappings between the population codes, weight matrices were set based on the distances of the connected neurons, where the distances were currently determined by an informed kinematic model of the arm. To determine plausibility values, we used the scalar product to compare two neurally-encoded distributions. To avoid overconfidences in body states and to effectively realize information fusion, we normalized the resulting distributions maintaining respective Shannon entropies in the neural encodings.

In further contrast to the MMF model, nMMF also includes information exchanges from distal to proximal limbs and joints. This addition enables further-reaching information exchange. For example, information about the hand location can also influence estimates of the lower and upper arm, which was not the case in the MMF model (Ehrenfeld and Butz, 2013).

The evaluations confirmed that information from the wrist location influenced the whole arm estimate. First, we showed that due to the addition of the distal-to-proximal mappings, the location of the elbow or angles in the shoulder were adjusted by nMMF to generate an overall representation that is more consistent with the wrist estimate. We also showed that the additional mappings improve the state estimate due to the additional information exchange. Second, we showed that a systematic sensor error can be detected with the neural encoding. Third, although the inclusion of plausibilities slightly decreases the quality of the state estimate when all information sources are valid, if a sufficiently strong systematic error occurs in a sensor then the plausibility estimate can block this inconsistent information. Such sensor errors can be compared with situations in which visual information about the location of the hand is inaccurate, as is the case in the rubber hand illusion, thus leading to a misjudgment of the hand's location. The distal-to-proximal mappings in nMMF suggest, in addition to a misplacement of the hand, that the internal estimates of the elbow angles and lower arm orientations should be affected by the illusion.

# **4.1. RELATED MODELS**

The original motivation to develop the nMMF model came from SURE\_REACH (Butz et al., 2007), a neural, sensorimotor redundancy resolving architecture, which models human arm reaching. SURE\_REACH and the strongly related posture-based motion planning approaches (Rosenbaum et al., 2001; Vaughan et al., 2006) focused on flexible goal reaching capabilities and on anticipatory behavior capabilities, such as modeling the end state comfort effect (Rosenbaum et al., 1990). The current state of the body, although incorporated during action decision making, was not explicitly represented. In contrast, nMMF primarily focuses on the probabilistic, distributed representation of the body and effective information exchange. However, we believe that the nMMF model is ready to be combined with goal-oriented behavioral decision making, planning, and control routines. Moreover, while the SURE\_REACH model was also implemented by neural grids, it represented the angular space of the arm in one module. Such a representation, however, is unfeasible for a seven degree of freedom, humanoid arm. nMMF's modularizations yield spatial encodings that are maximally three dimensional. Thus, nMMF is applicable to a seven degree of freedom arm. In particular, while SURE\_REACH needs *O*(*x*7) neurons to cover the angular space of a humanoid arm with a density of 1/*x* neurons per dimension, nMMF only needs *O*(3*x*3) neurons to encode a comparable density.

The locality and modularity of nMMF relate the model to the mean of multiple computations (MMC) model (Cruse and Steinkühler, 1993; Schilling, 2011). However, nMMF additionally provides a probabilistic state representation, rigorous Bayesian-based information exchange, and plausibilityenhanced sensory information integration mechanisms. While the MMC model focuses on motor control, the nMMF model focuses on an effective, probabilistic body state representation. Nonetheless, the similarity to MMC suggests that similar motor control routines are implementable on a neural level in nMMF. Moreover, the fact that distributed, multisensory bodily representations serve well for goal-directed motor control (Andersen and Buneo, 2002) suggests that nMMF should be extended with adaptive motor control capabilities.

Various models use population codes for encoding probability distributions and exchange information in a comparable Bayesian fashion (Deneve and Pouget, 2004; Knill and Pouget, 2004; Doya et al., 2007). Information exchange across modalities and frames of reference take place in the brain. Gain fields are good candidates for realizing frame-of-reference conversions neurally (Andersen et al., 1985; Salinas and Abbott, 1995; Hwang et al., 2003; Deneve and Pouget, 2004). In the current nMMF implementation we used fully connected, direct transformations, which will need to be adjusted to gain-field transformations in order to map two three dimensional spaces into a third space. Nonetheless, in contrast to the related models, nMMF realizes a fully modularized, distributed probabilistic arm representation, which, to the best of our knowledge, has not been accomplished before. For example, Deneve and Pouget (2004) reviewed a multimodal gain field model that exchanged auditory, visual, and eye position information, enforcing consistency via population encodings. While nMMF has not considered auditory information so far, it goes beyond previous models in that it also incorporates a kinematic chain, relating body parts to each other along the chain. Thus, besides exchanging information across different frames of references, nMMF also exchanges information from distal-to-proximal body parts and vice versa.

In sum, nMMF focuses on estimating the own body state, incorporating multiple sources of information across sensory modalities and their respective frames of reference, as well as across neighboring body parts. While flexible goal-oriented behavior cannot be generated by nMMF at this point, the relations to the MMC model, the SURE\_REACH model, and the posture-based motion planning theory suggest that behavioral decision making, planning, and control techniques can be incorporated.

# **4.2. FUTURE WORK**

Although the plausibility measure used in this work is generally well-suited, our previous work showed that a more rigorous normalization can yield very little information loss but the same gain in robustness when plausibilities are applied (Ehrenfeld and Butz, 2012, 2013). A similar normalization in the neural implementation seems to be possible only by means of heuristics, lacking the computational rigor. We are currently investigating alternatives.

In the current nMMF implementation several choices had to be made about which information should be exchanged, how plausibilities should be computed, and which reference frames should be represented. Additional frames-of-reference could be represented, such as a local location frame. Synergistic body spaces may also be represented, potentially accounting for the synergistic properties of the human body, the muscle arrangements, and the neural control networks involved (Latash, 2008). Also, plausibilities may be determined by considering the internal state estimations in addition to the redundant sensory information sources. Finally, the transformations between limbs and frames-of-reference may also be endowed with uncertainties. In this way, the body model itself would become adjustable, potentially accounting for illusions such as the Pinocchio illusion (Lackner, 1988), where a body part (e.g., the nose) elongates phenomenally.

Due to its modularity and focus on bodily representations, we believe that nMMF can be easily integrated into a layered control architecture. In such an architecture, other layers may encode extended bodily motion primitives, plan the desired kinematics of bodily motions, or control the dynamics of the body. In particular, extended motion primitives may be incorporated in order to execute a motion sequence, potentially selectively with any limb or joint currently available, similar to us being able to push down a door handle by means of our hands but also potentially with one of our elbows. Meanwhile, kinematic planning mechanisms may utilize the nMMF representation to generate motion plans online. Finally, lower-level dynamic control layers may be included.

# **5. CONCLUSION**

In conclusion, this paper has shown that a distributed, probabilistic bodily representation can be encoded by modularized neural population codes based on Bayesian principles. The presented nMMF architecture is able to mimic the capability of humans to integrate different sources of information about the body on the fly, weighted by the respective information content. Bodily illusions can also be mimicked. Besides the more rigorous modeling of human data with nMMF beyond qualitative comparisons, we believe that nMMF should be embedded in a layered representation and adaptive control architecture in order to generate flexible and adaptive goal-oriented behavior.

# **ACKNOWLEDGMENTS**

The authors like to thank the Cognitive Modeling team. Funding from the Emmy Noether program (grant BU1335/3-1) of the German Research Foundation (DFG) is acknowledged. We also acknowledge support by the DFG for the Open Access Publishing Fund of Tübingen University.

# **REFERENCES**


*Artificial Cognitive Systems*. LNAI 5225, eds G. Pezzulo, M. V. Butz, C. Castelfranchi, and R. Falcone (Berlin: Springer-Verlag), 45–62.


107, 61–82. doi: 10.1007/s00422- 012-0526-2


in infants: hand trajectory formation and joint torque control. *Exp. Brain Res.* 106, 156–168. doi: 10.1007/BF00241365


Sommer, M. A., and Wurtz, R. H. (2006). Influence of the thalamus on spatial visual processing in frontal cortex. *Nature* 444, 374–377. doi: 10.1038/nature05279

Stein, B. E., and Stanford, T. R. (2008). Multisensory integration: current issues from the perspective of the single neuron. *Nat. Rev. Neurosci.* 9, 255–266. doi: 10.1038/nrn2331

Thrun, S., Burgard, W., Fox, D., et al. (2005). *Probabilistic Robotics*. Vol. 1. Cambridge, MA: MIT press.


*Development* (Bloomington, IN).

von Hofsten, C. (2004). An action perspective on motor development. *Trends Cogn. Sci.* 8, 266–272. doi: 10.1016/j.tics. 2004.04.002

von Holst, E., and Mittelstaedt, H. (1950). Das reafferenzprinzip. *Naturwissenschaften* 37, 464–476. doi: 10.1007/BF00622503

Zhang, C., and Eggert, J. (2009). "Tracking with multiple prediction models," in *19th International Conference on Artificial Neural Networks (ICANN)* (Limassol, Cyprus), 855–864.

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 05 April 2013; accepted: 08 October 2013; published online: 28 October 2013.*

*Citation: Ehrenfeld S, Herbort O and Butz MV (2013) Modular neuron-based body estimation: maintaining consistency over different limbs, modalities, and frames of reference. Front. Comput. Neurosci. 7:148. doi: 10.3389/fncom. 2013.00148*

*This article was submitted to the journal Frontiers in Computational Neuroscience.*

*Copyright © 2013 Ehrenfeld, Herbort and Butz. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# **A APPENDIX**

# **A.1 PSEUDOINVERSE AND ACTIVATION OF A GAUSSIAN**

A multivariate Gaussian with mean **μ** and Covariance-matrix **P** is given by:

$$N(\boldsymbol{\mu}, \mathbf{P})\left(\mathbf{x}\right) \equiv \frac{1}{\sqrt{(2\pi)^{k}|\mathbf{P}|}} \mathbf{e}^{-\frac{1}{2}(\mathbf{x}-\boldsymbol{\mu})^{T}\mathbf{P}^{-1}(\mathbf{x}-\boldsymbol{\mu})},\qquad(\text{30})$$

where *k* is the number of dimensions in the manifold.

In nMMF, three cases occur where activity is spread over neighboring neurons: when sensory inputs are encoded (Equation 6), when neural activity is propagated to other modules (Equation 16), and when the body estimate is updated with the movement (Equation 23). If the involved tuning functions are Gaussian, the activity mass spreads to all individual neurons *l* according to

$$q\_l \left( \mu, \mathbf{P} \right) = f \cdot \frac{V\_l N(\mu, \mathbf{P}) \left( \mathbf{x}\_l \right)}{\sum\_{l^\*} V\_{l^\*} N(\mu, \mathbf{P}) \left( \mathbf{x}\_{l^\*} \right)},\tag{31}$$

where **μ** is the new mean, **P** the tuning functions covariance, and *f* the activity mass, which is spread. For instance, if a sensory input is activated, **μ** is equal to the sensory reading and *f* is 1. If, on the other hand, the activity of a single neuron **x***<sup>n</sup>* is updated with a movement **x**, **μ** is equal to **x***<sup>n</sup>* + **x**, and *f* is equal to the neuron's probability mass *qn*.

If no inverse **P**−<sup>1</sup> exists, it is approximated with the pseudoinverse **P**+. The pseudoinverse is computed via a singular value decomposition, which factorizes the (real) *m* × *m* covariance **P** into

$$\mathbf{P} = \mathbf{U}\Sigma\mathbf{V}^T,\tag{32}$$

where **U** and **V** are unitary and diagonal. **U** and **V** can be understood as rotation matrices while is responsible for the scaling. Then the pseudoinverse **P**<sup>+</sup> is

$$\mathbf{P}^+ = \mathbf{V}\Sigma^+\mathbf{U}^T,\tag{33}$$

where the pseudoinverse <sup>+</sup> of the diagonal matrix is obtained by taking the reciprocal of every non-zero element. If a diagonal element of is equal to zero, this has to be interpreted as the probability distribution not depending on that element. Consistent with that interpretation is +: the corresponding diagonal element remains zero and deviations (**x***<sup>l</sup>* − **μ**) of the mean in the direction of that element are multiplied with zero, i.e., they do not lower the result of the Gaussian (Equation 31). Unfortunately, this is a singularity. For diagonal elements close but unequal to zero, <sup>+</sup> and consequently **P**<sup>+</sup> explode. This occurs especially if a sub-manifold in a higher dimensional space needs to be activated (e.g., a sphere of elbow positions).

Thresholds are introduced to prevent discretization errors and small numerical errors from destabilizing the model. Matrix elements - *P*−<sup>1</sup> *ij* larger than the threshold 10<sup>12</sup> are set to zero:

$$(\Gamma)\_{\vec{ij}} = \Theta \left( 10^{12} - \left( P^{-1} \right)\_{\vec{ij}} \right) \left( P^{-1} \right)\_{\vec{ij}},\tag{34}$$

where is the heavyside function. Following, the distance vectors α*<sup>l</sup>* and β*<sup>l</sup>* are introduced as **x***<sup>l</sup>* − **μ** and (**x***<sup>l</sup>* − **μ**) and bound by the threshold 10<sup>−</sup>10:

$$\mathbf{p}(\alpha\_l)\_i = \Theta \left( [\mathbf{x}\_l - \boldsymbol{\mu}\_i]\_i - 10^{-10} \right) \cdot [\mathbf{x}\_l - \boldsymbol{\mu}\_i]\_i \tag{35}$$

$$\Theta(\beta\_l)\_i = \Theta\left( [\Gamma \left( \mathbf{x}\_l - \boldsymbol{\mu} \right)]\_i - 10^{-10} \right) \cdot [\Gamma \left( \mathbf{x}\_l - \boldsymbol{\mu} \right)]\_i \tag{36}$$

Thus, Equation (31) becomes

$$q\_l \left( \mu, \mathbf{P} \right) = f \cdot \frac{V\_l \text{ e}^{-\frac{1}{2} \alpha\_l^T \beta\_l}}{\sum\_{l^\*} V\_{l^\*} \text{ e}^{-\frac{1}{2} \alpha\_{l^\*}^T \beta\_{l^\*}}},\tag{37}$$

### **A.2 VORONOI CELL AND VORONOI VOLUME**

When *N* neurons are spread over a sample space at positions **x***l*, *l*∈(1..*N*), the Voronoi-cell *Rl* of a neuron *l* is defined as the set of all points *x* that are closer to the neuron position **x***<sup>l</sup>* than to any other neurons, i.e.,

$$R\_l \equiv \{ \mathbf{x} \mid \forall m: ||\mathbf{x} - \mathbf{x}\_l|| \le ||\mathbf{x} - \mathbf{x}\_m|| \}\tag{38}$$

where ||·|| is the Euclidean norm. Intuitively, it is the subspace to which the neuron responds stronger than any other neurons. The Voronoi volume is defined as the volume of that cell. As only relative values are required, any normalization of the Volumes *Vl* is arbitrary.

# A modular theory of multisensory integration for motor control

# *Michele Tagliabue\* and Joseph McIntyre*

*Centre d'Étude de la Sensorimotricité, (CNRS UMR 8194),Institut des Neurosciences et de la Cognition, Université Paris Descartes, Sorbonne Paris Cité, Paris, France*

### *Edited by:*

*Yuri P. Ivanenko, IRCCS Fondazione Santa Lucia, Italy*

### *Reviewed by:*

*Rava A. Da Silveira, Ecole Normale Supérieure, France Marion D. Luyat, University of Lille 3, France*

### *\*Correspondence:*

*Michele Tagliabue, Centre d'Étude de la Sensorimotricité, (CNRS UMR 8194), Institut des Neurosciences et de la Cognition, Université Paris Descartes, Sorbonne Paris Cité, 45 rue des Saints Pères, 75006, Paris, France e-mail: michele.tagliabue@ parisdescartes.fr*

To control targeted movements, such as reaching to grasp an object or hammering a nail, the brain can use divers sources of sensory information, such as vision and proprioception. Although a variety of studies have shown that sensory signals are optimally combined according to principles of maximum likelihood, increasing evidence indicates that the CNS does not compute a single, optimal estimation of the target's position to be compared with a single optimal estimation of the hand. Rather, it employs a more modular approach in which the overall behavior is built by computing multiple concurrent comparisons carried out simultaneously in a number of different reference frames. The results of these individual comparisons are then optimally combined in order to drive the hand. In this article we examine at a computational level two formulations of concurrent models for sensory integration and compare this to the more conventional model of converging multi-sensory signals. Through a review of published studies, both our own and those performed by others, we produce evidence favoring the concurrent formulations. We then examine in detail the effects of additive signal noise as information flows through the sensorimotor system. By taking into account the noise added by sensorimotor transformations, one can explain why the CNS may shift its reliance on one sensory modality toward a greater reliance on another and investigate under what conditions those sensory transformations occur. Careful consideration of how transformed signals will co-vary with the original source also provides insight into how the CNS chooses one sensory modality over another. These concepts can be used to explain why the CNS might, for instance, create a visual representation of a task that is otherwise limited to the kinesthetic domain (e.g., pointing with one hand to a finger on the other) and why the CNS might choose to recode sensory information in an external reference frame.

**Keywords: sensory integration, motor control, maximum likelihood, reference frames**

# **1. INTRODUCTION**

Reaching to grasp an object requires that the CNS compare the position and orientation of the object with the position and orientation of the hand in order to generate a motor command that will bring the hand to the object. Depending on the situation, the CNS might use more than one sensory modality, such as vision and proprioception, to sense the position and orientation of the target and of the hand, with each source of information encoded in its own intrinsic reference frame. This raises the question as to how the CNS combines these different sources of information to generate the appropriate motor commands.

One school of thought contends that processes of sensor fusion for perception can be explained by the tenets of optimal estimation and control. According to the principles of maximum likelihood estimation, sensory signals that contain redundant information should be combined based on the expected variability of each so as to maximize the probability of producing a value close to the true value of what is being measured. This concept has been used with success in recent years to explain how humans combine different sources of sensory information to generate robust estimates of the position, size and orientation of external objects (Landy et al., 1995; Ernst and Banks, 2002; Kersten et al., 2004; Kording et al., 2007). Of greater interest for us, however, is the task of reaching an object with the hand, which adds additional aspects to the process beyond that of simple perception. The position and orientation of the object and of the hand must be effectively subtracted at some level, be it to compute a movement vector during task planning or to apply corrective actions based on real-time feedback during the course of the movement.This aspect of the task immediately brings to mind two additional issues that must be resolved: (1) To compare the position and orientation of two entities, sensory information about each must be expressed in a common coordinate frame. What reference frame(s) are used to perform the requisite computations? (2) The fusion of redundant sensory information might occur at various stages in the perception-action cycle. Where and how are the principles of maximum likelihood applied? In this article we will contrast two possible models of sensor fusion, which we will call *convergent* and *concurrent*, as illustrated in **Figure 1** for the task of hitting a nail with a hammer.

The convergent model shown in **Figure 1A** reflects the conventional idea that the CNS constructs a single representation of the

target based on all available sensory information. In the example of hammering a nail, this includes the position of the nail-head in the visual field and the position of the fingertips holding the nail as sensed by kinesthesia. Weighting can be used to privilege either the visual or the kinesthetic information in the estimate of the target position; ditto for the estimation of the hammer's position and orientation, for which both visual and kinesthetic information are available. The combined representations are then compared in some reference frame that could be the reference frame intrinsic to one of the sensory modalities, or it could be some other, more generalized coordinate system. For instance, kinesthetic information could be transformed into retinal coordinates, or both visual and kinesthetic information could be transformed into a common reference frame centered on the head or on the trunk or referenced to external objects (McIntyre et al., 1997; Guerraz et al., 1998; Henriques et al., 1998; McIntyre et al., 1998; Carrozzo et al., 1999; Pouget et al., 2002a; Avillac et al., 2005; Obhi and Goodale, 2005; Byrne et al., 2010). Under this scheme, the CNS would combine all available sensory information about the target into a single, optimal representation of its position and orientation. Similarly, sensory information would be combined to form an optimal representation of the hand's position and orientation in the same general reference frame. The comparison of target and hand would then be carried out within this general reference frame and the difference between the two positions would be used to drive the motor response.

**Figure 1B** shows the alternative hypothesis by which the CNS performs a distributed set of concurrent comparisons within each reference frame first, and then combines the results to form a unique movement vector (Tagliabue and McIntyre, 2008, 2011, 2012, 2013; McGuire and Sabes, 2009, 2011; Tagliabue et al., 2013). In the example of hammering the nail, visual information about the nail-head is compared to visual information about the hammer while at the same time kinesthetic information about the hand holding the nail is compared with kinesthetic information about the hand swinging the hammer. Each comparison is carried out separately and thus may be carried out within the coordinate system intrinsic to the corresponding sensory modality. Under this formulation, a movement is programmed based on an optimal combination of the different movement vectors within each of the various reference frames. In this way the CNS accomplishes multimodal sensorimotor coordination in a modular fashion by performing a number of simpler target-hand comparisons in parallel.

The purpose of this article is to examine in greater detail these two hypotheses of convergent versus concurrent comparisons of target and hand for reaching movements, both at a theoretical level and through a targeted review of the pertinent literature. In section 2 we differentiate further the two models at the conceptual level by showing mathematically how the application of optimal estimation differs between them. Using these equations, we go on to present the experimental evidence supporting the hypothesis that the CNS functions according to the concurrent model. In section 3 we examine the conditions in which the CNS will transform information from the intrinsic reference frame of one sensor to the reference frame of another. Key to this discussion is an assessment of how coordinate transformations and memory processes affect the variability of the outcome, and we explicitly take into account how co-variation of transformed signals affects the choice of weighting. Section 4 examines the time course of the underlying sensorimotor processes, providing insight into when sensorimotor transformations are actually performed and, as a corollary, indicating that not only does the CNS perform multiple comparisons in parallel, it maintains parallel memory traces in multiple reference frames as well. In section 5 we generalize the concepts of convergent and concurrent processes to more than two sensory modalities, and in section 6 we use these formulations to consider trade-offs between using sensory information encoded in reference frames intrinsic to the sensors themselves or with respect to extrinsic reference frames such as the visual surrond or with respect to gravity. In the final section we describe some specific predictions made by different concurrent and convergent formulations and discuss how the models might be differentiated experimentally.

# **2. MULTIPLE, CONCURRENT vs. MULTIMODAL, CONVERGENT**

The two models depicted in **Figure 1** can be described mathematically, in the linear case, as a set of weighted sums and differences. We use here linear formulations because they simplify the equations and are sufficient to make predictions about how the two models might differ computationally and experimentally. The main feature of the convergent model in **Figure 1A** is that a single representation of the target is compared to a single representation of the hand in the common reference frame, and a movement is performed that reduces the difference of these two estimates, *x*, to zero. The equation describing this formulation is:

$$
\Delta \mathbf{x} = \left( \boldsymbol{\omega}\_{\mathrm{T,V}} \boldsymbol{\omega}\_{\mathrm{T,V}} + \boldsymbol{\omega}\_{\mathrm{T,K}} \boldsymbol{\omega}\_{\mathrm{T,K}} \right) - \left( \boldsymbol{\omega}\_{\mathrm{H,V}} \boldsymbol{\omega}\_{\mathrm{H,V}} + \boldsymbol{\omega}\_{\mathrm{H,K}} \boldsymbol{\omega}\_{\mathrm{H,K}} \right) \tag{1}
$$

where *x*T,V and *x*T,K represent the position of the target detected by vision and kinesthesia, respectively, *x*H,V and *x*H,K represent the detected position of the hammer in each of those reference frames, and *w*T,V, *w*T,K, *w*H,V and *w*H,K are the weights given to each of these pieces of information. In the concurrent model of **Figure 1B**, target and hand are compared in the reference frame of each sensory modality first, and then the final movement vector *x* is computed as a weighted sum of the individual differences. This process can be described by the equation:

$$
\Delta \mathfrak{x} = \lambda\_{\rm V} (\mathfrak{x}\_{\rm T,V} - \mathfrak{x}\_{\rm H,V}) + \lambda\_{\rm K} (\mathfrak{x}\_{\rm T,K} - \mathfrak{x}\_{\rm H,K}) \tag{2}
$$

where λ<sup>V</sup> and λ<sup>K</sup> represent the weight given to the comparisons carried out in each of the two sensory modalities. Common to both Equations (1 and 2) is the idea that redundant information from the various sensory modalities can be weighted differently through the factors *w* and λ. In fact, Equation (2) is a special case of Equation (1), with the added constraint that within each sensory modality, signals about the target and the hand must have the same weight:

$$
\omega\_{\Gamma,i} = \omega\_{\text{H},i} = \chi\_i \tag{3}
$$

In the linear formulation used here, therefore, the computational difference between the two models is not so much in terms of the order in which sensory information is added or subtracted, but rather in terms of how the weighting factors *w* and λ are chosen.

The principles of maximum likelihood estimation (MLE) can be applied to both Equations (1 and 2) to find weighting factors that are in some sense optimal, although they differ in terms of what is optimized. The optimal estimation of a parameter *p* given noisy measurements (*m*1,. . . ,*mn*) corresponds to the value that maximizes the probability distribution *P*(*m*1,..., *mn*|*p*) which for independent measurements is equal to *<sup>P</sup>*(*m*1,..., *mn*|*p*) <sup>=</sup> *<sup>n</sup> <sup>i</sup>* <sup>=</sup> <sup>1</sup> *P*(*mi*|*p*). If each measurement is considered to be governed by Gaussian noise, the optimal estimate is analytically derived to be the weighted average such that the relative weight given to any one of the component quantities is equal to the inverse of it's variance relative to all the other quantities:

$$\boldsymbol{w}\_{m\_i} = \frac{\sigma\_{m\_i}^{-2}}{\sum\_{i=1}^{n} \sigma\_{m\_i}^{-2}} \tag{4}$$

where σ<sup>2</sup> *mi* is the variance of measurement *mi*. Thus, noisy variables are given less weight compared to those that are more reliable (Ghahramani et al., 1997). If weighted in this manner, the linear combination of different sources of information results in a reduction of output variability (i.e., an increase in movement precision) compared to the use of any one source of information alone. For illustration purposes, therefore, we assume that the noise exhibited by each sensory signal is Gaussian so that we may apply the linear maximal likelihood solution (Equation 4) to find the optimal weights.

For the convergent model in **Figure 1A**, applying MLE in order to compute the weighting factors (*w*- s) in Equation (1) means that an optimal estimate of the position of the hand, derived from all available sensory feedback about the hand, will be compared to (subtracted from) an optimal estimate of the target's position, similarly derived from all available sources of sensory information about the target. Applying Equation (4) to the convergent model, the sets of weights for *i* = K and *i* = V are:

$$\omega\_{\Gamma,i} = \frac{\sigma\_{\Gamma,i}^{-2}}{\sigma\_{\Gamma,\mathcal{V}}^{-2} + \sigma\_{\Gamma,\mathcal{K}}^{-2}} \text{and } \omega\_{\mathcal{H},i} = \frac{\sigma\_{\mathcal{H},i}^{-2}}{\sigma\_{\mathcal{H},\mathcal{V}}^{-2} + \sigma\_{\mathcal{H},\mathcal{K}}^{-2}} \tag{5}$$

The computation of weighting factors (λ- s) for the parallel structure in **Figure 1B** is somewhat different. Here, target and hand are compared in both sensory modalities in parallel (*xi* = *x*T,*<sup>i</sup>* − *x*H,*i*) and maximum likelihood then determines how much weight should be given to each of these comparisons, based on the expected variance of each of the computed differences. Given that the variance of a difference is simply the sum of the variances of its minuend and of its subtrahend (σ<sup>2</sup> *<sup>i</sup>* <sup>=</sup> <sup>σ</sup><sup>2</sup> <sup>T</sup>,*<sup>i</sup>* <sup>+</sup> <sup>σ</sup><sup>2</sup> H,*i* ) and applying Equation (4), the weight given to each difference is computed as:

$$
\lambda\_i = \frac{\sigma\_{\Delta\_i}^{-2}}{\sigma\_{\Delta\_{\rm V}}^{-2} + \sigma\_{\Delta\_{\rm K}}^{-2}} \tag{6}
$$

Conceptually, therefore, the convergent and concurrent models differ primarily in terms of what is optimized. For the convergent model, an optimal estimate of the target and an optimal estimate of the hand are computed and then used to compute a movement vector. Under the concurrent model, multiple movement vectors are computed and then these vectors combined in an optimal fashion. Thus, even though Equations (1 and 2) are algebraically very similar, the choice of what to optimize when determining the various weights leads different results for the two different models. Note that the neural system may not operate in a strictly linear fashion, in which case differentiating between the two model structures would be even more important in terms of model predictions. But even the linear analysis presented here allows one to draw a distinction between the convergent and concurrent models, both conceptually, as we have described here, and experimentally, as we will show in the following paragraphs.

### **2.1. DISTINGUISHING BETWEEN MODELS**

When both target and hand can be localized via all the same sensory modalities, the convergent and modular formulations differ very little in terms of the predicted outcomes. In the example of hitting a nail with a hammer, this corresponds to the situation in which one can simultaneously see and feel with the hand both the hammer and the nail. In these circumstances, both models predict that more weight will be given to the most reliable (e.g., the least noisy) sensory channels. However, when only a subset of sensory information is available (e.g., only vision of the target or only kinesthesia about the hand), the two different formulations predict two substantially different outcomes.

Consider the situation of a nail that is already imbedded in the wall, such that it need not be held by the non-dominant hand (**Figure 2A**). Information about the target would therefore be limited to the visual domain. Compare this to hammering a nail that is held by the non-dominant hand, but whose head is obscured from view (**Figure 2B**). This example is perhaps not a very wise thing to do in real life, but it illustrates the point. To generalize, we will refer to these two types of tasks by the notation V-VK (visual target, visual and kinesthetic hand feedback) and K-VK, respectively, and to the original case of hammering a hand-held nail with full vision of both target and hands as a VK-VK task. In the case of the convergent model (**Figures 2C,D**), the lack of one source of information about the target simply means that an optimal combination of the remaining sensory cues will be used to localize the target. Thus, in V-VK, a representation of the target based on visual cues, transformed into the common reference frame, will be compared with a representation of the hand in that same reference frame derived from both visual and kinesthetic feedback. Similarly, in K-VK a representation of the target derived from kinesthetic information will be compared

which the target position can be sensed through **(A)** visual (*x*T,V) or **(B)** kinesthetic (*x*T,K) information only, whilst information from both sensory modalities (*x*H,V and *x*H,K) can be use to estimate the effector/hand position. Panels **(C,D)** represent how available sensory signals would be used following the Convergent Model in each of the two situations, respectively. Panels **(E,F)** illustrate the computational structure of the Concurrent Model for the same two situations. Green arrows represent the cross-modal sensory transformations that might be performed. Grayed out symbols indicate sensory inputs that are absent, as compared to the situation shown in **Figure 1**. All other notations and color conventions are the same as in **Figure 1**.

with a representation of the hand that is based on an optimal combination of visual and kinesthetic cues.

Applying the concurrent scheme to the situations shown in **Figures 2A,B**, however, begs the question: What is to be done with kinesthetic information about the hand when the target is presented only visually (V-VK) and what is done with visual information about the hand when the target is localized only kinesthetically (K-VK)? One possibility (not shown) is that the CNS simply ignores information about the hand in any sensory modality that is not also used to localize the target, relying only on sensory information that is directly comparable. Thus, only visual information about the hand would be used in the V-VK situation and only kinesthetic information about the hand would be used in the K-VK situation. But by doing so, one would forfeit the added precision that could be obtained by using both sources of sensory information about the hand holding the hammer. Alternatively, as illustrated in **Figures 2E,F**, the CNS could *reconstruct* the missing sensory information about the target by performing a cross-modal sensory transformation (green arrows). According to this arrangement, a kinesthetic representation of the target will be derived from visual information in V-VK, allowing both the visual and the kinesthetic information from the hand to be utilized. Analogously, the target can be reconstructed in visual space in K-VK, again allowing the comparison of target and hand to be carried out in both the visual and the kinesthetic domains.

The difference between the convergent and concurrent formulations becomes apparent if one compares the model predictions for V-VK versus K-VK in terms of the relative weighting given to visual or kinesthetic modalities. Consider first the concurrent models in **Figures 2E,F**. When computing the optimal weights λ<sup>V</sup> and λ<sup>K</sup> one must take into account not only the noise intrinsic to the sensory inputs, but also the noise added by cross-modal transformations (Soechting and Flanders, 1989; Tillery et al., 1991; Schlicht and Schrater, 2007) when a sensory input missing in one modality must be reconstructed from sensory signals in other. Taking into account this additional noise when applying Equation (6), one obtains for K-VK:

$$\lambda\_{\rm V} = \frac{\left(\sigma\_{\rm T,K}^2 + \sigma\_{\rm H,K}^2\right)}{\left(\sigma\_{\rm T,K}^2 + \sigma\_{\rm H,K}^2\right) + \left(\sigma\_{\rm T,K}^2 + \sigma\_{\rm T,K \to V}^2 + \sigma\_{\rm H,V}^2\right)}\tag{7}$$

$$\lambda\_{\rm K} = \frac{\left(\sigma\_{\rm T,K}^2 + \sigma\_{\rm T,K \to V}^2 + \sigma\_{\rm H,V}^2\right)}{\left(\sigma\_{\rm T,K}^2 + \sigma\_{\rm H,K}^2\right) + \left(\sigma\_{\rm T,K}^2 + \sigma\_{\rm T,K \to V}^2 + \sigma\_{\rm H,V}^2\right)}$$

and for V-VK:

$$\begin{aligned} \lambda\_{\rm V} &= \frac{\left(\sigma\_{\rm T,V}^2 + \sigma\_{\rm T,V \mapsto K}^2 + \sigma\_{\rm H,K}^2\right)}{\left(\sigma\_{\rm T,V}^2 + \sigma\_{\rm T,V \mapsto K}^2 + \sigma\_{\rm H,K}^2\right) + \left(\sigma\_{\rm T,V}^2 + \sigma\_{\rm H,V}^2\right)} \\ \lambda\_{\rm K} &= \frac{\left(\sigma\_{\rm T,V}^2 + \sigma\_{\rm H,V}^2\right)}{\left(\sigma\_{\rm T,V}^2 + \sigma\_{\rm T,V \mapsto K}^2 + \sigma\_{\rm H,K}^2\right) + \left(\sigma\_{\rm T,V}^2 + \sigma\_{\rm H,V}^2\right)} \end{aligned} \tag{8}$$

where σ<sup>2</sup> T,K→<sup>V</sup> and <sup>σ</sup><sup>2</sup> T,V→<sup>K</sup> represent the noise added when reconstructing a visual representation of the target from kinesthetic information and the noise added when reconstructing the target in kinesthetic space from visual information, respectively. One can see from these sets of equations that changing what sensory information is available about the target has the potential of changing the weight given to each type of sensory feedback used to guide the hand. Indeed, less weight (smaller λ- s) will be given to the component comparisons that require the reconstruction of sensory information, due to the noise that these reconstructions add to the signals. In most cases, however, the weighting of the two component comparisons will shift toward the visual information when the target is visual (V-VK) and will shift toward the kinaesthetic domain when the target is kinaesthetic (K-VK). In the limit, if the transformation noise is very high compared to the input noise, the comparison that requires a sensorimotor reconstruction will be given zero weight, leaving only the direct comparison to drive the response.

For the convergent model, there is no inherent need to reconstruct sensory information that is not available. The CNS would simply use all the available sensory information about the target and all available sensory information about the hand in order to compute an optimal estimate of the position of each. This does not mean, however, that no sensorimotor transformations are required to implement the concurrent formulation. On the contrary, in order to combine spatial information from different sources, the different pieces of information must be expressed in a common reference frame R. Thus, for the convergent model, coordinate transformations will be required even though no "reconstruction" of missing sensory information is needed. These transformations will also add noise which will affect the weighting between the different inputs and should therefore be explicitly considered when comparing the concurrent and convergent models. According to Equations (1 and 5), the estimate of the hand's position and orientation will be based on a weighted sum of the visual and kinesthetic feedback, with the weight determined by the variance of the two feedback signals and by the noise added by the two sensorimotor transformations:

$$\begin{aligned} \boldsymbol{w\_{\rm H,V}} &= \frac{\sigma\_{\rm H,K}^2 + \sigma\_{\rm H,K \mapsto R}^2}{\sigma\_{\rm H,K}^2 + \sigma\_{\rm H,K \mapsto R}^2 + \sigma\_{\rm H,V}^2 + \sigma\_{\rm H,V \mapsto R}^2} \\ \boldsymbol{w\_{\rm H,K}} &= \frac{\sigma\_{\rm H,V}^2 + \sigma\_{\rm H,V \mapsto R}^2}{\sigma\_{\rm H,K}^2 + \sigma\_{\rm H,K \mapsto R}^2 + \sigma\_{\rm H,V}^2 + \sigma\_{\rm H,V \mapsto R}^2} \end{aligned} \tag{9}$$

One can see that even if one considers noise added by sensorimotor transformations, the convergent model, unlike the concurrent model, predicts that the weighting of sensory information will not change between V-VK and K-VK. Because the information available about the hand is the same in both V-VK and K-VK, the relative weight given to visual versus kinesthetic feedback about the hand will be the same in both circumstances, regardless of the sensory modality used to sense the target.

The convergent and concurrent models make two different predictions, therefore, about what happens when the modality of the target is changed while full feedback of the hand is available. These predictions allow one to differentiate between the two hypotheses experimentally. Indeed, a number of studies that have compared moving the hand to visual versus proprioceptive targets provide support for the hypothesis of concurrent comparisons shown **Figure 1B**. For instance:


Because their data could not be reconciled with the encoding of movement parameters exclusively in either retinotopic space or kinesthetic space, the authors of the last two studies each proposed versions of the concurrent structure depicted in **Figure 1B**. The specifics of the models proposed by these different authors differ slightly from each other (more on the similarities and differences below) but both involve multiple comparisons in multiple reference frames and both can explain a shift in weighting toward visual information when the target was visible and toward kinesthetic information when the target was kinesthetic. Thus, compared to the hypothesis of convergent, multi-modal sensory integration shown in **Figure 1A**, the computational structure of multiple, concurrent comparisons depicted in **Figure 1B** provides a much more parsimonious explanation of the data reported from a number of different tasks and experimental paradigms.

# **3. TO RECONSTRUCT OR NOT TO RECONSTRUCT?**

Inherent to the concurrent model is the concept of sensory reconstruction. According to this idea, a visible target could be compared with proprioceptive information about the location of the hand if the visible information is transformed into proprioceptive space. Some such reconstruction would be necessary when, for instance, reaching toward a visual target with the unseen hand (V-K). The question remains, however, as to whether the visual target should be transformed into kinesthetic space or whether a visual representation of the hand should be constructed based on proprioceptive information from the arm. Transforming target information into kinesthetic space would be optional in a V-VK situation, where a direct comparison of target and hand could be carried out in visual coordinates. It would be even more superfluous to transform into visual space a purely kinesthetic (K-K) task. Yet the implication of visual representations in purely kinesthetic tasks is known to occur (Pouget et al., 2002b; Sober and Sabes, 2005; Sarlegna and Sainburg, 2007; McGuire and Sabes, 2009; Jones and Henriques, 2010). A key question to be addressed, therefore, is that of how the CNS chooses which comparisons to apply to a given task, and how to weight the different computations to arrive at the overall response. Under what conditions should information from one sensory modality be transformed into the reference frame of another?

In our original publication (Tagliabue and McIntyre, 2011) we argued that the CNS avoids sensory transformations, and thus performs direct comparisons whenever possible. Indeed, we observed that a V-VK task was carried out in visual coordinates while the equivalent K-VK task was carried out in kinesthetic space. (Note that we observed this result when subjects held their head upright. We saw a somewhat different result when subjects were asked to move their head during an imposed memory delay. We will discuss these latter results further down in this section). In our V-K and our K-V tasks, however, we observed that both visual and kinesthetic comparisons were performed, even though just one of these (and just one transformation) would have been sufficient. For instance, in V-K, subjects could have performed a single transformation of visual information into kinesthetic space, or they could have only transformed the kinesthetic hand information so as to perform the task in visual space. The fact that both transformations and both comparisons were performed shows that the CNS does sometimes perform "unnecessary" transformations beyond what would be minimally necessary to achieve the task.

In order to explain our results, and others, we had to resort to additional, albeit reasonable, assumptions that went beyond the basic tenets of MLE. The first was that direct comparisons are absolutely best, even though estimates of noise in the visual and kinesthetic channels and the conventional application of maximum likelihood would predict a more graded weighting between visual and kinesthetic information for the V-V and K-K tasks. The second was that the necessity of a single transformation would provoke the execution of a whole range of transformations into a number of different reference frame or sensory modalities. This could explain why the CNS would reconstruct a visual representation of a task that is otherwise purely kinesthetic, as was observed in the studies mentioned above. In the discussion of our results, we argued that this could be because a common neural network might generate the same amount of noise, whether performing one or many transformations. While this is a reasonable, and even testable, hypothesis, it still remains unproven and thus still constitutes, as of this writing, an *ad hoc* assumption that we had to invoke in order to reconcile empirical data with MLE.

In a more recent study, however, we showed how MLE *can* explain much, if not all, of the available data without these additional assumptions, if one properly accounts for co-variation of noise in sensory signals that have been reconstructed in one sensory modality from another (Tagliabue and McIntyre, 2013). The issue of co-variation is important because it conditions how two signals should be optimally weighted. If two signals are stochastically independent, the principle of maximal likelihood estimation says that the two quantities should be weighted according to the inverse of their respective expected variance. This weighted average will tend to reduce the effects of the independent noise in each component. But if the noise in one is correlated with the noise in the other, computing the weighted average will be less effective in reducing the overall noise. In the limit, if the noise in the two variables in perfectly correlated, then computing the weighted average will not reduce the overall noise at all.

To correctly compensate for covariance between two signals in the computation of the optimal weights to be applied, one must essentially take into account only the independent components of noise within each variable. In the case of two non-independent variables that exhibit Gaussian noise, the weighted combination of *x* and *y* that will minimize the variance of the output:

$$z = \lambda z + (1 - \lambda)\,\gamma \tag{10}$$

is given by the equation:

$$\lambda = \frac{\left(\sigma\_{\rm x}^{2} - \text{cov}\_{\mathbf{x}, \mathbf{y}}\right)^{-1}}{\left(\sigma\_{\rm x}^{2} - \text{cov}\_{\mathbf{x}, \mathbf{y}}\right)^{-1} + \left(\sigma\_{\rm y}^{-2} - \text{cov}\_{\mathbf{x}, \mathbf{y}}\right)^{-1}} \tag{11}$$

where cov*x*,*<sup>y</sup>* is the covariance between x and y. Added insight can be achieved if one considers two components *x* and *y* are derived from two stochastically independent signals, *p* and *q* and a common component *c*:

$$\begin{aligned} x &= p + \mathcal{c} \\ y &= q + \mathcal{c} \end{aligned} \tag{12}$$

In this case, which is directly applicable to the sensorimotor transformations that are being considered in this paper, the covariance between *x* and *y* is precisely equal to the variance of the common component *c*:

$$\begin{aligned} \sigma\_x^2 &= \sigma\_p^2 + \sigma\_c^2\\ \sigma\_y^2 &= \sigma\_q^2 + \sigma\_c^2\\ \text{cov}\_{x,y} &= \sigma\_c^2 \end{aligned} \tag{13}$$

and Equation (11) reduces to:

$$
\lambda = \frac{\sigma\_q^2}{\sigma\_p^2 + \sigma\_q^2} \tag{14}
$$

In other words, the optimal weighting of *x* and *y* depends only on the variance of the independent components *p* and *q*.

One can see from Equation (14) that if one of the two constituent signals presents only noise that is common to both quantities *x* and *y*, e.g.,:

$$\begin{aligned} \varkappa &= p + c \\ y &= c \end{aligned} \tag{15}$$

then the weight given to the constituent with the added noise (*x* in the example) will be zero. This fact can be used to predict when the CNS might reconstruct a representation of the task in a reference frame different from that of either the target localization or the feedback about the motor response. If the task allows for a direct comparison of target and effector information, e.g., when moving the hand to a remembered posture, the reconstructed comparison will contain all the variability of the kinesthetic inputs plus the noise added by the coordinate transformations while the direct comparison will contain no noise that is not also included in the reconstructed comparison:

$$\begin{aligned} \sigma\_{\Delta \mathbf{V}}^2 &= \sigma\_{\mathbf{T}, \mathbf{K}}^2 + \sigma\_{\mathbf{H}, \mathbf{K}}^2 + \sigma\_{\mathbf{T}, \mathbf{K} \mapsto \mathbf{V}}^2 + \sigma\_{\mathbf{H}, \mathbf{K} \mapsto \mathbf{V}}^2 \\ \sigma\_{\Delta \mathbf{K}}^2 &= \sigma\_{\mathbf{T}, \mathbf{K}}^2 + \sigma\_{\mathbf{H}, \mathbf{K}}^2 \end{aligned} \tag{16}$$

Applying Equation (14) means that the comparison of the reconstructed signals, -V will be given no weight compared to the direct comparison -K. In other words, there is no advantage to transforming the task into an alternate reference frame (e.g., in visual space) in this situation. On the other hand, if the target and hand are sensed in two different reference frames, such that at least one sensory transformation is required, then reconstruction into a third reference frame might be beneficial. For example, if one is asked to reproduce with the right hand the remembered orientation of the left, a transformation will have to be applied to compare the hand orientation between the two limbs (see **Figure 3**), leading to the equations:

$$
\sigma\_{\Delta \mathbf{V}}^2 = \sigma\_{\mathbf{T}, \mathbf{K\_L}}^2 + \sigma\_{\mathbf{H}, \mathbf{K\_R}}^2 + \sigma\_{\mathbf{T}, \mathbf{K\_L} \mapsto \mathbf{V}}^2 + \sigma\_{\mathbf{H}, \mathbf{K\_R} \mapsto \mathbf{V}}^2
$$

$$
\sigma\_{\Delta \mathbf{K\_L}}^2 = \sigma\_{\mathbf{T}, \mathbf{K\_L}}^2 + \sigma\_{\mathbf{H}, \mathbf{K\_R}}^2 + \sigma\_{\mathbf{H}, \mathbf{K\_R} \mapsto \mathbf{K\_L}}^2\tag{17}
$$

$$
\sigma\_{\Delta \mathbf{K\_R}}^2 = \sigma\_{\mathbf{T}, \mathbf{K\_L}}^2 + \sigma\_{\mathbf{H}, \mathbf{K\_R}}^2 + \sigma\_{\mathbf{T}, \mathbf{K\_L} \mapsto \mathbf{K\_R}}^2
$$

where KL and KR represent the kinesthetic information about the left and right hand, respectively. In this situation, each representation of the task, including representation that includes no direct inputs (-V) includes at least one source of noise that is independent from each of the others. Thus, one might expect to find that the task is carried out simultaneously in the intrinsic reference frame of each arm, and also in visual space. Indeed, when we compared precisely these two situations (matching the

**FIGURE 3 | Direct vs. indirect comparisons (modified from Tagliabue and McIntyre, 2013).** The schematics represent the concurrent model applied to two tasks that are both purely kinesthetic (K-K). In the INTRA-manual task the subject feels the target position with the right hand (T,KR) and reproduces it with the same hand (H,KR). In the INTER-manual task the target is felt with the left hand (T,KL) and its position is reproduced with the right (H,KR). As in **Figures 1**, **2**, red and blue arrows represent visual and kinesthetic signals, respectively, circular nodes represent movement vectors computed in different reference frames and green arrows represent sensory transformations. Each task can potentially be carried out partially in visual space by reconstructing a visual representation of the target (T,V) and a visual representation of the hand (H,V) from available kinesthetic inputs. In the INTRA-, but not INTER-manual task, a direct comparison between the kinesthetic signals about target and response is possible. Taking into account co-variance between reconstructed signals, only in the INTER-condition would a reconstruction of an "unnecessary" visual representation reduce movement variability. Grayed-out symbols represent sensory inputs that are absent in each task while grayed-out green arrows depict sensory reconstructions that are given no weight when MLE is applied.

posture of the right hand to the remembered posture of the left versus matching the posture of the right hand to the remembered posture of the right hand) we observed exactly this behavior. The unilateral task showed no effect of deviations of the visual field, while the bilateral task did. This same reasoning can also be applied to a number of examples from the literature to explain why subjects appeared to reconstruct a visual representation of a task that could conceivably be carried out entirely in kinesthetic space (Pouget et al., 2002b; Sober and Sabes, 2005; Sarlegna and Sainburg, 2007; McGuire and Sabes, 2009; Jones and Henriques, 2010). Explicitly including the co-variation of reconstructed variable therefore increases the predictive value of the model structure depicted in **Figure 1B**.

### **4. THE TIMING OF SENSORY RECONSTRUCTIONS**

If one accepts the idea that the CNS transforms sensory information amongst multiple reference frames, one might also ask the question, when do such transformations occur? A number of studies have considered the performance of cross-modal transformations for the computation of a movement vector during planning (Sober and Sabes, 2003, 2005; Sarlegna and Sainburg, 2007; McGuire and Sabes, 2009; Burns and Blohm, 2010), but this is not the only time when such transformations may be needed. Sensory information about the target and limb continues to arrive throughout the movement, and the same issues about reference frames and sensor fusion arise when considering on-line corrections that are made based on this information. This question is of particular interest when one considers movements to memorized targets. In a V-K task, for instance, which is a task that requires at least one cross-modal sensory transformation, what happens if the target disappears before the reaching movement is started? How is the information about the target stored? Is it encoded in memory in visual space, to be transformed into kinesthetic space for comparison with proprioceptive information from the arm? Or is it immediately transformed into kinesthetic space and stored during the memory delay for later use?

The results of one of our recent experiments (Tagliabue et al., 2013) can be used to address this question. In that study we analyzed the V-K tasks alluded to above and illustrated in **Figure 4**. We asked subjects to perform this task in two different conditions, which differed only in terms of the timing of head movements. In one condition (U-T) subjects memorized the target with the head upright and produced the motor response with the head tilted. In the other condition (T-U) they memorized the target with the head tilted and moved the hand with the head upright. The rationale for performing this experiment with head tilted at different times is based on the notion that transformations between visual and kinesthetic space are disrupted (noisier) when the head is not aligned with gravity (Burns and Blohm, 2010; Tagliabue and McIntyre, 2011). This assumption is supported by a study of orientation matching between a visual and haptic stimuli (McIntyre and Lipshits, 2008). Whereas tilting the subject's entire body had no effect on visual-visual and haptic-haptic comparisons, responses were more variable

**FIGURE 4 | Experimental manipulation of transformation noise (modified from Tagliabue et al., 2013).** Two different experimental conditions are illustrated in which the subjects were asked to memorize the orientation (θ) of a visual target (red bar) and to reproduce it, after a delay, with their unseen hand. **(A)** In one condition (U-T) subjects memorized the target with the head upright and responded with the head tilted. **(B)** In the other condition (T-U), the target was memorized with the head tilted and the hand oriented with the head upright. On the right side of the figure are depicted the predictions of the Concurrent Model for each of the two experimental conditions. As in **Figures 1**, **2**, and blue arrows represent visual and kinaesthetic signals, respectively and green arrows represent cross-modal transformations. Gray symbols represent sensory inputs that are absent. Because having the head tilted (yellow areas) causes cross-modal transformations to be significantly noisier, comparisons requiring such transformations are given less weight (faded green arrows) and comparisons for which sensory reconstructions are performed with the head upright are privileged.

in the case of a visual-haptic comparison when the body was tilted versus when it was upright. The fact that the inter-modal comparison became more variable, but not the intra-modal ones indicates that it is the transformation between sensory modalities, and not the actual sensory inputs, that are noisier when tilted with respect to gravity. In light of this fact, the relative weight given to visual information (λV) in our more recent experiment and the overall variance (σ<sup>2</sup> -) will depend on whether each transformation is performed with the head upright or with the head tilted.

One can therefore differentiate between the different hypotheses **Figure 4** as follows. For a V-K task we have:

$$\begin{aligned} \sigma\_{\Delta \mathbf{V}}^2 &= \sigma\_{\mathbf{T}, \mathbf{V}}^2 &+ \sigma\_{\mathbf{H}, \mathbf{K}}^2 + \sigma\_{\mathbf{H}, \mathbf{K} \mapsto \mathbf{V}}^2 \\ \sigma\_{\Delta \mathbf{K}}^2 &= \sigma\_{\mathbf{T}, \mathbf{V}}^2 + \sigma\_{\mathbf{T}, \mathbf{V} \mapsto \mathbf{K}}^2 &+ \sigma\_{\mathbf{H}, \mathbf{K}}^2 \end{aligned} \tag{18}$$

Taking into account the co-variation between a transformed signal and its source, as described in section 3, one can compute the weight given to the visual comparison:

$$
\lambda\_{\rm V} = \begin{array}{c}
\sigma\_{\rm T,V \mapsto \rm K}^2 \\
\overline{\sigma\_{\rm T,V \mapsto \rm K}^2 + \sigma\_{\rm H,K \mapsto \rm V}^2}
\end{array} \tag{19}
$$

and given the formula for the variance of a weighted sum of two variables that are not independent:

$$
\sigma\_{ax+by}^2 = a^2 \sigma\_x^2 + b^2 \sigma\_y^2 + 2 \, ab \, \text{cov}\_{x,y} \tag{20}
$$

the overall variance of the optimal estimate will be:

$$
\sigma\_{\Delta}^{2} = \lambda\_{\text{V}}^{2} \sigma\_{\Delta \text{V}}^{2} + (1 - \lambda\_{\text{V}})^{2} \sigma\_{\Delta \text{K}}^{2} + 2\lambda\_{\text{V}}(1 - \lambda\_{\text{V}}) \text{cov}\_{\Delta \text{V}, \Delta \text{K}} \tag{21}
$$

$$=\sigma\_{\rm T,V}^{2} + \sigma\_{\rm H,K}^{2} + \lambda\_{\rm V}^{2}\sigma\_{\rm H,K \leftrightarrow V}^{2} + (1-\lambda\_{\rm V})^{2}\sigma\_{\rm T,V \leftrightarrow K}^{2} \tag{22}$$

$$=\sigma\_{\rm T,V}^{2}+\sigma\_{\rm H,K}^{2}+\frac{\sigma\_{\rm T,V\mapsto K}^{2}\sigma\_{\rm H,K\mapsto V}^{2}}{\sigma\_{\rm T,V\mapsto K}^{2}+\sigma\_{\rm H,K\mapsto V}^{2}}\tag{23}$$

Now assume that the noise added when transforming from visual to kinesthetic or from kinesthetic to visual is the same, for a given orientation of the head, and that head tilt has the same additive effect on all transformations, i.e., we define:

$$
\sigma\_{\text{T,V}\mapsto\text{K}}^2 = \sigma\_{\text{H,K}\mapsto\text{V}}^2 = \sigma\_\leftrightarrow^2 \tag{24}
$$

when the transformation is performed with the head upright, and:

$$
\sigma^2\_{\text{T,V}\mapsto\text{K}} = \sigma^2\_{\text{H,K}\mapsto\text{V}} = \sigma^2\_{\mapsto} + \sigma^2\_{//} \tag{25}
$$

when the transformation is performed with the head tilted to the side. Combining Equations (18–25), one can see that tilting the head will have no effect on λ<sup>V</sup> if both transformations are performed with the head upright or both are performed with the head tilted:

$$\begin{aligned} \lambda\_{\rm V}|\_{\rm up,up} &= \frac{\sigma\_{\rm \rightarrow}^2}{\sigma\_{\rm \rightarrow}^2 + \sigma\_{\rm \rightarrow}^2} = \frac{1}{2} \\ \lambda\_{\rm V}|\_{\rm tilt,ill} &= \frac{\sigma\_{\rm \rightarrow}^2 + \sigma\_{\rm \prime \prime}^2}{\sigma\_{\rm \rightarrow}^2 + \sigma\_{\rm \prime \prime}^2 + \sigma\_{\rm \rightarrow}^2 + \sigma\_{\rm \prime \prime}^2} = \frac{1}{2} \\ \lambda\_{\rm V}|\_{\rm up,up} &= \lambda\_{\rm V}|\_{\rm tilt,ill} \end{aligned} \tag{26}$$

Performing both transformations with the head upright or both with the head tilted will, however, have an effect on the overall variability:

$$\begin{aligned} \sigma\_{\rm{\Delta}}^{2}|\_{\rm{up,up}} &= \sigma\_{\rm{\Gamma,V}}^{2} + \sigma\_{\rm{H,K}}^{2} + \frac{\sigma\_{\rm{\Delta}}^{2} \sigma\_{\rm{\Delta}}^{2}}{\sigma\_{\rm{\Delta}}^{2} + \sigma\_{\rm{\Delta}}^{2}} \\ &= \sigma\_{\rm{\Gamma,V}}^{2} + \sigma\_{\rm{H,K}}^{2} + \frac{\sigma\_{\rm{\Delta}}^{2}}{2} \\\\ \sigma\_{\Delta}^{2}|\_{\rm{\rm{ult,ult}}} &= \sigma\_{\rm{\Gamma,V}}^{2} + \sigma\_{\rm{H,K}}^{2} + \frac{\left(\sigma\_{\rm{\Delta}}^{2} + \sigma\_{\rm{\Delta}}^{2}\right)\left(\sigma\_{\rm{\Delta}}^{2} + \sigma\_{\rm{\Delta}}^{2}\right)}{\sigma\_{\rm{\Delta}}^{2} + \sigma\_{\rm{\Delta}}^{2} + \sigma\_{\rm{\Delta}}^{2} + \sigma\_{\rm{\Delta}}^{2}} \\ &= \sigma\_{\rm{\Gamma,V}}^{2} + \sigma\_{\rm{H,K}}^{2} + \frac{\sigma\_{\rm{\Delta}}^{2} + \sigma\_{\rm{\Delta}}^{2}}{2} \\\\ \sigma\_{\rm{\Delta}}^{2}|\_{\rm{ult,ult}} &= \sigma\_{\rm{\Delta}}^{2}|\_{\rm{up,up}} + \frac{\sigma\_{\rm{\Delta}}^{2}}{2} \end{aligned} \tag{27}$$

On the other hand, if one of the transformations is performed with the head upright, and the other with the head tilted, the opposite pattern should be observed. The weight given to visual information will depend on whether the transformation T,V → K is performed with the head upright and the transformation H,K → V is performed with the head tilted (up,tilt), or vice versa (tilt,up):

$$\begin{aligned} \lambda \mathbf{v}|\_{\text{up,ilt}} &= \frac{\sigma\_{\mapsto}^2}{\sigma\_{\mapsto}^2 + \sigma\_{\mapsto}^2 + \sigma\_{//}^2} = 0 \,\mathrm{as}\sigma\_{//}^2 \to \infty \\ \lambda \mathbf{v}|\_{\text{tilt,up}} &= \frac{\sigma\_{\mapsto}^2 + \sigma\_{//}^2}{\sigma\_{\mapsto}^2 + \sigma\_{//}^2 + \sigma\_{\mapsto}^2} = 1 \,\mathrm{as}\,\sigma\_{//}^2 \to \infty \\ \lambda \mathbf{v}|\_{\text{up,ilt}} &< \lambda \mathbf{v}|\_{\text{tilt,up}} \end{aligned} \tag{28}$$

while one would expect to see similar levels of overall variability between the two conditions, because in both cases one transformation is performed with the head tilted and one with the head upright:

$$\begin{aligned} \sigma\_{\Delta}^{2}|\_{\text{up,ilt}} &= \sigma\_{\text{T,V}}^{2} + \sigma\_{\text{H,K}}^{2} + \frac{\sigma\_{\mapsto}^{2} \left(\sigma\_{\mapsto}^{2} + \sigma\_{//}^{2}\right)}{\sigma\_{\mapsto}^{2} + \sigma\_{\mapsto}^{2} + \sigma\_{//}^{2}} \\ \sigma\_{\Delta}^{2}|\_{\text{tilt,up}} &= \sigma\_{\text{T,V}}^{2} + \sigma\_{\text{H,K}}^{2} + \frac{\left(\sigma\_{\mapsto}^{2} + \sigma\_{//}^{2}\right)\sigma\_{\mapsto}^{2}}{\sigma\_{\mapsto}^{2} + \sigma\_{//}^{2} + \sigma\_{\mapsto}^{2}} \\ \sigma\_{\Delta}^{2}|\_{\text{up,ilt}} &= \sigma\_{\Delta}^{2}|\_{\text{tilt,up}} \end{aligned} \tag{29}$$

Note that the results remain valid even if σ<sup>2</sup> T,V→<sup>K</sup> = <sup>σ</sup><sup>2</sup> H,K→V, for plausible values of σ<sup>2</sup> T,V→K, <sup>σ</sup><sup>2</sup> H,K→V, <sup>σ</sup><sup>2</sup> T,V, <sup>σ</sup><sup>2</sup> H,K and <sup>σ</sup><sup>2</sup> //.

Using these mathematical considerations and the results of our experiment, one can distinguish between the three hypotheses about the timing of sensory reconstructions shown in **Figure 5**. If the movement vector is computed while the target is still visible (**Figure 5A**), then both transformations (T,V → K and H,K → V) will be performed with the head upright in the U-T condition and both will be performed with the head tilted in the T-U condition. According to Equations (26 and 27), the relative weight given to visual information should not change between the U-T and T-U conditions, while the overall variance should be greater for T-U than for U-T. Neither of these predictions is consistent with our empirical results in which we observed a significantly greater weight given to visual information in the T-U condition, compared to U-T, and similar levels of overall variability for both (Tagliabue et al., 2013). Note that this hypothesis can also be rejected by the strong effect of response modality that we observed in our previous study (Tagliabue and McIntyre, 2011). In all conditions tested in that study (K-K, K-VK, K-V, V-K, V-VK, and V-V) the subject's hand was outside the field of view during the time when the target was being presented. Therefore in all conditions the information available about the hand's orientation during target observation was *de facto* the same. If **Figure 5A** were correct, we would not have observed the strong effect of response modality on the weight given to visual versus kinesthetic information.

**Figure 5B** depicts an alternative hypothesis by which the CNS performs the requisite coordinate transformations starting at movement onset, relying on visual memory of the target after it disappears. In this case both transformations (T,V → K and H,K → V) would be performed with the head upright in the T-U condition and with the head tilted in the U-T condition. Applying once again Equations (26 and 27), one would expect to see similar weight given to visual information in both conditions and a significant difference in the overall variability, although according to this hypothesis, the higher variability would occur for U-T. As before, the empirical observations (Tagliabue et al., 2013) do not match the predictions of **Figure 5B**.

Our experimental findings can, however, be reconciled with a hypothesis by which cross-modal reconstructions of target and hand occur continuously, but *only long as the sensory input to be transformed is present* (**Figure 5C**). When the target disappears, as in our experiments, further reconstruction of its kinesthetic orientation from visual information is halted, and the remembered

**FIGURE 5 | Timing of cross-modal reconstructions.** Hypotheses concerning the time course of sensorimotor reconstructions are represented for the task depicted in **Figure 4**. The visibility of the target (purple bar) and the tilt of the head (yellow bar) are shown as time progresses from left to right. The hand moves only after the rotation of the head is terminated. Horizontal lines represent internal representations of the target (θT,V and θT,K) and of the hand (θH,V and θH,K). Gray symbols indicate sensory inputs that are absent, while green arrows indicate cross-modal reconstructions that may be performed. Vertical arrows and nodes indicate when the comparisons of target and hand are carried out, according to three hypotheses: **(A)** Cross-modal reconstructions and concurrent target-hand comparisons (θV,θK) are performed while the target is visible and the resulting

orientation is maintained in both spaces. Transformation of the continuously available hand kinesthesia into the visual domain proceeds, however, through the end of the movement. Here we fall into the situation in which the sensory transformations potentially used to control the movement do not all occur with the head at the same orientation. In the U-T condition, the last transformation of the target into kinesthetic space will occur with the head upright, while the latest transformations of the hand into visual space will occur throughout the movement, i.e., with the head tilted. Conversely, in the T-U condition, the last transformation of the target will occur with the head tilted, and the latest transformations of the hand with the head upright. Applying Equations (28 and 29), one expects to see a greater reliance on visual information in T-U than in U-T, with similar levels of overall variability between the two conditions, precisely as we observed (Tagliabue et al., 2013).

To summarize, we have shown that the reconstruction of sensory signals in alternate reference frames appears to occur only while the primary sensory input is available. An important movement vector (θ) is maintained and updated through the end of the movement. **(B)** Cross-modal reconstructions are performed during movement execution, relying on sensory inputs about the target stored in memory. **(C)** Cross-modal reconstructions are performed continuously as long as the sensory input is present; direct and reconstructed target representations are maintained in memory in parallel through the end of the movement. Faded nodes indicate target-hand comparisons that are noisier because they rely on cross-modal reconstructions that were performed with the head titled. Hypotheses **(A,B)** predict similar weighting of visual and kinesthetic information, and thus partial deviations of the response in both the U-T and T-U conditions, while hypothesis **(C)** predicts a significantly larger weighting of the visual comparison in the T-U than in the U-T conditions.

corollary to this conclusion is that the CNS will also store spatial information concurrently in multiple reference frames, a prediction that can, in theory, be tested experimentally.

# **5. GENERALIZED CONVERGENT AND CONCURRENT MODELS**

In the preceding sections we have discussed how the CNS might benefit from performing multiple, concurrent comparisons when, for instance, bringing the hand into alignment with a target. This discussion has highlighted a number of pertinent issues, including the evidence for single versus multiple comparisons, the importance of considering co-variation of signals when computing weights based on maximum likelihood and the timing of inter-modal transformations. The preceding sections leave open a number of questions, however, about when the various input signals are combined and about how to extend these concepts to situations where more than two sensory modalities may be involved. In this section we will formalize the distinction between convergent versus concurrent structures. In the section that follows we will show how the various computational concepts can be broadened to include questions such as how the CNS makes use of intrinsic versus extrinsic reference frames.

### **5.1. FULLY CONVERGENT MODEL**

**Figure 6A** shows the computational structure of the fully convergent model. A maximum likelihood estimate is made from all available inputs about the target's position and a similar process is applied to all available information about the position of the hand. As pointed out in section 2.1, the various sources of information must be transformed into a common reference frame in order for these optimal estimates to be computed and these transformations add noise. The calculations that describe the convergent model are therefore given by:

$$\Delta \mathbf{x} = \sum\_{i=1}^{n} \mathbf{w}\_{\Gamma,i} \Psi\_{i \to r} \left( \mathbf{x}\_{\Gamma,i} \right) - \sum\_{j=1}^{m} \mathbf{w}\_{\mathbf{H},j} \Psi\_{j \to r} \left( \mathbf{x}\_{\mathbf{H},j} \right) \tag{30}$$

where *x*T,*<sup>i</sup>* and *x*H,*<sup>j</sup>* are the sensory inputs about the target position in reference frame *i* and the hand position in reference frame *j*. Each input is associated with its own intrinsic variability (σ<sup>2</sup> <sup>T</sup>,*<sup>i</sup>* or <sup>σ</sup><sup>2</sup> H,*j* ). The operator *a*→*<sup>r</sup>* represents the a transformation of a position value from some reference frame *a* into the common reference frame *r*. Applying *a*→*<sup>r</sup>* to an input value expressed in its intrinsic coordinate frame *a* creates a new value in the reference frame *r* with noise equal to the sum of the variance of the input (e.g., σ<sup>2</sup> T,*a*) and the variance added by the transformation (σ<sup>2</sup> *<sup>a</sup>*→*r*). Note that the common reference frame *r* could be some abstract reference frame that is independent from any given sensory frame, or it could be one of the *n* reference frames intrinsic to the sensory modalities used to sense the target position or one of the *m* reference frames used to sense the hand position. In this latter case, no transformation will be required for at least one sensory input, and we define *r*→*r*(*x*) = *x* and σ<sup>2</sup> *<sup>r</sup>*→*<sup>r</sup>* = 0.

### **5.2. HYBRID CONVERGENT/CONCURRENT MODEL**

According to the model presented in **Figure 6B**, it is presumed that the CNS will use all available information to represent the task in each of the component reference frames, and will then concurrently compare the target to the hand within each reference frame, before combining the results of each comparison to drive the motor response. We base this formulation on the model proposed by McGuire and Sabes (2009) for the combination of visual and kinesthetic information. From their discussion: *movements are always represented in multiple reference frames*, and from the Methods: *the model first builds internal representations of fingertip and target locations in both retinotopic and body-centered reference frames. These representations integrate all available sensory signals, requiring the transformation of non-native signals.* Extending these concepts to more than two sensory modalities and reference frames, the equation describing this formulation is:

$$\Delta \mathbf{x} = \sum\_{i=1}^{N} \lambda\_i \left( \sum\_{j=1}^{n} \boldsymbol{\omega}\_{i, \mathrm{T}, j} \boldsymbol{\Psi}\_{j \rightarrow i} (\boldsymbol{\chi}\_{\mathrm{T}, j}) - \sum\_{j=1}^{m} \boldsymbol{\omega}\_{i, \mathrm{H}, j} \boldsymbol{\Psi}\_{j \rightarrow i} (\boldsymbol{\chi}\_{\mathrm{H}, j}) \right) (31)$$

where *N* is the total number of reference frames for which the comparison between target and hand will be made, *n* - *N* is the number of reference frames in which target information is directly available and *m* - *N* is the number of reference frames in which hand feedback is available. Implicit in this formulation is the idea that the CNS will always reconstruct sensory signals across modalities, even when sensory information is directly available within a given modality. One can see that this formulation allows for two sets of weights, those that determine the weight given to direct and reconstructed inputs within each reference frame [*wi*,T,*<sup>j</sup>* and *wi*,H,*j*, comparable to the weights *w* described in the convergent model of Equation (1)] and those used to combine the results of the differences computed in each reference frame [comparable to the weights λ in the concurrent model of Equation (2)]. So, for instance, if both visual and kinesthetic information is available about the target, both the direct visual input and a transformed version of the kinesthetic information will be used to construct a representation of the target in visual space. Similarly, both the direct sensory input and the reconstructed visual input will be used to construct a representation of the target in kinesthetic space. The weight given to each source of information, however, will take into account the noise added by the cross-modal transformations. Thus, the representation of the movement in visual space will give more weight to the direct visual input than to the visual representation that is reconstructed from kinesthetic signals, etc. According to this model, the CNS will read out the desired movement vector by combining the differences computed concurrently in each reference frame, also according to the expected variance of each of the differences.

### **5.3. A FULLY-CONCURRENT MODEL**

Here we propose a third formulation, shown in **Figure 6C**, based on the concept that individual comparisons form the building blocks for multisensory control of hand-eye coordination. According to this proposal, each available sensory input may be transformed into any and all other potential reference frames, as in the hybrid model described above. The two models differ, however, in terms of how the various reconstructions are handled within each reference frame. According to the fully concurrent model, the direct and reconstructed signals are not combined into a single representation of the target and of the hand within each reference frame. Rather, the CNS would compute individually the differences between all possible permutations of target and hand representations, both direct and reconstructed, within each reference frame, on a pair-by-pair basis. Only then would the results of all the individual differences be combined through a weighted average according to MLE in order to compute the movement vector. The computations that describe such a fully distributed, concurrent model, based on individual differences can be described by:

$$\Delta \mathbf{x} = \sum\_{i=1}^{N} \sum\_{j=1}^{n} \sum\_{k=1}^{n} \gamma\_{i,j,k} \left( \Psi\_{j \to i} \left( \mathbf{x}\_{\Gamma,j} \right) - \Psi\_{k \to i} \left( \mathbf{x}\_{\text{H,K}} \right) \right) \tag{32}$$

A simple mathematical convenience serves to adapt Equation (32) to situations where direct sensory inputs about the target or the hand are missing in one or more of the *n* sensory modalities. According to MLE, a given signal is weighted according to the inverse of its expected variance. If the quantity 1/σ<sup>2</sup> is a measure of the confidence that one has in a given signal—i.e., the greater the variability, the lower the confidence—one can therefore assign to a missing sensory input an infinite variance, in the sense that the confidence in a missing signal will be 1/σ<sup>2</sup> = 1/∞ = 0. By doing so, the weight given to a missing input, or to a transformed version of a missing input will automatically fall to zero in the calculations derived from MLE.

Note that Model **6C** is "fully connected", allowing for the possibility that, for instance, the CNS will reconstruct and compare kinesthetic signals in a visual reference frame even though both target and hand may be visible. This means that there may be multiple comparisons of the target and hand within any one reference frame due to the reconstruction from more than one other reference frames. Nevertheless, given the noise inherent to the reconstruction, the application of MLE will favor the comparison of the directly sensed visual signals within the each reference frame, when such direct information is available. Indeed, some components may drop out of the equation because MLE gives them a weight of zero, as we will see in the following.

# **6. EXTRINSIC REFERENCE FRAMES**

In the examples given above we have focused mainly on intrinsic reference frames native to the sensory modalities used to localize the target and the hand. This is due in part to the fact that the most widely documented studies of sensor fusion for eyehand coordination, including those cited above, have considered two main reference frames: retinal for visual information and body centered for kinesthetic (a.k.a. proprioceptive) information. Depending on the task, however, other non-native reference frames are almost certainly of interest. For instance, ample evidence exists for the encoding of limb movements (Soechting and Ross, 1984; Darling and Gilchrist, 1991; Borghese et al., 1996; Luyat et al., 2001; Darling et al., 2008) or visual stimuli (Asch and Witkin, 1948b; Luyat and Gentaz, 2002) in a gravitational reference frames, as well as the encoding of information with respect to visual landmarks (Asch and Witkin, 1948a). In the following we examine the question of whether or not to make use of extrinsic reference frames in the context of each of the three models shown in **Figure 6**.

The convergent model of **Figure 6A** can accommodate the recoding of a sensorimotor task by realizing a change in the common reference frame *r*. Thus, the CNS may choose to combine sensory inputs in one possible reference frame or another, depending on the task conditions. Nothing in Equation (30), however, says anything about how *r* is chosen. Additional rules, not specified in Equation (30), would have to be found to resolve this outstanding question. As such, Model **6A** is incomplete. Models **6B,C** provide more elegant solutions to this question. An astute reader will have noticed the distinction between the lowercase *n* and *m* in Equations (31 and 32), representing the number of sensory inputs, from the uppercase *N* indicating the number of reference frames in which the comparison of target and hand is performed. These numbers could all be the same, but the two formulations allow for the use of additional reference frames not directly linked to a sensory input as well. According to these equations, each sensory input may be reconstructed in additional, non-native reference frames. Candidates include other, derived egocentric references such as the head or the shoulder or with respect to external references such as gravity or visual landmarks.

From the perspective of minimizing variability, however, recoding of sensory information in a non-native reference frame would not necessarily be advantageous, because the transformation of the information from a native to a non-native reference introduces additional noise. For instance, the variability of a visual target encoded with respect to gravity will include the variability of both retinal signals and of graviceptors. Moreover, all the variance of the target-hand comparison in the retinal reference frame will be included in the comparison encoded in the external reference frame. According to the analysis presented in section 3 the weight given to the external representation would drop to zero. One might therefore surmise that the recoding of spatial information in non-native reference frames will be avoided, when possible, in deference to direct comparisons of sensory information within the intrinsic reference frame of the different neural receptors. As we will show in the following examples, however, the native sensory representations may be affected by additional sources of noise, depending on the circumstances. The principle of maximum likelyhood coupled with the concurrent structures of Models **7B,C**, can then predict which of the *N* reference frames, intrinsic or extrinsic, come into play in any given situation.

## **6.1. EXTERNAL REFERENCE FRAMES**

**Figure 7** shows an example of how the concurrent models may be applied to the question of whether or not to make use of an external reference frame for a given task. The model predicts that if the target and the hand can be sensed through the same modality and no movement of the sensor occurs between target memorization and response (**Figure 7A**), the brain should privilege a direct egocentric encoding of the movement. Since the transformation into the alternative reference frame would add noise, maximum likelihood will give the most weight to the direct comparison. This effect is amplified if one considers the co-variation between direct and reconstructed signals. Because a comparison performed in any other reconstructed reference frame would co-vary precisely with the inputs to the direct comparison, performing these additional encodings would not reduce the variability of the movement at all. On the other hand, if a movement occurs after the target is stored in memory (**Figure 7B**), an egocentric memory of the target would need to be updated to account for the sensor displacement (Droulez and Berthoz, 1992; Duhamel et al., 1992; Medendorp et al., 2008). In this situation, reconstructing additional, external encodings of the movement becomes advantageous, because the noise added by the updating of the intrinsic representation becomes comparable to the noise added when reconstructing in an external reference frame. This is especially true when the noise in the information used to update the egocentric representation of the target and the noise in the signals used as external references are independent.

The parallel structures of Models **6B,C** are interesting because they provide a theoretical basis for using a combination of intrinsic and extrinsic reference frames, which appears to well correspond to behavioral (Burgess et al., 2004; Vidal et al., 2004; Burgess, 2006; Byrne et al., 2010) and physiological (Dean and Platt, 2006; Zaehle et al., 2007) evidence. Indeed, in a task of reaching with the outstretched hand for a visual or kinesthetic target, with visual or kinesthetic feedback about the response, or both, we were unable to reconcile empirical data with a computational model that relied on intrinsic reference frames alone (Tagliabue and McIntyre, 2012). We surmised that due to the movement of the head in our experiment, subjects encoded the task in external reference frames as well. Psychophysical studies have also shown that subjects tend to use egocentric representations if they remain stable after memorization, but they combine egocentric and external representations if their body moves (Burgess et al., 2004; Burgess, 2006). Similarly, during reaching to visual targets, external visual landmarks appear to be neglected if the hand visual feedback is reliable; whilst they are integrated to build an allocentric representation of the movement if the hand visual feedback was absent or unpredictable (Obhi and Goodale, 2005; Neely et al., 2008).

### **6.2. MEMORY**

The need to store target information in memory for some time before the movement occurs can also motivate the transformation

**FIGURE 7 | External reference frames.** Example of how external sources of information, such as gravity and the visual scene, can be combined to build external encodings of initially retino-centric signal about the target, *x*T,V and the hand, *x*H,V. Open circular and square nodes represent the recoding of information with respect to an external reference (circles) or the updating of egocentric information to account for movements of the body. All other symbols for inputs and transformations are as defined in previous figures. **(A)** If no movement occurs after the memorization of the retinal information about the target, its direct comparison with the retinal signal about the hand is possible, therefore encoding these signal with respect to the external gravitational and scene references would not reduce movement variability. **(B)** If the head moves in space, or if the eye moves within its orbit, a direct comparison between retinal signals about the target and hand is not possible, because the retinal information about the target must be updated to take into account the sensor movement. In this case, encoding the initially retino-centric signals with respect to the gravity and visual scene become advantageous, because the egocentric and the external encodings become partially uncorrelated.

of sensory information into a non-native reference frame. In eye-hand tasks with imposed memory delays, the variability of responses tends to increase with the length of the delay (McIntyre et al., 1997, 1998). Thus, the simple act of storing spatial information in memory adds noise. According to the hypothesis related in section 4, the target location will be stored in memory simultaneously in more than one reference frame. Assuming that each representation of the remembered target position will degrade independently (i.e., each will accumulate noise that is stochastically independent from the other), it becomes more and more interesting, a maximum likelihood perspective, to make use of the non-native representations, despite the added cost of reconstructing those representations in the first place. This reasoning is supported by a study in which subjects were asked to point to targets located along a straight line in 3D space (Carrozzo et al., 2002). As the memory delay increased, patterns of variability of the pointing position were more-and-more constrained by the extrinsic reference provided by the direction of the line in 3D space. This can be interpreted as a shift in weighting between egocentric and allocentric reference frames, even when the body does not move. By simply substituting "memory processes" for "head/eye movement", however, **Figure 7** can be used to understand why the CNS may rely more on the encoding of a task in a external reference frame when memory processes are involved.

# **7. DISCUSSION**

In this paper we have described three analytical models (see **Figure 6**) that share a number of defining features. One of these, the idea that the CNS can express spatial information in multiple reference frames while transforming information between them, is a common theme that is supported by numerous theoretical and experimental studies. To cite a few examples, Droulez and Cornilleau-Peres (1993) proposed a distributed model of "coherence constraint" by which spatial information may be encoded in reference frames intrinsic to each sensor and they described a computational structure by which information from one sensor can be reconstructed based on redundant information from other sensors when the primary source is not available. Bock (1986) identified a phenomenon of bias when pointing to targets that lie at a location peripheral to the center of gaze. This phenomenon has been used in a number of studies to argue that whether pointing to visual, auditory or even proprioceptive targets, the CNS carries out the task in retinotopic coordinates (Enright, 1995; Henriques et al., 1998; Pouget et al., 2002b). These observations can be linked to neural properties through models that solve the problem of recoding information in different reference frames by using basis functions and attractor dynamics (Pouget et al., 2002a) or restricted Boltzmann machines (Makin et al., 2013).

The premise that the CNS combines sensory information based on relative variance has also found considerable experimental support: van Beers et al. (1996) showed that the precision of pointing movements increased when the subject could use both visual and kinesthetic feedback signals, compared to when only one sensory feedback modality was available. They also showed that the relative weight given to the two sensory signals depended on their relative variability (van Beers et al., 1999). Ernst and Banks (2002) varied experimentally the noise in the sensory signals available to subjects when they grasped a virtual object that provided both visual and haptic cues about size. Using verbal judgments, they showed how the overall perceptual response shifted toward the haptic information when the precision of the visual inputs was degraded. Smeets et al. (2006) assumed that the CNS maintains both a visual and a kinesthetic representation of targeted movements. When vision of the hand was allowed, this sensory modality dominated due to its higher precision. But when vision of the hand was occluded and subjects were asked to make consecutive movements, the authors observed a gradual shift toward a reliance on proprioceptive information, as indicated by gradual drift in the direction of biases that are specifically associated with this modality. They attributed this shift to a re-weighting toward proprioceptive information as the visual representation of the occluded hand degrades over the course of sequential movements.

These themes of transformations and maximum likelihood come together when one considers the noise added when converting sensory information from one reference frame to another. As alluded to in section 2.1, the added noise inherent to sensory information that is reconstructed from other sources will cause a shift toward the alternative, directly sensed information. This principle has given rise to other empirical manifestations: Sober and Sabes (2003, 2005) postulated that the CNS combines visual and proprioceptive information at two different stages in the planning of targeted hand movements. First, the movement vector is calculated in visual space as the difference between the position of the visual target and the initial position of the hand. Kinesthetic information about the hand's position is also used at this stage, but because it must be transformed into visual space, it is given much less weight, in accord with MLE. At a second stage, the visual movement vector is converted into a motor vector, based primarily on proprioceptive information, but also accommodating a weaker influence of visual information about the target, hand and limb configuration transformed into motor coordinates. Burns and Blohm (2010), using the same model structure as Sober and Sabes, observed a reduction of the weight given to proprioceptive information in the calculation of the movement vector during planning when the head was tilted in a V-VK task. They attributed the shift to the fact that (a) the movement vector was calculated in visual space, requiring that the proprioceptive information about hand position be transformed in order to be useful and (b) tilting the head with respect to gravity increases the noise added by manual-to-visual transformations, thus further decreasing the weight given to the reconstructed signals. Tagliabue et al. (2013) examined the effects of head tilt on the weighting of sensory information. In a V-K task (**Figure 4**), if the head was tilted during target acquisition, but not the motor response, the CNS gave greater weight to the visual representation, presumably because transforming the visual target into kinesthetic space with the head tilted would be much noisier than transforming kinesthetic information about the hand into visual space with the head upright. Conversely, if the head was held upright when the target was acquired, but the head was tilted during the motor response, then the task was carried out in kinesthetic space so as to avoid the kinesthetic-to-visual transformation that would have to occur while the head was tilted.

Although the three computational models of **Figure 6** share a number of features, as described above, they vary in terms of the level of convergence or parallelism in the processing of sensory information. Model **6A** presents the highest level of convergence, combining all available inputs about the target and all available inputs about the hand before calculating a movement vector based on the two optimal estimates. Model **6A** provides no clue, however, as to what is the common reference frame for any given task, nor how the common reference frame might change from one task to another. Models **6B,C** provide more elegant solutions to this question by allowing the comparison of target and hand to be carried out simultaneously in multiple reference frames. The same rules that determine which sensory inputs will dominate in any given situation (maximization of likelihood) also determine the weight given to the comparison carried out in each of the component reference frames. The computational scheme depicted in **Figure 6B** combines features of both the convergent model of **Figure 1A** and the concurrent model of **Figure 1B**. Whereas multiple comparisons of target and hand are performed in different reference frames, one can see nevertheless that there is a convergence of multimodal sensory signals about the target and about the hand before these two quantities are compared (subtracted) within each reference frame. In contrast, Model **6C** combines the results of binomial comparisons of a single sensory input about the target (direct or reconstructed in another reference frame) with a single sensory input about the hand (also direct or reconstructed). Model **6C** is the least convergent of the three and as such, lends itself to a modular approach to sensory integration for the coordination of eye and hand.

# **7.1. MODEL PREDICTIONS**

Which of the three models depicted in **Figure 6** best represents human sensorimotor behavior and the underlying neurophysiology? The three computational structures that we have compared here can be distinguished on theoretical grounds and the differences between them lead to testable hypotheses, both at the behavioral level and in terms of the neural implementation as measured by electrophysiological or other methods.

### *7.1.1. Fully convergent vs. concurrent*

The question as to whether sensory signals are combined in a unique reference frame that is defined *a priori* (i.e., in line with **Figure 6A**) prior to performing the comparison between hand and target has received considerable attention in recent years and can, perhaps, already be rejected. From a Bayesian perspective, it can be argued that it is advantageous to maintain multiple representations of movement parameters, expressed in diverse reference frames, in order to optimize motor performance. Electrophysiological evidence also supports the notion that motor planning and execution is carried out in multiple reference frames in parallel, both across different regions of the brain and within a single cortical area (Buneo et al., 2002; Beurze et al., 2010; Buchholz et al., 2013; Maule et al., 2013; Reichenbach et al., 2014). At the behavioral level, the fully convergent model depicted in **Figure 6A** cannot predict certain experimentally observed characteristics of movement planning and execution. As explained in the earliest sections of this article (2–2.1), such a computational model cannot explain why sensory information about the hand is weighted differently between K-VK and V-VK tasks, nor would Model **6A** be able to predict why the CNS would reconstruct a visual representation of kinesthetic pointing task when the task is bilateral, but not when it is unilateral (Tagliabue and McIntyre, 2013). Moreover, the combination of parallel comparisons in a variety of coordinate systems gives meaning to the concept of a *hybrid* reference frame (Carrozzo and Lacquaniti, 1994). Rather than considering that the task is executed in some abstract reference frame that has little or no physical meaning, one can instead understand that the characteristics of a so-called hybrid reference frame may in fact be the manifestation of a parallel, weighted combination of individual target-hand comparisons carried out in reference frames tied to identifiable objects or sensors.

Studies that have explicitly considered sensor fusion in the case of reaching or pointing tasks have often assumed, implicitly or explicitly, the fully convergent computational structure depicted in **Figure 1A**. One such example is the work carried out by van Beers et al. (1996, 1999) who postulated that a minimization of motor variability could be the driving factor behind the choice of one motor plan over another. They explicitly refer to a convergent maximum likelihood model structure along the lines of Equation (1). The work by Smeets et al. (2006) included the assumption that the CNS maintains both a visual and a proprioceptive representation of the hand and of the target, but did not include any explicit consideration of the transformation of visual information into proprioceptive space or vice versa. Furthermore, the equations that the authors used to make the model predictions in that study would appear to adhere to the computational structure evoked by the convergent model described by Equation (1). Nevertheless, the structure of concurrent comparisons described by Equation (2) can also accommodate both of these studies, without contradiction. Thus, even though Equation (1) has been used on occasion to explain the results of a number of studies, the ability of Equation (2) to explain those studies, and to also explain the effects of target modality that cannot be explained by Equation (1) means that Equation (2) provides a more parsimonious explanation of human sensorimotor behavior.

# *7.1.2. Hybrid concurrent/convergent vs. fully concurrent*

Experiments testing the two concurrent hypotheses (**Figures 6B,C**) have been performed by various groups and reported in the literature. We believe that the hybrid formulation of Equation (31) is representative of the model proposed by McGuire and Sabes (2009). These authors used a more sophisticated Bayesian analysis to formulate their hypothesis, but as they point out, the convolutions required to represent a coordinate transformation in Bayesian notation are simply additions or subtractions and if there is no prior to be taken into account, the posterior is proportional to the likelihood. This model has been used to interpret a number of empirical results (McGuire and Sabes, 2009, 2011; Burns and Blohm, 2010). In our own studies and publications, we have implicitly used the computational structure of Equation (32) to interpret the results of a series of experiments on multi sensory integration (Tagliabue and McIntyre, 2008, 2011, 2012, 2013; Tagliabue et al., 2013). But whereas both models have been used with success to explain a wide range of empirical results, the differentiation between the hybrid concurrent/convergent formulation of **Figure 6B** and the fully concurrent formulation in **Figure 6C** has not, to our knowledge, been explicitly taken up in the literature. Yet it should be possible to distinguish between the two mechanisms, both in terms of potential theoretical advantages of one computational scheme over the other and in terms of empirical results, as we will discuss below.

One key difference between **Figures 6B,C** is that of when the difference between target and hand is actually computed. In a linear system, this distinction is not very important, since Model **6B** can be rearranged algebraically to match Model **6C**, and vice versa. But evidence suggests that the combination of sensory signals occurs in a non-linear fashion, in part as a means to deal with sensory signals that may or may not come from the same stimulus or event (Roach et al., 2006; Knill, 2007; Hospedales and Vijayakumar, 2009). If sensory signals are separated in distance or in time, the Bayesian optimal may be to rely fully on one signal or the other, rather than an weighted sum of the two. A corollary of these non-linear processes is that as two redundant signal become more separated, the combined estimate may become noisier (Wallace et al., 2004). Model **6C** has an advantage over **6B** in this respect. By combining sensory signals only after computing the movement vector, disparity between reference frames will drop out, provided that the disparity is the same for the target and for the hand. One might therefore test this hypothesis by artificially modulating the disparity between reference frames. The prediction of Model **6C** is that such an operation will not affect motor precision.

The question of how the CNS takes into account covariance between signals could also provide the basis for favoring one model over the other. In Model **6B**, the combination of visual and kinesthetic information about the target are combined by using a "local" optimality criterion, that is by taking into account the variability of the signal to be combined (including the necessary cross-modal transformations), but neglecting how the resulting optimal estimation will be used in later stages. In particular, this local optimal weighting of the target information neglects the consequences of any covariance that may be generated between the two concurrent comparisons -V and -K. The very same considerations are valid, of course, for the hand information. It follows that the brain could tend to over-estimate the benefit of weighting a given signal, because, although it would "locally" provide a more precise estimation of the target and of the hand positions, "globally" it would increase the covariance between -V and -K, and if not corrected, will increase the variance of the final output. In other words, generating optimal estimates of target and hand does not necessarily lead to optimal targeted hand movements. Model **6C**, on the other hand, is based on the combination of pairwise comparisons of target and hand, with maximum likelihood being applied to minimize the variability of the combination of multiple movement vectors. Through this more modular approach, it is potentially easier to identify and adjust for co-variation between movement vectors.

An example of this is shown in **Figure 8**, in the case of a V-VK task. The hybrid model predicts that both visual and kinesthetic information about the hand will be used to construct representations of the hand in each of the two reference frames (**Figure 8A**). Due to the inter-modal transformations, the comparison carried out in kinesthetic space will be correlated with the comparison carried out in visual space. The optimal combination of -V and -K will need to be modified to take into account the resulting co-variation. Model **6C** applied in this situation instead predicts that comparison of the visual target position, reconstructed in kinesthetic space, with the representation of the hand, reconstructed from visual information, will simply drop out, due to the co-variance with the direct comparison of target and hand in visual space (**Figure 8B**). One might therefore ask the question, will the CNS, like Penelope waiting for Ulysses with her weaving (Homer, VIII century BC), perform cross-modal reconstructions, only to undo their effects at a later stage (**Figure 8A**)? Or, by maintaining a more modular approach, can the CNS more efficiently achieve the optimal solution by performing only those transformations and comparisons that are beneficial in any given situation (**Figure 8B**)?

Of course the ultimate test of the hypotheses presented here would be to find correlates of models **6B** or **6C** in electrophysiological studies of neuronal activity. Model **6B** predicts that one should find neurons that respond to multiple sensory inputs about the target and similar neurons encoding information about the hand. Model **6C** makes a novel prediction that certain cells will be sensitive to inputs about the target in one (and only one) sensory modality but that the spatial information will be expressed in the coordinate frame of another. For example, Model **6C** predicts the existence of a cell that encodes the movement vector in visual space, even though the cell may be sensitive to modulation of proprioceptive, but not visual, signals. This would not be the case for Model **6B**, where sensory signals from each available sensory modality are expected to converge prior to the computation of the movement vector.

# **8. CONCLUSIONS**

In this article we have formulated computational models that rely on multiple concurrent computations carried out in multiple reference frames in order to optimally drive the hand to a target. We have compared these concurrent models to the more conventional viewpoint that presupposes the use of a single, common reference frame for combining multi-sensory information. The concurrent models are attractive because of their modular structure and because they better explain a variety of empirical studies. Moreover, they place the question of how to combine sensory information and how to choose the reference

sensed only visually, but the subject has both visual and kinesthetic information about the hand. Missing sources of information are represented by faded colors. Dashed lines represent sensory transformations and comparisons that can be neglected without a decrease in motor performance, given the extent to which the noise in these calculations correlates with the other comparisons. The fully-concurrent model, but not the Hybrid model, predicts that in the V-VK condition the reconstruction of the kinesthetic representation of the hand from visual feedback can be avoided.

frame(s) for any given task into a common theoretical framework, that of maximum likelihood estimation. They also make specific, testable predictions about the sensory transformations that are performed and the representations of target and hand that are maintained in working memory during the performance of sensorimotor tasks. In the spirit of this special issue on modularity in motor control, we therefore propose that the CNS performs multisensory integration in a highly modular fashion, building up the required motor commands for targeted movements from a principled combination of elementary target-hand comparisons.

### **ACKNOWLEDGMENTS**

This work was supported by the French Space Agency (Centre National d'Etudes Spatiales). We gratefully acknowledge the support of the Paris Descartes Platform for Sensorimotor Studies (Université Paris Descartes, CNRS, INSERM, Région Île-de-France).

# **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 30 September 2013; accepted: 06 January 2014; published online: 31 January 2014.*

*Citation: Tagliabue M and McIntyre J (2014) A modular theory of multisensory integration for motor control. Front. Comput. Neurosci. 8:1. doi: 10.3389/fncom. 2014.00001*

*This article was submitted to the journal Frontiers in Computational Neuroscience.*

*Copyright © 2014 Tagliabue and McIntyre. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Learning modular policies for robotics

#### *Gerhard Neumann1 \*, Christian Daniel 1, Alexandros Paraschos 1, Andras Kupcsik2 and Jan Peters 1,3*

*<sup>1</sup> Department of Computer Science, Intelligent Autonomous Systems, Technische Universität Darmstadt, Darmstadt, Germany*

*<sup>2</sup> School of Computing, National University of Singapore, Singapore*

*<sup>3</sup> Empirical Inference, Intelligent Systems, Max Planck Institute, Tübingen, Germany*

### *Edited by:*

*Andrea D'Avella, IRCCS Fondazione Santa Lucia, Italy*

### *Reviewed by:*

*Aude Billard, École Polytechnique Fédérale de Lausanne, Switzerland Scott Niekum, Carnegie Mellon University, USA*

### *\*Correspondence:*

*Gerhard Neumann, Department of Computer Science, Intelligent Autonomous Systems, Technische Universität Darmstadt, Hochschulstrasse 10, 64289 Darmstadt, Germany e-mail: neumann@ ias.tu-darmstadt.de*

A promising idea for scaling robot learning to more complex tasks is to use elementa behaviors as building blocks to compose more complex behavior. Ideally, such buildin blocks are used in combination with a learning algorithm that is able to learn to select adapt, sequence and co-activate the building blocks. While there has been a lot of wor on approaches that support one of these requirements, no learning algorithm exists tha unifies all these properties in one framework. In this paper we present our work on unified approach for learning such a modular control architecture. We introduce new polic search algorithms that are based on information-theoretic principles and are able to lear to select, adapt and sequence the building blocks. Furthermore, we developed a ne representation for the individual building block that supports co-activation and principle ways for adapting the movement. Finally, we summarize our experiments for learnin modular control architectures in simulation and with real robots. l g , k t a y n w d g

**Keywords: robotics, policy search, modularity, movement primitives, motor control, hierarchical reinforcement learning**

# **1. INTRODUCTION**

Robot learning approaches such as policy search methods (Kober and Peters, 2010; Kormushev et al., 2010; Theodorou et al., 2010) have been very successful. Kormushev et al. (2010) Learned to flip pan-cakes and Kober and Peters (2010) Learned the game ballin-the-cup. Despite these impressive applications, robot learning still offers many challenges due to the inherent high-dimensional continuous state and action spaces, the high costs of generating new data with the real robot, the partial observability of the environment and the risk of damaging the robot due to overly aggressive exploration strategies. These challenges have, so far, prevented robot learning methods to scale to more complex real world tasks.

However, many motor tasks are heavily structured. Exploiting such structures may well be the key to scale robot learning to more complex real world domains. One of the most common structures of a motor task is modularity. Many motor tasks can be decomposed into elemental movements or movement primitives (Schaal et al., 2003; Khansari-Zadeh and Billard, 2011; Rozo et al., 2013) that are used as building blocks in a modular control architecture. For example, playing tennis can be decomposed into single stroke-based movements, such as a forehand and a backhand stroke. To this end, we need a learning architecture that learns to select, improve, adapt, sequence and co-activate the elemental building blocks. Adaptation is needed as such building blocks are only useful if they can be reused for a wide range of situations, and, hence the building block needs to be adapted to the current situation. For example, for playing tennis, the ball will always approach the player slightly differently. Furthermore, we need to learn how to sequence such parametrized building blocks. Taking up our tennis example, we need to execute a sequence of strokes such that the opponent player can not return the ball on the long run. For sequencing the building blocks, we ideally want to be able to continuously switch from one building block to the next to avoid abrupt transitions, also called "blending" of building blocks. Finally, co-activation of the building blocks would considerably increase the expressibility of the control architecture. Coming back to the tennis example, co-activating primitives that are responsible for the upper body movement, i.e., the stroke, and primitives that are responsible for the movement of the lower body, i.e., making a side step or a forward step would significantly reduce the number of required building blocks.

In this paper we present an overview over our work that concentrates on learning such modular control architectures by reinforcement learning. We developed new policy search methods that can select and adapt the individual building blocks to the current situation, learn and improve a large number of different building blocks as well as to learn how to sequence building blocks to solve a complex task. Our learning architecture is based on an information-theoretic policy search algorithm called Relative Entropy Policy Search (REPS) proposed by Peters et al. (2010). The main insight used by REPS is that the relative entropy between the trajectory distributions of two subsequent policies during policy search should be bounded. This bound is particularly useful in robotics as it can cope with many of the mentioned challenges of robot learning. It decreases the danger of damaging the robot as the policy updates stay close to the "data" generated by the old policy and do not perform wild exploration. Moreover, it results in a smooth learning process and prevents the algorithm from getting stuck prematurely in local minima even for high dimensional parameter spaces that are typically used in robotics (Peters and Schaal, 2008; Daniel et al., 2012a). While there are several other policy search approaches which can either learn the selection (da Silva et al., 2012), adaptation (Kober et al., 2010b; Ude et al., 2010) or the sequencing (Stulp and Schaal, 2011) of individual building blocks, to the best of our knowledge, our approach offers the first framework that unifies all these properties in a principled way.

A common way to implement the building blocks is to use movement primitives (MPs). Movement primitives provide a compact representation of elemental movements by either parameterizing the trajectory (Schaal et al., 2003; Neumann, 2011; Rozo et al., 2013), muscle activation profiles (dAvella and Pai, 2010) or directly the control policy (Khansari-Zadeh and Billard, 2011). All of these representations offer several advantages, such as the ability to learn the MP from demonstration (Schaal et al., 2003; Rozo et al., 2013), global stability properties (Schaal et al., 2003), co-activation of multiple primitives (dAvella and Pai, 2010), or adaptability of the representation per hyper-parameter tuning (Schaal et al., 2003; Rozo et al., 2013). However, none of these approaches unifies all the desirable properties of a MP in one framework. We therefore introduced a new MP representation that is particularly well suited to be used in a modular control architecture. Our MP representation is based on distributions over trajectories and is called Probabilistic Movement Primitive (ProMP). It can, therefore, represent the variance profile of the resulting trajectories, which allows us to encode the importance of time points as well as represent optimal behavior in stochastic systems (Todorov and Jordan, 2002). However, the most important benefit of a probabilistic representation is that we can perform probabilistic operators on trajectory distributions, i.e., conditioning for adaptation of the MP and a product of distributions for co-activation and blending of MPs. Yet, such a probabilistic representation is of little use if we cannot use it to control the robot. Therefore, we showed that a stochastic time-varying feedback controller can be obtained analytically, enabling us to use the probabilistic movement primitive approach as a promising future representation of a building block in modular control architectures. We will present experiments on several real robot tasks such as playing tether-ball and shooting a hockey puck. The robots used for the experiments are illustrated in **Figure 1**.

## **1.1. RELATED WORK**

### *1.1.1. Movement representations*

Different elemental movement representations have been proposed in the literature. The most prominent one is the dynamic movement primitive (DMP) approach (Ijspeert and Schaal, 2003; Schaal et al., 2003). DMPs encode a movement in a parametrized

dynamical system. The dynamical system is implemented as a second order spring damper system which is perturbed by a non-linear forcing function *f* . The forcing function depends nonlinearly on the phase variable *zt* which denotes a clock for the movement. The evolution of the phase variable can be made faster or slower by the temporal scaling factor τ , which finally also changes the execution speed of the movement. The forcing function is linearly parametrized by a parameter vector **w** and can be easily learned from demonstrations. In addition to the high dimensional parameters **w**, we can adjust meta-parameters of the DMPs such as the goal attractor **g** of the spring-damper system and temporal scaling factor. In Kober et al. (2010a), the DMPs have been extended to include the final desired velocity in its meta-parameters. DMPs have several advantages. They are easy to learn from demonstrations and by reinforcement learning, they can be used for rhythmic and stroke-based movements and they have build-in stability guarantees. However, they also suffer from some disadvantages. The can not represent optimal behavior in a stochastic environment. In addition, the generalization to a new end position is based on heuristics and not learned from demonstrations and it is not clear how DMPs can be combined simultaneously. Several other movement primitive representation have been proposed in the literature. Some of them are based on DMPs to overcome their limitations (Calinon et al., 2007; Rozo et al., 2013), but none of them can overcome all the limitations in one framework. Rozo et al. (2013) estimate a time varying feedback controller for the DMPs, however, how this feedback controller is obtained is based on heuristics. They also implement a combination of primitives as a product of GMMs which is similar to the work presented here on the probabilistic movement primitives. However, this approach is lacking a principled way of determining a feedback controller that exactly matches the trajectory distribution. Therefore, it is not clear what the result of this product is if we apply the resulting controller on the robot.

Most of the movement representations explicitly depend on time (Ijspeert and Schaal, 2003; Neumann and Peters, 2009; Paraschos et al., 2013; Rozo et al., 2013). For time-dependent representations, a linear controller is often sufficient to model complex behavior as the non-linearity is induced by the time dependency. In contrast, time-independent models such as the Stable Estimator of Dynamical Systems (SEDS) approach (Khansari-Zadeh and Billard, 2011) directly estimate a state dependent policy that is independent of time. Such models require more complex, non-linear controllers. For example, the SEDS approach uses a GMM to model the policy. The GMM is estimated such that the resulting policy is proofed to be stable. Due to the simplicity of the policy, time-dependent representations can be easily scaled up to higher dimensions as shown by Ijspeert and Schaal (2003). Due to the increased complexity, time-independent models are typically used for lower dimensional movements such as modeling the movement directly in task space. Yet, a time-independent model is the more general representation as it does not require the knowledge of the current time step. In this paper, we will nevertheless concentrate on time-dependent movement representations.

### *1.1.2. Policy search*

The most common reinforcement learning approach to learn the parameters of an elemental movement representation such as a DMP is policy search (Williams, 1992; Peters and Schaal, 2008; Kober and Peters, 2010; Kober et al., 2010a). The goal of policy search is to find a parameter vector of the policy such that the resulting policy optimizes the expected long-term reward. Many policy search methods use a stochastic policy for exploration. They can be coarsely categorized according their policy update strategy. Policy gradient methods (Williams, 1992; Peters et al., 2003) are one of the earliest policy update strategies that were applied to motor primitive representations. They estimate the gradient of the expected long-term reward with respect to the policy parameters (Williams, 1992) and update the policy parameters in the direction of this gradient. The main disadvantages of policy gradient methods are the necessity to specify a hand-tuned learning rate, the poor learning speed and that typically many samples are required to obtain a new policy without sample re-use.

More recent approaches rely on probabilistic methods. These methods typically base their derivation on the expectationmaximization algorithm (Vlassis et al., 2009; Kober and Peters, 2010) and formulate the policy search problem as inference problem by transforming the reward into an improper probability distribution, i.e., the transformed reward is required to be always positive. Such transformation is typically achieved by an exponential transformation with a hand-tuned temperature. The resulting policy update can be formulated as a weighted model fitting task where each sample is weighted by the transformed long-term rewards (Kober and Peters, 2010). Using a probabilistic model fitting approach to compute the policy update results in the important advantage that we can use a big toolbox of algorithms for estimating structured probabilistic models, such as the expectation maximization algorithm (Dempster et al., 1977) or variational inference (Neal and Hinton, 1998). Additionally, it does not require a user specified learning rate. These approaches typically directly explore in the parameter space of the policy by estimating a distribution over the policy parameters. Such approach works well if we have a moderate number of parameters.

Another algorithm that has recently gained a lot of attention is the policy improvement by path integrals (PI2) algorithm (Theodorou et al., 2010; Stulp and Sigaud, 2012). The path integral theory allows to compute the globally optimal trajectory distribution along with the optimal controls without requiring a value function as opposed to traditional dynamic programming approaches. However, the current algorithm is limited to learning open-loop policies (Theodorou et al., 2010; Stulp and Sigaud, 2012) and may not be able to adapt the the variance of the exploration policy (Theodorou et al., 2010).

### *1.1.3. Generalization of skills*

An important requirement in a modular control architecture is that we can adapt a building block to the current situation or task. We will describe a task or a situation with a context vector **s**. The context vector can contain the objectives of the agent, e.g., throwing a ball to a desired target location, or physical properties of the environment. e.g., the mass of the ball to throw. Ude et al. (2010) use supervised learning to generalize movement primitives from a set of demonstrations. Such approach is well suited to generalize a set of demonstrations to new situations, but can not be used to improve the skills upon the demonstration. To alleviate this limitation, da Silva et al. (2012) combines low-dimensional subspace extraction for generalization and policy search methods for policy improvement. Finding such low-dimensional sub-spaces is an interesting idea that can considerably improve the generalization of the skills. Yet, there is one important limitation of the approach presented in da Silva et al. (2012). The algorithms for policy improvement and skill generalization work almost independently from from each other. The only way they interact is that the generalization is used as initialization for the policy search algorithm when a new task needs to be learned. As a consequence, the method needs to create many roll-outs for the same task/context in order to improve the skill for this context. Such limitation is relaxed by contextual policy search methods (Kober et al., 2010b; Neumann, 2011). Contextual policy search methods explicitly learn a policy that choses the control parameters θ in accordance to the context vector **s**. Therefore, a different context can be used for each roll-out. Kober et al. (2010b) us a Gaussian Process (GP) for generalization. While GPs have good generalization properties, they are of limited use for policy search as they typically learn an uncorrelated exploration policy. The approach in Neumann (2011) can use a directed exploration strategy, but it suffers from high computational demands.

# *1.1.4. Sequencing of skills*

Another requirement is to learn to sequence the building blocks. Standard policy search methods typically choose a single parameter vector per episode. Hence, such methods can be used to learn the parameters of a single building block. In order to sequence building blocks, we have to learn how to choose multiple parameter vectors per episode. The first approach (Neumann and Peters, 2009) for learning to sequence primitives was based on value-function approximation techniques, which restricted its application on a rather small set of parameters for each primitive. Recently, (Stulp and Schaal, 2011) adapted the path integral approach to policy search to sequence movement primitives. Other approaches (Morimoto and Doya, 2001; Ghavamzadeh and Mahadevan, 2003) use hand-specified sub-tasks to learn the sequencing of elemental skills. Such an approach is limited in its flexibility of the resulting policy and the sub-tasks are typically not easy to define manually.

# *1.1.5. Segmentation and modular imitation learning*

Segmentation (Kulic et al., 2009; Álvarez et al., 2010; Meier et al., 2011) and modular imitation learning (Niekum et al., 2012) is a very important and challenging problem to autonomously extract the structure of the modular control policy from demonstrations. In Meier et al. (2011) and Álvarez et al. (2010), the segmentation is done due to parameter changes in the dynamical system that is supposed to have created the motion. In Chiappa and Peters (2010), Bayesian methods are used to construct a library of building blocks. Repeated skills are modeled to be generated by one of the building-blocks, which are rescaled and noisy. Based on the segmentation of the demonstrations, we can infer the single building blocks from the data by clustering the segments. One approach that integrates clustering and segmentation is to use Hidden Markov Models (HMMs). Williams and Storkey (2007) used a HMM to extract movement primitives from hand-writing data. While this is a very general approach, it has only been used to rather low-dimensional data, i.e., 2-D movements. Niekum et al. (2012) use a beta-process auto regressive HMM to estimate the segmentation which has the advantage the number of building blocks can also be inferred from data. DMPs are used to represent the policy of the single segments. Butterfield et al. (2010) use a HMM to directly estimate the policy. For each hidden state, they fit a Gaussian Process model to represent the policy of this hidden state. The advantages of these imitation learning approaches is that we can also estimate the temporal structure of the modular control policy, i.e., when to switch from one building block to the next. So far, such imitation learning approaches have not been integrated in a reinforcement learning framework, which seems to be a very interesting direction. For example, in current reinforcement learning approaches, the duration of the building blocks is specified by a single parameter. Estimating the duration of the building blocks from the given trajectory data seems to be a fruitful and more general approach.

# **2. INFORMATION THEORETIC POLICY SEARCH FOR LEARNING MODULAR CONTROL POLICIES**

In this section we will sequentially introduce our information theoretic policy search framework used for learning modular control policies. We start our discussion with the adaptation of a single building block. Subsequently, we discuss how to learn to select a building block and, finally, we will discuss sequencing of building blocks.

After introducing each component of our framework, we briefly discuss related experiments on real robots and in simulation. In this paper, we can only give a brief overview over the experiments. For more details, we refer to the corresponding papers. In our experiments with our information theoretic policy search framework, we used Dynamic Movement Primitives (DMP) introduced in Schaal et al. (2003) as building blocks in our modular control architecture. In all our experiments, we used the hyper-parameters of a DMP as parameters of the building blocks, such as the final positions and velocities of the joints (Kober et al., 2010a) as well as the temporal scaling factor of the DMPs for changing the execution speed of the movement.

# **2.1. LEARNING TO ADAPT THE INDIVIDUAL BUILDING BLOCKS**

We formulate the learning of the adaptation of the building blocks as contextual policy search problem (Kober et al., 2010b; Neumann, 2011; Daniel et al., 2012a), where we will for now assume that we want to execute only a single building block. Adaptation of a building block is implemented by an upper-level policy π(θ |**s**) that chooses the parameter vector θ of the building block according to the current context vector **s**. The context describes the task. It might contain objectives of the agent or properties of the environment, for example, the incoming velocity of a tennis ball. After choosing the parameters θ , the lower level policy **u***<sup>t</sup>* = *f*(**x***t*, θ ) of the building block takes over and is used to control the robot. Note that we use the symbol **x***<sup>t</sup>* to denote the state of the robot. The state **x***<sup>t</sup>* typically contains the joint angles **q***<sup>t</sup>* and joint velocities **q**˙*<sup>t</sup>* of the robot and it should not be confused with the context vector **s**. The context vector **s** describes the task and contains higher level objectives of the agent. For example, such a lower level policy can be defined by a trajectory tracking controller that tracks the desired trajectory of a dynamic movement primitive (DMP) (Schaal et al., 2003).

Our aim is to learn an upper-level policy that maximizes the expected reward

$$J\_{\pi} = \iint \mu(\mathbf{s}) \pi(\boldsymbol{\theta}|\mathbf{s}) R(\mathbf{s}, \boldsymbol{\theta}) d\mathbf{s} d\boldsymbol{\theta},$$

$$R(\mathbf{s}, \boldsymbol{\theta}) = \int p(\boldsymbol{\tau}|\mathbf{s}, \boldsymbol{\theta}) r(\boldsymbol{\tau}, \mathbf{s}) d\boldsymbol{\tau},\tag{1}$$

where *R*(**s**, θ ) is the expected reward of the resulting trajectory τ when using parameters θ in context **s** and μ(**s**) denotes the distribution over the contexts that is specified by the learning problem. The distribution *p*(τ |**s**, θ ) denotes the probability of a trajectory given **s** and θ and *r*(τ,**s**) a user-specified reward function that depends on the trajectory τ and on the context **s**. We use the Relative Entropy Policy Search (REPS) algorithm (Peters et al., 2010) as underlying policy search method, The basic idea of REPS is to bound the relative entropy between the old and the new parameter distribution. Here, we will consider the episode-based contextual formulation of REPS (Daniel et al., 2012a; Kupcsik et al., 2013) that is tailored for learning such an upper-level policy. The policy update step is defined as constrained optimization problem where we want to find the distribution *p*(**s**, θ ) = μ(**s**)π(θ |**s**) that maximizes the average reward given in Eq. 1 with respect to *p*(**s**, θ ) and simultaneously satisfies several constraints. We will first discuss these constraints and show how to compute *p*(**s**, θ ). Subsequently, we will explain how to obtain the upper-level policy π(θ |**s**) from *p*(**s**, θ ).

Generally, we initialize any policy search (PS) method with an initial policy *q*0(**s**, θ ) = μ(**s**)*q*0(θ |**s**), either obtained through learning from demonstration or by manually setting a distribution for the parameters. The variance of the initial distribution *q*0(**s**, θ ) defines the exploration region. Policy search is an iterative process. Given the sampling distribution *q*0(**s**, θ ), we obtain a new distribution *p*1(**s**, θ ). Subsequently, *p*<sup>1</sup> is used as new sampling policy *q*<sup>1</sup> and the process is repeated.

PS methods need to find a trade-off between keeping the initial exploration and constricting the policy to a (typically local) optimum. In REPS, this trade-off is realized via the Kullback-Leibler (KL) divergence. REPS maximizes the reward under the constraint that the KL-divergence to the old exploration policy is bounded, i.e.,

$$
\epsilon \ge \text{KL}\left(p(\mathbf{s}, \boldsymbol{\theta}) || q(\mathbf{s}, \boldsymbol{\theta})\right). \tag{2}
$$

Due to this bound, we can choose between exploitation with the greedy policy (high KL-bound) or continue to explore with the old exploration policy (very small KL-bound). The KL divergence in REPS bounds not only the conditional probability π(θ |**s**), i.e., the differences in the policies, but also the joint state-action probabilities *p*(**s**, θ ) to ensures that the observed state-action region does not change rapidly over iterations, which is paramount to a real robot learning algorithm. Using the (asymmetric) KL divergence KL *p*(**s**, θ )||*q*(**s**, θ ) allows us to find a closed form solution of the algorithm. Such closed form would not be possible with the opposite KL divergence, i.e., KL *q*(**s**, θ )||*p*(**s**, θ ) .

**s**

We also have to consider that the context distribution *p*(**s**) = *p*(**s**, θ )*d*θ cannot be freely chosen by the agent as it is specified by the learning problem and given by μ(**s**). Hence, we need to add the constraints ∀**s** : *p*(**s**) = μ(**s**) to match the given context distribution μ(**s**). However, for continuous context vector **s**, we would end up with infinitely many constraints. Therefore, we resort to matching feature averages instead of single probability values, i.e., *p*(**s**)φ(**s**)*d***s** = φ,ˆ where φ(**s**) is a feature vector describing the context and φˆ is the mean observed feature vector.

The resulting constrained optimization problem is now given by

$$\begin{aligned} \max\_{\boldsymbol{\theta}} & \iint\_{\mathbf{s}, \boldsymbol{\theta}} p(\mathbf{s}, \boldsymbol{\theta}) R(\mathbf{s}, \boldsymbol{\theta}) d\mathbf{s} d\boldsymbol{\theta}, \quad \text{s.t.:} \; \boldsymbol{\epsilon} \ge \text{KL} \left( p(\mathbf{s}, \boldsymbol{\theta}) ||q(\mathbf{s}, \boldsymbol{\theta}) \right), \\\\ & \int p(\mathbf{s}) \boldsymbol{\Phi}(\mathbf{s}) d\mathbf{s} = \boldsymbol{\hat{\Phi}}, \quad \iint p(\mathbf{s}, \boldsymbol{\theta}) d\mathbf{s} d\boldsymbol{\theta} = 1. \end{aligned} \tag{3}$$

It can be solved by the method of Lagrangian multipliers and yields a closed-form solution solution for *p* that is given by

**s**,

θ

$$p(\mathbf{s}, \boldsymbol{\theta}) \propto q(\mathbf{s}, \boldsymbol{\theta}) \exp\left(\frac{R(\mathbf{s}, \boldsymbol{\theta}) - V(\mathbf{s})}{\eta}\right),\tag{4}$$

where *V*(**s**) = **v***T*φ(**s**) is a context dependent baseline that is subtracted from the the reward signal. The scalar η and the vector **v** are Lagrangian multipliers that can be found by optimizing the dual function *g*(η, **v**) (Daniel et al., 2012a). It can be shown that *V*(**s**) can be interpreted as value function (Peters et al., 2010) and, hence, estimates the mean performance of the new policy in context **s**.

The optimization defined by the REPS algorithm is only performed on a discrete set of samples *D* = **s**[*i*] , θ [*i*] , *R*[*i*] , *i* = 1,..., *N*, where *R*[*i*] denotes the return obtained by the *i*th rollout. The resulting probabilities *p* **s**[*i*] , θ [*i*] , see Equation (4), of these samples are used to weight the samples. In order to obtain the weight *p*[*i*] for each sample, we need to divide *p* **s**[*i*] , θ [*i*] by the sampling distribution *q*(**s**, θ ) to account for the sampling probability (Kupcsik et al., 2013), i.e.,

$$p^{\{i\}} = \frac{p\left(\mathbf{s}^{\{i\}}, \boldsymbol{\theta}^{\{i\}}\right)}{q\left(\mathbf{s}^{\{i\}}, \boldsymbol{\theta}^{\{i\}}\right)} \propto \exp\left(\frac{R(\mathbf{s}, \boldsymbol{\theta}) - V(\mathbf{s})}{\eta}\right). \tag{5}$$

Hence, being able to sample from *q* is sufficient and *q* is not needed in its analytical form.

The upper-level policy π(θ |**s**) is subsequently obtained by performing a weighted maximum-likelihood (ML) estimate. We use a linear-Gaussian model to represent the upper-level policy π(θ |**s**) = *N* (θ |**a** + **As**,) of the building block, where the parameters **a**, **A** and are obtained through the ML estimation. As a building block is typically reused only for similar contexts **s**, a linear model is sufficient in most cases. **Figure 2** shows an illustration of how a linear model can adapt the trajectories generated by a DMP. In practice, we still need

an initial policy *q*. This initial policy can either be obtained through learning from demonstration or by selecting reasonable parameters and variance if the experimenter has sufficient task knowledge.

In Kupcsik et al. (2013), we further improved the dataefficiency of our contextual policy search algorithm by learning probabilistic forward models of the real robot and its environment. With these forward models, we can predict the reward *R* **s**[*j*] , θ [*j*] for unseen context-parameter pairs **s**[*j*] and θ [*j*] and use these additional samples for computing the policy update. The data-efficiency of our method could be improved up to two orders of magnitude using the learned forward models. As we used Gaussian Processes (GPs) (Rasmussen and Williams, 2006) to represent the forward models, this extension of our method is called GPREPS. These forward models were used to generate additional data points that are used for the policy update. For each of these virtual data points, we generated 15 trajectories with the learned forward models. We used the average reward of these predicted trajectories as reward used in the REPS optimization. We used sparse GPs (Snelson and Ghahramani, 2006) to deal with the high number of data points within a reasonable computation time.

### *2.1.1. Experimental evaluation of the adaptation of building blocks - robot hockey target shooting*

In this task we used GPREPS with learned forward models to learn how to adapt the building blocks such that the robot can shoot hockey pucks to different locations. The objective was to make a target puck move for a specified distance by shooting a second hockey puck at the target puck. The context **s** was composed of the initial location [*bx*, *by*] *<sup>T</sup>* of the target puck and the distance *d*<sup>∗</sup> that the target puck had to be shoot, i.e., **s** = [*bx*, *by*, *d*∗] *<sup>T</sup>*. We chose the initial position of the target puck to be uniformly distributed from the robot's base with displacements *bx* ∈ [1.5, 2.5]m and *by* ∈ [0.5, 1]m. The desired displacement context parameter *d*<sup>∗</sup> is also uniformly distributed *d*<sup>∗</sup> ∈ [0, 1]m. The reward function

$$r(\mathbf{r}, \mathbf{s}) = -\min\_{t} \| |\mathbf{x}\_t - \mathbf{b}||\_2 - \| |d\_T - d^\* \|\_2 \|$$

consist of two terms with equal weighting. The first term penalizes missing the target puck located at position **b** = [*bx*, *by*] *<sup>T</sup>*, where the control puck trajectory is *x*1:*T*. The second term penalizes the error in the desired displacement of the target puck, where *dT* is the resulting displacement of the target puck after the shot. The parameters θ define the weights and goal position of the DMP. The policy in this experiment was a linear Gaussian policy. The simulated robot task is depicted in **Figure 3**.

GPREPS first learned a forward model to predict the initial position and velocity of the first puck after contact with the racket and a travel distance of 20 cm. Subsequently, GPREPS learned the free dynamics model of both pucks and the contact model of the pucks. We assumed that we know the geometry of the pucks to detect a contact. If there is a contact, we used the contact model to predict the state of both pucks after the contact given the state of both pucks before the contact. From this state, we again predicted the final puck positions after they came to stop with a separate GP model.

We compared GPREPS in simulation to directly predicting the reward *R*(**s**, θ ), model-free REPS and CrKR (Kober et al., 2010b), a state-of-the-art model-free contextual policy search method. The resulting learning curves are shown in **Figure 3** (middle). GPREPS learned the task already after 120 interactions with the environment while the model-free version of REPS needed approximately 10000 interactions. Directly predicting the rewards from parameters θ using a single GP model resulted in faster convergence but the resulting policies still showed a poor performance (*GP direct*). The results show that CrKR could not compete with model-free REPS. The learned movement is shown in **Figure 3** for a specific context. After 100 evaluations, GPREPS placed the target puck accurately at the desired distance with an error ≤ 5 cm.

Finally, we evaluated the performance of GPREPS on the hockey task using a real KUKA lightweight arm. The learning curve of this experiment is shown in **Figure 3** (right) and confirms that GP-REPS can find high-quality policies within a small amount of interactions with the environment.

## **2.2. LEARNING TO SELECT THE BUILDING BLOCKS**

In order to select between several building blocks *o*, we add an additional level of hierarchy on top of the upper-level policies of the individual building blocks. We assume that each building block shares the same parameter space. The parameters are now selected by first choosing the building block to execute with a gating policy π*G*(*o*|**s**) and, subsequently, the upper level parameter policy π*P*(θ |**s**, *o*) of the building block *o* selects the parameters θ . Hence, π(θ |**s**) can be written as hierarchical policy

$$
\pi\left(\boldsymbol{\theta}|\mathbf{s}\right) = \sum\_{\boldsymbol{\rho}} \pi\_{\boldsymbol{G}}(\boldsymbol{o}|\mathbf{s}) \pi\_{\boldsymbol{P}}(\boldsymbol{\theta}|\mathbf{s}, \boldsymbol{o}).\tag{6}
$$

In this model, the gating policy composes a complex, nonlinear parameter selection strategy out of the simpler upper level policies of the building blocks. Moreover, it can learn multiple solutions for the same context, which also increases the versatility of the learned motor skill (Daniel et al., 2012b). While a similar decomposition in gating policy and option policies has been presented in da Silva et al. (2012), their framework was not integrated in a reinforcement learning algorithm, and hence, generalization and improvement the building blocks is performed by two independent algorithms, resulting in sample-inefficient policy updates.

To incorporate multiple building blocks, we now bound the Kullback-Leibler divergence between *q*(**s**, θ , *o*) and *p*(**s**, θ , *o*). As we are interested in versatile solutions, we also want to avoid that several building blocks concentrate on the same solution. Hence, we want to limit the "overlap" between building blocks in the parameter space. In order to do so, we bound the expected entropy of the conditional distribution *p*(*o*|**s**, θ ), i.e.,

$$-\int p(\mathbf{s},\boldsymbol{\theta})\sum\_{o}p(o|\mathbf{s},\boldsymbol{\theta})\log p(o|\mathbf{s},\boldsymbol{\theta})d\mathbf{s}d\boldsymbol{\theta} \leq \kappa. \tag{7}$$

A low entropy of *p*(*o*|**s**, θ ) ensures that our building blocks do not overlap in parameter space and, thus, represent individual and clearly separated solutions (Daniel et al., 2012a). The new optimization program results in the hierarchical version of REPS, denoted as HiREPS. We can again determine a closed form solution for *p*(**s**, θ , *o*) which is given in Daniel et al. (2012a). As in the

robot hockey task in simulation. GPREPS was able to learn the task within 120 interactions with the environment, while the model-free version of REPS needed about 10000 episodes. **(Right)** GPREPS learning curve on the real robot arm.

previous section, the optimization problem is only solved for a given set of samples that has been generated from the distribution *q*(**s**, θ ). Subsequently, the parameters of the gating policy and the upper-level policies are obtained by weighted ML estimates. We use a Gaussian gating policy and an individual linear Gaussian policy π(θ |**s**, *o*) = *N* (θ |**a***<sup>o</sup>* + **A***o***s**,*o*) for each building block. As we use a linear upper-level policy and the used DMPs produce only locally valid controllers, our architecture might require a large number of building blocks.

# *2.2.1. Experimental evaluation of the selection of building blocks robot tetherball*

In robot tetherball, the robot has to shoot a ball that is fixed with a string on the ceiling such that it winds around a pole. The robot obtains a reward proportional to the speed of the ball winding around the pole. There are two different solutions, to wind the ball around the left or to the right side of the pole. Two successful hitting movements of the real robot are shown in **Figure 5**. We decompose our movement into a swing-in motion and a hitting motion. As we used the non-sequential algorithm for this experiment, we represented the two motions by a single set of parameters and jointly learn the parameters θ for the two DMPs. We start the policy search algorithm with 15 options with randomly distributed parameters sampled from a Gaussian distribution around the parameters of the initial demonstration. We use a higher number of building blocks to increase the probability of finding both solutions with the building blocks. If we use two randomly initialized building blocks, the probability that both cover the same solution is quite high. We delete unused building blocks that have a very small probability of being chosen, i.e., *p*(*o*) < 0.001. The learning curve is shown in **Figure 4** (left). The noisy reward signal is mostly due to the vision system and partly also due to real world effects such as friction. Two resulting movements of the robot are shown in **Figure 5**. The robot could learn a versatile strategy that contained building blocks that wind the ball around the left and building blocks that wind the ball around the right side of the pole.

### **2.3. LEARNING TO SEQUENCE THE BUILDING BLOCKS**

To execute multiple building blocks in a sequence, we reformulate the problem of sequencing building blocks as Markov Decision Process (MDP). Each building block defines a transition probability *p*(**s** |**s**, θ ) over future contexts and an immediate reward function *R*(**s**, θ ). It is executed until its termination condition *to*(**s**, θ ) is satisfied. However, in our experiments, we used a fixed duration for each building block. Note that traditional reinforcement learning methods, such as TD-learning, can not deal with such MDPs as its action space is high dimensional and continuous.

We concentrate on the finite-horizon case, i.e., each episode consists of *K* decision steps where each step is defined as the execution of an individual building block. For clarity, we will only discuss the sequencing of a single building block, however, the selection of multiple building blocks at each decision step can be easily incorporated (Daniel et al., 2013).

In the finite horizon formulation of REPS we want to find the probabilities *pk*(**s**, θ ) = *pk*(**s**)π(θ |**s**), *k* ≤ *K*, and *pK*+1(**s**) that maximize the expected long term reward

$$J = \int\_{\mathbf{s}} p\_{K+1}(\mathbf{s}) R\_{K+1}(\mathbf{s}) d\mathbf{s} + \sum\_{k=1}^{K} \iint\_{\mathbf{s}, \mathbf{\theta}} p\_k(\mathbf{s}, \mathbf{\theta}) R\_k(\mathbf{s}\_k, \mathbf{\theta}\_k) d\mathbf{s} d\mathbf{\theta},$$

where *RK* <sup>+</sup> 1(**s***<sup>K</sup>* <sup>+</sup> 1) denotes the final reward for ending up in the state **s***<sup>K</sup>* <sup>+</sup> <sup>1</sup> after executing the last building block. As in the previous case, the initial context distributions is given by the task, i.e., ∀**s** : *p*1(**s**) = μ1(**s**). Furthermore, the context distribution at future decision steps *k* > 1 need to be consistent with the the past distributions *pk* <sup>−</sup> 1(**s**, θ ) and the transition model *p*(**s** |**s**, θ ), i.e.,

$$\forall \mathbf{s}', k > 1: p\_k\left(\mathbf{s}'\right) = \iint\limits\_{\mathbf{s}, \mathbf{\theta}} p\_{k-1}(\mathbf{s}, \mathbf{\theta}) p\left(\mathbf{s}' | \mathbf{s}, \mathbf{\theta}\right) d\mathbf{s} d\mathbf{\theta},$$

for each decision step of the episode. These constraints connect the policies for the individual decision-steps and result in a policy π*k*(θ |**s**) that optimizes the long-term reward instead of the immediate ones. As in the previous sections, these constraints are again implemented by matching feature averages.

The closed form solution of the joint distribution *pk*(**s**, θ ) yields

$$\begin{aligned} p\_k(\mathbf{s}, \boldsymbol{\theta}) & \propto q\_k(\mathbf{s}, \boldsymbol{\theta}) \exp\left(\frac{A\_k(\mathbf{s}, \boldsymbol{\theta})}{\eta\_k}\right), \\ A\_k(\mathbf{s}, \boldsymbol{\theta}) &= R\_k(\mathbf{s}, \boldsymbol{\theta}) + \mathbb{E}\_{\boldsymbol{\theta} \in (s' \mid \mathbf{s}, \boldsymbol{\theta})} \left[V\_{k+1}\left(\mathbf{s'}\right)\right] - V\_k(\mathbf{s}). \end{aligned}$$

We can see that the reward *Rk*(**s**, θ ) is transformed into an advantage function *Ak*(**s**, θ ) where the advantage now also depends on the expected value of the next state E*p*(**s** |**s**,θ ) *Vk* <sup>+</sup> <sup>1</sup> **s** . This term ensures that we do not just optimize the immediate reward but the long term reward.

## *2.3.1. Experimental evaluation of sequencing of building blocks sequential robot hockey*

We used the sequential robot hockey task to evaluate sequential motor skill learning framework. The robot has to move the target

puck into one of three target areas by sequentially shooting a control puck at the target puck. The target areas are defined by a specified distance to the robot, see **Figure 6** (left). The robot gets rewards of 1, 2, and 3 for reaching zone 1, 2 or 3, respectively. After each shot, the control puck is returned to the robot. The target puck, however, is only reset after every third shot.

The 2-dimensional position of the target puck defines the context **s** of the task and the parameter vector θ defines the goal positions of the DMP that define the desired trajectory of the robot's joints. After performing one shot, the agent observes the new context to plan the subsequent shot. In order to give the agent an incentive to shoot at the target puck, we punished the agent with the negative minimum distance of the control puck to the target puck after each shot. While this reward was given after every step, the zone reward was only given at the end of the episode (every third step) as *rK* <sup>+</sup> 1(**s***<sup>K</sup>* <sup>+</sup> 1).

We compared our sequential motor primitive learning method with its episodic variant on a realistic simulation. For the episodic variant we used one extended parameter vector ˜ θ that contained the parameters for all three hockey shoots. The comparison of both methods can be seen in **Figure 6** (middle). Due to the high-dimensional parameter space, the episodic learning setup failed to learn a proper policy while our sequential motor primitive learning framework could learn policies of much higher quality.

On the real robot, we could reproduce the simulation results. The robot learned a strategy which could move the target puck to the highest reward zone in most of the cases after 300 episodes. The learning curve is shown in **Figure 6** (right).

# **3. PROBABILISTIC MOVEMENT PRIMITIVES**

In the second part of this paper, we investigate new representations for the individual building blocks of movements that are particularly suited to be used in a modular control architecture. In all experiments for our modular policy search framework, we so far used the Dynamic Movement Primitive (DMP) approach (Schaal et al., 2003). DMPs are widely used, however, when used for our modular control architecture, DMPs suffer from severe limitations as they do not support co-activation or blending of building blocks. In addition, the DMPs use heuristics for the adaptation of the motion. Hence, we focus our discussion on our new movement primitive (MP) representation (Paraschos et al., 2013) on a these two important properties.

We use a trajectories τ = *qt <sup>t</sup>* <sup>=</sup> <sup>0</sup>...*T*, defined by the joint angles *qt* over time, to model a single movement. We will use a probabilistic representation of a movement, which we call probabilistic movement primitives (ProMP), where a movement primitive describes several ways how to execute a movement (Paraschos et al., 2013). Hence, the movement primitive is given as distribution *p*(τ ) over trajectories. A probabilistic representation offers several advantages that make it particularly suitable to be used in a modular control architecture. Most importantly, it offers principled ways to adapt as well as to co-activate movement primitives. Yet, these advantages of a probabilistic trajectory representation are of little use if we can not use it to control the robot. Therefore, we derive a stochastic feedback controller in closed form that can exactly reproduce a given trajectory distribution, and, hence, trajectory distributions can be used directly for robot control.

In this section, we present two experiments that we performed with the ProMP approach. As we focused on the representation of the individual building blocks, we evaluated the new representation without the use of reinforcement learning and learned the ProMPs by imitation. In our experiments, we illustrate how to use conditioning as well as co-activation of the building blocks.

### **3.1. PROBABILISTIC TRAJECTORY REPRESENTATION**

In the imitation learning setup, we assume that we are given several demonstrations in terms of trajectories τ*i*. In our probabilistic approach we want to learn a distribution of these trajectories. We will first explain the basic representation of a trajectory distribution and subsequently cover the two new operations that are now available in our probabilistic framework, i.e., conditioning and co-activation. Finally, we will explain in Section 3.3 how to control the robot with a stochastic feedback controller that exactly reproduces the given trajectory distribution.

We use a weight vector **w** to compactly represent a single trajectory τ . The probability of observing a trajectory τ given the weight vector **w** is given as a linear basis function model *p*(τ |**w**) = *<sup>t</sup>N* **y***t*| *T <sup>t</sup>* **w**,*<sup>y</sup>* , where **y***<sup>t</sup>* = [*qt*, *q*˙*t*] *<sup>T</sup>* contains the joint position *qt* and joint velocity *q*˙*t*, *<sup>t</sup>* = [ψ, ψ˙*t*] defines the time-dependent basis matrix and *<sup>y</sup>* is zero-mean i.i.d. Gaussian noise.

We now abstract a distribution over trajectories as distribution *p*(**w**; θ ) over the weight vector **w** that is parametrized by the parameter vector θ . The original trajectory distribution *p*(τ ; θ ) can now be computed by marginalizing of the weight vector **w**, i.e., *p*(τ ; θ ) = *p*(τ |**w**)*p*(**w**; θ )*d***w**. We will assume a Gaussian distribution for *p*(**w**; θ ) = *N* (**w**|μ*w*,*w*) and, hence, *p*(τ ; θ ) can be computed analytically, i.e.,

$$\rho\left(\mathbf{y}\_t;\boldsymbol{\theta}\right) = \mathcal{N}\left(\mathbf{y}\_t|\boldsymbol{\Psi}\_t^T\boldsymbol{\mu}\_\mathbf{w}, \boldsymbol{\Psi}\_t^T\boldsymbol{\Sigma}\_\mathbf{w}\boldsymbol{\Psi}\_t + \boldsymbol{\Sigma}\_\mathbf{y}\right) \dots$$

As a probabilistic MP represents multiple ways to execute an elemental movement, we also need multiple demonstrations to learn *p*(**w**; θ ). The parameters θ = {μ**w**,**w**} can be learned by maximum likelihood estimation, for example, by using the expectation maximization algorithm (Lazaric and Ghavamzadeh, 2010).

For multi-dimensional systems, we can also learn the coupling between the joints. Coupling is typically represented by the covariance of the joint positions and velocities. We can learn this covariance by maintaining a parameter vector **w***<sup>i</sup>* for each dimension *i* and learn a distribution over the combined weight vector **w** = **w***T* <sup>1</sup> ,...,**w***<sup>T</sup> n T* .

To be able to adapt the execution speed of the movement, we introduce a phase variable *z* to decouple the movement from the time signal (Schaal et al., 2003). The phase can be any function *z*(*t*) monotonically increasing with time. The basis functions ψ*<sup>t</sup>* are now decoupled from the time and depend on the phase, such that ψ*<sup>t</sup>* = ψ(*zt*) and ψ˙*<sup>t</sup>* = ψ (*zt*)*z*˙*t*. The choice of the basis functions depends on whether we want to model rhythmic movements, where we use normalized Von-Mises basis functions that are periodic in the phase, or stroke-based movements, where we use normalized Gaussian basis functions,

$$\boldsymbol{\Phi}\_{i}^{\rm G}(\boldsymbol{z}) = \exp\left(-\frac{(\boldsymbol{z}\_{t} - \boldsymbol{c}\_{i})^{2}}{2h}\right),$$

$$\boldsymbol{\Phi}\_{i}^{\rm VM}(\boldsymbol{z}) = \exp\left(h\cos\left(2\pi\left(\boldsymbol{z}\_{t} - \boldsymbol{c}\_{i}\right)\right)\right). \tag{8}$$

The parameter *h* defines the width of the basis and *ci* the center for the *i*th basis function. We normalize the basis functions φ*<sup>i</sup>* with ψ*i*(*zt*) = φ*i*(*z*)/ *<sup>j</sup>* φ*j*(*z*).

### **3.2. NEW PROBABILISTIC OPERATORS FOR MOVEMENT PRIMITIVES**

The probabilistic formulation of MPs enables us to use new probabilistic operators on our movement primitive representation. Adaptation of the movement can be accomplished by conditioning on desired positions or velocities at time step *t*. Co-activation and blending of MPs can be implemented as as product of two trajectory distributions.

### *3.2.1. Adaptation of the building blocks by conditioning*

For efficient adaptation, our building blocks should support the modulation of hyper-parameters of the movements such as the desired final joint positions or the joint positions at given via-points. For example, DMPs allow for the adaptation of the final position by modulation of the point attractor of the system. However, how the final position modulates the trajectory is hard-coded in the DMP-framework and can not be learned from data. This adaptation mechanism might violate other task constraints.

In our probabilistic formulation, such adaptation operations can be described by conditioning the MP to reach a certain state **y**∗ *<sup>t</sup>* at time *t*. Conditioning can be performed by adding a new desired observation **xt** = **y**∗ *<sup>t</sup>* ,<sup>∗</sup> *y* to our probabilistic model where **y**<sup>∗</sup> *<sup>t</sup>* represents the desired position and velocity vector at time *t* and <sup>∗</sup> *<sup>y</sup>* specifies the accuracy of the desired observation. By applying Bayes theorem, we obtain a new distribution over **w**, i.e., *p*(**w**|**x**<sup>∗</sup> *<sup>t</sup>* ) ∝ *N* **y**∗ *<sup>t</sup>* | *T <sup>t</sup>* **<sup>w</sup>**,<sup>∗</sup> *y p*(**w**). As *p*(**w**|θ ) is Gaussian, the conditional distribution *p* **w**|**y**<sup>∗</sup> *t* is also Gaussian and can be computed analytically

$$\mu\_{\mathbf{w}}^{\{\text{new}\}} = \mu\_{\mathbf{w}} + \Sigma\_{\mathbf{w}} \Psi\_t \left(\Sigma\_{\mathbf{y}}^\* + \Psi\_t^T \Sigma\_{\mathbf{w}} \Psi\_t\right)^{-1} \left(\mathbf{y}\_t^\* - \Psi\_t^T \mu\_{\mathbf{w}}\right), (9)$$

$$
\Delta\_{\mathbf{w}}^{\{\text{new}\}} = \boldsymbol{\Sigma}\_{\mathbf{w}} - \boldsymbol{\Sigma}\_{\mathbf{w}} \boldsymbol{\Psi}\_{t} \left(\boldsymbol{\Sigma}\_{\mathcal{V}}^{\*} + \boldsymbol{\Psi}\_{t}^{T} \boldsymbol{\Sigma}\_{\mathbf{w}} \boldsymbol{\Psi}\_{t}\right)^{-1} \boldsymbol{\Psi}\_{t}^{T} \boldsymbol{\Sigma}\_{\mathbf{w}}.\tag{10}
$$

We illustrated conditioning a ProMP to different target states in **Figure 7A**. As we can see, the modulation of a target state is also learned from demonstration, i.e., the ProMP will choose a new trajectory distribution that goes through the target state, and, at the same time, is similar to the learned trajectory distribution.

### *3.2.2. Combination and blending by multiplying distributions*

In our probabilistic representation, a single MP represents a whole family of movements. Co-activating two MPs should return a new set of movements which are contained in both MPs. Such operation can be performed by multiplying two distributions. We also want to weight the activation of each primitive *oi* by a time-varying activation factor α*i*(*t*), for example, to continuously blend the movement execution from one primitive to the next. The activation factors can be implemented by taking the distributions of the individual primitives to the power of α*i*(*t*). Hence, the co-activation of ProMPs yields *p*<sup>∗</sup> (τ ) ∝ *t i pi*(**y***t*)α*i*(*t*) .

For Gaussian distributions *pi* **y***t* = *N* **<sup>y</sup>***t*|μ[*i*] *<sup>t</sup>* ,[*i*] *t* , the resulting distribution *p*∗(**y***t*) is again Gaussian and we can obtain its mean *μ*<sup>∗</sup> *<sup>t</sup>* and variance <sup>∗</sup> *<sup>t</sup>* analytically with variance and mean

$$\begin{aligned} \mathbf{E}\_t^\* &= \left(\sum\_i \left(\mathbf{E}\_t^{[i]} / \alpha\_i(t)\right)^{-1}\right)^{-1}, \\ \boldsymbol{\mu}\_t^\* &= \left(\mathbf{E}\_t^\*\right)^{-1} \left(\sum\_i \left(\mathbf{E}\_t^{[i]} / \alpha\_i(t)\right)^{-1} \boldsymbol{\mu}\_t^{[i]}\right). \end{aligned} \tag{11}$$

Both terms are required to obtain the stochastic feedback controller that is finally used to control the robot. We illustrated co-activating two ProMPs in **Figure 7B** and blending of two ProMPs in **Figure 7C**.

### **3.3. USING TRAJECTORY DISTRIBUTIONS FOR ROBOT CONTROL**

In order to use a trajectory distribution *p*(τ |θ ) for robot control, we have to obtain a controller which can exactly reproduce the given distribution. As we show in Paraschos et al. (2013), such controller can be obtained in closed form if we know the system dynamics **y**˙ = *f*(**y**, **u**) + *<sup>y</sup>* of the robot1 . We model the controller as time-varying stochastic linear feedback controller, i.e., **u***<sup>t</sup>* = **k***<sup>t</sup>* + **K***t***y***<sup>t</sup>* + *u*, where **k***<sup>t</sup>* denotes the feed-forward gains, **K***<sup>t</sup>* the feedback gains and *<sup>u</sup>* ∼ *N* (**0**,*u*) the controller noise. Hence, the controller is determined by **k***t*, **K***<sup>t</sup>* and*<sup>u</sup>* for each time point. All these terms can be obtained analytically by predicting the distribution *p*model(**y***t*<sup>+</sup>dt) from *p*(**y***t*|θ ) with the known model of the system dynamics and subsequently matching the moments of *p*(**y***t*+dt|θ ) and the moments of the predicted distribution *p*model(**y***t*<sup>+</sup>dt). The resulting controller exactly reproduces the given trajectory distribution *p*(τ |θ ) (Paraschos et al., 2013).

While the ProMP approach has many similarities to the approach introduced in Rozo et al. (2013) by Calinon and colleagues, there are also important differences to this approach. They also learn a trajectory distribution which is modeled with a GMM, where the output variables are the joint angles and the time step *t*. The probability for the joint angles at time step *t* is then obtained by conditioning on *t*. However, it is unclear how to condition on being at a certain state **q**<sup>∗</sup> *<sup>t</sup>* at time step, which is very different then just conditioning on being in time step *t*. In this case, the mixture components need to be changed such

<sup>1</sup>Alternatively, we can assume that we use inverse dynamics control on the robot, and, hence, the idealized dynamics of the robot are given by a linear system. Such an approach is, for example, followed by the DMPs that also assumes that the underlying dynamical system, that represents the robot, is linear.

**FIGURE 7 | (A)** Conditioning on different target states. The blue shaded area represents the learned trajectory distribution. We condition on different target positions, indicated by the "x"-markers. The produced trajectories exactly reach the desired targets while keeping the shape of the demonstrations. **(B)** Combination of two ProMPs. The trajectory distributions are indicated by the blue and red shaded areas. Both primitives have to reach via-points at different points in time, indicated by the "x"-markers. We co-activate both primitives with the same activation factor. The trajectory distribution generated by the resulting feedback controller now goes through all four via-points. **(C)** Blending of two ProMPs. We smoothly blend from the red primitive to the blue primitive. The activation factors are shown in the bottom. The resulting movement (green) first follows the red primitive and, subsequently, switches to following the blue primitive.

that the trajectory distribution passes through **q**<sup>∗</sup> *<sup>t</sup>* at time step t. How to implement this change with a GMM is an open problem. Note that the ProMP approach is very different from a GMM. It uses a linear basis function model and learns the correlation of the parameters of the basis functions for the different movements. Time is not modeled as random variable but as conditional variable right away. Due to the learned correlations, we can condition on reaching **q**<sup>∗</sup> *<sup>t</sup>* at time step *t* and the trajectory distribution smoothly passes through **q**<sup>∗</sup> *<sup>t</sup>* with high accuracy.

Furthermore, a trajectory distribution alone is not sufficient to control a robot as it requires a feedback controller that determines the control actions. How to obtain this feedback controller from the trajectory distribution is based on heuristics in Rozo et al. (2013). I.e., when we apply the feedback controller on the real robot, we will not reproduce the learned trajectory distribution. The produced trajectory distribution might be similar, but we do not know how similar. Therefore, for all operations performed on the trajectory distributions (i.e., a combination of distributions by a product), it is hard to quantify the effect of this operation on the resulting motions that are obtained from the heuristic feedback controller. In contrast, the ProMPs come with a feedback controller that exactly matches the trajectory distribution. Hence, for a combination of distributions, we know that the feedback controller will exactly follow the product of the two distributions.

# *3.3.1. Experimental evaluation of the combination of objectives at different time-points*

In this task, a seven link planar robot has to reach different target positions in end-effector space at the final time point *tT* and at a via-point *tv*. We generated the demonstrations for learning the MPs with an optimal control law, (Toussaint, 2009) and adding noise to the control outputs. In the first set of demonstrations, the robot reached a via-point at *t*<sup>1</sup> = 0.25 s with its end-effector. We used 10 normalized Gaussian basis functions per joint, resulting in a 70-dimensional weight vector. As we learned a single distribution over all joints of the robot, we can also model the correlations between the joints. These correlations are required to learn to reach a desired via-point in task space. The reproduced behavior with the ProMPs is illustrated in **Figure 8** (top). The ProMP exactly reproduced the via-points in task space. Moreover, the ProMP exhibited the same variability in between the time points of the via-points. It also reproduced the coupling of the joints from the optimal control law, which can be seen by the small variance of the end-effector in comparison to the rather large variance of the single joints at the via-points. We also used a second set of demonstrations where the first via-point was located at time step *t*<sup>2</sup> = 0.75, which is illustrated in **Figure 8** (middle). We co-activated the ProMPs learned from both demonstrations. The robot could accurately reach both via-points at *t*<sup>1</sup> = 0.25 and *t*<sup>2</sup> = 0.75, see **Figure 8** (bottom).

# *3.3.2. Experimental evaluation of the combination of simultaneous objectives - robot hockey*

In this task, the robot again has to shoot a hockey puck in different directions and distances. The task setup can be seen in **Figure 9A**. We record two different sets of demonstrations, one that contains straight shots with varying distances, while the second set contains shots with a varying shooting angle and almost constant distance. Both data sets contained ten demonstrations each.

at different time steps (black) and samples generated by the ProMP (gray).

bottom. The resulting movement reached both via-points with high accuracy.

**FIGURE 9 | Robot Hockey.** The robot shoots a hockey puck. The setup is shown in **(A)**. We demonstrate ten straight shots for varying distances and ten shots for varying angles. The pictures show samples from the ProMP model for straight shots **(B)** and angled shots **(C)**. Learning from combined data set yields a model that represents

variance in both, distance and angle **(D)**. Multiplying the individual models leads to a model that only reproduces shots where both models had probability mass, in the center at medium distance **(E)**. The last picture shows the effect of conditioning on only left or right angles, the robot does not shoot in the center any more **(F)**.

Sampling from the two models generated by the different data sets yields shots that exhibit the demonstrated variance in either angle or distance, as shown in **Figures 9B,C**. When combining the data sets of both primitives and learning a new primitive, we get a movement which exhibits variance in both dimensions, i.e., angle and distance, see **Figure 9D**. When the two individual primitives are combined by a product of MPs, the resulting model shoots only in the center at medium distance, i.e., the intersection of both MPs, see **Figure 9F**.

In this section, we present two experiments that we performed with the ProMP approach. As we focused on the representation of the individual building blocks, we evaluated the new representation without the use of reinforcement learning and learned the ProMPs by imitation. In our experiments, we illustrate how to use conditioning as well as co-activation of the building blocks.

# **4. CONCLUSION AND FUTURE WORK**

Using structured, modular control architectures is a promising concept to scale robot learning to more complex real-world tasks. In such a modular control architecture, elemental building blocks, such as movement primitives, need to be adapted, sequenced or co-activated simultaneously. In this paper, we presented a unified data-efficient policy search framework that exploits such control architectures for robot learning. Our policy search framework can learn to select, adapt and sequence parametrized building blocks such as movement primitives while coping with the main challenges of robot learning, i.e., high dimensional, continuous state and action spaces and the high costs of generating data. Moreover, we presented a new probabilistic representation of the individual building blocks which show several beneficial properties. Most importantly, they support efficient and principled ways of adapting a building block to the current situation and we can co-activate several of these building blocks.

Future work will concentrate on integrating the new ProMP approach into our policy search framework. Interestingly, the upper-level policy would in this case directly specify the trajectory distribution. The lower level control policy is automatically given by this trajectory distribution. We will explore to incorporate the co-activation of individual building blocks also in our policy search framework. Additional future work will concentrate on incorporating perceptual feedback into the building blocks and using more complex hierarchies in policy search.

# **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 22 November 2013; accepted: 21 May 2014; published online: 11 June 2014. Citation: Neumann G, Daniel C, Paraschos A, Kupcsik A and Peters J (2014) Learning modular policies for robotics. Front. Comput. Neurosci. 8:62. doi: 10.3389/fncom. 2014.00062*

*This article was submitted to the journal Frontiers in Computational Neuroscience. Copyright © 2014 Neumann, Daniel, Paraschos, Kupcsik and Peters. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# MACOP modular architecture with control primitives

# *Tim Waegeman\*, Michiel Hermans and Benjamin Schrauwen*

*Department of Electronics and Information Systems, Ghent University, Ghent, Belgium*

### *Edited by:*

*Andrea D'Avella, IRCCS Fondazione Santa Lucia, Italy*

### *Reviewed by:*

*Florentin Wörgötter, University Goettingen, Germany Andrea D'Avella, IRCCS Fondazione Santa Lucia, Italy*

### *\*Correspondence:*

*Tim Waegeman, Department of Electronics and Information Systems, Ghent University, Sint-Pietersnieuwstraat 41, Ghent, B9000 East-Flanders, Belgium e-mail: tim.waegeman@ugent.be*

Walking, catching a ball and reaching are all tasks in which humans and animals exhibit advanced motor skills. Findings in biological research concerning motor control suggest a modular control hierarchy which combines movement/motor primitives into complex and natural movements. Engineers inspire their research on these findings in the quest for adaptive and skillful control for robots. In this work we propose a modular architecture with control primitives (MACOP) which uses a set of controllers, where each controller becomes specialized in a subregion of its joint and task-space. Instead of having a single controller being used in this subregion [such as MOSAIC (modular selection and identification for control) on which MACOP is inspired], MACOP relates more to the idea of continuously mixing a limited set of primitive controllers. By enforcing a set of desired properties on the mixing mechanism, a mixture of primitives emerges unsupervised which successfully solves the control task. We evaluate MACOP on a numerical model of a robot arm by training it to generate desired trajectories. We investigate how the tracking performance is affected by the number of controllers in MACOP and examine how the individual controllers and their generated control primitives contribute to solving the task. Furthermore, we show how MACOP compensates for the dynamic effects caused by a fixed control rate and the inertia of the robot.

**Keywords: reservoir computing, echo state networks, motor primitives, movement primitives, motor control, MOSAIC, robot control**

# **1. INTRODUCTION**

Catching a ball, reaching for a cup of coffee and drawing a figure on a blackboard are all tasks in which humans exhibit advanced motor control. We are able to perform such tasks robustly and adaptively, constantly anticipating an uncertain environment. Robots are most commonly used in environments which are fully deterministic, and are programmed in such a way that all possible situations are foreseen by the engineer. However, inspired by humans and biology in general, more and more techniques emerge in which robots can be used in dynamic environments without explicitly defining a set of rules to achieve advanced and adaptive motor control. The study of motor skills in nature also has sparked the interest of modular representations in both planned and actual motor commands. For instance, research performed on frogs (Bizzi et al., 1991) showed that electrical microstimulation of different areas of the lumbar cord generated distinct types of force fields in the frog's isometric leg movement. Similar research on frogs (Mussa-Ivaldi et al., 1994; Kargo and Giszter, 2000) and rats (Tresch and Bizzi, 1999) has shown that simultaneous stimulation of such areas result in a superposition of the separate recorded force fields, suggesting a modular control system. In Mussa-Ivaldi and Bizzi (2000), this work has been extended to the planning of limb movements and how to transform this planning into a sufficient set of motor commands.

Instead of using invasive and/or stimulation techniques to investigate the existence of a modular control system, researchers (d'Avella et al., 2003) also developed methods to find out if a large set of natural movements is the result of a combination of a limited set of *motor primitives*, solely based on muscle activity observations. By measuring such activations with Electromyography (EMG) and applying a decomposition technique over multiple EMG recordings, they found primitive representations, called synergies. These experiments were first conducted on frogs and later on humans (Hart and Giszter, 2004; Cheung et al., 2005; d'Avella and Bizzi, 2005).

Also on the behavioral level it has been demonstrated that humans try to follow mental templates of motion when executing a task (Bernstein, 1967). The presence of these mental templates or *movement primitives* can also be detected as velocity bumps (Doeringer and Hogan, 1998) during online movement corrections. A more detailed overview of primitives at the neural, dynamic and kinematic level can be found in Flash and Hochner (2005).

The idea of movement primitives also inspired research in robotics (Schaal et al., 2003). In Muelling et al. (2010) they demonstrate a robot that learned to play table tennis based on a set of primitives learned by imitating human table tennis movements. In Schaal et al. (2005), a flexible and reactive framework for motor control was presented which uses dynamic movement primitives (DMPs) (Schaal, 2006). This framework showed to be useful in the generation of walking motion of a biped based on oscillating DMPs or the generation of the swimming and walking motions of a salamander robot (Ijspeert et al., 2007) when DMPs are used as central pattern generator.

Most of these approaches define the primitives as oscillators or learned movements, and learn how to adapt them to get the desired objective. In this work, however, we take a different approach in which a similar decomposition emerges naturally, such that their combination converges to the objective. For this, we inspired our work on MOSAIC (modular selection and identification for control), originally proposed by Wolpert and Kawato (Wolpert and Kawato, 1998; Haruno et al., 2001) which suggest a feasible strategy on how a human motor control system learns and adapts to novel environments. For instance, when moving an empty cup or one filled with coffee. As both objects have different dynamics, MOSAIC learns a different controller for each object and assigns a "responsibility"-function, which allows for smooth switching between the controllers' individual contributions. When a new object is introduced, MOSAIC will generalize, by combining the contribution of each controller. To determine which controller should be used, each controller contains a forward model that predicts the next state of the object based on the previous control commands. If a controller's forward model is predicting well compared to the others, that controller is used. This architecture, however, can not be related to the idea of movement or motor primitives, as the number of controllers roughly depends on the number of objects, such that when handling a known object, a single controller's output is used.

Many variants of MOSAIC have been studied. The original implementation uses a gradient based method and later hidden Markov models. In Lonini et al. (2009), an alternative architecture based on locally weighted projection regression (LWPR) (Vijayakumar and Schaal, 2000) was presented which allows a better incremental learning of new tasks. Another approach (Oyama et al., 2001) uses a separate performance prediction network to determine which module should be used to learn the inverse kinematics of an arm. Likewise, in Nguyen-Tuong et al. (2009), they propose a localized version of Gaussian Process Regression in which a different model is trained for different regions in task-space.

The *modular architecture with control primitives* (MACOP) which we propose in this work is also inspired on MOSAIC. However, we want to build upon the idea of using a limited set of controllers whose contributions are continuously combined to produce the desired objective. Each controller's contribution should be mixed in a manner which permits all controllers to contribute to the objective, while still allowing for each controller to specialize in a part of the task. Due to the similarity with motion/motor primitives we will call the contributions of the individual controllers *control primitives*. We will provide a more detailed explanation on the similarities and differences with the common notion of primitives in the Discussion section of this paper.

Based on some interesting observations we omit the use of forward models (unlike MOSAIC) and use a simple heuristic to achieve the desired mixing mechanism which determines the "responsibility". The controllers are constructed from Echo State Networks (ESNs) (Jaeger, 2001) which are inherently dynamic and therefore provide a natural platform for modeling and controlling a dynamic system such as a robot arm. After giving a description of MACOP we validate it by letting it learn the inverse kinematics (IK) of a 6 degrees-of-freedom (DOF) robot arm and we analyze its behavior. In the discussion we will address the following key points and differences with other techniques:


# **2. MATERIALS AND METHODS**

# **2.1. DIFFICULTY OF CONTROLLING AN UNKNOWN SYSTEM**

When controlling a system we want to find a way to control the states of that system by changing its input without actually knowing how the system internally works. All we can do is observe how the system responds to a certain input. Consider for instance the scenario in which we have a student trying to manipulate a cup of coffee with a robot arm. The actions that are under direct control of the student are the joint-torques, and this at a fixed control-rate. The variables that the student wishes to control, however, represents the trajectory of the cup of coffee. The student needs to use trial and error to learn to produce the desired result. In this process, he or she will need to implicitly learn the connection between his or her actions and the resulting trajectory of the cup. The main difficulties concerning this task can be summarized as:


Similar to this scenario, the system we use is a multi-jointed robot arm that needs to trace out a desired trajectory in its task-space.

# **2.2. ROBOT ARM PLATFORM**

In this work we perform all our experiments on a dynamic Webots simulation model of the PUMA 500 robot arm which has 6 DOF. This numerical model allows us to apply joint-angles and measure its actual values in the dynamical environment of the robot. We interact with this simulation model every 32 ms (default Webots configuration), which means that between every sample we take, 32 ms passed in the simulation environment, regardless of the computation time needed for the proposed algorithm. We control the joint-angles of the robot arm which are converted to joint-torques by PID (proportional-integral-derivative) controllers. Each joint is equipped with an encoder which allows us to measure the actual joint-angles. Additionally, the simulation environment provides us with the euclidean end-effector position of the robot arm.

# **2.3. MACOP**

As mentioned before, controlling a system such as a robot arm poses some difficulties when the internal mechanisms of the robot are unknown. Although classical kinematic models are known for most commercial robots available, the increasing use of soft materials with passive compliant properties requires an adaptive modeling approach. Often a learning algorithm is used to create a model of the robot such that the model exhibits the same behavior (outcome) when perturbed by the same inputs (actions), which is called a forward model. By using for instance a neural network as model, the known structure of this network can be exploited to calculate a gradient which can be used to determine the actions that are needed to change the outcome as desired. Another approach is to learn an inverse model which maps a desired outcome to an action. However, learning such a model is difficult when the correct actions to a certain outcome are unknown. A more detailed overview on training and using such models can be found in Jordan and Rumelhart (1992).

Often the modeling complexity of a control problem can be reduced by decomposing the problem into less complex parts. For instance, if we again consider the cup lifting scenario for which we want to learn a model. This model will approximate the underlying dynamics of the particular task. If we extend this problem to lifting other objects, the resulting model needs to be able to approximate a single function that includes both tasks with different dynamics each.

One of the approaches that can solve such problems is called MOSAIC (Wolpert and Kawato, 1998; Haruno et al., 2001), which suggest a feasible strategy on how a human motor control system learns and adapts to new dynamic characteristics of the environment. MOSAIC learns a different controller for each different task, and uses a "responsibility"-function to decide which controller will be used, while still allowing for smooth switching between the controllers individual actions. When introducing a new task, MOSAIC will generalize, by combining the actions of each controller. To determine each controller's "responsibility", every controller contains a forward model that predicts the next state of the object based on the previous control actions. If a controller's forward model is more accurate compared to the others, that controller's inverse model is trained further with the new observations and used to control the robot arm.

One potential weakness of this approach is that the performance of a forward model is not necessarily a good indicator of the modeling performance of the inverse model. To confirm this we have tried an approach directly based on MOSAIC. For each controller, we trained a reservoir (see section 2.4.2) to serve as both an inverse and forward model at the same time. We found that all forward models had initially roughly the same prediction error, leading to an equal responsibility factor for each controller as a result. During the training phase, however, small variations in performance error influenced both the training of the forward and the inverse models. Eventually this always causes one controller to be fully responsible at all times, making the other controllers redundant. These findings confirm the observations made in Haruno et al. (2001) even though in the original MOSAIC setup, the inverse and forward models are completely separate from each other, meaning that there is no relation at all between the modeling performance of the inverse and forward model. Based on these experiments one can argue that the responsibility of a controller is fully determined by noise on the forward modeling performance of a controller. Any other controller selection mechanism might thus be as useful as the one used by MOSAIC.

Therefore we propose a Modular Architecture for Control with Primitives (MACOP) which is inspired on MOSAIC, and depicted in **Figure 1**. Instead of using both an inverse and forward model, we only use inverse models to produce actions for our robot arm. Determining when a controller (inverse model) should contribute to the task is learned unsupervised given the robot's state and some desired mixing properties. This can be related to a Kohonen map (Kohonen, 1998).

# *2.3.1. Controller selection*

As depicted in **Figure 1**, the actual control signal is a weighted sum of the outputs of a limited number of controller outputs. The weight (or scaling) factors (which are the equivalent of MOSAIC's "responsibility"), depend on observable properties of the robot. Each controller learns to control the robot arm by creating an inverse robot model. Simultaneously, the mixing mechanism is trained, also online. In order for a controller to distinguish itself from others, the rate at which each controller is trained will be modulated according to its corresponding responsibility. This will be explained in more detail when we describe the operation of a single controller.

Suppose we have *Nc* controllers. We denote the output of the *i*-th controller as **x***i(t)*. The controlled joint-angles **x***(t)* are then given by:

$$\mathbf{x}(t) = \sum\_{i=1}^{N\_\ell} \boldsymbol{\xi}\_i(t)\mathbf{x}\_i(t),\tag{1}$$

where ζ*i(t)* is the scaling factor, or "responsibility" of a controller, which decides how much each controller is expressed in the final control signal. Ideally, we would like to let ζ*i(t)* express the momentary accuracy of each controller. For example, if each controller is randomly initialized before training, certain controllers may be better than others when the arm is near a certain pose, and we would like to use the ζ*<sup>i</sup>* to scale up the control signal of these controllers, and suppress that of the others. In reality, however, we cannot directly measure the accuracy of each individual controller, as the robot is driven with the weighted sum of the control signals, and not the individual controllers.

Therefore, we will apply a different strategy. We will introduce a way in which the scaling factors will automatically start to represent local parts of the operating regime of the robot, and next we will specialize the associated controllers to be more accurate within this local area.

We wish for ζ*i(t)* to only depend on the current end-effector position **<sup>y</sup>***(t)* and the measured joint-angles **x***(t)*, both of which are observable properties of the robot arm. As each controller will attempt to learn an inverse model, the combined control signal will need to be of the same magnitude as the individual control signals. Therefore, we will make sure that the scaling factors are always positive, and always sum to one:

$$\sum\_{i=1}^{N\_\ell} \xi\_i(t) = 1 \quad \text{and} \quad 0 \le \xi\_i(t) \le 1. \tag{2}$$

Both these qualities can be ensured if ζ*<sup>i</sup>* is calculated by a *softmax* function. First we use a linear projection from the joint-angles and end-effector position to a vector **r**:

$$\mathbf{r}(t) = \begin{bmatrix} r\_1(t) \\ r\_2(t) \\ \vdots \\ \vdots \\ r\_{N\_\epsilon}(t) \end{bmatrix} = \mathbf{V}(t) \begin{bmatrix} \mathbf{y}(t) \\ \widehat{\mathbf{x}}(t) \end{bmatrix},\tag{3}$$

Next, we compute the softmax function.

$$\chi\_i(t) = \frac{\exp(r\_i(t))}{\sum\_{j=1}^{N\_\epsilon} \exp(r\_j(t))},\tag{4}$$

The projection matrix **<sup>V</sup>***(t)* is a matrix of size *Nc* by *<sup>N</sup>***<sup>y</sup>** <sup>+</sup> *<sup>N</sup>***x**, where *<sup>N</sup>***<sup>y</sup>** and *<sup>N</sup>***<sup>x</sup>** are the dimensions of **<sup>y</sup>***(t)* and**x***(t)*, respectively. Dependence on time comes from the fact that, like all parameters, **V** is trained online.

**V***(t)* is randomly initialized, with elements drawn from a normal distribution *N (*0*,* 0*.*1*)*. It will determine how the responsibilities are distributed. We need to train **V***(t)* in such a way that MACOP learns to generate the target trajectories by mixing all contributions as we desire. In this work we wish to obtain the following two qualitative properties:


Based on these two desired properties, we will construct a learning algorithm for training **V***(t)*. The first property of our mixing mechanism can be achieved by gradually increasing the magnitude of **V**. This results in a more strongly peaked distribution for the scaling factors. This can be understood by looking at the limit situations. If all elements of **V***(t)* are zero, all scaling factors are equal. If the magnitude goes to infinity, the softmax function will be equal to one for the highest element, and zero for all others. Controlling the magnitude of **V***(t)* allows us to make a smooth transition between these extremes. We chose to increase the magnitude of **V***(t)* linearly each time step by adding a small increment, equal to **V***(t)* divided by its Frobenius norm.

The second mixing property requires that all controllers contributes significantly to the robot motion. The manner in which we chose to do this was to suppress the scaling of the momentary maximal scaling factor. This ensures that no single scaling factor can remain dominant for a long time. Suppressing one scaling factor automatically scales up the others, such that in the end none of the scaling factors remains very small at all times. In order to train **V***(t)* to get this effect, we need to set target values for the ζ*i(t)* at each time step. We set the target value of the highest ζ*i(t)* equal to *<sup>N</sup>*−<sup>1</sup> *<sup>c</sup>* (which would be the long-term time average of all scaling factors if they all contribute equally). At the same time we have to make sure that the sum of the target values is equal to one (i.e., a target that can be reached by a softmax function). To obtain this, the target values of the other ζ*i(t)* are equal to themselves, scaled up to ensure that the sum of the targets equals one. If we denote the target value for ζ*i(t)* as θ*i(t)*, we can write

$$\theta\_i(t) = \begin{cases} h(t)\xi\_i(t), & \text{if } i \neq \operatorname{argmax}(\xi\_i(t)) \\ \frac{1}{N\_\epsilon} & \text{if } i = \operatorname{argmax}(\xi\_i(t)) \end{cases},\tag{5}$$

with

$$h(t) = \frac{1 - N\_c^{-1}}{1 - \max(\xi\_i(t))}.\tag{6}$$

To train **V** according to these target values, we calculate the gradient of the cross-entropy <sup>1</sup> *H(*θ*i,*ζ*i)* with respect to **V**. For both desired properties we have defined an update rule and each time step we add up both contributions, resulting in the following update rule:

$$\mathbf{V}(t+1) = \mathbf{V}(t) + \eta\_{\boldsymbol{\xi}} \frac{\mathbf{V}(t)}{||\mathbf{V}(t)||\_{F}} + \eta\_{\boldsymbol{s}} [\mathbf{y}^{\mathsf{T}}(t), \widehat{\mathbf{x}}^{\mathsf{T}}(t)](\boldsymbol{\xi}(t) - \boldsymbol{\theta}(t)), \tag{7}$$

where **ζ***(t)* and **θ***(t)* are column vectors with the responsibility factors and their targets, respectively, and η*<sup>g</sup>* and η*<sup>s</sup>* are two learning rates. In order to prevent one mixing property to dominate the other, we set these learning rates such that both properties are present. Left on its own, Equation (7) never converges. What will happen is that the magnitude of **V** slowly keeps on increasing, and in the long term, the scaling factor distribution will become highly peaked (at each moment, one will be close to one, the others close to zero). Therefore, during all our experiments, unless mentioned differently, we calculate the root mean-square-error (RMSE) between the desired and the measured end-effector position <sup>2</sup> over a moving time-window of 1000 samples. When this RMSE becomes smaller than 1 mm we start to linearly decrease both η*<sup>g</sup>* and η*<sup>s</sup>* over the course of 5000 samples until they reach 0. After this point the elements of **V** no longer change.

### **2.4. SINGLE CONTROLLER**

As described before, we will use inverse models for controlling our multi-jointed robot arm. Rather than finding the mapping from joint-angles (actions) to end-effector position (outcome), we will approximate the inverse mapping: from outcome to action. Essentially, this means that we train a model to directly provide us with the correct joint-angles for any given desired end-effector position.

### *2.4.1. General setup*

As described before, the end-effector position (outcome) is denoted by **y***(t)*, and the joint-angles (the actions) by **x***(t)*. We assume that we can train a model (model A in **Figure 2**) to approximate the past joint-angles **x***(t* − δ*)*, δ being a fixed delay

**FIGURE 2 | Schematic representation of a single controller.** Model A and B are identical at every moment in time, but receive different input signals. The optional limiter limits the values **x***(t)* to a desired range which, for example, represent imposed motor characteristics. Afterwards, the limited values **x**˜*(t)* excite the plant (robot in this work). The signal **x**˜*(t* − δ*)* is the desired output which model A is trained to generate from the plant output, i.e., it learns the inverse model. This inverse model is then simultaneously employed as a controller to drive the plant (model B), which receives a desired future plant state **y***<sup>d</sup> (t* + δ*)* as input, instead of the actual one.

period, given that it receives the current and the delayed endeffector position **y***(t)* and **y***(t* − δ*)*, respectively. This part of the control mechanism is the inverse model.

Simultaneous with training the inverse model, we use it as a controller. In order to do this, we use an identical copy of the model (model B in **Figure 2**), which has as input the current endeffector position **y***(t)*, and a desired future end-effector position **y***d(t* + δ*)*. This model, given that the inverse model performs sufficiently well, provides the required joint-angles **x***(t)* to reach the set target after the delay. For some plants it might be necessary to limit these values to a certain range. For instance, when controlling a joint, the angle in which it can be positioned is bounded. In **Figure 2** this bounding is represented by a limiter which converts **x***(t)* values to **x**˜*(t)*.

In general, the optimal δ depends on the rate at which the dynamics are observed (sample rate) and the kind of dynamics (fast or slow) that are inherent to the control task. Plants (the system under control) with fast dynamics usually require a smaller δ than slower dynamical systems when using the same sample rate. In this work we chose δ = 1, because one time step delay is sufficient to capture the dynamics of the task at hand. More details concerning this parameter can be found in Waegeman et al. (2012).

In order to train the inverse model we will use online learning, i.e., the inverse model is trained during operation. Initially the untrained inverse model will not be able to provide the desired joint-angles, and the driving signal **x***(t)* is essentially random. Model A will use the resulting end-effector position to learn what signal was provided by model B. The random driving signal will assure that in this initial training phase model A is provided with a sufficiently broad set of examples. However, to further improve

<sup>1</sup>Here we treat the scaling factors and their target values as if they were probabilities, which stems from the common use of a softmax function: to model a multinomial distribution function (Bishop and Nasrabadi, 2006). We could as well use mean-square error to train **V***(t)* on the target values, but the resulting update equations would be more complicated, whereas cross-entropy leads to a simple formula.

<sup>2</sup>This is the average distance, such that we can express RMSE in millimeter or centimeter.

exploration and to speed up training in the initial training phase, we add a small amount of noise [initially drawn from *N (*0*,* 7*)* in mm] to the desired end-effector position, of which the standard deviation linearly diminishes to 0 over the course of 50*,* 000 samples of training. As soon as model A becomes sufficiently accurate, model B will begin providing the actions for obtaining the desired end-effector positions. In this phase, the inverse model will learn to become especially accurate in the operating regime in which the robot provides the resulting end-effector positions.

In principle, there is no need to ever stop training the inverse model in the proposed controller. Indeed, if the experimenter knows that the conditions of the setup may change over time, it could be desirable to keep the online learning mechanism active at all times in order to let it keep track of changes in the system. For this paper, however, we chose to gradually slow down the learning algorithm and at some point in time let it stop, such that all parameters in the controller architecture remain fixed during testing. More details are provided in section 2.4.4.

# *2.4.2. Echo state networks*

We use an *Echo State Network* (ESN) (Jaeger, 2001) as inverse model. An ESN is composed of a discrete-time recurrent neural network [commonly called the *reservoir* because ESNs belong to the class of Reservoir Computing techniques (Schrauwen et al., 2007)] and a linear readout layer which maps the state of the reservoir to the desired output. A schematic overview of this is given in **Figure 3**. An ESN has an internal state **a***(t)* (often referred to as reservoir state), which evolves as follows:

$$\mathbf{a}(t+1) = \tanh\left(\mathbf{W}\_{\mathrm{r}}^{\mathrm{r}}\mathbf{a}(t) + \mathbf{W}\_{\mathrm{i}}^{\mathrm{r}}\mathbf{u}(t) + \mathbf{W}\_{\mathrm{o}}^{\mathrm{r}}\mathbf{o}(t) + \mathbf{W}\_{\mathrm{b}}^{\mathrm{r}}\right). \tag{8}$$

Here, **u***(t)* is the reservoir input signal and **o***(t)* the output produced by the ESN (see below). The weight matrices **W***<sup>k</sup> <sup>g</sup>* represent the connections from *g* to *k* between the nodes of the network, where *r, i, o, b* denote *reservoir, input, output*, and *bias*, respectively. In the case of the controller setup we discussed earlier, the reservoir input **u***(t)* of model A consists of the concatenation of the past and present end-effector positions, and that of model B of the present and desired end-effector position. During our experiments we scale the input and training signal to the ESN such that

**FIGURE 3 | Description of an ESN.** Dashed arrows are the connections which can be trained. Solid arrows are fixed. **W***<sup>k</sup> <sup>g</sup>* is a matrix representing the connections from *g* to *k*, which stand for any of the letters *r, i, o, b* denoting *reservoir, input, output*, and *bias*, respectively. **u***(t)*, **o***(t)* and **a***(t)* represent the input, output, and reservoir states, respectively.

their values are between −1 and 1. Consequently, we will need to undo this scaling before the generated network output represents actual joint-angles which can control the robot.

The ESN output **o***(t)* is generated by:

$$\mathbf{o}(t) = \mathbf{W}\_{\mathbf{r}}^{\diamond}(t)\mathbf{a}(t),\tag{9}$$

i.e., a linear transformation of the reservoir state over **W**<sup>o</sup> <sup>r</sup>*(t)*. The core idea of ESNs is that only this weight matrix is explicitly trained. The other weight matrices have fixed, randomly chosen elements of which only the global scaling is set. As we use a continuously operating online learning strategy, **W**<sup>o</sup> <sup>r</sup>*(t)* is time depending. In the more common ESN setup they are trained offline using, e.g., ridge regression, and remain fixed once they are determined.

Each ESN in this paper is initialized by choosing the elements of **W**<sup>r</sup> r, **W**<sup>r</sup> i , and **W**<sup>r</sup> <sup>b</sup> from a standard normal distribution *N (*0*,* 1*)*. Next, **W**<sup>r</sup> <sup>r</sup> is scaled such that its spectral radius equals one, and the matrices **W**<sup>r</sup> o, **W**<sup>r</sup> <sup>i</sup> and **<sup>W</sup>**<sup>r</sup> <sup>b</sup> are multiplied with a factor 0*.*1 (hand tuned).

The reservoir serves as a random non-linear dynamical system which can extract useful information of its input. Due to its recursion, a reservoir has *fading memory*, i.e., it retains information of the past input signal but gradually forgets it over time and allows ESNs to be used for processing time series.

## *2.4.3. Linear controllers*

In order to check how much the operation of MACOP depends on the type of controller, we also conducted an experiments in which we used linear controllers. Here, the output of the inverse model is a direct linear combination of its input, so no non-linearity or memory is present in these controllers, and learning the nonlinear part of the full inverse kinematics will largely need to be accounted for by training the scaling factors. Here too, we will train the system online, according to the algorithm described in section 2.4.4.

### *2.4.4. Recursive least squares*

In order to train the inverse model online we will use Recursive Least Squares. With each iteration the output weights are adjusted such that the network converges to the desired output. However, the rate at which these weights are changed is controlled by the corresponding responsibility factor ζ*i*. Within the proposed MACOP architecture, such adaptive learning rate allows each controller's inverse model to distinguish itself from the other controllers. Additionally, in order to allow the weights to converge to fixed values, the training speed is modulated with a factor *l(t)*. Because our description of a controller is equal for all controllers within MACOP we will omit the use of the index *i* which refers to the *i*-th controller. We can therefore write the weight update equation as follows:

$$\mathbf{W\_r^o}(t) = \mathbf{W\_r^o}(t-1) - l(t)\zeta(t)\mathbf{e}(t)(\mathbf{Q}(t)\mathbf{a}(t))^\mathsf{T},\qquad(10)$$

where

$$\mathbf{Q}(t) = \frac{\mathbf{Q}(t-1)}{\lambda} - \frac{\mathbf{Q}(t-1)\mathbf{a}(t)\mathbf{a}^T(t)\mathbf{Q}(t-1)}{\lambda(\lambda + \mathbf{a}^T(t)\mathbf{Q}(t-1)\mathbf{a}(t))},\tag{11}$$

and

$$\mathbf{Q}(0) = \frac{\mathbf{I}}{\alpha}.\tag{12}$$

Here, **a***(t)* are the current states, λ a forgetting factor and α an initially chosen value. **Q***(t)* is a running estimate of the Moore– Penrose pseudo inverse *(***a***T***a***)*<sup>−</sup>1, (Penrose, 2008). **Q***(*0*)* denotes the initial value of **Q**. The error **e***(t)* is the difference between the actual and the desired joint-angles. To allow **W<sup>o</sup> <sup>r</sup>***(t)* to converge together with the projection matrix **V** from Equation (7), *l(t)* is decreased linearly from 1 to 0 in the same fashion as the learning rates η*<sup>g</sup>* and η*<sup>s</sup>* in Equation (7), i.e., as long as the average error over the last 1000 time steps is larger than 1 mm, it is equal to one, and as soon as it is smaller, it linearly decreases to zero over the course of 5000 time steps.

# **2.5. ANALYZING MACOP**

Each controller learns to produce a set of joint-angles online, which contribute to solving the IK problem. We define such a set of joint-angles as being a control primitive. Mixing these primitives results in a set of joint-angles to which the robot is positioned. To analyze each controller and its contribution we define several methods which we will describe in the remainder of this section. The results of these analyses can be found in section 3.

# *2.5.1. Tracking a trajectory*

As described above we designed MACOP such that the scaling of a controller depends on the location of the end-effector and the robot's pose. A control primitive that has the largest "responsibility" [biggest ζ*i(t)*], we will call the dominant primitive (generated by the dominant controller). Furthermore, we study the time course of the scaling factors, and how they relate to the motion of the robot. We color the trajectory of the end-effector according to which controller is the dominant one at that position, this in order to show which controllers specialize in which regions of task-space. We show the resulting trajectories using both ESN and linear controllers.

# *2.5.2. Selecting a single controller: control primitives*

Even when the scaling factors strongly fluctuate in time, this does not necessarily mean that the controllers are sending different control signals. Indeed, even though the learning speed is modulated according to the scaling factors, all of them still are trained to perform the same task. In order to verify if specialization indeed occurs, we conduct experiments in which after the training phase, only one controller is used (i.e., we set its scaling factor to 1 and all others to 0). We show the resulting trajectories, and we study individual model performance compared to its true scaling factor.

# *2.5.3. One vs. multiple controllers*

One of the main research questions of this paper is of course how much we can profit from using MACOP versus a single controller. In order to answer this, we measure how well the setup performs as a function of the number of controllers. In order to keep this comparison fair, we make sure that for each setup the number of trainable parameters (the total number of output weights of the ESN) remains constant.

# **2.6. DYNAMIC EFFECTS OF KINEMATIC CONTROL**

An inverse kinematic mapping maps a desired task-space position to the corresponding joint-angles. In most evaluations of learning inverse kinematics, a new command is only sent when the previous desired joint-angles are reached. For this, a direct inverse mapping without memory suffices. However, in our experiments we do not wait for the robot to reach the commanded joint-angles and send new joint-angles at a constant rate (every 32 ms). As shown in **Figure 4**, a PID-controller, which applies the necessary torque to reach a desired jointangle, has a dynamic transition before reaching a new target angle. These dynamic transitions need to be compensated by the controllers as well, in order to reach the desired outcome in time <sup>3</sup> . Such transitions require memory instead of a direct mapping. We evaluate the MACOP's ability to cope with these dynamic effects by changing the *P*-parameter of each joint's PIDcontroller (which determines how fast the robot can react to changes in the desired joint-angles) such that these dynamic effects become more important.

# **3. RESULTS**

All training parameters for the experiment are provided in **Table 1**. The RLS parameters λ and α we chose based on previous experience. The learning rates η*<sup>g</sup>* and η*<sup>s</sup>* we found by trial and error, but we experimentally verified that performance does not change much in a broad region around the provided values.

# **3.1. TRACKING A TRAJECTORY**

To investigate the overall behavior of MACOP, we applied multiple desired trajectories of the robot end-effector. In both

<sup>3</sup>This means that the control signal the robot receives can no longer be simply interpreted as joint-angles, but more as a type of motor commands.

**FIGURE 4 | Illustration of a sudden change in desired joint-angle of the bottom joint (black dashed line) of the PUMA 500.** The robot response is shown for this particular joint with different *P*-parameters, to illustrate the effect of a changed *P* value. By decreasing the *P*-parameter the dynamic transition time from one position to the other increases.

### **Table 1 | Simulation parameters.**


**Figures 5**, **6**, we show the resulting trajectories (after convergence) of following a rectangular and circular-shaped target trajectory. In both experiments an RMSE of 10 cm was achieved within 10,000 samples and a RMSE of 1 mm (point of convergence) within 100,000 samples, demonstrating that the system is able to follow a desired trajectory closely.

In the first experiment, we train the robot to generate a rectangular trajectory which spirals back and forth into the X-direction over several passes. For this we used 5 controllers, each with 50 neurons. In the top panel of **Figure 5** we show the trajectory generated by the robot after convergence (all learning rates are 0 and RMSE = 1 mm). Each part of the trajectory is colored according to which controller is dominant (has the maximum scaling factor) at that time. The responsibilities ζ*i(t)* themselves are shown in the bottom panel of **Figure 5**. It appears that the 5 controllers have formed a specialization for certain regions of task-space, their responsibilities ζ*<sup>i</sup>* peaking at the corresponding parts of the trajectory.

Notice that the depth of the trajectory in the X-direction is rather small (20 cm), and yet the scaling factors strongly vary as a function of it (as is especially apparent in the green and blue parts of the trajectory). This strong change in scaling factor is caused mostly by the pose of the robot, and not so much the end-effector position, as we have verified by experimentally testing the sensitivity of the scaling factors as a function of the joint-angles and position. This suggest that the control architecture effectively uses information of the robot pose to solve the task.

A second experiment (*Nc* = 4, 50 neurons each) extends the difficulty of the previous trajectory to demonstrate responsibility/task-space correlations over a larger time period of the desired movement. The trajectory describes four passes of a circle in a single direction during 8 s, after which the trajectory smoothly switches to describing a shifted circle in the opposite direction of the previous circle.

We show the result after convergence in **Figure 6**. The rotation direction in which the trajectory is followed is indicated by the arrows. In the left part of the circular trajectory the blue controller is contributing a significant part to the control of the robot. This contribution is reduced when the robot is performing the right circular movement. In this part of the trajectory, the red controller is contributing more. In order to verify MACOPs robustness we consider the double circle trajectory after training, and suddenly jump ahead in time in the desired trajectory such that there is a discontinuous jump in the target end-effector position. The result is shown in **Figure 7**. It appears that after a large initial overshoot, the robot recovers and is eventually capable of tracking the

**FIGURE 5 | Top panel:** the resulting end-effector trajectory generated by the robot arm for the rectangular target trajectory after convergence (RMSE *<* 1 mm for the full trajectory). The corresponding color of the dominant controller is shown. **Bottom panel:** the responsibility factors ζ*i(t)* as a function of time.

desired trajectory again. The overshoot can be largely explained by the fact that the controller never saw a discontinuous jump during training, such that it has seen no examples of what happens when large torques are applied on the joints. Furthermore, the sudden jump forces the robot arm into a region of task-space where it never resided during training, causing unpredictable behavior.

After training MACOP (same configuration as before) with the double circle trajectory we define a test grid on the plane of the training trajectory with a resolution of 1 cm. The test target points of this trajectory are visited by sweeping the grid back and forth

in the Z-direction. The result of such an experiment is shown in the bottom panel of **Figure 7**. Each pixel represents the RMSE (in meters) of a specific grid point. Averaged over 10 experiments (different initialization and training) a mean RMSE = 4*.*4 mm with a standard deviation of 3*.*1 mm is achieved. Note that the RMSE in the grid corners are bigger because they are harder to reach.

In the final experiment of this paragraph, we tried a set of linear controllers to see if MACOP is able to still control the robot arm to generate a trajectory with very low-complexity controllers. We found that we need at least 9 controllers to approximate the target trajectory with a final RMSE = 1*.*5 cm. Using MACOP with fewer controllers does not work. **Figure 8** shows the resulting trajectory and the scaling factors of the individual controllers. The fact that MACOP is capable of solving the tracking task with such basic controllers is a strong indicator that the presented training algorithm for the scaling factors is quite successful in distributing the complexity of the full task. It also demonstrates that MACOP can be easily extended to include any kind of inverse model.

### **3.2. ONE VS. MULTIPLE CONTROLLERS**

One of the main assumptions underlying MACOP is that we assume it is beneficial to distribute the full control task over multiple controllers. The first check we performed was to make sure that the mixing using the softmax function is in fact responsible for the increase in performance, and not just having several distinct controllers in the first place. We have tested this by keeping the responsibility factor <sup>ζ</sup>*i(t)* constant and equal to *<sup>N</sup>*−<sup>1</sup> *<sup>c</sup>* for each model, and this in the case of 5 ESN-controllers. It turns out that this situation leads to the same performance that is attained by using a single, large reservoir (which performs worse, as we will show next), showing that the variable responsibility factors directly increases performance.

To investigate how much the tracking performance depends on the number of controllers we conducted an experiment in which we measure the mean error on the trajectory for different number of controllers. We measure the distance from the end-effector to the target averaged over 5000 samples after training. We linearly reduce all learning rates to 0 after 100,000 samples of training (because some experiments will never reach the requirement of an RMSE less than 1 mm). For an increasing number of controllers we apply MACOP on trajectories which are based on all 26 letters in the English alphabet, which we have drawn by hand and of which we recorded position as a function of time. As the trajectory is repeated periodically, we also made sure that the end and starting point are the same in each trajectory (to avoid sudden jumps). After recording we scaled the trajectories and placed them in the YZ-plane in reach of the robot.

In each experiment we train a randomly initialized controller ensemble to produce a single letter, and we measure the RMSE after convergence. For each number of controllers, we measure the average RMSE over 50 instantiations of each letter, such that the measured result for each *Nc* is averaged over 50 × 26 = 1300 experiments. In order to keep the comparison fair, we keep the number of trainable parameters (the total number of output weights of all the ESNs) constant. In practice this means we used 250 neurons for a single controller, 125 for 2 controllers, etc. In **Figure 9** we present the results. A single controller performs rather poorly. The optimal number of controllers for the entire English alphabet is around 6 controllers. When the number of controllers increases further, the number of neurons, and hence the modeling power, of each controller becomes smaller. Similar to the experiment with the linear controller, this experiment shows that great deal of the modeling complexity is covered by the mixing mechanism.

It should be noted that in some cases, due to the random initialization, the robot can get stuck in a certain pose (as some

joint-angles are limited between certain values), and never reach the desired trajectory. This is the reason why the maximum values in **Figure 9** are much bigger than the mean over all its experiments. If we disregard these cases we get an RMSE of 1 mm within 100,000 training samples.

### **3.3. CONTROL PRIMITIVES**

As we have mentioned in the introduction, what we call control primitives in this paper differs from the regular notion of primitives. In this section, we investigate how the actual individual contributions of the controllers behave. Due to the MACOP setup it is not straightforward to get a good understanding of the role of a single controller. At all points in time, all controllers influence the robot, and due to the feedback, each controller influences all other controllers. We can think of two ways in which to study the individual controller contributions. Either we use a single controller in the ensemble (with scaling set to one) for steering the robot, which then ignores potential feedback by the influence of the other controllers, and as such emergent synergies are not expressed. Alternatively, we could record the control signals of the individual controllers during normal operation, and use these recordings to steer the robot afterwards. Even though this approach will take into account potential synergies between the controllers, during testing it has no feedback at all, such that the trajectory could start to drift from the objective. In our setup, however, it seems that such an effect does not occur. Therefore we will use this approach. We have tried the other approach as well, and the results were qualitatively similar.

To get a qualitative idea of how the individual contributions look, we revisit the task inspired on the English alphabet, and train a controller ensemble to draw the letters of the word "amarsi" <sup>4</sup> one after another. After training, we use the recorded contributions of a single controller (unscaled) to steer the Webots simulation, and record the robot response.

The result is shown in **Figure 10A**, the five rows starting from the top show what trajectory each individual control primitive produces if it alone is present in the control architecture (in a corresponding color), plotted over the target value (gray). The bottom row shows the trajectory of the full ensemble, colored according to the most dominant controller. It is interesting to note that, even though all individual controllers produce a trajectory that resembles the target, all of them strongly deviate from the true target, and each of them produces a distinctly different response. The scaled combination of them, however, tracks the objective far more closely, which again indicates that the mixing mechanism works well to combine contributions of several controllers.

A second experiment we perform is to see whether true specialization occurs. After all, even though one controller is dominant, the other controllers will also strongly contribute to the total motion. In order to check this, we have performed a similar experiment to the double-circle objective, depicted in **Figure 6**. We used four controllers, and used the individual recordings to drive the robot. Next, we record the distance error of the end-effector as a function of time, which we compare with the corresponding scaling factor of the controller. If specialization occurs, we would expect to see some negative correlation between the error on the trajectory and the corresponding scaling of the controller. If the scaling factor of a certain controller is high, this would indicate that it specializes in the current region of task-space, and the resulting error should be low. Vice versa, if the scaling is low the controller should perform worse, as it is not its region of specialization.

The result for each controller is shown in **Figure 10B**. For some controllers there seems to be a strong relation between the error and the scaling of the corresponding model. Especially in the case of the red and blue controller. The relation is weaker, however, for the other two. Indeed, the scaling factor for these two controllers fluctuate less, such that they are able to train their corresponding inverse models throughout the full trajectory, leading to better overall generalization. From this we can conclude that MACOP uses both specialization and signal mixing to obtain good control over the robot arm.

# **3.4. COPING WITH DYNAMIC EFFECTS**

As mentioned before, during training MACOP is capable of handling dynamic effects/transitions caused by the inherent inertia of the robot and the fixed control signal rate. To demonstrate this ability, we apply MACOP to the robot with the square objective we used in **Figure 5**, but reduce the velocity of each joint. The *P*-parameter in the PID-controllers is reduced from 10 to 2, of which the effect is shown in **Figure 4**. Furthermore, in order to assess the effect of a sudden jump, we periodically shift the square by half a meter, which introduces discontinuous moments in the objective.

In **Figure 11** we show the Z-coordinate as a function of time for the desired trajectory and resulting trajectories for the different *P*-parameters. The top panel depicts the results during the beginning of the experiment (30*.*4–38*.*4 s), while the

<sup>4</sup>AMARSi is an EU project concerning adaptive motor skills for robots.

contribution (the black curves), plotted with their corresponding scaling factors (colored) as a function of time. The vertical scale of the error is provided on the left vertical axis, and that of the scaling on the right.

corresponds to the shift of the squares.

bottom panel depicts the results after convergence. The robot with the standard velocity (blue line) is able to follow the objective more closely during the beginning of the experiment but exhibits some fluctuations. The Z-position trajectory of the reduced velocity configuration is unable to reach the objective closely during the first part an clearly needs more time to learn the inverse model, indicating that the control problem is harder when the robot reacts more slowly. If we look at the result after convergence, we notice that the small fluctuations in the blue trajectory are reduced, and that MACOP has learned to follow the objective closely. Interestingly, due to a larger maximum velocity, the blue trajectory has a large overshoot when the applied objective exhibits a sudden jump. The red trajectory exhibits almost no overshoot in the beginning of the experiment but after convergence, the red curve also exhibits some overshoot. In the beginning of the experiment it's clear that the limited velocity of the joints causes the robot's end-effector to not reach its target in time, as opposed to the version with fast control. This indicates that the desired trajectory is more difficult to obtain if the robot has slow dynamics. Indeed, the sudden changes in direction can be made far more easily if the robot has a fast control response, and actuating a single joint may be enough, which is a simple task. If it has a slow response, the robot will need to use synergies between several joints in order to cause the end-effector to switch its direction so suddenly. This makes the control problem far more complicated.

Reducing the velocity of each robot joint has its advantages in terms of power consumption and safety. However, a trade-off must be made between faster convergence, as in closely following the objective, and the amount of overshoot allowed. In some tasks it might be possible that the objective changes very fast, and in such cases, reducing the reaction speed will restrict the robot in reaching its targets.

### **4. DISCUSSION**

In this work we described a modular architecture with control primitives (MACOP), which learns to control a robot arm based on a pre-set number of controllers. The inspiration for this architecture stems from MOSAIC (Haruno et al., 2001), which is a control framework, inspired by a plausible model on how human motor control learns and adapts to novel environments. MOSAIC uses a strategy in which an ensemble of inverse-model controllers is trained, one for each environment with different properties. On top of this, a selection mechanism selects which controller needs to be active at which moment in time. Each controller is associated with a forward model of the system that needs to be controlled, and controller selection happens by choosing the forward model which is the most accurate.

Our [and others' (Haruno et al., 2001)] observations show that such a strategy may not be optimal. There is no reason why the accuracy of a forward model should be correlated to that of the inverse model. Another selection mechanism of the controllers might thus be possible. In this work, we want to build upon the idea of a fixed number of control primitives which are continuously combined to produce a desired motion. Each controller used in MACOP consists of an inverse model which is trained online and consists of an Echo State Network. Given some high-level controller mixing requirements, an unsupervised division of the task and joint-space is achieved, which can be related to a Kohonen map (Kohonen, 1998). The mixing mechanism learns a subdivision of the entire task and joint-space and produces one scaling factor for each controller which are associated with the current end-effector and joint-angle position of the robot. The training error of each controller is scaled with the same factor such that training data within a controller's associated part of the subdivision becomes more important than other data. As a result of this data selection mechanism, every controller can specialize its function within its appointed part of the joint and task-space. The used mixing requirements prescribe that all controllers should contribute significantly to the task, while still allowing for a controller to specialize itself for a certain subregion.

We validated MACOP on an inverse kinematic learning task where we controlled a 6 DOF robot arm by producing joint-angles which are sent at a fixed control rate. This is in contrast with other approaches such as Oyama et al. (2001) where a static mapping from task-space position to joint-space is learned and where a separate feedback control loop to approach the target joint-angles is needed. Such a separate feedback control system results in high control gains when their is an external perturbation of the robot's movement. Achieving a compliant kinematic control thus argues for a dynamic learning approach which learns the control at a fixed control rate. In this work we rely on the approach proposed in Waegeman et al. (2012) for such a dynamic control method. As a result MACOP is well suited to cope with the dynamical effects introduced by the non-instantaneous control of the robot: even when the robot responds slowly to the control signal, the MACOP architecture is able to compensate for it and produce the target trajectory.

We replaced each controller with a simple linear controller to validate MACOP's independence of the chosen ESN-controller. Such a linear controller is constructed by learning a linear combination of the architecture's input. When the number of linear controllers within MACOP is large enough, the end-effector will start to track the target. However, the tracking performance of MACOP with the ESN-based controllers is better due to its non-linear nature.

In this work, the contribution of a single controller to all the robot's joint-angles is called a control primitive. Motor and motion primitives generally refer different building blocks at different levels of the motor hierarchy. They can be kinematic (e.g., strokes, sub movements), dynamic (e.g., joint torque synergies, control policies) or both. According to Flash and Hochner (2005) their crucial feature is that a wide variety of movements can be derived from a limited number of stored primitives by applying appropriate transformations. Within this definition a controller's contribution to the joint-angles can be called a primitive. Their organization is stored within the mixing transformation such that after convergence a consistent controller selection is achieved. What is different from the common interpretation of primitives is that in our case, the control primitives are mixed and rescaled constantly, instead of truly being selected and weighted statically.

MACOP learns to spread a set of controllers in the vicinity of the target trajectory such that primitives produced by controllers can help in tracking this trajectory. Even when the complexity of these controller is reduced (using linear controllers), the task is still solvable. Unlike in Nori and Frezza (2004), the MACOP control primitives and their mixing values both depend on the state of the robot, and because they are adapted online they become dependent on the task. After convergence (all learning rates are 0) this task dependency is removed.

In future work we wish to investigate how well MACOP is able to simplify the full dynamic control (e.g., controlling torques) of such systems including real-world robot platforms. Secondly, we wish to investigate how the mixing mechanism adapts to new tasks when the controllers are assumed to be fixed (training completed).

One interesting observation needs more scrutiny. Even though we showed that a single controller is unable to learn the inverse kinematics, we still found that for the controller ensemble, individual controller contributions perform relatively well in tracking the target trajectory. We suspect that this is caused by the used mixing mechanism and its effect on the controller's learning rate. The training data used to update a controller is weighted such that a good local model can be learned more easily instead of polluting the controller with data from other positions. Later, generalization to other positions in task-space can be performed more gradually, simplifying the full training problem. Further tests will be needed to confirm this hypothesis.

# **ACKNOWLEDGMENTS**

We thank J. P. Carbajal for the valuable and inspiring discussions and useful suggestions. This work was partially funded by a Ph.D. grant of the Institute for the Promotion of Innovation through Science and Technology in Flanders (IWT-Vlaanderen) and the FP7 funded AMARSi EU project under grant agreement FP7-248311.

# **REFERENCES**


for sensorimotor learning and control. *Neural Comput.* 13, 2201–2220. doi: 10.1162/089976601750541778


*Acad. Sci. U.S.A.* 91, 7534–7538. doi: 10.1073/pnas.91.16.7534


*Neural Networks*, (Bruges), 471– 482.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 26 January 2013; accepted: 03 July 2013; published online: 23 July 2013. Citation: Waegeman T, Hermans M and Schrauwen B (2013) MACOP modular architecture with control primitives. Front. Comput. Neurosci. 7:99. doi: 10.3389/fncom.2013.00099*

*Copyright © 2013 Waegeman, Hermans and Schrauwen. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

# A soft body as a reservoir: case studies in a dynamic model of octopus-inspired soft robotic arm

# *Kohei Nakajima1,2\*, Helmut Hauser 1, Rongjie Kang3,4, Emanuele Guglielmino3, Darwin G. Caldwell <sup>3</sup> and Rolf Pfeifer <sup>1</sup>*

*<sup>1</sup> Artificial Intelligence Laboratory, Department of Informatics, University of Zurich, Zurich, Switzerland*

*<sup>2</sup> Bio-inspired Robotics Laboratory, Department of Mechanical and Process Engineering, ETH Zurich, Zurich, Switzerland*

*<sup>3</sup> Department of Advanced Robotics, Istituto Italiano di Tecnologia, Genova, Italy*

*<sup>4</sup> Key Laboratory of Mechanism Theory and Equipment Design of Ministry of Education, Tianjin University, Tianjin 300072, P.R. China*

### *Edited by:*

*Tamar Flash, Weizmann Institute, Israel*

### *Reviewed by:*

*Vladimir Brezina, Mount Sinai School of Medicine, USA Dimitris Tsakiris, Institute for Computer Science - FORTH, Greece*

### *\*Correspondence:*

*Kohei Nakajima, Department of Informatics, University of Zurich, Andreasstrasse 15, 8050 Zurich, Switzerland e-mail: nakajima@ifi.uzh.ch*

The behaviors of the animals or embodied agents are characterized by the dynamic coupling between the brain, the body, and the environment. This implies that control, which is conventionally thought to be handled by the brain or a controller, can partially be outsourced to the physical body and the interaction with the environment. This idea has been demonstrated in a number of recently constructed robots, in particular from the field of "soft robotics". Soft robots are made of a soft material introducing high-dimensionality, non-linearity, and elasticity, which often makes the robots difficult to control. Biological systems such as the octopus are mastering their complex bodies in highly sophisticated manners by capitalizing on their body dynamics. We will demonstrate that the structure of the octopus arm cannot only be exploited for generating behavior but also, in a sense, as a computational resource. By using a soft robotic arm inspired by the octopus we show in a number of experiments how control is partially incorporated into the physical arm's dynamics and how the arm's dynamics can be exploited to approximate non-linear dynamical systems and embed non-linear limit cycles. Future application scenarios as well as the implications of the results for the octopus biology are also discussed.

**Keywords: reservoir computing, octopus, soft robotics, morphological computation**

# **1. INTRODUCTION**

Biological systems have certain morphologies1 and material characteristics that improve their adaptivity and increase their probability to survive. This suggests that control is not only located in the brain, but that there is a tight coupling between the brain, the body, and the environment, an idea that is usually termed *embodiment* (Pfeifer and Bongard, 2006; Pfeifer et al., 2007, 2012; Li et al., 2011b; Nakajima et al., 2011c, 2012a,b). Recently, motivated by the fact that soft material is ubiquitous in the body structures of living creatures, a new family of robots, *soft robots*, has been constructed with the aim of incorporating flexible elements (Trivedi et al., 2008; Steltz et al., 2009; Brown et al., 2010; Shepherd et al., 2011; Pfeifer et al., 2012). These robots have significant advantages over traditional articulated robots in terms of morphological flexibility and interactional safety (Trivedi et al., 2008; Li et al., 2011b). However, controlling them with conventional techniques is difficult because of their high-dimensional body structures and their diverse body dynamics, which are due to their non-linearity and elasticity. In this context, the octopus has been a good source of inspiration for roboticists to learn a control strategy for soft robots. An octopus has hyper-redundant limbs with a virtually unlimited number of degrees of freedom (DOF), and its movements are known to be highly sophisticated (Sumbre et al., 2001, 2005; Trivedi et al., 2008). From a conventional control perspective, the octopus's method of controlling movement and harnessing its non-linear body dynamics is outstanding and far-reaching.

It is well known that the nervous system of the octopus is highly distributed throughout the entire body. It has a relatively small central brain (about 50 million neurons), a central nervous system (CNS), which controls the large peripheral nervous system (PNS) of the arms (about 300 million neurons), integrates information from the visual system, and then issues commands to lower motor centers controlling the elaborate neuromuscular system of the arms. A typical example showing the effectiveness of this distribution of the nervous system is the reaching behavior (Gutfreund et al., 1996; Gutfreund, 1998; Sumbre et al., 2001; Yekutieli et al., 2005a,b). Reaching behavior consists of a *bend propagation* along the arm toward the tip in a highly stereotypical and invariant way. Sumbre et al. showed that the arm extensions can be evoked in arms whose connection with the brain have been severed (Sumbre et al., 2001). Because the evoked motions in denervated octopus arms were qualitatively and kinematically identical to natural bend propagations, an underlying motor program appears to be embedded in the neuromuscular system of the arm, which does not require continuous central control (Li et al., 2011a, 2012, 2013; Nakajima et al., 2011a,b; Kuwabara et al., 2012). In addition, the muscular organization of the octopus's arm has a characteristic structure called *muscular-hydrostats*(Kier and Smith, 1985; Smith and Kier, 1989; Taylor and Kier, 2003;

<sup>1</sup>By morphology, we do not only refer to the shape, but also sensor and actuator distributions, and physical properties, such as stiffness, etc.

Feinstein et al., 2011). In such structures, the volume of the organ remains constant during all movements, enabling the muscles themselves to perform all the functions usually performed by the skeleton (Sumbre et al., 2001, 2005; Taylor and Kier, 2003). This suggests that the body of the octopus arm is highly involved in the production of movements. Accordingly, in robotics, there have been several attempts to characterize the role of the muscularhydrostat system in terms of its anatomical structure (Mazzolai et al., 2007; Laschi et al., 2009, 2012; Vavourakis et al., 2012a,b) and functionality (Nakajima et al., 2013).

In this paper, along the lines of these biological findings, we aim to provide one quantitative evidence that the structure of the octopus's arm has the potential to embed multiple motor programs without any support from the external controller. Recently, it has been shown that non-linear mass-spring networks can be used as a computational resource (Caluwaerts and Schrauwen, 2011; Hauser et al., 2011, 2012; Sumioka et al., 2011; Caluwaerts et al., 2013; Nakajima et al., 2013). These works imply that the non-linear and elastic body dynamics of soft robots are not drawbacks for control, but rather can be directly exploited as a computational resource. In this paper, we build on theoretical models (Hauser et al., 2011, 2012) that have been proposed in the context of *reservoir computing*.

The term reservoir computing has been proposed by Schrauwen et al. (2007) for a set of machine learning techniques used to emulate complex, non-linear computations. The idea is to drive a high-dimensional, non-linear dynamical system (which has been randomly initialized, but afterwards fixed) with a low-dimensional input stream. This dynamical system, typically referred to as the *reservoir*, provides highly complex, but reproducible responses in its state space to the input. It operates as a type of temporal and finite "kernel" by projecting non-linearily the low-dimensional input into the high-dimensional state space of the reservoir. Furthermore, since a reservoir consists of dynamical systems, it exhibits a memory, which fades out exponentially (i.e., fading memory)2. A remarkable property of the approach is, if the reservoir is complex enough (i.e., high-dimensional and non-linear), it can been shown that it is sufficient to add a simple *linear*, *static* readout from the high-dimensional state space to emulate *non-linear complex* computations. Such reservoir computing setups have been proven to outperform other machine learning techniques in a number of difficult tasks; see Jaeger (2003) for example. Another remarkable property of this setup is that the requested properties for computationally powerful reservoirs turn out to be rather general. Hence, a number of different implementations for reservoirs have been proposed (Schrauwen et al., 2007). For example, simple, abstract dynamical systems are used for echo state networks (Jaeger, 2002; Verstraeten et al., 2007; Lukoševicius and Jaeger, 2009 ˇ ), or models of neurons are used in liquid state machines (Maass et al., 2002). Lately, it has been demonstrated that complex, compliant bodies of biological systems and robots have the potential to serve as such a reservoir as well, see Hauser et al. (2011) and Hauser et al. (2012).

Here, we demonstrate that the soft robotic arm inspired by the octopus can be used as a reservoir. This means, by simply attaching a static, linear readout from the high-dimensional non-linear dynamics of the octopus arm, one can emulate complex, nonlinear computation without altering the physical system itself. That is, we employ the physical body as part of a computational device. In this paper, a 3D dynamic model of this soft robotic arm is used as a test platform. Compared with the model used in Hauser et al. (2011) and Hauser et al. (2012), our model is more biologically inspired and physically feasible. It is a massspring-damper system, where the springs are aligned to emulate the octopus's muscular organization, and embeds the characteristic properties of a muscular-hydrostat. The arm is also assumed to be immersed in an underwater environment, in which the water friction constants are approximated by the computational fluid dynamics (CFD) simulations. As a result, the arm reveals highly non-linear body dynamics when actuated. By using this platform, we demonstrate that its body dynamics can be exploited as a computational resource. To test its power, we defined two types of tasks: first, to emulate complex non-linear dynamical systems, where we investigate whether the body dynamics are exploited as a computational resource; second, to implement a closed-loop control. We used several non-linear limit cycles to see how they can be embedded directly into the soft robotic arm without any support from an external controller. The choice of example functions adopted for each type of task is motivated to evaluate the non-linearity and memory that the body dynamics contains.

This paper is organized as follows. In section 2, we start by explaining the overall setting of the 3D dynamic model of the soft robotic arm platform and show how the arm emulates the muscular organization of the octopus in detail. The input– output relations adopted and the experimental procedures are also provided in detail. In section 3, we explain the results for a series of tasks in detail, and in section 4, we give concluding remarks, including future extension scenarios of our proposed approach.

# **2. MATERIALS AND METHODS**

In this section, we provide a detailed description of the soft robotic arm simulator model and explain how to exploit the system as a computational resource by introducing input–output relations. The experimental procedure is also explained in detail.

# **2.1. DYNAMIC MODEL OF A SOFT ROBOTIC ARM INSPIRED BY THE OCTOPUS**

In this paper, we use a 3D dynamic model of a soft robotic arm inspired by the octopus (Kang et al., 2011, 2012). The model is currently applied for testing purposes for control architectures of soft robotic arms (Kuwabara et al., 2012; Nakajima et al., 2012a), and has been validated to have good agreement with a physical soft robotic arm platform (Zheng et al., 2012). The overall structure of the entire arm is shown in **Figure 1F**. It is assumed to be immersed in an underwater environment, and the base of the arm is able to rotate in any direction. It consists of 20 compartments, and each compartment implements the unique characteristics of octopus muscles, called muscular-hydrostats. In an octopus arm,

<sup>2</sup>The memory is due to the integration capability of dynamical systems (i.e., It accumulates information over time).

system (a single compartment) used in this paper. **(C)** Schematics showing a longitudinal spring, a transverse spring **(D)**, and a ceiling plane **(E)**. In **(C)**, *<sup>f</sup>*damp <sup>=</sup> *Clij*˙ *l l ij*, *f*stiff = *Klij(l l ij* − *l l* <sup>0</sup>*ij)*, and <sup>ξ</sup>*ij* <sup>=</sup> <sup>ξ</sup>*cij* <sup>+</sup> <sup>ξ</sup>*vij*. In **(D)**, *<sup>f</sup>*damp <sup>=</sup> *Crij*˙ *l r ij*,

compartment contains four longitudinal springs and one transverse disk. The blue line represents the line connecting the centers of each transverse disk. The base of the arm is able to rotate in any direction. See the text for details.

muscles are organized into transverse, longitudinal, and obliquely oriented groups (**Figure 1A**). This special muscular organization forms the structures of the muscular-hydrostats. Their main property is that their volume remains constant during muscle contractions. The result is that if the diameter of the muscularhydrostats decreases, then their length increases, and vice versa. Several proposed models deal with the muscular-hydrostat system of the octopus [e.g., See Yekutieli et al. (2005a,b) and Kang et al. (2012)]. The overall structure of the muscular-hydrostat system adopted in this paper is shown in **Figure 1B**. We begin our description by focusing on the model of a single compartment, and then progress to describing an entire arm.

A single compartment is a mass-spring-damper system, shaped like a circular truncated cone, consisting of a base plane, a ceiling plane with four transverse springs, a central strut, and four longitudinal springs, which emulate the anatomical structure of the muscle alignment in a real octopus arm. (Note that, although we use the term "spring," it is a model for a muscle, so it has mass and damping.) The longitudinal springs control the position and orientation of the ceiling plane, while the transverse springs control the radius of the ceiling plane. The central strut provides kinematic constraints to guarantee the unique position of the ceiling plane. It is considered as an ideal prismatic joint without mass, damping, and stiffness. The system has an isovolmetric structure, which provides forces constantly aiming to maintain its volume and is an expression of the property of the muscular-hydrostats, and thus, all the springs are assumed to be implicitly or explicitly coupled with each other. The values for all the parameters of the model (e.g., spring coefficients, damping, etc.) are either inspired by the octopus or directly drawn from biological data (Kier and Curtin, 2002; Lieber, 2002; Shinohara et al., 2010). Standard SI units are used for the variables, and all of the ordinal differential equations presented are solved using the 4th order Runge–Kutta method, where *dt* is set to 0.001 *s* for the system throughout this paper.

Coordinates are defined on a base plane and a ceiling plane, denoted by *Aj(xj, yj,zj)* and *Bj(uj, vj,wj)*, respectively (**Figure 1B**), where *j* is the index number of the compartment. A vector expressing a longitudinal spring (**d***ij*) is: **d***ij* = **p***<sup>j</sup>* + **b***ij* − **a***ij*, where **p***<sup>j</sup>* = *AjBj* = [0 0 *hj*] *<sup>T</sup>* is the position vector of the center of the ceiling plane, **b***ij* is the position vector of joint *Bij*, and **a***ij* = *AjAij* is the position vector of joint *Aij*. The length of the *i*th spring in compartment *j* is *l l ij* and can then be obtained by *l l ij* = - **d***T ij* · **d***ij*. The dynamics of a longitudinal spring is expressed by:

$$
\xi\_{cij} + \xi\_{vij} - f\_{\text{BL}xij} = M\_{lij}\dot{l}^l\_{ij} + C\_{lij}\dot{l}^l\_{ij} + K\_{lij}\left(l^l\_{ij} - l^l\_{0ij}\right), \qquad (1)
$$

where ξ*cij* is the control force, ξ*vij* is the isovolumetric constraint force, which will be also explained further in Equation (7), *f*BL*zij* is the component force of joint *Bij* acting along the spring *i* of compartment *j*, *l l* <sup>0</sup>*ij* is the initial length of the spring, *Mlij* is the mass of the spring, *Clij* is the damping coefficient, and *Klij* is the stiffness coefficient (**Figure 1C**). Then, the rotation of the longitudinal springs can be formulated in frame *A* by:

$$I\_{\vec{ij}}\dot{\alpha}\_{\text{lij}} = \mathbf{d}\_{\vec{ij}} \times \mathbf{f}\_{\text{BL}\_{\text{xy}\vec{ij}}},\tag{2}$$

where **f**BL*xyij* is the component force of joint *Bij* acting perpendicular to the spring, *Iij* is the inertia moment of the spring about *Aij*, and ω*lij* is the angular velocity of the spring about *Aij*, where ω*lij* = *(***d***ij/l l ij)* × *(***v***Bij/l l ij)*, and **v***Bij* is the velocity of *Bij*.

To interlink several compartments serially the reaction forces between the longitudinal springs and the base, **F***ALij*, need to be calculated. These reaction forces are obtained by:

$$\mathbf{F}\_{AL\dot{i}\dot{j}} = M\_{\text{lij}} \ddot{\mathbf{d}}\_{\dot{i}\dot{j}} - \mathbf{F}\_{\text{BL}\dot{i}\dot{j}},\tag{3}$$

where **F***ALij* is the joint force on *Aij*, and **F**BL*ij* = **f**BL*xyij* + **f**BL*zij* is the joint force on *Bij*.

The four transverse springs spread around the central point of the ceiling plane. **Figure 1D** shows the illustration of a transverse spring. The dynamics of the length of the transverse spring are described as:

$$\begin{aligned} \delta\_{c\dot{i}\dot{j}} + \delta\_{\nu\dot{i}\dot{j}} - \mathbf{F}\_{\text{CL},\dot{i}\dot{j}} \cdot \mathbf{u}\_{l\dot{i}\dot{j}} &= M\_{r\dot{i}} \dddot{l}\_{\dot{i}\dot{j}}^{r} + C\_{r\dot{i}} \dot{l}\_{\dot{i}}^{r} \\ &+ K\_{r\dot{i}\dot{j}} \left(l\_{\dot{i}\dot{j}}^{r} - l\_{0\dot{i}}^{r}\right), \end{aligned} \tag{4}$$

where δ*cij* is the control force, δ*vij* is the isovolumetric constraint force (which will be discussed in detail later), **F**CL*ij* = −**F**BL*ij* is the joint force of *Bij* acting on the ceiling plane, **u***bij* is the unit vector of **b***ij*, *Mrij* is the mass of the spring, *l r* <sup>0</sup>*ij* is the initial radius of the ceiling plane, *Crij* is the damping coefficient, and *Krij* is the stiffness coefficient (**Figure 1D**). **Figure 1E** shows the illustration of the ceiling plane. The equation for the motion of the ceiling plane is:

$$\begin{aligned} \left[M\_{\text{cell}\dot{}\dot{}}\right] \left[0\ 0\ \ddot{h}\_{\dot{}}\right]^{T} &= \mathbf{F}\_{\text{CL}\,1\dot{}j} + \mathbf{F}\_{\text{CL}\,2\dot{}j} + \mathbf{F}\_{\text{CL}\,3\dot{}} \\ &+ \mathbf{F}\_{\text{CL}\,4\dot{j}} + \mathbf{F}\_{\text{CC}\dot{j}} + \mathbf{F}\_{\text{exj}}, \end{aligned} \tag{5}$$

where *hj* is the height of the ceiling plane of compartment *j*, **F**CC*<sup>j</sup>* = −**F**BC*<sup>j</sup>* is the joint force on *Bj* acting on the ceiling plane, and **F**ex*<sup>j</sup>* is the external force. The rotation is formulated as:

$$\begin{aligned} \mathbf{^B\_{}}\mathbf{^B\_{}}\mathbf{\_{cell}}\left[\boldsymbol{\delta}\_{j}^{\circ}\,\ddot{\boldsymbol{\alpha}}\_{j}\,\mathbf{0}\right]^{T} &= \sum\_{i=1}^{4} \mathbf{^B\_{}}\mathbf{b\_{ij}} \times \mathbf{^A\_{}}\mathbf{R\_{B\_{j}}^{T}}\mathbf{F\_{CLij}}\\ &+ \mathbf{^B\_{}}\mathbf{T\_{CCj}} + \mathbf{^B\_{}}\mathbf{T\_{exj}},\end{aligned} \tag{6}$$

where *Bj* **T**CC*<sup>j</sup>* is the constraint torque of joint *Bj* acting on the ceiling plane, *Aj* **R***T Bj* is the Euler rotation matrix, *Bj* **b***ij* is the position vector of *Bij* expressed in frame *Bj*, and *Bj* **I**ceil*<sup>j</sup>* is the inertia matrix of the ceiling plane.

As explained previously, the system is isovolumetric due to its muscular hydrostat structure. This means that an increase in the longitudinal length will result in a reduction in the crosssectional area and vice versa. A pair of antagonistic forces are applied to the longitudinal and transverse springs to guarantee the isovolumetric constraints. These are expressed as:

$$
\xi\_{\nu\bar{\jmath}} = -K\_{l\nu} \times |\delta\_{c\bar{\jmath}\bar{\jmath}}| \times (V\_{c\bar{\jmath}} - V\_{0\bar{\jmath}}),
\tag{7}
$$

$$\delta\_{\nu\bar{\jmath}} = -K\_{r\nu} \times \left| \sum\_{i=1}^{4} \xi\_{i\bar{\jmath}} + \mathbf{F}\_{\rm exj} \cdot \mathbf{p}\_{\bar{\jmath}} / h\_{\bar{\jmath}} \right| \times (V\_{\bar{\jmath}} - V\_{0\bar{\jmath}}), \tag{8}$$

where *Vcj* is the actual volume of compartment *j*, *V*0*<sup>j</sup>* is the initial volume of the compartment, *Klv* is the constraint force gain for the longitudinal springs, and *Krv* is the constraint force gain for the transverse springs. From Equation (7), it can be seen that the constraint force ξ*vij* is a function of the transverse spring force, δ*cij*, and the compartment volume change, *Vcj* − *V*0*j*. By applying ξ*vij* to Equation (1), the longitudinal springs will act against the transverse springs to drive the volume change to zero. Similarly, another constraint force δ*vij* is obtained by Equation (8) and applied to Equation (4) for the transverse springs to cancel the volume change induced by the longitudinal springs. Note that the external force, **F**ex*j*, is included in Equation (8) because it is acting on the compartment on joint *Bj*, which may also change the length of the longitudinal springs and the volume of the compartment.

In addition to the forces generated by the muscles, typical external forces applied to the soft robotic arm in an underwater environment are gravity, buoyancy, and hydrodynamic forces. These are considered as distributed forces acting on each compartment as:

$$\begin{split} \mathbf{F}\_{\text{exj}} &= \mathbf{F}\_{\text{g}j} + \mathbf{F}\_{bj} + \mathbf{F}\_{\text{hyd}j} \\ &= \mathbf{F}\_{\text{g}j} + \mathbf{F}\_{bj} + \left( \mathbf{F}\_{\text{hyd}\mathbf{D}j} + \mathbf{F}\_{\text{hyd}\mathbf{L}j} \right), \end{split} \tag{9}$$

where **F**ex*<sup>j</sup>* is the total external force acting on compartment *j*, **F***gj* is the gravity force, **F***bj* is the buoyancy force, and **F**hyd*<sup>j</sup>* is the hydrodynamic force composed of the water drag force, **F**hydD*j*, and the water lift force, **F**hydL*j*.

The direction of buoyancy always opposes gravity. Thus, the resulting force due to gravity and buoyancy is:

$$\mathbf{F}\_{\mathfrak{g}\dot{\jmath}} + \mathbf{F}\_{b\dot{\jmath}} = (\rho\_o - \rho\_\nu) V\_{\dot{\jmath}\dot{\jmath}} \mathbf{g} \mathbf{u}\_{\mathfrak{g}\dot{\jmath}} = \rho\_o V\_{\dot{\jmath}\dot{\jmath}} \mathbf{g}\_\epsilon \mathbf{u}\_{\mathfrak{g}\dot{\jmath}},\tag{10}$$

where ρ*<sup>w</sup>* is the density of water, ρ*<sup>o</sup>* is the density of the octopus arm, *Vcj* is the volume of compartment *j*, **u***gj* is the unit vector indicating the direction of gravity for compartment *j*, *g* is the gravity constant, and *ge* is the equivalent gravity constant defined as:

$$\mathbf{g}\_e = \left( 1 - \frac{\rho\_w}{\rho\_o} \right) \mathbf{g}. \tag{11}$$

By adjusting the value of *ge*, both gravity and buoyancy forces are included in the model.

The hydrodynamic forces applied to an octopus arm during a movement through a fluid medium are shown in **Figure 2**. For compartment *j*, the drag force **F**hydD*<sup>j</sup>* is parallel to the velocity vector **V***<sup>j</sup>* of the fluid (or the uniform arm velocity in a stationary fluid) and the lift force **F**hydL*<sup>j</sup>* is perpendicular to **V***j*, according to

$$\mathbf{F}\_{\text{hydDj}} = \frac{1}{2} C\_{Dj} A\_{rj} \rho\_{\text{w}} ||\mathbf{V}\_{j}||^{2} \mathbf{u}\_{\nu j} \tag{12}$$

$$\mathbf{F}\_{\text{hydL}j} = \frac{1}{2} C\_{Lj} A\_{rj} \rho\_w ||\mathbf{V}\_j||^2 \mathbf{u}\_{\nu j}^\perp,\tag{13}$$

where *CDj* and *CLj* are the drag and lift coefficients, respectively, *Arj* is the reference area of compartment *j*, **u***vj* is the unit vector indicating the direction of **V***j*, and **u**<sup>⊥</sup> *vj* is the unit vector of normal direction for **V***j*. The hydrodynamic force coefficients, *CDj* and *CLj*, for a segmented arm are obtained from high fidelity CFD simulations (Kazakidi et al., 2012). They were found to be dependent on the flow incidence angle θ*j*, and the configuration of the arm (e.g., straight vs. bended). As a first approximation of the hydrodynamic forces common in robotic literature (Ijspeert, 2001; Kazakidi et al., 2012), dependence on arm configuration was ignored. Therefore, a single value for each coefficient at specific angles of θ*<sup>j</sup>* for a straight arm was identified by CFD simulations and approximated by a 4th order polynomial in the simulator, expressed as follows:

$$C\_{D\dot{j}} = e\_1^D \theta\_{\dot{j}}^4 + e\_2^D \theta\_{\dot{j}}^3 + e\_3^D \theta\_{\dot{j}}^2 + e\_4^D \theta\_{\dot{j}} + e\_5^D,\tag{14}$$

$$C\_{L\circ} = e\_1^L \theta\_j^4 + e\_2^L \theta\_j^3 + e\_3^L \theta\_j^2 + e\_4^L \theta\_j + e\_5^L,\tag{15}$$

where *eD* <sup>1</sup>−<sup>5</sup> and *eL* <sup>1</sup>−<sup>5</sup> are the parameters identified by CFD simulations. The hydrodynamic forces for each compartment were then computed according to Equations (12) and (13), where **V***<sup>j</sup>* was taken as the velocity of *Bj*. All the parameters used in this study are shown in **Table 1**.

This model is intrinsically non-linear. The non-linearities of the system are partly introduced by its kinematics (Kang

**FIGURE 2 | Hydrodynamic forces acting on the soft robotic arm.**

**Table 1 | Parameters for the soft robotic arm adopted in this paper.**


*Note that, for l<sup>r</sup>* <sup>0</sup>*ij and V*0*j, they decrease monotonically, according to the compartment number in the given range in the table.*

et al., 2012). The relation between the spring length and the ceiling plane posture (position and orientation) is non-linear. Therefore, the system dynamics become non-linear as well. Also, the calculation of the isovolumetric forces and hydrodynamic forces introduces non-linearities. See Kang et al. (2012) for a detailed discussion of the model. In the majority of cases, these non-linearities are undesirable from the viewpoint of classical control theory. However, as previously mentioned in section 1, such a complex body could potentially be used as part of a computational device, if appropriate inputs and readouts are applied.

### **2.2. EXPERIMENTAL PROCEDURE**

Our aim in this paper is to demonstrate whether a soft robotic arm can be exploited as a computational resource, as well as a controller. Accordingly, we need to define inputs (In*(t)*) to the system and how to generate corresponding outputs (O*(t* + 1*)*). In this paper, we apply the position control of the base rotation as an input, and an output is generated by the weighted sum of the longitudinal spring lengths of all 20 compartments (**Figure 3**).

Based on this I/O scheme, we set two types of tasks for our demonstrations. First, we consider the emulation tasks of non-linear dynamical systems (**Figure 3A**), which aims to show whether the soft robotic arm can be exploited as a computational resource, including sufficient non-linearity and memory. The second task is to embed closed-loop control onto the soft robotic arm itself (**Figure 3B**). In particular, we aim to embed non-linear limit cycles, which are especially appealing for the control of robots. Typically, such limit cycles are implemented using non-linear oscillators, such as central pattern generators (CPGs), or a network of such oscillators (Righetti and Ijspeert, 2008). We here aim to demonstrate that the body of the soft robotic arm itself can be used to generate such limit cycles.

As explained in section 1, our approach is comparable to a reservoir computing approach, which normally uses randomly coupled non-linear elements as a computational resource (Jaeger, 2002; Maass et al., 2002; Jaeger and Haas, 2004). In the conventional reservoir computing approach, since each computational element is coupled randomly, each element possess a uniform role in the computation in the statistical sense. On the other hand, if we exploit the robot's body as a reservoir, according to the intrinsic structure of the body, each part of the body shows qualitatively different dynamics, which may lead to specific role distributions corresponding to each body part. Accordingly, in this paper, we will investigate how the computational role is distributed through the arm in each task. In the following subsections, we provide detailed descriptions for each task.

# *2.2.1. Task 1: non-linear dynamical system emulation tasks*

In order to evaluate the computational power of the system, we here set non-linear dynamical system emulation tasks, which are often used as benchmark tasks (Jaeger, 2002; Verstraeten et al., 2007; Hauser et al., 2011) in the context of recurrent neural network learning (Atiya and Parlos, 2000) and the reservoir computing approach (Jaeger, 2002; Maass et al., 2002; Jaeger and Haas, 2004). Each task requires a certain degree of non-linearity and memory to be performed by the system. As explained above,

accordingly, the soft robotic arm shows passive body dynamics. By setting a linear readout for each longitudinal spring length in each compartment, an output (O*(t* + 1*)*) is calculated as a weighted sum of all the spring lengths. By adjusting only the linear readout, we demonstrate of a nonlinear limit cycle are emulated and fed back as the next input to the system to generate the base rotation movement. Accordingly, the nonlinear limit cycle is embedded onto the arm in a closed-loop manner. See the text for details.

we first apply position control of the base rotation as an input (**Figure 3A**), expressed as follows:

$$
\theta(t) = \phi(t) = \text{Scale} \times \text{In}(t), \tag{16}
$$

$$\mathrm{Im}(t) = 0.2\sin(2\pi f\_1 t \times dt)\sin(2\pi f\_2 t \times dt)\sin(2\pi f\_3 t \times dt), \text{ (17)}$$

where θ*(t)* and φ*(t)* are base rotations at timestep *t* along the xaxis and y-axis, respectively. The parameter *Scale* linearly scales the raw input, In*(t)*, to the specific range of the base angle [degree], −*R* ≤ θ*(t),*φ*(t)* ≤ *R*. This scaling parameter can be freely chosen, but should be fixed throughout the experiment. The detailed setting will be explained in the section 3. The parameters *f*1, *f*2, and *f*<sup>3</sup> are set to 2.11, 3.73, and 4.33, respectively. Similar inputs were adopted in Hauser et al. (2011), Sumioka et al. (2011), and Nakajima et al. (2013).

According to the base rotation, the arm generates passive body dynamics. The output of the system is calculated by using the resulting spring dynamics as follows:

$$\mathcal{O}(t+1) = \sum\_{j=1}^{20} \sum\_{i=1}^{4} w\_{\text{out}}^{\vec{ij}} s\_{\vec{ij}}(t),\tag{18}$$

where *sij(t)* is the length of the longitudinal spring *i* (*i* = 1*,* 2*,* 3*,* and 4) in compartment *j* (*j* = 1*,* 2*, . . . .,* 20) at timestep *t*. When the input is projected as the base rotation at timestep *t*, the corresponding length of the spring, *l l ij*, is collected to *sij(t)*. The linear readout weight, *wij* out, corresponds to each spring length. Overall, the dynamics of 80 (= 4 × 20) spring lengths are the expression of the body dynamics in this paper.

In order to achieve the required computation, we only train the linear readout (*wij* out). Since we have 80 nodes, *s*11*(t),s*21*(t),s*31*(t),s*41*(t),s*12*(t), . . . .,s*420*(t)*, for the lengths of the spring at timestep *t*, by collecting the lengths of the springs for *M* timesteps, we can generate an 80 ×*M* matrix **L**. We also collect the corresponding target outputs for *M* in a matrix **T**. Then, the optimal readout weights, **W** = [*w*<sup>11</sup> out*, w*<sup>21</sup> out*,w*<sup>31</sup> out*,w*<sup>41</sup> out*,w*<sup>12</sup> out*, . . . .,w*<sup>420</sup> out] *<sup>T</sup>*, can be obtained by **W** = **L**∗**T**, where **L**<sup>∗</sup> is the Moore-Penrose pseudo-inverse, since **L** is not a square matrix in general.

According to the sent input, the system should emulate the following three non-linear dynamical systems as outputs. We prepared three corresponding output nodes to the system whose linear readouts are trained separately for each task. (This procedure is often called *multitasking*.) The first task is a 2nd order non-linear dynamical system, expressed as follows:

$$y(t+1) = 0.4\boldsymbol{y}(t) + 0.4\boldsymbol{y}(t)\boldsymbol{y}(t-1) + 0.6\ln^3(t) + 0.1,\ \text{(19)}$$

where *y(t)* denotes the output of the system. The second task is a 10th order non-linear dynamical system, expressed as follows:

$$\dot{\chi}(t+1) = 0.3\dot{\chi}(t) + 0.05\dot{\chi}(t) \left(\sum\_{i=0}^{9} \dot{\chi}(t-i)\right) \tag{20}$$

$$+1.5\ln(t-9)\text{In}(t) + 0.1,\tag{21}$$

where *y(t)* denotes the output of the system. The third task is a discrete Volterra series, expressed as follows:

$$\chi(t+1) = A \times \sum\_{\mathfrak{r}\_1=0}^{200} \sum\_{\mathfrak{r}\_2=0}^{200} h(\mathfrak{r}\_1, \mathfrak{r}\_2) \text{In}(t - \mathfrak{r}\_1) \text{In}(t - \mathfrak{r}\_2), \text{ (22)}$$

$$h(\mathbf{r}\_1, \mathbf{r}\_2) = \exp\left(\frac{(\mathbf{r}\_1 \times dt - \mu\_1)^2}{2\sigma\_1^2} + \frac{(\mathbf{r}\_2 \times dt - \mu\_2)^2}{2\sigma\_2^2}\right), \tag{23}$$

where *A* is a scaling parameter set to 0.0001, *y(t)* and *h(*τ1*,* τ2*)* denote the output of the system and a Gaussian kernel, respectively. The parameters, μ1, μ2, σ1, and σ<sup>2</sup> are set as μ<sup>1</sup> = μ<sup>2</sup> = 0*.*1 and σ<sup>1</sup> = σ<sup>2</sup> = 0*.*05. Any computational model that can emulate the above dynamical systems should have a certain degree of memory and non-linearity. Simply put, emulation of a 10th order system requires more memory and non-linearity than a 2nd order system, and emulation of the Volterra task requires more than the 10th order system.

For the experimental procedure, the soft robotic arm is first set in the resting state with θ*(t)* = φ*(t)* = 0, and before beginning the experiment, we start to run the arm with Equation (16) for *T*ini timesteps. This phase is to set the different initial positions of the arm for each experimental trial; *T*ini is randomly determined from 0 to 1000 timesteps for each trial. The actual experimental trial consists of 16,000 timesteps, where the first 1000 timesteps are for washout, the following 10,000 timesteps are for the training phase, and the final 5000 timesteps are for the evaluation phase. After *T*ini timesteps, we continue running the arm with Equation (16) and the actual experiment begins. By collecting the lengths of the spring and the corresponding target outputs for each task in the training phase, we train the linear readouts for three outputs by adopting the previously explained procedure. By using the trained linear readouts, we evaluate the performance of the system output by calculating the mean squared error (MSE), MSE = <sup>1</sup> *n <sup>n</sup> <sup>t</sup>*=1*(*O*(t* + 1*)* − *y(t* + 1*))*2, where *n* = 5000. We here compare the performance of the system with outputs generated by simple linear regression, O*(t* + 1*)* = *a* × In*(t)* + *b*, where *a* and *b* are trained by using the same time series as in the training phase. As is clear from the equation, since the linear regressor only uses the input to generate the output, which does not contain non-linearity and memory, any task performance of the system better than the linear regressor can be said that the required non-linearity and memory to perform the task is positively exploited from the system.

## *2.2.2. Task 2: closed-loop control—embedding non-linear limit cycles*

As previously explained, in this task, we aim to embed non-linear limit cycles in a closed-loop manner. The major difference from Task 1 is that the outputs generated by the system are fed back to the system itself as a motor command (an input) for the next timestep (**Figure 3B**). In particular, as will be explained later, we here aim to embed several limit cycles, which each have two variables. Accordingly, the outputs generated for the next motor commands (namely, In1*(t)* and In2*(t)* for each variable) are projected to θ*(t)* and φ*(t)*, respectively. The situation is expressed as follows:

$$\begin{cases} \theta(t) = \text{In}\_1(t), \\ \phi(t) = \text{In}\_2(t), \end{cases} \tag{24}$$

$$\begin{cases} \text{In}\_1(t) = \text{O}\_1(t), \\ \text{In}\_2(t) = \text{O}\_2(t), \end{cases} \tag{25}$$

$$\begin{cases} \mathcal{O}\_1(t+1) = \sum\_{j=1}^{20} \sum\_{i=1}^4 \boldsymbol{w}\_{\text{out},1}^{\vec{j}} s\_{\vec{j}\boldsymbol{j}}(t), \\ \mathcal{O}\_2(t+1) = \sum\_{j=1}^{20} \sum\_{i=1}^4 \boldsymbol{w}\_{\text{out},2}^{\vec{j}} s\_{\vec{j}\boldsymbol{j}}(t), \end{cases} \tag{26}$$

where *wij* out*,*<sup>1</sup> and *<sup>w</sup>ij* out*,*<sup>2</sup> are the linear readouts corresponding to the two outputs, O1*(t)* and O2*(t)*, respectively. (Note that, in Equation (24), unlike Equation (16) in Task 1, the scaling parameter *Scale* is not introduced. As will be explained later, this is because we here aim to emulate the limit cycles, which are already scaled with a certain parameter value. So the scaling procedure is already included in the target outputs.)

As in the procedure explained in Task 1, to train the system, we only adjust the linear readouts, *wij* out*,*<sup>1</sup> and *<sup>w</sup>ij* out*,*2. However, the procedures differ in two points; first, during the training phase, we clamp the feedbacks from the system outputs, and provide the target outputs as inputs, which means, in Equation (25), we set In1*(t)* = *x*1*(t)*, In2*(t)* = *x*2*(t)*, where *x*1*(t)* and *x*2*(t)* are the target outputs at timestep *t*. Thus, the training phase is carried out with an open-loop, where the system was forced into the desired operative state by the target signals (this is called, *teacher forcing*) (Hauser et al., 2012). Second, when collecting the lengths of the springs in the training phase, we add white noise in the range of [−ν, ν]. By doing this, we can expect that the obtained optimal readouts will generate the target outputs even under the influence of noise (Hauser et al., 2012). The appropriate degree of ν is determined heuristically for each task.

Here, we aim to embed three non-linear limit cycles. The first one is the dynamical systems of the Van der Pol equations, expressed as follows:

$$\begin{cases} \dot{\boldsymbol{x}}\_1 = \boldsymbol{x}\_2, \\ \dot{\boldsymbol{x}}\_2 = -\boldsymbol{x}\_1 + \left(1 - \boldsymbol{x}\_1^2\right) \boldsymbol{x}\_2. \end{cases} \tag{27}$$

The second one is a limit cycle, which we call the *quadratic limit cycle* (Hauser et al., 2012; Khalil, 2002), expressed as follows:

$$\begin{cases} \dot{\mathbf{x}}\_1 = \mathbf{x}\_1 + \mathbf{x}\_2 - 5\mathbf{x}\_1 \left( \mathbf{x}\_1^2 + \mathbf{x}\_2^2 \right), \\ \dot{\mathbf{x}}\_2 = -2\mathbf{x}\_1 + \mathbf{x}\_2 - \mathbf{x}\_2 \left( \mathbf{x}\_1^2 + \mathbf{x}\_2^2 \right). \end{cases} \tag{28}$$

The third one is a Lissajous curve with a frequency ratio of *f*1*/f*<sup>2</sup> = 2, expressed as follows:

$$\begin{cases} \varkappa\_1 = \sin(f\_1 t), \\ \varkappa\_2 = \sin(f\_2 t). \end{cases} \tag{29}$$

Since each limit cycle is symmetric about the point (0, 0), we select the variable with the larger range, and scale both variables to the desired range of the base rotation [degree], −*R* ≤ θ*(t),*φ*(t)* ≤ *R*. As in Task 1, this scaling parameter can be freely chosen, but should be fixed throughout the experiment. Thus, what the system should emulate here is *x* <sup>1</sup>*(t* + 1*)* = Scale × *x*1*(t* + 1*)* and *x* <sup>2</sup>*(t* + 1*)* = Scale × *x*2*(t* + 1*)* as O1*(t* + 1*)* and O2*(t* + 1*)*, respectively. Further settings on the parameter *Scale* will be explained in the section 3. For the Van der Pol system and the quadratic limit cycle, the ordinal differential equations are solved for each simulation timestep by using the 4th order Runge–Kutta method, where *dt* is set to 0.01. Note that the timescale of the arm model and these limit cycle is different. When we refer to time *s*, we always fixed our expression to the timescale to the arm model, otherwise we use the expression of simulation *timestep* to avoid the confusion.

In the experimental procedure, the soft robotic arm is first set in the resting state with θ*(t)* = φ*(t)* = 0, as in Task 1. We run the system with the teacher forcing condition for 70,000 timesteps, and by discarding the first 10,000 timesteps, we use 60,000 timesteps as for the training phase with white noise of degree ν added in the spring lengths. After obtaining the optimal readout weights from these collected data, we initialize the arm's state to the resting state and again start to run the system with the teacher forcing condition. After 5000 timesteps of running, we switch the inputs to the system output generated by the trained readout weights (Equations 25 and 26) and check whether it could embed the target limit cycle. Unlike Task 1, multitasking cannot be adopted (due to the feedback control), so each limit cycle is trained separately as a different trial.

# **3. RESULTS**

In this section, we present the results of each task applied to indicate the performance of our system. We would like to note again that all the tasks presented in this section is performed with "one body", the same soft robotic arm explained in section 2.1, where all the parameter settings of the arm is fixed throughout the experiments. In addition, for Task 1, emulations of three nonlinear dynamical systems are simultaneously performed for each experimental run (i.e., multitasking).

### **3.1. TASK 1: NON-LINEAR DYNAMICAL SYSTEM EMULATION TASKS**

**Figure 4** shows a typical example of the time series of In*(t)*, the lengths of the springs, and the performance of each task during the evaluation phase. The plots show the case when *R* is set to 60. We can clearly see that, according to the input projected to the base rotation, our soft robotic arm shows diverse passive body dynamics (**Figures 4A,B**). Regarding the task performance, the system output shows better performances than the linear regressor in all the tasks (**Figure 4C**). In the emulation task for the 2nd order system, the linear regressor also showed relatively good performance. However, as we can see in the plots, as the degree of non-linearity and memory of the task increases in the 10th order system and Volterra task, the performance of the linear regressor decreases significantly (**Figure 4C**). On the other hand, our system shows relatively good performance even in the Voletrra series emulation task (**Figure 4C**). **Table 2** shows the statistical comparisons between the MSE of the output of the system and that of the linear regressor for each task. The values show the averaged

**terms of time series in the evaluation phase. (A)** The time series of In*(t)*. **(B)** Corresponding body dynamics, which are expressed as the time series of the lengths of springs 1, 2, 3, and 4. The spring dynamics of all 20 compartments are overlaid. **(C)** Comparisons of the performance of the system output and the linear regressor for the emulation tasks of the 2nd order system (upper diagram), 10th order system (middle diagram), and Volterra task (lower diagram). In the plots, for the 2nd and 10th order systems, and the Voletrra task, the MSEs of the system output were 1.89 ×10<sup>−</sup>7, 3.80 ×10<sup>−</sup>6, and 8.90 ×10<sup>−</sup>5, respectively, and those of the linear regressor were 9.96 ×10<sup>−</sup>7, 4.95 ×10<sup>−</sup>4, and 9.48 ×10<sup>−</sup>4, respectively. For each task, the system output showed better performance than the linear regressor. Note that the output of the linear regressor is not a straight line, but a scaled version of the input with an offset (this outcome is due to the scaling of the figure).

MSEs over 20 trials. In each task, our system showed significantly low MSE. These results suggest that our system is able to exploit the non-linearity and memory originates from the passive body dynamics of the soft robotic arm to perform the task.

Unlike a conventional computational network or device, our system receives input as a mechanical rotation of the base, which generates the physical motion of the arm. Thus, we can easily imagine that, for example, if the input scaling of the base rotation range, *R*, is small, then the arm can only vibrate slightly, which does not generate diverse body dynamics. Moreover, since the

**Table 2 | Comparisons of MSE between the output of the system and that of the linear regressor for each task.**


input is linearly scaled to the base rotation range in our setting, the degree of this scaling changes not only the amplitude of the rotation but also the speed of the rotation. Considering that the water friction on the arm shows non-linear dependence on the velocity and the angle of the compartments (Equations 12, 13, and 14), the property of the body dynamics would change according to the degree of input scaling, *R*. Accordingly, the performance of our system would also change for each task. In order to validate this, we varied *R* from 15 to 90, and observed how the performance of the system changed for each *R*. **Figure 5A** shows the results. We can confirm the different individual error profile with respect to *R* for each task. First, small *R* values (around 15) show the highest errors; errors gradually start to decrease according to increases in *R* values in all tasks. But, for example, in the case of the 2nd order system, the error starts to increase again at around *R* = 30 and has a local maximum at around *R* = 45. In the case of the 10th order system, the error just decreases monotonically as the value of *R* increases. In the case of the Volterra task, the error shows the minimum at around *R* = 55 and start to increase monotonically as the value of *R* increases. This suggests that, even if the mechanical structure of the arm is the same, certain behaviors of the arm can reveal especially high computational power in some tasks, but not in others.

As explained in section 1, our system is essentially classified as a reservoir computing approach, in which a number of randomly coupled nodes are usually used as a computational resource, and where each node has a statistically uniform role. On the other hand, in our system, due to the intrinsic body structure, we can expect that there is a specific role for each body part. Here, we aim to investigate this point in two ways. First, when running the evaluation phase, we take out the readouts from one compartment and analyze the error. Note that, in this analysis, we use the readouts, which are trained with 20 compartments. By iterating this procedure for each compartment, we can investigate how each compartment contributed to the task performance when the readouts were fully connected. We call this *contribution ratio analysis* of the compartments. Second, we perform the entire experiment (i.e., washout, learning and evaluation phase) with only 19 compartments by skipping one compartment, and compare the performance with that obtained using 20 compartments. The difference with the previous contribution ratio analysis is that the readouts are, in the first place, trained to maximize the performance with 19 compartments including the entirely new readout weights. Since the readout

on the base rotation range, *R*, for each task. The plots show the averaged values of MSE over 20 trials. The error bars show the standard deviation. The averaged value of MSE of the linear regressor was much higher than that of the system output with respect to each *R* value (usually an order of magnitude higher). **(B)** Results of the contribution ratio analysis for each task. The horizontal axis shows the number of the compartment discarded in the

standard deviations. **(C)** Results of the computational power analysis for each task. The horizontal axis shows the number of the compartment excluded to train the readout weights, and the vertical axis shows the performance ratio (PR). The plots show the averaged PR values over 20 trials, and the error bars show the standard deviations. See text for the details.

weights are optimized without using a specific compartment, we can infer back the overall computational power of each compartment as a deviation from the original 20 compartment case. We call this *computational power analysis* of the compartments. Hereafter, the base rotation range, *R*, is fixed to 60 for the analyses.

In the contribution ratio analysis, the experimental procedure is the same as that explained in section 2.2, except that the evaluation phase is performed by taking out the readouts from a specific compartment (thus, four nodes are excluded). We iterate this procedure for each compartment by using the same input time series and body dynamics in a trial and calculate the MSE for each case. After testing all the compartments, we normalize them with the maximum MSE collected, and obtain *contribution ratio* (CR) for each compartment. If the CR is high for the compartment, then it implies that this compartment was contributing to the task performance largely when the readouts were fully connected. **Figure 5B** shows the result of the contribution ratio analysis for each task over 20 trials. In the case of the 2nd order system, although compartments 2, 3, 4, 8, and 9 seem to have high CRs, the standard deviations for these are also high, while compartments 1, 6, 19, and 20 have low CRs with low standard deviations. This suggests that specific compartments, such as 1, 6, 19, and 20 always contribute less to the task performance, while the computational role for this task is relatively distributed throughout the resting compartments, and among them, there is no specific compartment that consistently has a high contribution. In the case of the 10th order system and the Volterra task, the situation is different. There seems to be key compartments that always show high contributions to the task performance. In the case of the 10th order system, compartments 16, 17, and 18 show high performance, and in the case of the Volterra system, compartments 14 and 15 show high performance. Overall, these results suggest that our system adopts various strategies in the performance of computational abilities, according to the task. One strategy is to distribute the computational role throughout the entire arm, while the other is to always select and rely on the motion of specific body parts.

Next, in the computational power analysis, the experiment is performed both under the default condition (using 20 compartments) and with the exclusion of a specific compartment when training the readout weights. This is done by using the same input time series and body dynamics in a trial for each task. We calculated the MSEs in the evaluation phase for both cases and divided the MSE of the case without the specific compartment by that of the default condition and obtained the performance ratio (PR) for each task in each trial. Thus, if the PR is larger than 1.0, it implies that the task performance is worse than in the default condition, and the value indicates the degree of how the exclusion of the specific compartment affects the overall task performance in terms of ratio. **Figure 5C** shows the result of the computational power analysis for each task over 20 trials. In the 2nd order system task, we can clearly see that there are high PR values around the base of the arm (in compartments 1, 2, 3, and 4) suggesting that these compartments contain significant information for the performance of this task. Similarly, in the Volterra task, there are high PR values in compartments 14, 15, and 16. On the other hand, in the 10th order system task, PR values lower than 1.0 are shown around the tip of the arm (in compartment numbers higher than 15), suggesting that these compartments have a negative influence on the performance of this task. We speculate that this is caused by an overfitting effect produced by the compartments around the tip of the arm. The network is too specialized for the learning data and is not able to generalize to the new (evaluation) data. The reason for this should be explored in more detail in future work. Overall, we showed that there are specific regions in the body parts that contain positive or negative information for the performance of the tasks.

In this section, we first demonstrated that our soft robotic arm can perform the task to emulate non-linear dynamic systems by positively exploiting the non-linearity and memory originates from its body dynamics. We also confirmed that the way the input is applied (in our case, the amplitude range for the arm movement) significantly affects the computational ability, and its body parts show specialized roles due to their intrinsic morphological structure and corresponding diverse body dynamics, unlike the conventional reservoir computing approach. In the next section, we see how these body dynamics can potentially be used to control the arm's motion in a closed-loop manner by embedding non-linear limit cycles.

# **3.2. TASK 2: CLOSED-LOOP CONTROL—EMBEDDING NON-LINEAR LIMIT CYCLES**

In this section, we show the results for Task 2. By following the procedure described in section 2.2.2, we conducted a number of computer simulations to train the readouts with various values of the base rotation range *R* and the degree of white noise ν. As a result, we heuristically found that the system performance is extremely sensitive to the setting of these parameters [as opposed to the results presented for the simpler and abstract networks used in Hauser et al. (2012)]. (As for *R*, we have already shown in the previous section that *R* changes the computational power of the system significantly.) If these parameters were not set appropriately, we often observed that, when the system was switched from the teacher forcing condition to the closed-loop control, the arm gradually approached the resting state or showed unrealistic behaviors due to numerical problems. For the latter case, since we adopt the position control of the base angle, if the output showed much higher values than that of one timestep before (for example, if |O*(t* + 1*)* − O*(t)*| *>* 10, then the arm would have to rotate its base extremely quickly, namely, larger than 104deg*/*s, which is unrealistic in the physical platform), then, as a result, the simulator showed numerical problems. We carefully discarded these cases from our experiment. Even if the system has a high computational power as we saw in the previous section, the closed-loop setting requires additional care due to the stability issues. Since the output, which includes the error, is fed back to the system as input, the error may grow larger and larger in each simulation timestep.

**Figures 6**–**8** show the typical results we obtained when the arm does not approach the resting state or the unrealistic behaviors mentioned above, for closed-loop control of the Van der Pol limit cycle, the quadratic limit cycle, and the Lissajous curve,

respectively. <sup>1</sup> 3 For the parameters *(R,* ν*)*, we adopted *(R,* ν*)* = *(*130*,* 1*.*5 × 10−6*)*, *(*90*,* 1*.*0 × 10−6*)*, and *(*10*,* 1*.*0 × 10−11*)*, for the Van der Pol limit cycle, the quadratic limit cycle, and the Lissajous curve, respectively. For the Van der Pol limit cycle, we can see that the system is not implementing the target trajectory (**Figures 6B,C**), but is rather implementing an irregular one (**Figure 6C**). We also observed that the behavior of this trajectory is not as stable, but rather constantly changes its trajectory for each cycle, and this change remains throughout the trial. However, the results for the quadratic limit cycle and the Lissajous curve show almost a complete fit with the target trajectory (**Figures 7B,C**, and **8B,C**, respectively). We confirmed

**FIGURE 7 | Results for implementing the quadratic limit cycle.** At timestep 5000, the system is switched from the teacher forcing condition to the closed-loop control (black line). **(A)** The time series of the lengths of springs 1, 2, 3, and 4 for all 20 compartments. **(B)** Comparisons between the system output (O1*(t)* and O2*(t)* (blue lines)) and the target output (*x*1*(t)* and *x*2*(t)* (red lines)). **(C)** Comparisons between the system output (blue lines) and the target output (red lines) in the O1*(t)* - O2*(t)* (*x*1*(t)* - *x*2*(t)*) plane. The time series from timestep 5000 to timestep 70,000 are overlaid.

that these trajectories were stable enough to run for 1,000,000 timesteps without leaving the trajectories of the target limit cycles. These results suggest that the task performance of the closed-loop control is not only restricted to the degree of non-linearity or memory required for the limit cycles but is also dependent on how the arm is driven. These preferences are caused by the intrinsic structure of the body. From now on, by using the system embedding the quadratic limit cycle (**Figure 7**) and the Lissajous curve (**Figure 8**), we move on to analyze the stability of the closed-loop controls and the role of each body part in these limit cycles as we saw in the previous section.

One important aspect to evaluate the embedded closed-loop control is its robustness against external perturbations. To test this, we added white noise, δO1*(t)* and δO2*(t)*, in the range of (δO1*,*2*(t)* ∈[−, ]) to the two motor outputs of the embedded control, such as O1*(t)* + δO1*(t)* and O2*(t)* + δO2*(t)*, during *t* = 6000–7000 timesteps for each as an example. **Figure 9** shows the typical results of the performance of the embedded quadratic limit cycle regarding each noise level ( = 10*.*0*,* 5*.*0*,* and 1*.*0).

**FIGURE 8 | Results for implementing the Lissajous curve.** At timestep 5000, the system is switched from the teacher forcing condition to the closed-loop control (black line). **(A)**. The time series of the lengths of springs 1, 2, 3, and 4 for all 20 compartments. **(B)** Comparisons between the system output (O1*(t)* and O2*(t)* (blue lines)) and the target output (*x*1*(t)* and *x*2*(t)* (red lines)). **(C)** Comparisons between the system output (blue lines) and the target output (red lines) in the O1*(t)* - O2*(t)* (*x*1*(t)* - *x*2*(t)*) plane. The time series from timestep 5000 to timestep 70,000 are overlaid.

<sup>3</sup>Strictly speaking, to prove that the embedded trajectory is really a "limit cycle", we need to analytically show whether the trajectory is an attractor of the system. In our case, this is unrealistic because the equation governing the mechanical system is too complex, and we would have to rely on a heuristic approach. Accordingly, as we see later, we here call the embedded trajectory a "limit cycle" if and only if the trajectory can stay at the target limit cycle for 1,000,000 timesteps and the trajectory has a certain attraction when perturbed externally.

We can clearly see that even if the noise level is relatively large, such as = 10*.*0, the trajectories eventually recover to the limit cycle, which suggests that the embedded quadratic limit cycle is robust against external noise. We also confirmed that, even if we elongate the duration of time for the added noise, the system can successfully recover its performance. Note that, although the perturbed trajectories came back toward the limit cycle (**Figure 9B**), the oscillation phase was often shifted (**Figure 9A**). This was mainly caused by the relatively long duration of time for adding noise. We observed that by shortening this duration, this phase shift tendency can be reduced accordingly. Next, let us see the case for the embedded Lissajous curve. Compared to the quadratic limit cycle case, the system is less robust. When is more than around 0.3, we often observed that the trajectories go out from the limit cycle and never come back. **Figure 10** shows the typical results of the performance of the embedded Lissajous curve for each noise level less than 0.3 ( = 0*.*3*,* 0*.*15*,* and 0*.*1). If the noise level was less than around 0.3, we observed the system performance recovered toward the limit cycle as in the quadratic limit cycle case. However, even in this noise range, we sometimes observed an unstable trajectory as shown in **Figure 11**. In addition, similarly to the quadratic limit cycle case, even if the perturbed trajectories came back toward the limit cycle (**Figure 10B**), the oscillation phase was often shifted (**Figure 10A**).

Now, we move on to see the role of each body part (compartments or springs) as we saw in Task 1. Since the motor outputs and the body dynamics are reciprocally coupled through the feedback loop, the scheme we adopted in the Task 1 case, such as skipping one compartment, will cause unrealistic behavior of the arm due to numerical problems, as explained previously, and cannot always be adopted to appropriately evaluate the contributions of the body parts. Accordingly, we aim to investigate the contribution of each body part in terms of robustness against noise. Namely, we evaluate how the noise added to each body part affects the overall system performance. As is obvious from the system construction (Equations 24, 25, and 26) for the closed-loop control, the slight difference in the motor outputs at timestep *t* can affect the corresponding sensory time series, i.e., the lengths of the

springs, and this effect influences the motor outputs at timestep *t* + 1. That is, according to the slight difference in the motor outputs, δO1*(t)* and δO2*(t)*, the sensory time series, *sij(t)*, will deviate from the original expressed as *s ij(t)* = *sij(t)* + δ*sij(t)*, where *s ij(t)* is the actual spring length at timestep *t*. Then, from Equation (26), the outputs at timestep *t* + 1 can be simply expressed as:

$$O'(t+1) = \sum\_{j=1}^{20} \sum\_{i=1}^{4} \nu\_{\text{out}}^{\vec{ij}} s\_{\vec{ij}}'(t),\tag{30}$$

$$=\sum\_{j=1}^{20}\sum\_{i=1}^{4}\boldsymbol{w}\_{\text{out}}^{\vec{ij}}\left(\boldsymbol{s}\_{\vec{ij}}(t)+\boldsymbol{\delta}\boldsymbol{s}\_{\vec{ij}}(t)\right),\tag{31}$$

$$=O(t+1) + \sum\_{j=1}^{20} \sum\_{i=1}^{4} \omega\_{\text{out}}^{ij} \delta s\_{i\bar{\eta}}(t),\tag{32}$$

$$=O(t+1) + \\$O(t+1),\tag{33}$$

where O *(t)* and O*(t)* are the actual and original motor outputs at timestep *t*, respectively. Note that for simplicity we dropped the index expressing two outputs. Since the deviation of the motor outputs is expressed as δO*(t* + 1*)* = <sup>20</sup> *j*=1 <sup>4</sup> *<sup>i</sup>*=<sup>1</sup> *<sup>w</sup>ij* outδ*sij(t)*, we can investigate how the noise applied to a single spring at timestep *t* can affect the motor outputs at timestep *t* + 1 by fixing the other springs as the original. By investigating how δO*(t* + 1*)* evolves through time, we can also evaluate the effect of the noise against the overall system performance.

Now, let us assume that the noise was applied to the sensory value of the spring *i* in compartment *j* at timestep *t* by fixing the other sensory values as the original. Then, the deviation of the motor output at timestep *t* + 1 can be simply expressed as δO*(t* + 1*)* = *wij* outδ*sij(t)*, which straight-forwardly means that the degree of δO*(t* + 1*)* is only linearly dependent on the readout weight of the focused spring. Therefore, we can infer and compare how the noise added to each sensory value affects the

motor output at timestep *t* + 1, regarding the fixed noise value, only by checking the weight distributions. Simply saying, if the value of |*wij* out| is large, then the effect of the noise for spring *i* in compartment *j* at timestep *t* on the outputs for timestep *t* + 1 is also large, which means this spring makes a big contribution to the transition of the outputs for timestep *t* + 1. **Figure 12** shows the readout weight distributions of *w*out*,*<sup>1</sup> and *w*out*,*<sup>2</sup> for the embedded quadratic limit cycle (**Figure 12A**) and Lissajous curve (**Figure 12B**). We can see that the distributions show a characteristic pattern for each limit cycle. For example, in the embedded quadratic limit cycle case, the value of the weights often seems to have a symmetric and corresponding distribution over springs in each compartment (e.g., between spring 1,2 and spring 3,4 in compartment 3–13), and even over *w*out*,*<sup>1</sup> and *w*out*,*<sup>2</sup> (**Figure 12A**). In the embedded Lissajous curve case, this type of symmetric and corresponding distribution can also be found within each weight, but not over *w*out*,*<sup>1</sup> and *w*out*,*<sup>2</sup> (**Figure 12B**). In addition, for the distribution of *w*out*,*1, the value of the weights is almost zero in compartment 7–20, which means the external noise applied to these compartments at timestep *t* will not affect the motor outputs at timestep *t* + 1 very much.

To systematically proceed with this line of analysis for the overall system performance, we need to confirm whether the large deviation in the sensory value at a specific time also leads to the large deviation of motor outputs over time. Although this seems to be trivial in our system, it is not trivial in general because the transition δO*(t)* → δ*sij(t)* depends on the construction of the body (for example, imagine the sensory value that exhibits a saturation)4. To evaluate this, we added a small white noise in the degree of to the motor outputs only at timestep 6000, and investigated how the differences in the motor outputs, |δO*(t)*|, and the spring lengths, |δ*sij(t)*|, carry on over time according to the degree of by measuring the mean square errors between the actual and original motor commands, such as MSEO1 = 1 *T <sup>T</sup> t*=1*(*O <sup>1</sup>*(t)* <sup>−</sup> O1*(t))*2, MSEO2 <sup>=</sup> <sup>1</sup> *T <sup>T</sup> t*=1*(*O <sup>2</sup>*(t)* <sup>−</sup> O2*(t))*2, and between the actual and original sensory time series as MSEspring = <sup>1</sup> 80×*T <sup>T</sup> t*=1 <sup>20</sup> *j*=1 <sup>4</sup> *<sup>i</sup>*=1*(s ij(t)* <sup>−</sup> *sij(t))*2, where *T* = 500 throughout this experiment. Note that we consider only a small range of noise around ∈ [0*.*005*,* 0*.*1], since if the noise level is too large, the trajectories often show phase shifts as we saw in the previous analysis (**Figures 9**, **10**), which make the measures miss capturing the intended difference even if the trajectories were in the original limit cycles. **Figure 13A** shows the results of the averaged MSEO1, MSEO2, and MSEspring for each embedded limit cycle. We can clearly confirm that according to the increase in the noise level , the value of each measure also increases, which means that the large deviation of the motor outputs at a specific time also leads to a large deviation in the motor outputs over time. In addition, in the embedded Lissajous curve case, MSEO1 is larger than MSEO2 for each value, which suggests that the output O1*(t)* is more sensitive than O2*(t)* (**Figure 13A** (right)).

Now, we are ready to investigate the role of each body part. According to the results shown in **Figure 13A**, the weights assigned for each spring directly reflect how the noise added to each sensory value affects the overall system performance. To correspond to the results in **Figure 13A**, we first calculated *dw* = - *(wij* out*,*1*)*<sup>2</sup> <sup>+</sup> *(wij* out*,*2*)*<sup>2</sup> for each spring *i* in each compartment *j*, since the size of the noise to the motor output can be expressed as <sup>δ</sup>*sij*- *(wij* out*,*1*)*<sup>2</sup> <sup>+</sup> *(wij* out*,*2*)*<sup>2</sup> <sup>=</sup> δO1*(t)*<sup>2</sup> + δO2*(t)*<sup>2</sup> ≤ by scaling the size of δ*sij* in the appropriate range. Thus, the value of *dw* directly reflects the contribution of each spring to the overall system performance regarding the fixed noise value. **Figure 13B** shows the value of *dw* for each spring *i* according to each compartment for the embedded quadratic limit cycle and Lissajous curve. Interestingly, the value of *dw* for each spring is almost the same within each compartment for both limit cycles, which means that the contribution of each body part can be expressed at the compartment level. In the case of the quadratic limit cycle, the value of *dw* shows almost a zig–zag pattern and gradually decreases when the compartment number increases from the base toward the tip [**Figure 13B** (left)]. In the case of the Lissajous curve, only the compartment around the base shows high values for *dw* [**Figure 13B** (right)]. These results suggest that, according to the limit cycles embedded, the sensitivity and degree to affect the overall system performance against the noise show different tendencies for each body part.

<sup>4</sup>Theoretically, this is to analyze the basin structure surrounding the original trajectory. Due to the number of parameters, instead of analyzing the basin structure according to δO1*(t)* and δO2*(t)*, we analyzed the basin volume in terms of the error measures introduced later according to |δO*(t)*|.

In this section, we have investigated whether our soft robotic arm can embed limit cycles in a closed-loop manner, and have shown that several properties, such as the system performance, the robustness against external noise, and the role of each body part differ according to which limit cycle to embed. Since we are adjusting only the linear readouts by using the same body, we can speculate that these specific properties corresponding to each limit cycle are caused by the intrinsic body structure.

# **4. DISCUSSION**

In this paper, by using the dynamic simulator of the soft robotic arm inspired by the octopus, we demonstrated that the robot's body dynamics are already capable of emulating non-linear dynamical systems and embedding non-linear limit cycles in a closed-loop manner by only adjusting the fixed linear readouts. The arm we used did not contain any rigid components. Instead, it is soft, including only springs, which are aligned to mimic the muscular structure of the octopus. This resulted in several compartments, each of which had a specific muscularhydrostat property, which enforced the springs to be coupled in well-defined, but constrained, manner. In addition, the arm was assumed to be immersed in an underwater environment, in which the friction constants were identified via CFD simulations. All these factors, including this intrinsic body structure and its interaction with the environment, generated diverse body dynamics, including rich non-linearity and memory. The technique presented here allowed us to exploit these properties as computational resources. In addition, it is possible to infer the amount of non-linearity and memory that can be potentially exploited for information processing in terms of the task performance. For roboticists, this may open up the way to quantitatively characterize which control is efficient for which body design, as well as outsourcing the control load to the body parts. Although we kept the arm's mechanical structure as bio-inspired as possible throughout the analyses, it would also be meaningful to investigate how the information processing capability would change if the arm's mechanical properties (such as stiffness, damping,

drag parameters) are altered. This line of experimentation will be included in our future work.

For the emulation tasks of non-linear dynamical systems, in addition to its high computational power, we showed that each body part has a specific role according to the task type. Additionally, for the closed-loop control tasks, we showed that the arm prefers some limit cycles over others (i.e., the quadratic limit cycle and the Lissajous curve were possible to embed, while the Van der Pol limit cycle was not). These obvious and specific coherencies are usually not observed in conventional reservoir computing where the reservoir consists of randomly coupled non-linear computational elements, suggesting that these properties originate from the intrinsic body structure.

From a biological systems point of view, this result seems natural. In nature, animals adapt to their respective ecological niches, where they evolve their body morphology to survive within their environment. The octopus is not an exception; its specific body structure is specialized to permit survival in a complicated underwater environment, enabling it to behave efficiently in particular ways. In this context, it would be interesting to investigate whether the arm could embed more biologically plausible behaviors in future work. For example, as we mentioned earlier, it is well known that the octopus adopts a specific strategy for reaching, called *bend propagation* (Gutfreund et al., 1996; Gutfreund, 1998; Sumbre et al., 2001; Yekutieli et al., 2005a,b). In this specific motion, it is suggested that the CNS only initiates the motion and all the muscle activations are handled at the PNS level (Gutfreund et al., 1996; Gutfreund, 1998; Sumbre et al., 2001). Several researches have investigated this behavior by directly extracting the muscle contraction patterns from the real octopus, and by externally applying these patterns to the octopus arm models (Gutfreund et al., 1996; Gutfreund, 1998; Yekutieli et al., 2005a,b). On this point, our technique presented here may reveal further insights on this overall scheme by including the role of the arm's body dynamics. Considering that the PNS does not have a plasticity (Kandel et al., 2000), it would be worth investigating how the arm's body dynamics, together with the PNS, modeled as a linear and static feedback loop onto the arm, embeds the motor patterns of bend propagation according to the initiation command sent by the CNS. This line of experiment can be investigated in future work.

There exists a growing number of documented cases in nature, which support that certain morphologies found in animals are facilitating a kind of computation. This observation is usually characterized by the term *morphological computation*. For example, the non-linear, non-homogeneous spatial arrangement of the ommatidia in insect eyes are more dense toward the front than on the side in order to compensate for motion parallax, which is non-linear (Franceschini et al., 1992). The morphology counteracts the non-linearity introduced by the parallax; hence, the complexity of the computational tasks to steer through obstacles based on the visual input is reduced. Since the resulting task for the brain is now simpler and not non-linear anymore due to the "clever" morphology, one could argue that part of the computation is conducted by the morphology. While this is a very simple case of a morphological computation, since the given morphology represents only a static, non-linear mapping, the concept does go further, if we consider, for example, soft, compliant bodies. Such bodies exhibit interesting dynamic properties, such as fading memory and non-linearity. Examples of such complex computations outsourced to the physical layer are passive walkers (Collins et al., 2005). Their design pushes the limits of what can be outsourced to the physical body, in so far that no controller (i.e., CPU) is needed at all. The mechanical design inspired by the musculoskeletal structure enabling "preflexes", which can self-stabilize movements through its elastic material properties, also gives such an example (Brown et al., 1995; Blickhan et al., 2007; Proctor and Holmes, 2010). Their morphology (i.e., the mechanical, soft design and the environment) is able to "do" all the computations needed to walk robustly. While such robots are impressive, their disadvantage is their inflexibility by restricting the computation to a fixed physical body only. In a biological system, a sensible distribution of the computation between the body and the brain is more probable. Despite the number of biological examples and the series of robots, which have been built considering the concept of morphological computation in their design, there are still a few studies characterizing the concept within a quantitative framework (Hauser et al., 2011, 2012; Füchslin et al., 2013). In this context, we believe that the approach presented here would be one of the interesting directions for further study.

# **REFERENCES**


# **FUNDING**

This work was supported by the European Commission in the ICT-FET OCTOPUS Integrating Project (EU project FP7- 231608), and was partially supported by JSPS Postdoctoral Fellowships for Research Abroad.

# **ACKNOWLEDGMENTS**

We would like to thank Stefan Häusler, Tao Li, and Junichi Kuwabara for fruitful discussions and helpful suggestions.

chaotic systems and saving energy in wireless communication. *Science* 314, 78–80. doi: 10.1126/science. 1091277


"Timing-based control via echo state network for soft robotic arm," in *Proceedings of the 2012 International Joint Conference on Neural Networks (IJCNN)*, (Brisbane), 1–8. doi: 10.1109/ IJCNN.2012.6252774


recurrent neural network training. *Comput. Sci. Rev.* 3, 127–149.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 26 January 2013; accepted: 17 June 2013; published online: 09 July 2013.*

*Citation: Nakajima K, Hauser H, Kang R, Guglielmino E, Caldwell DG and Pfeifer R (2013) A soft body as a reservoir: case studies in a dynamic model of octopus-inspired soft robotic arm. Front. Comput. Neurosci. 7:91. doi: 10.3389/ fncom.2013.00091*

*Copyright © 2013 Nakajima, Hauser, Kang, Guglielmino, Caldwell and Pfeifer. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

# Kinematic primitives for walking and trotting gaits of a quadruped robot with compliant legs

# *Alexander T. Spröwitz\*, Mostafa Ajallooeian , Alexandre Tuleu and Auke Jan Ijspeert*

*Biorobotics Laboratory, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland*

### *Edited by:*

*Martin Giese, University Clinic Tuebingen/Hertie Institute, Germany*

### *Reviewed by:*

*Luc Berthouze, University of Sussex, UK Gustavo A. Medrano-Cerda, Istituto Italiano di Tecnologia, Italy*

### *\*Correspondence:*

*Alexander T. Spröwitz, Biorobotics Laboratory, École Polytechnique Fédérale de Lausanne, EPFL STI IBI BIOROB, INN 237, Station 14, CH-1015 Lausanne, Switzerland e-mail: alexander.sproewitz@epfl.ch*

In this work we research the role of body dynamics in the complexity of kinematic patterns in a quadruped robot with compliant legs. Two gait patterns, lateral sequence walk and trot, along with leg length control patterns of different complexity were implemented in a modular, feed-forward locomotion controller. The controller was tested on a small, quadruped robot with compliant, segmented leg design, and led to self-stable and self-stabilizing robot locomotion. *In-air* stepping and *on-ground* locomotion leg kinematics were recorded, and the number and shapes of motion primitives accounting for 95% of the variance of kinematic leg data were extracted. This revealed that kinematic patterns resulting from feed-forward control had a lower complexity (*in-air* stepping, 2–3 primitives) than kinematic patterns from *on-ground* locomotion (4 primitives), although both experiments applied identical motor patterns. The complexity of *on-ground* kinematic patterns had increased, through ground contact and mechanical entrainment. The complexity of observed kinematic *on-ground* data matches those reported from level-ground locomotion data of legged animals. Results indicate that a very low complexity of modular, rhythmic, feed-forward motor control is sufficient for level-ground locomotion in combination with passive compliant legged hardware.

**Keywords: motion primitives, locomotion patterns, central pattern generator, quadruped robot, passive leg compliance, entrainment, principal component analysis, walk and trot**

# **1. INTRODUCTION**

The overlapping fields of functional leg anatomy, leg and body compliance, and neuro-control in legged locomotion are intensively researched. Results potentially allow insights into the structure and functionality of the nervous system of animals. Not surprisingly, roboticists have started researching bio-inspired, legged robot systems, both on the functional morphological level, and the controller level. Though intrinsically limited (Webb, 2001), robots are beginning to be used as proof-of-concept platforms (Raibert et al., 1984; Raibert, 1990; Full and Koditschek, 1999; Ijspeert et al., 2007; Umedachi et al., 2010; Zhou and Bi, 2012).

In this experimental work we present results by comparing basic patterns measured from kinematic leg data from *in-air* stepping movements of a suspended legged, compliant robot, and from *on-ground* locomotion of the same robot during lateral sequence walk and trot. We applied as measure the number of significant principal components (PCs) extracted from joint-angle data of the robot's compliant, multi-segment legs. We compared four parameter setups, altering the robot's gait control parameters between walk and trot, its speed, and the modules and complexity of its locomotor drive signals. For the robot hardware, special attention has been paid to the in-series and in-parallel leg compliance. The robot was designed such that its leg's compliance and cable-driven actuation were the medium of *change of kinematic complexity* between feed-forward-sent and observed kinematic joint patterns, through emerging mechanical entrainment during level-ground (flat ground) locomotion. During *in-air* leg movement, leg-joints were not exposed to gravitational or inertial forces acting on the robot body (they were only exposed to those acting on the leg's own light-weight segments). The robot's legs were hanging freely *in-air*, and replayed motion patterns represented the kinematic complexity of the feed-forward locomotor controller. During *on-ground* locomotion, leg-joints were compressed by gravitational forces acting on the robot body, and Newtonian dynamics acting at the robot's body eventually deflecting the robot's compliant limbs. Self-stable and self-stabilizing locomotion only emerged if appropriate feed-forward patterns were sent to the robot. In all other cases, the robot would stumble, fall, or move only very poorly, i.e., very slowly or even backwards.

Physical, biological leg compliance was found to function as energy recoil mechanism, allowing animals to re-use negative work, and reduce metabolic cost of locomotion (Alexander, 1984, 1990, 1991; Biewener and Blickhan, 1988). Sources of compliance were found both in muscles and muscle complexes (Witte et al., 1994; Labeit and Kolmerer, 1995; Wilson et al., 2003), and in tendons and aponeuroses (Alexander, 1977; Witte et al., 1997; Biewener, 1998; Gregersen et al., 1998; Lichtwark et al., 2007). If biological systems rely that strongly on in-series and in-parallel compliant "locomotion hardware," the nervous system producing motor control patterns has to be able to cope with leg compliance, and is required to send the corresponding control signals. Ivanenko et al. (2002) present experiments with walking humans, and *in-air* stepping with varying amounts of gravity, leading to no or limited sensory feedback through foot contact. They observe that during air-stepping ". . . motor patterns are transformed in simple harmonic angular motion of the lower limb segments associated with alternating activation of antagonist muscles" (cf. Ivanenko et al., 2002, p. 3087). Although they conclude on the role of peripheral sensory input, we are building in this work on the idea that the complexity of leg kinematics during locomotion can be increased through compliant, and purely mechanical components of the locomotor apparatus and its interactions with the ground. In turn, this could mean that the underlying layers of motor control do not need to send as complex control signals as one might have guessed from kinematic studies, while still achieving a sufficiently complex, adaptive kinematic output.

The spring-loaded inverted pendulum (SLIP) framework describes how an abstracted, energy conservative point-mass system can make use of a compliant, in-series elastic leg design to self-stably run (Blickhan, 1989; Seyfarth et al., 2002). In a SLIP simulation, minimal control effort is required to stabilize locomotion. Control of the leg's angle of attack is necessary, together with monitoring the system's hip apex height. A feed-forward SLIP controller will lead to self-stable locomotion patterns, also under the influence of small perturbations. Full and Koditschek (1999) explain the SLIP model as a "template." A biological or robotic legged system still requires an "anchor," to map the system mechanics to the template.

In this work, central pattern generator (CPG) control signals in feed-forward mode were implemented as high-gain position signals from the robot's RC-servo motors, actuating hip joints and leg length. Interpreting CPG output patterns as position signals presents an abstraction and simplification of animal motor control and actuation. In animals, motor control signals of the nervous system are interpreted by sets of antagonistic muscles and muscle groups to produce joint torques (Inman et al., 1952). Hence, animals can control their limbs in many modes, including "position-control" (muscle lengths leading to joint angles), but also with adjustable joint torques (Winter, 1983; Fischer and Blickhan, 2006).

With the help of a robotic tool like Cheetah-cub robot, this work's intention is to shed light on the interplay between rhythmic, modular feed-forward motor control, and the mechanical entrainment leading to stable gait patterns. Mechanical entrainment was a result from an in-series and in-parallel segmented, robotic, bio-inspired leg design, appropriate actuator control patterns, gravity and body dynamics, and ground contact during locomotion. Though not within the topic of quadruped legged walk and trot locomotion, entrainment in articulated robotic systems has been looked upon earlier; Lungarella and Berthouze (2002) designed a setup with a swinging humanoid robot, showing that physical entrainment led to a larger basin of attraction for the space of control parameters leading to stable swinging motions. Entrainment was also achieved at the presence of non-linear mechanical coupling of the humanoid to its environment (Berthouze and Lungarella, 2004). This is interesting because ground contact of Cheetah-cub robot also presents a non-linear perturbation (alternating leg swing and stance phase).

Here, the simplified case of a position-controlled system with serial compliance allows us to focus on few components, and dismiss the effects of additional control and hardware complexity (e.g., explicit control feedback loops, corresponding controller architecture, or torque control strategies). As mentioned earlier, animals feature components of (simple) in-series leg compliance (Alexander, 1984; Gregersen et al., 1998). This morphological feature works also well for robots. In-series compliance can lead to reduced impact forces, what consequently can reduce the control complexity (Meyer et al., 2006).

In the presented experiments, change of complexity between the directly commanded *in-air* leg kinematics and the *onground* locomotion leg kinematics emerged through the robot's compliant leg design and the interactions with the ground. At level-ground locomotion, periodic leg length shortening is caused (a) by signals of the robot's feed-forward controller and (b) by the robot-body's pitch, roll, vertical, and translational body movements (**Figures 2-2–5**). The inertia- and gravity-induced robot body and leg length movements present a major difference between Cheetah-cub robot and the controller applied in this work, and other feedback-controlled and body-stabilized quadruped robots (Raibert et al., 1986; Havoutis et al., 2013). If one would design a similar set of experiments, but sent feedforward signals to a high-gain position controlled robot with stiff leg design, its kinematic data from *in-air* running would (largely) match that of *on-ground* running experiments. Hardware springs and cable clutches are not the only way to achieve deviating, adaptive joint kinematics. Alternative setups can facilitate low-gain position controlled actuators, or more generally torque or force controllers (Buchli et al., 2009; Valenzuela and Kim, 2012). These setups, however, require explicit feedback control.

Kinematic leg patterns of Cheetah-cub robot were extracted by principal component analysis (PCA; Krzanowski, 1988; Jolliffe, 2002) on the normalized kinematic leg data, for both *in-air* leg movements, and *on-ground* locomotion. Applying PCA on kinematic leg data from recording of locomotion experiments, or corresponding electromyographic data of the participating muscles has become a common tool in biology, and neurobiology. Dominici et al. (2011) present in a comparative study basic patterns derived from electromyographic (EMG) data of stepping human neonates, toddlers, pre-schoolers, and adults. The authors compare these results to data of neonatal rats, and adult quadruped animals such as cats and monkeys. They report that human neonate stepping and neonatal rat stepping can be represented by two basic patterns, and that human toddler locomotion activation patterns along with all adult quadruped animals share a similar pool of four basic patterns. However, toddler and human adult patterns show differences: ". . . the four patterns [of human adults] were accurately timed around the four critical events of the gait cycle . . . " (cf. Dominici et al., 2011, p. 998). Basic patterns from toddler locomotion were less time structured. Dominici et al. (2011) conclude that the increase in patterns is caused by continuous learning, until adulthood. The similarity between toddler data, and basic patterns of adult quadrupedal animal locomotion provides a possible explanation as to where gait patterns in vertebrates originate: central pattern generators located in the spinal cord (Delcomyn, 1980; Grillner, 1985; Ijspeert, 2008). Moro et al. (2013a) extracted four basic patterns (kinematic motion primitives, kMPs) from horses, for the three gaits walk, trot, and gallop (kMPs account for 93% and 97% of the kinematic data). Moro et al. (2013b) found five kMPs accounting for almost all variance of human walking and running kinematics. Ivanenko et al. (2004) reported five basic muscle activation patterns accounting for almost all variance of muscle activation, during human walking. Koditschek et al. (2004) reported observations on retrieving basic patterns from running cockroaches. Despite the animal's very high number of degrees of freedom—a cockroach is six legged and has multi-segment legs—". . . a single component represen[ted] over 80% of the variation . . . " (cf. Koditschek et al., 2004, p. 256), for very fast running. The authors report that almost all variations were captured by three basic components, leading to the conclusion that a very simple neural controller was likely responsible for the motor control of this insect. The above findings from insects and mammals suggest that basic locomotion on level-ground requires three to five basic patterns, and possibly fewer for very fast locomotion. It is intriguing to be able to hypothesize and through robot hardware implementation and experimentation, test the interplay between locomotion controller, robot morphology, and locomotion patterns. Although direct conclusions can only be drawn for the artificial, robotic system, similar designs of both systems and similar results for the task of locomotion can provide insights into animal locomotion control, and how neuro-control is interacting with bio-mechanical components.

We recorded joint-angle leg kinematics of our quadruped robot, for two situations: *in-air* trotting movements, versus robot locomotion *on-ground* (**Figure 1**). With its legs swinging *inair* and without contacting the floor, the kinematics of the robot's low-inertia leg-joints followed the commanded patterns. Once Cheetah-cub robot is placed *on-ground* to walk or trot, the interplay between ground contact, in-series leg compliance and spring deflection, and body movements alters the complexity of its leg kinematics. In all experiments documented, motor control patterns where sent feed-forward. Hence, the observed changes between *in-air* and *on-ground* leg kinematics were caused by ground contact and mechanical entrainment.

The experiments of this work are intended to inform on the potential interplay of a compliant, legged system, such as found in legged animals or robots, and its motor control. In animals, the interplay of the brain, spinal cord, peripheral nervous system (PNS), morphology, and intrinsic, mechanical properties (springiness, damping, inertial moments, lengths, speed and force properties) plays a key role for its locomotion capabilities. Studies like Ivanenko et al. (2002) or Hägglund et al. (2013) show that understanding the role, structure, and interplay of locomotor components in animals is difficult. With a largely simplified robot "hardware model" we can focus on the interplay of only a few components. Specifically, this robot features only these locomotor components: a feed-forward, oscillatory motor controller (we implemented a central pattern generator, but any other controller with similar features could be applied), and the mechanically compliant, bio-inspired leg structure. Cheetah-cub robot is stripped off any task-level control feedback, the motor control CPG is purely running in feed-forward mode. Self-stable and self-stabilizing locomotion was a product of appropriate motor control patterns (derived through systematic testing, see Spröwitz et al., 2013), the robot's compliant leg design, and the mechanical entrainment of these components, through ground contact during *on-ground* locomotion.

Establishing a reduced experimental setup, without any tasklevel feedback from the nervous system, is hard to achieve in live, locomoting animals. Possibilities for modulation of PNS pathways include preparations with drugs, lesions (see for example Grillner and Zangger, 1979), or more recently introduced methods like light-evoked activation and deactivation of spinal cord and PNS components (Daou et al., 2013; Hägglund et al., 2013). The vast number of publications in the field shows the complexity of locomotion generation and control in animals. From this perspective, the use of a legged robot with a programmable motor controller and dedicated hardware presents a diametrical, bottom-up approach to analyze the interplay of only its featured, however, much simpler locomotor components.

As our **first hypothesis** (H1-1) we expect four to five basic patterns accounting for 95% of the variance of kinematic leg data, at low-speed and mid-speed robot locomotion, as observed from legged animals (Ivanenko et al., 2004; Dominici et al., 2011; Moro et al., 2013a,b). Further, Koditschek et al. (2004) had reported a decreasing number of motion primitives at high locomotion speed, for a simpler biological legged systems with leg compliance. Hence, we expect a similar trend: a decreasing number of *on-ground* PCs with increasing robot speed (H1-2).

In addition, we looked at the number of *in-air* stepping PCs, versus the number of *on-ground* locomotion PCs. The interaction of ground contact, feed-forward controlled compliant legs, and the naturally emerging pitch and roll body movements produced a self-stable walk or trot gait (Spröwitz et al., 2013). This richer, additional pattern through mechanical entrainment should be visible as a higher number of observed basic principal components for *on-ground* locomotion, compared to non-contact, *in-air* stepping. *In-air* stepping presents virtual locomotion patterns with rigid, non-compliant legs. No ground contact deflects the serial leg springs out of their slack length, and recorded complexity of *in-air* kinematic data represents effectively the complexity of Cheetah-cub's feed-forward controller. As our **second hypothesis** we expect a higher number of basic patterns for *on-ground* locomotion, compared to *in-air* stepping (H2).

This paper is organized as follows: in section 2 we give a short overview of the robot's locomotion controller and hardware. We provide details of data recording and processing, and of the extraction of basic primitives from the recorded kinematic data. In section 3 we present the results from the four proposed experiments for walk and trot locomotion, *in-air* and *on-ground*. In section 4 we discuss results and their implications for biological and robotic systems. Finally, we conclude the paper.

# **2. MATERIALS AND METHODS**

The first part of this section covers a brief description of the robot's hardware. For a more throughout description please refer to (Spröwitz et al., 2013). Next, this section provides information about the experimental tools and setup, and details about the applied principal component analysis. Cheetah-cub robot's gait controller based on a central pattern generator (CPG in feed-forward mode) is explained. Videos of Cheetah-cub robot running can be found at

**FIGURE 1 | Schematic presentation of the** *in-air* **experiment (left), the** *on-ground* **experiment (middle), and a picture of the real robot (right).** For the *in-air* experiment, feed-forward locomotion patterns (permutations of trot, lateral sequence walk, locomotion frequencies of 2.5 and 3.5 Hz, two different control pattern types) were sent to the robot, while the robot's body was mounted on a stand, with its legs swinging freely *in-air*. At the *on-ground* experiment, the quadruped robot walked and trotted freely on

level-ground, with an average speed between 0.45 and 0.9 ms<sup>−</sup>1. Identical motor control patterns were sent to both *in-air* and *on-ground* experiments, for each experiment type. In all experiments, resulting kinematic patterns (leg angle and leg length) were recorded. Change of complexity of kinematic primitives between *in-air* and *on-ground* derived from ground contact, and the robot's compliant leg design. Further details of Cheetah-cub robot's leg design and compliance are available in **Figure 2**.

**FIGURE 2 | Cheetah-cub leg mechanism, and leg compliance.** A single leg is shown abstracted, detailed leg segment ratios are omitted for clarity, robot heading direction is to the left. **(1)** shows the three leg angles αprox, αmid, and αdist. Hip and knee RC servo motors are mounted proximally, the leg length actuation is transmitted by a cable mechanism. The pantograph structure was inspired by the work of Witte et al. (2003) and Fischer and Blickhan (2006). **(2)** The foot segment describes a simplified foot-locus, showing the leg in

mid-swing. For ground clearance, the knee motor shortens the leg by pulling on the cable mechanism (green, *F*cable). *F*diag is the major, diagonal leg spring. Its force extends the pantograph leg, against gravitational and dynamic forces. **(3)** The leg during mid-stance. **(4)** In case of an external translational perturbation, the leg will be compressed passively. **(5)** If an external perturbation torque applies e.g., through body pitching, the leg linkage will transmit it into a deflection of the parallel spring, not of the diagonal spring.

http://biorob2.epfl.ch/utils/movieplayer.php?id=209 and http:// biorob2.epfl.ch/utils/movieplayer.php?id=207.

### **2.1. QUADRUPED ROBOT HARDWARE**

Cheetah-cub robot's leg design is based on a mammalian, animalinspired pantograph mechanism (Witte et al., 2003; Fischer and Blickhan, 2006). An automatic, cable-based clutch mechanism, proximal actuation, and a compliant foot joint enhanced the original, bio-inspired hardware blueprint (Spröwitz et al., 2013). Each robot-leg was individually controlled by two RC servo motors. The leg length (knee) actuator actively *flexed* the leg via a cable mechanism, antagonistic to the diagonal leg spring (**Figure 2-2**). The cable mechanism also works as an automatic decoupling mechanism. It goes slack if external forces are applied to the leg (**Figure 2-4**). The robot's proximal actuator was directly mounted between body and leg. It protracted and retracted front and hind legs.

The robot's body was implemented as a stiff plate, only legs provided compliance. Three leg springs are acting in this leg design, under different load conditions (**Figure 2**): *F*diag is the inparallel spring to the cable actuation, and provides anti-gravity support. *F*par is the spring replacing one of the struts of the linkage mechanism. Under tension it provides an in-series leg elasticity. The third spring is located in the most distal leg joint. It is a helical spring and provides serial foot torque. In sum, this presents a very compliant leg design with a very low leg stiffness, in comparison with biological systems. The linearized vertical leg stiffness of two in-parallel Cheetah-cub robot legs is about 0.25 kN/m, for a static measurement with isolated legs. During fast trotting locomotion (Froude number speed of FR = 1.0), a leg stiffness of *F*vert/*l* = 0.65 kN/m was recorded. Leg stiffness for *running*, quadruped animals of this body weight, but at faster speed, are documented to be almost twice as high (*k*leg = *M*0.<sup>67</sup> = 1.05 kN/m, using the convention of treating all groundcontacting legs as one, Farley et al., 1993). Compared to a young cat of equal weight, Cheetah-cub robot exhibits a more crouched leg posture. This is generally associated with a lower overall leg stiffness through bigger effective lever arms.

In all experiments, the robot was tethered to a power supply through a long, light-weight power cable. CPG computation, RC servo motor control signal generation, and wireless communication were controlled from a single board computer, mounted on the robot's body.

### **2.2. DATASET, EXTRACTION, AND EXPERIMENTAL SETUP**

The robot was controlled with four different sets of control parameters (**Figure 4**). The robot ran at two different gait patterns (lateral sequence walk, and trot), a speed range from mid-speed to higher-speed, and two different knee-control strategies. Thus, the robot exploited its body dynamics for different gait patterns, control complexities, and dynamical speed conditions: (a) Lateral sequence walk gait with a locomotion cycle frequency of 2.5 Hz, with double-peak knee deflection (DP, **Figure 3**), at medium robot speed. (b) Trot gait with a locomotion cycle frequency of 3.5 Hz, with DP knee actuation, at high robot speed. (c) Trot gait with a locomotion cycle frequency of 2.5 Hz, with DP knee actuation, at medium robot speed. (d) Trot gait with a locomotion cycle frequency of 3.5 Hz, with a single-peak knee actuator signal (SP), at higher [compared to (a) and (c)] robot speed. Single-peak and double-peak leg length control signals, and hip-joint control signals are plotted for one locomotion cycle in **Figure 3**. All robot runs were repeated 10 times, and between 30 and 60 stride cycles were extracted for each gait. Kinematic robot data was recorded with a motion capture (MOCAP) system, based on infrared reflective markers of 11 mm diameter. Twelve MOCAP cameras (Optitrack s250e, Naturalpoint, Inc., 2011) were mounted at 1.20 m and 2.30 m height, positioned in a large rectangular arena around the locomoting quadruped robot. Cameras observed a volume of 1.5 m width, 4 m length, and 0.5 m height. MOCAP data were captured at *f* = 250 fps. Marker trajectories were processed and cleaned with Arena software (Naturalpoint, Inc., 2011). Unlabeled markers were labeled in *Mokka* (Barré and Armand, 2014). Data was loaded in Matlab (MATLAB, 2009, v. 7.9) with *b-tk* framework (Barré and Armand, 2014). All marker trajectories were low-pass filtered, with an 18 Hz cut-off frequency.

For *in-air* experiments, Cheetah-cub was mounted on a small stand in the center of the arena. Its legs were hanging freely in the air, and MOCAP cameras recorded leg kinematics. For *on-ground* experiments, Cheetah-cub ran the full length of the motion capture arena distance (4 m), without restraints. The robot was powered externally by a power tether. Cheetah-cub's design includes no explicit degree of freedom for changing direction, i.e., adduction or abduction. Before reaching its steady state and before recording was started, the power tether was, in a few cases, used to correct the robot's heading. During recording, the robot would walk or trot while the tether was carefully kept loose. The robot was started from the ground, data was recorded once it reached steady state. This happened typically after less than three locomotion cycles. Markers were attached on the robot's right side of fore and hind limb, on the proximal leg joints, midleg joints, and feet (**Figures 1**, **2**). Using leg kinematics, angles of proximal (αprox), middle (αmid), and distal (αdist) joints were calculated (**Figure 2-1**). As the robot leg's parallel spring and foot spring work in-series, deflection of the distal leg segment and the foot segment were combined for simplification into a single angle (αdistal). Recorded locomotion stride cycles were synchronized based on *pi*/2 crossing of hip angle of the virtual leg (hip to foot), at mid-swing. The end of each stride cycle was calculated from the inverse stride frequency. Finally, all cycles were divided into 100 samples per cycle, using Piecewise Cubic Hermite Interpolating Polynomial (Fritsch and Carlson, 1980).

# **2.3. PRINCIPAL COMPONENT ANALYSIS, PCA**

Kinematic primitives describing the main components of the kinematic leg dataset can be computed by matrix factorization (Strang, 2003). We applied Principal Component Analysis (Krzanowski, 1988; Jolliffe, 2002) to implement matrix factorization (similar to Fod et al., 2002; Bizzi et al., 2008). The obtained data are represented with **X***no*,*nv* , with *no* being the number of observations, i.e., the number of samples per cycle, and *nv* being the number of variables, i.e., the number of cycles. The dataset was first normalized **<sup>X</sup>**˜ *no*,*nv* , so each cycle had a zero mean and a standard deviation equal to one. The covariance matrix of the normalized dataset was then calculated, obtaining *nv*,*nv* :

$$\Sigma = \frac{1}{N-1} \sum\_{i=1}^{N} (\tilde{X}\_i - \bar{X})^T (\tilde{X}\_i - \bar{X}) \tag{1}$$

where *X*˜*<sup>i</sup>* is the *i*-th observation in **X**˜ , and *X*¯ is the mean observation. Principal components of the covariance matrix were extracted, obtaining the loading vectors **v***i*, *i* = 1.. min (*no* − 1, *nv*), and the respective eigenvalues λ*i*, sorted in descending order. Typically a low number of principal components are sufficient to account for a big part of variance. If the first *ns* components account for a percentage variance (e.g., *ns* components account for 95% of the variance, for all results in this work), then *ns* primitives are obtained by projecting the normalized dataset on the most significant loading vectors **v***i*, *i* = 1..*ns*:

$$\mathbf{p}\_i = \mathbf{\tilde{X}} \mathbf{v}\_i \tag{2}$$

### **2.4. LOCOMOTION CONTROL WITH CENTRAL PATTERN GENERATOR**

Central pattern generators (CPG) were successfully applied to generate locomotion patterns for legged and other robots (Fukuoka et al., 2003; Ijspeert, 2008; Spröwitz et al., 2008, 2013; Sato et al., 2011). We applied a CPG implemented as a network of coupled oscillators to rapidly and conveniently encode a feed-forward control signal with explicit, legged locomotion-relevant input parameters such as duty factor, hip amplitude, leg length, and locomotion frequency. Cheetah-cub is a RC-servo motor controlled quadruped robot. The CPG provides for smooth trajectory transition at gait initialization, because of damping terms in the CPG equations. The CPG controller was running feed-forward, i.e., it was streaming a position signal to the RC-servo motors, without incorporating external feedback. Cheetah-cub robot's CPG controller consisted of two modules, a hip controller, commanding the hip motor, and a knee controller, commanding the leg length through a proximally mounted knee motor. Two knee control strategies (single-peak SP, and double-peak DP) were implemented. An example locomotion cycle is provided in **Figure 3**. Top and center plots show a gait applying double peak knee-signals (DP). Top and bottom plots show a gait with a single-peak (SP) knee-signal. The hip signal (top plot) is identical for SP and DP gaits. We used previously derived CPG parameters for trot gait (Spröwitz et al., 2013). The hip-joint-driving CPG consisted of a network of four phase-coupled oscillators, each oscillator controlled one hip joint. The gait was switched from lateral sequence walk (cf. Hildebrand, 1989) to trot by setting the phase shift between hip oscillators accordingly (Righetti and Ijspeert, 2008). Knee oscillators of the corresponding knee were coupled serially to their hip oscillator. The range of speeds obtained for walk and trot is shown in **Figure 4**.

# **3. RESULTS**

This section provides the results from four experiments, each for *in-air* stepping and *on-ground* locomotion of the robot. The number and shapes of basic components accounting for at least 95% of the variance of the kinematic leg data are provided.

# **3.1. DOUBLE-PEAK (DP) KNEE PATTERN, WALK,** *F* **= 2***.***5 HZ**

**Figure 5** shows joint-angle data *in-air* and *onground* (**Figures 5A,B**), and the corresponding principal components (**Figures 5C,D**). Average robot speed for this experiment was 0.45 ms<sup>−</sup>1, around 2.3 body lengths per second. **Figures 5C,D** indicate that more than 97% of the *in-air* patterns can be presented by three principal components, and 96% of the *on-ground* joint-angles kinematics by four principal components. The first PC of *in-air* and *on-ground* represents almost 50% of the kinematic data, for both cases (**Figure 9A**).

# **3.2. DOUBLE-PEAK KNEE PATTERN, TROT,** *F* **= 3***.***5 HZ**

Average robot speed for the 3.5 Hz trot gait with double-peak knee actuation was the highest of all four experiment types (0.9 ms<sup>−</sup>1). Higher average robot speed is possible with higher motor coil voltage, up to 1.42 ms−<sup>1</sup> (Fr = 1.3, Spröwitz et al., 2013). However, the power consumption becomes so large that motors would break at longer experimentation, caused by coil overheating. A speed of 0.9 ms−<sup>1</sup> was a compromise between fast robot locomotion, around 4.5 body lengths per second, and a repeatable, robust experimental setup. The results in **Figure 6** show three basic patterns for *in-air* stepping (98%), and four basic patterns for *on-ground* locomotion (97%). **Figure 9B** shows

**FIGURE 3 | Schematic presentation of the control signals for the two servo motors per robot leg: hip joint-angles (top,** **h) and knee joint-angle (center, bottom,** **k) for one stride cycle.** Swing phase is plotted on the left side (transparent background), stance phase on the right side (gray background). The *double-peak* (DP) knee signal activates the knee motor twice per stride cycle. The leg is flexed stronger during swing phase (larger amplitude), and less flexed during stance phase (*A***k,st**). The *single-peak* (SP) knee motor activation signal triggers only during swing phase (*A***k,sw**). Leg flexing in SP mode during stance phase emerges through inertia and gravity acting on the robot body, compressing the compliant stance leg at ground contact. For DP knee activation, active actuator leg shortening overlapped with inertia-induced leg shortening (Spröwitz et al., 2013).

that the first *in-air* PC of this gait accounted for more than 60% of the variance of the kinematic data.

# **3.3. DOUBLE-PEAK KNEE PATTERN, TROT,** *F* **= 2***.***5 HZ**

In this experiment the robot trotted at mid-speed level, at *v*av = 0.55 ms<sup>−</sup>1. This was little more than the average speed of the robot for walk-gait experiment (0.45 ms<sup>−</sup>1), at the same gait frequency. Three basic patterns account for 98% of the variance of the *in-air* stepping data (**Figures 7**, **9C**). Four basic components account for 97% of the variance of the *on-ground* locomotion joint-angle data.

# **3.4. SINGLE-PEAK (SP) KNEE PATTERN, TROT GAIT,** *F* **= 3***.***5 HZ**

Single-peak (SP) knee actuation based locomotion was only stable above a speed of 0.55 ms<sup>−</sup>1. The single activation burst of the knee actuator, effectively shortening leg length, is triggered during swing phase. This provides leg ground clearance to freely swing the leg and foot forward. Without leg length actuation during stance phase, leg shortening relies solely on inertia and gravity forces acting on the robot body compressing (flexing) the leg. This necessary amount of inertial energy explains the minimum required speed for SP experiments. **Figure 8** shows two almost sine-shaped PCs for *in-air* stepping. Both PCs account for 98%

**FIGURE 4 | Average speed values of the four experiments, sorted by gait type (walk, trot) and gait frequency (***f* **= 2***.***5***,* **3***.***5** *Hz***).** The 3.5 Hz trot applied a single-peak knee-trajectory (SP), the three remaining experiments applied double-peak knee-trajectories (DP). For the applied RC servo motor voltage of 12 V, a trot gait speed of 0.9 ms−<sup>1</sup> (FR = 1) is about the maximum average speed.

of the variance of the kinematic joint-angle data, the first PC accounts for 58% of the variance, the second PC 40% of the variance. *On-ground* locomotion showed four PCs, accounting for 96% of the variance. The number of *in-air* PCs differed, compared to all other experiments (**Figure 9D**). The major change was a switch of the complexity of the knee-control signal, from double-peak to single-peak. This reduction was reflected in the *in-air* PC data, however, *not* in the *on-ground* PC data.

# **3.5. COMBINED DATA**

In **Figure 10**, PCs of *in-air* leg movements and *on-ground* locomotion of all four experiments are depicted. This common pool of all collected joint-angle kinematics includes lateral sequence walk and trot gait data. The corresponding joint-angle plot was omitted, it basically covers the entire plot area. Three *in-air* PCs account for 97% of the variance, and four *on-ground* PCs account for 95% of the variance.

# **4. DISCUSSION**

We presented results of locomotion patterns *in-air* and *onground*, for the two gait types walk and trot. The robot ran at average forward speeds between *v*av = 0.45 ms<sup>−</sup>1, more than 2 body lengths per second, and *v*av = 0.9 ms<sup>−</sup>1, around 4.5 body lengths

**frequency of 2.5 Hz, with double-peak knee activation patterns. (A)** Joint angles for *in-air* experiment, and **(B)** joint-angles for *on-ground* experiment.

*on-ground* experiment. Three basic patterns of the *in-air* experiment sum up to 97%, the four basic pattern of the *on-ground* experiment to 96%.

per second. All *on-ground* locomotion experiments showed four principal components. *In-air* experiments revealed either two PCs (single-peak knee controller) or three PCs (double-peak knee controller), accounting for at least 95% of the variance of the kinematic leg data.

Stable walk and trot gait pattern were derived by encoding joint control patterns as a set of coupled oscillators (CPG), and manually tuning CPG parameters. For all speeds and gaits, only one type of hip joint pattern was sufficient: a duty-factor distorted sine-wave position signal (**Figure 3**, top plot). For lower speed gaits, double-peak knee signals (**Figure 3**, mid figure) were necessary to produce stable gait patterns. For higher speed, both singlepeak (**Figure 3**, bottom plot) and double-peak signals produce stable gait patterns.

### **4.1. IMPACT OF MECHANICAL ENTRAINMENT**

For all experiments mounting the robot on a stand and moving legs *in-air*, gaits with double-peak knee signals (DP-trot and DPwalk) showed three principal components accounting for at least 95% of the variance. All experiments *on-ground*, independent from the robot speed, showed four basic patterns. These results support the first hypothesis (H1-1), derived from observations with animals; level-ground locomotion showed four to five basic components (Ivanenko et al., 2004; Dominici et al., 2011; Moro et al., 2013a,b).

Our *in-air* observations (hypothesis H2) coincide qualitatively with observations from human *in-air* stepping exhibiting simpler, harmonic leg kinematics (Ivanenko et al., 2002), essentially a lower observed kinematic complexity. However, we were unable to find quantitative descriptions of PCs for *in-air* stepping in animals with feed-forward-only motor control. Until otherwise reported, we consider the occurrence of either 2 or 3 PCs for the *in-air* patterns corresponding to 4 *on-ground* PCs as a weak indication for a similar mechanism in animals. Only similarly conducted animal experiments could potentially reveal evidential details on feed-forward motor control mechanisms in animals.

The interaction of the robot with its environment (i.e., ground contact) increased the kinematic complexity by at least one principal component. Qualitatively, this was externally observable through emerging robot body pitch and roll patterns.

For the single-peak knee actuation experiment (SP, 3.5 Hz, trot) Cheetah-cub robot ran at high speed *on-ground*, in average 0.8 ms<sup>−</sup>1. Replaying the same CPG drive signals

to 98%.

*in-air* showed two basic patterns (98%, **Figure 8C**). We found stable *on-ground* SP-gait patterns only at higher robot speeds, from 0.55 to 1.1 ms<sup>−</sup>1. From this speed on, the robot's leg springs were sufficiently deflected by inertial forces and gravity, acting on the robot body, and enabled mechanical entrainment. Four basic patterns accounted for at least 95% of the variance of *onground* locomotion, for the SP-gait experiment. This result is not in accordance with the second part of hypothesis 1 (H1-2); a decreasing number of *on-ground* PCs was found at higher animal speeds for cockroaches (Koditschek et al., 2004). However, Cheetah-cub robot is unable to perform normalized speeds documented for these insects. The maximum robot speed recorded was 6.9 body lengths per second, or (Froude number of FR = 1.3 Spröwitz et al., 2013). It is possible that Cheetah-cub is not running fast enough to replicate similar results. Above results have potential implications for the general implementation of quadruped robot locomotion controllers: at level-ground running, the resulting patterns *on-ground* require not more than four basic components. This reduces the necessary complexity of the locomotion controller, also for other controller types than CPGs.

**(B)** joint-angles for *on-ground* experiment. **(C)** Principal components

# **4.2. CONTROL DIMENSION REDUCTION AND ROBOTIC GAIT GENERATION**

The CPG model used for the control of locomotion included more that 10 open parameters, tuned for each gait. We hypothesize that with a collected dataset of sufficient size, the extracted in-air PCs representing this dataset can be used to reconstruct a new controller. Hence, to generate a new gait, one could tune the in-air primitive-weights for different joints, instead of CPG control parameters. However, many additional experiments are required to prove this claim. If above weight-tuning would apply, one could also encode the extracted primitives into a Dynamical Movement Primitive (Ijspeert et al., 2003, 2013). This would provide smooth modulation of the output signals as well as feedback incorporating capabilities (Ajallooeian et al., 2013). Such a controller would allow for switching between different gaits, by a change of primitive weights.

At present, we found no quantitative data available to compare our results with other legged robotic platforms, to observe the effect of mechanical entrainment of feed-forward controlled, legged robots and their kinematic motion primitives. Platforms other than Cheetah-cub robot exist, locomoting with

dynamical, feed-forward controlled, self-stabilizing gaits and similar leg mechanics. Bobcat-robot for example is a small bounding robot and it can reach dynamical, full flight phases in-between touch-downs. It is equipped with a Cheetah-cublike feed-forward controller, in-series elastic segmented legs, and an actuated spine (Khoramshahi et al., 2013). However, no data on the complexity of its kinematic *on-ground* PCs is available. Typically, motor controller designs for legged robots feature explicit feedback loops (Kimura et al., 2007; Buchli et al., 2009).

Mechanical compliance in legged robotic design has been introduced very early; Raibert's robots featured in-series elasticity (air springs, Raibert et al., 1984). Raibert reported a closed-loop controller with explicit, though simple and linear feedback. We are unaware on how much Raibert's machines could have been controlled in a feed-forward manner. We experienced that sufficiently complex feed-forward CPG signals (two to three PCs, speed dependent), a segmented leg design, and in-series and in-parallel leg compliance were the necessary ingredients for a simple, yet self-stable, dynamically legged quadruped robot system. Cheetah-cub and other robots (Iida and Pfeifer, 2004; Khoramshahi et al., 2013) are indicators that a larger design pool for mechanically entrained, dynamical robots exists.

As for the robot's range of leg compliance; Cheetah-cub features a slightly lower leg stiffness than observed in animals of the same weight and leg length. It would be an interesting future experiment to incrementally alter the leg stiffness, up to a level where the leg has no compliance. This would also require a way to alter the feed-forward control patterns in a systematic way, to ensure comparability between resulting gaits.

### **4.3. RELEVANCE TO BIOLOGICAL SYSTEMS**

Cheetah-cub is a bio-inspired robot designed and motion controlled according to bio-inspired blueprints. Therefore, it presents a strong abstraction. We replicated observed blueprints from functional anatomy (pantograph leg, in-series and in-parallel compliance, clutch mechanism) and control (CPGs and locomotion parameters: duty factor, leg length, leg angle, amplitude, and frequency). The applied feed-forward controller produced position-control motor signals. Swing-leg dynamics are different to that of an animal, because its mechanical spring force cannot be manipulated in an online fashion. The robot's distal compliance acts passively and in-series, whereas quadruped animals are

**FIGURE 9 | The percent variance accounting for the first 10 primitives, as a function of the number of primitives, for in-air stepping (red, dashed lines) and on-ground (black, solid lines) locomotion patterns.** Horizontal, black, dashed lines indicate 80%, 90%, and 95% of the variance. In this article, we used a 95% of the variance interval (top, dashed, black, horizontal line). This results in between 2 and 4 primitives to account for ≥95% of the variance. The single peak (SP) trot gait at 3.5 Hz **(D)** showed the largest change from

able to adjust ankle stiffness. Cheetah-cub does not feature an antagonist actuator producing a foot-joint stiffness profile.

Studies on legged locomotion in biology indicate that 4–5 principal components account for a large part of kinematic data variance, both for vertebrates and invertebrates (Ivanenko et al., 2004; Holmes et al., 2006; Dominici et al., 2011; Moro et al., 2013a,b). This is remarkable, considering the range of leg lengths, body sizes and weights, and differences in leg design and actuation strategies. In our study, four PCs accounted for 95% of the variance of all level-ground experimental data, including walk and trot gait data. The corresponding locomotion controller (CPG in feed-forward mode) was programmed with changing complexity. Depending on the range of robot speed either 3 or 2 PCs accounted for the variance of the feed-forward controller data (slower and faster *in-air* patterns, respectively). Perturbation experiments with running birds (guinea fowl) indicate that bipedal running is controlled through a combination of feed-forward control and additional, reflex-based actuator changes (cf. Daley et al., 2006, 2009). For future robotic legged experiments it will become interesting to observe and quantify the effect of explicit, possibly reflex-based feedback. Adult human locomotion patterns showed more pronounced basic gait patterns, compared to those of toddlers (cf. Dominici et al., 2011). These patterns were also phase-locked at important gait events. It is also unclear if and how a faster running robotic system would reduce the number of observable principal components with increasing robot speed, similar to the findings of Koditschek et al. (2004).

# **5. CONCLUSION**

In this study we reported on the interplay between a modular, feed-forward locomotion controller, and the mechanical entrainment of a quadruped, self-stably walking and trotting compliant legged robot. We measured the complexity of the feed-forward controller, and the complexity of the resulting leg kinematics through the number of basic patterns accounting for a certain variance of kinematic data from *in-air* leg motions and *on-ground* locomotion, respectively. We implemented lateral sequence walk and trot gaits, and applied two different leg length control strategies. We found that the number of basic patterns from *on-ground* locomotion data matched those reported for animals; four basic patterns accounted for ≥95% of the variance. Three basic patterns accounted for ≥95% of the variance in lateral sequence walk and slower trot *in-air* experimental kinematic data, and two basic patterns accounted for faster trotting. Because patterns were sent in a feed-forward manner, the measured complexity of *in-air* kinematic data represents the complexity of the feed-forward controller. This shows that already a simple, modular rhythm generator is sufficient for level-ground, feedforward legged quadruped locomotion, for two different gaits walk and trot. It also shows that passive mechanical compliance enables an increase of kinematic complexity, leading to dynamic and self-stabilizing walk and trot locomotion. In the case of our quadruped legged robot, the complexity of the kinematic data increased at ground contact, through mechanical entrainment between the feed-forward controller and compliant, bio-inspired robot hardware. Here, the bio-inspired leg design supported the emergence of additional *on-ground* basic primitives, e.g., through passive leg compliance and leg segmentation. Animals show a much wider range of tools to adapt and modulate dynamic legged locomotion. Similar results between presented robot experiments and experiments with animals at level-ground locomotion indicate that modular, feed-forward, rhythmic pattern-based motor control in combination with compliant hardware are important components of animal neuro-control and bio-mechanics.

# **ACKNOWLEDGMENTS**

The research leading to these results has received funding from the European Community's Seventh Framework Programme FP7/2007-2013-Challenge 2-Cognitive Systems, Interaction, Robotics-under grant agreement No. 248311 (AMARSi), and from the Swiss National Science Foundation through the National Centre of Competence in Research Robotics.

# **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

### *Received: 30 September 2013; accepted: 20 February 2014; published online: 07 March 2014.*

*Citation: Spröwitz AT, Ajallooeian M, Tuleu A and Ijspeert AJ (2014) Kinematic primitives for walking and trotting gaits of a quadruped robot with compliant legs. Front. Comput. Neurosci. 8:27. doi: 10.3389/fncom.2014.00027*

*This article was submitted to the journal Frontiers in Computational Neuroscience. Copyright © 2014 Spröwitz, Ajallooeian, Tuleu and Ijspeert. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# The minimum transition hypothesis for intermittent hierarchical motor control

# *Amir Karniel\**

*Department of Biomedical Engineering, Ben-Gurion University of the Negev, Beer-Sheva, Israel*

### *Edited by:*

*Andrea D'Avella, IRCCS Fondazione Santa Lucia, Italy*

### *Reviewed by:*

*Todd Troyer, University of Texas, USA Peter Gawthrop, The University of Melbourne, Australia*

*\*Correspondence: Amir Karniel, Department of*

*Biomedical Engineering, Ben-Gurion University of the Negev, POB 653, Beer-Sheva, 84105, Israel. e-mail: akarniel@bgu.ac.il*

In intermittent control, instead of continuously calculating the control signal, the controller occasionally changes this signal at certain sparse points in time. The control law may include feedback, adaptation, optimization, or any other control strategies. When, where, and how does the brain employ intermittency as it controls movement? These are open questions in motor neuroscience. Evidence for intermittency in human motor control has been repeatedly observed in the neural control of movement literature. Moreover, some researchers have provided theoretical models to address intermittency. Even so, the vast majority of current models, and I would dare to say the dogma in most of the current motor neuroscience literature involves continuous control. In this paper, I focus on an area in which intermittent control has not yet been thoroughly considered, the structure of muscle synergies. A synergy in the muscle space is a group of muscles activated together by a single neural command. Under the assumption that the motor control is intermittent, I present the minimum transition hypothesis (MTH) and its predictions with regards to the structure of muscle synergies. The MTH asserts that the purpose of synergies is to minimize the effort of the higher level in the hierarchy by minimizing the number of transitions in an intermittent control signal. The implications of the MTH are not only for the structure of the muscle synergies but also to the intermittent and hierarchical nature of the motor system, with various predictions as to the process of skill learning, and important implications to the design of brain machine interfaces and human robot interaction.

### **Keywords: muscle synergies, motor control, intermittent control, spinal cord, blind source separation**

Nature sets in motion by signs and watchwords, which are made with little momentum . . . Just as in the army the soldiers are set in motion by one word as if by a given signal and continue to move until they receive another signal to stop, so the muscles move in order and harmony from established custom.

William Harvey (1578–1657)

William Harvey has eloquently described the notion of intermittency in hierarchical neural control of movement, "Nature set in motion by signs and watchwords," Harvey 1627, see Whitteridge (1959) and Meijer (2001). In intermittent control, instead of continuously calculating the control signal, the controller occasionally changes this signal at certain sparse points in time according to the control law, which may or may not include feedback, adaptation, optimization, or other control strategies. When, where, and how does the brain employ intermittency as it controls movement? These are open questions in motor neuroscience (Karniel, 2011).

Evidence for intermittency in human motor control has been repeatedly observed in the neural control of movement literature (Navas and Stark, 1968; Neilson et al., 1988; Hanneton et al., 1997; Welsh and Llinas, 1997; Doeringer and Hogan, 1998; Fishbach et al., 2005; Gawthrop and Wang, 2006; Squeri et al., 2010; Loram et al., 2011). Moreover, some researchers have provided theoretical models to address intermittency (Hanneton et al., 1997; Ben-Itzhak and Karniel, 2008; Bye and Neilson, 2008, 2010; Gawthrop and Wang, 2009). Even so, the vast majority of current models, and I would dare to say the dogma in most of the current motor neuroscience literature involves continuous control (Todorov and Jordan, 2002; Karniel and Mussa-Ivaldi, 2003; Shadmehr and Wise, 2005).

In this paper I present the minimum transition hypothesis (MTH) asserting that the control signal in the high level of the motor system is intermittent and that the system evolved to minimize the transitions in this high level control signal. This hypothesis was first presented in a meeting of the neural control of movement society (Karniel et al., 2002) and has not been thoroughly tested yet.

# **MUSCLE SYNERGIES**

There are various definitions of synergies concentrating on the functional, neural, or muscular levels (Welsh and Llinas, 1997; Tresch et al., 1999; Giszter et al., 2000; Grossberg and Paine, 2000; Saltiel et al., 2001; Domkin et al., 2002; d'Avella et al., 2003; Kang et al., 2004; Mussa-Ivaldi and Solla, 2004; Cheung et al., 2005, 2009; d'Avella and Bizzi, 2005; Sosnik et al., 2007; Kargo and Giszter, 2008; Overduin et al., 2008; Berniker et al., 2009). Here, we define synergy at the muscular level: a group of muscles that can be activated together by a single neural command. Previous studies of such synergies employed recordings of electromyography (EMG) of multiple muscles and extracted the synergies based on algorithms such as principle component analysis, or non-negative matrix factorization, implicitly assuming that there are only a few synergies which represent most of the variance in the data. After presenting the hypothesis and its predictions, we further discuss more recent theories of synergies, such as the so called time varying synergies and their relation to the MTH.

# **THE MINIMUM TRANSITION HYPOTHESIS**

The MTH asserts that the higher-level motor command is intermittent and sparse, and that the synergies have been developed to minimize the effort of this motor command as measured by the number of transitions. Two assumptions underlie the MTH: (1) There are groups of muscles that are typically activated together at a predefined pattern; we call each group a synergy. (2) The purpose of the synergies is to minimize the effort of the central nervous system (CNS) while controlling movements, i.e., the existence of synergies allows the CNS to send fewer commands than would be needed if each muscle were controlled individually. The minimization is hypothesized to be performed over the entire expected motor behavior of the animal.

More formally, consider a vector of motor commands *c*(*t*) and a vector of muscle activation *e*(*t*) which are generated by some spinal cord synergies mathematically denoted by the operator μ, namely *e* = μ- *c* , the MTH asserts that μ<sup>∗</sup> = arg min μ *E <sup>T</sup> <sup>t</sup>* -*C*(*t*) , where -*C*(*t*) is the number of transitions in the control signal vector at time *t*, and the expectation is over the entire behavior of the animal.

## **MATHEMATICAL FORMULATION FOR LINEAR TIME INVARIANT SYNERGIES**

In order to demonstrate and validate the MTH, we incorporate the following simplifying assumptions, which would definitely be relaxed in future development and validation of the hypothesis: (1) The synergies are static and linear, i.e., there is a linear timeinvariant relationship between the activation of the synergies and the activation of each muscle. (2) The activation of the synergies, namely the high level motor command can be well approximated by a sum of step functions of various amplitudes at various time points. The smoothness of the EMG is assumed to be the result of low level filtering.

Suppose that there are *K* muscles and *N* synergies, where each synergy is activated by a control command *cj*(*t*). Following the first assumption of linearity, the EMG of each muscle can be written as the following linear combination of the control commands:

$$e\_i(t) = \sum\_{j=1}^{N} \mu\_{i,j} \cdot c\_j(t) \quad \forall i \in \{1, 2, \dots, K\} \tag{1}$$

where μ is a matrix of the weights of the synergies. Following the second assumption (command being pulses and steps), one can count the number of changes in the control signal sent by the CNS as -*C*(*t*) = *<sup>N</sup> j* = 1 *cj*(*t*) = *cj*(*t* − 1) which can be practically relaxed by counting *cj*(*t*) − *cj*(*t* − 1) > ε. The MTH asserts that the system evolved to minimize the effort of the CNS as measured by the number of transitions in the motor command, therefore, the MTH suggests that the synergies are the result of the following optimization: μ<sup>∗</sup> = arg min μ *<sup>T</sup> <sup>t</sup>* <sup>=</sup> <sup>1</sup> -*C*(*t*), where the changes should be counted over a representative sample of all the possible control signals. **Figure 1** illustrates the main idea of the MTH with a simple toy example.

# **INTERMITTENCY AND THE HIERARCHICAL NATURE OF THE MOTOR SYSTEM**

This paper is opened with a citation from Harvey describing the hierarchical nature of the motor system. The MTH is based on

the assumption that the higher level sends intermittent motor commands and that the lower level in the hierarchy evolved to minimize the motor commands sent from the higher level. In this section we consider a simple reaching movement and demonstrate how the MTH may fit into the current view of possible desired trajectories and current models of optimal hierarchical control. **Figure 2** illustrates the motor control hierarchical nature and the location of the MTH within this system. Numerous criteria were proposed to account for the trajectory formation in reaching movements (e.g., Abend et al., 1982; Flash and Hogan, 1985; Uno et al., 1989; Harris and Wolpert, 1998) and several studies have proposed various criterions for the manipulation of mass on a spring (Dingwell et al., 2004; Svinin et al., 2005, 2006; Leib and Karniel, 2012). The desired trajectory is expected to be found at the high level neural activity of the CNS, and in a well-practiced conditions, the actual arm trajectory is expected to be similar to the desired trajectory. Here we discuss intermittent control and in this context the minimum acceleration criterion with constraints [MACC, (Ben-Itzhak and Karniel, 2008; Leib and Karniel, 2012)], which predicts intermittent control signals, is probably the best candidate for the desired trajectory; however, other desired trajectories could be considered. One should note that the MACC predicts a continuous arm trajectory with a bell shaped speed profile, however, the predicted Jerk signal, namely the third time derivative of the path is a piecewise constant function of time and therefore it practically predicts intermittent control signals. It is important to note that some models do not include a desired trajectory, however, even these models typically have some cost function to be minimized or reward to be maximized, and the resultant

optimal trajectory can be called "a desired trajectory" in the current formulation. The MTH asserts that synergies evolved to minimize the transition in the control signal issued from the central nervous system to the spinal cord. Learning of a new skill can be also formulated in this framework. First, the high-level controller calculates the desired intermittent trajectory and quickly adapts through feedback error to perform the desired trajectory. Then, at a slower pace, the synergies are adapted to map the motor command to muscle activations to minimize the number of transitions.

Recent measurements in the cerebellum have found clear evidence supporting an intermittent control strategy (Loewenstein et al., 2005; Yartsev et al., 2009). In these studies, it has clearly been shown that the activity of cerebellar Pukinje cells demonstrates bistability—bursting activities separated by pauses.

# **METHODOLOGICAL TOOLS TO TEST THE MTH**

We have developed a few methods to validate the MTH using data from two frogs (one, fs11, performed jump, swim, and kick; and the other, fs17, performed jump, swim, and steps.) The multiple recordings of EMG from a behaving frog provide valuable information that could be used in the attempt to decipher the structure of the synergies. A recent study by d'Avella and colleagues (d'Avella and Bizzi, 2005) suggested a simple method to extract the dominant synergies from the measured EMG signals by means of an iterative minimum nonnegative least squares algorithm (here we refer to these six synergies as SA6) (Saltiel et al., 2001; Karniel et al., 2002; d'Avella et al., 2003; d'Avella and Bizzi, 2005). For detailed description of the animals and the EMG recording and analysis to produce the

feedback control is typically analyzed based on continuous trajectories

low level controller will follow the MTH.

normalized EMG signals and SA6 used here, see d'Avella and Bizzi (2005).

The SA6 convey some information about the tendency to actuate some muscles together, however, it was not clear whether this procedure really extracts the underlying synergies at the spinal cord level. Nevertheless, since we already have a candidate set of synergies, we designed a simple test to check whether they are consistent with the MTH. Similar tests could be later used to other candidate synergies or they could be adapted to extract the optimal MTH based synergies.

Our test was based on counting the number of transitions in the CNS command that is associated with each major transition in the EMG signal. A major transition was defined as a monotonically increasing/decreasing transition during three time-steps to a total change larger than 0.7 in the normalized EMG signal (this is clearly an arbitrary definition, and future sensitivity analysis can be used to validate the results). We then shuffled the EMG data to generate a non-physiological EMG signal, which could be approximated with the exact same synergies, and repeated the same count of transitions in the new signal. The null hypothesis asserts that the SA6 are just a compact mathematical description and they have nothing to do with the structure of the neural control system. Therefore, in particular, we do not expect them to support the MTH. Thus, we expect that any change in the EMG would be described with a change in the contribution of all the "synergies" in SA6. The alternative hypothesis asserts that the specific SA6 coincide with the MTH, namely, the SA6 represent synergies in the nervous system, and these synergies are there to simplify the task of the higher level controller. Therefore, we expect that many changes in the EMG would be a result of a change in the contribution of only few synergies in SA6. The result of this analysis is presented in **Figure 3**. It refutes the null hypothesis and supports the MTH.

To compare a specific synergy (SA6, or simply S) to other possible synergies, we used the S as a seed and generated many random S-equivalents (SE), defined as any set of synergies that can approximate the data with a small residual error. In a matrix notation, we can write Equation 1 as: *E*(*t*) = *M* · *C*(*t*). The vector of EMG signals *E*(*t*) is given, and we use the S synergies *MS* and approximate the CNS signal *C*(*t*) such that it would be positive (by means of non-negative least squares): *C*(*t*) = *NNLS* {*MS*, *E*(*t*)} + ε.

To generate the equivalent synergies SE, we generate a random invertible matrix A, and use it to transform S and the control signal as follows:

$$\begin{aligned} E(t) &= M \cdot C(t) = M \cdot A \cdot A^{-1} \cdot C(t) \cong \tilde{M} \cdot \tilde{C}(t);\\ \tilde{M} &= M \cdot A; \quad \tilde{C}(t) = \text{NNLS}\{\tilde{M}, E\} + \tilde{\varepsilon} \end{aligned}$$

Note that we used the nonnegative least squared NNLS algorithm instead of calculating *C*˜ = *A*−1*C*, since the latter may yield negative control signals.

**Figure 4**, demonstrates, that most equivalent synergies required more transitions than SA6, suggesting that SA6 is consistent with the MTH.

Additional validation of our predictions can be performed by comparing physiologically plausible synergies from the literature to equivalent synergies by providing predictions about the difference between intact and deafferented preparations (namely, with or without proprioceptive feedback respectively). By simulating continuous and intermittent control signals with physiological noise, we can generate predictions supporting or refuting the MTH, and extract the MTH synergies and compare them to other synergies reported for the same raw data.

As briefly mentioned in this section, we have employed these methods on multiple recordings of EMG from a behaving frog provided by Andrea d'Avella and colleagues at the Bizzi Lab (Saltiel et al., 2001; Karniel et al., 2002; d'Avella et al., 2003; d'Avella and Bizzi, 2005), and the results were consistent with the MTH. However, without probing the higher level system, it is extremely difficult to provide a convincing proof of the MTH, and therefore, further tests are required in animals and in humans. It is also a compelling technological solution for the control of artificial systems facing similar challenges of delay and other conditions facilitating the use of hierarchical control.

# **GENERALIZATION TO NONLINEAR TIME VARYING SYNERGIES, FEEDBACK, LEARNING, ADAPTATION, AND EVOLUTION**

The biological synergies in frogs or in humans are most likely not static linear matrix as we suggested above. On the other hand, with arbitrary complex nonlinear time varying synergies, one can explain any behavior with a single transition in the higher nervous system. The MTH as a biologically plausible hypothesis asserts that there is hierarchical non trivial structure.

One way to relax the assumptions underlying the simple model described above is to allow for dynamic synergies [sometimes referred to as time varying synergies (d'Avella et al., 2003; Cheung et al., 2005)]. This extended model accounts for the possibility that the CNS issues one command, e.g., a step function, and the spinal cord, by means of the synergies, generates complex time varying signals with different delays and temporal structure to each muscle. In formal notation, Equation 1 may be replaced by

$$c\_{\vec{l}}(t) = \sum\_{i=1}^{N} \mu\_{i, \vec{j}} \left[ c\_{\vec{l}}(t) \right] \quad \forall \vec{j} \in \{1, 2, \dots, K\}$$

In this formulation, μ*i*, *<sup>j</sup>* is a functional, a general operator on the input signal, and it generates a command to the muscle. The step response of this operator is the time varying signals that are used by d'Avella et al. (2003). Introducing general synergies and general filters in order to validate/refute the MTH in this general case is an overwhelming task, so it is suggested to limit the search to time-invariant filters and consider two special cases (1) delay operator, introducing various delays for each synergy and/or for each muscle and (2) linear filters. In this case, each operator could be represented as a transfer function in the Laplace domain (μ*i*, *<sup>j</sup>*(*s*)). If one restricts the number of zeros and poles, it is possible to attempt to estimate these parameters. Some other directions for future exploration are: varying the number of synergies, comparing them to synthetic EMG signals with the same frequency content, and comparing synergies between behaviors.

In order to better understand the meaning of this hypothesis it is important to remember that a key property of the motor control system is adaptation in the wide sense, including feedback, adaptation, learning, and evolution, see **Figures 2**, **5** and Karniel (2009, 2011). As illustrated in **Figure 5**, feedback is available to all levels of the control in real time, adaptation can modify the weights of each synergy while skill learning can also modify the number of synergies and generate new synergies, as well as change the motor command accordingly to reduce the number of transitions.

It is also important to note that the hypothesis presented here is only a prototype for the MTH and more details and assumptions are required to provide more specific predictions for specific animal and motor task. For example there is a tradeoff between communication rate and optimal performance that can be included in a more specific MTH, see Nair et al. (2007).

# **PREDICTIONS OF THE MTH**

hierarchy of wide sense adaptation.

1. Higher level motor command is intermittent and sparse.

feedback, adaptation, learning, and evolution in the spatiotemporal


In this article we have presented the MTH, demonstrated how it can be tested, and listed its predictions. Our demonstrations cannot be considered statistical proof of the hypothesis and further studies are required to support (or refute) the MTH and to elaborate on the structure of synergies in terms of adaptation and generalization capabilities.

# **ACKNOWLEDGMENTS**

I wish to thank Sandro Mussa-Ivaldi, Emilio Bizzi, Andrea d'Avella, Vincent Cheung, Sharon de-Castro, Ofir Tiferet, Tal Furmanov Ilana Nisky, and Raz Leib for useful discussions about the Minimum Transition Hypothesis. This research is currently being supported by the Israel Science Foundation.


of visual feedback. *J. Neurophysiol.* 80, 1787–1799.


*Neuroscience,* eds M. D. Binder, N. Hirokawa, and U. Windhorst (Berlin: Springer-Verlag), 832–837.


based on olivocerebellar physiology. *Prog. Brain Res.* 114, 449–461.

Whitteridge, G. (ed.). (1959). *De Motu Locali Animalium* [on animal movement] (I627)*.* William Harvey. Ed. Trans. and Introduced by Gweneth Whitteridge (Cambridge University Press).

Yartsev, M. M., Givon-Mayo, R., Maller, M., and Donchin, O. (2009). Pausing purkinje cells in the cerebellum of the awake cat. *Front. Syst. Neurosci.* 3:2. doi: 10.3389/neuro.06.002. 2009

**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 14 December 2012; accepted: 11 February 2013; published online: 28 February 2013.*

*Citation: Karniel A (2013) The minimum transition hypothesis for intermittent hierarchical motor control. Front.* *Comput. Neurosci. 7:12. doi: 10.3389/ fncom.2013.00012*

*Copyright © 2013 Karniel. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any thirdparty graphics etc.*

# Spatiotemporal characteristics of muscle patterns for ball catching

#### *M. D'Andola1, B. Cesqui 1, A. Portone1,2, L. Fernandez1,3, F. Lacquaniti 1,2,4 and A. d'Avella1 \**

*<sup>1</sup> Laboratory of Neuromotor Physiology, Santa Lucia Foundation, Rome, Italy*

*<sup>2</sup> Department of Systems Medicine, University of Rome, Rome, Italy*

*<sup>3</sup> Centre National de la Recherche Scientifique, Aix-Marseille Université, ISM UMR 7287, Marseille cedex, France*

*<sup>4</sup> Center of Space Biomedicine, University of Rome, Rome, Italy*

### *Edited by:*

*Tamar Flash, Weizmann Institute, Israel*

### *Reviewed by:*

*Alessandro Treves, Scuola Internazionale Superiore di Studi Avanzati, Italy Amir Karniel, Ben-Gurion University, Israel*

### *\*Correspondence:*

*A. d'Avella, Laboratory of Neuromotor Physiology, Santa Lucia Foundation, Via Ardeatina 306, 00179 Rome, Italy e-mail: a.davella@hsantalucia.it*

What sources of information and what control strategies the central nervous system (CNS) uses to perform movements that require accurate sensorimotor coordination, such as catching a flying ball, is still debated. Here we analyzed the EMG waveforms recorded from 16 shoulder and elbow muscles in six subjects during catching of balls projected frontally from a distance of 6 m and arriving at two different heights and with three different flight times (550, 650, 750 ms). We found that a large fraction of the variation in the muscle patterns was captured by two time-varying muscle synergies, coordinated recruitment of groups of muscles with specific activation waveforms, modulated in amplitude and shifted in time according to the ball's arrival height and flight duration. One synergy was recruited with a short and fixed delay from launch time. Remarkably, a second synergy was recruited at a fixed time before impact, suggesting that it is timed according to an accurate time-to-contact estimation. These results suggest that the control of interceptive movements relies on a combination of reactive and predictive processes through the intermittent recruitment of time-varying muscle synergies. Knowledge of the dynamic effect of gravity and drag on the ball may be then implicitly incorporated in a direct mapping of visual information into a small number of synergy recruitment parameters.

**Keywords: muscle synergies, interception, EMG activity, intermittency, time-to-contact**

# **INTRODUCTION**

Catching or hitting a moving object requires high spatiotemporal accuracy in the control of an end-effector in the presence of visuomotor delays. What sources of information and which visuomotor strategies the central nervous system (CNS) employs to achieve accurate control of interception has been extensively investigated and it is still debated (Zago et al., 2009). On one hand, lawful relationships between visual information available in retinal and extraretinal variables and properties of the target motion, such as time-to-contact or motion in depth, might be exploited to bring the effector on the target (Lee, 1998; Tresilian, 1999; Regan and Gray, 2000). On the other hand, prior knowledge of the characteristics of the target motion and their dependence on environmental conditions, such as gravity acceleration, might be necessary to successfully guide interception (McIntyre et al., 2001; Zago et al., 2004, 2008).

While interceptive movements have been studied extensively in terms of global performance measures and kinematic variables, the underlying muscle activation patterns have received considerably less attention. Even so, the analysis of the timing of muscle activation in preparation for interception has provided evidence for an accurate estimate of time-to-contact. The anticipatory electromyographic (EMG) activity recorded in elbow muscles when catching a ball falling on the hand from different heights shows an early component and a late component (Lacquaniti and Maioli, 1989). The early component has a roughly constant latency from the time of ball release. In contrast, the late component has a constant onset time with respect to the time of impact, indicating that the time-to-contact is estimated accurately. As the ball accelerates under gravity during the fall, precise estimation of time-to-contact cannot be obtained from first order derivatives of retinal variables, as the inverse of the rate of dilation of the retinal image or tau variable (Lee et al., 1983), and requires a-priori knowledge of gravity acceleration (Zago et al., 2008). The onset of EMG activity with respect to the time of impact is also constant for catching of balls thrown frontally (Savelsbergh et al., 1992).

However, to date, the muscle synergies underlying the coordination of multi-muscle EMG activity have not been characterized in catching tasks. It has recently been shown that modulation and superposition of a few time-varying muscle synergies, each appropriately scaled in amplitude and duration and shifted in time, capture the variation of the muscle patterns during reaching to stationary targets in different directions, with different speeds, and after a sudden change of the target location (d'Avella et al., 2006, 2008, 2011b). Such synergies are time-varying as they represent a collection of activation waveforms expressing a balance of muscle activation that varies over a few hundred milliseconds of the time-course of the muscle pattern for a single movement but, as a whole, invariant across movement conditions and repetitions. To test whether an accurate estimation of time-to-contact is used to control the execution of the final phase of an naturalistic and unconstrained interceptive movement and, more generally, to gain insight on the underlying control mechanisms, here we characterized the spatiotemporal organization of muscle patterns during one-handed catching of balls projected from a fixed location and reaching the subject's frontal plane at two different heights (above and below the subject's shoulder height) after three different flight times (550, 650, and 750 ms). This task involves projectile motion as in the previous study by Savelsbergh et al. (1992), but it requires bringing the hand at a point which is not specified in advance, in contrast with the fixed hand position of Savelsbergh et al. (1992). In our conditions, if the effector starts to move well in advance of the latest time at which visual information can affect its trajectory before impact, it is not clear how to detect a signature of an accurate estimate of time-to-contact in the on-going EMG patterns. Here we show that time-varying synergies allow to identify two partially overlapping components in the muscle patterns for intercepting ball with different flight characteristics: one aligned with the ball launch and a second timed to the ball impact.

# **MATERIALS AND METHODS**

# **SUBJECTS, APPARATUS, AND EXPERIMENTAL PROTOCOL**

Six right handed subjects (4 males and 2 females, between 22 and 47 years old) gave their written informed consent to participate in the study, which conformed with the Declaration of Helsinki and had been approved by the Ethical Review Board of the Santa Lucia Foundation. These subjects are a subset of the 14 subjects included in our previous report focusing on task performance and wrist kinematics (Cesqui et al., 2012) for which EMG data was also available. Both the experimental apparatus and protocol have been described in detail previously (Cesqui et al., 2012). Briefly, subjects were asked to catch with the right hand a lightweight (20 g) ball (diameter 7 cm, size similar to that of a cricket or tennis ball) projected in space with different initial velocities by a launching system, built specifically for these experiments (d'Avella et al., 2011a), which allowed to control accurately the mean ball flight time and arrival height at the subject's frontal plane. The ball was projected through a hole in a large screen (4 × 3 m, width × height) at 6 m from the subject's right shoulder and flew along a vertical plane passing through the center of the hole and the shoulder. Subjects were instructed to wait for the ball launch standing with their arms beside their body. Before starting the experiment the system was calibrated to deliver the ball with 3 different flight times (*T*<sup>1</sup> = 0.55 s, *T*<sup>2</sup> = 0.65 s, *T*<sup>3</sup> = 0.75 s) at 2 different heights (*Z*<sup>1</sup> = 1.3 m, low launches, and *Z*<sup>2</sup> = 1.9 m, high launches) for a total of six different ball flight conditions. Each subject performed at least 1 block of at least 10 trials for each condition (typically, 3 blocks with 10–15 trials). The order of block execution was selected at random for each subject.

### **DATA ACQUISITION**

The spatial position of the ball (covered with retro-reflective tape, Scotchlite, 3M, Pioltello, Milan, Italy) was measured together with the spatial position of 8 optical retro-reflective markers placed on the subject's trunk and arm close to the following anatomical landmarks: seventh cervical vertebra (C7), clavicle (CL), sternum (SRN), right acromion (RSHO), right epicondylus lateralis (RELB), right forearm (RFRA), right wrist ulnoid styloid (RWRU), right wrist radial styloid (RWRR). Marker positions in space were recorded with a sampling frequency of 100 Hz using a motion capture system (9-camera Vicon-612 system, Vicon, Oxford, UK). A very large tracking volume (6 <sup>×</sup> <sup>3</sup> <sup>×</sup> 3 m3) was required for capturing the motion of both the ball and the subject upper limb. The marker reconstruction residual, averaged over the nine cameras, obtained in such volume with the Vicon calibration procedure ranged across subjects between 0.93 and 1.03 mm (mean 0.97 mm). Markers coordinates were referred to a right handed calibration frame placed on the floor at 6 m distance from the launch plane, oriented with the *x* axis directed along the antero-posterior axis on the horizontal plane pointing toward the screen and with the *z* axis on the vertical plane pointing upward.

EMG activity was recorded with active bipolar surface electrodes (*DE 2.1*; Delsys, Boston, MA) from the following 16 muscles: biceps brachii, short head (abbreviated as BicShort), biceps brachii, long head (BicLong), brachioradialis (BrRad), triceps brachii, lateral head (TrLat), triceps brachii, long head (TrLong), anterior deltoid (DeltA), middle deltoid (DeltM), posterior deltoid (DeltP), pectoralis major, clavicular portion (PectClav), pectoralis major, sternal portion (PectLow), superior trapezius (TrapSup), middle trapezius (TrapMid), inferior trapezius (TrapInf), latissimum dorsi (LatDors), teres major (TeresMaj), infraspinatus (InfraSp). Each electrode consisted of two parallel silver bars (10 mm spacing) and a differential preamplifier (gain, 10; rms noise, 1.2 μV; common mode rejection ratio, > 80 dB) housed in a compact case (41 × 20 × 5 mm). Electrodes were taped on the muscle belly and connected to an amplifier (Bagnoli-16, Delsys) where the EMG signal was band-pass filtered (20–450 Hz), amplified (total gain 1000) and digitized at 1 KHz by a A/D board in the Vicon system, synchronized with the kinematic data acquisition. For each muscle, correct electrode placement was tested by asking the subject to perform a number of maneuvers involving both free movements and isometric contractions and monitoring the resulting activation patterns on a computer screen.

# **DATA ANALYSIS**

All analyses were performed with custom software written in Matlab (Mathworks, Natick, MA). We considered trials in which the subjects either intercepted but did not catch the ball or caught the ball thus excluding trials in which the subjects missed the ball (0.9% of the total number of trials on average). Motion capture and EMG data were analyzed between 200 ms before ball launch and 100 ms after the interception. The time of ball launch was obtained by detecting the instant at which the ball passed through the hole on the screen using a photo-sensor (E3T-S112, Omron Electronics S.p.A., Milan, Italy) mounted on the edge of the hole. The time of interception was obtained by computing the instant at which the distance between the ball trajectory (spatial coordinates as a function of time) and the plane passing through the wrist (RWRU, RWRR) and forearm (RFRA) marker positions reached its minimum (Cesqui et al., 2012). The onset time of wrist movement was defined as the time at which the tangential velocity of the averaged position of RWRU and RWRR markers (marker RWRM) crossed a threshold equal to 10% of its maximum.

## *EMG pre-processing*

A notch-filter was used to reduce 50 Hz line noise when present (quality factor 35; Matlab iirnotch function). The EMGs for each trial were digitally full-wave rectified, low pass filtered (20th order FIR filter; 50 Hz cutoff; zero phase distortion; Matlab fir1 and filtfilt functions) and resampled at 200 Hz. Diagnostic screening of EMG signal quality was performed by checking the maximum level of activation, the activation level before the launch time (i.e., with the arm at rest), and the power spectral density of each channel. Eighteen trials (out of a total of 581 trials) that showed high pre-launch activity or abrupt change in signal amplitude, likely resulting from a partial detachment of the electrode from the skin, were excluded. Presence of significant cross-talk was assessed by computing the mean cross-correlation between pairs of channels across trials. Across all subjects, the maximum of the absolute value of the mean cross-correlation was above 0.4 for 11 pairs of muscles (TrLat-TrLong in four subjects, BicShort-BicLong and TeresMaj-InfraSp in three subjects, BicLong-TrLat and PectLow-BrRad in two subjects, TrapInf-TrapMid, DeltA-DeltM, DeltM-TrLat, DeltP-TrLat, PectLow-DeltP, DeltP-BrRad in one subject). Because of the difficulty in distinguishing crosstalk due to volume conduction from synchronous recruitment of motor units in different muscles, we did not remove these muscles from the set used for the main analyses. However, in additional analyses, we verified that the removal of the muscles potentially affected by cross-talk did not change any conclusion drawn from the main analyses. Baseline activity, defined as the mean EMG activity in the 200 ms before launch, was subtracted from each EMG channel. EMGs recorded in each trial were aligned to launch time and averaged across repetitions on each one of the six launch conditions. Finally, the averaged EMG of each channel was normalized to its maximum amplitude across conditions.

# *Time-varying muscle synergies*

We modeled the construction of a muscle pattern by combination of N time-varying muscle synergies as follows (d'Avella et al., 2006):

$$\mathbf{m}\left(t\right) = \sum\_{i=1}^{N} c\_i \mathbf{w}\_i \left(t - t\_i\right) + \epsilon(t)$$

where **m**(t) is a vector of real numbers, each component of which represents the activation of a specific muscle at time *t*; **w**i(τ) is a vector representing the muscle activation for the *i*-th synergy at time τ after the synergy onset; *ti* is the time of synergy onset; *ci* is a non-negative scaling coefficient; -(*t*) is the residual. The extraction algorithm was initialized by choosing random values for *N* synergies of a specific duration, and it proceeded by iterating three steps (d'Avella and Bizzi, 2005; d'Avella et al., 2006): (1) synergy onset time estimation, given the synergies; (2) amplitude coefficient estimation, given the synergies and their onset times; (3) synergy update, given onset times and amplitude coefficients.

We used a convergence criterion of five consecutive iterations in which the decrease in the error was 10−4. To minimize the probability of finding local minima, for each N we repeated the optimization 20 times and selected the solution with the highest goodness of reconstruction. Because the EMG patterns and the residual of the reconstruction of patterns by synergy combination are multivariate time-series, we used the fraction of total variation explained by the model as a measure of the goodness of the reconstruction (*R*<sup>2</sup> <sup>=</sup> 1 – SSE/SST, where SSE is the sum of the squared residuals and SST is the sum of the squared residual for the mean activation of each channel).

The number of synergies (*N*) and the duration of each synergy were selected according to the general features of the time-course of the muscle activation waveforms observed (see **Figures 1**, **2**). In particular, we selected 2 synergies with a duration of 400 ms as the minimum number of synergies with the shortest duration which could fully capture the two distinct bursts of EMG activation. We also verified that the reconstruction *R*<sup>2</sup> did not increase significantly with synergies longer than 400 ms.

Synergies extracted from individual subjects were compared after grouping them with a hierarchical clustering procedure based on distance between synergy pairs. Distance between two synergies was defined as 1 − s, where s is the similarity between two synergies defined as the maximum of the scalar product between the two normalized synergies across all possible relative time shifts (d'Avella et al., 2006). A hierarchical tree was constructed from the distances between all pairs (Matlab function linkage) and such tree was used to form clusters (Matlab function cluster; "cutoff " parameter 0.15).

As the highest similarity between synergies in the same cluster was obtained with a non-zero relative time shift, the value of such time shift with respect to the synergies of subject S1 was subtracted from the timing coefficient (*ti*) to compare them across subjects.

To assess the potential effect of catching performance, we compared the synergies extracted from trials in which the ball was caught with those extracted from trials in which the ball was intercepted but not caught.

To analyze synergy amplitude coefficients (*ci*), each synergy was normalized to the maximum of the Euclidian norm of synergy vectors (**w***i*(t)) across time samples and the corresponding coefficient scaled by this norm. Moreover, to compare amplitude coefficient across launch conditions, each coefficient was normalized to its maximum value.

### *Statistical analysis*

To explain the effect on the synergies recruiting parameters of experimental condition (three ball flight times and two ball arrival heights), the synergy amplitude and timing coefficients were submitted to either a Two-Way ANOVA test (3 arrival times × 2 height conditions) or to a multiple linear regression analysis according to the model:

$$\gamma = \alpha + \beta T + \gamma Z\_d + \delta(T, Z\_d) + \varepsilon\_d$$

where *T* is the time variable, i.e., the ball flight time, *Zd* is a dummy variable for the ball arrival height *Z*, i.e., *Zd* = 1 for low launches and *Zd* = 2 for high launches. Statistical analyses were performed in the R software environment (R Development Core Team, 2011). The level of significance was set at *p* < 0.05.

**FIGURE 1 | Example of wrist kinematics and EMGs profiles.** *First column*: example of wrist path in the sagittal plane **(top)**, rectified EMGs **(middle)**, and wrist speed profile **(bottom)** for one trial of subject S1 catching a ball with flight duration *T*<sup>1</sup> and arrival height *Z*1. *Second column*: wrist trajectories, filtered EMGs, and wrist speed profiles averaged across multiple trials of subject S1 in the condition (*T*1,*Z*1). *Thick black lines* represent averages across trials, *gray areas* represent the standard deviation; EMGs were aligned to the launch time and normalized to the maximum of each channel across all trials and all launch conditions (*vertical bars*). Times of wrist movement onset and of ball impact with the hand are indicated respectively by the *dashed lines* in the EMGs and speed profiles plots and by *gray circular markers* in the wrist trajectory plots.

# **RESULTS**

Subjects, starting with the arm at rest and relaxed along their body, were instructed to catch lightweight balls projected from about 6 m from their shoulder and at 1.66 cm height and arriving at their frontal plane at two different heights and with three different flight times. On average, 67.6 ± 11.8% (mean ± SD, *n* = 6, range 84.6 – 55.0%) of launched balls were successfully caught and 30.8 ± 12.0% (41.7 – 13.5%) were intercepted but not caught. Mean onset time of the wrist movement with respect to ball launch over all trials was 171 ± 15 ms (mean ± SD, *n* = 6, range 149 – 193 ms). Mean movement time was 377 ± 20, 459 ± 21, 538 ± 23 ms (mean ± SD, *n* = 6) for low launches (*Z*1) at the three different flight times (*T*1, *T*2, and *T*3) and 370 ± 7, 462 ± 15, and 575 ± 23 ms for high launches (*Z*2).

### **MUSCLE PATTERNS**

The EMG activity waveforms recorded during catching of a ball flying along a trajectory with a low arrival height and the shortest flight time (condition *Z*<sup>1</sup> *T*1, **Figure 1**, *first column*, a single trial and, *second column*, average over 10 trials for subject S1) illustrate the key features of the patterns observed in all conditions and subjects. Muscles were activated mainly in two phases. Elbow flexors (biceps brachii and brachioradialis) and shoulder forward flexors and scapula elevators (anterior and middle fibers of deltoid, upper trapezius) were among the most active muscles in a first phase during which the hand quickly raised from its initial rest position toward the region of interception. Muscles involved in backward flexion, adduction and internal rotation of the humerus (teres major, posterior deltoid, pectoralis major) or in extension and flexion from extended positions of the humerus (latissimus dorsi), together with elbow extensors (triceps brachii) were instead mostly active in a second phase in which the hand impacted the ball.

The general temporal structure of the muscle patterns for catching and their dependence on the flight time can be grasped from the ensemble averages of the EMG waveforms across muscles and trials with the same flight time. Two distinct peaks are clearly visible in the averaged EMG waveforms (**Figure 2**). The first peak has an approximately constant latency with respect to the time of ball launch, independently of flight duration [203 ± 14 ms, mean ± SD, *n* = 6, for *T*1; 216 ± 18 ms for *T*2; 212 ± 11 ms for *T*3; One-Way ANOVA, main effect of *T*:*F*(2, <sup>17</sup>) = 1.25, *p* = 0.31]. Instead the second peak has an approximately constant lead with respect to the time of impact, independently of flight duration [ −28 ± 16 ms for *T*1; −5 ± 19 ms for *T*2; −41 ± 26 ms for *T*3; One-Way ANOVA, main effect of *T*:*F*(2, <sup>17</sup>) = 1.73, *p* = 0.21]. Thus, the general temporal structure suggests that an initial component of the muscle pattern is generated in response to the detection of the ball launch and a second component is precisely timed to the time of interception. This finding confirms previous results obtained for catching a free-falling ball (Lacquaniti and Maioli, 1989), and extends the observation to unconstrained catching of ballistic projectile motion.

### **TIME-VARYING SYNERGIES**

The novel results concern the synergic organization of muscle activity. To characterize the fine spatiotemporal structure of the muscle patterns, we decomposed them as the combination of two time-varying muscle synergies, each one with a condition-specific amplitude and onset time. Each synergy had a duration of 400 ms,

impact (*large dashes*) events. Each channel of the EMG averaged waveforms was normalized to its maximum amplitude across all movement conditions before averaging across channels. In the top panels waveforms are aligned to the onset time, in the bottom panel waveforms are aligned to the impact time. Evidence for rough division of the movement in two phases is provided by the presence of two main peaks in all the averaged waveforms.

spanning the duration of both EMG activation phases in all conditions. The reconstruction *<sup>R</sup>*<sup>2</sup> was on average 0.73 <sup>±</sup> 0.07 (mean ± SD, *n* = 6, range 0.65 – 0.83) indicating that a decomposition into two time-varying muscle synergies adequately captured the key features of the EMG waveforms.

The set of all pairs of synergies extracted from all subjects could be grouped according to their similarity into two clusters (see Materials and Methods), each contacting only one of the two synergies extracted from each subject (**Figure 3**). The mean similarity between pairs of synergies was 0.86 ± 0.03 (mean ± SD, *n* = 15, range 0.50 – 0.91) in the first cluster and 0.85 ± 0.04 (mean ± SD, *n* = 15, range 0.55 – 0.88) in the second cluster. Moreover, the structure of the synergies was not affected by catching performance. The mean similarity between synergies extracted from all trials and synergies extracted only from trials in which the ball was caught was 0.94 ± 0.04 (mean ± SD, *n* = 6, pairs of synergies in the two conditions for each subject) for the first synergy and 0.92 ± 0.05 for the second synergy.

The synergies in both clusters captured synchronous and asynchronous activations of many muscles. The synergies of the first cluster (**Figure 4**, *first row*) recruited strongly elbow flexors (biceps brachii and brachioradialis), shoulder flexors (anterior deltoid and the clavicular portion of pectoralis) and shoulder elevators (superior trapezius) with a shorter bursts in the elbow flexors than in the other muscles in most cases. The synergies of the second cluster (**Figure 4**, *second row*) showed a higher level of

subject) defined as the maximum of the scalar product between the two normalized synergies across all possible relative time shifts. Synergy pairs in the matrix are grouped into two clusters using a hierarchical clustering algorithm (see Materials and Methods).

co-activation of the entire set of muscles but also a more complex temporal structure and inter-individual variability than the synergies of the first cluster. Elbow extensors (triceps brachii, long and lateral heads), shoulder extensors and adductors (posterior deltoid, lower portion of pectoralis, latissumus dorsi) were strongly recruited with a burst peaking before the end of the synergy, while other muscles (biceps brachii long head, anterior deltoid) were recruited later and showed a ramped activation.

# **SYNERGY RECRUITMENT MODULATION ACROSS CONDITIONS**

The averaged EMG patterns of subject S1 for all the launch conditions (**Figure 5**, top, gray shaded area) and their reconstruction as combination of the two synergies (black line) illustrate how the synergy amplitude and timing coefficients (represented respectively by the height and the horizontal position of the rectangles below the EMG waveforms) were modulated across conditions. In all six conditions, the first synergy was aligned with the launch time (first dashed vertical line of **Figure 5**). Across all subjects, a Two-Way ANOVA (3 flight times × 2 arrival heights conditions) showed that the ball flight time and arrival height did not affect the latency time of the first synergy with respect to the ball launch [main effect of *T*:*F*(2, <sup>35</sup>) = 2.24, *p* = 0.124, main effect of *Z*:*F*(1, <sup>35</sup>) = 1.41, *p* = 0.244, interaction:*F*(2, <sup>35</sup>) = 0.22, *p* = 0.801]. Instead, the onset of the second synergy appeared to be aligned with the impact time (second dashed vertical line of **Figure 5**). Across all subjects, there was a significant effect of the ball flight time but no effect of the ball arrival height [main effect of *T*:*F*(2, <sup>35</sup>) = 3.83, *p* = 0.033; main effect *Z*:*F*(1, <sup>35</sup>) = 1.92, *p* = 0.156; interaction:*F*(2, <sup>35</sup>) = 1.91, *p* = 0.165] on the onset time of the second synergy with respect of impact time. However the observed effect of the ball flight time could be ascribed to subject S3 and S4, who recruited the second synergy closer to the impact event in the *T*3*Z*<sup>2</sup> condition (S3 and S4) and in

movement in reaction to the ball launch. The second synergy (**bottom row**) captured mostly the EMG patterns related to the final interceptive phase of the movement. Below the set of muscle waveforms constituting each muscle synergy their mean waveform is illustrated within a box.

the *T*2*Z*<sup>3</sup> condition (S4). When we excluded those two subjects, we found no statistically significant effect of either flight time or arrival height [Two-Way ANOVA without S3 and S4, main effect *T*:*F*(2, <sup>23</sup>) = 2.14, *p* = 0.14; main effect *Z*:*F*(1, <sup>23</sup>) = 0.03, *p* = 0.85; interaction:*F*(2, <sup>23</sup>) = 1.54, *p* = 0.24].

We also analyzed the onset of the two synergies as a function of ball flight duration (**Figure 6**) after alignment to the launch time (*top*) and to the impact time (*bottom*) for all subjects (*different colors*) and both low (*circular markers*) and high (*square markers*) launches, as the flight time also depended on the choice of interception point, which varied across conditions and individual. When aligned to launch time, the onset of the first synergy did not depend on ball flight conditions (multiple regression analysis: *p*<sup>β</sup> = 0.14, *p*<sup>γ</sup> = 0.24, *p*<sup>δ</sup> = 0.29), while the onset of the second synergy increased linearly with the flight time (multiple regression analysis; *p*<sup>β</sup> < 0.01 with β = 1.08, *p*<sup>γ</sup> = 0.149, *p*<sup>δ</sup> = 0.12). On the contrary, when aligned to impact time, the onset of the first synergy decreased linearly with flight time (multiple regression analysis:*p*<sup>β</sup> < 0.01 with β = −0.79, *p*<sup>γ</sup> = 0.25, *p*<sup>δ</sup> = 0.29) while the timing of the second synergy did not depend on flight

time (multiple regression analysis:*p*<sup>β</sup> = 0.55, *p*<sup>γ</sup> = 0.149, *p*<sup>δ</sup> = 0.12).

The amplitude coefficients *ci* of the synergies were also modulated with respect to the ball arrival height. **Figure 7** shows the coefficients of individual subjects for each arrival height, normalized to the maximum across all conditions and averaged across flight times. The first synergy coefficient increased with arrival height (multiple regression analysis: *p*<sup>β</sup> = 0.44, *p*<sup>γ</sup> = 0.004 with g = 0.75, πδ = 0.0571). In contrast, the amplitude coefficient for the second synergy was modulated differently with arrival height [multiple regression analysis *p*<sup>β</sup> = 0.02, *p*<sup>γ</sup> = 0.19, *p*<sup>δ</sup> = 0.034; Two-Way ANOVA test main effect *T*:*F*(2, <sup>35</sup>) = 0.75, *p* = 0.482; main effect *Z*:*F*(1, <sup>35</sup>) = 40.93, *p* < 0.01; interaction:*F*(2, <sup>35</sup>) = 1.53, *p* = 0.23]. For subjects S1, S3, S4, and S5 the coefficient almost doubled going from *Z*<sup>1</sup> to *Z*2, while for subjects S2 and S5 the coefficient showed only a small increase.

# **DISCUSSION**

The synergy decomposition of the muscle activity patterns recorded during one-handed catching of fast balls, projected along spatial trajectories with different flight durations and arrival heights, revealed remarkable spatiotemporal characteristics. A large fraction of the variation of the muscle patterns for catching of balls with different flight parameters was captured by amplitude modulation, onset-time shift, and superposition of two time-varying muscle synergies, each representing the coordinated recruitment of a group of muscles with specific temporal activation profiles. The first synergy was recruited with a fixed latency from the time of ball launch and the second synergy with a fixed anticipation of the time of hand-ball impact. These results indicate that the control of the fast hand movements required to intercept a ball with flight duration below 750 ms relies on a combination of reactive and predictive processes. The initial muscular response, captured by the first synergy, was generated as fast as possible after the detection of the ball launch and allowed the subject's hand to reach the interception zone, always above the position of the hand at the time of launch. The following component of the muscle pattern, captured by the second synergy, guided the hand to the interception point in a fixed time interval, i.e., according to an accurate estimation of the time necessary to reach the interception point.

The fact that we could reconstruct the muscle patterns for catching balls with three different flight times and two different arrival heights by combining two time-varying muscle synergies does not imply that the two identified synergies are basic building blocks sufficient to control all possible interceptive movements. Many recent studies of the muscle patterns for the control of the arm have identified muscle synergies by systematically varying the spatial constraints of the task (Muceli et al., 2010; Cheung et al., 2012; Roh et al., 2012; Delis et al., 2013). In particular, in our previous studies of time-varying muscle synergies for reaching (d'Avella et al., 2006, 2008, 2011b) we systematically varied the direction of the movement. In contrast, in this study we have focused on the effect of changing the temporal constraints inherent in an interception task (flight duration) with only two related spatial constraints (arrival height). A much larger set of flight conditions with a variety of arrival locations should be tested to make

the synergies identified by a decomposition algorithm from the muscle patterns representative of basic building blocks underlying the generation all possible catching movements. While we plan to test a more diverse set of experimental conditions in the future, the synergy decomposition approach was used here as a tool to identify spatiotemporal components in the muscle patterns independent of temporal constraints, such as flight duration. Thus, the synergies identified from this dataset might be composed by the combination of a larger number of more fundamental building blocks modulated by the spatial demands of the task, such as ball arrival locations at different medio-lateral coordinates.

Our observation of the modulation of the onset time of the second synergy by the temporal demands of the task, i.e., the duration of the ball flight and of the ball arrival at the interception point, may be contrasted with the timing modulation of the synergies for reaching a stationary target. In our previous investigation of the time-varying synergies for reaching with different movement directions and speeds (d'Avella et al., 2008) we found that a set of phasic and tonic synergies captured the muscle patterns once time-normalized to equal movement duration and that the onset of the synergies most active in each movement direction, expressed as a fraction of movement time, was approximately constant over a broad range of movement speeds. Thus, when reaching a stationary target with a movement which could be planned in advance and whose duration did not depend on an online estimation of the ball arrival time, as with an interceptive movement, the entire muscle pattern was scaled with the movement duration and the onset time of the synergies was then shifted accordingly with respect of the movement end. To directly compare the synergy timing for catching and reaching in similar conditions, we considered a subset of the data we had previously collected (d'Avella et al., 2008), selecting only the trials for reaching upward, and we averaged the muscle patterns over the trials with a movement time within three intervals around the mean movement times of the catching movements for the three ball flight time conditions of this study. We then extracted two

time-varying muscle synergies from the averaged reaching muscle patterns, aligned to movement onset and time-normalized to equal movement time, as in the original analysis, and with the baseline activity before movement onset subtracted, as in the present analysis. The fraction of data variation explained by two synergies was higher for reaching than for the catching patterns (*R*<sup>2</sup> <sup>=</sup> <sup>0</sup>.<sup>83</sup> <sup>±</sup> <sup>0</sup>.10, mean <sup>±</sup> SD, *<sup>n</sup>* <sup>=</sup> 5, range 0.71 – 0.92) and the onset of the second synergy was not aligned with movement end but it occurred closer to the movement end for faster movements than for slower ones, as expected for muscle patterns scaled in time with movement speed. Thus, the modulation of the recruitment timing of the time-varying muscle synergies indicated that the muscle patterns for catching movements with different duration, differently from those for reaching, were notscaled in time as a unit but consisted of two distinct temporal components, one aligned to the ball launch and movement onset and a second aligned to the time of ball interception and thus shifted progressively later for longer movement times.

The observation of a fixed temporal relationship between muscle synergies and specific task events, ball launch and ball impact with the hand, is in accordance and extends previous observations made in the context of catching balls falling vertically on the hand (Lacquaniti and Maioli, 1989) and of balls launched frontally onto the hand (Savelsbergh et al., 1992). A fixed relationship was found between the onset of an early component of

wrist and elbow muscle activity and the time of ball release and between the onset of a late component and the time of ball impact for when catching balls falling from different heights (Lacquaniti and Maioli, 1989). Moreover, the onset of wrist and finger muscle activity showed a fixed relationship with the time of impact for balls launched frontally with different speeds (Savelsbergh et al., 1992). However, in contrast to those studies, in our catching task the location of the interception was not specified prior to ball launch, as the ball could be intercepted anywhere along the portion of the ball trajectory inside the hand workspace, and subjects had to determine not only the appropriate time of interception but also the appropriate place. Thus, in our unconstrained and naturalistic task subjects had to move their hand to the interception location and simultaneously time the closing of the fingers on the ball. Moreover, as all trials started with the arm relaxed along the body and all ball trajectories crossed the hand workspace above that hand initial location, subjects initiated their movement as soon as they detected the ball launch to bring the hand close to the interception region. Indeed, such reactive component was characterized by a time-varying synergy, recruiting mainly elbow flexors, shoulder flexors, and shoulder elevators. The amplitude coefficient of this synergy was modu-

lated by ball flight arrival height (**Figure 7**), which was known

second synergy showed instead a weaker modulation with arrival height for

subject S2 and S5 than the other four subjects.

by the subject after the first trial of each block. When catching balls falling vertically on the hand, a conditions in which is it not necessary to displace the hand, the early anticipatory response in elbow (but not in wrist) muscles, is time-locked to the ball release and modulated in amplitude by release height and it has been interpreted as an alertness reaction (Lacquaniti and Maioli, 1989). More generally, the initial component of the interceptive muscle pattern, captured by the first synergy, may be interpreted as the reactive release of a motor program that brings the hand in an adequate spatial location from which a second motor program can be released once the interception location and time has been determined. As the time-to-contact during the ball flight depends on the final interception position and since the onset of the second synergy occurred at about the same time-tocontact, we can speculate that during the first part of the flight the recruitment parameters, activation amplitude and onset time, of the second synergy are selected to intercept the ball at a specific location and time along its trajectory. The selection of the appropriate synergy recruitment parameters might derive from a combination of visual information and a-priori knowledge of the dynamic behavior of the ball under gravity and viscous drag forces (Zago et al., 2009). Thus, in such synergy-based control scheme, a-priory knowledge of gravity and drag on the ball may be implicitly incorporated in a direct mapping of visual information into a small number of synergy recruitment parameters. Non-invariant parameters involved in this mapping, such as those relative to drag, may be learned quickly from experience if such mapping is intrinsically low-dimensional, as it would be if also the dimensionality of visual input is reduced by the extraction of a few spatiotemporal features.

Whether the control of interceptive movement relies on predictive or prospective control is debated (Bootsma and van Wieringen, 1990; Brenner et al., 1998; Dessing et al., 2002; Tresilian, 2005; Zago et al., 2009). According to prospective control strategies, appropriate processing of the visual information directly drives the hand toward the target, thus continuously updating the motor response according to an extrapolation of the trajectory based on sensory feedback rather than on an explicit representation of the trajectory or a prediction of the interception point and timing (Chapman, 1968; McLeod and Dlenes, 1993; Peper et al., 1994; McBeath et al., 1995; Brenner et al., 1998; Montagne et al., 1999; Dessing et al., 2002). In contrast, according to the simplest form of predictive control strategy, a stereotyped motor response of a fixed duration is preprogramed and triggered when the predicted time-to-contact is equal to the sum of the movement duration and the sensorimotor delay (Tyldesley and Whiting, 1975). However, as movement duration and velocity have been observed to depend on the parameters of the interceptive task (Bootsma and van Wieringen, 1990; Smeets and Brenner, 1995; Tresilian and Plooy, 2006), according to a more flexible predictive control strategy movement time and the criterion for triggering the response may be adjusted (Tresilian, 2005). In general, hybrid control strategies combining predictive and prospective control, with different weights depending on the time available for the interceptive movement, may be adopted by the CNS (Tresilian, 2005; Zago et al., 2009; Katsumata and Russell, 2012). For rapid interceptive actions, as those required in our task, the effectiveness of continuous on-line guidance by visual feedback is limited because of sensorimotor delays and subjects might adopt an intermittent control strategy (Burdet and Milner, 1998; Gawthrop et al., 2011; Loram et al., 2011; Karniel, 2013) based on an overlapping sequence of time-varying muscle synergies (d'Avella et al., 2011b). Each synergy might be triggered at a critical time and modulated according to a combination of visual information and a-priori knowledge. This control scheme is related to the biphasic preprogrammed model proposed to explain biphasic movements observed in a one-dimensional hitting task (Tresilian, 2005). In such model a first slow component is initiated at one critical value of time-to-contact and a second faster component at a smaller value. In our three-dimensional interception task, however, the location of the interception is not fixed and known prior to ball launch and the parameters of the interceptive movement cannot be fully pre-programmed. Thus, rather than a process that simply triggers a command when a visually driven time-to-contact signal reaches a critical value dependent on the duration of the preprogrammed movement component, we envision a process that triggers a muscle synergy at a critical time-to-contact value and simultaneously adjusts the synergy amplitude parameters to ensure that the hands is at the right location after that time interval.

In our previous study of the kinematic characteristics of the interceptive movements in this naturalistic, unconstrained catching task we found that individuals with similar performance level used different movement strategies (Cesqui et al., 2012). Also for the six subjects included in this study, inter-individual differences were particularly clear in terms of wrist trajectory in the sagittal plane and wrist velocity at impact (**Figures 8A,B**). As the second synergy captures mostly the second phase of the movement, we expected differences in the relative recruitment of elbow flexors and extensors. Indeed, it is apparent in **Figure 4** that in the second synergy of subjects S4 and S6, who impacted the ball with on average backward velocity (negative *x* component), both elbow flexors and elbow extensors are highly recruited. In contrast, in the second synergy of subjects S1, S2 and S3, who showed hook-like trajectories and impacted the ball with low horizontal velocity, the elbow extensors are more strongly recruited than the elbow flexors. Finally, in the second synergy of subject S5, who impacted the ball with a forward directed movement, the elbow flexors were almost completely silent and the elbow extensors were highly recruited. We then quantified these observations by computing, for the second synergy, the ratio between the mean area of the activation waveforms of the elbow flexor muscles (BicLong, BicShort, BrRad) and the mean of the area of the activation waveforms of elbow extensor muscles (TrLat, TrLong) as a function of the horizontal component on the sagittal plane of the wrist velocity (**Figure 8C**). The linear regression between the elbow flexors/extensors ratio and the wrist tangential velocity at impact was significant (*R*<sup>2</sup> <sup>=</sup> <sup>0</sup>.76, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>.02). Thus, some of the individual characteristics of catching kinematics are reflected in the structure of the synergies. However, despite these inter-individual differences in the structure of the second synergy, the overall spatiotemporal organization of both synergies was remarkably similar across individuals (**Figure 3**). Additional inter-individual characteristics in the synergy organization can be

**FIGURE 8 | Inter-individual variability in wrist kinematics and in the structure of the second synergy.** Individual characteristics in the movement kinematics were observed in the several features such as the wrist paths in the sagittal plane, illustrated in **(A)** for condition (*T*1,*Z*1), and the horizontal component of the wrist velocity in the sagittal plane (v*<sup>x</sup>* ), illustrated in **(B)** for the same condition. Subject color coding as in preceding figures; *circular markers* indicate the time of impact. **(C)** Differences across subjects in the mean activation level of elbow extensors and

found when more than two synergies are extracted in each subject. We characterized the muscle patterns with two time-varying synergies because their gross temporal organization in all subject was biphasic (**Figure 2**) and because two synergies captured always at least 65% of the data variation. However, while in two subjects (S1 and S5) the *R*<sup>2</sup> value reached a plateau at 2 synergies [assessed by the detection of a "knee" in the R2 curve, see d'Avella et al. (2006)], in the remaining four subjects (S2, S3, S4, S6) the plateau was at three synergies. In these sets of three synergies, one synergy was always similar to the first synergy of the set of two and it showed the same temporal relationship with ball launch. The other two synergies were related to the second synergy in the set of two but they showed additional subject-specific features. In general, one of the two remaining synergies was aligned to the impact time. The other synergy was recruited mainly for high launches and it was timed differently across subjects, generally in between the other two synergies, but with variable onset alignment. Thus, subtle differences in the muscle patterns captured by sets with more than two synergies might contribute to additional individual kinematic features. However, the similarity


and learning of accurate movements. *Biol. Cybern.* 78, 307–318. doi: 10.1007/s004220050435


motor cortical damage. *Proc. Natl. Acad. Sci. U.S.A.* 109, 14652–14656. doi: 10.1073/pnas.1212056109


elbow flexors of the second synergy waveforms are related to differences in the mean horizontal wrist velocity at impact across conditions. Subjects S4 and S6 who intercepted the ball with on average negative (backward) velocity showed a high ratio close to 1 between elbow flexors activation and elbow extensors activation areas. For the subjects who impacted the ball with low velocity (S1, S2, S3) the ratio was near 0.4/0.5, while for subject S5, who impacted the ball with a large forward velocity, the ratio was near zero.

of two synergies across subjects suggests that individual kinematic strategies emerge from a rather subtle differentiation of the same basic control scheme.

In conclusion, the decomposition into time-varying muscle synergies of the activation waveforms of a large set of shoulder and elbow muscles underlying unconstrained and naturalistic hand movements for catching fast flying balls reveals an intermittent control strategy based on two phases associated to specific synergies and timed according to two key events, ball launch and ball impact. Such strategy may allow for a fast and efficient selection of the appropriate motor commands by incorporating a priori knowledge of the ball flight dynamics in a low-dimensional mapping of the kinematic features of the ball trajectory, extracted from vision, into synergy amplitude and timing recruitment parameters.

## **ACKNOWLEDGMENTS**

Supported by the Italian Ministry of Health, the Italian Space Agency (DCMC and CRUSOE), and the EU Seventh Framework Programme (FP7-ICT No 248311 AMARSi).

of fast-reaching movements by muscle synergy combinations. *J. Neurosci.* 26, 7791–7810. doi:


*Sci.* 4, 99–107. doi: 10.1016/S1364- 6613(99)01442-4


in timing manual interceptions. *J. Neurophysiol.* 91, 1620–1634. doi: 10.1152/jn.00862.2003


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 04 April 2013; accepted: 21 July 2013; published online: 07 August 2013. Citation: D'Andola M, Cesqui B, Portone A, Fernandez L, Lacquaniti F and d'Avella A (2013) Spatiotemporal characteristics of muscle patterns for ball catching. Front. Comput. Neurosci. 7:107. doi: 10.3389/fncom.2013.00107 Copyright © 2013 D'Andola, Cesqui, Portone, Fernandez, Lacquaniti and d'Avella. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Interfacing sensory input with motor output: does the control architecture converge to a serial process along a single channel?

#### *Cornelis van de Kamp1 \*, Peter J. Gawthrop2,3, Henrik Gollee2, Martin Lakie4 and Ian D. Loram1*

*<sup>1</sup> School of Healthcare Science, Institute for Biomedical Research into Human Movement and Health, Manchester Metropolitan University, Manchester, UK*

*<sup>2</sup> School of Engineering, University of Glasgow, Glasgow, UK*

*<sup>3</sup> Melbourne School of Engineering, The University of Melbourne, Melbourne, VIC, Australia*

*<sup>4</sup> School of Sport and Exercise Sciences, University of Birmingham, Birmingham, UK*

### *Edited by:*

*Yuri P. Ivanenko, IRCCS Fondazione Santa Lucia, Italy*

### *Reviewed by:*

*Raz Leib, Ben-Gurion University of the Negev, Israel David Logan, University of Maryland, USA*

### *\*Correspondence:*

*Cornelis van de Kamp, School of Healthcare Science, Institute for Biomedical Research into Human Movement and Health, Manchester Metropolitan University, John Dalton Building, Chester Street, Manchester M1 5GD, UK. e-mail: c.vandekamp@mmu.ac.uk*

Modular organization in control architecture may underlie the versatility of human motor control; but the nature of the interface relating sensory input through task-selection in the space of performance variables to control actions in the space of the elemental variables is currently unknown. Our central question is whether the control architecture converges to a serial process along a single channel? In discrete reaction time experiments, psychologists have firmly associated a serial single channel hypothesis with refractoriness and response selection [psychological refractory period (PRP)]. Recently, we developed a methodology and evidence identifying refractoriness in sustained control of an external single degree-of-freedom system. We hypothesize that multi-segmental whole-body control also shows refractoriness. Eight participants controlled their whole body to ensure a head marker tracked a target as fast and accurately as possible. Analysis showed enhanced delays in response to stimuli with close temporal proximity to the preceding stimulus. Consistent with our preceding work, this evidence is incompatible with control as a linear time invariant process. This evidence is consistent with a single-channel serial ballistic process within the intermittent control paradigm with an intermittent interval of around 0.5 s. A control architecture reproducing intentional human movement control must reproduce refractoriness. Intermittent control is designed to provide computational time for an online optimization process and is appropriate for flexible adaptive control. For human motor control we suggest that parallel sensory input converges to a serial, single channel process involving planning, selection, and temporal inhibition of alternative responses prior to low dimensional motor output. Such design could aid robots to reproduce the flexibility of human control.

**Keywords: modularity, motor control, intermittent control, posture, redundancy**

# **INTRODUCTION**

In the everyday acts of standing and movement, humans easily generate complex multi-joint behavior. When performed in conjunction with secondary task requirements (e.g., reaching to grasp an object) the Central Nervous System (CNS) is confronted with an impressive coordination and control problem involving redundancy at many levels (e.g., sensory, biomechanical, and neuromuscular). Despite the numerous possibilities, task performance is characterized by remarkable regularity and lowdimensionality in motor output (Latash et al., 2007). Equally, performance shows "repetition without repetition" (Bernstein, 1967) which means that repetitive solutions are never the same but always vary. Open, exciting questions for researchers studying human motor behavior include: how is this abundant, robust control achieved, and how might it be replicated artificially?

In the example of human standing, multi-elemental motor outputs are defined based on an input specifying what, out of multiple task possibilities, the system has to produce as a whole (cf. Latash, 2010). This input-output coupling is faced with the "problem of selection" that is: how do accumulated task possibilities and sensory information, supplied initially through parallel channels including different modalities and sensory cells within single modalities, converge to parallel motor output? It has been suggested that at the motor level, flexibility, versatility, and adaptability in (parallel) muscle output can be achieved through modularity of the control architecture [e.g., through muscle modes, motor primitives, pattern generators, etc.; for an overview see: Latash et al. (2007), Safavynia and Ting (2012)]. What is unknown, however, is how the many-to-many convergence from parallel sensory input to parallel motor output is organized that is, the interface of sensing and action associated with task selection.

The "problem of selection" has been studied in the context of the multiple degrees of freedom (DOFs) problem (i.e., selecting a solution from the range of possible solutions). One approach that of motor synergies, has been defined as a neural organization that ensures a one-to-many mapping between important and inconsequential variables/quantities providing for both stability of important performance variables and flexibility of motor patterns to deal with possible perturbations and/or secondary tasks (Latash et al., 2007). More recently, it was suggested that "all the DOFs at all levels always participate in all the tasks." This hypothesis called the "principle of abundance" (Latash, 2012), predicts both stability and flexibility in performance. According to this principle, the CNS facilitates "families of solutions" each of which is able to solve a multiple DOF problem. These solutions emerge from the interplay between the state of the system "body + environment" and the (task) imposed constraints (cf. Hu and Newell, 2011). This means that at any level of the sensori-motor control system, behavior is defined by the laws of physics (cf. Latash, 2012). However, a fundamental question remains unanswered: what and where is the process by which tasks are selected and by which these families of solutions are reduced to unique actualizations at temporal instances, in other words the process of selection (Stepp and Turvey, 2010; Latash, 2012)?

In tasks such as human standing, the neuro-muscular-skeletal system uses sensory information to regulate its motor output. In control engineering terms this means that the feedback loop between the control system's inputs and outputs is closed. Redundancy implies that the motor system generates parallel possible goal-solutions (which include alternative equivalent motor solutions to the overall task goal) from parallel sensory input (Cisek, 2005) and that this mapping is many to one. Thus, in a full model, the interface between (all) sensory input and modular motor output should hypothetically include the processes of parallel goal-solution generation and convergence to an instantaneously unique goal-solution prior to motor output.

Within the generalized optimal control models of biological control (Li et al., 2004; Todorov, 2004; Todorov et al., 2005; Lockhart and Ting, 2007; Karniel, 2011, 2013; Safavynia and Ting, 2012) the process for solving the redundancy problem lies outside the low level feedback control loop: it lies within a response planner which provides (continuous) settings for the continuous feedback control loop. Thus, a single optimal solution is provided from many possibilities (Rosenbaum et al., 1995; Todorov et al., 2005). Although such a system is not invertible causing a problem for control methods using an inverse model to generate appropriate motor commands from desired movement outcomes, optimal control provides a unique solution to the control of multi-input systems (Goodwin et al., 2001). However, usually this scheme does not provide a full model for the interface between sensory input and modular motor output because: (1) the task level parameters including the goal, the cost function and the mapping between high level and low level variables are usually preselected and, (2) the processes of parallel goal-solution generation and choice-selection is usually omitted. According to the generalized optimal control models (Todorov, 2004; Lockhart and Ting, 2007; Safavynia and Ting, 2012), the CNS monitors a (small) number of task parameters and, using low-level controllers, performs continuous feedback of plant output according to the task goals and optimization constraints set by high-level commands. Although this popular framework transcends the "classic robotics" approach in which trajectories are planned in joint space, and subsequently executed using servo control governing joint torque, even if correct, it begs the question of the process by which the task and optimality criteria are selected.

Recent demonstrations have shown that even for the control of external second order unstable systems1 , continuous feedback control is not necessary, and that intermittent control has inherent advantages for control and adaptability (Loram et al., 2011). Evidence from sustained manual control of an external single degree of freedom system (Loram et al., 2012; van de Kamp et al., 2013) showed that it is unlikely that the entire sensory-motor pipeline is implemented in parallel as a continuous linear time invariant process. Rather the evidence is highly consistent with a limiting serial process along a single channel which is expressed formally in the intermittent control paradigm illustrated in **Figure 1** (Gawthrop and Wang, 2011; Gawthrop et al., 2011; Gawthrop and Gollee, 2012; Loram et al., 2012). The hypothesis of a limiting serial, single channel process is supported by extensive studies in Psychology showing refractoriness in double stimulus experiments. This effect is known as the Psychological Refractory Period (PRP) (Telford, 1931; Pashler et al., 1998). Refractoriness refers to the temporal duration for which control responses cannot be, or are not modified following their initiation (Vince, 1948; Pashler et al., 1998; Gawthrop et al., 2011; Loram et al., 2012; van de Kamp et al., 2013). Extensive experimentation has firmly associated the PRP both with the Single Channel Hypothesis (Smith, 1967) and with response planning/selection in the stages of sensory analysis (SA), response planning/selection (RP/S) and response execution (RE). While SA and response execution operate through parallel channels, only the RP/S converges to a serial process along a single channel in these experimental conditions (Pashler et al., 1998).

In the current study two competing hypotheses will be tested. The multi-channel mapping hypothesis predicts continuous parallel processing which is free from refractoriness. The alternative single channel hypothesis postulates a limiting serial process along a single channel associated with the Psychological Refractory Duration (see **Figure 1**).

Here we aim at connecting two bodies of literature. The first concerns the PRP, the second concerns Modularity in Motor Control. We ask a novel theoretical and experimental question: namely, in the control of whole body movements, is the single channel hypothesis (characterized by a PRP) relevant for the modular organization of the motor control system? In brief, the rationale behind this question is: (1) multi-segmental control is subject to redundancy, thus the process of selection is relevant, (2) for a flexible yet integrated multi-segmental structure, organization of selection should converge along a single or coordinated number of task goals within the main

<sup>1</sup>The external second order unstable system can be thought of as an average standing human behaving as an unstable second order inverted pendulum with a time constant of ∼0.9 s.

**FIGURE 1 | General model of intermittent control.** The intermittent predictive controller includes continuous control as a special case, but generally the predicted system state is only used intermittently to update the time varying control signal sent from the generalized "hold" to the actuator. "Trig." detects when the control trajectory is to be updated and this event trigger requires three conditions: (1) a single event must be detected (i.e., all events within the sampling delay (s) are considered as one), (2) a minimum open-loop interval (-OL) must have elapsed since the previous event and (3) an error signal must exceed a threshold. Scalar signals are represented by solid lines, vector signals are represented by dashed lines. The participant's neuro-muscular dynamics are modeled (linear) in the "NMS" block with input *u*(*t*). The linear external controlled system with output *y*(*t*) (represented by the "System" block) is driven by signals *u*e(*t*) and *d*(*t*) representing the externally observed control signal

and the disturbance signal. The state of the composite "NMS" and "System" blocks is estimated *xo*(*t*) by the "observer" block. Sampling is preceded by an anti-aliasing low-pass filter "LP" of the subtracted set point disturbance *w*(*t*) and subject to an event delay "-S" between event and sampling. The trigger for the sampling times *t*<sup>i</sup> is provided by the event detector block labeled "trig." Sampling *x*w(*t*) takes place at discrete times *t*i. Sampled signals (represented by the dotted lines) are defined only at the sample instants *t*i. The future state error *x*p(*t*i) is provided by the "predictor" block. The various delays in the human controller are accounted for by a pure time delay of *t*<sup>d</sup> represented by the "delay" block. The block labeled "hold" is a system-matched hold that provides the delayed version of the continuous-time signal that is multiplied by the feedback gain vector *k* (block "State FB") to give the feedback control signal *u*(*t*). This figure and its caption are reproduced with permission from Gawthrop et al. (2011).

feedback loop, and (3) response selection and planning is experimentally associated with refractoriness and the single channel hypothesis.

To summarize: We study control of the whole body to move the end effector (head) in accordance with a tracking target. Although the tracking task has one degree of freedom (movement of the head marker in the Anterior-Posterior plane), all the relevant joints, also when locked, have to be controlled appropriately and thus this task involves redundancy. We use our recently developed method to identify refractoriness in sustained control tasks and discriminate intermittent (serial ballistic) from continuous (parallel) control (Loram et al., 2012; van de Kamp et al., 2013). We ask:


• Is there a plausible rationale for why biological control should converge to a single channel?

# **METHODS**

### **ETHICAL APPROVAL**

The experiments reported in this study were approved by the Academic Ethics Committee of the Faculty of Science and Engineering, Manchester Metropolitan University and conform to the Declaration of Helsinki. Participants gave written, informed consent to the experiment.

### **PROCEDURE, APPARATUS, AND MEASUREMENT**

The experimental setup is illustrated in **Figure 2**. Eight healthy subjects (6 male, 2 female), aged between 27 and 59 years received real-time visual feedback about the Anterior-Posterior (AP) position of a of a VICON marker, placed on the participants head whilst pursuing a double stimulus tracking sequence with varying Inter Stimuli Intervals (ISIs: 0.2, 0.3, 0.5, 0.8, 1.4,

2.4, and 4 s). Feedback was displayed on a 42 TV screen that was mounted on a trolley positioned at a 1 m distance in front of the participant. The visual scene contained two (3 cm, green and magenta) spheres (moving up and down alongside the vertical mid line of the screen) and was constructed using the Simulink 3D Animation Toolbox (a 1 cm movement of the VICON marker in the AP direction corresponded to a 2.5 cm movement of the magenta sphere in the superior-inferior direction on the TV screen). Using Vicon's SDK we developed C++ code to stream (UDP protocol) marker data to the Simulink model that was compiled using Real-Time Workshop and executed on a laptop using Real-Time Windows Target within MATLAB v7 (MathWorks) at a sample rate of 1000 samples per second. We informed the participants that every now and then, the green target would jump up or down the screen which should be pursued by controlling the magenta sphere which represented antero-posterior head position. Participants were told to keep their feet in the initial position, to track the green target position by means of swaying their body forward and/or backwards as quickly and accurately as possible, and that, as a measure of performance, we would look at the deviation between target position and head position (i.e., the green and magenta spheres on the screen). In a randomized order, the stimuli with seven different ISIs (see above) were displayed four times. Following (van de Kamp et al., 2013), the tracking target step sequence was designed such that participants were unable to anticipate either the timing, direction, or amplitude of step change in target position (**Figure 2**). Unpredictability of the direction of the double step stimuli was achieved by varying the direction of the 2 cm step in target position (up-down, down-up, up-up-down to center, down-down-up to center). By varying the ISI and recovery time, also the temporal predictability was eliminated. Based on previous experiments (Loram et al., 2012; van de Kamp et al., 2013) we estimated that when stabilizing posture, a random 4–5 s period would be sufficient to recover from a step response (participants kept tracking the target which then was in the neutral/middle position). To serve as an independent base measure, the two longest ISIs were chosen in the vicinity of the recovery period. The remaining five ISI were chosen to span the range from less than to more than the expected refractory duration based on previous published work (Loram et al., 2012; van de Kamp et al., 2013). Participants verified that the delay between marker movement and its presentation on the screen (∼100 ms) was not detectable. After familiarization with the intuitive control tasks and some practice all participants were ready to take part. Trial duration was approximately 4 min.

### **METHOD OF ANALYSIS**

Here we applied our published three-staged method of analysis (Loram et al., 2012) to the set-point (target) and response (AP head position) signals. Details with respect to the method of analysis are stated more fully in previous work (Loram et al., 2012; van de Kamp et al., 2013) and have been restricted here to the minimum necessary.

# *Stage 1: reconstruction of the set-point*

For each first and second step, we estimated the time delay (i.e., RT1 and RT2) between the step in target position and the subsequent whole-body-movement response (see **Figure 3**). This was achieved by modeling the closed loop relationship between the target (step sequence) and response (head position) as a low order, zero delay, autoregressive moving average (ARMA) process (for details see: Loram et al., 2012; van de Kamp et al., 2013). Next, the step sequence was reconstructed by sequentially and individually adjusting the instant of each step. This procedure, optimizing the fit of the ARMA model, was done in two consecutive ways; (1) reconstruction of the step sequence using a single

**FIGURE 3 | Representative responses; reconstruction of the set-point (stage 1).** The figure shows two representative examples of a whole body response to a double-step disturbance over time (black solid lines). The first response is free of interference, the second response shows interference between responses to the second and first stimulus. The dashed line (red) shows the time-invariant optimized ARMA fit corresponding to the original/actual double step stimulus (cyan dashed line). The dotted line (cyan) shows the best fitting ARMA model corresponding to the non-time-invariant optimized step sequence. Estimates of first (RT1 in blue horizontal bar) and second (RT2 in green horizontal bar) delays hover above, and span the interval between the actual and optimized step sequence.

equal adjustments of the instant of all steps -in effect determining the time delay of the ARMA model in **Figure 3** and (2) reconstruction of the step sequence allowing individual adjustment of the instant of each step (i.e., the optimized ARMA in **Figure 3**). By optimizing the delay to each step, this "set-point reconstruction" procedure provides a distribution of response delays to first and second steps (see RT1 and RT2 in **Figure 3**). The statistical analysis of these delays in the second stage enables testing for refractoriness.

## *Stage 2: statistical analysis of RT1s and RT2s*

To compare the distributions of RT1 and RT2 over ISIs (levels 1 through 7) a two factor (Stimulus Number and ISI) repeated measures ANOVA design is used. To evaluate significant main and interaction effects *post-hoc* ANOVAs were run. To maximize statistical power, bi-directional, and unidirectional step-pairs were analysed in one group.

The two stages outlined above allow us to test the following null hypotheses:


If these hypotheses are rejected, the following tests provide evidence discriminating against continuous control and quantifying the extent of refractoriness in this whole body movement task.


# *Stage 3: model based interpretation of delays*

If, in Stage 2, we find evidence of refractoriness which favors the alternative hypothesis that the single channel/IC model does apply to multi segment control of movement, the following tests would reveal its open-loop interval

(v) Repeat the regression method explained in (*iv)*, assuming a least mean squares fit with slope constrained to −1 [i.e., if a response is triggered by the first step a slope of −1 is predicted in the relationship between average RT2 and ISI for ISI < open-loop interval (Pashler et al., 1998; Gawthrop et al., 2011)].

# **RESULTS**

# **REPRESENTATIVE PURSUIT TRACKING RESPONSES TO DOUBLE STEP STIMULI WHILE STABILIZING POSTURE; RECONSTRUCTION OF THE SET-POINT (STAGE 1)**

**Figure 3** shows the comparison between a response pair without interference followed by a response pair that shows evidence of interference (i.e., the response to the second stimulus is delayed following close temporal proximity of the first stimulus). When the ISI is relatively long (see 2.4 s example **Figure 3**), the response delay to the second steps is not elongated relative to the first response. However, with a small ISI (see 0.2 s example **Figure 3**), RT2 is clearly elongated compared to RT1. This observation is quantified by means of the "set-point reconstructed" ARMA procedure. That is, if in **Figure 3** we overlay the participant's whole body response (in solid black), the ARMA prediction in dotted red, and the "set-point reconstructed" ARMA prediction in dotted cyan we see that reconstructing the set-point results in a better (ARMA) description of the data. If we then compare the delays identified in this first stage of the method of analysis (i.e., the blue and green bars in **Figure 3**, displayed over the interval between the actual and optimized steps) we see that for the small ISI, RT2 is elongated relative to RT1.

## **STATISTICAL ANALYSIS (STAGE 2): GROUP RESULTS**

**Figure 4** shows that the average 5–95% range in RT was systematically affected by Step Number. The mean range in RT was significantly higher for step 2 than for step 1 [693 ± 77 ms, 497 ± 75 ms, *F*(1, <sup>7</sup>) = 48.8, *p* < 0.0005].

The mean RT (see box plots in **Figure 5**) was significantly higher for step 2 compared to step 1 [431 ± 130 ms, 357 ± 95 ms, *F*(1, <sup>7</sup>) = 14.2, *p* < 0.01]. Combining RT1s and RT2s showed a significant increase in RT with decreasing ISIs [406 ± 93 ms,

488 ± 135 ms, 397 ± 149 ms, 403 ± 117 ms, 339 ± 103 ms, 349 ± 79 ms, 375 ± 103 ms, *F*(6, <sup>42</sup>) = 3.83, *p* < 0.05]. The significant interaction effect between Step Number and ISI, [*F*(6, <sup>42</sup>) = 3.19, *p* < 0.05] indicates that reducing the ISI had different effects on RT1 compared to RT2. Conducting two separate *post-hoc* tests to break down the interaction, showed a significant effect of ISI on the RT2s, [*F*(6, <sup>42</sup>) = 4.53, *p* < 0.05], but not on the RT1s.

Refractoriness was quantified in three ways. The first metric, as shown in **Figure 5**, revealed that RT2 was increased relative to RT1 for ISIs up to 500 ms (see **Figure 5** for *p*-values of the planned comparison of Step Number at each ISI). The second metric showed a mean increase in RT2 of 145 ms [subtracting the average RT1 (355 ms) from the intercept (500 ms) of the regression line of mean interfered RT2s over ISIs (cf. **Figure 5**)].

### **SINGLE CHANNEL INTERPRETATION OF RT INTERFERENCE (STAGE 3)**

The red line in **Figure 5** (as discussed in Loram et al., 2012; van de Kamp et al., 2013) represents the mean RT2s in accordance with the Single Channel interpretation for the externally triggered Intermittent Control model in **Figure 6B**. The intercept of the red

**FIGURE 5 | Group results: Mean delays (stage 2).** Figure shows the inter participant mean RT1 (blue) and RT2 (green) against ISI combined across the eight participants. The *P*-values of the ANOVA's *post-hoc* test are display above each ISI level (black if <0.05, gray if not). The blue dotted line shows the mean RT1, the dashed green line shows the unconstrained regression linear fit between (interfered) RT2 and ISIs. The red line is a linear interpretation of the single-channel hypotheses (please note that the steps between ISI do not necessary increase linearly).

line (820 ms) in **Figure 5** minus the base line of the refractory duration (i.e., the average RT1 355 ms) provided a third metric (465 ms) for the refractory duration. Like we showed in van de Kamp et al., 2013, this third metric corresponded closely to the first metric (compare 465–500 ms) both transcending the second, unconstrained linear regression, metric (i.e., 145 ms).

# **DISCUSSION**

### **SUMMARY OF RESULTS**

In this study we hypothesized that refractoriness is relevant to the coordination of multi-segmental movement. Eight participants controlled their whole body to ensure a head marker tracked a target as fast and accurately as possible. The following results were shown unambiguously.

• Refractoriness was present in whole body movement. Delays in response to the first step (RT1) were independent of the interstep interval (ISI). Delays to the second step (RT2) depended on ISI were greater than RT1 for ISI's less than and including 500 ms.

**FIGURE 6 | Model based interpretation (stage 3).** Parameter variants from the generalized IC model of **Figure 1** showing several possible relationships between RT2 and inter-step interval (ISI) indicative of serial ballistic (intermittent) and continuous control behavior. The simulated system is zero order. The open-loop interval (-OL) is 0.55 s and feedback time delay (*t*d) is 0.25 s. For four models: **(A)** continuous LTI (-OL = 0), **(B)** externally-triggered intermittent control with a prediction error threshold, **(C)** internally-triggered intermittent control (with zero prediction error threshold, triggered to saturation), and **(D)** externally-trigger intermittent control supplemented with a sampling delay of 0.25 s which us associated with the ISI at the maximum delay for RT2. The joined green circles represent the theoretical delays as a function of ISI which are confirmed by the model simulations (blue dots). This figure and its caption are based on van de Kamp et al. (2013).


This paper concerns evidence for refractoriness, the relevance of refractoriness for modularity in motor control, and the possible rationale for a serial process along a single channel. Following the facts established here and previously (van de Kamp et al., 2013) we discuss the following issues:


# *Is this evidence of refractoriness consistent with a serial process along a single channel?*

The evidence for refractoriness in this whole body movement task is clear even with a relatively small sample of trials and participants. This evidence of increased delays for RT2 at low ISI is consistent with sustained manual control of an external, second order, single degree of freedom system (van de Kamp et al., 2013). The similarity includes the evidence that the increased delay for RT2 vs. RT1 is reduced at the smallest ISI. The current results demonstrate refractoriness in sustained movement control of the whole body. This result extends the relevance of refractoriness in sustained control beyond manual tracking where it might be argued that control of the hand is more refined and specialized than control of the postural muscles in the legs and trunk. This result also extends the relevance of refractoriness beyond control of a uniaxial joystick the task in our previous work (Loram et al., 2011, 2012; van de Kamp et al., 2013) because this whole body task requires coordinated control of multiple kinematic segments. Our results appear contradictory to the current prevailing hypotheses of optimal feedback control in which feedback proceeds continuously along low level feedback loops in which the goal and strategy are preset. Thus, the interpretation we have made previously (van de Kamp et al., 2013) applies also to this task of moving the whole body to control head position and the reader is referred to that discussion. The key point is that refractoriness is not compatible with a continuous, time invariant and linear process and that because the system is refractory, redundant, time varying with sensory delay and containing many DOFs, a serial process along a single channel is relevant to the control of such a system. The reader will probably not be surprised that human motor control is non-linear. However, the finding of refractoriness, a systematic time variance in which responses show increased delay when they follow a closely preceding response, points to a modular element or process in the human motor control architecture that authors usually neglect.

Does the evidence of refractoriness imply a serial process along a single channel? If RT2 were linearly related to ISI with a gradient of −1, then as clearly articulated by Pashler and Johnston (1998) that would be consistent with a serial process along a single channel in which a second process cannot start until a first process has completed. Within a control system, this idea is represented within an intermittent control paradigm in which feedback cannot be applied until a minimum open loop interval has elapsed (**Figure 1**) (Gawthrop and Wang, 2009, 2011; Gawthrop et al., 2011; Gawthrop and Gollee, 2012; Loram et al., 2012). The minimum open loop interval, or intermittent interval as it is called, is an implementation of the single channel PRP as expressed by Craik (Vince, 1948) and Pashler (Pashler et al., 1998). Within this event driven intermittent control paradigm (**Figures 1**, **7**), elapse of the minimum open loop interval is one of the two necessary conditions for triggering a feedback informed control trajectory. This ensures that control proceeds serially as a sequence of control actions that are constrained to be ballistic for at least the minimum open loop interval. This serial process along a single channel, and the associated refractoriness, is not represented by state related switching in which a state dependent error signal crosses a threshold. State related triggering is advocated by some authors (Asai et al., 2009; Suzuki et al., 2012) and is represented in the second triggering condition of our intermittent control paradigm (**Figure 1**).

The purpose of the model simulations in **Figure 6** is to show that: (1) a continuous model does not show refractoriness, (2) a gradient of RT2 vs. ISI shallower than −1 is compatible with the single channel hypothesis, and (3) a reduced RT2 at the lowest ISI (i.e., departure from the red line in **Figure 5**) is compatible with the IC model by using a sampling delay. As illustrated in **Figure 6**, our model simulations show that the largest ISI at which RT2 is significantly larger than RT1 gives the open loop interval and that that the ISI at which the RT2 is largest shows the sampling delay.

Departure from the linear relationship between RT2 and ISI at small ISI or a gradient of −1 does not invalidate the single channel hypothesis (**Figure 6**). For this explicitly single channel model (**Figures 1**, **7**), when one events is triggered for each target step, following the minimum open loop interval, the gradient is −1 (**Figure 6B**). If additional events are subsequently triggered, by an error signal crossing a threshold (e.g., due to increased noise) the slope will be less than −1. Setting the event threshold to zero so events trigger internally at the maximal possible rate will result in a gradient of -0.5 and a range of reaction times equal to the intermittent interval (**Figure 6C**). If applied noise is high enough, the relationship between RT2 and ISI is not defined by the IC model and the slope is zero (Loram et al., 2012). Supplementing the intermittent control model with low pass filtering of the set-point and a sampling delay (i.e., the delay between the event and the sampling instant cf. **Figure 1**) leads to RT2 decreasing as ISI decreases resulting in a peak in RT2 at a ISI which is equal to the sampling delay (**Figure 6D**). This feature reproduces the amplitude transition function (ATF) observed by Barrett and Glencross (1988a,b), in which participants combine their responses to first and second steps stimuli for small ISIs. Depending on parameter settings for noise levels, event thresholds, sampling delays, and low pass filtering; varying relationships between RT2 and ISI can be simulated consistent with the single channel hypothesis. The key evidence supporting a serial process along single channel hypothesis is the evidence of refractoriness. The decrease in RT2 with decreasing ISI at the lowest inter-stimulus-intervals may indicate incomplete convergence to a single channel for those lowest-inter-stimulus intervals (Resulaj et al., 2009).

# *What is the relationship between intermittent control and modularity of motor output?*

The existence and nature of the modules in the control architecture is far from settled. For instance, regularity and lowdimensionality in the motor output are often taken as an indication of modularity but could they simply be a by-product of optimization and task constraints? Moreover, what are the relationships between modules at different levels, such as muscle synergies and basic action concepts?

Our data for this whole body tracking task and for the visuomanual tasks that we have studied (cf. van de Kamp et al., 2013) shows refractoriness compatible with a single channel hypothesis as embodied in our intermittent control model (**Figure 1**). From psychology, refractoriness of this kind has been demonstrated to be associated with response selection according to a single channel hypothesis (Pashler et al., 1998). Biological systems are characterized by redundancy at multiple levels. Many joint configurations can produce the same end effector location, many actuator/muscle activation patterns produce the same joint torque, many control/neural activation patterns and pathways can produce the same actuator/muscle activation. Our data, combined with the evidence from psychology, leads us to propose that a process of selecting one movement alternative from the many possible occurs within the feedback loop that regulates this head tracking task (cf. **Figure 7**).

Selection of task in the space of performance variables and translation to elemental variables seems common to many schemes. To aid discussion of this question, we show in **Figure 7** two main schemes. In the first scheme (see **Figures 7A,B**), the redundancy problem is solved outside the main feedback loop within a planner. The planner provides hierarchically and temporally prior settings to the main feedback loop, which is continuous and parallel in nature. We regard this scheme as broadly representing the prevailing idea of biological control discussed by many authors within the continuous optimal feedback paradigm (e.g., Li et al., 2004; Todorov, 2004; Todorov et al., 2005; Lockhart and Ting, 2007; Karniel, 2011, 2013; Pruszynski and Scott, 2012; Safavynia and Ting, 2012). It has been argued that this scheme is, however, un-biological (Cisek, 2005). In the second, alternative, scheme (see **Figures 1**, **7C,D**) we propose a new hypothesis that the redundancy problem is solved within the feedback loop in a refractory response selector. The refractory response planner continuously observes multiple sensory input and multiple possible task-goal choices. The refractory response selector converges the redundant possibilities into a single output which is communicated intermittently to the response execution process. The response execution process translates the single output synergistically to the multiple muscles according to its current parameter settings. Once these are selected, the underlying control actions in the space of the elemental variables can be achieved by lower order mechanisms such as pattern generators, muscle modes, synergies, or optimal feedback systems. This second scheme generalizes the intermittent control hypothesis presented in **Figure 1**.

ordering the selected strategy to be employed continuously via the low level feedback mechanism. This feedback loop consists of the "Controller" enclosing the continuous stages of sensory analysis (SA) and response execution (RE). Panel **B** shows a particular version of panel **A**: a standard engineering controller with observer and state FB blocks. The command signal serves as a single input to the "Plant" whose neuro-muscular system (NMS; panel **E**) synergistically (e.g., pattern generators, muscle modes, synergies, or optimal feedback systems) translates it to the multiple muscles according to its current parameter settings. Once these are routed, the underlying muscle forces actuate

# *Is there a possible rationale for why biological control should converge to a serial process along a single channel?*

Exploitation of redundancy is biologically important. Biological systems generally exploit redundancy to improve robustness and flexibility of control (Karniel, 2011). The exploitation of all available DOFs to maximize performance and flexibility is generally a sign of skill and learning (Bernstein, 1967, page 107–108), whereas the elimination of available DOFs is usually a symptom the "refractory Response Planner" from **Figure 1**) forms the intermediate stage between sensory analysis (SA) and response execution (RE); an online process of selecting one movement alternative from the many possible which occurs within the feedback loop that regulates the task. Panel **D** shows a particular version of panel **C** based on the authors' implementation of intermittent control in engineering terms (Gawthrop et al., 2011, **Figure 2**). Panel **E** shows a particular hierarchical representation of the NMS block of panels **A**–**D** where *k* is the feedback gain, Sigma generates a weighted sum of muscle forces and x\_ss synergistically allocates the desired forces to each muscle.

of declining ability through age (Hsu et al., 2012), disease (Oude Nijhuis et al., 2008; Pasman et al., 2011), or fear (Adkin et al., 2002).

Utilization of redundant possibilities requires selecting one possibility from many at any instant. Fundamentally, the process is one of convergence and as stated by Cisek (2005), "the processes of motor planning appear to be inextricably entwined in the processes of decision-making." Important questions are whether convergence should narrow to a single channel and whether convergence should lie within or outside the main feedback loop. For example choice might be exercised once in selecting one strategy which is employed continuously via low level feedback mechanisms or choice might be exercised online during execution of the task in mechanisms which allow or require choice to be exercised iteratively within the feedback loop between sensory input and motor output. We see two relevant issues, namely compatibility and optimization of coordination, and rate of implementation of selection.

Compatibility and optimization of coordination: Some human tasks are incompatible. We cannot point and make a fist; we cannot flex our knees while at the same time extending them; we cannot walk and stand still simultaneously. Some tasks are partially compatible, for example walking and pointing. To avoid simultaneous engagement in mutually incompatible tasks is obviously essential. More subtly, the selection of compatible routines and the suppression of routines which are partially incompatible or merely inappropriate must underlie skilled and economical task performance. When we perform compatible tasks simultaneously such tasks need to be fully integrated to prevent mutual interference. One way to do this is to select sequentially a compatible family of lower level modules such as pattern generators, muscle modes, and synergies which translate the command to control actions. By using a single channel, only one such compatible family is selected at one time and all others are held off. For this experiment staying upright and tracking a dot might be two independent compatible tasks, two independent incompatible tasks, or one coordinated task. Our reasoning is that optimization of coordination of two tasks (dot tracking and staying upright) by eliminating mutual interference—in effect—becomes the same thing as controlling a single task in the task-space. Hence we offer the rationale that optimization of coordination leads to unification of control into a single synergy requiring a single channel for its selection.

*Rate of implementation.* We suggest that maintaining the response selection process within the feedback loop, maximizes the rate at which response selection can be translated to motor output as an open loop process. The alternative of placing the response selection process temporally or hierarchically outside the feedback loop reduces the rate of translating response selection to motor output because it imposes the closed loop dynamics of the feedback control loop onto the translation between response selection and motor output (cf. **Figure 7**).

To summarize, until the point of selection, competitive commands could be prepared in parallel as envisaged in models of decision making (Cisek, 2005; Sinha et al., 2006; Carpenter et al., 2009; Noorani et al., 2011). After selection, competitors should be suppressed (Neumann, 1996). The duration of the suppression should be sufficient for the command to be executed without interference. Convergence of parallel input to a sequential, serial process along a single channel seems the ideal solution to maximize optimal, coordinated function. The serial process involves planning, selection and temporary inhibition of competing responses prior to low dimensional motor output. The serial, ballistic nature of the process removes the obligation of closed loop dynamics from response generation. The consequences are intermittent control and refractoriness. Because all input is squeezed through a single channel this is often referred to as a "bottleneck."

# *Is there a plausible neural substrate for intermittent control and refractoriness?*

Some authors support a theory of central IC, with a short planning interval (100 ms), neurologically based in a cerebellothalamo-cortical loop, and related to a form of physiological tremor ("movement discontinuities" or "bumps") during slow movement (Vallbo and Wessberg, 1993; Neilson and Neilson, 2005; Bye and Neilson, 2008, 2010). However, these high frequency oscillations may simply be an effect of limb resonance (Lakie et al., 2012). Our recent evidence associates IC with longer open loop intervals (250–500 ms) related to the low bandwidth of voluntary control (Craik, 1947; Vince, 1948; Navas and Stark, 1968; Hanneton et al., 1997; Slifkin et al., 2000; Loram and Lakie, 2002; Loram et al., 2005, 2011, 2012; van de Kamp et al., 2013). The "bottleneck" associated with these longer periods of refractoriness does not occur at perceptual or motor stages of information processing, but at some central stage (Pashler and Johnston, 1998; Sigman and Dehaene, 2005). Where is it located? Some brain imaging evidence using dual tasks links have suggested frontal or pre-frontal cortical structures (Jiang and Kanwisher, 2003; Dux et al., 2006). However, Szameitat et al. (2006) have suggested that the increased activity of the prefrontal structures may represent an active task scheduling mechanism (response planning—temporal ordering of the dual tasks) rather than suggesting that this is the site of the bottleneck. It may be that the bottleneck analogy—which suggests a restriction continuously throttling the flow—leads to a search for the wrong type of mechanism. A better metaphor might be that of a conductor, who intermittently engages and suppresses sections of his orchestra. The central limitation is then the rate at which the intermittent adjustments can be made. Where in the brain is an appropriate switching mechanism which involves selective inhibition and facilitation of global motor activity (the central conductor) to be sought?

The basal ganglia are a clear possibility. They are now believed (e.g., Redgrave et al., 1999) to operate as a generic action selection system, receiving input from a broad range of other brain areas, and producing output that selects particular actions to perform. The opposing roles of the two cortical re-entrant loops (direct loop—thalamic facilitation of cortical output via *ansa lenticularis*) and the indirect loop (thalamic inhibition of cortical output via *globus pallidus pars externa*) might provide a plausible mechanism to engage and inhibit cortical outputs. Gurney et al. (2001) have suggested a basal ganglia mechanism whereby *salient* actions are selected and *promiscuous* actions are suppressed, rather like center—surround antagonism in visual processing. There is evidence that the basal ganglia are important to response selection, are associated with refractoriness and are part of the IC loop (Houk et al., 2007). These findings have been used to link the basal ganglia with computational models of IC and with neurological and behavioral deficits in decision making and action selection associated with Parkinson's disease, schizophrenia and Attention Deficit Disorder (ADD) (Houk et al., 2007). This justifies the possibility that functional deficits in Parkinson's disease including the initiation and selection of new responses, freezing, and postural rigidity might be related to deficits in the IC loop. Clearly, it would be interesting to find physiological and anatomical evidence positively linking IC to the basal ganglia. Do they work as an intermittently adjusted selector switch, and what happens when this switch is damaged by disease?

# *Could the design of autonomous robots benefit from a module including a serial, single channel process?*

Robots, like humans contain redundant possibilities within a multi-segmental structure. The rationale for a serial, single channel process as necessary to optimize task selection and coordination from redundant possibilities applies equally in this case as it does for humans. Intermittent control implements a serial, single channel process as the appropriate engineering solution to control problems in which there is a time consuming online computational process (Ronco et al., 1999). When the actuators, the system being controlled and the external constraints are time invariant, then the control signal can be computed rapidly from measured quantities, the reference signal, and pre-computed parameters such as the gains of a continuous optimal controller. However, when the actuators, system and constraints are time varying then online optimization and computation of the control signal is desirable. Intermittent open loop predictive control uses an intermittently moving time horizon which allows slow optimization to occur concurrently with a fast control action. This approach allows handling of time varying systems and constraints at the expense of increased online computational requirement. Thus, intermittent control provides for a time consuming online optimization process which lies at the heart of flexible predictive control. A serial, single channel process, and its implementation

# **REFERENCES**


through intermittent control, appears to be a valuable element missing from current schemes.

# **CONCLUSION**

Eight participants controlled their multi-segmental body to ensure their head tracked a stepwise moving target as fast and accurately as possible. This is a one degree of freedom task with control redundancy. Analysis showed enhanced delays in response to target steps with close temporal proximity to the preceding step. This evidence of refractoriness is incompatible with control as a linear time invariant process. This evidence is consistent with a single-channel serial ballistic process within the intermittent control paradigm with a substantial intermittent interval related to the bandwidth of voluntary control. A control architecture reproducing intentional control of human movement must reproduce refractoriness and provide a solution to redundancy. Albeit at this stage not an experimental -deductive conclusion we suggest that best coordination of redundant possibilities provides a rationale for why the biological control architecture might converge parallel sensory input to a serial single channel process involving planning, selection and temporal inhibition of alternative responses prior to low dimensional motor output. Intermittent control, a serial, single channel process, is designed to provide computational time for an online optimization process and is appropriate for flexible adaptive control. Such design has potential to aid robots to reproduce the flexibility of human control.

# **ACKNOWLEDGMENTS**

We acknowledge EPSRC financial support for the ICMM project via the linked grants EP/F068514/1, EP/F069022/1 and EP/F06974X/1. Furthermore, we would like to thank Andy Nisbet for writing the C++ code for the VICON-Simulink interface.

of a central bottleneck of information processing with time-resolved fMRI. *Neuron* 52, 1109–1120.


in the basal ganglia. I. A new functional anatomy. *Biol. Cybern.* 84, 401–410.


selection problem. *Neuroscience* 89, 1009–1023.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 21 December 2012; accepted: 20 April 2013; published online: 09 May 2013.*

*Citation: van de Kamp C, Gawthrop PJ, Gollee H, Lakie M and Loram ID (2013) Interfacing sensory input with motor output: does the control architecture converge to a serial process along a single channel? Front. Comput. Neurosci. 7:55. doi: 10.3389/fncom.2013.00055*

*Copyright © 2013 van de Kamp, Gawthrop, Gollee, Lakie and Loram. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.*

# From action representation to action execution: exploring the links between cognitive and biomechanical levels of motor control

# *William M. Land1,2,3\*, Dima Volchenkov3,4, Bettina E. Bläsing1,2,3 and Thomas Schack1,2,3*

*<sup>1</sup> Department of Neurocognition and Action Biomechanics, Bielefeld University, Bielefeld, Germany*

*<sup>2</sup> Research Institute for Cognition and Robotics (CoR-Lab), Bielefeld University, Bielefeld, Germany*

*<sup>3</sup> Cognitive Interaction Technology - Center of Excellence (CITEC), Bielefeld University, Bielefeld, Germany*

*<sup>4</sup> Department of Physics, Bielefeld University, Bielefeld, Germany*

### *Edited by:*

*Yuri P. Ivanenko, IRCCS Fondazione Santa Lucia, Italy*

### *Reviewed by:*

*Florentin Wörgötter, University Goettingen, Germany Martin V. Butz, Eberhard Karls University of Tübingen, Germany Francesca S. Labini, IRCCS Santa Lucia Foundation, Italy*

### *\*Correspondence:*

*William Land, Department of Neurocognition and Action Biomechanics, Bielefeld University, Universitätsstraße 25, PF 100131, 33501 Bielefeld, Germany e-mail: william.land@uni-bielefeld.de* Along with superior performance, research indicates that expertise is associated with a number of mediating cognitive adaptations. To this extent, extensive practice is associated with the development of general and task-specific mental representations, which play an important role in the organization and control of action. Recently, new experimental methods have been developed, which allow for investigating the organization and structure of these representations, along with the functional structure of the movement kinematics. In the current article, we present a new approach for examining the overlap between skill representations and motor output. In doing so, we first present an architecture model, which addresses links between biomechanical and cognitive levels of motor control. Next, we review the state of the art in assessing memory structures underlying complex action. Following we present a new spatio-temporal decomposition method for illuminating the functional structure of movement kinematics, and finally, we apply these methods to investigate the overlap between the structure of motor representations in memory and their corresponding kinematic structures. Our aim is to understand the extent to which the output at a kinematic level is governed by representations at a cognitive level of motor control.

**Keywords: mental representation, basic action concepts, kinematic structure, spatio-temporal kinematic decomposition of movement, structure dimensional analysis—motorics, hierarchical motor control, memory structure**

# **INTRODUCTION**

Research on expertise in sports has shown that skilled performance is based not only on physical ability, but equally on task-specific cognitive competences. During extensive practice, relevant mental representations are formed, adapted, and reorganized in such a way that flawless performance is progressively facilitated, based on increasing order formation in the athlete's long-term memory. According to the perceptual-cognitive perspective, actions are planned and performed on the basis of structured cognitive representations of action effects in motor memory (Hommel et al., 2001; Mechsner et al., 2001; Schack and Mechsner, 2006; Hoffmann et al., 2007; Shin et al., 2010). Furthermore, because these representations govern the tuning of motor commands and muscular activity patterns, skillful coordination occurs when appropriate mental representations of the motor task and action goals are constructed (Schack and Ritter, in press). In order to illustrate how these processes can be conceptualized and explored empirically, we will present studies that investigated the organization of task-related cognitive structures, and the way these structures correspond to functional components of skilled motor performance. Additionally, we will present a new empirical approach for linking these mental structures to the structures observed within the movement kinematics. Before we turn to the methodological aspects of these studies, we will first present the underlying theoretical conceptualization of the cognitive architecture of human motor action, beginning with the concept of mental representations, which is fundamental to this approach.

# **MENTAL REPRESENTATIONS OF HUMAN MOTOR ACTION**

The idea that cognitive representations play an important role in motor control is reminiscent of classical ideas in psychology, such as the "ideomotor" approach adopted by Lotze (1852) and James (1890) in the 19th century or the model-theory studies of the construction of movement presented by Bernstein (1947) in the middle of the 20th century. James wrote for instance 1890 in his now seminal work *The Principles of Psychology*: "We may ...lay it down for certain that every representation of a movement awakens in some degree the actual movement which is its object" (p. 526). Recently, the term mental representation has been widely used in a large variety of disciplines, often with rather diverse content. Gilbert and Wilson (2007) have stated: "the mental representation of a past event is a memory, the mental representation of a present event is a perception, and the mental representation of a future event is a plan." Even though this definition sounds viable, it might not be sufficient for our purposes. Mental representations were first discussed in the philosophy of language, referring mainly to linguistic representations. Later, the issue was adapted by other disciplines such as philosophy of mind and psychology, and various theories have been formulated to describe the nature of mental representations. From these theoretical perspectives, the functionalistic one seems most relevant in our context, as it states that mental representations predominantly play a functional role for the cognitive system. According to this perspective, the function of mental representations is to make situations and objects cognitively available that are otherwise physically unavailable—in this respect, they are the only way to make non-actual situations and objects available for thinking and acting (Vosgerau, 2009).

Several authors have reflected upon the nature of mental representations of actions (e.g., Rosenbaum et al., 2001), and it has been argued that even mental representations of static objects are dynamic in nature, as they are derived from and based on dynamical action representations, which are evolutionarily more relevant for controlling behavior than representations of static scenes or objects (Freyd, 1987). In the context of the studies presented in the following, we refer to mental representations in terms of states of mind that correspond to experiences, and to the physical reality of objects and movements. Such internal representations arise from exposure to sensory stimuli, are multimodal, and refer to objects or events that we perceive in our environment via the processes of perception and processing in our brain (Barsalou, 2008). Mental representations occur on different levels and, due to the nature of our nervous system, can be independent of the actual presence of the object that they refer to in the world. Our ability to store such representations is the basis of our ability not only to learn, but also to make plans and predictions regarding what will happen in the future. Wilson (2002) points out that our cognitive apparatus can even construct mental representations of situations that we have never experienced, purely on the basis of linguistic input. Mental representations thereby play a central role in the control and organization of actions, serving as "organizers of activity" (Steels, 2003). In cognitive systems, internal representations co-evolve together with corresponding actions and become vehicles for higher mental functions, such as thinking and planning (Steels, 2003). As a consequence, these representations stay closely connected to the actions they serve (Glenberg, 1997), resembling them most crucially in terms of structural similarity (Johnson-Laird, 1989). Mental representations are considered vital for learning complex movements and movement sequences, for refining and adapting learned movements to the requirements of actual situations, and for automatizing movement patterns on an expert level. Expert performance in sports is typically characterized by a high degree of control and a sense of clarity, which can arise based on regularities in the mental representation that allow for relieving cognitive load, or, as Wilson (2002) has put it, for circumventing the "representational bottle neck."

# **REPRESENTATIONS AS A BASIS FOR ACTION CONTROL**

Current perspectives in cognitive psychology suggest that actions are represented in terms of their anticipated perceptual effects (e.g., Prinz, 1997; Hommel et al., 2001; Knuf et al., 2001). Interestingly, these perspectives resonate with the earlier ideas of Bernstein (1996a,b, 1967) regarding the construction and control of movements. Prior to the current perspectives, Bernstein had already pointed toward the large number of degrees of freedom in the human motor system, the need for continuous processing of sensory feedback to control this highly redundant system, and the importance of the anticipation of movement effects for movement organization. Bernstein (1996a,b, 1967) proposed a model of the construction of movements according to which different organizational (and evolutionary) levels interact to generate and control different types of movement. These levels are thought to interact not simply in a fixed hierarchical manner, but their mode of interaction and hierarchical organization depends on the type of movement task and the level of expertise of the performer. Bernstein's model claims that movements are constructed on the basis of five levels described as (1) paleokinetic regulation (regulation of muscular tonus and basic postures, including tonic reflexes), (2) synergies (dynamical stability of movement, rhythmic and cyclic movement patterns), (3) movement in space (spatial orientation and object manipulation), and (4) action (volitional control of movements, object-related action, focus of attention), and (5) symbol coordination (symbolic action control, speech).

Bernstein's model reflects the general idea that movement control is based on representations, which serve intentional movement planning, and that these representations reflect the functional movement structure. Alongside Bernstein's approach to the construction of action, there have been several formulations of the idea that movement control is constructed hierarchically. The model we propose here provides a comprehensive account for the way complex movements are controlled, stemming from the volitional initiation of the action to the lowest level of motor control. Thus, this model acts to provide the relevant framework for a connection between mental representations and motor output. Specifically, the model proposed views the functional construction of actions (Schack, 2004; Schack and Ritter, 2009; Maycock et al., 2010) on the basis of a reciprocal assignment of performance-oriented regulation levels and representational levels (see **Table 1**). These levels differ according to their central tasks on the regulation and

**Table 1 | Levels of motor action (modified from Schack, 2004; Schack and Ritter, 2009).**


representation levels. Each level is assumed to be functionally autonomous.

Both control levels, the *level of sensorimotor control* (I) and the *level of mental control* (IV), serve the main function of regulation, whereas the *level of sensorimotor representation* (II) and *level of mental representation* (III) are representational, and are closely connected to the two regulation levels. Levels I and II could be understood as responsible for the functional manipulation of objects and events, whereas levels III and IV can be assigned a more distal focus on objects and events. All levels are connected and interact with each other, but are functionally autonomous.

The *level of sensorimotor control* (I) is based on movement primitives and directly linked to the environment. It is induced perceptually, built on functional units composed of perceptual effect representations, afferent feedback, and effectors. The essential invariant, or set value, of such functional units is the representation of the movement effect within the framework of the action. The system is broadly autonomous, and automatisms emerge when the level of sensorimotor control possesses sufficient correction mechanisms to ensure the stable attainment of the intended effect. Studies of patients with impaired motility showed that the execution of movements can best be realized via anticipated sensory effects, and that such direct sensory effects are the crucial invariant of movement control (e.g., Van der Weel et al., 1991).

Modality-specific information representing the effects of the particular movement is stored on the *level of sensorimotor representation* (II). The relevant sensory modalities might change as a function of the level of expertise in the learning process and as a function of the task context. Grasping movements, for instance, are associated with kinesthetic, tactile, visual, and (in part) auditory feedback. This involves the representation of perceptual patterns of exteroceptive and proprioceptive effects that result from particular movements and refer back to the action goal. During the first steps of learning a novel complex motor action in sports, visual information is often used to monitor body posture and movement timing. In later stages of the learning process, proprioceptive information gains increased meaning, and aspects related to body position and timing are no longer needed to be monitored consciously. During the learning process, movement automatization is characterized by increasingly adequate correction mechanisms between levels I and II.

The *level of mental representation* (III) predominantly forms a cognitive workbench for the *level of mental control* (IV). The *level of mental representation* is organized conceptually, and is responsible for transforming the anticipated action effects into movement programs that sufficiently bring about desired outcomes. According to Bernstein (1967), an action is a structure subdivided into details, and action organization therefore has to possess a working model of this structure, containing the topology and spatiotemporal effects of the action. Mental representations of movement structures that serve this purpose are located within the *level of mental representation (III)*, and are based on the conceptual building blocks of action Basic Action Concepts (BACs) that will be described in the following section.

The *level of mental control* (IV) is induced intentionally and is relevant for the anticipation of effects. Movements are planned, controlled, and performed with reference to the anticipated effects, or intended goal postures (e.g., Rosenbaum and Jorgensen, 1992; Rosenbaum et al., 1992; Kunde and Weigelt, 2005). Findings from such studies suggest the existence of a mental model of the action, including its outcome, to which all control processes can be related. Level IV comprises functional components of volition or mental control, such as the coding of intended effects into action goals. Specifically in the context of sports, instructions and self-applied strategies for focusing attention and stabilizing performance are important aspects of mental control on this level.

### **REPRESENTATION UNITS IN MOTOR ACTION**

Perceptual-cognitive approaches propose that motor actions are formed by cognitive representations of target objects, movement characteristics, movement goals, and the anticipation of potential disturbances. Movements can be understood as a serial and functional order of goal-related body postures (Rosenbaum et al., 2001) and their transitional states. Furthermore, the link between movements and perceptual effects is bi-directional and based on information that is typically stored in a hierarchical fashion in long-term memory.

Based on Schack's Cognitive Architecture model (see **Table 1**), complex movements can be conceptualized as a network of sensorimotor information. The better the order formation in memory, the more easily information can be accessed and retrieved. This leads to improved motor performance, which reduces the amount of attention and concentration required for successful performance. The nodes within this network contain functional subunits, or building blocks, that relate to motor actions and associated perceptual and semantic content. These building blocks, termed BACs can be understood as representational units in memory that are functionally connected to perceptual events; or as functional units for the control of actions at the level of mental representation, linking goals at the level of mental control to perceptual effects of movements. Such BACs are activated by representations of starting conditions and deactivated by effect representations, both at the perceptual level.

Underlying neurocogntitive theories state that actions are represented in functional terms as a combination of action execution and the intended or observed effect, or movement goal (Prinz, 1997; Hommel et al., 2001; Knuf et al., 2001; Koch et al., 2004). BACs can be regarded as cognitive tools for the execution of actions such as complex movement tasks in sports (see Schack, 2004). Within these tasks, BACs serve the purpose of reducing the cognitive effort necessary for controlling the action. The same applies to actions performed in everyday life, as their successful execution often depends on experience and thereby requires a level of expertise the performer is hardly aware of.

Altogether BACs can be viewed as the mental counterparts of functionally relevant elementary components or transitional states of complex movements. They are characterized by recognizable perceptual features. They can be described verbally as well as pictorially, and can often be labeled with a linguistic marker. "Turning the head" or "bending the knees" might be examples of such BACs in the case of, say, a complex floor exercise. As mentioned above, each individual BAC is characterized by a set of closely interconnected sensory and functional features. For example, a BAC in tennis like "*whole body stretch motion*" is functionally related to providing energy to the ball, transforming tension into swing, stretching but remaining stable, and the like. Afferent sensory features of the corresponding submovement that allow monitoring of the initial conditions are bended knees, tilted shoulder axis, and body weight on the left foot. Reafferent sensory features that allow monitoring of whether the functional demands of the submovements have been addressed successfully are muscles stretched and under tension, proprioceptive and, finally, perhaps visual perception of the swinging arm and ball in view.

BACs are stored at a basic level of representation and are investigated and defined by experimental methods (like reaction time-measurement) or with the help of biomechanical methods. Altogether the methods are used to learn about the basic body postures of a particular movement and their mental counterparts in memory. The number of BACs that can be assigned to a given movement task depends on the complexity of the task, on the way it has been learned and trained, and on the level of expertise of the addressee. It is hardly possible to define BACs without the extensive feedback and cooperation of persons who master the task with varying levels of expertise, taking into account their different types of knowledge. Consequently, it is important to take the experience of teachers into account, and to also look at the way the task is actively structured during learning and training, as concepts that emerge during training are likely to remain intact as scaffolding in long-term memory. During the experimental procedure, BACs can be represented as pictures or verbal labels that are meaningful to the participants in order to trigger movementrelated long-term memory content. Pictures and verbal labels differ slightly in the way they address mental representations. Presenting an action in pictures instead of words commonly allows for a higher temporal resolution, however, dynamical cues cannot be represented in static pictures unless the stimuli are augmented by verbal terms or symbols (e.g., arrows). Furthermore, pictures represent very short time segments that have a clear temporal order within the action, whereas verbal terms can relate to longer-lasting and synchronous partial actions.

# **MEASURING MENTAL REPRESENTATIONS**

In principle, there are two methodological approaches to the experimental study of mental representation structures: to determine them from response behavior or to determine them from reaction times. Whereas the first approach has been used for the study of order formation in LTM, the second approach should be used only to ascertain chunk structures in working memory. Schacks architecture model proposes that not only the LTM structure of mental representations but also the exploitation of working-memory capacity serve a notable function in the organization of movement acts. It assumes that working memory forms a unit that is structurally and functionally distinct from LTM. A particular interesting method to measure structures of mental representation in LTM, the so called Structure Dimensional Analysis (SDA) method has been originally developed by Lander and Lange (1996) in cognitive psychology for ascertaining relational structures in a given set of concepts, and adapted by Schack (2001) for analyzing representations of movements (Structure Dimensional Analysis—Motorics, SDA-M). This experimental approach has been documented in several contributions (Schack, 2004; Schack and Mechsner, 2006; Hodges et al., 2007; Schack and Hackfort, 2007; Schack, 2012). Importantly, the method does not ask the participants to give explicit statements regarding their representational structures, but rather reveals this structure by means of knowledge-based decisions in an experimental setting. Altogether, the SDA-M consists of four steps: First, a special splitting procedure requires one to subjectively differentiate whether or not a given BAC is "functionally close" to another, or not. A randomly selected BAC is presented as the standard unit, or anchor, and all other BACs are displayed below the anchor in a randomly ordered list. One after another, each BAC is subjectively compared for similarity to the anchor. Thereby, the list of BACs is split into two subsets, a positive ("close") and a negative ("not close") set, which are then repeatedly submitted to the same procedure, until every BAC has been compared to every other. Based on the participants' decisions, the program sums the positive and negative subsets separately and delivers an Euclidian distance scaling between the items (BACs). Second in the process, a hierarchical cluster analysis is used to transform the set of items into a dendrogram.

Third, a dimensioning of the cluster solutions is performed through a factor analysis linked to a specific cluster-oriented rotation process, resulting in a factor matrix classified by clusters. Finally, because the cluster solutions can differ both between and within individuals, a within- and between-group comparison of the cluster solutions is performed using a structural invariance measure lambda to determine their structural invariance (Lander, 1991; Schack, 2010). The structural invariance measure is determined based on three defined values: the number of constructed clusters of the pair-wise cluster solutions, the number of items within the constructed clusters, and the average quantities of the constructed clusters. The lambda value is calculated as the square root of the product of two factors; one factor being the weighted arithmetic mean of the relative average quantity of the constructed clusters, the other one being the proportional number of clusters in the compared cluster solutions.

SDA-M can be applied in two alternative modes, a direct and an indirect scaling mode. In the direct scaling mode, participants make direct judgments about the functional equivalence of pairs of BACs (BAC × BAC: pairs of BACs are judged as closely or not closely related to each other). In the indirect scaling mode, decisions concerning the functional relationship of BACs are made on the basis of features (e.g., spatial, temporal or force parameters of a given movement) that are assigned to the BACs, with the BACs serving only as anchors, and features being judged as belonging or not belonging to the anchor in the context action (BAC × features: features are judged as closely or not closely related to anchor BACs). To determine classification probabilities of features in relation to BACs, the initial *z*-matrix is transformed into a probability matrix (*p*-matrix), consisting of *p*-values that indicate the classification probabilities of features to individual BACs belonging to clusters. Both modes of the SDA-M method include a hierarchical cluster analysis that reveals clusters of BACs (step 2); the difference is that in the indirect scaling mode the features are predefined, whereas in the direct scaling mode the concept dimensions can be accessed via a factor analysis (step 3).

# **PREVIOUS STUDIES INVESTIGATING MENTAL REPRESENTATION STRUCTURES**

In the past, the SDA-M method has been used to study differences between groups who vary in their experience with cognitive or motor tasks. In the following, examples of such studies will be given to demonstrate the broad spectrum of potential applications of the Cognitive Architecture approach.

Schack and Mechsner (2006) applied the SDA-M method to investigate participants' mental representations of the tennis serve by comparing the structures of high-ranking tennis players, lowranking players, and novices. With the help of tennis experts and coaches, 11 BACs were defined for the tennis serve in relation to the functional movement structure that can be derived on the basis of biomechanical movement parameters. According to this structure, the tennis serve consists of a pre-activation phase that serves to build up tension energy, the strike as the main functional phase during which the energy is conveyed to the ball, and the final swing during which the racket is decelerated and the body is brought back into a stable position. The results of the study showed that the mean group cluster solution of the high-ranking players corresponded to the functional movement structure, reflecting the three functional phases. The low-ranking players' cluster solution combined the strike and the final swing into one cluster that was differentiated from the pre-activation phase. The novices' solution did not contain any clusters, reflecting a lack of functional order formation within this group. Thus, the authors found that the mental representation structure of the experts were well matched to the functional and biomechanical demands of the task, whereas, the low-ranking and novice players' representations were less hierarchically organized and not matched to the biomechanical demands of the task.

Similarly, Bläsing et al. (2009) applied the SDA-M method to compare the mental representations of classical dancers varying in expertise level and dance novices. Mental representations of two well-defined movements from the classical dance repertoire, the *pirouette en dehors* and a small jump called *petit pas assemblé*, were investigated in professional ballet dancers, advanced ballet amateurs, amateur beginners, and sport students without any dance training experience. The results for the *pirouette* revealed that the mean cluster solutions of both groups of professional dancers and advanced amateurs corresponded to the functional phases, with only minor differences. The beginners' cluster solution differed largely from the others and did not show much alignment with the functional phases, and the novices hardly formed any relevant clusters. For the *pas assemblé*, no difference was observed between the two groups of amateurs. Amateurs and novices formed clusters that included all BACs and were similar to the clusters formed by the experts, only the experts' cluster solution reflected the dynamical initiation of the jump, corresponding to the functional phases.

In a follow-up study, the SDA-M method was applied in indirect scaling mode (BAC × feature), in which the BACs appear only as anchors, and features are sorted in relation to these anchors (Bläsing and Schack, 2012). The same movements were used, and the BACs defined for the first study were presented as anchors. Spatial direction labels as features were related to these anchors, and the participants (who had already taken part in the previous study, Bläsing et al., 2009) were instructed to answer positively for spatial directions which they associated with a given BAC within an egocentric reference frame, according to their own motor imagery of the movement. Results showed that for the *pirouette*, only the professional dancers' mean cluster solution contained one cluster that clearly corresponded to the main functional phase and associated this phase with relevant directions. The amateurs' mean cluster solution did not result in any functional clusters, and the novices' did not contain any clusters at all. For the *assemblé*, professionals' and amateurs' cluster solutions associated the main functional phase with relevant spatial directions, whereas in the novices' cluster solution did not correspond to the functional movement parameters.

The SDA-M method has not only been applied to compare task-related cognitive representations of experts to those of novices, but also to monitor differences between developmental stages. In a study on anticipatory motor planning in children, Stöckel et al. (2012), investigated the end-state-comfort (ESC) effect in children aged 7–9 years. Additionally, the children took part in an SDA-M task with pictures of a hand grasping common objects in different ways. Only the 9-years old children produced a cluster solution that separated uncomfortable from comfortable grasps. In combination with the results of the ESC experiment, the study showed that 9-years old children had more distinct representations of comfortable and uncomfortable grasp postures and a better ability to plan movements to end in comfortable postures compared to younger children. Furthermore, both abilities were found to be related, as children who clustered by grasp comfort also showed the ESC effect, whereas children who did not cluster by grasp comfort performed less consistently in the ESC task, suggesting that cognitive representations of grasp postures are crucial for manual posture and action planning.

In the presented studies, the SDA-M method was applied to investigate general differences in cognitive skill representation between participants of varying levels of expertise or developmental stages. For this purpose, mean cluster solutions of groups of participants were compared, with each group representing a defined level of expertise and cluster solutions being understood as typical for this expertise level. The studies did not, however, pay attention to cluster solutions of individual participants within these groups, nor inter-individual differences in cognitive movement representations. This individual approach was taken by Weigelt et al. (2011), who studied the mental representations of a Judo throwing technique (Uchi-mata) in judoka who were competing on the national team level. The individual cluster solutions of two of the eight participants examined in this study were compared *post-hoc* to the mean group cluster solution, which represented the functional movement phases as expected. The individual cluster solutions differed in details from the mean cluster solution and functional reference structure, reflecting individual preferences and technical differences as well as weaknesses in the judokas' performance. The authors point out that such difference, interpreted with accuracy and care, can reveal subtle flaws in the technical skills of the athlete and can be used by an expert coach to improve and adapt further training.

Comparison between individual cluster solutions and group average cluster solutions has also been used in the context of rehabilitation. Braun et al. (2007) used SDA-M to analyze the mental representations of a common everyday activity, drinking from a cup, in elderly patients recovering from stroke. Sixteen patients 3–26 weeks after their stroke took part in the study as part of their rehabilitation program, along with sixteen matched controls. SDA-M was applied using pictures representing the action sequence, augmented with arrows indicating movement directions if necessary. The results of all participants were regarded individually. The sixteen control subjects produced very similar cluster solutions consisting of two or three clusters corresponding to the functional action phases. The stroke patients' cluster solutions differed largely from those of the control group and from each other and were characterized by a weak functional integration of BACs, ranging from incomplete functional over non-functional clusters to a total absence of clusters. The study showed that SDA-M can be applied as a tool in rehabilitation, even in patients with reduced motor and cognitive capabilities.

Bläsing et al. (2010) used SDA-M to compare individuals with congenitally missing limbs, one of them with congenital phantoms of arms and legs, to different control groups. Instead of BACs, body parts and related activities were used as items in order to evaluate the mental representations of the participants' own bodies. The results revealed that the cluster solution of the individual with congenital phantom limbs, the existence of which had been affirmed in several previous studies using behavioral and neurophysiological methods, differed only slightly from the groups' cluster solutions, showing the same modular structure (Haggard and Wolpert, 2005). In contrast, the result of the individual who had never experienced any phantoms, however, differed largely from this modular structure, but rather reflected the individual's typical use of his body in everyday activities, providing evidence for an action-based influence on the adaptive body representation (Haggard and Wolpert, 2005). The findings from this study suggest that the SDA-M method might provide empirical access to the body schema, which is inextricably linked to the motor system and is constantly involved in the planning and execution of motor actions (de Vignemont, 2010).

The striking differences in representations found between high- and low-level performers support the assumption that motor learning leads to the development of task-specific representations, which play an important role in the control and organization of actions (e.g., Elsner and Hommel, 2001). According to skill acquisition theories (e.g., Fitts and Posner, 1967; Anderson, 1982, 1993, 1995), the cognitive mechanisms governing task performance are refined over the course of learning. To this extent, learning can be viewed as the modification and adaptation of representation structures in long-term memory (Schack, 2004; Schack and Ritter, in press). To directly test this assertion, Frank et al. (2013) investigated the developmental change in mental representation structures over the course of early skill acquisition of a complex motor task. Specifically, the authors employed a longitudinal design in which a group of novices practiced a golf putting task over the course of five training days. Both the change in participants' putting performance as well as the developmental change in the structure of the participants' mental representations were assessed before and after training. Results indicated that along with improved putting proficiency, significant developmental changes emerged within the practice group's mental representation as a result of task practice. These findings support the notion that functional adaptations of mental representations are closely tied to motor learning.

Research in the area of training and feedback has indicated that the type of attentional focus induced by training instructions can significantly impact the quality and rate of skill acquisition. To this extent, instructions that promote an external focus of attention (i.e., attention given to the effects of the movement on the environment) can lead to improved motor learning and retention (e.g., Wulf et al., 1999), reduced working memory demands (e.g., Wulf et al., 2001), reduced susceptibility to performance pressure (e.g., Land and Tenenbaum, 2012), and overall, better outcome performance (e.g., McNevin et al., 2003). Given that the sensory consequences of motor actions are considered an important component within mental representations (e.g., Ford et al., 2007), it appears likely that focusing on the sensory effects of one's movement (i.e., an external focus) during learning may act to facilitate the integration of perceptual effects during the formation of one's mental representation, leading to a more refined representation structure.

To explore this question, Land et al. (submitted) examined the developmental change in participants' representation structures as a function of instruction type. Specifically, novice participants trained on a golf putting task over the course of three training days. For half of the participants, training instructions were given to direct attention to the external effects of their movement (i.e., the roll of the golf ball). For the other half of participants, instructions were given such that attention was directed internally to the execution of the movement (i.e., focus on the swing of the arms). At the conclusion of practice on the third day, results indicated that both training groups displayed improved putting performance along with significant functional changes in their underlying mental representation. However, the performers who were given instructions that directed attention to the sensory consequences of their movement significantly outperformed the group who trained while focusing on skill execution. Additionally, the representation structure of the external learning group was significantly more elaborate and more functionally similar to skilled golfers than those of the internal focus learning group. These findings highlight that the association between movements and their perceptual effects are crucial for learning. To this extent, findings suggest that instructions which emphasize an external focus of attention aid the integration of perceptual effects during the development of mental representations. These findings furthermore give credence to the assertion that sensory consequences of motor actions are an important component within mental representations.

# **INVESTIGATING THE RELATIONSHIP BETWEEN REPRESENTATIONS AND MOTOR ORGANIZATION**

The results of the aforementioned research indicate a clear relationship between mental representations and motor performance. Specifically, these findings support the hypothesis that voluntary actions are planned, executed, and stored in long-term memory by means of reference structures comprised of BACs. Central to this perspective, mental representations are functionally considered to guide the motor organization during the realization of action goals. In this regard, these representations serve as a cognitive reference for the creation of motor patterns. Given that movements are structured and controlled via these representations, an important theoretical advancement would be to identify direct links between mental representations and movement kinematics in the fulfillment of action goals.

In this direction, Schack (2003) investigated the relationship between mental representations of gymnastic somersaults and the underlying movement kinematics amongst gymnasts of varying skill levels. The results indicated significant correlations regarding the space and time of the movement between kinematic parameters and the structural relationship between BACs within the motor representation. For instance, a significant negative correlation was found between the angular velocity of the somersault and the Euclidean distance between two nodes within the representation structure related specifically to the initiation of the twisting motion.

In a similar investigation, Schütz et al. (2009) were able to show that key biomechanical parameters of a table tennis serve could be modeled based on the mental representation structure of the participants. The movement kinematics and the mental representations of nine table tennis experts revealed that movement duration and ball flight parameters were predictable based on the Euclidean distance between select representation nodes. For example, the movement duration of the table tennis strike could be predicted by the representational distance between the individual BACs "move racket backward" and "move racket downward and forward."

The implications of these two studies indicate that representational structures can be used to predict kinematic parameters of movement for a given task. Specifically, the relationships between local subsets of BACs within a representation structure have been found to be associated with spatial and temporal aspects of movement. However, given that mental representations serve as a reference structure for the unfolding of a movement pattern, it would be important to find a direct link between the structure of the representation and the overall structure of the movement on a more global level. For this to occur, a proper method for decomposing a movement into its structural features must be proposed, which we will do in the following section.

# **SPATIO-TEMPORAL KINEMATIC DECOMPOSITION OF MOVEMENT**

The modern technologies of motion tracking provide researchers with a wealth of kinematic data on the full-body movements of humans, animals, and various robotic platforms. In order to explore this rich data, we have created computationally feasible algorithms for decomposing movements into independent spatio-temporal features directly from the captured kinematic signal. The proposed approach is useful for understanding, interpreting, and modeling complex movements in systems possessing many degrees of freedom, and provides a means for examining the overall structure of a movement.

The following algorithms have been tested on recorded movements from classical ballet and golf, and they allow us to estimate the level of movement expertise, draw the detailed structures of arbitrary complex movements, and automatically classify them into a given repertoire.

From our work on the kinematic analysis of complex full-body movements in classical ballet (see Volchenkov and Bläsing, 2013), we show that a movement tracked by a motion tracking system (MTS) can be understood in terms of a hierarchy of major and minor scales, in which the spatial and temporal components can be separated and studied independently. Based on our Spatio-Temporal Kinematic Decomposition (STKD) method, the major structure of a movement can be assessed. Specifically, the affinity between markers is identified by measuring the distance between them in the largest scale of kinematic signal, and by visualizing the results via a dendrogram. This approach reveals the functional relationship between markers by their geometric proximity. The typical character of movement is featured by the few major scales, while the minor scales determine the individual movement traits and can uniquely disclose the individual level of movement expertise, uneven distribution of the fine motor skills, and the emotional character of an individual (see Volchenkov and Bläsing, 2013 for details). The functional separation of scales can explain why we can perceive movements categorically (for example, as the highly stylized figures of classical ballet).

# **STKD PROCEDURE**

To track ballet movements (as in Volchenkov and Bläsing, 2013), we used a MTS (Vicon Motion Systems, Inc.) based on 12 highresolution cameras outfitted with IR optical filters and rings of LED strobe lights streaming data at 200 *fps*; the cameras detected the 3-dimensional spatial positions of passive retro-reflective spherical body markers with millimeter accuracy. Markers were attached to key anatomical locations according to the standard Vicon full-body marker placement protocol (Plug-in Gait) (see **Figure 1**).

### *Scale decomposition of kinematic data*

The MTS delivers positional data for *N* markers, across *T* time frames (*T* >> *N*) at the rate of 200 *fps*, in the form of a rectangular 3*N* × *T* matrix *M* = (*x*1, *y*1,*z*1,..., *xN*, *yN*,*zN*), in which the consequent triples of columns, *x<sup>k</sup>* = (*xkt*1,..., *xktT*)**T**, *y<sup>k</sup>* = (*ykt*1,..., *yktT*)**T**, *z<sup>k</sup>* = (*zkt*1,...,*zktT*)**T**, represent the Cartesian coordinates of the markers *k* = 1,..., *N*, at the sequent time frames τ = *t*1,..., *tT*. The T sign indicates transposition. The data matrix *M* is factorized using the *singular value decomposition* (SVD),

$$M = U\Sigma V^{\mathrm{T}} = \sum\_{s=1}^{3N} \sigma\_{s} \mathfrak{u}\_{s} \otimes \mathfrak{v}\_{s}^{\mathrm{T}},\tag{1}$$

where the sign stands for the outer product of vectors, *U* is a 3*N* × 3*N* unitary matrix with the columns *u<sup>s</sup>* representing the left singular vectors of *M*, *V* is a *T* × 3*N* unitary matrix with the columns *v<sup>s</sup>* representing the right singular vectors of *M*, and is a 3*N* × *N* diagonal matrix of ordered non-negative scale factors (singular values): σ<sup>1</sup> > σ<sup>2</sup> ≥ ... ≥ σ3*<sup>N</sup>* > 0. A number of smallest singular values can be equal to zero if the MTS suffers from optical occlusion. Moreover, a number of left and right singular vectors can belong to the same singular value if the matrix *M* enjoys an exact spatio-temporal symmetry. However, while processing the actual motion tracking data, we have never encountered multiple singular values. If all singular values of *M* are non-degenerate and non-zero, then the factorization (1) is unique, up to simultaneous multiplication of the left and right eigenvectors by the same unit phase factor. The left singular vectors form an orthonormal basis for the spatial arrangement of markers, (*us*, *us*)*R*3*<sup>N</sup>* = δ*s*,*s* , with respect to the inner product in *R* <sup>3</sup>*N*. The right singular vectors are orthonormal with respect to the inner product in *<sup>R</sup> <sup>T</sup>*, (*v*τ, *v*τ)*RT* = δτ,τ , forming a basis for the temporal sequences of kinematic data. With the use of (1), the kinematic signal *M* is decomposed into a weighted, ordered sum of separable matrices σ*su<sup>s</sup>* ⊗ *v*<sup>T</sup> *<sup>s</sup>* , in which the information about the spatial arrangement of markers corresponding to the singular value σ*<sup>s</sup>* is represented by the vector *u<sup>s</sup>* separately from the vector *vs*, giving an account of the temporal evolution. For each non-degenerate singular value, the separable matrix σ*su<sup>s</sup>* ⊗ *v*<sup>T</sup> *<sup>s</sup>* is a rank-one 3*N* × *T* matrix describing a *one-dimensional* mapping of spatial locations of markers to the sequent time frames that correspond to the synchronous motion of all markers (although with variable velocity) along straight lines. Namely, the trajectories of markers specified by the consequent triples of columns

**r** (*s*) *<sup>k</sup>* (τ) = **x** (*s*) *<sup>k</sup>*,τ, **y** (*s*) *<sup>k</sup>*,τ, **z** (*s*) *k*,τ of the matrix σ*su<sup>s</sup>* ⊗ *v*<sup>T</sup> *<sup>s</sup>* can be described mathematically using a single spatial dimension. Let us denote with ρ (*s*) *<sup>k</sup>* the unit vector, tracing the direction of the linear motion of the *k*th marker at the scale *sk*,

$$\vec{\rho}\_k^{(\mathfrak{s})} = \frac{(\mathbf{r}\_k(\mathfrak{r} + 1) - \mathbf{r}\_k(\mathfrak{r}))}{\|\mathbf{r}\_k(\mathfrak{r} + 1) - \mathbf{r}\_k(\mathfrak{r})\|}, \quad \text{for any } \mathfrak{r} = t\_1, \dots, t\_{T-1}$$

and the amplitude function of the linear motion common for all markers by

$$\lambda^s(\mathfrak{r}) = \frac{\left(\mathbf{r}\_k^{(s)}(\mathfrak{r}), \vec{\rho}\_k^{(s)}\right)}{\left(\mathbf{r}\_k^{(s)}(t\_1), \vec{\rho}\_k^{(s)}\right)}, \quad \gamma\_s(t\_1) = 1$$

Then, the trajectory **r***k*(τ) of the *k*th marker recoded by the MTS can be represented by the ordered sum of linear trajectories,

$$\mathbf{r}\_k(\mathfrak{r}) = \sum\_{s=1}^{3N} \vec{\rho}\_k^{(s)} \cdot \boldsymbol{\chi}\_s(\mathfrak{r}), \quad \mathfrak{r} = t\_1, \dots, t\_T. \tag{2}$$

The SVD of trajectories into linear components given by (2) is obvious for the motion of a single marker (see **Figure 2**) along a planar elliptic trajectory segment. In such a simple case, the components ρ(1) and ρ(2) are nothing but the major and minor axes of the ellipse. It is clear that the amplitude functions for the bigger and smaller scales of motion are γ1(τ) = − sin (ω1τ) and γ2(τ) = − sin (ω2τ), respectively.

Application of SVD in data analysis is similar to the wellknown principal component analysis and Fourier analysis (see Jolliffe, 1986). By setting the small singular values to zero, we obtain the minimal set of independent spatio-temporal features, ordered according to the scales of motion, which then approximate the original data with a maximal precision. Namely, for *l* < 3*N*, the 3*N* × *T* matrix *M*() = *<sup>s</sup>* <sup>=</sup> <sup>1</sup> σ*su<sup>s</sup>* ⊗ *v*<sup>T</sup> *<sup>s</sup>* renders the best least square approximation to *M* of the rank-, with an error

smaller than the first neglected eigenvalue σ+1. By neglecting the scales σ*s*> in (1) and consequent recombination of the kinematic signal, we can filter out unsolicited scales of motion (e.g., small scale movements of markers fixed on the clothing instead of the skin, movements of skin and tissues relative to the skeletal system, etc.) Despite certain computational similarity, the method of SVD differs essentially from the latent variable models, such as factor analysis, which use regression modeling techniques to test hypotheses producing error terms. The decomposition (1) does not involve any statistical hypotheses, being a purely descriptive technique.

### **ANALYZING MOVEMENT STRUCTURES**

Scale decomposition of tracked movements can be used as a base for the functional alignment of markers. Spatio-temporal relationships between different body parts in evolving movements can be visualized by a dendrogram representing the relative distance between markers on the largest scale of movement through the horizontal branch length. In accordance with (2), a motion can be understood in terms of a hierarchy of scales evolving by γ*s*(τ). The lowest level of this hierarchy corresponds to fast, lowscale movements of markers fixed on the clothing relative to the body, whereas the highest levels encode relatively slow, large-scale movements of the skeletal system. Although a detailed analysis of the functions γ*s*(τ) lies beyond the scope of the present paper, it is worth mentioning that they typically constitute strongly anharmonic oscillations, indicating that the relationship between force and displacement at each movement scale is strongly non-linear.

Being primarily concerned with the movement on its largest scale, we note that its structure is determined in (2) by the spatial arrangement of vectors ρ (1) *<sup>k</sup>* in association with the markers *k* = 1,..., *N*. For each marker, the magnitude of the vector ρ (1) *k* · *tT* τ = *t*<sup>1</sup> |γ*s*(τ)| can be considered as a relative measure of its mobility on the movement scale σ*s*. The degree of affinity between a pair of markers, *k*<sup>1</sup> and *k*2, can be attested on the largest scale of the movement by means of the Euclidean distance between the related vectors,

$$d\left(k\_1, k\_2\right) = \left\| \vec{\rho}\_{k\_1}^{(1)} - \vec{\rho}\_{k\_2}^{(1)} \right\|\tag{3}$$

It is customary to reproduce the matrices of all-to-all distances in the form of a dendrogram by placing closely-related markers in the same mold. To preserve the structure (3) as much as possible, we use the standard neighbor-joining tree-generating algorithm (Felsenstein, 2004). We search the matrix (3) for the closest markers, and then connect them into a block. Once the markers are connected, they are removed from the distance matrix and replaced by the block connecting them. The neighbor-joining algorithm continues until all *N* markers are connected in a tree, and each branch acquires a length, with length being interpreted as the estimated number of substitutions required to resolve the block. The functional contingency between blocks of markers on the largest scale of the movement is disclosed by their geometric proximity in the resulting dendrogram. In spite of all participants sharing roughly the same anatomy and performing the same movements, the structures of calculated dendrograms can be substantially different in terms of individual movement features and level of movement expertise (Volchenkov and Bläsing, 2013).

# **REPRESENTING FUNCTIONAL ALIGNMENT OF MARKERS IN THE PIROUETTE EN DEHORS**

In **Figures 3**, **4**, we have shown the neighbor-joining dendrograms representing the functional alignment of markers in the *pirouette en dehors* performed by a professional ballet dancer and a novice, respectively. To visualize spatio-temporal relationships between markers on the major scale of recorded movements, we have used the *TreeView* software, which is freely available on the internet (see Page, 1996). We emphasize that the dendrogram shows the functional alignment of markers in the sequential phases of the movement, irrelevant to the actual durations of these phases (see Volchenkov and Bläsing, 2013 for details). The *pirouette en dehors*, a controlled turn away from the supporting leg, is one of the most difficult of all ballet steps that can be executed with single or multiple rotations. The proper turning technique includes a periodic, rapid rotation of the head that serves to fix the dancer's gaze on a single spot, helping her to maintain control over the body (known as *spotting*). This rotational movement requires highly-defined coordination and constant adjustment of the body axis in order to be performed with the required stability and accuracy (Schack, 2001). The rhythmic structure of the *pirouette en dehors* is described by Tarassow (2005) as four measures in 2–4 time, the first two measures containing the preparation,

and the second two measures containing the turn and conclusion. According to this approach, the *pirouette en dehors* consists of two parts, the preparation and the actual turning movement. Both of these parts can be dissected again; the preparation can be broken down into two rhythmically-separated sections, whereas the turn segment consists of the actual turning movement and the opening to the front that concludes the turn. The dendrogram shown in **Figure 3** discloses the functional structure of the *pirouette en dehors* on the left leg, executed by an expert. On the largest scale of the movement, the pirouette starts (at the upper right corner of the dendrogram) with the function of body alignment by arranging legs in the proper position: the right foot is placed in front of the left foot, both turned outward. The right foot slides to the side (*tendu*, or *dégagé*), which concludes the *body alignment* phase. The spring tension is built up for the turn during the *tension build-up* phase, as the right foot moves back and is placed behind the left one and the knees bend (*plié*). At the beginning of the *turn* phase, both legs push into the ground and the left (supporting) leg adopts *point* or *demi-point* position (on the toes or on the ball of the foot, respectively), while the right knee is bent and the right foot is pulled up to the knee of the supporting leg. During the turn, the head is rapidly whipping around, which helps the dancer to maintain balance. Eventually, in the *landing* phase concluding the turn, the right foot is placed behind the left one, and the knees bend and stretch (*plié*). The arms open, and the arms and torso are used to cease rotation. It is important to mention that each functional phase elicited from the dendrogram shown in **Figure 3** can be ascribed directly to the functional phases of the *pirouette en dehors* as defined in Bläsing et al. (2009) via the BACs, the key points within the functional structure of the movement, which are stored in the long-term memory of a dancer.

In contrast to the movement sequence executed by the professional dancer, the movement of a novice performer inappropriately starts simultaneously in both legs, and turning starts prematurely, while straightening the knees (see **Figure 4**). In the turning phase, the movements are allocated to the superior iliac spine. The head apparently does not play a role until ceasing the movement. Instead, the vigorous hand movements play a major role in maintaining the body's rotation, which is a common mistake among beginners.

### **LINKING MENTAL AND KINEMATIC STRUCTURES**

Given that the STKD procedure identifies the underlying kinematic structure of movements (see **Figure 3**), we are able to examine whether the organization and structures found within the mental representations share common structural features with those found in the movement kinematics. To provide a first glimpse into this overlap, we examined the movement kinematics and mental representations of a golf swing in 9 participants (*Mage* = 32.3, *SDage* = 10.6, 6 males) of varying skill levels (0–50 years of golf experience). Specifically, movement kinematics of a golf swing were captured using a 3-dimensional MTS (Vicon Motion Systems, Inc.) in which markers were placed on the anatomical landmarks of the body consistent with the standard Vicon full-body Plug-in Gait marker placement protocol. The subsequent marker trajectories for each trial were subjected to the STKD analysis (presented above) to produce a dendrogram of hierarchical couplings of movement trajectories, indicating the kinematic structure of the movement.

Likewise, the mental representation of the golf swing was procured for each participant through the SDA-M analysis (see Schack, 2004, 2012). The SDA-M identified the structural composition of mental representations by revealing the hierarchical and temporal structure of BACs within long-term memory. In order to assess the underlying mental representation on a level consistent with the kinematic structure revealed by the STKD approach (i.e., coupling between body segments), we utilized verbal labels indicating body parts as the representation units for the splitting task, the first step of the SDA-M. These verbal labels consisted of body parts relating to the tracked body segments in the motioncaptured kinematic data1 . Participants were instructed to assess the similarity between each of the body segments in terms of function and motion with respect to the golf swing. The resulting mental representation signified the functional relationship between body segments involved in the golf swing movement in long-term memory. The use of body parts as part of the SDA-M method has been successfully shown to distinguish between individuals with differing body representations in relation to a particular motor task (e.g., Bläsing et al., 2010).

<sup>1</sup>The body parts used as concepts within the SDA-M procedure, and subsequently tracked via motion capture, consisted of 1) head, 2) chest, 3) left shoulder, 4) left elbow, 5) left hand, 6) right shoulder, 7) right elbow, 8) right hand, 9) hips, 10) left thigh, 11) left knee, 12) left foot, 13) right thigh, 14) right knee, and 15) right foot.

The basis for the comparisons between the representational structure and the kinematic movement structure resides in the Euclidean distance matrices derived from both the STKD and SDA-M analyses. The distance matrix obtained from the SDA-M method is comprised of the Euclidean distances between concepts (body parts) as represented in feature space based on the results of the SDA-M splitting procedure. For the distance matrix obtained from the STKD method, the matrix contains the Euclidean distances between body markers within the major spatio-temporal scale of movement. From these two matrices, the hierarchical structure of the movements were derived, and thusly compared. More specifically, we computed both mean group dendrograms and individual dendrograms for both the mental representation and kinematic data via cluster analysis. Each cluster solution was established by determining an incidental euclidean distance (*dcrit*). Nodes linked together above this critical value were considered unrelated, while BACs linked below this value were considered related. For all cluster analyses conducted, the critical value *dcrit* = 5.64 was chosen, which reflects an alpha-level of α = 0.001.<sup>2</sup> Next, the invariance measure λ was calculated to determine the degree of similarity between two cluster solutions. According to Schack (2012), two cluster solutions are invariant (i.e., not significantly different) for λ > 0.68, while two cluster solutions are significantly variant for λ < 0.68. Additionally, the correlation between the distance matrices was examined as a measure of overall relatedness of structural couplings of body parts on a mental and physical level.

Results of our analyses indicated a high degree of consistency and similarity between the structure of mental representations and movement kinematics. Examination of group mean dendrograms for both the mental representation structure and the major movement kinematic structure revealed a significant similarity between the two structures (λ = 0.71; λ*crit* = 0.68; for more details, see Schack, 2012). That is to say that the structure revealed in the memory representation was statistically equivalent to the movement structure revealed by the STKD procedure (see **Figure 5**). Specifically, both the representation structure and kinematic structure displayed two distinct clusters representing the upper and lower body (*p* < 0.001). However, one difference emerged between the two structures in regard to the body segment "hips," such that in the mental representation, the hips were coupled with the functional relevance of the upper body, whereas the kinematic structure indicated that the hips were more closely coupled to the movement of the lower body. In this case, a mismatch exists between the group mental and group kinematic structure. Additionally, a comparison of the mental and kinematic structures on an individual level<sup>3</sup> revealed that 5 out of 9 individuals displayed significantly similar structures (*M*<sup>λ</sup> = 0.63, *SD* = 0.09; λ*crit* = 0.68, *n* = 9). **Figure 6** displays the similarity in the mental and kinematic structures for a single participant whose structures are statistically invariant (λ = 0.71; λ*crit* = 0.68). Interestingly, the degree of task experience was not related to the

degree of similarity between representation and kinematic structure in our current sample (*r* = −0.385, *p* = 0.31). However, future research is needed to determine why some individuals display stronger connections between mental and kinematic structures. To this extent, examination of differences between intact groups of novices and experts may provide further clarity into this connection.

In addition to the structural invariance measures based on the cluster solutions, significant correlations were evident between the mental and kinematic distance matrices. These correlations represent the degree of similarity between the coupling of body segments in feature space, as defined by the SDA-M and STKD procedures. Specifically, the group mean mental

<sup>2</sup>For more details on the SDA-M analysis, please see Schack (2012).

<sup>3</sup>For the individual comparisons, we utilized the SDA-M results and the STKD results from a single kinematic trial.

and kinematic distance matrices indicated a strong and positive correlation (*r* = 0.629, *p* < 0.001). As such, there is a close relationship between the relative body segments in both memory and physical execution. Likewise, significant correlations were evident on an individual level. For all participants, the correlation between the mental and kinematic distance matrices indicated significant positive correlations with values ranging from *r* = 0.242 to *r* = 0.712. Noteworthy, the correlations between mental and physical distance matrices remained remarkably stable over repeated trials within each individual, with an average correlation standard deviation of 0.02 across all participants.

Examining the scatter plot of the relationship between the mental and kinematic distance matrices also provides important insights into the link between representation and movement. Specifically, examination of outliers can be used as a diagnostic procedure to identify mismatches between an individual's movement representation and the physical execution of the movement. For instance, **Figure 7** displays the relationship between the mental and kinematic distance matrices for one participant. As can be seen, the points largely lay along the trend line, representing a strong relationship; however, point A represents a strong deviation from the predicted relationship. Examination of point A reveals that the left knee and left thigh were not similarly coupled in the execution of the movement like they were in the mental representation. This difference corresponds with the differences observed in the structural analyses as seen in **Figure 6** from the same participant. Such differences may suggest areas for targeted skill interventions by coaches, trainers, or physical therapists. Future research is needed to determine the qualitative impact on motor performance where such mismatches between mental and kinematic structure exist.

mismatch between cognitive representation and the major kinematic

### **DISCUSSION OF FINDINGS**

The above findings further confirm the close link between mental representations and the execution of motor actions. These results build on previous findings that have found functional links between representational chunks and specific biomechanical parameters within a given movement (e.g., Schack, 2003; Schütz et al., 2009). Specifically, our findings suggest that mental representations not only play a key role in guiding key biomechanical parameters of a movement, but also guide the overall structure of the action. To this end, we observed a close relationship between the overall structure of both the mental representation and movement kinematics on both a group and individual level.

The close link between representation and kinematic structure is consistent with the Cognitive Architecture model proposed by Schack (2004), which suggests that motor skills are organized within hierarchical memory structures comprised of elementary components or transitional states of complex movements. These memory structures act as a cognitive reference for the unfolding of action, such that they serve to govern the tuning of motor commands and muscular activity patterns. As such, a tight correlation is predicted to exist between the structure of mental representation and the structure of movement.

As we show, the structures observed within the movement kinematics are reflected within the mental representations of the participants. Interestingly, the degree of association or invariance between the mental representation and the movement structure did not appear to be dependent on the level of skill of the individual. Although our sample consisted of relatively few participants, thus likely making any relationship difficult to observe, the lack of a moderating effect of skill level is not entirely baseless. While research has shown that experts have more elaborate and hierarchically-organized mental representations (e.g., Bläsing et al., 2009), the overall function of the representation is to guide the unfolding of motor patterns. As such, regardless of the quality of the representation, the movement unfolds in a manner that is consistent with the representation structure, and thus movements and representations would be clearly correlated across all skill levels. However, this assumption warrants future research to confirm or reject the influence of skill level and other factors on the degree of association between movement and representation structure.

Examining the structural relatedness between mental representations and movements has a number of practical applications. As has been demonstrated, investigating the mismatch between memory structures and kinematic synergies may be useful for diagnosing movement disorders or guiding training strategies. To this extent, investigation of mental representations via the SDA-M method has successfully been shown to identify representational problems within stroke patients who display movement deficiencies (Braun et al., 2007). Similarly, incongruence between mental and kinematic structures may be a key factor in determining how deficient mental representations are manifested within the overall motor production. Additionally, the degree of invariance between mental and kinematic structures may act as a useful benchmark for examining the efficacy of artificial cognitive systems within modern robotics. As robotic platforms make qualitative leaps in the areas of perceptual, cognitive, and motor capabilities, cognitive architectures designed to effectively integrate these functions

feature of movement.

become essential. The field of cognitive robotics is increasingly turning to biological models of action organization to guide the development of efficient and natural cognitive control systems (e.g., Schack and Ritter, 2009, in press; Maycock et al., 2010). In this direction, first steps have already been taken in modeling higher level cognitive representations derived from human data into robotic grasping applications (Maycock et al., 2010). The extent to which artificial cognitive systems efficiently represent and guide complex actions may be distinguishable based upon our proposed method, which would indicate a level of cognitive sophistication similar to that of its biological counterparts.

# **SUMMARY**

The present article reviewed the substantial work on mental representations underlying complex action in humans. In doing so, we proposed a new experimental approach to capture the relationship between mental representation and the kinematic structure of movement. The STKD method presented in this paper allows for the segmentation of any recorded movement into a minimal number of independent spatial-temporal features.

# **REFERENCES**


structures of complex movementsin dance. *Psychol. Sport Exerc.* 10, 350–360. doi: 10.1016/j.psychsport. 2008.10.001


This method has been found to effectively elicit the hierarchicallyorganized key kinematic elements of a movement in different spatio-temporal scales. Based on these analyses, we presented a first step toward linking the memory structure of a complex motor task to the unfolding movement dynamics. Results from our analyses indicated a clear structural relationship between the motor representation in long-term memory and the functional structure of movement kinematics. These findings support the theoretical perspective that complex actions are planned and performed with the help of structured cognitive representations in long-term memory that act to guide the biomechanical organization of movements (Hommel et al., 2001; Mechsner et al., 2001; Schack and Mechsner, 2006; Hoffmann et al., 2007). Implications of these findings are important for a number of movement related domains, including physical therapy, sports training, and artificial cognitive systems. While the present paper presents an important first step in the direction of linking cognitive and biomechanical structures, much work remains to extend the current findings to new task domains while also exploring the variables that moderate this relationship.

94, 427–438. doi: 10.1037/0033- 295X.94.4.427


methodical approach. *Motor Control Learn.* 1–12. Available online at: http://www.ejournalbut. de/Journal/index.asp


individual skill diagnostics in high-performance sports. *Psychol. Sport Exerc.* 12, 231–235. doi: 10.1016/j.psychsport.2010.11.001


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 31 May 2013; accepted: 30 August 2013; published online: 18 September 2013.*

*Citation: Land WM, Volchenkov D, Bläsing BE and Schack T (2013) From action representation to action execution: exploring the links between cognitive and biomechanical levels of motor control. Front. Comput. Neurosci. 7:127. doi: 10.3389/fncom.2013.00127*

*This article was submitted to the journal Frontiers in Computational Neuroscience.*

*Copyright © 2013 Land, Volchenkov, Bläsing and Schack. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*