Edited by: Georg Martius, Max Planck Institute for Intelligent Systems (MPG), Germany
Reviewed by: Kohei Nakajima, University of Tokyo, Japan; Tom Ziemke, University of Skövde & Linköping University, Sweden
Specialty section: This article was submitted to Computational Intelligence, a section of the journal Frontiers in Robotics and AI
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
The greater ubiquity of robots creates a need for generic guidelines for robot behavior. We focus less on
One of the trends of modern robotics is to extend the role of robots beyond being a specifically designed machine with a clearly defined functionality that operates according to a confined specification or safely separated from humans. Instead, robots increasingly share living and work spaces with humans and act as servants, companions, and co-workers. In the future, these robots will have to deal with increasingly complex and novel situations. Thus, their operation will require to be guided by some form of generic, higher instruction level to be able to deal with previously unknown and unplanned-for situations in an effective way.
Once robotic control has to cope with more than replaying meticulously pre-arranged action sequences, or the execution of a predefined set of tasks as a reaction to a specified situation, the need arises for generic yet formalized guidelines which the robot can use to generate actions and preferences based on the current situation and the robot’s concrete embodiment.
We propose that any such guidelines should address the following three issues. First, “robot initiative”: we expect the principles to be generic enough for the robot to be able to apply them to novel situations. In particular, the robot should not only be able to respond according to predefined situations but also be able to
Science Fiction literature, an often productive vehicle to explore ideas about the future, has come across this problem in its countless speculations about the role of robots in society. Arguably, the best-known suggestion for generic rules for robot behavior are Asimov’s A robot may not injure a human being or, through inaction, allow a human being to come to harm. A robot must obey the orders given to it by human beings, except where such orders would conflict with the First Law. A robot must protect its own existence as long as such protection does not conflict with the First or Second Law.
While there is ample room to discuss the technicalities and implications of the Three Laws (McCauley,
In part, this is based on fundamental AI problems, such as determining the scope and context pertinent to such rules by robots or AIs in general (Dennett,
Even if we were to somehow imbue a robot with a human-level understanding of human language, we would still face the more pragmatic problem that human language carries intrinsic ambiguity. One example of this is the legal domain, where humans will argue what exactly constitutes “harm” in legal cases, demonstrating that there is no unambiguous understanding of this term that could just be applied in a technical fashion. Relatively simple sentences, such as the amendments of the US constitution have spawned decades of interpretation. Also, several of Asimov’s stories illustrate how robots find loopholes in their interpretations of the Three Laws that defy human expectations. Because of all previously mentioned problems, current-day robots are unable to generate actions or behavior complying with natural language directives, such as “protect human life” or “do no harm.” Even greater demand in regard to natural language processing is posed by the Second Law which requires the robot to be able to interpret any order. This requires a robust, unambiguous understanding of human language that cannot even be realized by humans.
In this paper, we have a similar aim as Asimov had with his Three Laws; however, rather than exactly reproduce the Laws, we propose a formal, non language-based method to capture the underlying properties of robots as tools. Instead of employing language, we suggest to use the information-theoretic measure of empowerment (Klyubin et al.,
Centrally, we propose here that the empowerment formalism offers an operational and quantifiable route to technically realize some of the ideas behind the Three Laws in a generic fashion. To this end, we will first introduce the idea behind empowerment. We will then proceed to give both a formal definition and the different empowerment perspectives. We will then discuss how these different perspectives correspond to concepts, such as self-preservation, compliance, and safety. Finally, we will discuss extensions, challenges, and future work needed to fully realize this approach on actual robots.
To motivate empowerment and gain a better understanding, let us first take a brief look at the background, before moving on to the formal definition. Oesterreich (
In essence, empowerment formalizes a “motivation for effectance, personal causation, competence, and self-determination,” which is considered to be one area of intrinsic motivation by Oudeyer and Kaplan ( task-independent, computable from the agent’s perspective, directly applicable to many different sensorimotor configurations, without or with little external tuning, and sensitive to and reflective of different agent embodiments.
The task-independence demarcates this approach from most classical AI techniques, such as reinforcement learning (Sutton and Barto,
The computability from an agent’s perspective is an essential requirement. If some form of intrinsic motivation is to be realized by an organism or deployed onto an autonomous robot, then the organism/robot needs to be able to evaluate this measure from its own perspective, i.e., based on its own sensor input. This relies on a notion of
The next property of intrinsic motivation is the ability to cope with different, quite disparate sensorimotor configurations. This is highly desirable for the definition of general behavioral guidelines for robots. This means not having to define them separately for every robot or change them manually every time the robot’s morphology changes. The applicability to different sensorimotor configurations combined with the requirement of task-independence is the central requirements for such a principle to be universal. More precisely: to be universal, a driver for intrinsic motivation should ideally operate in essentially the same manner and arise from the same principles, regardless of the particular embodiment or particular situation. A measure of this kind can then identify “desirable” changes in both situation (e.g., in the context of behavior generation) and embodiment (e.g., in the context of development or evolution). For example, while most empowerment work focusses on state evaluation and action generation, some work also considers its use for sensor or actuator evolution (Klyubin et al.,
Furthermore, this implies that a measure for intrinsic motivation should not just remain generically computable, but also be sensitive to different morphologies. The challenge is to define a value function in such a way that it stays meaningful when the situation or the embodiment of the agent changes. An illustrative example here are studies where an agent had the ability to move and place blocks in a simulated world (Salge et al.,
To sum up this section, if we want to use an intrinsic motivation measure as a surrogate for what Pfeifer and Bongard (
To make our subsequent studies precise, we now give a formal definition of empowerment. Empowerment is formalized as the maximal potential causal flow (Ay and Polani,
To compute empowerment, we model the agent–world interaction as a perception–action loop as in Figure
The perception–action loop visualized as a Causal Bayesian network (Pearl,
For the computation of empowerment, we consider this perception–action loop as telling us how actions may potentially influence a state in the future, and by influence we emphatically mean not the actual outcome of the concrete trajectory that the agent takes, but rather the potential future outcomes at the given time horizon
Empowerment is then defined as the channel capacity between the agent’s actuators
Note that the maximization implies that it is calculated under the assumption that the controller which chooses the action sequences (
Furthermore, empowerment is a state-dependent quantity, as it depends on the state
Empowerment is defined for both discrete and continuous variables. However, while it is possible to directly determine the channel capacity for the discrete case using the Blahut–Arimoto Algorithm (Arimoto,
When talking about an empowerment-maximizing agent, it must be emphasized that the distribution
To illustrate, an agent might prefer a state where it would have the
In past work, two main strategies have been used. Greedy empowerment maximization basically considers all possible actions in the current state and then computes the successor states (or state distributions) for each of those actions. Then, empowerment is calculated for each of those successor states. The successor state with the highest empowerment is selected, and the agent then performs the action leading to the chosen successor state. In case each action has a distribution of successor states, then the action with the highest average successor state empowerment is selected. This has the advantage that the agent only needs to compute the empowerment for the immediate successor states in the future.
Alternatively, the agent could compute the empowerment values for each possible state of the world, or a subset thereof. The agent could then determine the state with the maximal empowerment and then plan a sequence of actions to get to this state (Leu et al.,
Here, however, we will, in accordance with the latter principle, generally present the computed empowerment values as an empowerment map, because it gives an overview over what behavior would be preferred in either case. The map visualizes both the local gradients and the optima. The behaviors resulting from these maps would then be either an agent that acts in order to climb up the local gradient or to reach a global optimum.
In this section, we outline how we can use the empowerment formalism to capture the essential aspects of the Three Laws. For this, we look at a system that contains both a human and a robotic agent. The Causal Bayesian Network from Figure
The time-unrolled perception–action loop with two agents visualized as a Bayesian Network.
In combination these three heuristics should provide an operationalized motivation for robots to act in a way that reflects the sentiment behind the Three Laws of Robotics. The core of this idea was initially suggested by Salge et al. (
Robot empowerment is the potential causal information flow from the robot’s actuators to the robot’s sensors at a later point in time. In this perspective, the human is simply included in the external part of the perception–action loop of the robot. From the robot’s perspective, all variables pertaining to the human are subsumed in
Typical empowerment-driven behavior can be seen, e.g., in control problems, where empowerment maximization balances a pendulum and a double pendulum, and also stabilizes a bicycle (Jung et al.,
More concretely relevant for the present argument is the work by Leu et al. (
Empowerment-driven robot follower as described in Leu et al. (
While the human-in-the-loop can be treated, from the robot’s perspective, as just another part of the environment, we can modify our question and ask how the maximization of robot empowerment will affect the human or what kind of interactive behavior will result from this? Consider Figure
Robot empowerment (in grayscale: dark—low, bright—high) dependent on the robot position. Obstacles in blue. Two different human (yellow circle) behavior models are considered. If the robot would be within the safety shutdown distance (red circle), it is not able to act. The red dots are the endpoints of the 2,000 random human action trajectories used for possible action predictions, the two graphs differ in the assumed distribution of human movement in the next three steps. If one compares the empowerment close to the safety shutdown distance, one can see that the assumed human behavior influences the estimated empowerment while all other aspects of the simulation remain unchanged.
The effects of this can be seen in Figure
A reliably predictable human would also allow the robot to maintain a higher empowerment closer to the human if it would stay south-west of the human, basically opposite of the humans predicted movement direction. In general, obtaining a better human model can increase the robot’s perceived model-based empowerment by reducing the human noise and by providing a better estimate which states in close proximity to the humans are less likely to become disempowering.
A related phenomenon was observed in a study by Guckelsberger et al. (
Better model acquisition is necessary for an agent to gage whether or not to avoid humans, depending on their cooperation, unpredictability, or antagonism. But empowerment maximization offers, by itself, not a direct incentive toward or away from human interaction. It is possible to imagine that the human could perform actions that would increase or preserve the robots or NPCs empowerment, such as feeding or repairing/healing it. This would then likely create a drive toward the human, if this is modeled in viable timescales. In absence of any specific possible benefits provided by the human, however, the agent driven by empowerment alone is not automatically drawn toward interacting with a human and is likely to drift away over time. This will be addressed in the other perspectives below.
In short, we propose to maximize robot empowerment to generate behavior by which the robot strives toward self-preservation. Specifically, becoming inoperational corresponds to vanishing empowerment. In turn, having high empowerment means that the robot has a high influence on the world it can perceive, implying a high readiness to respond to a variety of challenges that might emerge. We propose this principle, namely, maximizing empowerment, as a generic measure as a plausible proxy for producing behavior in the spirit of the Third Law, as it will cause the robot to strive away from states where it expects to be destroyed or inoperational, and to strive toward states where it achieves the maximum potential effect. Robot empowerment maximization thus acts to some extent as a surrogate for a drive toward self-preservation.
We now turn to human empowerment. Human empowerment is defined in analogy to robot empowerment as the potential causal flow from the human’s actuators to the human’s sensors. The robot is now part of the external component of the perception–action loop of the human. Maximizing, or at least preserving human empowerment has similar effects to the previous case: keeping it at a high value implies maintaining the human’s influence on the world and avoiding situations which would hinder or disable the human agent. A central difference to the previous case is that the human empowerment is made dependent on the
Figure
Human empowerment (in grayscale: dark—low, bright—high) dependent on the robot’s position. In this simulation, a laser (indicated by the red line) blocks the human’s movement, but the laser can be occluded by the robot body. Thus, Human empowerment is the highest for robot positions toward the right wall where the robot blocks the laser and thereby allows the human a greater range of movement.
We can also see in Figure
For another illustrative example of the effect of considering human empowerment-driven robots, consider the NPC (non-person character; autonomous, computer-controlled player in a video games) in the dungeon crawler game scenario from Guckelsberger and Salge (
Companion (C) and player (P), both purple, are threatened simultaneously by two enemies (E), both red. The images represent the successive moves: the companion escapes its own death, rescues the player, and finally defends itself. Left: combined heuristics for 2-step empowerment (see text), lighter colors indicating higher empowerment. Arrows indicate shooting. Figure taken from Guckelsberger and Salge (
In this section, we considered human empowerment maximization (in the variant of being influenced by the artificial agent rather than the human). Using this variant as driver leads to a number of desirable behaviors. It prevents the robot from obstructing the human, for example, by getting too close or by interfering with the human’s actions. Both would be noticeable in the human empowerment value, because they would either constrain accessibility to states around the human, or inject noise in the human’s perception–action loop. In addition to that, the robot acts as to enhance or maintain the human’s empowerment, through “proactive”-appearing activities, represented in above examples by removing a barrier from the environment or by neutralizing a threat that would destroy or maim the human or even just impede their freedom of movement. In this sense, human empowerment maximization can be plausibly interpreted as a driver for the agent toward protecting the human and supporting their agenda.
A number of caveats remain: to compute the human empowerment value, the robot not only needs a sufficient forward model but also needs to be able to identify the human agent in the environment, their possible actions, and how the human perceives the world via their sensors and what they are able to do with their actuators. This is not a trivial problem; however, it nevertheless has the advantage that it offers, in some ways, a “portable” and operational modeling route. While it depends on a sufficiently reliable algorithm for detecting humans and plausible, if strongly abstracted, models for human perception and actuation, once these are provided, the principle is applicable to a wide range of scenarios. The present proposal suggest possible routes toward an operational implementation of a “do not cause harm to a human” and a “do not permit harm to be caused to a human” principle, provided one can endow the artificial agent with a—what could loosely be termed—“proto-empathetic” perspective of the human’s situation.
Another critical limitation to the applicability of the formalism is the time horizon, which is the central free parameter in the empowerment computation. While a robot driven by human empowerment maximization might stop a bullet, or a fall into a pit, it would need to extend its time horizon massively to account for things that would be undesirable for the human in the short-term, but are advantageous in the long run. To illustrate, consider the analogy from human lawmaking, where freedom to act on the short scale is curtailed in an effort to limit long-term damage (e.g., in environmental policies). The principle as discussed in this section is, therefore, best suited for interactions that have to avoid obstruction or interference by a robot in the short term and with immediate consequences. That being said, nothing in the formalism prevents one—in principle—to be able to account for long-term effects. To do so in practice will require extending the methods to deal with longer time-scales and levels of hierarchies.
As the third and last variant, we consider transfer empowerment. Transfer empowerment is defined as the potential causal information flow from the actions of
Figure
A visualization of the human-to-robot (HtR) transfer empowerment dependent on robot position (dark—low, light—high). This simulation shows a slightly elevated human-to-robot transfer empowerment around the human agent (yellow) at the shutdown distance (red circle). This is because the human can move toward or away from the robot, thereby having the potential to stop the robot. This creates a potential causal flow from the human’s actions to the robot’s sensors, which in this case measure the robot position. Here, the physical proximity of the human allows it to directly influence the robot’s state.
This effect can alternatively be obtained by the analogous robot-to-human transfer empowerment; this was demonstrated in Guckelsberger and Salge (
Two room scenario from Guckelsberger and Salge (
So, in regard to creating player-following behavior, both directions of transfer empowerment seem to be suitable. But, looking at the causal Bayesian network representation in Figure
The time-unrolled perception–action loop of two agents, colored to visualize the different pathways potential causal flow can be realized. The red dashed line indicate the causal flow from the human actuator
A visualization of the human-to-robot (HtR) transfer empowerment dependent on robot position (dark—low, light—high). In this scenario, the robot will mirror the human’s movement if it has a direct line of sight. This creates a high amount of potential causal flow through the robot where the latter sees the human. It results in comparatively high transfer empowerment in those areas where the robot has both a direct line of sight to the human (yellow) and is not shut down by close proximity to the human. The highest transfer empowerment is attained at a distance, but in an area where the robot can see and react to the human; with this, the robot provides the human with operational proximity, i.e., the ability to influence the robot’s resulting state.
The distinction between the two pathways for transfer empowerment, directly through the environment and through the internal part of the other agent’s perception–action loop, also provides us with reasons to prefer human-to-robot transfer empowerment over transfer empowerment in the other direction. In human-to-robot transfer empowerment, the internal pathway is through the agent; so the robot can consider adjusting its behavior, i.e., the way it responds with actions to sensor inputs, in order to increase the transfer empowerment. In the robot-to-human transfer empowerment, the internal pathway is through the human, which the robot cannot optimize and the human should ideally not be burdened with optimizing. So, if one seeks to elicit a reliable reaction of the robot to the human’s action, then human-to-robot transfer empowerment should be the quantity to optimize.
Another difference between the two directions of transfer empowerment becomes evident when we compare Figures
The different scenarios we looked at also illustrate the idea of “operational proximity” that transfer empowerment captures. The influence of one agent on another does not necessarily depend on physical proximity, but rather on both agents’ embodiment, here in the form of their actions and sensor perceptions. While in one scenario the human could stop the robot by physical proximity, in the other they could direct the robot along their line of sight. In the dungeon example, the companion needs proximity to directly affect the player by blocking or shooting it, but one could also instead imagine a situation where the NPC would push a button far away or block a laser somewhere else to affect the environment of the player. Maximizing transfer empowerment tries to attain this operational rather than physical proximity. In turn, operational proximity acts as a necessary precondition for any interaction and coordination between the agents. To interact, one agent has to be able to perceive the changes of the world induced by the other agent. Vanishing transfer empowerment would mean that not even this basic level of interaction is possible.
Furthermore, HtR transfer empowerment maximization also creates an incentive to reliably react to the human actions. In detail, this means increasing the transfer empowerment further by allowing for some potential causal flow through the
Summarizing the section, while the maximization of transfer empowerment does not precisely capture the Second Law, it creates operational proximity between the human and the robot, and thereby the basis for further interaction; together with the enhancement of human empowerment, it sets the foundation for the human to have the maximum amount of options available. Furthermore, if the robot behavior itself is included in the computation of transfer empowerment to be optimized, then this would provide an additional route to amplify the human’s actions in the world, namely by virtue of manipulating the robot via actions and proto-gestures which make use of an implicitly learnt understanding of the internal control of the robot.
The core aim of this article was to suggest three empowerment perspectives and to propose that these allow—in principle—for a formalization and operationalization of ideas roughly corresponding to the Three Laws of Robotics (not in order): the self-preservation of a robot, the protection of the robot’s human partner, and the robot supporting/expanding the human’s operational capabilities. Empowerment endows a state space cum transition dynamic with a generic pseudo-utility function that serves as a rich preference landscape without requiring an explicit, externally defined reward structure; on the other hand, where desired, it can be combined with explicit task-dependent rewards. Empowerment can be used as both a generic and intrinsic value function. It serves not only as a warning indicator that one is approaching the boundaries of the viability domain (i.e., being close to areas of imminent breakdown/destruction) but also imbues the interior of the viability domain with additional preference structure in advance of any task-specific utility. This, together with the properties outlined in Section
For the practical application of the presented formalism to real and more complex robot–human interaction scenarios, still a number of issues, such as computability and model acquisition need to be addressed. The following discussion outlines suggestions on how some of these challenges can be overcome and the problems still to be solved to deploy the presented heuristics on real-world scenarios.
Extending the idea presented here from simple abstract models into the domain of practical robotics immediately raises the questions of computability. In the classical empowerment formalism, computation time scales dramatically with an increase in sensor and actuator states and with an extension of the temporal horizon. In discrete domains, previous work has demonstrated a number of ways as to how to speed up the computation: one being the
A simple alternative option is to just sample a subset of all action sequences and compute empowerment based on this sample to get a heuristic estimate for the actual value (Salge et al.,
Earlier, we mentioned a fast approximation method for continuous variables (Salge et al.,
With the establishment of empowerment as a viable and useful intrinsic motivation driver for artificial agents in a battery of proof-of-principle scenarios over the last years, it has become evident that it is well warranted to invest effort into improving and speeding up empowerment computation for realistically sized scenarios. The previous list of methods shows that a promising range of approaches to speed up empowerment computation already exists and that such approximations may well be viable. We have, therefore, grounds to believe that future work will find even better ways to scale empowerment computation up, thus rendering it more suitable to deployment on practically relevant robotic systems.
Traditional methods for empowerment calculation crucially require the agent to have an interventional forward model (Salge et al.,
First, one would learn the local causal dynamics of agent–world interaction; from this one can then compute empowerment which, as a second step, provides an intrinsic reward or pseudo-utility function which is associated with the different world states distinguishable to the agent. As example, previous work in the continuous domain demonstrated how Gaussian Process learners (Rasmussen and Williams,
Whether the forward model is prespecified or learnt during the run, empowerment will generally drive the agent toward states with more options. However, if trained during the run of the agent, the model will in general also include uncertainty on the outcome of actions. Such uncertainty “devalues” any options available in this state and will lead to a reduction of empowerment. This reduction is irrespective of whether the uncertainty is due to “objective” noise in the environment and unpredictability (e.g., due to another agent) or due to internal model errors stemming from insufficient training. From the point of view of empowerment both effects are equivalent.
The first class of uncertainty (environmental uncertainty) will tend to drive the agent away from noisy or unpredictable areas to comparatively more predictable ones if the available options are otherwise equivalent, or reduce the value of richer option sets when they can be only unpredictably invoked. The second class of uncertainty (model uncertainty) will—in the initial phase of the training—cause empowerment to devalue states where the model cannot resolve the available options. This has consequences for a purely empowerment-driven exploration of rich, but non-obvious interaction patterns; a prominent candidate for such a scenario would be learning the behavior of other agents, as long as they are comparatively reliable.
The intertwining of learning and empowerment-driven behavior can thus be expected to produce a number of meta-effects on top of the already discussed dynamics. This could range from exhibiting a very specific type of exploratory behavior; moreover, such agents might initially be averse to encountering complex novel dynamics and other agents. On the other hand, by modulating the learning process and experience of an agent as well as its sensory resolution or “scope of attention” depending on the situation, one could guide the agent toward developing the desired sensitivities.
Such a process would, in a way, be reminiscent of the socialization of animals and humans to ensure that they develop an appropriate sense for the social dynamics of the world they live in. This, together with the earlier discussion in this paper, invites the hypothesis that, to be confident of the safety of an autonomous robot the following is essential: not only does this machine need to be “other-aware,” but if that “other-awareness” is to be learnt while enjoying to a large extent the level of autonomy that an intrinsic motivation model provides, it will be essential for the machine to undergo a suitably organized socialization process.
As a technical note, we remark that the type of model required for empowerment computation only needs to relate the actions and current sensor states to the expected subsequent sensor states and does not require the complete world mechanics. In fact, empowerment can be based on the general, but purely intrinsic
One central property of empowerment is that the formalism remains practically unchanged for different incarnations of robots or agents and has very few parameters; the time horizon being the only one for discrete empowerment. However, there is one, less obvious, “parameter” which has only been briefly discussed in the previous literature (Salge et al.,
By considering only certain sensor variables, one can reduce the state space and speed up computation immensely. However, one also influences the outcome of the computation by basically assigning which distinctions between states of the world should be considered relevant. An agent with positional sensors will only care about mobility, while an agent with visual sensors might also care about being in a state with different reachable views. In the simplest models, we often just assume that the agent perceives the whole world. In biological examples, we can lean on the idea of evolutionary adaptation (i.e., Jeffery (
This becomes even more of an issue when considering transfer empowerment which is basically an example of using partial sensor selection to focus on relevant properties. If both the human and the robot could fully sense the environment, then the human empowerment and the human-to-robot transfer empowerment would be identical, as both the human and the robot sensors would capture exactly the same information. In the continuous 2D example presented in this paper, we considered the human’s sensors to only capture the human’s position, and the robot sensors to only capture the robot’s position, so their perceptions would be distinct. In the NPC AI example by Guckelsberger and Salge (
If we extended the human’s sensors to the extent that the human could sense at least everything that the robot can sense, then the sensors relevant for transfer empowerment would be a subset of human empowerment. We could then just compute human empowerment and capture both potential causal flows at the same time. But by splitting the sensor variables, we basically compute partial sensor empowerment, once for the sensors pertaining to the human state and once with a selection of sensors pertaining to the robot state. In a real world scenario the embodied perspective of both the human and the robot usually lead to different perceptions of the world. Both have limited sensors and as a result there is a natural distinction between their respective sensor inputs. When using simulated environment, on the other hand, it is often easy to give all agents access to the whole world state. In this case it becomes necessary to limit the sensors of the different agents to introduce this split, before one is able to differentiate between the two different heuristics. This separation of human and human-to-robot empowerment allows for the prioritization of human empowerment over transfer empowerment. In general, we would expect one bit of human empowerment to be more valuable than one bit of human-to-robot empowerment and, therefore, aim to retain this distinction.
Whenever different variants of some evaluation function exist which cover different aspects of a phenomenon and these aspects are combined, one is left with the question how to weight them against each other. In the present case, this would mean balancing the three types of empowerment. The analogy with the Three Laws might suggest a clear hierarchy, where one would first maximize human empowerment and then only consider the other heuristics. However, given that one can always expect some minimal non-trivial gradient to exist in the first, such a lexicographic ordering would basically lead to only maximizing human empowerment above all else, completely overriding the other measures.
On the other hand, going back to Figure
This conceptual tension is also present in the original Three Laws. Consider a gedankenexperiment where a robot is faced with two options: (A) inflicting minor harm, such as a scratch, on a single human or (B) by avoiding that, permitting the destruction of all (perfectly peaceful) robots on earth. In a strict interpretation, the Laws would dictate to chose option (B), but we might be inclined to consider that there must be some amount of harm so negligible that (A) would seem the better option than cause a regression to a robotic “Stone Age.”
But how could we capture this insight with our three previously developed heuristics? We can, of course, consider a straightforward weighted sum of the three heuristics, defining some trade-off between the three values in the usual manner. But this approach inevitably raises the question whether there would be some distinct non-arbitrary trade-off.
The previous analogy is instructive. The problem is that option (B), the destruction of all robots, would create a lot of more significant problems further down the line than the single scratch of option A. It would result in the loss of all robots able to carry out the human commands, and there would be fewer robots to protect and save humans in the future (they would have to be rebuilt, absorbing significant productive work, before—if at all—reaching original levels).
Both of these problems are reflected in human empowerment on longer timescales. In fact, we would suggest that all three heuristics, and actually also the original Three Laws, reflect the idea that one core reason why humans build and program robots is actually to increase their very own empowerment, their very own options for the future. We already argued that transfer empowerment, the second heuristic, extends human empowerment further into the world, because the robot amplifies the human’s actions and their impact on the world. Similarly, robots that preserve themselves, as by following the third heuristic, make sure that they preserve or extend the human’s empowerment further.
The first heuristic is already directly about human empowerment maximization itself. So, in essence, the two other heuristics, robot empowerment and transfer empowerment, can be seen as a form of meta-heuristics for ultimate human empowerment maximization. We conjecture that both the behaviors of the second and the third heuristic might emerge once one maximizes the human empowerment with a sufficiently long temporal horizon. For example, the robot could realize, with a good enough model of the future, that it needs to keep itself functional in order to prevent harm to the human in the future. So, basically, we hypothesize that the Second and Third Law might manifest themselves as a short term proxy for a suitable longer term optimization of human empowerment. If so, it may be that this would help define a natural trade-off: in our example, the robot might calculate that, by preventing destruction of all robots on earth, at cost of inflicting a small scratch on a human, it would prevent many more and worse injuries of humans in the future by the thus rescued robots.
One final remark: in this paper we have not considered true multi-agent empowerment. The reason for this is subtle: empowerment so far is usually computed as an open-loop channel capacity. The future (potential) action sequences considered for empowerment are basically executed without reacting to the changing sensor states inside the time horizon. In other words, empowerment is computed as the channel capacity between fixed-length “open-loop” action sequences and the future sensor observation. In choosing these action sequences, intermediate sensor observations
This makes it impossible to formally account for instantaneously reacting to another agent’s actions during the computation of the potential futures, since this model selects actions only at the beginning and then only evaluates how the will affect the world at the end of the action run. However, we showed earlier that transfer empowerment can be massively enhanced by reacting to the human’s actions. This indicates strongly that it would be important to model empowerment with “reactive” action sequences, i.e., empowerment where action sequences are expressed in closed-loop form and which instantaneously react to other agents (or even changes in the environment) while still
These various aspects of the implementation of empowerment indicate a number of strategies to render it a useful tool to operationalize the Three Laws in a transparent way. Furthermore, they also may offer a pathway demonstrating how also other classes of intrinsic motivation measures might be adapted to achieve the fusion of the desirable autonomy of robots with the requirements of the Three Laws.
CS and DP developed the original idea. CS wrote and ran the simulations and drafted the paper. DP advised during the development of the simulations and co-wrote the paper.
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.