The expansive research field of computational intelligence combines various nature-inspired computational methodologies and draws on rigorous quantitative approaches across computer science, mathematics, physics, and life sciences. Some of its research topics, such as artificial neural networks, fuzzy logic, evolutionary computation, and swarm intelligence, are traditional to computational intelligence. Other areas have established their relevance to the field fairly recently: embodied intelligence (Pfeifer and Bongard, 2006; Der, 2014), information theory of cognitive systems (Lungarella and Sporns, 2006; Polani et al., 2007; Ay et al., 2008), guided self-organization (Prokopenko, 2009; Der and Martius, 2012), and evolutionary game theory (Vincent and Brown, 2005).

The intelligence phenomenon continues to fascinate scientists and engineers, remaining an elusive moving target. Following numerous past observations (e.g., Hofstadter, 1985, p. 585), it can be pointed out that several attempts to construct “artificial intelligence” have turned to designing programs with discriminative power. These programs would allow computers to discern between meaningful and meaningless in similar ways to how humans perform this task. Interestingly, as noted by de Looze (2006) among others, such discrimination is based on etymology of “intellect” derived from Latin “*intellego*” (inter-lego): to choose between, or to perceive/read (a core message) between (alternatives). In terms of computational intelligence, the ability to read between the lines, extracting some new essence, corresponds to mechanisms capable of generating computational novelty and choice, coupled with active perception, learning, prediction, and post-diction. When a robot demonstrates a stable control in presence of *a priori* unknown environmental perturbations, it exhibits intelligence. When a software agent generates and learns new behaviors in a self-organizing rather than a predefined way, it seems to be curiosity-driven. When an algorithm rapidly solves a hard computational problem, by efficiently exploring its search-space, it appears intelligent.

In short, innovation and creativity shown within a rich space shaped by diverse, “entropic” forces, appeal to us as cognitive traits (Wissner-Gross and Freer, 2013). Can this intuition be formalized within rigorous and generic computational frameworks? What are the crucial obstacles on such a path?

Intuitively, *intelligent behavior is expected to be predictable and stable, but sensitive to change*. Attempts to formalize this duality date back at least to cybernetics. For example, Ashby’s well-known Law of Requisite Variety states that an active controller requires as much variety (number of states) as that of the controlled system to be stable (Ashby, 1956). In order to explain the generation of behavior and learning in machines and living systems, Ashby also linked the concepts of ultrastability and homeostatic adaptation (Di Paolo, 2000; Fernández et al., 2014). The balance between robustness and adaptivity is often attained near “the edge of chaos” (Langton, 1990), and the corresponding phase transitions are typically detected via high sensitivities to underlying control parameters (thermodynamic variables) (Prokopenko et al., 2011). Stability in self-organizing systems can be generally related to negentropy, the entropy that the system exports (dissipates) to keep its own entropy low (Schrödinger, 1944). Despite significant advances in this direction, the fundamental question whether stability, within processes developing far from an equilibrium, necessitates specific entropy dynamics is still unanswered. Clarifying the connections between entropy dynamics and stable but adaptive behavior is one of the grand challenges for computational intelligence. Put simply, we need to know whether learning and self-organization necessitate phase transitions in certain spaces, in terms of some order parameters. Is it possible to characterize the richness of self-generated choice, intrinsic to intelligent behavior, with respect to generic thermodynamic principles?

The notion of generating and actively exploiting new behaviors, which adequately match the environment highlights that *to be intelligent is to be complex in creating innovations*. And so a mechanism producing computational novelty needs to exceed some threshold of complexity. To be truly impressive in generating endogenous innovation, it needs to be capable of universal computation, or to approach this capability in finite implementations (Casti, 1994; Markose, 2004). In other words, computational novelty may be fundamentally related to undecidability. Again, serious advances have been made in this foundational area of computer science. For example, Casti (1991) analyzed deeper interconnections between dynamical systems, Turing Machines, and formal logic systems: in particular, the complex, class IV, cellular automata were related to formal systems with undecidable statements (Gödel’s incompleteness theorem) and the Halting Problem. Nevertheless, the question whether universal computation is the ultimate innovation-generator is still unresolved, offering another grand challenge: how computational intelligence, including mechanisms producing richness of choice and novelty, is related to undecidability? In an abstract sense, we need to know what the theoretical limits to computational cognition are.

The cross-disciplinary nature of modern computational intelligence is emphasized by interactions with: (i) physics, e.g., via physics of information, econophysics; (ii) computer science, mathematics, and statistics, e.g., via probabilistic and Bayesian inference, graph theory, information theory; (iii) life sciences, e.g., via systems biology, artificial life, neuro-cognitive modeling, and analysis of neural data. A unifying theme underlying these interactions is the universal role of computation, ranging from DNA-based computation to distributed computation in cellular automata, to neural computation in cortical networks, to reservoir computing in artificial neural networks, to chaos-based computing in digital circuitry, to morphological computation in modular robots, to information cascades in self-organizing swarms, etc. Recent attempts to quantitatively capture this universal role have used information-theoretic characterizations of various properties of distributed computation, such as information storage, transfer, modification, and synergy (Harder et al., 2013; Griffith and Koch, 2014; Lizier et al., 2014), as well as several optimization principles (Klyubin et al., 2005; Lungarella and Sporns, 2006; Prokopenko et al., 2006; Ay et al., 2008; Polani, 2009). In general, *information dynamics of computation* have been precisely quantified within spatio-temporal systems, on both global and local scales. Nevertheless, the situation somewhat resembles the state of the mathematical art during pre-calculus times, when concepts of motion and change were not yet embedded within a general mathematical system consistently dealing with variable quantities. Creating an information-theoretic calculus allowing researchers to express and optimize numerous varied elements of information dynamics is arguably one of the contemporary grand challenges for computational intelligence.

Another crucial challenge for computational intelligence is the lack of a unifying theory for various *deep learning* architectures. Deep learning – a broad family of machine learning methods based on learning representations (LeCun et al., 1989; Schmidhuber, 1992) – has achieved a series of successes over recent years. As pointed out by Schmidhuber (2014), shallow and deep learners, e.g., neural networks, are “distinguished by the depth of their *credit assignment paths*, which are chains of possibly learnable, causal links between actions and effects.” The underlying assumption behind deep learning algorithms is that observed data are generated by multi-level interactions of many different factors, and therefore, can be represented in a distributed multi-layered way, where distinct layers correspond to various levels of abstraction (Bengio et al., 2013). In addition, the predictive power of deep learning may come from an inherent computational parallelization utilized by distributed representations. Nevertheless, there may be even more fundamental reasons behind prominence of deep learning methods. For example, it can be hypothesized that computation within a deep learner creates a sufficient variety of multiple credit assignment paths, increasing intrinsic plasticity via a larger requisite variety postulated by Ashby’s Law (Obst and Boedecker, 2014), and/or maximizing the ability of reservoir computing or neuronal networks in the states near the edge of chaos (Legenstein and Maass, 2007; Büsing et al., 2010; Buckley and Nowotny, 2011; Boedecker et al., 2012). Furthermore, it is conceivable that the larger variety enables a more robust and precise identification of critical points in the learning dynamics – analogous to phase transitions in connectivity of random graphs, which are detectable within ensembles of graphs (Newman, 2005). In other words, considering dynamics of deep learning in terms of its critical behavior may reveal some underlying mechanisms for convergence toward optimal solutions. Developing a unifying theoretical framework, which brings together several deep learning concepts, such as levels of abstraction, depth of credit assignment paths, requisite variety within distributed representations, critical behavior of learning dynamics, and so on, remains a major task.

Rather than proposing a frontal attack on the moving target of computational intelligence, we suggest to approach the described grand challenges in parallel. Such a cross-disciplinary strategy may not only elucidate the field from different viewpoints, but also offer significant advances in the overlapping research areas.

## Conflict of Interest Statement

The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

## References

Ay, N., Bertschinger, N., Der, R., Guttler, F., and Olbrich, E. (2008). Predictive information and explorative behavior of autonomous robots. *Eur. Phys. J. B* 63, 329–339. doi:10.1140/epjb/e2008-00175-0

Bengio, Y., Courville, A., and Vincent, P. (2013). Representation learning: a review and new perspectives. *IEEE Trans. Pattern Anal. Mach. Intell.* 35, 1798–1828. doi:10.1109/TPAMI.2013.50

Boedecker, J., Obst, O., Lizier, J. T., Mayer, N. M., and Asada, M. (2012). Information processing in echo state networks at the edge of chaos. *Theory Biosci.* 131, 205–213. doi:10.1007/s12064-011-0146-8

Buckley, C. L., and Nowotny, T. (2011). Multiscale model of an inhibitory network shows optimal properties near bifurcation. *Phys. Rev. Lett.* 106, 238109. doi:10.1103/PhysRevLett.106.238109

Büsing, L., Schrauwen, B., and Legenstein, R. A. (2010). Connectivity, dynamics, and memory in reservoir computing with binary and analog neurons. *Neural Comput.* 22, 1272–1311. doi:10.1162/neco.2009.01-09-947

Casti, J. L. (1991). “Chaos, Gödel and truth,” in *Beyond Belief: Randomness, Prediction, and Explanation in Science*, eds J. L. Casti, and A. Karlqvist (Boston, MA: CRC Press), 280–327.

Casti, J. L. (1994). *Complexification: Explaining a Paradoxical World Through the Science of Surprise*. New York, NY: Harper Collins.

de Looze, L. (2006). *Manuscript Diversity, Meaning, and “Variance” in Juan Manuel’s “El Conde Lucanor”*. Toronto, ON: University of Toronto Press.

Der, R. (2014). “On the role of embodiment for self-organizing robots: behavior as broken symmetry,” in *Guided Self-Organization: Inception*, Vol. 9, ed. M. Prokopenko (Berlin: Springer), 193–221.

Der, R., and Martius, G. (2012). *The Playful Machine – Theoretical Foundation and Practical Realization of Self-Organizing Robots*. Berlin Heidelberg: Springer.

Di Paolo, E. A. (2000). “Homeostatic adaptation to inversion of the visual field and other sensorimotor disruptions,” in *From Animals to Animats 6: Proceedings of the 6th International Conference on the Simulation of Adaptive Behavior*, eds J.-A. Meyer, A. Berthoz, D. Floreano, H. Roitblat, and S. W. Wilson (Cambridge MA: MIT Press), 440–449.

Fernández, N., Maldonado, C., and Gershenson, C. (2014). “Information measures of complexity, emergence, self-organization, homeostasis, and autopoiesis,” in *Guided Self-Organization: Inception*, ed. M. Prokopenko (Berlin: Springer), 19–51.

Griffith, V., and Koch, C. (2014). “Quantifying synergistic mutual information,” in *Guided Self-Organization: Inception*, Vol. 9, ed. M. Prokopenko (Berlin: Springer), 159–190.

Harder, M., Salge, C., and Polani, D. (2013). Bivariate measure of redundant information. *Phys. Rev. E* 87, 012130. doi:10.1103/PhysRevE.87.012130

Hofstadter, D. R. (1985). *Metamagical Themas: Questing for the Essence of Mind and Pattern*. New York, NY: Basic Books.

Klyubin, A. S., Polani, D., and Nehaniv, C. L. (2005). “Empowerment: a universal agent-centric measure of control,” in *Proceedings of the IEEE Congress on Evolutionary Computation*, Vol. 1 (Edinburgh: IEEE Press), 128–135.

Langton, C. G. (1990). Computation at the edge of chaos: phase transitions and emergent computation. *Physica D* 42, 12–37. doi:10.1016/0167-2789(90)90064-V

LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., et al. (1989). Backpropagation applied to handwritten zip code recognition. *Neural Comput.* 1, 541–551. doi:10.1162/neco.1989.1.4.541

Legenstein, R. A., and Maass, W. (2007). Edge of chaos and prediction of computational performance for neural circuit models. *Neural Netw.* 20, 323–334. doi:10.1016/j.neunet.2007.04.017

Lizier, J. T., Prokopenko, M., and Zomaya, A. Y. (2014). “A framework for the local information dynamics of distributed computation in complex systems,” in *Guided Self-Organization: Inception*, Vol. 9, ed. M. Prokopenko (Berlin: Springer), 115–158.

Lungarella, M., and Sporns, O. (2006). Mapping information flow in sensorimotor networks. *PLoS Comput. Biol.* 2:e144. doi:10.1371/journal.pcbi.0020144

Markose, S. M. (2004). *Novelty and Surprises in Complex Adaptive System (CAS) Dynamics: A Computational Theory of Actor Innovation*. Colchester: University of Essex.

Newman, M. E. J. (2005). “Random graphs as models of networks,” in *Handbook of Graphs and Networks*, eds S. Bornholdt, and H. G. Schuster (Berlin: Wiley-VCH Verlag GmbH & Co. KGaA), 35–68.

Obst, O., and Boedecker, J. (2014). “Guided self-organization of input-driven recurrent neural networks,” in *Guided Self-Organization: Inception*, Vol. 9, ed. M. Prokopenko (Berlin: Springer), 319–340.

Pfeifer, R., and Bongard, J. C. (2006). *How the Body Shapes the Way We Think: A New View of Intelligence*. Cambridge MA: The MIT Press.

Polani, D., Sporns, O., and Lungarella, M. (2007). “How information and embodiment shape intelligent information processing,” in *Proceedings of the 50th Anniversary Summit of Artificial Intelligence*, Vol. 4850, eds M. Lungarella, F. Iida, J. Bongard, and R. Pfeifer (Berlin: Springer), 99–111.

Prokopenko, M., Gerasimov, V., and Tanev, I. (2006). “Evolving spatiotemporal coordination in a modular robotic system,” in *From Animals to Animats 9: 9th International Conference on the Simulation of Adaptive Behavior (SAB 2006)*, Vol. 4095, eds S. Nolfi, G. Baldassarre, R. Calabretta, J. Hallam, D. Marocco, J.-A. Meyer, O. Miglino, and D. Parisi (Rome: Springer), 548–559.

Prokopenko, M., Lizier, J. T., Obst, O., and Wang, X. R. (2011). Relating Fisher information to order parameters. *Phys. Rev. E* 84, 041116. doi:10.1103/PhysRevE.84.041116

Schmidhuber, J. (1992). Learning complex, extended sequences using the principle of history compression. *Neural Comput.* 4, 234–242. doi:10.1162/neco.1992.4.2.234

Schmidhuber, J. (2014). *Deep Learning in Neural Networks: An Overview*. Technical Report IDSIA-03-14. arXiv:1404.7828v1.

Schrödinger, E. (1944). *What is Life? The Physical Aspect of the Living Cell*. Cambridge, MA: Cambridge University Press.

Vincent, T. L., and Brown, J. S. (2005). *Evolutionary Game Theory, Natural Selection, and Darwinian Dynamics*. Cambridge, NY: Cambridge University Press.