# QUANTUM STRUCTURES IN COGNITIVE AND SOCIAL SCIENCE

EDITED BY: Diederik Aerts, Jan Broekaert, Liane Gabora and Sandro Sozzo PUBLISHED IN: Frontiers in Psychology

#### *Frontiers Copyright Statement*

*© Copyright 2007-2016 Frontiers Media SA. All rights reserved. All content included on this site, such as text, graphics, logos, button icons, images, video/audio clips, downloads, data compilations and software, is the property of or is licensed to Frontiers Media SA ("Frontiers") or its licensees and/or subcontractors. The copyright in the text of individual articles is the property of their respective authors, subject to a license granted to Frontiers.*

*The compilation of articles constituting this e-book, wherever published, as well as the compilation of all other content on this site, is the exclusive property of Frontiers. For the conditions for downloading and copying of e-books from Frontiers' website, please see the Terms for Website Use. If purchasing Frontiers e-books from other websites or sources, the conditions of the website concerned apply.*

*Images and graphics not forming part of user-contributed materials may not be downloaded or copied without permission.*

*Individual articles may be downloaded and reproduced in accordance with the principles of the CC-BY licence subject to any copyright or other notices. They may not be re-sold as an e-book.*

*As author or other contributor you grant a CC-BY licence to others to reproduce your articles, including any graphics and third-party materials supplied by you, in accordance with the Conditions for Website Use and subject to any copyright notices which you include in connection with your articles and materials.*

> *All copyright, and all rights therein, are protected by national and international copyright laws.*

*The above represents a summary only. For the full conditions see the Conditions for Authors and the Conditions for Website Use.*

ISSN 1664-8714 ISBN 978-2-88919-876-4 DOI 10.3389/978-2-88919-876-4

# About Frontiers

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

# Frontiers Journal Series

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing. All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

# Dedication to Quality

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view. By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

# What are Frontiers Research Topics?

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

# **QUANTUM STRUCTURES IN COGNITIVE AND SOCIAL SCIENCE**

Topic Editors:

**Diederik Aerts,** Free University of Brussels, Belgium **Jan Broekaert,** Free University of Brussels, Belgium **Liane Gabora,** University of British Columbia, Canada **Sandro Sozzo,** University of Leicester, UK

Double slit-like pattern created by the concept Fruits interfering with the concept Vegetables in the disjunction Fruits or Vegetables. Image by Diederik Aerts

Traditional approaches to cognitive psychology correspond with a classical view of logic and probability theory. More specifically, one typically assumes that cognitive processes of human thought are founded on the Boolean structures of classical logic, while the probabilistic aspects of these processes are based on the Kolmogorovian structures of classical probability theory. However, growing experimental evidence indicates that the models founded on classical structures systematically fail when human decisions are at stake. These experimental deviations from classical behavior have been called `paradoxes', `fallacies', `effects' or `contradictions', depending on the specific situation where they appear. But, they involve a broad spectrum of cognitive and social science domains, ranging from conceptual combination to decision making under uncertainty, behavioral economics, and linguistics. This situation has constituted

a serious drawback to the development of various disciplines, like cognitive science, linguistics, artificial intelligence, economic modeling and behavioral finance.

A different approach to cognitive psychology, initiated two decades ago, has meanwhile matured into a new domain of research, called 'quantum cognition'. Its main feature is the use of the mathematical formalism of quantum theory as modeling tool for these cognitive situations where traditional classically based approaches fail. Quantum cognition has recently attracted the interest of important journals and editing houses, academic and funding institutions, popular science and media. Specifically, within a quantum cognition approach, one assumes that human decisions do not necessarily obey the rules of Boolean logic and Kolmogorovian probability, and can on the contrary be modeled by the quantum-mechanical formalism. Different concrete quantum-theoretic models have meanwhile been developed that successfully represent the cognitive situations that are classically problematical, by explaining observed deviations from classicality in terms of genuine quantum effects, such as `contextuality', `emergence', `interference', `superposition', `entanglement' and `indistinguishability'. In addition, the validity of these quantum models is convincingly confirmed by new experimental tests. We also stress that, since the use of a quantum-theoretic framework is mainly for modeling purposes, the identification of quantum structures in cognitive processes does not presuppose (without being incompatible with it) the existence of microscopic quantum processes in the human brain.

In this Research Topic, we review the major achievements that have been obtained in quantum cognition, by providing an accurate picture of the state-of-the-art of this emerging discipline. Our overview does not pretend to be either complete or exhaustive. But, we aim to introduce psychologists and social scientists to this challenging new research area, encouraging them, at the same time, to consider its promising results. It is our opinion that, if continuous progress in this domain can be realized, quantum cognition can constitute an important breakthrough in cognitive psychology, and potentially open the way towards a new scientific paradigm in social science.

**Citation:** Aerts, D., Broekaert, J., Gabora, L., Sozzo, S., eds. (2016). Quantum Structures in Cognitive and Social Science. Lausanne: Frontiers Media. doi: 10.3389/978-2-88919-876-4

# Table of Contents


# Editorial: Quantum Structures in Cognitive and Social Science

#### Diederik Aerts <sup>1</sup> , Jan Broekaert <sup>1</sup> , Liane Gabora<sup>2</sup> and Sandro Sozzo<sup>3</sup> \*

*<sup>1</sup> Center Leo Apostel for Interdisciplinary Studies, Free University of Brussels, Brussels, Belgium, <sup>2</sup> Department of Psychology, University of British Columbia, Kelowna, BC, Canada, <sup>3</sup> School of Management, Institute for Quantum Social and Cognitive Science, University of Leicester, Leicester, UK*

Keywords: quantum structures, quantum cognition, decision theory, human cognition, cognitive modeling

### **The Editorial on the Research Topic**

### **Quantum Structures in Cognitive and Social Science**

A fundamental problem in cognitive and social science concerns the identification of the principles guiding human cognitive acts such as decision-making, categorization, and behavior under uncertainty. Identifying these mechanisms would have manifold implications for fields ranging from psychology to economics, finance, politics, computer science, and artificial intelligence. The predominant theoretical paradigm rests on a classical conception of logic and probability theory. According to this paradigm people make decisions by following the rules of Boole's logic, while the probabilistic aspects of these decisions can be formalized by Kolmogorov's probability theory. This classical approach was believed to provide a quite complete and accurate account of human decision-making at both a normative level (describing what people should do) and a descriptive level (describing what people actually do). However, starting from the seventies, experimental studies of conceptual categorization, human judgment and perception, and behavioral economics have revealed that this classical conception is fundamentally problematical, in the sense that the cognitive models based on these mathematical structures are not capable of capturing how humans make decisions in situations involving uncertainty. In the last decade, an alternative scientific paradigm has arisen that employs a different and more general modeling scheme; it uses the mathematical formalism of quantum theory to model situations and processes in cognitive and social science. This new approach has not only met with considerable success but is becoming increasingly accepted in the scientific community, having attracted interest from important scientists, top journals, funding institutions, and media. Prisoners' dilemmas, conjunction and disjunction fallacies, disjunction effects, violations of the Sure-Thing principle, Allais, Ellsberg and Machina paradoxes, are only some of the examples where the application of the quantum mechanical formalism has shown significant effectiveness over traditional modeling schemes of a classical type.

The Frontiers Research Topic "Quantum Structures in Cognitive and Social Science" present an overview of current research that applies the formalism of quantum theory to cognitive and socioeconomic domains. The term "quantum" may be misleading. The aim here is not to investigate the microphysical processes occurring in the human brain and, as a consequence, driving human judgments. Rather, we inquire into the validity of quantum theory as a general, coherent, and unitary paradigm for human cognition. In this respect, this research benefits from studies into the axiomatic and operational foundations of quantum physics. The scope of this bold approach to human cognition is discovering general rules that associate the empirical phenomenology in these domains with states, measurements, and probabilities of outcomes in such a way that these entities are represented exactly as quantum theory in Hilbert space represents states, measurements, and probabilities of outcomes in the phenomenology of microphysics. The ensuing modeling is

Edited and reviewed by: *Bernhard Hommel, Leiden University, Netherlands*

> \*Correspondence: *Sandro Sozzo ss831@le.ac.uk*

#### Specialty section:

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology*

Received: *05 April 2016* Accepted: *07 April 2016* Published: *25 April 2016*

#### Citation:

*Aerts D, Broekaert J, Gabora L and Sozzo S (2016) Editorial: Quantum Structures in Cognitive and Social Science. Front. Psychol. 7:577. doi: 10.3389/fpsyg.2016.00577* theory-based, not experiment-based; that is, the models are not built around a specific effect or experiment, although they are sometimes used in conjunction with empirical data to build a stronger case. The models are constructed following the general epistemological and technical constraints of quantum theory; hence the successes of this quantum theory-based modeling suggest that it might provide a general theory for human cognition.

This Research Topic develops around three main directions of research, as follows.


The first set of results concern knowledge representation and conceptual categorization. Aerts et al. analyze the results of a cognitive test on conjunctions and negations of natural concepts, showing that a quantum-theoretic probabilistic model in Hilbert space faithfully represents the collected data, at variance with a set-theoretic Kolmogorovian model. This result is explained by assuming the existence of two types of reasoning in human cognition, a dominant emergent reasoning, and a secondary logical reasoning. Some mathematical aspects of this quantumtheoretic model on conceptual conjunctions and negations are developed in Veloz and Desjardins through the introduction of unitary operators in Hilbert space. Aerts et al. show instead that the quantum-theoretic approach Aerts et al. can be interpreted as a suitable generalization of Rosch's prototype theory, where prototypes are context-dependent and may interfere when concepts combine.

The second set of results concern the modeling of human decision-making. Moreira and Wichert explore an alternative quantum-theoretic approach, the quantum-like Bayesian network, to describe the paradoxes related to the violation of the Sure-Thing Principle in experiments on human judgments. Their model is in a good agreement with different sets of empirical data. The opinion paper in Pothos et al. reviews some current progress on the quantum similarity model in Hilbert space recently proposed by Pothos et al. which correctly represents human similarity judgments. Decision-making errors and preference reversal are also investigated in Yukalov and Sornette within their quantum decision theory. Wang and Busemeyer analyze the notion of complementarity in human cognition, and claim that the way in which it is used in quantum physics can also be helpful in cognitive science. Human perception is the object of the study in Khrennikov, where the author develops a quantum model of the sensation-perception dynamics, illustrating it by means of the model of bistable perception of a specific ambiguous figure, the Schröder stair. Finally, Tressoldi et al. identify a significant violation of temporal Bell inequalities in a set of cognitive tests. The violation indicates, according to the authors, the presence of temporal entanglement between binary human behavioral unconscious choices at a given time and binary random outcomes at a different time. In all these approaches, the presence of quantum structures in cognition is determined by the fact that the cognitive systems under investigation share a common feature, namely contextuality. A different position with respect to the presence of contextuality in cognition is assumed in Zhang and Dzhafarov, where the authors apply a theory of (non)contextuality to analyze series-parallel (SP) mental architectures.

The third set of results concern advanced applications of the quantum-mathematical formalism to wider ranges of social science. Bisconti et al. propose an inverse Potts model, typically used in statistical quantum field theory, to reconstruct the node states in a real-world social network. Haven explores the properties of two types of potential functions, inspired by classical and quantum physics that can be potentially employed to model financial information, including preferences toward risk and uncertainty. Finally, Dalla Chiara et al. investigate different, but mutually related, aspects of parallelism within the framework of quantum computation, cognition and music, and study potential applications of quantum computational semantics in both natural and musical language.

Leaving aside the specific differences between the approaches above, most of them agree in claiming that quantum structures are systematically present in cognitive and social science phenomena, and that quantum-inspired models are more efficient than traditional set-theoretic models of probability. Is "quantum" the end of the story? Is Hilbert space really the place where all these phenomena can be modeled? Is there any empirical deviation from quantum predictions? We do not have yet an answer to these questions. This is why we believe that the road that will lead further to possibly a generally accepted quantum theory of human decision-making will still be full of fascinating surprises.

# AUTHOR CONTRIBUTIONS

All authors listed, have made substantial, direct, and intellectual contribution to the work, and approved it for publication.

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Aerts, Broekaert, Gabora and Sozzo. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

\*

# Generalizing Prototype Theory: A Formal Quantum Framework

#### Diederik Aerts <sup>1</sup> , Jan Broekaert <sup>1</sup> , Liane Gabora<sup>2</sup> and Sandro Sozzo<sup>3</sup>

*<sup>1</sup> Center Leo Apostel for Interdisciplinary Studies, Free University of Brussels, Brussels, Belgium, <sup>2</sup> Department of Psychology, University of British Columbia, Kelowna, BC, Canada, <sup>3</sup> School of Management, Institute for Quantum Social and Cognitive Science, University of Leicester, Leicester, UK*

Theories of natural language and concepts have been unable to model the flexibility, creativity, context-dependence, and emergence, exhibited by words, concepts and their combinations. The mathematical formalism of quantum theory has instead been successful in capturing these phenomena such as graded membership, situational meaning, composition of categories, and also more complex decision making situations, which cannot be modeled in traditional probabilistic approaches. We show how a formal quantum approach to concepts and their combinations can provide a powerful extension of prototype theory. We explain how prototypes can interfere in conceptual combinations as a consequence of their contextual interactions, and provide an illustration of this using an intuitive wave-like diagram. This quantum-conceptual approach gives new life to original prototype theory, without however making it a privileged concept theory, as we explain at the end of our paper.

#### Edited by:

*Kevin Bradley Clark, University of California, Los Angeles, USA*

#### Reviewed by:

*Terrence C. Stewart, Carleton University, Canada Bruce MacLennan, University of Tennessee, USA*

> \*Correspondence: *Sandro Sozzo ss831@le.ac.uk*

#### Specialty section:

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology*

Received: *13 August 2015* Accepted: *09 March 2016* Published: *30 March 2016*

#### Citation:

*Aerts D, Broekaert J, Gabora L and Sozzo S (2016) Generalizing Prototype Theory: A Formal Quantum Framework. Front. Psychol. 7:418. doi: 10.3389/fpsyg.2016.00418* Keywords: cognition, concept theory, prototype theory, contextuality, interference, quantum modeling

# 1. INTRODUCTION

Theories of concepts struggle to capture the creative flexibility with which concepts are used in natural language, and combined into larger complexes with emergent meaning, as well as the context-dependent manner in which concepts are understood (Geeraerts, 1989). In this paper, we present some recent advances in our quantum approach to concepts. More specifically, we follow the general lines illustrated in Gabora and Aerts (2002), Aerts and Gabora (2005a,b), and Gabora et al. (2008), and generalize the quantum-theoretic model elaborated in Aerts (2009b) and Aerts et al. (2013a).

According to the "classical," or "rule-based" view of concepts, which can be traced back to Aristotle, all instances of a concept share a common set of necessary and sufficient defining properties. Wittgenstein pointed out that: (i) in some cases it is not possible to give a set of characteristics or rules defining a concept; (ii) it is often unclear whether an object is a member of a particular category; (iii) conceptual membership of an instance strongly depends on the context.

A major blow to the classical view came from Rosch's work on color. This work showed that colors do not have any particular criterial attributes or definite boundaries, and instances differ with respect to how typical they are of a concept (Rosch, 1973, 1978, 1983). This led to formulation of "prototype theory," according to which concepts are organized around family resemblances, and consist of characteristic, rather than defining, features. These features are weighted in the definition of the "prototype." Rosch showed that subjects rate conceptual membership as "graded," with degree of membership of an instance corresponding to conceptual distance from the prototype. Moreover, the prototype appears to be particularly resistant to forgetting. Prototype theory also has the strength that it can be mathematically formulated and empirically tested. By calculating the similarity between the prototype of a concept and a possible instance of it, across all salient features, one arrives at a measure of the "conceptual distance" between the instance and the prototype. Another means of calculating conceptual distance comes out of "exemplar theory" (Nosofsky, 1988, 1992), according to which a concept is represented by, not a set of defining or characteristic features, but a set of salient "instances" of it stored in memory. Exemplar theory has met with considerable success at predicting empirical results. Moreover, there is evidence of preservation of specific training exemplars in memory. Classical, prototype, and exemplar theories are sometimes referred to as "similarity based" approaches, because they assume that categorization relies on data-driven statistical evidence. They have been contrasted with "explanation based" approaches, according to which categorization relies on a rich body of knowledge about the world. For example, according to "theory theory" concepts take the form of "mini-theories" (Murphy and Medin, 1985) or schemata (Rumelhart and Norman, 1988), in which the causal relationships among properties are identified.

Although these theories do well at modeling empirical data when only one concept is concerned, they perform poorly at modeling combinations of two concepts. As a consequence, cognitive psychologists are still looking for a satisfactory and generally accepted model of how concepts combine.

The inadequacy of fuzzy set models of conceptual conjunctions (Zadeh, 1982) to resolve the "Pet-Fish problem" identified by Osherson and Smith (1981) highlighted the severity of the combination problem. People rate the item Guppy as a very typical example of the conjunction Pet-Fish, without rating Guppy as a typical example neither of Pet nor of Fish ("Guppy effect") (Osherson and Smith, 1981, 1982). Studies by Hampton on concept conjunctions (Hampton, 1988a), disjunctions (Hampton, 1988b) and negations (Hampton, 1997) confirmed that traditional fuzzy set and Boolean logical rules are violated whenever people combine concepts, as one usually finds "overextension" and "underextension" in the membership weights of items with respect to concepts and their combinations. It has been shown that people estimate a sentence like "x is tall and x is not tall" as true, in particular when x is a "borderline case" ("borderline contradictions") (Bonini et al., 1999; Alxatib and Pelletier, 2011), again violating the rules of set-theoretic Boolean logic. The seriousness of the combination problem was pointed out by various scholars (Komatsu, 1992; Fodor, 1994; Kamp and Partee, 1995; Rips, 1995; Osherson and Smith, 1997). More recently, other theories of concepts have been developed, such as "Costello and Keane's constraint theory" (Costello and Keane, 2000), "Dantzig, Raffone, and Hommel's connectionist CONCAT model of concepts" (Van Dantzig et al., 2011), "Thagard and Stewart's emergent binding model" (Thagard and Stewart, 2011), and "Gagne and Spalding's morphological approach" (Gagne and Spalding, 2009). However, none of these theories has a strong track record of modeling the emergence and non-compositionality of concept combinations.

The approach to concepts presented in this paper grew out earlier work on the application to concept theory on the axiomatic and operational foundations of quantum theory and quantum probability (Aerts, 1986; Pitowsky, 1989; Aerts, 1999). A major theoretical insight was to shift the perspective from viewing a concept as a "container" to viewing it as "an entity in a specific state that is changing under the influence of a context" (Gabora and Aerts, 2002). This allowed us to provide a solution to the Guppy effect and to successfully represent the data collected on Pet, Fish and Pet-Fish by using the mathematical formalism of quantum theory (Aerts and Gabora, 2005a,b). Then, we proved that none of the above experiments in concept theory can be represented in a single probability space satisfying the axioms of Kolmogorov (1933). We developed a general quantum framework to represent conjunctions, disjunctions and negations of two concepts, which has been successfully tested several times (Aerts, 2009a,b; Sozzo, 2014, 2015; Aerts et al., 2015a), and we put forward an explanatory hypothesis for the observed deviations from traditional logical and probabilistic structures and for the occurrence of quantum effects in cognition (Aerts et al., 2015b). We recently identified a strong and systematic non-classical phenomenon effect, which is deeper than the ones typically detected in concept combinations and directly connected with the mechanisms of concept formation (Aerts et al., 2015c). This work is part of a growing domain of cognitive psychology that uses the mathematical formalism of quantum theory and quantum structures to model empirical situations where the application of traditional probabilistic approaches is problematical (probability judgments errors, decision-making errors, violations of expected utility theory, etc.; Aerts and Aerts, 1995; Aerts et al., 2000, 2013a,b, 2014, 2015; Aerts and Sozzo, 2011, 2014; Busemeyer and Bruza, 2012; Haven and Khrennikov, 2013; Pothos and Busemeyer, 2013; Khrennikov et al., 2014; Wang et al., 2014).

This paper outlines recent progress in the development of a quantum-theoretic framework for concepts and their dynamics. Section 2 explains how the "SCoP formalism" can be interpreted as a "contextual and interfering prototype theory that is a generalization of standard prototype theory" in which prototypes are not fixed, but change under the influence of a context, and interfere as a consequence of their contextual interactions (see also Gabora et al., 2008; Aerts et al., 2013a). Section 3 presents an amended explanatory version of the quantum-mechanical model in complex Hilbert space worked out in Aerts (2009b) and Aerts et al. (2013a) for the typicality of items with respect to the concepts Fruits and Vegetables, and their disjunction Fruits or Vegetables. This improved quantum model illustrates how the prototype of Fruits (Vegetables) changes under the influence of the context Vegetables (Fruits) in the combination Fruits or Vegetables. The latter combination is represented using the quantum-mathematical notion of linear superposition in a complex Hilbert space, which entails the genuine quantum effect of "interference." Hence, our model shows that the prototypes of Fruits and Vegetables interfere in the disjunction Fruits or Vegetables. Sections 2, 3 also justify the fact that our quantum-theoretic framework for concepts can be considered as a "contextual and interfering generalization of original prototype theory." The presence of linear superposition and interference could suggest that concepts combine and interact like waves do.

Aerts et al. Generalizing Prototype Theory

In Section 4 we develop this intuition in detail and propose an intuitive wave-like illustration of the disjunction Fruits or Vegetables. Finally, Section 5 discusses connections between the quantum-theoretic approach to concepts presented here, and other theories of concepts. Although this approach can be interpreted as a specific generalization of prototype theory, it is compatible with insights from other theories of concepts.

We stress that our investigation does not deal with the elaboration of a "specific typicality model" that represents a given set of data on the concepts Fruits, Vegetables, and their disjunction Fruits or Vegetables. We inquire into the mathematical formalism of quantum theory as a general, unitary and coherent formalism to model natural concepts. Our quantum-theoretic model in Section 3 has been derived from this general quantum theory, hence it satisfies specific technical and general epistemological constraints of quantum theory. As such, it does not apply to any arbitrary set of experimental data. Our formalism exactly applies to those data that exhibit a peculiar deviation from classical set-theoretic modeling; such deviations are taken in our framework as indicative of interference and emergence. Data collected on combinations of two concepts systematically exhibit deviations from classical set-theoretical modeling, and traditional probabilistic approaches have difficulty coping with this. In this sense, the success of the quantumtheoretic modeling can be interpreted as a confirmation of the effectiveness of quantum theory to model conceptual combinations. We should also mention that our quantumtheoretic approach has recently produced new predictions, allowing us to identify entanglement in concept combinations (Aerts and Sozzo, 2011, 2014), and systematic deviations from the marginal law, deeply connected to the mechanisms of concept formation (Aerts et al., 2015a,c). These effects would not have been identified in a more traditional investigation of overextension and underextension.

It follows from the above analysis that our quantumtheoretic modeling rests on a "theory based approach," as it straightforwardly derives from quantum theory as "a theory to represent natural concepts." Hence, it should be distinguished from an "ad-hoc modeling based approach," only devised to fit data. One should be suspicious of models in which free parameters are added after the fact on an ad-hoc basis to fit the data more closely. In our opinion, the fact that our "theory derived model" reproduces different sets of experimental data is a convincing argument to support its advantage over traditional modeling approaches and to extend its use to more complex combinations of concepts.

# 2. THE SCoP FORMALISM AS A CONTEXTUAL INTERFERING PROTOTYPE THEORY

This section summarizes the SCoP approach to concepts by providing new insights to the research in Aerts and Gabora (2005a,b) and Gabora et al. (2008).

We mentioned in Section 1 that, according to prototype theory, concepts are associated with a set of characteristic, rather than defining, features (or properties), that are weighted in the definition of the prototype. A new item is categorized as an instance of the concept if it is sufficiently similar to this prototype (Rosch, 1973, 1978, 1983). The original prototype theory was subsequently put into mathematical form as follows. The prototype consists of a set of features {a1, a2,... , aM}, with associated "weights" (or "application values") {xp1, xp2,... , xpM}, where M is the number of features that are considered. A new item k is also associated with a set {xk<sup>1</sup> , xk<sup>2</sup> ,... , xkM}, where the number xkm refers to the applicability of the m-th feature to the item k (for a given stimulus). Then, the conceptual distance between the item k and the prototype, defined as

$$d\_k = \sqrt{\sum\_{m=1}^{M} (\mathbf{x}\_{km} - \mathbf{x}\_{pm})^2} \tag{1}$$

is a measure of the similarity between item and prototype. The smaller the distance d<sup>k</sup> for the item k, the more representative k is of the given concept.

Prototype theory was developed in response to findings that people rate conceptual membership as graded (or fuzzy), with the degree of membership of an instance corresponding to the conceptual distance from the prototype. A second fundamental element of prototype theory is that it can in principle be confronted with empirical data, e.g., membership or typicality measurements.

A fundamental challenge to prototype theory (but also to any other theory of concepts) has become known as the "Pet-Fish problem." The problem can be summarized as follows. We denote by Pet-Fish the conjunction of the concepts Pet and Fish. It has been shown that people rate Guppy neither as a typical Pet nor as a typical Fish, they do rate it as a highly typical Pet-Fish (Osherson and Smith, 1981). This phenomenon of the typicality of a conjunctive concept being greater than—or overextends that of either of its constituent concepts has also been called the "Guppy effect." Using classical logic, or even fuzzy logic, there is no specification of a prototype for Pet-Fish starting from the prototypes of Pet and Fish that is consistent with empirical data (Osherson and Smith, 1981, 1982; Zadeh, 1982). Fuzzy set theory falls short because standard connectives for conceptual conjunction involve typicality values that are less than or equal to each of the typicality values of the conceptual components, i.e., the typicality of an item such as Guppy is not higher for Pet-Fish than for either Pet or Fish.

Similar effects occur for membership weights of items with respect to concepts and their combinations. Hampton's experiments indicated that people estimate membership in such a way that the membership weight of an item for the conjunction (disjunction) of two concepts, calculated as the large number limit of relative frequency of membership estimates, is higher (lower) than the membership weight of this item for at least one constituent concept (Hampton, 1988a,b). This phenomenon is referred to as "overextension" ("underextension"). "Double overextension" ("double underextension") is also an experimentally established phenomenon, when the membership weight with respect to the conjunction (disjunction) of two concepts is higher (lower) than the membership weights with respect to both constituent concepts (Hampton, 1988a,b). Furthermore, conceptual negation does not satisfy the rules of classical Boolean logic (Hampton, 1997). More, Bonini et al. (1999), and Alxatib and Pelletier (2011), identified the presence of "borderline contradictions," directly connected with overextension, namely, a sentence like "John is tall and John is not tall" is estimated as true by a significant number of participants, again violating basic rules of classical Boolean logic. More generally, for each of these experimental data, a single classical probability framework satisfying the axioms of Kolmogorov does not exist (Aerts, 2009a,b; Aerts et al., 2013a,b, 2015a; Sozzo, 2014, 2015). To clarify the latter sentence no single probability space can be constructed for an item whose membership weight with respect to the conjunction of two concepts is overextended with respect to both constituent concepts.

These problems—compositionality, the graded nature of typicality, and the probabilistic nature of membership weights present a serious challenge to any theory of concepts.

We have developed a novel theoretical model of concepts and their combinations (Gabora and Aerts, 2002; Aerts and Gabora, 2005a,b), conjunction (Aerts, 2009a; Aerts et al., 2013a, 2015a; Sozzo, 2014, 2015), disjunction (Aerts, 2009a; Aerts et al., 2013a), conjunction and negation (Aerts et al., 2015a; Sozzo, 2015). It uses the mathematical formalism of quantum theory in Hilbert space to represent data on conceptual combinations, which has been successfully tested several times. This quantum-conceptual approach enables us to model the above-mentioned deviations from classicality in terms of genuine quantum phenomena (contextuality, emergence, entanglement, interference, and superposition), thus capturing fundamental aspects of how concepts combine. More importantly, we have recently identified stronger deviations from classicality than overextension and underextension, which unveil, in our opinion, deep non-classical aspects of concept formation (Aerts et al., 2015c).

The approach was inspired by similarity based theories, such as prototype theory, in several respects:


A key insight underlying our approach is considering a concept as, not a "container of instantiations" but, rather, an "entity in a specific state," which changes under the influence of a context. In our quantum-conceptual approach, a context is mathematically modeled as quantum physics models of a measurement on a quantum particle. The (cognitive) context changes the state of a concept in the way a measurement in quantum theory changes the state of a quantum particle (Aerts and Gabora, 2005a,b). For example, in our modeling of the concept Pet, we considered the context e expressed by Did you see the type of pet he has? This explains that he is a weird person, and found that when participants in an experiment were asked to rate different items of Pet, the scores for Snake and Spider were very high in this context. In this approach, this is explained by introducing different states for the concept Pet. We call "the state of Pet when no specific context is present" its ground state pˆ. The context e changes the ground state pˆ into a new state pweird person pet. Typicality here is an observable semantic quantity, which means that it takes different values in different states of the concept. As a consequence, a substantial part of the typicality variations that are encountered in the Guppy effect are due to, e.g., changes of state of the concept Pet under the influence of a context. More specifically, the typicality variations for the conjunction Pet-Fish are in great part similar to the typicality variations for Pet under the context Fish (and also for Fish under the context Pet). Not only does context play a role in shaping the typicality variations for Pet-Fish, but also interference between Pet and Fish contributes, as we will analyze in detail in Section 3.

In general, whenever someone is asked to estimate the typicality of Guppy with respect to the concept Pet in the absence of any context, it is the typicality in the ground state pˆPet that is obtained, and whenever the typicality of Guppy is estimated with respect to the concept Fish in the absence of any context, it is the typicality in the ground state pˆFish that is obtained. But, whenever someone is asked to estimate the typicality of Guppy with respect to the conjunction Pet-Fish, it is the typicality in a new ground state pˆPet−Fish that is obtained. This new ground state pˆPet−Fish is different from pˆPet as well as from pˆFish. It is close but not equal to the changed state of the ground state pˆPet under the context eFish, and close but not equal to the changed state of the ground state pˆFish under the context ePet, the difference being due to interference taking place between Pet and Fish when they combine into Pet-Fish (see Section 3). The "changes of state under the influence of a context" and corresponding typicalities behave like the changes of state and corresponding probabilities behave in quantum theory, giving rise to a violation of corresponding fuzzy set and/or classical probability rules. This partly explains the high typicality of Guppy in the conjunction Pet-Fish, and its normal typicality in Pet and Fish, and the reason why we identify the Guppy effect as an effect at least partly due to context. There is also an interference effect, as we will see later.

We developed this approach in a formal way, and called the underlying mathematical structure a "State Context Property (SCoP) formalism" (Aerts and Gabora, 2005a). Let A denote a concept. In SCoP, A is associated with a triple of sets, namely the set 6 of states—we denote states by p, q,..., the set M of contexts, we denote contexts by e, f,..., and the set L of properties—we denote properties by a, b,.... The "ground state" pˆ of the concept A is the state where A is not under the influence of any particular context. Whenever A is under the influence of a specific context e, a change of the state of A occurs. In case A was in its ground state pˆ, the ground state changes to a state p. The difference between states pˆ and p is manifested, for example, by the typicality values of different items of the concept, as we have seen in the case of the Guppy effect, and the applicability values of different properties being different in the two states pˆ and p. Hence, to complete the mathematical

construction of SCoP, also two functions µ and ν are needed. The function <sup>µ</sup> : <sup>6</sup> <sup>×</sup> <sup>M</sup> <sup>×</sup> <sup>6</sup> −→ [0, 1] is defined such that µ(q, e, p) is the probability that state p of concept A under the influence of context e changes to state q of concept A. The function <sup>ν</sup> : <sup>6</sup> <sup>×</sup> <sup>L</sup> −→ [0, 1] is defined such that <sup>ν</sup>(p, <sup>a</sup>) is the weight, or normalization of applicability, of property a in state p of concept S. The function µ mainly accounts for typicality measurements, the function ν mainly accounts for applicability measurements. Through these mathematical structures the SCoP formalism captures both "contextual typicality" and "contextual applicability" (Aerts and Gabora, 2005a).

A quantum representation in a complex Hilbert space of data on Pet and Fish and different states of Pet and Fish in different contexts was developed (Aerts and Gabora, 2005a), as well as of the concept Pet-Fish (Aerts and Gabora, 2005b). Let us deepen the connections between the quantum-theoretic approach to concepts and prototype theory (see also Gabora et al., 2008). This approach can be interpreted in a rather straightforward way as a generalization of prototype theory which mathematically integrates context and formalizes its effects, unlike standard prototype theory. What we call the ground state of a concept can be seen as the prototype of this concept. The conceptual distance of an item from the prototype can be reconstructed from the functions µ and ν in the SCoP formalism. Thus, as long as individual concepts are considered and in the absence of any context, prototype theory can be embodied into the SCoP formalism, and the prototype of a concept A can be represented as its ground state pˆA. However, any context will change this ground state into a new state. An important consequence of this is that when the concept is in this new state, the prototype changes. An intuitive way of understanding this is to consider this new state a new "contextualized prototype." More concretely, the concept Pet, when combined with Fish in the conjunction Pet-Fish, has a new contextualized prototype, which could be called "Pet contextualized by Fish." The new state can be thought of as a "contextualized prototype." Hence, this is a prototype-like theory that is capable of mathematically describing the presence and influence of context. From the point of view of conceptual distance, this contextualized prototype will be close to, e.g., Guppy.

The interpretation of the SCoP formalism as a contextual prototype theory can be applied not just to conjunctions and disjunctions of two concepts, but also to abstract categories such as Fruits. It is very likely that the prototype of Fruits is close to, e.g., Apple, or Orange. But let us now consider the combination Tropical Fruits, that is, Fruits under the context Tropical. It is then reasonable to maintain that the new contextualized prototype of Tropical Fruits is closer to, e.g., Pineapple, or Mango, than to Apple, or Orange. The introduction of contextualized prototypes within the SCoP formalism enables us to incorporate abstract categories as well as deviations of typicality from fuzzy set behavior.

Another interesting aspect of this approach to prototype theory comes to light if we consider again the conceptual combination Pet-Fish. It is reasonable that the prototypes of Pet and Fish—ground states pˆPet and pˆFish—interfere in Pet-Fish whenever the typicality of an item, e.g., Guppy, is measured with respect to Pet-Fish. This sentence cannot, however, be made more explicit in the absence of a concrete quantumtheoretic representation of typicality measurements of items with respect to concepts and their combinations. Indeed, interference and superposition effects can be precisely formalized in such quantum representation. This will be the content of Sections 3 and 4.

# 3. A HILBERT SPACE MODELING OF MEMBERSHIP MEASUREMENTS

One can gain insight into how people combine concepts by gathering data on "membership weights" and "typicalities." To obtain data on "typicalities," participants are given a concept, and a list of instances or items, and asked to estimate their typicality on a Likert scale. In other experiments participants are asked to choose which instance they consider most typical of the concept. Averages of these estimates or relative frequencies of the picked items give rise to values representing the typicalities of the respective items. A membership weight is obtained by asking participants to estimate the membership of specific items with respect to a concept. This estimation can be quantified using the 7-point Likert scale and then converted into a relative frequency, and then into a probability called the "membership weight."

Hampton used membership weights instead of typicalities (Hampton, 1988b), because all you can do with typicalities is fuzzy set type calculations: the minimum rule of fuzzy sets for conjunction or the maximum rule for disjunction. This approach has many serious shortcomings; indeed the Pet-Fish problem could not be addressed by it. More serious failures are revealed by membership weight data. Hampton measured "membership weights" and "degrees of non-membership or membership," making these two measurements in one experiment. More specifically, Hampton's experiment generates magnitude data, measuring the "degree of membership or non-membership" using a 7-point Likert scale providing −1, −2, −3 for degrees of non-membership, 1, 2, 3 for degrees of membership and, 0 for borderline cases. From the same experiment membership weight data are obtained, with 8 possible triplets [±, ±, ±] per item. Each triplet indicating with a + whether the participant considered item k to be a member of the first category (A), the second category (B) and the third disjunction category (A or B), and with a − respectively otherwise. In the present Hilbert space model we use the "degree of membership or non-membership" values obtained by Hampton, add +3 to them to make them all non-negative, sum them, and divide each one by this sum. Since there are 24 items in total, in this way we get a set of 24 values in the interval [0,1], that sum up to 1. We will use these values as a substitute for membership collapse probabilities.

Let us first explain how we arrive at the membership collapse probabilities as a consequence of a measurement, and why we can use the above-mentioned calculated values of "degree of membership or non-membership" as substitutes. Suppose that instead of using the data obtained by Hampton, we performed the following experiment. For each pair of concepts and their combination we ask the participant to select one and only one item that they consider the best choice for membership. Then we calculate for each of the 24 items the relative frequency of its appearance. These relative frequencies are 24 values in the interval [0,1] summing up to 1, and their limits for increasing numbers of participants represent the probabilities for each item to be chosen as the best member. These probabilities are what in a quantum model are called the "membership collapse probabilities." Of course, the above described experiment to determine the membership collapse probabilities has not been performed. However, the values calculated from Hampton's measurement of "degree of membership or non-membership," after renormalization as explained above, are expected to correlate with what these membership collapse probabilities would be if they were measured. This is why we use them as substitutes for the membership collapse probabilities in our quantum model. As we will see when we construct the quantum model, the exact values of the substitutes for the membership collapse probabilities are not critical. Thus, if we can model the substitutes for the membership collapse probabilities calculated from Hampton's data, we can also model the actual membership collapse probabilities (the data we would have if the experiment had been done).

So, we repeat, in **Table 1**, Hampton's experimental data (Hampton, 1988b) have been converted into relative frequencies. The "degrees of non-membership and degrees of membership" give rise to µ<sup>k</sup> (X) and now stand for the probability of concepts Fruits (X = A), Vegetables (X = B) and Fruits or Vegetables (X = "A or B") to collapse to the item k, and thus add up to 1, that is,

$$\sum\_{k=1}^{24} \mu\_k(A) = \sum\_{k=1}^{24} \mu\_k(B) = \sum\_{k=1}^{24} \mu\_k(A \text{ or } B) = 1 \tag{2}$$

for the 24 items. The quantum model for concepts and their disjunction in complex Hilbert space is developed by building appropriate state vectors and projection operators for a given ontology of 24 items of two more abstract "container" concepts.

In our model, the Hilbert space is a complex n-dimensional C n , in which state vectors are n-dimensional complex numbered vectors. We use the "bra-ket" notation – respectively h·| and |·i for vector states (see the Appendix for further explanation). The complex conjugate transpose of the |·i ket-vector (nx1 dim.) is the h·| bra-vector (1xn dim.). Projectors and operators are then combined as matrices |·ih·|, while scalars are obtained by inner product h·|·i. We represent the measurement, consisting in the question "Is item k a good example of concept X?," by means of an orthogonal projection operator M<sup>k</sup> . Each self-adjoint operator in the Hilbert space <sup>H</sup> has a spectral decomposition on {M<sup>k</sup> |k = 1,... , 24}, where each M<sup>k</sup> is the projector corresponding to item k from the list of 24 items in **Table 1**. A priori we set no restrictions to the dimension of the complex Hilbert space, and thus neither to the projection space of the operators M<sup>k</sup> . Each separate concept Fruits and Vegetables is now represented by its proper state vector |Ai and |Bi respectively, while their disjunction Fruits or Vegetables is realized by their equiponderous superposition √ 1 2 (|Ai + |Bi). It is precisely this feature of the model—its ability to represent combined concepts as superposed states—that provides the interferential composition of what could not be classically composed using sets.

Following the standard rule of average outcome values of quantum theory, the probabilities, µ<sup>k</sup> (A), µ<sup>k</sup> (B) and µ<sup>k</sup> (A or B) are given by:

$$
\mu\_k(A) = \langle A | M\_k | A \rangle \tag{3}
$$

$$
\mu\_k(B) = \underbrace{\langle B | M\_k | B \rangle}\_{\ldots, \ldots, \ldots, \ldots, \ldots} \tag{4}
$$

$$
\mu\_k(A \text{ or } B) = \frac{\langle A \rangle + \langle B \rangle}{\sqrt{2}} M\_k \frac{|A\rangle + |B\rangle}{\sqrt{2}} \tag{5}
$$

After a straightforward calculation, the membership probability expression µ<sup>k</sup> (A or B) becomes:

$$
\begin{aligned}
\mu\_k \langle A \text{ or } B \rangle &= \frac{1}{2} \left( \langle A | M\_k | A \rangle + \langle A | M\_k | B \rangle + \langle B | M\_k | A \rangle \right) \\ &+ \langle B | M\_k | B \rangle \end{aligned}
$$

$$
\begin{aligned}
&= \frac{1}{2} \left( \mu\_k \langle A \rangle + \mu\_k \langle B \rangle \right) + \mathfrak{R} \langle A | M\_k | B \rangle \end{aligned} \tag{6}
$$

where ℜ takes the real part of hA|M<sup>k</sup> |Bi. This expression shows the contribution of the interference term ℜhA|M<sup>k</sup> |Bi in µ<sup>k</sup> (A or B) with respect to the "classical average" term 1 2 µk (A) + µ<sup>k</sup> (B) . It consists of the real part of the complex probability amplitude of the k-th item in Vegetables (concept |Bi) to be the one in Fruits (concept |Ai).

The quantum concept model imposes the orthogonality of the state vectors corresponding to different concepts. Therefore, we have for the states of Fruits and Vegetables,

$$
\langle A|B \rangle = 0. \tag{7}
$$

Each different item of the projector M<sup>k</sup> also provides an orthogonal projection space. Since the conceptual disjunction Fruits or Vegetables spans a subspace of 2 dimensions in the complex Hilbert space (along the rays of |Ai and |Bi), we set forth the possibility for a complex 2-dimensional subspace for each item. This brings the dimension of the complex Hilbert space to 48. However, we will choose the unit vectors of these subspaces in such a way as to eliminate redundant dimensions whenever possible. Each category vector is built on orthogonal unit vectors, defined by the projection operators M<sup>k</sup> . i.e., we define |e<sup>k</sup> i the unit vector on M<sup>k</sup> |Ai, and define |f<sup>k</sup> i the unit vector on M<sup>k</sup> |Bi. Thus, each item is now represented by a vector spanned by |ek i and |f<sup>k</sup> i. Due the orthogonality of the projectors M<sup>k</sup> , we have

$$
\langle e\_k | f \rangle = \delta\_{kl} c\_k e^{i\gamma\_k} \tag{8}
$$

where the Kronecker δkl = 1 for same indices and zero otherwise, i.e., different item states are orthogonal as well. And c<sup>k</sup> expresses the angle between the two unit vectors |e<sup>k</sup> i and |f<sup>k</sup> i of each 2-dimensional subspace of item k. Notice that should some c<sup>k</sup> be 1, then the required dimension of the complex Hilbert space diminishes by 1, since the vectors |e<sup>k</sup> i and |f<sup>k</sup> i then coincide a property that we will use to minimize the size of the required


TABLE 1 | Membership collapse probability values µk (X) of 24 items for the categories Fruits, Vegetables, and Fruits or Vegetables (Hampton, 1988b).

*Notice also the membership collapse probabilities for* Mustard *and* Pumpkin *still show the mark of double underextension of the disjunction. Membership collapse probability data with* δµ <sup>≈</sup> *<sup>10</sup>*−*<sup>4</sup> entail phase data* δφ <sup>≈</sup> *<sup>2</sup>* · *<sup>10</sup>*−*<sup>1</sup> and lambda data* δλ <sup>≈</sup> *<sup>4</sup>* · *<sup>10</sup>*−*<sup>4</sup> .*

Hilbert space. Should c<sup>k</sup> be different from 1, then |e<sup>k</sup> i and |f<sup>k</sup> i span a subspace of 2 dimensions. The state vectors |Ai and |Bi of the concepts can then be expressed as a superposition of the vectors |e<sup>k</sup> i and |f<sup>k</sup> i for the items:

$$|A\rangle = \sum\_{k=1}^{24} a\_k e^{i\alpha\_k} |e\_k\rangle, \quad |B\rangle = \sum\_{k=1}^{24} b\_k e^{i\beta\_k} |f\_k\rangle \tag{9}$$

where a<sup>k</sup> , b<sup>k</sup> , α<sup>k</sup> , <sup>β</sup><sup>k</sup> <sup>∈</sup> <sup>R</sup>.

We can express their inner product as follows:

$$\begin{aligned} \langle A|B \rangle &= \left(\sum\_{k=1}^{24} a\_k e^{-i\alpha\_k} \langle e\_k| \right) \left(\sum\_{l=1}^{24} b\_l e^{i\beta\_l} |f\_l\rangle \right) \\ &= \sum\_{k=1}^{24} a\_k b\_k c\_k e^{i(\beta\_k - \alpha\_k + \gamma\_k)} = \sum\_{k=1}^{24} a\_k b\_k c\_k e^{i\phi\_k} \end{aligned}$$

where we have defined phase φ<sup>k</sup> as φ<sup>k</sup> : <sup>=</sup> <sup>β</sup><sup>k</sup> <sup>−</sup> <sup>α</sup><sup>k</sup> <sup>+</sup> <sup>γ</sup><sup>k</sup> in the last step. The membership probabilities given in Equations (3 and 4) and the interference terms in Equation (6) can be expanded on the projection spaces of the items:

$$
\mu\_k(A) = \left(\sum\_{l=1}^{24} a\_l e^{-i\alpha\_l} \langle e\_l | \right) \langle a\_k e^{i\alpha\_k} | e\_k \rangle = a\_k^2 \tag{10}
$$

$$\mu\_k(\boldsymbol{B}) = \left(\sum\_{l=1}^{24} b\_l e^{-i\beta\_l} \langle f\_l | \right) \langle b\_k e^{i\beta\_k} | f\_k \rangle = b\_k^2 \tag{11}$$

$$\begin{aligned} \langle A|M\_k|B\rangle &= \left(\sum\_{l=1}^{24} a\_l e^{-i\alpha\_l} \langle e\_l| \right) M\_k |\left(\sum\_{m=1}^{24} b\_m e^{i\beta\_m} |f\_m\rangle \right) \\ &= a\_k b\_k e^{i(\beta\_k - \alpha\_k)} \langle e\_k|f\_k\rangle = a\_k b\_k c\_k e^{i\phi\_k} \end{aligned} \tag{12}$$

Notice that the phase of the k-th component of the conceptual disjunction is not at play in the interference term hA|M<sup>k</sup> |Bi (Equation 6). Taking the real part of the interference term in Equation (12), we can rewrite the membership probability of the disjunction in Equation (6) as follows:

$$\begin{aligned} \mu\_k(A \text{ or B}) &= \frac{1}{2} \left( \mu\_k(A) \\ &+ \mu\_k(B) \right) + \varepsilon\_k \sqrt{\mu\_k(A)\mu\_k(B)} \cos \phi\_k \end{aligned} \tag{13}$$

Rearranging this equation we now choose φ<sup>k</sup> must satisfy

$$\cos \phi\_k = \frac{\mu\_k(A \text{ or } B) - \frac{1}{2}(\mu\_k(A) + \mu\_k(B))}{c\_k \sqrt{\mu\_k(A)\mu\_k(B)}} \tag{14}$$

Since all the membership probabilities on the right side are fixed, the only remaining free parameters are the coefficients ck . These parameters must now be tuned in order to satisfy the orthogonality of |Ai and |Bi. Using the expansion on the unit vector sets {|e<sup>k</sup> i}, {|f<sup>k</sup> i} we obtain for their orthogonality (Equation 7):

$$\sum\_{k=1}^{24} c\_k \sqrt{\mu(A)\_k \mu(B)\_k} \cos \phi\_k = 0,\tag{15}$$

$$\sum\_{k=1}^{24} c\_k \sqrt{\mu(A)\_k \mu(B)\_k} \sin \phi\_k = \text{ 0.} \tag{16}$$

The "cosine sum" (Equation 15) is automatically satisfied due to the definition of cos φ<sup>k</sup> and the normalization of membership probabilities (Equation 2). This can be seen by substituting the expression of cos φ<sup>k</sup> in Equation (14) and then applying the normalization condition of the membership probabilities (Equation 2). The "sine sum" equation still needs to be satisfied. With the defining relation (Equation 14) of φ<sup>k</sup> , and sin φ<sup>k</sup> = ǫk p 1 − cos<sup>2</sup> φ<sup>k</sup> , where ǫ<sup>k</sup> = ± provides the sign, this becomes 1

$$\sum\_{k=1}^{24} \epsilon\_k \sqrt{c\_k^2 \mu\_k(A) \mu\_k(B) - (\mu\_k(A \text{ or } B) - \frac{1}{2} (\mu\_k(A) + \mu\_k(B)))^2} = 0. \tag{17}$$
 
$$= 0. \tag{17}$$

In order to satisfy this equation a simple algorithm was devised (Aerts, 2009a). For convenience of notation we denote the square root expression, with c<sup>k</sup> = 1, by a separate symbol:

$$
\lambda\_k \colon= \sqrt{\mu\_k(A)\mu\_k(B) - (\mu\_k(A \text{ or } B) - \frac{1}{2}(\mu\_k(A) + \mu\_k(B)))^2}.\tag{18}
$$

First, we order the values λ<sup>k</sup> from large to small and then assign a sign ǫ<sup>k</sup> to each of them in such a way that each next partial sum (increasing index) remains smallest. The λ-ranking with corresponding values have been tabulated in **Table 1**. We assign index m to the item with the largest λ-value. In the present case, the item Tomato has the largest value, 0.07679.

We now adopt an optimized complex Hilbert space for our model in which c<sup>k</sup> = 1 (k 6= m), which reduces the space to 25 complex dimensions. We again note that all items except Tomato receive a 1-dimensional complex subspace, while Tomato is represented by a 2-dimensional subspace. The "sine sum" equation in Equation (17) can be written as

$$\sum\_{k=1, k \neq m}^{24} \epsilon\_k \lambda\_k + \epsilon\_m \sqrt{\frac{c\_m^2 \mu\_m(A) \mu\_m(B) - (\mu\_m(A \text{ or } B)}{-\frac{1}{2} (\mu\_m(A) + \mu\_m(B))}} = 0. \tag{19}$$

<sup>1</sup>The cosine value only defines the phase up to its absolute value <sup>|</sup>φ<sup>k</sup> |. Thus, the sign of the sine value is undefined. If ǫ<sup>k</sup> = −1, then φ<sup>k</sup> = −|φ<sup>k</sup> |.

Next, we define the partial sum of the λ<sup>k</sup> according a scheme of signs ǫ<sup>k</sup> such that from large to small the next ǫkλ<sup>k</sup> is added to make the sum smaller but not negative.

$$S\_{\vec{l}} = \sum\_{\text{size ordered }\lambda\_i}^{\vec{j}} \epsilon\_i \lambda\_i \tag{20}$$

$$\mathbf{S}\_{j+1} = \mathbf{S}\_j - \lambda\_{j+1} \text{ and } \epsilon\_{j+1} = -1, \text{ if } \mathbf{S}\_j - \lambda\_{j+1} \ge \mathbf{0} \tag{21}$$

$$\epsilon = \mathbb{S}\_{\flat} + \lambda\_{\flat+1} \text{ and } \epsilon\_{\flat+1} = +1, \text{ if } \mathbb{S}\_{\flat} - \lambda\_{\flat+1} < 0 \tag{22}$$

The first summand is thus λm, with ǫ<sup>m</sup> = +1. Finally this procedure leads to

$$S\_{24} = \sum\_{k=1}^{24} \epsilon\_k \lambda\_k \ge 0$$

In the Fruits and Vegetables example with membership probability data in **Table 1**, this procedure gives:

$$S\_{24} = 0.0154\tag{23}$$

In general the "sine sum" equation then becomes

$$S\_{24} - \lambda\_m + \sqrt{\frac{c\_m^2 \mu\_m(A)\mu\_m(B) - (\mu\_m(A \text{ or } B) \\ \phantom{B} = 0} - \frac{1}{2} (\mu\_m(A) + \mu\_m(B))^2} = 0. \tag{24}$$

From which we can fix cm, the remaining c<sup>k</sup> not equal to 1:

$$c\_m = \sqrt{\frac{(\mathcal{S}\_{24} - \lambda\_m)^2 + (\mu\_m(A \text{ or } B) - \frac{1}{2}(\mu\_m(A) + \mu\_m(B)))^2}{\mu\_m(A)\mu\_m(B)}} \tag{25}$$

In the present example we obtain the value c<sup>m</sup> = 0.8032. We thus have fixed the inner product—or "angle"—of the vectors |emi and |fmi, and can now write an explicit representation in the canonical 25 dimensional complex Hilbert space C <sup>25</sup>. We can take M<sup>k</sup> (H) to be rays of dimension 1 for <sup>k</sup> 6= <sup>m</sup>, and <sup>M</sup>m(H) to be a 2-dimensional plane spanned by the vectors |emi and |fmi.

We let the space C <sup>25</sup> be spanned on the canonical base {**1**i}, i ∈ [1 ... 25]. All items k 6= m are represented by the respective **1**i . While for k = m we express the projections of |Ai and |Bi by Mm(H) accordingly on **1**<sup>m</sup> and **1**<sup>25</sup>

$$a\_m e^{i\alpha\_m}\_{\ldots}|e\_m\rangle\_{\ldots} = \tilde{a}\_m e^{i\alpha\_{m\_1}}\_{\ldots} \mathbf{1}\_m + \tilde{a}\_{25} e^{i\alpha\_{m\_2}}\_{\ldots} \mathbf{1}\_{25} \tag{26}$$

$$b\_m e^{i\beta\_m} |f\_m\rangle = \tilde{b}\_m e^{i\beta\_{m\_1}} \mathbf{1}\_m + \tilde{b}\_{25} e^{i\beta\_{m\_2}} \mathbf{1}\_{25} \tag{27}$$

with a˜m, ˜bm, a˜25, ˜b<sup>25</sup> <sup>∈</sup> <sup>R</sup> to be specified. The parameters in these expressions should satisfy the inner product (Equation 8) for k, l = m

$$a\_m b\_m \langle e\_m | f\_m \rangle = \tilde{a}\_m \tilde{b}\_m e^{-i(\alpha\_{m\_1} - \beta\_{m\_1})} + \tilde{a}\_{25} \tilde{b}\_{25} e^{-i(\alpha\_{m\_2} - \beta\_{m\_2})},\\ \text{(28)}$$

$$= a\_m b\_m c\_m e^{i(\gamma\_m - \alpha\_m + \beta\_m)} \\ \tag{29}$$

and the probability weights for k = m

$$a\_m^2 = \tilde{a}\_m^2 + \tilde{a}\_{25}^2,\tag{30}$$

$$b\_m^2 = \tilde{b}\_m^2 + \tilde{b}\_{25}^2. \tag{31}$$

Finally, the representation of all vectors of the items can now be rendered explicit by simply choosing α<sup>k</sup> = γ<sup>k</sup> = 0, and thus β<sup>k</sup> = φ<sup>k</sup> , ∀k. A further simplification for Tomato is done by setting a˜<sup>25</sup> = 0, which also allows free choice of βm<sup>2</sup> = 0. Then a˜<sup>m</sup> = a<sup>m</sup> and ˜b<sup>m</sup> = bmcm, and ˜b<sup>25</sup> = b<sup>m</sup> p (1 − c 2 <sup>m</sup>). We have rendered explicit these membership probabilities and phases in **Table 1**. Thus we can write the vectors |Ai and |Bi in C <sup>25</sup> Hilbert space corresponding to the categories Fruits and Vegetables respectively.


This completes the quantum model for the membership probability of items with respect to Fruits, Vegetables and Fruits or Vegetables. It captures the enigmatic aspects of conceptual overextension and underextension identified in Hampton (1988b), explaining them in terms of genuine quantum phenomena.

0.1552). (33)

Recalling the terminology adopted in Section 2, the unit vectors |Ai and |Bi in Equations (32) and (33) represent the ground states of the concepts Fruits and Vegetables, respectively. Equivalently, these unit vectors represent the prototypes of the concepts Fruits and Vegetables in prototype theory. The unit vector √ 1 2 (|Ai + |Bi) instead represents the "contextualized prototype" obtained by combining the prototypes of Fruits and Vegetables in the disjunction Fruits or Vegetables. If one now looks at Equation (6), one sees that the prototypes Fruits and Vegetables interfere in the disjunction Fruits or Vegetables, and the term ℜhA|M<sup>k</sup> |Bi in Equation (6) specifies how much interference is present when the membership probability of k is measured.

# 4. AN ILLUSTRATION OF INTERFERING PROTOTYPES

In this section we provide an illustration of contextual interfering prototypes. It is not a complete mathematical representation as presented in Section 3 but, rather, an illustration that can help the reader with a non-technical background to have an intuitive picture of what a contextual prototype is and how contextual prototypes interfere. Consider the concepts Fruits, Vegetables and their disjunction Fruits or Vegetables. The contextual prototype of Fruits can be represented by the x-axis of a plane surrounded by a cloud containing items, features, etc.—all the contextual elements connected with the prototype of Fruits. Similarly, the contextual prototype of Vegetables can be represented by the y-axis of the same plane surrounded by a cloud containing items, features, etc.—all the contextual elements connected with the prototype of Vegetables. How can we represent the contextual prototype of the disjunction Fruits or Vegetables? Although as we have seen it cannot be represented in traditional fuzzy set theory, it can be represented in terms of waves, with peaks and troughs. Indeed, waves can be summed up in such a way that peaks and troughs of the combined wave reproduce overextension and underextension of the data. In other words, waves provide an intuitive geometric illustration of the interference taking place when contextual prototypes are combined in concept combination as discussed in Section 3. For example, let us demonstrate the interference of the item Almond when its membership probability with respect to the disjunction Fruits or Vegetables is calculated based on its membership probabilities for Fruits and for Vegetables. The membership probabilities for the categories Fruits, Vegetables and Fruits or Vegetables have been calculated from the Hampton's data and are reported in **Table 1**.

The idea of an illustration would be to show that in addition to "fuzziness" (as modeled using a fuzzy set-theoretic approach) there is a "wave structure." How can we graphically represent this "wave structure" of a concept? We start from the standard interference formula of quantum theory, which is the following. For an arbitrary item k we have

$$
\mu\_k(A \text{ or } B) = \frac{1}{2} (\mu\_k(A) + \mu\_k(B)) + c\_k \sqrt{\mu\_k(A)\mu\_k(B)} \cos \phi\_k. \tag{34}
$$

Now, we have

$$
\phi\_k = \beta\_k - \alpha\_k + \gamma\_k \tag{35}
$$

where α<sup>k</sup> is the phase angle connected with µ<sup>k</sup> (A), β<sup>k</sup> the phase angle connected to µ<sup>k</sup> (B), and γ<sup>k</sup> the phase angle connected to hA|M<sup>k</sup> |Bi. This has not yet been emphasized but if one analyses the rest of the construction in Hilbert space, it is possible to see that one can always choose γ<sup>k</sup> = 0, which means that, with this choice, φ<sup>k</sup> becomes the difference in phases β<sup>k</sup> and α<sup>k</sup> .

This is all we need to represent the "wave" nature of a concept in a manner analogous to that of quantum theory. Indeed, it is the "phase difference" between the waves—their phases being α<sup>k</sup> and β<sup>k</sup> respectively – that we attach to µ(A)<sup>k</sup> and µ(B)<sup>k</sup> . They determine, together with the membership probabilities µ(A)<sup>k</sup> and µ(B)<sup>k</sup> the interference that gives rise to the measured data for µ(A or B)<sup>k</sup> .

The choice of the c<sup>k</sup> is such that only for the biggest value of λk , which in this case of Tomato, the c<sup>k</sup> is chosen different from 1. The only choice different from 1, for Tomato, still does not influence the fact that φ<sup>k</sup> is the difference between β<sup>k</sup> and α<sup>k</sup> , when we decide to choose γ<sup>k</sup> = 0. Let us consider for example the first item Almond of the list of 24 in **Table 1**. We have

$$
\mu(A)\_1 = 0.0359 \,\tag{36}
$$

$$
\mu \text{(B)}\_{\text{l}} = 0.0133 \tag{37}
$$

$$
\mu \langle A \text{ or } B \rangle\_1 = 0.0269 \tag{38}
$$

These are the data measured by Hampton, and also what exists for the concepts Fruits, Vegetables and their combination Fruits or Vegetables with respect to membership probability of the item Almond in the realm where fuzzy set probability appears. These are the values that do not fit into a model in this realm, and for which a wave-like realm underneath is necessary. Calculating the angle φ<sup>1</sup> we get

$$
\phi\_1 = 84.0^\circ \tag{39}
$$

(see **Table 1**). This angle is the result of a wave being present underneath the fuzzy, probability realm for µ(A)<sup>1</sup> and µ(B)1, such that both waves give rise to a difference in phase—where the crests of one wave meet the troughs of the other—which is equal to β<sup>1</sup> − α1, and is the value of φ1. This can be represented graphically by attaching a wave pattern to µ(A)<sup>1</sup> and another one to µ(B)1, such that both have a phase difference of 84.0◦—see also **Figure 1**.

Let us apply quantum theory to each of the items apart. Each item k has a Schrödinger wave function vibrating in the neighborhood of A, another one vibrating in the neighborhood of B and a third vibrating in the neighborhood of "A or B," and they are related by superposition. We have:

$$
\psi\_k^A = \sqrt{\mu\_k(A)} e^{i\alpha\_k} \tag{40}
$$

$$
\psi\_k^B = \sqrt{\mu\_k(B)} e^{i\beta\_k} \tag{41}
$$

$$
\psi\_k^{\text{AorB}} = \sqrt{\mu\_k \langle A \text{ or } B \rangle} e^{i\delta\_k} \tag{42}
$$

In each case, this gives us the membership probabilities. Squaring (multiplying by its complex conjugate), we have

hψ A k |ψ A k i = (ψ A k ) ∗ (ψ A k ) = p µk (A)e iα<sup>k</sup> <sup>∗</sup> p µk (A)e iα<sup>k</sup> = p µk (A)e −iα<sup>k</sup> p µk (A)e iα<sup>k</sup> =µ<sup>k</sup> (A)e <sup>i</sup>(α−α) <sup>=</sup>µ<sup>k</sup> (A) (43) hψ B k |ψ B k i = (ψ B k ) ∗ (ψ B k ) = p µk (B)e iβ<sup>k</sup> <sup>∗</sup> p µk (B)e iβ<sup>k</sup> = p µk (B)e −iβ<sup>k</sup> p µk (B)e iβ<sup>k</sup> =µ<sup>k</sup> (B)e <sup>i</sup>(β−β) <sup>=</sup> <sup>µ</sup><sup>k</sup> (B) (44) hψ AorB k |ψ AorB k i = (ψ AorB k ) ∗ (ψ AorB k ) = p µk (A or B)e iδk <sup>∗</sup> p µk (A or B)e iδk = p µk (A or B)e −iδ<sup>k</sup> p µk (A or B)e iδk = µ<sup>k</sup> (A or B)e i(δ−δ) = µ<sup>k</sup> (A or B) (45)

If we write the quantum superposition equation for each item we get

$$\frac{1}{\sqrt{2}}(\psi\_k^A + \psi\_k^B) = \psi\_k^{A \text{or} B} \tag{46}$$

$$\Leftrightarrow \frac{1}{\sqrt{2}} \left( \sqrt{\mu(A)\_k} e^{i\alpha\_k} + \sqrt{\mu(B)\_k} e^{i\beta\_k} \right) = \sqrt{\mu(A \text{ or } B)\_k} e^{i\delta}\_k \text{ (47)}$$

where √ 1 2 is a normalization factor. It is the squaring (i.e., multiplying each with its complex conjugate) that gives rise to the interference equation. Let us do this explicitly to see it.

First we multiply the left hand side with its complex conjugate. We do the multiplication explicitly writing each step of it, to see well how the interference formula appears. Hence, we have

$$\begin{aligned} & \left( \frac{1}{\sqrt{2}} \left( \sqrt{\mu\_k(A)} e^{i\alpha\_k} + \sqrt{\mu\_k(B)} e^{i\beta\_k} \right) \right)^\* \left( \frac{1}{\sqrt{2}} \left( \sqrt{\mu\_k(A)} e^{i\alpha\_k} \right) \right) \\ & + \sqrt{\mu\_k(B)} e^{i\beta\_k} \left( \right) \end{aligned}$$

$$\begin{split} &= \left(\frac{1}{\sqrt{2}}\left(\sqrt{\mu\_{k}(A)}e^{-i\alpha\_{k}} + \sqrt{\mu\_{k}(B)}e^{-i\beta\_{k}}\right)\right)\left(\frac{1}{\sqrt{2}}\left(\sqrt{\mu\_{k}(A)}e^{i\alpha\_{k}}\right)\right) \\ &\quad + \sqrt{\mu\_{k}(B)}e^{i\beta\_{k}}\right) \\ &= \frac{1}{2}\left(\sqrt{\mu\_{k}(A)}e^{-i\alpha\_{k}} + \sqrt{\mu\_{k}(B)}e^{-i\beta\_{k}}\right)\left(\sqrt{\mu\_{k}(A)}e^{i\alpha\_{k}} + \sqrt{\mu\_{k}(B)}e^{i\beta\_{k}}\right) \\ &= \frac{1}{2}\left(\sqrt{\mu\_{k}(A)}e^{-i\alpha\_{k}} \cdot \sqrt{\mu\_{k}(A)}e^{i\alpha\_{k}} + \sqrt{\mu\_{k}(A)}e^{-i\alpha\_{k}} \cdot \sqrt{\mu\_{k}(B)}e^{i\beta\_{k}}\right) \\ &\quad + \sqrt{\mu\_{k}(B)}e^{-i\beta\_{k}} \cdot \sqrt{\mu\_{k}(A)}e^{i\alpha\_{k}} + \sqrt{\mu\_{k}(B)}e^{-i\beta\_{k}} \cdot \sqrt{\mu\_{k}(B)}e^{i\beta\_{k}}\right) \\ &= \frac{1}{2}(\mu\_{k}(A)e^{i(\alpha\_{k}-\alpha\_{k})} + \sqrt{\mu\_{k}(A)\mu\_{k}(B)}e^{i(\beta\_{k}-\alpha\_{k})} \\ &\quad + \sqrt{\mu\_{k}(A)\mu\_{k}(B)}e^{-i(\beta\_{k}-\alpha\_{k})} + \mu\_{k}(B)e^{i(\beta\_{k}-\beta\_{k})}) \end{split}$$

we use now that e i(αk−α<sup>k</sup> ) <sup>=</sup> <sup>e</sup> <sup>0</sup> <sup>=</sup> 1, <sup>e</sup> i(βk−β<sup>k</sup> ) <sup>=</sup> <sup>e</sup> <sup>0</sup> <sup>=</sup> 1, e i(βk−α<sup>k</sup> ) <sup>=</sup> cos(β<sup>k</sup> <sup>−</sup> <sup>α</sup><sup>k</sup> ) + isin(β<sup>k</sup> − α<sup>k</sup> ) and e −i(βk−α<sup>k</sup> ) <sup>=</sup> cos(β<sup>k</sup> − α<sup>k</sup> ) − isin(β<sup>k</sup> − α<sup>k</sup> ), to get to the following

$$\begin{aligned} &= \frac{1}{2} (\mu\_k(A) + \sqrt{\mu\_k(A)\mu\_k(B)}(\cos(\beta\_k - \alpha\_k) + i\sin(\beta\_k - \alpha\_k)) \\ &+ \sqrt{\mu\_k(A)\mu\_k(B)}(\cos(\beta\_k - \alpha\_k) - i\sin(\beta\_k - \alpha\_k)) + \mu\_k(B) \end{aligned}$$

see that the term in isin(β<sup>k</sup> − α<sup>k</sup> ) cancels, to get

$$\begin{aligned} \lambda &= \frac{1}{2} (\mu\_k(A) + 2\sqrt{\mu\_k(A)\mu\_k(B)}\cos(\beta\_k - \alpha\_k) + \mu\_k(B)) \\ \lambda &= \frac{1}{2} (\mu\_k(A) + \mu\_k(B)) + \sqrt{\mu\_k(A)\mu\_k(B)}\cos(\beta\_k - \alpha\_k) \end{aligned} \tag{48}$$

Let is multiply now the right hand sight of Equation (46) with its complex conjugate. This gives

$$= \left(\sqrt{\mu\_k(A \text{ or } B)} e\_k^{i\delta}\right)^\* \left(\sqrt{\mu\_k(A \text{ or } B)} e\_k^{i\delta}\right) = \mu\_k(A \text{ or } B) \text{ (49)}$$

Hence, we get, as a consequence of squaring (Equation 46), exactly our interference formula

$$\frac{1}{2}(\mu\_k(A) + \mu\_k(B)) + \sqrt{\mu\_k(A)\mu\_k(B)}\cos(\beta\_k - \alpha\_k) = \mu\_k(A \text{ or } B) \tag{50}$$

Note that the difference in phase β<sup>k</sup> − α<sup>k</sup> between the waves connected with the item k and A and the item k and B is what generates the interference. The new wave connected to the item k and A or B, of which the phase is δ is not influenced by it, is the amplitude of this new wave which is affected. This is the reason that interference is visible in the realm where the fuzzy nature appears, while it is provoked by the realm where the waves occur.

We put forward this "wave nature" aspect of concepts not just as an illustration, but to help the reader understand the manner in which such an underlying wave structure increases substantially the possible ways in which concepts can interact, as compared to the interaction possibilities in a modeling with fuzzy set structures. Of course the notion of a "wave" only adds clarification if we can imagine it to exist in some space-like realm. This is the case for the type of waves we all know from our daily physical environment, such as water waves, sound waves or light waves. The quantum waves of physical quantum particles can also be made visible in general by looking at probabilistic detection patterns of these quantum particles on a physical screen, and noting the typical interference patterns when the waves interact and the particles are detected on the screen. One might believe that an analogous situation is not possible for concepts, because intuitively concepts, unlike quantum particles, do not exist "inside" space. If we look at things is an operational way, however, an analysis can be put forward for the quantum model of the combination of the two concepts, and the graded structure of collapse probability weights of the 24 items, which does illustrate the presence of an interference pattern, and as a consequence reveals the underlying wave structure of concepts and their interactions. Let us explain how we can proceed to accomplish such an analysis.

We start by considering **Figure 2**. We see there the 24 different items of **Table 1** represented by numbered spots in a plane where a graded pattern, starting with the lightest region around the spot number 8, which is Apple, systematically becomes darker. Different numbers of items are situated in spots in regions of different darkness, for example, number 16, Lentils, is situated in a spot in the darkest region. Let us explain how the figure is constructed. The "intensity of light" of a specific region corresponds to the "weights of the items" with respect to the concept Fruits in **Table 1**. Looking at **Table 1**, it is indeed Apple, which has the highest weight, equal to 0.1184, and hence is represented by spot number 8 on **Figure 2**, in the lightest region. Next comes Elderberry with weight equal to 0.1134, represented by spot number 7 on **Figure 2**, on the border of the lightest and second lightest region. Next comes Raisin, with weight equal to 0.1026, represented by spot number 6 on **Figure 2**, on the border of the third and the fourth lightest region. Next comes Tomato, with weight equal to 0.0881, represented by spot number 19 on **Figure 2**, in the seventh lightest region, etc. last is Lentils, with weight equal to 0.0095, represented by spot number 16 on **Figure 2**, in the one to darkest region. Hence **Figure 2** contains a representation of the values of the collapse probability weights of the 24 items with respect to the concept Fruits. There is however more; we can, for example, wonder what the reason is to choose a representation in a plane? To explain this, turn to **Figure 3**. Let us first note with respect to the two figures, although it might not seem the case at first sight, all the numbered regions are located at exactly the same spots in both **Figures 2**, **3**, with respect to the two orthogonal axes that coordinate the plane. What is different in both figures are the graded structures of lighter to darker regions, while they are centered around the spot number 8, representing the item Apple, in **Figure 2** they are centered around the spot number 21, representing the item Broccoli, in **Figure 3**. And, effectively, **Figure 3** represents analogous to **Figure 2** of the same 24 items, their collapse probability weights, but this time with respect to the concept Vegetables. This explains why in **Figure 3** the lightest region is the one centered around spot number 21, representing Broccoli, while the lightest region in **Figure 2** is the one centered around spot number 8, representing Apple. Indeed, Broccoli is the most characteristic vegetable of the considered items, while Apple is the most characteristic fruit, if "characteristic" is measured by the size of the respective collapse

probability, i.e., the probability to choose this item in the course of the study. What might not seem obvious is that in a plane that a graded structure with center Apple and a second graded proven mathematically that a solution always exists, although not a unique one, which means that **Figures 2**, **3** show one of these solutions.

We have chosen on purpose the graded structure form light to dark to be colored yellow, because we can interpret **Figures 2**, **3** such that an interesting analogy arises between our study of the 24 items and two concepts Fruits and Vegetables, and the well-known double slit experiment with light in quantum

it is always possible to find 24 locations for the 24 items such

structure with center Broccoli can be defined, fitting exactly also the other items in their correct value of "graded light to dark,"

Such a situation is what we show in **Figures 2**, **3**. It can be

FIGURE 2 | The probabilities µ(A)<sup>k</sup> of a person choosing the item k as a "good example" of Fruits are fitted into a two-dimensional quantum wave function ψA(x, y). The numbers are placed at the locations of the different items with respect to the Gaussian probability distribution |ψ*A*(*x*, *y*)| <sup>2</sup>. This can be seen as a light source shining through a hole centered on the origin, and regions where the different items are located. The brightness of the light source in a specific region corresponds to the probability that this item will be chosen as a "good example" of Fruits.

mechanics. It is this analogy that will also directly illustrate the "wave nature" of concepts. Suppose we consider a plane figuring in the experiment as a detection screen, and put counters for quantum light particles, i.e., photons, at the numbered spots on the plane. Then we send light through a first slit, which we call the Fruits slit, which is placed in front of the screen. The slit is placed such that the counters in the spots detect numbers of photons with fractions to the total number of photons send equal the collapse probability weights of the items represented by the respective spots with respect to the concept Fruits. The light received on the screen would then look like what is shown in **Figure 2**. Similarly, with counters placed in the same spots, we send light through a second slit, which we call the Vegetable slit. Now the counters detect numbers of photons with fractions to the total number of photons equal to the collapse probability weights of the same items with respect to the concept Vegetables. The light received on the screen would then look like what is shown in **Figure 3**. We can obtain the same figures directly for our psychological study, consisting of each participant choosing amongst the 24 items the one that he or she finds most characteristic of Fruits and Vegetables respectively. The relative frequencies of the first choice gives rise to the image in **Figure 2**, while the relative frequencies of the second choice gives rise to the image in **Figure 3**, if, for example, we would mark each chosen item by a fixed number of yellow light pixels on a computer screen.

Before we combine the two slits to give rise to interference, let us specify the mathematics of the quantum mechanical formalism that underlies the two Figures. The situation can be represented quantum mechanically by complex valued Schrödinger wave functions of two real variables ψA(x, y),ψB(x, y). For the light and the two slits, this situation is the "interaction of a photon with the two slits." For the human participants in the concepts study, this situation is the "interaction with the two concepts of the mind of a participant." We choose for ψA(x, y) and ψB(x, y) quantum wave packets, such that the radial part for both wave packets is a Gaussian in two dimensions. Considering **Figures 2**, **3**, we choose the top of the first Gaussian in the origin where spot number 8 is located, and the top of the second Gaussian in the point (a, b), where spot number 21 is located. Hence

$$
\psi\_A(\mathbf{x}, \boldsymbol{\uprho}) = \sqrt{D\_A} e^{-\left(\frac{\mathbf{x}^2}{4\sigma\_{\rm Ax}^2} + \frac{\mathbf{p}^2}{4\sigma\_{\rm Ay}^2}\right)} e^{i S\_A(\mathbf{x}, \boldsymbol{\uprho})}
$$

$$
\psi\_B(\mathbf{x}, \boldsymbol{\uprho}) = \sqrt{D\_B} e^{-\left(\frac{(\mathbf{x} - \boldsymbol{\uprho})^2}{4\sigma\_{\rm Bx}^2} + \frac{(\mathbf{y} - \boldsymbol{\uprho})^2}{4\sigma\_{\rm By}^2}\right)} e^{i S\_B(\mathbf{x}, \boldsymbol{\uprho})}\tag{51}
$$

The phase parts of the wave packets e iSA(x,y) and e iSB(x,y) are determined by two phase fields SA(x, y) and SB(x, y) which will account for the interference and hence carry the wave nature. Of course, these phase parts vanish when we multiply each wave packet with its complex conjugate to find the connection with the collapse probabilities. Hence,

$$|\psi\_A(\mathbf{x}, \boldsymbol{y})|^2 = D\_A e^{-\left(\frac{\mathbf{x}^2}{2\sigma\_{Ax}^2} + \frac{\mathbf{y}^2}{2\sigma\_{A\mathbf{y}}^2}\right)} |\psi\_B(\mathbf{x}, \boldsymbol{y})|^2 = D\_B e^{-\left(\frac{(\mathbf{x} - \boldsymbol{a})^2}{2\sigma\_{B\mathbf{x}}^2} + \frac{(\mathbf{y} - \boldsymbol{b})^2}{2\sigma\_{B\mathbf{y}}^2}\right)}\tag{52}$$

are the Gaussians to be seen in **Figures 2, 3**, respectively. Let us denote by 1<sup>k</sup> a small surface specifying the spot corresponding to the item number k in the plane of the two figures. We then calculate the collapse probabilities of this item k with respect to the concepts Fruits and Vegetables in a standard quantum mechanical way as follows

$$
\mu\_k(A) = \int\_{\Delta\_k} |\psi\_A(\mathbf{x}, \boldsymbol{y})|^2 d\mathbf{x} d\mathbf{y} = \int\_{\Delta\_k} D\_A e^{-\left(\frac{\boldsymbol{x}^2}{2\sigma\_{Ax}^2} + \frac{\boldsymbol{y}^2}{2\sigma\_{A\mathbf{y}}^2}\right)} d\mathbf{x} d\mathbf{y} \tag{53}
$$

$$
\mu\_k(B) = \int\_{\Delta\_k} |\psi\_B(\mathbf{x}, \boldsymbol{y})|^2 d\mathbf{x} d\mathbf{y} = \int\_{\Delta\_k} D\_B e^{-\left(\frac{\boldsymbol{x}^2}{2\sigma\_{B\mathbf{x}}^2} + \frac{\boldsymbol{y}^2}{2\sigma\_{B\mathbf{y}}^2}\right)} d\mathbf{x} d\mathbf{y} \tag{54}
$$

We can prove that the parameters of the Gaussians, DA,σAx,σAy, DB,σBx,σBy can be determined in such a way that the above equations come true, and for the images of **Figures 2, 3**, exactly as we have done—using an approximation for the integrals, which we explain later.

If we open both slits it will be the normalized superposition of the two wave packets that quantum mechanically describes the new situation

$$
\psi\_{A\text{or}B}(\mathbf{x},\boldsymbol{\upchi}) = \frac{1}{\sqrt{2}} (\psi\_A(\mathbf{x},\boldsymbol{\upchi}) + \psi\_B(\mathbf{x},\boldsymbol{\upchi})) \tag{55}
$$

We have

$$
\begin{split}
\mu\_{k}(A\text{ or }B) &= \int\_{\Delta\_{k}} \psi\_{A\text{or}B}(\mathbf{x},\boldsymbol{\uprho})^{\*} \psi\_{A\text{or}B}(\mathbf{x},\boldsymbol{\uprho}) d\mathbf{x} d\mathbf{y} \\
&= \frac{1}{2} \left( \int\_{\Delta\_{k}} \psi\_{A}(\mathbf{x},\boldsymbol{\uprho})^{\*} \psi\_{A}(\mathbf{x},\boldsymbol{\uprho}) d\mathbf{x} d\mathbf{y} + \\
&\quad \int\_{\Delta\_{k}} \psi\_{B}(\mathbf{x},\boldsymbol{\uprho})^{\*} \psi\_{B}(\mathbf{x},\boldsymbol{\uprho}) d\mathbf{x} d\mathbf{y} \right) \\
&\quad + \int\_{\Delta\_{k}} \mathfrak{R}(\psi\_{A}(\mathbf{x},\boldsymbol{\uprho})^{\*} \psi\_{B}(\mathbf{x},\boldsymbol{\uprho})) d\mathbf{x} d\mathbf{y} \\
&= \frac{1}{2} (\mu\_{k}(A) + \mu\_{k}(B)) + \\
&\quad \int\_{\Delta\_{k}} \mathfrak{R}(\psi\_{A}(\mathbf{x},\boldsymbol{\uprho})^{\*} \psi\_{B}(\mathbf{x},\boldsymbol{\uprho})) d\mathbf{x} d\mathbf{y}
\end{split} \tag{56}$$

Let us calculate R 1k ℜ(ψA(x, y) <sup>∗</sup>ψB(x, y))dxdy. We have

$$\begin{split} &\int\_{\Delta\_{k}} \mathfrak{M}(\psi\_{A}(\mathbf{x},\mathbf{y})^{\*}\psi\_{B}(\mathbf{x},\mathbf{y}))d\mathbf{x}d\mathbf{y} \\ &= \int\_{\Delta\_{k}} \left(\sqrt{D\_{A}}e^{-\left(\frac{\mathbf{x}^{2}}{4\sigma\_{A\mathbf{x}}^{2}}+\frac{\mathbf{y}^{2}}{4\sigma\_{A\mathbf{y}}^{2}}\right)}\right) \left(\sqrt{D\_{B}}e^{-\left(\frac{(\mathbf{x}-\mathbf{a})^{2}}{4\sigma\_{B\mathbf{x}}^{2}}+\frac{(\mathbf{y}-\mathbf{b})^{2}}{4\sigma\_{B\mathbf{y}}^{2}}\right)}\right) \\ &\qquad \mathfrak{M}(e^{-iS\_{A}(\mathbf{x},\mathbf{y})}e^{iS\_{B}(\mathbf{x},\mathbf{y})})d\mathbf{x}d\mathbf{y} \\ &= \int\_{\Delta\_{k}} \left(\sqrt{D\_{A}D\_{B}}e^{-\left(\frac{\mathbf{x}^{2}}{4\sigma\_{A\mathbf{x}}^{2}}+\frac{(\mathbf{x}-\mathbf{a})^{2}}{4\sigma\_{B\mathbf{x}}^{2}}+\frac{\mathbf{y}^{2}}{4\sigma\_{A\mathbf{y}}^{2}}+\frac{(\mathbf{y}-\mathbf{b})^{2}}{4\sigma\_{B\mathbf{y}}^{2}}\right)}\right) \\ &\qquad \mathfrak{M}(e^{i(S\_{B}(\mathbf{x},\mathbf{y})-S\_{A}(\mathbf{x},\mathbf{y}))})d\mathbf{x}d\mathbf{y} \end{split}$$

$$=\int\_{\Delta\_{k}} \begin{pmatrix} \sqrt{D\_{A}D\_{B}}e^{-\left(\frac{\mathbf{x}^{2}}{4\sigma\_{A\mathbf{x}}^{2}} + \frac{(\mathbf{x}-\mathbf{a})^{2}}{4\sigma\_{B\mathbf{x}}^{2}} + \frac{\mathbf{y}^{2}}{4\sigma\_{A\mathbf{y}}^{2}} + \frac{(\mathbf{y}-\mathbf{b})^{2}}{4\sigma\_{B\mathbf{y}}^{2}}\right)}{\cos(S\_{B}\langle\mathbf{x},\mathbf{y}\rangle - S\_{A}\langle\mathbf{x},\mathbf{y}\rangle)d\mathbf{x}d\mathbf{y} \end{pmatrix} \tag{57}$$

We can hence rewrite (Equation 56) in the following way

$$\int\_{\Delta\_k} f(\mathbf{x}, \boldsymbol{y}) \cos \theta(\mathbf{x}, \boldsymbol{y}) d\mathbf{x} d\mathbf{y} = f\_k \tag{58}$$

where

$$f(\mathbf{x}, \boldsymbol{y}) = \sqrt{D\_A D\_B} e^{-\left(\frac{\mathbf{x}^2}{4\sigma\_{A\mathbf{x}}^2} + \frac{(\mathbf{x} - \mathbf{a})^2}{4\sigma\_{B\mathbf{x}}^2} + \frac{\mathbf{y}^2}{4\sigma\_{A\mathbf{y}}^2} + \frac{(\mathbf{y} - \mathbf{b})^2}{4\sigma\_{B\mathbf{y}}^2}\right)}\tag{59}$$

is a known Gaussian-like function, remember that we have determined DA, DB, σAx, σAy, σBx, σBy and a and b in choosing a solution to be seen in **Figures 3, 4**, and

$$f\_k = \mu\_k(A \text{ or } B) - \frac{1}{2}(\mu\_k(A) + \mu\_k(B))\tag{60}$$

are constants for each k determined by the data, and we have introduced

$$\theta(\mathbf{x}, \boldsymbol{\chi}) = \mathbf{S}\_B(\mathbf{x}, \boldsymbol{\chi}) - \mathbf{S}\_A(\mathbf{x}, \boldsymbol{\chi}) \tag{61}$$

the field of phase differences of the two quantum wave packets. This field of phases differences will determine the interference pattern and it is the solution of the 24 nonlinear Equations in (58). This set of 24 equations cannot be solved exactly, but even a general numerical solution is not straightforwardly at reach within actual optimization programs. We have introduces two steps of idealization to find a solution. First, we have looked for a solution where θ(x, y) is a large enough, polynomial in x and y, more specifically consisting of 24 independent sub-polynomials that are independent

$$\begin{aligned} \theta(\mathbf{x}, \mathbf{y}) &= F\_1 + F\_2 \mathbf{x} + F\_3 \mathbf{y} + F\_4 \mathbf{x}^2 + F\_5 \mathbf{x} \mathbf{y} + F\_6 \mathbf{y}^2 + F\_7 \mathbf{x}^3 \\ &+ F\_8 \mathbf{x}^2 \mathbf{y} + F\_9 \mathbf{x} \mathbf{y}^2 + F\_{10} \mathbf{y}^3 + F\_{11} \mathbf{x}^4 + F\_{12} \mathbf{x}^3 \mathbf{y} \\ &+ F\_{13} \mathbf{x}^2 \mathbf{y}^2 + F\_{14} \mathbf{x} \mathbf{y}^3 + F\_{15} \mathbf{y}^4 + F\_{16} \mathbf{x}^5 + F\_{17} \mathbf{x}^4 \mathbf{y} \\ &+ F\_{18} \mathbf{x}^3 \mathbf{y}^2 + F\_{19} \mathbf{x}^2 \mathbf{y}^3 + F\_{20} \mathbf{x} \mathbf{y}^4 + F\_{21} \mathbf{y}^5 + F\_{22} \mathbf{x}^6 \\ &+ F\_{23} \mathbf{x}^5 \mathbf{y} + F\_{24} \mathbf{x}^4 \mathbf{y}^2 \end{aligned} \tag{62}$$

Secondly, we suppose that 1<sup>k</sup> = 1 is a sufficiently small square surface such that a good approximation of the integral in Equation (58)—and it is also the approximation we have used for the integrals (Equations 53 and 54)—is given by 1 times the value of the function under the integral in the center of 1. This transforms the set of 24 nonlinear (Equation 58) into a set of 24 linear equations

$$
\Delta f(\varkappa\_k, \mathcal{y}\_k) \theta(\varkappa\_k, \mathcal{y}\_k) = f\_k \tag{63}
$$

We have solved them for the points (x<sup>k</sup> , y<sup>k</sup> ) where the 24 items are located in **Figures 2, 3**, for 1 = 0.01, which gives us θ(x, y), and hence also the expression for |ψAorB(x, y)| 2 containing the expected interference term, and we have

$$\begin{split} \left| \psi\_{A\text{or}B}(\mathbf{x}, \boldsymbol{\uprho}) \right|^{2} &= \frac{1}{2} (\left| \psi\_{A}(\mathbf{x}, \boldsymbol{\uprho}) \right|^{2} + \left| \psi\_{B}(\mathbf{x}, \boldsymbol{\uprho}) \right|^{2}) \\ &+ \left| \psi\_{A}(\mathbf{x}, \boldsymbol{\uprho}) \psi\_{B}(\mathbf{x}, \boldsymbol{\uprho}) \right| \cos \theta(\mathbf{x}, \boldsymbol{\uprho}) \end{split} \tag{64}$$

FIGURE 4 | The probabilities µ(A or B)<sup>k</sup> of a person choosing the item k as a "good example" of Fruits or Vegetables are fitted into the two-dimensional quantum wave function <sup>√</sup><sup>1</sup> 2 (ψA(x, y) + ψB(x, y)), which is the normalized superposition of the wave functions in Figures 2, 3. The

numbers are placed at the locations of the different exemplars with respect to the probability distribution |ψ*A*(*x*, *y*) + ψ*B*(*x*, *y*)| <sup>2</sup> <sup>=</sup> 1 2 (|ψ*A*(*x*, *y*)| <sup>2</sup> +|ψ*B*(*x*, *<sup>y</sup>*)<sup>|</sup> 2) +|ψ*A*(*x*, *<sup>y</sup>*)ψ*B*(*x*, *<sup>y</sup>*)<sup>|</sup> cos <sup>θ</sup>(*x*, *<sup>y</sup>*), where <sup>θ</sup>(*x*, *<sup>y</sup>*) is the quantum phase difference at (*x*, *<sup>y</sup>*). The values of <sup>θ</sup>(*x*, *<sup>y</sup>*) are given in Table 1 for the locations of the different items. The interference pattern is clearly visible.


*The first column lists the different items, and the second column the coordinates of their locations in* Figures 2*–*4*. The third column contains the orthogonal set of sub-polynomials used as first approximation for the phase field* θ(*x*, *y*)*, and the fourth column their values. The fifth and sixth columns contain the Gaussian parameters and their values of the solution.*

In **Figure 4** we have graphically represented this probability density |ψAorB(x, y)| 2 . The interference pattern shown in **Figure 4** is very similar to well-known interference patterns of light passing through an elastic material under stress. In our case, it is the interference pattern corresponding to "Fruits or Vegetables" as a contextual, interfering prototype. The numerical values of the solutions represented in **Figures 2**–**4** are in **Table 2**.

We have thus completed our illustration of contextual interfering prototypes. It is, however, important to remember that this representation is at the subtle level of an illustration, while the real working representation of contextual interfering prototypes needs the complete quantum-mechanical formalism. It can be considered as a pre-representation, exactly as the wave-like representations by de Broglie and Schrödinger in the early days of quantum physics can be considered as useful prequantum representations that capture something of the wave aspects of microscopic particles.

# 5. DISCUSSION

In this paper we showed that a generalization of prototype theory can address the "Pet-Fish problem" and related combination issues. This was done by formalizing the effect of the cognitive context on the state of a concept using a SCoP formalism (Gabora and Aerts, 2002; Aerts and Gabora, 2005a,b; Gabora et al., 2008). We also developed a quantum-theoretic model in complex Hilbert space to show that, in this contextualized prototype theory, prototypes can interfere when concepts combine, as evidenced by data where typicality measurements are performed. This could then lead one to think that the general quantum approach to concepts only presupposes a (contextual) prototype theory. We now explain why this inference is not true.

Let us make more explicit the relationship between our quantum-conceptual approach and other concept theories, such as prototype theory, exemplar theory and theory theory. A deeper analysis shows that our approach is more than a contextual generalization of prototype theory. Roughly speaking, other theories make assumptions about the principles guiding the formation and intuitive representation of a concept in the human mind. Thus, prototype theory assumes that a concept is determined by a set of characteristic rather than defining features, the human mind has a privileged prototype for each concept, and typicality of a concrete item is determined by its similarity with the prototype (Rosch, 1973, 1978, 1983). Exemplar theory assumes instead that a concept is not determined by a set of defining or characteristic features but, rather, by a set of salient instances of it stored in memory (Nosofsky, 1988, 1992). Theory theory assumes that concepts are determined by "minitheories" or schemata, identifying the causal relationships among properties (Murphy and Medin, 1985; Rumelhart and Norman, 1988). These theories have all mainly been preoccupied with the question of "what predominantly determines a concept." We agree on the relevance of this question, though it is not the main issue focused on there. Transposed to our approach, these theories mainly investigate "what predominantly determines the state of a concept." Conversely, the main preoccupation of our approach has been to propose a theory with the following features:


We seek to achieve (i) and (ii) independent of the question that is the focus of other theories of concepts. More concretely, and in accordance with the results of investigations into the question of "what predominantly determines a concept," as far as prototype theory, exemplar theory and theory theory are concerned, we believe that all approaches are partially valid. The state of a concept, i.e., its capability of influencing the values of measurable semantic quantities, such as typicality and membership weight, is influenced by the set of its characteristic features, but also by salient exemplars in memory, and in a considerable number of cases—where more causal aspects are at play—mini-theories might be appropriate to express this state. It is important that "a conceptual state is defined and gives rise, together with the context, to the values of the measurable semantic quantities." The fact that the specification of these values can be only probabilistic is a confirmation that potentiality and uncertainty occur even if the state is completely known, hence quantum structures are intrinsically needed.

It follows from the above that resorting and giving new life to prototype theory does not necessarily entail that contextual prototype theory is the only possible theory of concepts for what concerns the question of "what predominantly determines a concept." However, we choose to identify our general approach as a "generalized contextual interfering prototype theory," because the "ground state" of a concept is a fundamental notion of the theory, and this ground state is what corresponds to the prototype. There is not a similar affinity with exemplar theory and theory theory. However, the conceptual state and its interaction with the cognitive context can potentially capture the other conceptual aspects, exemplars and schemata, which are instead predominant in alternative concept theories. In this respect, an interesting analogy must be emphasized. The quantum-theoretic approach only aims at modeling concepts and their combinations in a unitary and coherent mathematical formalism. We do not pretend to give a universal definition of what a concept is and how it forms. Using a known analogy in mathematics, we can say that the quantum-theoretic model is to a concept as a traditional Kolmogorov model is to a probability. A Kolmogorovian model specifies how a probability can be mathematically formalized independent of the definition of probability that is chosen (favorable over possible cases, large number limit of frequencies, subjective, etc.). Analogously, the quantum-theoretic framework for concepts enables mathematical modeling of conceptual entities independent of the definition that is adopted in a specific concept theory (prototype, exemplar, theory, etc.).

We conclude with an epistemological consideration. The quantum-theoretic framework presented here constitutes a step toward the elaboration of a general theory for the representation of any conceptual entity. Hence, it is not just a "cognitive model for typicality, membership weight or membership probability." Rather, we are investigating whether "quantum theory, in its Hilbert space formulation, is an appropriate theory to model human cognition." To understand what we mean by this let us consider an example taken from everyday life. As an example of a theory, we could introduce the theory of "how to make good clothes." A tailor needs to learn how to make good clothes for different types of people, men, women, children, old people, etc. Each cloth is a model in itself. Then, one can also consider intermediate situations where one has models of series of clothes. A specific body will not fit in any clothes: you need to adjust the parameters (length, size, etc.) to reach the desired fit. We think that a theory should be able to reproduce different experimental results by suitably adjusting the involved parameters, exactly as a theory of clothing. This is different from a set of models, even if the set can cope with a wide range of data.

There is a tendency, mainly in empirically-based disciplines, to be critical with respect to a theory that can cope with all possible situations it applies to. This is because the theory contains too many parameters, which may lead one to think that "any type of data can be modeled by allowing all these parameters to have different values." We agree that, in case we have to do with an "ad-hoc model," i.e., a model specially made for the circumstance of the situation it models, this suspicion is grounded. Adding parameters to such an ad-hoc model, or stretching the already contained parameters to other values, does not give rise to what we call a theory. On the other hand, a theory needs to be well defined, its rules, the allowed procedures, its theoretical, mathematical, and internal logical structure, "independent" of the structure of the models describing specific situations that can be coped with by the theory. Hence also the theory needs to contain a well defined description of "how to produce models for specific situations." Coming back to the theory of clothing, if a tailor knows the theory of clothing, obviously he or she can make a cloth for every human body, because the theory of clothing, although its structure is defined independently of a specific cloth, contains a prescription of how to apply it to any possible specific cloth. In this respect, we think that one should carefully distinguish between a model that is derived by a general theory, as the one presented in this paper, and a model specifically designed to test a number of experimental situations.

This brings us to the important question of the "predictive power" of existing quantum-theoretic models. Models derived from a theory will generally need more data from a bigger set of experiments to become predictive for the outcomes of other not yet performed experiments than this is the case for models that are more ad-hoc. The reason is that in principle such models—think of the analogy we present with the theory of clothing above—must be able to faithfully represent the data of all possible experiments that can be performed on the conceptual entity in the same state. A tailor knowing the theory of clothing can in principle make clothes for all human bodies but hence also predicts outcomes of not performed experiments, e.g., the measure of a specific part of the cloth, if enough data of a set of experiments are available to the tailor, e.g., data that determine the possible types of clothes still fitting these data and as a consequence also determine the measure of this part of the clothe. In general in quantum cognition, the scarcity of data is preventing models from having systematic and substantial predictive power. One can wonder, if predictive power is not yet predominantly available in the majority of existing quantumtheoretic models, why so much attention and value is actually attributed to them? Answering this question allows us to clarify an aspect of quantum cognition that is not obvious and even makes it special in a specific way, at least provisionally until more data is available. The success of quantum cognition is due to it "being able to convincingly model data that theoretically can be proven to be impossible to model with any model that relies on classical fuzzy set theory and/or classical Kolmogorovian probability theory." Hence, a different criterion than predictive power is provisionally used to identify the success of quantum cognition. Of course, as soon as more data are collected, the models will also be able to be tested for their predictive power. Recent work in quantum cognition is starting to reach the level of being predictive, for example study of order effects (Wang et al., 2014), and an elaboration and refinement of the model presented in this article (Aerts et al., 2015a,c). The latter model simultaneously investigates the "conjuntion" and the "negation" of concepts, starting from data collected on such conceptual combinations. To explain the exact nature and also accurateness of the predictive power we gained in the model in Aerts et al. (2015a,c), consider the following mathematical expression

$$I\_{ABA'B'} = 1 - \mu(A \text{ and } B) - \mu(A \text{ and } B') - \mu(A' \text{ and } B) - \mu(A' \text{ and } B') \tag{65}$$

where A and B are the concepts Fruits and Vegetables, respectively, while A ′ and B ′ are their negations. Thus, "A and B ′ " means Fruits and not Vegetables, while "A ′ and B" means not Fruits and Vegetables and "A ′ and B ′ " means not Fruits and not Vegetables. In Aerts et al. (2015a,c) we published the data for the outcomes of experiments that test the membership of the same 24 items which we considered in the present article, but this time not only for the conjunction of A and B, but also for the conjunctions "A and B ′ ," "A ′ and B," and "A ′ and B ′ ." Suppose that the data follow a classical probabilistic structure, then IABA′B′ has to be theoretically equal to zero for each considered item, and this purely follows from a general "law of probability calculus" related to the so called "de Morgan laws" of classical probability. This means that, under the hypothesis of a classical probabilistic structure, if we measure the relative frequencies of "A and B," "A and B ′ " and "A ′ and B," and hence determine experimentally the values of µ(A and B), µ(A and B ′ ) and µ(A ′ and B), a "prediction" for µ(A ′ and B ′ ) can be made theoretically, namely,

$$\mu(A' \text{ and } B') = 1 - \mu(A \text{ and } B) - \mu(A \text{ and } B') - \mu(A' \text{ and } B) \tag{66}$$

for each considered item. Let is explain what are our findings in Aerts et al. (2015a,c) that make it possible for us to speak of some specific type of predictability for the more elaborated and refined model we developed for the combination of concepts and their negations. In Aerts et al. (2015a,c) we have collected data not only for the pair of concepts Fruits and Vegetables and the 24 items treated also in the present article, but for three more pairs of concepts, and for each of them again 24 items. Due to the already identified non classical nature of overextension of the conjunction we expected that IABA′B′ would not be equal to zero, and that indeed showed to be the case. However, we detected a high level of systematics of the value of IABA′B′ fluctuating around an average of −0.81. A statistical analysis showed the different values for individual items to be possible to be explained as fluctuations around this average (see Tables 1–4 in Aerts et al., 2015a). Next to the detailed statistical analysis to be found in Aerts et al. (2015a) we also put forward a theoretical explanation of this value. The elaborated and refined model for concept combinations developed in Aerts et al. (2015a) introduces within the model the combination of a pure quantum model and a classical model. It can be shown that for a pure quantum model the value of IABA′B′ would be −1. We also find that the quantum effects are dominant as compared to the classical effects in case concepts are combined, which explains why our refined model gives rise to a value of IABA′B′ in between the classical one, which is 0, and the pure quantum one, which is −1, but closer to the quantum one, hence −0.81. This finding can be turned into a predictive one in the following way. Suppose we measure µ(A and B), µ(A and B ′ ) and µ(A ′ and B) for two arbitrary concepts and an item. Our model allows us to put forward the following prediction for µ(A ′ and B ′ )

$$
\mu(A' \text{ and } B') = 1.81 - \mu(A \text{ and } B) - \mu(A \text{ and } B') - \mu(A' \text{ and } B) \tag{67}
$$

By comparing Equations (66) and (67), we get that the quantum-theoretic model in Aerts et al. (2015a) provides a "different prediction" from a classical probabilistic model satisfying the axioms of Kolmogorov, and experiments confirm the validity of the former over the latter. We add that the quantum model has different predictions from a classical model also for the values of other functions than IABA′B′ , and these predictions are "parameter independent," in the sense that they do not depend on the values of free parameters that may accommodate the data.

The results above can be considered as a strong confirmation that quantum-theoretic models of concept combinations provide predictions that deviate, in some situations, from the predictions of classical Kolmogorovian models, which is confirmed by experimental data.

# REFERENCES


# AUTHOR CONTRIBUTIONS

All authors listed, have made substantial, direct and intellectual contribution to the work, and approved it for publication.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Aerts, Broekaert, Gabora and Sozzo. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# APPENDIX

# A. Quantum Mathematics for Conceptual Modeling

We illustrate in this section how the mathematical formalism of quantum theory can be applied to model situations outside the microscopic quantum world, more specifically, in the representation of concepts and their combinations. We will limit technicalities to the essential.

When the quantum mechanical formalism is applied for modeling purposes, each considered entity—in our case a concept—is associated with a complex Hilbert space H, that is, a vector space over the field C of complex numbers, equipped with an inner product h·|·i that maps two vectors hA| and |Bi onto a complex number hA|Bi. We denote vectors by using the bra-ket notation introduced by Paul Adrien Dirac, one of the pioneers of quantum theory. Vectors can be "kets," denoted by |Ai, |Bi, or "bras," denoted by hA|, hB|. The inner product between the ket vectors |Ai and |Bi, or the bra-vectors hA| and hB|, is realized by juxtaposing the bra vector hA| and the ket vector |Bi, and hA|Bi is also called a "bra-ket," and it satisfies the following properties:

(i) hA|Ai ≥ 0;


From (ii) and (iii) follows that inner product h·|·i is linear in the ket and anti-linear in the bra, i.e., (zhA| + thB|)|Ci = z ∗ hA|Ci + t ∗ hB|Ci.

The "absolute value" of a complex number is defined as the square root of the product of this complex number times its complex conjugate, that is, <sup>|</sup>z| = <sup>√</sup> z ∗z. Moreover, a complex number z can either be decomposed into its cartesian form z = x + iy, or into its polar form z = |z|e <sup>i</sup><sup>θ</sup> = |z|(cos <sup>θ</sup> <sup>+</sup> <sup>i</sup>sin <sup>θ</sup>). As a consequence, we have |hA|Bi| = <sup>√</sup> hA|BihB|Ai. We define the "length" of a ket (bra) vector |Ai (hA|) as |||Ai|| = ||hA||| = √ hA|Ai. A vector of unitary length is called a "unit vector." We say that the ket vectors |Ai and |Bi are "orthogonal" and write |Ai ⊥ |Bi if hA|Bi = 0.

We have now introduced the necessary mathematics to state the first modeling rule of quantum theory, as follows.

#### A.1. First Quantum Modeling Rule

A state A of an entity—in our case a concept—modeled by quantum theory is represented by a ket vector |Ai with length 1, that is hA|Ai = 1.

An orthogonal projection M is a linear operator on the Hilbert space, that is, a mapping <sup>M</sup> : <sup>H</sup> <sup>→</sup> <sup>H</sup>, <sup>|</sup>Ai 7→ <sup>M</sup>|A<sup>i</sup> which is Hermitian and idempotent. The latter means that, for every <sup>|</sup>Ai, <sup>|</sup>Bi ∈ <sup>H</sup> and <sup>z</sup>, <sup>t</sup> <sup>∈</sup> <sup>C</sup>, we have:


(iii) M · M = M (idempotency).

The identity operator 1 maps each vector onto itself and is a trivial orthogonal projection. We say that two orthogonal projections M<sup>k</sup> and M<sup>l</sup> are orthogonal operators if each vector contained in M<sup>k</sup> (H) is orthogonal to each vector contained in Ml (H), and we write <sup>M</sup><sup>k</sup> <sup>⊥</sup> <sup>M</sup><sup>l</sup> , in this case. The orthogonality of the projection operators M<sup>k</sup> and M<sup>l</sup> can also be expressed by MkM<sup>l</sup> = 0, where 0 is the null operator. A set of orthogonal projection operators {M<sup>k</sup> |k = 1,... , n} is called a "spectral family" if all projectors are mutually orthogonal, that is, M<sup>k</sup> ⊥ M<sup>l</sup> for <sup>k</sup> 6= <sup>l</sup>, and their sum is the identity, that is, <sup>P</sup><sup>n</sup> <sup>k</sup> <sup>=</sup> <sup>1</sup> <sup>M</sup><sup>k</sup> <sup>=</sup> <sup>1</sup>.

The above definitions give us the necessary mathematics to state the second modeling rule of quantum theory, as follows.

#### A.2. Second Quantum Modeling Rule

A measurable quantity Q of an entity—in our case a concept modeled by quantum theory, and having a set of possible real values {q1, ... , qn} is represented by a spectral family {M<sup>k</sup> |k = 1, ... , n} in the following way. If the entity—in our case a concept—is in a state represented by the vector |Ai, then the probability of obtaining the value q<sup>k</sup> in a measurement of the measurable quantity Q is hA|M<sup>k</sup> |Ai = ||M<sup>k</sup> <sup>|</sup>Ai||<sup>2</sup> . This formula is called the "Born rule" in the quantum jargon. Moreover, if the value q<sup>k</sup> is actually obtained in the measurement, then the initial state is changed into a state represented by the vector

$$|A\_k\rangle = \frac{M\_k |A\rangle}{||M\_k |A\rangle ||}\tag{A1}$$

This change of state is called "collapse" in the quantum jargon.

The tensor product <sup>H</sup>A⊗H<sup>B</sup> of two Hilbert spaces <sup>H</sup><sup>A</sup> and <sup>H</sup><sup>B</sup> is the Hilbert space generated by the set {|Aii ⊗ |Bji}, where |Aii and <sup>|</sup>Bj<sup>i</sup> are vectors of <sup>H</sup><sup>A</sup> and <sup>H</sup>B, respectively, which means that a general vector of this tensor product is of the form P ij |Aii ⊗ |Bji. This gives us the necessary mathematics to introduce the third modeling rule.

#### A.3. Third Quantum Modeling Rule

A state C of a compound entity—in our case a combined concept—is represented by a unit vector|Ci of the tensor product <sup>H</sup><sup>A</sup> <sup>⊗</sup> <sup>H</sup><sup>B</sup> of the two Hilbert spaces <sup>H</sup><sup>A</sup> and <sup>H</sup><sup>B</sup> containing the vectors that represent the states of the component entities concepts.

The above means that we have |Ci = P ij cij|Aii ⊗ |Bji, where <sup>|</sup>Ai<sup>i</sup> and <sup>|</sup>Bj<sup>i</sup> are unit vectors of <sup>H</sup><sup>A</sup> and <sup>H</sup>B, respectively, and P i,j |cij| <sup>2</sup> <sup>=</sup> 1. We say that the state <sup>C</sup> represented by |Ci is a product state if it is of the form |Ai ⊗ |Bi for some <sup>|</sup>Ai ∈ <sup>H</sup><sup>A</sup> and <sup>|</sup>Bi ∈ <sup>H</sup>B. Otherwise, <sup>C</sup> is called an "entangled state."

The Fock space is a specific type of Hilbert space, originally introduced in quantum field theory. For most states of a quantum field the number of identical quantum entities is not conserved but is a variable quantity. The Fock space copes with this situation in allowing its vectors to be superpositions of vectors pertaining to different sectors for fixed numbers of identical quantum entities. MoreA explicitly, the k-th sector of a Aock space describes a fixed number of k identical quantum entities, and it is of the form <sup>H</sup> <sup>⊗</sup> ... <sup>⊗</sup> <sup>H</sup> of the tensor product of <sup>k</sup> identical Hilbert spaces H. The Aock space A itself is the direct sum of all these sectors, hence

$$\mathcal{A} = \oplus\_{k=1}^{j} \otimes\_{l=1}^{k} \mathcal{H} \tag{A2}$$

Aor our modeling we have only used Aock space for the "two" and "one quantum entity" case, hence <sup>A</sup> <sup>=</sup> <sup>H</sup> <sup>⊕</sup> (<sup>H</sup> <sup>⊗</sup> <sup>H</sup>). This is due to considering only combinations of two concepts. The sector <sup>H</sup> is called the "first sector," while the sector <sup>H</sup> <sup>⊗</sup> <sup>H</sup> is called the "second sector." A unit vector <sup>|</sup>Fi ∈ <sup>F</sup> is then written as <sup>|</sup>Fi = nei<sup>γ</sup> <sup>|</sup>Ci + mei<sup>δ</sup> (|Ai ⊗ |Bi), where |Ai, |Bi and |Ci are unit vectors of H, and such that n <sup>2</sup> <sup>+</sup> <sup>m</sup><sup>2</sup> <sup>=</sup> 1. For combinations of j concepts, the general form of Fock space in Equation (A2) should be used.

The quantum modeling above can be generalized by allowing states to be represented by the so called "density operators" and measurements to be represented by the so called "positive operator valued measures." However, for the sake of brevity we will not dwell on this extension here.

# Quantum structure of negation and conjunction in human thought

#### Diederik Aerts <sup>1</sup> , Sandro Sozzo<sup>2</sup> \* and Tomas Veloz <sup>3</sup>

*<sup>1</sup> Center Leo Apostel for Interdisciplinary Studies (CLEA), Free University of Brussels (VUB), Brussels, Belgium, <sup>2</sup> School of Management, Institute for Quantum Social and Cognitive Science, University of Leicester, Leicester, UK, <sup>3</sup> Department of Mathematics, University of British Columbia, Kelowna, BC, Canada*

We analyze in this paper the data collected in a set of experiments investigating how people combine natural concepts. We study the mutual influence of conceptual conjunction and negation by measuring the membership weights of a list of exemplars with respect to two concepts, e.g., *Fruits* and *Vegetables*, and their conjunction *Fruits And Vegetables*, but also their conjunction when one or both concepts are negated, namely, *Fruits And Not Vegetables*, *Not Fruits And Vegetables*, and *Not Fruits And Not Vegetables*. Our findings sharpen and advance existing analysis on conceptual combinations, revealing systematic deviations from classical (fuzzy set) logic and probability theory. And, more important, our results give further considerable evidence to the validity of our quantum-theoretic framework for the combination of two concepts. Indeed, the representation of conceptual negation naturally arises from the general assumptions of our two-sector Fock space model, and this representation faithfully agrees with the collected data. In addition, we find a new significant and a priori unexpected deviation from classicality, which can exactly be explained by assuming that human reasoning is the superposition of an "emergent reasoning" and a "logical reasoning," and that these two processes are represented in a Fock space algebraic structure.

#### Edited by:

*Gabriel Radvansky, University of Notre Dame, USA*

#### Reviewed by:

*Ken McRae, University of Western Ontario, Canada Maria Luisa Dalla Chiara, University of Florence, Italy*

#### \*Correspondence:

*Sandro Sozzo, School of Management, Institute for Quantum Social and Cognitive Science, University of Leicester, University Road, Leicester LE1 7RH, UK ss831@le.ac.uk*

#### Specialty section:

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology*

Received: *26 June 2015* Accepted: *10 September 2015* Published: *30 September 2015*

#### Citation:

*Aerts D, Sozzo S and Veloz T (2015) Quantum structure of negation and conjunction in human thought. Front. Psychol. 6:1447. doi: 10.3389/fpsyg.2015.01447* Keywords: cognition, concept theory, quantum structures, fock space, conceptual emergence, concept formation

# 1. Introduction

Substantial evidence of presence of quantum structures in processes connected with human behavior and cognition has been put forward in the last decade. More specifically, such quantum structures were identified in situations of decision making and in the structure of language (see e.g., Aerts, 2009; Khrennikov, 2010; Busemeyer and Bruza, 2012; Aerts et al., 2013b; Haven and Khrennikov, 2013; Pothos and Busemeyer, 2013; Wang et al., 2014). The success of this quantum modeling is interpreted as due to "descriptive effectiveness of the mathematical apparatus of quantum theory as formal instrument to model cognitive dynamics and structures in situations where classical set-based approaches are problematical." In particular, the mathematics of quantum theory in Hilbert space has proved very successful in modeling combinations of two concepts (Aerts, 2009; Aerts and Sozzo, 2001, 2014; Aerts and Gabora, 2005a,b; Aerts et al., 2013a,b; Sozzo, 2014, 2015).

The "combination problem," that is, the question of how the representation of the combination of two or more natural concepts can be connected to the representation of the component concepts, has been studied experimentally and within classical concept theories in great detail in the last 30 years. The main experimental challenges to traditional modeling approaches to concepts combinations are sketched in the following<sup>1</sup> .


What one typically finds in the above situations is a failure of set-theoretic approaches (classical set, fuzzy set, Kolmogorovian probability) to supply satisfactory theoretic models for the experimentally observed patterns. Indeed, all traditional approaches to concept theory [mainly, "prototype theory" (Rosch, 1973, 1978, 1983), "exemplar theory" (Nosofsky, 1988, 1992), and "theory theory" (Murphy and Medin, 1985; Rumelhart and Norman, 1988)] and concept representation [mainly, "extensional" membership-based (Zadeh, 1982; Rips, 1995) and "intensional" attribute-based (Hampton, 1988b; Minsky, 1975; Hampton, 1997)] have structural difficulties to cope with the experimental data exactly where the "graded," or "vague" nature of these data abundantly violates (fuzzy) set-theoretic structures (Osherson and Smith, 1982; Zadeh, 1982), indicating that this violation of set-theoretic structures is the core of the problem. This situation is experienced as one of the major problems in the domain of traditional concept theories and an obstacle for progress (Komatsu, 1992; Fodor, 1994; Kamp and Partee, 1995; Rips, 1995; Hampton, 1997; Osherson and Smith, 1997).

Important results in concept research and modeling have been obtained in the last decade within the approach of quantum cognition in which our research group has substantially contributed. We cannot report in detail the results attained in our approach, for obvious reasons of space limits. We limit ourselves to summarize the fundamentals and attach relevant bibliographic sources in the following.


<sup>1</sup>One typically gains insight into how people combine concepts by gathering data on "typicalities" or "membership weights." To obtain data on "typicalities," participants are given a concept, and a list of instances or exemplars, and asked to pick which exemplar they consider most typical of the concept. A membership weight is instead obtained by asking people to estimate the membership of specific exemplars with respect to a concept. This estimation can, e.g., be quantified by using 7-point (Likert) scale and then converted into a relative frequency and then into a probability called the "membership weight." We have worked on both typicality measurements, as in the analysis of the Guppy effect, and membership weights measurements, as in the analysis of Hampton's experiments and in the present paper.

(c) This quantum-theoretic framework was successfully applied to describe more complex situations, such as borderline vagueness (Sozzo, 2014) and the effects of negation on conceptual conjunction (Sozzo, 2015). In addition, specific conceptual combinations experimentally revealed the presence of further genuine quantum effects, namely, entanglement (Aerts et al., 2013a,b; Aerts and Sozzo, 2001, 2014) and quantum-type indistinguishability (Aerts et al., 2015c). Finally, other phenomena related to concept combination, such as "Ellsberg and Machina decision making paradoxes" (Ellsberg, 1961; Machina, 2009) were successfully modeled in the same quantum-conceptual framework (Aerts et al., 2012, 2014).

There has been very little research on how people interpret and combine negated concepts. In a seminal study, Hampton (1997) considered in a set of experiments both conjunctions of the form Games Which Are Also Sports and conjunctions of the form Games Which Are Not Sports. His work confirmed overextension in both types of conjunctions, also showing a violation of Boolean classical logical rules for the negation, which has recently been confirmed by ourselves (Sozzo, 2015). In the present paper we extend the collection of data in Sozzo (2015) with the aim of further exploring the use of negation in conceptual combinations and, more generally, the underlying logical structures being at work in human thought in the course of cognitive processes (Aerts et al., 2015a). Let us first put forward a specific comment with respect to the "negation of a concept." From the perspective of prototype theory, for quite some concepts the negation of a concept can be considered as a "singular concept," since it does not have a well defined prototype. In fact, while it is plain to determine the nonmembership of, e.g., Fruit, this does not seem to lead to the determination involving a similarity with some prototype of Not Fruit. Some authors maintain, for this reason, that single negated concepts have little meaning and that conceptual negations can be evaluated only in conjunctions of the form Fruits Which Are Not Vegetables (Hampton, 1997). We agree that there is an asymmetry between the way people estimate the membership of an exemplar, e.g., Apple, with respect to a positive concept, e.g., Fruits, and the way people estimate the membership of the same exemplar with respect to its negative counterpart, e.g., Not Fruits. Notwithstanding this, we believe it is meaningful to explicitly introduce the concept Not Fruits in our research. First of all, because we do not confine our concept modeling to prototype theory, on the contrary, our approach is more general, the basic structure of prototype theory can be recovered if we limit the concepts to be in their ground states (Aerts and Gabora, 2005a,b). Secondly, we will see that the quantum modeling elaborated in the present paper sheds light exactly on this problem, namely, the "negated concept" only appears as a full concept in "one part of the representation," while is treated as "non-membership with respect to the positive concept" in the other part. Hence, quantum-conceptual framework copes with this problem in a natural way.

Let us proceed by steps, summarizing the major findings in this paper, as follows.

In Section 2 we illustrate design and procedure of the four cognitive experiments we performed. In the first experiment, we tested the membership weights of four sets of exemplars with respect to four pairs (A, B) of concepts and their conjunction "A and B." In the second experiment, we tested the membership weights of the same four sets of exemplars with respect to the same four pairs (A, B) of concepts, but negating the second concept, hence actually considering A, B ′ and the conjunction "A and B ′ ." In the third experiment, we tested the membership weights obtained considering A ′ , B and the conjunction "A ′ and B." Finally, in the fourth experiment, we considered the membership weights obtained by negating both concepts, hence actually considering A ′ , B ′ and the conjunction "A ′ and B ′ ."

We investigate the representability of the collected data, reported in Appendix A3, in a "single classical Kolmogorovian probability space" (Kolmogorov, 1933). Basic notions and results on probability measures and classical modeling are briefly reviewed in Appendix A1. We prove theorems providing necessary and sufficient conditions for the modeling of the conceptual conjunctions "A and B," "A and B ′ ," "A ′ and B," and "A ′ and B ′ " in such a single classical Kolmogorovian framework. Then, we observe that the data significantly violate our theorems. More specifically, our analysis of classicality for the presence of conjunction and negation together leads to five classicality conditions that should be simultaneously satisfied by the data to fit into one classical probability framework together. When we analyze the deviations of our data with respect to these five conditions we also find a very strong, stable and systematic pattern of violation, i.e., the deviation has the same numerical values even over different pairs of concepts. That the violation is numerically the same independently of the considered pair of concepts indicates that we have identified a non-classical mechanism in human thought which is linked to the depth of concept formation itself, independent of the specific meaning for a specific pair of concepts and a specific set of considered exemplars. This was for ourselves a first surprising and unexpected finding, and we have recently devoted an article to investigate it in depth (Aerts et al., 2015b).

A second major and equally unexpected finding was that the numerical size of the "deviation of classicality pattern" can exactly be predicted in our quantum-theoretic model in two-sector Fock space. And, more, it can be explained by assuming that human reasoning is the superposition of two simultaneous processes, a "logical reasoning" and a "conceptual," or "emergent," "reasoning." Logical reasoning combines cognitive entities (concepts, combinations of concepts, propositions, etc.) by applying the rules of logic, though generally in a probabilistic way. Emergent reasoning instead enables formation of combined cognitive entities as newly emerging entities (new concepts, new propositions, etc.), carrying new meaning, linked to the meaning of the component cognitive entities, but with a connection not defined by the algebra of logic. Emergent reasoning can be modeled in first sector of Fock space and, at variance with widespread beliefs, is dominant in our approach. Logical reasoning can be modeled in second sector of Fock space, hence one expects that classical logical rules

hold in this sector, like we explicitly prove here for conceptual conjunctions and negations (see also Aerts et al., 2015b).

Our quantum-theoretic model in two-sector Fock space for conceptual negations and conjunctions is elaborated in Section 3. It naturally extends the model in Aerts (2009) and follows the general lines traced in Sozzo (2014, 2015). It is however important to notice that the simultaneous modeling of conjunction and negation requires the introduction of two new conceptual steps which were not needed in the modeling of conjunction pairs: (i) the introduction of entangled states in second sector of Fock space, which enables formalizing the situation where probabilities in second sector can be formed by a "product procedure," even if they are not independent—this is an aspect of the Fock space model we had not understood in our earlier modeling, hence we could consider it a further new surprising finding of the investigation presented in this paper; (ii) the handling of "negation" in second sector by "logical inversion," similarly like we handled conjunction in second sector by "product," more concretely, an experiment with "negation" with respect to a concept is treated by "negating logically" an experiment on the concept itself. This is also the way in which our Fock space model naturally copes with the general non-prototypicality of a negated concept, as already mentioned above.

We see in Section 4 that a large amount of data can be faithfully represented in our two-sector Fock space, and construct an explicit representation for some relevant cases that are classically problematical. A complete representation of the data is provided in the Supplementary Material attached to this paper. As we can see the findings presented in this paper provide strong and independent confirmations to our quantumtheoretic framework, and we devote Section 5 to comment on our results and extensively discuss novelties and corroboration of our approach. Technical appendices A4 and A5 complete the paper.

# 2. Description of Experiments and Classicality Analysis

James Hampton identified in his cognitive tests systematic deviations from classical (fuzzy) set predictions for membership weights of exemplars with respect to conjunctions and disjunctions of two concepts, and named these deviations "overextensions" and "underextensions" (Hampton, 1988a,b). Cases of "double overextension" were also observed. More explicitly, if the membership weight of an exemplar x with respect to the conjunction "A and B" of two concepts A and B is higher than the membership weight of x with respect to one concept (both concepts), we say that the membership weight of x is "overextended" ("double overextended") with respect to the conjunction (by abuse of language, we say that x is overextended (double overextended) with respect to the conjunction, in this case). If the membership weight of an exemplar x with respect to the disjunction "A or B" of two concepts A and B is less than the membership weight of x with respect to one concept (both concepts), we say that the membership weight of x is "underextended" ("double underextended") with respect to the disjunction (by abuse of language, we say that x is underextended (double underextended) with respect to the disjunction, in this case).

Similar effects were identified by Hampton in his experiments on conjunction and negation of two concepts (Hampton, 1997). The analysis in Aerts (2009) revealed further deviations from classicality in Hampton's experiments, due to the impossibility to generally represent the collected data in a classical probability framework satisfying the axioms of Kolmogorov. In Sozzo (2015) we moved along this direction and performed an experiment in which we tested both conjunctions of the form "A and B" and conjunctions of the form "A and B ′ ," for specific pairs (A, B) of natural concepts. We showed that very similar deviations from classicality are observed in our experiment too.

In the present paper we aim to generalize the results in Sozzo (2015), providing an extensive analysis of conceptual conjunction and negation and investigating their reciprocal influences. To this end we complete the experiment in Sozzo (2015) by performing a more general cognitive test, as described in the following sections.

## 2.1. Participants and Design

The participants to our experimental study—40 persons, chosen among our colleagues and friends—were asked to fill in a questionnaire in which they had to estimate the membership of four different sets of exemplars with respect to four different pairs (A, B) of natural concepts, and their conjunctions "A and B," "A and B ′ ," "A ′ and B," and "A ′ and B ′ ," where A ′ and B ′ denote the negations of the concepts A and B, respectively. We devised a "within-subjects design" for our experiments, hence all participants were exposed to every treatment or condition. The participants were presented with a preliminary text where we made explicit, by means of suitable examples, what one usually means by "membership of an exemplar with respect to a specific conceptual category." Further, we chose participants with different backgrounds, not only academics, to avoid issues connected with "selection biases."

We considered four pairs of natural concepts, namely (Home Furnishing, Furniture), (Spices, Herbs), (Pets, Farmyard Animals), and (Fruits, Vegetables). For each pair, we considered 24 exemplars and measured their membership with respect to these pairs of concepts and the conjunctions of these pairs mentioned above.

Conceptual membership was estimated by using a "7-point scale." The participants were asked to choose a number from the set +3, +2, +1, 0, −1, −2, −3, where the positive numbers +1, +2, and +3 meant that they considered "the exemplar to be a member of the concept"—+3 indicated a strong membership, +1 a relatively weak membership. The negative numbers −1, −2, and −3 meant that the participant considered "the exemplar to not be a member of the concept"—−3 indicated a strong non-membership, −1 a relatively weak non-membership.

Although we explicitly measured the "amount of membership" on a 7-point scale, for the scopes of this paper, we only need the data of a sub-experiment, namely the one tested for "membership" or "non-membership"—our plan is to use the "amount of membership data" for a following study leading to a graphical representation of the data, as we did with Hampton's data for the disjunction in earlier work (Aerts et al., 2013a,b).

A second reason for this specific form of the experiments is that we wanted to stay as close as possible to the disjunction experiments by Hampton (1988b), since we plan to investigate later the connections of our conjunction data with Hampton's disjunction data, for example to investigate the way in which the "de Morgan laws" take form in our modeling of the data. This is why we performed the full experiment measuring "amount of membership" and also testing simultaneously for "membership or non-membership," while it is only the latter sub-experiment that we use in the investigation presented in this article. The data of this sub-experiment give rise to relative frequencies for testing membership or not membership, which means that we can interpret them as probabilities in the limit of large numbers. More concretely, µ(A and B) is the large number limit of the relative frequency for x to be a member of "A and B" in the performed experiment. We get to this by converting the values collected on the 7-point scale by associating a value +1 to each positive value on the 7-point scale, −1 to each negative number, and 0.5 to each 0 on the same 7-point scale.

## 2.2. Procedure and Materials

This experimental study was carried out in accordance with the recommendations of the "University of Leicester Code of Practice and Research Code of Conduct, Research Ethics Committee of the School of Management" with written informed consent from all subjects. All subjects gave written informed consent in accordance with the Declaration of Helsinki. For each pair (A, B) of natural concepts, the 40 participants were involved in four subsequent experiments, eAB, eAB′ , eA′B, and eA′B′ , corresponding to the conjunctions "A and B," "A and B ′ ," "A ′ and B," and "A ′ and B ′ ," respectively. More specifically, the four sequential experiments can be illustrated as follows.

For the conceptual pair (Home Furnishing, Furniture), we firstly asked the 40 participants to estimate the membership of the first set of 24 exemplars with respect to the concepts Home Furnishing, Furniture, and their conjunction Home Furnishing And Furniture. Then, we asked the same 40 participants to estimate the membership of the same set of 24 exemplars with respect to the concept Home Furnishing, the negation Not Furniture of the concept Furniture, and their conjunction Home Furnishing And Not Furniture. Subsequently, we asked the 40 participants to estimate the membership of the 24 exemplars with respect to the negation Not Home Furnishing of the concept Home Furnishing, the concept Furniture, and their conjunction Not Home Furnishing And Furniture. Finally, we asked the 40 participants to estimate the membership of the 24 exemplars with respect to the negations Not Home Furnishing, Not Furniture, and their conjunction Not Home Furnishing And Not Furniture. The corresponding membership weights are reported in **Table A1**.

For the conceptual pair (Spices, Herbs), we firstly asked the 40 participants to estimate the membership of the second set of 24 exemplars with respect to the concepts Spices, Herbs, and their conjunction Spices And Herbs. Then, we asked the same 40 participants to estimate the membership of the same set of 24 exemplars with respect to the concept Spices, the negation Not Herbs of the concept Herbs, and their conjunction Spices And Not Herbs. Subsequently, we asked the 40 participants to estimate the membership of the 24 exemplars with respect to the negation Not Spices of the concept Spices, the concept Herbs, and their conjunction Not Spices And Herbs. Finally, we asked the 40 participants to estimate the membership of the 24 exemplars with respect to the negations Not Spices, Not Herbs, and their conjunction Not Spices And Not Herbs. The corresponding membership weights are reported in **Table A2**.

For the conceptual pair (Pets, Farmyard Animals), we firstly asked the 40 participants to estimate the membership of the third set of 24 exemplars with respect to the concepts Pets, Farmyard Animals, and their conjunction Pets And Farmyard Animals. Then, we asked the same 40 participants to estimate the membership of the same set of 24 exemplars with respect to the concept Pets, the negation Not Farmyard Animals of the concept Farmyard Animals, and their conjunction Pets And Not Farmyard Animals. Subsequently, we asked the 40 participants to estimate the membership of the 24 exemplars with respect to the negation Not Pets of the concept Pets, the concept Farmyard Animals, and their conjunction Not Pets And Farmyard Animals. Finally, we asked the 40 participants to estimate the membership of the 24 exemplars with respect to the negations Not Pets, Not Farmyard Animals, and their conjunction Not Pets And Not Farmyard Animals. The corresponding membership weights are reported in **Table A3**.

For the conceptual pair (Fruits, Vegetables), we firstly asked the 40 participants to estimate the membership of the third set of 24 exemplars with respect to the concepts Fruits, Vegetables, and their conjunction Fruits And Vegetables. Then, we asked the same 40 participants to estimate the membership of the same set of 24 exemplars with respect to the concept Fruits, the negation Not Vegetables of the concept Vegetables, and their conjunction Fruits And Not Vegetables. Subsequently, we asked the 40 participants to estimate the membership of the 24 exemplars with respect to the negation Not Fruits of the concept Fruits, the concept Vegetables, and their conjunction Not Fruits And Vegetables. Finally, we asked the 40 participants to estimate the membership of the 24 exemplars with respect to the negations Not Fruits, Not Vegetables, and their conjunction Not Fruits And Not Vegetables. The corresponding membership weights are reported in **Table A4**.

# 2.3. Methodology

A first inspection of tables **Tables A1**–**A4** already reveals that some exemplars present overextension with respect to all conjunctions "A and B," "A and B ′ ," "A ′ and B," "A ′ and B ′ ." This is the case, e.g., for the exemplar Lamp with respect to the concepts Home Furnishing and Furniture (**Table A1**), the exemplar Salt with respect to Spices and Herbs (**Table A2**), the exemplar Goldfish with respect to Pets and Farmyard Animals (**Table A3**), and the exemplar Mustard with respect to Fruits and Vegetables (**Table A4**). Hence, manifest deviations from classicality occurred in our experiments. When we say "deviations from classicality," we actually mean that the collected data behave in such a way that they cannot generally be modeled by using the usual connectives of classical (fuzzy set) logic for conceptual conjunctions, neither the rules of classical probability for their membership weights. In order to systematically identify such deviations from classicality we need however a characterization of the representability of these data in a classical probability space. To this end we derive in the following step by step conditions that will give us an overall picture of the classicality of conceptual conjunctions and negations. Finally, we arrive to a set of five conditions, formulated in Theorem 3 as a set of necessary and sufficient conditions of classicality for a pair of concepts, its negations and conjunctions to be representable within a classical Kolmogorovian probability model. Symbols and notions are introduced in Appendix A1. Let us mention that to our knowledge the "necessary and sufficient conditions for probabilities µ(A),µ(B),µ(A ′ ),µ(B ′ ), µ(A and B), µ(A and B ′ ), µ(A ′ and B), and µ(A ′ and B ′ ) to be represented within in classical Kolmogorovian probability model, have not yet been systematically derived, and hence are not known in the literature. However, the "necessary and sufficient conditions for probabilities µ(A),µ(B) and µ(A and B) to be represented within in classical Kolmogorovian probability model have been systematically studied (Pitowsky, 1989), their direct derivation can for example be found in Aerts (2009), theorem 1 of Section 1.3. We will start our investigation of the classicality condition by making use of the conditions that could be derived for µ(A),µ(B), and µ(A and B) and applying them additionally to µ(A ′ ),µ(B ′ ), and µ(A ′ and B ′ ), and to add some intermediate conditions connecting µ(A),µ(B), and µ(A and B) and µ(A ′ ),µ(B ′ ), and µ(A ′ and B ′ ), to also imply classicality for the mixed situations such as µ(A),µ(B ′ ), and µ(A and B ′ ). We can prove the following theorems (see also Appendix A4).

**Theorem 1.** The membership weights µ(A),µ(B),µ(A ′ ),µ(B ′ ), µ(A and B), µ(A and B ′ ), µ(A ′ and B), and µ(A ′ and B ′ ) of an exemplar x with respect to the concepts A, B, the negations "not A," "not B," the conjunctions "A and B," "A and B′ ," "A′ and B," and "A′ and B′ " are classical conjunction data i.e., can be represented in a classical Kolmogorovian probability model, if and only if they satisfy the following conditions.

$$0 \le \mu(A \text{ and } B) \le \mu(A) \le 1 \tag{1}$$

$$0 \le \mu(A \text{ and } B) \le \mu(B) \le 1 \tag{2}$$

$$0 \le \mu(A' \text{ and } B') \le \mu(A') \le 1 \tag{3}$$

$$0 \le \mu(A' \text{ and } B') \le \mu(B') \le 1 \tag{4}$$

$$
\mu(A) - \mu(A \text{ and } B) = \mu(B') - \mu(A' \text{ and } B') \tag{5}
$$

$$
= \mu(A \text{ and } B')
$$

$$\begin{aligned} &= \mu(A \text{ and } B')\\ \mu(B) - \mu(A \text{ and } B) &= \mu(A') - \mu(A' \text{ and } B')\\ &= \mu(A' \text{ and } B) \end{aligned} \tag{6}$$

$$(1 - \mu(A) - \mu(B) + \mu(A \text{ and } B) = \mu(A' \text{ and } B') \tag{7}$$

$$(1 - \mu(A') - \mu(B') + \mu(A' \text{ and } B') = \mu(A \text{ and } B) \tag{8}$$

**Theorem 2.** The membership weights µ(A),µ(B),µ(A ′ ),µ(B ′ ), µ(A and B), µ(A and B ′ ), µ(A ′ and B), and µ(A ′ and B ′ ) of an exemplar x with respect to the concepts A, B, A′ , and B′ and the conjunctions "A and B," "A and B′ ," "A′ and B," and "A′ and B′ " are classical conjunction data if and only if they satisfy the following conditions.

$$0 \le \mu(A \text{ and } B) \le \mu(A) \le 1\tag{9}$$

$$0 \le \mu(A \text{ and } B) \le \mu(B) \le 1 \tag{10}$$

$$
\mu(A) - \mu(A \text{ and } B) = \mu(B') - \mu(A' \text{ and } B') \tag{11}
$$

$$\begin{aligned} &= \mu(A \text{ and } B')\\ \mu(B) - \mu(A \text{ and } B) &= \mu(A') - \mu(A' \text{ and } B')\\ &= \mu(A' \text{ and } B) \end{aligned} \tag{12}$$

$$0 \le 1 - \mu(A) - \mu(B) + \mu(A \text{ and } B) = \mu(A' \text{ and } B') \tag{13}$$

The classicality requirements in Theorems 1 and 2 are not symmetric with respect to the exchange of A with A ′ and B with B ′ . Thus, we can look for equivalent and more symmetric sets of requirements. These include validity of the "marginal law" of classical probability. We see this in Theorem 3, whose proof preliminarily requires the following lemma.

**Lemma 1.** The four equalities defined in Equations (5) and (6) are equivalent with the following four equalities expressing the marginal law for all elements to be satisfied.

$$
\mu(A) = \mu(A \text{ and } B) + \mu(A \text{ and } B') \tag{14}
$$

$$
\mu(B) = \mu(A \text{ and } B) + \mu(A' \text{ and } B) \tag{15}
$$

$$
\mu(A') = \mu(A' \text{ and } B') + \mu(A' \text{ and } B) \tag{16}
$$

$$
\mu(B') = \mu(A' \text{ and } B') + \mu(A \text{ and } B') \tag{17}
$$

**Theorem 3.** The membership weights µ(A),µ(B),µ(A ′ ),µ(B ′ ), µ(A and B), µ(A and B ′ ), µ(A ′ and B), and µ(A ′ and B ′ ) of an exemplar x with respect to the concepts A, B, A′ , and B′ and the conjunctions "A and B," "A and B′ ," "A′ and B," and "A′ and B′ " are classical conjunction data if and only if they satisfy the following conditions.

$$0 \le \mu(A \text{ and } B) \le \mu(A) \le 1 \tag{18}$$

$$0 \le \mu(A \text{ and } B) \le \mu(B) \le 1 \tag{19}$$

$$
\mu(A) = \mu(A \text{ and } B) + \mu(A \text{ and } B') \tag{20}
$$

$$
\mu(B) = \mu(A \text{ and } B) + \mu(A' \text{ and } B) \tag{21}
$$

$$
\mu(A') = \mu(A' \text{ and } B') + \mu(A' \text{ and } B) \tag{22}
$$

$$
\mu(B') = \mu(A' \text{ and } B') + \mu(A \text{ and } B') \tag{23}
$$

$$0 \le 1 - \mu(A \text{ and } B) - \mu(A \text{ and } B') - \mu(A' \text{ and } B) \quad \text{(24)}$$

$$= \mu(A' \text{ and } B')$$

The conditions above can be further simplified by observing that the membership weights we collected in our experiments are large number limits of relative frequencies, thus all measured quantities are already contained in the interval [0, 1]. Therefore, we have

$$\mu(A), \,\mu(B), \,\mu(A'), \,\mu(B'), \,\mu(A \text{ and } B), \,\mu(A \text{ and } B'), \qquad \text{(25)}$$

$$\mu(A' \text{ and } B), \,\mu(A' \text{ and } B') \in [0, 1]$$

Now, when Equation (25) is satisfied, we have that from Equations (21) and (22) follows that

$$
\mu(A \text{ and } B) \le \mu(A).
$$

$$
\mu(A \text{ and } B) \le \mu(B).
$$

This entails that Equations (18) and (19) are satisfied, when Equations (21) and (22) are. Hence, we can amazingly enough formulate Theorem 3 a new, with only five conditions to be satisfied—four conditions expressing the marginal law.

**Theorem 3**′ **.** If the membership weights µ(A),µ(B),µ(A ′ ),µ(B ′ ), µ(A and B), µ(A and B ′ ), µ(A ′ and B), and µ(A ′ and B ′ ) of an exemplar x with respect to the concepts A, B, A′ , and B′ and the conjunctions "A and B," "A and B′ ," "A′ and B," and "A′ and B′ " are all contained in the interval [0, 1], they are classical conjunction data if and only if they satisfy the following conditions.

$$
\mu(A) = \mu(A \text{ and } B) + \mu(A \text{ and } B') \tag{26}
$$

$$
\mu(B) = \mu(A \text{ and } B) + \mu(A' \text{ and } B) \tag{27}
$$


$$\begin{aligned} \mu(A \text{ and } B) + \mu(A \text{ and } B') + \mu(A' \text{ and } B) \\ + \mu(A' \text{ and } B') = 1 \end{aligned} \tag{30}$$

Equations (26–30) express classicality conditions in their most symmetric form. A more traditional way to quantify deviations from classical conjunction in real data is resorting to the following parameters.

$$
\Delta\_{AB} = \mu(A \text{ and } B) - \min\{\mu(A), \mu(B)\} \tag{31}
$$

$$
\Delta\_{AB'} = \mu(A \text{ and } B') - \min\{\mu(A), \mu(B')\} \tag{32}
$$

$$
\Delta\_{A'B} = \mu(A' \text{ and } B) - \min\{\mu(A'), \mu(B)\} \tag{33}
$$

$$
\Delta\_{A'B'} = \mu(A' \text{ and } B') - \min\{\mu(A'), \mu(B')\} \tag{34}
$$

In fact, the quantities 1AB, 1AB′ , 1A′B, and 1A′B′ typically measure overextension with respect to the conjunctions "A and B," "A and B ′ ," "A ′ and B," and "A ′ and B ′ ," respectively (Hampton, 1988a). However, overextension-type deviations are generally not the only way in which membership for conjunction of concepts can deviate from classicality. Let us now introduce the following quantities:

$$k\_{AB} = 1 - \mu(A) - \mu(B) + \mu(A \text{ and } B) \tag{35}$$

$$k\_{AB'} = 1 - \mu(A) - \mu(B') + \mu(A \text{ and } B') \tag{36}$$

$$k\_{A'B} = 1 - \mu(A') - \mu(B) + \mu(A' \text{ and } B) \tag{37}$$

$$k\_{A'B'} = 1 - \mu(A') - \mu(B') + \mu(A' \text{ and } B') \tag{38}$$

The quantities kAB, kAB′ , kA′B, and kA′B′ have been named "Kolmogorovian conjunction factors" and studied in detail in Aerts (2009). The Kolmogorovian factors measure a deviation that can be understood as of "opposite type" than the deviation measured by the overextension. Namely, the condition for kAB is violated when both µ(A) and µ(B) are "too large" compared with µ(A and B). Finally, we introduce a new type of quantities that measure the deviations of classicality as expressed by Equations (27–30), hence essentially deviations from the marginal law of classical probability:<sup>2</sup>

$$I\_{ABA'B'} = 1 - \mu(A \text{ and } B) - \mu(A \text{ and } B') - \mu(A' \text{ and } B) \tag{39}$$

$$- \mu(A' \text{ and } B')$$

$$I\_A = \mu(A) - \mu(A \text{ and } B) - \mu(A \text{ and } B') \tag{40}$$

$$I\_B = \mu(B) - \mu(A \text{ and } B) - \mu(A' \text{ and } B) \tag{41}$$

$$I\_{A'} = \mu(A') - \mu(A' \text{ and } B') - \mu(A' \text{ and } B) \tag{42}$$

$$I\_{B'} = \mu(B') - \mu(A' \text{ and } B') - \mu(A \text{ and } B') \tag{43}$$

Finally, Theorem 3′ can be reformulated by means of the introduced parameters as follows.

**Theorem 3**′′ **.** If the membership weights µ(A),µ(B),µ(A ′ ),µ(B ′ ), µ(A and B), µ(A and B ′ ), µ(A ′ and B), and µ(A ′ and B ′ ) of an exemplar x with respect to concepts A, B, A′ , and B′ and the conjunctions "A and B," "A and B′ ," "A′ and B," and "A′ and B′ ," are all contained in the interval [0, 1], they are classical conjunction data if and only if

$$I\_{ABA'B'} = I\_A = I\_B = I\_{A'} = I\_{B'} = 0 \tag{44}$$

### 2.4. Results

Let us now come back to our experiments. Theorems 1–3 are manisfestly violated in several cases, and we report in Appendix A3 the relevant conditions that should hold in a classical setting. Since the conditions kAB > 0, kAB′ > 0, kA′<sup>B</sup> > 0, and kA′B′ > 0 are always satisfied, they are not explicitly inserted in **Tables A1**–**A4**. On the contrary, <sup>1</sup>XY, <sup>I</sup>X, <sup>I</sup>Y, <sup>X</sup> <sup>=</sup> <sup>A</sup>, <sup>A</sup> ′ , Y = B, B ′ , and IABA′B′ are systematically violated. This means that deviations from a classical probability model in our experimental data are due to both overextension in the conjunctions and violations of classicality in the negations. We consider some relevant cases in the following.

The exemplar Apple scores µ(A) = 1 with respect to the concept Fruits, µ(B) = 0.23 with respect to the concept Vegetables, and µ(A and B) = 0.6 with respect to the conjunction Fruits And Vegetables, hence it has <sup>1</sup>AB <sup>=</sup> 0.38 (**Table A4**). The exemplar Prize Bull scores µ(A) = 0.13 with respect to Pets, µ(B) = 0.76 with respect to the concept Farmyard Animals, and µ(A and B) = 0.43 with respect to the conjunction Pets And Farmyard Animals, hence it has <sup>1</sup>AB <sup>=</sup> 0.29 (**Table A3**). The membership weight of Chili Pepper with respect to Spices is 0.98, with respect to Herbs is 0.53, while its membership weight with respect to the conjunction Spices And Herbs is 0.8, hence <sup>1</sup>AB <sup>=</sup> 0.27, thus giving rise to overextension (**Table A2**). Even stronger deviations are observed in the combination Fruits And Vegetables. For example, the exemplar Broccoli scores 0.09 with respect to Fruits, 1 with respect to Vegetables, and 0.59 with

<sup>2</sup>Remark that, if we set <sup>I</sup>AA′ <sup>=</sup> <sup>1</sup> <sup>−</sup> <sup>µ</sup>(A) <sup>−</sup> <sup>µ</sup>(<sup>A</sup> ′ ) and IBB′ = 1 − µ(B) − µ(B ′ ), we have IAA′ = IABA′B′ − I<sup>A</sup> − IA′ and IBB′ = IABA′B′ − I<sup>B</sup> − IB′ , which means that the parameters IAA′ and IBB′ used in Sozzo (2015) can be derived from the parameters IABA′B′ , IA, IB, IA′ , If the membership weights and IB′ .

respect to Fruits And Vegetables (1AB = 0.49). A similar pattern is observed for Parsley, which scores 0.02 with respect to Fruits, 0.78 with respect to Vegetables and 0.45 with respect to Fruits And Vegetables (1AB <sup>=</sup> 0.43, **Table A4**).

Overextension is present when one concept is negated. More explicitly:

(i) in the conjunction "A and B ′ ." Indeed, the membership weights of Shelves with respect to Home Furnishing, Not Furniture, and Home Furnishing And Not Furniture is 0.85, 0.13, and 0.39, respectively, for a <sup>1</sup>AB′ <sup>=</sup> 0.26 (**Table A1**). Then, Pepper scores 0.99 with respect to Spices, 0.58 with respect to Not Herbs, and 0.9 with respect to Spices and Not Herbs, for a <sup>1</sup>AB′ <sup>=</sup> 0.32 (**Table A2**). Finally, Doberman Guard Dog gives 0.88 and 0.27 with respect to Pets and Not Farmyard Animals, respectively, while it scores 0.55 with respect to Pets And Not Farmyard Animals, hence it scores <sup>1</sup>AB′ <sup>=</sup> 0.28 (**Table A3**).

(ii) in the conjunction "A ′ and B." Indeed, the membership weights of Desk with respect to Not Home Furnishing, Furniture and Not Home Furnishing And Furniture is 0.31, 0.95, and 0.75, respectively, for a <sup>1</sup>A′<sup>B</sup> <sup>=</sup> 0.44 (**Table A1**). The exemplar Oregano scores 0.21 with respect to Not Spices, 0.86 with respect to Herbs, and 0.5 with respect to Not Spices and Herbs, for a <sup>1</sup>A′<sup>B</sup> <sup>=</sup> 0.29 (**Table A2**). Finally, again Doberman Guard Dog gives 0.14 and 0.76 with respect to Not Pets and Farmyard Animals, respectively, while it scores 0.45 with respect to Not Pets And Farmyard Animals, hence it scores <sup>1</sup>A′<sup>B</sup> <sup>=</sup> 0.45 (**Table A3**).

When two concepts are negated—"A ′ and B ′ "—we have, for example, µ(A ′ ) = 0.12, µ(B ′ ) = 0.81 and µ(A ′ and B ′ ) = 0.43 for Goldfish, with respect to Not Pets and Not Farmyard Animals, hence <sup>1</sup>A′B′ <sup>=</sup> 0.31, in this case (**Table A3**). More, the exemplar Garlicscores µ(A ′ ) = 0.88 with respect to Not Fruits and µ(B ′ ) = 0.24 with respect to Not Vegetables, and µ(A ′ and B ′ ) = 0.45 with respect to Not Fruits And Not Vegetables, for a 1A′B′ = 0.21 (**Table A4**).

Double overextension is also present in various cases. For example, the membership weight of Olive with respect to Fruits And Vegetables is 0.65, which is greater than both 0.53 and 0.63, i.e., the membership weights of Olive with respect to Fruits and Vegetables, respectively (**Table A4**). Furthermore, Prize Bull scores 0.13 with respect to Pets and 0.26 with respect to Not Farmyard Animals, but its membership weight with respect to Pets And Not Farmayard Animals is 0.28 (**Table A3**). Also, Door Bell gives 0.32 with respect to Not Home Furnishing and 0.33 with respect to Furniture, while it gives 0.34 with respect to Not Home Furnishing And Furniture.

Significant deviations from classicality are also due to conceptual negation, in the form of violation of the marginal law of classical probability theory. By again referring to **Tables A1**–**A4**, we have that the exemplar Field Mouse has IABA′B′ in Equation (40) equal to <sup>−</sup>0.46 (**Table A3**), while the exemplar Doberman Guard Dog has <sup>I</sup>ABA′B′ = −1.03 (**Table A3**). Both exemplars thus violate Equation (30). Analogously, Chili Pepper has <sup>I</sup><sup>A</sup> in Equation (40) equal to <sup>−</sup>0.73 (**Table A2**), hence it violates Equation (26), while Pumpkin has IB′ in Equation (43) equal to <sup>−</sup>0.13 (**Table A4**), hence it violates Equation (29).

We performed a statistical analysis of the data, estimating the probability that the experimentally identified deviations from classicality would be due to chance. We specifically considered the classicality conditions Equations (26–30) with the aim to prove that the deviations IX, X = A, A ′ , IY, Y = B, B ′ and IABA′B′ in Equations (40–43) were statistically significant. We firstly performed a "two-tail t-test for paired two samples for means" to test deviations from the marginal law of classical probability, that is, we tested violations of Equations (26– 29) by comparing µ(X) with respect to P <sup>Y</sup>=B,B′ <sup>µ</sup>(X, <sup>Y</sup>), X = A, A ′ , and µ(Y) with respect to P <sup>X</sup>=A,A′ <sup>µ</sup>(<sup>X</sup> and <sup>Y</sup>), Y = B, B ′ . Then, we performed a "two-tail t-test for one sample for means" to test P X=A,A′ P <sup>Y</sup>=B,B′ <sup>µ</sup>(Xand <sup>Y</sup>) with respect to 1. The corresponding p-values for df = 37 are reported in **Tables A5A–E**. Due to the high number of multiple comparisons—24 null hypotheses were tested for each pair (X, Y) of concepts—we applied a "Bonferroni correction procedure" to avoid the so-called "family-wise error rate" (FWER). Hence, we compared the obtained p-values with the reference value 0.05/24 ≈ 0.002. We found p-values systematically much lower than this reference value, for all exemplars and pairs of concepts, which makes it possible to conclude that the experimentally tested deviations from classicality are not due to chance.

In addition, our data analysis reveals a new, fundamental and a priori unexpected deviation from classicality. The numerical values of IA, IB, IA′ , IB′ , and IABA′B′ in Equations (40–43) are reported in Aerts et al. (2015b). They are such that the corresponding pattern of violation exhibits specific features:


Observations (i–iv) were for us a clue that IA, IB, IA′ , I<sup>B</sup> ′ , and IABA′<sup>B</sup> ′ are constant functions across all exemplars and pairs of concepts. This is indeed the case, as we have proved in Aerts et al. (2015b) by means of a "linear regression statistical analysis." This pattern is so unexpectedly stable, systematic and regular, being independent of exemplars, concepts and conceptual connectives, that it constitutes for us a fundamental new finding. We believe that this deviation from classicality occurs at a deeper level than the known deviations due to overextension and underextension, and that it expresses a fundamental mechanism of concept formation.

These results could already be considered as crucial for claiming that the violation of classicality occurs at a deep structural conceptual level, but this is not the end of the story. We will see in Section 5 that the stability of this violation can exactly be explained in a quantum-theoretic framework in two-sector Fock space elaborated by ourselves. Hence, we devote Sections 3 and 4 to expose this modeling framework (the essentials of the formalism we apply are reviewed in Appendix A2, and we refer to it for symbols and notation).

# 3. Quantum Modeling Conceptual Conjunctions and Negations

In Aerts (2009) we proved that a big amount of the experimental data collected in Hampton (1988a,b) on conjunctions and disjunctions of two concepts can be modeled by using the mathematical formalism of quantum theory. A two-sector Fock space then provided an optimal algebraic setting for this modeling. In Sozzo (2014) we proved that this quantum-theoretic framework was suitable to model the data collected in Alxatib and Pelletier (2011) on conjunctions of the form "A and A ′ ," and in Sozzo (2015) we were able to prove that also the experimental data collected on conjunctions of the form "A and B" and "A and B ′ ," for specific pairs (A, B) of concepts can be represented by using the same quantum mathematics. However, a complete modeling of data on both conjunctions and negations requires performing new experiments, where the conceptual conjunctions "A and B" and "A and B ′ " are tested together with the conceptual conjunctions "A ′ and B" and "A ′ and B ′ ." The complete collection of these experiments has been discussed in Section 2. As anticipated in Section 1, we undertake this modeling task here. It is natural to observe that the modeling in Aerts (2009) needs a suitable generalization, since conceptual negation should be taken into account as well. But, we will see later in this section that such a generalization is completely compatible with the original model, because it rests on the assumption that (probabilistic versions of) logical rules hold only in second sector of Fock space. By introducing this quite natural assumption, we were able to model conceptual conjunctions and disjunctions in Fock space. We show now that conceptual conjunctions and negations can be modeled in Fock space by introducing the same assumption.

To model conceptual negations we also need a new theoretical step which was not necessary in our previous formulations, namely, the introduction of "entangled states" in second sector of Fock space to formalize situations where the membership weights are not independent. This introduction, together with the application of quantum logical rules in second sector of Fock space, are compatible with previous formulations, but they make our generalization in this paper highly non-obvious. We will extensively discuss the novelties of the present modeling in the next sections. Let us first proceed with our mathematical construction.

Let us denote by µ(A), µ(B), µ(A ′ ), µ(B ′ ), µ(A and B), µ(A and B ′ ), µ(A ′ and B), and µ(A ′ and B ′ ) the membership weights of a given exemplar x with respect to the concepts A, B, the negations A ′ , B ′ and the conjunctions "A and B," "A and B ′ ," "A ′ and B," and "A ′ and B ′ ," respectively.

The decision measurement testing whether a specific exemplar x is a member or not a member of a concept A is represented by the spectral decomposition of the identity consisting of the two orthogonal projection operators M (generally depending on x, we omit such dependence, for the sake of brevity) and <sup>1</sup> <sup>−</sup> <sup>M</sup> defined in a complex Hilbert space H. The concepts A and B are represented by orthogonal unit vectors |Ai and |Bi, respectively, of H. Hence we have

$$
\langle A|A\rangle = \langle B|B\rangle = 1 \quad \langle A|B\rangle = 0 \tag{45}
$$

By using standard rules for quantum probabilities (see Appendix A2), we have the following

$$
\mu(A) = \langle A|M|A\rangle \quad \mu(B) = \langle B|M|B\rangle \tag{46}
$$

where µ(A) and µ(B) are the measured membership weights of x with respect to the concepts A and B, respectively, in the performed experiment.

The conceptual negations A ′ and B ′ are represented by another pair of orthogonal unit vectors |A ′ i and |B ′ i, respectively, such that the set {|Ai, |Bi, |A ′ i, |B ′ i}, is an orthonormal set. Hence we have

$$
\mu(A') = \langle A'|M|A'\rangle \quad \mu(B') = \langle B'|M|B'\rangle \tag{47}
$$

where µ(A ′ ) and µ(B ′ ) are the measured membership weights of x with respect to the negations A ′ and B ′ of the concepts A and B, respectively, in the performed experiment.

#### 3.1. The First Sector Analysis

Let us first analyze the situation where we look for a modeling solution in the Hilbert space H —which for our complete quantum model in Fock space will be the first sector of this Fock space, as we will show in detail later. In the Hilbert space H, the concepts "A and B," "A and B ′ ," "A ′ and B," and "A ′ and B ′ " are respectively represented by the superposition vectors<sup>3</sup> √ 1 2 (|Ai+|Bi), <sup>√</sup> 1 2 (|Ai+|B ′ i), <sup>√</sup> 1 2 (|A ′ i+|Bi), and <sup>√</sup> 1 2 (|A ′ i+|B ′ i).

Let us analyze in detail the aspects of this situation with the aim of resulting in a view on the possible solutions. Geometric considerations induce to observe that, if we look for a solution in the complex Hilbert space C 8 , we will find the most general type of solution. Indeed, since we consider four orthonormal vectors |Ai, |A ′ i, |Bi, and |B ′ i, our Hilbert space will contain a four dimensional subspace generated by these vectors. Further we have two orthogonal projection operators M and 1−M, that work on this four dimensional subspace. The image of a projection operator has dimension not bigger than the definition domain of it, which means that the image of M of the four dimensional subspace is at maximum equal to four, and this is also the case for the image of <sup>1</sup> <sup>−</sup> <sup>M</sup> of the four dimensional subspace. Since <sup>M</sup> and <sup>1</sup>−<sup>M</sup> are orthogonal, this can give rise to a eight dimensional subspace, but not more. It means that we can incorporate all what we need in an eight dimensional complex Hilbert space. This is the reason that we look for our representation starting with C 8 , knowing that the choice of a Hilbert space with more than eight dimensions would not add degrees of freedom that can give rise to additional solutions to those that can be found in C 8 . Hence, we explicitly use this Hilbert space in what follows reminding, however, that our results hold in any higher dimensional Hilbert

<sup>3</sup>We introduce in this model a superposition vector with equal weights on the two vectors. The general case of a weighted superposition can be considered in future investigation, and it is an interesting line of research in itself, as the interpretation of the weights is not trivial.

space. Let {|1i = (1,... , 0), |2i = (0, 1, ... , 0), . . . , |8i = (0, <sup>0</sup>,... , 1)} denote the canonical base of <sup>C</sup> 8 . We construct a representation in C <sup>8</sup> where M projects on the subspace C 4 generated by the last four vectors of this canonical base, and <sup>1</sup>−<sup>M</sup> on the subspace C 4 generator by the first four vectors of it. If we set

$$|A\rangle = e^{i\phi\_{\Lambda}}\_{\ldots}(a\_1, a\_2, a\_3, a\_4, a\_5, a\_6, a\_7, a\_8) \tag{48}$$

$$|A'\rangle = e^{i\phi\_{A'}} \langle a'\_1, a'\_2, a'\_3, a'\_4, a'\_5, a'\_6, a'\_7, a'\_8 \rangle \tag{49}$$

$$|B\rangle = e^{i\phi\_B}\_{\ldots}(b\_1, b\_2, b\_3, b\_4, b\_5, b\_6, b\_7, b\_8) \tag{50}$$

$$|B'\rangle = e^{i\phi\_{B'}} (b'\_1, b'\_2, b'\_3, b'\_4, b'\_5, b'\_6, b'\_7, b'\_8) \tag{51}$$

then Equations (46) and (47) become

$$
\mu(A) = \langle A|M|A\rangle = a\_5^2 + a\_6^2 + a\_7^2 + a\_8^2 \tag{52}
$$

$$1 - \mu(A) = \langle A | 1 - M | A \rangle = a\_1^2 + a\_2^2 + a\_3^2 + a\_4^2 \tag{53}$$

$$
\mu(A') = \langle A'|M|A'\rangle = a'^2\_{\ 5} + a'^2\_{\ 6} + a'^2\_{\ 7} + a'^2\_{\ 8} \tag{54}
$$

$$1 - \mu(A') = \langle A' | 1 - M | A' \rangle = a'^2\_1 + a'^2\_2 + a'^2\_3 + a'^2\_4 \quad \text{(55)}$$

$$\mu(B) = \langle B | M | B \rangle = b^2\_1 + b^2\_2 + b^2\_2 + b^2\_3 \tag{56}$$

$$
\mu(B) = \langle B|M|B\rangle = b\_5^2 + b\_6^2 + b\_7^2 + b\_8^2 \tag{56}
$$

$$
$$

$$1 - \mu \langle B \rangle = \langle B | \mathbb{1} - M | B \rangle = b\_1^2 + b\_2^2 + b\_3^2 + b\_4^2 \tag{57}$$
 
$$\mu \langle B' \rangle = \langle B' | M | B' \rangle = b'^2\_5 + b'^2\_6 + b'^2\_7 + b'^2\_8 \tag{58}$$

$$\mu(B') = \langle B'|M|B'\rangle = b'^{\sharp}\_{\ 5} + b'^{\sharp}\_{\ 6} + b'^{\sharp}\_{\ 7} + b'^{\sharp}\_{\ 8} \tag{58}$$

$$1 - \mu \langle B' \rangle = \langle B' | 1 - M | B' \rangle = b'^2\_1 + b'^2\_2 + b'^2\_3 + b'^2\_4 \quad \text{(59)}$$

and the orthogonality conditions become

$$\begin{aligned} 0 = \langle A|A'\rangle &= a\_1a\_1' + a\_2a\_2' + a\_3a\_3' + a\_4a\_4' + a\_5a\_5' \\ &+ a\_6a\_6' + a\_7a\_7' + a\_8a\_8' \end{aligned} \tag{60}$$

$$\begin{aligned} 0 = \langle B|B'\rangle &= b\_1b\_1' + b\_2b\_2' + b\_3b\_3' + b\_4b\_4' + b\_5b\_5' \\ &+ b\_6b\_6' + b\_7b\_7' + b\_8b\_8' \end{aligned} \tag{61}$$

$$0 = \langle A|B\rangle = a\_1b\_1 + a\_2b\_2 + a\_3b\_3 + a\_4b\_4 + a\_5b\_5$$

+ a6b<sup>6</sup> + a7b<sup>7</sup> + a8b<sup>8</sup> (62) 0 = hA|B ′ i = a1b ′ <sup>1</sup> + a2b ′ <sup>2</sup> + a3b ′ <sup>3</sup> + a4b ′ <sup>4</sup> + a5b ′ 5 + a6b ′ <sup>6</sup> + a7b ′ <sup>7</sup> + a8b ′ (63)

$$0 = \langle A' | B \rangle = a\_1' b\_1 + a\_2' b\_2 + a\_3' b\_3 + a\_4' b\_4 + a\_5' b\_5$$

$$\qquad + a\_6' b\_6 + a\_7' b\_7 + a\_8' b\_8 \tag{6}$$

6

$$b\_7 + a\_8' b\_8 \tag{64}$$

$$\begin{aligned} 0 = \langle A' | B' \rangle &= a\_1' b\_1' + a\_2' b\_2' + a\_3' b\_3' + a\_4' b\_4' + a\_5' b\_5' \\ &+ a\_6' b\_6' + a\_7' b\_7' + a\_8' b\_8' \end{aligned} \tag{65}$$

A solution of Equations (52–65) gives us a configuration of the four orthonormal vectors |Ai, |A ′ i, |Bi, and |B ′ <sup>i</sup> in <sup>C</sup> 8 , such that self-adjoint operator formed by the spectral decomposition of the two orthogonal projections <sup>M</sup> and <sup>1</sup> <sup>−</sup> <sup>M</sup> give rise to the values µ(A), 1 − µ(A), µ(A ′ ), 1 − µ(A ′ ), µ(B), 1 − µ(B), and µ(B ′ ), 1 − µ(B ′ ), corresponding to the measured data.

By using standard rules for quantum probabilities we have that the membership weights for the conjunctions corresponding to the measured data should satisfy the following equations:

$$
\mu \langle A \text{ and } B \rangle = \frac{1}{\sqrt{2}} (\langle A \rangle + \langle B \rangle) M \frac{1}{\sqrt{2}} (\langle A \rangle + \langle B \rangle)
$$

$$
= \frac{1}{2} (\mu \langle A \rangle + \mu \langle B \rangle) + \mathfrak{R} \langle A \vert M \vert B \rangle \tag{66}
$$

$$
\mu \langle A \text{ and } B' \rangle = \frac{1}{\sqrt{2}} (\langle A \rangle + \langle B' \rangle) M \frac{1}{\sqrt{2}} (\langle A \rangle + \langle B' \rangle)
$$

$$
= \frac{1}{2} (\mu \langle A \rangle + \mu \langle B' \rangle) + \mathfrak{R} \langle A \vert M \vert B' \rangle \tag{67}
$$

$$
\mu \langle A' \text{ and } B \rangle = \frac{1}{\sqrt{2}} (\langle A' | + \langle B |) M \frac{1}{\sqrt{2}} (|A' \rangle + |B \rangle)
$$

$$
= \frac{1}{2} (\mu \langle A' \rangle + \mu \langle B \rangle) + \mathfrak{R} \langle A' | M | B \rangle \tag{68}
$$

$$\mu(A' \text{ and } B') = \frac{1}{\sqrt{2}} (\langle A' | + \langle B' |)M \frac{1}{\sqrt{2}} (|A'\rangle + |B'\rangle)$$

$$= \frac{1}{2} (\mu(A') + \mu(B')) + \Re \langle A' | M | B' \rangle \tag{69}$$

Hence, in C 8 these equations become

$$\begin{aligned} \mu(A \text{ and } B) &= \frac{1}{2} (\mu(A) + \mu(B)) + \mathfrak{R} \langle A | M | B \rangle \\ &= \frac{1}{2} (\mu(A) + \mu(B)) + (a\_5 b\_5 + a\_6 b\_6 + a\_7 b\_7 + a\_8 b\_8) \\ &\quad \cos(\phi\_B - \phi\_A) &\quad \tag{70} \end{aligned}$$

$$\begin{aligned} \langle \mu(A \text{ and } B') &= \frac{1}{2} (\mu(A) + \mu(B')) + \Re \langle A | M | B' \rangle \\ &= \frac{1}{2} (\mu(A) + \mu(B')) + (a\_5 b\_5' + a\_6 b\_6' + a\_7 b\_7' + a\_8 b\_8') \\ &\quad \cos(\phi\_{B'} - \phi\_A) \end{aligned} \tag{71}$$

$$\begin{aligned} \mu(A' \text{ and } B) &= \frac{1}{2} (\mu(A') + \mu(B)) + \mathfrak{R} \langle A' | M | B \rangle \\ &= \frac{1}{2} (\mu(A') + \mu(B)) + (a'\_5 b\_5 + a'\_6 b\_6 + a'\_7 b\_7 + a'\_8 b\_8) \\ \cos(\phi\_B - \phi\_{A'}) &\approx \frac{1}{2} \end{aligned} \tag{72}$$

$$\begin{aligned} \mu(A' \text{ and } B') &= \frac{1}{2} (\mu(A') + \mu(B')) + \Re \langle A' | M | B' \rangle \\ &= \frac{1}{2} (\mu(A') + \mu(B')) + (a'\_5 b'\_5 + a'\_6 b'\_6 + a'\_7 b'\_7 + a'\_8 b'\_8) \\ \cos(\phi\_{B'} - \phi\_{A'}) &\tag{73} \end{aligned}$$

The conditions that should be satisfied by experimental data in order to represent them in C 8 are reported in Appendix A5. By analogy with what we found in Aerts(2009) and Sozzo (2014, 2015), we however expect that our experimental data in Appendix A3 cannot be generally modeled in the complex Hilbert space C 8 , or first sector of Fock space, but a second sector C <sup>8</sup> <sup>⊗</sup><sup>C</sup> <sup>8</sup> of Fock space is also needed. Consider for example a simple case that applies for classical logics µ(A) = 1,µ(B) = 0 and µ(A and B) = 0. This case is consistent with the minimum conjunction rule (Zadeh, 1982) but not with our first sector of Fock space (Hilbert space) model. We will see that this type of cases is compatible with second sector of Fock space, and show that a framework that encompasses both "logical" and "emergent" reasoning about membership in these situations requires the general properties of a Fock space. To make our description complete however, we first have to introduce a new, conceptually relevant, ingredient.

# 3.2. Introducing Entanglement in Conceptual Combinations

In Aerts (2009) and Sozzo (2014, 2015) we successfully modeled conjunctions of the form "A and B" in a Fock space constructed as the direct sum of an individual Hilbert space H, or "first sector of Fock space," and a tensor product Hilbert space <sup>H</sup> <sup>⊗</sup> <sup>H</sup>, or "second sector of Fock space." The concepts A and B were respectively represented by the unit vectors <sup>|</sup>A<sup>i</sup> and <sup>|</sup>B<sup>i</sup> of <sup>H</sup>, while the conjunction "A and B" was represented by the unit vector √1 2 (|Ai + |Bi) in first sector, and by the tensor product vector |Ai ⊗ |Bi in second sector. The decision measurement of a person who estimates whether a given exemplar x is a member of "A and B" was represented by the orthogonal projection operator M in first sector, and by the tensor product projection operator M ⊗ M in second sector. The conjunction "A and B" was represented by a unit vector of the form <sup>ψ</sup>(A, <sup>B</sup>) <sup>=</sup> mei<sup>θ</sup> <sup>|</sup>Ai ⊗|Bi +nei<sup>ρ</sup> <sup>√</sup><sup>1</sup> 2 (|Ai +|Bi) in the Fock space <sup>H</sup> <sup>⊕</sup> (<sup>H</sup> <sup>⊗</sup> <sup>H</sup>), while the decision measurement was represented by the orthogonal projection operator M ⊕ (M ⊗ M) in the same Fock space. By using quantum probabilistic rules, one could then write the membership weight of x with respect to "A and B" as µ(A and B) = hψ(A, B)|(M ⊗ M) ⊕ M|ψ(A, B)i = <sup>m</sup>2µ(A)µ(B) <sup>+</sup> <sup>n</sup> 2 ( 1 2 (µ(A) + µ(B)) + ℜhA|M|Bi). This treatment needs now to be generalized to the decision measurement of the concepts A, B, the negations A ′ , B ′ and the conjunctions "A and B," "A and B ′ , "A ′ and B," and "A ′ and B ′ ." The first sector situation has already been analysed in Section 3.1 where we have also constructed an explicit representation in the complex Hilbert space C 8 . Here we analyse the second sector situation, but we allow for the possibility of representing concepts by entangled states too.

Is it possible to introduce some "type of entanglement" in second sector C <sup>8</sup> <sup>⊗</sup><sup>C</sup> 8 ? This question is interesting, since it is reasonable to believe that the outcomes of experiments for A are not independent of the outcomes of experiments for B. For example, in case a specific exemplar x is strongly a member of Fruits, this will influence the strength of membership of Vegetables, and viceversa, because the meanings of Fruits and Vegetables are not independent. And this apparently occurs for all human concepts. Suppose we combined, for example, Fruits with Not Fruits, then one would expect to exist, for any exemplar, a substantial amount of anti-correlation between it being a member of Fruits and it being a member of Not Fruits. How can we express the general situation, where anti-correlation, as well as correlation, are possible to be increased or decreased by parameters? In the foregoing modeling (Aerts, 2009; Aerts et al., 2013b) we chose the simplest representation for the situation in second sector, namely the product state |Ai ⊗ |Bi, which leads to a situation of complete independence between A and B, for whatever exemplar tested. Let us investigate what would be a situation for a general entangled state, and how a two-sector Fock space already incorporates this possibility.

Suppose that the concept "A and B" is not represented by the product state vector |Ai ⊗ |Bi in second sector of Fock space C <sup>8</sup> <sup>⊗</sup><sup>C</sup> 8 , but by a general entangled state vector <sup>|</sup>C<sup>i</sup> of <sup>C</sup> <sup>8</sup> <sup>⊗</sup><sup>C</sup> 8 . We remind that C 8 is the concrete Hilbert space we have constructed in Section 3.1. In the canonical base {|ii}<sup>i</sup> <sup>=</sup> <sup>1</sup>,...,<sup>8</sup> of <sup>C</sup> 8 , we have

$$|\mathcal{C}\rangle = \sum\_{i,j=1}^{8} c\_{ij} e^{i\gamma\_{\vec{j}}} |i\rangle \otimes |j\rangle \tag{74}$$

and

$$1 = \langle \mathbf{C} | \mathbf{C} \rangle = \sum\_{i,j=1}^{8} c\_{ij}^{2} \tag{75}$$

We now express the effect, as described in second sector, of the experiments where participants were asked to decide for membership (or non-membership) of a specific exemplar with respect to the concepts A and B, as follows. Membership with respect to A, as a yes-no measurement, is represented by the orthogonal projection operators <sup>M</sup> <sup>⊗</sup> <sup>1</sup>, (<sup>1</sup> <sup>−</sup> <sup>M</sup>) <sup>⊗</sup> <sup>1</sup>, as spectral family of the corresponding self-adjoint operator. Hence in second sector, tests on concept A, are described in the first component of the tensor product Hilbert space C <sup>8</sup> <sup>⊗</sup> <sup>C</sup> 8 , which forms second sector. In an analogous way, tests on concept B, are described in the second component of the tensor product, by the orthogonal projection operators <sup>1</sup> <sup>⊗</sup> <sup>M</sup>, <sup>1</sup> <sup>⊗</sup> (<sup>1</sup> <sup>−</sup> <sup>M</sup>), as spectral family of the corresponding self-adjoint operator. Remark that we do not introduce in any way the concepts A ′ and B ′ for the second sector description, and also not conjunction of A and B with them, at least the aspect of these conjunctions that represent new emergent concepts. The concept A ′ and B ′ are indeed "emergent entities" because the negation on a concept to give rise to a new concept, namely the negation concept.

Also the experimental data collected on A ′ , B ′ and combinations of them with A and B do not appear in second sector. All emergence is indeed modeled in first sector. This means that also the conjunction of A and B as a new emergent concept does not appear in second sector, it only appears in first sector modeled there by the superposition. All non-emergent equivalents of these are described by the tensor product of the two self-adjoint operators corresponding to the yes-no experiments with respect to membership performed on concepts A and consequently on concept B, exactly as in our real life experiment that gave rise to our data on A and B. This means that the orthogonal projectors of the spectral family of this tensor product self-adjoint operator describe all cases of non-emergence. This family consists of {M ⊗ M, M ⊗ (<sup>1</sup> <sup>−</sup> <sup>M</sup>),(<sup>1</sup> <sup>−</sup> <sup>M</sup>) <sup>⊗</sup> <sup>M</sup>,(<sup>1</sup> <sup>−</sup> <sup>M</sup>) <sup>⊗</sup> (<sup>1</sup> <sup>−</sup> <sup>M</sup>)}. Let us express this on a general entangled state vector |Ci. We have

$$\mu(A) = \langle C|M \otimes \mathbb{1}|C\rangle = \sum\_{i=5}^{8} \sum\_{j=1}^{8} c\_{ij}^{2} \tag{76}$$

$$\mu(\mathcal{B}) = \langle \mathcal{C} | \mathbb{1} \otimes M | \mathcal{C} \rangle = \sum\_{i=1}^{8} \sum\_{j=5}^{8} c\_{ij}^{2} \tag{77}$$

$$(1 - \mu(A) = \langle \mathcal{C} | (1 - M) \otimes \mathbb{1} | \mathcal{C} \rangle = \sum\_{i=1}^{4} \sum\_{j=1}^{8} c\_{ij}^{2} \tag{78}$$

$$(1 - \mu \langle \mathcal{B} \rangle = \langle \mathcal{C} | \mathbb{1} \otimes \langle \mathbb{1} - M \rangle | \mathcal{C} \rangle = \sum\_{i=1}^{8} \sum\_{j=1}^{4} c\_{ij}^{2} \tag{79}$$

And further we have

$$\langle \text{C} \vert M \otimes M \vert \text{C} \rangle \quad \sum\_{i=5}^{8} \sum\_{j=5}^{8} c\_{ij}^{2} \tag{80}$$

$$\langle \text{C} \vert M \otimes \langle 1 - M \vert \text{C} \rangle = \sum\_{i=5}^{8} \sum\_{j=1}^{4} c\_{ij}^{2} \tag{81}$$

$$\langle \mathcal{C} | (\mathbb{1} - M) \otimes M | \mathcal{C} \rangle = \sum\_{i=1}^{4} \sum\_{j=5}^{8} c\_{ij}^{2} \tag{82}$$

$$\langle \text{C} | (\mathbb{1} - M) \otimes (\mathbb{1} - M) | \text{C} \rangle = \sum\_{i=1}^{4} \sum\_{j=1}^{4} c\_{ij}^{2} \tag{83}$$

The values of <sup>h</sup>C|<sup>M</sup> <sup>⊗</sup> <sup>M</sup>|Ci, <sup>h</sup>C|<sup>M</sup> <sup>⊗</sup> (<sup>1</sup> <sup>−</sup> <sup>M</sup>)|Ci, <sup>h</sup>C|(<sup>1</sup> <sup>−</sup> <sup>M</sup>) <sup>⊗</sup> <sup>M</sup>|Ci, and <sup>h</sup>C|(<sup>1</sup> <sup>−</sup> <sup>M</sup>) <sup>⊗</sup> (<sup>1</sup> <sup>−</sup> <sup>M</sup>)|C<sup>i</sup> will respectively represent the amounts that within our Fock space model second sector contributes to the values of µ(A and B), µ(A and B ′ ), µ(A ′ and B), and µ(A ′ and B ′ ). We can prove that the second sector theoretical values, allowing the state to be a general entangled state in our C 8 ⊗ C <sup>8</sup> Hilbert space model, reach exactly the values to be found for the case the classicality conditions Equations (26–30) in Theorem 3 ′ are satisfied. More explicitly, the following theorem holds (see Appendix A4 for its proof).

**Theorem 4.** If the experimentally collected membership weights µ(A), µ(B), µ(A ′ ), µ(B ′ ), µ(A and B), µ(A and B ′ ), µ(A ′ and B), and µ(A ′ and B ′ ) can be represented in second sector of Fock space for a given choice of the entangled state vector |Ci and the decision measurement projection operator M, then the membership weights satisfy Equations (26–30), hence they are classical data. Viceversa, if µ(A), µ(B), µ(A ′ ), µ(B ′ ), µ(A and B), µ(A and B ′ ), µ(A ′ and B), and µ(A ′ and B ′ ) satisfy Equations (26–30), hence they are classical data, then an entangled state vector |Ci and a decision measurement projection operator M can always be found such that µ(A), µ(B), µ(A ′ ), µ(B ′ ), µ(A and B), µ(A and B ′ ), µ(A ′ and B), and µ(A ′ and B ′ ) can be represented in second sector of Fock space.

Theorem 4 implies that the tensor product Hilbert space model (second sector of Fock space) has exactly the same generality as the most general classical conditions for conjunction and negation. More specifically, given any data that satisfy the five classicality conditions of Theorem 3′ , we can construct an entangled state such that in second sector "exactly" these classicality conditions are satisfied. Moreover, it clarifies that entangled states in our general Fock space modeling of data on conceptual conjunction and negation play a fundamental unexpected role in the combination of human concepts. The fact that classical logical rules are satisfied, in a probabilistic form, in second sector of Fock space provides an important confirmation to our two-sector quantum framework, as we will see in Section 5.

#### 3.3. A complete modeling in Fock space

In Section 3.1 we have considered the situation of first sector of Fock space, representing the starting concepts A and B by the state vectors <sup>|</sup>A<sup>i</sup> and <sup>|</sup>Bi, respectively, of a Hilbert space <sup>H</sup>. Then, we have introduced the state vectors |A ′ i and |B ′ i, which represent the conceptual negations "not A" and "not B," respectively. Since first sector of Fock space describes "emergence," the state vectors |A ′ i and |B ′ i can be interpreted as representing the newly emergent concepts "not A" and "not B," respectively, compatibly with the core of the approach we developed in our quantum modeling of combinations of concepts. We have also seen in Section 3.1 that also the newly emergent concept "A and B," or one of the other conjunction combinations, "A and B ′ ," "A ′ and B," and "A ′ and B ′ ," are directly represented in this first sector of Fock space by state vectors, more specifically by the superposition state vectors of the corresponding state vectors, namely <sup>√</sup><sup>1</sup> 2 (|Ai + |Bi), <sup>√</sup><sup>1</sup> 2 (|Ai + |B ′ i), √1 2 (|A ′ i + |Bi) and <sup>√</sup><sup>1</sup> 2 (|A ′ i + |B ′ i), respectively.

Following Aerts (2009) and Sozzo (2014, 2015), we should also take into account the logical aspects of conceptual conjunctions and negations in second sector of Fock space, mathematically formed by the tensor product of the Hilbert space H of first sector. In previous papers we had represented the state of the concept A and B in second sector by the product vector |Ai⊗|Bi of this tensor product <sup>H</sup> <sup>⊗</sup> <sup>H</sup>. However, this leads inevitably to the probability for the conjunction µ(A and B) to be equal to the product µ(A)µ(B), as we have seen in Section 3.2. In classical probability theory this means that the probabilities are probabilistically independent. Now, quite obviously, since the concepts A and B are related by their meaning, these probabilities are not probabilistically independent. Suppose that B is the negation A ′ , like in the borderline effect (Sozzo, 2014). Then, we obviously would have an anti-correlation between µ(A) and µ(B). But, even in this not simple case, any meaning connection between A and B would give rise to probabilities that are not independent. On the other hand, we have seen in Section 3.2 that we can model any type of classical probabilistic dependence by introducing the proper entangled state for the concept representation of A and of B in second sector. This means that we should not in principle use |Ai ⊗ |Bi to represent the concepts in second sector, but a properly chosen entangled state.

Let us denote, following our analysis in Section 3.2, such a general entangled state in C <sup>8</sup> by means of

$$|\mathcal{C}\rangle = \sum\_{i,j=1}^{8} c\_{ij} e^{\mathcal{Y}^{ij}} |i\rangle \otimes |j\rangle \tag{84}$$

where <sup>|</sup>i<sup>i</sup> and <sup>|</sup>j<sup>i</sup> are the canonical base vectors of <sup>C</sup> 8 .

The state vector representing the concept "A and B" in its totality, hence its first sector part, describing emergent human thought, i.e., the formation of the new concept "A and B," and its second sector part, describing quantum logical human thought, i.e., the conjunctive connective structure "A and B," is then the following

$$
\psi(A,B) = m\_{AB}e^{i\theta}|C\rangle + \frac{\eta\_{AB}e^{i\rho}}{\sqrt{2}}(|A\rangle + |B\rangle) \tag{85}
$$

with m<sup>2</sup> AB + n 2 AB = 1. It is indeed the superposition of two vectors, one vector given by <sup>√</sup><sup>1</sup> 2 (|Ai + |Bi) in first sector of Fock space, accounting for the emergent part of human thought with respect to the conjunction, and a second vector given by |Ci in second sector of Fock space, accounting for the quantum logical part of human thought with respect to the conjunction. By using Equations (84) and (85), we then get the following general expression for the membership weight of the conjunction

$$\begin{aligned} \mu(A \text{ and } B) &= \langle \psi(A, B) | (M \otimes M \oplus M) | \psi(A, B) \rangle \\ &= m\_{AB}^2 \langle \langle C | \rangle M \otimes M | (|C\rangle) \end{aligned}$$

$$\begin{aligned} &+\frac{n\_{AB}^2}{2}(\langle A| + \langle B|)M(|A\rangle + |B\rangle) \\ &= m\_{AB}^2 \sum\_{i,j=5}^8 c\_{ij}^2 + \frac{n\_{AB}^2}{2}(\langle A|M|A\rangle + \langle B|M|B\rangle \\ &+ \langle A|M|B\rangle + \langle B|M|A\rangle) \\ &= m\_{AB}^2 \sum\_{i,j=5}^8 c\_{ij}^2 + n\_{AB}^2(\frac{1}{2}(\mu(A) + \mu(B)) + \mathfrak{R}\langle A|M|B\rangle) \\ &= m\_{AB}^2 \sum\_{i,j=5}^8 c\_{ij}^2 + n\_{AB}^2(\frac{1}{2}(\mu(A) + \mu(B)) \\ &+ (a\_5b\_5 + a\_6b\_6 + a\_7b\_7 + a\_8b\_8)\cos(\phi\_B - \phi\_A)) \text{ (86)} \end{aligned}$$

where we have used Equation (70) in the last line of Equation (86).

What is the procedure corresponding to emergent and quantum logical parts of human thought when we also take into account negations, i.e., when we consider the conjunctions "A and B ′ ," "A ′ and B," and "A ′ and B ′ "? As for first sector of Fock space, we have already made it explicit in Section 3.1. Indeed, the new emergent concepts "A ′ " and "B ′ " are described by the state vectors |A ′ i and |B ′ i, and the respective conjunctions, i.e., their emergent aspects as a new concept, each time by means of the corresponding superposition state vector. This is the way the emergence of negation and conjunction are jointly modeled in first sector of Fock space—new state vectors model the new emergent concepts due to negation, and the emergent conjunctions are modeled by the respective superpositions.

In second sector of Fock space, we however have a specific situation to solve. Namely, exactly as we did for the conjunction, we need to identify what is the quantum logical structure related with negation, independent of its provoking the emergence of a new concept, i.e., the negation of the original concept. In second sector of Fock space we indeed only express the quantum logical reasoning in human thought and not the emergent reasoning. For the conjunction "A and B" we did this by means of the entangled state |Ci. Let us reflect about the negation, for example, with respect to the concept B. To make things clear let us introduce the following two expressions. We are in the experimental situation where the membership of an exemplar x, or the non-membership of this exemplar, is to be decided about, by a person participating in the experiment. The concept B can be involved, and the concept B ′ can be involved.

Expression 1. The considered exemplar x is a member of the concept B ′ .

Expression 2. The considered exemplar x is "not" a member of the concept B.

Our theoretic proposal is that:


focus is on "non-membership" of this exemplar x with respect to the old existing concept B.

Expressions (1) and (2) are two structurally speaking subtle deeply different possibilities of reasoning related to a concept and its negation.

Our third theoretic proposal is that:

(3) human thought, when confronted with this situation, follows a dynamics described by a quantum superposition of the two modes (1) and (2).

We will see in the following that the mathematical structure of Fock space enables modeling this in an impecable way.

Indeed, expression (1) will be modeled in first sector of our Fock space, and it is mathematically realised by making M work on |B ′ i. Expression (2) will instead be modeled in second sector of Fock space, and it is mathematically realised by making <sup>1</sup>⊗(1−M) work on |Ci. In the complete Fock space, direct sum of its first and second sectors, mathematically a superposition of the whole dynamics can be realised, by considering the superposition state which we already specified in Equation (85), and consider different structures of the projection operator on the whole of Fock space. More specifically, (<sup>M</sup> <sup>⊗</sup> <sup>M</sup>) <sup>⊕</sup> <sup>M</sup> for "<sup>A</sup> and <sup>B</sup>," (<sup>M</sup> <sup>⊗</sup> (<sup>1</sup> <sup>−</sup> <sup>M</sup>)) <sup>⊕</sup> <sup>M</sup> for "<sup>A</sup> and B ′ ," ((<sup>1</sup> <sup>−</sup> <sup>M</sup>) <sup>⊗</sup> <sup>M</sup>) <sup>⊕</sup> <sup>M</sup> for "<sup>A</sup> ′ and <sup>B</sup>," and ((<sup>1</sup> <sup>−</sup> <sup>M</sup>) <sup>⊗</sup> (<sup>1</sup> <sup>−</sup> M)) ⊕ M for "A ′ and B ′ ." However, for each of the combinations the vector representing the combination in second sector of Fock space will be |Ci. So, no vector appears in second sector of Fock space, since the negation is expressed quantum logically here, hence by <sup>M</sup> becoming <sup>1</sup> <sup>−</sup> <sup>M</sup>. While in first sector of Fock space, the negation is expressed emergently, hence by |Ai becoming |A ′ i and |Bi becoming |B ′ i, and M remaining M, since the focus in this first sector of Fock space, with emergent reasoning of human thought, is always on "membership," while in second sector, with quantum logical reasoning, the focus of negation is on "non-membership," described by <sup>1</sup> <sup>−</sup> <sup>M</sup>.

The above conceptual analysis makes it possible for us to write the complete Fock space formulas for the other combinations. More specifically, if we represent the concept "A and B ′ " by the unit vector

$$\left|\psi(A,B') = m\_{AB'}e^{i\theta}|C\rangle + \frac{\eta\_{AB'}e^{i\rho}}{\sqrt{2}}(|A\rangle + |B'\rangle) \tag{87}$$

with m<sup>2</sup> AB′ + n 2 AB′ = 1, then, by using Equations (84) and (87), we get

$$\begin{aligned} \mu(A \text{ and } B') &= \langle \psi(A, B') | M \otimes (1 - M) \oplus M | \psi(A, B') \rangle \\ &= m\_{AB'}^2 \langle \langle \mathcal{C} | M \otimes (1 - M) | \langle \mathcal{C} \rangle \rangle \\ &\quad + \frac{n\_{AB'}^2}{2} \langle \langle A | + \langle B' | M | A \rangle + | B' \rangle \rangle \\ &= m\_{AB'}^2 \sum\_{i=5}^8 \sum\_{j=1}^4 c\_{ij}^2 + \frac{n\_{AB'}^2}{2} \langle \langle A | M | A \rangle + \langle B' | M | B' \rangle \\ &\quad + \langle A | M | B' \rangle + \langle B' | M | A \rangle \rangle \\ &= m\_{AB'}^2 \sum\_{i=5}^8 \sum\_{j=1}^4 c\_{ij}^2 + n\_{AB'}^2 \left( \frac{1}{2} (\mu(A) + \mu(B')) \right) \\ &\quad + \mathfrak{R}(A | M | B') \end{aligned}$$

$$=\left.m\_{AB'}^2\sum\_{i=5}^8\sum\_{j=1}^4 c\_{ij}^2 + n\_{AB'}^2(\frac{1}{2}(\mu(A) + \mu(B'))$$

$$+ (a\_5b\_5' + a\_6b\_6' + a\_7b\_7' + a\_8b\_8')\cos(\phi\_{B'} - \phi\_A))\right) \tag{88}$$

where we have used Equation (71) in the last line of Equation (88). Analogously, if we represent the concept "A ′ and B" by the unit vector

$$\psi(A',B) = m\_{A'B}e^{i\theta}|C\rangle + \frac{m\_{A'B}e^{i\rho}}{\sqrt{2}}(|A'\rangle + |B\rangle) \tag{89}$$

with m<sup>2</sup> <sup>A</sup>′<sup>B</sup> + n 2 <sup>A</sup>′<sup>B</sup> = 1, then, by using Equations (84) and (89), we get

$$\begin{aligned} \mu(A' \text{ and } B) &= \langle \psi(A', B) | (\mathbb{1} - M) \otimes M \oplus \mathbb{1} | \psi(A', B) \rangle \\ &= m\_{A'B}^2(\langle \mathbb{C} | (\mathbb{1} - M) \otimes M | (\mathbb{C} )) \\ &\quad + \frac{n\_{A'B}^2}{2} (\langle A' | + \langle B |) M (| A' \rangle + | B \rangle) \\ &= m\_{A'B}^2 \sum\_{i=1}^4 \sum\_{j=5}^8 c\_{ij}^2 + \frac{n\_{A'B}^2}{2} (\langle A' | M | A' \rangle + \langle B | M | B \rangle) \\ &\quad + \langle A' | M | B \rangle + \langle B | M | A' \rangle) \\ &= m\_{A'B}^2 \sum\_{i=1}^4 \sum\_{j=5}^8 c\_{ij}^2 + n\_{A'B}^2 (\frac{1}{2} (\mu(A') + \mu(B)) \\ &\quad + \mathfrak{R} \langle A' | M | B \rangle) \\ &= m\_{A'B}^2 \sum\_{i=1}^4 \sum\_{j=5}^8 c\_{ij}^2 + n\_{A'B}^2 (\frac{1}{2} (\mu(A') + \mu(B)) \\ &\quad + (a\_5' b\_5 + a\_6' b\_6 + a\_7' b\_7 + a\_8' b\_8) \cos(\phi\_B - \phi\_{A'})) \end{aligned} \tag{90}$$

where we have used Equation (72) in the last line of Equation (90). Finally, if we represent the concept "A ′ and B ′ " by the unit vector

$$\psi(A',B') = m\_{A'B'}e^{i\theta}|C\rangle + \frac{n\_{A'B'}e^{i\rho}}{\sqrt{2}}(|A'\rangle + |B'\rangle) \tag{91}$$

with m<sup>2</sup> <sup>A</sup>′B′ + n 2 <sup>A</sup>′B′ = 1, then, by using Equations (84) and (91), we get

$$
\begin{split}
\mu(A' \text{ and } B') &= \langle \psi(A', B') | (\mathbbm{1} - M) \otimes (\mathbbm{1} - M) \oplus M | \psi(A', B') \rangle \\ &= m\_{A'B'}^2 \langle (\mathbb{C}[)(\mathbbm{1} - M) \otimes (\mathbbm{1} - M) | (\mathbb{C}[) \\ &\quad + \frac{n\_{A'B'}^2}{2} (\langle A' \rangle + \langle B' \rangle) M (|A' \rangle + |B' \rangle) \\ &= m\_{A'B'}^2 \sum\_{i,j=1}^4 c\_{ij}^2 + \frac{n\_{A'B'}^2}{2} (\langle A' | M | A' \rangle \\ &\quad + \langle B' | M | B' \rangle + \langle A' | M | B' \rangle + \langle B' | M | A' \rangle) \\ &= m\_{A'B'}^2 \sum\_{i,j=1}^4 c\_{ij}^2 + n\_{A'B'}^2 (\frac{1}{2} (\mu(A) + \mu(B')) \\ &\quad + \mathfrak{R} \langle A' | M | B' \rangle)
\end{split}
$$

$$\begin{aligned} 0 &= m\_{A'B'}^2 \sum\_{i,j=1}^4 c\_{ij}^2 + n\_{A'B'}^2 (\frac{1}{2} (\mu(A') + \mu(B')) \\ &+ (a\_5' b\_5' + a\_6' b\_6' + a\_7' b\_7' + a\_8' b\_8') \cos(\phi\_{B'} - \phi\_{A'})) \end{aligned} \tag{92}$$

where we have used Equation (73) in the last line of Equation (92).

Equations (86), (88), (90), and (92) contain the probabilistic expressions for simultaneously representing experimental data on conjunctions and negations of two concepts in a quantum-theoretic framework. These equations express the membership weights of the conjunctions "A and B," "A and B ′ ," "A ′ and B," and "A ′ and B ′ " in terms of the memership weights of A, B, A ′ , and B ′ , for suitable values of the following modeling parameters:<sup>4</sup>


As we can see our two-sector Fock space framework is able to cope with conceptual negation in a very natural way. In fact, the latter negation is modeled by using the general assumption that emergent aspects of a concept are represented in first sector of Fock space, while logical aspects of a concept are represented in second sector. This will be made explicit in Section 5. It is however important to stress that, for a given experiment eXY, with X = A, A ′ , Y = B, B ′ described in Section 2, there is no guarantee that sets of these parameters can be found such that Equations (86–92) are simultaneously satisfied. For this reason, we provide in Appendix A5 the conditions that should be satisfied by the experimental data µ(A), µ(B), . . . , µ(A ′ and B), µ(A ′ and B) such that these sets exist.

The conclusion we draw from the analysis above is that finding solutions for a given set of experimental data in our quantumtheoretic modeling it is highly non-obvious, which makes the results in the next section even more significant.

# 4. Representation of experimental data in Fock space

Most of these data in **Tables A1**–**A4**, are compatible with the intervals in Equations (A26), (A29), (A32), and (A35). Hence, almost all our data can be successfully modeled by using the quantum probabilistic equations in Equations (86), (88), (90), and (92). Let us consider some interesting cases, distinguishing them by: (i) situations with double overextension, (ii) situations with complete overextension, (iii) situations requiring both sectors of Fock space and/or entanglement, (iv) partially classical situations. Complete modeling is presented in the Supplementary Material attached to this article.

(i) Let us start with exemplars that are double overextended. Olive, with respect to (Fruits, Vegetables) (double overextension with respect to Fruits And Vegetables). Olive scored µ(A) = 0.53 with respect to Fruits, µ(B) = 0.63

<sup>4</sup>We remind that n 2 XY <sup>=</sup> <sup>1</sup>−m<sup>2</sup> XY , X = A, A ′ , Y = B, B ′ <sup>P</sup> . In addition, only the sums 8 i,j = 5 c 2 ij, P<sup>8</sup> i = 5 P<sup>4</sup> j = 1 c 2 ij, P<sup>4</sup> i = 1 P<sup>8</sup> j = 5 c 2 ij, and <sup>P</sup><sup>8</sup> i,j = 5 c 2 ij appear in Equations (86), (88), (90), and (92), respectively.

with respect to Vegetables, µ(A ′ ) = 0.47 with respect to Not Fruits, µ(B ′ ) = 0.44 with respect to Not Vegetables, µ(A and B) = 0.65 with respect to Fruits And Vegetables, µ(A and B ′ ) = 0.34 with respect to Fruits And Not Vegetables, µ(A ′ and B) = 0.51 with respect to Not Fruits And Vegetables, and µ(A ′ and B ′ ) = 0.36 with respect to Not Fruits And Not Vegetables. If one first looks for a representation of Olive in the Hilbert space C 8 , then the concepts Fruits and Vegetables are represented by the unit vectors |Ai = e iφA (−0.02, −0.47, 0.5, −0.02, −0.07, −0.31, −0.18, −0.63) and |Bi = e <sup>i</sup>φ<sup>B</sup> (0.04, 0.02, <sup>−</sup>0.6, 0.03, <sup>−</sup>0.26, 0.35, <sup>−</sup>0.39, <sup>−</sup>0.53), respectively, and their negations Not Fruits and Not Vegetables by the unit vectors |A ′ i = e iφA′ (0.06, −0.47, −0.55, 0.03, −0.02, −0.64, −0.06, 0.25), and |B ′ i = e iφ<sup>B</sup> ′ (−0.03, 0.75, −0.01, −0.01, −0.08, −0.6, −0.18, −0.19), respectively.

The interference angles φAB = φ<sup>B</sup> − φ<sup>A</sup> = 57.31◦ , φAB′ = φB′ − φ<sup>A</sup> = 95.32◦ , φA′<sup>B</sup> = φ<sup>B</sup> − φA′ = 103.43◦ and φA′B′ = φB′ − φA′ = 85.56◦ complete the Hilbert space representation in C 8 . A complete modeling in the Fock space C <sup>8</sup> <sup>⊕</sup> (<sup>C</sup> <sup>8</sup> <sup>⊗</sup> C 8 ) satisfying Equations (86), (88), (90), and (92) is given by an entangled state characterized by P<sup>8</sup> i,j = 5 c 2 ij <sup>=</sup> 0.44<sup>2</sup> , P<sup>8</sup> i = 5 P<sup>4</sup> i = 1 c 2 ij <sup>=</sup> 0.58<sup>2</sup> , P<sup>4</sup> i = 1 P<sup>8</sup> j = 5 c 2 ij <sup>=</sup> 0.66<sup>2</sup> and P<sup>4</sup> i,j = 1 c 2 ij <sup>=</sup> 0.18<sup>2</sup> , and convex weights mAB = 0.42, nAB = 0.91, mAB′ = 0.1, nAB′ = 0, mA′<sup>B</sup> = 0.78, nA′<sup>B</sup> = 0.63, mA′B′ = 0.52, and nA′B′ = 0.86.

Prize Bull, with respect to (Pets, Farmyard Animals) (double overextension with respect to Pets And Not Farmyard Animals). Prize Bull scored µ(A) = 0.13 with respect to Pets, µ(B) = 0.76 with respect to Farmyard Animals, µ(A ′ ) = 0.88 with respect to Not Pets, µ(B ′ ) = 0.26 with respect to Not Farmyard Animals, µ(A and B) = 0.43 with respect to Pets And Farmyard Animals, µ(A and B ′ ) = 0.28 with respect to Pets And Not Farmyard Animals, µ(A ′ and B) = 0.83 with respect to Not Pets And Farmyard Animals, and µ(A ′ and B ′ ) = 0.34 with respect to Not Pets And Not Farmyard Animals. If one first looks for a representation of Prize Bull in the Hilbert space C 8 , then the concepts Pets and Farmyard Animals, and their negations Not Pets and Not Farmyard Animals are respectively represented by the unit vectors |Ai = e <sup>i</sup>φ<sup>A</sup> (0.07, <sup>−</sup>0.39, <sup>−</sup>0.84,0.03, <sup>−</sup>0.06, <sup>−</sup>0.35, 0.04, −0.01) and |Bi = e <sup>i</sup>φ<sup>B</sup> (0.03, 0.21, <sup>−</sup>0.44, 0.01, 0.01, 0.81, −0.2, −0.25), and |A ′ i = e iφA′ (0.01, 0.29, −0.19, 0, 0.11, 0.06, −0.2, 0.91) and |B ′ i = e iφ<sup>B</sup> ′ (0.01, 0.84, −0.19, 0, −0.17, −0.41, −0.01, −0.26).

The interference angles φAB = φ<sup>B</sup> − φ<sup>A</sup> = 105.71◦ , φAB′ = φB′ − φ<sup>A</sup> = 40.23◦ , φA′BE = φ<sup>B</sup> − φA′ = 111.25◦ and φA′B′ = φB′ − φA′ = 52.51◦ complete the Hilbert space representation in C 8 . A complete modeling in the Fock space C <sup>8</sup> <sup>⊕</sup> (<sup>C</sup> <sup>8</sup> <sup>⊗</sup> C 8 ) satisfying Equations (86), (88), (90), and (92) is given by an entangled state characterized by P<sup>8</sup> i,j = 5 c 2 ij <sup>=</sup> 0.24<sup>2</sup> , P<sup>8</sup> i = 5 P<sup>4</sup> i = 1 c 2 ij <sup>=</sup> 0.27<sup>2</sup> , P<sup>4</sup> i = 1 P<sup>8</sup> j = 5 c 2 ij <sup>=</sup> 0.84<sup>2</sup> , and P<sup>4</sup> i,j = 1 c 2 ij <sup>=</sup> 0.41<sup>2</sup> , and convex weights mAB = 0.46, nAB = 0.89, mAB′ = 0.41, nAB′ = 0.91, mA′<sup>B</sup> = 0.54, nA′<sup>B</sup> = 0.84, mA′B′ = 0.52, and nA′B′ = 0.85.

Door Bell, with respect to (Home Furnishing, Furniture) (double overextension with respect to Not Home Furnishing And Furniture). Door Bell scored µ(A) = 0.75 with respect to Home Furnishing, µ(B) = 0.33 with respect to Furniture, µ(A ′ ) = 0.32 with respect to Not Home Furnishing, µ(B ′ ) = 0.79 with respect to Not Furniture, µ(A and B) = 0.5 with respect to Home Furnnishing And Furniture, µ(A and B ′ ) = 0.64 with respect to Home Furnishing And Not Furniture, µ(A ′ and B) = 0.34 with respect to Not Home Furnishing And Furniture, and µ(A ′ and B ′ ) = 0.51 with respect to Not Home Furnishing And Not Furniture. If one first looks for a representation of Door Bell in the Hilbert space C 8 , then the concepts Home Furnishing and Furniture, and their negations Not Home Furnishing and Not Furniture are respectively represented by the unit vectors |Ai = e <sup>i</sup>φ<sup>A</sup> (0,0.33, 0.37, −0.05, 0.04, −0.29, 0, 0.81) and |Bi = e <sup>i</sup>φ<sup>B</sup> (−0.14, 0.77, 0.17, −0.16, 0.24, −0.19, 0.07, −0.48), and |A ′ i = e iφA′ (0.21, −0.43, 0.66, 0.13, 0.22, −0.39, 0.22, −0.27) and |B ′ i = e iφ<sup>B</sup> ′ (−0.08, −0.03, −0.45, −0.02, −0.17, −0.52, 0.7, 0.04).

The interference angles φAB = φ<sup>B</sup> − φ<sup>A</sup> = 102.81◦ , φAB′ = φB′ − φ<sup>A</sup> = 117.67◦ , φA′<sup>B</sup> = φ<sup>B</sup> − φA′ = 67.37◦ and φA′B′ = φB′ − φA′ = 77.65◦ complete the Hilbert space representation in C 8 . A complete modeling in the Fock space C <sup>8</sup> <sup>⊕</sup> (<sup>C</sup> <sup>8</sup> <sup>⊗</sup> <sup>C</sup> 8 ) satisfying Equations (86), (88), (90), and (92) is given by an entangled state characterized by P<sup>8</sup> i,j = 5 c 2 ij = 0.35<sup>2</sup> , P<sup>8</sup> i = 5 P<sup>4</sup> i = 1 c 2 ij <sup>=</sup> 0.79<sup>2</sup> , P<sup>4</sup> i = 1 P<sup>8</sup> j = 5 c 2 ij <sup>=</sup> 0.46<sup>2</sup> and P<sup>4</sup> i,j = 1 c 2 ij <sup>=</sup> 0.2<sup>2</sup> , and convex weights mAB = 0.48, nAB = 0.88, mAB′ = 0.91, nAB′ = 0.41, mA′<sup>B</sup> = 0.65, nA′<sup>B</sup> = 0.76, mA′B′ = 0.43, and nA′B′ = 0.9.

(ii) Let us now come to the exemplars that present complete overextension, that is, exemplars that are overextended in all experiments.

Goldfish, with respect to (Pets, Farmyard Animals) (big overextension in all experiments, but also double overextension with respect to Not Pets And Farmyard Animals). Goldfish scored µ(A) = 0.93 with respect to Pets, µ(B) = 0.17 with respect to Farmyard Animals, µ(A ′ ) = 0.12 with respect to Not Pets, µ(B ′ ) = 0.81 with respect to Not Farmyard Animals, µ(A and B) = 0.43 with respect to Pets And Farmyard Animals, µ(A and B ′ ) = 0.91 with respect to Pets And Not Farmyard Animals, µ(A ′ and B) = 0.18 with respect to Not Pets And Farmyard Animals, and µ(A ′ and B ′ ) = 0.43 with respect to Not Pets And Not Farmyard Animals. If one first looks for a representation of Goldfish in the Hilbert space C 8 , then the concepts Pets and Farmyard Animals, and their negations Not Pets and Not Farmyard Animals are respectively represented by the unit vectors |Ai = e <sup>i</sup>φ<sup>A</sup> (−0.05, 0.16, <sup>−</sup>0.21, <sup>−</sup>0.01, <sup>−</sup>0.71, 0.22, 0.33, 0.51) and |Bi = e <sup>i</sup>φ<sup>B</sup> (−0.24, 0.26, <sup>−</sup>0.84, <sup>−</sup>0.07, 0.38, −0.11, −0.01, 0.12), and |A ′ i = e iφA′ (0.18, 0.85, 0.35, 0.09, 0.2, −0.12, −0.03, 0.25) and |B ′ i = e iφ<sup>B</sup> ′ (0.01, −0.41, 0.14, −0.01, 0.27, −0.32, −0.13, 0.79).

The interference angles φAB = φ<sup>B</sup> − φ<sup>A</sup> = 78.9◦ , φAB′ = φB′ − φ<sup>A</sup> = 43.15◦ , φA′<sup>B</sup> = φ<sup>B</sup> − φA′ = 54.74◦ and φA′B′ = φB′ − φA′ = 77.94◦ complete the Hilbert space representation in C 8 . A complete modeling in the Fock space C <sup>8</sup> <sup>⊕</sup> (<sup>C</sup> <sup>8</sup> <sup>⊗</sup> C 8 ) satisfying Equations (86), (88), (90), and (92) is given by an entangled state characterized by P<sup>8</sup> i,j = 5 c 2 ij <sup>=</sup> 0.35<sup>2</sup> , P<sup>8</sup> i = 5 P<sup>4</sup> i = 1 c 2 ij <sup>=</sup> 0.9<sup>2</sup> , P<sup>4</sup> i = 1 P<sup>8</sup> j = 5 c 2 ij <sup>=</sup> 0.22<sup>2</sup> and P<sup>4</sup> i,j = 1 c 2 ij <sup>=</sup> 0.17<sup>2</sup> , and convex weights mAB = 0.45, nAB = 0.89, mAB′ = 0.45, nAB′ = 0.9, mA′<sup>B</sup> = 0.48, nA′<sup>B</sup> = 0.88, mA′B′ = 0.45, and nA′B′ = 0.89.

Parsley, with respect to (Spices, Herbs) (overextension in all experiments). Parsley scored µ(A) = 0.54 with respect to Spices, µ(B) = 0.9 with respect to Herbs, µ(A ′ ) = 0.54 with respect to Not Spices, µ(B ′ ) = 0.09 with respect to Not Herbs, µ(A and B) = 0.68 with respect to Spices And Herbs, µ(A and B ′ ) = 0.26 with respect to Spices And Not Herbs, µ(A ′ and B) = 0.73 with respect to Not Spices And Herbs, and µ(A ′ and B ′ ) = 0.18 with respect to Not Spices And Not Herbs. If one first looks for a representation of Parsley in the Hilbert space C 8 , then the concepts Spices and Herbs, and their negations Not Spices and Not Herbs are respectively represented by the unit vectors |Ai = e <sup>i</sup>φ<sup>A</sup> (0, 0.25, −0.63, −0.02, −0.02, 0.5, −0.06, 0.54) and |Bi = e iφB (0, 0.02, −0.32, −0.01, 0.09, −0.84, −0.23, 0.37), and |A ′ i = e iφA′ (0, 0.17, 0.66, 0.02, −0.17, 0.01,0.14, 0.7) |B ′ i = e iφ<sup>B</sup> ′ (0, −0.95, −0.06, −0.01, −0.04, 0.11, 0.02, 0.27).

The interference angles φAB = φ<sup>B</sup> − φ<sup>A</sup> = 97.66◦ , φAB′ = φB′ − φ<sup>A</sup> = 84.49◦ , φA′<sup>B</sup> = φ<sup>B</sup> − φA′ = 68.25◦ and φA′B′ = φB′ −φA′ = 113.49◦ complete the Hilbert space representation in C 8 . A complete modeling in the Fock space C <sup>8</sup> <sup>⊕</sup> (C <sup>8</sup> <sup>⊗</sup> <sup>C</sup> 8 ) satisfying Equations (86), (88), (90), and (92) is given by an entangled state characterized by P<sup>8</sup> i,j = 5 c 2 ij = 0.66<sup>2</sup> , P<sup>8</sup> i = 5 P<sup>4</sup> i = 1 c 2 ij <sup>=</sup> 0.32<sup>2</sup> , P<sup>4</sup> i = 1 P<sup>8</sup> j = 5 c 2 ij <sup>=</sup> 0.68<sup>2</sup> and P<sup>4</sup> i,j = 1 c 2 ij = 0, and convex weights mAB = 0.48, nAB = 0.88, mAB′ = 0.55, nAB′ = 0.84, mA′<sup>B</sup> = 0.46, nA′<sup>B</sup> = 0.89, mA′B′ = 0.5, and nA′B′ = 0.87.

(iii) Let us then illustrate some relevant exemplars that either cannot be modeled in a pure Hilbert space framework, or cannot be represented by product states in second sector of Fock space.

Raisin, with respect to (Fruits, Vegetables). Raisin scored µ(A) = 0.88 with respect to Fruits, µ(B) = 0.27 with respect to Vegetables, µ(A ′ ) = 0.13 with respect to Not Fruits, µ(B ′ ) = 0.76 with respect to Not Vegetables, µ(A and B) = 0.53 with respect to Fruits And Vegetables, µ(A and B ′ ) = 0.75 with respect to Fruits And Not Vegetables, µ(A ′ and B) = 0.25 with respect to Not Fruits And Vegetables, and µ(A ′ and B ′ ) = 0.34 with respect to Not Fruits And Not Vegetables. If one first looks for a representation of Raisin in the Hilbert space C 8 , then the concepts Fruits and Vegetables, and their negations Not Fruits and Not Vegetables are respectively represented by the unit vectors |Ai = e <sup>i</sup>φ<sup>A</sup> (0.05, <sup>−</sup>0.01, 0.34, 0.01, −0.1, −0.51, 0.23, −0.75) and |Bi = e iφB (−0.41, −0.15, −0.73, −0.1, −0.38, −0.17, −0.19, −0.25), and |A ′ i = e iφA′ (0.56, −0.73, −0.09, 0.1,−0.08, −0.28, −0.15, 0.16) and |B ′ i = e iφ<sup>B</sup> ′ (0.07, 0.46, −0.14, 0.04, 0.13, −0.76, −0.11, 0.4).

However, a complete representation satisfying Equations (86), (88), (90), and (92) can only be worked out in the Fock space C <sup>8</sup> <sup>⊕</sup> (<sup>C</sup> <sup>8</sup> <sup>⊗</sup> <sup>C</sup> 8 ). This occurs for interference angles φAB = φ<sup>B</sup> − φ<sup>A</sup> = 80.79◦ , φAB′ = φB′ − φ<sup>A</sup> = 160◦ , φA′<sup>B</sup> = φ<sup>B</sup> − φA′ = 18.15◦ and φA′B′ = φB′ − φA′ = 92.88◦ , and for an entangled state characterized by P<sup>8</sup> i,j = 5 c 2 ij = 0.41<sup>2</sup> , P<sup>8</sup> i = 5 P<sup>4</sup> i = 1 c 2 ij <sup>=</sup> 0.85<sup>2</sup> , P<sup>4</sup> i = 1 P<sup>8</sup> j = 5 c 2 ij <sup>=</sup> 0.32<sup>2</sup> and P<sup>4</sup> i,j = 1 c 2 ij <sup>=</sup> 0.13<sup>2</sup> , and convex weights mAB = 0.45, nAB = 0.89, mAB′ = 0.65, nAB′ = 0.76, mA′<sup>B</sup> = 0.26, nA′<sup>B</sup> = 0.97, mA′B′ = 0.48, and nA′B′ = 0.88.

Fox, with respect to (Pets, Farmyard Animals). Fox scored µ(A) = 0.13 with respect to Pets, µ(B) = 0.3 with respect to Farmyard Animals, µ(A ′ ) = 0.86 with respect to Not Pets, µ(B ′ ) = 0.68 with respect to Not Farmyard Animals, µ(A and B) = 0.18 with respect to Pets And Farmyard Animals, µ(A and B ′ ) = 0.29 with respect to Pets And Not Farmyard Animals, µ(A ′ and B) = 0.46 with respect to Not Pets And Farmyard Animals, and µ(A ′ and B ′ ) = 0.59 with respect to Not Pets And Not Farmyard Animals. If one first looks for a representation of Fox in the Hilbert space C 8 , then the concepts Pets and Farmyard Animals, and their negations Not Pets and Not Farmyard Animals are respectively represented by the unit vectors |Ai = e <sup>i</sup>φ<sup>A</sup> (−0.07, <sup>−</sup>0.84, <sup>−</sup>0.39, <sup>−</sup>0.03, <sup>−</sup>0.02, <sup>−</sup>0.31, 0.02, 0.19) and |Bi = e <sup>i</sup>φ<sup>B</sup> (−0.01, 0.17, <sup>−</sup>0.82, 0.01, −0.01, 0.28, −0.01, −0.47), and |A ′ i = e iφA′ (−0.05, 0.19, −0.31, −0.02,0.12, 0.39, −0.02, 0.83) and |B ′ i = e iφ<sup>B</sup> ′ (−0.14, 0.47, −0.26, −0.08, −0.08, −0.8, 0.04, 0.17).

However, a complete representation satisfying Equations (86), (88), (90), and (92) can only be worked out in the Fock space C <sup>8</sup> <sup>⊕</sup> (<sup>C</sup> <sup>8</sup> <sup>⊗</sup> <sup>C</sup> 8 ). This occurs for interference angles φAB = φ<sup>B</sup> − φ<sup>A</sup> = 96.58◦ , φAB′ = φB′ − φ<sup>A</sup> = 95.05◦ , φA′<sup>B</sup> = φ<sup>B</sup> − φA′ = 85.68◦ and φA′B′ = φB′ − φA′ = −20◦ , and for an entangled state characterized by P<sup>8</sup> i,j = 5 c 2 ij = 0.05<sup>2</sup> , P<sup>8</sup> i = 5 P<sup>4</sup> i = 1 c 2 ij <sup>=</sup> 0.36<sup>2</sup> , P<sup>4</sup> i = 1 P<sup>8</sup> j = 5 c 2 ij <sup>=</sup> 0.55<sup>2</sup> and P<sup>4</sup> i,j = 1 c 2 ij <sup>=</sup> 0.76<sup>2</sup> , and convex weights mAB = 0.51, nAB = 0.86, mAB′ = 0.61, nAB′ = 0.79, mA′<sup>B</sup> = 0.61, nA′<sup>B</sup> = 0.79, mA′B′ = 0.66, and nA′B′ = 0.75.

(iv) Let us finally describe the quantum-theoretic representation of an exemplar that does not present overextension in any conjunction, but still does not admit a representation in a classical Kolmogorovian probability framework.

Window Seat, with respect to (Home Furnishing, Furniture). Window Seat scored µ(A) = 0.5 with respect to Home Furnishing, µ(B) = 0.48 with respect to Furniture, µ(A ′ ) = 0.47 with respect to Not Home Furnishing, µ(B ′ ) = 0.55 with respect to Not Furniture, µ(A and B) = 0.45 with respect to Home Furnnishing And Furniture, µ(A and B ′ ) = 0.49 with respect to Home Furnishing And Not Furniture, µ(A ′ and B) = 0.39 with respect to Not Home Furnishing And Furniture, and µ(A ′ and B ′ ) = 0.41 with respect to Not Home Furnishing And Not Furniture. If one first looks for a representation of Window Seat in the Hilbert space C 8 , then the concepts Home Furnishing and Furniture, and their negations Not Home Furnishing and Not Furniture are respectively represented by the unit vectors |Ai = e <sup>i</sup>φ<sup>A</sup> (−0.01, 0.69, 0.14, −0.01, −0.13, −0.66, −0.2, 0.11) and |Bi = e iφB (−0.08, −0.39, −0.6,0, −0.03, −0.4, −0.17, 0.54), and |A ′ i = e iφA′ (0.13, −0.19, 0.69, 0.01, 0.09,0.05, −0.05, 0.67) and |B ′ i = e iφ<sup>B</sup> ′ (−0.09, 0.57, −0.34, −0.02, 0.17, 0.54, 0.11, 0.47).

The interference angles φAB = φ<sup>B</sup> − φ<sup>A</sup> = 76.57◦ , φAB′ = φB′ − φ<sup>A</sup> = 103.86◦ , φA′<sup>B</sup> = φ<sup>B</sup> − φA′ = 84.42◦ and φA′B′ = φB′ − φA′ = 85.94◦ complete the Hilbert space representation in C 8 . A complete modeling in the Fock space C <sup>8</sup> <sup>⊕</sup> (<sup>C</sup> <sup>8</sup> <sup>⊗</sup> C 8 ) satisfying Equations (86), (88), (90), and (92) is given by an entangled state characterized by P<sup>8</sup> i,j = 5 c 2 ij <sup>=</sup> 0.31<sup>2</sup> , P<sup>8</sup> i = 5 P<sup>4</sup> i = 1 c 2 ij <sup>=</sup> 0.64<sup>2</sup> , P<sup>4</sup> i = 1 P<sup>8</sup> j = 5 c 2 ij <sup>=</sup> 0.62<sup>2</sup> and P<sup>4</sup> i,j = 1 c 2 ij <sup>=</sup> 0.34<sup>2</sup> , and convex weights mAB = 0.51, nAB = 0.86, mAB′ = 0.77, nAB′ = 0.63, mA′<sup>B</sup> = 1, nA′<sup>B</sup> = 0, mA′B′ = 0.54, and nA′B′ = 0.84.

The theoretic analysis on the representatibility of the data in **Tables A1**–**A4** is thus concluded. We stress that the majority of these data can be faithfully modeled by using the mathematical formalism of quantum theory in Fock space. We finally observe that a big amount of the collected data can be modeled by using only the first sector of Fock space, while almost all the weights of n 2 -type in first sector prevail over the weights of m<sup>2</sup> -type in second sector. The reasons of this will be clear after the discussion in Section 5.

# 5. Discussion

Our experimental data on conjunctions and negations of natural concepts confirm that classical probability does not generally work when people combine concepts, as we have seen in the previous sections. And, more, we have proved here that the deviations from classicality cannot be reduced to overextension and underextension, but they also include a very strong and fundamental pattern of violation. On the other side, our quantumtheoretic framework in Fock space has received remarkable corroboration. Thus, we think it worth to summarize and stress the novelties that have emerged in this article with respect to our approach.

We have recently put forward an explanatory hypothesis with respect to the deviations from classical logical reasoning that have been observed in human cognition (Aerts et al., 2015a). According to our explanatory hypothesis, human reasoning is a specifically structured superposition of two processes, a "logical reasoning" and a "conceptual reasoning" (also called "emergent reasoning"). The former "logical reasoning" combines cognitive entities, such as concepts, combinations of concepts, or propositions, by applying the rules of logic, though generally in a probabilistic way. The latter "emergent reasoning" enables formation of combined cognitive entities as newly emerging entities, in the case of concepts, new concepts, in the case of propositions, new propositions, carrying new meaning, linked to the meaning of the constituent cognitive entities, but with a linkage not defined by the algebra of logic. The two mechanisms act simultaneously and in superposition in human thought during a reasoning process, the first one is guided by an algebra of "logic," the second one follows a mechanism of "emergence." In this perspective, human reasoning can be mathematically formalized in a two-sector Fock space. More specifically, first sector of Fock space models "conceptual emergence," while second sector of Fock space models a conceptual combination from the combining concepts by requiring the rules of logic for the logical connective used for the combining to be satisfied in a probabilistic setting. The relative prevalence of emergence or logic in a specific cognitive process is measured by the "degree of participation" of second and first sectors, respectively. The abundance of evidence of deviations from classical logical reasoning in concrete human decisions (paradoxes, fallacies, effects, contradictions), together with our results, led us to draw the conclusion that emergence constitutes the dominant dynamics of human reasoning, while logic is only a secondary form of dynamics.

Now, if one reflects on how we represented conceptual negation in Section 3, one realizes at once that its modeling directly and naturally follows from the general hypothesis stated above. Indeed, suppose that a person is asked to estimate whether a given exemplar x is a member of the concepts A, B ′ , "A and B ′ (a completely equivalent explanation can be given for the conjunctions "A ′ and B" and "A ′ and B ′ "). Then, our quantum mathematics can be interpreted by assuming that a "logical thought" acts, where the person considers two copies of x and estimates whether the first copy belongs to A and the second copy of x "does not" belong to B, thus applying logical rules, though in a probabilistic way. But also a "conceptual thought" acts, where the person estimates whether the exemplar x belongs to the newly emergent concept "A and B ′ ." The place whether these superposed processes can be suitably structured is the two-sector Fock space. First sector of Fock space hosts the latter process, second sector hosts the former, hence one expects that classical logical rules are valid in this sector, though they are generally violated whenever both sectors are considered. The weights m<sup>2</sup> AB′ and n 2 AB′ indicate whether the overall process is mainly guided by logic or emergence.

The second confirmation of our quantum-conceptual framework comes from the significantly stable deviations from classicality in Equations (26–30). We have seen in Section 2.4 that these deviations occur at a different, deeper, level than overextension and underextension. We think we have identified a general mechanism determining how concepts are formed in the human mind. And this would already be convincing even without mentioning a Fock space modeling. But, this very stable pattern can exactly be explained in our two-sector Fock space framework by assuming that emergence plays a primary role in the human reasoning process, but also aspects of logic are systematically present. Indeed, suppose that, for every exemplar x and every X = A, A ′ , Y = B, B ′ , n 2 XY <sup>=</sup> 1 and <sup>m</sup><sup>2</sup> XY = 0, that is, the decision process only occurs in first sector of Fock space. This assumption corresponds, from our quantum modeling perspective (see Equations 86–92), to a situation where only emergence is present. We have then argued in Aerts et al. (2015b) that, for every X = A, A ′ , Y = B, B ′ , I<sup>X</sup> = I<sup>Y</sup> = −0.5 and IABA′B′ = −1, in this case. Then, an immediate comparison with the experimental values of IX, IY, and IABA′B′ in Section 2.4 reveals that a component of second sector of Fock space is also present, which is generally smaller than the component of first sector but systematic across all exemplars. The consequence is immediate: both emergence and logic play a role in the decision process—emergence is dominant, but also logic is systematically present. We believe that this finding is really a fundamental one, and it deserves further investigation in the future.

The third strong confirmation of this two-layered structure of human thought and its representation in two-sector Fock space comes from the peculiarities of conceptual negation. Indeed, being pushed to cope with conceptual conjunction and negation simultaneously, we have found a new insight which we had not noticed before, namely, the emergent non-classical properties of the conjunction Fruits And Vegetables are naturally accounted for in first sector of Fock space, while the fact that Not Vegetables does not have a well defined prototype, but it is rather the negation of Vegetables, is accounted for in second sector of Fock space, where logic occurs. In both cases, the Fock space model has naturally suggested us the right directions to follow.

The fourth corroboration derives from the fact that our Fock space indicates how and why introducing entanglement. In our previous attempts to model conceptual combinations, we had not recognized that by representing the combined concept by a tensor product vector |Ai ⊗ |Bi, we implicitly assumed that membership weights probabilities are factorised in second sector, that is, the membership weights µ(A) and µ(B) correspond to independent events in this sector. In Section 3.2 we have showed that, if one introduces entangled states to represent combined concepts in second sector, one is able to fully reproduce all classicality

# References


conditions Equations (26–30) in this sector. And, more, one can formalize the fact that, for certain exemplars, the probabilities associated with memberships of, say Fruits and Vegetables, are not independent. Therefore, Fock space has suggested how to capture this relevant aspect in depth.

The discussion above shows, in our opinion, that the merits of our two-sector Fock space framework go beyond faithful representation of one or more sets of experimental data. It captures some fundamental aspects of the mechanisms through which concepts are formed, combine and interact in human cognition.

# Supplementary Material

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fpsyg.2015.01447


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Aerts, Sozzo and Veloz. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Appendix

# A1. Fundamentals of Modeling in a Classical Framework

We introduce in this section the elementary measure-theoretic notions that are needed to express the classicality of experimental data coming from the membership weights of two concepts A and B with respect to the conceptual negation "not B" and the conjunctions "A and B," "A and not B," "not A and B," and "not A and not B." As we have anticipated in Section 2, by "classicality of a collection of experimental date" we actually mean the possibility to represent them in a "classical," or "Kolmogorovian," probability model. We avoid in our presentation superfluous technicalities, but aim to be synthetic and rigorous at the same time.

Let us start by the definition of a σ-algebra over a set.

First definition. A σ-algebra over a set is a non-empty collection σ() of subsets of that is closed under complementation and countable unions of its members. It is a Boolean algebra, completed to include countably infinite operations.

Measure structures are the most general classical structures devised by mathematicians and physicists to structure weights. A Kolmogorovian probability measure is such a measure applied to statistical data. It is called "Kolmogorovian," because Andrey Kolmogorov was the first to axiomatize probability theory in this way (Kolmogorov, 1933).

Second definition. A measure P is a function defined on a σ-algebra σ() over a set and taking values in the extended interval [0, ∞] such that the following three conditions are satisfied:


A Kolmogorovian probability measure is a measure with total measure one. A Kolmogorovian probability space (, σ(), P) is a measure space (, σ(), P) such that P is a Kolmogorovian probability. The three conditions expressed in a mathematical way are:

$$P(\emptyset) = 0 \quad P(\bigcup\_{i=1}^{\infty} E\_i) = \sum\_{i=1}^{\infty} P(E\_i) \quad P(\Omega) = 1 \tag{A1}$$

Let us now come to the possibility to represent a set of experimental data on two concepts and their conjunction in a classical Kolmogorovian probability model.

Third definition. We say that the membership weights µ(A), µ(B) and µ(A and B) of the exemplar x with respect to the pair of concepts A and B and their conjunction "A and B," respectively, can be represented in a classical Kolmogorovian probability model if there exists a Kolmogorovian probability space (, σ(), P) and events EA, E<sup>B</sup> ∈ σ() of the events algebra σ() such that

$$P(E\_A) = \mu(A) \quad P(E\_B) = \mu(B) \quad \text{and} \quad P(E\_A \cap E\_B) = \mu(A \text{ and } B) \tag{A2}$$

Let us finally come to the representability a set of experimental data on a concept and its negation in a classical Kolmogorovian probability model.

Fourth definition. We say that the membership weights µ(B) and µ(not B) of the exemplar x with respect to the concept B and its negation "not B," respectively, can be represented in a classical Kolmogorovian probability model if there exists a Kolmogorovian probability space (, σ(), P) and an event E<sup>B</sup> ∈ σ() of the events algebra σ() such that

$$P(E\_B) = \mu(B) \quad P(\Omega \mid E\_B) = \mu(\text{not } B) \tag{A3}$$

# A2. Quantum Mathematics for Conceptual Modeling

We illustrate in this section how the mathematical formalism of quantum theory can be applied to model situations outside the microscopic quantum world, more specifically, in the representation of concepts and their combinations. As in Appendix A1, we will limit technicalities to the essential.

When the quantum mechanical formalism is applied for modeling purposes, each considered entity—in our case a concept—is associated with a complex Hilbert space H, that is, a vector space over the field C of complex numbers, equipped with an inner product h·|·i that maps two vectors hA| and |Bi onto a complex number hA|Bi. We denote vectors by using the bra-ket notation introduced by Paul Adrien Dirac, one of the pioneers of quantum theory (Dirac, 1958). Vectors can be "kets," denoted by |Ai, |Bi, or "bras," denoted by hA|, hB|. The inner product between the ket vectors|Ai and |Bi, or the bra-vectorshA| and hB|, is realized by juxtaposing the bra vector hA| and the ket vector |Bi, and hA|Bi is also called a "bra-ket," and it satisfies the following properties:


From (ii) and (iii) follows that inner product h·|·i is linear in the ket and anti-linear in the bra, i.e., (zhA|+thB|)|Ci = z ∗ hA|Ci+t ∗ hB|Ci.

We recall that the "absolute value" of a complex number is defined as the square root of the product of this complex number times its complex conjugate, that is, <sup>|</sup>z| = <sup>√</sup> z ∗z. Moreover, a complex number z can either be decomposed into its cartesian form z = x + iy, or into its polar form z = |z|e <sup>i</sup><sup>θ</sup> = |z|(cos <sup>θ</sup> <sup>+</sup> <sup>i</sup>sin <sup>θ</sup>). As a consequence, we have |hA|Bi| = <sup>√</sup> hA|BihB|Ai. We define the "length" of a ket (bra) vector |Ai (hA|) as |||Ai|| = ||hA||| = √ hA|Ai. A vector of unitary length is called a "unit vector'. We say that the ket vectors |Ai and |Bi are "orthogonal" and write |Ai ⊥ |Bi if hA|Bi = 0.

We have now introduced the necessary mathematics to state the first modeling rule of quantum theory, as follows.

First quantum modeling rule: A state A of an entity—in our case a concept—modeled by quantum theory is represented by a ket vector |Ai with length 1, that is hA|Ai = 1.

An orthogonal projection M is a linear operator on the Hilbert space, that is, a mapping <sup>M</sup> : <sup>H</sup> <sup>→</sup> <sup>H</sup>, <sup>|</sup>Ai 7→ <sup>M</sup>|A<sup>i</sup> which is Hermitian and idempotent. The latter means that, for every <sup>|</sup>Ai, <sup>|</sup>Bi ∈ <sup>H</sup> and <sup>z</sup>, <sup>t</sup> <sup>∈</sup> <sup>C</sup>, we have:


The identity operator 1 maps each vector onto itself and is a trivial orthogonal projection. We say that two orthogonal projections M<sup>k</sup> and M<sup>l</sup> are orthogonal operators if each vector contained in M<sup>k</sup> (H) is orthogonal to each vector contained in M<sup>l</sup> (H), and we write M<sup>k</sup> ⊥ M<sup>l</sup> , in this case. The orthogonality of the projection operators M<sup>k</sup> and M<sup>l</sup> can also be expressed by MkM<sup>l</sup> = 0, where 0 is the null operator. A set of orthogonal projection operators {M<sup>k</sup> |k = 1, .. . , n} is called a "spectral family" if all projectors are mutually orthogonal, that is, M<sup>k</sup> ⊥ M<sup>l</sup> for k 6= l, and their sum is the identity, that is, P<sup>n</sup> <sup>k</sup> <sup>=</sup> <sup>1</sup> <sup>M</sup><sup>k</sup> <sup>=</sup> <sup>1</sup>.

The above definitions give us the necessary mathematics to state the second modeling rule of quantum theory, as follows.

Second quantum modeling rule: A measurable quantity Q of an entity—in our case a concept— modeled by quantum theory, and having a set of possible real values {q1, . . . , qn} is represented by a spectral family {M<sup>k</sup> |k = 1, . . ., n} in the following way. If the entity—in our case a concept—is in a state represented by the vector |Ai, then the probability of obtaining the value q<sup>k</sup> in a measurement of the measurable quantity Q is hA|M<sup>k</sup> |Ai = ||M<sup>k</sup> <sup>|</sup>Ai||<sup>2</sup> . This formula is called the "Born rule" in the quantum jargon. Moreover, if the value q<sup>k</sup> is actually obtained in the measurement, then the initial state is changed into a state represented by the vector

$$|A\_k\rangle = \frac{M\_k |A\rangle}{||M\_k |A\rangle ||}\tag{A4}$$

This change of state is called "collapse" in the quantum jargon.

The tensor product <sup>H</sup><sup>A</sup> <sup>⊗</sup> <sup>H</sup><sup>B</sup> of two Hilbert spaces <sup>H</sup><sup>A</sup> and <sup>H</sup><sup>B</sup> is the Hilbert space generated by the set {|Aii ⊗ |Bji}, where |Aii and <sup>|</sup>Bj<sup>i</sup> are vectors of <sup>H</sup><sup>A</sup> and <sup>H</sup>B, respectively, which means that a general vector of this tensor product is of the form P ij |Aii⊗|Bji. This gives us the necessary mathematics to introduce the third modeling rule.

Third quantum modeling rule: A state C of a compound entity—in our case a combined concept—is represented by a unit vector |Ci of the tensor product <sup>H</sup><sup>A</sup> <sup>⊗</sup> <sup>H</sup><sup>B</sup> of the two Hilbert spaces <sup>H</sup><sup>A</sup> and <sup>H</sup><sup>B</sup> containing the vectors that represent the states of the component entities—concepts.

The above means that we have |Ci = P ij cij|Aii ⊗ |Bji, where | P <sup>A</sup>i<sup>i</sup> and <sup>|</sup>Bj<sup>i</sup> are unit vectors of <sup>H</sup><sup>A</sup> and <sup>H</sup>B, respectively, and i,j |cij| <sup>2</sup> <sup>=</sup> 1. We say that the state <sup>C</sup> represented by <sup>|</sup>C<sup>i</sup> is a product state if it is of the form <sup>|</sup>Ai ⊗ |B<sup>i</sup> for some <sup>|</sup>Ai ∈ <sup>H</sup><sup>A</sup> and <sup>|</sup>Bi ∈ <sup>H</sup>B. Otherwise, <sup>C</sup> is called an "entangled state'.

The Fock space is a specific type of Hilbert space, originally introduced in quantum field theory. For most states of a quantum field the number of identical quantum entities is not conserved but is a variable quantity. The Fock space copes with this situation in allowing its vectors to be superpositions of vectors pertaining to different sectors for fixed numbers of identical quantum entities. More explicitly, the k-th sector of a Fock space describes a fixed number of k identical quantum entities, and it is of the form <sup>H</sup> <sup>⊗</sup> . . . <sup>⊗</sup> <sup>H</sup> of the tensor product of <sup>k</sup> identical Hilbert spaces <sup>H</sup>. The Fock space F itself is the direct sum of all these sectors, hence

$$\mathcal{F} = \oplus\_{k=1}^{\dot{l}} \otimes\_{l=1}^{k} \mathcal{H} \tag{A5}$$

For our modeling we have only used Fock space for the "two" and "one quantum entity" case, hence <sup>F</sup> <sup>=</sup> <sup>H</sup> <sup>⊕</sup> (<sup>H</sup> <sup>⊗</sup> <sup>H</sup>). This is due to considering only combinations of two concepts. The sector <sup>H</sup> is called the "first sector," while the sector <sup>H</sup> <sup>⊗</sup> <sup>H</sup> is called the "second sector'. A unit vector <sup>|</sup>Fi ∈ <sup>F</sup> is then written as <sup>|</sup>Fi = nei<sup>γ</sup> <sup>|</sup>Ci + mei<sup>δ</sup> (|Ai ⊗ |Bi), where |Ai, |Bi and |Ci are unit vectors of H, and such that n <sup>2</sup> <sup>+</sup> <sup>m</sup><sup>2</sup> <sup>=</sup> 1. For combinations of <sup>j</sup> concepts, the general form of Fock space in (A5) should be used.

# A3. Data Modeling Tables and Statistical Analysis

**Tables A1**–**A5A–E**.

# A4. Proofs of Theorems 1–4

**Proof of Theorem 1.** If µ(A), µ(B), µ(A ′ ), µ(B ′ ) and µ(A and B), µ(A and B ′ ), µ(A ′ and B), µ(A ′ and B ′ ) are classical conjunction and negation data, then there exists a Kolmogorovian probability space (, σ(), P) and events EA, E<sup>B</sup> ∈ σ() such that P(EA) = µ(A), P(EB) = µ(B), P( \ EA) = µ(A ′ ), P( \ EB) = µ(B ′ ), P(E<sup>A</sup> ∩ EB) = µ(A and B), P(E<sup>A</sup> ∩ (( \ EB)) = µ(A and B ′ ), P(( \ EA) ∩ EB) = µ(A ′ and B) and P(( \ EA) ∩ ( \ EB)) = µ(A ′ and B ′ ). From the general properties of a Kolmogorovian probability space it follows that (1), (2), (3), (4), (5), (6), (7) and (8) are satisfied.

Now suppose that x is such that its membership weights µ(A), µ(B), µ(A ′ ), µ(B ′ ), µ(A and B), µ(A and B ′ ), µ(A ′ and B) and µ(A ′ and B ′ ) with respect to the concepts A, B, A ′ , B ′ , "A and B," "A and B ′ ," "A ′ and B" and "A ′ and B ′ ," respectively, satisfy (1), (2), (3), (4), (5), (6), (7) and (8). We will prove that as a consequence µ(A), µ(B), µ(A ′ ), µ(B ′ ), µ(A and B), µ(A and B ′ ), µ(A ′ and B) and µ(A ′ and B ′ ) are classical conjunction and negation data, in the sense that "there exists a classical Kolmogorovian probability space, such that we can represent all of them as measures on event sets of this space'. We make our proof by explicitly constructing a Kolmogorovian probability space that models these data. Consider the set = {1, <sup>2</sup>, <sup>3</sup>, <sup>4</sup>} and <sup>σ</sup>() <sup>=</sup> <sup>P</sup>(), the set of all subsets of . We define

$$P(\{1\}) = \mu(A \text{ and } B) \tag{A6}$$

P({2}) = µ(A and B ′ ) = µ(A) − µ(A and B) (A7)

P({3}) = µ(A ′ and B) = µ(A ′ ) − µ(A ′ and B ′ ) (A8)

$$P(\{4\}) = \mu(A' \text{ and } B') \tag{A9}$$

and further for an arbitrary subset S ⊆ {1, 2, 3, 4} we define


TABLE A1 | Representation of the membership weights in the case of the concepts Home Furnishing and Furniture.

*A* = *Home Furnishing, B* = *Furniture.*

$$P(\mathcal{S}) = \sum\_{a \in \mathcal{S}} P(\{a\}) \tag{A10}$$

Let us prove that <sup>P</sup> : <sup>σ</sup>() <sup>→</sup> [0, 1] is a probability measure. For this purpose, we need to prove that P(S) ∈ [0, 1] for an arbitrary subset S ⊆ , and that the "sum formula" for a probability measure is satisfied. The sum formula for a probability measure is satisfied because of definition (A10). What remains to be proved is that P(S) ∈ [0, 1] for an arbitrary subset S ⊆ , and that all different subsets that can be formed are contained in σ(). P({1}), P({2}), P({3}) and P({4}) are contained in [0, 1] as a consequence of equations (1), (3), (5) and (6). Using (5) we have that P({1, 2}) = µ(A and B) + µ(A and B ′ ) = µ(A and B) + µ(A) − µ(A and B) = µ(A). Using (6) we have that P({3, 4}) = µ(A ′ and B ′ ) + µ(A ′ and B) = µ(A ′ and B ′ ) + µ(A ′ ) − µ(A ′ and B ′ ) = µ(A ′ ). Again using (6) we have that P({1, 3}) = µ(A and B) + µ(A ′ and B) = µ(A and B) + µ(B) − µ(A and B) = µ(B), and using again (5) we have that P({2, 4}) = µ(A and B ′ ) + µ(A ′ and B ′ ) = µ(B ′ ) − µ(A ′ and B ′ ) + µ(A ′ and B ′ ) = µ(B ′ ). Moreover, P({1, 2}), P({3, 4}), P({1, 3}) and P({2, 4}) are all contained in [0, 1] as a consequence of equations (1), (2), (3) and (4). We have already found the representatives of all elements and their conjunctions in σ(). But we have not yet considered all subsets of . Indeed, let us consider µ({1, 2, 3}) = µ(A and B) + µ(A and B ′ ) + µ(A ′ and B) = 1 − µ(A ′ and B ′ ). And from (7) it follows that this is contained in [0.1]. In an analogous way we prove that µ({1, 2, 4}) = 1 − µ(A ′ and B), µ({1, 3, 4}) = 1 − µ(A and B ′ ), and µ({2, 3, 4}) = 1 − µ(A and B). We almost have all subsets of . Let us consider {1, 4} and {2, 3}. Since by construction we have µ({1}) ≤ µ({1, 4}) ≤ µ({1, 2, 4}) and µ({2}) ≤ µ({2, 3}) ≤ µ({2, 3, 4}), it follows that both µ({1, 4}) and µ({2, 3}) are contained in [0, 1]. The last subset to control is itself. We have P() = P({1}) + P({2}) + P({3}) + P({4}) = 1, following the calculation we made above. We have verified all subsets S ⊆ , and hence proved that P is a probability measure. All subsets for which we have gathered data are represented in this σ-algebra, which completes our proof.

**Proof of Theorem 2.** Let us consider Theorem 1. In its proof, we did not use (8)" which means that inequalities (5), (6), (7) and (8) are not independent. By using, for example, (5), (6), and then (7) we get

$$\begin{aligned} 1 - \mu(A') - \mu(B') + \mu(A' \text{ and } B') \\ &= 1 - \mu(B) - \mu(B') + \mu(A \text{ and } B) \end{aligned}$$

TABLE A2 | Representation of the membership weights in the case of the concepts Spices and Herbs.


*A* = *Spices, B* = *Herbs.*

$$\begin{aligned} &=\, 1-\mu(B)-\mu(A' \text{ and } B')-\mu(A)+\mu(A \text{ and } B) \\ &\quad +\mu(A \text{ and } B) \\ &=\, \mu(A' \text{ and } B')-\mu(A' \text{ and } B')+\mu(A \text{ and } B) \\ &=\, \mu(A \text{ and } B) \end{aligned}$$

which proves indeed that (8) can be derived from (5), (6) and then (7), and can be left out as a condition.

Let us now prove a result which is useful for our purposes. Following (5) and (6) we have that µ(A and B ′ ) + µ(A and B) + µ(A ′ and B) = µ(A) + µ(B) − µ(A and B). Moreover, by using (7), we get µ(A and B ′ ) + µ(A and B) + µ(A ′ and B) + µ(A ′ and B ′ ) = µ(A) + µ(B) − µ(A and B) + 1 − µ(A) − µ(B) + µ(A and B) = 1. The equality

$$
\mu(A \text{ and } B') + \mu(A \text{ and } B) + \mu(A' \text{ and } B) + \mu(A' \text{ and } B') = 1 \tag{A11}
$$

can be used, together with (5) and (6), as follows.

$$\begin{aligned} \mu(A) + \mu(A') &= \mu(A \text{ and } B) + \mu(A \text{ and } B') + \mu(A' \text{ and } B) \\ &+ \mu(A' \text{ and } B') = 1 \\ \mu(B) + \mu(B') &= \mu(A \text{ and } B) + \mu(A' \text{ and } B) + \mu(A \text{ and } B') \\ &+ \mu(A' \text{ and } B') = 1 \end{aligned}$$

This means that from (1), and hence 0 ≤ µ(A) ≤ 1, follows that 0 ≤ 1 − µ(A) ≤ 1, and hence 0 ≤ µ(A ′ ) ≤ 1. And from (2), and hence 0 ≤ µ(B) ≤ 1, follows that 0 ≤ 1 − µ(B) ≤ 1, and hence 0 ≤ µ(B ′ ) ≤ 1. Suppose now that (1) and (2) are satisfied. This means that 0 ≤ µ(A) − µ(A and B) = µ(B ′ ) − µ(A ′ and B ′ ) and hence µ(A ′ and B ′ ) ≤ µ(B ′ ), and 0 ≤ µ(B) − µ(A and B) = µ(A ′ ) − µ(A ′ and B ′ ) and hence µ(A ′ and B ′ ) ≤ µ(A ′ ). The only condition that lacks to have derived (3) and (4) from (1) and (2), is that 0 ≤ µ(A ′ and B ′ ). We can add this as a requirement to (7).

Hence, we have proved the Theorem 2.

**Proof of Lemma 1.** That (14), (15), (16) and (17) follow from (5) and (6) follows from a simple reshuffling of the terms. Suppose now that (14), (15), (16) and (17) are satisfied. The inverse reshuffling of the same terms proves that (5) and (6) are satisfied. This completes the proof of Lemma 1.

**Proof of Theorem 3.** Lemma 1 entails that we can substitute (5) and (6) by the four equations expressing the marginal law to be satisfied.

A further simplification is possible. Indeed, (5), (6) and (7) are equivalent with (14), (15), (16), (17) and (A11). We have proved above that (5), (6) and (7) imply (A11). Let us prove the inverse. Hence suppose that (14), (15), (16), (17) and (A11) are satisfied, and let us proof (7). We have

$$\begin{aligned} &1 - \mu(A) - \mu(B) + \mu(A \text{ and } B) \\ &= \quad 1 - \mu(A \text{ and } B) - \mu(A \text{ and } B') - \mu(A \text{ and } B) - \mu(A' \text{ and } B) \\ &+ \mu(A \text{ and } B) \end{aligned}$$


TABLE A3 | Representation of the membership weights in the case of the concepts Pets and Farmyard Animals.

*A* = *Pets, B* = *Farmyard Animals.*

= 1 − µ(A and B) − µ(A and B ′ ) − µ(A ′ and B) = µ(A ′ and B ′ )

which proves that (7) holds.

We have thus proved Theorem 3, stating a new and more symmetric set of classicality conditions.

**Proof of Theorem 4.** Suppose that the theoretical values for a modeling only in second sector of Fock space are given. This means that |Ci and M are given, and the values of µ(A), µ(B), µ(A ′ ), µ(B ′ ), µ(A and B), µ(A and B ′ ), µ(A ′ and B) and µ(A ′ and B ′ ) are given respectively by <sup>h</sup>C|<sup>M</sup> <sup>⊗</sup> <sup>1</sup>|Ci, <sup>h</sup>C|<sup>1</sup> <sup>⊗</sup> <sup>M</sup>|Ci, <sup>h</sup>C|(<sup>1</sup> <sup>−</sup> <sup>M</sup>) <sup>⊗</sup> <sup>1</sup>|Ci, <sup>h</sup>C|<sup>1</sup> <sup>⊗</sup> (<sup>1</sup> <sup>−</sup> <sup>M</sup>)|Ci,hC|<sup>M</sup> <sup>⊗</sup> <sup>M</sup>|Ci, <sup>h</sup>C|<sup>M</sup> <sup>⊗</sup> (<sup>1</sup> <sup>−</sup> <sup>M</sup>)|Ci, <sup>h</sup>C|(<sup>1</sup> <sup>−</sup> <sup>M</sup>) <sup>⊗</sup> <sup>M</sup>|C<sup>i</sup> and <sup>h</sup>C|(<sup>1</sup> <sup>−</sup> <sup>M</sup>) <sup>⊗</sup> (<sup>1</sup> <sup>−</sup> <sup>M</sup>)|Ci. If we use the results calculated in (76), (77), (78), (79), (80), (81), (82) and (83), we can easely prove all the classicality conditions (26)–(30) to be satisfied. Let us prove one of them explicitly. For example, µ(A ′ and B) + µ(A ′ and B ′ ) = hC|(1<sup>−</sup> <sup>M</sup>) <sup>⊗</sup> <sup>M</sup>|Ci + hC|(1<sup>−</sup> <sup>M</sup>) <sup>⊗</sup> (1<sup>−</sup> <sup>M</sup>)|Ci = <sup>h</sup>C|(<sup>1</sup> <sup>−</sup> <sup>M</sup>) <sup>⊗</sup> <sup>M</sup> <sup>+</sup> (<sup>1</sup> <sup>−</sup> <sup>M</sup>)|Ci = hC|(<sup>1</sup> <sup>−</sup> <sup>M</sup>) <sup>⊗</sup> <sup>1</sup>|Ci = <sup>µ</sup>(<sup>A</sup> ′ ).

Let us prove the other implication. Hence suppose that we have available data satisfying the classicality conditions (26)–(30), let us prove that we can find a state |Ci and an orthogonal projector M, such that second sector models these data. It is a straightforward verification that an entangled vector |Ci, such that cij = 1 16 p µ(A and B) for 5 ≤ i ≤ 8 and 5 ≤ j ≤ 8, cij = 1 16 p µ(A and B′ ) for 5 ≤ i ≤ 8 and 1 ≤ j ≤ 4, cij = 1 16 p µ(A′ and B) for 1 ≤ i ≤ 4 and 5 ≤ j ≤ 8 and cij = 1 16 p µ(A′ and B′ ) for 1 ≤ i ≤ 4 and 1 ≤ j ≤ 4, is a solution.

# A5. Conditions for the Existence of Solutions in First and Second Sector of Fock Space

In this section we make explicit the conditions that should be satisfied by experimental data in order to be represented in Fock space. In our analysis, we distinguish between the first sector representation in C 8 and the complete two-sector representation in C <sup>8</sup> <sup>⊕</sup> (<sup>C</sup> <sup>8</sup> <sup>⊗</sup> <sup>C</sup> 8 ).

Let us start from the C 8 representation. We first analyze whether or not the solution of (52)–(65) is compatible with (70)–(73). To this end note that the right hand side of (70)–(73) correspond to the average of the probabilities of the former concepts, plus the so called "interference term," which depends on (i) how the


TABLE A4 | Representation of the membership weights in the case of the concepts Fruits and Vegetables.

*A* = *Fruits, B* = *Vegetables.*

vectors representing the former concepts in the combination, when restricted to the subspace determined by M, project into each other, and (ii) on the phase angles of the vectors (Aerts, 2009). Note that the configuration of the phase angles φA, φB, φA′, φB′ allow to model a variety of interference situations. In fact, when the difference between these phase angles is close to 0 or π, we have a maximal amount of interference while, when the difference between these phase angles is close to <sup>π</sup> 2 or <sup>3</sup><sup>π</sup> 2 , we have a minimal amount of interference. We can then characterize a set of "compatibility intervals" for the solution of (52)–(65) and (70)– (73) which determines the modeling capacity of this Hilbert space model. Let,

$$\begin{aligned} i(A,B)\_1 &= \frac{1}{2}(\mu(A) + \mu(B)) - |a\_5 b\_5 + a\_6 b\_6| \\ &+ a\_7 b\_7 + a\_8 b\_8| \end{aligned} \tag{A12}$$

$$\begin{aligned} i(A,B)\_2 &= \frac{1}{2}(\mu(A) + \mu(B)) + |a\_5 b\_5 + a\_6 b\_6| \\ &+ a\_7 b\_7 + a\_8 b\_8| \end{aligned} \tag{A13}$$

$$\begin{aligned} i(A,B')\_1 &= \frac{1}{2}(\mu(A) + \mu(B')) - |a\_5 b'\_5 + a\_6 b'\_6| \\ &+ a\_7 b'\_7 + a\_8 b'\_8| \end{aligned} \tag{A14}$$

$$\begin{aligned} i(A,B')\_2 &= \frac{1}{2}(\mu(A) + \mu(B')) + |a\_5 b'\_5 + a\_6 b'\_6| \\ &+ a\_7 b'\_7 + a\_8 b'\_8| \end{aligned} \tag{A15}$$

$$\begin{aligned} i(A',B)\_1 &= \frac{1}{2}(\mu(A') + \mu(B)) - |a'\_5 b\_5 + a'\_6 b\_6 \\ &+ a'\_7 b\_7 + a'\_8 b\_8| \end{aligned} \tag{A16}$$

$$\begin{aligned} i(A',B)\_2 &= \frac{1}{2}(\mu(A') + \mu(B)) + |a'\_5 b\_5 + a'\_6 b\_6 \\ &+ a'\_7 b\_7 + a'\_8 b\_8 \end{aligned} \tag{A17}$$

+a 7 b<sup>7</sup> + a 8 b8| (A17) i(A ′ , B ′ )<sup>1</sup> = 1 2 (µ(A ′ ) + µ(B ′ )) − |a ′ 5 b ′ <sup>5</sup> + a ′ 6 b ′ 6

$$+a\_7'b\_7'+a\_8'b\_8'|\,\tag{A18}$$

$$(\mathbf{j},\mathbf{B}')\_2=\frac{1}{2}(\mu(\mathbf{A}')+\mu(\mathbf{B}'))+|a\_5'b\_5'+a\_6'b\_6'$$

$$a\_7' + a\_8'b\_8'|\tag{A19}$$

and let us define the following "solution intervals"

+a ′ 7 b ′

i(A ′

$$\Box \mathcal{J}\_{AB} = [i(A,B)\_1, i(A,B)\_2] \tag{A20}$$

$$\Box \mathcal{J}\_{AB'} = [i(A, B')\_1, i(A, B')\_2] \tag{A21}$$

$$\mathcal{J}\_{A'B} = [i(A',B)\_1, i(A',B)\_2] \tag{A22}$$

$$\mathcal{A}'\_{A'B'} = [i(A',B')\_1, i(A',B')\_2] \tag{A23}$$

A solution of (52)-(73) exists if and only if the membership weights µ(A and B), µ(A ′ and B), µ(A and B ′ ) and µ(A and B) are respectively contained in the intervals IAB, IA′B, IAB′ and IA′B′ .


#### Deviation of µ(A) from µ(A and B) + µ(A and B ′ )


*By applying a Bonferroni correction procedure, the null hypothesis can be rejected for a* p*-value less than 0.05 <sup>24</sup>* <sup>=</sup> *2.08* · *<sup>10</sup>*−*<sup>3</sup>*

Let us now come to the complete representation in C <sup>8</sup> <sup>⊕</sup> (<sup>C</sup> 8 ⊗ C 8 ), and let us consider the data collected in the experiments eXY, X = A, A ′ , Y = B, B ′ , of Section 2. These data are explicitly reported in **Tables A1**–**A4**. Since the existence of solutions for (86)– (92) depends, for a given experiment eXY, on the expemplar x and the pair (A, B) of concepts that are considered, we explicitly report such dependence for all the relevant variables that are considered in this section. Hence, for each considered exemplar x, we collected the eight membership weights µx(A), µx(B), µx(A ′ ), µx(B ′ ), µx(A and B), µx(A and B ′ ), µx(A ′ and B), and µx(A ′ and B ′ ).

The analysis we made in the foregoing sections makes it possible for us to propose a general modeling procedure. For what concerns solutions that can be found on first sector alone, we determined the intervals of solutions as explained in (A20), (A21), (A22) and (A23). We can now easily determine the general intervals of solutions, including the extra solutions made possible by second sector. Therefore, we need to consider the following quantities P<sup>8</sup> i,j = 5 c 2 ij, P<sup>8</sup> i = 5 P<sup>4</sup> j = 1 c 2 ij, P<sup>4</sup> i = 1 P<sup>8</sup> j = 5 c 2 ij and <sup>P</sup><sup>8</sup> i = 5 P<sup>4</sup> j = 1 c 2 ij, respectively for the combinations "A and B," "A and not B," "not A and B" and "not A and not B'. To be able to express the intervals of Fock space solutions, we introduce the following quantities.

$$s(A, B, \mathfrak{x}) = \min \left( \sum\_{i, j = 5}^{8} c\_{ij}^2, i \langle A, B \rangle\_1 \right) \tag{A24}$$

$$\iota(A,B,\mathfrak{x}) = \max\left(\sum\_{i,j=5}^{8} c\_{ij}^2, \mathfrak{i}(A,B)\_2\right) \tag{A25}$$

Then, the interval

*.*

$$U\_{sol} \left( AB, \chi \right) = \left[ s(A, B, \chi), t(A, B, \chi) \right] \tag{A26}$$

is the solution interval for the general Fock space model. Hence, in case the experimental value µ(A and B) is contained in this interval, a solution exists. As a second step we can then see whether a solution in first sector alone exists, which consists of veryfying whether the experimental value µ(A and B) is contained in IAB. Suppose that the anwer is "yes," then we can caculate the angle φ<sup>B</sup> − φ<sup>A</sup> that gives rise to this solution in first sector. This angle is then an indication of which angle to choose for the general Fock space solution. Usually different choices are possible. If there is no solution in first sector, we anyhow can choose an angle φ<sup>B</sup> − φA, such that a choice of this angle φB−φA, and a choice of mAB and nAB gives a solution. The possible values of the angle and the coefficients mAB and nAB are calculated by solving (86).

We can analyze the other combinations in an equivalent way. Let us start with the combination "A and not B'. We have:

$$s(A, B', \mathfrak{x}) = \min \left( \sum\_{i=5}^{8} \sum\_{j=1}^{4} c\_{ij}^{2}, i \langle A, B' \rangle\_{1} \right) \tag{A27}$$


#### Deviation of µ(B) from µ(A and B) + µ(A ′ and B)


*By applying a Bonferroni correction procedure, the null hypothesis can be rejected for a* p*-value less than 0.05 <sup>24</sup>* <sup>=</sup> *2.08* · *<sup>10</sup>*−*<sup>3</sup> .*

$$\iota(A, B', \mathfrak{x}) = \max \left( \sum\_{i=5}^{8} \sum\_{j=1}^{4} c\_{ij}^{2}, \iota(A, B')\_{2} \right) \tag{A.28}$$

Then, the interval

$$U\_{sol}(AB',\mathbb{x}) = [s(A,B',\mathbb{x}), t(A,B',\mathbb{x})] \tag{A.29}$$

is the solution interval for the general Fock space model. The equation to be used to calculate the angle φB′ − φA, and the coefficients mAB′ and nAB′ is (88).

For the combination "A ′ and B," we have:

$$s(A',B,\chi) = \min\left(\sum\_{i=1}^{4}\sum\_{j=5}^{8}c\_{ij}^{2}, i(A',B)\_{1}\right) \tag{A30}$$

$$\mathfrak{l}(A',B,\mathfrak{x}) = \max\left(\sum\_{i=1}^{4} \sum\_{j=5}^{8} c\_{ij}^{2}, \mathfrak{l}(A',B)\_{2}\right) \tag{A31}$$

Then, the interval

$$U\_{sol}(A^\prime B, \mathfrak{x}) = [s(A^\prime, B, \mathfrak{x}), t(A^\prime, B, \mathfrak{x})] \tag{A32}$$

is the solution interval for the general Fock space model. The equation to be used to calculate the angle φ<sup>B</sup> − φA′ , and the coefficients mA′<sup>B</sup> and nA′<sup>B</sup> is (90).

Finally, for the combination "not A and not B," we have:

$$s(A', B', \mathfrak{x}) = \min \left( \sum\_{i=5}^{8} \sum\_{j=1}^{4} c\_{ij}^2, i \langle A', B' \rangle\_1 \right) \tag{A33}$$

$$\iota(A, B', \mathfrak{x}) = \max \left( \sum\_{i=5}^{8} \sum\_{j=1}^{4} c\_{ij}^2, \iota(A', B')\_2 \right) \tag{A34}$$

Then, the interval

$$U\_{sol}(A^\prime B^\prime, \mathfrak{x}) = [s(A^\prime, B^\prime, \mathfrak{x}), t(A^\prime, B^\prime, \mathfrak{x})] \tag{A35}$$

is the solution interval for the general Fock space model. The formula to be used to calculate the angle φB′ − φA′ , and the coefficients mA′B′ and nA′B′ is Equation (92).

#### TABLE A5C | Calculation of the <sup>p</sup>-values corresponding to the deviation <sup>I</sup>A′ between <sup>µ</sup>(<sup>A</sup> ′ ) and µ(A ′ and B) + µ(A ′ and B ′ ).

#### Deviation of µ(A ′ ) from µ(A ′ and B) + µ(A ′ and B ′ )


*By applying a Bonferroni correction procedure, the null hypothesis can be rejected for a* p*-value less than 0.05 <sup>24</sup>* <sup>=</sup> *2.08* · *<sup>10</sup>*−*<sup>3</sup> .*

#### TABLE A5D | Calculation of the <sup>p</sup>-values corresponding to the deviation <sup>I</sup>B′ between <sup>µ</sup>(<sup>B</sup> ′ ) and µ(A and B ′ ) + µ(A ′ and B ′ ).

#### Deviation of µ(B ′ ) from µ(A and B ′ ) + µ(A ′ and B ′ )


*By applying a Bonferroni correction procedure, the null hypothesis can be rejected for a* p*-value less than 0.05 <sup>24</sup>* <sup>=</sup> *2.08* · *<sup>10</sup>*−*<sup>3</sup> .*

#### TABLE A5E | Calculation of the <sup>p</sup>-values corresponding to the deviation <sup>I</sup>ABA′B′ between <sup>µ</sup>(<sup>A</sup> and <sup>B</sup>) <sup>+</sup> <sup>µ</sup>(<sup>A</sup> and <sup>B</sup> ′ ) + µ(A ′ and B) + µ(A ′ and B ′ ) and 1.

#### Deviation of µ(A and B) + µ(A and B ′ ) + µ(A ′ and B) + µ(A ′ and B ′ ) from 1


*By applying a Bonferroni correction procedure, the null hypothesis can be rejected for a* p*-value less than 0.05 <sup>24</sup>* <sup>=</sup> *2.08* · *<sup>10</sup>*−*<sup>3</sup> .*

# Reconstruction of a Real World Social Network Using the Potts Model and Loopy Belief Propagation

Cristian Bisconti <sup>1</sup> , Angelo Corallo<sup>1</sup> , Laura Fortunato<sup>1</sup> , Antonio A. Gentile1, 2 \*, Andrea Massafra<sup>1</sup> and Piergiuseppe Pellè1, 3

<sup>1</sup> CoSSNA Group, cPDM Lab, Department for Innovation Engineering, University of Salento, Lecce, Italy, <sup>2</sup> EKA srl, Lecce, Italy, <sup>3</sup> Advantech srl, Lecce, Italy

The scope of this paper is to test the adoption of a statistical model derived from Condensed Matter Physics, for the reconstruction of the structure of a social network. The inverse Potts model, traditionally applied to recursive observations of quantum states in an ensemble of particles, is here addressed to observations of the members' states in an organization and their (anti)correlations, thus inferring interactions as links among the members. Adopting proper (Bethe) approximations, such an inverse problem is showed to be tractable. Within an operational framework, this network-reconstruction method is tested for a small real-world social network, the Italian parliament. In this study case, it is easy to track statuses of the parliament members, using (co)sponsorships of law proposals as the initial dataset. In previous studies of similar activity-based networks, the graph structure was inferred directly from activity co-occurrences: here we compare our statistical reconstruction with such standard methods, outlining discrepancies and advantages.

Keywords: social network analysis, Potts model, network reconstruction, community detection, loopy belief propagation, inverse problem, quantum structures

# 1. INTRODUCTION

A growing interest raised in recent years about policy networks in social and organizational studies: the concept has flourished even in the absence of a widely agreed definition. Among the most successful ones, we may quote (Börzel, 1997) and the concept of horizontal networks linking a variety of actors, who share common interests about a policy, and cooperate toward its adoption. Now, such a broad idea withstood critiques considering the policy network a mere metaphor, more than a model capable of understanding the process of genesis and evolution of policies (Dowding, 1995). A rich literature has adopted both qualitative and quantitative methods to analyse the network paradigm. In fact, in most study cases, the relations between the actors involved are depicted as links between the corresponding nodes of a graph (the actors). Most discussions are also driven by network analysis tools and methods (Besussi, 2006).

Among quantitative methods, for our case study we focused on the collaborative nature of policy networks, dealing with vote behavior, an idea originally dating back to the "sociostructural and interactional effects," investigated since (Lazarsfeld et al., 1968). Here, however, following a recent but well developed approach, sponsorships and endorsements of law proposals are tracked in the dataset, rather than proper voting behavior when these proposals are approved or rejected. Social Network Analysis (SNA) performed with co-sponsorships and other

#### Edited by:

Sandro Sozzo, University of Leicester, UK

### Reviewed by:

Andreas Wichert, Instituto Superior Técnico - Universidade de Lisboa, Portugal Serena Arima, Sapienza University of Rome, Italy

> \*Correspondence: Antonio A. Gentile antonio.gentile@unisalento.it

#### Specialty section:

This article was submitted to Cognition, a section of the journal Frontiers in Psychology

Received: 29 July 2015 Accepted: 21 October 2015 Published: 09 November 2015

#### Citation:

Bisconti C, Corallo A, Fortunato L, Gentile AA, Massafra A and Pellè P (2015) Reconstruction of a Real World Social Network Using the Potts Model and Loopy Belief Propagation. Front. Psychol. 6:1698. doi: 10.3389/fpsyg.2015.01698 similar data in legislative bodies was started by Fowler (2006), where the network structure and proximity measures among the US Senate members were obtained. Further analyses were also based on roll calls, but they focused upon the creation and evolution of communities in the network (Porter et al., 2007; Zhang et al., 2008). Retrieving the policy network in a legislative body, and communities therein, using rollcalls and co-sponsorship data, instead of final votes, is a method embedding both advantages and disadvantages. A useful discussion about the point can be found in Chiru and Neamu (2012).

Main interest of this paper is the problem of preliminary reconstruction of member-member networks, starting from the member-activity affiliations, i.e., the roll-call votes data. Indeed, when it comes to the network reconstruction, the preliminary step of most SNA approaches is affected by the simplistic assumption that: two people are related to each other if, and only if, they perform simultaneously a (sub)set of activities, and the strength of their interaction is measured directly counting (and weighting) these co-occurrences.

Evidently, this standard method for network reconstruction can be improved in its capability of finding hidden links, or removing those due to noise and bearing no useful information. Various strategies may contribute significantly in this improvement: both those originated from SNA realm itself (e.g., adopting homophily for the study of the network structure); or from other fields (e.g., analysing covariates generated by different observations of the network, making use of random/mixed effects models, or checking covariance data against pseudo-randomization in the samples, etc.). The interested reader may find more details in specific papers. For example, in Newman and Leicht (2007) it is performed the reconstruction of the clusters inside a large-scale network via mixture models, investigating similar structural connections among the nodes. A mixture model in random graphs is used also in Daudin et al. (2008), but this time enriching it with a Bayesian approach, with the purpose to infer unknown classes (Nowicki and Snijders, 2001). Finally, in Jedidi et al. (1997) a general finite mixture structural equation model is built, capable of dealing with heterogeneities in the network's structural equation models, and based upon a set of observed variables (measured with error). In general, these approaches may adopt finite mixture simultaneous equation models, finite mixture confirmatory factor analysis, and finite mixture second-order factor analysis.

Most statistical methods outlined above are a way to relax the strong assumption that filters only those interactions due to co-occurrences. Here, instead, it is discussed an approach adopting the inverse Potts model, originated from Condensed Matter Physics. Inverse models aim to infer and model the interactions in an unknown network structure, starting from recursive observations of the nodes' states. As such, these models are adapt to capture underlying quantum structures in a decision making process, whenever the final decision state can be deduced in terms of the observed actions (this argument will be discussed in Section 2). Moreover, this paper also envisages how a Q-states Potts model enables a much better understanding and mimicking of the statistical features of complex network structures, compared for example to a more basic Ising modeling<sup>1</sup> . The approach is tested against a policy network reconstruction, starting from co-sponsorship data collected from the Italian Senate<sup>2</sup> .

It is worth to notice how Ising and Potts (direct) models have already found a large number of applications also in the realm of social sciences (Phani et al., 2004; Bordogna and Albano, 2007), including policy networks (Liu et al., 2010), but always applied to networks whose structure had been inferred previously by other strategies. However, the inverse problem formulation has been confined to the Ising model alone, and most of its interest for non-physical problems has involved so far only biological and neural sciences (Yamanishi et al., 2004; Ricci-Tersenghi, 2012), or image reconstruction tasks (Kiwata, 2012). To authors' knowledge, this paper is the first using the inverse Potts problem to reconstruct a network in social sciences, and it is in general the first to apply a moment-based Loopy Belief Propagation (LBP) method<sup>3</sup> to solve the Potts inverse problem in the real world.

The paper will be structured as follows. In Section 2, we will present how the Q-states Potts model intervenes in network reconstruction, and our approach to solve it. Then, in Section 3, a reconstruction of the Italian Senate network is reported, starting from data tracking co-sponsorships of law-proposals and inferring interactions among the senators, according to their decision patterns. Finally, in the Conclusions we will compare the results with traditional SNA methods, i.e., not employing statistical inference.

# 2. MODEL AND METHODOLOGY

The principle behind the approach described in this paragraph is that (co-participation in) activities of an organization lead(s) to two-body interactions among the organization members, and these interactions can be captured by a networked structure. In other words, a complete approach handling relations between different realms (e.g., users and activities) must be able also to examine relations within each realm, separately. Using typical SNA nomenclature, this means computing a one-mode network (represented by an adjacency matrix), starting from a two-mode network (represented by an affiliation matrix, that reports participations in the activities, by different organization members). Currently, the standard approach to deduce the onemode matrix is based upon a mere counting and normalization of co-occurrencies, according to some schemes: these include matches-counting, covariance and correlation measures, crossproducts, up to Bonacich and Jaccard indexes (Hanneman and Riddle, 2005). Each of these methods brings along some peculiar features, and the Jaccard index in particular is widely adopted (Borgatti, 2012), being well-suited for sparse affiliation matrices that are very common in the real world.

However, none of these standard approaches resembles probabilistic features, capable of taking into account noisy

<sup>1</sup>Adopted elsewhere in SNA literature for the same task.

<sup>2</sup>Publicly available at http://www.senato.it/leg/16/BGT/Schede/Attsen/Sena.html. 3 See Section 2.

signals, anti-correlations and co-occurrences of idle states<sup>4</sup> . This issue highlights the chance to improve the reconstruction of the corresponding one-mode networks, by a mapping to an inverse statistical problem for pairwise Markov Random Fields (MRF), as discussed in detail later. Especially for large systems, inverse statistical problems are computationally expensive, and approximate methods must be used. For the inverse Ising problem are known: expansions in correlations and clusters (Sessak and Monasson, 2009; Cocco and Monasson, 2011), methods based upon the Bethe approximation (Ricci-Tersenghi, 2012), and pseudo-likelihood methods (Ekeberg et al., 2013). Here, we will refer to the moment representation of the LBP approach (MR-LBP), considered particularly advantageous for solving this task (Horiguchi, 1981).

However, Ising models pose severe limitations for SNA applications<sup>5</sup> , and it would be advantageous to switch to a more general Q-state Potts modeling. A theoretical extension of MR-LBP for this general inverse problem has been provided already by Yasuda et al. (2012), making use of an expansion in Chebyshev polynomials. This approach is briefly outlined in this paragraph, before explaining how to match it with the specific needs of our case. As the first, however, it is important to discuss at an introductory level why inverse Potts (Ising) modeling are considered adapt to deal with affiliation matrices, that may well derive from quantum features of decision processes.

The starting point of the inverse Q-state Potts problem is a set of M observations: <sup>D</sup> = {<sup>d</sup> <sup>µ</sup> ∈ {0, <sup>1</sup>, ...<sup>Q</sup> <sup>−</sup> <sup>1</sup>} n |µ = 1, 2, ...M}. The task of the inverse problem is to reconstruct the (Potts) model subtended to the observations<sup>6</sup> . In other words, each observation in D can be considered a "snapshot" of the network at a certain moment in time, where the (positive integer) state of each node x<sup>i</sup> is observed as d µ i at the µ-th observation. Pairwise states are indicated as x(i,j) : = {xi, <sup>x</sup>j}, meaning that, at the time of the same observation, the states of nodes x<sup>i</sup> and x<sup>j</sup> were found to be as in x(i,j) . In this study case, the allowed Q-states can be interpreted as the possible decisions and thus positions (both active or not), about a law proposal, which can be held by the Senate members.

Now, among the fundamental principles of Quantum Mechanics, there is the possibility that if an object can be in either of two generic orthogonal<sup>7</sup> states |φi and |ψi, then, in general it is also allowed to be in any linear superposition of the two: α|φi + β|ψi. Intuitively, however, when a measurement of the object's state is performed, the state must collapse into either one or the other. This is also at the core of many models exploiting quantumness in the cognitive realm (Haven and Khrennikov, 2013). Mapping this general statement into ourspecific study case is equivalent to supposing that policy network agents perform decisions according to the same scheme of a quantum state measurement. Intuitively, this means that these agents do not already "embed" a decision about what to do, before being asked support for a roll-call. Only when they are confronted with the decision making, they contextually choose one of the possible alternatives to act: before that moment, it is possible to suppose they were in a superposition of some (all) possible decisions. I.e., they were considering also alternatives, before finalizing their choice.

More formally, the generic decision state of each senator can be mapped as a superposition state |Xii, in (some of) the Q-states |χi of the Potts model:

$$|X\_i\rangle = \sum\_{\chi=1}^{Q} \beta\_{i,\chi} |\chi\rangle \tag{1}$$

and each observation of a node's state can be understood as a POVM of |Xii in the basis of the states |χi, that are mutually orthogonal. This underlies the plausible assumption that a single member may desire—but not intend—more than one decision at once, toward a certain law proposal: for example they cannot simultaneously support and ignore the same roll call. Nonclassical effects of this superposition of states guiding the final decision have already been discussed, e.g., in Aerts et al. (2012), and a more complex quantum modeling of decision making has been proposed in Bisconti et al. (2015).

It may be noticed that, when introducing at first the Potts model in this paragraph, no explicit reference to quantum states was made. In fact, this is because an effective treatment of the quantum Potts model can be done within a classical formalism: a more technical justification follows in the rest of this paragraph. Indeed, a quantum Potts model introduces a Hamiltonian characterized by two-body<sup>8</sup> interactions as:

$$\mathcal{H}\_{\text{Potts}} = -\sum\_{\{i,j\}} H\_{(i,j)} \sum\_{\chi} P\_i^{\chi} P\_j^{\chi} \tag{2}$$

where P χ i are projectors onto the |χi state of the local space for the i-th node. H is instead called the ferromagnetic coupling, and it captures the intensity of interaction among the nodes.

It is known how any classical (finite-dimensional) spin model on a lattice can be associated to a quantum model (Somma and Ortiz, 2010), defined on the same lattice, by mapping every classical state x<sup>i</sup> into measurement outcomes of the state |Xii and viceversa. Classically, the spin model has an energy functional that is:

$$\mathcal{E}\_{\text{Potts}} = -\sum\_{\{i,j\}} H\_{\{i,j\}} \mathbf{x}\_{\{i,j\}} \tag{3}$$

Therefore, the energy functional maps into the eigenvalues of the Hamiltonian operator defined in Equation (2), and when performing statistical inference from the observations of the nodes' states in the network, this correspondence allows us to refer directly to the values of the classical variable x<sup>i</sup> . In the following, therefore, the baseline assumption will be that a model subtending a statistical treatment of the network reconstruction

<sup>4</sup>To be intended as those states, that label nodes observed to be inactive, whereas certain other activities are being performed by other nodes.

<sup>5</sup>E.g., the maximum number of allowed states is intrinsically limited to 2, while a generic Q-states Potts model allows <sup>Q</sup> <sup>∈</sup> <sup>N</sup>, <sup>Q</sup> <sup>≥</sup> 2. Indeed, in Section 3, it will be shown how an inverse Ising problem fails for our case study.

<sup>6</sup>To be specific, the observations are supposed to be sampled from a certain MRF. 7 I.e., they cannot be observed simultaneously for the same object.

<sup>8</sup>Here and in the following single-node terms are skipped for simplicity.

problem, inspired by a quantum-mechanical counterpart, can be far more efficient in revealing hidden links and patterns from observations, inferring even those interactions that standard methods are not capable of detecting.

# 2.1. The Inverse Potts Model

It has been seen how a statistical approach to the Potts problem, dealing with classical variables x<sup>i</sup> , still implicitly underlines an intrinsically quantum process of decision making, because the likelihood of observing a certain value d<sup>i</sup> for x<sup>i</sup> can be interpreted in terms of projecting the generic quantum state |Xii, onto the corresponding basis state |χi, where each of the orthogonal basis states identifies a single possible decision. This paragraph is devoted to a detailed explanation of the algorithm inferring relationships among nodes, from the set of observations performed: non-technical readers may skip it and move to the considerations in Section 2.2.

It can be observed how the probability distribution—for observations of the node states x—is clearly connected with the energy functional in Equation (3):

$$\mathcal{P}(\mathbf{x}) \propto \exp\left[-\sum\_{\langle i,j\rangle \in E}^{n} H\_{\langle i,j\rangle}(\mathbf{x}\_{\langle i,j\rangle})\right] \tag{4}$$

and this closely resembles the probability distribution in general pairwise MRF formalism. E defines here the set of connections expected in the model, and therefore the condition (i, j) ∈ E set in the summation can be understood as an explicit network constraint, whereas in Equation (3) we had the generic {i, j}. Now, in the inverse problem, the H(i,j) setting up the network model are unknown<sup>9</sup> and must be inferred by the probabilities in Equation (4). In terms of the orthogonal set of Chebyshev polynomials 8<sup>k</sup> (xi) and appropriate constants J (k,l) (i,j) , it is possible to write the two-body potential function H as:

$$\begin{aligned} H\_{\left(i,j\right)}\left(\mathbf{x}\_{\left(i,j\right)}\right) &= \frac{1}{\sqrt{Q}} \sum\_{k=1}^{Q-1} \left[ J\_{\left(i,j\right)}^{\left(k,0\right)} \Phi\_k(\mathbf{x}\_i) + J\_{\left(i,j\right)}^{\left(0,k\right)} \Phi\_k(\mathbf{x}\_j) \right] + \\ &+ \sum\_{k=0}^{Q-1} \sum\_{l=0}^{Q-1} J\_{\left(i,j\right)}^{\left(k,l\right)} \Phi\_k(\mathbf{x}\_i) \Phi\_l(\mathbf{x}\_j) + \text{constant} \end{aligned} \tag{5}$$

where constant terms in the expansion (e.g., 80(xi)) have been all included in the last constant term. Starting from Equation (5), Yasuda et al. (2012) applied a moment representation of the LBP scheme and message-passing rules to the MRF described so far. Within the Bethe approximation, it was shown how it is possible to approximately find the constants J from marginal probabilities of the observations:

$$J\_{(i,j)}^{(k,l)} = -\sum\_{\mathbf{x}\_i=0}^{Q-1} \sum\_{\mathbf{x}\_j=0}^{Q-1} \Phi\_k(\mathbf{x}\_i) \Phi\_l(\mathbf{x}\_j) \ln \mathcal{P}\_{(i,j)}(\mathbf{x}\_{(i,j)}|\mathcal{D}) \tag{6}$$

thus minimizing the (Bethe) approximate entropy of the model: the P probability values are used to reconstruct the parameters of the Potts model.

The probabilities P, for observing in D, respectively values x<sup>i</sup> and x(i,j) , can also be expressed as sums of Chebyshev polynomials:

$$\mathcal{P}\_i(\boldsymbol{\chi}\_i|\mathcal{D}) = \frac{1}{Q} + \sum\_{k=1}^{Q-1} \langle \Phi\_k(\boldsymbol{\chi}\_i) \rangle\_{\mathcal{D}} \Phi\_k(\boldsymbol{\chi}\_i) \tag{7}$$

$$\begin{split} \mathcal{P}\_{(i,j)}(\boldsymbol{\chi}\_{(i,j)}|\mathcal{D}) &= \\ \frac{1}{Q^2} + \frac{1}{Q} \sum\_{k=1}^{Q-1} [\langle \boldsymbol{\Phi}\_k(\boldsymbol{\chi}\_i) \rangle\_{\mathcal{D}} \boldsymbol{\Phi}\_k(\boldsymbol{\chi}\_i) + \langle \boldsymbol{\Phi}\_k(\boldsymbol{\chi}\_j) \rangle\_{\mathcal{D}} \boldsymbol{\Phi}\_k(\boldsymbol{\chi}\_j)] \\ + \sum\_{k=1}^{Q-1} \sum\_{l=1}^{Q-1} \langle \boldsymbol{\Phi}\_k(\boldsymbol{\chi}\_i) \boldsymbol{\Phi}\_l(\boldsymbol{\chi}\_j) \rangle\_{\mathcal{D}} \boldsymbol{\Phi}\_k(\boldsymbol{\chi}\_i) \boldsymbol{\Phi}\_l(\boldsymbol{\chi}\_j) \end{split} \tag{8}$$

Here, the interesting advantage of using the LBP moment representation is that all the quantities h...i<sup>D</sup> can be derived by averaging over an appropriate number of M observations D of the network.

It can be both intuitively predicted, and numerical experiments in Yasuda et al. (2012) confirmed it, that the number of observations used is correlated with the quality of the final network reconstruction obtained. It shall be observed how in the original paper, numerical experiments were limited to the case when the network structure underlying the inverse problem was a non-periodic lattice (i.e., |i − j| ∈ { / θ[min(i, j) mod p], p} ⇒ (i, j) ∈/ E ⇔ J(i,j) = 0, where θ the step function and p the lattice period).

Considering that the main specific interest of this paper is the reconstruction of the network, i.e., the pairwise interactions among the nodes, here the key parameter in the Potts model is indeed H(i,j) , measuring the intensity of connection between users i and j in the network. Equation (5) shows that H(i,j) is directly related to the set of constants J (k,l) (i,j) .

An interesting feature, that contributes to the sensitivity of this approach compared to standard ones listed above, is that 8<sup>k</sup> (xi)8<sup>l</sup> (xj)—used in Equation (6) for calculating J—is in general different from 0, even when k 6= l. Therefore, interactions are inferred also when simultaneous participation in the same activity plays no role. The interpretation is that, even if one expects no interaction to occur among users because they tended to perform different activities<sup>10</sup> in the observation snapshots, this assumption is actually tested by the reconstruction method against the observations, and indirect ("out-of-diagonal") correlations may be detected.

As better explained in Section 2.2, in most cases data collected from social networks require caution before being used as "observational data" in a Q-state inverse Potts problem.

<sup>9</sup>And in particular, it is unknown which nodes in the interaction model E are truly linked to each other, i.e., have a non-negligible interaction: (i, j) ∈ E ⇐⇒ H(i,j) ≇ 0.

<sup>10</sup>I.e., assuming that J (k,l) (i,j) ∝ δ(k, l), where δ(k, l) is the Kronecker delta. Indeed, also in the pseudo-observational models defined below, the parameters α(i,j) act as an initial guess for the interactions, based upon the assumption that interactions shall be inferred only when simultaneous participation in activities occurs, but this is tested against the observations.

Therefore, it is interesting to mention the possibility to simulate observations, whenever data about the probability distributions are known to depend upon some parameter(s). For example, observation samples may be reconstructed using one only parameter α in a generative model: in this case probabilities of observing a certain collective state x are computed according to α, and it is possible to write down averaged functions (such as the averages required by Equations 7 and 8):

$$\langle f(\mathbf{x})\rangle\_{\mathcal{D}} = \sum\_{\mathbf{x} \in \mathcal{D}} f(\mathbf{x}) P\_{\text{GenMod}}(\mathbf{x}|\alpha) \tag{9}$$

Clearly, because of the assumptions underlying the LBP inverse problem approach, the most general choice for the generative model (GenMod) must be a Q-state Potts model.

# 2.2. Data and Observations

It is left to explain how to employ an inverse Q-state Potts model for the reconstruction of the Italian Senate network of members, starting from data tracking law co-sponsorships by senators. In the Italian legislative system, a law undergoes a few preliminary steps before being discussed in the Senate. As the first, one or more<sup>11</sup> senators are responsible for writing it down, signing and proposing it; these are very similar to the sponsors in US legislative system. After that, other senators who are aware that this specific law is being proposed for discussion, may co-sign it, as an act of endorsement. They act as the US co-sponsors. According to the Senate's schedule, the law is then discussed in detail and subjected (eventually) to a final vote. Therefore, collecting (co)sponsorships' data brings along a considerable insight about patterns of collaboration and support among the senators, and can be considered equivalent to other studies performed with similar legislative bodies in other countries, as cited in the introduction.

Our case study focuses on the first part<sup>12</sup> of the XVI Italian legislature, using co-sponsorship data for Senate roll calls in the same period. We chose this period for two reasons. As the first, one of the intents is to find communities (and their members) in the network by automatic community detection algorithms, and compare the resulting groups with the "official memberships in political parties" of the senators. For this purpose, the beginning of a legislature is ideal, because senators have just been elected as members of a certain political party13. This makes it easy to refer to these parties as their true memberships, whereas at later points in time, several senators may have moved to different political parties (e.g., because some parties have been dismantled), and tracking these changes in a mindful way turns extremely difficult. Moreover, the dataset of this study case is the most recent (thus eventually more interesting from a policy network point of view), while referring at the same time to a past Legislature. This renders available data "crystallized," with less risk of updates to occur.

Usage of minimization procedures in Potts-like models for legislative bodies is not fully new in the literature: for example, in Liu et al. (2010), an Ising model had been used to model the US Senate network starting from bill cosponsorships. However, compared with this previous study, there are here a few important differences.


The connectedness of the US legislative network, given by the ratio cosponsorships/senators, and the reduced number of communities therein, make it adapt of being treated with an Ising 2-state model. Also standard methods may reproduce the structure of that network in an acceptable way, given that its high density may well represent<sup>14</sup> the absence of hidden or evolving links. This considerations, however, suggest that the same approach may provide poor results for the Italian situation.

Here we intend to use a Q-state Potts model directly for the network reconstruction, as outlined in Section **??**. A naive application of the model may involve two only possible states for the nodes (senators).


<sup>11</sup>Usually one or just a few. In some special cases, the law undergoes a peculiar path where no initial senator is quoted for sponsoring the proposal.

<sup>12</sup>Corresponding to the first Cabinet.

<sup>13</sup>Indeed, when in the following there will be references to the "true" memberships of the senators, these have been deduced by the participation of the senators to political groups: a specificity of the Italian Parliament, that enables the tracking of a senator's loyalty to a party or group of parties. See http://www.senato.it/ leg/16/BGT/Schede/GruppiStorici/Grp.html. When, along with the period under observation, a senator belonged to multiple groups, he/she was assigned to the group where he spent most of the observation time.

<sup>14</sup>Actually, a specular interpretation is that the mechanism used to track interaction was poorly efficient, and therefore links in excess shall be excluded by an inference scheme.

This would be equivalent to an Ising model. In order to better explain what follows, it is worth a parenthesis about the Italian case. It was highlighted how each activity, i.e., the proposal for each law in the Senate, was participated in average by about 8 people (NA). That is, for each of the M bills, the active community was in average 2.5% of the whole Senate. Now, using a 2-state Potts model as above implicitly generates correlations also among senators often detected in passive states: Indeed, it is evident how co-occurrences of inactive states would be assigned the same importance, in principle, as co-occurrences of active states.

Is this meaningful? Consider the underlying phenomenon: coendorsing a law proposal presumes a much more intensive link between two senators (as it obviously brings along the sharing of the same political point of view, as well as some sort of acquaintance with the senator who conceived the law itself), compared to simultaneous abstaining from the endorsement (which may be due to lack of chance to discuss and share the law proposal; or to early abundance of cosponsors, making worthless for other senators to join the cosponsoring group; etc.). A simple abstinence from action is an ambiguous behavior, as it supposes no direct opposition or lack of interest. Therefore, it is intuitively necessary to find a mechanism that keeps these inactive correlations<sup>15</sup> less significant, compared to those due to simultaneous observation of the same active state in two nodes<sup>16</sup> .

A first approach may be to still use Q = 2, while explicitly ignoring inactive correlations when computing the interaction parameters. This can be done by replacing:

$$\times\_{i(j)} \in \{0, \ldots, Q-1\} \to \times\_{i(j)} \in \{1, \ldots, Q-1\} \tag{10}$$

in the sums of Equation (6), where we supposed that x<sup>i</sup> = 0 corresponds to the only inactive state. However, this choice will miss the chance of capturing hidden connections, due to simultaneous occurrence of inactive states for some specific reason, and particularly the hostility against the law proposal under discussion.

An effective solution, but computationally expensive, is to pick a high enough Q-value for the model, assigning different x<sup>i</sup> 6= 1 to members in inactive states. In particular, to avoid aprioristic considerations about the level of interaction of people belonging to the same faction, the random probability of assigning two nodes to the same inactive state (pina) shall not be bigger than the average empirical probability of two nodes being assigned to the same active state (pact). Now: pina = N−N<sup>A</sup> (Q+1)<sup>N</sup> <sup>≤</sup> <sup>p</sup>act ∼= 0.024, which gives in turn: Q ≥ 40. Because of the computational complexity of the procedure (O(Q 2 )), here for demonstrative purposes it will be shown how the performance of the method can change moving from Q = 2 up to Q = 10. That is, we start by assigning to inactive correlations the same importance as active correlations, then we progressively reduce the importance of the second compared to the first ones. The case with Q = 5 has a specific underlying reason: community detection algorithms revealed 5 clusters in the Senate network, when run against the network, reconstructed with the standard Jaccard approach, see Section 3. The intent is therefore to try using this information as an initial guess for the LBP approach, introducing a number of possible states corresponding to community membership (under the reasonable assumption that such a membership strongly influences the co-sponsorship decisions). However, it should be emphasized here that partitioning the network in 5 communities may be non-optimal. Indeed, along the period of the analysis performed, it is true that the Parliament involved 4 major parties, plus senators being independent, or belonging to small<sup>17</sup> parties, but the 4 major parties were actually joint in 2 different alliances, thus reducing the number of effective communities to only 3. This is an important consideration, therefore it will be discussed again in the following.

There is still another feature in the procedure, left to discuss: cleaning and eventually generating the observation samples. This feature can be tuned as well, in order to introduce aprioristic knowledge about the network structure. In general, there are at least three different strategies to use properly the collected data:


It is worth to notice how the first strategy replaces real data with samples obtained according to some reasonable<sup>18</sup> assumptions about the strength of relationship among nodes, summarized as αij elements<sup>19</sup> of a preliminary adjacency matrix. For example, in Liu et al. (2010) a count of co-occurrences of active states was used:

$$\alpha\_{ij} = \sum\_{\mu} \frac{\delta(\mathbf{x}\_i^{\mu}, \mathbf{x}\_j^{\mu}, 1)}{n\_{\mu}} \tag{11}$$

weighted with the number of cosponsors nµ, for each bill µ.

Even if a standard method is used for the preliminary calculation of the interaction among the nodes, the LBP procedure still intervenes in allowing to infer hidden connections, not evident from the first step. The generative approach is particularly useful whenever only a few or only aggregate<sup>20</sup> data are available for the analysis. However, this

<sup>15</sup>As we will call them in the following for simplicity.

<sup>16</sup>In the following for simplicity: active correlations.

<sup>17</sup>Here by small, we intend parties with a number of senators below the threshold of 10, because this is the minimum number to constitute an official group in the Italian Senate. Smaller parties are obliged to group in the so called mixed group.

<sup>18</sup>E.g., frequency considerations, as those used in the following Equations (11) and (13) for the semi-observational model.

<sup>19</sup>Used in a second step to calculate the probability of observations in the sample, see Equation (12).

<sup>20</sup>I.e., there is no temporal allocation of the single observations, but only a global count.

strategy still introduces a manipulation of original data, in order to make the network reconstruction possible or less noisy<sup>21</sup> .

The semi-observational strategy can be seen as a compromise between the adoption of a generative model, and the direct usage of data with no further adjustments. In this case, observation data are used directly for each step of the network reconstruction, except the calculation of averages. More technically, in this case h8<sup>k</sup> (xi)i<sup>D</sup> and h8<sup>k</sup> (xi)8<sup>l</sup> (xj)i<sup>D</sup> are not simple averages from the samples' set, but they are adjusted according to Equation (9). As an example, the PGenMod probabilities of occurrence of the state x may be chosen as:

$$P\_{Potts}(\mathbf{x}|\alpha) \propto \exp\left[\frac{1}{2} \sum\_{i=1}^{N} \sum\_{j=1}^{N} \alpha\_{ij} \delta(\mathbf{x}\_i, \mathbf{x}\_j)\right] \tag{12}$$

Generally speaking, the introduction of a subtending model as in Equation (12) favors a reconstruction similar to the output of the standard preliminary reconstruction method, because the probabilities of observing configurations (not) matching the standard reconstructions are increased (decreased), compared to the probabilities calculated directly from the observations.

The α parameters were evaluated here in terms of frequencies of matching activities, within the set of observations, according to different approaches. One possibility is a pure frequentist probability, for the two nodes i and j to be observed in the same active state:

$$\alpha\_{\vec{\imath}\vec{\jmath}} = \frac{1}{M} \sum\_{\mu} \frac{\delta(d\_i^{\mu}, d\_{\vec{\jmath}}^{\mu}, 1)}{n\_{\mu}} \tag{13}$$

with generalized Kronecker δ(i, j, k) = 1 ⇐⇒ i = j = k and null otherwise, and the same weighting of Equation (11). This strategy penalizes the interactions of those nodes having a poor participation rate.

A second derivation for α, instead, was adjusted against the number of times the two users were active:

$$\alpha\_{ij} = \frac{1}{\sum\_{\mu} \left( \delta(d\_i^{\mu}, 1) + \delta(d\_j^{\mu}, 1) \right)} \sum\_{\mu} \frac{\delta(d\_i^{\mu}, d\_j^{\mu}, 1)}{n\_{\mu}} \tag{14}$$

thus reducing the bias of the previous formula toward active nodes.

Finally, when a pure observational method is used, the α parameter should play no role22, because no generative model needs to be provided and all the averaged quantities are computed as from the original set of data. Unfortunately, a pure observational method with the considered dataset (characterized by Q = 2, because of lacking information) intuitively requires to omit the contribution of inactive correlations, such as in Equation (10), in order not to overestimate their contribution.

Whatever the strategy chosen to derive observation samples from original data, the interactions H(i,j) will be calculated replacing in Equation (5) the pairwise interactions J from Equation (6).

# 3. RESULTS AND DISCUSSION

It was envisaged the importance of the LBP inference method, for discovering non-evident links and connections among the network members, as compared to traditional methods not employing statistical inference. This paragraph illustrates the first numerical application of a LBP procedure, to reconstruct a generic graph Potts model. Previous simulations (Yasuda et al., 2012), indeed, dealt only with lattice-like Potts models: the sums in Equation (12) had a constant α instead of αi,<sup>j</sup> , and the allowed indexes were only those compatible with the lattice structure (i, j) ∈ E.

The first and most important results to be observed are in **Table 1** and in **Figure 1**. In the table are reported the main network parameters for the various methods listed in Section 2. As a comparison, the network was also reconstructed via the Jaccard index, a standard method particularly adapt to sparse networks (Borgatti, 2009), such as the one analyzed in this paper (the calculated density is indeed smaller than 0.01). For LBPreconstructed networks, we introduced an additional parameter, the threshold (tm). In fact, after normalizing the intensity of connections (i.e., 0 ≤ H(i,j) ≤ 1), the density of these networks was close to 1 in most approaches. This is an effect of the sensitivity of the LBP method, prone to reproduce in the final adjacency matrix also links due to noise. In order to exclude the weakest links, we set a threshold value t<sup>m</sup> = 0.5, thus comparing the residual links with the standard network. It is evident how in all cases, also the LBP-reconstructed social network displays many more connections compared to the Jaccard one. These hidden connections would be hard to identify without referring to an inference statistical method, and this is a novelty of the approach. In the pure observational case, because off-diagonal interactions were neglected (i.e., the case in Equation 10), also noisy connections tend to occur in a small range, thus producing still a very high density at t<sup>m</sup> = 0.5. Because of the increased difficulty to filter properly this noise, the pure observational model will be omitted from analyses in the following. Also the average strength of all the links detected "Avg. H(i,j)" has an interesting behavior: it is strongly affected by the initial guess for the network structure, that the LBP method tries to reproduce, and considerably less, instead, by the value chosen for Q.

In **Figure 1**, instead, it is performed a more systematic analysis of the relation between the number of links in the network, against the threshold parameter23. A few interesting features are evident. As the first, all statistical methods tend to saturate the network at low values of tm. Moreover, a smoothing effect in the dependency of the number of links on the threshold value is observed, both when increasing Q or decreasing the average αij. The higher the smoothing, the closest are the data to the expected

<sup>21</sup>Indeed, once the preliminary model has been decided, samples can be drawn from it in abundance, whereas a real sampling of a social network is clearly bound to pragmatic constraints.

<sup>22</sup>For the case <sup>Q</sup> <sup>=</sup> 2, actually, it is advised to adopt a fictitious <sup>α</sup> <sup>=</sup> constant <sup>≪</sup> 1, because the critical value for the 2-state Potts model is α = 0.88, therefore the calculation may turn unstable if averages are computed directly.

<sup>23</sup>Clearly, the density and thus the number of links detected in the Jaccard network is independent from any threshold chosen.


TABLE 1 | Collection of fundamental model and network parameters, for a set of network reconstruction methods, and basic metrics resulting from the analysis.

The analysis with Q = 10 required to artificially generate more samples than the direct observations from data, for the results to be reliable, therefore it is listed only within the Generative case. \*For numerical convergence reasons, in the Pure observational case, it was set αij = constant = 0.01.

exponential decay in the number of detected links24. Several possible explanations for this conclusion may be proposed. As the first, preliminary community detection analyses with networks reconstructed via standard methods identified 5 groups<sup>25</sup> in the Italian Senate network. This suggests how the observation of only 2 states with the roll calls tends to produce distortions and artifacts. In fact, results are improved also by randomly introducing states other than the observed (non)sponsoring. Smaller values of α, instead, allow the method to compute the J (k,l) (i,j) not in proximity of critical values ln(1 + √ Q) of the Potts model, thus improving the stability of the results.

A different analysis was focused about the capability of the method not only to reconstruct pairwise interactions, but also to better identify the clusters inside the network, to be interpreted as communities of members. In **Figure 2** it is investigated how

the number of such communities depends on tm: the plot shows that when a small t<sup>m</sup> is taken into account26, LBP reconstructed networks have a cluster structure involving 2–3 groups. A plausible interpretation is that, if hidden links are considered, slight differences in the policy approach by the Senate members are swiped out in the analysis, and the CNM algorithm tends to detect only the fundamental communities: the ones related with the party(ies) participating in the Cabinet, and the group of parties opposing the first ones (plus eventually a third group which may be considered as composed by neutral senators). As the threshold is increased, and the graph becomes more disconnected, also clustering features are emphasized, and the number of detected communities increases. In particular, when a high number of possible states is allowed (high Q), and at the same time weak interactions are hypothesized (αij is small in average), the number of communities tends to "explode." However, excluding this extreme case, detected communities are otherwise stable, ranging between 3 and 6. It is also evident how, when links in the network are filtered and the cluster structure emerges, the network assuming Q = 2 totally fails to reproduce a plausible number of communities: this is clearly due to the

<sup>24</sup>This can be inferred by recalling that the probability to observe a certain collective state x has the form in Equation (4).

<sup>25</sup>Here and in the following, communities are always detected with a very successful method based upon random graph theory, the Clauset-Newman-Moore (CNM) method (Clauset et al., 2004). Only communities whose sizes are bigger than 3 nodes will be considered, while isolated nodes and dyads will be omitted.

<sup>26</sup>Preserving more links, indeed, leads to the discovery of weak interactions.

artifact of imposing a naturally bipartite network in the model, which does not correspond, though, to the expected network structure.

Finally, we investigated if—and how much—the LBP algorithm is able to improve the assignment of senator-nodes to the "right" political community: i.e., the one identified by the same political party, the senator officially belonged to. The figure of merit will be a sort of a false discovery rate (FDR), that is, the ratio between the number of senators assigned by the CNM algorithm to wrong communities, and the total number of senators analyzed. In order to emphasize the role of hidden links, the focus will be the capability of the algorithm to classify the senators as belonging to the group supporting the government, the opposition group, or the mixed independent group.

Referring to **Figure 3**, the reconstruction of the network via a simple Jaccard coefficient in this case is already capable of reproducing accurately the true membership of the nodes, scoring only about 15% of nodes classified. The case of LBP with Q = 2, instead, is very inefficient: almost half of the nodes is misclassifiedm, even if the target number of communities is close enough to allowed values of Q. This result shows the importance of extending the Ising model used elsewhere in analyses of policy networks: even when a subtended bipartite interaction is tracked (i.e., sponsoring VS abstaining in a roll call vote), in the end this is a projection of a more complex state, each agent in the network is before performing the voting action. From a modeling perspective, such a gap between model and reality can be reduced by a full quantum Ising model (as showed by Liu et al., 2010), or by semi-classical approaches with a Q-state Potts model. In fact, results with Q = 5 display a great improvement compared to the case with Q = 2. Especially when t<sup>m</sup> has a value in the range where the number of detected communities is stable, the percentage of nodes classified in the wrong group almost matches, or even outperforms the Jaccard one (13%), without assuming a-priori that indirect correlations among the network members are negligible. Interestingly, the best results are achieved for values of the threshold, corresponding to intervals where the number of comunities detected is stable (compare with **Figure 2**).

Moreover, it must be remembered how this analysis is affected by an important bias. States other than the active state (i.e., x<sup>i</sup> = 1 for node i) are assigned randomly, therefore favoring communities of homogeneous cardinality: some misclassified nodes originally belonged to mid-size communities, but at their expense, these nodes where assigned to smaller groups. Networks obtained with very low thresholds are particularly prone to this effect. Some other misclassifications are due to a specular effect: similarly to the "rich gets richer" phenomenon, discovery of hidden links increases the size of the major communities at the expense of the smallest ones, as it is expected when modularity-based community detection algorithms are applied to very dense networks. Indeed, in all LBP-reconstructed networks the group of senators members of the party leading the Cabinet was always (mistakenly?) bigger than expected. This effect is evident comparing **Figure 4** with **Figure 5**, where the last one has

been indeed obtained with LBP and a relatively high t<sup>m</sup> = 0.35. The specular consideration above suggests how to compensate this artifact, by lowering opportunely the threshold t<sup>m</sup> (the corresponding network graph is omitted for brevity). In any case, it shall be remembered how major Senate groups were actually bigger at the beginning of the legislature, compared to its end (when a few independent groups had been founded). By inferring weak links, it can be thus argued how LBP algorithms thus proved more efficient in merging communities into a few principal components, compared to forcing modularity algorithms to split the network in 2–3 groups (i.e., forcing the CNM algorithm to merge further the 5 communities detected as optimal by its modularity maximization procedure, leading to the results in **Figure 3**).

On the other side, further increasing the value of Q required to rely upon a Generative model, in order to have a number of samples sufficient for the analysis (100 k samples), whereas the reduced number of original data was prone to cause difficulties<sup>27</sup> in the numerical simulations. As envisaged in Section 2.2, moving from a (Pseudo-)Observational to a Generative approach produced a degradation in the results, because of losing the temporal information of the available data. In conclusion, an approach with higher Q, but still low enough to be based upon observational data, seems to produce the best and more stable results for both hidden links and community detection purposes.

# 4. CONCLUSIONS

Along the paper, a method based upon Q-state Potts inverse problem and Bethe-LBP approximation for network reconstruction was elucidated. Several possible ways were disclosed, to use the method for inferring links among the nodes of a generic networked social structure, under the hypotheses that: (i) actions like roll call sponsorships resemble decision-making processes and (ii) that these processes can be modeled efficiently by methods used for inferring the structure of an ensemble of quantum states, observed repeatedly over time.

The LBP-based resolution of the inverse problem was applied for the first time to reconstruct a generic graph structure. More specifically, in the Social Sciences realm, this work has been the first to use a Q-state model (instead of Ising model) to infer the structure of a real network. The study case chosen was the Italian Senate, analyzed starting from a dataset tracking law proposal cosponsorships. This allowed to evaluate the power of the method in detecting those links, that cannot be retrieved via standard reconstruction methods. Also the role of the diverse modeling choices—and peculiar parameters employed—was thoroughly

# REFERENCES


discussed, finding how the maximal value of Q permitted by the Potts model can introduce crucial differences in the quality of the results, alongside with aprioristic knowledge about the network structure.

It was investigated, as well, the capability of the model to reproduce the community structure of the network and the single memberships of the senators: it was found that the present method must be carefully reviewed, compared to standard ones, in order to produce a reliable output. In fact, a naive application without any further assumption may lead to completely wrong conclusions. The reason is that the Potts-LBP method is much closer to an ab-initio approach, therefore it originally embeds no information such as the weight to be assigned to inactive vs. active states, or direct vs. indirect correlations, or how weak connections shall be considered noisy, ... In turn, this higher flexibility allows to explore the role (and therefore the plausibility) of several assumptions made when reconstructing the network.

The authors envisage how interesting directions for further investigation may be the adoption of a full quantum treatment of the Potts model, as well as the possibility to apply this extended method to cases where data retrieved for the network do exhibit natively non-bipartite features, thus allowing a more direct application of generic Q-state Potts models.

# FUNDING

This work was part of the project "MUSCA" (PAC02L1-0018), funded by the Italian Ministry of Education, University and Research.

# ACKNOWLEDGMENTS

We thank Emanuele Rizzo for his expert technical help, and Dr. Marianovella Mello for aiding us throughout any administrative trouble.


<sup>27</sup>The simulation for high values of Q requires limited precision in the intermediate values calculated, to reduce the memory space required.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Bisconti, Corallo, Fortunato, Gentile, Massafra and Pellè. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Quantum information, cognition, and music

Maria L. Dalla Chiara<sup>1</sup> \*, Roberto Giuntini <sup>2</sup> , Roberto Leporini <sup>3</sup> , Eleonora Negri <sup>4</sup> and Giuseppe Sergioli <sup>2</sup>

<sup>1</sup> Dipartimento di Lettere e Filosofia, Università di Firenze, Firenze, Italy, <sup>2</sup> Dipartimento di Pedagogia, Psicologia, Filosofia, Università di Cagliari, Cagliari, Italy, <sup>3</sup> Dipartimento di Ingegneria Gestionale, dell'Informazione e della Produzione, Università di Bergamo, Dalmine, Italy, <sup>4</sup> Scuola di Musica di Fiesole, San Domenico di Fiesole, Fiesole, Italy

Parallelism represents an essential aspect of human mind/brain activities. One can recognize some common features between psychological parallelism and the characteristic parallel structures that arise in quantum theory and in quantum computation. The article is devoted to a discussion of the following questions:


#### Edited by:

Sandro Sozzo, University of Leicester, UK

#### Reviewed by:

Tomas Veloz, University of British Columbia, Canada Matías Graffigna, University of Buenos Aires, Argentina

> \*Correspondence: Maria L. Dalla Chiara dallachiara@unifi.it

#### Specialty section:

This article was submitted to Cognition, a section of the journal Frontiers in Psychology

Received: 08 August 2015 Accepted: 29 September 2015 Published: 21 October 2015

#### Citation:

Dalla Chiara ML, Giuntini R, Leporini R, Negri E and Sergioli G (2015) Quantum information, cognition, and music. Front. Psychol. 6:1583. doi: 10.3389/fpsyg.2015.01583 Keywords: Turing machines, quantum computation, quantum information, semantics, music cognition

# 1. INTRODUCTION

Parallelism represents an essential aspect of the activities of human brain and mind. One can recognize some common features between psychological parallelism and the characteristic parallel structures that arise in quantum theory and in quantum computation, being responsible for the extraordinary efficiency and speed of quantum computers.

Quantum parallelism and classical parallelism are deeply different, although it is sometimes claimed that quantum Turing machines are nothing but special examples of classical probabilistic Turing machines<sup>1</sup> . But what exactly are quantum Turing machines? So far, the literature has not provided a rigorous "institutional" concept of quantum Turing machine. Some definitions seem to be based on a kind of "imitation" of the classical definition of Turing machine, by referring to a tape (where the symbols are written) and to a moving head (which changes its position on the tape)<sup>2</sup> . These concepts, however, seem to be hardly applicable to physical quantum computers. Both in the classical and in the quantum case, it is expedient to consider a more abstract concept: the notion of state machine, which neglects both tapes and moving heads. Every finite computational task realized in different computational models proposed in the literature can be simulated by a state machine<sup>3</sup> . In order to compare classical and quantum parallelism, we will analyze the concepts of (classical) deterministic state machine, (classical) probabilistic state machine, and quantum state machine. On this basis we will discuss the question: to what extent can quantum state machines be simulated by probabilistic state machines? (Sections 2, 3).

3 See, for instance, Savage (1998) and Gudder (1999).

<sup>1</sup> See, for instance, Penrose (1994).

<sup>2</sup> See, for instance, Fouché et al. (2007).

In the investigation about possible links between quantum structures and psychological structures a useful tool is represented by a special form of quantum logical semantics (called quantum computational semantics) that has been inspired by the theory of quantum computation. We will see how this semantics can be naturally applied to a formal analysis of musical compositions, where parallel structures, ambiguity, holism, and contextuality play an essential role (Sections 4, 5)<sup>4</sup> .

Our analysis seems to confirm a general conjecture that has been defended and discussed in different research-fields: the basic concepts of the quantum-theoretic formalism (which had for a long time been regarded as mysterious and potentially paradoxical) seem to have a universal interest that goes beyond the domain of microphysical phenomena.

# 2. CLASSICAL DETERMINISTIC AND PROBABILISTIC MACHINES

We will first introduce a formal definition for the notion of deterministic state machine. On this basis, probabilistic state machines will be represented as stochastic variants of deterministic machines, which are able to calculate different outputs with different probability-values.

**Definition 1.** Deterministic state machine.

A deterministic state machine is an abstract system **M** based on the following elements:


(R0,... , Rt).

Each R<sup>i</sup> is a partial function that transforms configurations into configurations. We may have: R<sup>i</sup> = R<sup>j</sup> with i 6= j. The number i, corresponding to the rule R<sup>i</sup> , represents the i-th step of the program. The following conditions are required:

4.1 The rule R<sup>0</sup> is defined for any configuration (s0,w0), where s<sup>0</sup> is the initial state sin and w<sup>0</sup> is a possible wordinput. We have:

$$R\_0: (s\_0, \,\,\omega\_0\} \mapsto \,\,\, (s\_1, \,\,\omega\_1),$$

where s<sup>1</sup> is different from the initial state and from all halting states (if t 6= 0).

4.2 For any i (0 < i < t),

$$R\_i \colon (s\_i, \,\,\omega\_i) \longmapsto (s\_{i+1}, \,\,\omega\_{i+1}),$$

where s<sup>i</sup> <sup>+</sup> <sup>1</sup> is different from all si,... ,s<sup>0</sup> and from all halting states.

4.3 R<sup>t</sup> : (st,wt) 7→ (s<sup>t</sup> <sup>+</sup> <sup>1</sup>,w<sup>t</sup> <sup>+</sup> <sup>1</sup>), where s<sup>t</sup> <sup>+</sup> <sup>1</sup> is a halting state.

Each configuration (s<sup>i</sup> <sup>+</sup> <sup>1</sup>,w<sup>i</sup> <sup>+</sup> <sup>1</sup>) represents the output for the step i and the input for the step i + 1.

The concept of computation of a deterministic state machine can be now defined as follows.

**Definition 2.** Computation of a deterministic state machine. A computation of a deterministic state machine **M** is a finite sequence of configurations

$$((s\_0, \,\,\omega\_0), \,\,\dots, \,\,(s\_{t+1}, \,\,\omega\_{t+1})),$$

where:


$$(s\_{i+1}, w\_{i+1}) = R\_i((s\_i, w\_i)),$$

where R<sup>i</sup> is the i-th rule of the program.

The configurations (s0,w0) and (s<sup>t</sup> <sup>+</sup> <sup>1</sup>,w<sup>t</sup> <sup>+</sup> <sup>1</sup>) represent, respectively, the input and the output of the computation; while the words w<sup>0</sup> and w<sup>t</sup> <sup>+</sup> <sup>1</sup> represent, respectively, the word-input and the word-output of the computation.

Apparently, each deterministic state machine is devoted to a single task that is determined by its program.

Let us now turn to the concept of probabilistic state machine. The only difference between deterministic and probabilistic state machines concerns the program, which may be stochastic in the case of a probabilistic state machine (**PM**). In such a case, instead of a sequence of rules, we will have a sequence (Seq0,... , Seqt) of sequences of rules such that:

$$\begin{aligned} Seq\_0 &= (R\_{0\_1}, \dots, R\_{0\_r}) \\\\ &\dots \dots \dots \dots \dots \\\\Seq\_t &= (R\_{\ell\_1}, \dots, R\_{\ell\_l}). \end{aligned}$$

Each rule Ri<sup>j</sup> (occurring in the sequence Seqi) is associated to a probability-value pi<sup>j</sup> such that:

$$\sum\_{j} p\_{i\_{j}} = 1.$$

From an intuitive point of view, pi<sup>j</sup> represents the probability that the rule Ri<sup>j</sup> be applied at the i-th step. A deterministic

<sup>4</sup> Some basic intuitive ideas of the quantum computational semantics are close to the "quantum cognition approach" that has been extensively developed in recent times (see, for instance, Aerts and Gabora, 2005a,b; Aerts and Sozzo, 2014). In both theories concepts and thoughts are represented as special abstract entities that can be described in the framework of the quantum-theoretic formalism. The technical developments of the two approaches are, however, different.

state machine is, of course, a special case of a probabilistic state machine characterized by the following property: each sequence Seq<sup>i</sup> consists of a single rule R<sup>i</sup> .

Any probabilistic state machine naturally gives rise to a graphstructure for any choice of an input-configuration conf<sup>0</sup> = (s0,w0). As an example, consider the following simple case: a probabilistic state machine **PM** whose program consists of two sequences, each consisting of two rules:

$$Seq\_0 = (R\_{0\_1}, R\_{0\_2})$$

$$Seq\_1 = (R\_{1\_1}, R\_{1\_2}).$$

The graph associated to **PM** for the configuration conf<sup>0</sup> is illustrated by **Figure 1**.

How do probabilistic machines compute? In order to define the concept of computation of a probabilistic machine, let us first introduce the notions of program-path and of computation-path of a given probabilistic machine.

**Definition 3.** Program-path and computation-path. Let **PM** be a probabilistic state machine with program (Seq0,... , Seqt).

• A program-path of **PM** is a sequence

$$\mathcal{P} = (\mathbb{R}\_{0\_{\mathbb{N}}}, \dots, \mathbb{R}\_{\mathbb{\hat{i}}\_{\mathbb{I}}}, \dots, \mathbb{R}\_{\mathbb{\hat{t}}\_{\mathbb{k}}}),$$

consisting of t rules, where each Ri<sup>j</sup> is a rule from Seq<sup>i</sup> (probabilistically independent of all other rules of P).

• For any choice of an input (s0,w0), any program-path <sup>P</sup> determines a sequence of configurations

$$\mathcal{CP} = ((s\_0, \boldsymbol{w}\_0), \dots, (s\_l, \boldsymbol{w}\_l), \dots, (s\_{l+1}, \boldsymbol{w}\_{l+1})),$$

where (s<sup>i</sup> <sup>+</sup> <sup>1</sup>,w<sup>i</sup> <sup>+</sup> <sup>1</sup>) = Ri<sup>j</sup> (si,wi) and Ri<sup>j</sup> is the i-th element of P. This sequence is called the computation-path of **PM** determined by the program-path P and by the input (s0,w0). The configuration (s<sup>t</sup> <sup>+</sup> <sup>1</sup>,w<sup>t</sup> <sup>+</sup> <sup>1</sup>) represents the output of CP.

Any program-path <sup>P</sup> <sup>=</sup> (R0<sup>h</sup> ,... , Ri<sup>j</sup> ,... , Rt<sup>k</sup> ) has a welldetermined probability-value p(P), which is defined as follows (in terms of the probability-values of its rules):

$$p(\mathcal{P}) \colon= p\_{0\_{\hbar}} \cdot \cdot \cdot \cdot \cdot p\_{i\_{\jmath}} \cdot \cdot \cdot \cdot \cdot \cdot p\_{\hbar\_{\hbar}} \cdot$$

As expected, the probability-value of a program-path P naturally determines the probability-values of all corresponding computation-paths. It is sufficient to put:

$$p(\mathcal{CP}) \colon= p(\mathcal{P}).$$

Consider now the set **PPM** of all program-paths and the set **CPPM** of all computation-paths of a probabilistic machine **PM**. One can easily show that:

$$\sum\_{i} \left\{ p(\mathcal{P}\_{i}) | \mathcal{P}\_{i} \in \mathbf{P\_{PM}} \right\} = \sum\_{i} \left\{ p(\mathcal{CP}\_{i}) | \mathcal{CP}\_{i} \in \mathbf{CP\_{PM}} \right\} = 1.1$$

On this basis the concept of computation of a probabilistic state machine can be defined as follows.

**Definition 4.** Computation of a probabilistic state machine. A computation of a probabilistic state machine **PM** with input (s0,w0) is the system of all computation-paths of **PM** with input (s0,w0).

Unlike the case of deterministic state machines, a computation of a probabilistic state machine does not yield a unique output. For any choice of a configuration-input (s0,w0), the computation-output is a system of possible configurationoutputs (s i t + 1 ,w i t + 1 ), where each (s i t + 1 ,w i t + 1 ) corresponds to a computation-path CP<sup>i</sup> . As expected, each (s i t + 1 ,w i t + 1 ) has a well-determined probability-value that is defined as follows:

$$p((s\_{t+1}^i, w\_{t+1}^i)) := \sum\_i \left\{ p(\mathcal{CP}\_i) | \text{the configuration-output of } i \right\}$$

$$\mathcal{CP}\_i \text{ is } \left\{ s\_{t+1}^i, w\_{t+1}^i \right\}.$$

One can easily show that the sum of the probability-values of all configuration-outputs of any machine **PM** is 1.

# 3. QUANTUM STATE MACHINES

The strong parallelism that characterizes quantum computers is based on two quantum-theoretic notions that have been often described as mysterious and potentially paradoxical: superposition and entanglement. For the readers who are not expert of quantum theory it is expedient to recall some concepts of the quantum formalism that are used in quantum computation<sup>5</sup> . The basic idea is that any piece of quantum information is mathematically represented as a possible state of a quantum system that can store and transmit the information in question. In the simplest situations one is dealing with a single particle S (say, an electron or a photon), whose "mathematical environment" is a special example of a vector space: the twodimensional Hilbert space C 2 , based on the set of all ordered pairs of complex numbers. The canonical (orthonormal) basis of C 2 consists of the two following unit-vectors:

$$|0\rangle = (1,0); \ |1\rangle = (0,1),$$

<sup>5</sup>A survey of quantum computation theory can be found, for instance, in Nielsen and Chuang (2000).

which represent, in this framework, the two classical bits (0 and 1), or (equivalently) the two classical truth-values (Falsity and Truth). A pure state corresponds to a maximal piece of information that cannot be consistently extended to a richer knowledge. Such state is represented as a unit-vector |ψi that can be expressed as a superposition of the two elements of the canonical basis of C 2 :

$$|\psi\rangle = c\mathbf{0}|0\rangle + c\mathbf{1}|1\rangle,$$

where c<sup>0</sup> and c<sup>1</sup> (also called amplitudes) are complex numbers such that |c0| <sup>2</sup> + |c1<sup>|</sup> <sup>2</sup> <sup>=</sup> 1.

The physical interpretation of |ψi (also called qubit-state or, briefly, qubit) is the following: the physical system S in state |ψi might satisfy the physical properties that are certain for the bit |0i with probability |c0| 2 and might satisfy the physical properties that are certain for the bit |1i with probability |c1| 2 . Due to the characteristic indeterminism of quantum theory, the pure state |ψi is at the same time a maximal and logically incomplete piece of information that cannot decide some important physical properties of the system S. Accordingly, from an intuitive point of view, one can say that |ψi describes a kind of cloud of potential properties that might become actual when a measurement is performed. Measuring a physical quantity (by means of an apparatus associated to the canonical basis) determines a sudden transformation of the qubit |ψi either into the bit |0i or into the bit |1i. Such transformation is usually called collapse of the wave-function.

Not all states associated to a physical system S are pure. Non-maximal pieces of information can be represented as mixtures of pure states (special examples of operators called density operators). In the space C 2 a density operator ρ can be represented as a convenient finite sum of projection-operators:

$$\rho = \sum\_{i} w\_i P\_{\mid \psi\_i \rangle},$$

where w<sup>i</sup> are real numbers such that P <sup>i</sup> w<sup>i</sup> = 1, while each P|ψi<sup>i</sup> is a projection-operator that projects along the direction of |ψi. Notice that such representation is not generally unique. A density operator that cannot be represented as a projection P|ψ<sup>i</sup> is called a proper mixture. While pure states codify an essential indetermination of some relevant properties of the quantum system under investigation, mixtures may correspond to an epistemic uncertainty of the observer. Unlike pure states (which always satisfy some well-determined properties), there are mixtures that cannot decide any (non-trivial) property of the associated system. An example of this kind is the state ρ = 1 2 I, where I is the identity operator of the space C 2 .

As happens in classical information theory, quantum computation also needs complex pieces of information, which are supposed to be stored by composite quantum systems (generally consisting of n subsystems). Accordingly, one can naturally adopt the quantum-theoretic formalism for the mathematical representation of composite physical systems, based on the use of tensor products (special examples of products)<sup>6</sup> . While a single qubit is a unit-vector of the space C 2 , a pure state representing a complex piece of information can be identified with a unit-vector of the n-fold tensor product of C 2 :

$$\otimes^n \mathbb{C}^2 = \underbrace{\mathbb{C}^2 \otimes \dots \otimes \mathbb{C}^2}\_{n-times} \text{ (with } n \ge 1\text{)}\text{.}$$

Such vectors are called quregisters. The canonical basis of the space <sup>⊗</sup>n<sup>C</sup> 2 consists af all registers, products of bits that have the following form:

$$|\mathfrak{x}\_1\rangle \otimes \dots \otimes |\mathfrak{x}\_n\rangle \quad \text{(where any } \mathfrak{x}\_i \text{ is either 0 or 1)}.$$

Instead of |x1i ⊗ ... ⊗ |xni, it is customary to write |x1,... , xni. Any quregister can be represented as a superposition of registers:

$$|\psi\rangle = \sum\_{i} c\_{i} |\varkappa\_{i\_{1}}, \dots, \varkappa\_{i\_{n}}\rangle,$$

where c<sup>i</sup> are complex numbers such that P i |ci | <sup>2</sup> <sup>=</sup> 1. A tensor product |ψ1i ⊗ ... ⊗ |ψni (of n quregisters) is often briefly indicated by: |ψ1i... |ψni.

Quantum computation makes essential use of some characteristic quantum states that are called entangled. In order to illustrate the concept of entanglement from an intuitive point of view, let us refer to a simple paradigmatic case. We are concerned with a composite physical system S consisting of two subsystems S<sup>1</sup> and S<sup>2</sup> (say, a two-electron system). By the quantum-theoretic rules that concern the mathematical description of composite systems, all states of S shall live in the tensor product <sup>H</sup> <sup>=</sup> <sup>H</sup><sup>1</sup> <sup>⊗</sup> <sup>H</sup>2, where <sup>H</sup><sup>1</sup> and <sup>H</sup><sup>2</sup> are the Hilbert spaces associated to the systems S<sup>1</sup> and S2, respectively. The observer has a maximal information about S: a pure state |ψi of H. What can be said about the states of the two subsystems? Due to the form of |ψi, such states cannot be pure: they are represented by two identical mixtures, which codify a "maximal degree of uncertainty." A typical possible form of |ψi is the following Bell-state:

$$|\psi\rangle = \frac{1}{\sqrt{2}}(|0,0\rangle + |1,1\rangle),$$

which lives in the space C <sup>2</sup> <sup>⊗</sup> <sup>C</sup> 2 , whose canonical basis consists of the four vectors |0, 0i, |0, 1i, |1, 0i, |1, 1i .

This gives rise to the following physical interpretation: the global system S might satisfy the properties that are certain either for the state |0, 0i or for the state |1, 1i with probability-value 1 2 . At the same time, |ψi determines that the reduced state of both subsystems (S<sup>1</sup> and S2) is the mixture <sup>1</sup> 2 I. Although it is not determined whether the state of the global system S is |0, 0i or |1, 1i, the two subsystems S<sup>1</sup> and S<sup>2</sup> can be described

<sup>6</sup>The basic property of the tensor product <sup>H</sup><sup>1</sup> <sup>⊗</sup> <sup>H</sup><sup>2</sup> of two (finite-dimensional) Hilbert spaces <sup>H</sup><sup>1</sup> and <sup>H</sup><sup>2</sup> is the following: <sup>H</sup>1⊗H<sup>2</sup> is a Hilbert space that properly

includes an isomorphic image of the Cartesian product <sup>H</sup><sup>1</sup> <sup>×</sup> <sup>H</sup><sup>2</sup> (consisting of all ordered pairs of vectors that belong to the spaces H<sup>1</sup> and H2, respectively). Furthermore, <sup>H</sup>1⊗H<sup>2</sup> contains all possible superpositions of its elements. A vector <sup>|</sup>ψ<sup>i</sup> of <sup>H</sup><sup>1</sup> <sup>⊗</sup> <sup>H</sup><sup>2</sup> is called factorized iff <sup>|</sup>ψ<sup>i</sup> corresponds to a pair (|ψ1i, <sup>|</sup>ψ2i) <sup>∈</sup> <sup>H</sup><sup>1</sup> <sup>×</sup> <sup>H</sup>2. In such a case, it is customary to write: <sup>|</sup>ψi = |ψ1i ⊗ |ψ2i. Of course, not all vectors of <sup>H</sup><sup>1</sup> <sup>⊗</sup> <sup>H</sup><sup>2</sup> are factorized.

as "entangled," because in both possible cases they would satisfy the same properties, turning out to be indistinguishable. As a consequence, any measurement performed by an observer either on system S<sup>1</sup> or on system S<sup>2</sup> would instantaneously transform the potential properties of both subsystems into actual properties (by collapse of the wave-function).

The celebrated "Einstein–Podolsky–Rosen paradox"(EPR) is based on a similar physical situation. As is well-known, what mainly worried Einstein was the possibility of "non-local effects:" the subjective decision of an observer (who may choose among different incompatible observables to be measured on the system S1) seems to determine the instantaneous emergence of an actual property for the system S2, which might be very "far" from S<sup>1</sup> (possibly inaccessible by means of a light-signal). Interestingly enough, in the framework of quantum computation, entangled states have been often used as a powerful resource, even from a technological point of view (for instance, in the applications to teleportation-phenomena and to quantum cryptography).

As expected, quantum computation cannot be identified with a "static" representation of pieces of information. What is important is the dynamic process of information that gives rise to quantum computations. Such process is mathematically performed by quantum logical gates (briefly, gates): special examples of unitary operators that transform quregisters into quregisters in a reversible way. Since in quantum theory the timeevolution of all physical systems is mathematically described by unitary operators, one can say that quantum computations can be regarded as the time-evolution of some special quantum objects.

We will now introduce the definition of quantum state machine, which represents a quantum counterpart of the classical notion of deterministic state machine. From an intuitive point of view, any quantum state machine can be regarded as a kind of quantum superposition of many classical deterministic state machines. Some definitions of quantum Turing machine discussed in the literature are based on a strong idealization: no limit is assumed for the length of the registers occurring in a computation. This corresponds to the classical assumption according to which a Turing machine is equipped with an infinite tape. We will consider a more realistic concept, closer to physical quantum computers, which are of course always bound to a limited memory.

#### **Definition 5.** Quantum state machine.

A quantum state machine is an abstract system **QM** associated to a (finite-dimensional) Hilbert space H**QM** whose unit-vectors |ψi represent possible pure states of a quantum system that could physically implement the computations of the state machine. The space H**QM** has the following form:

$$\mathcal{H}^{\mathbf{QM}} = \mathcal{H}^H \otimes \mathcal{H}^S \otimes \mathcal{H}^W.$$

The following conditions are required:


<sup>⊗</sup>m<sup>C</sup> 2 , where 2<sup>m</sup> is the cardinal number of S. Accordingly, the set S can be one-to-one associated to a basis of H<sup>S</sup> .

3. H<sup>W</sup> (which represents the word-space) is identified with a Hilbert space <sup>⊗</sup>n<sup>C</sup> 2 (for a given n ≥ 1). The number n determines the length of the registers |x1,... , xni that may occur in a computation. Shorter registers |x1,... , xhi (with <sup>h</sup> <sup>&</sup>lt; <sup>n</sup>) can be represented in the space <sup>⊗</sup>n<sup>C</sup> <sup>2</sup> by means of convenient ancillary bits.

Let B **QM** be a basis of H**QM**, whose elements are unitvectors having the following form:

$$|\varphi\_{\dot{i}}\rangle = |h\_{\dot{i}}\rangle|s\_{\dot{i}}\rangle|\varkappa\_{\dot{i}\_{1}}, \dots, \varkappa\_{\dot{i}\_{\rm o}}\rangle,$$

where <sup>|</sup>hi<sup>i</sup> belongs to the basis of <sup>H</sup>H, while <sup>|</sup>si<sup>i</sup> belongs to the basis of H<sup>S</sup> .

Any unit-vector <sup>|</sup>ψ<sup>i</sup> of <sup>H</sup>**QM** that is a superposition of basis-elements |ϕii represents a possible computational state of **QM**. The expected interpretation of a computational state

$$\left| \psi \right\rangle = \sum\_{i} c\_{i} |h\_{i}\rangle |s\_{i}\rangle |\varkappa\_{i\_{1}}, \dots, \varkappa\_{i\_{n}}\rangle$$

is the following:


$$|\psi\rangle = \sum\_{i} |0\_H\rangle |s\_{in}\rangle |\varkappa\_{i\_1}, \dots, \varkappa\_{i\_{tt}}\rangle.$$

5. Like a deterministic state machine, a quantum state machine **QM** is characterized by a program. In the quantum case, a program is identified with a sequence of unitary operators of H**QM**:

$$(U\_0, \ldots, U\_l),$$

where we may have: U<sup>i</sup> = U<sup>j</sup> with i 6= j. The following conditions are required:

(a) for any possible input |ψ0i, U0(|ψ0i) = |ψ1i is a superposition of basis-elements having the following form:

$$|h\_i^1\rangle|s\_i^1\rangle|\mathcal{X}\_{i\_1}^1,\dots,\left.\mathbf{x}\_{i\_n}^1\right\rangle,$$

where all s 1 i are different from sin and |h 1 i i = |0Hi, if t 6= 0. (b) For any j (0 < j < t), Uj(|ψji) = |ψ<sup>j</sup> <sup>+</sup> <sup>1</sup>i is a superposition of basis-elements having the following form:

$$|0\_H\rangle |s\_{i\_1}^{j+1}\rangle |\varkappa\_{i\_1}^{j+1}, \dots, \varkappa\_{i\_n}^{j+1}\rangle.$$

(c) Ut(|ψti) = |ψ<sup>t</sup> <sup>+</sup> <sup>1</sup>i is a finite superposition of basiselements having the following form:

$$|\left|1\_{H}\right\rangle|s\_{half\_{j}}\rangle|\varkappa\_{i\_{1}}^{t+1},\ldots,\left|\varkappa\_{i\_{n}}^{t+1}\right\rangle.$$

The concept of computation of a quantum state machine can be now defined in a natural way.

**Definition 6.** Computation of a quantum state machine. Let **QM** be a quantum state machine, whose program is the operator-sequence (U0,... , Ut) and let |ψ0i be a possible input of **QM**. A computation of **QM** with input |ψ0i is a sequence of computational states of **QM**

$$\mathcal{QC} = (|\psi\_0\rangle, \dots, |\psi\_{t+1}\rangle),$$

such that: |ψ<sup>i</sup> <sup>+</sup> <sup>1</sup>i = Ui(|ψii), for any i (0 ≤ i ≤ t). The vector |ψ<sup>t</sup> <sup>+</sup> <sup>1</sup>i represents the output of the computation, while the density operator Red<sup>3</sup> (|ψ<sup>t</sup> <sup>+</sup> <sup>1</sup>i) (the reduced state of |ψ<sup>t</sup> <sup>+</sup> <sup>1</sup>i with respect to the third subsystem) represents the wordoutput of the computation.

Like all abstract notions of quantum computer, the concept of quantum state machine gives rise to some critical questions that have been often discussed in the literature. Two important problems (which cannot have any counterpart in the case of classical computation) are the following:


Consider now a quantum state machine whose program is

$$(U\_0, \ldots, U\_t).$$

Each U<sup>i</sup> naturally determines a corresponding word-operator U W i , defined on the word-space HW. Generally, it is not guaranteed that all word-operators are unitary. But it is convenient to refer to quantum state machines that satisfy this condition. In this way, any quantum state machine (whose wordspace is <sup>⊗</sup>n<sup>C</sup> 2 ) determines a quantum circuit, consisting of a sequence of unitary operators (gates):

$$(U\_0^W, \dots, U\_t^W),$$

where n represents the width, while t + 1 represents the depth of the circuit.

To what extent can quantum state machines be simulated by classical probabilistic state machines? In order to discuss this important question, let us refer to a celebrated quantum experiment, based on the Mach–Zehnder interferometer (represented by **Figure 2**).

The physical situation can be sketched as follows. Consider a photon-beam (possibly consisting of a single photon) and assume that |0i describes the state of photons moving along the x direction, while |1i describes the state of photons moving along the y direction. All photons go through a first beam splitter

that "splits" them giving rise to the following effect: within the box each photon follows a path corresponding either to the x-direction or to the y-direction with probability <sup>1</sup> 2 . Soon after, on both paths, all photons are reflected by a mirror that inverts their direction. Finally, the photons pass through a second beam splitter that determines the output-state. Suppose that all photons entering into the interferometer-box are moving in the x-direction. According to a "classical way of thinking" we would expect that the photons detected at the end of the process will move either along the x-direction or along the y-direction with probability <sup>1</sup> 2 . The result of the experiment is, instead, completely different: the Mach–Zehnder interferometer always transforms the input-state |0i into the output-state |0i; while the input-state |1i is transformed into |1i.

From a mathematical point of view, such a "surprising" result can be explained by using, in an essential way, the concept of superposition. The apparatuses (used in the Mach– Zehnder experiment) can be mathematically represented by two important gates. A beam splitter can be regarded as a physical implementation of the Hadamard-gate <sup>√</sup> I (also called square root of identity), which is defined as follows (on the canonical basis of C 2 ):

$$
\sqrt{\mathbb{1}}|0\rangle = \frac{1}{\sqrt{2}}(|0\rangle + |1\rangle); \ \sqrt{\mathbb{1}}|1\rangle = \frac{1}{\sqrt{2}}(|0\rangle - |1\rangle).
$$

Apparently, the Hadamard-gate transforms the two classical bits |0i and |1i into two (different) genuine superpositions. As a consequence, within the Mach–Zehnder box a photon in state √ 1 2 (|0i + |1i) turns out to satisfy at the same time two alternative properties: the property of moving along the x-direction and the property of moving along the y-direction. We have here a characteristic quantum parallelism: a single photon "goes along"

two different paths at the same time! Metaphorically, situations of this kind have been sometimes compared to the puzzling behavior of a "quantum skier" who runs at the same time on the left and on the right side of a given tree (see **Figure 3**).

The second apparatus of the Mach–Zehnder interferometer (the mirror), can be regarded as a physical implementation of another important gate, the negation NOT (a quantum generalization of the classical negation), which is defined as follows:

$$|\mathsf{NOT}|0\rangle = |1\rangle; \ \mathsf{NOT}|1\rangle = |0\rangle.$$

Accordingly, the Mach–Zehnder circuit can be identified with the following sequence of three gates (all defined on the space C 2 ):

$$(\sqrt{1}, \text{NOT}, \sqrt{1}).$$

Let us now apply the Mach–Zehnder circuit to the input |0i. We obtain:

√ <sup>I</sup> : <sup>|</sup>0i 7→ <sup>√</sup> 1 2 (|0i + |1i); NOT : <sup>√</sup> 1 2 (|0i + |1i) 7→ √ 1 2 (|0i + |1i); √ I : <sup>√</sup> 1 2 (|0i + |1i) 7→ |0i.

We can see, in this way, how the Mach–Zehnder circuit transforms the input-state |0i into the output-state |0i. In a similar way, the input-state |1i is transformed into the outputstate |1i.

Is there any natural "classical counterpart" for the Hadamardgate? A natural candidate might be a particular example of a probabilistic state machine that we can conventionally call the classical probabilistic NOT-state machine (**PM**NOT). Such machine can be defined as follows:


$$Seq\_0 = (R\_{0\_1}, R\_{0\_2}),$$

where:

$$R\_{01} \colon \left(s\_{in}, \left(\infty\right)\right) \mapsto \left(s\_{halt\_j}, \left(\infty\right)\right) \text{ and } p(R\_{01}) = \frac{1}{2};\\R\_{02} \colon \left(s\_{in}, \left(\infty\right)\right) \mapsto \left(s\_{halt\_j}, \left(1-\infty\right)\right) \text{ and } p(R\_{02}) = \frac{1}{2}.$$

Consider, for instance, the input (sin,(0)). The output will be the following set:

$$\left\{ (s\_{half\_j}, (0)), (s\_{half\_j}, (1)) \right\} \dots$$

On this basis, a "classical probabilistic Mach–Zehnder state machine" would determine (for the word-input (0)) the wordgraph illustrated by **Figure 4**.

Such a machine turns out to compute both the words (0) and (1) with probability <sup>1</sup> 2 . Interestingly enough, this is the same probabilistic result that is obtained in the quantum case, when one performs a measurement inside the interferometer-box. In such a case, photons behave like "normal skiers," who pass either at the right or at the left side of a tree (where or represents here, of course, the exclusive disjunction).

The arguments we have developed seem to confirm the following conjecture: the characteristic superposition-patterns, that may occur during a quantum computation (when no measurement is performed during the computation-process), cannot be generally represented by probabilistic state machines. Quantum parallelism (based on superpositions) and classical parallelism are deeply different.

# 4. QUANTUM PARALLELISM, PSYCHOLOGICAL PARALLELISM, AND QUANTUM COMPUTATIONAL SEMANTICS

What kind of similarity can be recognized between quantum parallel structures and different forms of psychological parallelism? Trying to represent the human mind as a kind of system of quantum state machines would be, of course, naive and misleading. In spite of many important results in the framework of neurosciences, the complex network that connects human conscious and unconscious thoughts is still quite mysterious<sup>7</sup> . Quantum-like superpositions can be reasonably applied to represent some aspects of such complex networks. Even quantum interference phenomena (with the characteristic constructive and destructive effects) can find some natural psychological interpretations.

According to an interesting hypothesis (discussed by the neuroscientist Boncinelli, 2012), the mysterious emergence of an act of consciousness can be represented as a sudden transition from a parallel structure to a linear one. Is it reasonable to conjecture that such transition could be described as a kind of "psychological collapse of the wave-function?"

In the investigations about possible links between quantum structures and psychological structures a useful tool is represented by a special form of quantum logical semantics (called quantum computational semantics) that has been naturally inspired by the theory of quantum computation<sup>8</sup> .

Let us briefly recall the basic ideas of this semantics. We can refer a first-order language L, whose non-logical alphabet contains individual terms (variables and names), predicates and sentential constants. Interpreting the language L means associating to any formula α a meaning, identified with a piece of quantum information that can be stored by a quantum system. Accordingly, any possible meaning of α is represented by a possible (pure or mixed) state of a quantum system: generally, a density operator ρ<sup>α</sup> that lives in a Hilbert space H<sup>α</sup> , whose dimension depends on the linguistic complexity of α.

The logical operators of L are associated to special examples of Hilbert-space operations that have a characteristic dynamic behavior, representing possible computation-actions. The logical connectives are interpreted as particular (reversible) gates, like the negation NOT, the Hadamard-gate <sup>√</sup> I, the Toffoli-gate T (which allows us to define a reversible conjunction AND). At the same time, the logical quantifiers (∀, ∃) are interpreted as possibly irreversible quantum operations. Since the universe of discourse (which the language refers to) may be indeterminate, the use of quantum quantifiers may give rise to a reversibilitybreaking, which is quite similar to what happens in the case of measurement-phenomena.

Due to the characteristic features of quantum holism, meanings turn out to behave in a holistic and contextual way: the density operator ρ<sup>α</sup> (which represents the global meaning of a formula α) determines the contextual meanings of all parts of α (which can be obtained by applying the reduced-state function to ρα). As a consequence, it may happen that the meaning of a formula is an entangled pure state, while the meanings of its parts are proper mixtures. In such cases, the meaning of a global expression turns out to be more precise than the meanings of its parts. It is also admitted that one and the same formula receives different contextual meanings in different contexts.

As an example, consider the atomic sentence "Alice is pretty" (formalized as **Pa**). In order to store the information expressed by this sentence, we need three quantum objects whose states represent the pieces of information corresponding, respectively, to the predicate **P**, to the name **a** and to the truth-degree according to which the individual denoted by the name **a** satisfies the property denoted by the predicate **P**. Accordingly, the meaning of the sentence **Pa** can be identified with a (pure or mixed) state <sup>ρ</sup>**Pa** living in the tensor-product space <sup>H</sup>**Pa** = ⊗3<sup>C</sup> 2 . In order to obtain the contextual meanings of the linguistic parts of **Pa** it is sufficient to consider the two reduced states Red<sup>1</sup> (ρ**Pa**) and Red<sup>2</sup> (ρ**Pa**), which describe (respectively) the states of the first and of the second subsystem of the quantum object that stores the information expressed by the sentence **Pa**. From a logical point of view, Red<sup>1</sup> (ρ**Pa**) and Red<sup>2</sup> (ρ**Pa**) can be regarded as two intensional meanings: a property-concept and an individual concept, respectively; while ρ**Pa** represents a propositional concept (or event).

Like formulas, sequences of formulas also can be interpreted according to the quantum computational rules. As expected, a possible meaning of the sequence (α1,... ,αn) will be a density operator ρ(α1,...,αn) living in a Hilbert space H(α1,...,αn) , whose dimension depends on the linguistic complexity of the formulas α1,... ,αn.

In this framework one can develop an abstract theory of vague possible worlds. Consider a pair

$$W = ( (\alpha\_1, \dots, \alpha\_n), \,\rho\_{(\alpha\_1, \dots, \alpha\_n)} ),$$

consisting of a sequence of formulas and of a density operator that represents a possible meaning for our sequence. It seems reasonable to assume that W describes a vague possible world, a kind of abstract scene where most events are characterized by a "cloud of ambiguities," due to quantum uncertainties. In some cases W might be exemplified as a "real" scene of a theatrical play or as a vague situation that is described either in a novel or in a poem. And it is needless to recall how ambiguities play an essential role in literary works.

As an example, consider the following vague possible world:

$$\boldsymbol{W} = ( (\mathbf{Pab}), \,\rho\_{(\mathbf{Pab})} ),$$

where **Pab** is supposed to formalize the sentence "Alice is kissing Bob," while ρ**Pab** corresponds to the pure state

$$|\Psi\rangle\_{\mathbf{Pa}\mathbf{b}} = |\varphi\rangle \otimes \frac{1}{\sqrt{2}} (|0,1\rangle) + |1,0\rangle) \otimes |1\rangle,$$

where <sup>|</sup>ϕ<sup>i</sup> lives in the space <sup>C</sup> 2 , while |9i**Pab** lives in the space <sup>⊗</sup>4<sup>C</sup> 2 . Here the reduced state of |9i**Pab** that describes the pair (Alice, Bob) has the typical form of an entangled state; consequently, the states describing the two individuals Alice and Bob are two identical mixed states. In the context |9i**Pab** Alice and Bob turn out to be indistinguishable: it is not determined "who is who" and "who is kissing whom." It is not difficult to imagine some "real" theatrical scenes representing ambiguous situations of this kind.

# 5. A QUANTUM SEMANTICS FOR MUSIC

An abstract version of the quantum computational semantics can be applied to a formal analysis of musical compositions, where

<sup>7</sup>As is well-known, the literature devoted to the study of parallel structures in the mind/brain-behavior is very rich. As an example, one can refer to some important contributions of Damasio (see, for instance, Damasio, 1999). 8 See (Dalla Chiara et al., 2005, 2010, in press).

both musical ideas and extra-musical meanings are generally characterized by some essentially vague and ambiguous features<sup>9</sup> .

Any musical composition (say, a sonata, a symphony, an opera,...) is, generally, determined by three elements:


While scores represent the syntactical component of musical compositions, performances are physical events that occur in space and time. From a logical point of view, we could say that performances are, in a sense, similar to extensional meanings, i.e., well-determined systems of objects which the linguistic expressions refer to.

Musical thoughts (or ideas) represent, instead, a more mysterious element. Is it reasonable to assume the existence of such ideal objects that are, in a sense, similar to the intensional meanings investigated by logic? Is there any danger to adhere, in this way, to a form of Platonism? When discussing semantic questions, one should not be "afraid" of Platonism. In the particular case of music, a composition cannot be simply reduced to a score and to a system of sound-events. Between a score (which is a system of signs) and the sound-events created by a performance there is something intermediate, represented by the musical ideas that underlie the different performances. This is the abstract environment where normally live both composers and conductors, who are accustomed to study scores without any help of a material instrument.

Following the rules of the quantum semantics, musical ideas can be naturally represented as superpositions that ambiguously describe a variety of co-existent thoughts. Accordingly, we can write:

$$|\mu\rangle = \sum\_{i} c\_{i} |\mu\_{i}\rangle,$$

where:


As happens in the case of composite quantum systems, musical ideas (which represent possible meanings of musical phrases written in a score) have an essential holistic behavior: the meaning of a global musical phrase determines the contextual meanings of all its parts (and not the other way around).

An important feature of music is the capacity of evoking extramusical meanings: subjective feelings, situations that are vaguely imagined by the composer or by the interpreter or by the listener, real or virtual theatrical scenes (which play an essential role in the case of lyric operas and of Lieder). The interplay between musical ideas and extra-musical meanings can be naturally represented in the framework of our quantum semantics, where extra-musical

9 See (Dalla Chiara et al., 2012). meanings can be dealt with as special examples of vague possible worlds.

We can refer to the abstract tensor product of two spaces

MSpace ⊗ WSpace,

where:


Following the quantum-theoretic formalism, we can distinguish between factorized and non-factorized global musical ideas. A factorized global musical idea will have the form:

$$|M\rangle = |\mu\rangle \otimes |\nu\rangle.$$

But we might also meet entangled global musical ideas, having the form:

$$|M\rangle = c\_1(|\mu\_1\rangle \otimes |\nu\_1\rangle) + c\_2(|\mu\_2\rangle \otimes |\nu\_2\rangle).$$

As is well-known, music gives rise to a special kind of psychological experience, where some complex parallel structures are consciously grasped, in a way that may appear miraculous. Paradigmatic examples arise, for instance, in the case of trios or quartets of lyric operas. In such cases, the listener perceives a global polyphonic structure; at the same time, he/she is able to follow (at least to a certain extent) the different melodic lines and even the different thoughts and feelings of the characters who are singing. As an example, it may be interesting to consider three great masterpieces of the history of lyric operas: the quartet of Act 1 in Beethoven's Fidelio, the quartet of Act 3 in Verdi's Rigoletto and the trio of Act 3 of Der Rosenkavalier by Richard Strauss. The parallel structures that arise in these three examples have some significant differences both from the musical and from the semantic point of view.

In Fidelio's quartet the psychological contraposition between the four characters (Marzelline, Leonore, Rocco, Jaquino) is realized by means of a single musical theme that is successively sung by the four singers (**Figure 5**).

It is amazing how Beethoven succeeds in expressing, by one and the same theme, different attitudes and emotions: the joyful hope of Marzelline, the doubts and the anguish of Leonore, the paternal satisfaction of Rocco, the jealous rage of Jaquino. The whole context is dominated by strong ambiguities and antagonistic elements: the contrast between an improbable family-portrait and the cruel jail-environment, the contradictions of Rocco (who is at the same time a fond father and an accomplice of the prison-system), the sexual ambiguity of Leonore, the loving heroin who has disguised herself as a man (Fidelio), in the attempt to save her husband, the prisoner Florestan. The musical result is an extraordinary and highly emotional polyphonic construction based on very simple musical components.

The structure of Rigoletto's quartet is completely different. All characters are associated to specific musical themes that

are repeated with some variations. The leading musical idea is represented by the wonderful theme sung by the Duke of Mantova at the very beginning (**Figure 6**) 10 .

Like Mozart's Don Giovanni, Verdi's Duke is a cynic seducer, who may appear sweet and sincere with his victims. And music often exalts a paradoxical co-existence of contradictory psychological attitudes. All contrasts are emphasized in the quartet by the sordid environment, where a crime is going to be committed. Maddalena's answer to the Duke is based on a fully different theme, a staccato-sequence of sixteenth-notes (**Figure 7**) 11 .

Both the music and the text reflect Maddalena's ambiguity: she is a prostitute who is playing a traditional seductive role;

<sup>10</sup>Fairest daughter of love, I am a slave of your charms; with but a single word you could relieve my every pain. Come touch my breast and feel how my heart is racing. With but a single word you could relieve my every pain.

<sup>11</sup> Ah! Ah! That really makes me laugh, talk like that is cheap enough.

at the same time she is also instrumental to a murder-project. Gilda's entrance (soon after Maddalena's first phrase) determines a sudden dramatic change. What Gilda sings is a cry of sorrow, interrupted by some short pauses and appoggiaturas that seem to describe desperate sobs (**Figure 8**) 12 .

One has often discussed the reasons that may have led Gilda to her unreasonable sacrifice for an unworthy man who had deceived her. Representing Gilda as a naive and modest girl is, however, misleading and in contrast with the greatness expressed by the music. Gilda's death-choice can be perhaps better understood as a suicide, caused by an unendurable disillusion. Rigoletto's role in the quartet is musically less "visible." His mind is completely absorbed in the vengeance-project ("la vendetta") that shall be shortly accomplished. From a musical point of view, the quartet is constructed as a polyphonic structure, where the four voices are interlaced, each preserving its own musical, semantic and psychological autonomy.

Der Rosenkavalier by Strauss belongs to a musical and literary world that is somewhat far both from Fidelio and from Rigoletto. Different forms of ambiguity are exalted in this opera, which is characterized by an extraordinary unity of music and text, written by the great poet Hugo von Hofmannsthal. The theme of sexual ambiguity is here developed by the character of Octavian, the Rosenkavalier whose role is sung by a mezzo-soprano. Although Octavian may recall Mozart's Cherubino, ambiguities are in Strauss' opera more sophisticated: in two different situations Octavian disguises himself as a woman in order to make fun

<sup>12</sup>Ah, these are the loving words the scoundrel spoke once to me! O wretched heart betrayed, do not break of sorrow.

of the rude fiancé of the fascinating girl Sophie. Interestingly enough, some interpreters of the role of Octavian have told how often they have been puzzled by their "oscillating identity" during the opera's performance.

A different and deeper "identity-question" is evoked in a splendid aria sung by the Marschallin in Act 1. After a passionate night spent with her lover Octavian, the lady is troubled by some sad thoughts about the flowing of time and the mysterious coexistence of different identities of one and the same person in different stages of life. She sings:

> Aber wie kann das wirklich sein, dass ich die kleine Resi war, und dass ich einmal die alte Frau sein werd'

..................................... Wie kann denn das geschehen? Wie macht denn das der liebe Gott? Wo ich doch immer die gleiche bin. Und wenn er's schon so machen muss, warum lasst er mich denn zuschaun dabei mit gar so klarem Sinn? Warum versteckt er's nicht vor mir? Das alles ist geheim, so viel geheim <sup>13</sup> .

One is dealing with an extraordinary poetic and musical representation of a "hard" scientific and philosophical problem, that modern philosophers of science usually call "the genidentityquestion"<sup>14</sup> .

<sup>14</sup>The term "genidentity," which refers to the problematic identity of individuals through time has been introduced by Lewin (in his doctoral thesis in 1922) and

The trio performed at the end of the opera by three female voices (the Marschallin, Sophie, Octavian) is a wonderful polyphonic construction, where the three characters express different thoughts and feelings, which are not generally associated to some specific musical themes (unlike the case of Rigoletto's quartet). The main theme is sung at the very beginning by the Marschallin (**Figure 9**) 15 .

By this deeply moving musical phrase the Marschallin expresses her extreme act of love, which is to renounce love. Her choice might recall what Violetta Valery sings in Verdi's La Traviata:

### Dite alla giovine sì bella e pura<sup>16</sup>

although Violetta and the Marschallin are, of course, completely different characters.

Sophie's entrance in the trio is, in a sense, surprising. She joins in, in the final part of the Marschallin's first phrase, just upon the critical word "andern" ("other"). Her intervention creates a sudden brief dissonance (a minor-second chord), which immediately disappears when the two womem (who are both in love with Octavian) harmonically conclude the phrase at a distance of a minor-third. What Sophie perceives is a strange religious atmosphere that she cannot really understand, since she is not aware of the liason between Octavian and the Marschallin. The incipit of the main theme (the characteristic imprinting of the whole trio) is then immediately transposed to a different key (from D flat major to A major) by Octavian, whose initial attitude seems to be mainly dominated by embarassing doubts and questions. But finally the reasons of love prevail over all doubts. At the end of the trio, while

<sup>15</sup>I promised to love him in the right way, even to love his love for another woman. <sup>16</sup>Tell the beautiful and pure girl.

<sup>13</sup>But how can it be that I was the little Resi and that I shall be the old lady. .... How can it come to pass? How can God decree it so? While, in fact, I am always the same. And if indeed it must be so, why does he let me look at it so clearly? Why does he not hide it to me? All this is a mystery, a great mystery.

has been further investigated by Reichenbach and many other scholars. See, for instance, Reichenbach (1928).

the two young lovers sing an expected "dich habe ich lieb" ("I love you"), the Marschallin concludes with an enigmatic phrase:

#### als wie halt Männer das glücklich sein verstehen<sup>17</sup> .

singing the last note alone over a perfect tonic chord.

The three examples of polyphonic constructions, created by Beethoven, Verdi, and Strauss, are all characterized by strong unitary conceptions, based on complex parallel networks of harmonic, melodic, timbric, and semantic relationships (which have been extensively analyzed in musicological literature18). At the same time, one can easily recognize some significant differences that distinguish the three cases, both from the musical and from the semantic point of view. The structure of Fidelio's quartet is very close to a canon-form, where the entrance of each voice is associated to a specific semantic connotation. Rigoletto's quartet is, instead, dominated by strong musical contrasts that reflect the conflicting feelings of four human beings, living in a highly dramatic situation. Finally, Strauss' trio seems to propose a kind of musical and semantic "peaceful resolution." The trio is perceived by the listener as a strongly unitary musical idea that evolves in time. The three female voices are in a sense "entangled," sometimes creating the illusion that a single voice is singing

<sup>18</sup>See, for instance, Budden (1983), Solomon (1998), Principe (2004).

# REFERENCES


(as happens in the case of some entangled quantum objects, whose parts are indistinguishable). Such musical situations can be naturally represented in the framework of the quantum musical semantics, where musical thoughts are dealt with as holistic ideal objects that vaguely allude to a (possibly infinite) variety of co-existing ideas.

The analysis proposed in this article has concerned questions that belong to worlds apparently "far apart": the theory of quantum computers, psychology, logical semantics, and music. A common pattern that arises in all these fields is a frequent and sometimes essential emergence of some characteristic parallel structures. We have seen how the quantum-theoretic concepts of superposition and entanglement have inspired the development of a "bridge-theory" (based on the quantum computational semantics) that can be usefully applied to a formal representation of different kinds of phenomena where parallelism plays a relevant role.

# ACKNOWLEDGMENTS

GS's work has been supported by the Italian Ministry of Scientific Research within the FIRB project "Structures and Dynamics of Knowledge and Cognition," Cagliari Unit F21J12000140001; RL's work has been supported by the Italian Ministry of Scientific Research within the PRIN project "Automata and Formal Languages: Mathematical Aspects and Applications."


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Dalla Chiara, Giuntini, Leporini, Negri and Sergioli. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

<sup>17</sup>as far as men can understand happiness.

# Two types of potential functions and their use in the modeling of information: two applications from the social sciences

### Emmanuel E. Haven\*

*School of Management, Institute for Quantum Social and Cognitive Science and Institute of Finance, University of Leicester, Leicester, UK*

In this paper we consider how two types of potential functions, the real and quantum potential can be shown to be of use in a social science context. The real potential function is a key ingredient in the Hamiltonian framework used in both classical and quantum mechanics. The quantum potential however emerges in a different way in quantum mechanics. In this paper we consider both potentials and we attempt to give them a social science interpretation within the setting of two applications.

#### Edited by:

*Jan Broekaert, Vrije Universiteit Brussel, Belgium*

#### Reviewed by:

*Vyacheslav I. Yukalov, ETH Zurich, Switzerland Belal E. Baaquie, National University of Singapore, Singapore*

#### \*Correspondence:

*Emmanuel E. Haven, School of Management, Institute for Quantum Social and Cognitive Science and Institute of Finance, University of Leicester, University Road, Ken Edwards Building, Room 612, LE1 7RH Leicester, UK eh76@le.ac.uk*

#### Specialty section:

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology*

Received: *17 July 2015* Accepted: *18 September 2015* Published: *21 October 2015*

#### Citation:

*Haven EE (2015) Two types of potential functions and their use in the modeling of information: two applications from the social sciences. Front. Psychol. 6:1513. doi: 10.3389/fpsyg.2015.01513* Keywords: potential functions, quantum mechanics

# 1. Introduction

Potential functions are to physicists what utility functions are to economists: they are both examples of fundamental workhorse tools. But can there exist some connection? Utility functions, u, are defined as: <sup>u</sup> : <sup>C</sup> <sup>→</sup> <sup>R</sup>, where <sup>C</sup> is a set of objects. A preference relation on two objects, <sup>x</sup> and <sup>y</sup> such that, say, x ≻ y will imply that u(x) > u(y) under the necessary conditions that the preference relation is transitive and asymmetric. This paper will not pretend to be at a level of rigor which has been attained in economics based preference theory. Examples of such rigor abound in the various expected utility frameworks, and some papers in this special issue will be devoted to probing how deviations from central axioms like the sure-thing principle can be explained with the aid of quantum structures. Our objective in this paper is modest: we would like to inform (and maybe convince) the reader that with the help of two types of potential functions we can model, in a reasonably successful way, information. This information can include parameters which refer to attitudes toward risk (preferences for risk).

In the next section we introduce the basic structure where those two potential functions can exist. In the next two sections we consider two applications from the social sciences which will attempt to highlight the possible added value of using those potential functions in a non-physics setting. We also consider in the last section of the paper a brief discussion on the relevance of such potential functions in real world market settings.

# 2. Basic Structure where Two Potential Functions can Occur

The so called stochastic equivalent of the Hamilton-Jacobi equations provides for at least, according to the author of this paper, a sort of twilight state where we move from classical mechanics to stochastics and then to quantum mechanics. The hydrodynamic approach to quantum mechanics was developed by Edward Nelson, and in this section we will provide for the essentials of the basic structure which we need, to develop the examples in Sections 3 and 4. We use elements from the set up in the paper by Haven (2015). For a lot more detail, the paper by Nelson (1966) is the essential reference. The book by Paul and Baschnagel (1999) is also an excellent source (see also Haven and Khrennikov, 2013).

We will follow, as in the paper by Haven (2015) and Paul and Baschnagel (1999) approach to Nelson's theory. We also use the same notation. Consider a position in space, x, indexed by time (which we denote here by the index n**)**. Let time contract to zero, one can then write as in Nelson (1966) and Paul and Baschnagel (1999), that dx<sup>+</sup> dt ≃ xn+1−x<sup>n</sup> ǫ and dx<sup>−</sup> dt ≃ xn−xn−<sup>1</sup> ǫ , where ǫ denotes the difference in time t, and d indicates the infinitesimal differential operator. In the area of finance, the so called Brownian motion is a very common way of describing the random evolution of an asset price over time. Brownian motions are well-known in physics, especially with the formalization Einstein gave of such motions. As we mention in Haven (2015), Nelson (1966), and Paul and Baschnagel (1999), define two Brownian motions of the following types:

$$d\boldsymbol{x}(t) = b\_{+}(\boldsymbol{x}, t)dt + \sigma \, dW(t);\tag{1}$$

where dW(t) is a Wiener process; σ is the diffusion coefficient and b+(x, t) is the so called drift function. They also define:

$$d\boldsymbol{x}(t) = b\_{-}(\boldsymbol{x}, t)dt + \sigma \, dW(t). \tag{2}$$

What is now the difference between the two drift functions? Still following the set up in Haven (2015), Nelson (1966), and Paul and Baschnagel (1999) define:

$$D\_{+}\!x(t) = \lim\_{\epsilon \to 0} E\left[\frac{\varkappa\_{n+1} - \varkappa\_{n}}{\epsilon}\right] = b\_{+}(\varkappa, t), \tag{3}$$

and also:

$$D\_{-}x(t) = \lim\_{\epsilon \to 0} E\left[\frac{x\_n - x\_{n-1}}{\epsilon}\right] = b\_{-}(x, t). \tag{4}$$

Note that E is the expectation operator. This paper has mentioned in its introduction that we consider two types of potentials. But what are they? The real potential is the first type, wellknown from elementary classical mechanics. The real potential formalizes potential energy. The second type, is the so called quantum potential which emerges from inserting the polar form of a wave function into the Schrödinger partial differential equation. The uses of those potentials were first proposed by Khrennikov (1999) already more than 10 years ago. Haven (2015) 1 , indicates that a key component of the so called quantum potential, can be written as follows:

$$\begin{split} \frac{\nabla^2 R'}{R'} &= \frac{1}{\sigma^2} \frac{\partial}{\partial \mathfrak{x}} \left[ \frac{1}{2} \left( b\_+ (\mathfrak{x}, t) - b\_- (\mathfrak{x}, t) \right) \right] \\ &+ \frac{1}{4\sigma^4} \left[ b\_+ (\mathfrak{x}, t)^2 - 2b\_+ (\mathfrak{x}, t) b\_- (\mathfrak{x}, t) + b\_- (\mathfrak{x}, t)^2 \right], \text{ (5)} \end{split}$$

<sup>1</sup>This is Equation (25) in that paper.

where R ′ is a scalar field. We then argue in Haven (2015) 2 that if we set b+(x, t) = b and b−(x, t) = c where <sup>b</sup> 6= <sup>c</sup>; <sup>b</sup>, <sup>c</sup> <sup>∈</sup> <sup>R</sup>:

$$\frac{\nabla^2 R'}{R'} = \frac{1}{4\sigma^4} \left[b^2 - 2bc + c^2\right] = \frac{1}{4\sigma^4} \left(b - c\right)^2. \tag{6}$$

If one consider the classical mechanical equivalent of the quantum potential, one multiplies ∇ 2R ′ <sup>R</sup>′ with <sup>−</sup>m<sup>σ</sup> 4 2 , where m is mass. We note that use is made of the conversion: σ <sup>2</sup> <sup>=</sup> h¯ m . We note that such conversion requires further discussion (see Nelson, 1985). Hence, we obtain, as in Haven (2015) 3 :

$$\frac{-m\sigma^4}{2}\frac{\nabla^2 R'}{R'} = \frac{-m}{8}(b-c)^2. \tag{7}$$

We are now ready to consider our first application.

# 3. Application 1: Three Examples Showing Which Additional Information is Brought on by the Use of the Quantum Potential

The Newtonian motion with both the real and quantum potentials is: m.a = −∇ (V + Q) (see for instance Haven and Khrennikov, 2013, for more detail). If one consider the force, −∇Q, to be applied on Equation (7), one can see immediately that:

$$-\nabla Q = -\frac{\partial}{\partial \mathbf{x}} \left[ \frac{-m\sigma^4}{2} \frac{\nabla^2 R'}{R'} \right] = \frac{-\partial}{\partial \mathbf{x}} \left[ \frac{-m}{8} (b - \mathfrak{c})^2 \right] = 0. \tag{8}$$

The force derived from the real potential, −∇V, is −b+(x, t) or −b−(x, t). From an economics point of view, such force can be interpreted as an expected return. Hence, this force can thus incorporate preferences for risk.

In order to give an interpretation to the quantum potential, we need to re-consider Equation (5) but now for the case where b+(x, t) and/or b−(x, t) are not constant. Hence, let us consider the simple case where b+(x, t) = µx, where we can set that µ is now the expected return. We note that making µ to be such an expected return follows in parallel to what is done in financial economics, where the drift term of the geometric Brownian motion is a product of the expected return and the position variable (i.e., the value of the stock price for instance). Let us assume, for easiness of purpose, that b−(x, t) = 0. In this case, we are not in Newtonian mechanics since we are now explicitly imposing that b+(x, t) 6= b−(x, t). We re-consider Equation (5) again:

$$\frac{\nabla^2 R'}{R'} = \frac{1}{\sigma^2} \frac{\partial}{\partial \boldsymbol{x}} \left[ \frac{1}{2} \left( \mu \boldsymbol{x} \right) \right] + \frac{1}{4 \sigma^4} \left[ \mu^2 \boldsymbol{x}^2 \right]. \tag{9}$$

<sup>2</sup>This is Equation (26).

<sup>3</sup>This is Equation (36).

The quantum potential, Q, is written as: <sup>−</sup>m<sup>σ</sup> 4 2 ∇ 2R ′ R′ = −mσ 2 2 ∂ ∂x - 1 2 (µx) − m 8 - µ 2 x 2 . The force is then:

$$-\nabla Q = -\frac{\partial}{\partial \mathbf{x}} \left[ \frac{-m\sigma^2}{2} \frac{\partial}{\partial \mathbf{x}} \left[ \frac{1}{2} \left( \mu \mathbf{x} \right) \right] - \frac{m}{8} \left[ \mu^2 \mathbf{x}^2 \right] \right]. \tag{10}$$

This yields then:

$$-\nabla Q = \frac{m}{4}\mu^2 \text{x.}\tag{11}$$

If we assume that the force on the real potential, −∇V = −b+(x, t) = −µx, then, one can also write that:

$$-\nabla V - \nabla Q = -\mu \mathbf{x} + \frac{m}{4} \mu^2 \mathbf{x} \tag{12}$$

We observe from the above, that besides the information received, via the force on the real potential, i.e., the expected return times the position, additional information is now injected via the force on the quantum potential.

If we were to let b+(x, t) = µx 2 , then using Equation (5):

$$\frac{\nabla^2 R'}{R'} = \frac{1}{\sigma^2} \frac{\partial}{\partial \varkappa} \left[ \frac{1}{2} \left( \mu \varkappa^2 \right) \right] + \frac{1}{4 \sigma^4} \left[ \mu^2 \varkappa^4 \right]. \tag{13}$$

Using <sup>−</sup>m<sup>σ</sup> 4 2 ∇ 2R ′ <sup>R</sup>′ then the force delivered by both the quantum and real potentials (assuming −∇V = −b+(x, t) = −µx 2 ) is:

$$-\nabla V - \nabla Q = -\mu \mathbf{x}^2 + \frac{m\mu^2}{2} \mathbf{x}^3 + \frac{m\sigma^2}{2} \mu. \tag{14}$$

If we would set b+(x, t) = µ, then [see Equation (8)], we would only be able to write that:

$$-\nabla V - \nabla Q = -\mu \tag{15}$$

Let us compare those simple cases, Equations (12, 14 and 15). We can observe that additional terms are added to the force on the real potential. Remark however that we have assumed that the drift term in the pure Newtonian environment, is the same as the drift terms b+(x, t) and b−(x, t). If we translate the pure Newtonian environment, into the Nelson framework we obtain ∇R(x, t) = E h dx<sup>+</sup> dt − dx<sup>−</sup> dt i 2σ <sup>2</sup> = 0 and therefore, in that setting, R(x, t) is constant. This would mean that the density function, in the Nelson framework e <sup>2</sup>R(x,t) would be constant. The quantum potential would also be zero. Hence, the equivalent information of the pure Newtonian setting into a Nelsonian setting would be senseless. However, the quantum potential still is comparable to the real potential, after all one can write: −∇V −∇Q = m.a! This is in some sense a dilemma.

Consider again Equations (12, 14 and 15), and let us re-write slightly, as follows:

• Under the Newtonian based theory if the expected return (i.e., force on real potential) is the expected return µ then the additional information (brought by the gradient of the quantum potential) is nil


Given the reasonableness of this force on the quantum potential to exist (i.e., volatility is not zero and the uncertainty given by xn+1−x<sup>n</sup> ǫ 6= xn−xn−<sup>1</sup> ǫ to exist, it would then be reasonable to claim that if the expected drift is given as µx in a Newtonian world [where xn+1−x<sup>n</sup> ǫ = xn−xn−<sup>1</sup> ǫ and σ = 0] then in a Nelson world this information would need to be augmented with: <sup>1</sup> 4 µ 2 x. Similarly, if the expected drift is given as µx 2 in a Newtonian world [where xn+1−x<sup>n</sup> ǫ = xn−xn−<sup>1</sup> ǫ and σ = 0] then in a Nelson world this information would need to be augmented with: <sup>σ</sup> 2µ 2 + µ 2 2 x 3 .

Remark one very important issue which refers to the setting of m = 1 in the above discussion. The Nelson theory allows for a transition from pure Newtonian mechanics, into a stochastic environment [with the use of R(x, t) as a scalar field] and from there, into a further transition into quantum mechanics [with the use of R(x, t) now as an input into the wave function]. This transition from stochastics to quantum mechanics, also goes via the setting of σ 2 = h¯ m .

For a given finite σ 2 , in a quantum mechanical context when <sup>h</sup>¯ <sup>=</sup> <sup>σ</sup> <sup>2</sup>m, it would mean that m should be extremely!!! small indeed. The question becomes, whether the level of m has a continuum of values when transiting from the stochastic environment toward the quantum mechanics environment. If one considers the case −µx + m 4 µ 2 x then when m is extremely small, the term <sup>m</sup> 4 µ 2 x would need to be extremely small. The same can be said for: −µx <sup>2</sup> <sup>+</sup> mµ 2 2 x <sup>3</sup> <sup>+</sup> mσ 2 2 µ, where the terms which are added to µx 2 are then small too, because of small m : mµ 2 2 x <sup>3</sup> <sup>+</sup> mσ 2 2 µ. It would be a major achievement, if indeed we could find how m behaves when transiting from Newtonian → stochastics (with R as scalar field) → quantum mechanics (with R as an input to the wave function).

# 4. Application 2. An Example of How the Real and Quantum Potentials can be used in Lux's Noise Trader Infection Model

The "noise trader/infection" model was developed by Lux (1997) and we use it here to highlight the applications we can make, in a financial economics framework, of the potentials we have treated in our paper. From Equation (5), we can observe that σ 2 is essential to define the quantum potential. We will make the simple assumption, for the purposes of the model treated here, that E dx<sup>−</sup> dt = 0. There is a non-zero uncertainty due to the fact that E dx<sup>+</sup> dt 6= E dx<sup>−</sup> dt .

### 4.1. Brief Set up of the Lux Model

The model has two main types of traders, fundamental and chartist (or also called noise) traders. The total number of chartist traders is 2N. They are divided into two subgroups, n<sup>+</sup> (the number of noise traders who are positive about the development of the market) and n<sup>−</sup> (the number of noise traders who are negative about the development of the market) and hence n<sup>+</sup> + n<sup>−</sup> = 2N. An opinion index, which is, in the words of Lux (1997) (p. 8) "the distribution of attitudes among the 'population'" is also constructed and it is defined as: x = n+−n<sup>−</sup> 2N . Remark that in the model, chartist traders may change from one subgroup to the other. The smallest difference in the opinion index is given by ± 1 N and the asset price's minimal change is ± one cent. The central equation in which we are interested in for the purposes of our paper, is as follows (Equation 3.7b in Lux, 1997):

$$\frac{d\overline{p}}{dt} = \beta \left( \overline{\mathbf{x}} T\_c + T\_f \left( p\_f - \overline{p} \right) \right);\tag{16}$$

where β is defined as (Lux, 1997, p. 8): "a parameter for the average speed of price adjustment in the presence of excess demand."; x is the average distribution of attitudes; T<sup>c</sup> measures the trading volume of the chartist traders; T<sup>f</sup> measures the trading volume of the fundamental traders; p<sup>f</sup> is the perceived fundamental value of the asset and p is the expected price.

# 4.2. Embedding Lux's Model in the Quantum/Real Potential Environment

If we want to embed the above model in the quantum/real potential model presented here in this paper, then a departure of Lux's model could be as follows:

$$d\overline{p} = \beta \left( \overline{\mathbf{x}} T\_{\varepsilon} + T\_f \left( p\_f - \overline{p} \right) \right) dt + dW;\tag{17}$$

where dW is a Wiener process as defined before. Embedding this departure of Lux's model in the quantum/real potential model, we can then write that:

$$E\left[\frac{d\overline{p}}{dt}\right] = \overline{\mathbf{x}}\beta\boldsymbol{T}\_{\boldsymbol{c}} + \beta\boldsymbol{T}\_{f}\boldsymbol{p}\_{f} - \overline{\mathbf{p}}\beta\boldsymbol{T}\_{f} + E(d\mathcal{W});\tag{18}$$

which can then be re-written as, using <sup>E</sup>(dW) <sup>=</sup> <sup>0</sup>:

$$E\left[\frac{d\overline{p}}{dt}\right] = \overline{\mathbf{x}}\beta\,\mathbf{T}\_c + \beta\,\mathbf{T}\_f\mathbf{p}\_f - \overline{\mathbf{p}}\beta\,\mathbf{T}\_f.\tag{19}$$

Remark that the x parameter could be interpreted as being closely linked to some implicit preference functional, which in this model, is driven by chartists. We remark that Equation (3.7a; p. 13) in Lux's paper Lux (1997) provides for the time dependent evolution of the x parameter.

# 4.3. Consequence of the Absence of Expectation Operators

Remark that if we were to write Equation (16) (as it is thus written in Lux's model), as a result of having it embedded in our quantum/real potential model (thus now without expectation operator) then this would imply that dW dt = 0. If we now go back to the importance of expectation operators in Nelson's theory we can say the following. Imagine we were to not use such an operator on dW dt . For instance, can we then still write that dx<sup>+</sup> dt = b+(x, t), using the Brownian motion Equation (1)? The answer to this question would impose the requirement that no time reversibility can exist. The argument is quite straightforward. Let us follow the arguments of Merton (1990) (please see also Neftci, 2000, for a treatment of Merton's arguments which we follow here) who shows that dW dt , as is well-known, can not be defined with ordinary derivatives. This problem is circumvented in the Nelson theory by using E h dW dt i , where E(dW) = 0.

Assume we want to impose that dW dt = 0 and we thus allow for the use of no expectation operators. We note again, we are well aware that dW dt does not exist. But let us do a quick thought experiment and assume it were to exist. What would be the consequences?

Define elapsed time as h = t<sup>k</sup> − tk−<sup>1</sup> and let n = T <sup>h</sup><sup>m</sup> with m > 1, where T is total time. Following Neftci (2000), assume there exists a quantity A<sup>2</sup> so that: ∞ > A<sup>2</sup> > P∞ k = 1 E - 1W<sup>2</sup> k and there exists a quantity <sup>A</sup><sup>3</sup> so that <sup>E</sup> - 1W<sup>2</sup> k Vmax > A<sup>3</sup> with A<sup>3</sup> ∈]0, 1[ where Vmax = max<sup>k</sup> E - 1W<sup>2</sup> k . From the proof which allows for showing that E - 1W<sup>2</sup> k = σ 2 k h, the following relation is essential (see Neftci, 2000): <sup>h</sup> T A2 A3 > E - 1W<sup>2</sup> k > A3A1 T h, where 0 < A<sup>1</sup> < E - 1W<sup>2</sup> k . In order to come to show under what conditions dW dt = 0 (thus without expectation operator) we want to impose, using n = T <sup>h</sup><sup>m</sup> (<sup>m</sup> <sup>&</sup>gt; 1), that <sup>h</sup> m T A2 A3 > Vmax. Clearly, if h is small (<1) then h <sup>m</sup> (m > 1) will be smaller than h. Hence, it is reasonable to write that: <sup>h</sup> m T A2 A3 < h T A2 A3 . Hence, if we still want h m T A2 A3 > Vmax, we must impose that m > logVmax+log <sup>A</sup><sup>3</sup> A2 T log h . We know that A<sup>3</sup> > 0; A<sup>2</sup> > A<sup>1</sup> > 0. However, h = t<sup>k</sup> − tk−<sup>1</sup> must clearly be positive! We can then write that: <sup>h</sup> m T A2 A3 > E - 1W<sup>2</sup> k > A3A1 T h <sup>m</sup> and one can then define that: E - 1W<sup>2</sup> k = σ 2 k h <sup>m</sup>. We can approximate: dW dt ≃ limh→<sup>0</sup> 1W <sup>h</sup> = limh→<sup>0</sup> h m 2 <sup>h</sup> = limh→<sup>0</sup> h m−2 2 which for m > 2 will yield 0. Thus, we obtain that dW dt ≃ 0 in non expectation operator form when: (i) m > logVmax+log <sup>A</sup><sup>3</sup> A2 T log h and (ii) m > 2 and therefore we must impose that logVmax+log <sup>A</sup><sup>3</sup> A2 T log <sup>h</sup> = 2 and this condition would mean that the uncertainty concentrates in a very specific period of time, Vmax = h <sup>2</sup>A<sup>2</sup> A3T . It is clear from the condition logVmax+log <sup>A</sup><sup>3</sup> A2 T log <sup>h</sup> = 2 that there can not exist time reversibility since h > 0 (and h 6= 1) for the logh to be valid. This may on prima facie, "prove" that the expectation operators which have been imposed on dW dt in Nelson's theory are intrinsically connected to time reversibility.

A commutativity rule such as, E h xn xn+1−x<sup>n</sup> ǫ − xn−xn−<sup>1</sup> ǫ xn i ≡ E(C), also needs to use such operators. Without the expectation operators, setting x<sup>n</sup> fixed: xn. h dx<sup>+</sup> dt − E( dx<sup>−</sup> dt ) i , is possible. Time reversibility is obtained via the expectation operator. In the set up by Nelson, a function u(x, t) is defined as: u(x, t) = 1 2 (b+(x, t)−b−(x, t)), and this function is zero in Newtonian mechanics. We also remark in Haven (2015) 4 that Nelson (1966) and Paul and Baschnagel (1999) define u(x, t) = σ 2 2 ∇ ln f , where f is a probability density function (for instance f = e 2R(x,t) ).

From the above, we can not define dx<sup>−</sup> dt (since this would require time reversibility) and hence, we can not define u(x, t) = σ 2 2 ∇ ln f . This is the case, because the relation is obtained under time reversal on the Fokker-Planck pde: <sup>∂</sup>f(x,t) ∂t + ∇ b+(x, t)f(x, t) + σ 2 <sup>2</sup> 1f(x, t) = 0, which is now impossible since logh is to exist. If u(x, t) can not be defined then ∇ 2R ′ R′ = 1R + u(x,t) σ 2 2 is not definable.

## 4.4. What are the Real and Quantum Potential in Lux's Model?

We can write that using the real potential, V, E h dx dt i = ∇V = b±(x, t). In full analogy with this, we write Equation (19) now as:

$$
\nabla V = \overline{\mathbf{x}} \beta \, T\_c + \beta \, T\_f \mathbf{p}\_f - \overline{\mathbf{p}} \beta \, T\_f;\tag{20}
$$

which we write in shorthand format as: ∇V = α − pβT<sup>f</sup> . Recall Equation (5) which gave the expression for the "quantum potential":

$$\begin{split} \frac{\nabla^2 R'}{R'} &= \quad \frac{1}{\sigma^2} \frac{\partial}{\partial \mathbf{x}} \left[ \frac{1}{2} \left( b\_+ (\mathbf{x}, t) - b\_- (\mathbf{x}, t) \right) \right] \\ &+ \frac{1}{4\sigma^4} \left[ b\_+ (\mathbf{x}, t)^2 - 2b\_+ (\mathbf{x}, t) b\_- (\mathbf{x}, t) + b\_- (\mathbf{x}, t)^2 \right]. \end{split} \tag{5}$$

Using Equation (20) in Equation (5), we obtain:

$$\frac{\nabla^2 R'}{R'} = \frac{1}{\sigma^2} \frac{\partial}{\partial \overline{p}} \left[ \frac{1}{2} (\alpha - \overline{p}\beta T\_f) \right] + \frac{1}{4\sigma^4} \left[ (\alpha - \overline{p}\beta T\_f)^2 \right];\tag{21}$$

P where we have to note that x is calculated as per Lux (1997), x P p x.L(x, p;t); where L(x, p;t) is defined as the probability to occupy a state x, p ; i.e., where x is the distribution of attitudes and p is price and hence indirectly x is a function of p, via the probability L. Similarly, note that p = P x P p p.L(x, p;t). Similarly, T<sup>c</sup> in Lux (1997) is defined as: T<sup>c</sup> ≡ 2Nt<sup>c</sup> , where 2N is the total noise trader population and t<sup>c</sup> is the amount the chartist, individually buys or sells. We assume that t<sup>c</sup> is not dependent on p.

Simplifying Equation (21) is now straightforward and leads to:

$$\frac{\nabla^2 R'}{R'} = \frac{1}{\sigma^2} \frac{1}{2} (-\beta T\_f) + \frac{1}{4\sigma^4} \left[ (\alpha - \overline{p}\beta T\_f)^2 \right] \tag{22}$$

<sup>4</sup>This is Equation (19).

Frontiers in Psychology | www.frontiersin.org October 2015 | Volume 6 | Article 1513 |

Then multiplying Equation (22) with <sup>−</sup>m<sup>σ</sup> 4 2 and also taking <sup>−</sup><sup>d</sup> dp so as to get the force, we obtain:

$$-\nabla Q = -\frac{d}{d\overline{p}} \left[ \left( \frac{-m\sigma^4}{2} \right) \frac{\nabla^2 R'}{R'} \right] = \frac{m}{4} \left( \alpha - \overline{p}\beta T\_f \right) (-\beta T\_f). \tag{23}$$

Recall from Equation (20), that we now can write:

$$-\nabla V = \overline{p}\beta T\_f - \alpha. \tag{24}$$

Recall that the m factor in −∇Q can indeed be very small if we move toward a quantum mechanical environment. If we thus write: −∇V − ∇Q, we then obtain:

$$-\nabla V - \nabla Q = \overline{p}\beta T\_f - \alpha + \frac{m}{4}\left(\alpha - \overline{p}\beta T\_f\right)(-\beta T\_f). \tag{25}$$

This can be simplified to:

$$(\overline{\rho}\beta T\_f - \alpha) \left[1 + \beta \, T\_f \left(\frac{m}{4}\right)\right]. \tag{26}$$

Clearly, if m → 0 and σ <sup>2</sup> 6= 0 then <sup>β</sup>T<sup>f</sup> m 4 is indeed very small.

We can also write: <sup>∇</sup><sup>V</sup> <sup>∇</sup><sup>R</sup> <sup>=</sup> <sup>2</sup><sup>σ</sup> 2 and hence: ∇R = ∇V 2σ 2 . We can then write that:

$$R(\overline{p}) = \frac{1}{2\sigma^2} \int \alpha \, -\, \overline{p}\beta \, T\_f d\overline{p};\tag{27}$$

which is worked out as: R(p) = 1 2σ 2 (αp) − 1 4σ <sup>2</sup> βT<sup>f</sup> p <sup>2</sup> <sup>+</sup> <sup>C</sup>. Recall that s = 1 1+r R exp(2R(p, t))dp, where s is the state price. In the context of the model we consider here this would yield:

$$s = \frac{1}{1+r} \int \exp\left( \left( \frac{1}{\sigma^2} (\alpha \overline{p}) - \frac{1}{2\sigma^2} \beta \, T\_f \overline{p}^2 + C \right) \right) d\,\overline{p}.\tag{28}$$

This means that the state price (an insurance price which is paid to guarantee a financial outcome when a particular state of nature occurs) is now dependent, using Lux's model which is embedded in our quantum/real potential model, on the (i) volatility; (ii) the expected price of an asset (which chartists and fundamental buyers buy); (iii) the parameter β for the average speed of price adjustment in the presence of excess demand; (iv) the term α = xβT<sup>c</sup> + βT<sup>f</sup> p<sup>f</sup> . Remark that this density function used the amplitude function R from ∇R = ∇V 2σ 2 . The quantum potential in its full form is absent from this relation, but parts of that potential (i.e., ∇R) are still figuring in the equation. In this formulation the preference factor (via x) would be embedded in p. The volatility parameter seems to be the "conduit" factor which links the changes in R with the changes in V, via ∇V = 2σ <sup>2</sup>∇R. Thus, Equation (28) does only exist here if this "conduit" factor exists (i.e., if σ 2 is not zero).

# 5. What is the Relevance of Real and Quantum Potentials in Empirical Work?

This paper has not yet answered a crucial question, for which we have to thank one of the referees of this paper. Can the quantum and/or real potentials have empirical relevance? The answer is fortunately enough positive. Recent work by Belal Baaquie shows quite clearly how real potential functions can be estimated for traded commodities. Please consult Baaquie (2013). The same author also shows how the minimization of a real potential function, when that function is defined as being equal to the sum of supply and demand functions, yields a more general version of the equilibrium price which is well-known in basic economics. The quantum potential can also be estimated from real market data. In the paper by Tahmasebi et al. (2015) the quantum potential is estimated for the Standards and Poor (S&P) Index. More work is needed on how the path derived from the extended Newtonian motion (i.e., with thus two potentials), can be used to emulate price behavior over time.

# 6. Conclusion

We have attempted to show that preferences for risk seem to be captured by both types of potentials (i.e., the real and quantum potentials). When embedding the basics of the Lux model in the types of potential approach proposed in this paper, we seem to find that the quantum potential's influence depends on the mass parameter. This parameter varies depending on whether we are far removed (or not) from the quantum physical limit. The last section of the paper does indicate that the real and quantum potentials can be estimated within real financial data settings. But the question may remain, if the proposed analysis in this paper is of any value, how one can interpret the quantum potential in light of those real data interpretations? In the paper by Tahmasebi et al. (2015) it is shown quite clearly that a quantum potential with infinite walls occurs for the S&P Index for short time scales, and when the time scale grows those infinite walls disappear and the quantum potential for the S&P index for large time scales resembles the quantum potential for Gaussian white noise. Price variation is deemed to be very small for small time scales, but allowed to be larger for large time scales. This is intuitively acceptable.

From a Newtonian price path point of view, considering for instance Equation (25), our analysis in this paper seems to indicate, that the influence of the quantum potential (next to the real potential) on the price path, may vary. But to pinpoint, in an economics sense, what this parameter of variation really means is very difficult.

# References


Nelson, E. (1985). Quantum Fluctuations. Princeton, NJ: Princeton University Press.


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Haven. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Quantum-like model of unconscious–conscious dynamics

#### Andrei Khrennikov \*

Department of Mathematics, Mathematical Institute, Linnaeus University, Växjö, Sweden

We present a quantum-like model of sensation–perception dynamics (originated in Helmholtz theory of unconscious inference) based on the theory of quantum apparatuses and instruments. We illustrate our approach with the model of bistable perception of a particular ambiguous figure, the Schröder stair. This is a concrete model for unconscious and conscious processing of information and their interaction. The starting point of our quantum-like journey was the observation that perception dynamics is essentially contextual which implies impossibility of (straightforward) embedding of experimental statistical data in the classical (Kolmogorov, 1933) framework of probability theory. This motivates application of nonclassical probabilistic schemes. And the quantum formalism provides a variety of the well-approved and mathematically elegant probabilistic schemes to handle results of measurements. The theory of quantum apparatuses and instruments is the most general quantum scheme describing measurements and it is natural to explore it to model the sensation–perception dynamics. In particular, this theory provides the scheme of indirect quantum measurements which we apply to model unconscious inference leading to transition from sensations to perceptions.

#### Edited by:

Sandro Sozzo, University of Leicester, UK

#### Reviewed by:

George Kachergis, New York University, USA Harald Atmanspacher, Collegium Helveticum, Switzerland

#### \*Correspondence:

Andrei Khrennikov, Department of Mathematics, Mathematical Institute, Linnaeus University, Universitetsplatsen 1, Växjö S-35195, Sweden andrei.khrennikov@lnu.se

#### Specialty section:

This article was submitted to Cognition, a section of the journal Frontiers in Psychology

Received: 18 March 2015 Accepted: 02 July 2015 Published: 03 August 2015

#### Citation:

Khrennikov A (2015) Quantum-like model of unconscious–conscious dynamics. Front. Psychol. 6:997. doi: 10.3389/fpsyg.2015.00997 Keywords: sensation, perception, quantum-like model, quantum apparatuses and instruments, bistable perception, unconscious inference

# 1. Introduction

In recent years the mathematical formalism of quantum mechanics was applied to a variety of problems outside of quantum physics: from molecular biology and genetics to cognition and decision making (see the monographs, Khrennikov, 2010b; Busemeyer and Bruza, 2012; Haven and Khrennikov, 2012) and the extended lists of references in them as well as in the papers (Aerts et al., 2014; Khrennikov et al., 2014).

The problem of mathematical modeling of bistable perception and, more generally, unconscious inference<sup>1</sup> is that it can be rather complex and that its nature is not understood well-enough to allow one to choose the optimal model. In spite of tremendous efforts during the last 200 years, this problem cannot be considered fully solved (cf. Newman et al., 1996; Laming, 1997). In this note we apply the theory of quantum apparatuses and instruments (Davies and Lewis, 1970; Busch et al., 1995; Ozawa, 1997) to quantum-like modeling of sensation–perception dynamics as the concrete example of unconscious and conscious processing of information and their interaction. Our model can be applied to general unconscious–conscious information processing. It generalized the quantum-like model developed in Khrennikov (2004). We also point out that this paper is the

<sup>1</sup>Unconscious inference (Conclusion) is a term of perceptual psychology invented by von Helmholtz (1866); Boring (1942), to describe an involuntary, pre-rational and reflex-like mechanism which is part of the formation of visual impressions.

first attempt to apply the theory of quantum apparatuses and instruments outside of physics, to cognition and psychology.

Special quantum structures were elaborated in order to mathematically represent most general measurement schemes and are applicable both in classical and quantum physics and, practically, in any domain of science. They generalize the pioneer quantum measurement representation by operators of the projection type, also known as von Neumann–Lüders measurements. In quantum physics, this new general framework is of vital importance since the projection type measurements do not completely cover real experimental situations (Davies and Lewis, 1970; Busch et al., 1995; Ozawa, 1997; Nielsen and Chuang, 2000). It seems that the same holds true in mathematical modeling in cognition and psychology (see Asano et al., 2010a,b; Khrennikov, 2010b; Asano et al., 2011, 2012; Khrennikov and Basieva, 2014; Khrennikov et al., 2014), although here the situation is not yet absolutely clear and, obviously, the underlying reason for using quantum instruments is different.

To motivate the use of the theory of quantum apparatuses and instruments, we shall compare it first to classical probabilistic methods and then to simpler quantum-like models of processing data from cognitive science and psychology based on the von Neumann–Lüders measurements. A detailed discussion on violation of laws of classical probability theory by statistical data collected in cognitive science and psychology can be found in Khrennikov, 2010b and SS. We can, for example, point to the order effect (Khrennikov, 2010b; Wang and Busemeyer, 2013) and the disjunction effect (Khrennikov, 2010b; Busemeyer and Bruza, 2012). In the probabilistic terms these are just various exhibitions of violation of the formula of total probability. In general, during recent years quantum probability and decision making were successfully applied to describe a variety of problems, paradoxes, and probability judgment fallacies, such as Allais paradox (humans violate Von Neumann–Morgenstern expected utility axioms), Ellsberg paradox (humans violate Aumann–Savage subjective utility axioms) (see e.g., Haven et al., 2009; Asano et al., 2010a,b, 2011, 2012; Busemeyer et al., 2011; Pothos and Busemeyer, 2013; Wang and Busemeyer, 2013; Aerts et al., 2014; Khrennikov and Basieva, 2014). Psychologists and economists explore the new way inspired by one simple fact from physics: quantum probability can work in situations where classical probability does not. Why? Answers may differ (see Khrennikov, 2010b). We point to contextuality of data as one of the main sources of its non-classicality (Khrennikov, 2010b; Dzhafarov and Kujala, 2012a,b, 2013).

As was pointed out, at the beginning of quantum theory physicists attempted to represent quantum measurements they were dealing with by projectors. The same attitude could be observed in applications of the quantum formalism outside of physics. Granted, some statistical psychological effects can be nicely described with the help of the von Neumann–Lüders measurements (see e.g., Haven et al., 2009; Busemeyer et al., 2011; Busemeyer and Bruza, 2012; Pothos and Busemeyer, 2013; Wang and Busemeyer, 2013; Aerts et al., 2014). However, more detailed analysis showed (Asano et al., 2010a,b; Khrennikov, 2010b; Asano et al., 2011, 2012; Khrennikov and Basieva, 2014; Khrennikov et al., 2014) that, in general, data from cognitive psychology cannot be embedded into the projectionmeasurement scheme. Therefore, it is natural to follow the development of quantum physics and proceed within a general theory of measurements.

In this paper we do this by illustrating the general theory of quantum instruments with one concrete example: bistable perception of the concrete ambiguous figure, the Schröder stair. Why do we use a quantum-like model? Here the argument is more complicated than in the case of the order and disjunction effects and other probability fallacies mentioned above. The deviation from classical probability theory is expressed not as a violation of the formula of total probability, but as a violation of one of the Bell-type inequalities, namely, the Garg–Leggett inequality (Asano et al., 2014). We point out that the Bell-type inequalities play an important role in modern quantum physics. If such an inequality is violated, then the data cannot fit a classical probability space. As was shown in our previous study (Asano et al., 2014), the data collected in a series of experiments performed at Tokyo University of Science (see Asano et al., 2014) for details, violate the Garg–Leggett inequality (statistically significantly)<sup>2</sup> .

The first step toward creation of a quantum-like model of bistable perception was done by Atmanspacher and Filk (2012, 2013). We studied this problem in Asano et al. (2014), where we demonstrated a violation of the Garg–Leggett inequality for experimental probabilistic data collected for rotating image of Schröder stair (the experiment was performed at Tokyo University of Science), in Accardi et al. (in press) we presented a quantum-like adaptive dynamical model for bistable perception. The latter is based on a more general formalism than the theory of quantum instruments—on the theory of adaptive quantum systems. In the present paper, the traditional approach to quantum measurement theory is used for modeling sensation– perception transition and unconscious inference.

Finally, we point out that violation of laws of classical probability theory is a statistical exhibition of violation of laws of classical Boolean logic. Thus, in logical terms the quantum-like modeling of cognition is modeling of a nonclassical reasoning, decision making, and problem solving. In particular, in our model unconscious inference, generation of a perception from a sensation, is not based on the rules of classical logics. We also remark that the so called quantum logic corresponding to the quantum formalism is just one special type of nonclassical logic. In principle, there are no reasons to assume that human (mental) cognition, even if it has a non-Boolean structure, can be modeled completely with the aid of quantum logic and quantum probability. Still more general models might be explored, see (Khrennikov and Basieva, 2014) for a discussion.

<sup>2</sup>We remark that the formula of total probability and the Bell-type inequalities can be treated as just two special statistical tests of non-classicality of the data (see Conte et al., 2008; Bruza et al., 2010; Khrennikov, 2010b; Asano et al., 2014; Dzhafarov and Kujala, 2014) for discussion. This is the "minimal interpretation." In quantum physics the standard interpretation of these inequalities is related to whether we can proceed with a realistic and local model. The Garg–Leggett inequality is a rather special type of Bell's inequalities, since it is about time correlations for a single system and the original Bell's inequality is about spatial correlations for pairs of systems.

# 2. Advantageousness of Quantum Instrumental Modeling in Cognitive Psychology

We emphasize that, as well as quantum physics (Plotnitsky, 2006, 2009), cognitive and social sciences also can be treated as theories of measurements. A great deal of effort has been put into the development of measurement formalisms, cf. with, e.g., the time-honored Stimulus–Organism–Response (S–O–R) scheme for explaining cognitive behavior (Woodworth, 1921). Just like the situation in quantum physics, cognitive and social scientists cannot approach the mental world directly; they work with results of observations. Both quantum physics and cognitive and social sciences are fundamentally based on operational formalisms for observations.

The basic notions of the operational formalism for the quantum measurement theory are quantum apparatus and instrument (Davies and Lewis, 1970; Busch et al., 1995; Ozawa, 1997). Quantum apparatuses are mathematical structures representing at a high level of abstraction physical apparatuses used for measurements. They encode the probabilities of the results of observations as well as the back-actions of the measurements on the states of physical systems. Such backactions are mathematically represented with the aid of another important mathematical structure, a quantum instrument. Our aim is to explore the theory of quantum apparatuses and instruments and especially its part devoted to indirect measurements in cognitive and social sciences.

The scheme of indirect measurements is very useful for applications, both in quantum physics and humanities. In this scheme, besides the "principle system" S, a probe system S ′ is considered. A measurement on S is composed of the unitary interaction with S ′ and a subsequent measurement on the latter.

In our cognitive modeling S represents unconscious information processing and S′ conscious. In the concrete example of Helmholtz unconscious inference, S represents processing of sensation (its unconscious nature was emphasized already by Helmholtz) and S ′ represents processing of perception - conscious representation of sensation.

This approach provides a possibility to extend the class of quantum measurements which originally were only von Neumann–Lüders measurements of the projection type. Such an extension serves not only the natural seeking of generality. Generalized quantum measurements have some new features. Here we shall concentrate only on those of them relevant to our project on quantum-like cognition.

For us, one of the main problems of exploring solely projective (direct) measurements is their fundamentally invasive nature: as the feedback of a measurement, the quantum state is "aggressively modified"—it is projected onto the subspace corresponding to the result of this measurement. In any event, this feature is not so natural for the dynamics of sensation and perception states. Of course, each "perception–creation" modifies the states of sensation and perception, but these modifications are not of the collapse type, as they should be in the case of projections.

Important for our applications is that a variety of different quantum instruments (describing back-reaction transformations resulting from measurements) can correspond to one and the same observable on the principle system S. That is, measurements having the same statistical results may lead to very different state transformations (due to very different types of interaction between the principle and probe systems). In quantum mechanics (as Ozawa emphasized Ozawa, 1997), the same observable can be measured by different apparatuses having different state-transforming quantum instruments. This is a very important characteristic of the theory of generalized quantum measurements. It is also very useful for cognitive modeling, since it reflects the individuality of measurement apparatuses/instruments which are used by cognitive systems (e.g., human beings) to generate the same perception.

We point out that the scheme of indirect measurements accounts for state dynamics in the process of measurement, which is not just a "yes"/"no" collapse as in the original von Neumann–Lüders approach. The possibility to mathematically describe the mental state dynamics in the process of perception– creation by means of the quantum formalism is very attractive. A study in this direction was already presented in the work of Pothos and Busemeyer (2013), although without appealing to the operational approach to quantum mechanics. In the series of works of Asano et al. (2010a,b, 2011, 2012), the process of decision making was described by a novel scheme of measurements generalizing the standard theory of quantum apparatuses and instruments (Asano et al., 2010a,b, 2011, 2012).

Now we list once again the main advantageous properties of the quantum instrument/apparatus modeling in cognitive psychology:


# 3. Quantum States

We start with a brief introduction to the quantum basics and define pure and mixed quantum states. The state space of a quantum system is complex Hilbert space. Denote it by H. This is a complex linear space endowed with a scalar product, a positivedefinite non-degenerate Hermitian form. Denote the latter by h·|·i. It generates the norm on <sup>H</sup>: <sup>k</sup>ψk = <sup>√</sup> hψ|ψi.

A reader who does not feel comfortable in the abstract framework of functional analysis can simply proceed with the Hilbert space H = **C** n , where **C** is the set of complex numbers, and the scalar product hu|vi = P i uiv¯i, u = (u1,..., un), v = (v1,..., vn). Instead of linear operators, one can consider matrices.

Pure quantum states are represented by normalized vectors, <sup>ψ</sup> <sup>∈</sup> <sup>H</sup> : <sup>k</sup>ψk = <sup>1</sup>. Two colinear vectors, <sup>ψ</sup> ′ = λψ,λ ∈ **C**, |λ| = 1, represent the same pure state. Each pure state can also be represented as the projection operator Pψ which projects H onto the one dimensional subspace based on ψ. For a vector φ ∈ H, P<sup>ψ</sup> φ = hφ|ψi ψ. Any projector is a Hermitian and positive-definite operator<sup>3</sup> . We also remark that the trace of the one dimensional projector P<sup>ψ</sup> equals to 1: Tr P<sup>ψ</sup> = 1. (We recall that, for a linear operator A, its trace can be defined as the sum of diagonal elements of its matrix in any orthonormal basis: Tr A = P i aii.) We summarize these properties of an operator (matrix) ρ = P<sup>ψ</sup> representing a pure state. It is


A linear operator is an orthogonal projector if and only if it satisfies (1) and (4); in particular, (2) is a consequence of (4). The properties (1–4) are characteristic for one dimensional orthogonal projectors—pure states [for a projector, (3) implies that it is one dimensional], i.e., any operator satisfying (1–4) represents a pure state.

The next step in the development of quantum mechanics was the extension of the class of quantum states, from pure states represented by one dimensional projectors to states represented by linear operators (matrices) having the properties (1–3). Such operators (matrices) are called density operators (density matrices). (This nontrivial step of extension of the class of quantum states was based on the efforts of Landau and von Neumann). One typically distinguish pure states, as represented by one dimensional projectors, and mixed states, those density operators which cannot be represented by one dimensional projectors. The terminology "mixed" has the following origin: any density operator can be represented as a "mixture" of pure states (ψi):

$$\rho = \sum\_{i} p\_i P\_{\psi\_i}, \ p\_i \in [0, 1], \sum\_{i} p\_i = 1. \tag{1}$$

The state is pure if and only if such a mixture is trivial: all pi, besides one, equal to zero. However, by operating with the terminology "mixed state" one has to take into account that the representation in the form Equation (1) is not unique. The same mixed state can be interpreted as mixtures of different collections of pure states.

Any operator ρ satisfying (1–3) is diagonalizable (even in the infinite-dimensional Hilbert space), i.e., in some orthonormal basis it is represented as a diagonal matrix, ρ = diag(pj), where p<sup>j</sup> ∈ [0, 1], P j p<sup>j</sup> = 1. Thus, it can be represented in the form Equation (1) with mutually orthogonal one dimensional projectors. The property (4) can be used to check whether a state is pure or not. We point out that pure states are merely mathematical abstractions; in real experimental situations it is possible to prepare only mixed states; one defines the degree of purity as Tr[ρ <sup>2</sup> <sup>−</sup> <sup>ρ</sup>]. Experimenters are satisfied by getting this quantity less than some small ǫ.

# 4. Atomic Instruments/Apparatuses

The notions of instrument and apparatus are based on very simple and natural consideration. Consider systems of any origin (physical, biological, social, financial). Suppose that the states of such systems can be represented by points of some set X. These are statistical states, i.e., by knowing the state of a system one can determine the values of observables only with some probabilities. Then, for each state x ∈ X and observable A and its concrete value ai, there is defined a map

$$p\_i = f\_{A, a\_i}(\mathbf{x}) \tag{2}$$

giving the probability of the result A = a<sup>i</sup> for systems in the state x ∈ X. Here fA,a<sup>i</sup> : <sup>X</sup> <sup>→</sup> [0, 1]. Then its is natural to assume that the measurement modifies the state x, i.e., there is is defined another map

$$\mathbf{x}\_{i} = \mathbf{g}\_{A, a\_{i}}(\mathbf{x}),\tag{3}$$

here gA,a<sup>i</sup> :<sup>X</sup> <sup>→</sup> <sup>X</sup>. This scheme is applicable both in classical and quantum physics as well as in psychology—Stimulus–Organism– Response (S–O–R) scheme for explaining behavior (Woodworth, 1921) of humans and other cognitive systems.

For the fixed observable A, the system of the state transformation maps (gA,a<sup>i</sup> ) corresponding to all possible values (ai) of A is called an instrument and the collection of maps (fA,a<sup>i</sup> ; gA,a<sup>i</sup> ) is called an apparatus. Of course, this scheme is too general and, to get something fruitful, one has to select the state space X having a special structure and special classes of f- and gmaps. Quantum theory is characterized by selection of the state space starting with a complex Hilbert space. This choice leads to theory of quantum instruments and apparatuses.

The general theory of quantum measurements is mathematically advanced, Section 9. Therefore, it is useful to illustrate it by a simple example. We consider the simplest class of quantum instruments extending the class of von Neumann–Lüders instruments of the projection type. These are atomic instruments.

Suppose that the range of values of a measurement, spectrum of an observable, is discrete O = {a1,..., an}. The main point of theory of instruments is that each measurement resulting in a concrete value a<sup>i</sup> generates the feedback action to the original state ρ of a quantum system, i.e., ρ is transformed into a new state ρai , see Equation (3):

$$
\rho \to \rho\_{a\_i}.\tag{4}
$$

We start with the standard von Neumann–Lüders measurements. which gives us an important class of quantum instruments/apparatuses (especially from the

<sup>3</sup>We recall that a linear operator A in H is called Hermitian if it coincides with its adjoint operator, A = A ⋆ . If an orthonormal basis in H is fixed, (ei), and A is represented by its matrix, A = (aij), where aij = hAe<sup>i</sup> |eji, then it is Hermitian if and only if a¯ij = aji. A linear operator is positive-definite if, for any φ ∈ H, hAφ|φi ≥ 0. It is equivalent to positive definiteness of its matrix. We remark that, for a Hermitian operator, all its eigenvalues are real.

historical viewpoint) . These measurements are mathematically represented by Hermitian operators,

$$A = \sum\_{i} a\_i P\_{a\_i},\tag{5}$$

where Pa<sup>i</sup> is the projector onto the eigensubspace corresponding to the eigenvalue a<sup>i</sup> . For pure states, the transformation (Equation 4) is based on the projection Pa<sup>i</sup> :

$$
\psi \to P\_{a\_i} \psi,\tag{6}
$$

this map is linear and it is convenient to work with it. However, if Pa<sup>i</sup> 6= I, where I is the unit operator, then kPaiψk < 1, so the output of Equation (6) is not a state. To get a state, it has to be normalized by its norm:

$$
\psi \to \frac{P\_{a\_i}\psi}{\|P\_{a\_i}\psi\|}. \tag{7}
$$

This is a map from the space of pure states into the space of pure states, but it is nonlinear. This type of the feedback reaction to the result of measurement was postulated by von Neumann. It is well-known as the projection postulate of quantum mechanics (the state reduction postulate or the state collapse postulate, see (Khrennikov and Basieva, 2014) for a psychologist-friendly discussion on these postulates and their role in quantum physics and cognitive psychology and psychophysics) <sup>4</sup> .

Now, for a pure state ψ, one can consider its representation by the density operator ρ = P<sup>ψ</sup> . In such terms, the state transform (Equation 6) can be written as

$$
\rho \to P\_{a\_i} \rho P\_{a\_i}.\tag{8}
$$

This is the simplest example of a transformation which in quantum measurement theory is called a quantum operation. It can be extended to the linear map from the space of linear operators (matrices) to itself—by the same formula (Equation 8). For a finite spectral set O, the collection of quantum operations (Equation 8) , a<sup>i</sup> ∈ O, gives the simplest example of a quantum instrument.

We are again interested in a map from the space of density operators (matrices) to itself, see Equation (4). Thus, we again have to make normalization:

$$
\rho \to \rho\_{a\_i} = \frac{\mathbf{P}\_{a\_i} \rho \mathbf{P}\_{a\_i}}{\text{Tr } \mathbf{P}\_{a\_i} \rho \mathbf{P}\_{a\_i}}.\tag{9}
$$

4 It is less known (in fact, practically unknown) that von Neumann sharply distinguished the case of observables with non-degenerate spectra, i.e., all (Pa<sup>i</sup> ) in the spectral decomposition of A, see Equation (5), are one dimensional projectors, and degenerate spectra, i.e., some of (Pa<sup>i</sup> ) are projectors onto multidimensional subspaces. In the first case he postulated aforementioned statecollapse (Equation 7), but in the second case he pointed out that the measurement feedback can generate state transformations different from one given by Equation (7); in particular, the output of the initial pure state can be a mixed state. Later Lüders extended the von Neumann projection postulate even to projectors with degenerate spectra, i.e., in fact, he reduced the class of possible state transformations (quantum operations). This simplification was convenient in theoretical studies and the projection postulate was widely treated as applicable generally, i.e., even to observables with degenerate spectra. The name of Lüders was washed out from the majority of foundational works and nowadays the projection postulate is typically known as the von Neumann projection postulate (see Khrennikov, 2008) for more details.

It is nonlinear and physicists work with quantum operations (forming instruments), by making normalization by trace only at the final step of calculations which can involve a chain of measurements.

However, we are primarily interested not in the measurement feedback to the initial quantum state ρ, but in the probabilities to get the results a<sup>i</sup> ∈ O. Denote them p(a<sup>i</sup> |ρ). Here they are given by Born's rule. If the initial state is pure ρ = P<sup>ψ</sup> , then

$$p(a\_i|\psi) = \langle P\_{a\_i}\psi|\psi\rangle = \|P\_{a\_i}\psi\|^2. \tag{10}$$

It is easy to see that

$$p(a\_i|\psi) = \operatorname{Tr} \operatorname{P}\_{\mathfrak{A}} \operatorname{P}\_{\psi}.\tag{11}$$

This formula can be easily generalized, e.g., via Equation (1), to an arbitrary initial state ρ:

$$\mathfrak{p}(a\_i|\rho) = \text{Tr } \mathcal{P}\_{\mathfrak{a}\_i} \rho \,. \tag{12}$$

A quantum apparatus is the combination of feedback statetransformations, i.e., a quantum instrument, and detection probabilities.

In the von Neumann–Lüders approach the quantum instrument is uniquely determined by an observable, the Hermitian operator A. The latter is the basis of the construction. However, even in this approach we could start directly with an instrument determined by a family of mutually orthogonal projectors (Pa<sup>i</sup> ), i.e.,

$$\sum\_{i} P\_{a\_i} = I,\tag{13}$$

where Pa<sup>i</sup> ⊥ Pa<sup>j</sup> , i 6= j, and then define the observable A simply as this family (Pa<sup>i</sup> ). In quantum information the values a<sup>i</sup> have merely the meaning of labels for the results of measurement. For future generalization, we remark that the normalization condition (Equation 13) can be written as

$$\sum\_{i} P\_{a\_i}^{\star} P\_{a\_i} = I,\tag{14}$$

because, for any orthogonal projector P, P <sup>⋆</sup> <sup>=</sup> <sup>P</sup> and <sup>P</sup> <sup>2</sup> <sup>=</sup> <sup>P</sup>.

Now we move to general atomic instruments and apparatuses. Here quantum operations have the form:

$$
\rho \to Q\_{a\_i} \rho Q\_{a\_i}, \tag{15}
$$

where, for each value ai, Qa<sup>i</sup> is a linear operator which is a contraction (i.e., its norm is bounded by 1). These operators are constrained by the normalization condition, cf. (Equation 14):

$$\sum\_{i} Q\_{a\_{i}}^{\star} Q\_{a\_{i}} = I,\tag{16}$$

These operations determine an atomic quantum instrument. Each quantum operation induces the corresponding state transformation:

$$
\rho \to \rho\_{a\_i} = \frac{\mathbf{Q}\_{a\_i} \rho \mathbf{Q}\_{a\_i}}{\text{Tr}\, \mathbf{Q}\_{\text{ai}} \rho \, \mathbf{Q}\_{\text{ai}}}.\tag{17}
$$

In particular, pure states are transformed into pure states (similar to the von Neumann–Lüders measurements):

$$
\psi \rightarrow \frac{Q\_{a\_i} \psi}{\|Q\_{a\_i} \psi\|}. \tag{18}
$$

Probabilities of the results of measurements are given by the following generalization of Equation (12):

$$p(a\_i|\rho) = \text{Tr } \mathbf{M}\_{\text{il}}\rho,\tag{19}$$

where

$$M\_{a\_i} = Q\_{a\_i}^\bullet Q\_{a\_i}.\tag{20}$$

(We remark that if Qa<sup>i</sup> is a projector, then Q ⋆ <sup>a</sup><sup>i</sup> = Qa<sup>i</sup> and Q 2 a<sup>i</sup> = Qa<sup>i</sup> . Thus, in this case (Equation 19) matches with (Equation 12). In this way we obtain the corresponding quantum instrument.

The class of atomic instruments and apparatuses is the most direct generalization of the von Neumann–Lüders class. In particular, in general quantum instruments do not transfer pure states into pure states, see Appendix.

# 5. Bistable Perception of Schröder Stair

The experiment is about perception of on the ambiguous figure, the Schröder stair, see **Figure 1**. Here we reproduce data from paper (Asano et al., 2014), where the reader can find a more detailed presentation.

A total of 151 subjects participated in the test performed at Tokyo University of Science. They were divided into three groups (n<sup>A</sup> = 55, n<sup>B</sup> = 48, n<sup>C</sup> = 48). To the subjects of all three groups, we showed 11 pictures of the Schröder stair which was leaning at different angles. Subjects answered L ="I can see that left side is front,"or R ="I can see that right side is front" for each picture. Thus, we have a random variable for perception, X<sup>θ</sup> = L, R. We denote the experimental probability that a subject answers "Left side is front" by p(X<sup>θ</sup> = L).

For the first group (A), order of showing pictures is randomly selected for each subject. For the second group (B), angle θ changed from 0 to 90 as if the picture was rotating clockwise.

Inversely, for the third group (C), the angle θ was changed from 90 to 0. As a result, we obtained perception trends with respect of angles, see **Figure 2**. These graphs demonstrate contextuality of data, its dependence on experimental contexts, (A)–(C), (see Asano et al., 2014) for numerical estimation of the degree of contextuality as violation of the Garg–Leggett inequality. As was discussed in Introduction, contextual statistical data can be modeled by using the quantum formalism.

# 6. Mental Apparatuses

We shall proceed with finite dimensional state spaces by making remarks on the corresponding modifications in the infinite dimensional case. The symbol D(H) denotes the space of density operators in the complex Hilbert space H; L(H) the space of all linear operators in H (bounded operators in the infinite dimensional case).

The space L(H) can itself be endowed with the structure of the linear space. We also have to consider linear operators from L(H) into itself; such maps, <sup>T</sup> : <sup>L</sup>(H) <sup>→</sup> <sup>L</sup>(H) are called superoperators. We shall use this notion only in Section 9. Thus, for a moment, the reader can proceed without it.

Moreover, on the space L(H) it is possible to introduce the structure of Hilbert space with the scalar product

$$
\langle A|B\rangle = \operatorname{Tr} \mathbf{A^\star B}.
$$

Therefore, for each superoperator <sup>T</sup> : <sup>L</sup>(H) <sup>→</sup> <sup>L</sup>(H), there is defined its adjoint (super)operator T ⋆ : <sup>L</sup>(H) <sup>→</sup> <sup>L</sup>(H), hT(A)|Bi = hA|T ⋆ (B)i, A, B ∈ L(H).

For reader's convenience we remind the notion of POVM.

**Definition.** A positive operator valued measure (POVM) is a family of positive operators {Mj} such that <sup>P</sup><sup>m</sup> <sup>j</sup>=<sup>1</sup> <sup>M</sup><sup>j</sup> <sup>=</sup> <sup>I</sup>, where I is the unit operator.

Consider a cognitive system, to be concrete consider a human individual, call her Keiko. She confronts some recognitionproblem, i.e., in our problem of bistable perception of Schröder stair she has to make the choice between two perception A = L, R. In the quantum(-like) model the space of her mental states is represented by complex Hilbert space H (pure states are

represented by normalized vectors and mixed states by density operators).

In the model under construction H is tensor-factorized into two components, namely, H = H ⊗ K, where H is the space of sensation-states and K is the space of perception-states. The states of the latter are open for conscious introspection, but the states of the former are in general not approachable consciously. We recall that we model Helmholtz unconscious inference.

In general suppose that Keiko confronts some concrete recognition problem A with possible perceptions labeled as ai, i = 1, 2,..., m. We denote the set of possible values of A by the symbol O, i.e., O = {a1,.., am}. By interacting with a figure (in our concrete case the figure is ambiguous) she generates the the sensation-state ρ (e.g., a pure state, i.e., ρ = |ψihψ|,ψ ∈ H, kψk = 1). The process of generation of ρ can be mathematically represented as a unitary transformation in the space H. Denote the pre-recognition state of sensation by ρ0. Then

$$
\rho = U\rho\_0 U^\star,
$$

where the unitary operator <sup>U</sup> : <sup>H</sup> <sup>→</sup> <sup>H</sup> depends on the figure; in our concrete case U = USchr.

To come to the concrete perception, Keiko uses a "mental apparatus," denoted as A, which produces the results (perceptions) a<sup>i</sup> randomly with the probabilities p(a<sup>i</sup> |ρ), the output probabilities<sup>5</sup> . An apparatus represents not only perceptions and the corresponding probabilities, but also the results of the evolution of the initial sensation-state ρ as induced by the back-reaction to the concrete perception a<sup>i</sup> . This is a sort of the state reduction, "sensation-state collapse" as the result of creation of the concrete perception a<sup>i</sup> . Thus, the sensation state ρ which Keiko created from her visual image is transformed into the output state ρa<sup>i</sup> .

However, as we shall see, in general this sensation-state update can be sufficiently peaceful, so our model differs crucially from the orthodox quantum models of cognition (Busemeyer and Bruza, 2012) based on the projection-type state update. Thus, each mental apparatus A corresponding to the recognitionproblem A is mathematically represented by


$$
\rho \to \rho\_{a\_l}.\tag{21}
$$

The rigorous mathematical description of such state transformations leads to the notion of a quantum instrument, see Section 9.

#### 6.1. Mixing Law

In the quantum operational formalism it is assumed that these probabilities, p(a<sup>i</sup> |ρ), satisfy the mixing law. We remark that, for any pair of states (density operators) ρ1,ρ<sup>2</sup> and any pair of probability weights q1, q<sup>2</sup> ≥ 0, q<sup>1</sup> + q<sup>2</sup> = 1, the convex combination ρ = q1ρ<sup>1</sup> + q2ρ<sup>2</sup> is again a state (density operator). In accordance with the mixing law any apparatus produces probabilities such that

$$p(a\_i|q\_1\rho\_1 + q\_2\rho\_2) = q\_1p(a\_i|\rho\_1) + q\_2p(a\_i|\rho\_2). \tag{22}$$

In our model of bistable perception the mixing law can be formulated as follows:

A probabilistic mixture of sensations produces the mixture of probabilities for perception outputs.

In physics this is a very natural assumption. However, in modeling of cognitive phenomena, in particular, unconscious inference, an additional analysis of its validity has to be performed. We have no possibility to do this in this note, so we postpone such analysis to one of coming publications. Now we mimic quantum physics explicitly and proceed under the assumption (Equation 22).

#### 6.2. Composition of the Apparatuses

It is natural to assume that after resolving the recognitionproblem A a person is ready to look at another image B and proceed to its perception. In general perception of B depends on the preceding perception of A. Such a sequence of perceptions represented as a new mental apparatus, the composition of the apparatuses A and B : BA. Its outputs are ordered pairs of perceptions (ai, bj). It is postulated that the corresponding output probabilities and states are determined as

$$p((a\_i, b\_j) | \rho) = p(b\_j | \rho\_{a\_i}) p(a\_i | \rho);\tag{23}$$

$$
\rho\_{(a\_i, b\_j)} = (\rho\_{a\_i})\_{b\_j}.\tag{24}
$$

The law (Equation 23) can be considered as the quantum generalization of the Bayes rule. The law (Equation 24) is the natural composition law.

In our experiment with rotation of the Schröder stair, we are interested in a sequence of instruments Aθ corresponding to some sample of angles C = {θ1,...,θm}. Here C determines the context of the experiment. Our data from Section 5 can be represented as the superposition of quantum apparatuses: A<sup>C</sup> = Aθ<sup>m</sup> ...Aθ<sup>1</sup> . Here A<sup>C</sup> is the quantum apparatus representing the context C. In our experimental study we considered not only deterministic contexts corresponding to clockwise and counterclockwise rotations, but even the random context determined by the uniform probability distribution.

# 7. Perception through Unitary Interaction Between the Sensation and Perception-states

The above operational description of "perception–production" was formulated solely in terms of sensation-states. However, a sensation-state is a complex informational state which is in general unapproachable for conscious introspective. The operational representation of observables in the space of sensation-states is not straightforward and in general it cannot be formulated in terms of mutually exclusive perceptions. For

<sup>5</sup>We are going toward creation of a cognitive analog of the quantum operational model of measurements with the aid of physical apparatuses.

example, in our experiment Keiko's perceptions can be binary encoded: A = L, R. However, her sensation of the Schröder stair is a complex information state depending on a variety of parameters (in particular, we are interested in dependence on the rotation angle). The subspaces corresponding to sensations leading to the L-perception and R-perception are in general not orthogonal. This non-orthogonality of sensation subspaces for different perceptions is the fundamental feature of bistable perception, recognition of ambiguous figures.

Therefore, it is more fruitful to define the perceptionobservable directly by using an additional state space, the space of the perception-states K. In the perception space a perceptionobservable can be defined as the standard von Neumann–Lüders projection observable.

**Example 1.** Consider the simplest case: recognition of the fixed figure A, with dichotomous output, i.e., there are two possible outcomes of "perception-measurement," e.g., L = 0 and R = 1 for the Schröder stair. This observable can be represented by the pair of projectors (P0, P1) onto the subspaces K<sup>0</sup> and K<sup>1</sup> of the perception space K. Since the perceptions a<sup>0</sup> = 0 and a<sup>1</sup> = 1 are mutually exclusive, and sharply exclusive, the subspaces K<sup>0</sup> and K<sup>1</sup> are orthogonal. Hence, the projectors P<sup>0</sup> and P<sup>1</sup> can be selected as orthogonal. The perception-observable A can be represented as the conventional von-Neumann-Lüders observable Aˆ = a0P<sup>0</sup> + a1P1(= P1). However, we emphasize that this representation is valid only in the perception-state space K. It is often (but not always!) possible to proceed with one dimensional projectors, i.e., to represent possible perceptions just by the basis vectors in the two dimensional perception-state space, (|0i, |1i). Here each perception-state can be represented as superposition

$$\left|\phi = c\mathbf{o}\left|\mathbf{0}\right\rangle + c\mathbf{l}\left|\mathbf{l}\right\rangle,\ \left|c\mathbf{o}\right|^2 + \left|c\mathbf{l}\right|^2 = 1.\tag{25}$$

Measurement of A leads to probabilities of perceptions given by squared coefficients, p<sup>0</sup> = |c0| 2 , p<sup>1</sup> = |c1| 2 .

In the case of the finite-dimensional perception-state, a perception-observable A can be represented as

$$A = \sum\_{i} a\_{i} P\_{i},\tag{26}$$

where (Pi) is the family of mutually orthogonal projectors in the space of perception-states K and (ai) are real numbers encoding possible answers (perceptions).

Now we shall explore the cognitive analog of the standard scheme of quantum indirect measurements.

In our cognitive framework "indirectness" means that the sensation-states are in general unapproachable for consicious introspection. Therefore, it is impossible to perform the direct measurement on the sensation-state ρ (in particular, on a pure state ρ = |ψihψ|). Moreover, in the sensation-state the alternatives, say 0/1, encoded in a perception-observer A are not represented exclusively, they can have overlap. (Mathematically the overlap is expressed as non-orthogonality of sensationsubspaces corresponding to various perceptions.)

In the quantum measurement framework, this situation is described as follows: in the sensation space an observable A is represented as an unsharp observable of the POVM-type. Roughly speaking in the H-representation the A-zero contains partially the A-one and vice versa. The latter is simply a consequence of interpretation of POVM observables as unsharp observables.

**Remark 1.** To map the quantum physics scheme (Ozawa, 1997) of indirect measurements onto the quantum(-like) cognition scheme, one has to associate the state of the principle physical system S with the sensation-state and the state of the probe physical system S ′ with the perception-state. We point out that in the cognitive framework we do not consider analogs of physical systems. In principle, one can consider the sensation-system S as a part of the neuronal system representing sensations and the perception system S ′ as another part of the neuronal system representing possible perceptions. The latter can be specified: different measurements can be associated with different neuronal networks responsible for the corresponding perceptions. However, in principle we need not associate sensation and perception states with the concrete physical neuronal networks. In the case of cognition usage of isolated physical systems as carriers of the corresponding information states might be ambiguous. The interconnectivity of neuronal networks is very high. Therefore, the picture of distributed computational system is more adequate. (Of course, even in physics the notion of an isolated system is just an idealization of the real situation). Therefore, it is useful to proceed in the purely information approach by operating solely with states, without coupling them to bio-physical systems. This is, in fact, the quantum information approach, where systems play the secondary role, and one operates with states; especially for the information interpretation of quantum mechanics (Zeilinger, 2010).

In the simplest model we can assume that at the beginning of the process of perception-creation the sensation and perceptionstates, ρ and σ, are not entangled<sup>6</sup> . Thus, mathematically, in accordance with the quantum formalism, the integral sensation– perception-state, the complete mental state corresponding to the problem under consideration, can be represented as the tensor product

$$
\mathcal{R} = \rho \otimes \sigma.
$$

In the process of perception-creation the sensation and perception-states (cf. Remark 1) "interacts" and the evolution of the sensation–perception-state R is mathematically represented by a unitary operator<sup>7</sup> <sup>U</sup> : <sup>H</sup> <sup>→</sup> <sup>H</sup>:

$$R \to R\_{\text{out}} \equiv U R U^{\star}.\tag{27}$$

In the space of sensation–perception-states H the perceptionobserver A is represented by the operator I ⊗ A. Thus, the probabilities of perceptions are given by

$$p\_{a\_i}^{A \otimes I} = \text{Tr } \mathbb{R}\_{\text{out}}(\mathbf{I} \otimes \mathbf{P}\_i) = \text{Tr } \mathbf{U} \mathbf{R} \mathbf{U}^\star (\mathbf{I} \otimes \mathbf{P}\_i),\tag{28}$$

<sup>6</sup> One can say that they are independent. But one can use this terminology carefully, since the notion of quantum independence is more complicated than the classical one and it is characterized by diversity of approaches.

<sup>7</sup>As was mentioned, in the works of Asano et al. (2010a,b, 2011, 2012) and Accardi et al. (in press) even non-unitary evolutions were in charge.

where the projectors (Pi) form the spectral decomposition of the Hermitian observable A in K, see Equation (26).

Since only the perception-state belonging K is a subject of conscious introspective, at the conscious level the perception process can be represented solely in the state space K. The post-interaction perception-state σout can be (mathematically) extracted from the integral state Rout with the aid of the operation of the partial trace:

$$
\sigma\_{\rm out} = \mathrm{Tr}\_{\rm H} \mathrm{R}\_{\rm out}.\tag{29}
$$

Then perceptions can be represented as the results of the Ameasurement (measurement of the projection-type) in the perception space; measurement on the output state σout. The probabilities of the concrete perceptions (ai) are given by the standard Born rule:

$$\mathbf{p}\_{a\_i}^A = \mathrm{Tr}\_{\mathbf{K}} \sigma\_{\mathrm{out}} \mathbf{P}\_{\mathbf{i}} = \mathrm{Tr}\_{\mathbf{K}}(\mathrm{Tr}\_{\mathbf{H}} \mathbf{R}\_{\mathrm{out}}) \mathbf{P}\_{\mathbf{i}} = \mathrm{Tr} \mathbf{R}\_{\mathrm{out}} (\mathbf{I} \otimes \mathbf{P}\_{\mathbf{i}}) = \mathbf{p}\_{\mathbf{a\_i}}^{\mathrm{A} \otimes \mathbf{I}}.\tag{30}$$

Thus, Equations (28) and (30) match each other.

If the concrete result A = a<sup>i</sup> was observed, then the state of perception σ is transformed into

$$\sigma\_{i;\text{out}} = \frac{\text{Tr}\_{\text{H}} \text{R}\_{\text{out}} (\text{I} \otimes \text{P}\_{\text{i}})}{\text{Tr} \text{R}\_{\text{out}} (\text{I} \otimes \text{P}\_{\text{i}})}. \tag{31}$$

#### What does happen in the sensation space?

The expression (Equation 28) for the probability of the perception a<sup>i</sup> can be represented as

$$\begin{split} \mathfrak{p}(a\_{i}|\rho) &= \mathfrak{p}\_{a\_{i}}^{A \otimes I} = \mathrm{Tr} \mathrm{R\_{\mathrm{out}}}(\mathrm{I} \otimes \mathrm{P\_{i}}) = \mathrm{Tr} \rho \otimes \sigma \,\mathrm{U}^{\star}(\mathrm{I} \otimes \mathrm{P\_{i}}) \mathrm{U} \\ &= \mathrm{Tr} \mathrm{r\_{\mathrm{H}}} \rho \mathrm{M\_{\mathrm{u}\_{i}}}, \end{split} \tag{32}$$

where

$$M\_{d\_i} = \operatorname{Tr} \mathbb{K} (\mathbf{I} \otimes \sigma) \mathbf{U}^\star (\mathbf{I} \otimes \mathbf{P}\_i) \mathbf{U}. \tag{33}$$

The operator Mi; H → H can also be represented in the following useful form (a consequence of the cyclic property of the trace operation):

$$M\_{d\_i} = \operatorname{Tr}\_{\mathbf{K}} \mathbf{U}^\star (\mathbf{I} \otimes \mathbf{P}\_{\mathbf{i}}) \mathbf{U} (\mathbf{I} \otimes \sigma) \tag{34}$$

We remark that (Equation 33) implies:

$$\sum\_{i} M\_{a\_{i}} = \mathrm{Tr}\_{\mathrm{K}}(\mathrm{I} \otimes \sigma) \mathrm{U}^{\star}(\mathrm{I} \otimes \sum\_{i} \mathrm{P}\_{i}) \mathrm{U} = \mathrm{Tr}\_{\mathrm{K}} \mathrm{I} \otimes \sigma = (\mathrm{Tr}\_{\mathrm{K}} \sigma) \mathrm{I} \mathrm{I}$$

We also remark that each operator Ma<sup>i</sup> is positively defined and Hermitian.

Thus, in the sensation space the perception-observable of the projection-type A (acting in K) with the spectral family (Pi) is represented as POVM M = (Mi). We remark that in general the operators M<sup>i</sup> are not projectors. Such measurement cannot separate sharply sensations leading to perceptions (ai) for different i.

The operational formalism also gives the "post-perception sensation-state," i.e., the state of sensation created as the feedback to the consciously recognized perception ai,

$$\rho\_{a\_i} = \frac{\operatorname{Tr}\_{\mathbf{K}} \mathbf{R}\_{\text{out}} (\mathbf{I} \otimes \mathbf{P}\_{\mathbf{i}})}{\operatorname{Tr} \mathbf{R}\_{\text{out}} (\mathbf{I} \otimes \mathbf{P}\_{\mathbf{i}})}. \tag{35}$$

The output sensation-state depends not only on the initial sensation-state ρ, but also on the initial perception-state σ, interaction between believes and possible perceptions given by U and the question-observable A acting in K.

# 8. The Indirect Measurement Scheme for Rotation Contexts for Perception of Schröder Stair

As at the very end of Section 6.2, we consider contextual measurements for the Schröder stair: a sequence of perceptions corresponding to some sample of angles C = {θ1,...,θm}. Here C determines the context of the experiment. We apply the scheme of indirect measurements. We can assume that the perception space K is two dimensional with the orthogonal basis |Li, |Ri representing the "left-faced" and "right-faced" preceptions of the stair. Thus, projectors Pi, i = L, R, are one dimensional.

We start with the initial sensation state ρ0. By the visual image rotated at the angle θ<sup>1</sup> this state is transformed to

$$
\rho\_{\theta\_1} = U\_{\text{Sch};\theta\_1} \rho\_0 U\_{\text{Sch};\theta\_1}^\star,\tag{36}
$$

where USch;θ<sup>1</sup> represents the unitary dynamics induced by this image. Then the perception of the image is modeled starting with

$$R\_{\theta\_1} = \rho\_{\theta\_1} \otimes \sigma\_0,\tag{37}$$

where σ<sup>0</sup> represents the state of perception preceding interaction with the state of sensation. It is natural to assume that σ<sup>0</sup> = |φ0ihφ0|, where

$$
\phi\_0 = (|L\rangle + |R\rangle)/\sqrt{2} \tag{38}
$$

is the neutral composition of the states "left-faced" and "rightfaced." It represents the deepest state of uncertainty. Suppose (for simplicity) that independently of the angle the interaction of sensation and perception states is given by the same unitary operator U. Then Keiko's perception of the Schröder stair observed at the angle θ<sup>1</sup> with the fixed result i<sup>1</sup> = L or R leads to the new states of sensation and perception:

$$\sigma\_{\dot{\imath}\_{1};\theta\_{1}} = \frac{\mathrm{Tr}\_{\mathrm{H}}\mathrm{U}\mathrm{R}\_{\theta\_{1}}\mathrm{U}^{\star}(\mathrm{I}\otimes\mathrm{P}\_{\mathrm{i}\_{1}})}{\mathrm{Tr}\mathrm{U}\mathrm{R}\_{\theta\_{1}}\mathrm{U}^{\star}(\mathrm{I}\otimes\mathrm{P}\_{\mathrm{i}\_{1}})}, \ \rho\_{\dot{\imath}\_{1};\theta\_{1}} = \frac{\mathrm{Tr}\_{\mathrm{K}}\mathrm{U}\mathrm{R}\_{\theta\_{1}}\mathrm{U}^{\star}(\mathrm{I}\otimes\mathrm{P}\_{\mathrm{i}\_{1}})}{\mathrm{Tr}\mathrm{U}\mathrm{R}\_{\theta\_{1}}\mathrm{U}^{\star}(\mathrm{I}\otimes\mathrm{P}\_{\mathrm{i}\_{1}})}. \tag{39}$$

The probability of creation of the perception i can be calculated as

$$\rho\_{i\_1;\theta\_1} = \operatorname{Tr}\_{\mathbf{H}} \rho\_{\theta\_1} \mathbf{M}\_{i\_1;\theta\_1}.\tag{40}$$

Here POVM's component Mi1;θ<sup>1</sup> , i<sup>1</sup> = L, R, has the form:

$$M\_{\rm i\_l;\theta\_l} = \operatorname{Tr}\_{\mathbf{K}} \mathbf{U}^\star (\mathbf{I} \otimes \mathbf{P}\_{\rm i\_l}) \mathbf{U} (\mathbf{I} \otimes \sigma\_0). \tag{41}$$

For the next measurement corresponding to rotation of Schröder's stair for the angle θ2, Keiko selects ρi1;θ<sup>1</sup> and σi2;θ<sup>1</sup> as the initial states. This means that creation of the fixed perception i<sup>1</sup> leads to disentanglement of her mental state into the product of two states, the state of sensation and perception. Then

$$
\rho\_{\theta\_2} = U\_{\text{Sch};\theta\_2} \rho\_{i\_1;\theta\_1} U\_{\text{Sch};\theta\_2}^{\star}, \tag{42}
$$

where USch;θ<sup>1</sup> represents the unitary dynamics induced by the θ2-image. Then

$$R\_{\theta\_2} = \rho\_{\theta\_2} \otimes \sigma\_{i\_1;\theta\_1},\tag{43}$$

Then Keiko's perception of the Schröder stair observed at the angle θ<sup>2</sup> with the fixed result j = L or R leads to the new states of sensation and perception:

$$\sigma\_{i\_2; \theta\_2} = \frac{\mathrm{Tr}\_\mathbf{H} \mathrm{U} \mathrm{R}\_{\theta\_2} \mathbf{U}^\star (\mathbf{I} \otimes \mathbf{P}\_{i\_2})}{\mathrm{Tr} \mathbf{U} \mathrm{R}\_{\theta\_2} \mathbf{U}^\star (\mathbf{I} \otimes \mathbf{P}\_{i\_2})}, \ \rho\_{i\_2; \theta\_2} = \frac{\mathrm{Tr}\_\mathbf{K} \mathrm{U} \mathrm{R}\_{\theta\_2} \mathbf{U}^\star (\mathbf{I} \otimes \mathbf{P}\_{i\_2})}{\mathrm{Tr} \mathbf{U} \mathrm{R}\_{\theta\_2} \mathbf{U}^\star (\mathbf{I} \otimes \mathbf{P}\_{i\_2})}. \tag{44}$$

The probability of creation of the perception i<sup>2</sup> can be calculated as

$$
\rho\_{i\_2; \theta\_2} = \text{Tr}\_{\text{H}} \rho\_{\theta\_2} \text{M}\_{i\_2; \theta\_2}.\tag{45}
$$

Starting with ρi2;θ<sup>2</sup> ,σi2;θ<sup>2</sup> , Keiko generates the perception of the θ3-rotated stair and so on. After the last test, Keiko's states of sensation and perception ρin;θ<sup>n</sup> ,σin;θ<sup>n</sup> depend on the sequence of angles C and the sequence of her perceptions (i1, i2,..., in). The same is valid for the probability pin;θ<sup>n</sup> . If the experiment is performed for two different contexts C = {θ1,...,θm} and C ′ = {θ ′ 1 ,...,θ′ <sup>m</sup>}. Then in general it is impossible to embed the probabilities of perceptions in a single Kolmogorov probability space. Therefore, the use of quantum theory of measurement and "quantum probabilities" can be fruitful. Our approach provides the possibility to model probabilities of perceptions depending on a context, a sequence of angles.

# 9. Representing Perception by Quantum Instruments

The considered model of perception as the result of unitary interaction between the sensation-state and the perception-state describes an important class of transformations of the sensationstate, see Equation (35). We now turn to the general case which was considered in Section 6, see Equation (21). Set

$$E(a\_i)\rho = p(a\_i|\rho)\rho\_{a\_i} \tag{46}$$

and, for a subset Ŵ of O, where O = {a1,..., am} is the set of all possible perceptions, we set

$$E(\Gamma)\rho = \sum\_{a\_l \in \Gamma} E(a\_l)\rho = \sum\_{a\_l \in \Gamma} p(a\_l|\rho)\rho\_{a\_l}.\tag{47}$$

We point to the basic feature of this map:

$$\text{Tr}\mathcal{E}(\mathcal{O})\rho = \sum\_{\mathfrak{a}\_{\mathbb{I}} \in \mathcal{O}} \mathbf{p}(\mathcal{a}\_{\mathbb{I}}|\rho) \text{Tr}\rho\_{\mathfrak{a}\_{\mathbb{I}}} = 1. \tag{48}$$

For each concrete perception ai, E(ai) maps density operators to linear operators (in the infinite dimensional case, these are traceclass operators, but we proceed in the finite dimensional case, where all operators have finite traces).

The mixing law implies that, for any Ŵ ⊂ O,

$$E(\Gamma)(q\_1\rho\_1 + q\_2\rho\_2) = q\_1E(\Gamma)\rho\_1 + q\_2E(\Gamma)\rho\_2. \tag{49}$$

As was shown by Ozawa (1997), under the assumption on the existence of composition of the apparatuses any such a map E(Ŵ): D(H) → L(H) can be extended to a linear map (superoperator)

$$L(\Gamma): L(H) \to L(H) \tag{50}$$

such that:

• each E(Ŵ) is positive, i.e., it transfers the set of positively defined operators into itself;

• E(O) = P i E(ai) is trace preserving:

$$\text{Tr}\text{E}(\mathcal{O})\rho = \text{Tr}\rho.\tag{51}$$

The latter property is a consequence of Equation (48)<sup>8</sup> .

Thus, the two very natural and simple assumptions, the mixing law for probabilities and the existence of composite apparatuses, have the fundamental mathematical consequence, the representation of the evolution of the state by a superoperator (Equation 50).

In quantum physics such maps are known as state transformers (Busch et al., 1995) or DL (Davis–Levis, Davies and Lewis, 1970) quantum operations<sup>9</sup> .

Thus, each perception induces the back-reaction which can be formally represented as a state transformer. In these terms

$$\rho\_{a\_i} = \frac{E(a\_i)\rho}{\text{Tr}\mathbf{E}(a\_i)\rho} \tag{52}$$

We remark that the map Ŵ → L(L(H)), from subsets of the set of possible perceptions O into the space of superoperators, is additive:

$$E(\Gamma\_1 \cup \Gamma\_2) = E(\Gamma\_1) + E(\Gamma\_2), \ \Gamma\_1 \cap \Gamma\_2 = \emptyset. \tag{53}$$

This is a measure with values in the space L(L(H)). Such measures are called (DL) instruments (Davies and Lewis, 1970). To specify the domain of applications in our case, we shall call them perception instruments.

The class of such instruments is essentially wider than the class of instruments based on the unitary interaction between sensation and perception components of the mental state, see

<sup>8</sup> If one wants to extend E(Ŵ) from the set of density operators to the set of all linear operators (in the infinite dimensional case it has to be the set of finitetrace operators) by linearity then it has to be set E(Ŵ)µ = E(Ŵ)Trµ(µ/Trµ) = Trµ E(Ŵ)(µ/Trµ) and, in particular, E(O)µ = Trµ E(O)(µ/Trµ) = Trµ.

<sup>9</sup>DL-notion of the quantum operation is more general than the notion used nowadays. The latter is based on complete positivity, instead of simply positivity as the DL-notion, see Appendix for the corresponding definition and a discussions on whether the reasons used in physics to restrict the class of state transformers can be automatically used in cognitive science.

Equation (35). The evident generalization of the scheme of Section 7 is to consider nonunitary interactions between the components of the mental state; another assumption which can be evidently violated in modeling of cognition is that the initial sensation and perception states are not entangled ("independent") (see Asano et al., 2010a,b, 2011, 2012) for generalizations of the aforementioned scheme.

We start with a discussion on possible nonunitarity of interaction between the sensation and perception states. In quantum physics the assumption of unitarity of interaction between the principle system S and the probe system S ′ (representing a part of the measurement apparatus interacting with S) is justified, because the compound system S + S˜ ′ can be considered (with a high degree of approximation) as an isolated quantum system and its evolution can be described (at least approximately) by the Schrödinger equation. And the latter induces the unitary evolution of a state.

In cognition the situation is totally different. The main scene of cognition is not the physical space-time, but the brain. It is characterized by huge interconnectivity and parallelism of information processing. Therefore, it is more natural to consider the sensation and perception states corresponding to different visual inputs as interacting, especially at the level of the sensation-states. Thus, the perception-creation model based on the assumption of isolation of different perception-creation processes from each other seems to be too idealized, although it can be used in many applications, where the concentration on one fixed problem may diminish the influence of other perception-creation processes.

In physics, the assumption that the initial state of the system S + S˜ ′ is factorized is also justified, since the exclusion of the influence of the state of the measurement device to the state of a system S prepared for measurement (and vice versa) is the experimental routine. In cognition the situation is more complicated. One cannot exclude that in some situations the initial sensation and perception state are entangled.

# References


The representation of probabilities with the aid of POVMs is not a feature of only the unitary interaction representation of apparatuses, see Equation (32). In general, any DL-instrument generates such a representation. Take an instrument E, where, for each <sup>a</sup><sup>i</sup> <sup>∈</sup> <sup>O</sup>, <sup>E</sup>(ai) : <sup>L</sup>(H) <sup>→</sup> <sup>L</sup>(H) is a superoperator. Then we can define the adjoint operator E ⋆ (ai) : <sup>L</sup>(H) <sup>→</sup> <sup>L</sup>(H). Set Ma<sup>i</sup> = E ⋆ (ai)I, where <sup>I</sup> : <sup>H</sup> <sup>→</sup> <sup>H</sup> is the unit operator. Then, since pa<sup>i</sup> = TrE(ai)ρ = Tr I; E(ai)ρ = hI|E(ai)ρi == hE ⋆ (ai)I|ρi = Tr(E<sup>⋆</sup> (ai)I)ρ = TrMaiρ. By using the properties of an instrument it is easy to show that Ma<sup>i</sup> is POVM. Thus, each mental apparatus can be represented by a POVM. We interpret this POVM as the mathematical representation of "unconscious" inference. Such "unconscious measurements" are not sharp, they cannot separate completely different perceptions a<sup>i</sup> which are mutually exclusive at the conscious level. Mathematically, we have that the subspaces Ha<sup>i</sup> = MiH need not be orthogonal. Sensation states corresponding to the perceptions a<sup>i</sup> and aj, say ψ<sup>i</sup> ∈ Ha<sup>i</sup> and ψ<sup>j</sup> ∈ Ha<sup>j</sup> , in general have nonzero overlap hψi |ψji 6= 0.

# 10. Concluding Remarks

This paper is an attempt to present the theory of generalized quantum measurements based on quantum apparatuses and instruments in a humanities-friendly way. This is a difficult task, since this theory is based on advanced mathematical apparatus. We hope that the reader can at least follow our introductory presentation in Sections 3, 4. Although we applied quantum apparatuses and instruments to the concrete problem of cognition, modeling bistable perception and, more generally, Helmholtz unconscious inference, this approach can be used to model general unconscious–conscious information processing. We hope that in future other interesting examples will be presented with the aid of this formalism (cf. Khrennikov, 2010a, 2014).


Plotnitsky, A. (2006). Reading Bohr: Physics and Philosophy. Dordrecht: Springer.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Khrennikov. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Appendix

### Do we Need Complete Positivity?

Nowadays theory of the DL-instruments is considered oldfashioned; the class of such instruments is considered to be too general: it contains mathematical artifacts which have no relation to real physical measurements and state transformations as back-reactions to these measurements. The modern theory of instruments is based on the extendability postulate (e.g., Busch et al., 1995; Ozawa, 1997; Nielsen and Chuang, 2000):

For any apparatus A<sup>S</sup> corresponding to measurement of observable A on a system S and any system S˜ noninteracting with <sup>S</sup> there exists an apparatus <sup>A</sup>S+S˜ representing measurement on the compound system S + S˜ such that


for any state ρ of S and any state r of S˜.

In physics this postulate is quite natural: if, besides the quantum system S which is the object of measurement, there is (somewhere in the universe) another system S˜ which is not entangled with S, i.e., their joint pre-measurement state has the form ρ ⊗ r, then the measurement on S with the result a<sup>i</sup> can be considered as measurement on S + S˜ as well with the same result a<sup>i</sup> . It is clear that the back-reaction cannot change the state of S˜. Surprisingly this very trivial assumption has tremendous mathematical implications.

Since we proceed only in the finite dimensional case, the corresponding mathematical considerations are simplified. Consider an instrument E<sup>S</sup> representing the state update as the result of the back-reaction from measurement on S. For each Ŵ, this is a linear map from L(H) → L(H), where H is the state space of S. Let W be the state space of the system S˜. Then the state space of the compound system S+S˜ is given by the tensor product H ⊗W. We remark that the space of linear operators in this state space can be represented as L(H ⊗W) = L(H) ⊗ L(W). Then the superoperator <sup>E</sup>S(Ŵ) : <sup>L</sup>(H) <sup>→</sup> <sup>L</sup>(H) can be trivially extended to the superoperator <sup>E</sup>S(Ŵ)⊗I:L(H⊗W) <sup>→</sup> <sup>L</sup>(H⊗W). It is easy to prove that the state transformer corresponding to the apparatus for measurements on <sup>S</sup> <sup>+</sup> <sup>S</sup>˜ has to have this form <sup>E</sup>S+S˜ (ai) = ES(ai) ⊗ I. Hence, this operator also has to be positively defined. We remark that if the state space W has the dimension k, then the space of linear operators L(W) can be represented as the space of k × k matrices which is further denoted as **C** k×k .

Formally, a superoperator <sup>T</sup> : <sup>L</sup>(H) <sup>→</sup> <sup>L</sup>(H) is called completely positive if it is positive and each its trivial extension <sup>T</sup> <sup>⊗</sup> <sup>I</sup> : <sup>L</sup>(H) <sup>⊗</sup> **<sup>C</sup>** <sup>k</sup>×<sup>k</sup> <sup>→</sup> <sup>L</sup>(H) <sup>⊗</sup> **<sup>C</sup>** k×k is also positive. There are natural examples of positive maps which are not completely positive (Nielsen and Chuang, 2000).

A CP quantum operation is a DL quantum operation which is additionally completely positive; a CP instrument is based on CP quantum operations representing back-reactions to measurement. As was pointed out, in modern literature only CP quantum operations and instruments are in the use, so they are called simply quantum operations and instruments.

The main mathematical feature of (CP) quantum operations is that the class of such operations can be described in a simple way, namely, with the aid of the Kraus representation (Busch et al., 1995; Ozawa, 1997; Nielsen and Chuang, 2000):

$$T\rho = \sum\_{j} V\_{j}^{\star} \rho V\_{j},\tag{A1}$$

where (Vj) are some operators acting in H. Hence, for a (CP) instrument, we have: for each a<sup>i</sup> ∈ O, there exist operators (Va<sup>i</sup> j) such that

$$E(a\_i)\rho = \sum\_j V\_{a\_i j}^\star \rho \, V\_{a\_i j}.\tag{A2}$$

Thus,

$$\rho\_{a\_i} = \frac{\sum\_j V\_{a\_{ij}}^\star \rho \, V\_{a\_{ij}}}{\sum\_j V\_{a\_{ij}}^\star \rho \, V\_{a\_{ij}}},\tag{A3}$$

where the trace one condition (Equation 48) implies that

$$\sum\_{i} \sum\_{j} V\_{aij}^{\star} V\_{aij} = I. \tag{A4}$$

The corresponding POVMs Ma<sup>i</sup> can be represented as

$$M\_{a\_i} = \sum\_j V\_{a\_{i\bar{j}}}^\star V\_{a\_{i\bar{j}}}.\tag{A5}$$

This is a really elegant mathematical representation. However, it might be that this mathematical elegance, and not a real physical situation, has contributed to widespread use of CP in quantum information theory (cf. Shaji and Sudarshan, 2005).

Is the use of the extendability postulate justified in the operational approach to cognition?

Seemingly, not (although further analysis is required). Any concrete perception takes place at the conscious level, and it is based on interaction with the sensation of a visual image. The state of this sensation corresponds to the state of the system S in the above considerations. To be able to consider the state of another sensation, the analog of the state of the system S˜, the brain has to activate this sensation. Thus, we cannot simply consider all possible sensations as existing in some kind of the mental universe simultaneously. Hence, in general, sensations generated by different visual stimuli cannot be treated as existing simultaneously.

It is more natural to develop the theory of perception instruments as the theory of DL instruments and not CP instruments. In particular, although the Kraus representation can be used as a powerful analytic tool, we need not to overestimate its applicability for modeling of cognition.

# Quantum-Like Bayesian Networks for Modeling Decision Making

### Catarina Moreira\* and Andreas Wichert

Department of Computer Science, Instituto Superior Técnico, University of Lisbon, INESC-ID, Lisbon, Portugal

In this work, we explore an alternative quantum structure to perform quantum probabilistic inferences to accommodate the paradoxical findings of the Sure Thing Principle. We propose a Quantum-Like Bayesian Network, which consists in replacing classical probabilities by quantum probability amplitudes. However, since this approach suffers from the problem of exponential growth of quantum parameters, we also propose a similarity heuristic that automatically fits quantum parameters through vector similarities. This makes the proposed model general and predictive in contrast to the current state of the art models, which cannot be generalized for more complex decision scenarios and that only provide an explanatory nature for the observed paradoxes. In the end, the model that we propose consists in a nonparametric method for estimating inference effects from a statistical point of view. It is a statistical model that is simpler than the previous quantum dynamic and quantum-like models proposed in the literature. We tested the proposed network with several empirical data from the literature, mainly from the Prisoner's Dilemma game and the Two Stage Gambling game. The results obtained show that the proposed quantum Bayesian Network is a general method that can accommodate violations of the laws of classical probability theory and make accurate predictions regarding human decision-making in these scenarios.

#### Edited by:

Sandro Sozzo, University of Leicester, UK

Reviewed by: Kirsty Kitto, Queensland University of Technology, Australia Dominic Widdows, Microsoft Inc., USA

# \*Correspondence:

Catarina Moreira catarina.p.moreira@ist.utl.pt

#### Specialty section:

This article was submitted to Cognition, a section of the journal Frontiers in Psychology

Received: 31 July 2015 Accepted: 05 January 2016 Published: 26 January 2016

### Citation:

Moreira C and Wichert A (2016) Quantum-Like Bayesian Networks for Modeling Decision Making. Front. Psychol. 7:11. doi: 10.3389/fpsyg.2016.00011 Keywords: Bayesian networks, decision making, quantum probability, quantum cognition, sure thing principle

# 1. INTRODUCTION

The present work proposes a new model to make predictions in paradoxical situations where the Sure Thing Principle is being violated. The Sure Thing Principle (Savage, 1954) is a fundamental principle in economics and probability theory and states that if one prefers action A over B under state of the world X, and if one also prefers A over B under the complementary state of the world, ¬ X, then one should always prefer action A over B even when the state of the world is unspecified. Several experiments have shown that people violate this principle in decisions under uncertainty, leading to paradoxical results and violations of the classical law of total probability (Tversky and Kahnenman, 1974; Tversky and Kahneman, 1983; Tversky and Shafir, 1992; Aerts et al., 2004; Birnbaum, 2008).

# 1.1. Motivation

More recently, cognitive scientists have turned to quantum probability theory in order to accommodate these paradoxical findings. Although many models have been proposed in the literature, most of them cannot be considered predictive. Most of these models require a set of quantum parameters to be fitted and, so far, the only way these models have to fit the parameters is to use the final outcome of the experiment to set the parameters in order to explain that outcome. Moreover, these models cannot scale to more complex decision scenarios, because the number of parameters is exponentially large (Khrennikov, 2003a, 2004, 2006) or because of computational constraints in the computation of very large unitary operators (Busemeyer et al., 2006b, 2009; Pothos and Busemeyer, 2009).

# 1.2. Contributions

For these reasons, in this work, we propose a network structure framework that can easily scale to more complex decision scenarios. In other words, we propose a quantum-like Bayesian Network formalism, which consists in replacing classical probabilities by quantum probability amplitudes. However, since this approach also suffers from the problem of exponential growth of quantum parameters that need to be fit, we also propose a similarity heuristic (Shah and Oppenheimer, 2008) that automatically computes this exponential number of quantum parameters through vector similarities. A Bayesian Network can be understood as an acyclic directed graph, in which each node represents a random variable and each edge represents a direct causal influence from the source node to the target node (conditional dependence).

In this article, we will address the problem of violations to the Sure Thing Principle by examining two major problems in which these violations were verified: the Prisoner's Dilemma game and the Two Stage Gambling game. These violations were initially reported by Tversky and Shafir (1992) and later simulated in several works in the literature that also reported similar results (Li and Taplin, 2002; Busemeyer et al., 2006a; Hristova and Grinberg, 2008). We will show how the current classical models fail to explain the paradoxical findings implied in the violations of the Sure Thing Principle and we will make a more deep discussion about the drawbacks of the most representative quantum-like models in the literature.

# 1.3. Research Questions

With the present work, we intend to address the following research questions. An answer to these questions is given in Section 8.


# 2. VIOLATIONS OF THE SURE THING PRINCIPLE

In this section, we present two experiments from the literature, in which it was observed violations to the Sure Thing Principle and consequently to the laws of classical probability theory and logic. The two experiments are the Prisoner's Dilemma game and the Two Stage Gambling game.

# 2.1. The Prisoner's Dilemma Game

The Prisoner's Dilemma game corresponds to an example of the violation of the Sure Thing Principle. In this game, there are two prisoners who are in separate solitary confinements with no means of speaking to or exchanging messages with the other. The police offer each prisoner an agreement: each prisoner is given the opportunity either to betray the other (Defect), by testifying that the other committed the crime, or to Cooperate with the other by remaining silent.

In order to test the veracity of the Sure Thing Principle under the Prisoner's Dilemma game, an experiment was made in which three conditions were tested:


**Table 1** summarizes the results of several works of the literature, which have performed this experiment. Note that all entries of **Table 1** show a violation of the law of total probability. According to the total law of probability, it is expected that:

$$\begin{aligned} Pr(P\_2 = \text{Defect} \mid P\_1 = \text{Defect}) &\geq Pr(P\_2 = \text{Defect})\\ \geq Pr(P\_2 = \text{Defect} \mid P\_1 = \text{Cooperate}) \end{aligned}$$

Note that, Pr P<sup>2</sup> = Defect | P<sup>1</sup> = Defect corresponds to the probability of the second player choosing the Defect action given that he knows that the first player chose to Defect. In **Table 1** this corresponds to the entry Known to Defect. In the same way, Pr P<sup>2</sup> = Defect | P<sup>1</sup> = Cooperate corresponds to the entry Known to Collaborate. The observed probability during the experiments concerned with player 2 choosing to Defect, Pr P2 = Defect , corresponds to the entry unknown of **Table 1**, since there is no evidence about the first player's actions. Finally, the entry Classical Probability corresponds to the classical probability Pr P<sup>2</sup> = Defect , which is computed through the law of total probability:

TABLE 1 | Works of the literature reporting the probability of a player choosing to Defect under several conditions for the Prisoner's Dilemma Game: when the action of the second player is known to be Defect (Known to Defect), when the action of the second player is known to be Cooperate (Known to Collaborate), and when the action of the second player is not known (Unknown).


a corresponds to the average of the results reported in the first two payoff matrices of the work of Croson (1999).

b corresponds to the average of all seven experiments reported in the work of Li and Taplin (2002).

TABLE 2 | Works of the literature reporting the probability of a player choosing to make a second gamble under several conditions for the Two Stage Gambling Game: when the outcome of the first gamble is known to be Lose (Known to Lose), when the outcome of the first gamble is known to be Win (Known to Win), and when the outcome of the first gamble is not known (Unknown).


$$\Pr(P\_2 = Defect) = \Pr(P\_1 = Defect) \cdot \Pr(P\_2 = Defect | \cdot)$$

$$P\_1 = \text{Defect} \dagger$$

$$\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad$$

+ Pr(P<sup>1</sup> = Cooperate) · Pr(P<sup>2</sup> = Defect|P<sup>1</sup> = Cooperate)

# 2.2. The Two Stage Gambling Game

The Two Stage Gambling game is another game that shows violations of the Sure Thing Principle. In this game, participants were asked at each stage to make the decision of whether or not to play a gamble that has an equal chance of winning \$200 or losing \$100. Three conditions were verified:


The overall results revealed that participants who knew that they won the first gamble, decided to play again. Participants who knew that they lost the first gamble, also decided to play again. Through Savage's Sure Thing Principle, it was expected that the participants would choose to play again, even if they did not know the outcome of the first gamble. However, the results obtained revealed something different. If the participants did not know the outcome of the first gamble, then many of them decided not to play the second one.

We conclude this section by clarifying why we will only validated the proposed quantum-like Bayesian Network in small decision problems (such as the Prisoner's Dilema and the Two Stage Gambling Game), since we are defending a general quantum-like structure that is able to deal with complex decision scenarios. We used small decision scenarios, because we cannot find literature showing violations to the Sure Thing Principle for more complex decision scenarios. Actually, after performing some research, we believe that the violations of the Sure Thing Principle tend to diminish with the complexity of the decision scenario. Imagine for instance a three stage gambling game. It will be very hard to find significant data that shows a player wishing to play the last gamble, given that he has lost the two previous gambles. **Table 2** shows the results obtained in several works of the literature.

# 3. VIOLATION OF THE SURE THING PRINCIPLE: CLASSICAL APPROACHES

There are many classical approaches that could be used to try to accommodate violations to the Sure Thing Principle. Two of these main models are the Classical Markov Models and the Classical Bayesian Networks. In this section, we will describe how these two models work and we will explain why they cannot be used to simulate violations to the Sure Thing Principle.

# 3.1. Classical Markov Model

A Markov Model can be generally defined as a stochastic probabilistic undirected graphical model that satisfies the Markov property. This means that the process evolves (and tries to perform a prediction) based only on the present state. The current state is independent of any past or future states. These probabilistic models are very useful to model systems that change states according to a transition matrix that specifies some probability distribution or some transition rules that depend solely on the current state.

The initial state is given by a vector, which contains the probabilities of each event occurring. This vector requires that the sum of these probabilities is one.

$$P\_I = \begin{bmatrix} a\_0 \ a\_1 \ \dots \ a\_n \end{bmatrix} \cdot \frac{1}{\sum\_i a\_i}$$

The state transition is represented by a differential equation, which consists in the multiplication of this initial probability state P<sup>I</sup> by a transition function T(t). This function is represented by a matrix containing positive real numbers and with the constraint that each row must sum to one (normalization axiom). In other words, this matrix represents the new probability distribution across all possible outcomes through some time period t (Pothos and Busemeyer, 2009).

$$\frac{d}{dt}T(t) = K \cdot T(t) \tag{1}$$

The intensity matrix K corresponds to the problem's settings. For instance, for the Prisoner's Dilemma Game, it represents the payoffs of each player, in the Two Stage Gambling Game, it represents the rewards/losses that the player can have in each gamble. A solution to this equation is given by Equation 2, which allows one to construct a transition matrix for any time point from the fixed intensity matrix. In other words, the intensity matrix performs a transformation in the probabilities of the current state in order to favor a certain action in the decision problem.

$$T(t) = e^{K.t} \tag{2}$$

In the end, we can compute the solution for the probability distribution over time by multiplying the transition matrix by the initial probability state.

$$P\_F(t) = e^{K.t} \cdot P\_I(0) \tag{3}$$

In Equation 3, we do not need to perform any normalization in the end, because the operation in Equation 1 together with the intensity matrix K assure that the values computed are already probability values.

Since, in the end, the Markov Model has to obey to the rules of probability theory and set theory, even if we parameterize the intensity matrix K, we would find that there are no values that could explain the violations of the Sure Thing Principle without violating the laws of classical probability theory. Some studies have been proposed in the literature demonstrating that the classical Markov Model cannot accommodate violations to the Sure Thing Principle (Busemeyer et al., 2009; Pothos and Busemeyer, 2009).

# 3.2. Classical Bayesian Networks

A classical Bayesian Network can be defined by a directed acyclic graph structure in which each node represents a different random variable from a specific domain and each edge represents a direct influence from the source node to the target node. The graph represents independence relationships between variables and each node is associated with a conditional probability table, which specifies a distribution over the values of a node given each possible joint assignment of values of its parents. This idea of a node, depending directly from its parent nodes, is the core of Bayesian Networks. Once the values of the parents are known, no information relating directly or indirectly to its parents or other ancestors can influence the beliefs about it (Koller and Friedman, 2009).

A Bayesian Network can be understood as the representation of a full joint probability distribution through conditional independence statements. This way, a Bayesian Network can be used to answer any query about the domain by combining (adding) all relevant entries from the joint probability.

The full joint distribution (Russel and Norvig, 2010) of a Bayesian Network, where X is the list of variables, that is, the set of nodes of the Bayesian Network and is given by:

$$\Pr(X\_1, \ldots, X\_n) = \prod\_{i=1}^n \Pr(X\_i | A \text{parents}(X\_i)) \tag{4}$$

The formula for computing classical exact inferences on Bayesian Networks is based on the full joint distribution (Equation 4). Let e be the list of observed variables (nodes) and let Y be the remaining unobserved variables (nodes) in the network. For some query X, the inference is given by Equation 5. Note that, Pr(X, e, y) corresponds to the full joint probability distribution.

$$\Pr(X|e) = \alpha \left[ \sum\_{\mathcal{Y} \in \mathcal{Y}} \Pr(X, e, \mathcal{y}) \right] \tag{5}$$

$$\text{Where} \ \alpha = \frac{1}{\sum\_{\mathbf{x} \in \mathcal{X}} Pr\_{\mathbf{c}}(X = \mathbf{x}, \mathbf{e})}$$

The summation is over all possible y, i.e., all possible combinations of values of the unobserved variables y. The α parameter, corresponds to the normalization factor for the distribution Pr(X|e) (Russel and Norvig, 2010). This normalization factor comes from some assumptions that are made in Bayes rule.

One might think that if we parameterize the Bayesian Network, it could be possible to explain the paradoxical findings of the Sure Thing Principle. This line of thought is legitimate, however one must take into account that in the end, the probabilistic inferences computed through the Bayesian Network must obey set theory and to the law of total probability. This means that, even if we parameterize the network, we could not find any closed form optimization that would accommodate violations to the Sure Thing Principle.

# 4. VIOLATION OF THE SURE THING PRINCIPLE: QUANTUM-LIKE APPROACHES

In this section, we introduce the most import quantum decision models that have been proposed in the literature that can accommodate the violations to the Sure Thing Principle. The models that we describe in this section are the following: the Quantum Dynamical Model (Section 4.1), the Quantum-Like Approach (Section 4.2), and the Quantum Prospect Decision Theory (Section 4.3).

# 4.1. The Quantum Dynamical Model

The Quantum Dynamical Model was originally proposed by Busemeyer (Busemeyer et al., 2009; Pothos and Busemeyer, 2009) and consists on a general framework that corresponds to a quantum version of a classical dynamical Markov model. The Quantum Dynamical Model takes into account time evolution. Quantum interference effects are also taken into account though a superposition of paths.

The initial belief state corresponds to a quantum state representing a superposition of the participant's beliefs in the form of a vector. The term ψ corresponds to a quantum probability amplitude.

$$P\_I = \begin{bmatrix} \psi\_0 \ \psi\_1 \ \dots \ \psi\_n \end{bmatrix} \cdot \frac{1}{\sum\_i |\psi\_i|^2} \tag{6}$$

Next, we need to create a unitary matrix. In quantum mechanics, a unitary matrix restricts the allowed evolution of quantum systems, ensuring that the sum of probabilities of all possible outcomes of any event is always 1. This means that the matrix must be orthonormal (the rows are mutually orthogonal unit vectors, as are the columns). In the Quantum Dynamical Model, this matrix encodes all state transitions that a person can experience while choosing a decision. The unitary matrix U is computed by a differential equation called Schrödinger's equation.

$$\frac{\delta}{\delta t}U(t) = -i \cdot H \cdot U(t) \tag{7}$$

The parameter t corresponds to the time evolution. Under the Dynamical Quantum Model, this parameter is set to π/2, corresponding to the average time that a participant takes to make a decision (approximately 2 s) (Pothos and Busemeyer, 2009). The matrix H is called the Hamiltonian matrix, which must be Hermitian in order to generate a unitary matrix.

$$U(t) = \exp(-i \cdot H \cdot t) \tag{8}$$

By multiplying the unitary matrix with the initial superposition belief state, one can compute the transition of the participants' beliefs at each time. The final vector Q<sup>F</sup> represents the amplitude distribution across states after deliberation.

In the end, we can compute the solution for the probability distribution over time by multiplying the transition matrix by the initial probability state.

$$Q\_F = U \cdot Q\_i = e^{-i \cdot H \cdot t} \cdot Q\_I(0) \tag{9}$$

In Equation 9, we do not need to perform any normalization in the end, because the operation in Equation 8 together with the intensity matrix H assure that the values computed are in accordance with the normalization axiom.

# 4.2. The Quantum-Like Approach

The Quantum-Like Approach has its roots in contextual probabilities. This model was proposed by Khrennikov and corresponds to a general contextual probability space from which the classical and quantum probability models can be derived (Khrennikov, 2009b, 2010).

In the Quantum-Like Approach, the context relates to the circumstances that form the setting for an event in terms of which it can be fully understood, clarifying the meaning of the event. More specifically, it is a complex of conditions under which a measurement is performed. For instance, in domains outside of physics, such as cognitive science, one can have mental contexts. In social sciences, we can have a social context. And the same idea is applied to many other domains, such as economics, politics, game theory, biology, etc. (Khrennikov, 1999, 2001, 2003b, 2005a,b).

The Quantum-Like Approach corresponds to a contextual probabilistic model given by M <sup>=</sup> (C, <sup>O</sup>, <sup>π</sup>(O, <sup>C</sup>)). Where <sup>C</sup> is a set of contexts, O is the set of observables and π(O, C) corresponds to a probability distribution of some observables belonging to a specific context. Associated with a context, there are a set of observables. In quantum mechanics, an observable corresponds to a self-adjoint operator on a complex Hilbert Space. Under the Quantum-Like Approach, these observables correspond to the set of possible events with their respective values.

Let's assume, for a context <sup>C</sup> <sup>∈</sup> <sup>C</sup>, that there are two dichotomous observables <sup>a</sup>, <sup>b</sup> <sup>∈</sup> <sup>O</sup>, and each of these observables can take some values α ∈ a and β ∈ b, respectively.

The Quantum-Like Approach can be built from the general structure of the quantum law of total probability. The quantum law of total probability is very similar to the classical law of total probability, except that it uses complex amplitudes instead of real probability values. In order to obtain a probability value, the magnitude of the quantum amplitude must be squared Busemeyer and Bruza (2012). This will generate an additional term called the interference term. This term does not exist in classical probability and enables the representation of interferences between quantum states.

$$Pr(b=\beta) = \text{Classical\\_Probability}(b=\beta) + \text{Interference\\_Term} \tag{10}$$

Under this representation, we can replace Classical\_Probability by the classical law of total probability, and also replace the quantum Interference\_Term by a measure of supplementary, represented by δ(β|a, C).

If we perform the normalization of the probability measure of supplementary δ(β|a, C) by the square root of the product of all probabilities, we obtain:

$$\lambda\_{\theta} = \frac{\delta(\beta|a, \mathcal{C})}{2\sqrt{\prod\_{\alpha \in a} Pr(a = \alpha | \mathcal{C}) Pr(b = \beta | a = \alpha, \mathcal{C})}} \tag{11}$$

From Equation 11, the general probability formula of the Quantum-Like Approach can be derived. For two variables, is given by:

$$\Pr(b = \beta | C) = \sum\_{a \in a} \Pr(a = \alpha | C) \Pr(b = \beta | a = \alpha, C)$$

$$+ 2\lambda\_{\theta} \sqrt{\prod\_{a \in a} \Pr(a = \alpha | C) \Pr(b = \beta | a = \alpha, C)} \quad \text{(12)}$$

If we look closely to Equation 12, we will see that the first summation of the formula corresponds to the classical law of total probability. The second term of the formula (the one that contains the λ<sup>θ</sup> parameter), does not exist in the classical model and it is called the interference term.

In a quantum context, since the supplementary term δ(β|a, C) is being normalized in a quantum fashion, then we automatically know that the indicator term λ<sup>θ</sup> will always have to be smaller than 1 in order to obtain quantum probabilities, λ<sup>θ</sup> ≤ 1. So, under trigonometric contexts, the Quantum-Like Approach for quantum probabilities becomes:

$$\lambda\_{\theta} = \cos(\theta) \quad \rightarrow \quad Pr(\beta|C) = \sum\_{\alpha \in a} Pr(\alpha|C) Pr(\beta|\alpha, C)$$

$$+ 2 \sqrt{\prod\_{\alpha \in a} Pr(\alpha|C) Pr(\beta|\alpha, C) \cos(\theta)} \tag{13}$$

Equation 13 can be simplified in the following way:

$$\begin{split} \Pr(\beta|\mathcal{C}) &= \left| \sqrt{\Pr(\alpha\_1|\mathcal{C}) \Pr(\beta|\alpha\_1, \mathcal{C})} \right. \\ &\left. + e^{i\theta\beta|\alpha, \mathcal{C}} \sqrt{\Pr(\alpha\_2|\mathcal{C}) \Pr(\beta|\alpha\_2, \mathcal{C})} \right|^2 \end{split} \tag{14}$$

Equation 14 corresponds to the representation of the quantum law of total probability. In this equation, the angle θβ|α,<sup>C</sup> corresponds to the phase of a random variable and incorporates the phase of both A = α<sup>1</sup> and A = α<sup>2</sup> in the following way: θβ|α,<sup>C</sup> = θβ|α<sup>1</sup> − θβ|α<sup>2</sup> .

One should note that, the Quantum-Like Approach can be extended to more complex decision scenarios, that is, with more than two random variables. However, this will lead to the very difficult task of tuning an exponential number of quantum θ parameters. Peter Nyman noticed this problem when he generalized the Quantum-Like Approach for three dichotomous variables (Nyman, 2010, 2011b; Nyman and Basieva, 2011a,b).

#### 4.2.1. The Hyperbolic Interference

Although the Quantum-Like Approach provides great possibilities comparing with the classical one, it seems that it cannot cover completely data from psychology and that a quantum formalism was not enough to explain some paradoxical findings (see Khrennikov et al., 2014), so hyperbolic spaces were proposed (Khrennikov, 2005c; Nyman, 2011a,b).

From Equation 12, if Pr(b = β) − P α∈a Pr(a = α|C)Pr(b = β|a = α, C) is different from zero, then some interference effects occur. In order to determine which type of interference happened, one tests the Quantum-Like Approach for quantum probabilities. This can be determined by normalizing the supplementary measure in a quantum fashion, just like presented in Equation 11.

If the probability Pr(b = β) was not computed in a trigonometric space (that is, it is not quantum), then, it is straightforward that the quantum normalization applied in Equation 11 will give a value bigger than 1. Since we are not in the context of quantum probabilities, the quantum normalization factor will fail to normalize the interference term, and will produce a number bigger than the normalization factor. Under these circumstances, the Quantum-Like Approach incorporates the generalization of hyperbolic probabilities, arguing that the context in which these probabilities were computed was in a Hyperbolic context (Khrennikov, 2009a, 2010; Nyman, 2011a).

Under Hyperbolic contexts, the Quantum-Like Approach contextual probability formula becomes:

$$\lambda\_{\theta} = \cosh(\theta) \quad \rightarrow \quad Pr(\beta|C) = \sum\_{\alpha \in a} Pr(\alpha|C) Pr(\beta|\alpha, C)$$

$$\pm 2 \sqrt{\prod\_{\alpha \in a} Pr(\alpha|C) Pr(\beta|\alpha, C) \cosh(\theta)} \tag{15}$$

In summary, according to the values computed by the indicator function λ<sup>θ</sup> , the Växjö Model enables the computation of probabilities in the following contexts:


# 4.3. The Quantum Prospect Decision Theory

The Quantum Prospect Decision Theory was developed by Yukalov and Sornette (2008, 2011) and developed throughout many other works (Yukalov and Sornette, 2009a,b, 2010a,b). The foundations of this theory are very similar to the previously presented Quantum-Like Approach.

In the Quantum-Like Approach, we start with two dichotomous observables. In the Quantum Prospect Decision Theory, these observables are referred to intensions. An intension can be defined by an intended action and a set of intended actions is defined by a prospect.

Each prospect can contain a set of action modes, which are concrete representations of an intension. Making a comparison with the Quantum-Like Approach, a prospect can be seen as a random variable and the set of action modes are the assignments that each random variable can have. For instance, the intension to play can have two representations: play action A or play action B.

Following the work of Yukalov and Sornette (2011), two intensions A and B with the respective representations: A = x where x ∈ a1, a<sup>2</sup> and B = y, where y ∈ b1, b2. The corresponding state of mind is given by:

$$\left| \left| \psi\_{s} \left( t \right) \right> = \sum\_{i,j} c\_{i,j} \left( t \right) \left| A\_{i} \ B\_{j} \right> \tag{16}$$

Equation 16 represents a linear combination of the prospect basis states. From a psychological perspective, the state of mind is a fixed vector characterizing a particular decision maker with his/her beliefs, habits, principles, etc. That is, it describes each decision maker as a unique subject.

The prospect states corresponding to the intensions A and B are given by Equation 17. The ψ symbol corresponds to quantum amplitudes associated with the prospect state. Under the Quantum Prospect Decision Theory, these amplitudes represent the weights of the intended actions, while a person is still deliberating about them.

$$\begin{aligned} \vert \pi\_{A=a\_1} \rangle &= \psi\_{11} \vert A = a\_1 B = b\_1 \rangle + \psi\_{12} \vert A = a\_1 B = b\_2 \rangle \\ \vert \pi\_{A=a\_2} \rangle &= \psi\_{21} \vert A = a\_2 B = b\_1 \rangle + \psi\_{22} \vert A = a\_2 B = b\_2 \rangle \end{aligned} \tag{17}$$

The probabilities of the prospects can be obtained by computing the squared magnitude of the prospect states (just like in the Quantum-Like Approach and the Quantum Dynamical Model). Consequently, the final probabilities are given by:

$$\begin{aligned} Pr(\pi\_{A=a\_1}) &= Pr(A=a\_1, B=b\_1) + Pr(A=a\_1, B=b\_2) \\ &+Interference\_{A=a\_1} \\ Pr(\pi\_{A=a\_2}) &= Pr(A=a\_2, B=b\_1) + Pr(A=a\_2, B=b\_2) \\ &+Interference\_{A=a\_2} \end{aligned} \tag{18}$$

Where the interference term in defined by:

$$\begin{aligned}Interference\_{A=a\_1} &= 2 \cdot \varphi(\pi\_{A=a\_1}) \sqrt{Pr(A=a\_1, B=b\_1)} \cdot \\ &= \sqrt{Pr(A=a\_1, B=b\_2)} \end{aligned} \quad \begin{aligned} \sqrt{Pr(A=a\_1, B=b\_1)} \cdot \\ \sqrt{Pr(A=a\_2, B=b\_1)} \cdot \\ \sqrt{Pr(A=a\_2, B=b\_2)} \end{aligned} \text{ (19)}$$

In Equation 19, the symbol ϕ corresponds to the uncertainty factor and is given by:

$$\begin{aligned} \varphi(\pi\_{A=a\_1}) &= \cos \left( \arg \left( \psi\_{11} \cdot \psi\_{12} \right) \right) \\ \varphi(\pi\_{A=a\_2}) &= \cos \left( \arg \left( \psi\_{21} \cdot \psi\_{22} \right) \right) \end{aligned} \tag{20}$$

The interference term corresponds to the effects that emerge during the process of deliberation, that is, while a person is making a decision. These interference effects result from conflicting interests, ambiguity, emotions, etc. (Yukalov and Sornette, 2011).

One can notice that the Quantum Prospect Decision Theory is very similar to the Quantum-Like Approach proposed by Khrennikov (2009c). Both theories end up with the same quantum probability formula. However, the Quantum Prospect Decision Theory provides some heuristics in how to choose the uncertainty factors. This information will be addressed in the next section.

#### 4.3.1. Choosing the Uncertainty Factor

In order to accommodate the violations of the Sure Thing Principle, the uncertainty factor must be set in such a way that it will enable accurate predictions. Two methods were proposed by Yukalov and Sornette (2011) to estimate the uncertainty factor: the Interference Alternation method and the Interference Quarter Law.

• **Interference Alternation**—Under normalized conditions, the probabilities of the prospects p πj must sum to 1. This normalization only occurs if one characterizes the interference term as an alternation, such that the interference effects disappear while summing the probability of the prospects. The interference alternation property is in accordance with the findings of Epstein (1999): the destructive interference effects can be associated with uncertainty aversion. This leads to a less probable action under uncertainty conditions. In contrast, the probabilities of other actions that contain less uncertainty are enhanced through constructive quantum interference effects. This uncertainty aversion happens quite frequently in situations where the Sure Thing Principle is violated. This implies that one of the probabilities of the prospects must be enhanced, whereas the other must be decreased.

$$\begin{aligned} \left[ \operatorname{sign} \left[ \varphi(\pi\_{A=a\_1}) \right] \right] &= -\operatorname{sign} \left[ \varphi(\pi\_{A=a\_2}) \right] \quad \text{where} \\ \left| \varphi(\pi\_{A=a\_l}) \right| &\in \left[ 0, 1 \right] \end{aligned} \tag{21}$$

• **Interference Quarter Law**—the interference terms generated by quantum probabilistic inferences, have a free quantum parameter, which is the uncertainty factor. The Interference Quarter Law corresponds to a quantitative estimation of this parameter. The modulus of the interference term q can be quantitatively estimated by computing the expectation value of the probability distribution of a random variable ξ in the interval [0, 1].

$$q \equiv \int\_0^1 \xi \cdot \operatorname{pr}(\xi) \, d\xi = \frac{1}{4} \tag{22}$$

The probability distribution p(ξ ) is given by Equation 22 and can be computed by making the average of two probability distributions.

$$Pr\left(\xi\right) = \frac{1}{2}\left[\left.pr\_1\left(\xi\right) + \left.pr\_2\left(\xi\right)\right] = \delta\left(\xi\right) + \frac{1}{2}\Theta\left(1 - \xi\right) \quad \text{(23)}\right]$$

# 4.4. Quantum-Like Bayesian Networks in the Literature

There are two main works in the literature that have contributed to the development and understanding of Quantum Bayesian Networks. One belongs to Tucci (1995) and the other to Leifer and Poulin (2008).

In the work of Tucci (1995), it is argued that any classical Bayesian Network can be extended to a quantum one by replacing real probabilities with quantum complex amplitudes. This means that the factorization should be performed in the same way as in a classical Bayesian Network.

One big problem with Tucci's work was the lack of methods to set the phase parameters. The author states that, one could have infinite Quantum Bayesian Networks representing the same classical Bayesian Network depending on the values that one chooses to set the parameter. This requires that one knows a priori which parameters would lead to the desired solution for each node queried in the network (which we never know). So, for these experiments, Tucci's model (Tucci, 1995) cannot predict the results observed, since one does not have any information about the quantum parameters.

In the work of Leifer and Poulin (2008), the authors argue that, in order to develop a quantum Bayesian Network, a quantum version is required of probability distributions, quantum marginal probabilities and quantum conditional probabilities (**Table 3**). The authors made a preliminary study on these concepts. Generally speaking, a quantum probability distribution corresponds to a density matrix contained in a Hilbert space, with the constraint that the trace of this matrix must sum to 1. In quantum probability theory, a full joint distribution is given by a density matrix ρ. This matrix provides the probability distribution of all states that a Bayesian Network can have. The marginalization operation corresponds to a quantum partial trace (Nielsen and Chuang, 2000; Rieffel and Polak, 2011). In the end, these models from the literature fail to provide any advantage relatively to the classical models, because they cannot take into account interference effects between random variables. So, they provide no advantages in modeling decision-making problems that try to predict decisions that violate the laws of total probability.

TABLE 3 | Relation between classical and quantum probabilities used in the work of Leifer and Poulin (2008).


# 5. PROBLEMS WITH CURRENT CLASSICAL AND QUANTUM-LIKE APPROACHES

In this section, we summarize the three main models that were presented in the previous sections (**Table 4**) and point out the advantages and disadvantages of each one of them.

The Quantum-Like Approach is a very simple framework that enables the computation of quantum probabilities by performing the direct mapping between classical real probabilities and quantum probability amplitudes through Born's rule (Zurek, 2005, 2011). Although this model can be extended for N random variables and also go beyond quantum probabilities by incorporating hyperbolic spaces, this model cannot be called predictive, since there are no mechanisms to estimate the quantum θ parameters. One is required to know a priori the outcome of the decision scenario in order to fit the quantum parameters. So, this model has an explanatory nature in what concerns accommodating the paradoxical findings derived from violations of the Sure Thing Principle.

The Quantum Dynamical Model provides an elegant framework that can estimate decisions though time evolution. However, it also suffers from a major disadvantage related to Hamiltonian matrices. Creating a manual Hamiltonian is a very hard problem. It is required that all possible interactions of the decision problem are known and this specification must be made in such a way that the matrix is double stochastic. For more complex decision scenarios, this process is intractable. Furthermore, the Hamiltonian matrix grows exponentially with the complexity of the decision problem and the computation of a Unitary operator from such matrices is a very complex process. Most of the times, approximations are used, because of the complexity of the calculations involved in the matrix exponentiation operation.

The Quantum Prospect Decision Theory is a model very similar to the Quantum-Like Approach, but it is not extended to the hyperbolic spaces. The main advantage of the Quantum Prospect Decision Theory toward the other known quantum models is its predictive nature. The Quantum-Like Approach and the Quantum Dynamical model are more explanatory models. That is, they require that the outcome of an experiment is known in order to fit the parameters of the model and explain the paradoxical findings. The Quantum Prospect Decision Theory, on the other hand, contains an heuristic (the interference quarter law) that enables the estimation of the quantum parameters, turning the model predictive. However, the interference quarter law is a static heuristic. This means that, independently of the decision scenario and independently of the complexity of the decision, this interference term remains constant for every problem.

All of the above models exhibit different growth rates in parameters. For instance, the Dynamical Model parameterizes actions plus an additional parameter to model cognitive dissonance effects. So the number of parameters would be static if we consider the N-Person Prisoner's Dilemma Game. That is, instead of having only 2 players, this would be extended to N players. In the case of the Quantum-Like Approach, we would have 2<sup>N</sup> parameters for the N-Person Prisoner's Dilemma Game. The number 2 comes from the fact that each player has two actions (either Defect or Cooperate). The same applies to the Quantum-Like Bayesian Networks and to the Quantum Prospect Theory Model. If we extend these models for N random variables, the number of parameters grows at a rate of N Nperson actions , but these parameters will be automatically set using the Law of Quantum Interference, in the case of the Quantum Prospect Theory. The same is applied to the proposed Quantum-Like Bayesian Network, but instead of a static heuristic, we automatically set these parameters using a dynamic heuristic.

At this point, the reader might be thinking that the Quantum Dynamical Model provides great advantages toward the existing models, since the number of parameters required corresponds to the players actions with an additional cognitive dissonance parameter. Although this line of thought is correct, one should also take into account how the model unfolds. Although the numbers of parameters do not grow exponentially large as in the Quantum-Like Approach, the size of the Hamiltonian does. In fact, it grows exponentially large with the following size: N Nplayers actions × N Nplayers actions , where Nactions represents the number of actions of the players and Nplayers corresponds to the number of players. The computation of a unitary operator from such matrices is a very complex process. Most of the times, approximations are used, because of the complexity of the calculations involved in the matrix exponentiation operation. **Table 5** summarizes the parameter growth rate of each approach.


TABLE 4 | Summary of the most relevant quantum decision models of the literature.

#### TABLE 5 | Comparison of the different growth rates in parameters for some models proposed in the literature.


For these reasons, in this work, we propose a network structure framework that can easily scale to more complex decision scenarios. In other words, we propose a quantumlike Bayesian Network formalism, which consists in replacing classical probabilities by quantum probability amplitudes. However, since this approach also suffers from the problem of exponential growth of quantum parameters that need to be fit, we also propose a similarity heuristic that automatically computes this exponential number of quantum parameters (Shah and Oppenheimer, 2008).

# 6. A QUANTUM-LIKE BAYESIAN NETWORK FOR DECISION AND COGNITION

The reason why we chose Bayesian Networks is because it provides a link between probability theory and graph theory. And a fundamental property of graph theory is its modularity: one can build a complex system by combining smaller and simpler parts. It is easier for a person to combine pieces of evidence and to reason about them, instead of calculating all possible events and their respective beliefs (Griffiths et al., 2008). In the same way, Bayesian Networks represent the decision problem in small modules that can be combined to perform inferences. Only the probabilities, which are actually needed to perform the inferences, are computed.

A Quantum-Like Bayesian Network can be defined in the same way as a classical Bayesian Network with the difference that real probability numbers are replaced by quantum probability amplitudes (Tucci, 1995). **Figure 1** shows an example of the proposed Quantum-Like Bayesian Network, containing quantum probability amplitudes, ψi,<sup>j</sup> , instead of real probability values.

In this sense, the quantum counterpart of the full joint probability distribution corresponds to the application of Born's rule to Equation 4. This results in the quantum like version of the full joint probability distribution:

$$Pr(X\_1, \ldots, X\_n) = \left| \prod\_{i=1}^n \psi(X\_i | Parametric(X\_i)) \right|^2 \tag{24}$$

In order to perform exact inference in Bayesian Networks, the probability amplitude of each assignment of the network is propagated and influences the probabilities of the remaining nodes. That is, every assignment of every node of the Bayesian Network propagate throughout the network until they reach the node representing the query variable. Note that, by taking multiple assignments and paths at the same time, these trails influence each other producing interference effects.

The quantum counterpart of the Bayesian exact inference formula corresponds to the application of Born's rule to the classical marginal probability distribution equation (Equation 5).

$$Pr(X|e) = \alpha \left| \sum\_{\mathcal{Y}} \prod\_{\mathbf{x}=1}^{N} \psi(X\_{\mathbf{x}}|Parents(X\_{\mathbf{x}}), e, \mathcal{y}) \right|^2 \tag{25}$$

Expanding Equation 25, it will lead to the quantum marginalization formula with interference effects (Moreira and Wichert, 2014):

$$Pr(X|e) = \alpha \sum\_{i=1}^{|Y|} \left| \prod\_{\mathbf{x}}^{N} \psi(X\_{\mathbf{x}}|Pounds(X\_{\mathbf{x}}), \mathbf{e}, \mathbf{y} = i) \right|^2 + 2 \cdot Interference$$

$$\begin{aligned} \text{Interference} &=\\ \sum\_{i=1}^{|Y|-1} \sum\_{j=i+1}^{|Y|} \left| \prod\_{\mathbf{x}}^{N} \psi(\mathbf{X}\_{\mathbf{x}} | \text{Parents}(\mathbf{X}\_{\mathbf{x}}), \mathbf{e}, \boldsymbol{\mathcal{y}} = i) \right| \\ &\left| \prod\_{\mathbf{x}}^{N} \psi(\mathbf{X}\_{\mathbf{x}} | \text{Parents}(\mathbf{X}\_{\mathbf{x}}), \mathbf{e}, \boldsymbol{\mathcal{y}} = j) \right| \cdot \cos(\theta\_{i} - \theta\_{j}) \end{aligned}$$

In the Quantum Dynamical Model, since it uses unitary operators, the double symmetric property of these operators does not require the normalization of the computed values. In the proposed approach, on the other hand, since we do not have the constraints of double stochastic operators, we need to normalize the final scores that are computed in order to achieve a probability value. In classical Bayesian inference, normalization of the inference scores is also necessary due to assumptions made in Bayes rule. The normalization factor corresponds to α in Equation 26.

Note that, in Equation 26, if one sets (θ<sup>i</sup> − θj) to π/2, then cos(θ<sup>i</sup> − θj) = 0, which means that the quantum Bayesian

TABLE 6 | Table representation of a quantum full joint probability distribution.


Network collapses to its classical counterpart. That is, the proposed Quantum-Like Bayesian Network can behave in a classical way, if one sets the interference term to zero. Setting the angles to right angles means that all cosine similarities are 0 or 1. This transforms a continuous-valued system to a Boolean-valued system. Moreover, in Equation 26, if the Bayesian Network has N binary random variables, we will end up with 2<sup>N</sup> free quantum θ parameters.

The proposed Bayesian Network leaves an open research question regarding the quantum θ parameters: how can one compute such parameters in order to obtain realistic inferences? By realistic, we mean the probability that an event that was observed in an experiment. These probabilities are impossible to compute using exact Bayesian inference in experiments where the Sure Thing Principle is being violated. In the next section, we answer this question by proposing a similarity heuristic that is able to compute the quantum θ parameters through vector similarities between beliefs/actions in superposition.

# 6.1. Representation of Beliefs/Actions

The superposition quantum vector, comprising all possible events, is given by the quantum full joint probability distribution already presented in Equation 24. The full joint probability distribution can be illustrated in table form just like it is presented in **Table 6**.

The quantum probability inference formula is composed of two parts: one representing the classical probability and the other representing the quantum interference term. The interference term performs a summation over several combinations of the entries of the full joint probability distribution in groups of two variables: PN−<sup>1</sup> i=1 P<sup>N</sup> j=i+1 |ψi | ψj  cos θ<sup>i</sup> − θ<sup>j</sup> . For each pair of variables, we will represent them as a 2-dimensional vector: one component represents the probability of ψ<sup>i</sup> and the other corresponds to ψ<sup>j</sup> . Moreover, the different probabilities represented in the full joint probability distribution table can be seen as the different beliefs/actions that one might have available before making a decision.

$$\mathbf{a}(\mathbf{X}=\mathbf{T}) = \begin{bmatrix} \left| \psi\_{i} \cdot e^{i\theta\_{i}} \right|^{2} \\ \left| \psi\_{i} \cdot e^{i\theta\_{j}} \right|^{2} \end{bmatrix} \text{ b}(\mathbf{X}=\mathbf{F}) = \begin{bmatrix} \left| \psi\_{i} \cdot e^{i\theta\_{i}} \right|^{2} \\ \left| \psi\_{j} \cdot e^{i\theta\_{j}} \right|^{2} \end{bmatrix} \tag{27}$$

We always have two vectors, because the proposed Quantum-Like Bayesian network only supports binary random variables, that is, the query that it is performed to the network corresponds to a yes or no answer. In other words, one vector corresponds to

the probability of the query random variable returning a positive answer, and the other corresponds to the probability of the query random variable returning a negative one. In a geometric space, these vectors are represented as in **Figure 2**. From these two vectors, similarity measures like the angles between the vectors or the distances between them can be computed. These similarity measures will be addressed in more detail in Section 6.2.

One could ask why these feature vectors are represented by probabilities. In our model, the goal is to find a quantum parameter that can be used to compute quantum probability inferences. The only information that one has are the probability distributions of a given scenario, which are encoded in the Bayesian Network.

In quantum mechanics, quantum states are always represented by unit length vectors. Since the proposed model is inspired by quantum formalisms, one might be wondering why the vectors are not unit length as well. There are two reasons for this choice. First, this representation of beliefs/actions as probabilities in feature vectors is not new, and it is a common practice in the literature (Osherson, 1995). Second, since our model is represented by a Bayesian Network and the vectors extracted directly from the network (through the representation of the full joint probability distribution), we do not need to have unit length vectors. Instead, this normalization will be performed during the inference process through the computation of the normalization factor α.

In the end, the quantum interference term is computed by computing different vector representations for each pair of variables that are being computed (**Figure 3**). These vectors are extremely important to compute, since they will enable the calculation of different quantum θ parameters.

# 6.2. Acquisition of Additional Information

It is important to note that, over the current literature, quantum parameters must be assigned manually in order to obtain a prediction. So, for different experiments, we will have disparate quantum parameters. For this reason, it is very hard to create a universal heuristic that can assign quantum parameters for different applications. In this work, we propose a heuristic that

FIGURE 3 | Illustration of the different 2-dimensional vectors that will be generated for each step of iteration during the computation of the quantum interference term.

is able to perform accurate predictions for the several different experiments reported in the literature related to the Prisoner's Dilemma Game and the Two Stage Gambling Game.

The goal of this similarity heuristic is to determine an angle between the vectors **a** and **b** (Equation 27) that can be used as the θ parameter in Equation 26. Moreover, by computing the Euclidean distance between vectors **a** and **b**, one can obtain vector **c**. Equation 28 shows how to obtain the norm of vector **c** through vectors **a** and **b** (**Figure 2**). Additional information is gained by comparing the similarity between the two vectors. This new information allows one to infer hidden properties of a participant's beliefs/actions from visible ones. This vector representation is similar to the approach proposed in the work of Pothos et al. (2013), where the authors represent a person's beliefs/actions in an n-dimensional vector space and the similarity between the vectors is measured by a projection operator, which corresponds to the computation of the squared length of the projected vector. This is similar to our approach, since we are also computing the length between the vectors **a** and **b**.

$$\begin{aligned} ||\mathbf{c}|| &= \\ ||\mathbf{a} - \mathbf{b}|| &= \sqrt{\left(a\_1 - b\_1\right)^2 + \left(a\_2 - b\_2\right)^2 + \dots + \left(a\_n - b\_n\right)^2} \end{aligned} \tag{28}$$

Since we are interested in the angles that these vectors make between each other, we used trigonometric laws, such as the law of cosines, to determine these angles. The law of cosines is given by Equations 29–31, where θA corresponds to the angle between vectors **b** and **c**. θB corresponds to the angle between vectors **a** and **c**. And θC corresponds to the angle between vectors **a** and **b**. Since we know the coordinates of vectors **a** and **b**, one can also compute angle θC through the similarity between two vectors using the cosine similarity measure: cos(θC) = **a**·**b** ||**a**||·||**b**|| . However, since we only know the length of vector **c**, we need to compare the similarity of the vectors through the law of cosines.

$$\begin{split} ||\mathbf{a}||^2 &= ||\mathbf{b}||^2 + ||\mathbf{c}||^2 - 2 \cdot ||\mathbf{b}|| \cdot ||\mathbf{c}|| \cdot \cos\left(\theta A\right) \Leftrightarrow \theta A \\ &= \cos^{-1}\left(\frac{||\mathbf{b}||^2 + ||\mathbf{c}||^2 - ||\mathbf{a}||^2}{2 \cdot ||\mathbf{b}|| \cdot ||\mathbf{c}||}\right) \end{split} \tag{29}$$

$$\begin{split} ||\mathbf{b}||^2 &= ||\mathbf{a}||^2 + ||\mathbf{c}||^2 - 2 \cdot ||\mathbf{a}|| \cdot ||\mathbf{c}|| \cdot \cos \left( \theta B \right) \Leftrightarrow \theta B \\ &= \cos^{-1} \left( \frac{||\mathbf{a}||^2 + ||\mathbf{c}||^2 - ||\mathbf{b}||^2}{2 \cdot ||\mathbf{a}|| \cdot ||\mathbf{c}||} \right) \end{split} \tag{30}$$

$$\begin{split} ||\mathbf{c}||^2 &= ||\mathbf{a}||^2 + ||\mathbf{b}||^2 - 2 \cdot ||\mathbf{a}|| \cdot ||\mathbf{b}|| \cdot \cos\left(\theta C\right) \Leftrightarrow \theta C \\ &= \cos^{-1}\left(\frac{||\mathbf{a}||^2 + ||\mathbf{b}||^2 - ||\mathbf{c}||^2}{2 \cdot ||\mathbf{a}|| \cdot ||\mathbf{b}||}\right) \end{split} \tag{31}$$

# 6.3. Definition of the Similarity Heuristic

Violations to the Sure Thing principle imply a decrease in the final probability values when compared to the classical theory. This suggests that, somehow, we need to force the quantum parameters to have a destructive interference effect. This can be obtained by setting the quantum parameter to π (which is the angle that provides the smallest cosine value). The additional information that we incorporated in **Figure 2**, namely the Euclidean distance between vectors and their similarities, is translated into a triangle. This shape has a well-known property that all their inner angles must sum to 180◦ or π radians. Moreover, we would like to have a destructive interference effect that takes into account the similarity of the original vectors. Equation 32, shows how one can obtain this relationship.

$$
\theta\_A + \theta\_B + \theta\_C = \pi \iff \pi - \theta\_C = \theta\_A + \theta\_B \tag{32}
$$

$$
\Leftrightarrow \pi - \frac{\theta\_C}{2} = \frac{\theta\_A + \theta\_B + \pi}{2}
$$

When, the similarity of the vectors is very small, that is θ<sup>C</sup> is very small, then we can add a third relationship:

$$
\theta\_A + \theta\_B + \theta\_C = \pi \Leftrightarrow \pi = \theta\_A + \theta\_B
$$

In this sense, we can formulate the general formula of the proposed similarity heuristic :

$$h\left(a,\ b\right) = \begin{cases} \pi & \text{if } \phi < 0\\ \pi - \theta\_C/2 & \text{if } \phi > 0.2\\ \pi - \theta\_C & \text{otherwise} \end{cases} \tag{33}$$

We also came up with a similarity measure φ, which is given by the ratio between all the angles that the vectors make between them. In order words, it represents the similarity between the additional information found by manipulating the original vectors and is given by Equation 34.

$$
\phi = \frac{\theta\_C}{\theta\_A} - \frac{\theta\_B}{\theta\_A} \tag{34}
$$

The thresholds shown in the proposed similarity heuristic were taken by observing the data from several experiments violating the Sure Thing Principle. These include several experiments in the literature of the Prisoner's Dilemma Game and the Two Stage Gambling Game. Yukalov and Sornette (2011) also did something similar. They analyzed the experiments violating the Sure Thing Principle and came up with a static interference term (the Interference Quarter Law) that allows them to apply their model without knowing exactly a priori the outcome of some specific experiment. The proposed model works under similar conditions. We analyzed several experiments from the literature from different games and mapped the trends of the data into a dynamic heuristic. So, in the end, the proposed model works under some rules that enables a dynamic behavior (after all each experiment is unique, so there should be the freedom of different quantum interferences) and also enables the application of the model without specific a priori knowledge from a specific experiment.

In quantum mechanics, the θ parameter corresponds to the phase of a wave. When representing a quantum state in a Hilbert space, this phase is given by the inner product between two quantum states (Busemeyer and Bruza, 2012). The proposed similarity heuristic is motivated by the same idea. For two vectors representing a person's belief/action, we find which angle (or in this case, a combination of angles) that can lead to the observed probabilities for the Prisoner's Dilemma and for the Two Stage Gambling game.

# 6.4. Summary of the Proposed Model

The proposed model is built based on observed data to perform quantum probabilistic inferences. We are using a similarity heuristic, which relies in the data of the Bayesian Network to indicate the parameters that will allow us to perform quantum probabilistic inferences. One should keep in mind that this function is a heuristic: it generally provides good results in many situations (in this case, the Two Stage Gambling game, and the Prisoner's Dilemma), but at the cost of occasionally not giving us very accurate results (Shah and Oppenheimer, 2008).

In sum, the proposed model works as follows:


all entries of the distribution that have the assignment of the query variable set to true.

• After knowing the similarities that the vectors share between them, we can apply the proposed similarity heuristic given in Equation 33 to obtain a θ parameter that enables the computation of the final probability value of the query.

One might be thinking that we use two of the three data points directly in the model (known Defect and known Collaborate). Then, they use one free parameter to account for the remaining data point (the probability of Defection in the unknown condition). However, this is not what we state with this work. As already mentioned, this work is a nonparametric method for estimating inference effects from a statistical point of view. It is a statistical model that is simpler than the previous quantum dynamic and quantum-like models proposed in the literature. Again, this work is not about simulation methods of fitting. We are simply providing a Bayesian Network structure that enables a simple representation of more complex decisionmaking scenarios, and the incorporation of a similarity heuristic (which results from algebraic manipulations) in order to assign values to quantum parameters in such a way that provides accurate predictions (that is, it can represent the data accurately).

In the next sections, we will present a full example of how the proposed Quantum-Like Bayesian Network can be applied (Section 6.5). We will also present experimental results of the proposed model applied to several works of the literature concerned with the Prisoner's Dilemma game (Section 7.1) and the Two Stage Gambling game (Section 7.2).

# 6.5. Example of Application of the Proposed Model

In this section, we will demonstrate how the proposed Bayesian Network can be applied to the average results presented in **Table 1** for the Prisoner's Dilemma game. The proposed Quantum-Like Bayesian Network can be summarized in the following steps:

• **Step 1: Create a Bayesian Network Representation of the Problem:** In the Prisoner's dilemma game, if nothing is told to the participants, then there is a 50% chance of the first participant choosing to Defect or Cooperate. The decision of the first participant is then followed by the decision of the second participant. A Bayesian Network representation of this problem is illustrated in **Figure 4**.

TABLE 7 | Quantum full joint probability distribution representation of the Bayesian Network in Figure 4.


• **Step 2: Compute the Vectors associated to each action.** Since we want to determine the Pr(P2 = Defect), this probability will be given by the quantum full joint probability distribution, which is represented in **Table 7**.

$$\mathbf{P} \mathbf{2}\_{Defect} = \begin{bmatrix} \left| 0.6595 \cdot e^{i \cdot \theta\_A} \right|^2 \\ \left| 0.6083 \cdot e^{i \cdot \theta\_C} \right|^2 \end{bmatrix} = \begin{bmatrix} 0.435 \\ 0.370 \end{bmatrix}$$

$$\mathbf{P} \mathbf{2}\_{Cooperate} = \begin{bmatrix} \left| 0.2550 \cdot e^{i \cdot \theta\_B} \right|^2 \\ \left| 0.3606 \cdot e^{i \cdot \theta\_D} \right|^2 \end{bmatrix} = \begin{bmatrix} 0.065 \\ 0.130 \end{bmatrix} \tag{35}$$

This way, one can build feature vectors using classical probabilities. For instance, the probability of Pr(P2 = Defect) is given by a 2-dimensional feature vector with entries: Pr(P1 = Defect) · Pr(P2 = Defect|P1 = Defect) and Pr(P1 = Cooperate) · Pr(P2 = Defect|P1 = Cooperate). The feature vector corresponding to the action Cooperate can be achieved in the same way (Equation 35).

• **Step 3: Determine the quantum parameters using the proposed similarity heuristic:** Since we only have two random variables, we only need to compute one θ parameter. This parameter can be obtained by directly by first computing the Euclidean distance between **P2**Defect and **P2**Cooperate, and by computing the inner angles of the resulting triangle (**Figure 5**).

$$\begin{aligned} ||\mathbf{c}|| &= \left|| \mathbf{P} \mathbf{2}\_{\text{Defect}} - \mathbf{P} \mathbf{2}\_{\text{Cooperate}} \right|| \\ &= \sqrt{(0.435 - 0.065)^2 + (0.37 - 0.13)^2} = 0.4410 \,\text{(36)} \end{aligned}$$

The norm of vectors P2Defect and P2Cooperate is given by:

$$\left| \left| \mathbf{P} \mathbf{2}\_{\text{Defect}} \right| \right| = \sqrt{0.435^2 + 0.370^2} = 0.5711$$

$$\left| \left| \mathbf{P} \mathbf{2}\_{\text{Cooperate}} \right| \right| = \sqrt{0.065^2 + 0.130^2} = 0.1453 \quad \text{(37)}$$

The inner angles of the triangle formed by vectors **P2**Defect and **P2**Cooperate and **c** can be computed from the law of Cosines presented in Equations 38–40.

$$A = \cos^{-1}\left(\frac{||\mathbf{P}\mathbf{2}\_{\text{Cooperate}}||^2 - ||\mathbf{P}\mathbf{2}\_{\text{Defect}}||^2 + c^2}{2 \cdot \mathbf{c} \cdot \left||\mathbf{P}\mathbf{2}\_{\text{Cooperate}}||}\right|\right) = 2.6102\tag{38}$$

$$B = \cos^{-1}\left(\frac{||\mathbf{P}\mathbf{2}\_{\text{Defect}}||^2 - \left||\mathbf{P}\mathbf{2}\_{\text{Cooperate}}||^2 + \mathbf{c}^2}{2 \cdot \mathbf{c} \cdot \left||\mathbf{P}\mathbf{2}\_{\text{Defect}}||}\right|\right) = 0.1294$$

$$C = \cos^{-1}\left(\frac{\left||\mathbf{P}\mathbf{2}\_{Defet}\right||^2 + \left||\mathbf{P}\mathbf{2}\_{Cooperate}\right||^2 - \mathbf{c}^2}{2 \cdot \left||\mathbf{P}\mathbf{2}\_{Defet}\right|| \cdot \left||\mathbf{P}\mathbf{2}\_{Cooperate}\right||}\right) = 0.4023\tag{40}$$

Given that <sup>θ</sup><sup>C</sup> θA − θB <sup>θ</sup><sup>A</sup> = 0.1046, then the final quantum θ parameter can be computed by using the third condition of Equation 33

$$
\theta = \pi \,\, - \, \theta \,\, \theta \,\, = \pi \,\, - 0.4023 \,\, = 2.7393 \,\, \tag{41}
$$

• **Step 4: Perform the Probabilistic Inference.** In order to compute Pr(P2 = Defect) we also need to compute the opposite probability, that is, Pr(P2 = Cooperate). Equation 42 represents quantum amplitudes through the symbol ψ. The sub indexes D and C correspond to the actions Defect and Cooperate, respectively.

$$\begin{split} Pr(\text{P2} = \text{Defect}) &= \alpha \left[ \left| \psi\_{P2 = D|P1 = D} \right|^2 + \left| \psi\_{P2 = D|P1 = D} \right|^2 \right] \\ &+ 2 \cdot \left| \psi\_{P2 = D|P1 = D} \right| \cdot \left| \psi\_{P2 = D|P1 = C} \right| \cdot \cos \left( \theta \right) \end{split} \tag{42}$$

$$\begin{aligned} Pr(P2 = Defect) &= \alpha \left[ 0.5 \times 0.87 + 0.5 \times 0.74 \right] \\ &+ 2 \times \sqrt{0.5 \times 0.87} \times \sqrt{0.5 \times 0.74} \cos(2.7393) \end{aligned} \tag{43}$$

Computing the probability of Pr(P2 = Cooperate) in the same way, we obtain:

$$\Pr(\text{P2} = \text{Defect}) = \alpha \cdot 0.0667$$

$$\Pr(\text{Cooperate}) = \alpha \cdot 0.0258 \tag{44}$$

• Step 5: Compute Normalization Factor and Final Probabilities.

$$\alpha = \frac{1}{0.0667 + 0.0258} = \frac{1}{0.0925} = 10.8108 \qquad \text{(45)}$$

The final probabilities are given by Equation 45. Note that in **Table 1**, the observed probability of a player choosing to Defect was 0.64. The proposed Bayesian Network estimated this probability to be approximately 0.72, which corresponds to a fit error percentage of 12.63%.

$$\Pr(\text{P2} = \text{Defect}) = 0.7208 \quad \Pr(\text{P2} = \text{Cooperate}) = 0.2792\tag{46}$$

# 7. EXPERIMENTAL RESULTS

Violations to the Sure Thing Principle are hard to verify in complex decision-making problems. For this reason, there is not much data available in the literature for validation purposes. So, in this work, we will validate our model for several different experiments made to detect violations of the Sure Thing Principle in the Prisoner's Dilemma Game (Section 7.1) and for the Two Stage Gambling game (Section 7.2).

# 7.1. Quantum Bayesian Network Applied to the Prisoner's Dilemma Game

In this section, we apply our model to predict the results obtained for the Prisoner's Dilemma game for several works in the literature.

It is common (and good) practice in cognitive science to compare the results of one's model to the results of leading comparable models. The fit error percentages that we present in the following sections would be much easier to interpret if there could be other models to compare with. However, we cannot perform this comparison directly, because the current models of the literature only work for isolated experiments, just like it was shown for the Quantum Dynamical Model (Section 4.1) and the Quantum-Like Approach (Section 4.2). That is, each time there is a new experiment, the parameters of their respective models would need to be tuned manually in order to perform correct predictions. We propose a general and scalable framework that is able to perform predictions in several different setting with small amounts of fit errors.

In this sense, we modeled each result reported in **Table 1** with the proposed Bayesian Network and using the proposed similarity heuristic. We obtained the results that are presented in **Figure 6**.

For a more detailed analysis of **Figure 6**, **Table 8** shows the quantum θ parameters that were computed for each experiment and the quantum parameter that would be expected to achieve a 0% fit error. The fit error is a percentage value and was computing in the following way: (1− computed\_probability observed\_probability )∗100. In **Table 8**, the term computedprobability corresponds to the column Pr(Defect) predicted and the term observed\_probability corresponds to the column observed\_probability.

In **Table 8**, one can see that the proposed similarity heuristic was able to perform good approximations to the data. The dynamical heuristic enabled to perform different estimations of quantum interference effects for different decision problems. However, since it is an heuristic, it can sometimes lead to overestimations, which was the case in the work of Busemeyer et al. (2006a). These overestimations occur due to the sensitivity of the quantum parameters. That is, a small change in a quantum parameter will lead to a completely different probability value. This will be discussed in more depth in Section 7.1.2.

As one might have noticed, the work of Croson (1999) was not taken into account in the analysis of these results. We decided to analyse these results in the next section, because they contained properties that were different from the remaining works. In Croson (1999), the participants were never told about the actions of the other player. The author asked for the participants to first try to guess what action the other player chose and then make a decision. In another setting, participants were just asked to make a decision.

## 7.1.1. The Special Case of Croson's (1999) Experiments

In work of Croson (1999), we used the results reported for the first two payoff matrices tested in their work and performed the average of the results. When trying to compute the optimum quantum θ parameter that would lead to the computation of the probability with a 0% fit error, we could not find any. There was no possible parameter that could be obtained from the two feature vectors representing the probability of choosing either a Defect action or a Cooperate action.


TABLE 8 | Analysis of the quantum θ parameters computed for each work of the literature using the proposed similarity function.

Expected θ corresponds to the quantum parameter that leads to the observed probability value in the experiment. Computed θ corresponds to the quantum parameter computed with the proposed heuristic.

b corresponds to the average of all seven experiments reported.

TABLE 9 | Results for the two games reported in the work of Croson (1999) for the Prisoner's Dilemma Game for several conditions: when the action of the second player was guessed to be Defect (Guessed to Defect), when the action of the second player was guessed to be Cooperate (Guessed to Collaborate), and when the action of the second player was not known (Unknown).


using the quantum law of total probability.

As a first thought, we noticed that the average of the results could be the cause of such impossibility, because they were not the true probabilities of the events reported. So, we decided to analyse the outcome of each experiment of the work of Croson (1999) individually. **Table 9** specifies those results.

We again analyzed the individual results of **Table 9**, and again, we could not find any quantum θ parameter that would lead to the computation of probabilities with a 0%. On the contrary, the minimum fit errors found were 64.89, 83.25, and 17.06% for Game 1, Game 2 and the Average of these games, respectively. **Figure 7** present all possible probabilities that can be computed using the quantum law of total amplitude.

Analysing Game 1 (**Figure 7**, left), the probability that leads to the smallest fit error is obtained when both θ parameters are set to zero, with a probability of 0.4123. The observed probability reported in this experiment corresponds to 0.2250, leading to a computed fit error of 64.69%.

For Game 2 (**Figure 7**, center), when θ<sup>1</sup> = 0 and θ<sup>2</sup> = π, we obtain the probability that leads to the smallest fit error, which is 0.4390, with a fit error of 83.25 %.

When computing the average of both games (**Figure 7**, right), the quantum θ parameters found were θ<sup>1</sup> = 0 and θ<sup>2</sup> = 0. This leads to a probability of 0.4947, corresponding to a fit error of 17.06%.

#### 7.1.2. Analysing Li and Taplin (2002) Experiments

**Table 10** specifies the results collected by Li and Taplin (2002), which corresponded to the average of the results obtained in seven different experiments for the Prisoner's Dilemma game. In this section we analyse each of these seven experiments, by trying to predict their outcome using the proposed Bayesian Network.

The results reported in the experiments conducted by Li and Taplin (2002) are presented in **Table 10**. Note that Games 3, 6 and 7 are not violating the Sure Thing Principle, because: Pr Defect ≥ Pr Unknown ≤ Pr Cooperate or Pr Cooperate ≥ Pr Unknown ≤ Pr Defect . Additionally, the results reported for the unknown condition in Games 3, 6 and 7 are very close to the classical probability theory. The goal of the study performed by Li and Taplin was to question

Dilemma game.

TABLE 10 | Experimental results reported in work of Li and Taplin (2002) for the Prisoner's Dilemma game for several conditions: when the action of the second player is known to be Defect (Known to Defect), when the action of the second player is known to be Cooperate (Known to Collaborate), and when the action of the second player was not known (Unknown).


The column Violations of STP corresponds to determining if the collected results are violating the Sure Thing Principle. The values in bold represent the experiments that are not violating the Sure Thing Principle.

if there was really violations of the Sure Thing Principle under the Prisoner's Dilemma game. According to **Table 10** three of the seven experiments did not show a violation, and reported results very similar to the classical probability theory.

By applying the proposed quantum-like Bayesian Network each game in **Table 10**, we obtained the results illustrated in **Figure 8**.

The experiments that achieved the highest fit error rates correspond to Games 2 and 6. Game 6 corresponds to a situation where the Sure Thing Principle was not being violated. This leads to the conclusion that the proposed Bayesian Network can also predict classical probabilities, but with some fit errors.

**Table 11** shows the quantum parameters that were computed and compares them with the parameters that would be expected in order to obtain the smallest fit error percentage. One thing worth mentioning in the computation of these quantum parameters is their sensitivity. Consider the row of **Table 11** addressing the results of Game 2. The difference between expected quantum parameter with the one that was computed using the similarity heuristic corresponds to a difference of just 0.0322. However, this small difference introduced a fit error of almost 11.28% in the computation of the final probabilities. **Figure 9** illustrates the relation between the quantum θ parameter and the final probabilities that can be obtained in Li's Game 2, Game 6 and the work of Busemeyer et al. (2006a).

Small changes in the θ parameters can lead to a completely different probability outcomes. This has some relation with deterministic chaos, in which small differences in initial conditions yield widely diverging outcomes in a system. This chaos suggests how difficult the task of predicting human decisions is and how random it can be (Sterman, 1989).

# 7.2. Quantum Bayesian Network Applied to the Two Stage Gambling Game

For the Two Stage Gambling Game, the overall results reported very small fit errors. The highest fit error percentage achieved was 16.3% and corresponds to the work of Kuhberger et al. (2001). Once again, the work of Kuhberger et al. (2001) is not showing a violation to the Sure Thing Principle, enhancing the previous conclusion that the proposed quantum-like Bayesian Network works best in situation where this violation exists.

In what concerns the work of Lambdin and Burdsal (2007) the proposed Quantum-Like Bayesian Network could not make accurate predictions. **Figure 10** show all possible probabilities



The entries highlighted correspond to games that are not violating the Sure Thing Principle. Expected θ corresponds to the quantum parameter that leads to the observed probability value in the experiment. Computed θ corresponds to the quantum parameter computed with the proposed heuristic.

FIGURE 9 | Possible probabilities that can be obtained in Game 2 of the work of Li and Taplin (2002) (left). Possible probabilities that can be obtained in Game 6 of the work of Li and Taplin (2002) (center). Possible probabilities that can be obtained in the work of Busemeyer et al. (2006a) (right).

that can be obtained by varying the quantum parameters. As one can see, the minimum value that we can obtain corresponds to 0.4593. However, the observed probability reported by Lambdin and Burdsal (2007) corresponds to 0.41. This leads to a fit error of 12.02%.

In the work of Busemeyer et al. (2012), the authors applied the quantum dynamical model to reproduce the results obtained for the Two Stage Gambling Game and also explored the use of Hierarchical Bayesian methods to estimate the values of quantum parameters to simulate the player's personal profile: risk aversion, loss aversion, memory and choice. In the recent work of Busemeyer et al. (2015), the authors also compare the quantum model with a classical model using Bayes factor. They concluded that the quantum approach was preferred by the Bayes Factor.

# 7.3. Comparison with Other Works of the Literature

In this section, we compare the results obtained with the proposed Quantum-Like Bayesian Network with the Quantum Prospect Decision Theory (Yukalov and Sornette, 2011). From all the analyzed models, this is the only one that can be called predictive due to its static heuristic: the Interference Quarter Law. The reason why we proposed a dynamic heuristic is because every decision problem is different and, consequently, quantum interference effects should also be different and not static. In the Quantum Prospect Decision Theory, the quantum interference term is fixed by the Interference Quarter Law, that is, the quantum interference term in the law of total probability is fixed to 0.25.

In the current model, since each decision problem is different, the proposed heuristic will compute a quantum θ parameter through similarities that the vector make between each other and these vectors are constructed from the experimental data. So, the vectors take into account the properties of each experiment, making it possible to compute different quantum interference terms for different decision problems.

**Table 12** shows the results obtained for the Quantum Prospect Decision Theory and for the Quantum-Like Bayesian Network for the different works of the literature that tested violations to the Sure Thing Principle in the Prisoner's Dilemma Game and the Two Stage Gambling Game.

In the end, the results from **Table 12** demonstrate that, in general, the proposed Quantum-Like Bayesian Network together with the dynamic heuristic managed to fit the observed results in the several different experiments with an average fit error of 6.3%, whereas the Quantum Prospect Decision Theory achieved an average fit error of 16.51%.

One needs to take into account that in the Quantum Prospect Decision Theory and in the proposed Quantum-Like Bayesian Network, heuristics are used to estimate the quantum interference effects. This means that the heuristic can lead to a good fit of the data most of the times, but, in some cases, it can lead to completely wrong results. In the Quantum Prospect Theory, for instance, one can see the static Interference Quarter Law heuristic performed several estimations with big fit errors. The same is applied to the proposed Quantum-Like Bayesian Network. The difference is that this last model makes use of dynamic heuristics. **Table 12** shows that the proposed dynamic heuristic overestimated the results in the works of Busemeyer et al. (2006a) and Kuhberger et al. (2001). This also happens due to the sensitivity of the θ parameters already discussed in **Figure 9**.

We also applied the Quantum Prospect Theory and the proposed Quantum-Like Bayesian Network to all experiments performed in the work of Li and Taplin (2002). **Table 13** shows again great discrepancies between the average fit error obtained with the static heuristic of the Quantum Prospect Decision Theory. In general, the proposed model manages to fit all the different seven experiments with an average fit error of 6.41%, whereas the Quantum Prospect Decision Theory achieved an error of 24.23%. Most of the times, the Interference Quarter Law managed to produce lower estimations of the results observed during the several experiments. This shows that having a dynamical heuristic that is able to adapt to the different decision problems brings advantages in terms of predictive effectiveness.

# 8. DISCUSSION AND CONCLUSION

In this work, we proposed an alternative quantum structure to perform quantum probabilistic inferences to accommodate the paradoxical findings of the Sure Thing Principle. We proposed a Quantum-Like Bayesian Network, which consists in replacing classical probabilities by quantum probability amplitudes. However, since this approach suffers from the problem of exponential growth of quantum parameters, we also proposed a similarity heuristic that automatically fits quantum parameters through vector similarities. This makes the proposed model general and predictive in contrast to the current state of the art models, which cannot be generalized for more complex decision scenarios and that only provide an explanatory nature for the observed paradoxes.

In Section 1.3, we established a set of research questions that we would like to address with the present research work. Their answers are detailed below.

1. Why do we need another quantum-like model to explain violations to the Sure Thing Principle?

TABLE 12 | Comparison between the Quantum Prospect Decision Theory (QPDT) model and the proposed Quantum-Like Bayesian Network (QLBN) for different works of the literature reporting violations to the Sure Thing Principle.


b corresponds to the average of all seven experiments reported. The values in bold represent the models that obtained the lowest Fit error.


TABLE 13 | Comparison between the Quantum Prospect Decision Theory (QPDT) model and the proposed Quantum-Like Bayesian Network (QLBN) for all the different experiments performed in the work of Li and Taplin (2002).

The values in bold represent the models that obtained the lowest Fit error.

Many of the models that have been proposed in the literature cannot be considered predictive. Most of these models require a set of quantum parameters to be fitted and, so far, the only way these models have to fit the parameters is to use the final outcome of the experiment to set the parameters in order to explain the experimental outcome. There is, however, one model in the literature that proposed a static heuristic to compute the quantum interference effects and can be called predictive. This model is the Quantum Prospect Decision Theory, proposed by Yukalov and Sornette (2011).

2. What is the advantage of the proposed approach? How can it make a difference toward the current well-established quantum models that have been proposed in the literature?

Since each decision problem is different, we believe that a quantum decision model would benefit from a dynamic heuristic that could take into account the decision problem's settings and come up with estimations for the quantum interference parameters. In the proposed model, quantum parameters are found based on the correlations that the vectors share between them. These correlations are explored through vector similarities that are computed using the Law of Cosines in a vector space. In this sense, we suggest that the quantum parameters that arise from interference effects might represent some degree of similarity between events. The previous work of Moreira and Wichert (2015) point out this

# REFERENCES


semantic relation between vectors. In the end, the proposed model can be seen as a nonparametric method for estimating inference effects from a statistical point of view. It is a statistical model that is simpler than the previous Quantum Dynamical Model (Pothos and Busemeyer, 2009) and Quantum-Like Approach (Khrennikov, 2010) models proposed in the literature. The method makes use of the principles of Bayesian Networks, in order to obtain a more general and scalable model that can produce competitive results over the current state of the art models.

Experimental data demonstrated that the proposed heuristic managed to produce accurate fits to the data, overcoming the previously proposed Quantum Prospect Theory. This suggests that taking into account a dynamic estimation of quantum parameters is a good direction to build quantum-like predictive models.

# ACKNOWLEDGMENTS

This work was supported by national funds through Fundação para a Ciência e a Tecnologia (FCT) with reference UID/CEC/50021/2013 and through the PhD grant SFRH/BD/92391/2013. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Moreira and Wichert. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Progress and current challenges with the quantum similarity model

#### *Emmanuel M. Pothos 1, Albert Barque-Duran1 \*, James M. Yearsley1, Jennifer S. Trueblood2, Jerome R. Busemeyer <sup>3</sup> and James A. Hampton1*

*<sup>1</sup> Department of Psychology, City University London, London, UK*

*<sup>2</sup> Department of Cognitive Sciences, University of California, Irvine, Irvine, CA, USA*

*<sup>3</sup> Department of Psychological and Brain Sciences, Indiana University, Bloomington, IN, USA*

*\*Correspondence: albert.barque-duran@city.ac.uk*

#### *Edited by:*

*Sandro Sozzo, University of Leicester, UK*

#### *Reviewed by:*

*Kirsty Kitto, Queensland University of Technology, Australia Ariane Lambert-Mogiliansky, Paris School of Economics, France Reinhard Blutner, University of Amsterdam, Netherlands (retired) Thomas Filk, University Freiburg, Germany*

#### **Keywords: similarity, similarity judgment, quantum probability theory, metric axioms, symmetry, diagnosticity**

This opinion paper reviews progress with the quantum similarity model (QSM), which was proposed by Pothos et al. (2013). In the QSM, concepts are associated with subspaces, the mental state is a state vector in a Hilbert space, and similarity between two concepts is computed in terms of the sequential projection, between the corresponding subspaces. If there is a relevant context, this is incorporated as prior projections (e.g., **Figure 1**).

The QSM was developed as a way to primarily cover the empirical findings of Tversky (1977). Tversky (1977) reported a series of results for (mostly) simple (nonanalogical, see below) pairwise similarity judgments. Tversky's (1977) research severely challenged the dominant similarity models at that time, based on metric spaces and distances. Such models are constrained to obey the metric axioms (as long as similarities are simple functions of distances). Yet, in his seminal work, Tversky reported violations of all three metric axioms (minimality, symmetry, triangle inequality), in the similarity judgments of naïve observers. Moreover, Tversky reported a so-called diagnosticity effect, where the same similarity judgment could change greatly, depending on which other stimuli were present in a (broadly) relevant context set.

All of Tversky's (1977) findings reveal intuitions about human similarity that are, initially at least, very surprising. For example, how can it be possible that the similarity between (simplifying his example) China and Korea be less than Korea and China? Yet, some thought shows that we indeed prefer to judge a nonprominent object (e.g., Korea) as more similar to a prominent one (e.g., China), as compared to the reverse order. Equally, how can it be possible that Austria is seen as more similar to Sweden than to Hungary in the context of Poland, but more similar to Hungary than to Sweden in the context of Norway?

Tversky's findings have been a major focus of subsequent theoretical work on similarity judgments. Some of the most prominent models are the distancedensity model (Krumhansl, 1978), general recognition theory (Ashby and Perrin, 1988), and the generalized context model (Nosofsky, 1988; this is a theory of categorization, rather than similarity, yet Nosofsky considered in his influential work how to accommodate Tversky's findings as well, e.g., Nosofsky, 1991). Limited space prevents us from a detailed analysis of this work. Overall, we think that while such work has provided many excellent intuitions regarding human similarity, its application to Tversky's (1977) findings is not uniformly satisfactory. This was a consideration that in part motivated the QSM.

Another motivating consideration has been the recently proposed model for the conjunction fallacy, based on quantum theory (Busemeyer et al., 2011). The conjunction fallacy is a famous result in decision making, whereby naïve observers judge a hypothetical person, Linda, to be more likely to be both a Bank teller and a feminist, than just a bank teller (Tversky and Kahneman, 1983). Of course, such a result is paradoxical, if one employs the rules of classical probability theory. Tversky and Kahneman (1983) suggested that naïve observers in their experiment employed a so-called representativeness heuristic, judging Linda to be more similar to the category of bank tellers and feminists. Thus, at the heart of the explanation for the conjunction fallacy is the idea that participants employ a similarity process (see also Shafir et al., 1990, for further validations of this idea). The quantum model for the conjunction fallacy indeed reflects operations that involve the overlap of a state vector (representing the mental state of participants) and subspaces (which correspond to different concepts in the participants' knowledge space, e.g., the idea that a woman can be a feminist; cf. Sloman, 1993). Thus, we were interested in whether the quantum model for the conjunction fallacy could be extended, more or less as it is, to function as a model of some aspects of similarity. This was indeed the approach that was adopted by Pothos et al. (2013) and the QSM is structurally and procedurally nearly entirely equivalent to Busemeyer et al's (2011) model for the conjunction fallacy. That the same

principles can provide a route for explaining both aspects of decisionmaking and similarity enables the exciting possibility that a formal unification may be possible between these two seemingly disparate aspects of cognition.

One emphasis of the QSM has been the demonstration of asymmetries in similarity judgments. In the QSM this arises in part because concepts are represented as subspaces. Note that the use of subspaces as such is not a uniquely quantum feature of the QSM, but the lack of commutativities in projection sequence (which contributes to the emergence of asymmetries) is. Subspaces can have rich inner structure, corresponding e.g., to the characteristics of a concept. Thus, concepts for which we have more knowledge (such as China, if we imagine ourselves in the shoes of Tversky's participants in 1977) will be represented by a higher dimensionality subspace, contrasting with concepts for which we have less knowledge (such as Korea). Together with an assumption that the mental state prior to a (simple) similarity comparison is neutral between the two concepts to be compared, this enables a natural emergence of asymmetries in human similarity judgments, in the predicted direction. More generally, conceptually, we think that representations as subspaces are an important advance. This is because representations in the QSM can have inner structure, not just in terms of a list of characteristics, but also in terms of how the characteristics relate to each other. By contrast, in traditional spatial representations, with concepts being represented as points or vectors, there is no possibility of such structure at all. This would be the case even in e.g., Latent Semantic Analysis approaches to representation, which have proved extremely useful and influential (e.g., Dumais, 2004; see also Kitsch, 2014; Kitsch, for an insightful comparison between the QSM and Latent Semantic Analysis; note that in Kitsch's (2014), approach, vectors are given variable length, and this can capture differences in degree of knowledge). But even in Tversky's (1977) feature-based approach, concepts would be lists of features, and Tversky (1977) did not consider how dependencies among features could be incorporated in his model.

The way violations of the triangle inequality arise in the QSM is very similar to how Tversky (1977) suggested such effects arise. Because in the QSM representations are subspaces, different regions in the overall space end up reflecting the features characteristic of the corresponding concepts. So, for example, imagine a region in the overall space with Russia and Cuba. This region will overall reflect the property of communism, noting that both Russia and Cuba are consistent with this property (thinking again as participants in Tversky's experiment in 1977). Then, imagine a region different to the first one containing Cuba and Jamaica. The shared characteristic of Cuba and Jamaica is their geographical proximity (they are both in the Caribbean), so this second region will likewise correspond to this property. It should be hopefully straightforward to then see how, if Cuba is on the boundary of the communism and Caribbean regions in psychological space, we can have Cuba highly similar to Russia, Cuba highly similar to Jamaica, but Russia and Jamaica dissimilar from each other, thus violating the triangle inequality. It has to be noted, however, that the triangle inequality is not a challenge for standard (non-linear) distance-based models of similarity. This is because the triangle inequality is already violated if one relates distance and similarity, via a non-linear function (such as the standard exponentially decaying function; Nosofsky, 1984; Shepard, 1987). Nevertheless, it is clearly important for a similarity model to cover violations of the triangle inequality in a convincing manner. Note, violations of the triangle inequality have been the focus of an alternative similarity model, based on quantum theory (Aerts et al., 2011).

A great focus for further work with the QSM concerns the diagnosticity effect. This is because the diagnosticity effect has proved difficult to replicate (e.g., see Evers and Lakens, 2014). We are interested in exploring whether the QSM model can provide insight into why the diagnosticity effect has proved elusive in its replicability. The diagnosticity effect is also significant because the quantum formalism, overall, is often said to embody strong contextual influences. So, perhaps, quantum theory would be particularly suitable for modeling context effects in similarity judgments? Well, the diagnosticity effect does emerge fairly naturally from the QSM, but the mechanisms that allow this are not the traditional contextual mechanisms in quantum theory (e.g., relating to entanglement or incompatibility). In the QSM, the contextual influences relevant to the diagnosticity effect emerge from the way prior projections are used to capture sensitivity to the grouping of context elements. In other words, the difficulty lies in the fact that contextual influences in similarity specifically depend on the degree of grouping of some of the options in the relevant choice set. For example, in Tversky's (1977) demonstration, participants were asked to decide which country is most similar to Austria, between Sweden, Hungary, and Poland. More participants chose Sweden, but when the choice set included Sweden, Hungary, and Norway, they chose Hungary. What we might call the "traditional" mechanisms for contextual influences in quantum theory are not sensitive to the similarity structure of the relevant options.

Contextual influences in the QSM arise in the following way. Similarity computations are based on projecting (laying down) the state vector (which represents the current mental state) onto different subspaces (which represents the concepts relevant in the similarity task; **Figure 1**). This projection operation can be highly order dependent in quantum theory. Of relevance, the outcome of a projection sequence is sensitive to the grouping of the subspaces across which projection takes place. If the subspaces are grouped together, then a projection sequence preserves the length of the state vector and vice versa. Thus, to account for the diagnosticity effect in the QSM, we postulated that, in a forced choice task (such as the one employed by (Tversky, 1977), in his diagnosticity formulation), prior to the projections corresponding to the elements in the similarity judgment, there would be projections corresponding to the other elements in the choice set. So, for example, if a participant is considering which between Sweden, Hungary, Poland is most similar to Austria, and is specifically evaluating the option of Sweden, then the similarity comparison would consist of projections from Sweden to Austria, but also there would be prior projections to Hungary and Poland. Using this scheme, with fairly minimal assumptions about the representation of the relevant stimuli, the diagnosticity effect emerges from the QSM.

One important challenge in further developing the QSM is further formalizing the way contextual influences are taken into account. The idea of incorporating context as prior projections works well, but it has a heuristic feel to it. Can the QSM be extended such that these prior projections can be motivated in a more rigorous way (cf. Lambert-Mogiliansky et al.'s, 2009, quantum model of framing effects)? Moreover, as noted, can the QSM generate any new predictions regarding the emergence or suppression of the diagnosticity effect? Since Tversky's (1977) work, there has not really been much further examination (or little that has reached the journals), which is surprising (in the sense that the idea of context in similarity judgments seems like a vast topic). These questions are an important focus for our current work with the QSM.

Another important focus concerns socalled analogical similarity judgments (e.g., Goldstone, 1994; Gentner and Markman, 1997). Analogical similarity refers to the idea that, for example, if we are comparing two people, Jim and Jack, if they both have black hair, this will increase their similarity, but if Jim has black hair and Jack has black shoes (and blond hair), this will have less impact on their similarity. That is, work on analogical similarity recognizes that objects often consist of separate components. Commonalities on matching components (e.g., black hair) increase similarity more so than commonalities on mismatching components (e.g., black hair and black shoes). It is currently unclear whether there is a genuine distinction between cognitive processing corresponding to basic similarity tasks (as in Tversky, 1977) and analogical similarity ones (some researchers have suggested that different cognitive systems may mediate the two types of judgments; Casale et al., 2012). Nevertheless, there have been largely separate corresponding literatures, with different objectives. We think that the QSM can be extended to incorporate analogical similarity, because quantum theory already has extensive machinery in place for combining individual components into a whole (cf. Smolensky, 1990). We have been pursuing an approach based on tensor products and we are optimistic that a concrete proposal will be forthcoming soon (Pothos and Trueblood, 2015).

Finally, the QSM is only part of a broader effort within the quantum cognition community to understand similarity using quantum processes. A more challenging, though important objective, would be to examine the formal relation between QSM and, for example, Aerts's (2009) model for conceptual combination or Lambert-Mogiliansky et al.'s (2009) model of framing effects.

### **ACKNOWLEDGMENTS**

EMP was supported by Leverhulme Trust grant RPG-2013-004 and JRB by NSF grant ECCS—1002188. EMP and JRB were supported by Air Force Office of Scientific Research (AFOSR), Air Force Material Command, USAF, grants FA 8655-13- 1-3044 and FA 9550-12-1-0397, respectively. The U.S Government is authorized to reproduce and distribute reprints for Governmental purpose notwithstanding any copyright notation thereon. JST was supported by NSF grant SES—1326275.

### **REFERENCES**


Shepard, R. N. (1987). Toward a universal law of generalization for psychological science. *Science* 237, 1317–1323. doi: 10.1126/science.3629243

Sloman, S. A. (1993). Feature-based induction. *Cogn. Psychol.* 25, 231–280. doi: 10.1006/cogp.1993.1006

Smolensky, P. (1990). Tensor product variable binding and the representation of symbolic structures in connectionist networks. *Artif. Intell.* 46, 159–216. doi: 10.1016/0004-3702(90)90007-M


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 18 November 2014; accepted: 10 February 2015; published online: 25 February 2015.*

*Citation: Pothos EM, Barque-Duran A, Yearsley JM, Trueblood JS, Busemeyer JR and Hampton JA (2015) Progress and current challenges with the quantum similarity model. Front. Psychol. 6:205. doi: 10.3389/fpsyg. 2015.00205*

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology.*

*Copyright © 2015 Pothos, Barque-Duran, Yearsley, Trueblood, Busemeyer and Hampton. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# A macroscopic violation of no-signaling in time inequalities? How to test temporal entanglement with behavioral observables

#### *Patrizio E. Tressoldi1\*, Markus A. Maier2, Vanessa L. Buechner2 and Andrei Khrennikov3*

*<sup>1</sup> Dipartimento di Psicologia Generale, Università di Padova, Padova, Italy, <sup>2</sup> Psychology Department, Ludwig Maximilian University of Munich, Munich, Germany, <sup>3</sup> International Center for Mathematical Modeling in Physics, Engineering, Economics, and Cognitive Science, Linnaeus University, Växjö-Kalmar, Sweden*

In this paper we applied for the first time the no-signaling in time (*NSIT)* formalism discussed by Kofler and Brukner (2013) to investigate temporal entanglement between binary human behavioral unconscious choices at t1 with binary random outcomes at t2. *NSIT* consists of a set of inequalities and represents mathematical conditions for macro-realism which require only two measurements in time. The analyses of three independent experiments show a strong violation of *NSIT* in two out of three of them, suggesting the hypothesis of a quantum-like temporal entanglement between human choices at t1 with binary random outcomes at t2. We discuss the potentialities of using NSIT to test temporal entanglement with behavioral measures.

#### *Edited by:*

*Liane Gabora, University of British Columbia, Canada*

#### *Reviewed by:*

*Dean Radin, Institute of Noetic Sciences, USA Thomas Filk, University of Freiburg, Germany*

#### *\*Correspondence:*

*Patrizio E. Tressoldi, Dipartimento di Psicologia Generale, Università di Padova, via Venezia 8, 35131 Padova, Italy patrizio.tressoldi@unipd.it*

#### *Specialty section:*

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology*

> *Received: 17 April 2015 Accepted: 10 July 2015 Published: 29 July 2015*

#### *Citation:*

*Tressoldi PE, Maier MA, Buechner VL and Khrennikov A (2015) A macroscopic violation of no-signaling in time inequalities? How to test temporal entanglement with behavioral observables. Front. Psychol. 6:1061. doi: 10.3389/fpsyg.2015.01061* Keywords: no-signaling in time, temporal entanglement, non-local correlation in time, human choices, random events

# Introduction

The possibility to use mathematical and statistical formalisms adopted in quantum mechanics for the study of biological (e.g., Engel et al., 2007; Blankenship and Engel, 2010) and cognitive phenomena (e.g., Wang et al., 2014) is not only a theoretical proposal but a rich field of empirical research (see Khrennikov, 2010; Busemeyer and Wang, 2014, for a review).

The application of quantum formalisms to domains other than quantum physics –such as biological or mental processes- is independent to the hypothesis that processing of information by biological systems is based on quantum physical processes within these systems. This approach known as "quantum biological information" is based on the *quantum-like paradigm*: biological systems of sufficiently high complexity process information in accordance with laws of quantum information theory (Khrennikov, 2010; Hameroff et al., 2014).

However, documenting the usefulness of such mathematical algorithms in modeling decision processes, memory, or consciousness, opens the possibility that the biological substrate constitutes the basis for the emergence of these quantum phenomena. This proposition is controversially discussed and only few researchers share this idea (see, e.g., Hameroff and Penrose, 2014). The main argument against the existence of quantum coherence or entanglement in biological systems like the brain refers to decoherence as a strong boundary condition of quantum phenomena (see, e.g., Tegmark, 2000; Jumper and Scholes, 2014). Decoherence of quantum states seems to occur with such a high frequency that these effects would be impossible to operate on macroscopically relevant spatial distances or time scales (Tegmark, 2000). This would imply that non-temporal correlations between temporally separated events in the range of several 100 of milliseconds or even up to seconds would be highly unlikely. In other words, the brain or the parts of it that are involved in actual information processing constitute a macroscopic entity and non-temporal correlations for macroscopic events are quite rare or even impossible (see Tegmark, 2000; but see Hameroff et al., 2014).

Independently of the quantum mind discussion, recently, in psychology, non-temporal correlations between temporally separated events (from a few 100 ms up to several minutes) have been observed (see, e.g., Mossbridge et al., 2012; Bem et al., 2014; Maier et al., 2014). These phenomena usually involved a behavioral or physiological response at time 1 (RP t1) and an activating event happening later at time 2 (AE t2). In these studies a retro-causal influence and therefore temporally non-local correlations of AE t2 on RP t1 were reported.

Since the Maier et al.'s (2014) studies will be re-analyzed within this article, we will refer to their data in more detail here to illustrate the basic finding. In a series of four out of seven studies a selective key-press at time 1 (left or right) was affected by the random assignment of negative or non-negative picture presentations at time 2. On average the participants were able to avoid negative future events. The random assignment at t2 was performed based on a pseudo random number generator (PRNG) in Studies 1, 2, and 3 and with a quantum based random number generator (RNG) in Study 4. In other words the events at t1 and t2 were classically uncorrelated. The findings, however, indicated that event t1 was affected by event t2 which could only be the case if these macroscopically occurring events were in a state of temporally non-local correlation. Although Maier et al. (2014) reported a significant avoidance effect at t1 being affected by the event on t2, a direct test of temporal non-locality has not been performed. The goal of the data presented here is to fill this gap by providing such a test.

Entanglement in time or temporal non-locality, that is a non-causal correlation between events measured at successive time frames, is one of the many "odd" phenomena studied in quantum physics and mathematical tools have been developed to test the existence of these effects within the empirical data.

Although a commonly accepted mathematical algorithm for a strict test of temporal non-locality does not exist, some mathematical inequalities that can be applied to temporally distinct physical or mental states have been developed to test the quantum-nature of the underlying physical or cognitive mechanisms. If the inequalities applied to the data are found to be violated, they would indicate the involvement of superposed states.

# Contextual LG Inequality and No-Signaling in Time (NSIT) Inequality

The theoretical foundations were originally discussed by Leggett and Garg (1985) as a temporal variant of John Bell inequalities which mainly address entanglement or non-local correlations in space. A violation of the Leggett–Garg-equation would confirm quantum-like superposed states between temporally separated events and is thus a pendant of the Bell inequalities for the time dimension. Whereas non-local temporal effects are intensely investigated in quantum physics (e.g., Olson and Ralph, 2012; Aharonov et al., 2014), there are still only few analyses of this type applied to human cognition. Atmanspacher and Filk (2010, 2012, 2013) were probably the first to test temporal non-locality to bistable perception applying their Necker–Zeno model which requires three different measures. Similarly, Asano et al. (2014), derived an analog of the Leggett and Garg (1985) inequality, "contextual LG inequality," and used it as a test of "quantum-likeness" of statistical data collected in a series of experiments on recognition of ambiguous figures. The Leggett– Garg approach has some limitations since this test can only be applied for situations involving three consecutively occurring events. For two event scenarios, as is the case in the Maier et al. (2014) research, the Leggett–Garg equation cannot be used. Fortunately, recently a test of non-local correlations for two consecutive events has been developed (Kofler and Brukner, 2013).

## The No-Signaling in Time (NSIT) Inequality

Kofler and Brukner (2013), discuss *NSIT* as a further necessary condition to satisfy the Leggett–Garg inequalities to test macrorealism defined by the postulates that (a) macroscopic objects which may have two or more macroscopically different states, at any given time, are in a single specific state, (b) it is possible to measure this specific state without changing it, and, (c) the properties of this macroscopic object are determined exclusively by the initial conditions.

*NSIT* requires only two measurements in time of two dichotomous observables, A and B, that may assume only two distinct states ±1. Hence, the basic scenario is: At1 = ±1, Bt1 = ±1 and At2 = ±1, Bt2 = ±1.

In accordance with the principle of *NSIT* the outcome probabilities for one part must not depend on the outcome probabilities of the second part and it is expressed by the following formula:

$$\mathbf{P(B\_{l2}=+1)} = \mathbf{P(A\_{l1}=-1, B\_{l2}=+1)} + \mathbf{P(A\_{l1}=+1, B\_{l2}=+1)}$$

$$\mathbf{P(A\_{l1}=+1, B\_{l2}=+1)} \text{ and symmetry.}$$

$$\mathbb{P}(\mathbb{B}\_{\text{l2}} = -1) \, = \mathbb{P}(\text{A}\_{\text{l1}} = \, +1, \mathbb{B}\_{\text{l2}} = \, -1) + \begin{array}{c} \\ \\ \mathbb{P}(\text{A}\_{\text{l1}} = \, -1, \mathbb{B}\_{\text{l2}} = \, -1) \end{array} \tag{1}$$

A violation of *NSIT* condition could be a first indicator that the mental state evolution cannot be described classically and may be explained by temporally distinct cognitive states existing in a state of superposition.

It is important to note that the temporal non-locality interpretation of NSIT is not straightforward and commonly accepted within the scientific community. The most accepted interpretation of violations of NSIT is that the data that violate these equalities are based on cognitive processes that most likely behaved quantum like. This includes the possibility that the underlying mechanisms are best described as information states that co-exist in a state of superposition. Such a quantum-like behavior of cognitive states could be considered as being a pre-condition for temporal non-locality to occur. In the analyses presented here we tested this pre-condition. To our knowledge, this is the first attempt to test this formalism in human behavioral tasks.

# Materials and Methods

# Description of the Experimental Data

Here we report the analyses of the three formal experiments in Maier et al.'s (2014) work, Study 1, Study 2, and Study 4 carried out with participants in the laboratory and with identical conditions and instructions to the participants. Our selection was based on the fact that only in these studies a retro-causal effect of t2 on t1 was observed. One successful study, Study 3, was eliminated since it was completed by a web-based program and participants could not be monitored during their task execution. Thus, only methodologically rigorously obtained data were included. A more detailed description of these experiments is presented in Maier et al. (2014).

# Participants

In all experiments participants were recruited among the undergraduate and graduate students of the University of Munich, Stony Brook NY, and Barcelona. The number of participants was 111, 201, and 327 for Study 1, Study 2, and Study 4, respectively.

## Procedure

Each participant was tested individually in a quiet lab room. After the completion of two preliminary tasks, lasting approximately 20 min and being unrelated to the crucial study, which were devised in order to increase the cognitive fatigue for inducing a more intuitive approach, participants were informed about their new task. A written instruction was presented on the screen: '*In the following experiment you have to press two keys on the keyboard as simultaneously as possible. You will see this instruction on the monitor's screen: Please Press the Keys. While seeing this instruction, please press both keys as simultaneously as possible! Afterward colored stimuli will be presented which you should simply watch.*'

After the participants read the instructions, the experimenter explained that the participants should put their index fingers on the left and right cursor keys of the keyboard. Both keys were placed on the table in front of the participants exactly at the same horizontal position as the midpoint of the computer screen. The experimenter emphasized that both index fingers should slightly touch the cursor keys throughout the experiment, and once the command appears they should press both keys as simultaneously as possible. Participants were informed that there is no rush, but the response should be spontaneous, and that after the key-press they should simply watch the following presentation of a colored stimulus.

Each trial started with the key-press command presented on the screen. Once the key-press was performed, the command line disappeared and, after a 500 ms delay with a black screen, a masked positive (Study 1 or neutral, Studies 2 and 4) or negative picture was presented. The masked picture presentation consisted of three consecutive stimulus presentations.

First, a masking stimulus was presented for 72 ms, followed by the presentation of a negative or positive (neutral) picture for 18 ms, again followed by the same mask for 72 ms. Each negative and positive (neutral) picture was combined with an individual mask. The masking stimulus was constructed by dividing the original picture into small squares that were randomly rearranged. The resulting mask consisted of the same color and lightness properties as the original picture and could therefore effectively mask the content of the picture ensuring a subliminal presentation. According to our theoretical model, subliminal perception is critical to allow a superposition of the information states in time. After the second masking stimulus had disappeared, a 3000 ms intertrial interval appeared before the key-press command initiated the next trial. A total of 60 trial presentations were used in all studies. The 60 experimental trials were preceded by three practice trials with neutral pictures helping the participants to familiarize themselves with the task. Pictures were taken from the International Affective Picture System (IAPS; Lang et al., 2008).

Although participants were told to press both keys simultaneously, due to the design of a typical computer keyboard, one of two keys is always triggered first. Thus, in any given trial, either a left or a right key-press was registered even though participants subjectively performed a simultaneous two-key-response. For Studies 1 and 2 a closed deck procedure was applied, that is in half of the trials, triggering a left key resulted in a positive (neutral) masked picture presentation and a right key in a negative one. In the other half, key and valence assignment were exactly reversed. The randomization procedure provided by E-PrimeTM was used to randomize the order of trial presentation. The 10 positive and 10 negative pictures were randomly assigned to each trial with the restrictions that each picture could maximally be presented six times within a study [i.e., if a participant always 'chooses' a positive picture presentation, 60 (6 × 10) positive (neutral) pictures would be presented]. In Study 4 an open deck procedure was used, that is the exact assignment to left and right key press and neutral vs. negative picture presentation was abandoned. Also, in this study a quantum-based randomizer, i.e., a true RNG, was used for randomization. Randomized trial selection was performed at the beginning of each trial. After the completion of the 60 trials participants saw each masked picture presentation again and were asked after each whether they could recognize anything and, if so, what.

None of the participants in each of the experiments reported here could precisely name the content of any picture. Thus, the masking procedure met the criterion of subjective unawareness (from Maier et al., 2014, pp. 130–132).

In Studies 2 and 4, material, design, and procedure were the same as in Study 1 with the one difference that the 10 negative pictures from Study 1 were used together with 10 neutral instead of positive pictures. Again, the pictures were taken from the IAPS; (Lang et al., 2008).

In Study 4, the only difference with respect to Study 2 was that the randomization was obtained by using a quantum based number generator (QRNG) from www.idquantique.com.

### Formal Mathematical Representation

There are two random variables A = A\_{t\_1} and B = B\_{t\_2}. The first one corresponds to the first task where the right and left keys determine the values A = +1 and A = −1, respectively. The nature of another variable is more complicated. The task at t\_2 determining B is the subliminal perception of a positive or a negative emotional picture. In psychology this task is considered a "response." Now if we assume that these random variables can be represented in the classical probabilistic framework, i.e., there can be introduced the joint probability distribution for their values P(A = x, B = y), the additivity of probability implies that P(B = y) = P(A = +1, B = y) + P(A = −1, B = y).

Typically in applications this equality is treated in the form of the formula of total probability

$$\mathbb{P}(\mathbf{B}=\mathbf{y}) = \mathbb{P}(\mathbf{A}=\mathbf{ +1})\mathbb{P}(\mathbf{B}=\mathbf{y}/\mathbf{A}=\mathbf{ +1}) + \mathbb{P}(\mathbf{ +1})\mathbb{P}(\mathbf{A}=\mathbf{ +1})\mathbb{P}(\mathbf{A}=\mathbf{ +1}) = \mathbb{P}(\mathbf{A}=\mathbf{ +1})\mathbb{P}(\mathbf{A}=\mathbf{ +1}) + \mathbb{P}(\mathbf{ +1})\mathbb{P}(\mathbf{A}=\mathbf{ +1})\mathbb{P}(\mathbf{A}=\mathbf{ +1}) = \mathbb{P}(\mathbf{A}=\mathbf{ +1})\mathbb{P}(\mathbf{A}=\mathbf{ +1}) + \mathbb{P}(\mathbf{ +1})\mathbb{P}(\mathbf{A}=\mathbf{ +1})\mathbb{P}(\mathbf{A}=\mathbf{ +1}) = \mathbb{P}(\mathbf{A}=\mathbf{ +1})\mathbb{P}(\mathbf{A}=\mathbf{ +1}) + \mathbb{P}(\mathbf{ +1})\mathbb{P}(\mathbf{A}=\mathbf{ +1})\mathbb{P}(\mathbf{A}=\mathbf{ +1})\mathbb{P}(\mathbf{A}=\mathbf{ +1}) = \mathbb{P}(\mathbf{A}=\mathbf{$$

This formula is violated in a variety of psychological tasks related to disjunction, conjunction and order effects and various probability fallacies (see, for example, Khrennikov, 2010; Busemeyer and Bruza, 2012; Wang et al., 2014).

The main distinguishing feature of the present study is that we couple the violation of the formula of total for statistical data collected in experiments with humans with (non)signaling problem in quantum physics, i.e., time is fundamentally involved into the experimental scheme.

# Application of NSIT Formalism

The left-hand side of equation (1) P(Bt2 = ±1) was estimated with a mean equal to 0.5 and a SD of 0.5 assuming a correct randomization.

The probabilities of the right-hand of equation 1, were empirically drawn cross-tabulating the data obtained in the three experiments (see Supplementary Material).

Following the suggestion of Khrennikov et al. (2014), we estimated the SEM of P(Bt2 = ±1) taking in account the number of trials of each experiment. The ratio of the observed *NSIT* with the SEM was used as an estimate of the *NSIT* violation.

# Results

In **Table 1** we report the results of the application of the *NSIT* inequality and the standardized deviation with respect to the P(B t2 = ±1) in SE.

The σ values which represent the violation of *NSIT* inequality in term of the number of SE from the expected probability at t2, 0.5 in our case, show a clear and strong *NSIT* violation both in the first two experiments and in the analysis of the total trials weighted for the number of trials. It is unclear to us why the *NSIT* analysis did not reveal a violation for Study 4.

#### TABLE 1 | Results of the three experiments.


*SEM, standard error of mean; NSIT, no-signaling in time;* σ = *NIST/SE.*

One reason could be the different approaches to realize the trial randomization. Although PRNGs have been used in Studies 1 and 2 and a true RNG was applied in Study 4, PRNG and trueRNG both equally produce random events especially when the seed number and the algorithm used for the PRNG procedure was unknown to the participants, which is the case for our Studies 1 and 2. Raw data for independent analyses are available on http://figshare.com/articles/No\_Signaling\_in\_Time\_Raw\_Data/ 1383260.

# Discussion

Applying quantum mathematical formalisms to test the quantum-likeness of cognitive and behavioral phenomena is becoming more and more popular within the scientific community. In this study we applied the *NSIT* formalism to investigate temporal entanglement between binary human behavioral unconscious choices at t1 with binary random outcomes at t2. This is, to our knowledge, the first time that *NSIT* formalism has successfully been applied to psychological data sets. The results of three independent experiments showed a strong violation of *NSIT* suggesting the hypothesis of a quantum-like temporal entanglement between the choices at t1 with binary random outcomes at t2 in Studies 1 and 2. However, a null result was observed in Study 4. Overall, it seems that for the majority of the data evidence for temporal entanglement could be found.

Our results therefore support the idea of exploring quantum phenomena with data obtained with psychological variables involving unconscious decision making based on automatic affective processes. *NSIT* could thus be a valuable tool to test quantum effects in similar paradigms since most psychological experiments consists of activating events and corresponding responses. The main goal of our analyses was to introduce this powerful set of inequalities to a broader psychologically interested scientific community.

In any event, it is too early to be able to draw firm conclusion about the effect of the differences between the studies on the outcome of the *NSIT* analysis. At the moment, a pre-registered replication of Study 4 is being undertaken and will be completed in about 1 year. An additional analysis of these data with *NSIT* will shed some more light on the usefulness and applicability of the *NSIT* theorem in psychology.

# Acknowledgments

We thank Marco Genovese of INRIM, for an independent analysis of our results and his suggestion on how to weight the total results.

# References


# Supplementary Material

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fpsyg. 2015.01061


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Tressoldi, Maier, Buechner and Khrennikov. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Unitary Transformations in the Quantum Model for Conceptual Conjunctions and Its Application to Data Representation

Tomas Veloz 1, 2, 3 \* † and Sylvie Desjardins 1 †

<sup>1</sup> Department of Mathematics, University of British Columbia, Kelowna, BC, Canada, <sup>2</sup> Center Leo Apostel, Vrije Universiteit Brussel, Brussels, Belgium, <sup>3</sup> Instituto de Filosofía y Ciencias de la Complejidad - IFICC, Ñuñoa, Chile

Quantum models of concept combinations have been successful in representing various experimental situations that cannot be accommodated by traditional models based on classical probability or fuzzy set theory. In many cases, the focus has been on producing a representation that fits experimental results to validate quantum models. However, these representations are not always consistent with the cognitive modeling principles. Moreover, some important issues related to the representation of concepts such as the dimensionality of the realization space, the uniqueness of solutions, and the compatibility of measurements, have been overlooked. In this paper, we provide a dimensional analysis of the realization space for the two-sector Fock space model for conjunction of concepts focusing on the first and second sectors separately. We then introduce various representation of concepts that arise from the use of unitary operators in the realization space. In these concrete representations, a pair of concepts and their combination are modeled by a single conceptual state, and by a collection of exemplar-dependent operators. Therefore, they are consistent with cognitive modeling principles. This framework not only provides a uniform approach to model an entire data set, but, because all measurement operators are expressed in the same basis, allows us to address the question of compatibility of measurements. In particular, we present evidence that it may be possible to predict non-commutative effects from partial measurements of conceptual combinations.

#### Edited by:

Sandro Sozzo, University of Leicester, UK

#### Reviewed by:

Zheng Wang, Ohio State University, USA Jose Acacio De Barros, San Francisco State University, USA

#### \*Correspondence:

Tomas Veloz tveloz@gmail.com

† These authors have contributed equally to this work.

#### Specialty section:

This article was submitted to Cognition, a section of the journal Frontiers in Psychology

Received: 03 August 2015 Accepted: 28 October 2015 Published: 12 November 2015

#### Citation:

Veloz T and Desjardins S (2015) Unitary Transformations in the Quantum Model for Conceptual Conjunctions and Its Application to Data Representation. Front. Psychol. 6:1734. doi: 10.3389/fpsyg.2015.01734 Keywords: concept combination, quantum cognition, data representation, unitary transformation, conjunction

# 1. INTRODUCTION

# 1.1. Concept Combinations in Quantum Cognition

The application of quantum models to cognitive phenomena is an emergent field known as quantum cognition (Aerts, 2009; Pothos and Busemeyer, 2013). One of the areas in quantum cognition that has received much attention is the study of concepts and their combinations (Aerts and Gabora, 2005a,b; Aerts, 2007a,b; Aerts and Sozzo, 2011; Aerts et al., 2013). In a general setting, a cognitive situation might include multiple concepts forming aggregated structures (Rips, 1995; Fodor, 1998). For example, the concepts "Fruit" and "Vegetable" can be combined to form a new concept "Fruit And Vegetable" (Hampton, 1988a). This example of a concept combination

is built with the connective "And," which is also an operation mathematically defined in logic and probability. The question becomes, is it possible to apply the mathematical definition of the connective "And" to build the structure of "Fruit And Vegetable" from the structures of "Fruit" and "Vegetable"? Cognitive scientists have performed several experiments measuring various semantic estimations including typicality, membership, and similarity of concept combinations built with connectives such as "And," and "Not" (Hampton, 1997a, 1988a,b), and adjectivenoun compounds such as "Red Apple" (Medin and Shoben, 1988; Medin, 1989; Kamp and Partee, 1995). The evidence collected during two decades of research suggests that it might not be possible to represent all the experimental data for concept combinations using the mathematical structures of fuzzy logic or probability theory. Quantum cognition proposes an alternative approach.

While traditional models based on classical logic, probability, or fuzzy set theory have failed to properly account for cognitive phenomena exhibiting non-classical probabilistic features, quantum models have consistently provided a framework that easily encompasses these and other so-called cognitive biases (Gilovich et al., 2002; Busemeyer et al., 2011) or paradoxical phenomena (Aerts et al., 2011a). Quantum inspired models have been successfully developed in the areas of decision making (Aerts et al., 2011b, 2012b; Busemeyer et al., 2011; Busemeyer and Bruza, 2012), psychology of categorization (Aerts and Aerts, 1995; Blutner et al., 2013; Sozzo, 2014), human memory (Bruza and Cole, 2005; Bruza et al., 2009, 2012), and finances (Khrennikov, 2009; Haven and Khrennikov, 2013). In this paper we will focus on the phenomena of concept conjunction. However, since our analysis and methodology is based on pure mathematical notions of the quantum mechanical framework, the results presented in this paper can be extended to other concept combinations (Veloz, 2015).

Aerts (2009) formally states the conditions that characterize the existence of a classical probability model for concept conjunction:

Definition 1. Let µ(A),µ(B), and µ(AB) be the membership weights of an exemplar p with respect to a pair of concepts A and B and their conjunction AB. We say that these membership weights are classical conjunction data if there exists a Kolmogorovian probability space (,σ(), P), and events EA, E<sup>B</sup> ∈ σ() such that

$$\begin{aligned} P(E\_A) &= \mu(A), \\ P(E\_B) &= \mu(B), \\ P(E\_A \cap E\_B) &= \mu(AB). \end{aligned} \tag{1}$$

Classical conjunction data characterizes the membership values of the conjunction of concepts that can be modeled in a classical probabilistic framework. It is therefore important to characterize the notion of classical conjunction data in terms of the membership weights.

Corollary 1. The membership weights µ(A), µ(B), and µ(AB) of an exemplar p with respect to concepts A, B, and their conjunction AB are classical conjunction data if and only if

$$0 \le \mu(AB) \le \mu(A),\tag{2}$$

$$0 \le \mu(AB) \le \mu(B),\tag{3}$$

$$0 \le \mu(A) + \mu(B) - \mu(AB) \le 1. \tag{4}$$

A large body of experimental evidence and a considerable amount of data analysis indicate that the membership of exemplars with respect to concept combinations does not form classical conjunction data (Fodor and Lepore, 1996; Hampton, 1997a,b; Aerts and Gabora, 2005a,b). Namely, the membership with respect to the conjunction of concepts is generally larger than the membership of one of the former concepts, and thus violates either conditions (2) or (3). This phenomenon is called single overextension. When conditions (2) and (3) are violated simultaneously, it is called double overextension. The violation of condition (4) is called the Kolmogorovian factor violation. We refer to (Pitowsky, 1989; Aerts, 2009) for an explanation of this phenomenon.

In Supplementary Table 1, we show two cases reported in Hampton (1988b). In the first case, the membership weight µ1(AB) of the item p<sup>1</sup> ="coffee table" with respect to the conjunction <sup>A</sup>1B<sup>1</sup> <sup>=</sup>"Furniture And Household Appliances" is single overextended with respect to the membership weights <sup>µ</sup>1(A) and <sup>µ</sup>1(B) of concepts <sup>A</sup><sup>1</sup> <sup>=</sup>"Furniture," and <sup>B</sup><sup>1</sup> <sup>=</sup>"Household Appliances," respectively. In the second case, membership weight µ2(AB) of the item p<sup>2</sup> ="tree house" with respect to the conjunction <sup>A</sup>2B<sup>2</sup> <sup>=</sup>"Building And Dwelling" is doubly overextended with respect to the membership weigths <sup>µ</sup>2(A) and <sup>µ</sup>2(B) of the concepts <sup>A</sup><sup>2</sup> <sup>=</sup>"Building," and <sup>B</sup><sup>2</sup> <sup>=</sup>"Dwelling," respectively.

The phenomenon of overextension has also been demonstrated not only for membership estimations, but also in typicality (Smith and Osherson, 1981; Hampton, 1996; Storms et al., 1998), property relevance (Fodor and Lepore, 1996; Hampton, 1997a,b; Aerts and Gabora, 2005a,b), and probability estimations (Tversky and Kahneman, 1983; Moro, 2009).

# 1.2. The Quantum Approach to Concept Combination

The quantum approach to concepts introduces two fundamental assumptions that depart from classical approaches:


Concepts <sup>A</sup> and <sup>B</sup> are represented by the states <sup>|</sup>A<sup>i</sup> and <sup>|</sup>Bi, respectively. When we consider the conjunction AB of these two concepts, there are two different ways to combine the concepts (Aerts, 2009). The first considers the conjunction of concepts from an intuitive perspective in the sense that the connective And does not play a logical role in the combination AB; instead the conjunction AB is viewed as an emergent entity. In particular, the quantum model assumes that the state of the combined concept <sup>|</sup>ABi ∈ <sup>H</sup> is given by a superposition of the states of concepts A and B as follows:

$$|AB\rangle = \frac{1}{\sqrt{2}}(|A\rangle + |B\rangle). \tag{5}$$

The second way considers the conjunction of concepts from a logical perspective, in the sense that And does play a logical role in the combination AB. In particular, the quantum model assumes that the state of the combined concept |Ci is modeled in the tensor product space <sup>H</sup> <sup>⊗</sup> <sup>H</sup>, where each space in the product captures the representation of the concepts in the combination, while the entire space represents the conjunction. The two quantum models of concept combination are presented in Supplementary Material. These two modes can be unified in a mathematical framework developed in quantum mechanics called Fock space (Aerts, 2007a, 2009).

A Fock space is a direct sum of tensor products of Hilbert spaces, where each space in the sum represents the state space of a system having different numbers of particles (Meyer, 1995). For the case of concepts, we model the state of the combination of two concepts in the two-sector Fock space:

$$
\mathcal{F} = \mathcal{H} \oplus (\mathcal{H} \otimes \mathcal{H}).\tag{6}
$$

The first space, H, also called the first sector, represents the concept combination as an emergent entity. The second space, <sup>H</sup> <sup>⊗</sup> <sup>H</sup>, called the second sector, represents the concept combination as a logical entity. The state of the combined concept in the two-sector Fock space is hence a superposition of the two modes of combination.

For example, when |Ci = |Ai ⊗ |Bi, the state |ψi of the concept combination is

$$|\psi\rangle = \frac{n\dot{e}^{\dot{\Pi}\theta\_1}}{\sqrt{2}}(|A\rangle + |B\rangle) + \sqrt{1 - n^2}e^{\dot{\Pi}\theta\_2}|A\rangle \otimes |B\rangle,\tag{7}$$

and the membership formula is given by

$$
\mu \langle AB \rangle = n^2 \left( \frac{\mu \langle A \rangle + \mu \langle B \rangle}{2} + \Re \{ \langle A | M | B \rangle \} \right) + \sqrt{1 - n^2} \mu \langle A | \mu \langle B \rangle, \tag{8}
$$

for 0 ≤ n ≤ 1.

When n = 1, the membership weight µ(AB) corresponds to the sum of the average of µ(A) and µ(B), plus an interference term ℜ(hA|M|Bi) bounded by

$$-\sqrt{\mu(A)\mu(B)} \le \Re(\langle A|M|B\rangle) \le \sqrt{\mu(A)\mu(B)}.$$

In the absence of interference, i.e., when ℜ(hA|M|Bi) = 0, the membership weight is simply the average of the former membership weights. This particular case, which has been shown to provide a good first approximation to exemplars of conceptual conjunction (Aerts et al., 2012a), is overextended, and therefore non-classical. When n = 0, the membership weight corresponds to the product µ(A)µ(B), which is equivalent to the probability of two joint classical events that are independent. When 0 < n < 1, the state of the concept is in the superposition of the two modes of combination.

Finally, the membership operator for a certain exemplar with respect to the conjunction of two concepts is given by

$$\mathbf{M}^{\mathcal{F}} = \mathbf{M} \oplus (\mathbf{M} \otimes \mathbf{M}),\tag{9}$$

where **M** is the operator that measures membership of the exemplar in the first sector, and **M**⊗**M** measures the membership of the exemplar with respect to the two concepts simultaneously in the second sector.

In addition to providing a suitable mathematical framework for cognitive models, quantum cognition also offers a different perspective on cognitive phenomena: uncertainty is described by means of superposed states (Aerts et al., 2011b), non-logical coherence involves interference (Aerts, 2009), order effects are revealed by incompatible measurements (Wang and Busemeyer, 2013), and certain "verb-noun" conceptual combinations mimic the structure of physically entangled particles (Aerts and Sozzo, 2014).

# 1.3. The Representation of Data

One of the reasons why quantum models of concept combinations have not been widely used is that the issue of data representation has been overlooked. Scholars have studied the capacity of quantum models to fit semantic estimations of concept combinations, and have presented concrete representations of the different estimations to validate the models (Aerts, 2007a,b, 2009; Aerts et al., 2012a; Sozzo, 2014); these concrete representations, however, model the data in an exemplar-based fashion, where one operator is used for all exemplars, but the conceptual state varies with exemplars.

For example, Aerts (2009) builds a quantum model in the Hilbert space C 3 to consider the exemplars "filing cabinet" and "heated waterbed" with respect to concepts <sup>A</sup> <sup>=</sup>"Furniture, <sup>B</sup> <sup>=</sup>"Household Appliances," and their conjunction AB. For the first exemplar, we have µ(A) = 0.97, µ(B) = 0.31, and µ(AB) = 0.53. This case is represented by the vectors

$$\begin{aligned} \vert A \rangle &= (-0.57 + 0.40\vert, 0.29 - 0.63\vert, 0.13 + 0.11\vert), \\ \vert B \rangle &= (0.39, 0.39, 0.83). \end{aligned} \tag{10}$$

For the second exemplar, µ(A) = 1, µ(B) = 0.49, and µ(AB) = 0.78, and the state vectors are given by

$$\begin{aligned} \vert A \rangle &= \langle 0.71, 0.71, 0 \rangle, \\ \vert B \rangle &= \langle 0.49, 0.49, 0.71 \rangle. \end{aligned} \tag{11}$$

In both cases **M** is defined by the projection operator

$$\mathbf{M}(\boldsymbol{x}, \boldsymbol{y}, \boldsymbol{z}) \to (\boldsymbol{x}, \boldsymbol{0}, \boldsymbol{0}).\tag{12}$$

Such concrete representations are useful to validate models, but unwieldy if one seeks to build a model that can be used for studying and comparing large amounts of data. Because the state is independent of the exemplar, it must remain the same for all measurements. But if we require the state representing the concept to remain fixed, then the number of measurement operators is restricted by the dimension of the Hilbert space H. In fact, because the membership operator is usually represented by the identity projector restricted to a smaller subspace, and the identity operator of the entire space and the null operator entail trivial measurements, the number of projectors available to represent membership measurements is restricted to n − 1, for <sup>n</sup> <sup>=</sup> dim(H). This implies that, if we consider <sup>n</sup> or more exemplars, then some exemplars will not have a unique membership operator. These issues become crucial in realworld situations involving concepts that entail thousands of exemplars (Tenenbaum et al., 2011).

In Section 2, we take a close look at the concrete representations of quantum models on each sector of the Fock space to identify the minimal dimensionality required to reach the modeling capacity of each of the sectors. In Section 3, we introduce the notion of unitary transformation for the first and second sectors of the Fock space separately, and propose concrete representations for concepts in these two models that require a single conceptual state, and a collection of exemplar-dependent operators. In Section 4, we use these representations to advance a conjecture concerning compatibility of measurements.

# 2. DIMENSIONALITY ANALYSIS OF THE TWO-SECTOR FOCK SPACE MODEL

In what follows we determine the dimension of H required to model concept combinations in the first and second sectors of the two-sector Fock space model. To explore this question, we assume <sup>H</sup> <sup>=</sup> <sup>C</sup> n equipped with the standard inner product, and analyze how n relates to the representation of concepts.

# 2.1. First Sector Dimension Analysis

The Hilbert space model for concept conjunction requires two vectors, <sup>|</sup>Ai, <sup>|</sup>Bi ∈ <sup>H</sup>, and an orthogonal projector, **<sup>M</sup>** : <sup>H</sup> <sup>→</sup> <sup>H</sup>, such that

$$
\langle A|A\rangle = \langle B|B\rangle = 1,\tag{13}
$$

$$
\langle A|B\rangle = 0,\tag{14}
$$

$$
\langle A|\mathbf{M}|A\rangle = \mu \langle A\rangle,\ \langle B|\mathbf{M}|B\rangle = \mu \langle B\rangle,\tag{15}
$$

$$
\mu(AB) = \frac{1}{2}(\mu(A) + \mu(B)) + \Re(\langle A|\mathbf{M}|B\rangle).\tag{16}
$$

The next theorem shows that n = 3 is sufficient to build a model that satisfies conditions (13–16).

Theorem 1. Let µ(A),µ(B), and µ(AB) denote the membership of an exemplar with respect to concepts A, B, and their conjunction AB. The membership weights are compatible with a complex Hilbert space model <sup>H</sup> <sup>=</sup> <sup>C</sup> 3 if and only if

$$
\mu(AB) \in [\text{ave}(AB) - \text{dev}(AB), \text{ave}(AB) + \text{dev}(AB)], \tag{17}
$$

where

$$\begin{aligned} \text{ave(AB)} &= \frac{1}{2} (\mu(A) + \mu(B)), \text{ and} \\ \text{dev(AB)} &= \sqrt{\min(\mu(A)\mu(B), (1 - \mu(A))(1 - \mu(B))}. \end{aligned} \tag{18}$$

Proof. We derive Equation (17) by applying conditions (13–16). First, if **M** is a zero- or three-dimensional projector, then

$$
\begin{aligned}
\mu(A) &= \mu(B) = \mu(AB) = 0, \text{ or} \\
\mu(A) &= \mu(B) = \mu(AB) = 1,
\end{aligned}
\tag{19}
$$

respectively. Thus, Equation (17) holds, and Equations (13–16) are satisfied by choosing |Ai and |Bi to be any two mutually orthogonal unit vectors.

Next, we consider the cases where **M** is either a one- or twodimensional projector. We apply conditions (13–16) to vectors |Ai and |Bi in these two cases separately, and combine the results to obtain (Equation 17).

If **M** is a one-dimensional projector, then without loss of generality, we can choose

$$\begin{aligned} \mathbf{M}(\boldsymbol{x}, \boldsymbol{y}, \boldsymbol{z}) &\to \langle \boldsymbol{\kappa}, \mathbf{0}, \mathbf{0} \rangle, \text{ and} \\\\ |A\rangle &= \langle a\_1 e^{i\alpha\_1}, a\_2 e^{i\alpha\_2}, a\_3 e^{i\alpha\_3} \rangle, \\ |B\rangle &= \langle b\_1 e^{i\beta\_1}, b\_2 e^{i\beta\_2}, b\_3 e^{i\beta\_3} \rangle. \end{aligned} \tag{20}$$

Note that conditions (13) and (15) are satisfied by choosing the coefficients in |Ai and |Bi as follows:

$$\begin{aligned} a\_1 &= \sqrt{\mu(A)}; \ a\_2 = \sqrt{\lambda}\sqrt{1-\mu(A)} \; ; a\_3 = \sqrt{1-\lambda}\sqrt{1-\mu(A)}, \\ b\_1 &= \sqrt{\mu(B)}; \ b\_2 = \sqrt{\kappa}\sqrt{1-\mu(B)} \; ; b\_3 = \sqrt{1-\kappa}\sqrt{1-\mu(B)}, \end{aligned} \tag{21}$$

with 0 ≤ λ ≤ 1, and 0 ≤ κ ≤ 1. Moreover, Equation (16) implies that µ(AB) is given by

$$
\mu(AB) = \frac{1}{2}(\mu(A) + \mu(B)) + \sqrt{\mu(A)\mu(B)}\cos(\alpha\_1 - \beta\_1). \tag{22}
$$

We then apply condition (14) to obtain

$$\begin{aligned} & -\sqrt{\mu(A)\mu(B)}\cos(\varphi\_1) \\ &= \sqrt{(1-\mu(A))(1-\mu(B))}F(\lambda,\kappa,\cos(\varphi\_2),\cos(\varphi\_3)), \\ & -\sqrt{\mu(A)\mu(B)}\sin(\varphi\_1) \\ &= \sqrt{(1-\mu(A))(1-\mu(B))}F(\lambda,\kappa,\sin(\varphi\_2),\sin(\varphi\_3)), \end{aligned} \tag{23}$$

where

$$F(\lambda, \kappa, f(\mathbf{x}), f(\mathbf{y})) := \left(\sqrt{\lambda \kappa} f(\mathbf{x}) + \sqrt{(1 - \lambda)(1 - \kappa)} f(\mathbf{y})\right). \tag{25}$$

Since <sup>F</sup>(λ,κ, cos(γ2), cos(γ3)) is a convex combination of <sup>√</sup> λκ and <sup>√</sup> (1 − λ)(1 − κ), we have

$$|F(\lambda, \kappa, \cos(\wp\_2), \cos(\wp\_3))| \le |\sqrt{\lambda \kappa}| + |\sqrt{(1 - \lambda)(1 - \kappa)}|.\tag{26}$$

We set

$$
\sqrt{\lambda} = \cos(\theta\_1), \ \sqrt{\kappa} = \cos(\theta\_2), \tag{27}
$$

for θ1,θ<sup>2</sup> in [0, π 2 ]. Then

$$\begin{aligned} \sqrt{1-\lambda} &= \sin(\theta\_1), \\ \sqrt{1-\kappa} &= \sin(\theta\_2). \end{aligned} \tag{28}$$

Substituting Equations (27) and (28) in Equation (26), we obtain

$$|F(\lambda, \kappa, \cos(\wp\_2), \cos(\wp\_3))| \le |\cos(\theta\_1 - \theta\_2)| \le 1. \tag{29}$$

Then Equation (23) implies that

$$|\sqrt{\mu(A)\mu(B)}\cos(\varphi\_1)| \le \sqrt{(1-\mu(A))(1-\mu(B))}.\tag{30}$$

Therefore, the interference term is bounded as follows:

$$|\sqrt{\mu(A)\mu(B)\cos(\varphi\_1)}| \le \min(\sqrt{\mu(A)\mu(B)}, \ \text{(31)})$$

$$\begin{split} \sqrt{(1-\mu(A))(1-\mu(B))} \\ = \text{dev}(AB). \end{split} \tag{31}$$

Next, combining Equations (23) and (24), we obtain

$$
\mu(A)\mu(B) = (1 - \mu(A))(1 - \mu(B))\hat{F}(\lambda, \kappa, \,\varphi\_2, \,\varphi\_3), \tag{32}
$$

where

$$\begin{split} \hat{F}(\lambda,\,\kappa,\,\varphi\_2,\,\varphi\_3) &= F^2(\lambda,\,\kappa,\,\cos(\varphi\_2),\,\cos(\varphi\_3)) \\ &+ F^2(\lambda,\,\kappa,\,\sin(\varphi\_2),\,\sin(\varphi\_3)). \end{split} \tag{33}$$

Hence,

$$
\mu(A) + \mu(B) = 1 + \mu(A)\mu(B)\left(1 - \frac{1}{\hat{F}(\lambda, \kappa, \, \chi\_2, \, \chi\_3)}\right). \tag{34}
$$

We use the parametrization for λ and κ given by Equation (27), and apply Equations (29–33), to obtain

$$0 \le \hat{F}(\lambda, \kappa, \gamma\_2, \gamma\_3) \le \cos(\theta\_1 - \theta\_2)^2 + \sin(\theta\_1 - \theta\_2)^2 = 1. \tag{35}$$

Combining Equations (35) and (34) yields

$$
\mu(A) + \mu(B) \le 1.\tag{36}
$$

Therefore, when **M** is a one-dimensional projector, conditions (13–16) imply

$$\begin{aligned} \mu(AB) &\in [\text{ave}(AB) - \text{dev}(AB), \text{ave}(AB) + \text{dev}(AB)], \text{ and} \\ \mu(A) + \mu(B) &\le 1. \end{aligned} \tag{37}$$

Next, consider the case in which **M** is a two dimensional projector. Without loss of generality, we can assume

$$\mathbf{M}(\mathfrak{x}, \mathfrak{y}, \mathfrak{z}) \to (\mathfrak{x}, \mathfrak{y}, \mathfrak{0}).$$

The requirements Equations (13) and (15) are satisfied by choosing the coefficients in |Ai, |Bi as follows

$$\begin{aligned} a\_1 &= \sqrt{\lambda} \sqrt{\mu(A)}; \ a\_2 = \sqrt{1 - \lambda} \sqrt{\mu(A)} \; ; a\_3 = \sqrt{1 - \mu(A)}, \\ b\_1 &= \sqrt{\kappa} \sqrt{\mu(B)}; \ b\_2 = \sqrt{1 - \kappa} \sqrt{\mu(B)} \; ; b\_3 = \sqrt{1 - \mu(B)}, \end{aligned} \tag{38}$$

with 0 ≤ λ ≤ 1, and 0 ≤ κ ≤ 1. Moreover, Equation (16) implies that µ(AB) is given by

$$
\mu\langle AB\rangle = \frac{1}{2} (\mu\langle A\rangle + \mu\langle B\rangle) + \sqrt{\mu\langle A\rangle\mu\langle B\rangle} F(\lambda, \kappa, \cos(\gamma\_1), \cos(\gamma\_2)).\tag{39}
$$

We apply condition (14) to obtain

$$\begin{aligned} &\sqrt{\mu(A)\mu(B)}F(\lambda,\kappa,\cos(\gamma\_1),\cos(\gamma\_2)) \\ &= -\sqrt{(1-\mu(A))(1-\mu(B))}\cos(\gamma\_3). \end{aligned} \tag{40}$$

Since F(λ,κ, cos(γ1), cos(γ2)) ≤ 1, Equation (40) implies that

$$\begin{aligned} |\sqrt{\mu(A)\mu(B)F(\lambda,\,\kappa,\cos(\gamma\_1),\cos(\gamma\_2))}| &\leq \min\{\sqrt{\mu(A)\mu(B)},\\ \sqrt{(1-\mu(A))(1-\mu(B))} \\ &= \text{dev}(AB). \end{aligned} \tag{41}$$

We repeat the procedure used in the one-dimensional case to obtain

$$
\mu(A)\mu(B)\hat{F}(\lambda,\kappa,\,\varphi\_1,\,\varphi\_2) = (1-\mu(A))(1-\mu(B)).\tag{42}
$$

Since 0 ≤ Fˆ(λ,κ,γ1,γ2) ≤ 1, Equation (42) yields

$$1 \le \mu(A) + \mu(B). \tag{43}$$

Therefore, when **M** is a two-dimensional projector, conditions (13–16) imply

$$\begin{aligned} \mu(AB) &\in \text{(ave(AB) - dev(AB), ave(AB) + dev(AB))}, \text{ and} \\ 1 &\le \mu(A) + \mu(B). \end{aligned} \tag{44}$$

We complete the proof by merging Equations (37) and (44).

The general case, <sup>H</sup> <sup>=</sup> <sup>C</sup> n for n > 3, doesn't provide additional modeling power since the condition given by Equation (17) remains. Also, the case <sup>H</sup> <sup>=</sup> <sup>C</sup> 2 is more restrictive than the <sup>H</sup> <sup>=</sup> C 3 case. In fact, membership data compatible with conditions (13–16) for <sup>H</sup> <sup>=</sup> <sup>C</sup> <sup>2</sup> must satisfy <sup>µ</sup>(A) <sup>+</sup> <sup>µ</sup>(B) <sup>=</sup> 1 (Veloz, 2015).

# 2.2. Second Sector Dimension Analysis

The second sector of the two-sector Fock space requires a concept combination state <sup>|</sup>Ci ∈ <sup>C</sup> <sup>n</sup> <sup>⊗</sup> <sup>C</sup> n and an operator **M** : C <sup>n</sup> <sup>→</sup> <sup>C</sup> n , such that |Ci restricted to the first sector represents the concept <sup>A</sup>, and <sup>|</sup>C<sup>i</sup> restricted to the second sector represents the concept <sup>B</sup>. However, <sup>|</sup>C<sup>i</sup> cannot in general be decomposed as a tensor product of the type <sup>|</sup>CAi ⊗ |CBi, for <sup>|</sup>CAi, <sup>|</sup>CBi ∈ <sup>C</sup> n . Therefore, |Ci is usually a non-separable state.

To recover the probabilistic structure of the former concepts in the combination, the operators **M** ⊗ **1** and **1** ⊗ **M** are applied to |Ci to obtain µ(A) and µ(B), respectively. Moreover, since <sup>|</sup>C<sup>i</sup> as a whole represents the concept combination AB, then the operator **M** ⊗ **M** is applied to |Ci to obtain µ(AB).

The following definition summarizes how data is represented in the second sector.

Definition 2. Let µ = {µ(A),µ(B), µ(AB)} be a triplet denoting the membership of concepts A, B, and their conjunction AB. We say that the triplet µ admits a representation in C <sup>n</sup> <sup>⊗</sup> <sup>C</sup> n if there exists a unit vector <sup>|</sup>Ci ∈ <sup>C</sup> <sup>n</sup> <sup>⊗</sup> <sup>C</sup> n , and an operator **M** : C <sup>n</sup> <sup>→</sup> <sup>C</sup> n such that

$$
\langle \mathbf{C} | \mathbf{M}^A | \mathbf{C} \rangle = \langle \mathbf{C} | \mathbf{M} \otimes \mathbf{I} | \mathbf{C} \rangle = \mu(A), \tag{45}
$$

$$
\langle \mathbf{C} | \mathbf{M}^{\mathcal{B}} | \mathbf{C} \rangle = \langle \mathbf{C} | \mathbf{I} \otimes \mathbf{M} | \mathbf{C} \rangle = \mu \langle \mathbf{B} \rangle,\tag{46}
$$

$$
\langle \mathbf{C} | \mathbf{M}^\wedge | \mathbf{C} \rangle = \langle \mathbf{C} | \mathbf{M} \otimes \mathbf{M} | \mathbf{C} \rangle = \mu \langle AB \rangle. \tag{47}
$$

Let {|ii}<sup>n</sup> i=1 be the canonical basis of C n . Without loss of generality, we can take **M** to be an orthogonal projector on the subspace of C n spanned by the basis elements |1i, ..., |ri, with r < n. Hence,

$$\mathbf{M}(\boldsymbol{\kappa}\_1, \dots, \boldsymbol{\kappa}\_n) \to (\boldsymbol{\kappa}\_1, \dots, \boldsymbol{\kappa}\_r, 0, \dots, 0).$$

Next, let <sup>|</sup>C<sup>i</sup> be a unit vector in <sup>C</sup> <sup>n</sup> <sup>⊗</sup> <sup>C</sup> n . That is,

$$|C\rangle = \sum\_{i=1}^{n} \sum\_{j=1}^{n} c\_{ij} e^{\hat{\Pi}^{\mathcal{V}\_{\bar{j}}}} |i\rangle \otimes |j\rangle,\tag{48}$$

and

$$\begin{split} \langle C|C \rangle &= \sum\_{i,j=1}^{n} c\_{ij} e^{\hat{\mathbf{M}} \gamma\_{j}} \langle i| \otimes \langle j| \sum\_{k,l=1}^{n} c\_{kl} e^{\hat{\mathbf{M}} \gamma\_{kl}} |k \rangle \otimes |l\rangle \\ &= \sum\_{i,j,k,l=1}^{n} c\_{ij} c\_{kl} e^{\hat{\mathbf{M}} \left(-\gamma\_{ij} + \gamma\_{kl}\right)} \langle i|k \rangle \langle j|l \rangle \\ &= \sum\_{i,j=1}^{n} c\_{ij}^{2} = 1. \end{split} \tag{49}$$

We now prove that the operator **M** and the vector |Ci above satisfy Equations (45–47) if and only if µ(A),µ(B), and µ(AB) are classical conjunction data.

Theorem 2. Let µ = {µ(A),µ(B), µ(AB)} be a triplet denoting the membership of concepts A, B, and their conjunction AB. The triplet µ is classical conjunction data if and only if it admits a representation in C <sup>n</sup> <sup>⊗</sup> <sup>C</sup> <sup>n</sup> with n <sup>=</sup> <sup>2</sup>.

Proof. If µ admits a representation in C <sup>2</sup> <sup>⊗</sup><sup>C</sup> 2 , there exists a unit vector <sup>|</sup>Ci ∈ <sup>C</sup> <sup>2</sup> <sup>⊗</sup> <sup>C</sup> 2 and an operator **M** such that Equations (45–47) are satisfied. If µ(A) = µ(B) = µ(AB) = 0 or 1, we can choose <sup>|</sup>Cito be any unit vector in <sup>C</sup> <sup>2</sup>⊗<sup>C</sup> 2 , and **M** to be a zero- or two-dimensional projector, respectively. Otherwise, let {|1i, |2i} be the canonical basis for C 2 . Without loss of generality, we can define |Ci by

$$\begin{split} |C\rangle = c\_{11} e^{\dot{\Pi}^{\gamma\_{11}}} |1\rangle \otimes |1\rangle + c\_{12} e^{\dot{\Pi}^{\gamma\_{12}}} |1\rangle \otimes |2\rangle + c\_{21} e^{\dot{\Pi}^{\gamma\_{21}}} |2\rangle \otimes |1\rangle \\ + c\_{22} e^{\dot{\Pi}^{\gamma\_{22}}} |2\rangle \otimes |2\rangle \langle 50 \rangle \end{split}$$

and **M** by the one-dimensional projector into the subspace determined by |1i. Note that

$$
\mu(A) = \langle \mathcal{C} | \mathbf{M} \otimes \mathbf{1} | \mathcal{C} \rangle = c\_{11}^2 + c\_{12}^2,
$$

$$
\mu(B) = \langle \mathcal{C} | \mathbf{1} \otimes \mathbf{M} | \mathcal{C} \rangle = c\_{11}^2 + c\_{21}^2,
\tag{51}
$$

$$
\mu(AB) = \langle \mathcal{C} | \mathbf{M} \otimes \mathbf{M} | \mathcal{C} \rangle = c\_{11}^2.
$$

Then, clearly µ(AB) ≤ µ(A), µ(AB) ≤ µ(B), and since |Ci is a unit vector,

$$
\mu(A) + \mu(B) - \mu(AB) = c\_{11}^2 + c\_{12}^2 + c\_{21}^2 \le 1. \tag{52}
$$

Therefore, µ is classical conjunction data. The other implication is proven by taking **M** to be the same one-dimensional projector, and |Ci such that

$$\begin{aligned} c\_{11} &= \sqrt{\mu(AB)}, \\ c\_{12} &= \sqrt{\mu(A) - \mu(AB)}, \\ c\_{21} &= \sqrt{\mu(B) - \mu(AB)}, \\ c\_{22} &= \sqrt{1 - \mu(A) - \mu(B) + \mu(AB)}, \end{aligned}$$

and γij = 0, for i, j = 1, 2.

Theorem 2 proves the strict equivalence between classical conjunction data and the model of conjunction built in C <sup>2</sup> <sup>⊗</sup> <sup>C</sup> 2 .

# 3. UNITARY TRANSFORMATIONS AND DATA REPRESENTATION

We now investigate how multiple exemplars can be concretely represented using a single concept state. To do so, we use unitary transformations to identify a basis of the realization space where multiple exemplars can be represented simultaneously. In this new framework, concrete representations are consistent with the cognitive principles of the quantum model of concepts. Namely, a concept exists in a single state for all exemplars, and the measurement of membership of an exemplar depends on the exemplar to be measured rather than on the concept state.

# 3.1. Data Representation in the First Sector

The following definition and theorem introduce the notion of data representation in the first sector that is consistent with the cognitive principles of the quantum model of concepts in C 3 .

Definition 3. Let µ = {µi(A),µi(B),µi(AB)} k i=1 be a set of experimental data, where µi(x) is the semantic estimation of an exemplar p<sup>i</sup> with respect to concepts A, B, and their conjunction AB. A representation of µ in C 3 is defined as a pair of unit vectors <sup>|</sup>Ai, <sup>|</sup>Bi ∈ <sup>C</sup> 3 , and a collection of orthogonal projectors **M**<sup>i</sup> : C <sup>3</sup> <sup>→</sup> C 3 such that conditions (13–16) are satisfied for i = 1, ..., k. We say (|Ai, |Bi,{**M**i} k i=1 ) is a representation of µ in C 3 .

Theorem 3. Let µ = {µi(A),µi(B),µi(AB)} k i=1 be a set of experimental data, where µi(x) is the semantic estimation of exemplar p<sup>i</sup> with respect to concepts A, B, and their conjunction AB. The set of data µ has a representation in C 3 if and only if for all i = 1, ..., k

$$\mu\_i(AB) \in \left[ \text{ave}\_i(AB) - \text{dev}\_i(AB), \text{ave}\_i(AB) + \text{dev}\_i(AB) \right]. \tag{53}$$

Proof. Let |Ai = (1, 0, 0), |Bi = (0, 1, 0), and |Ci = (0, 0, 1) be the canonical basis for C 3 . We prove that, if Equation (53) is satisfied for each i = 1, ..., k then there exists an orthogonal projector **M**<sup>i</sup> such that conditions (13–16) are satisfied for |Ai, |Bi, and **M**<sup>i</sup> .

Since µi(A),µi(B) and µi(AB) satisfy (Equation 53), by Theorem 1 for each i ∈ {1, ..., k} there exist two vectors,

$$\langle A\_i \rangle = (a\_1 e^{\dot{\Pi} \alpha\_1}, a\_2 e^{\dot{\Pi} \alpha\_2}, a\_3 e^{\dot{\Pi} \alpha\_3}), \ |B\_i\rangle = (b\_1 e^{\dot{\Pi} \beta\_1}, b\_2 e^{\dot{\Pi} \beta\_2}, b\_3 e^{\dot{\Pi} \beta\_3}),\tag{54}$$

and an orthogonal projector **M**ˆ <sup>i</sup> such that Equations (13–16) are satisfied. Thus, the pair of vectors |Aii and |Bii, as constructed in the proof of Theorem 1, are orthonormal. We set |Cii = |Aii × |Bii so that the set {|Aii, |Bii, |Cii} forms an orthonormal basis for C 3 for any i ∈ {1, ..., k}. Next, we define the operator U<sup>i</sup> by

$$\mathbf{U}\_{i} = \begin{pmatrix} \langle A\_{i}|A\rangle & \langle A\_{i}|B\rangle & \langle A\_{i}|C\rangle\\ \langle B\_{i}|A\rangle & \langle B\_{i}|B\rangle & \langle B\_{i}|C\rangle\\ \langle C\_{i}|A\rangle & \langle C\_{i}|B\rangle & \langle C\_{i}|C\rangle \end{pmatrix} . \tag{55}$$

Ui is a unitary matrix whose action induces a change from the basis {|Aii, |Bii, |Cii} to the basis {|Ai, |Bi, |Ci}. Note that Ui |Aii = |Ai, U<sup>i</sup> |Bii = |Bi, and U<sup>i</sup> |Cii = |Ci.

We can also use the operator U<sup>i</sup> to represent **M**ˆ i in the canonical basis {|Ai, |Bi, |Ci} as follows:

$$\mathbf{M}\_i = \mathbf{U}\_i \hat{\mathbf{M}}\_i \mathbf{U}\_i^{-1}. \tag{56}$$

We use the fact that **I** = U −1 <sup>i</sup> U<sup>i</sup> = UiU −1 i to show that the remaining conditions are satisfied. That is, for each i = 1, ..., k,

$$
\mu\_i(A) = \langle A\_i | \hat{\mathbf{M}}\_i | A\_i \rangle = \langle A\_i \mathbf{U}\_i^{-1} | \mathbf{U}\_i \hat{\mathbf{M}}\_i \mathbf{U}\_i^{-1} | \mathbf{U}\_i A\_i \rangle = \langle A | \mathbf{M}\_i | A \rangle,\tag{57}
$$

$$
\mu\_i(B) = \langle B\_i | \hat{\mathbf{M}}\_i | B\_i \rangle = \langle B\_i \mathbf{U}\_i^{-1} | \mathbf{U}\_i \hat{\mathbf{M}}\_i \mathbf{U}\_i^{-1} | \mathbf{U}\_i B\_i \rangle = \langle B | \mathbf{M}\_i | B \rangle,\tag{57}
$$

and

$$\begin{split} \mu\_i(AB) &= \frac{1}{2} (\mu(A) + \mu(B)) + \mathfrak{R} (\langle A\_i | \hat{\mathbf{M}}\_i | B\_i \rangle) \\ &= \frac{1}{2} (\mu(A) + \mu(B)) + \mathfrak{R} (\langle A\_i \mathbf{U}\_i^{-1} | \mathbf{U}\_i \hat{\mathbf{M}}\_i \mathbf{U}\_i^{-1} | \mathbf{U}\_i A\_i \rangle) \\ &= \frac{1}{2} (\mu(A) + \mu(B)) + \mathfrak{R} (\langle A | \mathbf{M}\_i | B \rangle). \end{split} \tag{58}$$

Theorem 3 provides a data representation in terms of a single pair of vectors |Ai and |Bi, and a set of projectors **M**i, for i = 1, ..., k, corresponding to the membership operator for each exemplar. Since the unitary transformations preserve the inner product between vectors and operators, the values of the membership estimations µi(A),µi(B), and µi(AB) are preserved.

Consider for example the exemplars p ="filing cabinet" and q ="heated waterbed" mentioned in Section 1.3. These can now be represented by the states |Ai = (1, 0, 0), |Bi = (0, 1, 0) and the following measurement operators

$$\begin{aligned} \mathbf{M}\_{\mathcal{P}} &= \begin{pmatrix} 0.97 & -0.11 + 0.09\mathbf{\hat{i}} & 0.09 + 0.01\mathbf{\hat{j}} \\ -0.11 - 0.09\mathbf{\hat{i}} & 0.31 & 0.28 + 0.34\mathbf{\hat{i}} \\ 0.09 - 0.01\mathbf{\hat{i}} & 0.28 - 0.34\mathbf{\hat{j}} & 0.72 \end{pmatrix}, \\ \mathbf{M}\_{\mathcal{Q}} &= \begin{pmatrix} 1 & 0 & 0 \\ 0 & 0.49 & 0.499 \\ 0 & 0.499 & 0.51 \end{pmatrix}. \end{aligned} \tag{59}$$

From a geometric perspective, the operators **M**<sup>p</sup> and **M**<sup>q</sup> correspond to rotations of the one-dimensional projector **<sup>M</sup>**(x, <sup>y</sup>, <sup>z</sup>) <sup>→</sup> (x, <sup>0</sup>, 0) in <sup>C</sup> 3 .

# 3.2. Data Representation in the Second Sector

We now apply unitary transformations in the concrete representations of the tensor product model in C <sup>n</sup> <sup>⊗</sup> <sup>C</sup> n . We first define different types of representations for multiple exemplars, and then provide explicit representation theorems for the cases n = 2 and 3.

Definition 4. A zero-type representation of µ k i=1 on the tensor product space C <sup>n</sup> <sup>⊗</sup> <sup>C</sup> n is a unit vector <sup>|</sup>Ci ∈ <sup>C</sup> <sup>n</sup> <sup>⊗</sup> <sup>C</sup> n , and a collection of orthogonal projectors {**M**<sup>A</sup> i , **M**<sup>B</sup> i } k i=1 from C <sup>n</sup> <sup>⊗</sup> <sup>C</sup> n to C <sup>n</sup> <sup>⊗</sup> <sup>C</sup> n , such that conditions (47)–(49) are satisfied with **M**∧ <sup>i</sup> <sup>=</sup> **<sup>M</sup>**<sup>A</sup> <sup>i</sup> **<sup>M</sup>**<sup>B</sup> i , for i <sup>=</sup> <sup>1</sup>, ..., k. We say (|Ci,{**M**<sup>A</sup> i , **M**<sup>B</sup> i } k i=1 ) is a zero-type representation of µ k i=1 in C <sup>n</sup> <sup>⊗</sup> <sup>C</sup> n .

The zero-type representation is, mathematically speaking, the most general representation in the tensor product model that is consistent with the modeling principles of quantum cognition because it assumes a single concept state |Ci, and a collection of measurements that represent the membership weight estimations. However, this representation cannot be appropriately interpreted because **M**<sup>A</sup> i and **M**<sup>B</sup> i can be entangled measurements, for i = 1, ..., k.

A more reasonable representation of data assumes that **M**<sup>A</sup> i = **<sup>M</sup>**<sup>i</sup> <sup>⊗</sup> <sup>1</sup>, and **<sup>M</sup>**<sup>B</sup> <sup>i</sup> <sup>=</sup> <sup>1</sup> <sup>⊗</sup> **<sup>M</sup>**<sup>i</sup> , for i = 1, ..., k. Therefore, these operators are not entangled because they act on different sides of C <sup>3</sup> <sup>⊗</sup> <sup>C</sup> 3 .

Definition 5. A first-type representation of µ k i=1 on the tensor product space C <sup>n</sup> <sup>⊗</sup> <sup>C</sup> n is a unit vector <sup>|</sup>Ci ∈ <sup>C</sup> <sup>n</sup> <sup>⊗</sup> <sup>C</sup> n , and a collection of orthogonal projectors **M**<sup>i</sup> from C n to C n , for <sup>i</sup> <sup>=</sup> <sup>1</sup>, ..., k, such that (|Ci,{**M**<sup>i</sup> <sup>⊗</sup> <sup>1</sup>, <sup>1</sup> <sup>⊗</sup> **<sup>M</sup>**i} k i=1 ) is a zero-type representation of µ k i=1 in C <sup>n</sup> <sup>⊗</sup> <sup>C</sup> n .

The first-type representation is a direct extension of the representation of individual exemplars in Definition 2, and thus it is interpreted according to such representation: The state |Ci describes the situation of having two concepts and their combination, and **M**<sup>i</sup> represents the semantic estimation of exemplar p<sup>i</sup> , i = 1, ..., k.

The zero- and first-type representations require different conditions to model a collection of exemplars for a pair of concepts and their conjunction. While the first-type corresponds to the natural way to represent a pair of systems in quantum physics, and thus is the natural way to define a representation in the tensor product model for concepts, the zero-type provides a more general way to build concrete representations because it does not impose a product structure on the concept state or on the membership operators for the exemplars.

In fact, from Definitions 4–5 it is trivial to deduce that a first-type representation is also a zero-type representation.

The following theorem characterizes the cases when a set of data has a zero-type representation in C <sup>2</sup> <sup>⊗</sup> <sup>C</sup> 2 .

Theorem 4. The set of data µ k i=1 has a zero-type representation in C <sup>2</sup> <sup>⊗</sup> <sup>C</sup> 2 if and only if µ<sup>i</sup> is classical conjunction data for i = 1, ..., k.

Proof. For each i = 1, ..., k, we use the construction in the proof of Theorem 2 to obtain a tensor |C˜ <sup>i</sup>i and a one-dimensional projector **M**˜ such that **M**˜ <sup>A</sup> <sup>i</sup> <sup>=</sup> **<sup>M</sup>**˜ <sup>⊗</sup> <sup>1</sup>, **<sup>M</sup>**˜ <sup>B</sup> <sup>i</sup> <sup>=</sup> <sup>1</sup> <sup>⊗</sup> **<sup>M</sup>**˜ , and **<sup>M</sup>**˜ <sup>∧</sup> i = **M**˜ ⊗ **M**˜ . This gives the tensor product representation of µ<sup>i</sup> . Next, we use unitary transformations to change this representation so that |C˜ <sup>i</sup><sup>i</sup> is a vector in the canonical basis of <sup>C</sup> <sup>2</sup> <sup>⊗</sup><sup>C</sup> 2 . To facilitate the notation, we will make use of the isomorphism I between C <sup>2</sup> <sup>⊗</sup> <sup>C</sup> 2 and C 4 . Let

$$\begin{aligned} \langle 1, 0, 0, 0 \rangle &= |e\_1\rangle, \\ \langle 0, 1, 0, 0 \rangle &= |e\_2\rangle, \\ \langle 0, 0, 1, 0 \rangle &= |e\_3\rangle, \\ \langle 0, 0, 0, 1 \rangle &= |e\_4\rangle. \end{aligned} \tag{60}$$

We define

$$\begin{aligned} \mathbb{I}(|1\rangle \otimes |1\rangle) &= |e\_1\rangle, \\ \mathbb{I}(|1\rangle \otimes |2\rangle) &= |e\_2\rangle, \\ \mathbb{I}(|2\rangle \otimes |1\rangle) &= |e\_3\rangle, \\ \mathbb{I}(|2\rangle \otimes |2\rangle) &= |e\_4\rangle. \end{aligned} \tag{61}$$

The isomorphism <sup>I</sup> allows us to represent <sup>|</sup>C˜ <sup>i</sup>i by a vector |Cii in C 4 .

We can prove the theorem by building a unitary transformation that takes |Cii to one of the canonical basis vectors of C 4 , and use this transformation to represent the operators **M**˜ <sup>A</sup>, **M**˜ <sup>B</sup> , and **M**˜ <sup>∧</sup> by the operators **M**A, **M**<sup>B</sup> , and **M**∧ in C 4 . Next, we apply the the inverse isomorphism I −1 to map these new representations to C <sup>2</sup> <sup>⊗</sup> <sup>C</sup> 2 .

Let <sup>|</sup>Dii, <sup>|</sup>Eii, <sup>|</sup>Fi<sup>i</sup> be three vectors in <sup>C</sup> 4 such that

$$\begin{aligned} \langle D\_i | D\_i \rangle &= \langle E\_i | E\_i \rangle = \langle F\_i | F\_i \rangle = 1, \\ \langle C\_i | D\_i \rangle &= \langle C\_i | E\_i \rangle = \langle C\_i | F\_i \rangle = 0, \\ \langle D\_i | E\_i \rangle &= \langle D\_i | F\_i \rangle = \langle E\_i | F\_i \rangle = 0. \end{aligned} \tag{62}$$

The vectors |Cii, |Dii, |Eii, and |Fii form an orthonormal basis for C 4 .

Let

$$\mathbf{U}\_{i} = \begin{pmatrix} \langle C\_{i}|e\_{1}\rangle & \langle C\_{i}|e\_{2}\rangle & \langle C\_{i}|e\_{3}\rangle & \langle C\_{i}|e\_{4}\rangle\\ \langle D\_{i}|e\_{1}\rangle & \langle D\_{i}|e\_{2}\rangle & \langle D\_{i}|e\_{3}\rangle & \langle D\_{i}|e\_{4}\rangle\\ \langle E\_{i}|e\_{1}\rangle & \langle E\_{i}|e\_{2}\rangle & \langle E\_{i}|e\_{3}\rangle & \langle E\_{i}|e\_{4}\rangle\\ \langle F\_{i}|e\_{1}\rangle & \langle F\_{i}|e\_{2}\rangle & \langle F\_{i}|e\_{3}\rangle & \langle F\_{i}|e\_{4}\rangle \end{pmatrix}. \tag{63}$$

Note that U<sup>i</sup> is a unitary matrix whose action induces a change from the basis {|Cii, <sup>|</sup>Dii, <sup>|</sup>Eii, <sup>|</sup>Fii} to the basis {|eji}<sup>4</sup> j=1 . In fact,

$$
\langle \mathcal{U}\_i | C\_i \rangle = |e\_1\rangle, \ \mathcal{U}\_i |D\_i\rangle = |e\_2\rangle, \ \mathcal{U}\_i |E\_i\rangle = |e\_3\rangle, \text{and } \mathcal{U}\_i |F\_i\rangle = |e\_4\rangle.
$$

The operator U<sup>i</sup> can now be used to change the basis in which **M**<sup>A</sup> i , **M**<sup>B</sup> i , and **M**∧ i are represented, to the basis {|eji}<sup>4</sup> j=1 :

$$\begin{aligned} \bar{\mathbf{M}}\_i^A &= \mathbf{U}\_i \mathbf{M}\_i^A \mathbf{U}\_i^{-1}, \\ \bar{\mathbf{M}}\_i^B &= \mathbf{U}\_i \mathbf{M}\_i^B \mathbf{U}\_i^{-1}, \\ \bar{\mathbf{M}}\_i^\diamond &= \mathbf{U}\_i \mathbf{M}\_i^\diamond \mathbf{U}\_i^{-1}. \end{aligned} \tag{64}$$

Since <sup>1</sup> <sup>=</sup> <sup>U</sup> −1 <sup>i</sup> U<sup>i</sup> = UiU −1 i , we obtain

$$
\mu\_i(A) = \langle \mathbf{C}\_i | \mathbf{M}\_i^A | C\_i \rangle = \langle \mathbf{C}\_i \mathbf{U}\_i^{-1} | \mathbf{U}\_i \mathbf{M}\_i^A \mathbf{U}\_i^{-1} | \mathbf{U}\_i \mathbf{C}\_i \rangle = \langle e\_1 | \bar{\mathbf{M}}\_i^A | e\_1 \rangle,\tag{65}
$$

$$
\mu\_i(\mathcal{B}) = \langle \mathbf{C}\_i | \mathbf{M}\_i^B | \mathbf{C}\_i \rangle = \langle \mathbf{C}\_i \mathbf{U}\_i^{-1} | \mathbf{U}\_i \mathbf{M}\_i^B \mathbf{U}\_i^{-1} | \mathbf{U}\_i \mathbf{C}\_i \rangle = \langle e\_1 | \bar{\mathbf{M}}\_i^B | e\_1 \rangle,\tag{66}
$$

$$
\mu\_i(\mathcal{A}\mathcal{B}) = \langle \mathbf{C}\_i | \mathbf{M}\_i^\wedge | \mathbf{C}\_i \rangle = \langle \mathbf{C}\_i \mathbf{U}\_i^{-1} | \mathbf{U}\_i \mathbf{M}\_i^\wedge \mathbf{U}\_i^{-1} | \mathbf{U}\_i \mathbf{C}\_i \rangle = \langle e\_1 | \bar{\mathbf{M}}\_i^\wedge | e\_1 \rangle.\tag{65}
$$

We then use the inverse isomorphism I −1 to obtain a zero-type representation in C <sup>2</sup> <sup>⊗</sup> <sup>C</sup> 2 :

$$\begin{aligned} \vert C \rangle &= \mathbb{I}^- 1 (\vert e\_1 \rangle) = \vert 1 \rangle \otimes \vert 1 \rangle, \\ \tilde{\mathbf{M}}\_i^A &= \mathbb{I}^{-1} \tilde{\mathbf{M}}\_i^A \mathbb{I}, \\ \tilde{\mathbf{M}}\_i^B &= \mathbb{I}^{-1} \tilde{\mathbf{M}}\_i^B \mathbb{I}, \\ \tilde{\mathbf{M}}\_i^\diamond &= \mathbb{I}^{-1} \tilde{\mathbf{M}}\_i^\diamond \mathbb{I}. \end{aligned} \tag{66}$$

We have constructed a zero-type representation (|1i ⊗ <sup>|</sup>1i,{**M**<sup>A</sup> i , **M**<sup>B</sup> i } k i=1 ) from a collection of representations (|Cii, **M**) for the exemplars p<sup>i</sup> with **M**(x, y) → (x, 0) obtained from Theorem 2.

In the construction of Theorem 4, note that when Equation (66) entails operators **M**<sup>A</sup> i and **M**<sup>B</sup> i that are of the form **M**<sup>i</sup> <sup>A</sup> = **M**ˇ <sup>i</sup> <sup>⊗</sup> <sup>1</sup> and **<sup>M</sup>**<sup>i</sup> <sup>B</sup> <sup>=</sup> <sup>1</sup> <sup>⊗</sup> **<sup>M</sup>**<sup>ˇ</sup> i , then the representation is also of the first-type.

Stating the necessary and sufficient conditions required for a set of data to have first-type representation is out of the scope of this paper. However, we now introduce another type of representation that is mathematically simpler, and can be used to obtain sufficient conditions for a first-type representation.

Definition 6. A second-type representation of µ k i=1 on the tensor product space C <sup>n</sup> <sup>⊗</sup> <sup>C</sup> n is a pair of unit vectors <sup>|</sup>Ai, <sup>B</sup>i ∈ <sup>C</sup> n , and a collection of orthogonal projectors **M**<sup>i</sup> from C n to C n , for <sup>i</sup> <sup>=</sup> <sup>1</sup>, ..., k, such that (|Ai⊗|Bi,{**M**i⊗1, <sup>1</sup>⊗**M**i} k i=1 ) is a zero-type representation of µ k i=1 in C <sup>n</sup> <sup>⊗</sup> <sup>C</sup> n .

The second-type is a mathematical simplification of the first-type representation that assumes |Ci to be a product state.

Lemma 1. The set of data µ k i=1 has a second-type representation in C <sup>2</sup>⊗<sup>C</sup> 2 if and only if for each i <sup>=</sup> <sup>1</sup>, ..., k there exist <sup>|</sup>Aii, <sup>|</sup>Bii, **<sup>M</sup>**<sup>ˇ</sup> <sup>A</sup> i , and **M**ˇ <sup>B</sup> i such that Equations (45–47) are satisfied.

Proof. Let Ui(A), Ui(B) : C <sup>2</sup> <sup>→</sup> <sup>C</sup> <sup>2</sup> be the unitary transformations that map |Aii to |1i and |Bii to |1i respectively, for i = 1, ..., k. Then, it is straightforward to show that (|1i ⊗ |1i,{**M**<sup>A</sup> <sup>i</sup> <sup>⊗</sup> <sup>1</sup>, <sup>1</sup> <sup>⊗</sup> **<sup>M</sup>**<sup>B</sup> i } k i=1 ) is a second-type representation of µ k <sup>i</sup>=<sup>1</sup> with

$$\begin{aligned} \mathbf{M}\_i^A &= \mathbf{U}\_i(A)^{-1} \check{\mathbf{M}}\_i^A \mathbf{U}\_i(A), \\ \mathbf{M}\_i^B &= \mathbf{U}\_i(B)^{-1} \check{\mathbf{M}}\_i^B \mathbf{U}\_i(B). \end{aligned} \tag{67}$$

Theorem 4 and Lemma 1 characterize the sets of data that have a zero- and second-type representations. Since the first-type representation is less general than the zero-type representation, but more general than the second-type representation, these results can be applied to obtain an upper and lower bound for the number of exemplars that have a first-type representation in a given set of data.

Note that Theorem 4 is built in C <sup>2</sup> <sup>⊗</sup> <sup>C</sup> 2 . We now extend our results to C <sup>3</sup> <sup>⊗</sup> <sup>C</sup> 3 so they become compatible with the representation analysis developed in Section 3.1 for a Hilbert space model in C 3 . The next corollary extends the proof of Theorem 4 to the space C <sup>3</sup> <sup>⊗</sup> <sup>C</sup> 3 .

Corollary 2. If the set of data µ k i=1 has a zero-type representation in C <sup>2</sup> <sup>⊗</sup> <sup>C</sup> 2 , then µ k i=1 has a zero-type representation in C <sup>3</sup> <sup>⊗</sup> <sup>C</sup> 3 .

Proof. Let (|Ci,{**M**<sup>A</sup> i , **M**<sup>B</sup> i } k i=1 ) be a zero-type representation of µ k i=1 in C <sup>2</sup> <sup>⊗</sup> <sup>C</sup> 2 . We can create a vector

$$|C^\*\rangle = \sum\_{i,j=1}^3 c\_{ij}^\* |i\rangle \otimes |j\rangle \tag{68}$$

such that it is the trivial embedding of

$$|\mathcal{C}\rangle = \sum\_{i,j=1}^{2} c\_{ij} |i\rangle \otimes |j\rangle \tag{69}$$

in C <sup>3</sup> <sup>⊗</sup> <sup>C</sup> <sup>3</sup> by choosing

$$c\_{ij}^\* = \begin{cases} c\_{ij} & i, j \in \{1, 2\} \\ 0 & \text{else.} \end{cases} \tag{70}$$

Similarly, we can also create operators **M**<sup>A</sup> ∗ i and **M**<sup>B</sup> ∗ i by using the trivial embedding in such a way that the actions of the operators **M**<sup>A</sup> i and **M**<sup>B</sup> i on C <sup>2</sup> <sup>⊗</sup> <sup>C</sup> 2 are preserved. This completes the proof.

Since second-type representations are also first- and zero-type representations, we can apply Corollary 2 to obtain a first- and second-type representation in C <sup>3</sup> <sup>⊗</sup> <sup>C</sup> 3 .

# 4. A CONJECTURE ABOUT COMPATIBILITY OF EXEMPLARS

In quantum theory, measurement operators can be incompatible. That is, when we consider two different observables, the result of their sequential application can depend on the order in which they are applied. The fact that quantum measurements can be incompatible is related to fundamental differences between the quantum and classical realms, such as the observer phenomena, and the Heisenberg uncertainty principle (Heisenberg, 1927; Isham, 2001).

Definition 7. Given two operators **M**<sup>1</sup> and **M**<sup>2</sup> represented in the same basis. We say that **M**<sup>1</sup> and **M**<sup>2</sup> represent compatible observables if and only if the commutator operator

$$[\mathbf{M}\_1, \mathbf{M}\_2] = \mathbf{M}\_1 \mathbf{M}\_2 - \mathbf{M}\_2 \mathbf{M}\_1 = 0. \tag{71}$$

Otherwise, the operators represent incompatible observables.

In terms of cognitive phenomena, sequential measurements could be interpreted as consecutive cognitive actions where the previous action serves as a context for the next action (Busemeyer and Wang, 2007; Wang and Busemeyer, 2013). Since in our concrete representations membership operators are represented in the same basis for all exemplars p<sup>i</sup> = 1, ..., k, it is now possible to test whether or not these measurement operators commute. If we find exemplars whose operators are non-commutative, then we can conjecture the existence of a fundamental limit to the precision with which the membership of these exemplars can be known simultaneously.

Note that we would expect that classical probabilistic models should be compatible, and because the classical probabilistic model and the tensor product model are equivalent, tensor product operators obtained from the data should also be compatible for the vector representing the conceptual situation. However, Hilbert space models could exhibit incompatible measurements for certain data on concept combination, as the Hilbert space model represents non-classical measurements.

We introduce the following definitions to characterize the compatibility of exemplars in C 3 and in C <sup>3</sup> <sup>⊗</sup> <sup>C</sup> 3 :

Definition 8. Let |Ai = (1, 0, 0), |Bi = (0, 1, 0), and {**M**1, **M**2} be a representation in C 3 of {(µi(A),µi(B),µi(AB))} 2 i=1 , and set

$$\begin{aligned} c\_A &= \langle A | [\mathbf{M}\_1, \mathbf{M}\_2] | A \rangle, \\ c\_B &= \langle B | \mathbf{M}\_1, \mathbf{M}\_2 \rangle | B \rangle, \\ c\_{AB} &= \frac{1}{2} (\langle A | + \langle B |)[\mathbf{M}\_1, \mathbf{M}\_2](|A\rangle + |B\rangle). \end{aligned} \tag{72}$$

We say p<sup>1</sup> and p<sup>2</sup> are compatible with respect to the concepts A, <sup>B</sup>, and AB if and only if c<sup>A</sup> <sup>=</sup> <sup>0</sup>, c<sup>B</sup> <sup>=</sup> <sup>0</sup>, and cAB <sup>=</sup> <sup>0</sup>, respectively.

For simplicity, we will study compatibility for zero-type representations in C <sup>3</sup> <sup>⊗</sup> <sup>C</sup> 3 .

Frontiers in Psychology | www.frontiersin.org November 2015 | Volume 6 | Article 1734 |

Definition 9. Let <sup>|</sup>Ci = (1, <sup>0</sup>, 0) <sup>⊗</sup> (1, <sup>0</sup>, 0), {**M**<sup>A</sup> 1 , **M**<sup>B</sup> 1 , **M**∧ 1 }, and {**M**<sup>A</sup> 2 , **M**<sup>B</sup> 2 , **M**∧ 2 } be a zero-type representation of data in <sup>C</sup> <sup>3</sup> <sup>⊗</sup> <sup>C</sup> 3 of {(µi(A),µi(B),µi(AB))} 2 i=1 , and set

$$\begin{aligned} c'\_A &= \langle C | [\mathbf{M}\_1^A, \mathbf{M}\_2^A] | C \rangle, \\ c'\_B &= \langle C | [\mathbf{M}\_1^B, \mathbf{M}\_2^B] | C \rangle, \\ c'\_{AB} &= \langle C | [\mathbf{M}\_1^\wedge, \mathbf{M}\_2^\wedge] | B \rangle \rangle. \end{aligned} \tag{73}$$

We say p<sup>1</sup> and p<sup>2</sup> are compatible with respect to concepts A, B, and AB if and only if c′ <sup>A</sup> = 0, c′ <sup>B</sup> = 0, and c′ AB = 0, respectively.

We have verified the compatibility of exemplars for each conceptual combination that can be modeled in C 3 and in C <sup>3</sup> <sup>⊗</sup> <sup>C</sup> <sup>3</sup> using the data in Hampton (1988a,b). The results support our predictions. We have found that the tensor product model always leads to compatible measurements, and that the Hilbert space model leads to incompatible measurements in most cases.

For example, consider the concepts <sup>A</sup> <sup>=</sup> "Machine" and <sup>B</sup> <sup>=</sup>"Vehicle," and the exemplars <sup>p</sup><sup>5</sup> <sup>=</sup>"sailboat" and p<sup>12</sup> ="skateboard." For the case of conceptual conjunction, we have

$$
\begin{aligned}
\mu\_5(A) &= 0.56, \,\mu\_5(B) = 0.8, \,\mu\_5(AB) = 0.42, \,\text{and} \\
\mu\_{12}(A) &= 0.28, \,\mu\_{12}(B) = 0.84, \,\mu\_{12}(AB) = 0.34.
\end{aligned}
\tag{74}
$$

Note that exemplar p<sup>5</sup> satisfies the conditions of Theorems 1 and 2. Thus, it can be represented in both C 3 and in C <sup>3</sup> <sup>⊗</sup> <sup>C</sup> 3 . However, the exemplar p<sup>12</sup> is singly overextended. Therefore, we can only represent the two exemplars simultaneously in C 3 .

When we apply Theorems 1 and 3, and Definition 8, on these data sets, we obtain

$$c\_A = 0.084 \text{\AA}, \ c\_B = 0.097 \text{\AA}, \text{ and } c\_{AB} = 0.137 \text{\AA} \tag{75}$$

Thus, exemplars p<sup>5</sup> and p<sup>12</sup> are incompatible. Moreover, note that the incompatibility is larger for the conjunction of the concepts than for each of the former concepts.

As a second example, consider the concepts <sup>A</sup> <sup>=</sup>"Building," and <sup>B</sup> <sup>=</sup>"Dwelling," and the exemplars <sup>p</sup><sup>2</sup> <sup>=</sup>"cave," and p<sup>10</sup> ="synagogue," whose memberships are given by

$$\begin{aligned} \mu\_2(A) &= 0.28, \,\mu\_2(B) = 0.85, \,\mu\_2(AB) = 0.28, \,\text{and} \\ \mu\_{10}(A) &= 0.93, \,\mu\_{10}(B) = 0.49, \,\mu\_{10}(AB) = 0.45. \end{aligned} \tag{76}$$

Both exemplars satisfy the conditions of Theorem 2. Applying Theorems 2 and 4, and Definition 9, we obtain

$$c'\_A = c'\_B = c'\_{AB} = 0.$$

This is consistent with our expectations because the representation in the second sector C <sup>3</sup> <sup>⊗</sup> <sup>C</sup> 3 correspond to classical (and thus compatible) measurements.

Since our data was collected presenting the exemplars in only one specific order (Hampton, 1988b), these computations demonstrate that we can predict order effects by determining the exemplars that are incompatible. The results presented here are, however, speculative since there is no experimental data where order effects have been recorded that could be used to contrast our computations. While our data set does not allow us to make a strong claim, we conjecture that order effects are predictable, and suggest that the concrete representations proposed in this paper could be used to develop Heisenberg-like uncertainty relations in the context of conceptual combinations.

# 5. CONCLUSION AND FUTURE WORK

In this paper, we have made some advances on the representational aspects of the quantum model for concept combinations. First, we proved that the first and second sectors of the two-sector Fock space model of concept conjunctions can be concretely represented in C 3 and C <sup>3</sup> <sup>⊗</sup> <sup>C</sup> 3 , respectively. Next, we introduced unitary transformations to provide concrete representations that are consistent with the cognitive principles of the quantum model of concepts, and used these concrete representations to study the question of measurement compatibility.

The representations introduced here could be an important tool for future applications. First, since they are consistent with the cognitive principles of the quantum model of concepts, the model could easily be introduced to a wider audience, and extended to produce concrete representations in the two-sector Fock space model. Second, they can be adopted as a representational standard for different groups who seek to develop their own computational implementations of the model. Third, the fact that all the measurements are represented in a single basis constitutes a tremendous mathematical advantage for studying the probabilistic structure of concepts.

The evidence obtained in the application of our representations to the issue of exemplar compatibility is consistent with the assumptions of the model. Since the second sector entails logical reasoning, measurements in the tensor product model should be compatible. However, incompatible measurements are likely to be found in the Hilbert space model, since the first sector is associated with non-logical or intuitive reasoning. Moreover, this line of enquiry invites us to explore possible relations between the projector operator structure and the meaning of the exemplar.

In summary, the introduction of unitary transformations and the subsequent application to develop concrete representations of concepts and their combinations seems to be a promising line of research that has the potential to expand both theoretical and applied research in quantum cognition.

# FUNDING

This research has been funded by an internal grant from the I.K Barber school of Arts and Sciences at UBC Okanagan.

# ACKNOWLEDGMENTS

We would like to acknowledge Diederik Aerts and Sandro Sozzo for discussions and insights that contributed to the development of this work.

# REFERENCES


# SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fpsyg. 2015.01734


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Veloz and Desjardins. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# **Reintroducing the Concept of Complementarity into Psychology**

#### *Zheng Wang <sup>1</sup> \* and Jerome Busemeyer <sup>2</sup>*

*<sup>1</sup> School of Communication, The Ohio State University, Columbus, OH, USA, <sup>2</sup> Department of Psychological and Brain Sciences, Indiana University, Bloomington, IN, USA*

Central to quantum theory is the concept of complementarity. In this essay, we argue that complementarity is also central to the emerging field of quantum cognition. We review the concept, its historical roots in psychology, and its development in quantum physics and offer examples of how it can be used to understand human cognition. The concept of complementarity provides a valuable and fresh perspective for organizing human cognitive phenomena and for understanding the nature of measurements in psychology. In turn, psychology can provide valuable new evidence and theoretical ideas to enrich this important scientific concept.

**Keywords: quantum cognition, quantum probability, complementarity, commutativity, compatibility, Niels Bohr, William James, order effects**

# **INTRODUCTION**

#### *Edited by:*

*Sandro Sozzo, University of Leicester, UK*

#### *Reviewed by:*

*Patrizio E. Tressoldi, Università di Padova, Italy Arkady Plotnitsky, Purdue University, USA*

> *\*Correspondence: Zheng Wang wang.1243@osu.edu*

#### *Specialty section:*

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology*

*Received: 13 August 2015 Accepted: 11 November 2015 Published: 27 November 2015*

#### *Citation:*

*Wang Z and Busemeyer J (2015) Reintroducing the Concept of Complementarity into Psychology. Front. Psychol. 6:1822. doi: 10.3389/fpsyg.2015.01822* Central to quantum theory is the concept of complementarity. This essay argues that complementarity is also central to the emerging field of quantum cognition (e.g., Aerts and Aerts, 1994; Khrennikov, 1999; Pothos and Busemeyer, 2013; Wang et al., 2013; Bruza et al., 2015; Busemeyer and Wang, 2015), which applies abstract, mathematical principles of quantum theory to shed light on cognitive structures and processes. The concept of complementarity provides a valuable and fresh perspective for organizing human cognitive phenomena and for understanding the nature of measurements in psychology. In turn, psychology can provide useful new evidence and theoretical ideas to enrich this important scientific concept.

# **COMPLEMENTARITY, COMMUTATIVITY, AND COMPATIBILITY**

The general concept of complementarity was developed by Niels Bohr in a series of debates with Einstein, but the main idea can be summarized as follows (Plotnitsky, 2014, p. 5): Different measurement conditions for observing different phenomena are complementary when


An important consequence of complementarity is that the sequence or order of the measurements matters (von Neumann, 1932, 1962; Atmanspacher and Römer, 2012; Wang et al., 2014). The above definition of complementarity is deliberately general so that it can permit many specific implementations. Below, we provide a way to implement this idea in psychology.

The essential idea of complementarity can be illustrated using the following example involving the measurements of attitudes toward politicians. In a 1997 poll in the United States, half of the 1,002 nationally sampled respondents were asked, "Do you generally think Clinton is honest?" Then they were asked the same question about Gore. The other half answered the same questions in the opposite order. The results exhibited a striking order effect: The proportion saying "yes" to both questions was significantly higher when Gore was judged first (Moore, 2002).

In this example, the phenomena of interest concern a survey respondent's beliefs about the honesty of different politicians. Complementarity arises when a person cannot have a welldefined position on each politician simultaneously. We can obtain a measurement of honesty concerning Clinton or concerning Gore, but we cannot measure both simultaneously, and the order in which we measure them affects the answers. Once we obtain a measurement on say, Clinton, that decision can create a definite position for Clinton, but then the opinion regarding the Gore must be uncertain. However, both measurements are needed to obtain a complete understanding of a respondent's attitude to the two politicians being considered. Therefore, these measurements satisfy the general requirements for complementarity.

This example captures another idea relevant to complementarity. The phenomena that we observe are products of an interaction between some object of investigation and our measurement instruments. The measurement does not simply record a phenomenon, but it creates one. This idea is consistent with the constructionist view of beliefs, attitudes, and intentions proposed by many psychologists (e.g., Feldman and Lynch, 1988; Schwarz, 2007). From this view, because of limited mental capacity and cognitive economy, beliefs, attitudes, and intentions do not exist in memory as properties ready to be recorded; instead, they are constructed when needed. When one is asked a subsequent question, information carried over from the preceding question provides a context for the construction of the second and influences the subsequent response.

Next, we will explain complementarity more specifically by providing a simple "toy" quantum model for this example. To do this, we need to first compare some concepts from classical and quantum probability theories (see Busemeyer and Bruza, 2012, for more detail).

Classical probability theory is concerned with the assignment of probabilities to events. Suppose, for example, we ask a survey participant to evaluate various politicians with regard to their honesty. For example, an event, A, might be that politician X is evaluated as honest. According to classical probability theory, events are represented as subsets of a universal set<sup>1</sup> . For example, the event that a politician is honest is a subset of the universe of all the features that a politician might have. Another event, B, might be that politician Y is evaluated as dishonest. The conjunction of two events is defined by set intersection—in this case, A and B. As shown in **Figure 1**, the combined event "A and B" is the same as the combined event "B and A," and therefore the order of the two events does not matter. Formally, we say that the intersection event is *commutative*, and the probability assigned to "A and B" must equal the probability assigned to "B and A."

Quantum probability theory is also concerned with the assignment of probabilities to events. However, according to quantum theory, events are represented as subspaces of a

universal vector space<sup>2</sup> . If events are defined as subspaces, then the conjunction of two events may or may not exist. The conjunction does not exist if the events are non-commutative so that the order of evaluating them matters. Events that are commutative are also called *compatible*, and events that are noncommutative are called *incompatible* (Atmanspacher and Römer, 2012). Classical probability theory essentially assumes that all events are compatible, but quantum probability theory allows some events to be incompatible.

**Figure 2** illustrates how the projective geometry used by quantum probability theory naturally accounts for order effects. The "yes" answer to the "Do you generally think Clinton is honest?" question is represented by the horizontal ray (which forms one axis from the blue basis), and the "yes" answer to the "Do you generally think Gore is honest?" question is represented by an oblique ray (which forms one axis from the red basis). These two answers are *incompatible* because the subspaces (rays in this "toy" example) for these answers are not defined by a common basis. A person needs to evaluate the Clinton question using one pair of axes (the blue axes), and then must shift her or his viewpoint to another pair of axes (the red axes) to evaluate the Gore question. The final result depends on the order of the applications, because answering one question provides a new contextualized state that is used to generate responses to the second question. As a consequence of incompatibility, if a person is certain about an answer to one question, then the person must be uncertain about the answer to the other question (evidencing the uncertainty principle of quantum theory). In other words, when the questions are incompatible, one cannot be certain about the answers to both questions simultaneously (evidencing the superposition principle of quantum theory).

A key point here is that different bases (red vs. blue axes in **Figure 2**) are required to perform the Clinton and Gore measurements. According to quantum theory, two measurement conditions are complementary whenever we have to change the basis used to represent the outcomes of each measurement.

<sup>1</sup>The universal set is a set that contains a sigma algebra of subsets. The subsets are the events that can occur and the events are subsets of the universal set.

<sup>2</sup>The universal vector space is a vector space spanned by a set of basis vectors. The vector space contains subspaces, which are closed subsets of this vector space, and events are subspaces of the vector space.

*What makes two measures compatible in psychology?* Two questions are compatible if the subspaces representing each question are defined by a common basis. In our example, to form a common basis for representing the Clinton and Gore questions, we must posit at least a four-dimensional space, with the four basis vectors (or axes) representing the four conjunctions: (1) "yes" to Clinton and "yes" to Gore, (2) "yes" to Clinton and "no" to Gore, (3) "no" to Clinton and "yes" to Gore, and (4) "no" to Clinton and "no" to Gore. The belief state would be a vector in this four-dimensional space, and each coordinate would indicate the belief about a conjunction (e.g., the belief in "yes" to Clinton and "no" to Gore). When a compatible representation is used, the order of questions does not matter, because the person eventually arrives at the same conjunction with the same probability when finished. Also, the person can be certain about the answers to both questions at the same time. This seems like a more ideal case of human cognition. This, however, all comes at a higher cost, because more cognitive resources are needed to increase and maintain the higher dimensionality of the compatible representation space (Wang and Busemeyer, 2013; Bruza et al., 2015).

# **FROM PSYCHOLOGY TO PHYSICS: THE HISTORY**

It is an interesting twist of the history that the term "complementary" first appeared in the foundational work of psychology. In one of the most influential classic works in psychology, *The Principles of Psychology*, James (1890) wrote,

"*. . .*in certain persons, at least, the total possible consciousness may be split into parts which coexist but mutually ignore each other, and share the objects of knowledge between them. More remarkable still, they are complementary." (p. 204)

Although there is still debate among philosophers and historians whether Bohr's concept of complementarity was influenced by James, many agree on the clear similarity between the concept of complementarity that James created for psychology in 1890 and that Bohr introduced into physics four decades later, and believe Bohr was at least indirectly affected by James's work (e.g., Stapp, 1993; Plotnitsky, 2012). The concept of complementarity emerged around 1926 and 1927 from the discussions between Bohr and Werner Heisenberg related to the discovery of the uncertainty principle. In a lecture in Como, Italy, in 1927, Bohr (1928) for the first time discussed complementarity in public, and the lecture was published the next year. By the time of the famous debate with Einstein regarding the Einstein-Podolsky-Rosen experiment, Bohr had developed a rather complete definition of complementarity (Plotnitsky, 2014):

"Evidence obtained under different experimental conditions cannot be comprehended within a single picture, but must be regarded as complementary in the sense that only the totality of the phenomena exhaust the possible information about the objects*. . .*" (Bohr, 1987a, p. 40)

It is interesting that as an adolescent, Bohr had shown interest in describing human conscious processes (Folse, 1985, p. 175). Even in his earlier papers on complementarity and quantum physics, he tried to state how the concept of complementarity could be applied to psychology. For example, he ended his Como lecture,

"I hope, however, that the idea of complementarity is suited to characterize the situation, which bears a deepgoing analogy to the general difficulty in the formation of human ideas, inherent in the distinction between subject and object." (Bohr, 1928, p. 590)

A year later, in a paper he wrote for a Planck *Festschrift* in 1929, he stated his view on applying complementarity to psychology with greater clarity:

"For describing our mental activity, we require, on one hand, an objectively given content to be placed in opposition to a perceiving subject, while, on the other hand, as is already implied in such an assertion, no sharp separation between object and subject can be maintained, since the perceiving subject also belongs to our mental content. From these circumstances follows not only the relative meaning of every concept, or rather of every word, the meaning depending upon our arbitrary choice of view point, but also that we must, in general, be prepared to accept the fact that a complete elucidation of one and the same object may require diverse points of view which defy a unique description. Indeed, strictly speaking, the conscious analysis of any concept stands in a relation of exclusion to its immediate application. The necessity of taking recourse to a complementary, or reciprocal, mode of description is perhaps most familiar to us from psychological problems." (Bohr, 1987b, p. 96)

Complementarity is not limited to physics. Instead, it is a general concept that can be applied to any phenomena that are featured by "a participating observer." As Bohr recognized, these kinds of phenomena are typical in psychology. In the end, psychology is the field that studies "the participating observer"—the observer's perception, attention, emotion, motivation, memory, and decision-making, among other psychological processes.

The concept of complementarity applies naturally to psychological systems. Just like a physical system, a psychological system can be measured in different, mutually exclusive ways. Although all these measurements are essential for describing the system, they cannot be measured simultaneously, only sequentially. In this case, we say the different measurements are complementary (Stapp, 1993). Importantly, this means that the measurement is "an essential part of making a property definite" (Stapp, 1993, p. 234). In other words, measurements do not merely record the property of a system but construct it.

# **EMPIRICAL TESTABILITY OF COMPLEMENTARITY IN PSYCHOLOGY**

Two criticisms are often raised in response to quantum cognition because of misunderstandings of this new research program. One we believe is a false alarm due to a general resistance to—and often a legitimate concern about—the loose, vague, metaphorical, speculative extension of quantum physics to cultural and social studies (Beller, 1998). However, differently from what is being argued against, the research program of quantum cognition rigorously uses mathematical principles of quantum probability theory to build new models of human cognition, develop specific new predictions, and empirically test the new predictions and compare new models against existing traditional models. Just like other cognitive models based on classical probability theory, quantum cognition models take advantage of quantum formalism to provide new theoretical and modeling tools that make precise predictions regarding human cognition.

The other typical criticism questions whether quantum cognition can ever provide the kind of rigor and precision that is shown by quantum mechanics. Unfortunately, it is true that compared to quantum physics, which provides rigorous and precise predictions about physical phenomena, psychological theories involve many more random variables that are hardly controlled, resulting in lower precision in prediction. To be fair, this is a general challenge for any theories in the behavioral and social sciences. However, through rigorous model comparison, empirical studies have shown that quantum models provide an elegant new way to specify general and vague verbal theories in psychology, and better explain and predict many phenomena puzzling to classical models, leading to highly testable models (e.g., Bruza et al., 2015; Busemeyer and Wang, 2015; Busemeyer et al., 2015).

In fact, compared to many other psychological theories and models, quantum cognitive models may be more falsifiable. Because quantum cognitive models are based on a coherent set of axioms that are clearly stated, these models must stand up to strict tests of these axioms in addition to performance comparisons against competing classical models. Using our quantum question order model as an example again, the model provides clear theoretical predictions about when order effects will or will not occur as well as the pattern of order effects that do occur (Wang and Busemeyer, 2013). One of the most convincing examples illustrating the testability of quantum models has been an *a priori*, parameter-free, and precise test called the quantum question equality, or QQ equality (Wang and Busemeyer, 2013; Wang et al., 2014). This equality, derived from quantum theory, imposes a strong symmetry condition on the nature of order effects, and empirical results from more than 70 U.S. national surveys provided surprisingly strong support for this precise prediction (Wang et al., 2014). Rarely in social science research do we find *a priori* and parameter-free predictions being upheld with such high accuracy. Classical models cannot explain—in a principled and *a priori* manner—both the question order effects and the QQ equality observed in the empirical data (Wang et al., 2014).

# **EXTENDING THE CONCEPT OF COMPLEMENTARITY IN PSYCHOLOGY**

Psychology provides an opportunity to extend and enrich the concept of complementarity beyond what is being formulated in physics. When applied to psychology as opposed to physics, compatibility may take on a more fluent and malleable role. Perhaps compatibility varies across individuals, develops across age, and changes with experience. For example, very young children do not seem to have the ability to take on the perspective of another person—this capability to change perspectives develops only after a critical developmental stage (e.g., Epley et al., 2004).

As another example, perhaps compatibility can be formed after an individual has had many experiences with combinations of events that permit the formation of conjunctive concepts. To be more specific, if a combination of questions is new or unusual, then an answer must be constructed on-line that relies on a simpler, incompatible, lower-cost representation. However, if a person has a great deal of experience with a combination, then the person may have sufficient knowledge to form a compatible representation as a result of cognitive adaptation to the environment. Therefore, order effects are expected to occur for uncommon or unfamiliar pairs of questions, whose answers must be (at least partially) constructed on the spot. Indeed, two field experiments during the 1988 and 1992 presidential elections supported this possibility (Simmons et al., 1993). The authors found that the question order effects on issue opinions decreased as the election became closer, which would be predicted by the quantum model because the measurements on issue opinions might more frequently occur over time during media exposure or daily conversations—even if the measurements were not directly noted.

In sum, at this early stage of research, the concept of compatibility is new in psychology, and we can only speculate about which measures will be compatible or incompatible. Then the speculations or assumptions can be empirically tested based on order effects or interference effects of the measures, among other predicted effects that follow incompatibility. However, this will be a crucial question for future research in quantum cognition, which should enrich the concept of complementarity through psychological experiments and theories.

# **REFERENCES**


# **CONCLUDING COMMENTS**

As we have described, the idea of complementarity was introduced into psychology by James (1890). Later, the idea was developed formally and became one of the centerpieces of Niels Bohr's interpretation of quantum mechanics. Unfortunately, for many years the concept appeared to be useful only in physics, and it almost disappeared from the psychological literature (for exceptions, see Grossberg, 2000). In this article, we have attempted to reintroduce the concept of complementarity to its original home in psychology. We think the concept provides an invaluable service toward understanding the fundamental nature of human cognition.

# **ACKNOWLEDGMENTS**

The work was supported by US Air Force Office of Scientific Research (FA 9550-12-1-0397).

phenomena. *Found. Phys.* 29, 1065–1098. doi: 10.1023/A:101888563 2116


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Wang and Busemeyer. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Preference reversal in quantum decision theory

Vyacheslav I. Yukalov 1, 2 \* † and Didier Sornette1, 3 †

<sup>1</sup> Department of Management, Technology and Economics, ETH Zürich, Zürich, Switzerland, <sup>2</sup> Bogolubov Laboratory of Theoretical Physics, Joint Institute for Nuclear Research, Dubna, Russia, <sup>3</sup> Swiss Finance Institute, University of Geneva, Geneva, Switzerland

We consider the psychological effect of preference reversal and show that it finds a natural explanation in the frame of quantum decision theory. When people choose between lotteries with non-negative payoffs, they prefer a more certain lottery because of uncertainty aversion. But when people evaluate lottery prices, e.g., for selling to others the right to play them, they do this more rationally, being less subject to behavioral biases. This difference can be explained by the presence of the attraction factors entering the expression of quantum probabilities. Only the existence of attraction factors can explain why, considering two lotteries with close utility factors, a decision maker prefers one of them when choosing, but evaluates higher the other one when pricing. We derive a general quantitative criterion for the preference reversal to occur that relates the utilities of the two lotteries to the attraction factors under choosing vs. pricing and test successfully its application on experiments by Tversky et al. We also show that the planning paradox can be treated as a kind of preference reversal.

#### Edited by:

Sandro Sozzo, University of Leicester, UK

#### Reviewed by: Peter Dixon,

University of Alberta, Canada Andrei Khrennikov, Linnaeus University, Sweden

#### \*Correspondence:

Vyacheslav I. Yukalov, Department of Management, Technology and Economics, ETH Zürich, Scheuchzerstrasse 7, Zürich CH-8032, Switzerland syukalov@ethz.ch

† These authors have contributed equally to this work.

#### Specialty section:

This article was submitted to Cognition, a section of the journal Frontiers in Psychology

Received: 07 June 2015 Accepted: 23 September 2015 Published: 08 October 2015

#### Citation:

Yukalov VI and Sornette D (2015) Preference reversal in quantum decision theory. Front. Psychol. 6:1538. doi: 10.3389/fpsyg.2015.01538 Keywords: preference reversal, decision theory, uncertainty, behavioral quantum probability, planning paradox

# 1. Introduction

For many decades, psychologists and economists have been intrigued by a seemingly anomalous effect termed preference reversal. The simplest example illustrating this effect is as follows. First, subjects are asked to choose between two lotteries, say L<sup>1</sup> and L2, such that L<sup>1</sup> has a high chance to win a relatively modest prize, while L<sup>2</sup> offers a lower chance of winning, but an essentially larger prize. The majority of subjects choose the more certain win of lottery L1, despite the fact that lottery L<sup>2</sup> can enjoy a larger expected utility. Then subjects are asked to price each of the lotteries, as if they would own them and wish to sell the right to play them. Surprisingly, the majority of subjects price higher the less certain lottery L<sup>2</sup> in apparent contradiction with their previous choice. This example embodies the essence of the preference reversal effect.

Among the first scientists emphasizing the existence of this effect were Lindman (1971) and Lichtenstein and Slovic (1971, 1973). Their studies were followed by several authors demonstrating the occurrence of this effect in psychology and economics (Grether and Plott, 1979; Loomes and Sugden, 1983; Holt, 1986; Goldstein and Einborn, 1987; Karni and Safra, 1987; Segal, 1988; Tversky et al., 1988; Schkade and Johnson, 1989). Many other citations can be found in the review articles (Slovic and Lichtenstein, 1983; Tversky and Thaler, 1990; Tversky et al., 1990). The experimental studies have established the clear validity and robustness of the preference reversal phenomenon.

The preference reversal effect looks surprising because, according to the common understanding of utility, the choice among the given lotteries should be based on the objective values of the latter, thus, being procedure invariant. Since the lottery values are not changed, why then is the preference reversed?

It has been proved by Tversky and Thaler (1990) and Tversky et al. (1990) that it is the breaking of procedure invariance that is responsible for the preference reversal phenomenon. It turns out that subjects weight more heavily payoffs in pricing than in choice, so that the preference reversal is a purely psychological effect.

The origin of the preference reversal has been recently explained from the point of view of neurology by Kim et al. (2012). It has been experimentally shown that there exists correlation between visual fixation and preferences. Visual fixations both reflect and influence preferences. From one side, these fixations reflect which objects seem to be more important for the subject. And, from the other side, such fixations modulate the neural correlates of preferences, with activity in ventromedial prefrontal cortex and ventral striatum, reflecting the value of the fixated item compared to the value of the item not fixated. Kim et al. studied the process of decision making under risk and measured eye movements while people chose between gambles or bid in pricing gambles. Consistently with the previous work, they found that, for two gambles matched in expected value, people systematically chose the higher probability option, but requested a higher ask price for the option that offered the greater amount to win, thus demonstrating preference reversal.

This effect was accompanied by a shift in fixation of the two attributes, with people fixating more on probabilities during choices and more on amounts during selling. In this way, there exists probability-vs.-amount dichotomy: When choosing, one pays more attention to probabilities while, when selling, one better appreciates amounts.

Understanding the cause of the preference reversal is the first necessary step. The next step should be the description of this effect by a mathematical model. Previous suggested models were not successful, as was analyzed by Tversky and Thaler (1990) and Tversky et al. (1990). In the present paper, we show that the effect of preference reversal finds a simple and natural explanation in the frame of the Quantum Decision Theory developed by the authors (Yukalov and Sornette, 2008, 2009a,b, 2010, 2011, 2013, 2014a,b, 2015).

# 2. Basics of Quantum Decision Theory

There exists several approaches applying quantum notions to psychological sciences, as can be inferred from the books (Khrennikov, 2010; Busemeyer and Bruza, 2012; Bagarello, 2013; Haven and Khrennikov, 2013) and the review articles (Yukalov and Sornette, 2009b; Busemeyer et al., 2014; Sornette, 2014; Ashtiani and Azgomi, 2015), where numerous citations to the previous literature can be found. Quantum Decision Theory (QDT) principally differs from all those approaches in two aspects. First, QDT is based on a self-consistent mathematical foundation that is common for both quantum measurement theory and quantum decision theory. Starting from the von Neumann (1955) theory of quantum measurements, we have generalized it to the case of uncertain or inconclusive events, making it possible to characterize uncertain measurements and uncertain prospects. Second, the main formulas of QDT are derived from general principles, giving the possibility of quantitative predictions, without fitting parameters. This is in contrast with the usual way of constructing particular models for describing some concrete experiments, with fitting the model parameters from empirical data.

We shall not repeat here the mathematical foundation of QDT that has been thoroughly expounded in our previous papers, but we will just briefly recall the resulting formulas that are necessary for describing the preference reversal effect.

Let us consider a composite event, called prospect,

$$
\pi\_n = A\_n \bigotimes B \,. \tag{1}
$$

Here A<sup>n</sup> is an operationally testable event, represented in a Hilbert space by an eigenstate |ni. While B = {Bα, bα} is an inconclusive event that is a set of possible events Bα, represented in a Hilbert space by eigenstates |αi, and equipped with random amplitudes bα, so that the inconclusive event is represented by a state |Bi = P α bα|αi.

The prospect operator is Pˆ(πn) = |nBihnB|, such that the prospect probability is given by the quantum formula

$$p(\pi\_n) = \operatorname{Tr} \hat{\rho} \hat{P}(\pi\_n) \; , \tag{2}$$

where ρˆ is a strategic state of a decision maker. By construction, the prospect probability enjoys the properties of a probability measure:

$$\sum\_{n} p(\pi\_n) = 1 \; , \qquad 0 \le p(\pi\_n) \le 1 \; . \tag{3}$$

It is easy to show that the prospect probability takes the form

$$p(\pi\_n) = f(\pi\_n) + q(\pi\_n) \ , \tag{4}$$

where the first term is called utility factor, characterizing the utility of the prospect, while the second term is attraction factor representing behavioral biases.

The intuitive explanation of the above probability expression (4) is straightforward: The definition of a quantum probability (2) for a composite event can be separated into a term containing diagonal matrix elements and a term including off-diagonal elements. The diagonal elements compose the term f(πn), while the off-diagonal elements define the term q(πn). The occurrence of an off-diagonal term is a typical feature of quantum theory, where this quantity is called interference term or coherence term. The existence of such an interference term constitutes the principal difference of the quantum approach from the classical consideration, where there are no interference terms. It is the appearance of interference terms that makes the structure of quantum expressions richer then the related classical ones and that allows one to explain those psychological phenomena that, otherwise, are inexplicable in classical decision making. Sometimes, the quantum approach even yields conclusions that are impossible in classical decision making, as, for instance, the possibility to agree on disagree (Khrennikov and Basieva, 2014). Below we show that this interference term, composing the attraction factor, is essential in explaining the existence of the preference reversal effect that cannot be described in classical decision theory.

The prospect probability satisfies the quantum-classical correspondence principle.

$$p(\pi\_n) \to f(\pi\_n) \,, \qquad q(\pi\_n) \to 0 \,\, . \tag{5}$$

This defines the utility factor as a classical-type probability, with the standard properties

$$\sum\_{n} f(\pi\_n) = 1 \; , \qquad 0 \le f(\pi\_n) \le 1 \; . \tag{6}$$

This is equivalent to the normalization condition

$$\sum\_{n\alpha} |b\_{\alpha}|^2 \langle n\alpha \mid \hat{\rho} \mid n\alpha \rangle = 1 \,,$$

imposing a constraint on the random quantities bα.

When considering lotteries, an event A<sup>n</sup> ≡ A(Ln) implies the choice of a lottery Ln. Then the inconclusive set B characterizes the decision maker hesitations between uncertain events Bα, describing uncertainty with respect to the decision maker ability and with respect to the lottery formulation (Yukalov and Sornette, 2014b, 2015). The explicit form of the utility factor is given by minimizing the Kullback-Leibler information functional, which in the simple case of uncertainty yields

$$f(\pi\_n) = \frac{U(L\_n)}{\sum\_n U(L\_n)}\,,\tag{7}$$

with U(Ln) being the expected utility of a lottery Ln. Note that the minimization of the information functional results in expression (7) that might be familiar to psychologists as a Luce (1959) choice rule using utility as response strength.

The attraction factor reflects the effects of quantum coherence and interference, and in decision theory it represents the behavioral biases rendering the prospects more or less attractive from the subconscious point of view of decision maker. By their definition, attraction factors lie in the interval

$$-1 \le q(\pi\_n) \le 1\tag{8}$$

and satisfy the alternation property

$$\sum\_{n} q(\pi\_{n}) = 0 \; . \tag{9}$$

Also, in the case of non-informative priors, the attraction factors for the considered prospect lattice {π<sup>n</sup> :<sup>n</sup> <sup>=</sup> <sup>1</sup>, <sup>2</sup>, . . . , <sup>N</sup>} obey the quarter law

$$\frac{1}{N} \sum\_{n=1}^{N} |q(\pi\_n)| = \frac{1}{4} \,\, . \tag{10}$$

This law makes it admissible to estimate the attraction factors by the values ±0.25, thus quantitatively predicting preferences.

The prospect lattice is ordered by the values of prospect probabilities. A prospect π<sup>i</sup> is termed preferable to π<sup>j</sup> if and only if

$$p(\pi\_i) > p(\pi\_j) \qquad \quad (\pi\_i > \pi\_j) \; . $$

At the same time, a prospect π<sup>i</sup> is more useful than π<sup>j</sup> when f(πi) > f(πj). A prospect π<sup>i</sup> is more attractive than π<sup>j</sup> , when q(πi) > q(πj). In this way, a prospect can be more useful but less attractive, as a result being less preferable.

A necessary condition for the existence of a nonzero attraction factor is that the composite prospect be entangled (Yukalov and Sornette, 2014a, 2015). Otherwise, there is no need of involving quantum probabilities.

# 3. General Criterion of Preference Reversal

Preference reversal may naturally arise in the frame of quantum decision theory. In this section, we derive the general criterion for the occurrence of this effect.

Suppose a decision maker considers a lattice of just two prospects

$$
\pi\_n = A(L\_n) \bigotimes B \qquad \left(n = 1, 2\right)\,,\tag{11}
$$

with the intention of choosing between them. Here A(Ln) implies the action of choosing a lottery Ln. And B is a set incorporating uncertainties associated with this choice. Let one prefer the prospect π<sup>1</sup> against π2, which means that

$$
\rho(\pi\_1) > \rho(\pi\_2) \qquad (\pi\_1 > \pi\_2) \,. \tag{12}
$$

Taking into account the alternation property, we have

$$q(\pi\_1) + q(\pi\_2) = 0 \; . \tag{13}$$

This tells us that the prospect π<sup>1</sup> is preferred to π<sup>2</sup> if and only if

$$f(\pi\_2) - f(\pi\_1) < 2q(\pi\_1) \,. \tag{14}$$

Now, assume that the decision maker plans to price the given lotteries, e.g., wishing to sell them. The lotteries remain the same as before. However, uncertainties in selling are of course different from those when choosing, hence, the uncertain set B ′ , associated with selling, is different from the set B including uncertainties associated with choosing. Now, the decision maker evaluates the two different prospects

$$
\pi\_n = A(L\_n) \bigotimes\_{\mathcal{B}} B' \qquad (n = 3, 4) \,, \tag{15}
$$

where L<sup>1</sup> = L<sup>3</sup> and L<sup>2</sup> = L4.

Preference reversal implies that, contrary to the situation with choosing, now the decision maker evaluates higher the prospect π<sup>4</sup> compared to π3, so that

$$p(\pi\_3) < p(\pi\_4) \qquad \left(\pi\_3 < \pi\_4\right). \tag{16}$$

In view of the alternation property

$$q(\pi\_3) + q(\pi\_4) = 0 \; , \tag{17}$$

the preference of π<sup>4</sup> occurs only when

$$f(\pi\_4) - f(\pi\_3) < 2q(\pi\_3) \,. \tag{18}$$

Since the lotteries are the same (L<sup>1</sup> = L<sup>3</sup> and L<sup>2</sup> = L4), their expected utilities are pairwise equal: U(L1) = U(L3) and U(L2) = U(L4). Therefore, the utility factors are also pairwise equal

$$f(\pi\_1) = f(\pi\_3) \,, \qquad f(\pi\_2) = f(\pi\_4) \,. \tag{19}$$

Combining the above conditions, we obtain the preference reversal criterion:

$$2q(\pi\_3) \prec f(\pi\_2) - f(\pi\_1) \prec 2q(\pi\_1) \,. \tag{20}$$

Let us stress that in classical decision making, where q(π1) = q(π3) ≡ 0, the inequalities (20) cannot hold, which means that it is impossible to suggest a self-consistent mathematical explanation of the preference reversal phenomenon in classical terms, which is in agreement with discussions by Tversky and Thaler (1990) and Tversky et al. (1990).

Criterion (20) not only explains the preference reversal phenomenon, but it also provides a quantitative estimate of how likely it may happen, as well as a posteriori confirmation of why it has happened. This is because the attraction factors are not just some additional arbitrary characteristics, but because their signs are prescribed by the risk aversion notion, while their values are constrained by conditions (8) – (10). Thus, due to risk aversion when facing several choices, the more certain lottery is more attractive, hence q(π1) > q(π2), which, in view of the alternation property (9), implies that q(π1) > 0, while q(π2) < 0. Contrary to this, when pricing, risk aversion is absent, hence more attractive is the lottery that can provide the larger gain, so that q(π4) > q(π3), which, again taking into account the alternation property (9), tells us that q(π3) < 0 while q(π4) > 0. Estimating the absolute values of the attraction factors by the quantity 0.25, which follows from the quarter law (10), we have the criterion

$$-\frac{1}{2} < f(\pi\_2) - f(\pi\_1) < \frac{1}{2} \ . .$$

Therefore, if the given lotteries are such that their utility factors satisfy the above inequalities, we may expect that preference reversal can occur. And, vice versa, if preference reversal has happened, then the above inequalities must hold. Below we demonstrate that criterion (20) really provides a necessary and sufficient conditions for the preference reversal phenomenon.

# 4. Confirmation of Preference Reversal Criterion

To confirm the validity of the preference reversal criterion, let us test it with empirical data of decision-making experiments. We shall consider pairs of lotteries with the notation of the previous section. The prospects, related to the choice between the lotteries L<sup>1</sup> and L2, are denoted as π<sup>1</sup> and π2, respectively. The prospects, corresponding to pricing of these lotteries, will be denoted by π<sup>3</sup> and π4. The expected utility of a lottery L = {xi, p(xi)}, consisting of payoffs x<sup>i</sup> , with their weights p(xi), will be calculated by the formula U(L) = P i xip(xi). And the utility factors are given by expression (7).

**Example 1**. Let us start with the example given by Tversky and Thaler (1990). Consider two lotteries

$$L\_1 = \left\{4, \frac{8}{9} \mid 0, \frac{1}{9}\right\} \quad , \qquad L\_2 = \left\{40, \frac{1}{9} \mid 0, \frac{8}{9}\right\} \; ,$$

whose payoffs 4 and 40 are given in some monetary units. The type of units, whether these are Dollars, or Euro, or Francs, is not of importance, since such units are canceled in definition (7) of utility factors. This is one of the advantage of employing the dimensionless utility factors that are invariant with respect to the type of payoff measures. The corresponding expected utilities

$$U(L\_1) = \frac{32}{9} \; , \qquad \quad U(L\_2) = \frac{40}{9} \; , \; .$$

result in the utility factors

$$f(\pi\_1) = \frac{4}{9}\ , \qquad f(\pi\_2) = \frac{5}{9}\ ,\ )$$

which show that the second lottery is more useful.

The experimental probabilities are defined as the fractions of subjects preferring the related lotteries. According to Tversky and Thaler (1990), in the case of choice, it was found that 71% of decision makers preferred the more certain lottery L1, so that

$$p(\pi\_1) = 0.71 \,>\, p(\pi\_2) = 0.29\,\,,$$

despite that this lottery is less useful. In view of (4), this corresponds to the attraction factors

$$q(\pi\_1) = 0.266\,, \qquad q(\pi\_2) = -0.266\,\,.$$

However, when pricing, 67% of subjects found Lottery L<sup>2</sup> more valuable, so that

$$p(\pi\_3) = 0.33 \, < \, p(\pi\_4) = 0.67 \, ,$$

despite that the win in this lottery is less probable. The related attraction factors are

$$q(\pi\_3) = -0.114\ \ , \qquad q(\pi\_4) = 0.114\ \ .$$

Notice that, in the case of pricing, the attraction factor signs are reversed as compared to the case of choosing. This is in agreement with the probability-amount dichotomy (Kim et al., 2012): when choosing, one accepts as more attractive the lottery with a higher probability win, while when pricing, one treats as more attractive the lottery with a higher payoff amount. In the process of pricing, decision makers usually are more pragmatic, evaluating higher the more useful lottery.

Combining the data of this experiment, the two inequalities (20) read

$$-0.228 < 0.111 < 0.452\ ,$$

which confirms the prediction of QDT.

**Example 2**. When there is no preference reversal, the criterion (20) does not hold. To illustrate this, let us consider an example treated by Tversky et al. (1990), taking the lotteries

$$L\_1 = \{100, 0.97 \mid 0, 0.03\} \,, \qquad L\_2 = \{400, 0.31 \mid 0, 0.69\} \,,$$

Their expected utilities are

$$U(L\_1) = 97\text{ }, \qquad U(L\_2) = 124\text{ },$$

which yields the utility factors

$$f(\pi\_1) = 0.439\ , \qquad f(\pi\_2) = 0.561\ .$$

The first lottery is essentially more certain, and subjects overwhelmingly tend to prefer this lottery, so that

$$p(\pi\_1) = 0.91 \,>\, p(\pi\_2) = 0.09\,\,.$$

According to (4), the related attractions factors are

$$q(\pi\_1) = 0.471\,, \qquad q(\pi\_2) = -0.471\,\,.$$

When pricing, subjects pay higher attention to the payoff amounts so that the fraction of decision makers preferring the first lottery is drastically reduced. However, the preference reversal does not occur per se, with the (more narrow) majority pricing the first lottery higher:

$$p(\pi\_3) = 0.54 \, > \, p(\pi\_4) = 0.46 \, .$$

The corresponding attraction factors are

$$q(\pi\_3) = 0.101\,, \qquad q(\pi\_4) = -0.101\,\,.$$

Since

$$f(\pi\_2) - f(\pi\_1) = 0.122 < 2q(\pi\_3) = 0.202\ ,$$

criterion (20) is not fulfilled, which is the expected situation in absence of preference reversal.

This example demonstrates that, although in pricing, one pays a higher attention to payoff amounts, however, the focus is not exclusively on this amount. Probabilities can also influence decisions, together with amounts.

**Example 3**. Another example from Tversky et al. (1990) deals with the lotteries

$$L\_1 = \{12, 0.92 \mid 0, 0.08\} \,, \qquad L\_2 = \{175, 0.06 \mid 0, 0.94\} \,,$$

The first lottery is both more certain as well as more useful, with the expected utilities

$$U(L\_1) = 11.04\ , \qquad U(L\_2) = 10.5\ $$

and the utility factors

$$f(\pi\_1) = 0.513\,, \qquad f(\pi\_2) = 0.487\,\,.$$

It is not surprising that, when choosing, decision makers prefer this lottery according to

$$p(\pi\_1) = 0.81 \,>\, p(\pi\_2) = 0.19\,.$$

The related attraction factors are

$$q(\pi\_1) = 0.297\,, \qquad q(\pi\_2) = -0.297\,\,.$$

When pricing, subjects take into account that the second lottery can provide a much higher payoff, yet with too small a probability. As a result, the fraction of decision makers preferring the first lottery diminishes, but preference reversal does not happen:

$$p(\pi\_3) = 0.58 \, \text{> } p(\pi\_4) = 0.42 \, \text{.}$$

In pricing, the first lottery becomes less attractive than in choosing, but remains more attractive than the second lottery, with the attraction factors

$$q(\pi\_3) = 0.067\,, \qquad q(\pi\_4) = -0.067\,\,.$$

In view of the relations

$$f(\pi\_2) - f(\pi\_1) = -0.026 \,\, <\,\, 2q(\pi\_3) = 0.134\,\,.$$

criterion (20) does not hold, in agreement with the absence of preference reversal. Again, we see that payoff amounts as well as probabilities are considered in the process of pricing, although the role of payoff amounts, without doubt, is more important in pricing than in choosing.

We have also analyzed a large set of data presented by Tversky et al. (1990), demonstrating the effect of preference reversal. Pairs of lotteries were presented to 198 participants. In each pair, one of the lotteries, L1, had a high probability, while the other, L2, a higher payoff with lower probability. These lotteries are given in **Table 1**. In each lottery, the first number is a payoff and the next number is the probability of this payoff. A lottery is represented as a set {x, p(x)}, implying that one gets either the payoff x, with probability p(x), or nothing, with probability 1 − p(x). The expected utilities and utility factors are shown. The first six lottery pairs include rather small payoffs. The following five pairs contain much larger payoffs by a factor of 25. And the last five pairs present a mixture of large and small payoffs. All the cases demonstrate the effect of preference reversal.

In **Table 2**, we show the prospect probabilities p(π1) and p(π3), with the corresponding attraction factors q(π1) and q(π3), demonstrating preference reversal, since p(π1) > p(π2), although p(π3) < p(π4). Those quantities that are not presented can be found from the relations

$$\begin{aligned} f(\pi\_1) &= f(\pi\_3) \ , & f(\pi\_2) &= f(\pi\_4) \ , \\ p(\pi\_2) &= 1 - p(\pi\_1) \ , & p(\pi\_4) &= 1 - p(\pi\_3) \ , \\ q(\pi\_2) &= -q(\pi\_1) \ , & q(\pi\_4) &= -q(\pi\_3) \ .\end{aligned}$$

TABLE 1 | Pairs of lotteries, with their expected utilities and utility factors.


TABLE 2 | Probability p(π<sup>1</sup> ) defined as the fraction of decision makers choosing the lottery L<sup>1</sup> , and probability p(π<sup>3</sup> ) defined as the fraction of subjects pricing the lottery L<sup>1</sup> higher.


The corresponding attraction factors q(π1) and q(π3), and the combination [f(π2)−f(π1)]/2 that should be compared with those attraction factors according to criterion (20) obtained from QDT, which reads here q(π3) < [f(π2) − f(π1)]/2 < q(π1).

We also show the value [f(π2)−f(π1)]/2 that has to be compared with q(π3) and q(π1) in order to check the validity of criterion (20). As is seen from **Table 2**, the preference reversal criterion (20) is always valid.

Since, in each pair of lotteries considered in the case of choosing or pricing, the utility factors do not change, the preference reversal effect can be interpreted within QDT as caused by the existence of the attraction factors. If one would evaluate the lotteries solely on the basis of rational utility, no preference reversal would occur. However, preferences of decision makers involve irrational feelings and biases as well as other considerations not included in the utility, which are embodied in the attraction factors, accounting for the phenomenon of preference reversal. In order to characterize the deviation from rationality during decision making over a family of N trials, we can introduce the irrationality measure

$$\delta\_{\vec{\jmath}} \equiv \frac{1}{N} \sum\_{n=1}^{N} |q(\pi\_{\vec{\jmath}})| \ .$$

Then δ<sup>1</sup> measures the level of irrationality in the course of choosing, while δ<sup>3</sup> describes the degree of irrationality in the process of pricing. From **Table 2**, we find δ<sup>1</sup> = 0.299 and δ<sup>3</sup> = 0.118. Thus, people seem to be significantly more irrational when choosing, as compared to pricing. In other words, the evaluation of lotteries in pricing is more rational.

# 5. Discussion

We have shown that the phenomenon of preference reversal, which is treated as an anomaly in classical decision making, finds a natural explanation in the frame of quantum decision theory. In the latter, the preference probability consists of two terms, the utility factor quantifying the utility of a prospect, and the attraction factor characterizing behavioral biases of a decision maker. In that way, a prospect probability, defined as a quantum quantity, has the meaning of a behavioral probability taking into account both utility of the considered prospects, as well as their attractiveness for the decision maker, due to subconscious behavioral biases. We have formulated the criterion associated within QDT with preference reversal and we have illustrated its validity for a large set of empirical data.

We summarize the key steps of the logic we have followed.


We have thus demonstrated that QDT predicts the existence of two inequalities for the reversal to occur, that turn out to be confirmed.

It is worth noting that the effect of preference reversal does not only occur when choice is compared with pricing, but similar reversals can happen in other cases. As another illustration, we can mention the so-called planning paradox that can be represented by the following stylized example.

Suppose one is deliberating about stopping smoking. Let the imaginary plan to stop smoking be denoted as the prospect π1, while continuing smoking corresponds to prospect π2. The utility of not smoking clearly overweights that of smoking because of evident health reasons. In contrast, the negative feelings, connected with addiction, are yet too imaginary to influence the mood of the decision maker. We thus expect that the related attraction factors should be rather small, so that the decision is based mainly on rational grounds. Hence, the preference in this plan π<sup>1</sup> is expressed by the inequality p(π1) > p(π2), implying that the majority of subjects would like to stop smoking.

However, when one has to choose to really stop smoking now (but not in the future), then one actually meets another alternative: really stop smoking, which can be denoted as the prospect π3, or continue smoking, the prospect π4. Deciding whether to really stop smoking now, one immediately confronts negative feelings anticipating the suffering resulting from addiction. This translates into the appearance of a negative

# References


attraction factor q(π3) devaluating the utility of not smoking. As a result, p(π3) becomes smaller than p(π4), which means that the majority of people do not really quit smoking.

This planning paradox gives a clear example of preference reversal, which cannot be understood in terms of classical utility considerations, since the utility of prospects does not change. But there is no paradox in quantum decision theory, where the effect of preference reversal is explained by the variation of attraction factors. Numerous data, collected by Walsh and Sanson-Fisher (2001) from the World Health Organization, confirm the robust existence of the preference reversal in the stop-smoking planning paradox. Thus the preference reversal is a rather general phenomenon that obtains a straightforward explanation in the framework of quantum decision theory.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Yukalov and Sornette. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Noncontextuality with marginal selectivity in reconstructing mental architectures

Ru Zhang and Ehtibar N. Dzhafarov \*

*Department of Psychological Sciences, Purdue University, West Lafayette, IN, USA*

We present a general theory of series-parallel mental architectures with selectively influenced stochastically non-independent components. A mental architecture is a hypothetical network of processes aimed at performing a task, of which we only observe the overall time it takes under variable parameters of the task. It is usually assumed that the network contains several processes selectively influenced by different experimental factors, and then the question is asked as to how these processes are arranged within the network, e.g., whether they are concurrent or sequential. One way of doing this is to consider the distribution functions for the overall processing time and compute certain linear combinations thereof (interaction contrasts). The theory of selective influences in psychology can be viewed as a special application of the interdisciplinary theory of (non)contextuality having its origins and main applications in quantum theory. In particular, lack of contextuality is equivalent to the existence of a "hidden" random entity of which all the random variables in play are functions. Consequently, for any given value of this common random entity, the processing times and their compositions (minima, maxima, or sums) become deterministic quantities. These quantities, in turn, can be treated as random variables with (shifted) Heaviside distribution functions, for which one can easily compute various linear combinations across different treatments, including interaction contrasts. This mathematical fact leads to a simple method, more general than the previously used ones, to investigate and characterize the interaction contrast for different types of series-parallel architectures.

Keywords: interaction contrast, mental architectures, noncontextuality, response time, selective influences, series-parallel network

# 1. Introduction

The notion of a network of mental processes with components selectively influenced by different experimental factors was introduced to psychology in Saul Sternberg's (1969) influential paper. Sternberg considered networks of processes a, b,c, . . . involved in performing a mental task. Denoting their respective durations by A, B, C . . ., the hypothesis he considered was that the observed response time T is A + B + C + . . . . One cannot test this hypothesis, Sternberg wrote, without assuming that there are some factors, α, β, γ, . . ., that selectively influence the durations A, B, C . . ., respectively. Sternberg's analysis was confined to stochastically independent A, B, C, . . ., and the consequences of the assumptions of seriality and selective influences were tested on the level of the mean response times only.

#### Edited by:

*Sandro Sozzo, University of Leicester, UK*

#### Reviewed by:

*James T. Townsend, Indiana University, USA Joseph W. Houpt, Wright State University, USA*

#### \*Correspondence:

*Ehtibar N. Dzhafarov, Department of Psychological Sciences, Purdue University, 703 Third Street, West Lafayette, IN 47907-2081, USA ehtibar@purdue.edu*

#### Specialty section:

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology*

> Received: *31 March 2015* Accepted: *17 May 2015* Published: *17 June 2015*

#### Citation:

*Zhang R and Dzhafarov EN (2015) Noncontextuality with marginal selectivity in reconstructing mental architectures. Front. Psychol. 6:735. doi: 10.3389/fpsyg.2015.00735*

Subsequent development of these ideas was aimed at the entire distributions of the response times and at a greater diversity and complexity of mental architectures than just series of "stages." This development prominently includes Townsend (1984, 1990a,b); Schweickert and Townsend (1989); Townsend and Schweickert (1989); Roberts and Sternberg (1993); Townsend and Nozawa (1995); Schweickert et al. (2000), and several other publications, primarily by James Townsend and Richard Schweickert with colleagues. For an overview of these developments see Dzhafarov (2003) and Schweickert et al. (2012). In the present context we should separately mention the development of the ideas of stochastic ordering of processing times in Townsend (1984, 1990a) and Townsend and Schweickert (1989); as well as the idea of marginal selectivity (Townsend and Schweickert, 1989).

The notion of selective influences also underwent a significant development, having been generalized from stochastically independent random variables to arbitrarily interdependent ones (Dzhafarov, 2003; Dzhafarov and Gluhovsky, 2006; Kujala and Dzhafarov, 2008; Dzhafarov and Kujala, 2010, in press). The essence of the development is easy to understand using two random variables (e.g., process durations) A, B selectively influenced by two respective factors α, β. In Dzhafarov's (2003) notation, this is written (A, B) " (α, β). According to the definition given in Dzhafarov (2003), this means that there are functions f and g and a random variable R (a common source of randomness) such that f (α, R) = A and g (β, R) = B. If such a choice of f, g, R exists, it is not unique. For instance, R can always be chosen to have any distribution that is absolutely continuous with respect to the usual Borel measure on the real line (e.g., a standard uniform, or standard normal distribution, see Dzhafarov and Gluhovsky, 2006). However, a triple f, g, R need not exist. It does not exist, e.g., if marginal selectivity (Townsend and Schweickert, 1989) is violated, i.e., if the distribution of, say, A at a given value of α changes in response to changing β. But marginal selectivity is not sufficient for the existence of a triple f, g, R . Let, e.g., α and β be binary factors, with values 1, 2 each, and let the correlation ρ between A and B for a treatment (α, β) be denoted ραβ. Then the triple in question does not exist if the correlations violated the "cosphericity test" (Kujala and Dzhafarov, 2008), also known in quantum mechanics as Landau's inequality (Landau, 1988):

$$|\rho\_{11}\rho\_{12} - \rho\_{21}\rho\_{22}| \le \overline{\rho}\_{11}\overline{\rho}\_{12} + \overline{\rho}\_{21}\overline{\rho}\_{22},\tag{1}$$

where ραβ = q 1 − ρ 2 αβ. There are many other known conditions that must be satisfied for the existence of a triple f, g, R when marginal selectivity is satisfied (Dzhafarov and Kujala, 2010, 2012a,b, 2013, 2014a).

The allusion to quantum mechanics is not accidental: as shown in Dzhafarov and Kujala (2012a,b), the theory of selective influences in psychology can be viewed as a special application of the theory of (non)contextuality. This theory is interdisciplinary (Khrennikov, 2009; Dzhafarov and Kujala, 2014b,c,d), but its origins are in quantum theory, dating from Kochen and Specker (1967) and Bell's (1964, 1966) celebrated work. For the modern state of the theory see Dzhafarov et al. (in press). A simplified account of the (non)contextuality analysis of the example given above is as follows. One labels each random variable in play contextually, i.e., by what property is being measured/recorded under what treatment (context):

The notation here is, of course, redundant, because the context and property identifiers overlap, but we need now to emphasize the logic rather than achieve notational convenience. Once the labeling is done, one looks at all possible joint distributions imposable on all these random variables, for all properties and all treatments. A system is noncontextual if there exists such a joint distributions in which any two random variables that represent the same property ("what is measured") are equal with probability 1. The latter is possible only if the random variables representing the same property always have the same distribution: in our case

$$(A\_{\mathfrak{a}})^{(\mathfrak{a},\mathfrak{b})} \sim (A\_{\mathfrak{a}})^{(\mathfrak{a},\mathfrak{b}')}, \left(B\_{\mathfrak{b}}\right)^{(\mathfrak{a},\mathfrak{b})} \sim \left(B\_{\mathfrak{b}}\right)^{(\mathfrak{a}',\mathfrak{b})} \tag{3}$$

for any values α, β, α ′ , β ′ of the two factors. This is called consistent connectedness (Dzhafarov et al., in press), and in physics is known under a variety of names, including (in certain paradigms) "no-signaling condition" (Popescu and Rohrlich, 1994; Cereceda, 2000; Masanes et al., 2006). In psychology, this is marginal selectivity. The definition of noncontextuality just given is not the most general one, as the notion of contextuality can be extended to inconsistently connected (violating marginal selectivity) systems (Dzhafarov et al., in press), but we do not need this generality in this paper. What is important for us here is that the existence of a joint distribution mentioned in our definition is equivalent to the existence of a random variable R and the functions f, g mentioned in the introductory paragraph.

It is easy to show (Dzhafarov, 2003) that the existence of a triple f, g, R for given joint distributions of (A, B) under different treatments (α, β) is equivalent to the existence of a quintuple f ′ , g ′ , S, SA, S<sup>B</sup> , where S, SA, S<sup>B</sup> are random variables, such that f ′ (α, S, SA) = A and g ′ (β, S, SB) = B. In such a representation, one can speak of a common source of randomness S and specific sources of randomness SA, SB. In Dzhafarov et al. (2004) this representation was used to investigate different series-parallel arrangements of the hypothetical durations A and B. The reason this representation has been considered convenient is that if one fixes the value S = s, then f ′ (α,s, SA) = A<sup>c</sup> and g ′ (β,s, SB) = B<sup>c</sup> are stochastically independent random variables. One can therefore use theorems proved for stochastically independent selectively influenced components (Schweickert et al., 2000) to obtain a general result by averaging across possible values of s. For instance, let α, β be binary factors (with values 1, 2 each), and let us assume that the observed duration Tαβ is min Aα, B<sup>β</sup> for every treatment (α, β). Then Tαβ<sup>s</sup> = min Aαs, Bβ<sup>s</sup> for every value S = s, and it is known that, for the independent Aαs, Bβ<sup>s</sup> (satisfying a prolongation condition, as explained below),

$$\Pr\left(T\_{11s} \le t\right) - \Pr\left(T\_{12s} \le t\right) - \Pr\left(T\_{21s} \le t\right) + \Pr\left(T\_{22s} \le t\right) \le 0.5\tag{4}$$

Since this should be true for every value S = s, then it should also be true that

$$\begin{split} \mathcal{C}\left(t\right) = \Pr\left(T\_{11} \le t\right) - \Pr\left(T\_{12} \le t\right) - \Pr\left(T\_{21} \le t\right) \\ &+ \Pr\left(T\_{22} \le t\right) \le 0. \end{split} \tag{5}$$

This follows from the fact that

$$\Pr\left(T\_{\alpha\beta} \le t\right) = \int \Pr\left(T\_{\alpha\beta s} \le t\right) \text{dm}\left(s\right),\tag{6}$$

where m (s) is the probability measure for S, and the integration is over the space of all possible s. The linear combination C (t) in (5) is called the interaction contrast of distributions functions.

The Prolongation Assumption used in Dzhafarov et al. (2004), and derived from Townsend (1984, 1990a) and Townsend and Schweickert (1989), is that, for every S = s,

$$\Pr\left(A\_{1s} \le t\right) \ge \Pr\left(A\_{2s} \le t\right), \text{ } \Pr\left(B\_{1s} \le t\right) \ge \Pr\left(B\_{2s} \le t\right). \tag{7}$$

For this particular architecture, T = min (A, B), this is the only assumption needed. To prove analogous results for more complex mental architectures, however, one needs additional assumptions, such as the existence of density functions for Aαs, Bβ<sup>s</sup> at every s, and even certain ordering of these density functions in some vicinity [0, τ].

The same results, however, can be obtained without these additional assumptions, if one adopts the other, equivalent definition of selective influences: f (α, R) = A and g (β, R) = B, for some triple f, g, R . If such a representation exists, then

$$a\_{\alpha r} = f\left(\alpha, r\right), b\_{\beta r} = \mathfrak{g}\left(\beta, r\right) \tag{8}$$

are deterministic quantities (real numbers), for every value R = r. Any real number x in turn can be viewed as a random variable whose distribution function is a shifted Heaviside function

$$h(t - \mathbf{x}) = \begin{cases} 0, \text{ if } t < \mathbf{x}, \\ 1, \text{ if } t \ge \mathbf{x}. \end{cases} \tag{9}$$

In particular, the quantity tαβ<sup>r</sup> = min aαr, bβ<sup>r</sup> for the simple architecture T = min (A, B) considered above is distributed according to

$$h\left(t - t\_{\alpha\beta r}\right) = h\_{\alpha\beta r}\left(t\right). \tag{10}$$

Let us see how inequality (5) can be derived using these observations.

We first formulate the (conditional) Prolongation Assumption, a deterministic version of (7): the assumption is that f, g, R can be so chosen that for every R = r,

$$a\_{1r} \le a\_{2r}, \ b\_{1r} \le b\_{2r}.\tag{11}$$

Without loss of generality, we can also assume, for any given r,

$$a\_{1r} \le b\_{1r} \tag{12}$$

(if not, rename a into b and vice versa).

Remark 1.1. The Prolongation Assumption clearly implies (7). Conversely, if (7) holds, one can always find functions f, g, R for which the Prolongation Assumption holds in the form above. For instance, one can choose R = (S, SA, SB), take S<sup>A</sup> and S<sup>B</sup> to be uniformly distributed between 0 and 1, and choose f (α, . . .), g (β, . . .) to be the quantile functions for the hypothetical distributions of A and B at the corresponding factor levels.

We next form the conditional (i.e., conditioned on R = r) interaction contrast

$$c\_r(t) = h\_{11r}(t) - h\_{12r}(t) - h\_{21r}(t) + h\_{22r}(t) \,. \tag{13}$$

**Notation Convention.** When r is fixed throughout a discussion, we omit this argument and write aα, bβ, tαβ, hαβ(t),c(t) in place of aαr, bβr, tαβr, hαβr(t),cr(t). (For binary factors α, β, we also conveniently replace α, β in indexation with i, j.)

Following this convention, there are three different arrangements of a1, a2, b1, b<sup>2</sup> (for a given R = r) satisfying (11)–(12):

$$\begin{array}{l} \text{(i)} \ a\_1 \le b\_1 \le a\_2 \le b\_2\\ \text{(ii)} \ a\_1 \le a\_2 \le b\_1 \le b\_2\\ \text{(iii)} \ a\_1 \le b\_1 \le b\_2 \le a\_2 \end{array} \tag{14}$$

In all three cases,

$$t\_{11} = \min\left(a\_1, b\_1\right) = a\_1 = \min\left(a\_1, b\_2\right) = t\_{12}.\tag{15}$$

For arrangement (i) we have


and

This diagram shows the values of hijr (t) and the resulting values ofc<sup>r</sup> (t) as t changes with respect to the fixed positions of tijr (with index r dropped everywhere). Analogously, for arrangements (ii) and (iii), we have, respectively

In all three cases, c(t) is obviously zero for t < t<sup>11</sup> and t ≥ t22. We see that c(t) = c<sup>r</sup> (t) ≤ 0 for all t and every R = r. It follows that C (t) ≤ 0, because

$$\Pr\left(T\_{\vec{\eta}} \le t\right) = \int h\_{\vec{\eta}r}\left(t\right) \mathrm{d}\mu\left(r\right),\tag{16}$$

for i, j ∈ {1, 2}, and

$$\mathcal{C}\left(t\right) = \int c\_r\left(t\right) \mathrm{d}\mu\left(r\right) \le 0,\tag{17}$$

where µ is the probability measure associated with R and the integration is over all possible r. We obtain the same result as in (5), but in a very different way.

In this paper we extend this approach to other mental architectures belonging to the class of series-parallel networks, those involving other composition operations and possibly more than just two selectively influenced processes. In doing so we follow a long trail of work mentioned earlier. When dealing with multiple processes we follow Yang et al. (2013) in using high-order interaction contrasts. All our results are replications or straightforward generalizations of the results already known: the primary value of our work therefore is not in characterizing mental architectures, but rather in demonstrating a new theoretical approach and a new proof technique.

#### 1.1. Definitions, Terminology, and Notation

Since we deal with the durations of processes rather than the processes themselves, we use the term composition to describe a function that relates the durations of the components of a network to the overall (observed) duration. Formally, a composition is a real-valued function t = t a, b, . . . , z of an arbitrary number of real-valued arguments. The arguments a, b, . . . , z are referred to as durations or components. In this article, we will use X ∧ Y ∧ . . . ∧ Z to denote min(X, Y, . . . , Z), and X ∨ Y ∨ . . . ∨ Z to denote max(X, Y . . . , Z).

A series-parallel composition (SP) is defined as follows.

**Definition 1.2.** (1) A single duration is an SP composition. (2) If X and Y are SP compositions with disjoint sets of arguments, then <sup>X</sup>∧Y, <sup>X</sup>∨Y, and <sup>X</sup>+<sup>Y</sup> are SP compositions. (3) There are no other SP compositions than those construable by Rules 1 and 2.

Remark 1.3. The requirement that X and Y in Rule 2 have disjoint sets of arguments prevents expressions like X ∧ X or X + X ∨ Y. But if the second X in X ∧ X is renamed into X ′ , or X ∨ Y in X + X ∨ Y is renamed into Z, then the resulting X ∧ X ′ and <sup>X</sup> <sup>+</sup> <sup>Z</sup> are legitimate SP compositions. This follows from the generality of our treatment, in which different components of an SP composition may have arbitrary joint distributions: e.g., X and X ′ in X ∧ X ′ may very well be jointly distributed so that Pr- X = X ′ = 1. One should, however, always keep in minds the pattern of selective influences: thus, if X is influenced by α, then Z is also influenced by α in X + Z above.

Any SP composition is obtained by a successive application of Rules 1 and 2 (the sequence being generally non-unique), and at any intermediate stage of such a sequence we also have an SP composition that we can term a subcomposition.

**Definition 1.4.** Two durations X, Y in an SP composition are said to be parallel or concurrent if there is a subcomposition of this SP composition of the form SP<sup>1</sup> X, X ′ , . . . ∧ SP2 Y, Y ′ , . . . (in which case X, Y are said to be min-parallel) or SP<sup>1</sup> X, X ′ , . . . <sup>∨</sup> SP<sup>2</sup> Y, Y ′ , . . . (X, Y are max-parallel). X, Y in an SP composition are said to be sequential or serial if there is a subcomposition of this SP composition of the form SP1 X, X ′ , . . . <sup>+</sup> SP<sup>2</sup> Y, Y ′ , . . . .

**Definition 1.5.** An SP composition is called homogeneous if it does not contain both ∧ and ∨ in it; if it does not contain ∧, it is denoted SP∨; if it does not contain <sup>∨</sup>, it is denoted SP∧.

The only SP composition that is both SP<sup>∧</sup> and SP<sup>∨</sup> is a purely serial one: a+b+. . .+z. Most of the results previously obtained for mental networks are confined to homogeneous compositions. We will not need this constraint for the most part.

Since we will be dealing with compositions of more than just two components, we need to extend the definition of selective influences mentioned above. In the formulation below, ∼ stands for "has the same distribution as." A treatment φ = λ 1 i1 , . . . , λ<sup>n</sup> in is a vector of values of the factors λ 1 , . . . , λ<sup>n</sup> , the values of λ k (k = 1, . . . , n) being indicated by subscripts, λ k ik .

**Definition 1.6.** Random variables (X 1 , . . . , X n ) are selectively influenced by factors (λ 1 , . . . , λ<sup>n</sup> ), respectively,

$$(X^1, \dots, X^n) \nrightarrow (\lambda^1, \dots, \lambda^n),\tag{18}$$

if for some random variable R, whose distribution does not depend on (λ 1 , . . . λ<sup>n</sup> ), and for some functions g1, . . . , gn,

$$(X^1\_{\phi}, \dots, X^n\_{\phi}) \sim (\mathfrak{g}\_1(\lambda^1\_{i\_1}, \mathbb{R}), \dots, \mathfrak{g}\_n(\lambda^n\_{i\_n}, \mathbb{R})),\tag{19}$$

for any treatment φ = λ 1 i1 , . . . , λ<sup>n</sup> in .

In the subsequent discussion we assume that all non-dummy factors involved are binary in a completely crossed design (i.e., the overall time T is recorded for all 2<sup>n</sup> vectors of values for φ). When we have random variables not influenced by any of these factors, we will say they selectively influenced by an empty set of factors (we could also, equivalently, introduce for them dummy factors, with one value each).

# 2. SP Compositions Containing Two Selectively Influenced Processes

Consider two processes, with durations A and B in an SP composition. The overall duration of this SP composition can be written as a function of A, B and other components: T = T(A, B, . . .). We assume that A, B, and all other components are selectively influenced by α, β, and empty set, respectively: (A, <sup>B</sup>, . . .) " (α, <sup>β</sup>, <sup>∅</sup>). Let each factor has two levels: <sup>α</sup> <sup>=</sup> 1, 2 and β = 1, 2, with four allowable treatments (1, 1), (1, 2), (2, 1), and (2, 2). The corresponding overall durations (random variables) are written as T11, T12, T21, and T22.

By Definition 1.6 of selective influences, each process duration (a random variable) is a function of some random variable R and the corresponding factor: A = a (α, R), B = b (β, R). For any given value R = r, the component durations are fixed numbers,

$$\begin{array}{ll} a\left(\alpha=1,r\right) = a\_{1r}, & a\left(\alpha=2,r\right) = a\_{2r}, \\ b\left(\beta=1,r\right) = b\_{1r}, & b\left(\beta=2,r\right) = b\_{2r}, \\ & \qquad \qquad \qquad \times\left(\emptyset,r\right) = \ge\_r, \end{array} \tag{20}$$

where x is the value of any duration X in the composition other than A and B. We assume that R is chosen so that the Prolongation Assumption (11) holds, with the convention (12).

The overall duration T at R = r is also a fixed number, written as (recall that we replace α, β in indexation with i, j)

$$T\left(a\_{ir}, \beta\_{jr}, \dots\right) = t\_{jr}, i, j \in \{1, 2\}\,. \tag{21}$$

The distribution function for tijr is the shifted Heaviside function hijr (t) = h t − tijr ,

The conditional interaction contrast c<sup>r</sup> (t) is defined by (13). Denoting by Hij(t) the distribution function of Tij, we have

$$H\_{\vec{l}\vec{j}}\left(t\right) = \int\_{\mathcal{R}} h\_{\vec{l}\vec{r}}\left(t\right) \mathrm{d}\mu\_r,\tag{23}$$

with R denoting the set of possible values of R. For the observable (i.e., estimable from data) interaction contrast

$$C\left(t\right) = H\_{11}\left(t\right) - H\_{12}\left(t\right) - H\_{21}\left(t\right) + H\_{22}\left(t\right), \tag{24}$$

we have then

$$\mathcal{C}\left(t\right) = \int\_{\mathcal{R}} c\_r\left(t\right) \mathrm{d}\mu\_r. \tag{25}$$

Note that it follows from our Prolongation Assumption that

$$H\_{11}\ (t) \ge H\_{12}\ (t) \ , \ H\_{21}\ (t) \ge H\_{22}\ (t) \ ,$$

$$H\_{11}\ (t) \ge H\_{21}\ (t) \ , \ H\_{12}\ (t) \ge H\_{22}\ (t) \ . \tag{26}$$

We also define two conditional cumulative interaction contrasts (conditioned on R = r):

$$
\mathcal{L}\left(0,t\right) = \int\_0^t \mathfrak{c}\left(\mathfrak{r}\right)d\mathfrak{r}.\tag{27}
$$

$$\mathcal{L}\left(t,\infty\right) = \int\_{t}^{\infty} \mathcal{c}\left(\tau\right) \mathrm{d}\tau = \lim\_{u \to \infty} \int\_{t}^{u} \mathcal{c}\left(\tau\right) \mathrm{d}\tau. \tag{28}$$

The corresponding observable cumulative interaction contrasts are

$$\begin{split} C(0,t) &= \int\_{\mathcal{R}} \boldsymbol{\varepsilon}(0,t) \, \mathrm{d}\mu\_{r} = \int\_{\mathcal{R}} \left( \int\_{0}^{t} \boldsymbol{\varepsilon}(\tau) \, \mathrm{d}\tau \right) \mathrm{d}\mu\_{r} \\ &= \int\_{0}^{t} \left( \int\_{\mathcal{R}} \boldsymbol{\varepsilon}(\tau) \, \mathrm{d}\mu\_{r} \right) \mathrm{d}\tau = \int\_{0}^{t} \boldsymbol{C} \, (\tau) \, \mathrm{d}\tau. \end{split} \tag{29}$$

$$\begin{split} \mathcal{C}(t,\infty) &= \int\_{\mathcal{R}} \boldsymbol{\varepsilon}(t,\infty) \, \mathrm{d}\mu\_{\boldsymbol{\tau}} = \int\_{\mathcal{R}} \left( \int\_{t}^{\infty} \boldsymbol{\varepsilon}(\boldsymbol{\tau}) \, \mathrm{d}\boldsymbol{\tau} \right) \mathrm{d}\mu\_{\boldsymbol{\tau}} \\ &= \int\_{t}^{\infty} \left( \int\_{\mathcal{R}} \boldsymbol{\varepsilon}(\boldsymbol{\tau}) \, \mathrm{d}\mu\_{\boldsymbol{\tau}} \right) \mathrm{d}\boldsymbol{\tau} = \int\_{t}^{\infty} \boldsymbol{\mathcal{C}}(\boldsymbol{\tau}) \, \mathrm{d}\boldsymbol{\tau}. \end{split} \tag{30}$$

In these formulas we could switch the order of integration by Fubini's theorem, because, for any interval of reals I,

$$\int\_{I \times \mathcal{R}} |\boldsymbol{\varepsilon}(\boldsymbol{\tau})| \, \mathrm{d} \, (\boldsymbol{\tau} \times \boldsymbol{\mu}\_r) \le \int\_{I \times \mathcal{R}} 2 \mathrm{d} \, (\boldsymbol{\tau} \times \boldsymbol{\mu}\_r) \le 2. \tag{31}$$

Frontiers in Psychology | www.frontiersin.org June 2015 | Volume 6 | Article 735 |

### 2.1. Four lemmas

Recall the definition of c<sup>r</sup> (t) in (13). We follow our Notation Convention and drop the index r in c<sup>r</sup> (t) and all other expressions for a fixed r.

**Lemma 2.1.** In any SP architecture, for any r,

$$t\_{11} \le t\_{12} \land t\_{21} \le t\_{12} \lor t\_{21} \le t\_{22}.$$

Proof. Follows from the (nonstrict) monotonicity of the SP composition in all arguments.

**Lemma 2.2.** In any SP architecture, for any r, c(t) equals 0 for all values of t except for two cases:

(Case+) if t<sup>11</sup> <sup>≤</sup> <sup>t</sup> <sup>&</sup>lt; <sup>t</sup><sup>12</sup> <sup>∧</sup> <sup>t</sup>21, then c(t) <sup>=</sup> <sup>1</sup> <sup>−</sup> <sup>0</sup> <sup>−</sup> <sup>0</sup> <sup>+</sup> <sup>0</sup> <sup>&</sup>gt; <sup>0</sup>, and

(Case−) if t<sup>12</sup> <sup>∨</sup> <sup>t</sup><sup>21</sup> <sup>≤</sup> <sup>t</sup> <sup>&</sup>lt; <sup>t</sup>22, then c(t) <sup>=</sup> <sup>1</sup> <sup>−</sup> <sup>1</sup> <sup>−</sup> <sup>1</sup> <sup>+</sup> <sup>0</sup> <sup>&</sup>lt; <sup>0</sup>.

Proof. By direct computation.

**Lemma 2.3.** In any SP architecture, for any r, c(t) <sup>≤</sup> <sup>0</sup> for all values of t if and only if t<sup>11</sup> = t<sup>12</sup> ∧ t21; c(t) ≥ 0 for all values of t if and only if t<sup>12</sup> ∨ t<sup>21</sup> = t22.

Proof. Immediately follows from Lemma 2.2.

**Lemma 2.4.** In any SP architecture, for any r,

(i) c(0, t) = R t 0 c(τ ) dτ ≥ 0 for any t if and only if −t<sup>11</sup> + t<sup>12</sup> + t<sup>21</sup> − t<sup>22</sup> ≥ 0,

(ii) c(t,∞) = R ∞ t c(τ ) dτ ≤ 0 for any t if and only if −t<sup>11</sup> + t<sup>12</sup> + t<sup>21</sup> − t<sup>22</sup> ≤ 0,

(iii) limt→∞ c(0, t) = 0 if and only if −t11+t12+t21−t<sup>22</sup> = 0. (iv) limt→<sup>0</sup> c(t,∞) = 0 if and only if −t11+t12+t21−t<sup>22</sup> = 0.

Proof. Without loss of generality, put t<sup>12</sup> ≤ t21. We have

$$\begin{cases} \dot{\boldsymbol{c}} \ (0, t) = \\ \begin{cases} 0 & \text{if } t < t\_{11} \\ (t - t\_{11}) & \text{if } t\_{11} \le t < t\_{12} \\ (t - t\_{11}) - (t - t\_{12}) = t\_{12} - t\_{11} & \text{if } t\_{12} \le t < t\_{21} \\ (t - t\_{11}) - (t - t\_{12}) - (t - t\_{21}) & \\ & = -t\_{11} + t\_{12} + t\_{21} - t & \text{if } t\_{21} \le t < t\_{22} \\ (t - t\_{11}) - (t - t\_{12}) - (t - t\_{21}) + (t - t\_{22}) \\ & = -t\_{11} + t\_{12} + t\_{21} - t\_{22} & \text{if } t \ge t\_{22} \end{cases} $$

The expressions for the first three cases are obviously nonnegative. If −t<sup>11</sup> + t<sup>12</sup> + t<sup>21</sup> − t<sup>22</sup> ≥ 0, then c(0, t) ≥ 0 for all t in the last case (t ≥ t22). With −t11+t12+t21−t<sup>22</sup> ≥ 0, we have −t11+t12+t21−t ≥ t22−t ≥ 0 for the fourth case (t<sup>21</sup> ≤ t < t22). Hence c(0, t) ≥ 0 for all t if −t11+t12+t21−t<sup>22</sup> ≥ 0. Conversely, if c(0, t) ≥ 0 for all t, then it is also true for t ≥ t22, whence −t<sup>11</sup> + t<sup>12</sup> + t<sup>21</sup> − t<sup>22</sup> ≥ 0.

The proof for c(t,∞) = R ∞ t c(τ ) dτ requires replacing it first with R <sup>u</sup> t c(τ ) dτ ≤ 0 for some u > t22. We have

$$\begin{cases} \int\_{t}^{u} c(\tau) \, d\tau = \\ \begin{cases} (u - t\_{11}) - (u - t\_{12}) - (u - t\_{21}) + (u - t\_{22}) \\ \qquad = -t\_{11} + t\_{12} + t\_{21} - t\_{22} \\ (u - t) - (u - t\_{12}) - (u - t\_{21}) + (u - t\_{22}) \\ \qquad = -t + t\_{12} + t\_{21} - t\_{22} \\ \qquad = -t + t\_{12} + t\_{21} - t\_{22} \\ \qquad = t\_{21} - t\_{22} \\ \qquad = t\_{21} - t\_{22} \\ (u - t) - (u - t) - (u - t) + (u - t\_{22}) \\ \qquad = t - t\_{22} \\ \qquad = (u - t) - (u - t) - (u - t) + (u - t) \\ \qquad = 0 \end{cases} \qquad \text{if } t\_{11} \le t < t\_{21}$$

The expressions for the last three cases are obviously nonpositive. If <sup>−</sup>t<sup>11</sup> <sup>+</sup> <sup>t</sup><sup>12</sup> <sup>+</sup> <sup>t</sup><sup>21</sup> <sup>−</sup> <sup>t</sup><sup>22</sup> <sup>≤</sup> 0, then <sup>R</sup> <sup>u</sup> t c (2) (τ ) <sup>d</sup><sup>τ</sup> <sup>≤</sup> 0 for all <sup>t</sup> in the first case (t < t11). With −t<sup>11</sup> + t<sup>12</sup> + t<sup>21</sup> − t<sup>22</sup> ≤ 0, we have −t+t12+t21−t<sup>22</sup> ≤ t11−t < 0 for the second case (t<sup>11</sup> ≤ t < t12). Hence R <sup>u</sup> t c(τ ) dτ ≤ 0 for all t if −t<sup>11</sup> + t<sup>12</sup> + t<sup>21</sup> − t<sup>22</sup> ≤ 0 . Since in all expressions u is algebraically eliminated, they remain unchanged as u → ∞. Conversely, if c(t,∞) ≤ 0 for all t, then it is also true for t < t11, whence −t<sup>11</sup> + t<sup>12</sup> + t<sup>21</sup> − t<sup>22</sup> ≤ 0.

The statements (iii) and (iv) follow trivially.

## 2.2. Parallel Processes

#### 2.2.1. Simple Parallel Architectures of Size 2

A simple parallel architecture corresponds to one of the two compositions: <sup>T</sup> <sup>=</sup> <sup>A</sup> <sup>∧</sup> <sup>B</sup> or <sup>T</sup> <sup>=</sup> <sup>A</sup> <sup>∨</sup> <sup>B</sup>, with (A, <sup>B</sup>) " (α, <sup>β</sup>). Recall the definition of C (t) in (24).

**Theorem 2.5.** For T = A ∧ B, we have c(t) ≤ 0 for any r, t; consequently, C (t) ≤ 0 for any t. For T = A∨B, we have c(t) ≥ 0 for any r, t; consequently, C (t) ≥ 0 for any t.

Proof. For T = A ∧ B with the Prolongation Assumption (11)–(12), we have

$$t\_{11} = a\_1 \land b\_1 = a\_1, \ t\_{12} = a\_1 \land b\_2, \ t\_{21} = a\_2 \land b\_1.$$

It follows that

$$t\_{12} \wedge t\_{21} = a\_1 \wedge b\_2 \wedge a\_2 \wedge b\_1 = a\_1 = t\_{11}.$$

By Lemma 2.3, c(t) ≤ 0. As C (t) in (25) preserves the sign of c(t), we have C (t) ≤ 0. For T = A ∧ B, we have

$$t\_{22} = a\_2 \lor b\_2, \ t\_{12} = a\_1 \lor b\_2, \ t\_{21} = a\_2 \lor b\_1.$$

It follows that

$$t\_{12} \lor t\_{21} = a\_1 \lor b\_2 \lor a\_2 \lor b\_1 = t\_{22},$$

whence, by Lemma 2.3, c(t) ≥ 0 and therefore C (t) ≥ 0.

### 2.2.2. Two Parallel Processes in an Arbitrary SP Network

Consider now a composition SP(A, B, . . .) with (A, B, . . .) " (α, β, ∅).

**Lemma 2.6.** If A, B in SP(A, B, . . .) are parallel, then SP(A, <sup>B</sup>, . . .) can be presented as A′ <sup>∧</sup> <sup>B</sup> ′ if they are min-parallel, or as A′ ∨B ′ if they are max-parallel, so that (A ′ , B ′ ) " (α, β) and, for any fixed R = r, the Prolongation Assumption holds.

Proof. By Definitions 1.2 and 1.4, if A, B are min-parallel, then SP∧(A, <sup>B</sup>, . . .) can be presented either as

$$\operatorname{SP}^1(A, \dots, \cdot) \land \operatorname{SP}^2(B, \dots, \cdot)$$

or

$$\left(\mathrm{SP}^1(A,\ldots)\wedge\mathrm{SP}^2(B,\ldots)+X\right)\wedge Y,$$

or else

$$\left(\mathrm{SP^1}(A,\ldots)\wedge\mathrm{SP^2}(B,\ldots)\wedge X\right)+Y,$$

where B does not enter in SP<sup>1</sup> and A does not enter in SP<sup>2</sup> . On renaming

$$
\begin{aligned}
\underbrace{\operatorname{SP}^1(A,\dots)}\_{=A'} &\wedge \underbrace{\operatorname{SP}^2(B,\dots)}\_{=B'}, \\
\end{aligned}
$$

$$
\begin{aligned}
\left(\operatorname{SP}^1(A,\dots)\right) &\wedge \operatorname{SP}^2(B,\dots) + X\right) \wedge Y \\
&= \underbrace{\left(\operatorname{SP}^1(A,\dots) + X\right)}\_{=A'} \wedge \underbrace{\left(\operatorname{SP}^2(B,\dots) + X\right) \wedge Y}\_{=B'}
\end{aligned}
$$

and

$$\begin{aligned} \mathop{\rm SP^1}(A, \dots) \wedge \mathop{\rm SP^2}(B, \dots) \wedge X) + Y \\ = \underbrace{\left(\mathop{\rm SP^1}(A, \dots) + Y\right)}\_{=A'} \wedge \underbrace{\left(\mathop{\rm SP^2}(B, \dots) \wedge X + Y\right)}\_{=B'}, \end{aligned}$$

we have, obviously, (A ′ , B ′ ) " (α, <sup>β</sup>). Fixing <sup>R</sup> <sup>=</sup> <sup>r</sup>, by the (nonstrict) monotonicity of SP compositions,

$$a\_1' = \operatorname{SP}^1(a\_1, \dots) \le \operatorname{SP}^1(a\_2, \dots) = a\_2'$$

and

$$b\_1' = \mathrm{SP}^2(b\_1, \dots) \le \mathrm{SP}^2(b\_2, \dots) = b\_2'$$

We can also put a ′ <sup>1</sup> <sup>=</sup> SP<sup>1</sup> (a1, . . .) <sup>≤</sup> SP<sup>2</sup> (b1, . . .) = b ′ 1 (otherwise we can rename the variables). The proof for the max-parallel case is analogous.

**Theorem 2.7.** If A, B in SP(A, B, . . .) are min-parallel, then c(t) ≤ 0 for any r, t; consequently, C (t) ≤ 0 for any t. If A, B in SP(A, <sup>B</sup>, . . .) are max-parallel, then c(t) <sup>≥</sup> <sup>0</sup> for any r, t; consequently, C (t) ≥ 0 for any t.

Proof. Immediately follows from Lemma 2.6 and Theorem 2.5.

### 2.3. Sequential Processes

#### 2.3.1. Simple Serial Architectures of Size 2

Simple serial architectures of size 2 corresponds to the SP composition <sup>T</sup> <sup>=</sup> <sup>A</sup> <sup>+</sup> <sup>B</sup>, with (A, <sup>B</sup>) " (α, <sup>β</sup>). Recall the definitions of the two cumulative interaction contrasts: (27)–(30).

**Theorem 2.8.** If T = A + B, then c(0, t) ≥ 0 and c(t,∞) ≤ 0 for any r, t; moreover,

$$\lim\_{t \to \infty} c\left(0, t\right) = \lim\_{t \to 0} c\left(t, \infty\right) = 0,$$

for any r, t. Consequently, C (0, t) ≥ 0, C (t,∞) ≤ 0 for any t, and

$$\lim\_{t \to \infty} C\left(0, t\right) = \lim\_{t \to 0} C\left(t, \infty\right) = 0$$

Proof. Follows immediately from Lemma 2.4, since

$$\begin{aligned} -t\_{11} + t\_{12} + t\_{21} - t\_{22} &= -\left(a\_1 + b\_1\right) + \left(a\_1 + b\_2\right) \\ + \left(a\_2 + b\_1\right) - \left(a\_2 + b\_2\right) &= 0. \end{aligned}$$

### 2.3.2. Two Sequential Processes in an Arbitrary SP Network

Consider now a composition SP(A, B, . . .) with (A, B, . . .) " (α, β, ∅).

**Theorem 2.9.** If A and B are sequential in an SP(A, B, . . .) composition, then one or both of the following statements hold: (i) c(0, t) ≥ 0 for any r, t, and C (0, t) ≥ 0 for any t, (ii) c(t,∞) ≤ 0 for any r, t, and C (t,∞) ≤ 0 for any t.

Proof. In accordance with Definitions 1.2 and 1.4, SP(A, B, . . .) with sequential A, B can be presented as either

$$\left(\mathrm{SP}^1(A,\ldots) + \mathrm{SP}^2(B,\ldots)\right) \wedge X + Y \tag{32}$$

or

,

$$\left(\mathrm{SP}^1(A,\ldots) + \mathrm{SP}^2(B,\ldots)\right) \vee X + Y \tag{33}$$

(note that any Z in SP<sup>1</sup> (A, . . .)+SP<sup>2</sup> (B, . . .)+Z can be absorbed by either of the first two summands). For both cases, by the monotonicity of SP compositions, for any <sup>R</sup> <sup>=</sup> <sup>r</sup>, SP<sup>1</sup> (a1, . . .) ≤ SP1 (a2, . . .), SP<sup>2</sup> (b1, . . .) <sup>≤</sup> SP<sup>2</sup> (b2, . . .), and we can always assume SP<sup>1</sup> (a1, . . .) <sup>≤</sup> SP<sup>2</sup> (b1, . . .). Denoting the durations of SP1 (ai, . . .) <sup>+</sup> SP<sup>2</sup> (bj, . . .) by t ′ ij, we have therefore, by Theorem 2.8, −t ′ <sup>11</sup> + t ′ <sup>12</sup> + t ′ <sup>21</sup> − t ′ <sup>22</sup> = 0. Denoting the durations of X and Y by t ′ and t ′′, respectively, in the case (32) we have

$$t\_{\vec{\imath}\vec{\jmath}} = t'\_{\vec{\imath}\vec{\jmath}} \wedge t' + t'' .$$

By Lemma 2.4, all we have to show is that −t11+t12+t21−t<sup>22</sup> ≥ 0. It is easy to see that t ′′ does not affect this linear combination, and its value is (assuming t ′ <sup>12</sup> ≤ t ′ <sup>21</sup>, without loss of generality)

$$\begin{cases} 0 & \text{if } t' < t'\_{11} \\ -t'\_{11} + t' & \text{if } t'\_{11} \le t' < t'\_{12} \\ -t'\_{11} + t'\_{12} & \text{if } t'\_{12} \le t' < t'\_{21} \\ -t'\_{11} + t'\_{12} + t'\_{21} - t' & \text{if } t'\_{21} \le t' < t'\_{22} \\ -t'\_{11} + t'\_{12} + t'\_{21} - t'\_{22} & \text{if } t' \ge t'\_{22} \end{cases}$$

The nonnegativity of the first three expressions is obvious, the fifth one is zero, and the forth expression is larger than the fifth because t ′ < t ′ 22.

The proof for the case (33) is analogous.

If the SP composition with sequential A, B is homogeneous (Definition 1.5), the statement of theorem can be made more specific.

**Theorem 2.10.** If A and B are sequential in an SP∧(A, <sup>B</sup>, . . .) composition, then c(0, t) ≥ 0 for any r, t, and C (0, t) ≥ 0 for any t; if the composition is SP∨(A, <sup>B</sup>, . . .), then c(t,∞) <sup>≤</sup> <sup>0</sup> for any r, t, and C (t,∞) ≤ 0 for any t.

# 3. Multiple Processes

We now turn to networks containing n ≥ 2 processes with durations (X 1 , . . . , X n ), selectively influenced by factors (λ 1 , . . . , λ<sup>n</sup> ). In other words, we deal with compositions SP(X 1 , . . . , X n , . . .) such that (X 1 , . . . , X n , . . .) " (λ 1 , . . . , λ<sup>n</sup> , ∅), where each λ k is binary, with values 1,2. There are 2<sup>n</sup> allowable treatments and 2<sup>n</sup> corresponding overall durations, T11...1, T11...2, . . . , T22...2. According to Definition 1.6 of selective influences, each process duration here is a function of some random variable R and of the corresponding factor, X <sup>k</sup> <sup>=</sup> <sup>x</sup> k (R, λ<sup>k</sup> ). For any fixed value R = r, these durations are fixed numbers for any given treatment, and so is the overall, observed value of the SP composition. We denote them

$$\boldsymbol{x}^{k}(r,\lambda^{k}=1)=\boldsymbol{x}\_{1r}^{k},\ \boldsymbol{x}^{k}(r,\lambda^{k}=2)=\boldsymbol{x}\_{2r}^{k},\tag{34}$$

and

$$T(\mathbf{x}\_{i\_1r}^1, \mathbf{x}\_{i\_2r}^2, \dots, \mathbf{x}\_{i\_nr}^n, \dots), \dots = t\_{i\_1i\_2\dots i\_nr},\tag{35}$$

where i1, i2, . . . , i<sup>n</sup> ∈ {1, 2}. The distribution function for ti1i2...in<sup>r</sup> is a shifted Heaviside function

$$h\_{i\_1 i\_2 \dots i\_{n^r}}(t) = \begin{cases} 0, \text{ if } t < t\_{i\_1 i\_2 \dots i\_n r} \\ 1, \text{ if } t \ge t\_{i\_1 i\_2 \dots i\_n r} \end{cases} \tag{36}$$

Denoting by Hi1i2...i<sup>n</sup> (t) the distribution function of Ti1i2...i<sup>n</sup> , we have

$$H\_{\dot{i}\_1 \dot{i}\_2 \dots \dot{i}\_n}(t) = \int\_{\mathcal{R}} h\_{\dot{i}\_1 \dot{i}\_2 \dots \dot{i}\_n r}(t) \,\mathrm{d}\mu\_r. \tag{37}$$

Conditioned on R = r, the n-th order interaction contrast is defined in terms of mixed finite differences as

$$c\_r^{(n)}\left(t\right) = \Delta\_{\dot{t}\_1} \Delta\_{\dot{t}\_2} \dots \Delta\_{\dot{t}\_n} h\_{\dot{t}\_1 \dot{t}\_2 \dots \dot{t}\_n r}\left(t\right),\tag{38}$$

which, with some algebra can be shown to be equal to

$$c\_r^{(n)}\left(t\right) = \sum\_{i\_1, i\_2, \dots, i\_n} (-1)^{n + \sum\_{k=1}^n i\_k} h\_{i\_1 \dots i\_n r}\left(t\right) \,. \tag{39}$$

Thus,

c

r

$$\begin{split} c\_{r}^{(1)}\left(t\right) &= \Delta\_{\bar{l}\_{1}}h\_{\bar{l}\_{1}r}\left(t\right) = h\_{1r}\left(t\right) - h\_{2r}\left(t\right) \\ &= \sum\_{\bar{i}\_{1}} \left(-1\right)^{1+\bar{i}\_{1}} h\_{\bar{i}\_{1}r}\left(t\right), \end{split} \tag{40}$$

$$\begin{split} c\_{r}^{(2)}\left(t\right) &= \Delta\_{\bar{i}\_{1}}\Delta\_{\bar{i}\_{2}}h\_{\bar{i}\_{1}\bar{i}\_{2}r}\left(t\right) = \left[h\_{11r}\left(t\right) - h\_{12r}\left(t\right)\right] \\ &\quad - \left[h\_{21r}\left(t\right) - h\_{22r}\left(t\right)\right] \\ &= h\_{11r}\left(t\right) - h\_{12r}\left(t\right) - h\_{21r}\left(t\right) + h\_{22r}\left(t\right) \\ &= \sum\_{\bar{i}\_{1},\bar{i}\_{2}} \left(-1\right)^{2+\bar{i}\_{1}+\bar{i}\_{2}} h\_{\bar{i}\_{1}\bar{i}\_{2}r}\left(t\right), \end{split} \tag{41}$$

$$c\_r^{(3)}(t) = \Delta\_{i1}\Delta\_{i2}\Delta\_{i\bar{j}}h\_{\bar{i}1\bar{i}2\bar{i}3\bar{r}}(t)$$

$$= \left\{ \begin{bmatrix} h\_{111r}(t) - h\_{112r}(t) \end{bmatrix} - \begin{bmatrix} h\_{121r}(t) - h\_{122r}(t) \end{bmatrix} \right\}$$

$$- \left\{ \begin{bmatrix} h\_{211r}(t) - h\_{212r}(t) \end{bmatrix} - \begin{bmatrix} h\_{221r}(t) - h\_{222r}(t) \end{bmatrix} \right\}$$

$$= h\_{111r}(t) - h\_{112r}(t) - h\_{121r}(t) - h\_{211r}(t)$$

$$\qquad + h\_{122r}(t) + h\_{212r}(t) + h\_{211r}(t) - h\_{222r}(t)$$

$$= \sum\_{i\_1, i\_2, i\_3} \left( -1 \right)^{3 + i\_1 + i\_2 + i\_3} h\_{i\_1i\_2i\_3r}(t), \qquad \text{(42)}$$

etc. The observable distribution function interaction contrast of order n is defined as

$$\mathcal{C}^{(n)}\left(t\right) = \int\_{\mathcal{R}} \mathcal{c}\_r^{(n)}\left(t\right) \,\mathrm{d}\mu\_r. \tag{43}$$

By straightforward calculus this can be written in extenso as

$$C^{(\boldsymbol{\eta})}\left(t\right) = \sum\_{i\_1, i\_2, \dots, i\_n} (-1)^{n + \sum\_{k=1}^n i\_k} H\_{i\_1 \dots i\_k}\left(t\right), \tag{44}$$

or, in terms of finite differences,

$$C^{(\eta)}\left(t\right) = \Delta\_{\dot{t}\_1}\Delta\_{\dot{t}\_2}\dots\Delta\_{\dot{t}\_n}H\_{\dot{t}\_1\dot{t}\_2\dots\dot{t}\_n}\left(t\right).\tag{45}$$

This is essentially the high-order interaction contrast used by Yang et al. (2013), the only difference being that they use survivor functions 1 − H (t) rather than the distribution functions H (t). We see that c<sup>r</sup> (t) and C (t) in the preceding analysis correspond to c (2) <sup>r</sup> (t) and C (2) (t), respectively.

We also introduce n-th order cumulative contrasts. Conditioned on R = r, we define

$$c\_r^{[1]}\left(0,t\right) = c\_r^{[1]}\left(t,\infty\right) = h\_{1r}\left(t\right) - h\_{2r}\left(t\right),\tag{46}$$

Frontiers in Psychology | www.frontiersin.org June 2015 | Volume 6 | Article 735 |

(τ,∞) dτ,

$$c\_r^{[2]}\left(0,t\right) = \int\_0^t c\_r^{(2)}\left(t\_1\right)dt\_1, \quad c\_r^{[2]}\left(t,\infty\right) = \int\_t^\infty c\_r^{(2)}\left(t\_1\right)dt\_1,\tag{47}$$

$$c\_r^{[3]}\left(0,t\right) = \int\_0^t \int\_0^{t\_1} c\_r^{(3)}\left(t\_2\right) dt\_2 dt\_1,$$

$$c\_r^{[3]}\left(t,\infty\right) = \int\_t^{\infty} \int\_{t\_1}^{\infty} c\_r^{(3)}\left(t\_2\right) dt\_2 dt\_1,\tag{48}$$

etc. Generalizing,

$$c\_r^{[n]}\left(0,t\right) = \int\_0^t \left(\int\_0^{t\_1} \dots \int\_0^{t\_{n-2}} c\_r^{(n)}\left(t\_{n-1}\right) dt\_{n-1} \dots dt\_2\right) dt\_1,\tag{49}$$

$$c\_r^{[n]}\left(t,\infty\right) = \int\_t^{\infty} \left(\int\_{t\_1}^{\infty} \dots \int\_{t\_{n-2}}^{\infty} c\_r^{(n)}\left(t\_{n-1}\right) dt\_{n-1} \dots dt\_2\right) dt\_1 \dots dt\_n \tag{50}$$

The corresponding unconditional cumulative contrasts of the nth order are, as always, defined by integration of the conditional ones:

$$\begin{aligned} \mathbf{C}^{[n]}\left(\mathbf{0},t\right) &= \int\_{\mathcal{R}} \mathbf{c}\_{r}^{[n]}\left(\mathbf{0},t\right) \mathbf{d}\mu\_{r} \\ &= \int\_{0}^{t} \left(\int\_{0}^{t\_{1}} \dots \int\_{0}^{t\_{n-2}} \mathbf{C}^{(n)}\left(t\_{n-1}\right) dt\_{n-1} \dots dt\_{2}\right) dt\_{1}, \end{aligned} \tag{51}$$

$$\begin{split} \mathbf{C}^{[n]}\left(t,\infty\right) &= \int\_{\mathcal{R}} \mathbf{c}\_{r}^{[n]}\left(t,\infty\right) \, \mathrm{d}\mu\_{r} \\ &= \int\_{t}^{\infty} \left( \int\_{t\_{1}}^{\infty} \dots \int\_{t\_{n-2}}^{\infty} \mathbf{C}^{(n)}\left(t\_{n-1}\right) \, dt\_{n-1} \dots dt\_{2} \right) dt\_{1} . \end{split} \tag{52}$$

In the proofs below we will make use of the recursive representation of the conditional cumulative contrasts c [n] r . It is verified by straightforward calculus. Denoting

$$c\_{i\_{\mathbf{w}}r}^{(n-1)}(t) = \\
$$

$$\sum\_{i\_1,\dots,i\_{\mathbf{w}-1},i\_{\mathbf{w}+1},\dots,i\_n} (-1)^{n-1-i\_{\mathbf{w}}+\sum\_{k=1}^n i\_k} \mu\_{i\_1\dots i\_{\mathbf{w}-1}i\_{\mathbf{w}}i\_{\mathbf{w}+1}\dots i\_{\mathbf{n}^\mathbf{r}}}(t)\,,$$

where w ∈ {1, . . . , n} and i<sup>w</sup> is fixed at 1 or 2, we have:

$$c\_r^{[1]}\left(0,t\right) = c\_r^{[1]}\left(t,\infty\right) = h\_{1r}\left(t\right) - h\_{2r}\left(t\right),\tag{54}$$

$$\begin{split} c\_{r}^{[2]}\left(0,t\right) &= \int\_{0}^{t} c\_{r}^{(2)}\left(\tau\right)d\tau \\ &= \int\_{0}^{t} \left(h\_{11r}\left(\tau\right) - h\_{12r}\left(\tau\right) - h\_{21r}\left(\tau\right) + h\_{22r}\left(\tau\right)\right)d\tau \\ &= \int\_{0}^{t} \left[c\_{i\_{w}=1,r}^{(1)}\left(\tau\right) - c\_{i\_{w}=2,r}^{(1)}\left(\tau\right)\right]d\tau \\ &= \int\_{0}^{t} c\_{i\_{w}=1,r}^{[1]}\left(0,\tau\right)d\tau - \int\_{0}^{t} c\_{i\_{w}=2,r}^{[1]}\left(0,\tau\right)d\tau,\end{split} \tag{55}$$

$$\begin{split} c\_{r}^{[2]}\left(t,\infty\right) &= \int\_{t}^{\infty} c\_{r}^{(2)}\left(\tau\right)d\tau \\ &= \int\_{t}^{\infty} \left(h\_{11r}\left(\tau\right) - h\_{12r}\left(\tau\right) - h\_{21r}\left(\tau\right) + h\_{22r}\left(\tau\right)\right)d\tau \\ &\tag{56} \\ &= \int\_{t}^{\infty} \left[c\_{iw=1,r}^{(1)}\left(\tau\right) - c\_{iw=2,r}^{(1)}\left(\tau\right)\right]d\tau \\ &\qquad \int\_{t}^{\infty} \left[1\right]\_{(1-\tau)^{1}} \int\_{-\infty}^{\infty} \left[1\right]\_{(1-\tau)^{1}}\left[c\_{1-\tau}\right]\_{-\infty} \end{split} \tag{57}$$

(τ,∞) dτ −

t c [1] iw=2,r

$$\begin{split} c\_{r}^{[3]}\left(0,t\right) &= \int\_{0}^{t} \int\_{0}^{t\_{1}} c\_{r}^{(3)}\left(t\_{2}\right) dt\_{2} dt\_{1} \\ &= \int\_{0}^{t} \int\_{0}^{t\_{1}} \left[ c\_{i\_{w}=1,r}^{(2)}\left(t\_{2}\right) - c\_{i\_{w}=2,r}^{(2)}\left(t\_{2}\right) \right] dt\_{2} dt\_{1} \\ &= \int\_{0}^{t} \left[ \int\_{0}^{t\_{1}} c\_{i\_{w}=1,r}^{(2)}\left(t\_{2}\right) dt\_{2} - \int\_{0}^{t\_{1}} c\_{i\_{w}=2,r}^{(2)}\left(t\_{2}\right) dt\_{2} \right] dt\_{1} \\ &= \int\_{0}^{t} c\_{i\_{w}=1,r}^{[2]}\left(0,\tau\right) d\tau - \int\_{0}^{t} c\_{i\_{w}=2,r}^{[2]}\left(0,\tau\right) d\tau, \end{split} \tag{57}$$

$$\begin{split} c\_{r}^{[3]}\left(t,\infty\right) &= \int\_{t}^{\infty} \int\_{t\_{1}}^{\infty} c\_{r}^{(3)}\left(t\_{2}\right) \, dt\_{2} \, dt\_{1} \\ &= \int\_{t}^{\infty} \int\_{t\_{1}}^{\infty} \left[ c\_{i\_{w}=1,r}^{(2)}\left(t\_{2}\right) - c\_{i\_{w}=2,r}^{(2)}\left(t\_{2}\right) \right] \, dt\_{2} \, dt\_{1} \\ &= \int\_{t}^{\infty} \left[ \int\_{t\_{1}}^{\infty} c\_{i\_{w}=1,r}^{(2)}\left(t\_{2}\right) \, dt\_{2} - \int\_{t\_{1}}^{\infty} c\_{i\_{w}=2,r}^{(2)}\left(t\_{2}\right) \, dt\_{2} \right] \, dt\_{1} \\ &= \int\_{t}^{\infty} c\_{i\_{w}=1,r}^{[2]}\left(\tau,\infty\right) \, d\tau - \int\_{t}^{\infty} c\_{i\_{w}=2,r}^{[2]}\left(\tau,\infty\right) \, d\tau, \end{split} \tag{58}$$

and generally, for n > 1,

=

t c [1] iw=1,r

$$c\_r^{[n]}(0,t) = \int\_0^t c\_{i\_w=1,r}^{[n-1]}(0,\tau) \,d\tau - \int\_0^t c\_{i\_w=2,r}^{[n-1]}(0,\tau) \,d\tau,\tag{59}$$

$$c^{\left[n\right]}\left(t,\infty\right) = \int\_{t}^{\infty} c^{\left[n-1\right]}\_{i\_{w}=1,r}\left(\tau,\infty\right)d\tau - \int\_{t}^{\infty} c^{\left[n-1\right]}\_{i\_{w}=2,r}\left(\tau,\infty\right)d\tau. \tag{60}$$

Also we have, by substitution of variables under integral,

$$c\_{i\_{\rm w}r}^{[n-1]}(0,t) = c\_r^{[n-1]}\left(0, t - x\_{i\_{\rm w}r}^{\rm w}\right),\tag{61}$$

$$c\_{i\_{\rm yr}}^{[n-1]}(t,\infty) = c^{[n-1]} \left(t - x\_{i\_{\rm w}r}^{\rm w}, \infty\right). \tag{62}$$

The Prolongation Assumption generalizing (11)–(12) is formulated as follows.

**Prolongation Assumption.** R and functions x 1 , . . . , x n in (34) can be chosen so that x k <sup>1</sup><sup>r</sup> ≤ x k 2r for all R = r and for all k = 1, . . . , n. Without loss of generality, we can also assume x 1 <sup>1</sup><sup>r</sup> ≤ x 2 <sup>1</sup><sup>r</sup> ≤ . . . ≤ x n 1r (if not, rearrange x 1 1r , . . . , x n 1r ).

**Notation Convention.** As we did before for n = 2, when r is fixed throughout a discussion, we omit this argument and write x 1 i1 , . . . , x n in , ti1i2...i<sup>n</sup> , hi1i2...i<sup>n</sup> (t), c (n) (t) in place of x 1 i1r , . . . , x n inr , ti1i2...in<sup>r</sup> , hi1i2...in<sup>r</sup> (t), c (n) <sup>r</sup> (t).

### 3.1. Parallel Processes

#### 3.1.1. Simple Parallel Architectures of Size n

**Theorem 3.1.** If T = X <sup>1</sup> <sup>∧</sup> . . . <sup>∧</sup> <sup>X</sup> n , then for any r, t, c(n) (t) ≤ 0 if n is even and c(n) (t) ≥ 0 if n is odd. Consequently, for any t, C (n) (t) <sup>≤</sup> <sup>0</sup> if n is even and C(n) (t) ≥ 0 if n is odd.

Proof. By induction on n, the case n = 1 being true by the Prolongation Assumption:

$$c^{(1)}\left(t\right) = h\_1\left(t\right) - h\_2\left(t\right) \ge 0.1$$

Let the statement of the theorem be true for c (n−1)(t) , with n − 1 ≥ 1. By the Prolongation Assumption,

$$x\_{1i\_2...i\_n} = x\_1^1 \land x\_{i\_2}^2 \land \dots \land x\_{i\_n}^n = x\_1^1,$$

for any i<sup>2</sup> . . . in, whence

$$h\_{1\dot{t}2\ldots\dot{t}n}(t) = \begin{cases} 0, & \text{if } t < x\_1^1 \\ 1, & \text{if } t \ge x\_1^1 \end{cases}$$

Therefore c (n−1) i1=1 (t) = 0, and, applying the induction hypothesis to c (n−1) i1=2 (t),

$$\begin{aligned} \boldsymbol{c}^{(n)}\left(t\right) &= \boldsymbol{c}^{(n-1)}\_{i\_1=1}\left(t\right) - \boldsymbol{c}^{(n-1)}\_{i\_1=2}\left(t\right) = -\boldsymbol{c}^{(n-1)}\_{i\_1=2}\left(t\right) \\ &= \begin{cases} \leq 0, & \text{if } n \text{ is even} \\ \geq 0, & \text{if } n \text{ is odd} \end{cases} \end{aligned}$$

That C (n) (t) ≤ 0 if n is even and C (n) (t) ≥ 0 if n is odd follows by the standard argument.

**Theorem 3.2.** If T = X <sup>1</sup> <sup>∨</sup>. . . <sup>∨</sup><sup>X</sup> n , then for any r, t, c(n) (t) ≥ 0. Consequently, for any t, C(n) (t) ≥ 0.

Proof. By induction on n, the case n = 1 being true by the Prolongation Assumption:

$$c^{(1)}\left(t\right) = h\_1\left(t\right) - h\_2\left(t\right) \ge 0.1$$

Let the theorem be true for c (n−1)(t), where <sup>n</sup> <sup>−</sup> <sup>1</sup> <sup>≥</sup> 1. Let

$$
\varkappa\_2^1 \lor \varkappa\_2^2 \lor \dots \lor \varkappa\_2^n = \varkappa\_2^m,
$$

where 1 ≤ m ≤ n. We have then

$$x\_{i\_1 i\_2 \dots i\_m - 1} x\_{i\_m + 1 \dots i\_n} = x\_2^m,$$

and

$$h\_{i\_1\dots i\_{m-1}2i\_{m+1}\dots i\_n}(t) = \begin{cases} 0, & \text{if } t < x\_2^m\\ 1, & \text{if } t \ge x\_2^m \end{cases},$$

for all i1...im−1, i<sup>m</sup> <sup>+</sup> <sup>1</sup>...in. Then c (n−1) im=2 (t) = 0, and

$$c^{(n)}\left(t\right) = c^{(n-1)}\_{i\_m=1}\left(t\right) - c^{(n-1)}\_{i\_m=2}\left(t\right) = c^{(n-1)}\_{i\_m=1}\left(t\right) \succeq 0.1$$

$$\text{Consequently, } C^{(n)}\left(t\right) \ge 0\text{, for any }t.$$

3.1.2. Multiple Parallel Processes in Arbitrary SP Networks

In a composition SP X 1 , . . . , X n , . . . , the components X 1 , . . . , X n are considered parallel if any two of them are parallel. We assume selective influences (X 1 , . . . , X n , . . .) " (λ 1 , . . . , λ<sup>n</sup> , ∅). We do not consider the complex situation when some of the selectively influenced processes X 1 , . . . , X n are min-parallel and some are max-parallel. However, if they are all (pairwise) min-parallel or all max-parallel, we have essentially the same situation as with a simple parallel arrangement of n durations.

**Lemma 3.3.** If X<sup>1</sup> , . . . , X <sup>n</sup> are all min-parallel or max-parallel in an SP composition, this composition can be presented as T <sup>=</sup> A <sup>1</sup> <sup>∧</sup> . . . <sup>∧</sup> <sup>A</sup> n or T = A <sup>1</sup> <sup>∨</sup> . . . <sup>∨</sup> <sup>A</sup> n , respectively. In either case, (A 1 , . . . , A n ) " (λ 1 , . . . , λ<sup>n</sup> ) and the Prolongation Assumption holds for any R = r.

Proof. For the min-parallel case, by a minor modification of the proof of Lemma 2.6 we present the SP composition as

$$\underbrace{\operatorname{SP}^1(X^1, \ldots)}\_{=A^1} \wedge \operatorname{SP}^2(X^2, \ldots, X^n, \ldots),$$

or

$$\begin{aligned} \operatorname{(SP}^1(X^1, \dots) \land \operatorname{SP}^2(X^2, \dots, X^n, \dots) + X) \land Y, \\ \mathcal{C} = \underbrace{\left(\operatorname{SP}^1(X^1, \dots) + X\right)}\_{=:\mathcal{A}^1} \land \left(\operatorname{SP}^2(X^2, \dots, X^n, \dots) + X\right) \land Y, \end{aligned}$$

or else

$$\begin{aligned} \mbox{\{SP^1(X^1, \dots) \land SP^2(X^2, \dots, X^n, \dots) \land X\} + Y} \\ = \underbrace{\left(\operatorname{SP^1(X^1, \dots) + Y}\right)}\_{=A^1} \wedge \left(\operatorname{SP^2(X^2, \dots, X^n, \dots) \land X + Y}\right) .\end{aligned}$$

Then we analogously decompose SP<sup>2</sup> (X 2 , . . . , X n , . . .) achieving A <sup>1</sup> <sup>∧</sup><sup>A</sup> <sup>2</sup> <sup>∧</sup>SP<sup>3</sup> (X 3 , . . . , X n , . . .), and proceed in this fashion until we reach the required A <sup>1</sup> <sup>∧</sup> . . . <sup>∧</sup> <sup>A</sup> n . The pattern of selective influences is seen immediately, and the Prolongation Assumption follows by the monotonicity of the SP compositions. The proof for the max-parallel case is analogous.

**Theorem 3.4.** If X<sup>1</sup> , . . . , X <sup>n</sup> are min-parallel in an SP composition, then for any r, t, c(n) (t) <sup>≤</sup> <sup>0</sup> if n is even and c(n) (t) ≥ 0 if n is odd. Consequently, for any t, C(n) (t) ≤ 0 if n is even and C (n) (t) <sup>≥</sup> <sup>0</sup> if n is odd. If X<sup>1</sup> , . . . , X <sup>n</sup> are max-parallel, then for any r, t, c(n) (t) <sup>≥</sup> <sup>0</sup>, and for any t, C(n) (t) ≥ 0.

Proof. Follows from Lemma 3.3 and Theorems 3.1 and 3.2.

#### 3.2. Sequential Processes

#### 3.2.1. Simple Serial Architectures of Size n

**Theorem 3.5.** If T = X <sup>1</sup>+. . .+<sup>X</sup> n , then for any r, t, c[n] (0, t) ≥ 0, while c[n] (t,∞) <sup>≤</sup> <sup>0</sup> if n is even and c[n] (t,∞) ≥ 0 if n is odd; moreover, c[n] (0,∞) = 0 for any r. Consequently, for any t, C[n] (0, <sup>t</sup>) <sup>≥</sup> <sup>0</sup>, while C[n] (t,∞) ≤ 0 if n is even and C [n] (t,∞) <sup>≥</sup> <sup>0</sup> if n is odd; moreover, C[n] (0,∞) = 0.

c

Proof. By induction on n, the case n = 1 being true by the Prolongation Assumption:

and

$$c^{[1]}\left(0,t\right) = c^{[1]}\left(t,\infty\right) = h\_1\left(t\right) - h\_2\left(t\right) \ge 0,$$

$$\lim\_{t \to \infty} c^{[1]}(0, t) = \lim\_{t \to 0} c^{[1]}(t, \infty) = 0.$$

Let the statement of the theorem hold for all natural numbers up to and including n − 1 ≥ 1. Using the recursive representations (59)–(60),

$$\begin{array}{lcl}\boldsymbol{c}^{[n]}(\mathbf{0},t) = \int\_{0}^{t} \boldsymbol{c}^{[n-1]}\_{i\_{w}=1}(\mathbf{0},\mathbf{r}) \, d\mathbf{r} - \int\_{0}^{t} \boldsymbol{c}^{[n-1]}\_{i\_{w}=2}(\mathbf{0},\mathbf{r}) \, d\mathbf{r} \\ = \int\_{0}^{t-\mathbf{x}\_{1}^{w}} \boldsymbol{c}^{[n-1]}(\mathbf{0},\mathbf{r}) \, d\mathbf{r} - \int\_{0}^{t-\mathbf{x}\_{2}^{w}} \boldsymbol{c}^{[n-1]}(\mathbf{0},\mathbf{r}) \, d\mathbf{r} \\ = \int\_{t-\mathbf{x}\_{2}^{w}}^{t-\mathbf{x}\_{1}^{w}} \boldsymbol{c}^{[n-1]}(\mathbf{0},\mathbf{r}) \, d\mathbf{r} \end{array},\end{array} \tag{63}$$

which is ≥ 0 since c [n−1] (0, τ ) ≥ 0 and t − x w <sup>2</sup> ≤ t − x w 1 . Analogously,

$$\begin{array}{lcl}\boldsymbol{c}^{[n]}(t,\infty) = \int\_{t}^{\infty} \boldsymbol{c}^{[n-1]}\_{i\_{w}=1}(\boldsymbol{\tau},\infty) \, d\boldsymbol{\tau} - \int\_{t}^{\infty} \boldsymbol{c}^{[n-1]}\_{i\_{w}=2}(\boldsymbol{\tau},\infty) \, d\boldsymbol{\tau} \\ = \int\_{t-\mathbf{x}\_{1}^{w}}^{\infty} \boldsymbol{c}^{[n-1]}(\boldsymbol{\tau},\infty) \, d\boldsymbol{\tau} - \int\_{t-\mathbf{x}\_{2}^{w}}^{\infty} \boldsymbol{c}^{[n-1]}(\boldsymbol{\tau},\infty) \, d\boldsymbol{\tau} \\ = -\int\_{t-\mathbf{x}\_{2}^{w}}^{t-\mathbf{x}\_{1}^{w}} \boldsymbol{c}^{[n-1]}(\boldsymbol{\tau},\infty) \, d\boldsymbol{\tau} \end{array},\tag{64}$$

which is ≤ 0 if n is even and ≥ 0 if n is odd. Applying the mean value theorem to the results of (63) and (64), we get, for some t − x w <sup>2</sup> < t ′ , t ′′ < t − x w 1

$$\begin{aligned} \int\_{t-\mathbf{x}\_2^w}^{t-\mathbf{x}\_1^w} c^{[n-1]}\left(0,\tau\right) d\tau &= c^{[n-1]}\left(0,t'\right) \left(-\mathbf{x}\_1^w + \mathbf{x}\_2^w\right), \\\\ \int\_{t-\mathbf{x}\_2^w}^{t-\mathbf{x}\_1^w} c^{[n-1]}\left(\tau,\infty\right) d\tau &= c^{[n-1]}\left(t'',\infty\right) \left(-\mathbf{x}\_1^w + \mathbf{x}\_2^w\right), \end{aligned}$$

and, as c [n−1] (0,∞) = 0, both expressions tend to zero as, respectively, t → ∞ (implying t ′ → ∞) and t → 0 (implying t ′′ → 0).

### 3.2.2. Multiple Sequential Processes in Arbitrary SP Networks

In a composition SP X 1 , . . . , X n , . . . , the components X 1 , . . . , X n are considered sequential if any two of them are sequential. By analogy with Theorem 2.9 for two sequential processes and with Theorem 3.4 for parallel X 1 , . . . , X n , one might expect that the result for the simple sequential arrangement X <sup>1</sup> <sup>+</sup> . . . <sup>+</sup> <sup>X</sup> <sup>n</sup> will also extend to n sequential components of more complex compositions SP X 1 , . . . , X n , . . . . However, this is not the case, as one can see from the following counterexample.

Consider the composition

$$\text{SP}(X^1, X^2, X^3, Y) = \left(X^1 + X^2 + X^3\right) \wedge \left(Y = 2\right), \tag{65}$$

with X 1 , X 2 , X 3 selectively influenced by binary factors, so that

$$\begin{array}{l} \mathbf{x}\_1^1 = \mathbf{x}\_1^2 = \mathbf{x}\_1^3 = \mathbf{0},\\ \mathbf{x}\_2^1 = \mathbf{x}\_2^2 = \mathbf{x}\_2^3 = 1. \end{array} \tag{66}$$

It follows that

$$\begin{aligned} t\_{111} &= 0, \\ t\_{112} &= t\_{121} = t\_{211} = 1, \\ t\_{122} &= t\_{212} = t\_{221} = t\_{222} = 2. \end{aligned} \tag{67}$$

This is clearly a sequential arrangement of the three durations X 1 , X 2 , X 3 , but one can easily check that c [3] (0, t) here is not nonnegative for all t. For instance, at t=3 we have, by direct computation, c [3] (0, t) = −1. We conclude that there is no straightforward generalization of Theorems 3.5 to arbitrary SP compositions.

# 4. Conclusion

The work presented in this paper is summarized in the abstract. By proving and generalizing most of the known results on the interaction contrast of distribution functions, we have demonstrated a new way of dealing with SP mental architectures. It is based on conditioning all hypothetical components of a network on a fixed value of a common source of randomness R (the "hidden variable" of the contextuality analysis in quantum theory), which renders these components deterministic quantities, and then treating these deterministic quantities as random variables with shifted Heaviside distribution functions.

The potential advantage of this method can be seen in the fact that the shifted Heaviside functions have the simplest possible arithmetic among distribution functions: for every time moment it only involves 0's and 1's. As a result, the complexity of this arithmetic does not increase with nonlinearity of the relations involved. Thus, Dzhafarov and Schweickert (1995); Cortese and Dzhafarov (1996), and Dzhafarov and Cortese (1996) argued that composition rules for mental architectures need not be confined to +, ∧, ∨. They analyzed architectures involving other associative and commutative operations, such as multiplication. Due to mathematical complexity, however, this work was confined to networks consisting of two components that are either stochastically independent or monotone functions of each other. It remains to be seen whether the approach presented here, mutatis mutandis, will lead to significant generalizations in this line of work.

The limitations of the approach, however, are already apparent. Thus, we were not able to achieve any progress over known results in applying it to Wheatstone bridges (Schweickert and Giorgini, 1999; Dzhafarov et al., 2004). The possibility that the "architecture" (composition rule) itself changes as one changes experimental factors makes the perspective of a general theory based on our approach even more problematic (e.g., Townsend and Fific, 2004). It seems, however, that these problems are not specific for just our approach.

# Acknowledgments

This work is supported by NSF grant SES-1155956 and AFOSR grant FA9550-14-1-0318.

# References

Bell, J. (1964). On the Einstein-Podolsky-Rosen paradox. Physics 1, 195–200.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Zhang and Dzhafarov. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Corrigendum: Noncontextuality with marginal selectivity in reconstructing mental architectures

Ru Zhang and Ehtibar N. Dzhafarov \*

*Department of Psychological Sciences, Purdue University, West Lafayette, IN, USA*

Keywords: interaction contrast, mental architectures, response time, series-parallel networks, corrigendum

## **A corrigendum on**

**Noncontextuality with marginal selectivity in reconstructing mental architectures** by Zhang, R., and Dzhafarov, E. N. (2015). Front. Psychol. 6:735. doi: 10.3389/fpsyg.2015.00735

This corrigendum note points out and corrects two mistakes found in the paper cited in the title. These mistakes do not affect the correctness of the statements proved and expressions derived.

#### Edited by:

*Sandro Sozzo, University of Leicester, UK*

#### Reviewed by:

*Joseph W. Houpt, Wright State University, USA*

#### \*Correspondence:

*Ehtibar N. Dzhafarov ehtibar@purdue.edu*

#### Specialty section:

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology*

Received: *30 January 2016* Accepted: *11 March 2016* Published: *30 March 2016*

#### Citation:

*Zhang R and Dzhafarov EN (2016) Corrigendum: Noncontextuality with marginal selectivity in reconstructing mental architectures. Front. Psychol. 7:437. doi: 10.3389/fpsyg.2016.00437* 1. In Zhang and Dzhafarov (2015), Lemma 2.6 (p. 7) and Lemma 3.3 (p. 10) are formulated for series-parallel (SP) architectures in which the minimum (∧) and maximum (∨) operations may be intermixed. The proofs are shown for the min-parallel arrangement of the selectively influenced processes, with the correct statement that the max-parallel arrangement is dealt with analogously. However, by an oversight, the proof for the min-parallel arrangement is shown only for homogeneous SP<sup>∧</sup> architectures, those that cannot contain <sup>∨</sup> operations. The statements of the lemmas are correct despite this oversight, because the proofs remain valid if the rightmost ∧Y and ∧X in all expressions of the form

$$\left(\mathsf{SP}^1(\ldots) \wedge \mathsf{SP}^2(\ldots) + X\right) \bigwedge \ Y \text{ and } \left(\mathsf{SP}^1(\ldots) \wedge \mathsf{SP}^2(\ldots) \wedge \!\!/ X\right) + Y$$

are replaced with ∨Y and ∨X, respectively.

2. Equations (61) and (62) on p. 9 should be disregarded: one of them, (62), contains typos, and both are shown in the wrong place. These transformations are only valid in the context of Theorem 3.5, for sequential architectures, and this is the only place where they are used, in Equations (63) and (64) on p. 11.

(A typo: in Equation (60) on p. 9, c [n] (t,∞) should be c [n] <sup>r</sup> (t,∞).)

# AUTHOR CONTRIBUTIONS

All authors listed, have made substantial, direct and intellectual contribution to the work, and approved it for publication.

# ACKNOWLEDGMENTS

This work is supported by NSF grant SES-1155956 and AFOSR grant FA9550-14-1-0318.

# REFERENCES

Zhang, R., and Dzhafarov, E. N. (2015). Noncontextuality with marginal selectivity in reconstructing mental architectures. Front. Psychol. 6:735. doi: 10.3389/fpsyg.2015.00735

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Zhang and Dzhafarov. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.