Authors: Cordelia Mühlenbeck, Thomas Jacobsen
Abstract
What is the origin of visual symbols? The artefacts that are viewed as the first visual symbols—or at least their prototypes—are the remains of stones and other objects with engravings and colorful markings. Our only access to the origin of this behavior that we share with our ancestors within the genus Homo is through skeletons, artefacts, and genetic testing, and we can only draw indirect conclusions about the reasons for their behavior and the underlying cognitive capacities. Yet indications from different disciplines, including anthropology, archaeology, evolutionary biology, and psychology, fit together to form an overall picture. Through empirical studies, we can analyze and draw conclusions from the advantageous visual effects caused by material symbols. In this review, we first examine a definition of visual symbols that captures their essential characteristics and also provide an overview of the evolution of Homo sapiens and the emergence of the species’ cultural behavior. Next, we present two prominent theories regarding the origin of material symbols: a cultural intensification across the entire evolution of the genus Homo versus a later cultural revolution involving only anatomically modern humans and the assumption of additional anatomical or genetic changes, and we describe the difficulties each theory faces. We then examine differences in the cultural behaviors of different primates and indicate which aspects of the two theories are testable, discussing the advantages and limitations of experimental approaches. In conclusion, we clarify how the invention of material symbols can be embedded in the (cultural) evolution of Homo sapiens.
Dieser Artikel ist erschienen in: Mühlenbeck, C., & Jacobsen, T. (2020). On the origin of visual symbols. Journal of Comparative Psychology, 134(4), 435–452. https://doi.org/10.1037/com0000229. Um den Artikel zu zitieren, nutzen Sie bitte diese Referenz.
©American Psychological Association, [2020]. This paper is not the copy of record and may not exactly replicate the authoritative document published in the APA journal. Please do not copy or cite without author’s permission. The final article is available at: [DOI: 10.1037/com0000229] Journal of Comparative Psychology.
1 Introduction
Human visual perception is, among other things, salience driven, with a biased competition between different objects in visual scenes (Desimone & Duncan, 1995). This competition is driven to focus on desired perceptual features through the inherent salience of objects (Yantis, 2005), on the one hand, and the influence of top–down attention (Desimone & Duncan, 1995), on the other (for a summary see: Shinn-Cunningham, 2008). The first nonutilitarian object manipulation (i.e., with no direct technical function) took the form of markings on objects highlighting object-inherent salience. Such findings date back not only to cognitively modern humans, but also to archaic Homo sapiens, Homo neanderthalensis, and Homo erectus (Hoffmann et al., 2018; Joordens et al., 2015; Rodríguez-Vidal et al., 2014). Examples include pigment processing with ochre from more than 280,000 years ago (McBrearty & Brooks, 2000) and use of incisions from different archaeological sites around 100,000 years (Balter, 2009b, 2009a; Hovers, Vandermeersch, & Bar-Yosef, 1997) to 75,000 years ago (Henshilwood, d’Errico, & Watts, 2009; Henshilwood et al., 2002). While these objects provide evidence for a gradual development (McBrearty & Brooks, 2000) of nonutilitarian object manipulation through highlighting existing structure, which raises the question of how salience was used to create the first material symbols, there are two contrasting theoretical explanations for the historical emergence of human production of material symbols. The first assumes that the emergence of human symbolic behavior was a gradual cultural intensification across the entire evolution of the genus Homo, while the second assumes a late revolutionary cultural change, rather than a gradual development, that involved anatomically modern humans but with an additional reorganization of the brain and/or genetic changes.
Given the discrepancies between the two theories, the following questions seem salient: What were the benefits of the object-marking and object-shaping behavior of these ancestors of Homo sapiens, and how might this behavior be related to the beginning of external symbolic storage (a term introduced by Merlin Donald; 1991)—i.e., could there be a connection between a gradual intensification of object manipulation and a late cultural revolution? Since the markings are visual attributes, we believe that investigations of how such objects are visually perceived may show how the two theories regarding the origin of human symbolic behavior can be reconciled. Markings can be used to create different object structures and to construct different object–background relations (Singer & Gray, 1995) and thereby function as representations of the perceived structure of the environment and thus as memory representations, but also as tools for guiding the attention of others. In this way, they can be regarded as external representations and thus as early symbols.
In this article, we argue that the earliest markings on objects should already be interpreted as the beginning of symbolic behavior, and we show how cultural and species comparative studies of the visual perception of marked objects can provide information about differences in mental-processing architectures (i.e., the functioning whole of all mental processes and structures) that can lead to inferences about the beginning of this first symbolic behavior. In particular, we discuss the similarities of nonhuman primate social behavior that appears to be a precursor of human symbolic expression, as great apes understand many aspects of their physical and social worlds—which means that characteristics such as language, sociality, and culture are not unique to humans but are also how great apes approach problem solving (Tomasello, 2014). For this comparative approach, we discuss several eye-tracking studies and the perceptual constraints on orangutan shape perception as examples of how an experimental approach can inform us about the processing of basic abstract visual symbols, since the visual processing architecture reveals aspects of conceptual processing and representations of the environment.
2 What Are Visual Symbols?
In this section, we argue that symbolic behavior can be understood as the ability to create a relation between a signifier and a signified entity. A symbol is a sign or entity that is used to stand for something else (Deacon, 1998); it can be divided into a content carrier[1] and the content, in line with De Saussure’s classification of the signifiant (signifier: content carrier, the symbol) and the signifié (signified: content, the symbolized idea; De Saussure, Baskin, & Meisel, 2011). The signifier can be any material or immaterial entity, such as a sound in language, a material object in art, an action in a ritual, and much more.
There are several requirements regarding what the content can be, which brings us to concepts. One commonality of all symbols is that, when shared by more than one person, they rely on a common information background and often draw the attention of those who share the same information background (Rosch, Mervis, Gray, Johnson, & Boyes-Braem, 1976). In line with cognitive psychology, we use the term “concept” to denote a representation in semantic memory (Collins & Quillian, 1969; Medin, Lynch, & Solomon, 2000; for review see: Putnam, 1979). Symbols can be viewed as external representations of mental concepts, externalized by using material or nonmaterial signifiers. The categories from which concepts are formed are based on family resemblances (Rosch & Mervis, 1975) in prototype theory or exemplar models (Storms, De Boeck, & Ruts, 2000), which require that certain components of perceptual inputs are highlighted during information filtering and are associated with the signifier when they are subsequently considered. Any type of content can function as an object of the subsequent consideration—that is, there are other kinds of concepts in addition to noun–object concepts (Medin et al., 2000).
Language-specific cognition is required for forming the concepts and symbolic representations we have described thus far, but when we turn to the beginnings of the production of visual symbols, we find parallels. The earliest manipulated objects are characterized by a highlighting of single components through colorful or structural markings on stones, bones, and shells (these are documented by the findings that have been preserved, although it is possible that other materials were also used but have since disintegrated; Colagè & d’Errico, 2018; Henshilwood et al., 2018). Markings can be used to orient attention in visual processing, which suggests that they can be used as vehicles for meaning because their specialness is recognized. In this sense, markings are the earliest material symbols: They externalize mental representations of the structure of the environment, or the structures individuals have filtered out of the environment.
For definitions, we will use the terms “symbol,” “sign,” and “signal” in line with cognitive psychological and linguistic usage. There are many ways in which signs and symbols can be defined. For example, Peircean semiotics has recently been applied to the analysis of various Paleolithic artefacts (Iliopoulos, 2016a, 2016b; Preucel, 2008). In this perspective, which semiotically deems early markings to be Peircean icons or indexes (see the comments by Coolidge F.L., Wynn. T., 2011, and Rossano, M., 2011, to: Henshilwood et al., 2011), signs differ from symbols insofar as they are considered to be materially grounded: Signs refer only to themselves, while symbols can acquire meanings in arbitrary and conventional ways (Iliopoulos, 2016b). Consequently, the Peircean perspective assumes that the cognitive architecture associated with the realization of iconic and indexical artefacts could be different from the architecture required for creating full symbols and could have different implications regarding the causality of cognitive evolution.
However, we want to differentiate our use of the term “symbol.” We believe that abstract markings and icons are themselves already symbolic in a basic way, with symbols differing from signs only in adding another level of referencing; that is, there is a hierarchy of referencing in which symbols span multiple levels. For example, it is assumed (and only assumed) that the Venus figurines of the Upper Paleolithic period were a symbol of fertility. As icons, it is possible that they represent only the body of a woman. However, as qua icons, they are the content carrier for a picture of the female body, and this necessitates the concept of a female body. It is not doubted that different symbolic levels were acquired during the cultural evolution of modern humans. Even Malafouris (2013), who differentiates between linguistic and material signs, believes that the difference lies primarily in the communicative dimension. Malafouris sees the enactive logic of material semiosis “as a product of a process of conceptual integration between material and conceptual domains”, and even though material icons might not refer to another culturally shared meaning, but only to their directly incorporated meaning, the ability to form concepts and mental representations is still necessary. This is why we focus on the cognitive foundations needed for basic mental representations in general.
Hence, as indicated above, we also use the term “symbol” for the lowest-level representations of structural elements of the environment, because they require (the same) conceptual abilities; we use the term to refer to any entity that stands for something else, where the entity can be any form of external representation (i.e., also including iconic concepts). We use the terms “sign” and “signal” (for an explanation how these two terms form communicative elements see, e.g. Tomasello, Call, & Gluckman, 1997) to refer to the identification marks that form part of a symbolic representation, that is, the signifiers. Using signs and signals, we highlight parts of reality to denote reality. Using symbols, we place ourselves in relation to reality and express our notions of reality, referring not only to objects, but also relations between them, the conditions and essential characteristics of their existence, and their localization in time and space. Hence, “symbol” refers to both our notion of reality—its perceived structure—and the content carrier that conveys it, while “sign” and “signal” both refer to the content carrier alone.
2.1 Early Forms of Material Symbols and Aesthetics
It is often assumed that the benefits of symbolic behavior are so self-evident (R. White, 1992) that there is no need to explain them or their effects for an individual or a social group in detail. We will argue that there is exactly one commonality between all examples of symbolic behavior: the ability to create representations, which is why marking behavior can already be viewed as visually symbolic and a behavior that distinguishes humans from other primates. To find reasons for the first invention of symbols, we focus on the visual effects of the earliest marked objects. However, it is important to examine not only the visual benefits, but also the possibility of underlying aesthetic rules, as some early artefacts feature structural manipulations that have no direct technical benefit, which hints at an aesthetic value.
Two examples are a hand axe from Tofts, Norfolk, that features a bivalve mollusk and another from Swanscombe, Kent, that features a fossil echinoid. Both come from the Acheuléen, which belongs to the Middle Pleistocene era and pre-dates Homo sapiens (Bahn & Vertut, 1997; McNamara, 2007). The mollusk shell and the echinoid were not added to these artefacts after they were made; rather, the stones featuring them were cut in such a way that the shell and the echinoid that were already embedded in them were left undamaged. As leaving these items on the axes had no direct technical function, this suggests an aesthetic or symbolic reason for leaving them. Moreover, even though the shell and the echinoid were not added to the hand axes after they were made, the objects were highlighted with these items, and cutting around them required a preconceived mental template. Thus, these axes combine two characteristics: the objects were made special through marking, and there is evidence for an aesthetic reason. Additional evidence for aesthetic expression can be seen in the further development of Acheuléen tools from 1.5 million to 100,000 years ago (Abramiuk, 2012) with respect to their shape. Unlike Oldowan tools (2.7 to 1.5 million years ago), the Acheuléen hand axes were bifacial and showed a finished form. The Acheuléen axes were built more and more symmetrically, with a change in their production around 400,000 years ago (Mithen, 1996; T. Wynn, 2002, 2004), while earlier hand axes show no evidence of being meant to be shaped symmetrically in plan view (McNabb, Binyon, & Hazelwood, 2004). A recent study (Brooks et al., 2018) documenting the pigment use and long-distance stone transport of Homo sapiens around 320,000 years ago provides an additional example of colorful markings on stones. These examples of early aesthetic and structuring expression show the importance of directly analyzing the visual features of early artefacts and how they focus attention.
3 The Evolution of Homo Sapiens and the Species’ Cultural Behavior
There are different approaches to determining the beginning of Homo sapiens and the species’ capabilities. Skeletal remains, stones and other artefacts, and evidence from DNA analyses can be used together to trace back to the place and time of our last common ancestors within the genus Homo. Analyses of mitochondrial DNA (mtDNA) have shown that all present-day humans can be traced to a small group of people living in eastern Africa between roughly 194,000 and 160,000 years ago (Gonder, Mortensen, Reed, de Sousa, & Tishkoff, 2007; Ingman, Kaessmann, Pääbo, & Gyllensten, 2000; Stringer & Andrews, 1988). Y-chromosomal DNA analyses examining the root of all living males have found a common ancestor who lived around 104,000 to 59,000 years ago (Tang, Siegmund, Shen, Oefner, & Feldman, 2002; Underhill et al., 2000). African populations seem to have been separated early during our evolution (Behar et al., 2008; Henn et al., 2011), but only 150,000 to 90,000 years ago; albeit, humans with combinations of archaic and modern features persisted in Africa as late as 35,000 years ago (Durvasula & Sankararaman, 2020). Skeletal remains point to almost the same time spans. According to these remains, anatomically modern humans emerged around 200,000 years ago (McDougall, Brown, & Fleagle, 2005; T. D. White et al., 2003). This means that we have a single origin and that Homo sapiens, with the genetic basis for most of the cognitive capabilities we have today, dispersed across the globe with those cognitive capabilities, including the use of spoken language and the ability to create visual symbols. Indeed, Roepstorff (Roepstorff, 2009) has argued that a cognitive connection exists between language and art production, or symbolic practices, because words and objects both function as entities holding a content and are expressions for internal representations, and this also supports the idea that all characteristically human cognitive outcomes have the same origin.
Homo sapiens began to expand across the globe around 60,000 years ago (Atkinson, Gray, & Drummond, 2009; Mellars, 2006). An alternative opinion also states that around 125,000 to 74,000 years ago, Homo sapiens began to expand into Asia (for a review see: Appenzeller, 2012)—but this is not important to our considerations. Any further development of modern humans could only involve cognitive capabilities, since the postcranial skeleton did not change, which allows identification of only one type of modern human. Some archaeologists believe that Homo sapiens did extend its cognitive abilities (Mithen, 1996); for example, a variety of artefacts in large quantities (such as remnants from burial goods and adornment) from between 60,000 and 30,000 years ago have led researchers to this belief. This proposed fundamental change in behavior is also known as the cultural revolution of the Middle–Upper Paleolithic transition in Europe (Mellars, 1973; R. White et al., 1982). Modern humans’ characteristics include their spoken language and their variety of cultural and social practices. Regarding the implications of changes in Homo sapiens’ “cognitive abilities” compared to other species of Homo, it is also important to mention the evolution of neurocranial globularization during the past 125 ka. A gradual change in anatomically modern humans’ cranial shape compared to archaic humans and Neanderthals suggests enlargement of the parietal cortex, the precuneus (where visual imagery generated in the prefrontal cortex is integrated with motor activity), the cerebellum (which regulates delicate hand–eye coordination for construction), and possible enlargement of the basal ganglia (which regulate motor activity) (Bruner et al., 2014; Heilman, 2016; Kochiyama et al., 2018; Neubauer, Hublin, & Gunz, 2018).
3.1 Cultural Behavior and Artefacts: Long-term Cultural Intensification or Sudden Cultural Revolution?
The proposal of a sudden change in the human mind—the cultural revolution—is supported by the new density of cultural artefacts that emerged around 40,000 years ago. Some of the first evidence of ornamentation (Abramiuk, 2012) in Europe is 43,000 years old (Kozlowski, 2000), and evidence in the form of Late Stone Age ostrich shell beads in East Africa (Ambrose, 1998a) as well as evidence in Asia (Turkey and Israel) are both 41,000 years old (Kuhn, Stiner, Reese, & Güleç, 2001). Promoters of the sudden-change perspective state that something fundamental, such as anatomical or genetic changes shaped by natural selection—referred to as a change in cognitive fluidity, occurred during the Middle–Upper Paleolithic transition and provided modern humans with the ability to have their different types of intelligence function together fluidly (Mithen, 1996, 1998b). This is supposed to have led to the origin of our diverse cultural outcomes through art, science, and religion. According to this perspective, specialized types of intelligence that had previously been reserved for special problem solving were reorganized in the brain, and then a working structure was created that made it possible to combine different intelligence types or modules (although today’s human mind is still described by some researchers as comprising different, separated modules (for example: H. Gardner, 1983; Tooby & Cosmides, 1992). It is argued that, although the brain size in general remained the same across modern humans (from 200,000 years ago to now), ranging between 1200 and 1500 cc (Abramiuk, 2012), the different modules that are responsible for different specialized tasks were reorganized to work together as a network. According to this perspective, a combination of different abilities such as technical, social, and natural history intelligence is necessary for creating material symbols (Mithen, 1996). Moreover, this perspective views the mind as working in a holistic way rather than in separate modules responsible for different tasks. As described by Fodor, cognition comprises analogical reasoning (1983), creativity, and holism (1985).
In contrast, the perspective of the gradually evolving human mind sees no sudden changes. According to the gradual evolution perspective, the human mind emerged in a series of gradual changes occurring over several hundred thousand years, even though a clearly stronger density of artefacts appeared around 40,000 years ago. Proponents of the gradual evolution perspective rely on findings that date to earlier than 40,000 years ago, of which there are in fact fewer, although they do exist. In addition to the examples already mentioned in the introduction, there have also been findings of nonpurposeful use of ochre from between 130,000 and 120,000 years ago (Bar-Yosef Mayer, Vandermeersch, & Bar-Yosef, 2009), shell beads from about 92,000 to 82,000 years ago (Bouzouggar et al., 2007), pierced and colored shells and a piece of ochre with geometrical incisions from about 75,000 years ago (Henshilwood et al., 2009; Henshilwood et al., 2002), remnants of ochre found in Blombos Cave, South Africa, that are around 100,000 years old (Balter, 2009a), and geometrically ornamented ostrich shells found in the Diepkloof rock shelter in South Africa that are about 60,000 years old (Texier et al., 2010). Very recent findings that are about 540,000 to 430,000 years old suggest that Homo erectus also engraved objects (Joordens et al., 2015). A good overview of the different new cultural achievements during this long period can be found in McBrearty and Brooks (2000). Various researchers regard the use of beads as a sign of modern cognition (Ambrose, 1998b; d’Errico, Henshilwood, & Nilssen, 2001; Henshilwood & Marean, 2003; McBrearty & Brooks, 2000) and the practice of pigment processing as very early symbolic behavior and part of notational systems (Knight, Power, & Watts, 1995).
There is a vigorous debate between proponents of these two perspectives regarding whether the early findings of markings and pigment processing should be seen as the beginning of symbolic behavior. According to Mithen (1996) and Wynn and Coolidge (2009), the most important things that must be identified in order to settle this debate are a cognitive basis for symbolic behavior, a capacity to intentionally create marks, and a common definition of what exactly symbolic behavior is. For symbolic behavior, we suggest the definition provided earlier: It is the externalization of mental representations in material or immaterial content carrier. Regarding its cognitive basis, the previously mentioned two genetic analyses regarding mtDNA and Y-chromosomal DNA trace us back to ancestors within our own type of neuroanatomically modern humans, which means that the genetic and skeletal findings do not fit well with the theory that a mutation was responsible for the emergence of working memory: “The radical reorganization of gene expression that underwrote the distinctive physical appearance of Homo sapiens was probably also responsible for the neural substrate that permits symbolic cognition. This exaptively acquired potential lay unexploited until it was ‘discovered’ via a cultural stimulus” (Tattersall, 2009, p. 16018). The expression “exaptation” used by Tattersall was first proposed by Gould & Vrba (1982) and refers to nonadaptive co-opting of existing brain plasticity for a new symbolic cognition function.
Modern human anatomy, including the anatomy of the brain, has been apparent since around 200,000 to 150,000 years ago. In this regard, another finding suggests that modern humans should have had at least some sort of cognitive fluidity very early in their evolution: Investigations into the origin of certain amino acids on the modern human FOXP2[2] gene that “have been found to be genetically linked to the advent of language” (Abramiuk, 2012, p. 273) suggest that language had already developed by 200,000 years ago (Enard et al., 2002) . Spoken language is known to use a network of different centers in the brain (Deacon, 1998), which confirms that cognitive fluidity is also necessary for material symbols, because both are symbolic outcomes.
As Sterelny (2014) explains, the early use of symbols, which he already sees in markings and decorations, can be carried out without metarepresentational capacities and, in particular, without the advanced theory of mind (ToM) capacities. Anatomically, the early humans of about 100,000 years ago and later humans of about 50,000 years ago are not very different. There is no evidence of genetic changes that led to a cultural revolution. Sterelny goes on to explain that only the social lives of early humans changed, and these changes needed markers for social bonding, as can be seen by the decorations. In this regard, early material symbols were an objectification of social structures. Thus, we need to focus on the effects of markings on visual perception, because we can derive possible advantages for social life from these effects.
Hence, we will argue for the model proposed by Sterelny (2011), which can be seen as a mixture of the two perspectives. This means that the advent of modern behavior did not coincide with the first appearance of anatomically modern humans, although their cognitive abilities might have been the same over time: “There seems to be good evidence that the modern cultural ensemble arose gradually in Africa and that its abrupt appearance in the European record is the signature of migration (and perhaps indigenous response) rather than rapid biocultural evolution” (Sterelny, 2011, p. 811; see also: Klein, 2008; Klein & Edgar, 2002; McBrearty & Brooks, 2000; McBrearty, 2007). In this model, the behavioral changes are built on previous achievements. It should also be mentioned that in later works, many supporters of the old mutational models (e.g. T. G. Wynn & Coolidge, 2017) no longer make such a strong assertion but rather seem to believe that biology is a necessary but not sufficient condition for cognitive and cultural change and that culture remains a critical condition for this goal. Regarding why (sparse) evidence of symbolic use appeared around 100,000 to 80,000 years ago and subsequently disappeared and then reappeared around 50,000 years ago, we argue that all we can infer based on the record are behavioral changes, which are more likely due to changes in social structures and not anatomical or genetic ones. Markings can be seen as signs, already requiring an individual to be receptive to the benefit of their usage and therefore already requiring the individual’s ability to engage in the use of symbols. To a certain extent, nonhuman primates are also able to use signs (R. A. Gardner & Gardner, 1969; Greenfield & Savage-Rumbaugh, 1990, 1991; Rivas, 2005). In the following section, we examine the extent to which they are similar in this regard but also how the symbolic behavior of humans differs.
4 Cultural and Symbolic Behavior in Comparative Psychology
Comparative studies often focus on the question of which working memory capacities are present in great apes, because this is supposed to be connected to higher-order consciousness and the use of symbols, and whether great apes have the ability to use symbols to communicate or, as a kind of materialization of their mental concepts for providing information to others, or for their own representational examination of their environment. To test their cultural behavior, several studies have examined chimpanzees’ ability to use tools and transfer their knowledge about this use (Boesch, 1991, 1993), as well as whether they have mental maps to remember and find the locations of hidden objects (Menzel, 1973, 1978). Many comparative studies on social learning and transmission of cultural traditions in different great ape species have revealed that cultural transmission is much more widespread in ape species than had earlier been suspected (Gruber, 2016; Schuppli, Koops, & van Schaik, 2016; Whiten, 2017; Whiten, Ayala, Feldman, & Laland, 2017) and also depends on whether they are living in captive environments or in the wild (Gruber, Singleton, & van Schaik, 2012; Kendal, 2015; Musgrave & Sanz, 2016). Tool use and transfer of knowledge represent what the tool user must know about the outcomes that using a tool on a target object will have. Both are connected to what we call cultural behavior and are similar to early human behavior.
In this context, the concept of Gibsonian affordances (Gibson, 1966, 2014) is helpful for describing the relation between an organism and the environment, as certain objects or events permit certain behaviors to occur. It is a descriptive term that refers to the functional opportunities that animals have for interacting with their environment or to properties that permit a behavior (e.g., climbing up or hiding in something). Regarding tool use, early hominins would have functionally perceived the sharp edges of fractured stone, including those of the earliest Oldowan choppers, as objects that could slice or penetrate some structure (Coss, 2003), which then led to the development of ever more symmetrical structures for their shape. The affordances portrayed by abstract graphical images, such as technological symbols, would mostly need to be learned. Crosshatch and zig-zag patterns on artefacts by Asian Homo erectus and early modern humans can be viewed as salient representations of ecologically important biological patterns, possibly recognized innately (e.g., macaques and humans are both attuned to snake scales; Isbell & Etting, 2017; Kawai & He, 2016). Despite arguments by d’Errico and Henshilwood, these salient designs are not recognized as being symbolic representations, although engraved notches on artefacts might afford (symbolically characterize) counts (numbers) for record keeping (d’Errico et al., 2018; for a recent Neanderthal example see Majkić, Evans, Stepanchuk, Tsvelykh, & d’Errico, 2017).
Our interpretation is different, however, because mental maps are connected to the functions of working memory. Even though the method of inferring information about our early ancestors using great apes—chimpanzees, bonobos, gorillas, and orangutans—as models has been critiqued (Sayers & Lovejoy, 2008; T. D. White et al., 2009), great apes provide a good way to narrow down the likelihood of a human behavior being unique or shared by a common ancestor (Carvalho & McGrew, 2012). Following De Waal (1999), Tomasello summarizes: “In the absence of evidence our default assumption will be evolutionary continuity” (Tomasello, 2014, p. 15).
Marc Mehu’s (2015) combination of a hybrid information-theory construct of signal transmission (encoding) with the perceiver’s interpretation of the information is relevant to the topic of de Waal’s notion of evolutionary continuity: “Models of information transfer are useful to understand certain aspects of symbolic communication, but they have to be complemented with models that emphasize social influence. Such integration implies that we recognize the different functions associated with the roles of signaler and perceiver in communication” (p. 4). He concludes that research “should pursue questions related to what is achieved by communicative signals and by perceivers’ assessment mechanisms, along with a careful analysis of the contextual factors and interactive consequences of multimodal displays” (p. 4). For a nonhuman primate comparative view of multimodal communication, see Partan and Marler (1999).
Great apes understand many aspects of their physical and social worlds (for a review, see Tomasello & Call, 1997) and the underlying relations to others’ intentions. This means that human characteristics such as language, sociality, and culture are not unique, but are also the great apes’ approach to problem solving (Tomasello, 2014). To a certain degree, for example, chimpanzees understand the goals of the intentions of others as well as their perceptions, knowledge, and beliefs (Call & Tomasello, 2008). What makes humans different is the extent to which they are capable of understanding the mental states of others, including mental representations of the world that guide others’ actions (Call & Tomasello, 2008). Several studies have shown that great apes seem to be unable to understand false belief (Call & Tomasello, 1999; Hare, Call, & Tomasello, 2001). In contrast, one- and two-year-old human children do seem to understand false belief to a certain degree (Clements & Perner, 1994; Csibra & Southgate, 2006; Onishi & Baillargeon, 2005; Surian, Caldi, & Sperber, 2007). The larger picture of knowing someone else’s belief–desire system (i.e., having a concept of someone else’s mind, referred to as the Theory of Mind (ToM) construct) can be seen as originating from the shared intentionality and cooperative communication of which great apes are only capable to a certain extent (Leavens & Racine, 2009; Tomasello, 2014; Tomasello, Carpenter, Call, Behne, & Moll, 2005).
Much recent work has examined primates’ abilities regarding shared intentionality and ToM (ToM; for review see: Martin & Santos, 2016). This work has shown that primates are able to track the current and past perceptions of others, but do not represent others’ beliefs or form representational relations in the same way as humans (Call & Tomasello, 1999; Kaminski, Call, & Tomasello, 2008; Krachun, Carpenter, Call, & Tomasello, 2009; Marticorena, Ruiz, Mukerji, Goddu, & Santos, 2011; Martin & Santos, 2014; O Connell & Dunbar, 2003). Still, there is consistent evidence that primates are aware of other individuals’ perceptions and information about the world (Flombaum & Santos, 2005; Hare, Call, Agnetta, & Tomasello, 2000; Hare et al., 2001; Hare, Call, & Tomasello, 2006; MacLean & Hare, 2012; Melis, Call, & Tomasello, 2006; Santos, Nissen, & Ferrugia, 2006; Schmelz, Call, & Tomasello, 2011). These studies show that shared intentionality and representation of others’ beliefs exist at different levels of abstraction and that primates (can) only use these to a certain extent.
Regarding the ability to engage in symbolic behavior, Tomasello also views the use of iconic gestures, or pantomime, as the foundation for symbolic behavior, because these gestures symbolize entities, actions, or situations in external icons (Tomasello, 2014), which are then advanced and integrated in further levels of abstraction. Although the emergence of symbolic gestures is not our concern in this review, but rather symbolic markings, there are commonalities. According to Tomasello (2014, p. 68), “Joint goals and attention, as the shared aspect, and individual roles and perspectives, as the individual aspect” unite two levels of cognitive abilities, the two different concepts of the two communicative partners, and their specific perspectives. This ability to do things together did not require language but was rather its prerequisite. Markings on an object make it salient for oneself, and they change the marked object into something different for others. A marking becomes a sign by virtue of being an identification mark that can be remembered, in contrast to the ordinary object before it was marked. Thus, the marking already functions as a concept of an object on which one can fall back as a new object of special interest.
Nonhuman primates do not use iconic gestures (in the sense of higher-order communicative entities that transmit information against a shared culturally-agreed-upon background) or vocalizations (Tomasello, 2014), and they also do not understand other signs, for example, as markers that indicate someone else’s communicative intention (Herrmann, Melis, & Tomasello, 2006; Tomasello et al., 1997). In addition, to our knowledge, there is no reported evidence that nonhuman primates actively use their own signs or markings for relating specific objects to their experiences and memories or for actively representing their mental representations of their surroundings. Regarding symbolic gestural usage, we know that great apes that were raised by humans and trained to use symbols for communication did so almost exclusively to request something (Greenfield & Savage-Rumbaugh, 1990, 1991; Rivas, 2005; Tomasello, 2014). The expressions were always from their own perspective and did not show any constructions that were meant to refer to the recipient’s knowledge and expectations (Tomasello, 2014).
Other studies with trained chimpanzees, bonobos, and gorillas (R. A. Gardner & Gardner, 1969, 1978; Patterson, 1980; Savage-Rumbaugh, 1986) have shown that they have difficulty inventing new words and that the structure of their sentences is simple (extending to only a few words). More recent studies have focused on intentional communication with innate signals (Byrne et al., 2017), though the communicational radius remains the same (Jensvold, 2016; Pika, 2015; Rumbaugh & Massel, 2018; Tomasello & Call, 2019; Zebrowitz & Rhodes, 2004). There is a large body of literature on pointing gestures in primates, also viewed as sign language, but there are “substantive critiques of how to interpret pointing or ‘pointing-like’ gestures in animals [and whether these gestures are rather used] in a way that communicates intent (declarative) rather than motivational states (imperative)” (M. A. Krause, Udell, Leavens, & Skopos, 2018, p. 326).
Donald (1991) believes that the primates’ difficulties stem from the absence or near absence of semantic memory, which “consists of impersonal information, such as general concepts that are socially agreed upon” (Abramiuk, 2012, p. 162). Studies on the presence of episodic-like memory in great apes (Dere, Kart-Teke, Huston, & Silva, 2006; Martin-Ordas, Haun, Colmenares, & Call, 2010; Templer & Hampton, 2013) show that it is unlikely that only humans are capable of episodically remembering, but it seems that great apes do not have mental states whose contents are propositionally structured (Sant’Anna, 2018). The ability to conceptualize symbols for signs, spoken language, and symbolic actions (such as rituals) should rely on the same cognitive structures. Many studies on concept and category learning in animals have shown that other species are capable of differentiating between classes of categories (Zentall, Wasserman, Lazareva, Thompson, & Rattermann, 2008) and have demonstrated strong continuities with humans in categorization, but they have also shown that the major difference is the extent to which humans express their concepts and categorizations for others (for a review, see Smith, Zakrzewski, Johnson, Valleau, & Church, 2016). Regarding the reasons for the differences in the cognitive structures of primates, including humans, Barsalou (2005, p. 311) states that “Humans represent situations that are completely unrelated to the current situation. … This system [greater frontal control plus mechanisms that support social coordination] might also allow humans to focus on mental states and their relations to events, thereby supporting the semantics of abstract concepts (Barsalou & Wiemer-Hastings, 2005).” This aligns with our previous description of the evolution of neurocranial globularization during the past 125 ka in anatomically modern humans compared to archaic humans and Neanderthals.
To summarize, great apes exhibit social and communicative approaches to problem solving similar to that of humans, but their cultural and symbolic behaviors still occur in different forms than those of humans. This is because humans can combine different perspectives detached from space and time. Moreover, the extent to which great apes can conceptualize (regarding other individuals as well as objects or situations) determines the differences in their mechanisms for handling challenges in their environment. Great apes conceptualize in relation to their direct surroundings, which can be seen in the symbolic meanings they represented in the aforementioned studies, but they also differ in the symbols they use, insofar as gestures do not persist over time as material objects do. Humans conceptualize reality to a deeper extent; the conceptualization relates not only to objects, but also relations between different objects and the conditions and essential characteristics of their existence or their localization in time and space.
Cultural comparisons can also shed light on the variability in humans’ use of their cognitive capacities to deal with their ecological environment and how their sociocultural structures correspond to this. Combined with species comparisons, cultural comparisons can also help determine the likelihood of a certain behavior being culturally shaped or having a deeper evolutionary background. In the following section, we present the results of three empirical studies that we conducted for the purpose of cultural and species comparison.
5 Empirical Studies on the Effects of Visual Symbols
Markings restructure objects, changing them from ordinary objects into new ones with their own individuality, and they make specific parts of an individual’s environment salient. The concepts that the new objects represent can vary from individual to individual. The first modifications of objects could have been executed without any significance, but due to the highlighting effects, it is likely that they were subsequently used to carry a meaning to be communicated, such as a marker for ownership, group identity, or personal identity. In what follows, we show how eye tracking combined with a cultural and species comparative approach can lead to reliable information about the relation between viewing behavior, cognitive information processing, and visual adaptation to the living environment.
5.1 Visual Perception and the Eye-Tracking Method
For many mammals, the visual perception channel is one of the most important for processing information from the environment. The analysis of visual attention in psychological research began around one hundred years ago (Duchowski, 2007), when Dodge and Cline (1901) used the first noninvasive technique to measure eye movements via corneal reflection (Jacob & Karn, 2003). The method advanced from mounting the apparatus on the head or in front of the eyes to corneal reflection techniques, with the first of these developed specifically for experiments with young children (Gredebäck, Johnson, & von Hofsten, 2009; Haith, 1969; for a review, see Jacob & Karn, 2003; Salapatek & Kessen, 1966). Most eye-tracking studies analyze the fixations and saccades of the eyes and combine these to construct the scan path that the eyes build on a given stimulus (Poole & Ball, 2006). The fixations are not random, but are rather centered on the object (Buswell, 1935). In a first scan, the rough structure of the object is detected, and then the eyes rest on the object in longer fixations, which can indicate that there is greater interest in the area fixated upon or that the area is more difficult to encode, as formulated in Just and Carpenter’s eye–mind hypothesis (1976, 1980, 1984). Eye movements can thus reveal underlying cognitive processes (Just & Carpenter, 1984; Rayner, 1995, 1998). The duration of a fixation on a specific part of a stimulus can be viewed as an indication of neural information processing or cognitive activity (Loftus & Mackworth, 1978; Salthouse & Ellis, 1980), and the regression of fixations back to the parts that are fixated upon reflects the difficulty of processing and the amount of interest a subject has in the visual information (Goldinger, He, & Papesh, 2009; Just & Carpenter, 1984; Mak, Vonk, & Schriefers, 2002; Radach, 1998; Reichle, Pollatsek, Fisher, & Rayner, 1998). Thus, the intensity of a subject’s information processing can be inferred using the combined measurements of fixations and saccades. It is also possible to analyze spatial and temporal information about the viewing behavior (provided, respectively, by the scan path and by the duration of the fixation on the stimulus).
Eye tracking is a relatively new method for studying the cognitive processes of great apes and comparing these to those of different primate species (Hattori, Kano, & Tomonaga, 2010; Kano, Hirata, Call, & Tomonaga, 2011; Kano & Tomonaga, 2009, 2010, 2011a, 2011b). Assuming that the eye-tracking techniques can be reasonably applied even though the animals live in captivity, have different visual skill development than humans, and may also differ in other respects, a comparison of the eye-tracking patterns of different cultural groups and species allows inferences to be made about their basic visual organization.
We conducted three eye-tracking studies in a cultural and species comparison to analyze the characteristics of early markings. First, we studied the general visual effects of the markings, how they are used in visual processing, and whether they are really given greater attention, which would mean that they are highlighted in contrast to their background (Mühlenbeck, Jacobsen, Pritsch, & Liebal, 2017). Second, since the structures of early markings were often made in a symmetrical way and the overall shapes of hand axes and other tools became more and more symmetrical, we analyzed the visual effects of symmetric structures and whether symmetry could have had an attention-seeking effect (Mühlenbeck, Liebal, Pritsch, & Jacobsen, 2016). Third, we studied the perception of colors to determine whether humans and nonhuman primates share an avoidance or approach reaction to specific colors (Mühlenbeck, Liebal, Pritsch, & Jacobsen, 2015). Thus, the first study concerned the general visual effect of markings, while the second (symmetry) and third (color) studies concerned how the markings or highlighting were done. In each case, it could be argued that the markings would be fixated upon longer because marked objects contain more complex information to be processed. Our environment contains an endless amount of information that must be filtered, and in our processing we choose what we pay attention to. Markings can also be viewed as reducing the information to be processed by providing orientation points. Analysis of different cultural groups and species reflects the different ways these groups solve this information-processing task. No nonhuman primate species have been reported to use or produce highlighting of objects, as Homo sapiens does (although, as noted above, other subspecies of the species Homo have decorated objects; Majkić et al., 2017). Fixation times and patterns can be used to analyze whether other species fail to perceive markings as more complex information. In turn, symmetry can offer an ordering that helps in filtering information, as the ordered structure makes the information easier to process. If spectators pay more attention to symmetry than to other patterns, symmetric markings would be a good choice for highlighting objects.
Assuming that symmetry attracts attention, previous studies have examined the preference for symmetry in other animals. For example, Rensch (1964) conducted a study with capuchin monkeys, vervet monkeys, jackdaws, and crows that tested their preference for symmetrical versus asymmetrical shapes, and Morris (1962) trained different primate species to paint and draw and examined whether they were able to tag predetermined patterns and balance asymmetrical shapes. Both studies found clearly positive effects of symmetry, but the tests were only conducted with individual subjects, so it is not clear whether the tests reflected individual or general preferences. Similarly, testing color preferences in cultural and species comparisons holds interest because the capability of trichromatic color vision has evolved in many primates, including humans and other apes, as well as in Old World monkeys (Buchanan-Smith, 2005; Wells, McDonald, & Ringland, 2008) and one genus of New World monkeys (Dominy & Lucas, 2001). When many species are found to have this capability, the question of whether colors are connected to specific information, for example, hazards or fertility, arises. If so, colors could be used to deliver a certain kind of information or provide a certain signal, and the color with which objects are marked could already contain information that the producer intended to communicate. Thus, markings and highlighting with symmetry and colors could have been the basis for building content carriers because members of Homo sapiens developed the ability to agree with others on using these to draw someone else’s attention and also to use these for themselves, to re-identify objects to which they attached a certain importance.
For our studies, we chose three groups of primates based on the distinctiveness of their habitats and/or sociocultural backgrounds. For humans, two populations were selected, Namibian hunter–gatherers and German town-dwellers. They are different in many ways, but their living environments have been shown to particularly influence their visual perception (e.g., Haun, Rapold, Call, Janzen, & Levinson, 2006). This is special insofar as, like other southern African hunter–gatherers, they have outstanding orientation skills. Widlok (1997, p. 328) describes the orientation strategy of the ≠Akhoe Hai//om, who live in the Northern Namibian Savannah: “Unlike those associated with Indo-European languages it does not rely primarily on the intersecting body-centred axes of left/right and front/back. And, unlike western maps, Hai//om orientation is not based on a grid of latitudes and longitudes.” These orientation skills represent a more holistic approach to dealing with the challenges of the environment and locating oneself within this environment. The ≠Akhoe Hai//om perceive humans and their senses as part of the environment and not separate from it. Widlok (2008, p. 378) explains that the “senses participate in the ‘environment’ and have evolved with the general evolution of the body and the landscape” and further that “there are indications that the insistence to separate out ‘the landscape’ from human practices, including the naming of places as well as the moving through space, is not found in ≠Akhoe Hai//om cultural practice”. Western European humans’ living environment is characterized by a high population density and a mixture of rural, industrialized, and urban landscapes. In contrast to Hai//om children, who live and play outside most of the day, German schoolchildren spend most of the day inside buildings and are therefore confronted with different dimensions in depth perception. The industrialization of cities is also significant, because these cities feature more buildings and an infrastructure net, and sight of the horizon is restricted.
For a non-human primate, we selected orangutans whose natural habitat is even more different than those of the two groups of humans used. Orangutans live in the high canopy of the rainforests of Sumatra and Borneo and use all available vertical and horizontal space when climbing trees, something that is not always possible when they are housed in captive environments such as zoos (Hebert & Bard, 2000; Perkins, 1992; Wilson, 1982). Zoos provide fewer opportunities for apes to move upwards because the enclosures often do not include many trees. Therefore, orangutans in zoos are frequently seen sitting on the floor, although they have been shown to prefer using the upper levels of vertical space when they have the opportunity to climb upwards (Hebert & Bard, 2000). The possibility of climbing and using vertical space could influence orangutans’ visual perception, although we do not know to what extent. Many orangutans that were born in captive environments and have never been exposed to the visual conditions of a dense canopy nevertheless use all spatial dimensions to climb in trees. The Wolfgang Köhler Primate Research Center at Leipzig Zoo, where the orangutans in our studies were tested, offers many opportunities for the apes to climb and hide in the higher levels of trees. Thus, the orangutans in our studies were familiar with the three-dimensional use of climbing space and therefore lived in an environment that differs significantly from that of humans.
5.2 Cultural and Species-specific Differences in Visual Perception
The purpose of our three studies was to test three characteristics of marked objects: the marks that make objects salient (marking), the form in which objects can be modified (symmetry), and the use of specific colors that can reflect an associated meaning through a shared preference for or aversion to these colors. The studies also aimed to determine whether marking behavior could have been based on underlying aesthetic universals.
The study on markings (Mühlenbeck et al., 2017) showed that, regardless of their cultural background, humans paid more attention to marked objects and used the markings in their visual processing of the objects, but the orangutans did not. The orangutan group had a trend of preferring marked sticks over unmarked ones, which shows that they also responded to the markings to some extent. However, their overall viewing behavior seemed to be completely different since they generally paid more attention to the background of the objects than the humans did. This suggests that human perception is trained in finding signs and signals, in the sense of identification marks, whereas orangutans’ perception is not. Considering markings as basic symbolic representations of the structure of our environment, our studies showed that the difference between the humans and orangutans was that the orangutans only responded to the markings on the objects they knew—that is, they perceived the markings only in the context of the known objects, and not as a general abstraction of an object marker common to the other objects that were presented. Hence, for the humans, a structural abstraction emerged as a commonality among all marked objects, whereas for the orangutans, no abstraction among the objects apparently occurred.
Our study on symmetry (Mühlenbeck et al., 2016) showed that the same result also holds for symmetric structures. The humans preferred symmetry over asymmetry and used the ordered structures in their visual processing by sustaining their fixation on them after briefly scanning two patterns, one symmetric and the other asymmetric. In this regard, it is worth noting that preschool children’s early artistic expression is dominated by pattern symmetry (Kellogg, 1969). In contrast, the orangutans did not differentiate between the two types of structures.
Our study on colors (Mühlenbeck et al., 2015) showed that there were no shared color preferences between orangutans and humans, and also that the visual perception of colors was not influenced by a simultaneously heard auditory stimulus. In the human group an aversion to the color yellow was found, but not among the orangutans, which suggests that the use of color for markings has no predetermined connected information.
One explanation of the ability to attend to markings could be the ability to respond to signs and use them in the structural processing of one’s surroundings and as a prerequisite for creating symbols for the representation of one’s surroundings. While we can assume that the two species under investigation in the three studies share overlapping mental representations in their long-term memory, i.e., concepts; it is not clear whether orangutans share the basic processes of early symbol use found in Homo sapiens, as reviewed above. Our hypothesis that markings and symmetry are used in human visual processing was confirmed. (Regarding the results of the aesthetic preference for these structures, we refer the reader to our studies.) However, we confirmed neither a shared preference nor a shared fixation avoidance when colors were combined with negatively or positively valenced auditory information.
As described above, the living environment, among other factors, influences how individuals perceive their surroundings. Separation of the self from the surrounding environment could be the reason why the German participants in the studies on markings and symmetry concentrated completely on the center of the objects and ignored the objects’ background. In contrast, in the ≠Akhoe Hai//om culture, there is no strong separation between subject and environment, which could explain why the Hai//om always perceived objects as part of the background. Spatial cognition systematically varies with language and culture, as found by Haun et al. (2006), who examined four different genera—Pongo, Gorilla, Pan, and Homo—regarding their processing of spatial relations, and found that all four genera preferred allocentric over egocentric spatial orientations. This means that they linked themselves to a reference frame based on their external environment rather than their own position in this environment. This shows that the preference for allocentric coding of spatial relations can be overridden by cultural preferences, as in our own Western European culture, where we have a more egocentric orientation.
Biological mechanisms could also explain the differences in visual perception. People who live in their original environmental niche, as hunter–gatherers do, develop almost no myopia (Cordain, Eaton, Brand Miller, Lindeberg, & Jensen, 2002). Moreover, several studies involving people living in industrialized cities, not only in Western Europe but also, for example, in China (Angle & Wissmann, 1980; Lu et al., 2009; Park & Congdon, 2004), have found that these people develop more myopia. However, the extents to which genetic predisposition and habituation of the eyes to a near focal distance during close work have an impact are still under discussion. A strong connection has only been found between indoor activities and myopia, while outdoor activities have been shown to reduce the prevalence of myopia in children (Dirani et al., 2009; Jones et al., 2007; Rose et al., 2008). Thus, for the ≠Akhoe Hai//om, spending most of their life outside could result in better depth perception and hence in different attention being paid to the object–background relation. The connection between attention in visual perception and ecological-sociocultural backgrounds should be tested in future studies, including a broader variety of cultures.
The distinctiveness of the three different groups regarding their spatial orientation and their perception of their environment played a major role in our eye-tracking studies. The studies showed that, though ≠Akhoe Hai//om children are very different from German children in terms of their culture, their social life, and how they perceive their surroundings and locate themselves within it (their scanning patterns represented these differences in perception), both groups nevertheless preferred the markings and the symmetric patterns in their fixations. The main difference between humans and orangutans was that the orangutans scanned the stimuli much more quickly and with a wider radius than did the human participants. This is consistent with the findings of Kano et al. (2011), who explain the different scanning behaviors by different adaptations to the respective ecological environments. We agree with their argument that “it may be more beneficial to scan visual fields more quickly … in the context of arboreal living, where objects and animals tend to appear in an unexpected manner, as may be the case for chimpanzees and orangutans,” and “rather than constantly retrieving new information, humans may keep their gaze stationary and thereby promote time-consuming internal processing (e.g., for the sake of categorical and language processing)” (Kano et al., 2011, p. 2354). Although it is only an interpretation of Kano and colleagues‘ findings, this statement reveals very clearly how differences between species and cultures can be explained by the surroundings that they live in and to which they are adapted. The orangutans we studied were living in captivity, so it is possible that other orangutans living in a natural habitat would show different viewing behaviors. We do not think, however, that the extent to which the viewing behaviors might differ would have a significant influence on the presented results, because the environmental influence did not reveal itself to be significant regarding the viewing architecture in our cultural comparison. For the three groups tested, we showed how visual perception and attention capturing can be understood relative to the participants’ ecological and sociocultural environments, which revealed commonalities that resisted a cultural override.
6 The Invention of Visual Symbols
6.1 A Theory Regarding the Origins of Visual Symbols
According to Mithen (1996), technical skills, knowledge of natural history, and social intelligence are the three types of intelligence that represent the three basic mental attributes involved in creating and reading visual symbols. These should work together smoothly: 1) planning and execution of a preconceived mental template or construct (technical intelligence); 2) intentional communication that is not limited in terms of time and space and easily recalled in memory (social intelligence, language intelligence); and 3) attribution of symbolic meaning independent of the object (natural history intelligence as, for example, the attribution of hoof prints are natural signs). As we have argued, we should not focus on the materials used or the types of intelligence involved, but rather on the basic requirements of symbols, which are any form of content and any form of content carrier. Therefore, improvement in the cognitive abilities required for producing visual symbols should not be inferred from technical advancements, but rather from the intention to engage in symbolic behavior. As Mithen (1996, p. 160) explains, “What we need to find in the mind of Early Humans is a capacity to intentionally create marks or objects of a preconceived form.” Since incidentally produced incisions and marks from tooth scratches on bones and the like can be excluded, our findings indicate that the markings were applied intentionally, because as signs they guide the attention of other humans and because information about the mental representation of the perceived structure of the environment is provided by the visual-scanning architecture.
We suggest that the ability to already use manipulated objects in this way as information carrier represents the most crucial step in cultural evolution and does not have to be connected to cognitive or genetic changes shaped by natural selection. Although cognitive changes would be inherent in any brain plasticity subserving cultural evolution, these changes also function in accordance with the aforementioned nonadaptive construct of “exaptation.” (Gould & Vrba, 1982). Concepts have their foundation in memory, and markings address this characteristic insofar as they highlight objects and thereby stand for the mental representation of the environment. In addition, concepts make it possible to invest markings with other information. Mithen (1996) and Wynn and Coolidge (2009) outlined additional requirements for symbols: first, a mental template—a concept—and intentional manipulation of an object to represent this concept, second, intentional communication with others, and third, they should stand for something else—a meaning, a content. These apply to marking behavior without needing anatomical changes such as brain modifications with genetic effects. The intentional modification of an object already requires a mental template—the distinction between an ordinary object and the highlighted version of it, which also includes a mental template of a higher aesthetic value if the marking was only for personal use. The other two requirements address the social structures in which symbols are used. The communicative dimension of symbols does not mean that information will eventually be delivered to others. Symbols are defined as entities that stand for something else. We can decorate or highlight things as symbols for beauty or value or to carry information that we only use personally without ever communicating it to others. But still, with highlighting, there is the possibility that the attention of others will be driven to the marking. Hence, the attention-guiding effect of markings has a communicative role, although being a symbol does not necessitate use of this role.
As we have argued, spoken language must have already appeared by the time of the earliest use of material symbols, since Homo sapiens was anatomically modern. As modern humans were already able to use specific forms of symbols (auditory symbols), the use of material symbols can therefore be understood as a cultural change or cultural intensification, rather than based on genetic changes. As mentioned earlier, very recent findings indicate that marking behavior was also conducted by hominoid groups other than Homo sapiens—Homo erectus some 530,000 years ago (Joordens et al., 2015), but also Neanderthals more than 39,000 years ago (Rodríguez-Vidal et al., 2014)—and neither of these was assumed by cultural revolution theory to be capable of symbolic behavior. Additionally, these authors claim that accidental manipulation of the objects could be ruled out. This shows that there existed other large-brained hominids who already used markings, which supports the thesis that the early use of symbols should be understood as a cultural advancement rather than a cognitive one. The fact that humans visually respond to markings differently than orangutans (and possibly other primates) suggests that early markings were created to represent certain information, even if directed only to oneself. Thus, our three studies did not prove that cognitive fluidity (i.e., different types of intelligence working fluently together, such as technical, communicative, and conceptual intelligence) was necessary for the early markings. Instead, they showed that markings solicit attention and that only humans responded to them. That is, cognitive fluidity is not a necessary characteristic for symbolic behavior; rather, attention guidance and structural representation are already important characteristics that lead to symbolic behavior.
There is another reason why we should view the invention of visual symbols as cultural transmission rather than the result of a genetic change. As Tomasello notes, genetic and anatomical changes would have required time “to invent and maintain complex tool-use industries and technologies, complex forms of symbolic communication and representation, and complex social organizations and institutions.” (1999, p. 2). The proposal of a sudden change is even more surprising when we consider that for many millions of years there should not have been anything other than “typical great ape cognitive skills” (Tomasello, 1999, p. 4), and then these suddenly changed into human cognitive skills. Tomasello maintains that the only solution to this problem is “social or cultural transmission, which works on time scales many orders of magnitude faster than those of organic evolution” (1999, p. 4). Seen this way, the invention of visual symbols does not seem as complex as has been assumed. The catalyst—the social or cultural transmission—stands in sharp contrast to the mutation assumed by Wynn and Coolidge (2004, 2007). When such a significant development as symbolic behavior intensified about 30,000 years ago, there had to exist prototypes on which to build and which could be further developed. This is why we must consider that the cognitive abilities necessary for creating symbols should have been present before the cultural revolution. Still, cognitive architectures can also transform themselves over time due to cultural learning, and differences in our cognitive architectures can depend on the complexity of the external symbolic storage a culture has produced and builds upon (for the hypothesis that biological selection increased in time due to culture see Cochran & Harpending, 2009; and for a detailed analysis of the dependence between mind and external symbolic storage, see Donald, 1991). But there must be a difference between the genetic changes that are assumed to have caused a cultural revolution and the steady transformation that takes place when the mind remains in interchange with different forms of material and nonmaterial symbols, because the amount of symbolic storage that is produced and used in today’s living cultures varies widely but all living humans still belong to the same species of Homo sapiens, for which we would not assume such saltational genetic changes as have been assumed for the Upper Paleolithic period.
Given what our study on markings has shown, namely, that markings are treated differently in visual processing, it is more likely that the development of marking behavior built the foundation for symbolic behavior, because markings represent a prototype of abstract signs. They do this through the pointing character to which humans are receptive, which shows that humans at that time should already have possessed the capacity to understand abstract signs as a hint that their attention should be drawn to something. As we have noted, the orangutans in our study most likely did not perceive the markings as signs or references to a certain kind of peculiarity of the object.
Regarding the question why material symbols were not used earlier in human history, which Zilhão (2007, p. 72) described as the “sapiens paradox”, Sterelny (2011, p. 813) explains that this can only be seen as a paradox “if it is conjoined with a ‘simple-reflection model’ of the relations between cultures, minds and genes: a model in which cultures reflect the intrinsic capacities of human minds, and these in turn reflect our evolved genetic endowment”. This model can be rejected because it is not inevitable that human cultures should mirror the innate capabilities of the human mind. Humans react based on their material and informational environments (Sterelny, 2011, p. 813), and material symbols, which emerge in environments where they are supported, enhance memory (Clark, 2008). Technological advancements have appeared and disappeared again over the last 300,000 years, and they become the foundations for later technologies (Conard, 2007; Hiscock & O’Connor, 2006). Hence, the emergence of modern behavior was due to cultural learning, as supported by several facts. First, technological advancements do not develop in a linear fashion; rather, they appear and then disappear again. The genetic change model seems to predict a change in cultural complexity around 60,000 to 50,000 years ago (Sterelny, 2011), but the archaeological data do not support such a single change. Second, tool production changed around 300,000 to 250,000 years ago (McBrearty & Brooks, 2000), and the changes included the use of different materials, such as bone and ivory, which were likely valued because of their aesthetic properties. Humans expanded their range of resources (O’Connell, 2006), and the extension of hand crafts could only have been due to cultural learning.
6.2 Limitations
Inferences about the cognitive abilities of our early human ancestors are always subject to uncertainties since these ancestors are extinct, which makes it impossible to directly test such inferences. However, as we have outlined, there are reasons that make it likely that the cognitive abilities needed for symbol use were already present in early humans, such as the temporal problem described by Tomasello and the cultural transmission solution to this problem. In addition, Sterelny has argued that the cultural development of Homo sapiens should not be seen as a genetic inheritance alone because this would predict a single sudden emergence of cultural artefacts. The archaeological record shows different peaks of cultural inventions, which poses a challenge that must be solved by the cultural revolution perspective.
By analyzing the visual perception found in different living cultures and species, we can only address two dimensions of the cultural-transmission hypotheses described by Tomasello and Sterelny: the attention-driving dimension of intentional communication and the mental representation of the structure of one’s surroundings, which is the symbolic dimension. The scanning paths of the human participants in our studies reflected recognition of these mental representations and the functional use of the markings; in contrast, the orangutans did not indicate such recognition. Cultural and species comparative approaches have the advantage of including the evolutionary developmental status of the compared groups within their specific habitats, and a direct experimental approach to the behavioral dimension of markings provides missing information about how our visual perception and attention are linked to symbolic behavior.
However, there is a need to test more species and analyze their visual attention in a wider context. German city dwellers and Namibian hunter–gatherers culturally differ to a great extent, but other environmental niches that people live in should be included in the comparison to support the conclusions. There are many respects in which people vary in their cultural lives, their ecological niches, and their social backgrounds, and all of these can have different influences. Since all habitats differ to some extent and with these the surroundings for testing, this raises a general question regarding whether other species and humans from different cultures are similar enough to be comparable. To fruitfully compare them, one must eventually assume that, for eye-tracking methods, their eyes, their eye movements, and also their basic attentional behavior are similar. To some extent, this is not the case. For example, orangutans have a much shorter attention span than humans. In addition, to reflect the visual adaptation of the orangutans to their natural habitat, it would be necessary to test subjects in the field, which is impossible with eye trackers. Although head-orientation measures of vigilance in wild species have been quantified using head orientation as a proxy for measuring visual fixation on something important (see, for example, Fernández-Juricic, Beauchamp, Treminio, & Hoover, 2011; Okamoto et al., 2002), it is not possible to use head orientation to accurately study the duration of single fixations and the number of saccades. This means that testing in a species comparison will inevitably carry some limitations that cannot be overcome.
7 Conclusions
While we can assume that the two species under investigation in our three studies share overlapping mental representations in their long-term memory—namely, the concepts of many natural categories—it appears that orangutans do not share the basic processes of early symbol use with Homo sapiens. We consider markings, engraving, and coloring of objects to be good candidates for such early abstract symbols. Using the perspective of cognitive archeology, Duilo Garofoli (2015, p. 7) provides a cautionary review, with a broad view of perceptual symbols: “Perceptual symbols can coexist with amodal representations, so that abstract concepts can be represented by the classic amodal theories (definitions, prototypes, exemplars, theories), while concrete, highly-imageable entities can be represented in the form of perceptual tokens”. Under the assumption that the employed eye-tracking techniques can be reasonably applied to orangutans (even though the animals live in captivity, have different visual skill development than humans, and may also differ in other factors), a comparison of the scanning patterns between cultural and species groups allows us to draw inferences about basic visual organization. The data we collected show that the orangutans did not exhibit the basic processes of visual organization that underlie the construction of a mental representation of signs serving as symbols, in our case, the markings on objects. Despite all of the given limitations, one inference that can be drawn from our three studies is that the orangutans did not use basic visual symbols (i.e., basic abstract representations of the given structure), because their viewing did not mirror construction of the respective mental representation. In contrast, the two human participant groups did not differ in their basic viewing behavior, despite the fact that they had vastly different life-history experiences. In this way, cultural and species comparisons can address fundamental questions about the evolution of human cognition.
Steven Mithen (1998a, p. 181) wrote that the invention of language “provided the means by which one could explore one’s own conceptual spaces, and, by creating a network of minds, the extent of this exploration and transformation was exponentially increased”. We believe that this also applies perfectly to visual symbols, because they made it possible to externalize information by using any available material to build new signifier for content and attach value and meaning to them. In this way, humans began to actively structure their surroundings and communicate their own impressions of how the world is structured and build their identity in relation to it.
Acknowledgments
We thank Merrie Bergmann for her helpful comments on earlier versions of the manuscript.
References
Abramiuk, M. A. (2012). The foundations of cognitive archaeology. Cambridge, MA: MIT Press.
Appenzeller, T. (2012). Human migrations: Eastern odyssey. Nature, 485(7396), 24-26.
Balter, M. (2009b). On the origin of art and symbolism. Science, 323, 709-711.
Boesch, C. (1991). Teaching among wild chimpanzees. Animal Behaviour, 41(3), 530-532.
Buswell, G. T. (1935). How people look at pictures. Chicago, IL: University of Chicago Press.
Cordain, L., Eaton, S. B., Brand Miller, J., Lindeberg, S., & Jensen, C. (2002). An evolutionary analysis of the aetiology and pathogenesis of juvenile-onset myopia. Acta Ophthalmologica Scandinavica, 80(2), 125-135.
Csibra, G., & Southgate, V. (2006). Evidence for infants‘ understanding of false beliefs should not be dismissed. Trends in Cognitive Sciences, 10(1), 4-5.
d’Errico, F., Doyon, L., Colagé, I., Queffelec, A., Le Vraux, E., Giacobini, G., . . . Maureille, B. (2018). From number sense to number symbols: An archaeological perspective. Philosophical Transactions of the Royal Society B: Biological Sciences, 373(1740), 20160518.
Duchowski, A. (2007). Eye Tracking Methodology: Theory and Practice (2 ed.). London: Springer.
Fodor, J. A. (1985). Precis of the modularity of mind. Behavioral and Brain Sciences, 8(01), 1-5.
Gardner, H. (1983). Frames of Mind: The Theory of Multiple Intelligences. New York, NY: Basic Books.
Hebert, P. L., & Bard, K. (2000). Orangutan use of vertical space in an innovative habitat. Zoo Biology, 19(4), 239-251.
Ingman, M., Kaessmann, H., Pääbo, S., & Gyllensten, U. (2000). Mitochondrial genome variation and the origin of modern humans. Nature, 408(6813), 708-713.
Jones, L. A., Sinnott, L. T., Mutti, D. O., Mitchell, G. L., Moeschberger, M. L., & Zadnik, K. (2007). Parental history of myopia, sports and outdoor activities, and future myopia. Investigative Ophthalmology & Visual Science, 48(8), 3524-3532.
Joordens, J. C., d’Errico, F., Wesselingh, F. P., Munro, S., De Vos, J., Wallinga, J., . . . Kuiper, K. F. (2015). Homo erectus at Trinil on Java used shells for tool production and engraving. Nature, 518(7538), 228-231.
Kellogg, R. (1969). Analyzing children’s art. Palo Alto, CA: National Press Books.
Klein, R. G., & Edgar, B. (2002). The Dawn of Human Culture. New York, NY: John Wiley & Sons.
Malafouris, L. (2013). How things shape the mind. Cambridge, MA; London, England: MIT Press.
Maricic, T., Günther, V., Georgiev, O., Gehre, S., Ćurlin, M., Schreiweis, C., . . . Lalueza-Fox, C. (2012). A recent evolutionary change affects a regulatory element in the human FOXP2 gene. Molecular biology and evolution, 30(4), 844-852.
Menzel, E. W. (1973). Chimpanzee spatial memory organization. Science, 182(4115), 943-945.
Morris, D. (1962). The Biology of Art. London, England: Methuen.
Mühlenbeck, C., Liebal, K., Pritsch, C., & Jacobsen, T. (2015). Gaze duration biases for colours in combination with dissonant and consonant sounds: A comparative eye-tracking study with orangutans. PloS One, 10(10), e0139894.
Mühlenbeck, C., Liebal, K., Pritsch, C., & Jacobsen, T. (2016). Differences in the visual perception of symmetric patterns in orangutans (Pongo pygmaeus abelii) and two human cultural groups: A comparative eye-tracking study. Frontiers in Psychology, 7.
Partan, S., & Marler, P. (1999). Communication goes multimodal. Science, 283(5406), 1272-1273.
Preucel, R. W. (2008). Archaeological semiotics (Vol. 4). Malden, MA: John Wiley & Sons.
Reichle, E. D., Pollatsek, A., Fisher, D. L., & Rayner, K. (1998). Toward a model of eye movement control in reading. Psychological Review, 105(1), 125-157.
Santos, L. R., Nissen, A. G., & Ferrugia, J. A. (2006). Rhesus monkeys, Macaca mulatta, know what others can and cannot hear. Animal Behaviour, 71(5), 1175-1181.
Templer, V. L., & Hampton, R. R. (2013). Episodic memory in nonhuman animals. Current Biology, 23(17), R801-R806.
Tomasello, M. (2014). A Natural History of Human Thinking. Cambirdge, MA: Harvard University Press.
Tomasello, M., & Call, J. (1997). Primate Cognition. New York: Oxford University Press.
Tomasello, M., & Call, J. (2019). Thirty years of great ape gestures. Animal Cognition, 22, 461-469.
[1] “Carrier” is used here in the sense of “bearer” and does not convey the idea of transportation.
[2] The human FOXP2 gene is different from that of other animals and has subsequences that can also be differentiated from that found in Homo neanderthalensis (J. Krause et al., 2007; Maricic et al., 2012).