pmc logo image
Logo of pnasPNAS Home page.Reference to the article.PNAS Info for AuthorsPNAS SubscriptionsPNAS About

Formats:

Proc Natl Acad Sci U S A. 1999 July 6; 96(14): 8028–8033.
PMCID: PMC22182
Evolution
The evolution of language
Martin A. Nowak* and David C. Krakauer
Institute for Advanced Study, Princeton, NJ 08540
*To whom reprint requests should be addressed. e-mail: nowak/at/ias.edu.
Communicated by Robert May, University of Oxford, Oxford, United Kingdom
Received March 12, 1999; Accepted May 7, 1999.
The emergence of language was a defining moment in the evolution of modern humans. It was an innovation that changed radically the character of human society. Here, we provide an approach to language evolution based on evolutionary game theory. We explore the ways in which protolanguages can evolve in a nonlinguistic society and how specific signals can become associated with specific objects. We assume that early in the evolution of language, errors in signaling and perception would be common. We model the probability of misunderstanding a signal and show that this limits the number of objects that can be described by a protolanguage. This “error limit” is not overcome by employing more sounds but by combining a small set of more easily distinguishable sounds into words. The process of “word formation” enables a language to encode an essentially unlimited number of objects. Next, we analyze how words can be combined into sentences and specify the conditions for the evolution of very simple grammatical rules. We argue that grammar originated as a simplified rule system that evolved by natural selection to reduce mistakes in communication. Our theory provides a systematic approach for thinking about the origin and evolution of human language.
Language remains in the minds of many philosophers, linguists, and biologists a quintessentially human trait (13). Attempts to shed light on the evolution of human language have come from many areas including studies of primate social behavior (46), the diversity of existing human languages (7, 8), the development of language in children (911), and the genetic and anatomical correlates of language competence (1216), as well as theoretical studies of cultural evolution (1721) and of learning and lexicon formation (22). Studies of bees, birds, and mammals have shown that complex communication can evolve without the need for a human grammar or for large vocabularies of symbols (23, 24). All human languages are thought to possess the same general structure and permit an almost limitless production of information for communication (25). This limitlessness has been described as “making infinite use of finite means” (45). The lack of obvious formal similarities between human language and animal communication has led some to propose that human language is not a product of evolution but a side-effect of a large and complex brain evolved for nonlinguistic purposes (1, 26). Others suggest that language represents a mix of organic and cultural factors and, as such, can only be understood fully by investigating its cultural history (16, 27). One problem in the study of language evolution has been the tendency to identify contemporary features of human language and suggest scenarios in which these would be selectively advantageous. This approach ignores the fact that if language has evolved, it must have done so from a relatively simple precursor (28, 29). We are therefore required to provide an explanation that proposes an advantage for a very simple language in a population that is prelinguistic (3032). This work can be seen as part of a recent program to understand language evolution based on mathematical and computational modeling (3337).
The Evolution of Signal–Object Associations.
We assume that language evolved as a means of communicating information between individuals. In the basic “evolutionary language game,” we imagine a group of individuals (early hominids) that can produce a variety of sounds. Information shall be transferred about a number of “objects.” Suppose there are m sounds and n objects. The matrix P contains the entries pij, denoting the probability that for a speaker object i is associated with sound j. The matrix Q contains the entries qji, which denote the probability that for a listener sound j is associated with object i. P is called “active matrix,” whereas Q is called “passive matrix.” A similar formalism was used by Hurford (22).
Imagine two individuals, A and B, that use slightly different languages L (given by P and Q) and L′ (given by P′ and Q′). For individual A, pij denotes the probability of making sound j when seeing object i, whereas qji denotes the probability of inferring object i when hearing sound j. For individual B, these probabilities are given by pij and qji. Suppose A sees object i and signals, then B will infer object i with probability Σj=1mpijqji. A measure of A’s ability to convey information to B is given by summing this probability over all objects (n). The overall payoff for communication between A and B is taken as the average of A’s ability to convey information to B, and B’s ability to convey information to A. Thus,
equation M1
1
In this equation, both L and L′ are treated once as listener and once as speaker, leading to the intrinsic symmetry of the language game: F(L,L′) = F(L′,L). Language L obtains from L′ the same payoff as L′ obtains from L. If two individuals use the same language, L, the payoff is F(L,L) = Σin=1Σjm=1pijqji.
Hence, we assume that both speaker and listener receive a reward for mutual understanding. If for example only the listener receives a benefit, then the evolution of language requires cooperation.
In each round of the game, every individual communicates with every other individual, and the accumulated payoffs are summed up. The total payoff for each player represents the ability of this player to communicate information with other individuals of the community. Following the central assumption of evolutionary game theory (38), the payoff from the game is interpreted as fitness: individuals with a higher payoff have a higher survival chance and leave more offspring who learn the language of their parents by sampling their responses to individual objects.
Fig. Fig.11Figure 1 shows a computer simulation of a group of 100 individuals. Initially, all individuals have different random entries in both active and passive matrices. After some rounds, specific sounds begin to associate with specific objects. Eventually each object is exactly associated with one signal. The simulation shows how a protolanguage can emerge in an originally prelinguistic society.
Figure 1
Figure 1
Figure 1
Emergence of a protolanguage in an initially prelinguistic society. The population consists of 100 individuals. Each of them starts with a randomly chosen P and Q matrix. There are five objects and five signals (sounds). In one round of the game, (more ...)
For m = n, the evolutionary optimum is reached if each object is associated with one specific sound and vice versa. Evolution does not always lead to the optimum solution, but certain suboptimum solutions, in which the same signal is used for two (or more) objects, can be evolutionarily stable.
A Linguistic Error Limit.
Below, we discuss two essential extensions of the basic model. First, we include the possibility of errors in perception: early in the evolution of communication, signals are likely to have been noisy and can therefore be mistaken for each other (39). We denote the probability of interpreting sound i as sound j by uij. The payoff for L communicating with L′ is now given by
equation M2
2
The probabilities, uij, can be expressed in terms of similarities between sounds. We denote the similarity between sounds i and j by sij. We obtain uij = sijkm=1sik. As a simple example, we assume the similarity between two different sounds is constant and given by sij = [var epsilon], whereas sii = 1. In this case, the probability of correct understanding is uii = 1/[1 + (m − 1)[var epsilon]]. The maximum payoff for a language with m sounds (when communicating with another individual who is using the same language) is given by F(m) = Σim=1uii, and therefore F(m) = m/[1 + (m − 1)[var epsilon]]. The fitness, F, is an increasing function of m converging to a maximum value of 1/[var epsilon] for large values of m. Without error, we would have F(m) = m. Thus, in the presence of error, the maximum capacity of information transfer is limited and equivalent to what could be achieved by 1/[var epsilon] sounds without error.
Next, we assume that objects can have different values, ai. (For example when a leopard represents a higher risk than a python, the word “leopard” may be more valuable than “python.”) We have F(m) = [1 + (m − 1)[var epsilon]]−1Σim=1ai, where the objects are ranked according to their value, a1 > a2 >… . This fitness function can adopt a maximum value for a certain number m and decline if the value of m becomes too big. In this case, natural selection will limit the number of sounds used in the language and consequently also limit the number of objects described. Fig. Fig.22Figure 2 shows a computer simulation of this extended evolutionary language game. The final outcome is a language that uses only a subset of all available sounds to describe the most valuable objects.
Figure 2
Figure 2
Figure 2
Evolution of protolanguage in the context of misunderstanding. There are 20 objects and 40 sounds, but evolution leads to a language that uses only 9 sounds to describe 11 objects. Sounds are represented on a linear spectrum by numbers between 0 and (more ...)
The principal result of the extended model, including misunderstanding, is that of a “linguistic error limit”: the number of distinguishable sounds in a protolanguage, and therefore the number of objects that can be accurately described by this language, is limited. Adding new sounds increases the number of objects that can be described but at the cost of an increased probability of making mistakes; the overall ability to transfer information does not improve. This obstacle in the evolution of language has interesting parallels with the error-threshold concept of molecular evolution (40). The origin of life has been described as a passage from limited to unlimited hereditary replicators, whereas the origin of language as a transition from limited to unlimited semantic representation (41).
Word Formation.
The way to overcome the error limit is by combining sounds into words. Words are strings of sounds. As before, we define the fitness of a language as the total amount of successful information transfer. The maximum fitness is obtained by summing over all probabilities of correct understanding of words. For a language with m sounds (phonemes) and a word-length l, the maximum payoff is given by F(m,l) = ml[1 + (m − 1)[var epsilon]]l, which converges to 1/[var epsilon]l for large values of m, thus allowing a much greater potential for communication. This equation assumes that understanding of a word is based on the correct understanding of each individual sound.
More realistically, we may assume that correct understanding of a word is based (to some extent) on matching the perceived string of phonemes to known words of the language. Consider a language with N words, wi, which are strings of phonemes: wi = (xi1, xi2xil). For m different phonemes there are ml possible words. A particular language will contain a subset of these words, Nml. We define the similarity between two words as the product of the similarities between individual phonemes in corresponding positions. The similarity between word wi and wj is Sij = Πk=1lsij(k), where sij(k) denotes the similarity between the k–th phonemes of words wi and wj. The probability of correctly understanding word wi is Pi = 1/Σjml=1Sijσj, where σj = 1 if word wj is part of the language, and σj = σ if word wj is not part of the language. The parameter σ is a number between 0 and 1 and specifies the degree to which word recognition is based on correct understanding of every phoneme versus understanding of the whole word. If σ = 0, then each word is only compared with every other word that is a part of the language; correct understanding of a word consists in comparing the perceived word with all other words that are part of the lexicon. An implicit assumption here is that individuals have perfect knowledge of the whole lexicon. If σ = 1, then every word is compared with every other possible word that can be formed by combining the phonemes. Correct understanding of a word requires a correct identification of each individual phoneme. The listener does not need to have a list of the lexicon. A value of σ between 0 and 1 blends these two possibilities. In this case, recognition of a word is to some extent based on identification of each individual phoneme and to some extent on identification of the word selected from the list of all words that are contained in the language. The maximum payoff for such a language is given by F = Σi=1NPi (Fig. (Fig.3). 3Figure 3).
Figure 3
Figure 3
Figure 3
Word formation can overcome the error limit. Suppose there are n = 100 objects with values ai uniformly distributed between 0 and 1. (a) Without word formation, each object is described by one sound. The similarity between different sounds is [var epsilon]. (more ...)
Combining sounds into words leads to an essentially unlimited potential for different words. This step in language evolution can be seen as a transition from an analogue to a digital system. The repertoire is not increased by adding more sounds, but by combining a set of easily distinguishable sounds into words. In all existing human languages, only a small subset of the sounds producible by the vocal apparatus are employed to generate a large number of words. These words are then used to construct an unlimited number of sentences. The crucial difference between word and sentence formation is that the first consists essentially of memorizing all (relevant) words of a language, whereas the second is based on grammatical rules. We do not memorize a list of all possible sentences.
The Evolution of Basic Grammatical Rules.
The next step in language evolution is the emergence of a basic syntax or grammar. Recall that by combining sounds into words, the protolanguage achieves an almost limitless potential for generating words with the power of describing a large number of objects or actions. Grammar emerges in the attempt to convey more information by combining these words into phrases or sentences. Simply naming an object will be less valuable than naming it and describing its action. (A leopard can be stalking, in which case it is a serious risk, or merely sleeping and thereby posing a lesser risk.) There is an obvious advantage to describing both objects and actions. Suppose there are n objects and h actions; there are nh possible combinations, but only a fraction, [var phi], of them may be relevant (for example: leopard runs; monkey runs; but not banana runs). A “nongrammatical” approach would be to conceive N = [var phi]nh different words for all combinations. A “grammatical” approach would be to have n words for objects (i.e., nouns) and h words for actions (i.e., verbs). Let us compare the fitness of grammar and nongrammar.
Again, we will include errors, this time as a probability to mistake words, which can include acoustic misunderstanding and/or incorrect assignment of meaning. The maximum fitness of a nongrammatical language with N different words is Fng = N/[1 + (N − 1)ξ]. The maximum fitness for a grammatical language is Fg = N/{[1 + (n − 1)ξ][1 + (h − 1)ξ]}. Here, ξ is the similarity between words. In the nongrammatical language, each event is described by one word, and correct communication requires that this word is distinguished from N − 1 other words. The grammatical language uses two words for every event: we can say that nouns describe objects and verbs describe actions. Each noun has to be distinguished from n − 1 other nouns, and each verb from h − 1 other verbs. Whether grammar wins in the evolutionary language game depends on the number of combinations of nouns and verbs that describe relevant events. Fg > Fng leads to
equation M3
3
If there is no possibility of mistakes (ξ = 0), then there is no difference between grammar and nongrammar. If there are too many mistakes (ξ > ξmax), then grammar is disadvantageous. Between these two limits, there is a “grammar zone” (0 < ξ < ξmax), where grammar has a higher fitness than nongrammar.
From Eq. 3, it follows that a necessary condition for grammar to win is
equation M4
4
The number of events must exceed (or equal) the sum of nouns and verbs that can be constructed to describe these events. In other words, a grammatical system is favored only if the number of relevant sentences (that individuals want to communicate to each other) exceeds the number of words that make up these sentences. Note that the main difference between grammar and nongrammar is not to use one or two words for each event, but the number (and types) of rules that need to be remembered for correct communication. Grammar can be seen as a simplified rule system that reduces the chances of mistakes in implementation and comprehension and is therefore favored by natural selection in a world where mistakes are possible.
Thus far, we have specified only those conditions conducive for grammar to have a higher fitness than nongrammar. We can also formulate a model describing how grammar can evolve gradually by natural selection (see Fig. Fig.44Figure 4 and Appendix).
Figure 4
Figure 4
Figure 4
Grammar can evolve by natural selection. (a) Imagine a simple protolanguage describing two objects, O1 and O2, by two words, W1 and W2. Suppose each object can occur with two actions, A1 and A2. Thus, there are four events, O1A1, O2A1, O1A2 (more ...)
The model can be extended in many ways. For example, events can consist of one action and several objects. Objects may be associated with properties, giving rise to adjectives. Events can have similar associations, giving rise to adverbs. The essential result is that a grammatical language that has words for each component of an event receives a higher payoff in the evolutionary language game than a nongrammatical language that has words (or a string of words) for the whole event. In this context, the grammar of human languages evolved to reflect the “grammar of the real world” (that is, the underlying logic of how objects relate to actions and other objects).
Conclusions.
In this paper, we have outlined simple mathematical models that provide new insights into how natural selection can guide three fundamental, early steps in the evolution of human language.
The question concerning why only humans evolved language is hard to answer. Interestingly, however, our models do not suggest that a protolanguage will evolve under all circumstances but outline several obstacles impeding its emergence. (i) In the simplest model (Fig. (Fig.1),1Figure 1), signal–object associations form only when information transfer is beneficial to both speaker and listener. Otherwise, the evolution of communication requires cooperation between individuals. Thus, cooperation may represent an important prerequisite for the evolution of language. (ii) In the presence of errors, only a very limited communication system describing a small number of objects can evolve by natural selection (Fig. (Fig.2).2Figure 2). We believe that this error limit is where most animal communication came to a stop. The obvious means to overcome this limit would be to use a larger variety of sounds, but this approach leads into a cul-de-sac. A completely different approach is to restrict the system to a subset of all possible sounds and to combine them into “words” (Fig. (Fig.3). 3Figure 3). (iii) Finally, although grammar can be an advantage for small systems (Fig. (Fig.4),4Figure 4), it may become necessary only if the language refers to many events. Thus, the need for grammar arises only if communication about many different events is required: a language must have more relevant sentences than words. It is likely that for most animal communication systems, the inequality (4) is not fulfilled.
We view this paper as a contribution toward formalizing the laws that governed the evolution of the primordial human languages. There are, of course, many important and more complex properties of human language that we have not considered here and that should ultimately be part of an evolutionary theory of language. We argue, however, that any such theory has to address the basic questions of signal–object association, word formation, and the emergence of a simple syntax or grammar, for these are the atomic units that make up the edifice of human language.
Acknowledgments
Thanks to Dominic Welsh, Sebastian Bonhoeffer, Lindi Wahl, Nick Grassly, and Robert May for stimulating discussion. Support from The Alfred P. Sloan Foundation, The Florence Gould Foundation, The Ambrose Monell Foundation, and the J. Seward Johnson Trust is gratefully acknowledged.
Appendix
Consider two objects, O1 and O2, that can cooccur with two actions, A1 and A2. Thus, there are four events, O1A1, O2A1, O1A2, and O2A2. The nongrammatical approach is to describe each event with a separate word, W1W4. The grammatical approach is to have separate words for objects, N1 and N2, and actions, V1 and V2. Consider mixed strategies that use the grammatical system with probability x. The active matrix, P, is given by
equation M5
The rows correspond to the four events: O1A1, O2A1, O1A2, and O2A2. The columns correspond to the eight signals: W1, W2, W3, W4, N1V1, N2V1, N1V2, and N2V2. The pure strategies, x = 0 and x = 1, describe nongrammar and grammar, respectively. The passive matrix, Q, is obtained by replacing all nonzero entries in P by 1 (and transforming this matrix). Note that mixed strategies, 0 < x < 1, have eight nonzero entries, whereas pure strategies have only four nonzero entries in both P and Q. Thus, mixed strategies have the possibility to understand both grammar and nongrammar, whereas the two pure strategies do not understand each other. Finally, we include the possibility of errors, either in implementation or comprehension. The error matrix is given by
equation M6
Here, ξ is the similarity between words or the fraction of times a word is mistaken or misimplemented for another. We used η1 = 1/(1 + 3ξ) and η2 = 1/(1 + ξ)2. We assume that the nongrammatical one-word sentences are not confused with the grammatical two-word sentences. The error matrix specifies the crucial difference between grammar and nongrammar.
The system can be completely understood in analytic terms. The payoff for language x communicating with language y is given (with Eq. 2) by: F (x,y) = (2 − xy)f1 + (x + y)f2, where f1 = 4/(1 + 3ξ) and f2 = 4/(1 + ξ)2. These equations hold for x and y between 0 and 1. Otherwise, we have F(x,0) = F(0,x) = (2 − x)f1 and F(x,1) = F (1,x) = (1 + x)f2. The payoffs for nongrammar and grammar are, respectively, F(0,0) = 2f1 and F(1,1) = 2f2. Because f1 < f2 and f2 < 2f1, we have the following interesting dynamics: both x = 0 and x = 1 are evolutionarily stable strategies that cannot invade any other strategy, but every mixed strategy, x, is invaded and replaced by every other strategy, y, if x < y < 1. Thus, the adaptive dynamics flow toward grammar. Alternatively, one can also assume that the pure strategies can understand each other, that is, the passive matrices of all strategies are the same; in this case, grammar (x = 1) is the only evolutionarily stable strategy and can beat every other strategy.
1. Chomsky N. Language and Mind. New York: Harcourt Brace Jovanovich; 1972.
2. Pinker S. The Language Instinct. New York: Morrow; 1994.
3. Eco U. The Search for the Perfect Language. London: Fontana; 1995.
4. Seyfarth R, Cheney D, Marler P. Science. 1980;210:801–803. [PubMed]
5. Burling R. Curr Anthropol. 1989;34:25–53.
6. Cheney D, Seyfarth R. How Monkeys See the World. Chicago: Univ. of Chicago Press; 1990.
7. Greenberg J H. Language, Culture and Communication. CA: Stanford Univ. Press; 1971.
8. Cavalli-Sforza L L, Cavalli-Sforza F. The Great Human Diasporas. Reading, MA: Addison–Wesley; 1995.
9. Newport E. Cogn Sci. 1990;14:11–28.
10. Bates E. Curr Opin Neurobiol. 1992;2:180–185. [PubMed]
11. Hurford J R. Cognition. 1991;40:159–201. [PubMed]
12. Lieberman P. The Biology and Evolution of Language. Cambridge, MA: Harvard Univ. Press; 1984.
13. Nobre A, Allison T, McCarthy G. Nature (London). 1994;372:260–263. [PubMed]
14. Aboitiz F, Garcia R. Brain Res Rev. 1997;25:381–396. [PubMed]
15. Hutsler J J, Gazzaniga M S. Neuroscientist. 1997;3:61–72.
16. Deacon T. The Symbolic Species. London: Penguin; 1997.
17. Cavalli-Sforza L L, Feldman M W. Cultural Transmission and Evolution: A Quantitative Approach. Princeton: Princeton Univ. Press; 1981.
18. Yasuda N, Cavalli-Sforza L L, Skolnick M, Moroni A. Theor Popul Biol. 1974;5:123–142. [PubMed]
19. Aoki K, Feldman M W. Proc Natl Acad Sci USA. 1987;84:7164–7168. [PubMed]
20. Aoki K, Feldman M W. Theor Popul Biol. 1989;35:181–194. [PubMed]
21. Cavalli-Sforza L L. Proc Natl. Acad Sci USA. 1997;94:7719–7724. [PubMed]
22. Hurford J R. Lingua. 1989;77:187–222.
23. Von Frisch K. The Dance Language and Orientation of Bees. Cambridge, MA: Harvard Univ. Press; 1967.
24. Hauser M D. The Evolution of Communication. Cambridge, MA: Harvard Univ. Press; 1996.
25. Chomsky N. Rules and Representations. New York: Columbia Univ. Press; 1980.
26. Bickerton D. Language and Species. Chicago: Univ. of Chicago Press; 1990.
27. de Saussure F. Cours de Linguistique Generale. Paris: Paycot; 1916.
28. Pinker S, Bloom P. In: The Adapted Mind: Evolutionary Psychology and the Generation of Culture. Barkow J, Cosmides L, Tooby J, editors. London: Oxford Univ. Press; 1992. pp. 451–493.
29. Dunbar R. Grooming, Gossip and the Evolution of Language. Cambridge, MA: Harvard Univ. Press; 1997.
30. MacLennan B. In: Artificial Life II: SFI Studies in the Sciences of Complexity. Langton C G, Taylor C D F, Rasmussen S, editors. Redwood City, CA: Addison–Wesley; 1992. pp. 631–658.
31. Hutchins E, Hazelhurst B. How to Invent a Lexicon: The Development of Shared Symbols in Interaction. London: UCL; 1995.
32. Akmajian A, Demers R A, Farmer A K, Harnish R M. Linguistics: An Introduction to Language and Communication. Cambridge, MA: MIT Press; 1997.
33. Hurford J R, Studdert-Kennedy M, Knight C. Approaches to the Evolution of Language. Cambridge, U.K.: Cambridge Univ. Press; 1998.
34. Parisi D. Brain Cogn. 1997;34:160–184. [PubMed]
35. Steels L. Evol Commun J. 1997;1(1):1–34.
36. Oliphant M. BioSystems. 1996;37:31–38. [PubMed]
37. Maynard Smith J, Szathmary E. The Major Transitions in Evolution. New York: Freeman; 1995.
38. Maynard Smith J. Evolution and the Theory of Games. Cambridge, U.K.: Cambridge Univ. Press; 1982.
39. Smith W J. The Behavior of Communicating. Cambridge, MA: Harvard Univ. Press; 1977.
40. Eigen M, Schuster P. The Hypercycle: A Principle of Natural Self-Organisation. Berlin: Springer; 1979.
41. Szathmary E, Maynard Smith J. Nature (London). 1995;374:227–232. [PubMed]
42. Nowak M A, Sigmund K. Acta Appl Math. 1990;20:247–265.
43. Metz J A J, Geritz S A H, Meszena F G, Jacobs F J A, van Heerwaarden J S. In: Stochastic and Spatial Structures of Dynamical Systems. Van Strien S J, Verduyn Lunel S M, editors. Amsterdam: North Holland; 1996. pp. 183–231.
44. Hofbauer J, Sigmund K. Evolutionary Games and Replicator Dynamics. Cambridge, U.K.: Cambridge Univ. Press; 1998.
45. von Humboldt W. Ueber die Verschiedenheit des Menschlichen Sprachbaus. Bonn: Dummlers; 1836.

See more articles cited in this paragraph
See more articles cited in this paragraph
See more articles cited in this paragraph