- We are sorry, but NCBI web applications do not support your browser and may not function properly. More information

# Least effort and the origins of scaling in human language

^{†}Complex Systems Lab, Universitat Pompeu Fabra, Doctor Aiguader 80, 08003 Barcelona, Spain; and

^{§}Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501

^{‡}To whom correspondence should be addressed. E-mail: se.fpu.sxec@rerref.nomar.

## Abstract

The emergence of a complex language is one of the fundamental events of human evolution, and several remarkable features suggest the presence of fundamental principles of organization. These principles seem to be common to all languages. The best known is the so-called Zipf's law, which states that the frequency of a word decays as a (universal) power law of its rank. The possible origins of this law have been controversial, and its meaningfulness is still an open question. In this article, the early hypothesis of Zipf of a principle of least effort for explaining the law is shown to be sound. Simultaneous minimization in the effort of both hearer and speaker is formalized with a simple optimization process operating on a binary matrix of signal–object associations. Zipf's law is found in the transition between referentially useless systems and indexical reference systems. Our finding strongly suggests that Zipf's law is a hallmark of symbolic reference and not a meaningless feature. The implications for the evolution of language are discussed. We explain how language evolution can take advantage of a communicative phase transition.

Beyond their specific differences, all known human languages exhibit two fully developed distinguishing traits with regard to animal communication systems: syntax (1) and symbolic reference (2). Trying to explain the complexity gap between humans and other species, different authors have adopted different views from gradual evolution (3) to non-Darwinian positions (4). Arguments are often qualitative in nature and sometimes ad hoc. Only recently mathematical models have explicitly addressed these questions (5, 6).

It seems reasonable to assume that our human ancestors started off with a communication system capable of rudimentary referential signaling, which subsequently evolved into a system with a massive lexicon supported by a recursive system that could combine entries in the lexicon into an infinite variety of meaningful utterances (7). In contrast, nonhuman repertoires of signals are generally small (8, 9). We aim to provide new theoretical insights to the absence of intermediate stages between animal communication and language (9).

Here we adopt the view that the design features of a communication system are the result of interaction between the constraints of the system and demands of the job required (7). More precisely, we will understand the demands of a task such as providing easy-to-decode messages for the receiver. Our system will be constrained by the limitations of a sender trying to code such an easy-to-decode message.

Many authors have pointed out that tradeoffs of utility concerning hearer and speaker needs to appear at many levels. As for the phonological level, speakers want to minimize articulatory effort and hence encourage brevity and phonological reduction. Hearers want to minimize the effort of understanding and hence desire explicitness and clarity (3, 10). Regarding the lexical level (10, 11), the effort for the hearer has to do with determining what the word actually means. The higher the ambiguity (i.e., the number of meanings) of a word, the higher the effort for the hearer. Besides, the speaker will tend to choose the most frequent words. The availability of a word is positively correlated with its frequency. The phenomenon known as the *word-frequency effect* (12) supports it. The most frequent words tend to be the most ambiguous ones (13). Thereafter, the speaker tends to choose the most ambiguous words, which is opposed to the least effort for the hearer. Zipf referred to the lexical tradeoff as the *principle of least effort*. He pointed out that it could explain the pattern of word frequencies, but he did not give a rigorous proof of its validity (11). Word frequencies obey Zipf's law. If the words of a sample text are ordered by decreasing frequency, the frequency of the *k*th word, *P*(*k*), is given by *P*(*k*) *k*^{−α}, with α ≈ 1 (11). This pattern is robust and widespread (14).

Here we show that such a lexical compromise can be made explicit in a simple form of language game where minimization of speaker and hearer needs is introduced in an explicit fashion. As a consequence of this process and once a given threshold is reached, Zipf's law, a hallmark of human language, emerges spontaneously.

## The Model

To define explicitly the compromise between speaker and hearer needs, a cost function must be introduced. Given the nature of our systems, information theory provides the adequate mathematical framework (15). We consider a system involving a set of *n* signals = {*s*_{1},…, *s _{i}*,…,

*s*} and a set of

_{n}*m*objects of reference = {

*r*

_{1},…,

*r*,…,

_{i}*r*}. The interactions between signals and objects of reference (hereafter objects) can be modeled with a binary matrix

_{m}**A**= {

*a*}, where 1 ≤

_{ij}*i*≤

*n*and 1 ≤

*j*≤

*m*. If

*a*= 1, then the

_{ij}*i*th signal refers to the

*j*th object, and

*a*= 0 otherwise. We define

_{ij}*p*(

*s*) and

_{i}*p*(

*r*) as the probability of

_{j}*s*and

_{i}*r*, respectively. If synonymy were forbidden, we would have

_{j} because signals are used for referring to objects. We assume *p*(*r _{i}*) = 1/

*m*in what follows. If synonymy is allowed, the frequency of an object has to be distributed among all its signals. The frequency of a signal,

*p*(

*s*) is defined as

_{i}According to the Bayes theorem we have

*p*(*s _{i}*|

*r*) is defined as

_{j} where ω_{i} = Σ_{j}*a*_{ji} is the number of synonyms of *j*. Substituting Eq. 4 into Eq. 3 we get

The effort for the speaker will be defined in terms of the diversity of signals, here measured by means of the signal entropy, i.e.

If a single word is used for whatever object, the effort is minimal and *H _{n}*() = 0. When all signals have the smallest (nonzero) possible frequency, then the frequency effect is in the worst case for all signals. Consistently,

*H*() = 1.

_{n}The effort for the hearer when *s _{i}* is heard, is defined as

where *p*(*r _{j}*|

*s*) =

_{i}*p*(

*r*,

_{j}*s*)/

_{i}*p*(

*s*) (by the Bayes theorem). The effort for the hearer is defined as the average noise for the hearer, that is

_{i}An energy function combining the effort for the hearer and the effort for the speaker is defined as

where 0 ≤ λ, *H _{n}*(),

*H*(, ) ≤ 1. The cost function depends on a single parameter λ, which weights the contribution of each term.

_{m}## Methods

Ω(λ) is minimized with the following algorithm, summarized in Fig. Fig.1.1. At each step, the graph is modified by randomly changing the state of some pairs of vertices, and the new **A** matrix is accepted if the cost is lowered [if an object has no signals, Ω(λ) = ∞]. The algorithm stops when the modifications on **A** are not accepted *T* = 2*nm* times in a row. Configurations for which an object has no signals assigned are forbidden.

**A**(here

*n*=

*m*= 3), the algorithm performs a change in a small number of bits (specifically, with probability ν, each

*a*can flip).

_{ij}**...**

If Zipf's hypothesis were valid, a Zipfian distribution of signal frequencies should appear for λ ≈ 1/2, where the efforts for the speaker and the hearer have a similar contribution to the cost function. Notice that Ω(1/2) = *H*_{n•m}(,)/2.

## Results

Two key quantities have been analyzed for different values of λ: the mutual information,

which measures the accuracy of the communication, and the (effective) lexicon size, *L*, defined as

where μ_{i} = Σ_{j}*a*_{ij} is the number of objects of *s _{i}*.

Three domains can be distinguished in the behavior of *I _{n}*(, ) versus λ, as shown in Fig. Fig.22

*A*. First,

*I*(, ) grows smoothly for λ < λ* ≈ 0.41.

_{n}*I*(, ) explodes abruptly for λ = λ* ≈ 0.41. An abrupt change in

_{n}*L*(Fig. (Fig.22

*A*) versus λ (Fig. (Fig.22

*B*) is also found for λ = λ*. Single-signal systems (

*L*≈ 1/

*n*) dominate for λ < λ*. Because every object has at least one signal, one signal stands for all the objects.

*I*(, ) indicates that the system is unable to convey information in this domain. Rich vocabularies (

_{n}*L*≈ 1) are found for λ > λ*. Full vocabularies are attained beyond λ ≈ 0.72. The maximal value of

*I*(, ) indicates that the associations between signals and objects are one-to-one maps.

_{n}*A*)

*I*

_{n}(

*S*,

*R*), the average mutual information as a function of λ. λ* = 0.41 divides

*I*

_{n}(

*S*,

*R*) into no-communication and perfect-communication phases. (

*B*) Average (effective) lexicon size,

*L*,

**...**

As for the signal frequency distribution in every domain, very few signals have nonzero frequency for λ < λ* (Fig. (Fig.33*A*), scaling consistent with Zipf's law appears for λ = λ* (Fig. (Fig.33*B*), and an almost uniform distribution is obtained for λ > λ* (Fig. (Fig.33*C*). As it occurs with other complex systems (16), the presence of a phase transition is associated with the emergence of power laws (17).

*P*(

*k*), versus rank,

*k*, for λ = 0.3 (

*A*), λ = λ* = 0.41 (

*B*), and λ = 0.5 (

*B*and

*C*) (averages over 30 replicas:

*n*=

*m*= 150 and

*T*= 2

*nm*). The dotted lines show the distribution that would be obtained

**...**

Knowing that *I _{n}*(, ) =

*I*(, ) and using Eq. 10, minimizing Eq. 9 is equivalent to minimizing

_{n}Other functions could be proposed. Interestingly, the symmetric version of Eq. 9 with conditional entropies in both terms of the right side,

will help us to understand the origins of the sharp transition. Although the global minimum of *H _{n}*() (one signal for all objects) is a maximum of

*H*(|), the global minimum of

_{m}*H*(|) (signal–object one-to-one maps with

_{m}*n*=

*m*) is a maximum of

*H*() in Eq. 9. Thus both terms of Eq. 9 are in conflict. In contrast, the global minimum of

_{n}*H*(|) is a subset of the global minimum of

_{n}*H*(|) in Eq. 13. Consistently, numerical optimization of Eq. 13 shows no evidence of scaling for Eq. 13. Not surprisingly, the minimization of Eq. 13 is equivalent to

_{m}Notice that λ is present in only one of the terms of the right side of the previous equation. Zipf's hypothesis was based on a tension between unification and diversification forces (11) that Eq. 13 does not accomplish. Eq. 9 does.

## Discussion

Theoretical models support the emergence of complex language as the result of overcoming error limits (5) or thresholds in the amount of objects of reference that can be handled (6). Despite their power, these models make little use of some well known quantitative regularities displayed by most human languages such as Zipf's law (11, 18). Most authors, however, make use of Zipf's law as a null hypothesis with no particular significance (6). As far as we know, there is no compelling explanation for Zipf's law, although many have been proposed (19–23). Random texts (random combinations of letters and blanks) reproduce Zipf's law (19, 24–26) and are generally regarded as a null hypothesis (18). Although random texts and real texts differ in many aspects (26, 27), the possibility that Zipf's law results from a simple process (not necessarily a random text) has not been soundly denied. Our results show that Zipf's law is the outcome of the nontrivial arrangement of word–concept associations adopted for complying with hearer and speaker needs. Sudden changes in Fig. Fig.22 and the presence of scaling (Fig. (Fig.33*B*) strongly suggest that a phase transition is taking place at λ = λ* (17).

Maximal mutual information (that is, one-to-one signal–object maps) beyond the transition is the general outcome of artificial-life language models (28, 29) and the case of animal communication (2), where small repertoires of signals are found (8, 9). On the one hand, speaker constraints (λ < λ*) are likely to cause species with a powerful articulatory system (providing them with a big potential vocabulary) to have a referentially useless communication system (8). On the other hand (λ > λ*), least effort for the hearer forces a species to have a different signal for each object at the maximum effort at the expense of the speaker, which allows us to make the following predictions. First, nonhuman repertoires must be small to cope with maximum speaker costs. Consistently, their size is on the order of 20–30 signals for the larger repertoires (8). Second, the large lexicons used by humans cannot be one-to-one maps because of the word-frequency effect (12) that makes evident how lexical access-retrieval cost is at play in humans. Third, large lexicons with one-to-one maps can be obtained only under idealized conditions when effort for the speaker is neglected. This is the case of artificial-language communication models, which reach maximal values of *I _{n}*(, ), making use of fast memory access and the (theoretically) unlimited memory storage of computers (28, 29).

λ > λ* implies not taking into account the effort of the speaker. Getting the right word for a specific object may become unaffordable beyond a certain vocabulary size. Furthermore, a one-to-one map implies that the number of signals has to grow accordingly as the number of objects to describe increases (when *m* → ∞) and leads to a referential catastrophe. A referential catastrophe is supported by the statistics of human–computer interactions, where the largest vocabularies follow Zipf's law (30) and are associated with a higher degree of expertise of the computer user. As the repertoire of potential signals is exhausted, strategies based on the combination of simple units are encouraged. Such a catastrophe could have motivated word formation from elementary syllables or phonemes but also syntax through word combinatorics. In a different context, some authors have shown that natural selection favors word formation or syntax when the number of required signals exceeds a threshold value (6). We show that arranging signals according to Zipf's law is the optimal solution for maximizing the referential power under effort for the speaker constraints. Moreover, almost the best *I _{n}*(, ) is achieved before being forced to use one-to-one signal–object maps (Fig. (Fig.2).2). Although other researchers have shown how overcoming phase transitions could have been the origin of the emergence of syntax (5), our results suggest that early human communication could have benefited from remaining in a referential phase transition. There, communication is optimal with regard to the tradeoff between speaker and hearer needs. An evolutionary prospect is that the number of objects to describe can grow, keeping the size of the lexicon relatively small at the transition.

Having determined the only three optimal configurations resulting from tuning speaker and hearer requirements, the path toward human language can be traced hypothetically: (*i*) a transition from a no-communication phase (λ < λ*) to a perfect-communication phase providing some kind of rudimentary referential signaling (λ < λ*); (*ii*) a transition from a communication phase to the edge of the transition (λ = λ*), where vocabularies can grow affordably (in terms of the speaker's effort) when *m* → ∞. The latter step is motivated by the positive correlation between brain size and cognitive skills in primates (where *m* can be seen as a simple measure of them) (31). Humans may have had a pressure for economical signaling systems (given by large values of *m*) that other species did not have. The above-mentioned emergence of Zipf's law in the usage of computer commands (the only evidence known of evolution toward Zipf's law, although the context is not human–human interactions) is associated with larger repertoires (30), suggesting that there is a minimum vocabulary size and a minimum number of objects encouraging Zipf's law arrangements.

The relationship between both is straightforward if the hearer imposes its needs, because the number of signals must be exactly the number of objects (when *n = m*) in that case. Our results predict that no natural intermediate communication system can be found between small-sized lexica and rich lexica unless Zipf's law is used (Fig. (Fig.22*B*). This might explain why human language is unique with regard to other species but not only so. One-to-one maps between signals and objects are the distinguishing feature of index reference (2). Symbolic communication is a higher-level reference in which reference results basically from interactions between signals (2). Zipf's law appears on the edge of the indexical communication phase and implies polysemy. The latter is the necessary (but not sufficient) condition for symbolic reference (2). Our results strongly suggest that Zipf's law is required by symbolic systems.

## Acknowledgments

We thank P. Fernández, R. Köhler, P. Niyogi, and M. Nowak for helpful comments. This work was supported by the Institució Catalana de Recerca i Estudis Avançats, the Grup de Recerca en Informàtica Biomèdica, the Santa Fe Institute (to R.V.S.), Generalitat de Catalunya Grant FI/2000-00393 (to R.F.i.C.), and Ministerio de Ciencia y Technologia Grant BFM 2001-2154 (to R.V.S.).

## Footnotes

This paper was submitted directly (Track II) to the PNAS office.

## References

*Human Behaviour and the Principle of Least Effort: An Introduction to Human Ecology*(Hafner, New York), 1st ed., pp. 19–55.

**National Academy of Sciences**

## Formats:

- Article |
- PubReader |
- ePub (beta) |
- PDF (172K)

- Emergence of Zipf's law in the evolution of communication.[Phys Rev E Stat Nonlin Soft Matter Phys. 20...]
*Corominas-Murtra B, Fortuny J, Solé RV.**Phys Rev E Stat Nonlin Soft Matter Phys. 2011 Mar; 83(3 Pt 2):036115. Epub 2011 Mar 28.* - The evolution of the exponent of Zipf's law in language ontogeny.[PLoS One. 2013]
*Baixeries J, Elvevåg B, Ferrer-i-Cancho R.**PLoS One. 2013; 8(3):e53227. Epub 2013 Mar 13.* - Beyond word frequency: bursts, lulls, and scaling in the temporal distributions of words.[PLoS One. 2009]
*Altmann EG, Pierrehumbert JB, Motter AE.**PLoS One. 2009 Nov 11; 4(11):e7678. Epub 2009 Nov 11.* - [Principle of least action, physiology of vision, and conditioned reflexes theory].[Ross Fiziol Zh Im I M Sechenova. 2003]
*Shelepin IuE, Krasil'nikov NN.**Ross Fiziol Zh Im I M Sechenova. 2003 Jun; 89(6):725-30.* - Prolegomena to a theory of the sound pattern of the first spoken language.[Phonetica. 1994]
*MacNeilage PF.**Phonetica. 1994; 51(1-3):184-94.*

- A frequency-based linguistic approach to protein decoding and design: Simple concepts, diverse applications, and the SCS Package[Computational and Structural Biotechnology ...]
*Motomura K, Nakamura M, Otaki JM.**Computational and Structural Biotechnology Journal. 5e201302010* - The languages of health in general practice electronic patient records: a Zipf's law analysis[Journal of Biomedical Semantics. ]
*Kalankesh LR, New JP, Baker PG, Brass A.**Journal of Biomedical Semantics. 52* - Whole transcriptome organisation in the dehydrated supraoptic nucleus[Brazilian Journal of Medical and Biological...]
*Hindmarch CC, Franses P, Goodwin B, Murphy D.**Brazilian Journal of Medical and Biological Research. 46(12)1000-1006* - Spatio-Temporal Variation of Conversational Utterances on Twitter[PLoS ONE. ]
*Alis CM, Lim MT.**PLoS ONE. 8(10)e77793* - Medical practices display power law behaviors similar to spoken languages[BMC Medical Informatics and Decision Making...]
*Paladino JD, Crooke PS, Brackney CR, Kaynar AM, Hotchkiss JR.**BMC Medical Informatics and Decision Making. 13102*

- PubMedPubMedPubMed citations for these articles

- Least effort and the origins of scaling in human languageLeast effort and the origins of scaling in human languageProceedings of the National Academy of Sciences of the United States of America. Feb 4, 2003; 100(3)788PMC

Your browsing activity is empty.

Activity recording is turned off.

See more...