![]() | ![]() |
Formats:
|
||||||||||||
Quantifying the evolutionary dynamics of language 1Program for Evolutionary Dynamics, Department of Organismic and Evolutionary Biology, Department of Mathematics, Harvard University, Cambridge, MA 02138, USA 2Department of Applied Mathematics, Harvard University, Cambridge, MA 02138, USA 3Harvard-MIT Division of Health Sciences and Technology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA 4Department of Systems Biology, Harvard Medical School, Boston, MA 02115, USA. *These authors contributed equally to this work. Reprints and permissions information is available at npg.nature.com/reprintsandpermissions. The authors declare no competing financial interests. Correspondence and requests for materials should be addressed to M. A. N. (Email: martin_nowak/at/harvard.edu) Abstract Human language is based on grammatical rules1–4. Cultural evolution allows these rules to change over time5. Rules compete with each other: as new rules rise to prominence, old ones die away. To quantify the dynamics of language evolution, we studied the regularization of English verbs over the last 1200 years. Although an elaborate system of productive conjugations existed in English’s proto-Germanic ancestor, modern English uses the dental suffix, -ed, to signify past tense6. Here, we describe the emergence of this linguistic rule amidst the evolutionary decay of its exceptions, known to us as irregular verbs. We have generated a dataset of verbs whose conjugations have been evolving for over a millennium, tracking inflectional changes to 177 Old English irregulars. Of these irregulars, 145 remained irregular in Middle English and 98 are still irregular today. We study how the rate of regularization depends on the frequency of word usage. The half-life of an irregular verb scales as the square root of its usage frequency: a verb that is 100 times less frequent regularizes 10 times as fast. Our study provides a quantitative analysis of the regularization process by which ancestral forms gradually yield to an emerging linguistic rule. Natural languages comprise elaborate systems of rules which enable one speaker to communicate with another7. These rules serve to simplify the production of language and enable an infinite array of comprehensible formulations8–10. Yet each rule has exceptions, and even the rules themselves wax and wane over centuries and millennia11,12. Verbs which obey standard rules of conjugation in their native language are called regular verbs13. In the modern English language, regular verbs are conjugated into the simple past and past participial forms by appending the dental suffix -ed to the root (for instance, talk/talked/talked). Irregular verbs obey antiquated rules (sing/sang/sung) or in some cases, no rule at all (go/went)14,15. New verbs entering English universally obey the regular conjugation (google/googled/googled), and many irregular verbs eventually regularize. Regular verbs become irregular much more rarely: for every sneak that snuck in16, there are many more flews that flied out. Although less than 3% of modern verbs are irregular, the ten most common verbs are all irregular (be, have, do, go, say, can, will, see, take, get). The irregular verbs are heavily biased towards high frequencies of occurrence17,18. Linguists have suggested an evolutionary hypothesis underlying the frequency distribution of irregular verbs: uncommon irregular verbs tend to disappear more rapidly because they are less readily learned, and more rapidly forgotten19,20. In order to study this phenomenon quantitatively, we studied verb inflection beginning with Old English (the language of Beowulf, spoken circa 800 CE), continuing through Middle English (the language of Chaucer’s Canterbury Tales, spoken circa 1200 CE), and ending with Modern English, the language as it is spoken today. The modern -ed rule descends from Old English ‘weak’ conjugation, which applied to 3/4 of all Old English verbs21. The exceptions - ancestors of the modern irregulars - were mostly members of the so-called ‘strong’ verbs. There are 7 different classes of strong verbs with exemplars among the modern English irregulars, each with distinguishing markers that often include characteristic vowel shifts. Though stable coexistence of multiple rules is one possible outcome of rule dynamics, this is not what occurred in English verb inflection22. We therefore define regularity with respect to the modern -ed rule, and call all these exceptional forms ‘irregular’. We consulted a large collection of grammar textbooks describing verb inflection in these earlier epochs, and hand annotated every irregular verb they described. (See Supplementary Information.) This provided us with a list of irregular verbs from ancestral forms of English. Eliminating verbs which were no longer part of Modern English, we compiled a list of 177 Old English irregular verbs which remain part of the language to this day. Of these 177 Old English irregulars, 145 remained irregular in Middle English, and 98 are still irregular in Modern English. Verbs such as help, grip, and laugh, which were once irregular, have become regular with the passing of time. Next we obtained frequency data for all verbs by using the CELEX corpus, which contains 17.9 million words from a wide variety of textual sources23. For each of our 177 verbs we calculated the frequency of occurrence among all verbs. We subdivided the frequency spectrum into six logarithmically spaced bins from 10−6 to 1. Figure 1a
Plotting the number of irregular verbs against their frequency generates a unimodal distribution with a peak between 10−4 and 10−3. This unimodal distribution again demonstrates that irregular verbs are not an arbitrary subset of all verbs, because a random subset of verbs (such as all verbs that contain the letter ‘m’) would follow a power law distribution with a slope of three-fourths24,25. Four of our six frequency bins, those between 10−6 and 10−2, allow us to estimate the relative regularization rates of irregular verbs. Calculating the relative regularization rates of verbs of different frequencies is independent of time, which makes the dating of Old and Middle English irrelevant for this calculation. We can draw regularization rate versus frequency and fit a straight line in a log-log plot (Figure 1b Figure 2a
We cannot directly determine the regularization rate for frequency bins above 10−2, because regularization is so slow that no event was observed in the time span of our data. But we can extrapolate. For instance, the half-life of verbs with frequencies between 10−2 and 10−1 should be 14,400 years. For these bins, the population is so small and the half-life so long that we may not see a regularization event in the lifetime of the English language. To test whether the dynamics within individual competing rules were captured by our global analysis, we studied the decay of individual classes of strong verbs (e.g., hit/hit/hit, hurt/hurt/hurt; draw/drew/drawn, grow/grew/grown)26. Although our resolution is limited by the small sample size, exponential decay is once again observed, with similar exponents. (See Supplementary Figure S1.) Like a Cheshire cat, dying rules vanish one instance at a time, leaving behind a unimodal frown. Because adequate corpora of Old and Middle English do not exist, we have estimated the frequency of an irregular verb of Old and Middle English by the frequency of the corresponding (regular or irregular) verb of Modern English.27 A large fraction of verbs would have had to change frequency by several orders of magnitude in order to interfere with the effects observed. To verify that large changes in frequency are rare, we compared frequency data from CELEX with frequencies drawn from the largest available corpus of Middle English texts28. Out of fifty verbs, only five had frequency changes greater than a factor of 10. (See Supplementary Figure S2.) Our analysis covers a vast period, spanning the Norman invasion and the invention of the printing press, but these events did not upset the dynamics of English regularization. Therefore, it is possible to retrospectively trace the evolution of the irregular verbs, moving backwards in time from the observed Modern distribution and up through Middle and Old English. Going still further back in time allows us to explore the effects of completely undoing the frequency-dependent selective process which the irregular verbs have undergone. Eventually, the shape of the curve changes from unimodal to a power law decline with slope nearly −3/4 (Figure 3
We can also make predictions about the future of the past tense. By the time one verb from the set {begin, break, bring, buy, choose, draw, drink, drive, eat, fall} will regularize, five verbs from the set {bid, dive, heave, shear, shed, slay, slit, sow, sting, stink} will be regularized. If the current trends continue, only 83 of the 177 verbs studied will be irregular in 2500. What will be the next irregular verb to regularize? Most likely it will be wed/wed/wed. Wed’s frequency is only 4.2 uses per million verbs, ranking at the very bottom of the modern irregulars. Indeed, it is already being replaced in many contexts by wed/wedded/wedded. Now is your last chance to be a newly-wed. The married couples of the future can only hope for wedded bliss. In prior millennia, many rules vied for control of English language conjugation, and fossils of those rules remain to this day. Yet from this primordial soup of conjugations, the dental suffix -ed emerged triumphant. The competing rules are long dead, and unfamiliar even to well-educated native speakers. These rules disappeared because of the gradual erosion of their instances by a process we, from a privileged vantage, call regularization. But regularity is not the default state of a language. A rule is the tombstone of a thousand exceptions. Methods Summary We searched 11 reference works on Old and Middle English, compiling a list of every irregular verb which we found. We determined whether each verb was still present in Modern English. For all those Old English verbs whose descendants remained in the English language, we checked whether they were still irregular using a complete listing of the Modern irregular verbs. If they had regularized, we determined when regularization had occurred based on the last time period in which we found a positive annotation listing the verb as irregular. A list of sources used, and the entire resulting annotation, are provided in the Supplementary Information. We determined usage frequencies for all the verbs using the CELEX database. We then binned the Old English irregular verbs using a standard logarithmic binning algorithm in Python. We used the resulting binning to determine regularization rates for verbs of differing frequencies. Regularization rates (Figure 1b
Supplementary Online Information Click here to view.(2.5M, pdf) Supplementary English Verbs Click here to view.(5.3K, txt) Acknowledgements The Program for Evolutionary Dynamics is sponsored by J. Epstein. E.L. was supported by the National Defense Science and Engineering Graduate Fellowship and the National Science Foundation Graduate Fellowship. We are indebted to S. Pinker, J. Rau, D. Donoghue, and A. Presser for discussions. We thank J. Saragosti for help with visualization. Footnotes Supplementary Information is linked to the online version of the paper at www.nature.com/nature. References 1. Chomsky N. Aspects of the Theory of Syntax. Cambridge: The MIT Press; 1965. 2. Lightfoot D. The Development of Language: Acquisition, Change and Evolution. Oxford: Blackwell; 1999. 3. Clark R, Roberts I. A computational model of language learnability and language change. Linguist. Inq. 1993;24:299–345. 4. Abrams D, Strogatz S. Modelling the dynamics of language death. Nature. 2003;424:900. [PubMed] 5. Nowak MA, Komarova NL, Niyogi P. Computational and evolutionary aspects of language. Nature. 2002;417:611–617. [PubMed] 6. Hooper J. In: Current Progress in Historical Linguistics. Christie W, editor. Amsterdam: North-Holland; 1976. pp. 95–105. 7. Hauser MD, Chomsky N, Fitch WT. The faculty of language: what is it, who has it, and how did it evolve? Science. 2002;298:1569–1579. [PubMed] 8. Chomsky N, Lasnik H. In: Syntax: An International Handbook of Contemporary Research. Jacobs J, editor. Berlin: de Gruyte; 1993. pp. 506–569. 9. Dougherty RC. Natural Language Computing. Hillsdale: Lawrence Erlbaum; 1994. 10. Stabler EP, Keenan EL. Structural similarity within and among languages. Theor. Comput. Sci. 2003;293:345–363. 11. Niyogi P. The Computational Nature of Language Learning and Evolution. Cambridge: The MIT Press; 2006. 12. Labov W. Transmission and Diffusion. Language. 2007;83:344–387. 13. Pinker S. Words and Rules: The Ingredients of Language. New York: Basic Books; 1999. 14. Kroch A. Reflexes of grammar in patterns of language change. Lang. Variation Change. 1989;1:199–244. 15. Kroch A. In: Beals K, et al., editors. Papers from the 30th Regional Meeting of the Chicago Linguistics Society: Parasession on Variation and Linguistic Theory; Chicago. CLS; 1994. pp. 180–201. 16. Pinker S. The irregular verbs. Landfall. 2000 March; 17. Bybee J. Morphology: A Study of Relation Between Meaning and Form. Amsterdam: John Benjamins; 1985. 18. Greenberg J. In: Current Trends in Linguistics III. Sebeok TA, et al., editors. The Hague: Mouton; 1966. pp. 61–112. 19. Bybee J. From usage to grammar: the mind’s response to repetition. Language. 2006;82:711–733. 20. Corbett G, Hippisley A, Brown D, Marriott P. In: Frequency and the Emergence of Linguistic Structure. Bybee J, Hopper P, editors. Amsterdam: John Benjamins; 2001. pp. 201–226. 21. Hare M, Elman J. Learning and morphological change. Cognition. 1995;56:61–98. [PubMed] 22. Marcus G, Brinkmann U, Clahsen H, Wiese R, Pinker S. German inflection: the exception that proves the rule. Cognit. Psychol. 1995;29:189–256. [PubMed] 23. Van der Wouden T. In: Magay T, Zigány J, editors. Papers from the 3rd International EURALEX Congress; Budapest. Akadémiai Kiadó; 1988. pp. 363–373. 24. Zipf GK. Human Behavior and the Principle of Least Effort. Cambridge: Addison-Wesley; 1949. 25. Miller GA. Some effects of intermittent silence. Am. J. Psychol. 1957;70:311–314. [PubMed] 26. Yang C. Knowledge and Learning in Natural Language. New York: Oxford University Press; 2002. 27. Glushko M. Towards the quantitative approach to studying evolution of English verb paradigm; Proceedings of the 19th Scandinavian Conference of Linguistics; 2003. pp. 30–45. 28. Kroch A, Taylor A. Penn-Helsinki Parsed Corpus of Middle English. second edition. 2000. |
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||
Nature. 2003 Aug 21; 424(6951):900.
[Nature. 2003]Nature. 2002 Jun 6; 417(6889):611-7.
[Nature. 2002]Science. 2002 Nov 22; 298(5598):1569-79.
[Science. 2002]Cognition. 1995 Jul; 56(1):61-98.
[Cognition. 1995]Cogn Psychol. 1995 Dec; 29(3):189-256.
[Cogn Psychol. 1995]Am J Psychol. 1957 Jun; 70(2):311-4.
[Am J Psychol. 1957]