(*a*) Power-law distribution on word frequencies obtained from corpus data, consisting of *N* = 33 399 word tokens. Horizontal axis corresponds to word frequency (*x*_{k}) in log scale, and the vertical axis corresponds to the probability *p*(*x*_{k}) that a certain word-type would fall within the bin at that frequency level. A power-law distribution is indicated by a linear relationship with slope γ = 1.70 (Bernstein-Ratner corpus). (*b*) Iterated learning using a two-parameter Poisson–Dirichlet distribution as a prior on distributions over infinitely many variants also produces a power-law relationship with *γ* = 1.74. Simulations were implemented by sampling over a population of arbitrarily assigned 33 399 numerical word tokens to match the size of the corpus. Frequencies were initialized by setting all word tokens to the same unique type. The frequency distribution stabilized after 10 000 iterations of learning, and the result shown here reflect the distribution produced by a single learner after 20 000 iterations. We ran the simulations across a range of values of *δ* (from 0.1 to 1, with steps of 0.1), and the value of *α* was set to 10 (see the electronic supplementary material, for details). Simulations with *δ* = 0.3 produced the closest match to the corpus data, and this is the case shown in the figure model. (*c*) Initial lexical frequency *x*_{k} plotted against the replacement rate, estimated as *r* = 1/*t*, where *t* is the number of iterations before absorption (i.e. *x*_{k} = 0). For each frequency value, time of absorption was directly measured over 5000 iterations after frequencies reached a steady state. The resulting linear relationship on a log–log plot reflects an underlying power law with *γ* = 0.8 (the correlation between log frequency and log replacement rate is *r* = −0.81, *p* < 0.00001).

## PubMed Commons