GBS pan-genome. The number of specific genes is plotted as a function of the number
n of strains sequentially added (see
Materials and Methods). For each
n, circles are the 8!/[(
n – 1)!·(8 –
n)!] values obtained for the different strain combinations; squares are the averages of such values. The blue curve is the least-squares fit of the function
Fs(
n) = κ
s exp[–
n/τ
s] +
tg(θ) (see Eq.
2 in
Supporting Text) to the data. The best fit was obtained with correlation
r2 = 0.995 for κ
s = 476 ± 62, τ
s = 1.51 ± 0.15, and
tg(θ) = 33 ± 3.5. The extrapolated average number
tg(θ) of strain-specific genes is shown as a dashed line. (
Inset) Size of the GBS pan-genome as a function of
n. The red curve is the calculated pan-genome size

(see Eq.
4 in
Supporting Text), with values of the parameters obtained from the fit of
Fs(
n) (see Eq.
2 in
Supporting Text).