MOIRE: A software package for the estimation of allele frequencies and effective multiplicity of infection from polyallelic data

Malaria parasite genetic data can provide insight into parasite phenotypes, evolution, and transmission. However, estimating key parameters such as allele frequencies, multiplicity of infection (MOI), and within-host relatedness from genetic data has been challenging, particularly in the presence of multiple related coinfecting strains. Existing methods often rely on single nucleotide polymorphism (SNP) data and do not account for within-host relatedness. In this study, we introduce a Bayesian approach called MOIRE (Multiplicity Of Infection and allele frequency REcovery), designed to estimate allele frequencies, MOI, and within-host relatedness from genetic data subject to experimental error. Importantly, MOIRE is flexible in accommodating both polyallelic and SNP data, making it adaptable to diverse genotyping panels. We also introduce a novel metric, the effective MOI (eMOI), which integrates MOI and within-host relatedness, providing a robust and interpretable measure of genetic diversity. Using extensive simulations and real-world data from a malaria study in Namibia, we demonstrate the superior performance of MOIRE over naive estimation methods, accurately estimating MOI up to 7 with moderate sized panels of diverse loci (e.g. microhaplotypes). MOIRE also revealed substantial heterogeneity in population mean MOI and mean relatedness across health districts in Namibia, suggesting detectable differences in transmission dynamics. Notably, eMOI emerges as a portable metric of within-host diversity, facilitating meaningful comparisons across settings, even when allele frequencies or genotyping panels are different. MOIRE represents an important addition to the analysis toolkit for malaria population dynamics. Compared to existing software, MOIRE enhances the accuracy of parameter estimation and enables more comprehensive insights into within-host diversity and population structure. Additionally, MOIRE's adaptability to diverse data sources and potential for future improvements make it a valuable asset for research on malaria and other organisms, such as other eukaryotic pathogens. MOIRE is available as an R package at https://eppicenter.github.io/moire/.


Introduction 1
Genetic data can be a powerful source of information for 2 understanding malaria parasite phenotype and transmission without consideration of strain composition from polyclonal samples results in a consistent overestimation of heterozygosity, leading to potentially faulty inference about population diversity.
Additionally, naive estimation offers no principled way to address genotyping error beyond heuristics, further biasing estimates of diversity in ways that depend on choices made during initial interpretation of genotyping data.Alternatively, considering only monoclonal samples is potentially problematic, as this may require a substantial number of samples to be discarded when collected from regions where multiple infection is the rule rather than the exception.Further, the monoclonal subset of samples are fundamentally different from the larger population of interest, as they preclude the possibility of within-host relatedness between strains.This ignores a potentially important source of information about transmission dynamics, as within-host relatedness may be indicative of co-transmission or persistent local transmission (Wong et al., 2017;Nkhoma et al., 2020;Wong et al., 2018).
To address these issues and make full use of available data, Chang et al. (2017) developed a Bayesian approach (THE REAL McCOIL) to estimate allele frequencies and MOI in the context of polygenomic infections from single nucleotide polymorphism (SNP) based data.More recently, coiaf (Paschalidis et al., 2023) and SNP-Slice (Ju et al., 2023) have been developed to further improve computational efficiency and resolving power.
Briefly, coiaf takes user provided allele frequencies and SNP read count data and applies an optimization procedure to estimate either discrete or continuous values for MOI.SNP-Slice also takes SNP read count data and uses a non-parametric Bayesian approach to simultaneously estimate phased strain identity and within-host strain composition.While Paschalidis et al. suggest within-host relatedness as a possible explanation for continuous values of MOI, and the method by Ju et al. may provide a way to interrogate within-host relatedness through phased strain composition, none of these methods directly consider or estimate within-host relatedness.Further, these methods are all tailored to SNP based data and are unable to accommodate more diverse polyallelic loci, such as microsatellites, which have been widely used in population genetic studies (Anderson et al., 2000;Tessema et al., 2019;Roh et al., 2019;Pringle et al., 2019).
Other methods that infer within-host relatedness (Zhu et al., 2019), in contrast, rely on whole genome sequencing (WGS) data.WGS based approaches, however, frequently have poor sensitivity for detecting minority strains and low density infections (Tessema et al., 2022).In recent years, the declining cost of DNA sequencing and development of high throughput, high diversity, targeted sequencing panels have made polyallelic data even more attractive for genomic based studies of malaria (Tessema et al., 2022;LaVerriere et al., 2022;Kattenberg et al., 2023).Genetic analysis methods leveraging polyallelic loci have the potential for substantially increased resolving power over their SNP based counterparts, particularly in the context of related polyclonal infections in malaria (Taylor et al., 2019;Inna Gerlovina et al., 2022).Unfortunately, there are limited tools available to analyze these types of data.
We present here a new Bayesian approach, Multiplicity Of Infection and allele frequency REcovery from noisy polyallelic data (MOIRE ), that, like THE REAL McCOIL, enables the estimation of allele frequencies and MOI from genomic data that are subject to experimental error.In addition, MOIRE estimates and accounts for within-host relatedness of parasites, a common occurrence due to the inbreeding of parasites serially 81 co-transmitted by mosquitoes (Nkhoma et al., 2020(Nkhoma et al., , 2012)).an interpretable quantity that is comparable across genotyping 98 panels and transmission settings.We contrast this with the within-99 host infection fixation index, F W S , a frequently used metric 100 of within-host diversity and signal of inbreeding and population 101 sub-structure (Manske et al., 2012;Auburn et al., 2012), and 102 demonstrate the inherent shortcomings of F W S as a non-portable 103 metric.

106
Consider observed genetic data X = (X1, . . ., Xn) from n samples 107 indexed by i, where each Xi is a collection of vectors indexed 108 by l of possibly differing length, representing the varying number 109 of alleles possible at each locus, e.g.polyallelic loci.Each vector 110 is binary, with 1 representing the allele was observed or 0 111 representing the allele went unobserved at locus l for sample i. 112 From this data, we wish to estimate MOI for each individual 113 (µ = [µ1, . . ., µn]), within host relatedness (r = [r1, . . ., rn]), 114 defined as the average proportion of the genome that is identical by 115 descent across all strains, individual specific genotyping error rates 116 ), and population allele 117 frequencies at each locus (π = [π1, . . ., π l ]).Similar to Chang et al. 118 (2017), we applied a Bayesian approach and looked to estimate the 119 posterior distribution of µ, r, ϵ + , ϵ − and π as 120 using MOIRE with default priors and settings, using 40 parallel tempered chains for 5000 burn-in steps, followed by 10,000 samples which were thinned to 1000 total samples.

Estimation of multiplicity of infection, within-host relatedness, and allele frequencies
We simulated collections of 100 samples under varied combinations of population mean MOI, average within-host relatedness, false positive and false negative rates, and different genotyping panels (details of our simulation procedure may be found in the supplement section 4).Individual MOIs were drawn from zero truncated Poisson (ZTP) distributions with rate parameters 1, 3, and 5, resulting in mean MOIs of 1.58, 3.16, and 5.03 respectively.Within-host relatedness was simulated from settings with low, moderate, and high relatedness.False positive and false negative rates were varied from 0 to 0.1.We first simulated synthetic genomic loci with prespecified diversity: 100 SNPs, 30 loci with 5 alleles (moderate diversity), 30 loci with 10 alleles (high diversity), and 30 loci with 20 alleles (very high diversity) with frequencies drawn from the uniform Dirichlet distribution.We also assessed potential real world performance of MOIRE by simulating data for 5 currently used genotyping panels from 12 regional populations characterized by the MalariaGEN Pf7 dataset (Abdel Hamid et al., 2023) as described in the supplementary material (section 7, Supplementary Figure 4).Genetic loci were selected according to a 24 SNP panel (Daniels et al., 2008), a 101 SNP panel (Chang et al., 2019), and 3 recently developed amplicon sequencing panels consisting of 128 (LaVerriere et al., 2022), 165 (Aranda-Diaz and Neubauer Vickers, 2022), and 233 (Kattenberg et al., 2023) diverse microhaplotypes respectively.Like the fully synthetic simulations, these simulations were varied over a range of MOI and withinhost relatedness, however error rates were fixed at moderate false positive and false negative rates of .01 and .1 respectively for the purposes of computational feasibility due to the extensive number of simulations required.We chose these levels as we believe they are reflective of the most likely situation of higher levels of false negatives and relatively low rates of false positives from a typical bioinformatics pipeline.We then ran MOIRE and calculated summary statistics of interest on the sampled posterior distributions.
We estimated allele frequencies, heterozygosity, MOI, within-host relatedness, and eMOI using the mean or median of the posterior distribution output by MOIRE.It should be noted that withinhost relatedness is only defined for polyclonal infections, so the posterior distribution of within-host relatedness is conditional on the MOI being greater than 1.We contrasted these with naive estimates of allele frequency and MOI by assuming that an observed allele was contributed by a single strain, and estimated MOI as equal to the second-highest number of alleles observed across loci.We calculated ground truth allele frequencies using the true number of strains contributing each allele.
Under moderate false positive and false negative rates of 0.01 and 0.1 respectively, MOIRE accurately recovered parameters of interest across a range of genotyping panels, population MOI, and within-host relatedness (Figure 1, Table 1).Allele frequencies estimated by MOIRE were unbiased across genotyping panels (Figure 1B), leading to unbiased estimates of heterozygosity (Figure 1C).Naive estimation exhibited substantial bias that

247
MOI was also well estimated by MOIRE, with accuracy increasing substantially in the presence of more diverse loci (Figure 1D).
In the context of SNPs, MOIRE recovered MOI accurately up to approximately 4 strains, and then began to exhibit limited ability to resolve.More diverse panels enabled greatly improved resolving power, allowing for the accurate recovery of MOI up to approximately 7 strains.Naive estimation substantially underestimated MOI in comparison, due in part to the limited capacity of low diversity loci to discriminate MOI, as well as the presence of related strains that deflate the observed number of distinct alleles.This bias was particularly prominent for low diversity markers such as SNPs which can only resolve up to 2 strains.
MOIRE was generally able to recover within-host relatedness, particularly for moderate and high diversity markers in the context of high relatedness (Figure 1E).SNP based panels had difficulty resolving individual level within-host relatedness and were sensitive to the uniform prior.It should be noted that in the circumstance that a monoclonal infection has an inferred MOI greater than 1, MOIRE will likely classify these infections with very high relatedness (Figure 1E).This is due to the presence of false positives that MOIRE will sometimes infer as an infection consisting of highly related strains rather than being explained by observation error.Therefore, within-host relatedness should be interpreted in the context of the probability of the infection being polyclonal.A more robust metric is eMOI, since it is a metric of diversity that integrates MOI and within-host relatedness.
MOIRE recovered eMOI with high accuracy under all conditions using polyallelic panels (Figure 1F).SNP panels exhibited a larger degree of bias at higher eMOI, but still performed relatively well for eMOI of up to 4. This demonstrates that while identifiability of MOI or within-host relatedness may be challenging in some situations, eMOI is a reliably identifiable quantity when estimated using highly polymorphic markers.
All simulations were also conducted without any relatedness present.MOIRE was still able to accurately recover allele frequencies, heterozygosity, and MOI, indicating that minimal bias or uncertainty are introduced by attempting to estimate relatedness (Supplementary Figure 1).
These patterns held across the range of false positive and false negative rates simulated with the fully synthetic simulations.
Allele frequencies and heterozygosity remained well estimated by MOIRE across settings, however bias was elevated for individual level estimates of MOI, within-host relatedness, and eMOI when false positive rates were increased and panel diversity was low.
Increased false negative rates did not result in any additional bias within the range of tested values (Supplementary Figure 2).

Population inference
MOIRE is a probabilistic approach providing a full posterior distribution over model parameters, allowing estimation of credible intervals for model parameters as well as functions thereof.While sample level parameters estimated by the model are useful, it may also be useful to estimate population level summary statistics for reporting and comparison purposes.We thus calculated the posterior distribution of population level summaries of interest, such as mean MOI, mean within-host relatedness, and mean eMOI.We note that mean within-host relatedness is defined only for samples with MOI greater than 1, therefore the posterior distribution of mean within-host relatedness was calculated across samples with MOI greater than 1 at each iteration of the MCMC algorithm.MOIRE accurately estimated these quantities across a range of conditions (Supplementary Figure 3), with the best performance seen for polyallelic data.
Population mean MOI was accurately estimated across all panels, with improved precision at lower levels (Supplementary Figure 3A, Table 1).Credible interval (CI) coverage in general was poor, likely due to the challenge of identifiability in conjunction with within-host relatedness.SNP panels were largely unable to resolve population level mean within-host relatedness and exhibited poor CI coverage and substantial sensitivity to the uniform prior specification due to the low relative information contained in these markers.Polyallelic panels in contrast had improved precision as more diverse panels were used, although CI coverage was also poor due to persistent sensitivity to the uniform prior as indicated by slightly overestimating within-host relatedness below .5 and underestimating within-host relatedness above .5.
Population mean eMOI was remarkably accurate for low and medium mean MOI when using SNP based panels, with bias only becoming apparent at higher mean MOI (Supplementary Figure 3C, Table 1).Polyallelic panels had substantially improved precision across a wide range of values, further demonstrating that while population mean within-host relatedness or mean MOI may be challenging to identify, mean eMOI remains a highly identifiable quantity when genetic markers with sufficient diversity are used.

Metric stability across genetic backgrounds
Population metrics of genetic diversity enable researchers to make comparisons across space and time, and to answer questions relating to differences in transmission dynamics.In order for a metric to be useful for these purposes, it must be sensitive to changes in transmission dynamics while remaining insensitive to other factors that vary and may confound interpretation, such as the genotyping panel used, or the local allele frequencies for a given panel.For example, if we were to compare two populations that exhibit the same transmission dynamics, we would want the metric to be the same, uninfluenced by differing population allele frequencies.It would be even better if the metric is insensitive to the genotyping panel used, allowing for comparisons across studies that are independent of the technology utilized.For each level of relatedness (low and high), we simulated 100 infections with a mean MOI of 1.51 and 3.16, for a total of 400 infections across 4 conditions.Keeping the MOI and relatedness fixed for each sample, we varied the genetic diversity of the panel used to genotype each sample.We then calculated the mean eMOI from MOIRE, mean MOI using the naive estimator, and mean F W S using a naive estimate of allele frequencies for each simulation to assess the sensitivity of each metric to varying the genetic diversity of the panel.True mean eMOI and mean MOI are fixed values within levels of within-host relatedness and are annotated by dashed lines.Mean F W S is not fixed within levels of within-host relatedness and MOI because it is a function of the genetic diversity of the panel.
To explore the performance of eMOI across varying transmission 346 settings, we simulated 100 samples with MOI drawn from a ZTP 347 distribution with either λ = 1 or λ = 3.For each sample, we 348 then simulated either low or high within-host relatedness.For each 349 individual level simulation, we then observed simulated genetics 350 parameterized by each of the 12 regional populations previously 351 described using the 5 genotyping panels, followed by the previously 352 described observation process with false positive and false negative 353 rates of .01 and .1 respectively.We then fit MOIRE on each 354 simulation independently.
For each simulation, we calculated mean eMOI, mean naive MOI, 356 and the within-host infection fixation index (F W S ) (Roh et al., 357 2019;Manske et al., 2012), a frequently used metric of within-358 host diversity that relates genetic diversity of the individual 359 infection to diversity of the parasite population.Mean MOI was 360 calculated using the second-highest number of observed alleles, 361 and F W S used the observed genetics, assuming all alleles were 362 equifrequent within hosts, and naive estimates of allele frequencies 363 to estimate heterozygosity.For these metrics to be most useful in 364 characterizing transmission dynamics, they should be the same for 365 all simulations with the same degree of within-host relatedness and 366 mean MOI, no matter the panel used nor the genetic background 367 of the population.We found that mean eMOI was stable across all 368 genetic backgrounds using microhaplotype based panels, yielding 369 accurate estimates of mean eMOI despite substantial variability in 370 local diversity of alleles, as shown by heterozygosity, and differing 371 genomic loci (Figure 2A).Interestingly, while the SNP panels exhibited reduced precision and downward bias as expected, they were consistently biased with respect to the true eMOI, even across different panels.This suggests that SNP panels, while limited in resolving power, may still have utility in estimating relative ordering of eMOI.These results also demonstrate that eMOI may be readily used and compared across transmission settings and is relatively insensitive to other factors such as heterozygosity that may vary across settings.In contrast, mean naive MOI and F W S were sensitive to genetic background and genotyping panel in confounded ways.Mean naive MOI, only useful with polyallelic markers, exhibits an inherent upward bias as mean heterozygosity increases that is most severe at higher mean MOI.This bias also varied with the genotyping panel used, making it difficult to interpret and compare across settings (Figure 2B).F W S is also sensitive to genetic background and panel used, exhibiting an upward trend as heterozygosity increases and a bias that varies across panels.This is inherent to the construction of the metric, as it is coupled to an estimate of the true heterozygosity of genetic loci being used (Figure 2C).This simulation demonstrates limitations in the utility of F W S as a metric of within-host diversity for a population as it is inherently uncomparable across settings due to its high sensitivity to varying genetic background and genotyping panel used.Mean eMOI, in contrast, is a stable metric of genetic diversity that is insensitive to genetic background and genotyping panel, and is thus readily comparable across settings.

Application to a study in Northern Namibia
We next used MOIRE to reanalyze data from a previously conducted study carried out in northeastern Namibia consisting of 2585 samples from 29 health facilities across 4 health districts genotyped at 26 microsatellite loci (Tessema et al., 2019).We ran MOIRE across samples collected from each of the 4 health districts independently.Running MOIRE in this way implies that we are assuming that all samples from each health district come from a shared population with the same allele frequencies.We then calculated summary statistics of interest on the sampled posterior distributions.
We compared our results to the naive estimation conducted in the original study and found that overall relative ordering of mean MOI was maintained, with Andara and Rundu exhibiting the highest MOI, Zambezi the lowest, and Nyangana in between, consistent with contemporary estimates of transmission intensity (Tessema et al., 2019).However, similar to our simulations, naive estimation substantially underestimated mean MOI across health districts compared to MOIRE (Figure 3A and C).Individual within-host relatedness was estimated to be very high across sites (IQR: .61-.91) with no differences between sites (Figure 3B).
This suggests substantial inbreeding which may be indicative of persistent local transmission, consistent with the original findings by Tessema et al. (2019) We also found that heterozygosity across loci estimated by MOIRE was generally lower (IQR: .55 -.85), consistent with the previously described simulations demonstrating that naive estimation overestimates heterozygosity, and that previously detected statistically significant differences in heterozygosity between the Zambezi region and the other three regions may have been an artifact of biased estimation (Figure 3D).
We also ran MOIRE independently across each of the 29 health facilities, excluding 2 health facilities from the Zambezi region Fig. 3: Estimated MOI, relatedness, eMOI and heterozygosity in Northern Namibia.MOIRE was run on data from 2585 samples from 29 clinics genotyped at 26 microsatellite loci, subset across four health districts.Each point represents the posterior mean or median for each sample or locus level parameter.The black circle represents the population mean with 95% credible interval for each health district and the black triangle indicates the naive estimate where applicable.In the case of eMOI (C), the naive estimate is simply the MOI.Opacity was used to accommodate overplotting in A, C and D, however opacity in B is reflective of the posterior probability of a particular sample being polyclonal to emphasize that an observation's contribution to the posterior distribution of mean within-host relatedness is weighted by its probability of being polyclonal.This is due to the fact that mean within-host relatedness is only defined for samples with MOI greater than 1, and thus the posterior distribution of within-host relatedness was calculated by taking the mean withinhost relatedness across samples with MOI greater than 1 at each iteration of the MCMC algorithm.Therefore, the opacity of each point in B is reflective of the contribution of that sample to the posterior distribution of mean within-host relatedness.due to low total number of samples (n = 9 in each).Stratifying 430 by health facility revealed substantial heterogeneity in mean MOI, 431 within-host relatedness, and consequently eMOI, also consistent 432 with the findings by Tessema et al. (2019) (Figure 4).Interestingly, 433 Tessema et al. (2019) identified Rundu district hospital as having 434 exceptionally high within-host diversity as measured by F W S , 435 which was posited to be due to a large fraction of the patients 436 having traveled or resided in Angola.We found that Rundu district 437 hospital had the highest mean eMOI and greatest spread across 438 observations ], IQR = 4.88).This 439 was mainly driven by much higher mean MOI (7 [95% CI: 6.5-440 7.5]), and low mean within-host relatedness (.47 [95% CI: .43-441 0.51]).The combination of high MOI and relatively low within-442 host relatedness, translating into high population mean eMOI, 443

454
In particular, naive estimation systematically overestimates 455 measures of allelic diversity such as heterozygosity and 456 systematically underestimates MOI.State-of-the-art methods diversity than naive MOI or F W S , and is insensitive to other factors that may vary across settings such as allele frequencies of given genetic markers.Further, by decomposing the genetic state of an infection into components of within-host relatedness and the number of distinct strains present, we have enabled the characterization of these quantities independently, which may be of interest in their own right.For example, within-host relatedness may be of interest in the context of understanding the role of inbreeding and co-transmission in the parasite population (Wong et al., 2022;Nkhoma et al., 2020), and the number of distinct strains may be of interest in the context of understanding superinfection dynamics.
While we have demonstrated the utility of polyallelic data, MOIRE is still compatible with SNP based data and can offer benefits over other approaches.When using SNP based panels, eMOI is still well characterized up to moderate levels, and while the reduced capacity of SNPs generally results in biased estimates, the estimates recovered reflect changes in within-host relatedness yet are stable across genetic backgrounds.Thus, these data may be useful for comparing relative ordering of eMOI across settings and providing inference.In contrast, existing analytical approaches are likely to be sensitive to model misspecification by not considering within-host relatedness and varying genetic backgrounds, and may be biased in ways that are difficult to interpret and compare across settings.
We also note that while increasing the number of loci genotyped is always beneficial, the largest gains in recovering estimates of interest are through using sufficiently diverse loci.Our simulations demonstrate that, even with a modest number of very diverse loci such as our synthetic simulations using 30 loci, eMOI can be recovered with a high accuracy and precision.Marginal increases in complexity of incorporating several highly diverse loci, for example in the context of drug resistance monitoring, may be outweighed by the substantial insights obtained from jointly understanding transmission dynamics, population structure, and drug resistance through increased accuracy of estimating resistance marker allele frequency.Modern amplicon sequencing panels have been developed precisely for these contexts, combining high diversity targets with comprehensive coverage of known resistance markers (LaVerriere et al., 2022;Aranda-Diaz and Neubauer Vickers, 2022;Kattenberg et al., 2023).MOIRE provides a powerful tool for leveraging polyallelic data to understand malaria epidemiology, and there are multiple avenues for future work to further improve inference.First, the observation model does not currently fully leverage the information in sequencing based data where the actual number of reads may be available.This may provide additional information, e.g. to inform false positive rates by considering the number of reads attributable to an allele, as well as false negative rates by considering the total number of reads at a locus which may be indicative of sample quality.Second, we currently consider only a single, well mixed, background population parameterized by allele frequencies at each locus.However, it may be the case that there are multiple distinct populations with their own allele frequencies, and that the observed data is a mixture of these populations.This may be particularly relevant in the context of malaria transmission where there may be multiple distinct populations of parasites circulating in a region.Future work may consider a mixture model over allele frequencies, where the number of populations

82
Critically, MOIRE takes as input genetic data of arbitrary83 diversity, allowing for estimation of allele frequencies, MOI, 84 and within-host relatedness from polyallelic as well as biallelic 85 data.MOIRE is able to fully utilize polyallelic data, yielding 86 joint estimates of allele frequencies, sample specific MOIs and 87 within-host relatedness along with probabilistic measures of 88 uncertainty.We demonstrate through simulations and applications 89 to empirical data the ability of MOIRE to leverage a variety of 90 polyallelic markers.Polyallelic markers can greatly improve jointly 91 estimating sample MOI, within-host relatedness, and population 92 allele frequencies, resulting in reduced bias and increased power 93 for understanding population dynamics from genetic data.We also 94 introduce a new metric of diversity, the effective MOI (eMOI), a 95 continuous value that combines estimates of the true MOI and 96 the degree of within-host relatedness in a single sample, providing 97

Fig. 1 :
Fig.1: True vs. estimated values of parameters across panels of varying genetic diversity.Panel A summarizes the distribution of heterozygosity across each panel used.Each symbol represents the estimated value of the parameter for a single simulated dataset, with the true value of the parameter on the x-axis and the estimated value on the y-axis.Simulations were pooled across mean MOIs and levels of relatedness.False positive and false negatives rates were fixed to 0.01 and 0.1 respectively.Opacity was set to accommodate overplotting, except in the case of withinhost relatedness where opacity reflects the estimated probability that a sample is polyclonal, calculated as the posterior probability of the sample MOI being greater than 1, as individual withinhost relatedness is only defined for samples with MOI greater than 1.MOIRE accurately recovered parameters of interest with increasing accuracy as panel diversity increased, while naive estimation exhibited substantial bias where such estimators exist.

Fig. 2 :
Fig.2: Comparison of mean eMOI to other summary measures of diversity across varying levels of within-host relatedness.For each level of relatedness (low and high), we simulated 100 infections with a mean MOI of 1.51 and 3.16, for a total of 400 infections across 4 conditions.Keeping the MOI and relatedness fixed for each sample, we varied the genetic diversity of the panel used to genotype each sample.We then calculated the mean eMOI from MOIRE, mean MOI using the naive estimator, and mean F W S using a naive estimate of allele frequencies for each simulation to assess the sensitivity of each metric to varying the genetic diversity of the panel.True mean eMOI and mean MOI are fixed values within levels of within-host relatedness and are annotated by dashed lines.Mean F W S is not fixed within levels of within-host relatedness and MOI because it is a function of the genetic diversity of the panel.

Fig. 4 : 448 Translating
Fig.4: Estimated MOI, relatedness, eMOI and heterozygosity in Northern Namibia, stratified by health facility.MOIRE was run independently on data from each health facility.Two health facilities from the Zambezi region were excluded due to only having 9 samples present in each subset.Health facilities are plotted in geographic order from West to East.Plotting conventions are the same as in Figure3. ) 166within-host relatedness decreases, however the contribution to the 167 estimate of eMOI from within-host relatedness also decreases, and 168 thus the overall precision of eMOI is maintained.169InferenceandImplementation170 We fit our model to observed genetic data using a Markov 171 Chain Monte Carlo (MCMC) approach using the Metropolis-Hastings algorithm with a variety of update kernels.Details of 173 sampling and implementation are described in the supplementary 174 material (section 5).MOIRE is implemented as an R package 175 and is available with tutorials and usage guidance at https: 176 //eppicenter.github.io/moire/.All sampling procedures were 177 implemented using Rcpp (Eddelbuettel and Francois, 2011) for 178 efficiency.Substantial effort was placed on ease of use and 179 limiting the amount of tuning required by the user by leveraging 180 adaptive sampling methods.We provide weak default priors for 181 all parameters and recommend that users only modify priors if 182 they have strong prior knowledge about the parameters, such 183 as experimentally derived estimates of false positive and false 184 negative rates using samples with known parasite compositions 185 and densities.All analysis conducted in this paper was done 186

Table 1 .
Mean absolute deviation (MAD) of estimates of MOI, heterozygosity, within-host relatedness, and eMOI across simulations using synthetic (top) and real-world (bottom) genotyping panels.The MAD of estimates of MOI were calculated by taking the mean of the MAD for each stratum of true MOI between 1 and 10.MOI Within-host relatedness accuracy is only considered for samples with a true MOI > 1. Coverage rates of 95% credible intervals are shown in parentheses for estimates by MOIRE.