Demography, education, and research trends in the interdisciplinary field of disease ecology

Abstract Micro‐ and macroparasites are a leading cause of mortality for humans, animals, and plants, and there is great need to understand their origins, transmission dynamics, and impacts. Disease ecology formed as an interdisciplinary field in the 1970s to fill this need and has recently rapidly grown in size and influence. Because interdisciplinary fields integrate diverse scientific expertise and training experiences, understanding their composition and research priorities is often difficult. Here, for the first time, we quantify the composition and educational experiences of a subset of disease ecology practitioners and identify topical trends in published research. We combined a large survey of self‐declared disease ecologists with a literature synthesis involving machine‐learning topic detection of over 18,500 disease ecology research articles. The number of graduate degrees earned by disease ecology practitioners has grown dramatically since the early 2000s. Similar to other science fields, we show that practitioners in disease ecology have diversified in the last decade in terms of gender identity and institution, with weaker diversification in race and ethnicity. Topic detection analysis revealed how the frequency of publications on certain topics has declined (e.g., HIV, serology), increased (e.g., the dilution effect, infectious disease in bats), remained relatively common (e.g., malaria ecology, influenza, vaccine research and development), or have consistently remained relatively infrequent (e.g., theoretical models, field experiments). Other topics, such as climate change, superspreading, emerging infectious diseases, and network analyses, have recently come to prominence. This study helps identify the major themes of disease ecology and demonstrates how publication frequency corresponds to emergent health and environmental threats. More broadly, our approach provides a framework to examine the composition and publication trends of other major research fields that cross traditional disciplinary boundaries.

fields integrate diverse scientific expertise and training experiences, understanding their composition and research priorities is often difficult. Here, for the first time, we quantify the composition and educational experiences of a subset of disease ecology practitioners and identify topical trends in published research. We combined a large survey of self-declared disease ecologists with a literature synthesis involving machine-learning topic detection of over 18,500 disease ecology research articles.
The number of graduate degrees earned by disease ecology practitioners has grown dramatically since the early 2000s. Similar to other science fields, we show that practitioners in disease ecology have diversified in the last decade in terms of gender identity and institution, with weaker diversification in race and ethnicity. Topic detection analysis revealed how the frequency of publications on certain topics has declined (e.g., HIV, serology), increased (e.g., the dilution effect, infectious disease in bats), remained relatively common (e.g., malaria ecology, influenza, vaccine research and development), or have consistently remained relatively infrequent (e.g., theoretical models, field experiments). Other topics, such as climate change, superspreading, emerging infectious diseases, and network analyses, have recently come to prominence. This study helps identify the major themes of disease ecology and demonstrates how publication frequency corresponds to emergent health and environmental threats. More broadly, our approach provides a framework to examine the composition and publication trends of other major research fields that cross traditional disciplinary boundaries.

K E Y W O R D S
host-pathogen interaction, infectious disease ecology, machine learning, questionnaire, research trends

| INTRODUC TI ON
Parasites and the diseases they can cause are an important component of ecosystems and can shape population dynamics, food web structure, and ecosystem health (Hudson et al., 1998(Hudson et al., , 2006Lafferty et al., 2006). Parasites have both positive and negative impacts on the ecosystems in which they occur, yet their negative impacts can be extremely serious, such that infectious diseases are a leading source of human, domestic animal, and wildlife mortality, killing an estimated 17 million people each year (Brand, 2013;World Health Organization, 1996, threatening economic security through crop and production animal losses (Haseeb et al., 2019;Benavides et al., 2017), and causing declines of endangered species (Scheele et al., 2019). Moreover, infectious disease outbreaks are predicted to be exacerbated by contemporary issues such as climate change, high human population density, and fragmentation of natural environments (Altizer et al., 2013;Daszak et al., 2001;Plowright et al., 2021). It is important to establish a strong foundation and specialization for research on infectious diseases in their ecological and evolutionary context to promote a high standard of living and enhance wildlife and ecosystem health; for example, we continue to face challenges such as emerging pathogens (e.g., SARS-CoV-2; Andersen et al., 2020), pathogen evolution (van Boeckel et al., 2019), and the need for innovative interventions (Sokolow et al., 2019).
Disease ecology is the study of how micro-and macroparasites move through and are distributed across host populations, landscapes, and ecosystems, considering both abiotic and biotic factors, as well as the consequences of their infections. It is a relatively new and rapidly expanding research focus within ecology and evolutionary biology that draws heavily on early foundations in population biology  and vectorborne disease (e.g., zooprophylaxis as a precursor to the dilution effect literature (Hess & Hayes, 1970;Schmidt & Ostfeld, 2001)).
Further, disease ecology integrates many fields that cross multiple levels of biological organization including but not limited to parasitology, microbiology, immunology, and epidemiology (Grenfell et al., 1995;Hudson et al., 2002;Wilson et al., 2019). Disease ecologists investigate a range of practical and fundamental questions relevant to humans, other animals, and plants, such as the natural origins of disease outbreaks; heterogeneities in pathogen susceptibility, transmission, and impact; and the effectiveness of intervention strategies (Condeso & Meentemeyer, 2007;Hudson et al., 1998;Joseph et al., 2013;Olival et al., 2017;Vanderwaal & Ezenwa, 2016).
Disease ecology, in part, adapted and developed population biology theory to address societal needs (Johnson et al., 2015;Koprivnikar & Johnson, 2016;Scheiner & Rosenthal, 2006). Key among these is the urgency to understand and address novel disease threats, which are rooted in natural systems but are often exacerbated by societal inequalities (Carlson & Mendenhall, 2019). For example, the impacts of habitat degradation on pathogen spillover are an expanding area of research that can be used to guide risk assessments and environmental policy (Plowright et al., 2021). At the same time, infrastructure has developed around disease ecology, including journals and associated organizations (e.g., Wildlife Disease Association, American Society of Tropical Medicine and Hygiene), and a specialized National Science Foundation and National Institutes of Health funding program and conference series (Scheiner & Rosenthal, 2006), which have helped to direct research effort and create networks among researchers.
Still, many questions remain as to the composition of disease ecology practitioners, core research foci, and if research trends are associated with widespread disease outbreaks. Answering these questions could help improve recruitment and retention and prioritize future research directions. However, understanding these complex and interrelated factors as they apply to an interdisciplinary research field requires diverse and innovative approaches.
Here, we characterize the field of disease ecology and a subset of its practitioners by addressing the following questions: (1) Who comprises the field in terms of education, demographics, and the type of research they conduct? (2) Which scientific articles and journals have been the most influential? (3) And significantly, how has the frequency of research topics emerged and changed in the literature over time? For example, do the topics in publications follow global health events such as disease outbreaks?
However, high volumes of published research make theme synthesis very difficult and require innovative approaches (Lajeunesse, 2016;Nunez-Mir et al., 2016). Following recent adoptions of data mining approaches to systematic reviews (Han & Ostfeld, 2019), we apply topic detection using non-negative matrix factorization to characterize the research core and trajectory of disease ecology. More broadly, our approach can provide a quantitative synthesis framework to examine the frequency that topics are published in other fields that cross traditional disciplinary boundaries.

| Survey
We developed a survey questionnaire to quantify the demographics and research core of disease ecology (Pennsylvania State University Institutional Review Board Study 00010582; Appendix S1). The survey was disseminated on disease ecology email listservs such as conference attendees (e.g., the past five years of Ecology and Evolution of Infectious Disease conferences, Ecological Society of America), scientific organizations and networks (e.g., VectorBiTE, American Society of Parasitologists, Ecological Society of America disease ecology section), and institutional research centers (e.g., Pennsylvania State University Center for Infectious Disease Dynamics, University of Georgia Center for the Ecology of Infectious Diseases). We also distributed the survey to prominent non-USA research groups (in, e.g., South America, Europe, Australia) to diversify our survey participants. However, we acknowledge that survey reach was heavily biased toward established research centers and active researchers in the disciplinary community, primarily in North America and Europe, and surely missed certain individuals and groups, particularly those who conduct relevant research on infectious disease but may not necessarily identify as disease ecologists (e.g., medical entomologists and historians).
The survey was open from November 2018 until January 2019, closing once the response rate dropped below two new responses per day for one consecutive week. All survey participants were selfdeclared disease ecology practitioners, who were informed about the potential use of results in a consent statement. The survey asked participants questions on their demographics, institution, education, types and topics of current research, and influential scientific articles and journals. It included a combination of multiple choice and short answer response questions. A full copy of the survey and description of the data cleaning procedure is available in the Appendix S1.

| Literature search
Our objective was to compile an extensive corpus robustly representative of publications in disease ecology rather than to include every article per se. Literature search terms are often generated by the authors, which may impose bias. To generate a list of search terms with reduced author bias, we compiled a set of papers that cited the foundational paper in disease ecology and was considered to be highly influential to the survey participants ; Table 1). Using this set of papers, we performed topic detection algorithms (nltk library, Python 2.7; Bird et al., 2009) to generate the list of 13 base keywords (e.g., the word parasite could have multiple prefixes and suffixes) that were used to search the wider literature (see Appendix S1: Literature Search Methods). To this end, our literature search terms emerged from disease ecology literature itself and then were refined through the process described below and in Figure 1.
The final literature search was conducted in Web of Science for the years 1975 to 2018. Each article had to meet specific criteria using Boolean filters, including a focus on studying a pathogen or parasite, host infections (to distinguish from solely environmental persistence of microorganisms), and individual-level or higherorder dynamics (e.g., not cellular processes, with the exception of those analyzed as a population-level process). The full list of search terms is provided in the Appendix S1, alongside a set of exclusionary terms to remove similar but non-disease ecology articles. Web of Science categories were used to narrow our search and also reduce false-positive inclusions. To reduce bias, both search terms and included journals were based on survey results. We included journals that were listed by at least four survey participants as significant to the field (n = 42), as well as Nature and Science. Finally, articles with fewer than four citations were removed as a form of quality control.
To evaluate false positives, two authors (DJB and KMF) independently evaluated the same 100 randomly selected articles and classified them as "disease ecology" or "outside the field." Papers that fell outside the field predominantly described pathogenesis, bacterial communities, or genetics/genomics ( Figure S5). Withinhost studies were accepted if they focused on population-level processes (Cressler et al., 2014) or parasite manipulation. 75% of the articles in the final corpus were classified as disease ecology, and consensus was strong among evaluators (94% agreement, Cohen's κ = 0.84). Within false-positive papers, there was no association between topical and temporal trends (χ 2 = 72.84, p = .29, Appendix S1: To evaluate false negatives, we cross-validated our corpus using our survey data. Specifically, we assessed whether articles that were identified by at least two survey participants as influential were present in our corpus. We calculated the proportion of papers that were included in our corpus out of the list of such articles, with the requirement that at least 70% of papers had to be included. Of the influential articles identified by survey participants (written ≥2 times) restricted to journals used in building the corpus, approximately 71% (50/70) were present in the corpus. The "most influential" articles had a higher probability of being included: the corpus included 85% of articles written four or more times, 75% of articles written three times, and 63% of articles written twice. We adjusted the search and exclusion terms twice using the workflow described in Figure 1 (unfilled arrow) to obtain a corpus with high classification and cross-validation success.
TA B L E 1 Five highest ranking scientific journals (left) and articles (right) based on survey responses. Survey participants were asked to pick journals other than Science or Nature

| Literature analysis
We conducted topic detection on the validated corpus using nonnegative matrix factorization. Topic clusters represent a set of cooccurring words that can be used to define an area of research. The number of topic clusters (i) and words per topic (j) were the only parameters imposed on the literature analysis. To select appropriate values for i and j, we ran topic detection for a range of values and combinations of i and j and assessed outputs. If i was too small or too large, we were unable to detect temporal variation in that topic.
If j was too small or too large, the topics were not clearly defined.
For example, a topic with only five words may not be interpretable; similarly, a topic with 30 words may be too broad to assign meaning. We used i = 15 and j = 15, so our corpus was analyzed for 15 topics with 15 words each. A "topic" therefore describes the frequency of co-occurring words or phrases in the literature corpus; topics could span any number of interests, methods, taxa, or set of words/phrases that emerged from the literature or are of interest to researchers.
We used K-means clustering from the nltk Python library to construct topics, where each topic comprised 15 commonly cooccurring words. We assigned a name to each topic to describe its theme. For example, we named a topic containing immunodeficiency, HIV, patient, therapy, drug, AIDS, background, treatment, and risk, as an HIV topic. We gave each topic name a "confidence" measurement of 1-3, from high to low confidence in identifying the topic (Appendix S1: Topic detection). In addition to topics that emerged from the literature, we also generated and assessed our own topic lists based on key research areas: climate change, dilution effect, superspreaders, network analysis, EIDs, infectious diseases in bats and rodents, chytrid fungus, theoretical modeling, and field experiments (Figure 4; full topic lists are in the Appendix S1). To ensure topic trends were not confounded by an increase in the total number of published articles through time, we constructed a baseline topic using neutral words that should be in all disease ecology articles: analysis, study, and paper. We evaluated temporal trends in publications for each theme using generalized additive models (GAMs) fit using the mgcv package in R (Wood, 2006). The proportion of words in each topic relative to all words was modeled as a binomial response using thin-plate splines with shrinkage for publication year. Lastly, to assess covariation among topics, we estimated Spearman's rank correlation coefficients (ρ) at the zero-year lag. (1.2%, n = 5), master's student (2.9%, n = 12), PhD student (24.5%, n = 100), postdoctoral researcher (21.1%, n = 86), faculty (39.5%, n = 161), researcher (9.1%, n = 37), and other (1.7%, n = 7). Participants identifying as women comprised most of each academic position except Master's student and faculty (Table S1). In general, most PhD students and postdoctoral researchers were young and identified as women. Most masters' students were young and identified as men, and most faculty were middle-aged and identified as men (Tables S1-S3). Non-binary participants were distributed across age (<50) and position categories (n = 3), as were participants who preferred not to provide a gender identity (n = 3).

| Survey
We The least common areas included behavioral ecology, bioinformatics, field and laboratory techniques, movement ecology, virology, landscape ecology, and zoology.
In brief, most participants fell into a few distinguishable categories based on location, study taxa, and research type ( Figure 3; see Appendix S1 for more). 87.1% were currently employed/studying at a university (Figure 3a), primarily in the United States (74.7%) or United Kingdom (15.1%, Figure 3c). Most participants studied wildlife hosts, microparasites, and/or vectors and macroparasites ( Figure 3e); wildlife-microparasite, ectoparasite/ vector-wildlife, and human-vector were the most common cooccurring pairs of study taxa selected by individual participants.
Survey participants were asked to write in scientific journals and articles that they believed were the most influential in disease ecology (Table 1) (Table 1). See Appendix S1 for full lists of both articles and journals.

| Literature search and analyses
We compiled a list of 42 journals that at least four survey participants said were the most important in disease ecology, plus Science and Nature. We searched these 44 journals for relevant articles in the field using the algorithm described above, and our final corpus comprised 18,695 articles. Our validation processes demonstrated that at least 75% of these articles were properly classified, and we did not detect any systematic bias in falsely positive articles. Articles span from 1975 to 2018, with most published after 2000, indicating a rapid and considerable expansion of the field since early foundational work in population biology and vector-borne disease Hess & Hayes, 1970;. However, some journals were not available in Web of Science until the 1980s-1990s, so article availability may slightly bias our corpus in the early years.

F I G U R E 3 Summary of disease ecology demographics and research topics/types from survey participants. Pie charts display (a) current position or institution type, (b) self-declared race or ethnicity, and (c) country of residence; research topics and types are categorized by (d) field of PhD thesis, (e) current study taxa, and (f) type of primary research
Topic clusters were classified into two categories: (1) those that emerged and were identifiable from the literature and (2) those that we deliberately searched for using key term searches. Of emergent topics, malaria and mosquito-borne pathogens appeared most frequently in the topic clusters (3/15), followed by experimental infection trials (2/15). Other clear topics included HIV, influenza, vaccine research, and host-pathogen coevolution. Some topics were more ambiguous but still identifiable, such as wildlife pathogens, tick-borne pathogens with rodent hosts, and serological analyses.
Overall, we had high confidence assigning names to topic clusters emerging from the literature, indicating defined areas of research in the corpus (see Appendix S1).
Many of the topics that emerged from the disease ecology literature, such as malaria, influenza, and vaccination research and de-

| DISCUSS ION
Interdisciplinary research fields can rapidly grow to address important societal needs, and retrospective analysis of their evolution can help improve their future trajectory and growth. By combining a survey with a powerful quantitative literature synthesis, we demonstrate the increasing gender and institutional diversity of disease ecology practitioners alongside the breadth of research activities.
Certain topical themes that emerged from our literature corpus, such as influenza, malaria, and vaccine research and development, have remained prominent foci of disease ecology, whereas an increase in the frequency of publication of a priori selected topics such as emerging infectious diseases, climate change, and effects of biodiversity loss emphasizes how this expanding field has mirrored global events and priorities.
Self-declared disease ecology practitioners are becoming more diverse in terms of country of education, gender identity, and institution (Figures 2 and 3). The gender trends identified here are echoed in engineering, computer science, and mathematics/statistics where the proportion of women earning graduate degrees has increased over the past two decades (20%-43% of master's and doctorates earned in 2014), yet remains low in physics (18.7% of doctorates earned in 2014) (National Science Foundation, 2020).
Women authorship has increased in ecology and evolution literature every year from 2009 to 2015 (Fox et al., 2018), and the proportion of women journal editors also increased over that time but was still low relative to men (Fox et al., 2019). In terms of race/ethnicity, the rate of people identifying as Hispanic earning bachelor's degrees in science and engineering has increased slowly since the 1990s, but remained approximately constant for people identifying as black, African American, or Asian (National Science Foundation, 2020).
Similarly, persons who identify as women and those who do not identify as white remain underrepresented in a prominent ecological organization, the Ecological Society of America; representation has improved for women in this group over the past 30 years, but not for most racial/ethnic minorities (Beck et al., 2014). Therefore, our findings are largely reflected in other scientific and mathematical fields such that gender representation is improving at a more rapid pace than racial/ethnic representation.
Diversity in the workplace and educational institutions is fundamentally important and increases performance, cooperation, problem-solving, and student retention (Drury et al., 2011;Milem, 2003;Roberge & van Dick, 2010). The highest demographic and institutional diversity we identified was in younger age groups (<36 years old), graduate programs, and postdoctoral positions (Tables S2 and S3). This may be due to increasing levels of education globally (Group of Eight, 2013; UNESCO Institute for Statistics, 2020), or targeted programs to increase diversity in science and mathematics, particularly focused on recruiting women (Burke et al., 2007;Huntoon & Lane, 2007). Another non-mutually exclusive driver of these trends could be the failed retention of minorities and women in later career stages (Blickenstaff, 2005;Diekman et al., 2010;Shaw & Stanton, 2012). Yet significantly, although we identified some relative increases in diversity within disease ecology, the field as a whole remains quite homogenous in terms of gender, race and ethnicity, and geography, and marginalized groups face considerable inequities and discrimination in science fields-for example, experience harassment and exclusion, lower likelihood to have grants funded, hold fewer faculty positions, and have limited access to academic experiences and resources (Allen et al., 2000;Jones & Solomon, 2019;Rissler et al., 2020). Concerted efforts to improve equity must continue and explicitly address recruitment and retention, especially for fostering racial and ethnic diversity.
We acknowledge that surveys can be a biased source of information because researchers rely on voluntary participation. For example, studies of academic survey participation have shown that people who identify as women are more likely to respond to surveys, while academic rank had little influence on response rate (e.g., tenured versus tenure track) (Saleh & Bista, 2017;Smith, 2008). Additionally, as our survey was shared via email listservs, it is likely that many people did not see or receive our request for participation. Because the field of disease ecology is relatively new and multidisciplinary, it is more challenging to identify smaller research groups both at universities (i.e., individual laboratory groups) or decentralized working groups (e.g., Bat One Health Research Network; BOHRN). Our survey dissemination and participation likely reflect broader geographic biases in ecology research and publishing (Nuñez et al., 2021), which could subsequently affect the influential literature identified (Table 1); however, we were unable to assess these limitations. While there are shortcomings of surveys, they remain a widely used method of data collection, and the survey developed here provides the first description of the composition of disease ecologists and important literature that we hope is built upon in future.
The second part of our study comprised an extensive literature synthesis. Literature reviews can be compromised by author bias when search terms are subjectively selected (reviewed in Okoli, 2015). There is a trade-off between the scope and errors when constructing a literature corpus. For example, narrower ecological literature reviews usually consist of a <2000 article corpus and often much less (Han & Ostfeld, 2019;Lowry et al., 2013;Poff & Zimmerman, 2010;Wortley et al., 2013), and papers may be individ- and true positives in our corpus, which is rarely accounted for or reported in ecological literature reviews (Haddaway & Watson, 2016).
False positives are inevitable in such a large body of literature, but the false-positive papers identified were unbiased with respect to year or topic. Our true-positive rate was high-85% of articles written in by participants four or more times were present in our corpus.
Large-scale quantitative reviews are imperfect and, even with the development and implementation of our robust corpus formulation and validation (Figure 1) (Suh et al., 2020), especially in the context of emerging human pathogens (e.g., Plasmodium knowlesi, Lee et al., 2011).
Mosquito-borne pathogens and influenza have been defining topics over the entire time series, which is reflected in human-vector research being the third most commonly studied disease system by participants; we expect this trend to persist for the foreseeable future. We observed exceptions for theoretical and field experimental approaches to disease ecology. While publications with these approaches have remained constant over time, publication frequency was rare relative to other themes in disease ecology. This could signal that these approaches are relatively uncommon, but we suspect that publications using theoretical modeling or field experiments may not use the same set of co-occurring words, thus making them harder to identify as distinct approaches using topic detection methods.
We also identified broader concept-based trends in disease ecology literature. In particular, the frequency of published research on the dilution effect has undergone several spikes following key findings (Civitello et al., 2015;Keesing et al., 2006) (Andersen et al., 2020).
In general, published research on epidemics tended to lag rather than precede events such that we observed a spike in the frequency of publications on high-profile pathogens followed by a decline or plateau (e.g., chytrid fungus). Emergent topics were remarkably stable through time, with the exception of HIV and host-pathogen coevolution, which have, respectively, decreased and increased. The frequency of published research focusing on concepts (e.g., the dilution effect, superspreaders, coevolution) or approaches (e.g., network analyses) rather than specific hosts or pathogens tended to rise more gradually and remain a notable proportion of the literature. Interestingly, self-declared disease ecologists performed experimental, observational, and computational research equally ( Figure 3f); however, computational research such as methodological development (e.g., network analyses), epidemiology, and mathematical modeling was popular in the literature and among survey participants. We suspect that disease ecologists have broad skillsets that intersect multiple types of research, such as performing experiments to calibrate mathematical models, which may be a defining feature of practitioners.
Although our analysis of cross-correlation between the topic frequency time series is associative, we observed several interesting relationships. The frequency of publications on bat disease, chytrid fungus, climate change, the dilution effect, superspreaders, and emerging infectious diseases all increased over time, suggesting a general increase in research on emergent disease risks to wildlife and humans in relation to anthropogenic change and heterogeneities in pathogen transmission (Daszak et al., 2001;Jones et al., 2008;Lloyd-Smith et al., 2005). From another perspective, the frequency of publications on serology has steadily declined over time, in contrast to topics such as bat disease and emerging infectious diseases.
This may indicate advances in rapid and affordable sequencing efforts to quantify pathogen diversity (Lipkin, 2013). These associations provide testable hypotheses for a future analysis that examines the combination of concepts and methods in published literature.
Using a survey and quantitative literature synthesis, this study demonstrates that disease ecology is a rapidly growing field, albeit one that will require continued efforts to enhance recruitment and retention to improve diversity. Our analysis identified trends and publication patterns of research topics addressed by disease ecologists. More broadly, our quantitative synthesis framework could help examine the composition and trends of other major research topics that cross traditional disciplinary boundaries.

ACK N OWLED G M ENTS
Our survey was considered exempt under the Pennsylvania State University Institutional Review Board (STUDY00010582). We thank all participants who completed the online survey as well as helpful feedback from participants at the 2019 Ecology and Evolution of Infectious Diseases conference. We are grateful to Vanessa Ezenwa and Peter Hudson for constructive comments on an earlier version of this report.

CO N FLI C T O F I NTE R E S T
The authors declare no conflicts of interest.

DATA AVA I L A B I L I T Y S TAT E M E N T
Additional data and code files are available on the Dryad Digital Repository: https://doi.org/10.5061/dryad.c2fqz 619f.