![]() | ![]() |
Formats:
|
||||||||||||||||||
Copyright Davies et al. This is an open-access article distributed under the
terms of the Creative Commons Attribution License, which permits unrestricted use,
distribution, and reproduction in any medium, provided the original author and
source are credited. Classification of the Universe of Immune Epitope Literature:
Representation and Knowledge Gaps La Jolla Institute for Allergy and Immunology, Vaccine Discovery, La
Jolla, California, United States of America Lisa F. P. Ng, Editor Singapore Immunology Network, Singapore * E-mail: kvaughan/at/liai.org Conceived and designed the experiments: AS. Performed the experiments: VD RD.
Analyzed the data: VD KCV AS. Contributed reagents/materials/analysis tools:
BP. Wrote the paper: VD KCV AS. Contributed to the writing of the paper: KCV
VD. Received June 5, 2009; Accepted August 1, 2009. Abstract Background A significant fraction of the more than 18 million scientific articles
currently indexed in the PubMed database are related to immune responses to
various agents, including infectious microbes, autoantigens, allergens,
transplants, cancer antigens and others. The Immune Epitope Database (IEDB)
is an online repository that catalogs immune epitope reactivity data derived
from articles listed in the National Library of Medicine PubMed database.
The IEDB is maintained and continually updated by monitoring PubMed for new,
potentially relevant references. Methodology Herein we detail the classification of all epitope-specific literature in
over 100 different immunological domains representing Infectious Diseases
and Microbes, Autoimmunity, Allergy, Transplantation and Cancer. The
relative number of references in each category reflects past and present
areas of research on immune reactivities. In addition to describing the
overall landscape of data distribution, this particular characterization of
the epitope reference data also allows for the exploration of possible
correlations with global disease morbidity and mortality data. Conclusions/Significance While in most cases diseases associated with high morbidity and mortality
rates were amongst the most studied, a number of high impact diseases such
as dengue, Schistosoma, HSV-2, B. pertussis and Chlamydia trachoma, were
found to have very little coverage. The data analyzed in this fashion
represents the first estimate of how reported immunological data corresponds
to disease-related morbidity and mortality, and confirms significant
discrepancies in the overall research foci versus disease burden, thus
identifying important gaps to be pursued by future research. These findings
may also provide a justification for redirecting a portion of research funds
into some of the underfunded, critical disease areas. Introduction More than 18 million scientific articles are currently indexed in the PubMed
database. A significant number of these relate to epitopes associated with immune responses to
various agents, including infectious microbes, autoantigens, allergens, transplants,
cancer antigens and others. The relative number of references in each category
reflects the current and past research focus on immune reactivities. Here we analyze
the data from these different domains to evaluate overall epitope data coverage
therein, and highlight the strengths and weakness in our overall knowledge base. The Immune Epitope Database (IEDB) is an online repository of manually curated data,
which catalogs immune epitope reactivity data [www.immuneepitope.org]. The PubMed database is constantly
queried for potentially relevant references. To capture the highest possible
fraction of references describing immune epitope data, these queries are
intentionally broad. To date more than 21,000 references of potential interest
(>1% of PubMed) have been identified for inclusion into the IEDB.
Their potential relevance is determined first by an automated text classifier based
on the information contained within their abstract, title and journal title [1], and then
manually by subject matter experts. The criteria for passing this initial selection
process require that i) the reference contains experimental data
describing adaptive immune responses, ii) the data is original (for
example, review papers and use of epitopes as a mere marker or tag are excluded),
and iii) the epitope molecular structure is described in sufficient
detail [2].
Following these preliminary determinations, a systematic review of the paper is
performed, and if the study is still deemed to contain curatable information, the
data are extracted, entered into the database and the curated record then becomes
available to the public. Results A broad-net strategy to capture epitope-related literature Currently, four different immunological domains are recognized and prioritized
for curation within the IEDB. As first priority, Infectious Diseases and
Microbes have been targeted, followed by Allergy, Autoimmunity and
Transplantation. Accordingly, the curation of all references from Allergy and
Infectious Diseases is nearly complete (>80% and
90%, respectively). In order to enhance the curation prioritization
process, we recently further categorized all potential references into six main
immunological domains, or classes. These classes include the four original
domains: Infectious Disease and Microbes (excluding HIV), Allergy, Autoimmunity
and Transplantation, plus HIV (including SIV and related viruses) and Cancer.
References containing immune epitope data, but not pertinent to any of the main
classes were categorized as “Other”. This detailed
classification allows not only for enhanced prioritization, but also enables a
more accurate/thorough description and accounting of the IEDB reference data. The relative number of references classified in each of these main classes is
shown in Figure 1
Next, Autoimmunity and “Other” references make up nearly
20% each. Cancer references represent nearly 12% of the
references, and finally, the least references are available for the Allergies
and Transplantation classes (less than 5% each). Each main class was
next parsed in a finer set of categories. A detailed breakdown of the pathogens associated with Infectious Diseases/and
Microbes references Using the established NCBI taxonomy, eight major categories were defined within
the Infectious Diseases and Microbes class: 1) negative strand RNA viruses, 2)
positive strand RNA viruses, 3) retro-transcribing viruses
[Retroviridae, Hepadnaviridae], 4) double-stranded DNA (dsDNA)
viruses, 5) Other viruses, 6) Actinobacteria and Proteobacteria 7) Other
bacteria, and 8) eukaryotic organisms. Each of these categories corresponds to
different phylogenetic groupings designed to encompass approximately all related
references, with the exception of the “Other viruses”
category, which contains approximately 109 references (Figure 2a
The overall greatest representation was from viral pathogens (59.2%),
followed by bacterial pathogens (22.4%) and parasites
(18.4%). Looking at the individual categories, the majority of
references within negative-strand RNA viruses, are derived from influenza
viruses (52%), followed by viruses within the Paramyxoviridae family
(RSV, Measles, Mumps; 25.5%) [Table
S1]. Within positive-strand RNA viruses, the hepatitis C virus
(HCV) subcategory accounts for about 34.1% of the references,
followed by Picornaviruses (poliovirus, coxsackievirus, foot and mouth disease
virus; 26.3%). The Flaviviridae (West Nile, yellow fever virus and
dengue virus) and Nidovirales (SARS and other coronaviruses) families are less
well represented, representing 8.5 and 13.2%, respectively. In the
retro-transcribing virus category, excluding Lentiviruses (HIV, SIV) which are
separately classified by another group in the HIV Los Alamos database
[www.hiv.lanl.gov], the delta retrovirus (HTLV) and
hepatitis B virus (HBV) represent the bulk of the category, with
22.1% and 53.5%, respectively. In the category of dsDNA
viruses, Herpesviridae are by far the most represented. Specifically, there are
185, 208 and 233 references each for alpha, beta and gamma herpesviruses,
respectively, accounting for 57% of the category. Another 315
references are related to human papilloma virus (HPV). Finally, relatively few
(45) references are related to poxviruses. Within the Actinobacteria/Proteobacteria category, references related to
Mycobacteria are by far the most numerous (39.6%), followed by
Enterobacteria (E.coli, Salmonella, Yersinia, Shigella and Proteus). Fewer
references are related to other classes containing important human pathogens,
such as bacteria in the genera Vibrio, Haemophilus, Pseudomonas, Anaplasmas,
Neisseria and Bordetella. Likewise, in the category of Firmicutes/Other
Bacteria, well-represented categories include Listeria, Streptococci and
Clostridiales. References describing epitopes from Staphylococcus, Chlamydia,
Spirochaetes and other bacteria are less well represented. Finally, in terms of
parasitic eukaryotic organisms, the most references by far are represented as
relating to Plasmodium [malaria] (48.1%); followed
by Eukaryotic invertebrates such as Nematodes, Platyhelminthes, and Schistosomas
(18.8%). Fewer references are represented by other eukaryotes in this
category, such as Entamoeba, Theliera and Babesia. A classification of Autoimmunity references based on a combination of
associated disease and autoantigens Epitope-specific references that were broadly related to autoimmunity were
classified in seven categories, based on the specific type of autoimmune
manifestations: 1) diabetes 2) rheumatoid arthritis, 3) multiple sclerosis, 4)
lupus, 5) myasthenia gravis, 6) beta amyloid reactions/Alzheimer's
disease, and 7) other autoimmune diseases. A pie chart representing the relative
distribution of these categories is shown in Figure 2b Within the MS references, those relating to myelin oligodendrocyte glycoprotein
(MOG), proteolipid protein (PLP) and myelin basic protein (MBP) in aggregate
represent 84% of the references, and that references presenting data
related to other target antigens only encompass ~16% of the
total. More specifically, three subcategories relate to myelin basic protein
(MBP), and these collectively encompass ~68% of the MS
related references. References relating to the most well-characterized
epitopes/regions (MBP 78, MBP 1–9 and PLP 139–151)
out-numbered references describing all other epitopes derived from the same two
MS target antigens. We have chosen to classify these references separately, and
only a few representative references might be curated in the IEDB for each
epitope; references presenting largely redundant data may be excluded or
considered of lower priority. A different picture emerges in the case of lupus-associated references, where a
myriad of different targets are reported. Here, epitopes are derived from
antigens of different chemical types (proteins, lipids and DNA). With the
exception of antiphospolipid and anti-cardiolipin associated references, which
represent ~44% of the references, no other class of antigens
appear to be greatly over-represented in comparison to the others. Within diabetes references, two subcategories encompass the majority of the
references, namely those related to proinsulin/insulin and glutamic acid
decarboxilase (GAD), respectively. Similarly, in the case of rheumatoid
arthritis, eight different reference subcategories were defined, with most
references belonging to collagen. The next most represented subcategories were
citrullinated epitopes, heat shock proteins and rheumatoid factors/antibodies. Finally in the broad class of ‘other’ autoimmune references,
nine main subcategories were defined ranging from reactions against reproductive
antigens (investigated as potential contraceptive measures), anti-interferon and
anti-von Willebrand factor reactions (mostly associated with reactivity
resulting from treatment with protein therapeutics) and organ-specific
autoimmune manifestations (thyroid, liver, and uveitis). This class constitutes
a large percentage of the total autoimmune class, representing some 1248
references. The Classification of Allergy references Allergy references were classified in 3 main categories based on the source of
the allergen itself: plants (288 references), animals and fungi (217
references), and other (458 references). A pie chart representing the relative
distribution of these categories is shown in Figure 2c Within the Plant category, we further distinguish eight main subcategories.
Approximately one quarter of these references are related to gluten and celiac
disease. The second most populated subcategories are those related to Betulaceae (the
birch family), Cupressaceae (the cypress and cedar family) and Poaceae (mostly
timothy grass). Less frequently populated are the subcategories corresponding to
latex and to the Fabaceae family (soybean and peas), as well as ‘other
trees’ and ‘other flowering plants.’ The animal and fungi category is further broken down in various subcategories,
corresponding to allergies to arachnids (mites and ticks) 29.5%, and
insects 18.4%, and fungi 12%. A significant number of
references are also identified corresponding to allergens derived from
vertebrates, such as animal dander or food products (i.e. milk). Here, mammals
and birds account for 26.7% and 6%, respectively, of the
references in the Eukaryote (Non-plant) category. Finally, within the ‘other allergens’ category, most
references relate to small, non-peptidic experimental allergens involved in
hypersensitivity reactions and/or commonly utilized as model small molecule
allergens (haptens). Of these, nearly half involve dinitrophenol (DNP) and
related molecules, and a much smaller fraction relates to metals (nickel,
beryllium and others). Other haptens encompass the remaining 40% or
so of the references. The classification of Alloantigen and Transplantation references The main class of Transplantation/allorecognition represents the smallest class
and a fairly heterogeneous set of references. A relative large fraction of these
references is related to galactose (15.1%), an important determinant
recognized by rejection-associated antibodies. Other related categories included
xenotranplantation (10.1%) and blood groups (8%). Also
very prominent is the category of references related to MHC molecules
(18.1%), which represent important targets, either as a whole
protein, or as a source of allorecognized peptides. Other allopeptides include
the minor histocompatibility antigen (HY protein) category (4.3%),
generic allopeptides (20%), and minor antigens (9.7%).
Finally, two additional categories represent epitopes associated with specific
disease settings such as thrombocytopenia and graft-versus-host disease (graft
rejection). A pie chart representing the relative distribution of these
categories is shown in Figure
2d The ‘other’ references To avoid duplication of ongoing efforts at the HIV Los Alamos database
[www.hiv.lanl.gov], references related to HIV, SIV and
other Lentiviruses are presently not included within the scope of the IEDB.
Likewise, cancer references involving specific immune epitopes are not currently
considered within NIAID's priority, and accordingly, their
categorization is not described herein. The final main class,
“Other,” was designed as a catchall for those references not
conforming to any of the above organism- or disease-based groupings. However,
perusal of the references within this class helps complete the picture of
molecular targets related to adaptive immune responses. These 3,765 references were further sub-dived in eight categories (Table 1). The most numerous
categories were related to non-disease-related, non-peptidic antigens, such as
the DNP hapten (7.8%), carbohydrate epitopes (2.5%),
gangliosides (5.1%), and other small molecular/haptens
(24%). References related to the definition of epitopes recognized by
monoclonal antibodies (20.8%), and by other non-monoclonal B cell
responses (8.8%) are also rather numerous. Additional categories
capture model antigens, like cytochrome C (Cyt C), hen egg white lysozyme (HEL)
and class II-associated invariant chain peptides (CLIP). A large number of
papers also exist in the scientific literature relating to ovalbumin (OVA)
epitopes, specifically OVA 257–264 (SIINFEKL) and OVA
323–339 (ISQAVHAAHAEINEAGR). Representative references for these two
epitopes have been, or are in the process of being captured. In most other
instances these epitopes are utilized as ‘tags’ and thereby
are likely to be excluded in the database. Finally, other categories in this
class relate to naturally processed ligands eluted from MHC molecules,
(5.8%), epitopes defined by X-ray crystallography, NMR structures
(3.3%), and definition of MHC binding motifs (4.6%).
Time course of reference deposition Having completed the categorization of all references, we investigated the time
course of publication. Figure
3a
In general, all categories follow a similar trend, with the exception of the
‘other’ class, which appears to start accumulating
references sooner than the others, but also appears to level off and even
decrease in rate in more recent times. To investigate the reason for this
profile, we plotted the rate of publication of each of the subcategory, as shown
in Figure 3b Correlation with societal impact The categorization of immune epitope data presented herein relates to domains
such as Infectious Diseases and Microbes, Autoimmunity, Allergy and
Transplantation. It may therefore be taken as representative of the degree to
which the molecular targets associated with these diseases have been scrutinized
and defined. An immune epitope is defined by the IEDB as the molecular structure
interacting with receptors of the immune system (T cell and B cell/antibody).
Accordingly this definition excludes, for example, structures involved in
lectin-carbohydrate interactions, or structures recognized by receptors involved
in innate immunity, such as NK cells. Immune epitopes are defined in the
literature with varying level of resolution. In some cases the minimal/optimal
residues of a T or B cell determinant are exactly defined, while in other cases
only certain key components are mapped. Finally, in certain instances broad
regions are pinpointed as containing the epitope, but the exact boundaries and
components are not defined. No structures >50aa are considered for
inclusion in the IEDB. Because it is difficult to adequately compare the
societal impact (i.e. disease burden in morbidity and mortality) of vastly
different diseases categories such as Allergies, Cancer, Autoimmunity or
Infectious Diseases, we have elected to investigate the overall impact of
disease burden within only the infectious and autoimmune disease classes. For
this purpose, we examined global morbidity (prevalence) and mortality data for a
representative list of pathogen/diseases, and then compared these data to the
number of references in each category. A total of thirty infectious diseases were selected based on the most prevalent
disease categories for which sound epidemiological data existed. Table 2 shows the ranking of
each pathogen/disease according to the total number of immune epitope references
and the estimated morbidity (prevalence) and mortality for each. The ten
pathogens associated with the most references are highlighted in gray; the
numbers in bold highlight the top ten pathogens/diseases with respect to
prevalence and mortality. The ranking of disease by prevalence or mortality is
also indicated in parentheses (1–10). The top ten diseases by overall
reference abundance are HIV/AIDS, influenza, Malaria, HCV, HBV, HPV, TB, group A
Streptococcus, Measles and RSV.
In general, the majority of pathogens listed in the top ten most prevalent in
mortality have more than 100 references each, and represents diseases
responsible for morbidity in hundreds of millions of people per year. While it
is perhaps not surprising that the epitope reference coverage would be high for
prominent infectious diseases [3], the length of the list of diseases for which
coverage was poor (<80 references) is surprising, as it includes five
very high impact diseases (dengue, Schistosoma, HSV-2, B.
pertussis and Chlamydia trachoma). Indeed, B.
pertussis stands out prominently, with some 60 million people
infected per year and an estimated 400,000 deaths, and a mere 16 references. Figure 4
In the past [4], we have also investigated overall epitope
coverage at the genomic level by calculating the total number of proteins from
which epitopes have been derived and the percent of the genome by antigen
targeted for epitope identification. Applying this approach to the current
analysis, we calculated coverage for high relevance/low reference coverage
pathogens and compare these to high profile pathogens (Table 3). In the majority of cases, pathogens
well-represented in the literature have greater epitope coverage, whereas those
less well-represented have lower epitope coverage. While the epitope coverage
was greater for high profile pathogens like TB and Malaria, the actual
percentage itself was rather low (ex. 7% for TB and 0.9%
for Malaria). Furthermore, high overall coverage does not necessarily translate
in well-balanced coverage. In many cases, the majority of the identified
epitopes come from very few antigens. This phenomenon was observed and
previously discussed in detail for high profile pathogens such as P.
falciparum, TB and influenza [4]–[6].
These results further emphasize how the current epitope literature is far from
complete and how numerous gaps and opportunities for further investigation still
exist.
Discussion The data presented herein for the first time allows for a glimpse of the balance of
references contained in the scientific literature, related to epitope data and to
all associated immunological domains, including Infectious Diseases and Microbes,
Autoimmunity, Allergy, Transplantation and Cancer. Several conclusions emerged from
the analysis of the data gathered so far. First, it is of interest to note the relative distribution of the references among
the domains, or classes. By far the majority of references are related to Infectious
Diseases and Autoimmunity. Cancer is a distant third, and relatively fewer
references are available for Transplantation and allergies. It is unclear at this
stage whether this reflects a differential focus of the scientific community, or
inherent difficulties in defining the molecular targets (epitopes) recognized by
adaptive immune responses in those settings. Interestingly, within the Infectious
Disease class, we found that references representing viral pathogens outnumber
bacterial and parasitic pathogens by about 3 to 1. This is likely due to biological
factors relating to genome size and/or antigenic complexity. The autoimmune class is
dominated by MS references and, not surprisingly, the Allergy class shows a majority
of references defining plant allergens. References related to Transplantation are
mostly focused on allo-antigens/major histocompatibility complexes. In addition to describing the overall distribution of data within the six classes,
this characterization of the epitope reference data has also allowed us to explore
possible correlations with global morbidity and mortality. Here, we investigated the
relationship between epitope coverage and overall disease burden by comparing the
top ten pathogens/diseases in terms of the total number of references with the total
number of cases and deaths per year worldwide. This analysis showed that while most
of the high impact infectious diseases (in terms of prevalence and deaths) were
well-represented in the top ten, several very significant diseases were represented
by far fewer references, thus highlighting a significant imbalance in the coverage.
Examples of these high relevance/low coverage infectious diseases are dengue,
Schistosoma, HSV-2, B. pertussis and Chlamydia
trachoma. Similar observations were made for diseases within the autoimmune
domain, where the relatively lower impact diseases MS and lupus have much greater
epitope coverage than type 1 diabetes and RA. These gaps are even more significant
in light of our recent findings following in-depth meta-analyses of some of the most
well studied diseases, such as influenza, TB and malaria [4]–[7],
which revealed significant gaps and imbalances within the immunological knowledge
associated with each of these pathogens. A sub-analysis of several representative infectious diseases undertaken to examine
epitope coverage at the genomic level showed similar results. In the majority of
cases, pathogens well-represented in the literature had greater epitope coverage, as
measured by the number of ORFs targeted for epitope identification, whereas those
less well-represented had lower epitope coverage. However, while the epitope
coverage was greater for high profile pathogens like TB and Malaria, the actual
percentage itself was rather low. This overall lack of coverage for these pathogens
is likely due to a combination of factors, including genome size and organism
complexity. Indeed, the genome of M. tuberculosis is ~4,000
ORFs, and the etiological agent of malaria, P. falciparum, has an
even larger genome and a complex 3-stage life cycle. In addition, we found that good
coverage did not necessarily translate in well-balanced coverage, as in many
instances, the bulk of epitope identification focuses on just a few of the total
antigens. This observation has been made previously by our group, the best example
of which is malaria. Here we found that the vast majority of epitope identification
was focused on the circumsporozoite surface protein (CSP), despite the
5,000+ ORFs in existence [6]. These results have relevance to the ongoing debate as to whether the direction of
biomedical research accurately reflects the reality of the global disease burden
[8]. A
recent report [9] analyzing the global burden of major infectious
diseases indicated that R&D funders have focused overwhelmingly on TB,
Malaria and HIV/AIDS, while other diseases with an even bigger burden get little
R&D attention. Indeed, numerous studies have already highlighted this trend
[10],
[11]. According to another report from an international
research foundation [12], less than 10% of the world's
research budget is spent on conditions that account for 90% of global
disease. Our data represents the first estimate of how knowledge in terms of
immunological data, rather than R&D funding corresponds to disease morbidity
and mortality, and confirms significant discrepancies in the overall research focus
versus disease burden, thus identifying important gaps to be pursued by future
research. One important issue that became apparent upon embarking in the classification of the
overall “universe” of epitope-specific references is that not
all classification schema are applicable to the various broad domains of
immunological literature. In the case of infectious diseases, we utilized the NCBI
taxonomy as a guide in the categorization. However, in the case of autoimmunity, the classification is not related to the
organism from which the epitope is derived, but rather from a broad classification
of the associated autoimmune disease manifestation. Furthermore, within each
autoimmune category, sub-categories were mostly organized on the basis of the
protein or molecule recognized. This approach was taken to reflect the classification utilized by most scientists
operating in this field. The classification of allergy references also required
development of a similar approach, because a classification based on the organism
source of the allergen would not conform to the established classification in use in
the allergy community. Here we were able to rely on the backbone of the newly
formalized immunological ontology to help establish meaningful categories and
sub-categories. Finally, we would like to point out how the current data is by necessity preliminary,
as the accuracy of the classification of the references is dependent on the stage of
the curation process. For example, in our experience about 30% of the
references deemed potentially acceptable on the basis of the abstract, are
eventually disqualified from curation because they fail to meet all inclusion
criteria. Thus approximately 70% of targeted references become
incorporated into the database. We would also like to underscore that the work
presented herein is tightly linked to other efforts currently ongoing with the IEDB.
Specifically, the present work benefits from the development of formal ontologies
and classification of immune epitope data [13], and at the same
time informs further development of ontologies and controlled vocabularies, as new
reference categories are curated. Most importantly, the current work has crucially
relied on the development of automated text classifiers [1], and in turn the present
data can be utilized to further develop new methods of automated text classification
and characterization. Materials and Methods Worldwide morbidity and mortality figures were obtained, in large part, from the WHO
using their publically available data. The WHO draws on a wide range of data sources
to quantify global and regional effects of diseases, the details of which are not
discussed here, however are available on their websites [Tables S5 and
S6] (www.who.int). When statistics were not available through the WHO, we
used the peer-reviewed literature and/or personal communication with experts in the
respective fields. Further validation was sought for these figures by contacting
subject matter experts (SMEs) to confirm the accuracy of estimates and timeframes.
Because morbidity and mortality data for individual pathogens/diseases are acquired
at different yearly intervals, data derived from a single year could not be
determined for all of the pathogens included herein. The majority of the figures
were available for years 2004–2008, however, some figures were older than
2004 and have been noted. Accurate and reliable burden of disease estimates are
currently difficult to assess as a result of inconsistent data collection and/or
reporting, especially in the developing world. Therefore, the majority of the data
shown presented herein are likely underestimates. The majority of the prevalence
data represents case estimates by the WHO, unless otherwise noted as actual reported
cases. To accomplish this sub-analysis, we first assembled a consolidated list of 30
pathogens to target from all the infectious disease categories described in Table 1. These were selected
based on the most prevalent disease categories for which sound epidemiological data
existed. For example, within the broader categories such as single stranded,
positive sense RNA viruses, we selected prominent pathogens such as measles virus,
RSV, Mumps, rabies virus, Ebola virus and all influenza viruses as representative of
the group. Preliminary prevalence and mortality figures were then assigned to each
pathogen using the WHO web page (WHO fact sheets). In order to verify that our data
corresponds with what is currently accepted within the scientific community a list
of subject matter experts (SME) was generated using the ISI web of knowledge site
[apps.isiknowledge.com]. Under the Web of Science tab, the terms
“morbidity or prevalence or mortality and disease x” were
applied in order to flag the top 5 experts within each disease-specific field based
on their total number of publications. Subject matter experts were then formally
contacted by email to seek independent validation of the preliminary figures.
Prevalence and mortality numbers were then adjusted, if warranted, to those provided
by the SMEs. On average, three authors were emailed for each disease in order to
obtain validation. Diseases, for which reliable figures could not be found and/or
were not available through the SMEs, were excluded from this part of the
analysis. Table S1 This represents a summary of each major infectious disease category,
excluding HIV, showing subtype designations to the specificity of the source
organism's genus. There are 6,567 infectious disease references,
which are distributed across the main infectious disease categories. The
percentage column indicates each category as a percent of the total amount
of infectious disease references. (0.09 MB DOC) Click here for additional data file.(90K, doc) Table S2 This represents a summary of the various autoimmune main categories, as well
as their specific subcategory designations. These subcategories were
organized on the basis of the actual protein or molecular structure
recognized by immune responses. The percentage column indicates each
category as a percent of the total amount of autoimmune references. (0.08 MB DOC) Click here for additional data file.(83K, doc) Table S3 There are three main allergy categories and these were further classified
into subcategories. The main plants category contains trees, plants and
grasses. The non-plant eukaryotes contain insects, mammals, birds,
invertebrates and fungi. The other allergens mainly consist of low molecular
weight, non-peptidic chemicals and haptens, as well as metals. The
percentage column indicates each category as a percent of the total amount
of allergy references. (0.05 MB DOC) Click here for additional data file.(47K, doc) Table S4 Table S4 describes the breakdown of the various alloantigen/transplant
subcategories. There are a total of 701 transplant-related journal
publications, constituting the lowest represented class. Each subcategory is
presented as a percentage of the total. (0.03 MB DOC) Click here for additional data file.(33K, doc) Table S5 Worldwide morbidity and mortality figures were obtained, in large part, from
the WHO using their publically available data. The WHO draws on a wide range
of data sources to quantify global and regional effects of diseases, the
details of which are not discussed here, however are available on their
websites. When statistics were not available through the WHO, we used the
peer-reviewed literature and/or personal communication with experts in the
respective fields. Morbidity and mortality data for individual
pathogens/diseases are acquired at different yearly intervals, data derived
from a single year could not be determined for all of the pathogens included
herein. The majority of the figures were available for years
2004–2008, however, some figures were older than 2004 and have
been noted. (0.06 MB DOC) Click here for additional data file.(60K, doc) Table S6 Worldwide morbidity and mortality figures were obtained, in large part, from
the WHO using their publically available data. The WHO draws on a wide range
of data sources to quantify global and regional effects of diseases, the
details of which are not discussed here, however are available on their
websites. When statistics were not available through the WHO, we used the
peer-reviewed literature and/or personal communication with experts in the
respective fields. Morbidity and mortality data for individual
pathogens/diseases are acquired at different yearly intervals, data derived
from a single year could not be determined for all of the pathogens included
herein. The majority of the figures were available for years
2004–2008, however, some figures were older than 2004 and have
been noted. (0.04 MB DOC) Click here for additional data file.(38K, doc) Acknowledgments We gratefully acknowledge the helpful contributions of the following subject matter
experts for their validation of morbidity and mortality statistics: Douglas Fleming,
Peter Aaby, Robert M. Jacobsen, John Rose, Charles Rupprecht, Lone Simonsen,
Stanislas Pol, P. Cacoub, Duane J. Gubler, M.S. Diamond, Eduardo Massad, Bruce
Aylward, Oliver Rosenbauer, Yun-Fan Liaw, Anna Wald, Martien Borgdorff, Jan
Holmgren, Ray Borrow, Charlotte Gaydos, Victor Nizet, Tom Quinn, Juerg Utzinger,
Robert Snow, Ricardo Gurtler, Gradoni Luigi and Vincent Soriano. Footnotes Competing Interests: The authors have declared that no competing interests exist. Funding: The La Jolla Institute of Allergy and Immunology is supported by NIH/NIAID,
contract number: HHSN2662004000 0 6C, under the Immune Epitope Database and
Analysis Program. The funders had no role in study design, data collection and
analysis, decision to publish, or preparation of the manuscript. References 1. Wang P, Morgan AA, Zhang Q, Sette A, Peters B. Automating document classification for the Immune Epitope
Database. BMC Bioinformatics. 2007;8:269. [PubMed] 2. Vita R, Vaughan K, Zarebski L, Salimi N, Fleri W, et al. High-throughput curation of complex, context-dependent
immunological data. BMC Bioinformatics. 2006;7:341. [PubMed] 3. WHO: www.who.int/mediacentre/factsheets/fs310/en/index.html.
‘Top Ten causes of death, 2004.’. 4. Blythe M, Zhang Q, Vaughan K, de Castro R, Jr, Salimi N, et al. An analysis of the epitope knowledge related to Mycobacteria. Immunome Res. 2007;3:10. [PubMed] 5. Bui HH, Peters B, Assarsson E, Mbawuike I, Sette A. Ab and T cell epitopes of influenza A virus, knowledge and
opportunities. Proc Natl Acad Sci U S A. 2007;104:246–51. [PubMed] 6. Vaughan K, Blythe M, Greenbaum J, Zhang Q, Peters B, et al. Meta-Analysis of immune epitope data for all Plasmodia: overview
and applications for malarial immunobiology and vaccine-related issues. Parasite Immunol. 2009;32:78–97. [PubMed] 7. Assarsson E, Bui HH, Sidney J, Zhang Q, Glenn J, et al. Immunomic analysis of the repertoire of T-cell specificities for
influenza A virus in humans. J Virol. 2008;82:1224–1251. 8. Flory JH, Kitcher P. Global health and the scientific research agenda. Philos Public Aff. 2004;32:36–65. [PubMed] 9. Enserink M. Some neglected disease are more neglected than others. Science. 2009;323:700. [PubMed] 10. McKenna MT, Zohrabian A. U.S. Burden of disease – Past, present and future. Ann Epidemiol. 2009;19:212–219. [PubMed] 11. Stuckler D, King L, Robinson H, McKee M. WHO's budgetary allocations and burden of disease. Lancet. 2008;372:1563–1569. [PubMed] 12. Research does not reflect global disease burden. Western J Med. 2000;173:9. 13. Sathiamurthy M, Peters B, Bui H-H, Sidney J, Mokili J, et al. An ontology for immune epitopes: application to the design of a
broad scope database of immune reactivities. Immunome Res. 2005;1:2. [PubMed] |
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||||||
BMC Bioinformatics. 2007 Jul 26; 8():269.
[BMC Bioinformatics. 2007]BMC Bioinformatics. 2006 Jul 12; 7():341.
[BMC Bioinformatics. 2006]Immunome Res. 2007 Dec 14; 3():10.
[Immunome Res. 2007]Parasite Immunol. 2009 Feb; 31(2):78-97.
[Parasite Immunol. 2009]Immunome Res. 2007 Dec 14; 3():10.
[Immunome Res. 2007]Parasite Immunol. 2009 Feb; 31(2):78-97.
[Parasite Immunol. 2009]Philos Public Aff. 2004 Winter; 32(1):36-65.
[Philos Public Aff. 2004]Science. 2009 Feb 6; 323(5915):700.
[Science. 2009]Ann Epidemiol. 2009 Mar; 19(3):212-9.
[Ann Epidemiol. 2009]Lancet. 2008 Nov 1; 372(9649):1563-9.
[Lancet. 2008]Immunome Res. 2005 Sep 20; 1(1):2.
[Immunome Res. 2005]BMC Bioinformatics. 2007 Jul 26; 8():269.
[BMC Bioinformatics. 2007]