Logo of bmcgenoBioMed Centralsearchsubmit a manuscriptregisterthis articleBMC Genomics
BMC Genomics. 2011; 12: 486.
Published online Oct 5, 2011. doi:  10.1186/1471-2164-12-486
PMCID: PMC3217955

New resources for functional analysis of omics data for the genus Aspergillus

Abstract

Background

Detailed and comprehensive genome annotation can be considered a prerequisite for effective analysis and interpretation of omics data. As such, Gene Ontology (GO) annotation has become a well accepted framework for functional annotation. The genus Aspergillus comprises fungal species that are important model organisms, plant and human pathogens as well as industrial workhorses. However, GO annotation based on both computational predictions and extended manual curation has so far only been available for one of its species, namely A. nidulans.

Results

Based on protein homology, we mapped 97% of the 3,498 GO annotated A. nidulans genes to at least one of seven other Aspergillus species: A. niger, A. fumigatus, A. flavus, A. clavatus, A. terreus, A. oryzae and Neosartorya fischeri. GO annotation files compatible with diverse publicly available tools have been generated and deposited online. To further improve their accessibility, we developed a web application for GO enrichment analysis named FetGOat and integrated GO annotations for all Aspergillus species with public genome sequences. Both the annotation files and the web application FetGOat are accessible via the Broad Institute's website (http://www.broadinstitute.org/fetgoat/index.html). To demonstrate the value of those new resources for functional analysis of omics data for the genus Aspergillus, we performed two case studies analyzing microarray data recently published for A. nidulans, A. niger and A. oryzae.

Conclusions

We mapped A. nidulans GO annotation to seven other Aspergilli. By depositing the newly mapped GO annotation online as well as integrating it into the web tool FetGOat, we provide new, valuable and easily accessible resources for omics data analysis and interpretation for the genus Aspergillus. Furthermore, we have given a general example of how a well annotated genome can help improving GO annotation of related species to subsequently facilitate the interpretation of omics data.

Background

Gene Ontology (GO) is a framework for functional annotation of gene products aiming to provide a unique vocabulary for living systems [1]. It comprises Biological Process (BP), Molecular Function (MF) and Cellular Component (CC) ontologies. GO terms are organized as directed acyclic graphs (DAG) meaning that GO terms are connected as nodes by directed edges defining hierarchical parent-child relationships. As a consequence, the specificity of GO terms increases with increasing distance from their root node. Enrichment analysis of GO terms is a well accepted approach to dissecting omics data in a non-biased manner. It has been used in many studies to highlight major trends in genomic, transcriptomic or proteomic datasets and describe them with a controlled vocabulary [2-5]. If the frequency of specific GO terms in a list of genes or proteins is higher than expected by chance, it is likely that these enriched GO terms are related to the biological processes under investigation.

The genus Aspergillus covers a group of filamentous fungi that includes saprophytes, human and plant pathogens as well as species being exploited in biotechnology. Whereas A. nidulans has been comprehensively studied and used as model organism, A. niger, A. oryzae and A. terreus are important industrial workhorses for the production of various enzymes and organic acids. In medical research, A. fumigatus and Neosartorya fischeri are intensively studied because of their importance as allergens and pathogens of immunocompromised patients. The aflatoxin producing fungus A. flavus is well known to cause spoilage of a great variety of agricultural goods. With genome sequences publicly available for eight of its species, the genus Aspergillus provides an important group of related fungal species for comparative genomics [6]. The exceptional role of this genus in the genomics of filamentous fungi is further emphasized by a community sequencing project (CSP#350), which has recently been initiated by the DOE Joint Genome Institute (JGI), aiming to sequence nine additional Aspergillus species. However, despite the importance of the genus Aspergillus, A. nidulans has so far been the only species with a genome-scale GO annotation inferred from both orthology mapping and intense manual curation [7-9], thus providing a valuable resource for the analysis of omics data.

In this work, we have generated a new central repository for functional analysis of omics data for the genus Aspergillus using GO annotation. Firstly, we extended the GO annotation of A. nidulans to all Aspergillus species with publicly available genome sequences and generated annotation files compatible with diverse publicly available tools for GO enrichment analysis. Secondly, we further improved the accessibility of the GO annotation for the genus Aspergillus by integrating it into a web tool for GO enrichment analysis and graph visualization named Fisher's exact test Gene Ontology annotation tool (FetGOat). Finally, we performed two case studies to demonstrate the value and flexibility of the newly generated resources for functional analysis of omics data for the genus Aspergillus.

Results

Mapping of GO annotation

A. nidulans is the only Aspergillus species for which comprehensive GO annotation based on both computational prediction and extended manual curation of gene-specific literature is available [9]. It constitutes a valuable resource for GO enrichment analysis, which has proven to be a powerful tool for dissecting omics data, for example sets of differentially expressed genes. The GO annotation of A. nidulans available at the Aspergillus Genome Database (AspGD) [9] covers 33% (3,498) of its predicted transcripts and associates them with 3,340 GO terms. Including all parental nodes, the list of GO terms extends to 5,508 comprising 3,061 (55%) BP, 1,753 MF (32%) and 694 (13%) CC terms.

To extend this valuable resource to other species of its genus, we mapped the A. nidulans GO annotation to all Aspergillus strains with published genome sequences (see Table Table1).1). Groups of orthologous and close paralogous proteins were compiled with the Sybil comparative analysis package [10], which applies a modified reciprocal best-hit approach comprising two clustering cycles. Roughly 89% (99,679) of all predicted proteins from the ten analyzed Aspergillus strains constituted 13,179 Jaccard orthologous clusters. For A. nidulans, 9,250 of its predicted proteins were organized in Jaccard orthologous clusters, meaning that roughly 80% of all A. nidulans proteins were linked to at least one ortholog of another Aspergillus species. Of the 3,498 GO annotated A. nidulans genes, 97% were contained in Jaccard orthologous clusters, meaning that their associated annotations could be mapped to at least one other Aspergillus species (see Figure Figure1).1). Overall, mapping resulted in an average of 3,484 GO annotated transcripts per genome ranging from 3,403 (A. clavatus) to 3,574 (A. flavus). On average, their GO annotations comprise 5,436 terms, (see Table Table1).1). These numbers correspond well to the GO annotation of A. nidulans and indicate that the majority (97%) of the A. nidulans GO annotated genes could be efficiently mapped to the other Aspergilli.

Table 1
Mapping of A. nidulans GO annotation
Figure 1
Mapping A. nidulans GO annotation to Jaccard orthologous clusters. Area-proportional Venn diagram [39] showing fractions of all A. nidulans transcripts (red) annotated by GO (green) and/or associated with Jaccard orthologous protein clusters (blue). The ...

Availability of GO resources for the genus Aspergillus

The newly mapped GO annotations were deposited at the Broad Institute's website (http://www.broadinstitute.org/fetgoat/index.html). Different annotation file formats were generated that can be used with diverse public tools for GO enrichment analysis, such as: the Gene Set Enrichment Analysis tool (GSEA) [11], the functional annotation suite Blast2GO [12], the Cytoscape plug-in BiNGO [13] and the Bioconductor package TopGO [14]. To further improve its accessibility, we have implemented Fisher's exact test [15], a well-accepted approach for GO enrichment analysis, in the web application FetGOat and integrated the newly mapped GO annotations. FetGOat can be accessed via a web interface at the Broad Institute's website (http://www.broadinstitute.org/fetgoat/index.html). It combines GO annotations for all Aspergillus species with public genome sequences and a widely used statistical methodology to identify overrepresented GO terms. Via the web interface, a list of gene identifiers can be uploaded to the server and statistical parameters can easily be adjusted with end-user computational skills. After completion of the analysis on the server-side, the enrichment results are sent by Email. The results consist of plain text and spreadsheet files as well as scalable vector graphics representing graphs of enriched GO terms.

Case studies

To demonstrate the flexibility and value of the newly generated resources for omics data analysis, we performed two case studies analyzing transcriptomic datasets recently published for the genus Aspergillus. In the first case study, we demonstrate that the generated resources can be used with various methods for enrichment analysis. We analyze a set of maltose-induced genes from A. niger using FetGOat and two alternative tools for enrichment analysis to subsequently compare their results. In the second case study, we highlight the advantage of having GO annotations that are as comprehensive as possible available for different species. We use FetGOat to analyze sets of glycerol-induced genes derived from a three-species microarray study to highlight major differences in the transcriptional responses for A. nidulans, A. niger and A. oryzae.

Maltose-induced genes

The first dataset reflects the transcriptomic responses of A. niger to growth in maltose and xylose-limited chemostat cultures at identical growth rates. From manual analysis of roughly 700 upregulated genes, Jørgensen et al. [16] concluded a concerted induction of secretory pathway genes in maltose compared to xylose-limited cultures.

Using three alternative approaches, we repeated the analysis of the maltose induced genes in an automated and un-biased manner to subsequently compare their enrichment results. First, we performed the analysis using the web application FetGOat. We identified 73 enriched GO terms, which were reduced to 19 most-specific GO terms by removing redundant higher hierarchy terms with less detailed annotations. In correspondence to the findings by Jørgensen et al., the enriched GO terms are related to important steps involved in protein secretion: Translocation to the endoplasmic reticulum, glycosylation and transport between the endoplasmic reticulum and the Golgi apparatus (see Table Table22).

Table 2
FetGOat enrichment analysis of maltose-induced genes

For comparison of FetGOat with alternative programs, we used the generated annotation files and repeated the enrichment analysis with two publicly available tools, Blast2GO [12] and GSEA [11]. The numbers of enriched GO terms found with Blast2GO and GSEA are in the same range compared to the results from FetGOat, they identified 76 and 47 enriched GO terms, respectively. To compare the enrichment results from the three tools, we computed semantic similarity scores with the G-SESAME tool [17]. For both FetGOat and Blast2GO, the enrichment statistic is based on Fisher's exact test and thus their results are theoretically expected to be identical resulting in a semantic similarity score of 1. A similarity score of 0.983 confirms that their results are virtually identical, with minor differences that are likely due to differences in their implementations. In contrast to FetGOat (and Blast2GO), the GSEA results are based on running-sum statistics computed from the complete expression data set. Therefore, the similarity between their results can be expected to be less. Accordingly, G-SESAME determined a smaller semantic similarity score of 0.863 for the results obtained with FetGOat and the GSEA tool.

In addition to the GO terms identified by both Fisher's exact test based tools, GSEA computed an enrichment of GO terms related to oxidative phosphorylation (GO:0006119), carbohydrate transport (GO:0008643) and glucosidase activity (GO:0015926). Comparing maltose to xylose limitation, an enrichment of those GO terms fits our expectations. Under maltose-limitation, A. niger breaks down the disaccharide into its monomer glucose by enzymes having glucosidase activity. Subsequently, glucose is taken up by carbohydrate transporters, which can be expected to be different from those required for the uptake of xylose. Finally, 1 mole of glucose yields more ATP than 1 mole of xylose, thereby explaining an induction of oxidative phosphorylation.

These differences in the enrichment results are potentially inherited by the statistics applied by Jørgensen et al. to define the set of maltose-induced genes. In contrast to the GSEA tool, which analyzes the complete expression data, FetGOat and Blast2GO are depending on a-priori performed statistics that were applied to generate subsets of genes or proteins of interest. Jørgensen et al. used the Affymetrix MAS 5.0 algorithm for data pre-processing in combination with the student's t-test to define their set of maltose induced genes. In current literature, this approach is critically discussed [18,19]. To assess the effect of those a-priori applied statistics on the differences between the results from FetGOat and the GSEA tool, we generated an alternative set of maltose-induced genes. We computed RMA expression data [18] from the raw data (CEL files) and subsequently applied a moderated t-statistic [20] to identify upregulated genes (data not shown). Interestingly, FetGOat also identified enriched GO terms related to glucosidase activity and carbohydrate transport for this alternative set of maltose-induced genes. However, no enrichment of genes related to oxidative phosphorylation was found. Genes annotated with the GO term oxidative phosphorylation were only marginally induced and their FDR values were rather high (data not shown). Interestingly, similar differences between Fisher's exact test based methods and the GSEA tool were reported in another study. In muscle tissue from diabetics, the GSEA tool identified a joint downregulation of genes related to oxidative phosphorylation compared to healthy controls, while no enrichment was found in the set of downregulated genes [21]. For tightly regulated essential cellular processes that show only minor fold changes, the GSEA tool seems to be superior to gene-by-gene differential expression studies.

Glycerol-induced genes

In the second case study, we used FetGOat to analyze transcriptomic data generated by Salazar et al. [22]. With a three-species microarray, the authors studied the transcriptomic responses of A. nidulans, A. niger and A. oryzae to growth in glycerol and glucose-limited batch cultures. The authors identified 4,139 glycerol-induced genes comprising 679, 2,240 and 1,040 genes from A. nidulans, A. niger and A. oryzae, respectively. Based on tri-directional best blast hits, 81 orthologous gene clusters were shown to be upregulated in each of the species. Using the A. niger (strain ATCC 1015) GO annotation, Salazar et al. analyzed the set of conserved upregulated genes and identified enriched BP terms, which are related to amino acid metabolism, gluconeogenesis, hexose and alcohol biosynthetic processes.

First, we repeated the enrichment analysis similar to Salazar et al. on the set of 81 upregulated and conserved genes. With the web application FetGOat, we individually performed enrichment analysis using GO annotations of A. nidulans, A. niger (strain ATCC 1015) and A. oryzae. FetGOat identified 58, 57 and 54 enriched BP terms, respectively. To summarize the enrichment results for the three Aspergilli and compare them with each other, we mapped the GO terms to a GO Slim annotation and counted the occurrences of related GO terms. As expected from analyzing orthologous gene sets, the counts for the GO Slim terms were nearly identical, independent of which of the three Aspergilli the enrichment analysis was performed for (see Figure Figure2).2). To further assess the similarity of the three lists of enriched GO terms, we used the G-SESAME tool [17] and computed pair-wise semantic similarity scores for A. nidulans vs. A. niger, A. nidulans vs. A. oryzae and A. niger vs. A. oryzae of 0.991, 0.992 and 0.993, respectively. The similarity of the three enrichment results indicates that the newly mapped GO annotations for A. niger and A. oryzae are well comparable with each other and the A. nidulans GO annotation.

Figure 2
Comparative GO enrichment analysis of conserved glycerol-induced orthologous gene sets. Comparative enrichment analysis of 81 glycerol-induced and conserved genes of the three species A. nidulans, A. niger and A. oryzae [22]. Using FetGOat, enrichment ...

Corresponding to the enrichment results from Salazar et al., FetGOat identified enriched GO terms that are related to pyruvate and (aromatic) amino acid metabolism. Unlike Salazar et al., FetGOat did not identify BP terms related to gluconeogenesis. This difference can be explained by an improvement of the GO annotation. While only three genes were annotated with the BP term gluconeogenesis (GO:0006094) in the GO annotation used by Salazar et al., it is a total of 28 genes in the newly mapped GO annotation for A. niger (ATCC 1015 strain). For both annotations, one out of the upregulated conserved genes is annotated by the BP term gluconeogenesis, thus explaining why Salazar et al. identified it as an enriched BP term and FetGOat did not.

Next, we aimed to identify differences in the tendencies of the transcriptional responses to glycerol for the three Aspergilli. With FetGOat, we individually performed enrichment analysis on each of the complete sets of upregulated genes and found 35, 100 and 65 enriched BP terms for A. nidulans, A. niger and A. oryzae, respectively. The differences in the number of enriched BP terms correspond to the differences in the number of upregulated genes. To summarize and compare the results with each other, we mapped the GO terms to a GO-Slim annotation and counted their occurrences (see Figure Figure3).3). This summary clearly shows different tendencies in the transcriptomic responses of the three Aspergilli. Most strikingly, a number of GO-Slim terms were identified as being enriched for A. niger but not for the other two Aspergilli. Many of the associated GO terms are directly or indirectly related to nutrient limitation such as conidiation, secondary metabolic processes and cell death. Furthermore, FetGOat found an enrichment of the BP term response to nutrient levels (GO:0031667) for A. nidulans (nine upregulated genes) and A. niger (30 upregulated genes) but not A. oryzae. In contrast, GO terms related to energy generation and peroxisomal organization were enriched for A. oryzae but not for the other two Aspergilli. FetGOat further computed an enrichment of the BP term carbohydrate transport (GO:0008643) specifically for A. oryzae. Interestingly, the different transcriptional trends correspond well with the physiological data. The capacities to grow on glycerol differ significantly for the three Aspergilli. With a maximum specific growth rate of 0.05 h-1, which is one-fourth of its maximum specific growth rate on glucose, A. niger grew the worst on glycerol. In contrast, A. oryzae showed the fastest growth (0.30 h-1), which is equal to approximately 80% of its maximum specific growth rate on glucose. A. nidulans is in between and grew with roughly 50% of its glucose specific speed.

Figure 3
Comparative GO enrichment analysis of individual glycerol-induced gene sets. FetGOat was used for comparative enrichment analysis of the complete glycerol-induced gene sets of the three species A. nidulans, A. niger and A. oryzae [22]. For comparison, ...

Discussion

A detailed and comprehensive genome annotation can be considered a prerequisite for the analysis and interpretation of omics data. GO provides a framework for functional annotation and has been proven to be a valuable tool for omics data analysis, especially in combination with enrichment statistics. Currently, the GO reference genome project [23] provides the most comprehensive manually curated GO annotation for twelve model organisms and is intended to serve as a reference for automated mapping of GO annotation to organisms other than these major models. From the reference genome projects, Saccharomyces cerevisiae and Schizosaccharomyces pombe are most closely related to the genus Aspergillus.

A. nidulans has so far been the only Aspergillus species with comprehensive genome scale GO annotation based on both orthology mapping to S. cerevisiae and extensive manual curation [9] of gene-specific literature. We have thus mapped the A. nidulans GO annotation to all other Aspergillus species (see Table Table1)1) with published genomes. With 79% of all A. nidulans genes being organized in Jaccard orthologous clusters covering 97% of all its GO annotated genes, we demonstrated that this approach is promising for mapping GO annotation between closely related genomes such as those of the genus Aspergillus. Nevertheless, the newly generated GO annotations have exclusively been inferred by computational analysis and thus their quality can be expected to be lower compared to the extensively manually curated A. nidulans GO annotation. The ortholog clustering approach as implemented in the Sybil comparative analysis package [10] has worked well for a number of comparative genome studies [24-33], but does have limitations, especially when there are a large number of strains and/or percentage of repetitive proteins. Additionally, we recognize that the optimal choice of an ortholog detection method depends on the purpose of the analysis. This graph based approach is robust if looking at closely related species, but may not be the best choice when considering large numbers of more distantly related genomes.

The GO annotations for ten Aspergillus strains (see Table Table1)1) have been made available at the Broad Institute's website (http://www.broadinstitute.org/fetgoat/index.html) and will be updated regularly as the GO annotations for the various Aspergillus species continue to improve through manual and computational efforts. To improve the applicability of the GO annotations, they are provided in different file formats that can be used with various freely available GO enrichment tools, e.g. Blast2GO [12], TopGO [14], GSEA [11] and BinGO [13]. Thereby, functional analysis of Aspergillus omics data by GO enrichment analysis is strongly facilitated. The availability of different annotation file formats makes it feasible to use different tools and compare them with each other.

To further improve the accessibility of the extended annotations, we developed the web application FetGOat and integrated the GO annotation for all Aspergillus species with public genome sequences. FetGOat basically resembles the functionality of other publicly available enrichment tools. However, for the Aspergillus research community, FetGOat is a valuable addition to existing programs because it uniquely combines an intuitive web interface, GO annotations for all Aspergilli with public genome sequences and a frequently applied statistical method for the identification of enriched GO terms.

To demonstrate the use of those newly generated resources for functional analysis of omics data, we applied them in two case studies to re-analyze recently published microarray data in an automated and un-biased manner. As shown for the first dataset, the enrichment results are in correspondence to the main conclusions from Jørgensen et al. [16]. We found an induction of processes related to secretion, glycosylation and starch degradation (see Table Table2).2). In addition, we used the dataset from Jørgensen et al. to compare the enrichment results of FetGOat to those obtained with two well established publicly available tools, Blast2GO and GSEA. The three tools apply two different methods for enrichment analysis. While Blast2GO and FetGOat compute a Fisher's exact test statistic to identify GO terms that are over-represented in subsets of genes derived e.g. from transcriptomic or proteomic data, the GSEA tool computes running sum statistics on (non-filtered) expression data to identify a-priori defined groups of genes that show joined differential expression. The results from FetGOat are virtually identical to the results obtained with Blast2GO demonstrating the correctness of FetGOat. As expected, the similarity between the results from FetGOat and the GSEA tool is less, while their results are still well comparable. For a large part, both tools are highlighting the same transcriptional trends. However, the GO term oxidative phosphorylation was exclusively identified as being enriched by the GSEA tool. Taking into account that 1 mole of glucose yields more ATP than 1 mole of xylose, an induction of the oxidative phosphorylation machinery during growth in maltose-limited cultures can be expected. Because the fold-changes of the corresponding genes were very small and their statistical significances were low, no enrichment could be found in the set of maltose-induced genes as assessed by Fisher's exact test. Similar results were found in another study, in which the GSEA tool detected a joined transcriptional downregulation of genes related to oxidative phosphorylation in tissue from diabetics vs. control [21]. For tightly regulated essential genes, which show only marginal differential expression, the GSEA tool seems to be superior to gene-by-gene differential expression approaches. However, we would like to emphasize that this is rather caused by the a-priori performed statistics than by the Fisher's exact test itself. A combination of clustering based on gene expression profiles combined with Fisher's exact test enrichment statistics will potentially allow to draw similar conclusions as with the GSEA tool. The causality between an increased ATP yield for maltose and an upregulation of secretion related genes remains to be investigated. However, it is an interesting new hypothesis for further investigations.

For the second dataset from Salazar et al. [22], we first performed GO enrichment analysis on the set of 81 conserved and glycerol-induced genes used in the original study. We could partly reproduce the enrichment results. However, we didn't find an enrichment of genes annotated with the GO term gluconeogenesis. A comparison of the GO annotation used by Salazar et al. and our newly mapped GO annotation revealed that this is due to an improvement of the newly mapped GO annotation, which includes many more genes annotated with the GO term gluconeogenesis. As expected from analyzing orthologous gene sets, we showed that the enrichment results are nearly identical, independent of which of the three Aspergilli they were obtained for. Furthermore, we separately performed enrichment analysis for the three Aspergilli analyzing their complete sets of up regulated genes and highlighted major differences in their responses to glycerol vs. glucose limitation. Thereby, we were able to draw additional conclusions explaining their different capabilities to grow on glycerol. Especially for the three-species microarray platform, FetGOat in combination with the newly mapped GO annotation forms a new, valuable and flexible resource for omics data analysis. Applied at an early stage of data analysis, GO enrichment analysis can thus strongly facilitate subsequent manual data interpretation.

While GSEA is an attractive alternative to Fisher's exact test based tools such as FetGOat and Blast2GO, it lacks flexibility because it is restricted to transcriptomic data and can only compare two conditions at a time. Furthermore, its application is more sophisticated, because microarray specific chip annotation files as well as phenotypic labels have to be provided for analysis. Tools such as FetGOat and Blast2GO can be applied to any set of genes or proteins deriving from genomic, transcriptomic or proteomic studies. They can for example be used to perform GO enrichment analysis on a set of proteins commonly secreted under certain conditions. Improving the power of the statistics applied to obtain gene sets of interest will consequently improve the strength of Fisher's exact test based enrichment analysis. For transcriptomic data analysis, moderated statistics or non specific filtering have for example been shown to improve the statistical power [19].

The choice of a tool for GO enrichment analysis depends on the type of data, the available resources and personal preferences. Certainly, most of the enrichment results will be redundant between the tools. With the different GO annotation files generated in this study, various freely available tools can easily be used and compared with each other. Especially for the genus Aspergillus, FetGOat stands out with respect to the ease of use and the integration of comprehensive and regularly updated GO annotations. The power of FetGOat lies in its flexibility. Any set of genes/proteins from any Aspergillus strain with published genome sequence can be investigated for enrichment of GO terms. FetGOat is not restricted to the genus Aspergillus as it can be extended to include GO annotations from any organism of interest.

Conclusions

We have mapped the A. nidulans GO annotation to the genomes of seven other Aspergillus species and made the GO annotations available in different file formats. We furthermore developed the web tool FetGOat, which can be used for GO enrichment analysis of omics data from all Aspergillus strains with published genome sequences. Both, the mapped GO annotations and FetGOat were successfully applied in two case studies and are available at the Broad Institute's website (http://www.broadinstitute.org/fetgoat/index.html). Moreover, we have given a general example of how a well annotated genome can help improving GO annotation of related species to subsequently facilitate the interpretation of omics data.

Methods

Ortholog and paralog identification

Clusters of orthologous proteins from ten Aspergillus strains (see Table Table1)1) were generated with Sybil [10]. The Sybil comparative analysis package currently utilizes the following two-step clustering method, which is a modification of the standard reciprocal best match approach. First, an all-vs-all protein similarity matrix is computed by searching each of the predicted polypeptides within the genomes being compared against all polypeptides. BLASTP is currently used for these searches, with an E-value cut off of 1E-5. Polypeptides from each individual species are clustered independently using only BLASTP hits that had a sequence identity score of at least 80%. The BLASTP matches that meet these criteria are used to compute a Jaccard similarity coefficient [34] for each distinct pair of polypeptides in the same genome. Given two polypeptides P1 and P2 the Jaccard similarity coefficient is defined as:

J(P1,P2)=matches toP1P2matches toP1P2

Using default parameters, any pair of polypeptides with a Jaccard coefficient > 0.6 is connected in a graph representation. The connected components of this graph are referred to as "Jaccard Clusters" and are analogous to paralogous protein clusters within each species. Subsequently, the reciprocal best-hit phase of the clustering algorithm identifies pairs of Jaccard clusters such that: (1) The clusters are from different genomes. (2) The highest-scoring BLASTP match of at least one polypeptide in each of the clusters is to a polypeptide in the other cluster. A graph is constructed, with an edge drawn between two nodes (Jaccard clusters) if and only if they are bidirectional best BLASTP matches of each other. The connected components of this graph are considered ortholog groups in downstream analysis and will be referred to as "Jaccard orthologous clusters".

Mapping A. nidulans GO annotation

GO annotation for A. nidulans (gene_association.aspgd version: 1.256) was obtained from the Aspergillus Genome Database (AspGD: http://www.aspgd.org) [9] and is based on orthology mapping between A. nidulans and S. cerevisiae as well as extensive manual curation based on gene specific A. nidulans literature. GO terms for Jaccard orthologous clusters and their associated proteins were inferred from A. nidulans GO annotation such that each protein belonging to the same Jaccard orthologous cluster shares identical GO terms. For each of the analyzed strains (see Table Table1),1), individual GO annotation files were generated in different formats.

Enrichment analysis

GO enrichment analyses were performed applying two different statistical tests: Fisher's exact test [15] and Kolmogorow-Smirnov statistics [11,35]. If not stated differently, p-values were corrected according to Benjamini & Hochberg [36] and a critical False Discovery Rate (FDR) q-value of 0.05 was applied. For the Fisher's exact test based enrichment analysis of GO terms, we developed the web application FetGOat, which calculates one-tailed p-values and corrects them for multiple hypothesis testing according to the Benjamini & Hochberg method. In addition to FetGOat, Blast2GO [12] was used to compute enriched GO terms via Fisher's exact test as implemented in GOSSIP [37]. For the identification of enriched GO terms based on the Kolmogorov-Smirnov statistic, the GSEA tool [11] was used. The corresponding GO annotation files for Blast2GO and the GSEA tool were generated in this study and are available at the Broad Institute's website (http://www.broadinstitute.org/fetgoat/index.html).

Mapping to GO slim annotation

To summarize GO enrichment results, we mapped the enriched GO terms to a GO Slim annotation [1], which is a reduced version of the complete annotation with less detailed high-level GO terms, and counted the occurrences (single occurrence option) of GO Slim terms as well as related lower hierarchy terms using the CateGOrizer tool [38].

Authors' contributions

BMN implemented a data analysis pipeline to map GO annotation to Jaccard clusters and generate annotation files, developed FetGOat and performed further data analysis. JC and JRW generated Jaccard clusters. GCC integrated FetGOat at the Borad Institute's website. JRW, AFJR and VM were involved in writing the manuscript. All authors read and approved the final manuscript.

Acknowledgements

This work was supported by grants of the SenterNovem IOP Genomics project (IGE07008) and the National Institute of Allergy and Infectious Diseases at the US National Institutes of Health (R01 AI077599). Part of this work was carried out within the research programme of the Kluyver Centre for Genomics of Industrial Fermentation, which is part of the Netherlands Genomics Initiative/Netherlands Organization for Scientific Research.

References

  • Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000;25:25–9. doi: 10.1038/75556. [PMC free article] [PubMed] [Cross Ref]
  • Jorgensen TR, Nitsche BM, Lamers GE, Arentshorst M, van den Hondel CA, Ram AF. Transcriptomic insights into the physiology of Aspergillus niger approaching a specific growth rate of zero. Appl Environ Microbiol. 2010;76(16):5344–55. doi: 10.1128/AEM.00450-10. [PMC free article] [PubMed] [Cross Ref]
  • Lin MK, Lee YJ, Lough TJ, Phinney BS, Lucas WJ. Analysis of the pumpkin phloem proteome provides insights into angiosperm sieve tube function. Mol Cell Proteomics. 2009;8(2):343–56. [PubMed]
  • Twine NA, Janitz K, Wilkins MR, Janitz M. Whole transcriptome sequencing reveals gene expression and splicing differences in brain regions affected by Alzheimer's disease. PLoS, ONE. 2011;6:e16266. doi: 10.1371/journal.pone.0016266. [PMC free article] [PubMed] [Cross Ref]
  • Ryan JC, Morey JS, Bottein MY, Ramsdell JS, Van Dolah FM. Gene expression profiling in brain of mice exposed to the marine neurotoxin ciguatoxin reveals an acute anti-inflammatory, neuroprotective response. BMC Neurosci. 2010;11:107. doi: 10.1186/1471-2202-11-107. [PMC free article] [PubMed] [Cross Ref]
  • Jones MG. The first filamentous fungal genome sequences: Aspergillus leads the way for essential everyday resources or dusty museum specimens? Microbiology. 2007;153:1–6. doi: 10.1099/mic.0.2006/001479-0. [PubMed] [Cross Ref]
  • Wortman JR, Gilsenan JM, Joardar V, Deegan J, Clutterbuck J, Andersen MR, Archer D, Bencina M, Braus G, Coutinho P, von Dohren H, Doonan J, Driessen AJ, Durek P, Espeso E, Fekete E, Flipphi M, Estrada CG, Geysens S, Goldman G, de Groot PW, Hansen K, Harris SD, Heinekamp T, Helmstaedt K, Henrissat B, Hofmann G, Homan T, Horio T, Horiuchi H, James S, Jones M, Karaffa L, Karanyi Z, Kato M, Keller N, Kelly DE, Kiel JA, Kim JM, van der Klei IJ, Klis FM, Kovalchuk A, Krasevec N, Kubicek CP, Liu B, Maccabe A, Meyer V, Mirabito P, Miskei M, Mos M, Mullins J, Nelson DR, Nielsen J, Oakley BR, Osmani SA, Pakula T, Paszewski A, Paulsen I, Pilsyk S, Pocsi I, Punt PJ, Ram AF, Ren Q, Robellet X, Robson G, Seiboth B, van Solingen P, Specht T, Sun J, Taheri-Talesh N, Takeshita N, Ussery D, van Kuyk PA, Visser H, van de Vondervoort PJ, de Vries RP, Walton J, Xiang X, Xiong Y, Zeng AP, Brandt BW, Cornell MJ, van den Hondel CA, Visser J, Oliver SG, Turner G. The 2008 update of the Aspergillus nidulans genome annotation: a community effort. Fungal Genet Biol. 2009;46(Suppl 1):S2–13. [PMC free article] [PubMed]
  • Galagan JE, Calvo SE, Cuomo C, Ma LJ, Wortman JR, Batzoglou S, Lee SI, Basturkmen M, Spevak CC, Clutterbuck J, Kapitonov V, Jurka J, Scazzocchio C, Farman M, Butler J, Purcell S, Harris S, Braus GH, Draht O, Busch S, D'Enfert C, Bouchier C, Goldman GH, Bell-Pedersen D, Griffiths-Jones S, Doonan JH, Yu J, Vienken K, Pain A, Freitag M, Selker EU, Archer DB, Penalva MA, Oakley BR, Momany M, Tanaka T, Kumagai T, Asai K, Machida M, Nierman WC, Denning DW, Caddick M, Hynes M, Paoletti M, Fischer R, Miller B, Dyer P, Sachs MS, Osmani SA, Birren BW. Sequencing of Aspergillus nidulans and comparative analysis with A. fumigatus and A. oryzae. Nature. 2005;438(7071):1105–15. doi: 10.1038/nature04341. [PubMed] [Cross Ref]
  • Arnaud MB, Chibucos MC, Costanzo MC, Crabtree J, Inglis DO, Lotia A, Orvis J, Shah P, Skrzypek MS, Binkley G, Miyasato SR, Wortman JR, Sherlock G. The Aspergillus Genome Database, a curated comparative genomics resource for gene, protein and sequence information for the Aspergillus research community. Nucleic Acids Res. 2010. pp. D420–7. [PMC free article] [PubMed]
  • Crabtree J, Angiuoli SV, Wortman JR, White OR. Sybil: methods and software for multiple genome comparison and visualization. Methods Mol Biol. 2007;408:93–108. doi: 10.1007/978-1-59745-547-3_6. [PubMed] [Cross Ref]
  • Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA. 2005;102(43):15545–50. doi: 10.1073/pnas.0506580102. [PMC free article] [PubMed] [Cross Ref]
  • Conesa A, Gotz S, Garcia-Gomez JM, Terol J, Talon M, Robles M. Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics. 2005;21(18):3674–6. doi: 10.1093/bioinformatics/bti610. [PubMed] [Cross Ref]
  • Maere S, Heymans K, Kuiper M. BiNGO: a Cytoscape plugin to assess overrepresentation of gene ontology categories in biological networks. Bioinformatics. 2005;21(16):3448–9. doi: 10.1093/bioinformatics/bti551. [PubMed] [Cross Ref]
  • Adrian Alexa JR. topGO: Enrichment analysis for Gene Ontology. 2010.
  • Fisher R. On the Interpretation of χ2 from Contingency Tables, and the Calculation of P. Journal of the Royal Statistical Society. Series B (Methodological) 1922;85:87–94. doi: 10.2307/2340521. [Cross Ref]
  • Jorgensen TR, Goosen T, Hondel CA, Ram AF, Iversen JJ. Transcriptomic comparison of Aspergillus niger growing on two different sugars reveals coordinated regulation of the secretory pathway. BMC Genomics. 2009;10:44. doi: 10.1186/1471-2164-10-44. [PMC free article] [PubMed] [Cross Ref]
  • Du Z, Li L, Chen CF, Yu PS, Wang JZ. G-SESAME: web tools for GO-term-based gene similarity analysis and knowledge discovery. Nucleic Acids Res. 2009. pp. W345–9. [PMC free article] [PubMed]
  • Irizarry RA, Bolstad BM, Collin F, Cope LM, Hobbs B, Speed TP. Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Res. 2003;31(4):e15. doi: 10.1093/nar/gng015. [PMC free article] [PubMed] [Cross Ref]
  • Bourgon R, Gentleman R, Huber W. Independent filtering increases detection power for high-throughput experiments. Proc Natl Acad Sci USA. pp. 9546–51. [PMC free article] [PubMed]
  • Smyth GK. Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol. 2004;3 Article3. [PubMed]
  • Mootha VK, Lindgren CM, Eriksson KF, Subramanian A, Sihag S, Lehar J, Puigserver P, Carlsson E, Ridderstrale M, Laurila E, Houstis N, Daly MJ, Patterson N, Mesirov JP, Golub TR, Tamayo P, Spiegelman B, Lander ES, Hirschhorn JN, Altshuler D, Groop LC. PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat Genet. 2003;34(3):267–73. doi: 10.1038/ng1180. [PubMed] [Cross Ref]
  • Salazar M, Vongsangnak W, Panagiotou G, Andersen MR, Nielsen J. Uncovering transcriptional regulation of glycerol metabolism in Aspergilli through genome-wide gene expression data analysis. Mol Genet Genomics. 2009;282(6):571–86. doi: 10.1007/s00438-009-0486-y. [PubMed] [Cross Ref]
  • of the Gene Ontology Consortium TRGG. The Gene Ontology's Reference Genome Project: a unified framework for functional annotation across species. PLoS Comput Biol. 2009;5(7):e1000431. doi: 10.1371/journal.pcbi.1000431. [PMC free article] [PubMed] [Cross Ref]
  • Carlton JM, Adams JH, Silva JC, Bidwell SL, Lorenzi H, Caler E, Crabtree J, Angiuoli SV, Merino EF, Amedeo P, Cheng Q, Coulson RM, Crabb BS, Del Portillo HA, Essien K, Feldblyum TV, Fernandez-Becerra C, Gilson PR, Gueye AH, Guo X, Kang'a S, Kooij TW, Korsinczky M, Meyer EV, Nene V, Paulsen I, White O, Ralph SA, Ren Q, Sargeant TJ, Salzberg SL, Stoeckert CJ, Sullivan SA, Yamamoto MM, Hoffman SL, Wortman JR, Gardner MJ, Galinski MR, Barnwell JW, Fraser-Liggett CM. Comparative genomics of the neglected human malaria parasite Plasmodium vivax. Nature. 2008;455(7214):757–63. doi: 10.1038/nature07327. [PMC free article] [PubMed] [Cross Ref]
  • McDonagh A, Fedorova ND, Crabtree J, Yu Y, Kim S, Chen D, Loss O, Cairns T, Goldman G, Armstrong-James D, Haynes K, Haas H, Schrettl M, May G, Nierman WC, Bignell E. Sub-telomere directed gene expression during initiation of invasive aspergillosis. PLoS Pathog. 2008;4(9):e1000154. doi: 10.1371/journal.ppat.1000154. [PMC free article] [PubMed] [Cross Ref]
  • Fedorova ND, Khaldi N, Joardar VS, Maiti R, Amedeo P, Anderson MJ, Crabtree J, Silva JC, Badger JH, Albarraq A, Angiuoli S, Bussey H, Bowyer P, Cotty PJ, Dyer PS, Egan A, Galens K, Fraser-Liggett CM, Haas BJ, Inman JM, Kent R, Lemieux S, Malavazi I, Orvis J, Roemer T, Ronning CM, Sundaram JP, Sutton G, Turner G, Venter JC, White OR, Whitty BR, Youngman P, Wolfe KH, Goldman GH, Wortman JR, Jiang B, Denning DW, Nierman WC. Genomic islands in the pathogenic filamentous fungus Aspergillus fumigatus. PLoS Genet. 2008;4(4):e1000046. doi: 10.1371/journal.pgen.1000046. [PMC free article] [PubMed] [Cross Ref]
  • Brayton KA, Lau AO, Herndon DR, Hannick L, Kappmeyer LS, Berens SJ, Bidwell SL, Brown WC, Crabtree J, Fadrosh D, Feldblum T, Forberger HA, Haas BJ, Howell JM, Khouri H, Koo H, Mann DJ, Norimine J, Paulsen IT, Radune D, Ren Q, Smith JRK, Suarez CE, White O, Wortman JR, Knowles JDP, McElwain TF, Nene VM. Genome sequence of Babesia bovis and comparative analysis of apicomplexan hemoprotozoa. PLoS Pathog. 2007;3(10):1401–13. [PMC free article] [PubMed]
  • Ghedin E, Wang S, Spiro D, Caler E, Zhao Q, Crabtree J, Allen JE, Delcher AL, Guiliano DB, Miranda-Saavedra D, Angiuoli SV, Creasy T, Amedeo P, Haas B, El-Sayed NM, Wortman JR, Feldblyum T, Tallon L, Schatz M, Shumway M, Koo H, Salzberg SL, Schobel S, Pertea M, Pop M, White O, Barton GJ, Carlow CK, Crawford MJ, Daub J, Dimmic MW, Estes CF, Foster JM, Ganatra M, Gregory WF, Johnson NM, Jin J, Komuniecki R, Korf I, Kumar S, Laney S, Li BW, Li W, Lindblom TH, Lustigman S, Ma D, Maina CV, Martin DM, McCarter JP, McReynolds L, Mitreva M, Nutman TB, Parkinson J, Peregrin-Alvarez JM, Poole C, Ren Q, Saunders L, Sluder AE, Smith K, Stanke M, Unnasch TR, Ware J, Wei AD, Weil G, Williams DJ, Zhang Y, Williams SA, Fraser-Liggett C, Slatko B, Blaxter ML, Scott AL. Draft genome of the filarial nematode parasite Brugia malayi. Science. 2007;317(5845):1756–60. doi: 10.1126/science.1145406. [PMC free article] [PubMed] [Cross Ref]
  • Dunning Hotopp JC, Lin M, Madupu R, Crabtree J, Angiuoli SV, Eisen JA, Seshadri R, Ren Q, Wu M, Utterback TR, Smith S, Lewis M, Khouri H, Zhang C, Niu H, Lin Q, Ohashi N, Zhi N, Nelson W, Brinkac LM, Dodson RJ, Rosovitz MJ, Sundaram J, Daugherty SC, Davidsen T, Durkin AS, Gwinn M, Haft DH, Selengut JD, Sullivan SA, Zafar N, Zhou L, Benahmed F, Forberger H, Halpin R, Mulligan S, Robinson J, White O, Rikihisa Y, Tettelin H. Comparative genomics of emerging human ehrlichiosis agents. PLoS Genet. 2006;2(2):e21. doi: 10.1371/journal.pgen.0020021. [PMC free article] [PubMed] [Cross Ref]
  • El-Sayed NM, Myler PJ, Blandin G, Berriman M, Crabtree J, Aggarwal G, Caler E, Renauld H, Worthey EA, Hertz-Fowler C, Ghedin E, Peacock C, Bartholomeu DC, Haas BJ, Tran AN, Wortman JR, Alsmark UC, Angiuoli S, Anupama A, Badger J, Bringaud F, Cadag E, Carlton JM, Cerqueira GC, Creasy T, Delcher AL, Djikeng A, Embley TM, Hauser C, Ivens AC, Kummerfeld SK, Pereira-Leal JB, Nilsson D, Peterson J, Salzberg SL, Shallom J, Silva JC, Sundaram J, Westenberger S, White O, Melville SE, Donelson JE, Andersson B, Stuart KD, Hall N. Comparative genomics of trypanosomatid parasitic protozoa. Science. 2005;309(5733):404–9. doi: 10.1126/science.1112181. [PubMed] [Cross Ref]
  • El-Sayed NM, Myler PJ, Bartholomeu DC, Nilsson D, Aggarwal G, Tran AN, Ghedin E, Worthey EA, Delcher AL, Blandin G, Westenberger SJ, Caler E, Cerqueira GC, Branche C, Haas B, Anupama A, Arner E, Aslund L, Attipoe P, Bontempi E, Bringaud F, Burton P, Cadag E, Campbell DA, Carrington M, Crabtree J, Darban H, da Silveira JF, de Jong P, Edwards K, Englund PT, Fazelina G, Feldblyum T, Ferella M, Frasch AC, Gull K, Horn D, Hou L, Huang Y, Kindlund E, Klingbeil M, Kluge S, Koo H, Lacerda D, Levin MJ, Lorenzi H, Louie T, Machado CR, McCulloch R, McKenna A, Mizuno Y, Mottram JC, Nelson S, Ochaya S, Osoegawa K, Pai G, Parsons M, Pentony M, Pettersson U, Pop M, Ramirez JL, Rinta J, Robertson L, Salzberg SL, Sanchez DO, Seyler A, Sharma R, Shetty J, Simpson AJ, Sisk E, Tammi MT, Tarleton R, Teixeira S, Van Aken S, Vogt C, Ward PN, Wickstead B, Wortman J, White O, Fraser CM, Stuart KD, Andersson B. The genome sequence of Trypanosoma cruzi, etiologic agent of Chagas disease. Science. 2005;309(5733):409–15. doi: 10.1126/science.1112631. [PubMed] [Cross Ref]
  • Gardner MJ, Bishop R, Shah T, de Villiers EP, Carlton JM, Hall N, Ren Q, Paulsen IT, Pain A, Berriman M, Wilson RJ, Sato S, Ralph SA, Mann DJ, Xiong Z, Shallom SJ, Weidman J, Jiang L, Lynn J, Weaver B, Shoaibi A, Domingo AR, Wasawo D, Crabtree J, Wortman JR, Haas B, Angiuoli SV, Creasy TH, Lu C, Suh B, Silva JC, Utterback TR, Feldblyum TV, Pertea M, Allen J, Nierman WC, Taracha EL, Salzberg SL, White OR, Fitzhugh HA, Morzaria S, Venter JC, Fraser CM, Nene V. Genome sequence of Theileria parva, a bovine pathogen that transforms lymphocytes. Science. 2005;309(5731):134–7. doi: 10.1126/science.1110439. [PubMed] [Cross Ref]
  • Joardar V, Lindeberg M, Jackson RW, Selengut J, Dodson R, Brinkac LM, Daugherty SC, Deboy R, Durkin AS, Giglio MG, Madupu R, Nelson WC, Rosovitz MJ, Sullivan S, Crabtree J, Creasy T, Davidsen T, Haft DH, Zafar N, Zhou L, Halpin R, Holley T, Khouri H, Feldblyum T, White O, Fraser CM, Chatterjee AK, Cartinhour S, Schneider DJ, Mansfield J, Collmer A, Buell CR. Whole-genome sequence analysis of Pseudomonas syringae pv. phaseolicola 1448A reveals divergence among pathovars in genes involved in virulence and transposition. J Bacteriol. 2005;187(18):6488–98. doi: 10.1128/JB.187.18.6488-6498.2005. [PMC free article] [PubMed] [Cross Ref]
  • Jaccard P. Nouvelles recherches sur la distribution florale. Bulletin de la Sociète Vaudense des Sciences Naturelles. 1908;44:223–270.
  • Hollander M, Wolfe D. Nonparametric Statistical Methods. New York:Wiley; 1999.
  • Benjamini Y, Hochberg Y. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society. Series B (Methodological) 1995;57:289–300.
  • Bluthgen N, Brand K, Cajavec B, Swat M, Herzel H, Beule D. Biological profiling of gene groups utilizing Gene Ontology. Genome Inform. 2005;16:106–15. [PubMed]
  • H ZL, B J, R JM. CateGOrizer: A Web-Based Program to Batch Analyze Gene Ontology Classification Categories. Online Journal of Bioinformatics. 2008;9(2):108–112.
  • Hulsen T, de Vlieg J, Alkema W. BioVenn - a web application for the comparison and visualization of biological lists using area-proportional Venn diagrams. BMC Genomics. 2008;9:488. doi: 10.1186/1471-2164-9-488. [PMC free article] [PubMed] [Cross Ref]
  • Mabey JE, Anderson MJ, Giles PF, Miller CJ, Attwood TK, Paton NW, Bornberg-Bauer E, Robson GD, Oliver SG, Denning DW. CADRE: the Central Aspergillus Data REpository. Nucleic Acids Res. 2004. pp. D401–5. [PMC free article] [PubMed]

Articles from BMC Genomics are provided here courtesy of BioMed Central
PubReader format: click here to try

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...