Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Gene Expr Patterns. Author manuscript; available in PMC 2009 Feb 1.
Published in final edited form as:
PMCID: PMC2253684

A Web Based Resource Characterizing the Zebrafish Developmental Profile of over 16,000 transcripts


Using a spotted 65-mer oligonucleotide microarray, we have characterized the developmental expression profile from mid-gastrulation (75% epiboly) to 5 days post-fertilization (dpf) for >16,000 unique transcripts in the zebrafish genome. Microarray profiling data sets are often immense, and one challenge is validating the results and prioritizing genes for further study. The purpose of the current study was to address such issues, as well as to generate a publicly available resource for investigators to examine the developmental expression profile of any of the over 16,000 zebrafish genes on the array. On the chips, there are 16,459 printed spots corresponding to 16,288 unique transcripts and 172 β-actin (AF025305) spots spatially distributed throughout the chip as a positive control. We have collected 55 microarray gene expression profiling results from various zebrafish laboratories and created a Perl/CGI-based software tool (http://serine.umdnj.edu/~ouyangmi/cgi-bin/zebrafish/profile.htm) for researchers to look for the expression patterns of their gene of interest. Users can search for their genes of interest by entering the accession numbers or the nucleotide sequences and the expression profiling will be reported in the form of expression intensities versus time-course graphical displays. In order to validate this web tool, we compared seventy-four genes’ expression results between our web tool and the in situ hybridization results from Thisse et al. (2004) as well as those reported by Mathavan et al. (2005). The comparison indicates the expression patterns are 80% and 75% in agreement between our web resource with the in situ database (Thisse et al. 2004) and with those reported by Mathavan et al. (2005), respectively. Those genes that conflict between our web tool and the in situ database either have high sequence similarity with other genes or the in situ probes are not reliable. Among those genes that disagree between our web tool and those reported by Mathavan et al. (2005), 93% of the genes are in agreement between our web tool and the in situ database, indicating our web tool results are quite reliable. Thus, this resource provides a user-friendly web based platform for researchers to determine the developmental profile of their gene of interest and to prioritize genes identified in microarray analyses by their developmental expression profile.

1. Introduction

Microarray profiling is a powerful technology that enables researchers to compare gene expression across samples (Lee and Saeed, 2007). In order to study gene expression in zebrafish embryos at various developmental stages, we have printed zebrafish DNA microarray chips at the Kimmel Cancer Center (Thomas Jefferson University) comprising more than 16,000 unique transcripts from the Zebrafish oligo library from Compugen/Sigma-Genosys. The Kimmel Cancer Center microarray profiling service has been used by many investigators in the zebrafish community to investigate a variety of interesting biological problems and proven to be an useful resource for the community (Linney et al., 2004, Sumanas et al, 2005, Covassin L et al, 2006, Kassen et al., 2007, Kreiling, et al., 2007, Robu et al., 2007). Using a spotted 65-mer oligonucleotide microarray, we have characterized the developmental expression profile from mid-gastrulation (75% epiboly) to 5 days post-fertilization (dpf) for 16,288 unique transcripts in the zebrafish genome, representing about a third to one half of all genes in the genome, according to current predictions (zebrafish genome project zv7, http://www.sanger.ac.uk/Projects/D_rerio/Zv7_assembly_information.shtml). We have collected 55 independent RNA sets of zebrafish gene expression profiles spanning 75% epiboly to 5 dpf from several zebrafish laboratories (Amacher, Farber, Ho) and created a useful web resource tool (http://serine.umdnj.edu/~ouyangmi/cgi-bin/zebrafish/profile.htm) for researchers. One of the challenges in analyzing microarray data is prioritizing and validating positives for further study; our web resource tool allows researchers to more quickly refine their microarray data. In addition, the database provides “virtual data” for those interested in studying groups of genes with distinct developmental expression profiles.

2. Results and Discussion

Our Perl/CGI-based software tool (http://serine.umdnj.edu/~ouyangmi/cgi-bin/zebrafish/profile.htm) provides an easy-to-use web based platform for users to determine the developmental profile of their gene of interest. The established web tool has three basic functionalities. First, the user can determine which probe on the microarray chip corresponds to the DNA sequence of interest. The web tool will look up the closest matching probe(s) and report the corresponding GenBank Accession Number(s). Second, the user can examine the developmental profile of the transcript by entering the GenBank Accession Number. The example in Figure 1 shows the developmental expression profile for BE201395, a gene with similarity to myosin regulatory light chain 2. The tool displays three figures: the raw intensities, the normalized intensities, and the intensities relative to β-actin. The scanned and digitized data are processed as follows: the local background intensities are subtracted from the spot intensities, and the resulting values are called “Raw Intensities” (Figure 1A). The raw intensities are divided by the median intensity of the chip, and the resulting values are called “Normalized Intensities” (Figure 1B). The normalized intensities are divided by the median of the β-actin intensities of the chip, and the resulting values are called “Intensities Relative to β-actin” (Figure 1C). All three kinds of intensities are reported as mean and standard error of the replicates, with time-course graphical display. Since the β-actin expression levels are slightly increased after 24 hpf during the developmental stages, the intensities relative to β-actin are slightly different than raw intensities or normalized intensities. Finally, the top 30 positively correlated and the top 30 negatively correlated transcripts are displayed in the search results (Fig. 1 D). Another functionality of the tool is to perform a BLAST search of the NCBI nonredundant (nr) database (data not shown).

Figure 1Figure 1Figure 1Figure 1
Web Tool “output” for zebrafish microarray analyses. To demonstrate the functions of the web resource, the expression profile of zgc:103639 (BE201395) retrieved from the web (http://serine.umdnj.edu/~ouyangmi/cgi-bin/zebrafish/profile.htm ...

In addition to obtaining expression profiles of individual genes, users may also use the web interface to obtain information about groups of genes with similar developmental expression profiles. By comparing the profiles of all genes on the microarray, genes expressed at similar times during development were clustered into 10 groups (called “nodes”) using the program Cluster 3.0 (de Hoon et al., 2004, Eisen et al., 1998) using K-means clustering with 1000 runs and an uncentered Pearson correlation metric. The results were visualized using the program Maple Tree (http://rana.lbl.gov/EisenSoftware.htm) and are displayed on the website (Fig. 2). For instance, genes in Node 2 are expressed at high levels during gastrulation, declining to basal levels after the first day of development, whereas genes in Node 5, such as the amylase α2a gene (BM103972) that encodes a pancreas-enriched enzyme (Fig. 3), are not expressed until late larval stages. BE201395, the myosin regulatory light chain 2-like gene profiled in Fig. 1A–C, belongs to Node 8, representing genes with peak expression at 48 hpf. We anticipate that users will use node information to prioritize genes within their microarray datasets for further study.

Figure 2
Gene Clusters Display. To identify genes expressed at similar times during development, K-means clustering was performed on the expression profiles of each gene in the array. The intensity values for each spot were normalized to the median intensity on ...
Figure 3Figure 3Figure 3Figure 3
The expression profile of pancreatic enzyme gene, amylase α2a (BM103972) retrieved from the web (http://serine.umdnj.edu/~ouyangmi/cgi-bin/zebrafish/profile.htm). A. BM103972 raw intensities display. B. BM103972 normalized intensities display. ...

Investigators can use the website to obtain gene expression levels at developmental stages from 75% epiboly to 5 dpf. To validate the gene expression profiling, we chose 74 genes (34 genes with gene names starting with the letter a, and 40 genes at random) and compared their expression profiles on the microarray with the in situ hybridization expression data reported by Thisse et al. (2004) and archived by ZFIN (http://zfin.org/cgi-bin/webdriver?MIval=aa-pubview2.apg&OID=ZDB-PUB-040907-1). Basically, we assigned a score of 1 to those expression profiles that correlate, a score of 0 to those that differ, and no score (n) to those time points with no in situ hybridization data (Table 1). The summarized scores were divided by the total number of time points. Of the genes in this survey, 80% show an agreement between our microarray data and the in situ hybridization results from Thisse et al. (2004). Among these 74 genes, 10 genes exhibit an average score less than or equal to 0.5. Further analysis of the 10 low-scoring genes revealed that 5 of them have more than one hit on the microarray chip with E values from 0 to e−6 range. These results suggest that the 5 low-scoring genes (zgc:92191, zgc:56709, acvr2b, zgc:64120, zgc:77800) likely have high sequence similarity to other genes. The other 5 low-scoring genes (zgc:113518, zgc:101040, zgc:92530, zgc:92313, zgc:85659) either display weak expression in the in situ hybridization results or the probe has not been validated. Of the 64 high-scoring genes, 34 genes (53%) have developmental expression profiles that fully correlate when microarray and in situ hybridization data are compared. For example, the microarray data indicate a muscle-expressed gene related to myosin regulatory light chain 2 (zgc:103639; BE201395) is not expressed before the 18-somite stage (Fig. 1 A–C). The in situ hybridization results also indicate that BE201395is not expressed before 20 somites (http://zfin.org/cgi-bin/webdriver?MIval=aa-fxallfigures.apg&OID=ZDB-PUB-051025-1&fxallfig_probe_zdb_id=ZDB-EST-051107-63). The expression of BE201395 peaks at 2 dpf and is lower at 5 dpf in our array data (Fig. 1 A–C) and correlates well with the in situ hybridization data which shows robust expression from late segmentation stages (20–25 somites) to high-pec (42 hpf) and long-pec stages (48 hpf), with potentially weaker expression at 5 dpf stage. Likewise, the array data indicate that the enzyme amylase α2a (BM103972) is initially expressed at 4 dpf, increasing substantially by 5 dpf (Fig. 3 A–C), which agrees nicely with in situ hybridization data showing pancreas-specific expression at 5 dpf and no expression at early developmental stages. We chose the above two examples because their expression patterns have been documented across all the developmental stages; expression patterns for many of the genes having ‘perfect’ scores in Table 1 are not available for later developmental stages (i.e. 5 dpf) in the in situ hybridization database. We also compared these 74 genes’ expression patterns with those reported by Mathavan et al. (2005), which used the same Compugen oligo set for their microarrays. We found that 75% of the genes are well-matched between the two databases (Table 2). We further compared the low-scored 15 genes (≤ 0.5) in the table 2 with those in the table 1, only one gene (acvr2b) has low score in both tables (table 1 and table 2), suggesting our web tool results are relatively reliable. In addition, we have performed quantitative RT-PCR on a randomly selected set of genes and compared these results with data obtained using the web tool. Our quantitative RT-PCR of β-actin (AF025305) showed a similar pattern with the retrieved web data which showed slightly increased levels after 24 hpf (Fig. 4A). The quantitative RT-PCR profiles of the selected Node 2 gene (BM104515, s100 calcium binding protein A1), Node 3 gene (AW421995, fat-free gene), Node 4 gene (BG738899, vacuolar protein sorting 26 homolog B) and Node 5 gene (AF254642, fatty acid binding protein 10, liver type) also parallel the microarray data, thus further validating our web resource (Fig. 4B–E). Furthermore, our web resource provides additional time points at later developmental stages (i.e. 3 to 5 dpf) and displays easy-to-read virtual data for a gene of interest. We hope that we can add more data to this resource and extend an invitation to other Compugen oligo set users to contact us by email so we can determine if their data can be added. In conclusion, we have established a user-friendly web-based resource that enables researchers to obtain information regarding zebrafish gene expression patterns during early development.

Figure 4
Comparison of array results with quantitative RT-PCR. Using the expression level of the 75% epiboly stage as reference, (A) the quantitative RT-PCR of β-actin (AF025305 ...
Table 1
Comparison of Microarray Data with ZFIN Expression Data
Table 2
Comparison of Microarray Data with Mathavan et al. (2005)

3. Experimental Procedure

3.1. Total RNA isolation

Zebrafish (Danio rerio) embryos were collected and staged from 75% epiboly to 5 days post fertilization (dpf). Total RNA was extracted by TRIZOL (Invitrogen).

3.2. Zebrafish Microarray

The customized Gene Expression microarrays are transcriptional profiles designed to monitor the mRNA levels of multiple genes. The technology of Zebrafish microarray used by the KCC/TJU for expression profiling is a single color system similar to Affymetrix’s technology. The microarrays utilize nucleic acid hybridization of a biotin-labeled complementary RNA (cRNA) target with DNA oligonucleotide probes attached to a gel matrix.

3.3. Array Design

The commercial Zebrafish oligo library from Compugen/Sigma-Genosys consists of 16,399 oligos (65nt) representing 16,288 unique gene clusters. The oligos are modified with a 5′ amino C6 linker. The library includes 172 zebrafish β-actin oligos distributed over the entire library as internal controls (approximately 4 per 384 well plate). Amersham CodeLink™ activated glass slides were prepared using a hydrophilic polymer contain N-hydroxysuccinimide ester groups covalently attached to the silane basecoat by the vendor. The oligos were resuspended in 50 mM phosphate buffer, pH 8.0, at 20 mM concentration. The individual oligo-probes were printed on CodeLink™ activated slides under 45% humidity by GeneMachine OmniGrid™ 100 Microarrayer in 4 × 12 pin configuration and 20 × 20 spot configuration of each subarray. The spot diameter was 100 μm and distance from center to center was 200 μm. The printed Zebrafish microarrays were further immobilized via covalent coupling methods at 70% humidity overnight. Samples were hybridized after additional 100 mM Taurine/Bicine blocking and 4X SSC/0.1% SDS washing steps.

3.4. Target Preparation Protocol

The biotin-labeled cRNA target is prepared by a linear amplification method. Poly (A) RNA from 5ug of total RNA is primed for reverse transcription by a DNA oligonucleotide containing a T7 RNA polymerase promoter 5′ to a d(T)24 sequence. After second-strand cDNA synthesis, the cDNA serves as the template in an in vitro transcription (IVT) reaction to produce the target cRNA. The IVT is performed in the presence of biotinylated nucleotides to label the target cRNA: Total RNA (5 ug) was incubated with T7 primer at 70°C for 10 minutes before addition of Superscript reverse transcriptase reaction mix and further incubation at 37°C for 1 hour. The first-strand cDNA was added to polymerase mix and incubated in a TropiCooler incubator at 16°C for 2 hours. Double-strand cDNA was precipitated in isopropanol at −80°C for 20 min, centrifuged at 14k rpm for 20 min, and pellet resuspended in ddH2O after air-drying for 30 minutes. In vitro transcription was performed using the Ambion MEGAscript T7 kit. Biotinylated UTP, NTPs and enzyme reaction mix were added to dried cDNA and incubated at 37°C overnight. Synthesized cRNA was purified using the Qiagen RNeasy kit. Concentration and quality of cRNA was measured by an Eppendorf Biophotometer. The target labeling by the KCC/TJU standard procedure results in a 50 – 200 fold linear amplification of the input poly(A) RNA.

3.5. Hybridization Procedures and Parameters

Biotin-labeled cRNA, 10 μg of each target, was used for hybridization on each KCC/TJU 17K Zebrafish oligo expression microarray on a TECAN HS4800 Hybridization Station. The microarrays were hybridized in 6X SSPE with 50% formamide at 37°C for 20 hours, washed in 0.75X TNT at 46°C for 1 hour, followed by blocking in TNB at RT for 30 minutes, and then processed by using direct detection method of the biotin-containing transcripts by Streptavidin-Alexa 647 conjugate in TNB buffer (1:500) at RT for 30 minutes. Stained chips were washed in 1X TNT wash buffer at room temperature for 60 minutes with changing fresh TNT buffer every 15 minutes.

3.6. Array Data processing

Processed chips were scanned by using a Perkin Elmer ScanArray® XL5000, software version 3.1, with the laser set to 635 nm, at power 90 and PMT 70-50 setting, and a scan resolution of 10 microns. Images were quantified by PerkinElmer QuantArray® Software 3.0. The fixed circle quantification method was used and output was given as total intensities. Confidence calculation was weighted average. The median normalized data from control and testing samples were compared to each other in fold change for further identification of differentially expressed genes in Excel spreadsheet format.

3.7. Web-based resource establishment

The profiling of zebrafish development is conducted at seven time points: 75% epiboly, 18 somite, 24 hpf, 48 hpf, 3 dpf, 4 dpf, and 5 dpf. The number of independent RNA samples that were applied to the microarrays for each selected developmental time point were 3, 4, 8, 13, 11, 8, and 8, respectively. For each of these 55 microarrays, the scanned and digitized data are processed as follows: First, the local background intensities are subtracted from the spot intensities resulting in raw intensities. Second, the raw intensities are divided by the median intensity of the chip resulting in normalized intensities. Third, the normalized intensities are divided by the median of the β-actin intensities, and the resulting values are the intensities relative to β-actin. Then, the replicates at each time point are combined by calculating the mean and standard error for each transcript. We constructed a web tool to facilitate access to and visualization of the data. The tool is implemented using HTML/Perl/CGI, and the URL of the tool is http://serine.umdnj.edu/~ouyangmi/cgi-bin/zebrafish/profile.htm. In addition to cluster analysis, we use Pearson correlation in gene to gene time-course profiles to capture detailed information of co-regulation during the various developmental stages. The positively correlated genes are those genes that have the same expression patterns to the query gene; whereas the negatively correlated genes are those have opposite expression patterns.

3.8 Quantitative PCR

The SYBR Green master mix reagents (Applied Biosystems) were used on the ABI PRISM 7900HT or the DNA Engine OPTICOM according to the manufacturers instructions. The transcripts were amplified using the following forward and reverse primers. AF025305: 5- ATGGATGATGAAATTGCCGC-3′ and 5′-AGTTGGTGACAATACCGTGC -3′; BM104515: 5′-TACAAGCTA AGCAAATCTGAGCT-3′ and 5′-ATTGCATGCCAC AGTGAGA-3′; AW421995:5′-ATGAGTTCAGCCACAACAC-3′ and 5′-AAAGTCATTCTTCATTTTTCTG-3′; BG738899:5′-ACCTCATTGTTCATCAGCTG-3′ and 5′-ACCTCATTGTTCATCAGCTG-3′; AF254642: 5′-ATCCAGCAGAACGGCAGC-3′ and 5′ACGATGCACTTGAGCTTCTT-3′. The threshold cycle (CT value) was determined for each transcript at designated time point. The relative expression levels of each transcript were calculated by the comparative CT method (Livak and Dchmittgen, 2001).


The authors want to thank the Microarray Facility at Kimmel Cancer Center (Thomas Jefferson University) for the microarray experiments (Drs. Chang-Gong Liu and Jennifer Ma) and Jennifer Anderson at the Carnegie Institution for helpful comments on the manuscript. This work was supported by an American Heart Association SDG award (0730083N) to S.Y.H., a US National Institute of Health Grant (DK060369) to S.A.F., and a March of Dimes grant (1FY05-118) to S.L.A.. T.M.H. was supported by an American Cancer Society Postdoctoral Fellowship and A.T.G. was supported by the UC Berkeley Center for Integrative Genomics.


Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.


  • Covassin L, Amigo JD, Suzuki K, Teplyuk V, Straubhaar J, Lawson ND. Global analysis of hematopoietic and vascular endothelial gene expression by tissue specific microarray profiling in zebrafish. Dev Biol. 2006;299:551–62. [PMC free article] [PubMed]
  • de Hoon MJL, Imoto S, Nolan J, Miyano S. Open Source Clustering Software. Bioinformatics. 2004;20(9):1453–1454. [PubMed]
  • Eisen MB, Spellman PT, Brown PO, Botstein D. Cluster analysis and display of genome-wide expression patterns. PNAS. 1998;95:14863–14868. [PMC free article] [PubMed]
  • Kassen SC, Ramanan V, Montgomery JE, TBurket C, Liu CG, Vihtelic TS, Hyde DR. Time course analysis of gene expression during light-induced photoreceptor cell death and regeneration in albino zebrafish. Dev Neurobiol. 2007;67:1009–31. [PubMed]
  • Kreiling JA, Creton R, Reinisch C. Early Embryonic Exposure to Polychlorinated Biphenyls Disrupts Heat-Shock Protein 70 Cognate Expression in Zebrafish. J Toxicol Environ Health A. 2007;70:1005–1013. [PubMed]
  • Lee NH, Saeed AI. Microarray: an overview. Methods Mol Biol. 2007;353:265–300. [PubMed]
  • Linney E, Dobbs-McAuliffe B, Sajadi H, Malek RL. Microarray gene expression profiling during the segmentation phase of zebrafish development. Comp Biochem Physiol C Toxicol Pharmacol. 2004;138(3):351–62. [PubMed]
  • Livak KJ, Schmittgen TD. Analysis of relative gene expression data using real-time quantitative PCR and the 2(-ΔΔC(T)) Method. Methods. 2001;25:402–408. [PubMed]
  • Mathavan S, Lee SG, mark A, Miller LD, Murthy KR, Tong Y, Wu YL, Lam SH, Yang H, Ruan Y, Korzh V, Gong Z, Liu ET, Lufkin T. Transcriptome analysis of zebrafish embryogenesis using microarrays. PLoS Genet. 2005;1:260–76. [PMC free article] [PubMed]
  • Robu ME, Larson JD, Nasevicius A, Beiraghi S, Brenner C, Farber SA, Ekker SC. p53 activation by knockdown technologies. PLoS Genet. 2007;3(5):e78. [PMC free article] [PubMed]
  • Sumanas S, Jorniak T, Lin S. Identification of novel vascular endothelial-specific genes by the microarray analysis of the zebrafish cloche mutants. Blood. 2005;106:534–541. [PMC free article] [PubMed]
  • Thisse B, Heyer V, Lux A, Alunni A, Degrave A, Seiliez I, Kirchner J, Parkhill JP, Thisse C. Spatial and Temporal Expression of the Zebrafish Genome by Large-Scale In Situ Hybridization Screening. Meth Cell Biol. 2004;77:505–519. [PubMed]
PubReader format: click here to try


Save items

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


  • Gene
    Gene records that cite the current articles. Citations in Gene are added manually by NCBI or imported from outside public resources.
  • Gene (nucleotide)
    Gene (nucleotide)
    Records in Gene identified from shared sequence and PMC links.
  • GEO Profiles
    GEO Profiles
    Gene Expression Omnibus (GEO) Profiles of molecular abundance data. The current articles are references on the Gene record associated with the GEO profile.
  • HomoloGene
    HomoloGene clusters of homologous genes and sequences that cite the current articles. These are references on the Gene and sequence records in the HomoloGene entry.
  • MedGen
    Related information in MedGen
  • Nucleotide
    Primary database (GenBank) nucleotide records reported in the current articles as well as Reference Sequences (RefSeqs) that include the articles as references.
  • PubMed
    PubMed citations for these articles
  • Substance
    PubChem chemical substance records that cite the current articles. These references are taken from those provided on submitted PubChem chemical substance records.
  • Taxonomy
    Taxonomy records associated with the current articles through taxonomic information on related molecular database records (Nucleotide, Protein, Gene, SNP, Structure).
  • Taxonomy Tree
    Taxonomy Tree

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...