Expression Resources
Course Home Modules Schedule Exercises Comments Credits


Sample size and confidence in expression data

  Sample User Question
Step By Step Guide
 

Sample User Question back to top

While we have used the Testis vs. Whole Brain data from UniGene (previous questions in this exercise set are accessible from the exercises page), a question arises as to whether or not we have enough independent samples in our data to be confident in our results - do we? A similar set of data using the single channel Affymetrix Whole Mouse Genome (MG-U74A) chip contains data comparing mouse testis to mouse brain can be found in GEO Dataset GDS182--are their enough independant samples of these two tuissue in this so-called Large scale Mouse Transcriptome experiment to be confident of the statistical significance of our results?

Step By Step Guide back to top

In the UniGene DDD expeiment, there are 7 cDNA pools for Tesits and 27 cDNA pools for Whole Brain used in our comparison. Thus, df=(7 Testis samples + 27 Whole Brain samples)-2 tissue types (Testis and Brain)=34-2=32. Therefore, df=n=32...more than enough to give us a high level of confidence in our findings. Examination of the Large Scale Mouse Transcriptome data described in GEO is another matter, however...in GDS182, there are no more than 2 independant samples for each of 45 tissues. This gives us for a pairwise comparison of Testis and brain (cerebelum in this case, as no whole brain sample is available in this experiment) a df of (2+2)-2=2, clearly not a large enough n value...in fact, the value of the whole experiment is questionable due to this simple rule of thumb. Nonetheless, an examination of some key gene profiles, such as for Protamine 1 (Prm1), shows what appears to be tissue specific expression...with a number of unexplained datacalls for absent signals in the other tissues...



Expression Resources Return to Slides (*.html or *.mht format)
Return to Exercises List
Revised 07/24/2007