GEO DataSet (GDS) A vs B query tool


Purpose: To help identify gene profiles that display marked differences in expression level between two subsets of experimental factors (e.g. tissue, strain, time, dose, etc).

Caveat:
  • t-test is a well established statistical method to try and determine if the means of two sets of data are really different. Please refer to any basic statistical textbook or search the web for more detail on that method. There are basic assumptions made by the t-test thus results may be wrong or misleading based on the validity of these assumptions.
  • The "mean group A vs B" is perhaps the most rudimentary means of filtering data. Retrievals may have no statistical significance; compared subsets may be too small to provide any statistic value (e.g., singletons).


Selecting groups: As a first step, it is extremely helpful to view a graphic representation of the subset groupings. To do this, click on the 'Value distribution' chart for this DataSet (from the Analysis button) - the bars along the bottom of the chart provide an overview of the subset groupings. You need to select the subsets in group A (left) to compare to group B (right). There are few options to select elements (Samples) in each group:
  • Selecting subsets that have no intersecting elements will create a union of the elements into one group.
  • Selecting subsets that share Samples will use the intersection, not all elements.
  • You may further limit Samples in groups by unchecking the boxes for any groups or Samples you do not wish to include.
Method: The criterion is set by the user as a t-test significance level or mean group fold difference between log values or rank values. The t-test score or mean value for each group is calculated. If elements are null or absent they are ignored. If both groups are empty the profile is skipped, if one group is empty its value is assumed to be zero for mean group fold (i.e. it will participate in the criterion), t-test requires at least 2 samples in each group. Only profiles that pass the user criteria are presented. There is no way to know a priori what filter to use to provide meaningful results or that meaningful results will be obtained. The result set may be empty if no profiles pass the criteria.

Additional tools: You might also consider using the hierarchical and k-means clustering tools to identify profiles of interest - you can find them on this page under the "Analysis" button. In addition, the "Profile neighbors" link (found on top right of retrievals) will return other profiles within that DataSet that exhibit similar or opposite expression patterns as calculated by Pearson correlation coefficients.

We plan to add more mining features in the future.
If you have any questions regarding this feature, please e-mail geo@ncbi.nlm.nih.gov.