NCBI » GEO » Info » MINiMLLogin

MINiML (MIAME Notation in Markup Language)

What is MINiML?Back to top

MINiML (MIAME Notation in Markup Language, pronounced 'minimal') is a data exchange format optimized for microarray gene expression data, as well as many other types of high-throughput molecular abundance data. MINiML assumes only very basic relations between objects: Platform (e.g., array), Sample (e.g., hybridization), and Series (experiment). MINiML captures all components of the MIAME checklist, as well as any additional information that the submitter wants to provide. MINiML uses XML Schema as syntax.

MINiML XML Schema definition is available.

GEO supports both data submissions and data retrievals in MINiML:

  • All GEO data can be downloaded as MINiML files from our FTP site.
  • MINiML files can be uploaded using our batch submission page.

Why another data exchange format?Back to top

GEO has been using SOFT (Simple Omnibus Format in Text) as a data exchange format. An advantage of SOFT is its simplicity which makes it suitable for parsing and generation by virtually any text manipulating language. However, excellent tools exist today to programmatically support XML formats and provide better document structure, syntax definitions or data rendering. MINiML is effectively an XML rendering of SOFT.

GEO fully supports both SOFT and MINiML.

MAGE-ML (Microarray Gene Expression - Markup Language) is another data exchange format. MINiML and MAGE-ML are both defined in XML and allow capture of MIAME information, but are not otherwise directly related. Most notably, MINiML is a stand alone XML Schema definition and MAGE-ML is a DTD definition generated automatically from an object model (MAGE-OM). MAGE-ML can structure data in a variety of ways and is mostly suitable when using the MAGE-OM as object model in an underlying database.

We welcome all comments and feedback (please email geo@ncbi.nlm.nih.gov).

MINiML Elements and Content GuidelinesBack to top

The table below provides content guidelines and constraints for most MINiML elements; it is not exhaustive.

Updates: Every required element must be included in update files, even if they have not changed. You can safely omit optional elements if they have not changed.

Please refer to the complete Schema definition, or the following example MINiML submission files (tables truncated to 20 rows) for more information:

Human Genomic Data Submitted to Unrestricted-Access Repositories

NIH-funded studies: If you plan to submit large-scale human genomic data, as defined by the NIH Genomic Data Sharing (GDS) Policy, to be maintained in an unrestricted-access NCBI database, NIH expects you to 1) submit an Institutional Certification to assure that the data submission and expectations defined in the NIH GDS Policy have been met, 2) register the study in NCBI BioProject regardless of where the data will ultimately reside (e.g., GenBank, SRA, GEO). If you have any questions about whether your research is subject to the NIH GDS Policy, please contact the relevant NIH Program Official and/or the Genomic Program Administrator. If you plan to submit genomic data from human specimens that would not be considered large-scale, it is your responsibility to ensure that the submitted information does not compromise participant privacy and is in accord with the original consent in addition to all applicable laws, regulations, and institutional policies.

Non-NIH-funded studies: If your data are not NIH-funded, you are not required to comply with GDS policy but you must have the appropriate consent/permission to submit the data to a public database like GEO. GEO is not able to help interpret your consent forms, you should consult with your IRB on that. It is your responsibility to ensure that the submitted information does not compromise participant privacy and is in accord with the original consent in addition to all applicable laws, regulations, and institutional policies. If you do not have consent to make the data fully public in a database like GEO, you can apply to the NIH Office of Science Policy to find an NIH Institute that will sponsor your study in NCBI's dbGaP database. dbGaP has controlled access mechanisms and is an appropriate resource for hosting sensitive patient data. The sponsor would create a Data Access Request and Use Certification and define use restrictions for use in approving data access requests.

Element nameNumber of allowed labelsAllowed values and constraintsContent Guidelines for submitters
Title required string of length 1-120 characters, must be unique within local file and over all previously submitted Platforms for that submitter Provide a unique title that describes your Platform. We suggest that you use the system '[institution/lab][species][number of features][version]', e.g. "FHCRC Mouse 15K v1.0".
Distribution required commercial, non-commercial, custom-commercial, or virtual Microarrays are 'commercial', 'non-commercial', or 'custom-commercial' in accordance with how the array was manufactured . Use 'virtual' only if creating a virtual definition for MS, MPSS, SARST, or RT-PCR data.
Technology required spotted DNA/cDNA, spotted oligonucleotide, in situ oligonucleotide, antibody, tissue, SARST, RT-PCR, MS, or MPSS Select the category that best describes the Platform technology.
Organism required and unbounded use standard NCBI Taxonomy nomenclature Identify the organism(s) from which the features on the Platform were designed or derived.
Manufacturer required any Provide the name of the company, facility or laboratory where the array was manufactured or produced.
Manufacture-Protocol required any Describe the array manufacture protocol. Include as much detail as possible, e.g., clone/primer set identification and preparation, strandedness/length, arrayer hardware/software, spotting protocols. Please provide complete protocol descriptions within your submission.
Catalog-Number optional any Provide the manufacturer catalog number for commercially-available arrays.
Web-Link optional and unbounded valid URL Specify a Web link that directs users to supplementary information about the array. Please restrict to Web sites that you know are stable.
Support optional any Provide the surface type of the array, e.g., glass, nitrocellulose, nylon, silicon, unknown.
Coating optional any Provide the coating of the array, e.g., aminosilane, quartz, polysine, unknown.
Description optional any Provide any additional descriptive information not captured in another field, e.g., array and/or feature physical dimensions, element grid system.
Contributor-Ref optional and unbounded List all people associated with this array design.
Pubmed_ID optional and unbounded an integer Specify a valid PubMed identifier (PMID) that references a published article that describes the array.
Data-Table required a plain text (ASCII) tab-delimited table Data-Tables can be supplied either within the MINiML file (Internal-Data), or can be external files (External-Data). External-Data files should be zipped or tarred together with the MINiML file at the time of submission.
A full description of Platform data tables, required columns, content and restrictions is provided in the Platform data table guidelines. One difference to note is that data tables do not have headers in MINiML files - table columns are defined by position.
Supplementary-Data optional and unbounded a link or path to supplementary data Examples of Platform supplementary data include original GAL and CSV files. Supplementary files can be zipped or tarred together with the MINiML file at time of submission.
Element nameNumber of allowed labelsAllowed values and constraintsContent Guidelines for submitters
Title required string of length 1-120 characters, must be unique within local file and over all previously submitted Samples for that submitter Provide a unique title that describes this Sample. We suggest that you use the system [biomaterial]-[condition(s)]-[replicate number], e.g., Muscle_exercised_60min_rep2.
Channel-Count required nomenclature State the number of channels in the experiment, e.g., two-color hybridizations are typically 2-channel, Affymetrix hybridizations are typically 1-channel.
Source required per channel any Briefly identify the biological material and the experimental variable(s) for this Sample, e.g., vastus lateralis muscle, exercised, 60 min.
Organism required and unbounded per channel use standard NCBI Taxonomy nomenclature Identify the organism(s) from which the biological material was derived.
Characteristics required per channel any List all available characteristics of the biological source e.g.,
Strain: C57BL/6
Gender: female
Age: 45 days
Tissue: bladder tumor
Tumor stage: Ta
Biomaterial-Provider optional per channel any Specify the name of the company, laboratory or person that provided the biological material.
Treatment-Protocol optional per channel any Describe any treatments applied to the biological material prior to extract preparation. Please provide complete protocol descriptions within your submission.
Growth-Protocol optional per channel any Describe the conditions that were used to grow or maintain organisms or cells prior to extract preparation. Please provide complete protocol descriptions within your submission.
Molecule required per channel total RNA, polyA RNA, cytoplasmic RNA, nuclear RNA, genomic DNA, protein, or other Specify the type of molecule that was extracted from the biological material.
Extract-Protocol optional per channel any Describe the protocol used to isolate the extract material. Please provide complete protocol descriptions within your submission.
Label required per channel any Specify the compound used to label the extract e.g., biotin, Cy3, Cy5, 33P.
Label-Protocol optional per channel any Describe the protocol used to label the extract. Please provide complete protocol descriptions within your submission.
Hybridization-Protocol optional any Describe the protocols used for hybridization, blocking and washing, and any post-processing steps such as staining. Please provide complete protocol descriptions within your submission.
Scan-Protocol optional any Describe the scanning and image acquisition protocols, hardware, and software. Please provide complete protocol descriptions within your submission.
Data-Processing required any Provide details of how data in the VALUE column of your table were generated and calculated, i.e., normalization method, data selection procedures and parameters, transformation algorithm and scaling parameters (e.g., MAS5.0, scaled to 100).
Description required any Include any additional information not provided in the other fields, or paste in broad descriptions that cannot be easily dissected into the other fields.
Platform-Ref required a valid Platform identifier Reference the Platform iid upon which this hybridization was performed.
Data-Table required a plain text (ASCII) tab-delimited table Data-Tables can be supplied either within the MINiML file (Internal-Data), or can be external files (External-Data). External-Data files should be zipped or tarred together with the MINiML file at the time of submission.
A full description of Sample data tables, required columns, content and restrictions is provided in the on our Web submission guide: One difference to note is that data tables do not have headers in MINiML files - table columns are defined by position.
Supplementary-Data required a reference to supplementary data, or type="none" Examples of Sample supplementary data include original GPR, CEL, EXP, RPT, CAB, and TIFF files. Supplementary files should be zipped or tarred together with the MINiML file at time of submission. Provision of supplementary raw data files facilitates the unambiguous interpretation of data and potential verification of conclusions as set forth in the MIAME guidelines.
Anchor required for SAGE Samples NlaIII or Sau3A Supply for SAGE submissions only. State the enzyme anchor.
Type required for SAGE Samples RNA, genomic, protein, SAGE, MPSS, SARST, mixed Supply for SAGE submissions only (this field is derived automatically for other Sample types using the Molecule field).
Tag-Count required for SAGE Samples an integer Supply for SAGE submissions only. State the sum number of tags quantified in this Sample.
Tag-Length required for SAGE Samples an integer Supply for SAGE submissions only. State the base pair length of the SAGE tags, excluding anchor sequence.
Element nameNumber of allowed labelsAllowed values and constraintsContent Guidelines for submitters
Title required string of length 1-120 characters, must be unique within local file and over all previously submitted Series for that submitter Provide a unique title that describes the overall study.
Summary required any Summarize the goals and objectives of this study. The abstract from the associated publication may be suitable.
Type required any Enter keyword(s) that generally describe the type of study. Examples include: time course, dose response, comparative genomic hybridization, ChIP-chip, cell type comparison, disease state analysis, stress response, genetic modification, etc.
Overall-Design required any Provide a brief description of the experimental design. Indicate how many Samples are analyzed, if replicates are included, are there control and/or reference Samples, dye-swaps, etc.
Pubmed-ID optional and unbounded an integer Specify a valid PubMed identifier (PMID) that references a published article describing this study. Most commonly, this information is not available at the time of submission - it can be added later once the data are published.
Web-Link optional and unbounded valid URL Specify a Web link that directs users to supplementary information about the study. Please restrict to Web sites that you know are stable.
Contributor-Ref optional and unbounded List all people associated with this study.
Sample-Ref required and unbounded valid Sample identifiers Reference the Sample iid that make up this experiment.
Variable
Factor Description Sample-Ref
optional and unbounded Allowed 'Factors' include:
dose, time, tissue, strain, gender, cell line, development stage, age, agent, cell type, infection, isolate, metabolism, shock, stress, temperature, specimen, disease state, protocol, growth protocol, genotype/variation, species, individual, or other
Indicate and describe the variable type(s) investigated in this study. NOTE - this information does not appear in Series records or downloads, but will be used to assemble corresponding GEO DataSet records.
Repeats
Factor Sample-Ref
optional and unbounded Allowed 'Factors' include:
biological replicate
technical replicate - extract
technical replicate - labeled-extract
Indicate the repeat type(s). NOTE - this information does not appear in Series records or downloads, but will be used to assemble corresponding GEO DataSet records.
Last modified: October 4, 2017