U.S. flag

An official website of the United States government

MIMAG: metagenome-assembled genome, host-associated; version 6.0 Package

You can download package details in xml format or as an Excel spreadsheet.



Environmental Package



Use for metagenome-assembled genome sequences produced using computational binning tools that group sequences into individual organism genome assemblies starting from metagenomic data sets. Organism cannot contain the term 'metagenome'. Use the MIUVIG package for virus genomes. Before creating BioSamples for prokaryotic and eukaryotic MAGs, please read and follow the MAG submission instructions at https://www.ncbi.nlm.nih.gov/genbank/wgsfaq/#metagen.

Mandatory Attributes


  • Harmonized nameisolate
  • Descriptionidentification or description of the specific individual from which this sample was obtained

collection date

  • Harmonized namecollection_date
  • Descriptionthe date on which the sample was collected; date/time ranges are supported by providing two dates from among the supported value formats, delimited by a forward-slash character; collection times are supported by adding "T", then the hour and minute after the date, and must be in Coordinated Universal Time (UTC), otherwise known as "Zulu Time" (Z); supported formats include "DD-Mmm-YYYY", "Mmm-YYYY", "YYYY" or ISO 8601 standard "YYYY-mm-dd", "YYYY-mm", "YYYY-mm-ddThh:mm:ss"; e.g., 30-Oct-1990, Oct-1990, 1990, 1990-10-30, 1990-10, 21-Oct-1952/15-Feb-1953, 2015-10-11T17:53:03Z; valid non-ISO dates will be automatically transformed to ISO format

broad-scale environmental context

  • Harmonized nameenv_broad_scale
  • DescriptionAdd terms that identify the major environment type(s) where your sample was collected. Recommend subclasses of biome [ENVO:00000428]. Multiple terms can be separated by one or more pipes e.g.:  mangrove biome [ENVO:01000181]|estuarine biome [ENVO:01000020]

local-scale environmental context

  • Harmonized nameenv_local_scale
  • DescriptionAdd terms that identify environmental entities having causal influences upon the entity at time of sampling, multiple terms can be separated by pipes, e.g.:  shoreline [ENVO:00000486]|intertidal zone [ENVO:00000316]

environmental medium

  • Harmonized nameenv_medium
  • DescriptionAdd terms that identify the material displaced by the entity at time of sampling. Recommend subclasses of environmental material [ENVO:00010483]. Multiple terms can be separated by pipes e.g.: estuarine water [ENVO:01000301]|estuarine mud [ENVO:00002160]

geographic location

  • Harmonized namegeo_loc_name
  • DescriptionGeographical origin of the sample; use the appropriate name from this list http://www.insdc.org/documents/country-qualifier-vocabulary. Use a colon to separate the country or ocean from more detailed information about the location, eg "Canada: Vancouver" or "Germany: halfway down Zugspitze, Alps"

isolation source

  • Harmonized nameisolation_source
  • DescriptionDescribes the physical, environmental and/or local geographical source of the biological sample from which the sample was derived.

latitude and longitude

  • Harmonized namelat_lon
  • DescriptionThe geographical coordinates of the location where the sample was collected. Specify as degrees latitude and longitude in format "d[d.dddd] N|S d[dd.dddd] W|E", eg, 38.98 N 77.11 W

Optional Attributes


  • Harmonized namealtitude
  • DescriptionThe altitude of the sample is the vertical distance between Earth's surface above Sea Level and the sampled position in the air.

ancestral data

  • Harmonized nameances_data
  • DescriptionInformation about either pedigree or other ancestral information description, e.g., parental variety in case of mutant or selection, A/3*B (meaning [(A x B) x B] x B)

biological status

  • Harmonized namebiol_stat
  • DescriptionThe level of genome modification, e.g., wild, natural, semi-natural, inbred line, breeder's line, hybrid, clonal selection, mutant

chemical administration

  • Harmonized namechem_administration
  • Descriptionlist of chemical compounds administered to the host or site where sampling occurred, and when (e.g. antibiotics, N fertilizer, air filter); can include multiple compounds. For Chemical Entities of Biological Interest ontology (CHEBI) (v1.72), please see http://bioportal.bioontology.org/visualize/44603

collection method

  • Harmonized namecollection_method
  • DescriptionProcess used to collect the sample, e.g., bronchoalveolar lavage (BAL)


  • Harmonized namedepth
  • DescriptionDepth is defined as the vertical distance below surface, e.g. for sediment or soil samples depth is measured from sediment or soil surface, respectivly. Depth can be reported as an interval for subsurface samples.

derived from

  • Harmonized namederived_from
  • DescriptionIndicates when one BioSample was derived from another BioSample. Value should include BioSample accession number(s) (SAMNxxxxxxxx).


  • Harmonized nameelev
  • DescriptionThe elevation of the sampling site as measured by the vertical distance from mean sea level.

experimental factor

  • Harmonized nameexperimental_factor
  • DescriptionVariable aspect of experimental design

genetic modification

  • Harmonized namegenetic_mod
  • DescriptionGenetic modifications of the genome of an organism, which may occur naturally by spontaneous mutation, or be introduced by some experimental means, e.g. specification of a transgene or the gene knocked-out or details of transient transfection


  • Harmonized namegravidity
  • Descriptionwhether or not subject is gravid, and if yes date due or date post-conception, specifying which is used

host age

  • Harmonized namehost_age
  • DescriptionAge of host at the time of sampling

host blood pressure diastolic

  • Harmonized namehost_blood_press_diast
  • Descriptionresting diastolic blood pressureof the host, measured as mm mercury

host blood pressure systolic

  • Harmonized namehost_blood_press_syst
  • Descriptionresting systolic blood pressure of the host, measured as mm mercury

host body habitat

  • Harmonized namehost_body_habitat
  • Descriptionoriginal body habitat where the sample was obtained from

host body product

  • Harmonized namehost_body_product
  • Descriptionsubstance produced by the host, e.g. stool, mucus, where the sample was obtained from

host body temperature

  • Harmonized namehost_body_temp
  • Descriptioncore body temperature of the host when sample was collected

host color

  • Harmonized namehost_color
  • Descriptionthe color of host

host common name

  • Harmonized namehost_common_name
  • DescriptionThe natural language (non-taxonomic) name of the host organism, e.g., mouse

host diet

  • Harmonized namehost_diet
  • Descriptiontype of diet depending on the sample for animals omnivore, herbivore etc., for humans high-fat, meditteranean etc.; can include multiple diet types

host disease

  • Harmonized namehost_disease
  • DescriptionName of relevant disease, e.g. Salmonella gastroenteritis. Controlled vocabulary, http://bioportal.bioontology.org/ontologies/1009 or http://www.ncbi.nlm.nih.gov/mesh

host dry mass

  • Harmonized namehost_dry_mass
  • Descriptionmeasurement of dry mass

host family relationship

  • Harmonized namehost_family_relationship
  • Description

host genotype

  • Harmonized namehost_genotype
  • Description

host growth conditions

  • Harmonized namehost_growth_cond
  • Descriptionliterature reference giving growth conditions of the host

host height

  • Harmonized namehost_height
  • Descriptionthe height of subject

host last meal

  • Harmonized namehost_last_meal
  • Descriptioncontent of last meal and time since feeding; can include multiple values

host length

  • Harmonized namehost_length
  • Descriptionthe length of subject

host life stage

  • Harmonized namehost_life_stage
  • Descriptiondescription of host life stage

host phenotype

  • Harmonized namehost_phenotype
  • Description

host sex

  • Harmonized namehost_sex
  • DescriptionGender or physical sex of the host

host shape

  • Harmonized namehost_shape
  • Descriptionmorphological shape of host

host subject id

  • Harmonized namehost_subject_id
  • Descriptiona unique identifier by which each subject can be referred to, de-identified, e.g. #131

host subspecific genetic lineage

  • Harmonized namehost_subspecf_genlin
  • DescriptionInformation about the genetic distinctness of the host organism below the subspecies level e.g., serovar, serotype, biotype, ecotype, variety, cultivar, or any relevant genetic typing schemes like Group I plasmid. Subspecies should not be recorded in this term, but in the NCBI taxonomy. Supply both the lineage name and the lineage rank separated by a colon, e.g., biovar:abc123

host substrate

  • Harmonized namehost_substrate
  • Descriptionthe growth substrate of the host

observed host symbionts

  • Harmonized namehost_symbiont
  • DescriptionThe taxonomic name of the organism(s) found living in mutualistic, commensalistic, or parasitic symbiosis with the specific host

host taxonomy ID

  • Harmonized namehost_taxid
  • DescriptionNCBI taxonomy ID of the host, e.g. 9606

host tissue sampled

  • Harmonized namehost_tissue_sampled
  • Descriptionname of body site where the sample was obtained from, such as a specific organ or tissue, e.g., tongue, lung. For foundational model of anatomy ontology (fma) (v 4.11.0) or Uber-anatomy ontology (UBERON) (v releases/2014-06-15) terms, please see http://purl.bioontology.org/ontology/FMA or http://purl.bioontology.org/ontology/UBERON

host total mass

  • Harmonized namehost_tot_mass
  • Descriptiontotal mass of the host at collection, the unit depends on host

metagenome source

  • Harmonized namemetagenome_source
  • Descriptiondescribes the original source of a metagenome assembled genome (MAG). Examples: soil metagenome, gut metagenome

miscellaneous parameter

  • Harmonized namemisc_param
  • Descriptionany other measurement performed or parameter collected, that is not listed here

negative control type

  • Harmonized nameneg_cont_type
  • DescriptionThe substance or equipment used as a negative control in an investigation, e.g., distilled water, phosphate buffer, empty collection device, empty collection tube, DNA-free PCR mix, sterile swab, sterile syringe

Omics Observatory ID

  • Harmonized nameomics_observ_id
  • DescriptionA unique identifier of the omics-enabled observatory (or comparable time series) your data derives from. This identifier should be provided by the OMICON ontology; if you require a new identifier for your time series, contact the ontology's developers. Information is available here: https://github.com/GLOMICON/omicon. This field is only applicable to records which derive from an omics time-series or observatory.

organism count

  • Harmonized nameorganism_count
  • Descriptiontotal count of any organism per gram or volume of sample,should include name of organism followed by count; can include multiple organism counts

oxygenation status of sample

  • Harmonized nameoxy_stat_samp
  • Descriptionoxygenation status of sample


  • Harmonized nameperturbation
  • Descriptiontype of perturbation, e.g. chemical administration, physical disturbance, etc., coupled with time that perturbation occurred; can include multiple perturbation types

positive control type

  • Harmonized namepos_cont_type
  • DescriptionThe substance, mixture, product, or apparatus used to verify that a process which is part of an investigation delivers a true positive

reference for biomaterial

  • Harmonized nameref_biomaterial
  • DescriptionPrimary publication or genome report

relationship to oxygen

  • Harmonized namerel_to_oxygen
  • DescriptionIs this organism an aerobe, anaerobe? Please note that aerobic and anaerobic are valid descriptors for microbial environments, eg, aerobe, anaerobe, facultative, microaerophilic, microanaerobe, obligate aerobe, obligate anaerobe, missing, not applicable, not collected, not provided, restricted access

sample capture status

  • Harmonized namesamp_capt_status
  • DescriptionReason for the sample, e.g., active surveillance in response to an outbreak, active surveillance not initiated by an outbreak, farm sample, market sample

sample collection device or method

  • Harmonized namesamp_collect_device
  • DescriptionMethod or device employed for collecting sample

sample disease stage

  • Harmonized namesamp_dis_stage
  • DescriptionStage of the disease at the time of sample collection, e.g., dissemination, growth and reproduction, infection, inoculation, penetration

sample material processing

  • Harmonized namesamp_mat_process
  • DescriptionProcessing applied to the sample during or after isolation

sample salinity

  • Harmonized namesamp_salinity
  • Description

sample size

  • Harmonized namesamp_size
  • DescriptionAmount or size of sample (volume, mass or area) that was collected

sample storage duration

  • Harmonized namesamp_store_dur
  • Description

sample storage location

  • Harmonized namesamp_store_loc
  • Description

sample storage temperature

  • Harmonized namesamp_store_temp
  • Description

sample volume or weight for DNA extraction

  • Harmonized namesamp_vol_we_dna_ext
  • Descriptionvolume (mL) or weight (g) of sample processed for DNA extraction

size fraction selected

  • Harmonized namesize_frac
  • DescriptionFiltering pore size used in sample preparation, e.g., 0-0.22 micrometer

source material identifiers

  • Harmonized namesource_material_id
  • Descriptionunique identifier assigned to a material sample used for extracting nucleic acids, and subsequent sequencing. The identifier can refer either to the original material collected or to any derived sub-samples.


  • Harmonized nametemp
  • Descriptiontemperature of the sample at time of sampling
Support Center