Introduction to Patches
- What are patches?
- What types of patches are there?
- Do patches result in changes to chromosome coordinates?
- What is a patch release?
- How often does the GRC release patches?
- Why does the GRC release patches?
- How can I tell if an assembly update is a patch release or a major release?
- How should I refer to patches in a publication?
- Where can I find the list of assembly regions that have been patched?
- What implications do patches have for my analyses?
What are patches?
Patches are accessioned scaffold sequences that represent assembly updates. They add information to the assembly without disrupting the chromosome coordinates. Patches are given chromosome context via alignment to the current assembly. Together, the scaffold sequence and alignment define the patch. Patch sequences and alignments can be downloaded from the GenBank FTP site.
What types of patches are there?
- FIX patches: Fix patches represent changes to existing assembly sequences. These are generally error corrections (addressed by approaches such as base changes, component replacements/updates, switch point updates or tiling path changes) or assembly improvements, such as the extension of sequence into gaps. A fix patch scaffold represents a preview of what the assembly will look like at the next major (coordinate changing) release. When the next major release occurs, the accessions for the fix patch scaffolds will be deprecated and the changes will be found in the chromosomes. An example of a fix patch is shown in Figure 1.
Figure 1: Example of a fix patch. The patch fixes a problem with the ABO region on chr. 9 (detailed in HG-2030. The switch points for the clones used to represent this locus on GRCh38 chromosome 9 generate a mixed haplotype that has never been observed in the human population. The top panel shows the chromosome representation in GRCh38. The blue lines represent the clones in the tiling path, the green lines represent genes and the gray line with the red (mismatches) and blue (insertion/deletions) stripes shows the alignment of the patch to the chromosome. The bottom panel shows the same information, but with respect to the patch. In the top panel, ABO aligns across a clone junction. In the fix patch, ABO aligns to a single component due to a change in the switch points between the components.
- NOVEL patches: Novel patches represent the addition of new alternate loci to the assembly. These are alternate sequence representations of sequence found on the chromosomes. When the next major release occurs, the accessions for the novel patch scaffolds will persist, and the scaffolds will be known as alternate loci. At that time, the scaffolds will move from the PATCHES assembly unit to the relevant alternate loci assembly unit. An example of a novel patch is shown in Figure 2.
Figure 2: Example of a novel patch. The patch provides representation for a duplication allele at the CYP2D6 locus on chr. 22 (detailed in HG-2205). The top panel shows the chromosome representation in GRCh38. The blue lines represent the clones in the tiling path, the green lines represent genes and the gray line shows the alignment of the patch to the chromosome. The bottom panel shows the same information, but with respect to the patch. The thin line in the aligment in the bottom panel is an insertion in the patch. The two components highlighted in red in the top panel have been replaced by the three highlighted components in the novel patch. The duplicated CYP2D6 gene is circled.
Figure 3 illustrates the two patch types and their relationship to major assembly releases.
Figure 3. Cartoon illustration of the two patch types and their relationship to major assembly releases.
Do patches result in changes to chromosome coordinates?
No. Chromosome coordinates are unchanged by patches.
What is a patch release?
All GRC assemblies use the NCBI Genome Assembly model. A patch release is a form of minor assembly release in which the only sequence content that updates belongs to the PATCHES assembly unit. Patch releases are cumulative. Patch releases do not change chromosome coordinates.
GRC assemblies are given versioned accessions in INSDC databases (e.g. GRCh37 = GCA_000001405.1; GRCh38 = GCA_000001405.15). Each patch release results in a version update to the assembly accession (e.g. GRCh37.p1 = GCA_000001405.2, GRCh38.p1 = GCA_000001405.16). The assembly accession.version is an unambiguous identifier for the assembly and should always be included in publications.
Patch releases are also reflected in the assembly name. Each patch release results in the addition of the suffix “.p$
” to the assembly name, where $ = the patch release. For example: GRCh38.p1 = the first patch release version of the GRCh38 assembly.
A patch release includes all of the sequences and alignments in the corresponding major assembly release, plus the cumulative collection of patch sequences and alignments. For example, download of GRCh38.p1 (GCA_000001405.16) includes all of the chromosomes, unlocalized and unplaced scaffolds and alternate loci scaffolds found in GRCh38 (GCA_000001405.15), plus the patch scaffolds.
How often does the GRC release patches?
Patches to the human and mouse reference assemblies are reviewed for release on a quarterly cycle. The GRC is not currently providing patches to the zebrafish reference assembly.
Why does the GRC release patches?
Patches are a means to provide updated information for a particular genomic region without changing chromosome coordinates. Researchers performing whole genome analysis typically require stability of chromosome coordinates due to the time and effort it takes to do an experiment, and the effort involved in remapping data to new coordinates. Researchers interested in a particular locus prefer to have the most recent information. Patches allow us to serve both communities better.
How can I tell if an assembly update is a patch release or a major release?
GRC patch releases are always indicated in the assembly name (e.g. Genome Reference Consortium Human Build 38 patch release 1 (GRCh38.p1)). In addition, you can check the accession.version of the primary assembly unit of the corresponding major release. The accession.version of the primary assembly unit only updates in a major (chromosome coordinate changing) assembly release. For example:
- GRCh37 primary assembly unit = GCA_000001305.1
- GRCh37.p13 primary assembly unit = GCA_000001305.1
- GRCh38 primary assembly unit = GCA_000001305.2
- GRCh38.p1 primary assembly unit = GCA_000001305.2
Assembly unit accessions are provided in the full sequence reports, which you access from the corresponding NCBI Assembly pages (e.g. GRCh38.p1).
How should I refer to patches in a publication?
The assembly accession.version is an unambiguous identifier for the assembly and should always be included in publications to describe the particular assembly version used for analysis. In addition, if you are using a patched version of an assembly, for clarity you should indicate whether your analysis included the patch scaffolds. The accession.version of individual patch scaffolds should be provided when describing analyses on a particular patch. Likewise, all analyses and annotations on a patch scaffold should be described using the sequence coordinates of the accessioned patch scaffold. Use of "pseudo-chromosome" coordinates (i.e. providing new chromosome coordinates to reflect a user-defined instance of a chromosome into which one or more patches has been inserted to replace the original chromosome sequence) is strongly discouraged by the GRC, as there are no corresponding INSDC-accessioned sequences.
Where can I find the list of assembly regions that have been patched?
Please see the "Patches and Alternate" loci tables on the GRC website human and mouse pages. In addition, patch placements are available in the alt_scaffold_placement.txt report, which can be downloaded from the corresponding PATCHES directory in the GenBank FTP site (e.g. GRCh38.p1).
What implications do patches have for my analyses?
Fix patches represent sequence updates; they will replace the current chromosome sequences in the next major assembly release. Thus, when interpreting data, the fix patches should take precedence over the chromosomes. Users interested in a particular locus should consider the fix patch the most accurate representation of the region.
Novel patches can be treated like alternate loci. They are alternate sequence representations for specified genomic regions. The existence of a novel patch (or alternate locus) does not imply an error in the chromosome. When interpreting data, the novel patches should be treated as sequence variants.
All human patches contain some sequence that is redundant to the current assembly sequence. Inclusion of patch scaffolds in alignment target sets therefore introduces allelic duplication. Currently, many commonly used aligners cannot distinguish allelic variation from paralogous variation. As a result, reads mapping to patches (and alternate loci) may receive depressed mapping scores and be excluded from downstream analyses. Likewise, variant analysis tools do not have a robust mechanism for handling patch sequences. Thus, while patches are valuable additions to the assembly (correcting errors and providing variant representations), users should be aware of these potential complications.