NGS tips: Converting Genome Coordinates From One Genome Version To Another (Ucsc Liftover, Ncbi Remap, Ensembl Api)

Some recent posts reminded me that it might be useful for us to review the options for converting between genome coordinate systems.
This comes up in several contexts. Probably the most common is that you have some coordinates for a particular version of a reference genome and you want to determine the corresponding coordinates on a different version of the reference genome for that species. For example, you have a bed file with exon coordinates for human build GRC36 (hg18) and wish to update to GRC37 (hg19). By the way, for a nice summary of genome versions and their release names refer to the Assembly Releases and Versions FAQ
Or perhaps you have coordinates of a gene and wish to determine the corresponding coordinates in another species. For example, you have coordinates of a gene in human hg19 and wish to determine corresponding coordinates in mouse mm10.
Finally you may wish to convert coordinates between coordinate systems within a single assembly. For example, you have the coordinates of a series of exons and you want to determine the position of these exons with respect to the transcript, gene, contig, or entire chromosome.
There are at least three well known tools that can help you with these kinds of tasks:
  1. UCSC liftOver. This tool is available through a simple web interface or it can be downloaded as a standalone executable. To use the executable you will also need to download the appropriate chain file. Each chain file describes conversions between a pair of genome assemblies. Liftover can be used through Galaxy as well. There is a python implementation of liftover called pyliftover that does conversion of point coordinates only.
  2. NCBI Remap. This tool is conceptually similar to liftOver in that in manages conversions between a pair of genome assemblies but it uses different methods to achieve these mappings. It is also available through a simple web interface or you can use the API for NCBI Remap.
  3. The Ensembl API. The final example I described above (converting between coordinate systems within a single genome assembly) can be accomplished with the Ensembl core API. Many examples are provided within the installationoverviewtutorial and documentationsections of the Ensembl API project. In particular, refer to these sections of the tutorial: 'Coordinates', 'Coordinate systems', 'Transform', and 'Transfer'. Ensembl also has a simple web service for coordinate conversions.
  4. Bioconductor rtracklayer package. For R users, Bioconductor has an implementation of UCSC liftOver in the rtracklayer package. To see documentation on how to use it, open an R session and run the following commands.
  5. CrossMap. A standalone open source program for convenient conversion of genome coordinates (or annotation files) between different assemblies. It supports most commonly used file formats including SAM/BAM, Wiggle/BigWig, BED, GFF/GTF, VCF. CrossMap is designed to liftover genome coordinates between assemblies. It’s not a program for aligning sequences to reference genome. Not recommended for converting genome coordinates between species.
    source("http://bioconductor.org/biocLite.R")
    biocLite("rtracklayer")
    library(rtracklayer)
    ?liftOver
  6. Flo. A liftover pipeline for different reference genome builds of the same species. It describes the process as follows: "align the new assembly with the old one, process the alignment data to define how a coordinate or coordinate range on the old assembly should be transformed to the new assembly, transform the coordinates."
Reference:
https://www.biostars.org/p/65558/

Comments

Popular posts from this blog

gspread error:gspread.exceptions.SpreadsheetNotFound

Miniconda installation problem: concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.

转载:彻底搞清楚promoter, exon, intron, and UTR