Posts

Showing posts from February, 2018

Annotatr: include extra columns

library(annotatr) read_annotations(con = "test.bed", genome = 'xxx', name = 'test', format = 'bed', extraCols=c(gene_id="character", symbol="character", tx_id="character")) print(annotatr_cache$get('xxx_custom_test'))      test.bed chr1    199     240     gene1   abc     mm10 chr2    200     400     gene2   def     mm10

Create stacked barplot where each stack is scaled to sum to 100%

Image
dat <- read.table ( text = " ONE TWO THREE 1 23 234 324 2 34 534 12 3 56 324 124 4 34 234 124 5 123 534 654" , sep = "" , header = TRUE ) #Add an id variable for the filled regions library ( reshape ) datm <- melt ( cbind ( dat , ind = rownames ( dat )), id.vars = c ( 'ind' )) library ( scales ) ggplot ( datm , aes ( x = variable , y = value , fill = ind )) + geom_bar ( position = "fill" , stat = "identity" ) + # or: # geom_bar(position = position_fill(), stat = "identity") scale_y_continuous ( labels = percent_format ()) Reference: https://stackoverflow.com/questions/9563368/create-stacked-barplot-where-each-stack-is-scaled-to-sum-to-100

How to filter SAM/BAM files

Image
Lots of time, we need to extract unmmaped reads from the SAM/BAM file. Then you google it, someone says  samtools view -f 4 test.bam  can help you extract all the unmapped reads. Sometimes you need to extract the singleton reads for which only one read of a pair is mapped to the reference genome but not for the other one, you used  samtools view -f 9 test.bam . But without doing Google search, how could you know and understand all these things? This post will let you how to do so. Imaging you want to extract unmapped reads, how will you know what FLAG value you should use. First you need to go to  Decoding SAM flags . Here is a screenshot of the web page: To find out what the SAM flag value would be for a given combination of properties, tick the boxes for those that you’d like to include. The flag value will be shown in the SAM Flag field above. Here is a gif vedio to show you how to do so: decodesam If you want to know the SAM flag value for unmapped reads, you

What are the best ways to start learning bioinformatics for a wet lab biologist?

I came across this question several times. In my real life, I also encountered this question many times. Many time, my friend asked me: “Hey, can I learn Bioinformatics with your? Can you give me some materials?” At the beginning, I would say: “Sure. Let me know if you need any help.” Then mostly nothing happened after I sent them the web links or e-books. Now if someone ask me the same question, I usually will say: “Are you sure? If you take it seriously, I’ll teach you.” My point here is take it seriously. What is Bioinformatics? Bioinformatics is the science of collection and analyzing complex biological data such as genetics codes. This is the definition given by the first result in Google search. Maybe it’s too abstract to understand. Based on my understanding, I want to talk about Bioinformatics in the following aspects: Software development Maybe the most popular tool is blast. When I was in college, I selected a course called “生物信息学”. The english name is “Bioinfo

How To Set Up RStudio On A CentOS Cloud Server | DigitalOcean

Check Linux release ls /etc/*release ## /etc/centos-release /etc/os-release /etc/redhat-release /etc/system-release cat /etc/redhat-release ## CentOS Linux release 7.3.1611 (Core) ## /etc/centos-release ## /etc/os-release ## /etc/redhat-release ## /etc/system-release ## CentOS Linux release 7.4.1708 (Core) Install R # Install Extra Packages for Enterprise Linux repository configuration # R belongs to RHEL Extra Packages for Enterprise Linux (EPEL) # Extra Packages for Enterprise Linux (or EPEL) is a Fedora Special Interest Group that creates, maintains, and manages a high quality set of additional packages for Enterprise Linux sudo yum install -y epel-release sudo yum update -y sudo yum install -y R Install Rstudio server ## https://www.rstudio.com/products/rstudio/download-server/ wget https://download2.rstudio.org/rstudio-server-rhel-1.0.143-x86_64.rpm yum install --nogpgcheck rstudio-server-rhel-1.0.143-x86_64.rpm ## By default RStudio Server runs on port 8787 a

system in R

 In R,  system  invokes the OS command specified by  command . # Now the output of the command will be written to 'stdout' system_return_val <- system("ls ../") cat("Retrun value of system is: ", system_return_val, "\n") ## Retrun value of system is: 0 # You can use parameter 'intern' to captur the output of the command as an R character vector system_return_val <- system("ls ../", intern = TRUE) cat("Retrun value of system with 'intern=T' is: ", system_return_val, "\n") ## Retrun value of system with 'intern=T' is: donate home post publications_selected tags intern: a logical (not ‘NA’) which indicates whether to capture the output of the command as an R character vector.

Error when install htslib in Ubuntu

Error message when `make`: gcc -g -Wall -O2 -I.  -c -o cram/cram_io.o cram/cram_io.c cram/cram_io.c:57:19: fatal error: bzlib.h: No such file or directory compilation terminated. Makefile:120: recipe for target 'cram/cram_io.o' failed Solved by:  sudo  apt-get install libbz2-1.0 libbz2-dev libbz2-ocaml libbz2-ocaml-dev