How to filter SAM/BAM files

Lots of time, we need to extract unmmaped reads from the SAM/BAM file. Then you google it, someone says samtools view -f 4 test.bam can help you extract all the unmapped reads. Sometimes you need to extract the singleton reads for which only one read of a pair is mapped to the reference genome but not for the other one, you used samtools view -f 9 test.bam.
But without doing Google search, how could you know and understand all these things? This post will let you how to do so.
Imaging you want to extract unmapped reads, how will you know what FLAG value you should use.
First you need to go to Decoding SAM flags.
Here is a screenshot of the web page:
To find out what the SAM flag value would be for a given combination of properties, tick the boxes for those that you’d like to include. The flag value will be shown in the SAM Flag field above.
Here is a gif vedio to show you how to do so:
decodesam
decodesam
If you want to know the SAM flag value for unmapped reads, you can tick read unmapped. Then you will get a value 4 in the SAM flag field.
If you want to get the SAM flag for first in pair, you can tick first in pairand read paired.
If you want to get the SAM flag for mate unmapped, you can tick mate unmappedand read paired.
Then based on whether you want to include or exclude the reads with the corresponding SAM flag, you can use either -f-F or -G to do the filtering.
For example, if you want to extract unmapped reads:
samtools view -f 4 test.bam
If you want to extract unmapped reads:
samtools view -F 4 test.bam
  • -f INT only include reads with all of the FLAGs in INT present [0]
  • -F INT only include reads with none of the FLAGS in INT present [0]
  • -G INT only EXCLUDE reads with all of the FLAGs in INT present [0]
References:

Comments

Popular posts from this blog

gspread error:gspread.exceptions.SpreadsheetNotFound

Miniconda installation problem: concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.

转载:彻底搞清楚promoter, exon, intron, and UTR