How to filter SAM/BAM files
Lots of time, we need to extract unmmaped reads from the SAM/BAM file. Then you google it, someone says
samtools view -f 4 test.bam
can help you extract all the unmapped reads. Sometimes you need to extract the singleton reads for which only one read of a pair is mapped to the reference genome but not for the other one, you used samtools view -f 9 test.bam
.
But without doing Google search, how could you know and understand all these things? This post will let you how to do so.
Imaging you want to extract unmapped reads, how will you know what FLAG value you should use.
First you need to go to Decoding SAM flags.
Here is a screenshot of the web page:
To find out what the SAM flag value would be for a given combination of properties, tick the boxes for those that you’d like to include. The flag value will be shown in the SAM Flag field above.
Here is a gif vedio to show you how to do so:
If you want to know the SAM flag value for unmapped reads, you can tick
read unmapped
. Then you will get a value 4
in the SAM flag field.
If you want to get the SAM flag for
first in pair
, you can tick first in pair
and read paired
.
If you want to get the SAM flag for
mate unmapped
, you can tick mate unmapped
and read paired
.
Then based on whether you want to include or exclude the reads with the corresponding SAM flag, you can use either
-f
, -F
or -G
to do the filtering.
For example, if you want to extract unmapped reads:
samtools view -f 4 test.bam
If you want to extract unmapped reads:
samtools view -F 4 test.bam
-f
INT only include reads with all of the FLAGs in INT present [0]-F
INT only include reads with none of the FLAGS in INT present [0]-G
INT only EXCLUDE reads with all of the FLAGs in INT present [0]
References:
Comments
Post a Comment