NGS tips: Picard MarkDuplicates - whole bam file marked as duplicates?
If you used Picard MarkDuplicates to mark duplicate reads in your bam files, you may notice that each line has a tag "PG:Z:MarkDuplicates".
You would be confused. Each read has this mark. Does this mean that every read is duplicate?
The short answer is NO.
In SAM/BAM file, whether a read is duplicate is determined by FLAG (the 2nd column).
You can decode SAM flags under the link: https://broadinstitute.github.io/picard/explain-flags.html
For example, I have a read which has a tag "PG:Z:MarkDuplicates". You can type 83 in the box and press "Expain" button. Then you can see the summary shown in the left.
Each read has 12 properties. The 11th is to determine whether this is duplicated or not. As you can see, for 83 the 11th property is not marked meaning that this one is not a duplicate one.
You can also use REMOVE_DUPLICATES=
Comments
Post a Comment