R: DESeq2 analysis: outliers and refitting

September 16, 2016

When I was running DESeq2, I got the message shown as below:

converting counts to integer mode
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
-- replacing outliers and refitting for 47 genes
-- DESeq argument 'minReplicatesForReplace' = 7
-- original counts are preserved in counts(dds)
estimating dispersions
fitting model and testing

I didn't encounter this before. Here are the reasons:

Answers:

The count outlier flagging is useful when there are a minority of outliers in the dataset, but as you have noted, something else is going on here with so many genes flagged. There are two reasons for so many genes being flagged as outlier: either the method for flagging outliers is not appropriate for the distribution of counts in your data and should be turned off (by setting minReplicatesForReplace=Inf and cooksCutoff=FALSE), or you have a sample which is a count outlier in almost every gene (which could be found using plotPCA as in the vignette). My recommendation if you don't find an obvious outlier sample which is contributing to most of these filtered genes, then turn off the filtering and inspect the top genes using plotCounts.

No, you can't include NA in the columns which are used for modeling. We need complete covariate information.

References:

https://www.biostars.org/p/149031/

Search This Blog

Omics Academy

R: DESeq2 analysis: outliers and refitting

Comments

Post a Comment

Popular posts from this blog

gspread error:gspread.exceptions.SpreadsheetNotFound

Miniconda installation problem: concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.

How to download hg19 reference genome?