My concerns about rRNA-depleted RNA-seq data

Recently I did some analysis on rRNA-depleted RNA-seq data. I found that there were a lot reads from intronic regions and intergenic regions.

Here I showed you one example from Illumina commercial samples:
How you can get the figure above:
Go to basespace.illumina.com, login and go to Public Data tab.
Select HiSeq 2000: TruSeq Stranded Total RNA (MAQC) and import Project into your account.
Select RNA-Seq alignment and scroll down to alignment distribution. You can see that ~50% intronic and intergenic is typical with commercial samples in our internal workflow.

What are the possible sources of these intronic and intergenic reads?
"Some reads are definitely expected as this kit sequences both coding and noncoding RNA. The library prep uses an rRNA depletion, so everything else should be present aside from small RNA <200 nt or so. This means the reads we sequenced could be from mRNA, pre-mRNA, nascent RNA and/or degraded mRNA, which is also the reason that we saw many reads in the introns. " This is the answer from Illumina Tech Support. 

My two concerns from the standpoint of analysis:

1) Researchers will first identify exonic reads to identify differentially expressed genes (DEG). The reads we want to use I think should be from mature mRNA because mature mRNA will be translated into protein and to function. But if there are reads from other sources (pre-mRNA, nascent RNA and/or degraded mRNA) indicating there are also exonic reads from other sources. How these reads from pre-mRNA, nascent RNA and/or degraded mRNA will affect the DEG identification? I don't know. 

2) Researchers who performed rRNA-depleted RNA-seq will also want to identify lncRNA. Intronic regions there could also be some lncRNA (intronic lncRNA). But if a lot of reads are from pre-mRNA, nascent RNA and/or degraded mRNA. How could we use these reads to identify "real" lncRNA???
The same thing for the intergenic lncRNA (lincRNA), especially for the lncRNA that also have introns. 




Comments

Popular posts from this blog

gspread error:gspread.exceptions.SpreadsheetNotFound

Miniconda installation problem: concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.

转载:彻底搞清楚promoter, exon, intron, and UTR