Omics Academy

Posts

Showing posts from February, 2017

python

February 05, 2017

I have many excel files with multiple sheets where the first row of data are not headers but data. How can I parse each sheet without specifying a header row, and the default not being 0. Having the first row as my headers is a pain. Failing that, what is the best way to insert a column index into the first row of data? Set you header=None in your parse() that will use some default '0,1,2' type of column names. Then you can change names before or after your concat

RNA-seq中一般长度比较短（< 300bps）的基因建议是不要用的。

February 03, 2017

RNA-seq中一般长度比较短（< 300bps）的基因建议是不要用的。详情见： http://onetipperday.sterding.com/2012/11/dont-trust-cufflinks-fpkm-for- short .html http://seqanswers.com/forums/archive/index.php/t-17404.html Here is what Cole (author of Cufflinks) commented to the observation of " very high RPKM values from Cufflink ": This issue has been discussed elsewhere on this board. As Nicholas points out, RNA-Seq really isn't reliable for very short transcripts. The reason is that all the fragments that map to these transcripts come from the "tail" of the distribution of library fragment lengths. That is, fragments that map to microRNAs are much, much shorter than most fragments in the library - by design in the RNA-Seq protocol, which size selects away very short inserts. Thus, Cufflinks infers that even though relatively few fragments actually mapped to the microRNAs, there were probably TONS of individual microRNA molecules in the transcriptome before all of the various size...

NotFromMe:Simulating genes and counts for DESeq2 analysis

February 02, 2017

Sometimes it is helpful to simulate gene expression data to test code or to see how your results look with simulated values from a particular probability distribution. Here I am going to show you how to simulate RNAseq expression data counts from a uniform distribution with a mininum = 0 and maximum = 1200. Sometimes it is helpful to simulate gene expression data to test code or to see how your results look with simulated values from a particular probability distribution. Here I am going to show you how to simulate RNAseq expression data counts from a uniform distribution with a mininum = 0 and maximum = 1200. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 # Get all human gene symbols from biomaRt library ( "biomaRt" ) mart <- useMart (biomart= "ensembl" , dataset = "hsapiens_gene_ensembl" ) my_results <- getBM (attributes = c ( "hgnc_symbol" ), mart=mart) head (my_results) # Simulate 100...