What are the best ways to start learning bioinformatics for a wet lab biologist?

I came across this question several times. In my real life, I also encountered this question many times.
Many time, my friend asked me: “Hey, can I learn Bioinformatics with your? Can you give me some materials?” At the beginning, I would say: “Sure. Let me know if you need any help.” Then mostly nothing happened after I sent them the web links or e-books. Now if someone ask me the same question, I usually will say: “Are you sure? If you take it seriously, I’ll teach you.” My point here is take it seriously.

What is Bioinformatics?

Bioinformatics is the science of collection and analyzing complex biological data such as genetics codes. This is the definition given by the first result in Google search. Maybe it’s too abstract to understand.
Based on my understanding, I want to talk about Bioinformatics in the following aspects:
  • Software development
Maybe the most popular tool is blast. When I was in college, I selected a course called “生物信息学”. The english name is “Bioinformatics”. But what I learned is what is blastp, blastx, psi-blast, etc. So I thought Bioinformatics is all about alignments. That’s it. Then I heard of NGS (next generation sequencing). I had no idea what NGS is. Speaking of NGS, bwa is another very popular tool in NGS field. The manuscript of bwa (published in 2009) has been cited by 11117 times by 5/4/2017.
I, with my friends, also developed a tool called ViewBS. ViewBS is to help users to analyze BS-seq data and generate publication-ready figures.
This is the author of bwa.
  • Web application is another product in Bioinformatics. AgriGO is a web app for GO analysis. There are many similar web apps, like DAVID, KOBAS, etc. Actually the future plan of ViewBS is to develop a web app or a desktop app for the users.
  • Database development is another type of Bioinformatics project. For example, DroughtDB is a database to store drought stress related genes. There are many other databases, like AraPort, etc.

Data mining

First I’ll talk about NGS data analysis. Different NGS methods can generate different types of data and answer the questions for different layers of biology. For example, I, with several other colleagues, were working on RNA-seq and MethylC-seq data to understand the mechanisms of imprinting genes in maize. The questions we were trying to answer are how many imprinting genes there are in maize, what are their expression patterns across the different tissues, how epigenetics regulate these imprinting genes, etc.
Except sequencing data analysis, there are some other types of data, like proteomics, metabolomics.
There are many different areas. I couldn’t mention them all here. But the most important thing is what contributions your Bioinformatics project can have or what questions your Bioinformatics project can ask.

How to learn Bioinformatics?

As I explained in the section of What is Bioinformatics, there are many different fields inside Bioinformatics. So the best way to learn Bioinformatics is really dependent on which field you are going to be in.
I started to analyze NGS data in 2010. Now it has been almost 7 years. So here I’ll tell you how to learn Bioinformatics for NGS data analysis.
The best metaphor I like about Bioinformatics is that Bioinformatics is a house with a door. For newbies, it’s like you are standing outside the house. You don’t know what are inside. You need to open the door. The problem is a lot of people they are just walking around the door and wondering what it looks like inside, but they never try to open the door.
So for a wet lab biologist, my first suggestion is to stick to it once you start to learn Bioinformatics. Do NOT panic. Based on my experience, there will be a difficult period after you start your learning. You’ll feel boring. You don’t even know what you can do with what you’re learning. You’ll feel helpless. But don’t worry. Remember to stick with it. Everyone will experience this problem.
Picture via Google search.
My second suggestion is to join a group. As a wet lab biologist, it’s very likely that you are the only person to learn Bioinformatics in your lab. So do NOT learn it alone. It’s very important to have someone supervising you. Maybe there isn’t a group. Why don’t you create one and invite other people to join.
My second suggestion is to learn what you can use in your own project immediately. Use what you have learned in your own project is probably the best and fastest way to grow your Bioinformatics skills.
What to learn?
For NGS data analysis, here are my suggestions of what to learn:
Linux command First, learn the basic command line. Learning the basic command line can help you to handle the data in Linux environment. Within a few hours, you can learn the basic command. But the important thing to practice and use them frequently, even every day.
Then later as your Bioinformatics skill grows, you can learn how to write shell script in depth.
Programming language: Python, Perl and R, etc. Among the programming languages, which one to choose is a headache. My suggestion is to choose the one that is used the most for the people among you. Because as a newbie, you’ll encounter a lot problems while programming. Sometimes, it may take you one day, two days or even a week to solve a problem all by yourself. But for a sophisticated guy, maybe he just needs 1 minute. Again stick with it. Don’t panic.
Learn the basic principles of different applications of NGS technology and the corresponding workflow of data analysis. Nowadays NGS technology has been applied in various ways. There are many different applications, like RNA-seq, ChIP-seq, etc. Each application can be used for different purposes. For example, RNA-seq can be used to measure gene expression levels, identify DNA variations, etc. Knowing the details of each application can help you to utilize these data in a more efficient way for your project.
Try some read data. If you already have a project, try some read data based on your project. Usually, there will be similar data available. Download them and do some tests.
In summary, it’s not easy to learn Bioinformatics. It’s a tough process especially if you want to become a very sophisticated Bioinformatician or Bioinformatics scientist. Keep the passion alive when you feel boring or feel hard. Ask help to your friends in your Bioinformatics group.
====================================================================
If you have any questions or comments, please comment under this blog. I’ll update this blog later based on the comments.
Also I have a plan to start to a YouTube channel to teach researchers how to do data analysis for NGS data. I’ll let you guys know once it’s ready.
If you like this blog please follow this blog. I’ll post new blogs every week or every the other week.

Comments

Popular posts from this blog

gspread error:gspread.exceptions.SpreadsheetNotFound

Miniconda installation problem: concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.

转载:彻底搞清楚promoter, exon, intron, and UTR