Preparing for a Biostatistics Internship

Mar 11, 2017 · 900 words · 5 minutes read bioinformatics cancer

I recently completed an eight-month stint at a cancer research institute in Toronto. It was a great experience that opened my eyes to the entire field of bioinformatics, the joys of working on collaborative projects, and many interesting problems in medical research.

From the moment I got the offer I knew this was an amazing opportunity, and wanted to make the most of it. I started preparing months in advance – in part because I didn’t want to screw up, but mostly because I couldn’t contain my excitement.

In this blogpost I’ll provide a brief summary of the steps I took to prepare, and how each of them helped, and could have been improved.

1. Gain Basic Understanding of Cancer as a Disease

Like most of us, I already had some idea of what cancer entails from seeing family members and others suffer through the disease. I wanted to supplement this with more comprehensive accounts, and started picking up books about the history of cancer.

This type of preparation provided the lowest return on investment, but I still found it to be valuable. The books were enjoyable to read, and provided a gentle introduction to questions such as “how do viruses cause cancer?” and “what’s the difference between chemotherapy and targeted therapies?”

In particular, I read:

The Death of Cancer by Vincent T. DeVita and Elizabeth DeVita-Raeburn
Vincent DeVita was one of the pioneers of combination chemotherapy regimens, working at the National Cancer Institute in Maryland. While his account is obviously biased, the book provides a fascinating overview of advances in cancer treatments from the 1960s onwards.

The Emperor of All Maladies by Siddhartha Mukherjee
This is an absolute masterpiece that I cannot recommend enough. Mukherjee goes through the history of cancer from the Babylonians to the present, and intersperses historical accounts with personal stories of cancer patients he has treated. My only issue with this book is that I couldn’t give it to anyone for Christmas, as my family tends to have a lower tolerance for death and disease than I do!

2. Learn Some Cancer Biology

Of all the steps I took to prepare, this was by far the most important one. While it’s technically possible to analyze data you don’t understand, it requires more hand-holding, and I find it to be less rewarding. As I was going to a large lab where the principal investigator would not have time to provide detailed instructions, I knew I had to put in some work myself.

Having quit biology after Grade 10, I had limited knowledge of genes, chromosomes and cells. I used an online course developed by the lab to get up to speed as fast as possible. The course covered basic biology and cancer biology, and allowed me to go into the internship with a basic understanding of the types of data that I would be working with.

3. Start Reading Papers

In the months leading up to the start of my placement, I would read papers from the lab I was going to every morning over breakfast. I didn’t understand much of them at first, but highlighted and looked up all of the statistical tests.

Going through papers allowed me to get a sense for the kinds of problems the lab was working on, and what statistical methods they used to solve them. Many of the statistical techiques in biology papers are too specific to be given much coverage in statistics master’s programs. For example, multiple testing correction has never been mentioned as more than a side note in any of my classes, but is crucial in genomics research.

If I had to do it over again, there are two things I would change about my paper-reading strategy.

First off, I would try to pick papers that were a better representation of the lab’s work. One week in March I forced myself through a paper by The Cancer Genome Atlas on the molecular characteristics of prostate cancer. The paper was far too specific for me to understand, and the lab had not been heavily involved in the work. As a general rule, papers with lots of lab members on the author’s list and the PI as last author give the best indication of the lab’s research.

Secondly, I tended to focus on the wrong parts of papers. In my experience, genomics papers are written around the figures, with the main text statistics being more of an afterthought. This is different from statistics papers, where the figures are added to back up the points made in the main text. After I realized this, I flipped my approach to reading papers – focusing on the figures as the story and seeing the main text as glue.


Taking the time to prepare really helped with the first few months of work. While I imagine I would have eventually learned the same things anyways, having the head start made me feel less lost.

Of course, there were still a lot of chances to feel lost – and a number of things I didn’t learn beforehand. For example, I had no idea how sequencing technologies work, and didn’t know how to write shell scripts. Both things would have been useful, but there was only so much time I had to prepare. Thankfully, learning on the job was highly encouraged, and I now know much more about both sequencing and shell scripting!