Your genome is your personal DNA sequence. DNA is a really long molecule that we represent with a string of letters which can be one of four types: A, T, G or C. The human genome has about 3 billion of these ‘letters’.
Now let me give you a feel for how big this really is. Imagine Tolstoy’s very thick novel War and Peace.
It has around 3 million letters in it, so the human genome has as many letters as 1,000 copies of War and Peace. If you were to pile this many copies of the paperback novel on top of each other you would form a stack about as high as an 18-storey building.
There are a lot of letters in that stack and a lot of information that we are trying to understand. A genetic disease is like having a typo in one of those copies of War and Peace. In order to achieve this we have to shatter the DNA into millions of small pieces and then we can sequence these fragments simultaneously.
So this is like taking our 18-storey stack of War and Peace books, ripping out all the pages, shredding them into pieces that only contain a few hundred letters and simultaneously ‘reading’ the letters on each tiny piece of paper.
As a result we have a huge pile of all the words and letters, but the trick is to try and put them back together correctly to make the original books. This can be quite a hard task and is really a computational and statistical challenge. You cannot do this by hand; you cannot even do this with an Excel spreadsheet!
My job as a bioinformatician is to work with these data computationally and statistically in order to make sense of the DNA and understand what it means. We try and find the ‘typo’ in the genome that is related to disease. I see my job as the interface between our ability to generate data and our ability to understand what it is telling us about biology.
Dr Alicia Oshlack is Head of Bioinformatics at the Murdoch Childrens Research Institute and an NHMRC Career Development Fellow.