By Shian Su, Walter and Eliza Hall Institue of Medical Research
I present to you something I like to call the “Layman’s Conjecture”, it is stated as follows: “There is probably a program for that.” It is a conjecture that is frequently stated and many expect it to be true.
The reality is that there really are a great number of programs for a great deal of things but not in the way you’d expect. To a programmer a program is simply a bunch of code that does something useful, but we are mostly spoiled with “programs” with buttons and pretty interfaces.
Genetic sequencing produces tens of gigabytes of data; a preliminary study of half a dozen mice can easily fill up a Blu-ray disc! So to make sense of the data coming out of these studies we have to make use of a lot of computing power, unfortunately at such cutting edge regions of research the software isn’t as nice and polished as we’re used to. In fact most of the analysis done is run through complicated computer scripts or terminal commands, which to the untrained observer might as well look like one of those screens from ‘The Matrix’.
This is how programs secretly do everything in the back
A statistician usually does the statistical analysis done for biology experiments, but they usually use a program written by another statistician who is an expert in that particular area. For very simple experiments it’s actually possible for biologists to do their own analysis, or rather it would be if they could program themselves, because to make use of the kinds of programs statisticians are used to they would need to write little computer scripts. So I wanted to try and make little tools that let biologists do some simple analysis using state-of-the-art statistical methods. This saves biologists trouble because they don’t have to go find a statistician, it also saves statisticians trouble because they don’t have to get swamped by biologists looking to have their data analysed. So I ended up making some web-based tools for biologists to statistically determine whether or not the genes in two different experimental groups are expressed differently.
The tool I made! Looks like some boring old form rather than an actual program, but filling out forms is still easier than writing computer code.
Once it’s all well tested and completed, biologists can take the samples from their two experimental groups and work out if genes in a mouse or human have become more or less active under some conditions. This means they can work out what genes are responsible for what disease and find drugs that can target those genes fight the disease. Actually right now some decisions on experimental outcome are made based on rules unrelated to statistics, but mathematics is precise and powerful, and I think you’d rather a doctor use a scalpel rather than a kitchen knife. Hopefully my tools make it into the hands of researchers everywhere.
Shian Su was one of the recipients of a 2013/14 AMSI Vacation Research Scholarship.