Dr. Baggerly is best known for his work on “forensic bioinformatics”, in which reexamination of raw data shows the need for careful experimental design, preprocessing, and documentation – the careful application of basic statistics and sanity checks. Keith took some time out of his busy schedule to answer questions about his field of research, we look forward to hearing him speak at this year’s BioInfoSummer.
What do you think are the most interesting “big questions” in your field?
Only some patients respond well to chemotherapy, and we only get one shot at picking treatment. How can we predict patient response to chemotherapy from single measurements of patient tumors and perturbation studies on associated cell lines?
Most chemotherapies kill lots of cells. To what extent can we use genomic information to better choose targeted therapies? How much of an improvement in clinical outcome is likely to result? How can this be balanced against cost and assay stability? To what extent can we use roundabout paths (targeting second-line players) to slow down previously untreatable disease?
Recent work has shown that resensitizing patients’ immune systems can produce positive responses in many patients, and durable responses (years) in about 20%. What determines which patients will respond to these therapies?
Please tell us about your research interests and what you are currently working on.
I’m interested in applying mathematical techniques to improve patient care. Given the present deluge of big data, that’s largely translated into trying to distill usable information from huge swaths of data, and dealing with the first-order processing and design questions that need to be addressed before anything useful can be inferred.
I’m further interested in the extent to which some questions can simply be overwhelmed by collecting and organizing large volumes of relevant data; examining the mutational profiles of thousands of cancer samples is a case in point. Once the data are moderately clean, there’s the additional challenge of grasping enough of the underlying biology to think of natural interrelationships that can be checked between different types of measurements (e.g., DNA, RNA, protein). Most recently, I’ve been exploring these issues in the context of perturbation studies where we kick one gene at a time.
Do you have favourite applications of your work and what is the impact of these applications?
My favorite applications vary over time. Those that give me opportunities to learn more biology and have the potential to yield less “scorched earth” therapies in the cancer context are the general leaders. Much of the impact of my work thus far has been associated with clarifying the basic processing pipelines for working with several types of data, so we’re in agreement as to reasonable summaries of what the data actually are as we try to understand the underlying stories.
Why did you choose this career?
When I first started graduate school I wasn’t really sure exactly what I wanted to do. I drifted into statistics because I was good at mathematics and I liked seeing how such techniques could be applied. After a few years at Los Alamos and Rice University, I chose to go into bioinformatics because it was (a) an applied area with (b) lots of novel types of data and (c) clear relevance to patient health.
Can you tell us about the highlight of your career so far?
There have been a few; identifying and describing relevant processing procedures for new types of data (microarrays, mass spectrometry, reverse phase protein arrays) has been exciting and interesting. That said, what I’m best known for (by far) involves attempts to prevent incorrect data from directing patient care. People are understandably excited by the potential for big data to supply useful information, but either aren’t aware of or drastically underestimate the potential for artifacts or simple mistakes to drive patterns that “look cool”. In a few instances now, we’ve explored exciting results reported in the literature to see whether they could be implemented at MD Anderson to improve patient care, only to discover the “results” arose from active errors, not biology, and these needed to be actively blocked from implementation so they wouldn’t hurt patients. In one case things had already gone too far and patients were already being incorrectly treated, and we had to fight to get the institutions involved to act according to what the data actually showed, as opposed to what the investigators wanted to believe. Unfortunately, this took years, which highlighted the need for higher standards before things could get that far.