In News

By Daniel Mahar

Consider how much time and money goes into research to enable us to live longer and healthier lives, how much we invest to help us understand what happens inside our body, our mind, and what makes us happy.

Because of this, it’s critical that when, after following the progress of a group of people for an extended period of time, finally gathering all the data they can offer, we must use the best possible methods to analyse that data. Anything less and we would be doing a disservice to not only all the people who volunteered their time or underwent treatments but also to everyone involved in that field of research and indeed, all the other statisticians out there.

During my AMSI Vacation Research Project, I have been working on various methods of analysing time dependent data, looking at the strengths and weaknesses of the different methods and what happens when choosing the wrong method to analyse the data. With the improvement in technology and computing power that’s occurred over the past 15 years, as well as the involvement of open source statistical software such as R, we can now use more advanced methods, such as Generalised Least Squares (GLS) and Linear Mixed Effects (LME) models to improve upon the more commonly used methods such as Repeated Measures ANOVA (RMANOVA).

An important aspect of this is the ability to choose a more customised correlation structure to more accurately account for the natural errors which exist when taking random samples and one which more accurately reflects the nature of people and how they change over time.

Given the relatively recent developments of GLS and LME, testing showed that although both attempt to estimate the same coefficients and confidence intervals, they go about it in different ways and consequently, whilst there was little or no difference in their estimation of the coefficients, there was a significant difference between the two methods in estimating their confidence intervals. This difference also turned out to be the most significant difference, more so than the choice of correlation structure chosen within each model.

The question still remains of which model to choose and more importantly, why? Sadly there is no definitive answer here, each model may well be more suitable under different circumstances and indeed, the best model may yet be an unknown model that lies somewhere in between the GLS and LME. What is important is that as the statistician, you’re able to justify why you’re choosing that model, why it works best under the given circumstances and consequently, why you have the most accurate possible results that can be relied upon for decision making.

 

Daniel Mahar was one of the recipients of a 2013/14 AMSI Vacation Research Scholarship.