logo



Keywords: Statistics, data analysis, public policy, current affairs

Title: The Cult Of Statistical Significance

Author: Stephen Ziliak and Deidre McCloskey

Publisher: University of Michigan Press

ISBN: 0472050079

 

What is 'statistical significance', 'standard error' and why should we care? For economists Deirdre McCloskey and Stephen Ziliak, these are fundamentally important questions and we should all care. As they put it:

Crossing frantically a busy street to save your child from certain death is a good gamble. Crossing frantically to get another mustard packet for your hot dog is not. The size of the potential loss if you don't hurry to save your child is larger, most will agree, than the potential loss if you don't get the mustard.

In other words, it's not the probability of success or failure that's important in the real world, it's the pay-off that counts. Translate this probability of success or failure into 'statistical significance' and you have the heart of the problem.

It would be fortunate if we could simply blame innumerate journalists and commentators for mistaking 'statistical significance' with actual significance in the real world. Unfortunately there is more to the confusion over the meaning of 'significance' than journalistic ignorance or laziness. It is, argue the authors, increasingly common in the sciences too - and with potentially dire consequences.

Statistical significance is a test that looks at whether a given result (for a sample) could have arisen by chance (normally 1 in 20). It doesn't ask whether the result is important in any way. Results can have high levels of statistical significance and yet be utterly trivial or uninteresting. And, let's be honest, the press (popular and scientific) is increasingly filled with reports of findings that are statistically significant and yet totally pointless. Such research is sizeless, as the authors describe it, in that there is no measure of how big or important an effect is, the only thing that is reported is statistical significance.

The danger, as the authors take care to point out, is not that our newspapers and web pages are full of insignificant 'significant' results, but that tests of statistical significance are becoming increasingly an end in themselves. It now risks distorting not just the 'soft' sciences (including economics), but the 'harder' sciences, particularly epidemiology. And this is significant in the real sense, way beyond the rarefied world of academic publishing.

One example that McCloskey and Ziliak high-light is the case of Vioxx, the anti-inflammatory drug marketed and later withdrawn by Merck. During clinical trials of the drug five people died while taking Vioxx, compared to one person in the control group taking generic anti-inflammatory drugs. This 5 to 1 ratio was not statistically significant, therefore the conclusion was drawn that there was no difference in the effects of the drugs. This is just one example of many, though a particularly dramatic one.

As well as outlining the issue itself, the authors highlight research that looks at particular fields (economics, for example), to see how widespread the problem is. The results are singularly depressing - not only is sizeless research the major part of what is published in some of the most respected journals in the field, the proportion of such research is growing ever closer to 100%. The problem is getting worse, not better.

How did this situation come about? The history is outlined in some detail, and we are introduced to both the villain of the piece (RA Fisher), and an unsung and unremembered hero (William Gosset). It was Gosset who pioneered many statistical techniques that are used daily in science and industry. He worked for the Guinness brewery and was precluded from publishing under his own name. Instead he published his work using the pseudonym Student, a typically modest choice of name. Fisher, on the other hand, suffered a huge ego and worked assiduously to promote himself and his ideas. Where Gosset was concerned that tests of significance did not translate into real world importance, Fisher had no such qualms. He extended Gosset's work in the sizeless direction , and in some cases even wrote Gosset completely out of the story. The story is related in some detail and it casts an interesting light on the history of statistics - particularly the relationship between early statistical analysis and eugenics.

Despite appearing to be a book of limited appeal - it is after all a book that looks at a set of statistical techniques - it is one that has immense social implications. We live in an age where ideologies have largely been cast aside and instead we are governed increasingly by a class of politicians and civil servants who aim for 'evidence-based' policy-making. When that evidence is based on statistically significant results that ignore any quantification of results then we all have reason to pay attention.

Contents © London Book Review 2008. Published 23 December 2008