Sifting the Evidence: or how the blog got its name.
I like statistics. Sometimes they elude me, but that feeling when a concept or technique reveals itself, for me is one of the joys of my studies. Which is a good job, as observational epidemiology isn't light on analyses. The geeky joy of stats was instilled in me by my boss Marcus, forcing his lab group to read papers about statistical techniques, giving us lectures about power calculations and suchlike. Back when I was a newbie, he brought to us a paper called 'Sifting the Evidence: what's wrong with significance tests?', by Jonathan Sterne and George Davey Smith. Little did I know that less than a year later I'd be doing a PhD in their department, with a blog named after their paper.
So what was it about the paper that inspired me?* I suppose it was the first time my eyes were opened to the problems with p values, or certainly, the problem with the blanket 'significant' p values (the dreaded less than .05). Psychology often gets a rap on the knuckles for over reliance on p values. I know the feeling of waiting for SPSS to spit out its results and dreading a p value of .055 or .06. While I knew .05 was an arbitrary cutoff, I also knew of journals which advised in the author instructions that if your p values were higher than .05 you should not claim an association, even if p=.051.
And why is this a problem? Surely we have to put a cutoff somewhere? The paper explains the problems beautifully, with a brief history of the development of the p value, and a comparison of its initial purpose with its current abuse.
I'm not going to go in to the paper in too much detail, it's worth a read and open access, so seek it out. But it's important to consider that this paper was written in 2001, over a decade ago. I started my undergraduate degree (Psychology) in 2001, it feels like a long time ago (although the realisation its that long ago slightly depresses me), but this article is just as relevant today as it was when it was written. Publication bias is still a problem. P values are still poorly understood and even more poorly reported.
There are 5 guidelines that the Sterne and Davey Smith suggest should be issued by journal editors as instructions for authors. Banning the use of 'significant' to describe findings, the reporting confidence intervals focussing on the clinical implications, reporting exact p values rather than arbitrary cutoffs, scepticism of subgroup analyses and important consideration of confounding and bias. But perhaps their most important suggestion is to more carefully consider null hypotheses before running studies. As they point out, if 100 randomised trials of useless treatments are conducted, then all the 'significant' findings will be spurious. Epidemiology has a danger of not being taken seriously if findings are contradictory. If random, poorly powered data dredging studies are more commonplace, there will be more spurious findings and more studies that disagree with each other, leading to a confused message presented to the public and in an extreme example, distrust in research.
Reading this paper was a revelation. These ideas are now second nature to me, but it still surprises me when talking to others or peer reviewing papers how poorly understood they still are. When Neil, Dylan and I set up this blog originally on blogspot, we had a big debate about names. But for me, it was this paper that set me on a track of questioning what was presented to me, and not taking findings at face value. And that's how the blog got its name!
* NB I should say I didn't move to the department because of the paper; in fact I had no idea the authors were there, so much that when I went for my PhD interview I referenced the paper without realising I was talking to Jonathan and George's colleagues!