Quality over quantity

9 July 2012 by Pete Etchells, posted in Uncategorized

Darwin’s theory of Natural Selection. Einstein’s theory of general relativity. Watson and Crick’s description of the DNA molecule. All of these, and many more, are easily classed as some of Science’s Greatest Discoveries. But with modern pressures to publish high-impact papers as often as possible, are the opportunities to find the next Great Scientific Discoveries being stifled?

As I’ve mentioned before, last year one of the academic heavyweights of social psychology, Diederik Stapel, was found guilty of faking data in a lot of his research. Quite a lot of data, actually – over 30 scientific papers and numerous PhD theses. In the aftermath, a lot of difficult questions have been asked about how and why something like this could possibly happen. Some of the reasoning behind his actions comes from Stapel himself:

“…I did not withstand the pressure to score, to publish, the pressure to get better in time. I wanted too much, too fast. In a system where there are few checks and balances, where people work alone, I took the wrong turn.”

Stapel’s behaviour is quite clearly inexcusable, but the pressure to publish may feel familiar to many other researchers – perhaps, most acutely, by the one hundred academics who recently lost their jobs at the University of Sydney for not publishing frequently enough.

In principle, the idea of publishing as much as possible doesn’t seem too bad – if you’re running lots of experiments and doing lots of work, it means that we might get to see those Great Scientific Discoveries quicker, right? But this sort of mentality can cause (and has already caused) a number of undesirable side effects. Perhaps the most well known is ‘publication bias’. This can manifest itself in different ways, but generally refers to the tendency for positive results (in other words, those in which a hypothesis is confirmed) to be much more likely to be published than negative, or inconclusive results. Put another way, if you run a perfectly good, well-designed experiment, but your analysis comes up with a null result, you’re much less likely to get it published, or even actually submit it for publication. This is bad, because it means that the total body of research that does get published on a particular topic might be completely unrepresentative of what’s actually going on. It can be a particular issue for medical science – say, for example, I run a trial for a new behavioural therapy that’s supposed to completely cure anxiety. My design is perfectly robust, but my results suggest that the therapy doesn’t work. That’s a bit boring, and I don’t think it will get published anywhere that’s considered prestigious, so I don’t bother writing it up; the results just get stashed away in my lab, and maybe I’ll come back to it in a few years. But what if labs in other institutions run the same experiment? They don’t know I’ve already done it, so they just carry on with it. Most of them find what I found, and again don’t bother to publish their results – it’s a waste of time. Except a couple of labs did find that the therapy works. They report their experiments, and now it looks like we have good evidence for a new and effective anxiety therapy, despite the large body of (unpublished) evidence to the contrary.

This leads into all sorts of issues rooted in precisely how we statistically analyse our work, but there’s another, simpler problem that this phenomenon causes; it’s wasting a lot of time for a lot of people. There must be countless scientific studies out there that, whilst methodologically sound, simply just didn’t produce a result deemed interesting enough to publish. And because they weren’t published, we don’t have a measure of how many times they’ve inadvertently been replicated elsewhere. Compounding this problem is the idea that if, say, a particularly time-intensive experiment doesn’t work out, researchers might find themselves under pressure to quickly publish something else instead; something that might not be particularly interesting or useful, but is quick, easy, and likely to have a positive outcome. The end result is that we’re sacrificing scientific creativity and research diversity for safe options, science in small increments, and administrative box-ticking. In Psychology, projects like Psychfiledrawer are starting to address this issue, but more clearly needs to be done.

We have to accept, and be comfortable with, the fact that theories and ideas need time to fully develop. As scientists, we to be okay with things when they don’t work out the way we thought they would. The world is a big, noisy, messy place, and not only is that absolutely fine, it’s also exciting. Darwin’s theories on evolution were almost 23 years in the making; would the modern-day pressure to publish have meant that On the Origin of the Species might have been confined to a dusty lab drawer, in favour of a quick and easy, but perhaps mediocre, paper?


29 Responses to “Quality over quantity”

  1. Chris Chambers Reply | Permalink

    Thanks for this post, Pete. Spot on as usual.

    We’ve got a lot of problems to solve in the coming years to combat publication bias, bean-counting measures that favour quantity of papers over quantity, and fraud.

    Some possible solutions to mull over:

    1) Journals to archive all raw data upon submission of manuscripts. The raw data to be publicly available.

    2) We forever abandon flawed impact factor as a proxy measure of journal/article quality, for the REF, and in assessing applicants for jobs.

    3) I quite the like the idea of pre-registering scientific studies, as Neuroskeptic has suggested: http://neuroskeptic.blogspot.co.uk/2012/04/fixing-science-systems-and-politics.html
    There are some kinks to iron out but something like this would help combat publication bias and is definitely on the right track (it has helped reduce this problem in clinical trials)

    4) A shift toward Bayesian statistical analyses, which allow researchers to directly test the null hypothesis

    5) We derive some kind of replicability index for scientific studies and give this more weight than quantity of pubs. I like many of Brian Nosek et al’s ideas, e.g. here: e.g. arxiv.org/pdf/1205.1055 and arxiv.org/pdf/1205.4251

    6) We move toward standardisation of analysis approaches in psychology and neuroimaging, to avoid the problem of having too much flexibility and consequently reporting spurious results following data mining (this could be helped by instituting (3) and having reviewers assess methods, including analysis approach, even before the results are in). Then authors are bound to their decisions – or must at least report the original outcome + any additional analyses.

  2. Pete Etchells Reply | Permalink

    Thanks for the comments, Chris. I really like the idea of pre-registering – it works in other domains, and so seems like the kinks aren’t insurmountable.

    Requiring archival of all raw data is also an option, but I worry that in practice it wouldn’t really be manageable. Does it mean that we end up having to require reviewers to reanalyse the data as well? I’m sure many would be resistant to that idea. But you’re right, maybe just simply having data publicly available, to be scrutinised at any point, would deter a lot of bad behaviour.

  3. Chris Chambers Reply | Permalink

    Yeah, I don’t think we could realistically expect reviewers to reanalyse data – but uploading it at the submission stage will deter outright fraud (e.g. of the Stapel kind), and will also make authors who commit lesser but still serious offences (e.g. data massaging) think twice. Even for honest scientists it will encourage us to pay that extra bit of attention to our data and make absolutely sure we haven’t made an embarrassing error that a dedicated reviewer could check for themselves.

    Making raw data freely available also allows later meta-analyses to be undertaken with minimum fuss.

    The big problem with pre-registration, as I see it, is avoiding having people game the system by ‘pre’ registering after a study is complete. But that’s potentially solvable too.

  4. Mr Clean Reply | Permalink

    Interesting article, but unfortunately you seem to mix the quite different concepts ‘publish or perish’ and ‘publication bias’. While I can agree that both are, more or less, a problem, their specific nature – and potential solution – is completely different.

  5. Pete Etchells Reply | Permalink

    They are two different things, but they aren’t mutually exclusive – and I’m making the argument that one can impact upon the other, not confusing them. Data don’t always behave in the way we expect them to, and when this happens, it can be a bad thing for your end of grant report. It’s quite difficult to get null results published (the publication bias bit), so you might be pushed to do a quick and dirty experiment off the back of the project that isn’t that interesting, but you know is low-risk and likely to produce a positive outcome (due to the publish or perish mentality). The outcome is that scientific creativity is stifled.

  6. Mr Clean Reply | Permalink

    As I’ve understand from your initial article you argue that ‘publish or perish’ causes ‘publication bias’.

    When you argue that both concepts together lead to sloppy science I can fully agree, but as far as I know there’s no causal relationship between them.

  7. Pete Etchells Reply | Permalink

    I’m not saying that it’s the sole cause. There’s a route through which they can be linked, though.

  8. Khalil A. Cassimally Reply | Permalink

    Essentially, in an ideal world, all data amassed from well-designed experiments would be available in some way or another online. There are a services/startups currently trying to enable just that. Figshare(.com), for instance, wants to be a depository for all your data to allow for more effective sharing and meta-analyses, amongst other things, further down the line.

    This is the way to go for scientific publishing, surely. Especially when taking into account the amount of data we produce today (and will in the future), the traditional model(s) of scientific publishing can no longer keep up.

  9. Pete Etchells Reply | Permalink

    One potential problem for making data available online is participant consent. I can envisage certain types of experiment wherein detailed personal data might wish to be collected – not necessarily anything that can result in the participant being identified, but data which might be deemed sensitive by them (even if it’s just things like sexual behaviour preferences). Researchers will have to get consent from them to make this data freely available for others to see, and it won’t always be possible to obtain this. More over, there is a potential for the abuse of this data – for example, if I (as a participant) consent to you using my data for your specific experiment only, I would be very unhappy if I found out that someone had obtained the data from a website and used it for other experiments without my consent. I’m not sure how we get around those issues.

  10. Mike Fowler Reply | Permalink

    Shameless plug for a journal that was started to deal with many of the important issues you raise, Pete:

    The Journal of Negative Results – Ecology & Evolutionary Biology
    http://www.jnr-eeb.org/

    (apologies, but there’s a problem formatting links in the Preview, at least)

    I’m an associate editor there, and we always welcome submissions that are scientifically rigorous, but don’t rely on a p-value to make a point. There are no publication fees!

  11. Pete Etchells Reply | Permalink

    Cool, thanks for the information, Mike.

  12. Jenny Bryant Reply | Permalink

    Was just about to mention the Journal of Negative Results before seeing that previous comment. Such a good and necessary Journal! I'm interested as to whether people will spend the time to write up negative results. If everyone did, I can imagine it would be one hefty journal!

    I feel a lot of pressure to get experiments to show something special but I've seen what happens to people who do that and also have a conscience! As my boss says, "At least I know you haven't faked these results!"

    I've been guilty of making a genuine mistake that made some really good results and when finding out it was horrible to have to admit. Can imagine others keeping quiet and that makes me sad.

    Great point, well raised Pete!

    • Pete Etchells Reply | Permalink

      Hey Jenny, thanks for the comment! I think the difficulty with new blogs (no offence, Mike), is that they take quite a while to build up momentum, and at the moment that means that there's not much incentive for people to write up null results for them. Why write up a failed experiment for a low-impact journal, when you can run another that might have a better chance someone more established? But hopefully that mentality will change.

      The other option, that's been mentioned in Chris' blog and elsewhere, is to enforce pre-registration of studies before they're run - that way, if they don't work and don't get written up, we'll at least know of their existence.

      I've also been in the position where I've screwed up some results and had to admit to it. Thankfully, the results looked too perfect to fit the argument, so it was clear I'd buggered up from early on. It was very embarrassing though, and I'm with you on understanding on how easy it could be for someone to not mention it and hope it doesn't get noticed. Sadly, I don't think the systems we have in place to flag this happening are working in the way that they should be.

  13. Dragonvale Reply | Permalink

    To picture out the out come of quality is a pain of the head and people will go for quality.

Leave a Reply