It’s not a failure when you fail to replicate

3 May 2013 by Pete Etchells, posted in Uncategorized

Let’s get this out there to begin with, so it’s absolutely clear in everyone’s minds. ‘Failure to replicate’ a study does not mean that the original study was wrong, poor, or fraudulently conducted. It does not call into question an entire field of science. It does not call into question the integrity of any scientists involved. It means that the results of the replication did not match the original study, which could be for a number of reasons. It is simply a part of the scientific process, and a good part at that.

Which is why I was completely flummoxed by a recent Nature headline screaming “Disputed results a fresh blow for social psychology”. The article relates to a recent study published in PLOS ONE, which looked at the concept of ‘intelligence priming’ - that thinking about someone perceived to be smart (or stupid) can affect your performance on an intelligence test. Over the course of 9 experiments, the study attempted to replicate the findings of a 1998 paper by Dijksterhuis and van Knippenberg, and the results all pointed towards intelligence priming providing no advantage in subsequent intelligence tests (i.e. in contention with the original results). I’m not going to go into the nitty-gritty of the specific studies - if you’re interested, it’s worth reading the paper along with responses in the comments. But there are two points to note here. One, as already mentioned, is that replication studies are important, and should be forming a huge part of scientific research - so it’s a good thing that this study was conducted (and published). The second is that one failure to replicate does not constitute a death blow for a particular theory.

To echo Gary Marcus’ recent post on the matter, social psychology does not equal priming, and priming does not equal social psychology. To say that one failure to replicate one particular phenomenon is a blow for the entire field is disingenuous, and tarnishes the many admirable attempts currently being made to not just turn psychology around, but also to lead the way in reforming scientific research practices. Initiatives like the Reproducibility Project, Cortex’s Registered Reports (which went live this week), and BMC Psychology’s open access approach to reviewing are all shining examples of the positive and beneficial moves currently being made.

Perhaps another, more worrying problem, is the association between failures to replicate and fraud. Again, to be clear on this, there are two completely separate conversations to be had. One is the need to replicate psychological studies to determine whether the effects we see are genuine and robust. The other is whether questionable research practices (QRP) are leading to over-inflated and erroneous results. Again, others have gone into excellent details on these matters recently, but it’s worth remembering that this isn’t a two-way street. Replication is part of the answer to preventing or discouraging QRPs. QRPS are not an inherent part of failures to replicate. In my opinion, to discuss any failure to replicate with specific reference to the fraud of Stapel and Smeesters, as the Nature article did, unfairly and unnecessarily calls into question the integrity of honest researchers.

In short, I don’t think every failure to replicate a study is news-worthy, and I certainly don’t think it helps anyone to persistently link such failures to extreme cases of fraud. Many psychologists are actively trying to reform the field in innovative and interesting ways, for the benefit of everyone. Let’s concentrate on that being a positive thing.

15 Responses to “It’s not a failure when you fail to replicate”

  1. Khalil A. Cassimally | Permalink

    You say that you "don’t think every failure to replicate a study is news-worthy". While this inherently is fair enough, do you think that every published study is news-worthy? Better still, would you like for all of your published papers to be covered by the media? If you answer yes to either of those questions, which I anticipate most scientists will, then surely studies that fail to replicate findings deserve to be in the news as well.

    Regardless, the fact remains that news coverage is based on demand. Right now, replication problems in social psychology is in the limelight and media entities will milk as much traffic (hence revenue) as they possibly can from it. But what will happen when people are sick of hearing about those studies though? Swept under the carpet, restricted to obscure journals, snubbed by mainstream media? This, I think, is the question we should ask.

    • Daniel Lakens | Permalink

      "If you answer yes to either of those questions, which I anticipate most scientists will"

      Then you really don't know scientists. The media should write about results that are relevant for a general public. Highly novel findings, of which it is unclear why they occur, or how reliable they are, can be shared among scientists, to allow people to examine the topic, but have no place in newspapers. Science needs time to figure out what is the signal, and what is the noise. Personally, I'm only annoyed when I see a single study paper presented in the media as the 'truth'.

      • Khalil A. Cassimally | Permalink

        "Highly novel findings, of which it is unclear why they occur, or how reliable they are, can be shared among scientists, to allow people to examine the topic, but have no place in newspapers."

        I have to disagree with you here, Daniel. If we keep thinking that certain papers have to place in newspapers, then we're only contributing to the single-study-paper problem to make mention of. All papers has the potential to be in the newspaper, if not on their own, for lack of a better word, accounts, then as part of a larger theme or argument which a journalist is conveying.

    • Pete Etchells | Permalink

      Thanks for the comment Khalil. With regards to the first bit (i.e. is everything news-worthy?); no, I don't think every published study is, in and of itself, news-worthy. Sure, there are angles you can take that bring a few studies together, but it depends whether you're talking about reporting an individual paper, or a number along a theme. Obviously I'm biased in thinking my own papers are interesting, but I'm not necessarily convinced that they would make for a compelling news story, and I'm sure a lot of other people would think the same.

      With regards to the second bit (i.e. news coverage/demand), I agree with what you're saying. I think the issue is what we mean by 'replication problems'. A failure to replicate a study is not a replication problem. Having no replications at all is a replication problem. In that sense, the Nature piece in question misunderstood what the issue is.

    • Eric Rietzschel | Permalink

      Failures to replicate are newsworthy under the same condition that holds for other findings: if they give us solid answers to questions about how things work. The importance of non-replications often (usually?) lies in the fact that they force us to ask (or re-ask) questions. Questions usually don't make for very popular news stories.

  2. Rolf Zwaan (@RolfZwaan) | Permalink

    Great post. I agree 100%. In fact, I just made the very same points in an email to the Nature journalist. One additional point though. Smeesters has never admitted to having committed fraud. The scientific integrity committee I chaired was able to show that some of his findings were too-good-to-be-true and that he had lost most of his raw data. We decided that based on this, we had no confidence in the findings of those studies, which were subsequently retracted. Given that we were not able to prove fraud, I think it's even unfair (or premature) to say that Smeesters has committed fraud.

  3. Eric Rietzschel | Permalink

    Excellent piece, and I couldn't agree more. The notion that non-replication is a cause for suspicion will be counterproductive, because (a) it will stain the general reputation of replication research (turning it into detective work rather than science), and (b) it will make researchers react defensively to non-replications of their results (turning it into an argument, rather than a discussion).

  4. Elisabeth | Permalink

    A failure to replicate can be driven by small changes in the experimental design in some cases. I agree that this gives clues to how robust the finding is, but doesn't mean that under the original test conditions, with the original participants, the finding didn't happen. Contradictory results lead to more questions - not about the integrity of the scientists, but about what small aspect of the changes in experimental procedures may have led to the divergent findings - which may lead to important changes in how we think about the world. If the finding can't hold up to seemingly minor modifications, then it's something important to know, but if the media is going to call "fraud" on every study that isn't replicated, then that would be good enough reason to not WANT to try to replicate things - and that alone would set back our science in many ways.

  5. Roman | Permalink

    Problem with priming studies is that for many years psychologists treated them as proven facts. For example Daniel Kahneman in his book Thinking fast and slow wrote (page 61 in electronic version):
    "The idea you should focus on, however, is that disbelief is not an option. The results are not made up, nor are they statistical flukes. You have no choice but to accept that the major conclusions of these studies are true. More important, you must accept that they are true about you."

  6. Patricia J Hawkins | Permalink

    I agree that the headlines were bad. However, as I understand it, the original researchers apparently refused to supply their methods to other researchers when asked, saying that small factors would affect reproducibility. If true, I have a problem with that. That's why scientists keep lab notebooks. That's why there's a "methods" section in scientific papers -- so other scientists can attempt to reproduce your results, and at the very least figure out where you went wrong. (Also see "cold fusion" -- which IMO was scientific success, where success was actually the failure to replicate!)

    If a researcher cannot do that *by some means*, perhaps by videotaping everything, I do not think that is science; I think that is a fuzzy area of scientific attempt and anecdote.

    I know that psychology is going through a process of becoming more rigorous, and all of science is pushing back with the fraud and, ah, "data tweaking" which result from academic and commercial pressures, and this sort of report is ultimately healthy -- but IMO the problem here is not just failure to replicate, but failure of the original researchers to give sufficiently good methods so that others could *either* reproduce OR fail to reproduce their results.

  7. Keith Laws | Permalink

    Nice post Pete and I absolutely agree about the fact that 'failure' to replicate (and I have said before, we may need a new term for this) is - usually- unrelated to issues of QRP. Although some failures to replicate will be related to QPRs, these will be a minority - to think that 'failure' to replicate itself indicates QRPs have occured is a typical piece of reverse inferencing (that sadly occurs not just in journalists, but is common in psychologists) - worse still - it impunes folk with absolutely no 'evidence' and should be roundly condemned!
    My view is that a 'failure' to replicate is an 'opportunity' - to investigate moderator variables - what factors influence the phenomenon - make new predictions, and test new hypotheses. Failed replications are rarely the end of the story, but the start - if a 'failure' to replicate didn't suggest new hypotheses to test, then we might then rightly call it a failure!

  8. Gerard te Meerman | Permalink

    There is a statistical aspect of replication relating to the power to replicate a result. It is not likely that studies had high power to find the results they ended up with. What we see as published results is a selection from all experiments done, where the negative results are much less likely to get published. If there was, say, 50% chance to end up with the results obtained, one should do a study with about 4 times the number of subjects, to have 95% probability of replication. This makes replication expensive.

  9. Stephan Schleim | Permalink

    I agree with the general notion that a failure to replicate a result does not mean that the original result was wrong (or even intentionally falsified), but the respective report in Nature is not just about the failure to replicate, but also about the investigator's alleged lack of collaboration:

    [A colleague] is disappointed that Dijksterhuis has declined “repeated requests” to help to generate a definitive answer.
    Dijksterhuis says that “focusing on a single phenomenon is not that helpful and won’t solve the problem”. He adds that social psychology needs to get more rigorous, but that the rigour should be applied to future, not historical, experiments.(Nature 497, p. 16)

    It is hard to judge this without knowing all the details and before also learning more about Dijksterhuis's point of view, but this part of the news report clearly hints to a credibility problem.

    Assume that researchers do not publish their original data and/or do not provide the necessary information for colleagues to double-check the results, how should we ever be able to trust that the results are sound?

  10. Ap Dijksterhuis | Permalink

    Just for the record:
    1. The Nature article is strangely biased, and Nature is now investigating matters.
    2. The intelligence priming effects has been replicated in about 25 different experiments, in 12 different labs, in 7 or 8 different countries. The journalist knew this but ignored the information.
    3. With my Nijmegen colleagues, we are actually preparing a protocol for skeptical colleagues to work with. Again, the journalist knew this, but ignored the information.


Comments are closed.