Yoel Inbar, Marcel Zeelenberg, and I published an article in which we try to replicate the finding that a reminder of health insurance makes people think health-related risks are less likely. We report 3 very close replications of the original study (total N = 451) and 2 conceptual replications (total N = 404). The initial study had 40 participants. In other papers, the ratio of replication-to-initial sample size easily exceeds 10 : 1 too, and that’s a good thing.
If you want to read our paper, you can, because it’s published at the open-access Journal of Judgment and Decision Making (JDM). Unfortunately, the study we replicated was published at the Personality and Social Psychology Bulletin (PSPB) so you might not be able to access that. Therefore, a brief description of the initial study:
In Tykocinski (2008) Study 1, Israeli commuters on the train were reminded of their health insurance plan either before or after they rated the probabilities of three health-related risks (i.e., the probability that they would require surgery, physiotherapy, or comprehensive nursing care within the next 5 years). The people who were reminded of their health insurance before answering the questions, on average, rated those probabilities lower.
We thought this was a cool effect. If the “protection effect”—insurance makes risks seem less likely—were real, it might explain part of the moral hazard effect (insurance-induced risk-taking). We ran studies that built on this idea but consistently did not find the results we predicted. After a while, we decided we should try to replicate the original finding. It turns out that we were unable to replicate the protection effect in the Netherlands (and the U.S.)*.
Replication-to-initial sample size ratio
Only counting the replications in which we try to stay as close to the original materials as possible, we have three (2-condition) studies with a total N of 451, versus 40 in the original study. One of our studies has 95% power to find an effect as large as originally reported and in another we have more than 95% power to find an effect half the size of the originally reported effect. We do not replicate the finding in any of our studies. Only counting replications that mimic the experimental procedure of Tykocinski (2008), the replication-to-initial sample size ratio is 11.28 : 1.
Other replication papers
Two recent replication attempts have even greater ratios. Galak et al (2012) report 7 replication attempts (total N = 3,289) of two studies on precognition reported in Bem (2011; total N = 150).
Ratio = 21.92 : 1.
In both cases, the initial studies’ sample sizes pale in comparison to those of the replication attempts. This is partly because the replication attempts are properly powered and the initial studies less so. Small samples sizes are almost never good (one reason described here), but the above made me think of an argument I’ve heard multiple people make.
What if authors of new papers would be required to report properly powered replications of their new ‘initial’ studies? Possibly in an Appendix. Surely, that would not be feasible for every study, but for simple lab and scenario studies I don’t see why that could not be implemented. Such a requirement would increase confidence in published papers and could prevent the need for 20:1 ratio replication papers.
Finally, I am certain the three replication attempts in this post are not the only ones out there. I probably only know about these papers because I am an author on one of them, and because the other two have unusually large numbers of highly powered studies. Papers I haven’t read might have much less power. So, if you know about other published replication attempts (so not in the OSF or psycfiledrawer), please let me know!
References with links, but sorry, sometimes paywall
Bem, D. J. ( 2011). Feeling the future: Experimental evidence for anomalous retroactive influences on cognition and affect. Journal of Personality and Social Psychology, 100, 407– 425. doi:10.1037/a0021524 link or link on Bem’s page
Matthews, W. J. (2012). How Much Do Incidental Values Affect the Judgment of Time? Psychological science, 23, 1432-1434. doi:10.1177/0956797612441609 link
Tykocinski, O. E. (2008). Insurance, risk, and magical thinking. Personality and Social Psychology Bulletin, 34, 1346-1356. doi: 10.1177/1046167208320556 link
Ungemach, C., Stewart, N., & Reimers, S. (2011). How incidental values from the environment affect decisions about money, risk, and delay. Psychological Science, 22, 253-260. doi:10.1177/0956797610396225 link or pdf!
van Wolferen, J., Inbar, Y., & Zeelenberg, M. (2013). Magical thinking in predictions of negative events : Evidence for tempting fate but not for a protection effect. Judgment and Decision Making, 8, 45–54. link
* In the paper we discuss why we might not have been able to replicate the protection effect.
This is me, because our replication paper is the first paper I ever published, yeah!
EDIT: added after comments (see below)
Zwaan et al replicate 3 findings, and do so twice for each. The sample sizes and ratios are 336/40 = 8.40:1, 352/42 = 8.38:1, and 304/42+60 = 2.98:1.
It seems that my earlier comment about me knowing about the replications mentioned above being due to “unusually large numbers of highly powered studies” has some truth to it. Still, it seems that the replication attempts are properly powered (i did not do the calculations), so that’s good! Thanks Ellen and Rolf!