We all know that correlation does not equal causation. Many websites use clickbait in their headlines to attract readers, and headlines involving science are no exception. But what would you say if you knew that many of the incredible studies cited aren’t even reproducible? Surely, as a good science student, you would reject them, right? Well…
According to the linked article, only about 36% of psychology studies ultimately produce replicable results, even when parameters like sample size are kept the same.
Think of shooting a gun. Your first time, you manage to hit the center of the target. Is this enough to make you a sharpshooter? What about if you managed to hit the center of the target 10 or 15 times? The more you hit the center of the target, the more likely it is that you have incredible aim and skill at firing a gun. Similarly, if an experiment produces replicable results, it is more likely to withstand scrutiny, be applicable to the general public, and/or have greater implications. (There is some variance with different fields; after all, a study on black holes is hardly going to coincide with the effects of the Mediterranean diet on pre-diabetic individuals.) This is effectively what a p-value tells us; low p-values indicate that a result is most likely not due to chance, and is therefore replicable.
According to the referenced study, more than 1/3 of studies were found to not have replicable results, even when they reported low p-values. This indicates that many results that are reported as significant may be due to chance. Furthermore, these findings proved correct even when tested on sample groups similar to the groups of the original studies. Psychology, along with other fields that study human behavior, is a field that attempts to tie all of its findings to larger and larger groups. What then does it say if results aren’t viable between similar groups?
Do replicable results prove the conclusion of an experiment?
“Prove” is kind of a nasty word to use. Usually, the only thing “proven” is that a hypothesis is either correct or incorrect for the subset of the population tested. “Proof” can be further complicated depending on the discipline and the study. For example, antagonists for the NK1 receptor were shown to reduce a pain phenotype in mice, but not in humans. However, results that are easily replicable provide stronger evidence for a certain theory. As an example, preclinical studies involving the NK1 receptor were profoundly replicated animal models, with a GoogleScholar search pulling over 600 results.
Should we focus on replicable results instead of novel experiments?
Not exactly. I feel that the team behind “Estimating the reproductability of psychological science” says it best:
Innovation points out paths that are probable; replication points out paths that are likely; progress relies on both.
If we stop designing new experiments, then progress is halted. However, if we do not ensure that these experiments are valid, then the results are meaningless.
Take for example the NK1 receptor findings discussed previously. Antagonists had to be tested multiple times in multiple models just to make sure they were safe for humans. If they had failed any preclinical trial, they wouldn’t have moved on. However, we ultimately didn’t learn anything until we tested them in humans. It’s unfortunate that the clinical trials were ineffective, but personally, I wouldn’t say they were useless. We learned what didn’t work, and now we don’t need to waste time on something that is ineffective. Progress happened only because innovation and replication were working alongside each other.
Just keep replicablity in mind before you start raving about a new study that shows that “Coconut Oil and Chocolate are the Definitive Cure for Everything.”
Editor: Rachel Levy