Questionable Research Practices

Week 11

Jenny Terry & Martina Sladekova

Part 1: How science got broken

Long ago, Psychology scientists lived together in harmony

1879 - world’s psychology lab established by Wilhelm Wundt

Defining point for Psychology as a separate discipline
Prior to that - part of philosophy
Use of experimental methods for the first time - smooth(ish) sciencing followed

Black and white photograph of bearded Wilhelm Wundt sitting in a chair, surrounded by 5 more white men of science, looking at very scientific instruments that have nothing to do with psychology.

…Everything changed when the Fire Nation Open Science Collaboration attacked

Open Science Collaboration (2015): Estimating the reproducibility of psychological science

Attempted to replicate findings published in high profile journals
Replicated effect sizes were half the size of the ones than originally reported
97% of original p-values were statistically significant (p < 0.05)
Only 36% of replicated p-values were statistically significant

Vocabulary

Replication: The process of repeating (re-running) the same study using identical methodology.

…Everything changed when the Fire Nation Open Science Collaboration attacked

Some famous findings that we can’t replicate:

Smiling will make you feel happier
Power posing will make you act bolder
Self-control is a limited resource
Revising after your exams can improve your earlier performance (Daryl Bem’s “pre-cognition” experiments)
Babies are born with the power to imitate

Decorative gif of captain Raymond Holt from Brookly 99 series saying "Time for the next stage: Forced laughter" and grinning maniacally"

Poll time!

Two teams of researchers set out to study the effect of mindfulness meditation on well-being. Both teams apply the same intervention using identical protocol and each team collects 200 participants. The teams analyse their own data using the t-test, comparing the well-being of the intervention group against the control group.

Team 1

Team 1 finds a statistically significant difference in well-being between the two groups (p = .034)

Team 2

Team 2 finds a non-significant difference in well-being between the two groups (p = .27)

Both teams write-up their reports as a scientific paper. Which paper should get published?

bit.ly/and-poll

Publication bias

Some results are more interesting than others. But are they more important?
The publication record is over-populated with statististically significant findings
Papers reporting significant p-values are 9 times more likely to get published compared to papers reporting non-significant findings.
Problem for evidence synthesis - publishing only (or mostly) significant findings might make it seem like an effect exists when in reality it doesn’t
- Example: if there is no effect of mindfulness on well-being, we’ll still find a statistically significant result in 5% of replications because of random sampling (assuming alpha of 0.05)

Vocabulary

Publication bias or the “File drawer effect” - is the bias in the publication system where statistically significant results are favoured for publication over non-significant findings. Papers reporting non-significant findings often end up in researchers’ file-drawers, never to be seen again.

A day in the life of a scientist

Conduct research (requires funding)
Train research students (requires funding)
Publish papers (requires funding)
Teach (it’s free! 🤩)
Secure funding (apply for grants)

Funding bodies look at researchers’ publication records (among other things) to decide who gets awarded a grant.

failure to publish |> failure to secure funding |> failure to conduct further research |> job insecurity

Questionable Research Practices (QRPs)

Tight deadlines + high pressure competitive environment + “publish or perish” + lack of training + personal investment -> QRPs

Vocabulary

Questionable research practices - a range of practices that distort (intentionally or unintentionally) the results (often) motivated by the desire to find support for hypotheses and make research more publishable.

Questionable Research Practices (QRPs)

From UK Research Integrity Office:

Graphic describing the different dimensions of QRPs. On the left side are errors, sloppiness, misunderstanding, incompetence, mistakes and time-pressure. On the right side is falsification, criminality, fabrication, deliberate actions and financial pressure.

Hanlon’s Razor: “Never attribute to malice that which is adequately explained by ignorance or incompetence.”

The Garden of Forking Paths

Each analysis has many decision points - which tests to run, which participants to exclude, which steps to take during data cleaning, etc..
Each decision results in a unique analysis “path”
Different analysts might not necessarily take the same path and arrive at the same conclusion (Silberzahn, 2018)
Some “paths” become can seem more sensible depending on your motivations

Illustrative image of a garden with a forking path

diagram showing a line splitting into two direction. Then each line splits into two more, which split into two more... generating many possible paths.

Some QRPs

p-hacking

Selective inclusion/removal of cases
Subsetting/combining groups
Variable dichotomisation
Data transformation
Collecting more data (“data peeking”)

80s hackerman gif but with martina's face on it.

NOTE: None of these are “questionable” in their own right. Motivation matters!

Vocabulary

p-hacking: taking specific analytic steps in order to achieve statistical significance rather than (pre-planned) steps that are more appropriate to answer the research question.

Some QRPs

p-hacking

Masicampo, E. J., & Lalande, D. R. (2012). A peculiar prevalence of p values just below .05:

A graph showing the frequency of different p-values found in the literature. The y axis shows counts, the x axis shows possible frequencies of values from 0.01 on the left to 0.10 on the right. The frequencies go steadily down as we move from left to right, expect for one spike just under the value of 0.05.

Some QRPs

HARKing

NOT the same as exploratory research
NOT the same as discussing explanations for your (suprising) results

Vocabulary

HARKing: Hypothesizing After The Results are Known. Often involves collecting data without clear a hypothesis, deciding on a hypothesis based on what’s significant, and then presenting that hypothesis as if it was decided on before running any analyses.

Some QRPs

Selective reporting

Often goes hand-in-hand with HARKing
Collecting a lot of variables and only reporting statistically significant relationships (without making in clear that you’ve also collected other data)

Selective/inaccurate citing

Picking and choosing which papers to cite in a way that fit your narrative
Citing papers as supporting a specific point when they don’t

Salami slicing

Splitting relevant analyses from a single dataset into multiple papers to increase publication count

Do Psychologists Engage in QRPs? (John et al. 2012)

Item	Self admission rate (%)
Failing to report all of study’s dependent measures	63.4
Deciding whether to collect more data after looking to see whether the results were significant	55.9
Failing to report all of a study’s conditions	27.7
Stopping collecting data earlier than planned because one found the result that one had been looking for	16.6
“Rounding off” a p value (e.g., reporting that a p value of .054 is less than .05)	22.2
Selectively reporting studies that “worked”	45.8
Deciding whether to exclude data after looking at the impact of doing so on the results	38.2
Reporting an unexpected finding as having been predicted from the start	27.0
Claiming that results are unaffected by demographic variables (e.g., gender) when one is actually unsure (or knows that they do)	3.0
Falsifying data	0.6

Why should you care?

YOU are the future

researchers (academic or industry)
practitioners
educators

Understanding how research works (and how the environment can affects the research process) allows you to:

Be critical when you’re reading about research and not fall every “Psychology says” or “Science says” click-bait and
Make science better!

How can we make science better?

Part 2: From “Replication Crisis” to “Credibility Revolution”

The Credibility Revolution

A photo of Simine Vazire, a woman with dark hair wearing a red jumper taken outside on a sunny day.

Simine Vazire (University of Melbourne)

Coined the term “Credibilty Revolution” to describe a more optimistic move towards improving research, as opposed to just pointing out what was wrong with it
Founder of the Society for the Improvement of Psychological Science (SIPS) in 2016
SIPS aims to “bring together scholars working to improve methods and practices in psychological science” and has played a key role in the development and proliferation of Open Science

Open Science (OS)

A graphic showing all the different aspects of Open Science. The point of the image is the number and variety of aspects, rather than what those aspects are.

Openness (transparency) prevents researchers from being able to hide their QRPs

The Open Science movement has inspired many innovations in transparent research, only a few of which we’ll explore today:

Preregistration
Registered reports
Open materials/data/code

Preregistration

Preregistration involves publicly sharing a time-stamped research plan (e.g., on the OSF) that includes:
- Precise hypotheses to prevent HARKing
- Information about all your variables and how they’re operationalised to prevent selective reporting
- A detailed data analysis plan to prevent p-hacking
Other benefits include making sure you are collecting data you can actually analyse and front-loading a lot of the work

Registered Reports

Registered Reports are like preregistration insomuch as you’re specifying what you’re going to do in advance, but they have additional benefits.

Like preregistration, they aim to reduce QRPs such as HARKing, p-hacking, and selective reporting
They also improve research quality more generally as the methods are reviewed by peers before data collection commences
They are also more likely to reduce publication bias as journals agree to publish the study based on the quality of the methods, regardless of the results

So, has Open Science improved Psychology?

New Evidence that OS is Improving Psychological Science!

Protzko et al. (2023) claimed that OS efforts are improving replicability
Methods are complex, but essentially compared the replication rates of studies that had used OS practices with the replication rate of studies that hadn’t used OS practices
Concluded 86% replication rate in the OS sample was due to “rigor-enhancing practices” such as preregistration, methodological transparency, confirmatory tests, and large sample sizes.

A screenshot of the journal article, including the title, author list, and an Editor's note warning that there are some issues with the paper.

Wait, what’s that Editor’s Note all about?

Photos of Berna Devezer and Joe Bak-Coleman.

Bak-Coleman & Devezer (2023) wrote a response, arguing (amongst other things):

The definition of ‘replication’ was less conservative in the OS than the non-OS sample
The OS group of studies were chosen because they were more likely to replicate
The hypotheses and analyses in the study were not as preregistered

The story unfolds further in this blogpost.

Metabias

Original reviews pointed out the flaws in this study
One of those reviewers spoke out on Twitter, saying the original authors’ response was that they were aware of the flaws, but thought the ends justified the means
In other words, it was okay to fudge the figures a bit to get more people on the OS bandwagon
This (meta)bias is the kind of QRP that OS set out to eradicate
Read the full thread here on Bluesky

A screenshot of some tweets from one of the original reviewers describing what they saw as metabias.

Is Preregistration Worthwhile?

Photos of Aba Szollosi, Danielle Navarro, and Iris van Rooij.

Szöllősi, Navarro, van Rooij, and colleagues (2021) wrote a paper called “Is Preregistration Worthwhile?”, which argued:

Preregistration isn’t sufficient for good science
You can still preregister bad science from weak theories
Strong theories generate very specific hypotheses that are less susceptible to HARKing
Psychology needs stronger theories, not preregistration

It caused quite the Twitter storm!

Questionable Metascience Practices

Rubin (2023) argued that many OS proponents engage in questionable metascience practices, for example:

A lack of evidence of its efficacy
Metabias towards blaming researcher bias (QRPs) for the replication crisis, when there are other explanations (e.g., weak theory, poor measurement practice)
Rejecting or ignoring criticisms of metascience and/or science reform
Quick, superficial, dismissive, and/or mocking style of criticising others, predominantly from those in positions of power and privilege (bropen science)

Bropen Science

Whitaker & Guest (2020): “#bropenscience is a tongue-in-cheek expression but also has a serious side, shedding light on the narrow demographics and off-putting behavioural patterns seen in open science.”

“Not all bros are men. And that’s true, but they are more likely to be from one or more of the following dominant social groups: male, white, cisgender, heterosexual, able-bodied, neurotypical, high socioeconomic status, English-speaking. That’s because structural privileges exist that benefit certain groups of people.”

What do these OS proponents have in common?

An image with 12 photos of Open Science proponents. Thy are mostly white and mostly presenting as men.

What do these OS critics have in common?

An image with 12 photos of open science critics. They vary in terms of gender and ethnicity.

Glimmers of Hope

The Society for the Improvement of Psychological Science’s Code of Conduct warns against behaviour typical of #bropenscience and is (visibly) more committed to diversity and inclusion
The Center for Open Science’s Symposium: Critical Perspectives on the Metascience Reform Movement is an example of embracing criticism
Non-English speaking OS groups are being set up, such as the Chinese Open Science Network

Lecture Summary

The Anakin/Padme meme, with Anakin saying, "Open Science will change psychological science" and Padme asking, "for the better, right?!" with a look of realisation that it might not be for the better!

The Replication Crisis in Psychology triggered a chain of events, intended to improve Psychological Science
Initially, the discipline focused on identifying problems and a list of Questionable Research Practices emerged
In response, Open Science initiatives were developed that attempted to prevent QRPs
However, OS has been subject to Questionable Metascience Practices and must be held to the same standards as the research it is trying to improve
OS is showing signs of becoming more humble, more reflective, more inclusive, and will have no choice but to provide robust evidence of its effectiveness

Further Information

FORRT (including the Open Science Glossary)
Mark Rubin’s Critical Metascience Reading List
Twitter/Bluesky (follow the folks cited in this lecture)

More references:

John, L. K., Loewenstein, G., & Prelec, D. (2012). Measuring the prevalence of questionable research practices with incentives for truth telling. Psychological science, 23(5), 524-532.
Masicampo, E. J., & Lalande, D. R. (2012). A peculiar prevalence of p values just below .05. Quarterly Journal of Experimental Psychology, 65(11), 2271-2279. https://doi.org/10.1080/17470218.2012.711335
Silberzahn, R., Uhlmann, E. L., Martin, D. P., Anselmi, P., Aust, F., Awtrey, E., … & Nosek, B. A. (2018). Many analysts, one data set: Making transparent how variations in analytic choices affect results. Advances in Methods and Practices in Psychological Science, 1(3), 337-356.
Simon Kolstoe. Defining the Spectrum of Questionable Research
Practices (QRPs), UKRIO, 2023 https://doi.org/10.37672/UKRIO.2023.02.QRPs

Questionable Research Practices

Part 1: How science got broken

Long ago, Psychology scientists lived together in harmony

…Everything changed when the Fire Nation Open Science Collaboration attacked

…Everything changed when the Fire Nation Open Science Collaboration attacked

Poll time!

bit.ly/and-poll

Publication bias

A day in the life of a scientist

Questionable Research Practices (QRPs)

Questionable Research Practices (QRPs)

The Garden of Forking Paths

Some QRPs

p-hacking

Some QRPs

p-hacking

Some QRPs

HARKing

Some QRPs

Selective reporting

Selective/inaccurate citing

Salami slicing

Do Psychologists Engage in QRPs? (John et al. 2012)

Why should you care?

Part 2: From “Replication Crisis” to “Credibility Revolution”

The Credibility Revolution

Open Science (OS)

Preregistration

Registered Reports

Registered Reports

Sharing of Materials, Data, & Code

So, has Open Science improved Psychology?

New Evidence that OS is Improving Psychological Science!

Wait, what’s that Editor’s Note all about?

Metabias

Is Preregistration Worthwhile?

Questionable Metascience Practices

Bropen Science

What do these OS proponents have in common?

What do these OS critics have in common?

Glimmers of Hope

Lecture Summary

Further Information