Questionable Research Practices

Week 11

Jenny Terry & Martina Sladekova

Part 1: How science got broken

Long ago, Psychology scientists lived together in harmony

1879 - world’s psychology lab established by Wilhelm Wundt

  • Defining point for Psychology as a separate discipline

  • Prior to that - part of philosophy

  • Use of experimental methods for the first time - smooth(ish) sciencing followed

Black and white photograph of bearded Wilhelm Wundt sitting in a chair, surrounded by 5 more white men of science, looking at very scientific instruments that have nothing to do with psychology.

…Everything changed when the Fire Nation Open Science Collaboration attacked

Open Science Collaboration (2015): Estimating the reproducibility of psychological science

  • Attempted to replicate findings published in high profile journals

  • Replicated effect sizes were half the size of the ones than originally reported

  • 97% of original p-values were statistically significant (p < 0.05)

  • Only 36% of replicated p-values were statistically significant

Vocabulary

Replication: The process of repeating (re-running) the same study using identical methodology.

…Everything changed when the Fire Nation Open Science Collaboration attacked

Some famous findings that we can’t replicate:

  • Smiling will make you feel happier

  • Power posing will make you act bolder

  • Self-control is a limited resource

  • Revising after your exams can improve your earlier performance (Daryl Bem’s “pre-cognition” experiments)

  • Babies are born with the power to imitate

Decorative gif of captain Raymond Holt from Brookly 99 series saying "Time for the next stage: Forced laughter" and grinning maniacally"

Read more here: https://www.bps.org.uk/research-digest/ten-famous-psychology-findings-have-been-difficult-replicate

Poll time!

Two teams of researchers set out to study the effect of mindfulness meditation on well-being. Both teams apply the same intervention using identical protocol and each team collects 200 participants. The teams analyse their own data using the t-test, comparing the well-being of the intervention group against the control group.

Team 1

Team 1 finds a statistically significant difference in well-being between the two groups (p = .034)

Team 2

Team 2 finds a non-significant difference in well-being between the two groups (p = .27)

Both teams write-up their reports as a scientific paper. Which paper should get published?

Publication bias

  • Some results are more interesting than others. But are they more important?

  • The publication record is over-populated with statististically significant findings

  • Papers reporting significant p-values are 9 times more likely to get published compared to papers reporting non-significant findings.

  • Problem for evidence synthesis - publishing only (or mostly) significant findings might make it seem like an effect exists when in reality it doesn’t

    • Example: if there is no effect of mindfulness on well-being, we’ll still find a statistically significant result in 5% of replications because of random sampling (assuming alpha of 0.05)

Vocabulary

Publication bias or the “File drawer effect” - is the bias in the publication system where statistically significant results are favoured for publication over non-significant findings. Papers reporting non-significant findings often end up in researchers’ file-drawers, never to be seen again.

A day in the life of a scientist

  • Conduct research (requires funding)

  • Train research students (requires funding)

  • Publish papers (requires funding)

  • Teach (it’s free! 🤩)

  • Secure funding (apply for grants)

Funding bodies look at researchers’ publication records (among other things) to decide who gets awarded a grant.

failure to publish |> failure to secure funding |> failure to conduct further research |> job insecurity

Questionable Research Practices (QRPs)

Tight deadlines + high pressure competitive environment + “publish or perish” + lack of training + personal investment -> QRPs

Vocabulary

Questionable research practices - a range of practices that distort (intentionally or unintentionally) the results (often) motivated by the desire to find support for hypotheses and make research more publishable.

Questionable Research Practices (QRPs)

From UK Research Integrity Office:

Graphic describing the different dimensions of QRPs. On the left side are errors, sloppiness, misunderstanding, incompetence, mistakes and time-pressure. On the right side is falsification, criminality, fabrication, deliberate actions and financial pressure.

Hanlon’s Razor: “Never attribute to malice that which is adequately explained by ignorance or incompetence.”

The Garden of Forking Paths

  • Each analysis has many decision points - which tests to run, which participants to exclude, which steps to take during data cleaning, etc..

  • Each decision results in a unique analysis “path”

  • Different analysts might not necessarily take the same path and arrive at the same conclusion (Silberzahn, 2018)

  • Some “paths” become can seem more sensible depending on your motivations

Illustrative image of a garden with a forking path

diagram showing a line splitting into two direction. Then each line splits into two more, which split into two more... generating many possible paths.

Some QRPs

p-hacking

  • Selective inclusion/removal of cases

  • Subsetting/combining groups

  • Variable dichotomisation

  • Data transformation

  • Collecting more data (“data peeking”)

80s hackerman gif but with martina's face on it.

NOTE: None of these are “questionable” in their own right. Motivation matters!

Vocabulary

p-hacking: taking specific analytic steps in order to achieve statistical significance rather than (pre-planned) steps that are more appropriate to answer the research question.

Some QRPs

p-hacking

Masicampo, E. J., & Lalande, D. R. (2012). A peculiar prevalence of p values just below .05:


A graph showing the frequency of different p-values found in the literature. The y axis shows counts, the x axis shows possible frequencies of values from 0.01 on the left to 0.10 on the right. The frequencies go steadily down as we move from left to right, expect for one spike just under the value of 0.05.

Some QRPs

HARKing

  • NOT the same as exploratory research

  • NOT the same as discussing explanations for your (suprising) results

Vocabulary

HARKing: Hypothesizing After The Results are Known. Often involves collecting data without clear a hypothesis, deciding on a hypothesis based on what’s significant, and then presenting that hypothesis as if it was decided on before running any analyses.

Some QRPs

Selective reporting

  • Often goes hand-in-hand with HARKing

  • Collecting a lot of variables and only reporting statistically significant relationships (without making in clear that you’ve also collected other data)

Selective/inaccurate citing

  • Picking and choosing which papers to cite in a way that fit your narrative

  • Citing papers as supporting a specific point when they don’t

Salami slicing

  • Splitting relevant analyses from a single dataset into multiple papers to increase publication count

Do Psychologists Engage in QRPs? (John et al. 2012)

Item Self admission rate (%)
Failing to report all of study’s dependent measures 63.4
Deciding whether to collect more data after looking to see whether the results were significant 55.9
Failing to report all of a study’s conditions 27.7
Stopping collecting data earlier than planned because one found the result that one had been looking for 16.6
“Rounding off” a p value (e.g., reporting that a p value of .054 is less than .05) 22.2
Selectively reporting studies that “worked” 45.8
Deciding whether to exclude data after looking at the impact of doing so on the results 38.2
Reporting an unexpected finding as having been predicted from the start 27.0
Claiming that results are unaffected by demographic variables (e.g., gender) when one is actually unsure (or knows that they do) 3.0
Falsifying data 0.6

Why should you care?

YOU are the future

  • researchers (academic or industry)

  • practitioners

  • educators

Understanding how research works (and how the environment can affects the research process) allows you to:

  1. Be critical when you’re reading about research and not fall every “Psychology says” or “Science says” click-bait and
  2. Make science better!

How can we make science better?

Part 2: From “Replication Crisis” to “Credibility Revolution”

The Credibility Revolution

A photo of Simine Vazire, a woman with dark hair wearing a red jumper taken outside on a sunny day.

Simine Vazire (University of Melbourne)

  • Coined the term “Credibilty Revolution” to describe a more optimistic move towards improving research, as opposed to just pointing out what was wrong with it
  • Founder of the Society for the Improvement of Psychological Science (SIPS) in 2016
  • SIPS aims to “bring together scholars working to improve methods and practices in psychological science” and has played a key role in the development and proliferation of Open Science

Open Science (OS)

A graphic showing all the different aspects of Open Science. The point of the image is the number and variety of aspects, rather than what those aspects are.

Openness (transparency) prevents researchers from being able to hide their QRPs

The Open Science movement has inspired many innovations in transparent research, only a few of which we’ll explore today:

  • Preregistration
  • Registered reports
  • Open materials/data/code

Preregistration

  • Preregistration involves publicly sharing a time-stamped research plan (e.g., on the OSF) that includes:

    • Precise hypotheses to prevent HARKing

    • Information about all your variables and how they’re operationalised to prevent selective reporting

    • A detailed data analysis plan to prevent p-hacking

  • Other benefits include making sure you are collecting data you can actually analyse and front-loading a lot of the work

The preregistered Open Science badge

Registered Reports

A graphic showing the difference between traditional publishing and registered reports, as described further in the audio.

Registered Reports

Registered Reports are like preregistration insomuch as you’re specifying what you’re going to do in advance, but they have additional benefits.

  • Like preregistration, they aim to reduce QRPs such as HARKing, p-hacking, and selective reporting

  • They also improve research quality more generally as the methods are reviewed by peers before data collection commences

  • They are also more likely to reduce publication bias as journals agree to publish the study based on the quality of the methods, regardless of the results

Sharing of Materials, Data, & Code

  • Open materials allow other researchers to inspect and more easily replicate your research study
  • Open code and data allow other researchers to inspect and reproduce your research findings
  • Open code is the main reason why we’re teaching you R!

The open materials, open code, and open data open science badges.

So, has Open Science improved Psychology?

New Evidence that OS is Improving Psychological Science!

  • Protzko et al. (2023) claimed that OS efforts are improving replicability

  • Methods are complex, but essentially compared the replication rates of studies that had used OS practices with the replication rate of studies that hadn’t used OS practices

  • Concluded 86% replication rate in the OS sample was due to “rigor-enhancing practices” such as preregistration, methodological transparency, confirmatory tests, and large sample sizes.

A screenshot of the journal article, including the title, author list, and an Editor's note warning that there are some issues with the paper.

Wait, what’s that Editor’s Note all about?

Photos of Berna Devezer and Joe Bak-Coleman.

Bak-Coleman & Devezer (2023) wrote a response, arguing (amongst other things):

  • The definition of ‘replication’ was less conservative in the OS than the non-OS sample

  • The OS group of studies were chosen because they were more likely to replicate

  • The hypotheses and analyses in the study were not as preregistered

The story unfolds further in this blogpost.

Metabias

  • Original reviews pointed out the flaws in this study

  • One of those reviewers spoke out on Twitter, saying the original authors’ response was that they were aware of the flaws, but thought the ends justified the means

  • In other words, it was okay to fudge the figures a bit to get more people on the OS bandwagon

  • This (meta)bias is the kind of QRP that OS set out to eradicate

  • Read the full thread here on Bluesky

A screenshot of some tweets from one of the original reviewers describing what they saw as metabias.

Is Preregistration Worthwhile?

Photos of Aba Szollosi, Danielle Navarro, and Iris van Rooij.

Szöllősi, Navarro, van Rooij, and colleagues (2021) wrote a paper called “Is Preregistration Worthwhile?”, which argued:

  • Preregistration isn’t sufficient for good science

  • You can still preregister bad science from weak theories

  • Strong theories generate very specific hypotheses that are less susceptible to HARKing

  • Psychology needs stronger theories, not preregistration

It caused quite the Twitter storm!

Questionable Metascience Practices

Rubin (2023) argued that many OS proponents engage in questionable metascience practices, for example:

  • A lack of evidence of its efficacy

  • Metabias towards blaming researcher bias (QRPs) for the replication crisis, when there are other explanations (e.g., weak theory, poor measurement practice)

  • Rejecting or ignoring criticisms of metascience and/or science reform

  • Quick, superficial, dismissive, and/or mocking style of criticising others, predominantly from those in positions of power and privilege (bropen science)

Bropen Science

Whitaker & Guest (2020): “#bropenscience is a tongue-in-cheek expression but also has a serious side, shedding light on the narrow demographics and off-putting behavioural patterns seen in open science.”

“Not all bros are men. And that’s true, but they are more likely to be from one or more of the following dominant social groups: male, white, cisgender, heterosexual, able-bodied, neurotypical, high socioeconomic status, English-speaking. That’s because structural privileges exist that benefit certain groups of people.”

What do these OS proponents have in common?

An image with 12 photos of Open Science proponents. Thy are mostly white and mostly presenting as men.

What do these OS critics have in common?

An image with 12 photos of open science critics. They vary in terms of gender and ethnicity.

Glimmers of Hope

Lecture Summary

The Anakin/Padme meme, with Anakin saying, "Open Science will change psychological science" and Padme asking, "for the better, right?!" with a look of realisation that it might not be for the better!

  • The Replication Crisis in Psychology triggered a chain of events, intended to improve Psychological Science

  • Initially, the discipline focused on identifying problems and a list of Questionable Research Practices emerged

  • In response, Open Science initiatives were developed that attempted to prevent QRPs

  • However, OS has been subject to Questionable Metascience Practices and must be held to the same standards as the research it is trying to improve

  • OS is showing signs of becoming more humble, more reflective, more inclusive, and will have no choice but to provide robust evidence of its effectiveness

Further Information

More references:

  • John, L. K., Loewenstein, G., & Prelec, D. (2012). Measuring the prevalence of questionable research practices with incentives for truth telling. Psychological science, 23(5), 524-532.
  • Masicampo, E. J., & Lalande, D. R. (2012). A peculiar prevalence of p values just below .05. Quarterly Journal of Experimental Psychology, 65(11), 2271-2279. https://doi.org/10.1080/17470218.2012.711335
  • Silberzahn, R., Uhlmann, E. L., Martin, D. P., Anselmi, P., Aust, F., Awtrey, E., … & Nosek, B. A. (2018). Many analysts, one data set: Making transparent how variations in analytic choices affect results. Advances in Methods and Practices in Psychological Science, 1(3), 337-356.
  • Simon Kolstoe. Defining the Spectrum of Questionable Research
    Practices (QRPs)
    , UKRIO, 2023 https://doi.org/10.37672/UKRIO.2023.02.QRPs