Questionable Research Practices

Week 11

Dr Jenny Terry & Dr Martina Sladekova

Part 1: How science got broken

Long ago, Psychology scientists lived together in harmony

1879 - world’s psychology lab established by Wilhelm Wundt

  • Defining point for Psychology as a separate discipline

  • Prior to that - part of philosophy

  • Use of experimental methods for the first time - smooth psychological science followed

Black and white photograph of bearded Wilhelm Wundt sitting in a chair, surrounded by 5 more white men of science, looking at very scientific instruments that have nothing to do with psychology.

…Everything changed when the Fire Nation Open Science Collaboration attacked

Open Science Collaboration (2015): Estimating the reproducibility of psychological science

  • Attempted to replicate findings published in high profile journals

  • Replicated effect sizes were half the size of the ones than originally reported

  • 97% of original p-values were statistically significant (p < 0.05)

  • Only 36% of replicated p-values were statistically significant

Vocabulary

Replication: The process of repeating (re-running) the same study using identical methodology.

…Everything changed when the Fire Nation Open Science Collaboration attacked

Some famous findings that we can’t replicate:

  • Smiling will make you feel happier

  • Power posing will make you act bolder

  • Self-control is a limited resource (ego-depletion)

  • Revising after your exams can improve your earlier performance (Daryl Bem’s “pre-cognition” experiments)

  • Babies are born with the ability to imitate

Decorative gif of captain Raymond Holt from Brookly 99 series saying "Time for the next stage: Forced laughter" and grinning maniacally"

Read more here: https://www.bps.org.uk/research-digest/ten-famous-psychology-findings-have-been-difficult-replicate

PollEverywhere

Two teams of researchers set out to study the effect of mindfulness meditation on well-being. Both teams apply the same intervention using identical protocol and each team collects 200 participants. The teams analyse their own data using the t-test, comparing the well-being of the intervention group against the control group.

Team 1

Team 1 finds a statistically significant difference in well-being between the two groups (p = .034)

Team 2

Team 2 finds a non-significant difference in well-being between the two groups (p = .27)

Both teams write-up their reports as a scientific paper. Which paper should get published? linktr.ee/analysingdata

Publication bias

  • Some results are more “interesting” to publish (more readers and citations = more money)

  • The publication record is over-populated with statistically significant findings

  • Papers with significant p-values are 9 times more likely to get published

  • Problem for evidence synthesis/meta-analysis

    • Weakness of statistical research: if there is no effect of mindfulness on well-being, we’ll still find a statistically significant result in 5% of replications because of random sampling (assuming alpha of 0.05)

Vocabulary

Publication bias or the “File drawer effect” - is the bias in the publication system where statistically significant results are favoured for publication over non-significant findings. Papers reporting non-significant findings often end up in researchers’ file-drawers, never to be seen again.

A day in the life of a scientist

  • Conduct research (requires funding)

  • Train research students (requires funding)

  • Publish papers (requires funding)

  • Teach (it’s free! 🤩)

  • Secure funding (apply for grants)

Funding bodies look at researchers’ publication records (among other things) to decide who gets awarded a grant.

Failure to publish |> failure to secure funding |> failure to conduct further research |> job insecurity

Questionable Research Practices (QRPs)

Tight deadlines +

high pressure competitive environment +

“publish or perish” +

lack of training +

personal investment -> QRPs

Vocabulary

Questionable research practices - a range of practices that distort (intentionally or unintentionally) the results (often) motivated by the desire to find support for hypotheses and make research more publishable.

Questionable Research Practices (QRPs)

From UK Research Integrity Office:

Graphic describing the different dimensions of QRPs. On the left side are errors, sloppiness, misunderstanding, incompetence, mistakes and time-pressure. On the right side is falsification, criminality, fabrication, deliberate actions and financial pressure.

Hanlon’s Razor: “Never attribute to malice that which is adequately explained by ignorance or incompetence.”

The Garden of Forking Paths

  • Each analysis has many decision points - which tests to run, which participants to exclude, which steps to take during data cleaning, etc..

  • Each decision results in a unique analysis “path”

  • Different analysts might not necessarily take the same path and arrive at the same conclusion (Silberzahn, 2018)

  • Some “paths” become can seem more sensible depending on your motivations

Illustrative image of a garden with a forking path

diagram showing a line splitting into two direction. Then each line splits into two more, which split into two more... generating many possible paths.

Some QRPs

p-hacking

  • Selective inclusion/removal of cases

  • Subsetting/combining groups

  • Variable dichotomisation

  • Data transformation

  • Collecting more data (“data peeking”)

NOTE: None of these are “questionable” in their own right. Motivation matters!

Vocabulary

p-hacking: taking specific analytic steps in order to achieve statistical significance rather than (pre-planned) steps that are more appropriate to answer the research question.

Some QRPs

p-hacking

Masicampo, E. J., & Lalande, D. R. (2012). A peculiar prevalence of p values just below .05:


A graph showing the frequency of different p-values found in the literature. The y axis shows counts, the x axis shows possible frequencies of values from 0.01 on the left to 0.10 on the right. The frequencies go steadily down as we move from left to right, expect for one spike just under the value of 0.05.

Some QRPs

HARKing

  • NOT the same as exploratory research

  • NOT the same as discussing explanations for your (suprising) results

Vocabulary

HARKing: Hypothesizing After The Results are Known. Often involves collecting data without clear a hypothesis, deciding on a hypothesis based on what’s significant, and then presenting that hypothesis as if it was decided on before running any analyses.

Some QRPs

Selective reporting

  • Often goes hand-in-hand with HARKing

  • Collecting a lot of variables and only reporting statistically significant relationships (without making in clear that you’ve also collected other data)

Selective/inaccurate citing

  • Picking and choosing which papers to cite in a way that fit your narrative

  • Citing papers as supporting a specific point when they don’t

Salami slicing

  • Splitting relevant analyses from a single dataset into multiple papers to increase publication count

Do Psychologists Engage in QRPs? (John et al. 2012)

Item Self admission rate (%)
Failing to report all of study’s dependent measures 63.4
Deciding whether to collect more data after looking to see whether the results were significant 55.9
Failing to report all of a study’s conditions 27.7
Stopping collecting data earlier than planned because one found the result that one had been looking for 16.6
“Rounding off” a p value (e.g., reporting that a p value of .054 is less than .05) 22.2
Selectively reporting studies that “worked” 45.8
Deciding whether to exclude data after looking at the impact of doing so on the results 38.2
Reporting an unexpected finding as having been predicted from the start 27.0
Claiming that results are unaffected by demographic variables (e.g., gender) when one is actually unsure (or knows that they do) 3.0
Falsifying data 0.6

There is yet hope…

PhD Researchers’ beliefs about null results (Pownall et al., 2023):

Statement Percent agree:
If someone does good scientific work, they produce significant results. 7.98
I believe that null effects are generally caused by sloppy experimental work. 1.60
A null effect tells me that I did not do good work. 1.60
Whether I get positive feedback on my work depends on whether my results are significant. 22.34
If my thesis does not produce significant results, I will get more negative feedback. 25.00
I think my supervisor gives more positive feedback on significant results than non-significant ones. 22.87

The future of QRPs?

Potential risks of AI for research integrity (Chen et al. 2024):

  • Generation of fictitious clinical trial data

  • Data fabrication and falsification

  • Inappropriate application of statistical models

  • AI-assisted academic plagiarism and automatic content generation

  • Lack of transparency and disclosure

Why should you care?

YOU are the future

  • researchers (academic or industry)

  • practitioners

  • educators

Understanding how research works (and how the environment can affects the research process) allows you to:

  1. Be critical when you’re reading about research and not fall every “Psychology says” or “Science says” click-bait and
  2. Make science better!

How can we make science better?

Part 2: From “Replication Crisis” to “Credibility Revolution”

The Credibility Revolution

A photo of Simine Vazire, a woman with dark hair wearing a red jumper taken outside on a sunny day.

Simine Vazire (University of Melbourne)

  • Coined the term “Credibility Revolution” to describe a more optimistic move towards improving research, as opposed to just pointing out what was wrong with it
  • Co-Founder of the Society for the Improvement of Psychological Science (SIPS) in 2016
  • SIPS aims to “bring together scholars working to improve methods and practices in psychological science” and has played a key role in the development and proliferation of Open Science

Open Science (OS)

A graphic showing all the different aspects of Open Science. The point of the image is the number and variety of aspects, rather than what those aspects are.

Openness (transparency) prevents researchers from being able to hide their QRPs

The Open Science movement has inspired many innovations in transparent research, only a few of which we’ll explore today:

  • Preregistration
  • Registered reports
  • Open materials/data/code

Preregistration

  • Preregistration involves publicly sharing a time-stamped research plan (e.g., on the OSF) that includes:

    • Precise hypotheses to prevent HARKing

    • Information about all your variables and how they’re operationalised to prevent selective reporting

    • A detailed data analysis plan to prevent p-hacking

  • Other benefits include making sure you are collecting data you can actually analyse and front-loading a lot of the work

The preregistered Open Science badge

Registered Reports

A graphic showing the difference between traditional publishing and registered reports, as described further in the audio.

Registered Reports

Registered Reports are like preregistration insomuch as you’re specifying what you’re going to do in advance, but they have additional benefits.

  • Like preregistration, they aim to reduce QRPs such as HARKing, p-hacking, and selective reporting

  • They also improve research quality more generally as the methods are reviewed by peers before data collection commences

  • They are also more likely to reduce publication bias as journals agree to publish the study based on the quality of the methods, regardless of the results

Sharing of Materials, Data, & Code

  • Open materials allow other researchers to inspect and more easily replicate your research study
  • Open code and data allow other researchers to inspect and reproduce your research findings
  • Open code is the main reason why we’re teaching you R!

The open materials, open code, and open data open science badges.

So, has Open Science improved Psychology?

New Evidence that OS is Improving Psychological Science!

  • Protzko et al. (2023) claimed that OS efforts are improving replicability

  • Methods are complex, but essentially compared the replication rates of studies that had used OS practices with the replication rate of studies that hadn’t used OS practices

  • Concluded 86% replication rate in the OS sample was due to “rigor-enhancing practices” such as preregistration, methodological transparency, confirmatory tests, and large sample sizes.

Wait, what’s that Retraction Notice all about?

Photos of Berna Devezer and Joe Bak-Coleman.

Bak-Coleman & Devezer (2023) wrote a response, arguing (amongst other things):

  • The definition of ‘replication’ was less conservative in the OS than the non-OS sample

  • The OS group of studies were chosen because they were more likely to replicate

  • The hypotheses and analyses in the study were not as preregistered

The story unfolds further in this blogpost.

Wait, what’s that Retraction Notice all about?!

A Bluesky post by Brian Nosek saying: "We made an embarrassing error claiming that *all* analyses in the "high replicability" were preregistered. The origins of the error is explained here: osf.io/4k5sf  I am grateful for the critiques and identification of our errors; that work is essential for a healthy, always improving science."

Wait, what’s that Retraction Notice all about?!

Bluesky post from Joe Bak-Coleman which reads: Brian, this is a misleading description of the journal's motivations for retraction--which extended far beyond this single sentence. Beyond what is mentioned in the note. Even if you disagree with them, it is important to represent the journal's rationale and the nature of the concerns accurately. If you are grateful for the critique, you'll hopefully understand that framing it as a limited technicality when in reality the concerns were much more expansive reflects poorly on us for raising the concerns and misleadingly implies we were making a mountain out of a molehill.

Metabias

  • Original reviewers also pointed out the flaws in this study

  • One of those reviewers spoke out on Twitter, saying the original authors’ response was that they were aware of the flaws, but thought the ends justified the means

  • In other words, it was okay to fudge the figures a bit to get more people on the OS bandwagon

  • This (meta)bias is the kind of QRP that OS set out to eradicate

  • Read the full thread here on Bluesky

A screenshot of a tweet from Tal Yarkoni saying, "...the authors’ response, in a letter appealing the (original) rejection decision, was basically to say “ok yes dear pedant reviewer, you’re right that this paper _shouldn’t_ be necessary… …but people aren’t always convinced by basic logic, so sometimes bad empirical evidence is necessary for persuasion, even if it doesn’t _really_ carry any informational content.”

Questionable Metascience Practices

Rubin (2023) argued that many OS proponents engage in questionable metascience practices, for example:

  • A lack of evidence of its efficacy

  • Metabias towards blaming researcher bias (QRPs) for the replication crisis, when there are other explanations (e.g., weak theory, poor measurement practice)

  • Rejecting, undermining, and ignoring criticisms of metascience and/or science reform

  • Quick, superficial, dismissive, and/or mocking style of criticising others, predominantly from those in positions of power and privilege (bropen science)

What do (most of) these OS proponents have in common?

An image with 12 photos of Open Science proponents. Thy are mostly white and mostly presenting as men.

What do (most of) these OS critics have in common?

An image with 12 photos of open science critics. They vary in terms of gender and ethnicity.

Bropen Science

Whitaker & Guest (2020)

“Not all bros are men… but they are more likely to be from one or more of the following dominant social groups: male, white, cisgender, heterosexual, able-bodied, neurotypical, high socioeconomic status, English-speaking. That’s because structural privileges exist that benefit certain groups of people.”

#bropenscience is a tongue-in-cheek expression but also has a serious side, shedding light on the narrow demographics and off-putting behavioural patterns seen in open science.”

Glimmers of Hope

Lecture Summary

The Anakin/Padme meme, with Anakin saying, "Open Science will change psychological science" and Padme asking, "for the better, right?!" with a look of realisation that it might not be for the better!

  • The Replication Crisis in Psychology triggered a chain of events, intended to improve Psychological Science

  • Initially, the discipline focused on identifying problems and a list of Questionable Research Practices emerged

  • In response, Open Science initiatives were developed that attempted to prevent QRPs

  • However, OS has been subject to Questionable Metascience Practices and must be held to the same standards as the research it is trying to improve

  • OS is showing signs of becoming more humble, more reflective, more inclusive, and will have no choice but to provide robust evidence of its effectiveness

Further Reading

More references:

  • John, L. K., Loewenstein, G., & Prelec, D. (2012). Measuring the prevalence of questionable research practices with incentives for truth telling. Psychological science, 23(5), 524-532.
  • Masicampo, E. J., & Lalande, D. R. (2012). A peculiar prevalence of p values just below .05. Quarterly Journal of Experimental Psychology, 65(11), 2271-2279. https://doi.org/10.1080/17470218.2012.711335
  • Pownall, M., Terry, J., Collins, E., Sladekova, M., & Jones, A. (2023). UK Psychology PhD researchers’ knowledge, perceptions, and experiences of open science. Cogent Psychology, 10(1), 2248765.
  • Silberzahn, R., Uhlmann, E. L., Martin, D. P., Anselmi, P., Aust, F., Awtrey, E., … & Nosek, B. A. (2018). Many analysts, one data set: Making transparent how variations in analytic choices affect results. Advances in Methods and Practices in Psychological Science, 1(3), 337-356.
  • Simon Kolstoe. Defining the Spectrum of Questionable Research
    Practices (QRPs)
    , UKRIO, 2023 https://doi.org/10.37672/UKRIO.2023.02.QRPs
Chen, Ziyu, Changye Chen, Guozhao Yang, Xiangpeng He, Xiaoxia Chi, Zhuoying Zeng, and Xuhong Chen. 2024. “Research Integrity in the Era of Artificial Intelligence: Challenges and Responses.” Medicine 103 (27): e38811. https://doi.org/10.1097/md.0000000000038811.