t-tests

Week 05

Jennifer Mankin

Looking Ahead (and Behind)

So far: Fundamental grammar
- Amo, amas, amat…
- Next up: Starting to converse!

This week: t-test

Coming up: Correlation, \(\chi^2\)
Further ahead: The Linear Model

Take-Away Paper

You will have some data, which you must use to:
- Ensure the data contains the necessary measures and values
- Decide on and create a summary and visualisation
- Perform an appropriate statistical test

You should be prepared to explain and justify your analytic decisions!

Use Your Resources!

Everything we ask you to do in the TAP, you will have practiced multiple times in tutorials, skills labs, and worksheets.

To prep or get help, come to Skills Labs and practicals!

Take-Away Paper

Technicalities:

Only available for 48 hours from Monday to Wednesday of Week 7
You can work on it as long as you want in the time available
Submit the rendered HTML file on Canvas

Read the TAP Information page on Canvas!

Important

THERE IS NO LATE SUBMISSION PERIOD!!!

Other Fun Stuff

Participants Needed!

Invitation to participate in final-year dissertation research

Hybrid teaching and disability support: rebrand.ly/hybrid_ds
ChatGPT and AI at University: rebrand.ly/gpt_uni

Awards

Education Award

Nominate a member of staff who has inspired or made a difference to you

Nominate for the Education Awards here!

SavioR Award

Nominate a fellow student on the course who has helped you with R

Nominate for the SavioR Award here!

Objectives

After this lecture you will understand:

The concepts behind comparing two means
- Independent and paired samples t-tests
Where the t-statistic comes from
How to read histograms and means plots
How to interpret and report the results of t-tests

Comparing Two Means

Extremely common and fundamental testing paradigm

The Fundamental Question

Do people in one group score, perform, react, behave (etc.) differently than in another group?

Two types (for today!)
- Independent: different entities/participants in each groups
- Paired: same entities/participants in both groups
Very similar logic and interpretation, slightly different maths!

Taste the Rainbow: Synaesthesia (Redux)

People with synaesthesia have unusual sensory experiences
- Experience colours for words, shapes for music, personalities for numbers, etc.

All of the letters of the English alphabet, each coloured a different colour

Drawing of a human figure in the centre of a circle made up of coloured segments, each labeled with a month of the year in order Image Source

Grapheme-Colour Synaesthesia

Association between letters/words and particular colours
- Tends to be consistent throughout life, beginning in childhood
- So, synaesthetes might tend to notice language/spelling more often

Research Question

Do synaesthetes have a different cognitive style, compared to non-synaesthetes?

Conceptual hypothesis

Grapheme-colour synaesthetes have a more language-oriented cognitive style than non-synaesthetes

Data and Design: SCSQ

Mealor et al (2016): Sussex Cognitive Styles Questionnaire
- Includes measures of imagery, language ability, and more
- Example items: “I tend to notice if a word has the same letter repeated in its spelling”; “I enjoy learning new languages”
- Validated on people with and without synaesthesia

Operational hypothesis

Synaesthetes will, on average, have a different score on the Language subscale of the SCSQ than non-synaesthetes

What is the null hypothesis?

Summary of Design

Research Question

Do synaesthetes have a different cognitive style, compared to non-synaesthetes?

Conceptual hypothesis

Grapheme-colour synaesthetes have a more language-oriented cognitive style than non-synaesthetes

Operational hypothesis

Grapheme-colour synaesthetes will have, on average, a different score on the Language subscale of the SCSQ than non-synaesthetes

Null hypothesis

Grapheme-colour synaesthetes will have, on average, the same score on the Language subscale of the SCSQ as non-synaesthetes

Having a Look

Let’s begin with all scores on the Language subscale together

Figure 1: SCSQ Language scores

Having a Look

Figure 2: SCSQ Language scores by synaesthesia group

Having a Look

Figure 3: SCSQ Language scores by synaesthesia group, with group means

Sorted!

The mean Language score for synaesthetes is higher than for non-synaesthetes
Are we done?

Of course not 😁
How different are these mean scores, accounting for how much scores vary?
- How strong is the signal (the difference in means)…
- Compared to the noise (the variation in mean differences)?

Steps of the Analysis

Calculate the test statistic (signal-to-noise ratio)

Compare that test statistic to its distribution under the null hypothesis

Obtain the probability p of encountering a test statistic of the size we have, or larger, assuming the null hypothesis is true

Vocabulary: Ratio

A number that captures the relative size of two quantities, expressed as how many times bigger the first quantity is than the second. Calculated as the first divided by the second.

e.g., if the seagull-to-human ratio in Brighton is 3, then there are 3 seagulls for every 1 human, or 3 times as many seagulls as humans.

A gif of a cat partially obscured by visual noise or 'snow' that make it a bit harder to interpret the image

Calculating the Test Statistic: The Signal

The “signal” is the relationship of interest
- The variation in scores explained by group membership
- Here: the relationship between being a synaesthete (or not) and Language score

Calculate the mean of each group
Subtract one mean from the other
The size of the difference in means is the signal

Having a Closer Look

Figure 4: Mean SCSQ Language scores by synaesthesia group, with 95% CIs

Calculating the Signal

Let’s get some numbers to work with 🤓
- Mean in synaesthete group: 4.29
- Mean in nonsynaesthete group: 3.55

Difference in the means (the “signal”) = 4.29 - 3.55 = 0.74

Is this a big difference, compared to how different we might expect any two sample means to be from the same population?

Calculating the Test Statistic: The Noise

The “noise” is the error, the variation NOT explained by group membership
The differences in means have a sampling distribution!
- Exactly analogous to the sampling distribution of the mean

So, the “noise” is the standard error of the difference in means between synaesthetes and non-synaesthetes
- Estimate of how different we expect any two sample means to be from the same population

Side Note: Why “from the same population”?

Remember: the null hypothesis was that on average, there’s no difference between synaesthetes and non-synaesthetes on Language score
- Basically: group membership doesn’t matter

We’re sampling from the same population of scores; synaesthesia is irrelevant
- Very small differences in means quite likely
- Very large differences in means quite unlikely

Random sampling of dots in a population resulting in a distribution centred at 0 to illustrate that when you sample two groups randomly from the same population, larger differences in group means are increasingly unlikely to occur

Calculating the Noise

The maths get a bit more complicated here!

We can derive the standard error of the difference in means from s and N…

\[s_{p}^2 = \frac{(n_{1} - 1)s_{1}^2 + (n_{2} - 1)s_{2}^2}{n_{1} + n_{2} -2}\] \[\hat{SE_{M_{diff}}} = \sqrt{\frac{s_{p}^2}{n_{1}} + \frac{s_{p}^2}{n_{2}}}\]

syn_lang_syn <- syn_data |> 
  dplyr::filter(syn_graph_col == "Yes") |> 
  dplyr::pull(scsq_language)

syn_lang_nonsyn <- syn_data |> 
  dplyr::filter(syn_graph_col == "No") |> 
  dplyr::pull(scsq_language)

n_syn <- length(syn_lang_syn)
n_nonsyn <- length(syn_lang_nonsyn)

sd_syn <- sd(syn_lang_syn, na.rm = TRUE)
sd_nonsyn <- sd(syn_lang_nonsyn, na.rm = TRUE)

sd_pooled <- (((n_syn - 1)*(sd_syn^2)) + ((n_nonsyn - 1)*(sd_nonsyn^2)))/(n_syn + n_nonsyn + 2)

mdiff_se <- sqrt((sd_pooled/n_syn) + (sd_pooled/n_nonsyn))
mdiff_se

[1] 0.1151723

Steps of the Analysis, Redux

Calculate the (standardised) difference between mean scores
- Divide the signal (difference in means) = 0.74…
- By the noise (standard error of the difference in means) = 0.12

This is our “test statistic” t, or signal-to-noise ratio

\[t = \frac{signal}{noise} = \frac{M_{diff}}{SE_{M_{diff}}} = \frac{0.74}{0.12} = 6.43 \]

Vocabulary: Test Statistic

A number that captures the relationship or comparison of interest. Common examples include t, r, \(\chi^2\), F. Typically has the general form:

\[\frac{\text{estimate of the relationship or difference}}{\text{estimate of the variation in estimates}} \ \text{or} \ \frac{signal}{noise} \]

We Did It!!

…Or did we?

What does this number mean? How (un)likely is it?

Steps of the Analysis, Redux

Compare the test statistic to its distribution under the null hypothesis
Obtain the probability p of encountering a test statistic of the size we have, or larger, if the null hypothesis is true

What’s the Point?

The test statistic t is the difference in group means divided by their standard error
- So, t represents how big the signal is compared to the noise

Each test statistic has its own known distribution under the null hypothesis
- Known probability of encountering any given test statistic under the null hypothesis
- Including t, which is (surprise) t-distributed

Important

Larger values of t are increasingly unlikely under the null hypothesis (ie. when the “signal” is, in reality, 0)

What’s the Point?

Density plot of a bell-shaped t-distribution, with 3 degrees of freedom. Shape is similar to normal, but tails are longer. Critical value displayed on the plot for 3 degrees of freedom is 3.184

Would You Like Some t?

Naturally, t is t-distributed (here, with N₁ + N₂ - 2 degrees of freedom)
What can we conclude, given an \(\alpha\) level of .05?

If p > .05

Our results are likely to occur under the null hypothesis

We have no evidence that the null hypothesis is not true

Conclusion: RETAIN THE NULL

If p < .05

Our results are unlikely to occur under the null hypothesis

It may in fact be the case that the null hypothesis is not true

Conclusion: REJECT THE NULL

Side Note: It’s Exactly Backwards

The word “significant” implies

Big
Noticeable
Having a large impact

So we might expect that “significant” means bigger than 0.05

It is the exact opposite of that!

Statistical significance expresses surprisingness - the probability of unlikely events

So, small p-values are “significant”, not large ones

That’s the t

After all of that work…R can do it for us instantaneously!


    Two Sample t-test

data:  scsq_language by syn_graph_col
t = -6.3394, df = 1209, p-value = 3.25e-10
alternative hypothesis: true difference in means between group No and group Yes is not equal to 0
95 percent confidence interval:
 -0.9576637 -0.5049972
sample estimates:
 mean in group No mean in group Yes 
         3.554949          4.286279

What do you think?

What should we conclude?

Would You Like Some t?


    Two Sample t-test

data:  scsq_language by syn_graph_col
t = -6.3394, df = 1209, p-value = 3.25e-10
alternative hypothesis: true difference in means between group No and group Yes is not equal to 0
95 percent confidence interval:
 -0.9576637 -0.5049972
sample estimates:
 mean in group No mean in group Yes 
         3.554949          4.286279

Reporting the Results

“On average, grapheme-colour synaesthetes scored higher on the Language subscale of the SCSQ (M = 4.29, SD = 0.61) than non-synaesthetes (M = 3.55, SD = 0.75). An independent samples t-test indicated that this difference was statistically significant (t(1209) = -6.34, p < .001, M_diff = -0.74, 95% CI [-0.96, -0.5]).”

Interim Summary

Independent samples t-tests

Tests the null hypothesis that two samples come from the same population (i.e. M_diff = 0)
Calculate test statistic t, which expresses signal-to-noise ratio
- Size of the difference in means divided by the standard error of the difference in means
Then, evaluate the probability p of obtaining t of this size (or larger) under the null hypothesis
If p < \(\alpha\), we might conclude that group membership is associated with some difference

No Maths Required

You will not need to memorise, and will not be assessed on, the equations for pooled SD or SE_Mdiff!

Interlude

Consider This

Does this constitute evidence that group membership (ie being a synaesthete or not) CAUSES a difference in language ability?

Next up: paired samples t-test

Do You Want Some Synaesthesia?

Being a synaesthete is super cool and a lot of fun
- See cool colours all the time!
- Have (very mundane and mostly unremarkable) superpowers!

What if everyone could be a synaesthete?
- Can you train people to have synaesthesia?

Paired (Repeated) Design

Simplified version of Bor et al. (2014)
Train people to associate colours with letters
Test success of the training with a modified Stroop task
- Outcome: naming speed pre- vs post-training

Three boxes illustrating the experimental design. The first is labeled 'Pre-test' and contains a green capital letter E, with the word 'green' beneath it. The second is labeled 'Training', can contains a capital letter E on a green background. The third is labeled 'Post-test'and contains a green capital letter E, with the word 'green' beneath it, identical to the first box.

Paired (Repeated) Design

Key difference: the same people participate in both conditions

id	pre	post
HZB89D	954.3002	895.3002
14994L	605.1047	544.1047
UIM6YD	830.1126	768.1126
TBR8Z7	697.5568	640.5568
C143ZP	523.6873	465.6873
ALR3HB	831.0576	771.0576

The data are paired
- Both columns (pre and post) contain the same thing (here, reaction time)
- Each row contains data from the same person

Example Output


    Paired t-test

data:  syn_train$pre and syn_train$post
t = 100.22, df = 13, p-value < 2.2e-16
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
 57.65824 60.19890
sample estimates:
mean difference 
       58.92857

“There was a significant difference in mean colour naming times between pre- and post-training (t(13) = 100.22, p < .001, M_diff = 58.93, 95% CI [57.66, 60.2]).”

Causality

In our first example, could we conclude that having synaesthesia causes you to pay more attention to language?
In our second example, could we conclude that having training causes you to associate colours with letters?

Why is this?

That’s the t

The t-test quantifies the size of the difference of two means (signal) compared to the error (noise)
Independent samples t-test
- Tests means from different entities/participants
- Independent or “between-subjects” design
Paired samples t-test
- Tests means from the same entities/participants
- Repeated or “within-subjects” design
Establishing causality is a function of study design not statistics!

Have a great day!

Participants Needed

Hybrid teaching and disability support study: rebrand.ly/hybrid_ds

ChatGPT and AI at University study: rebrand.ly/gpt_uni

Education Award

Nominate a member of staff who has inspired or made a difference to you

Nominate for the Education Awards here!

Gif of Kermit the Frog sipping a cup of tea.