Fundamentals of Statistical Testing

The last petrol station before the Highway to Hell

Dr. Martina Sladekova

A reminder image so that I don't forget to record the lecture on Zoom. Again.

Housekeeping

Register your Kahoot username:

https://canvas.sussex.ac.uk/courses/31714/quizzes

The R-Helpdesk is up and running:

https://canvas.sussex.ac.uk/courses/31714/pages/module-contacts

Where are we going?

Research questions and hypotheses

We want to answer a research question or be able to make decision about a hypothesis

Research question:

Is CBT (Cognitive Behavioural Therapy) effective for treating social anxiety?

Hypothesis:

Participants who receive the CBT intervention will show lower social anxiety levels than participants who don’t receive an intervention.

PollEverywhere:

What are some other examples of research questions?

Poll link:

PollEv.com/martinasladek

Research questions and hypotheses

Some other examples [made up data]:

Hypothesis: The more we procrastinate, the more stressed we feel.

Research questions and hypotheses

Some other examples [made up data]:

Hypothesis: The more we procrastinate, the more stressed we feel.

Research questions and hypotheses

Some other examples [made up data]:

Research question: Is there a relationship between caffeine consumption and productivity?

Research questions and hypotheses

Some other examples [made up data]:

Research question: Is there a relationship between caffeine consumption and productivity?

Research questions and hypotheses

Some other examples [made up data, though some studies have found this pattern]:

Hypothesis: The relationship between happiness and marriage is moderated by gender in heterosexual relationships.

Research questions and hypotheses

Some other examples [madu up data, though some studies have found this pattern]:

Hypothesis: The relationship between happiness and marriage is moderated by gender in heterosexual relationships.

Research questions and hypotheses

Often the data will not show a clear cut difference
“p-value”:
- a hypothesis testing tool
- a value that we calculate to “formally” decide whether our hypothesis is supported
The next three weeks - building blocks of hypothesis testing

Analysing Data Roadmap

Roadmap on the module. Top row contains boxes "Introduction and distributions", "Standard error and confidence intervals" and "null hypothesis significance testing". Middle row is "t-test", "correlation" and "chi-square". Bottom row is "equation of a straight line", "linear model with one predictor", "linear model with multiple predictors"

Where did we come from?

Quantitative research

In quantitative research, we often (but not always):
- Start with a theory,
- Devise an experiment to test that theory
- Collect data
- Describe our sample <- last term
- Test hypotheses <- this term

📌Attendance pin🧷

Cats or dogs

A study

Forman and Leavens (2024) - The Effect of Transparency on Unsolvable Task Engagement in Domestic Cats (Felis catus) using Citizen Science
A study of social behaviours - e.g. looking at the owner while completing an unsolvable puzzle
Sample of 21 cats (each cat completed multiple trials)

Fluffy white and grey cat holding a fetch toy. The cats irises are giant and playful. She's a good cat.

What can we say about this sample?

On average, how long do cats spend on a task before looking at their owner?
What is the shortest and longest time?
What is the variance of scores in our sample? How do scores in our sample differ from each other?
Are there any “unusual” cats in our sample?

The mean, the median and the mode

Measures of central tendency:

Mean: the average value \(\frac{\sum{x_i}}{n}\)
Median: the value exactly in the middle
Mode: the most common value

The mean, the median and the mode

Measures of central tendency:

Mean: the average value \(\frac{\sum{x_i}}{n}\) = 16.56
Median: the value exactly in the middle = 16.1
Mode: the most common value (around 15)

Minimum, maximum, and variance shenanigans

Minimum - smallest value
Maximum - largest value

Minimum, maximum, and variance shenanigans

Minimum - smallest value: 7.4
Maximum - largest value: 25.9

Minimum, maximum, and variance shenanigans

Variance - how scores differ from each other \(\frac{\sum{(X_i - X)^2}}{n-1}\) . Alternatively:

var(cat_sample$look_latency)

[1] 22.17848

Minimum, maximum, and variance shenanigans

Standard deviation - on average, by how much do scores differ from the mean? \(\sqrt{\frac{\sum{(X_i - X)^2}}{n-1}}\) . Alternatively:

sd_sample <- sd(cat_sample$look_latency)
sd_sample

[1] 4.709403

Unusual cases?

Given our sample, how unusual is a cat who took longer than 25 seconds to look at their owner

nrow(cat_sample)

[1] 21

There are 21 cats in our sample

dplyr::filter(
  cat_sample, 
  look_latency > 25
)

cat_name	look_latency
Bubbles	25.3
Commodore	25.9

2 cats out of 21 represents a proportion of 2/21 = 0.095.

The empirical probability that a cat takes more than 25s to look at owner is 0.095, or 9.5 percent.

Populations vs samples

Sample =/= Population

Population distribution

Describes the frequency with which scores of a variable occur in the population.

Sample distribution

Describes the frequency with which scores of a variable occur in the sample.

Two cornerstones of statistical research:

A distribution of a sample from a given population will resemble the shape of that population.
A lot of variables have population distributions with a predictable shape.

Populations vs samples

A distribution of a sample from a given population will resemble the shape of that population.

The larger our sample, the closer it will resemble the population distribution.

Populations vs samples

A distribution of a sample from a given population will resemble the shape of that population.

The larger our sample, the closer it will resemble the population distribution.

Known distributions

A lot of variables have population distributions with a predictable shape.

Bottom row shows histograms of 5 distributions - normal, chi-square, t, beta, and uniform. The top row shows their equivalent density plots.

One shape to rule them all

Normal distribution is :
- Symmetrical (skewness of 0)
- Bell-shaped
- Unimodal (only has one mode)
- Defined by mean and standard deviation
Mean, median, and mode converge on one value

Sneaky distributions

There are infinite possible combinations of means and standard deviations
Therefore there are infinite possible normal distributions
But not every bell-shaped distributions is a normal distribution

Proportions matter

How to tell whether something is actually normally distributed

We know that normal distribution has:
- More scores in the middle
- Fewer and fewer scores in the tails, the further away we get from the mean - the centre of the distribution

Proportions matter

We expect certain proportions of scores at certain distances away from the mean:
- ~68% of scores will be within 1 standard deviation of the mean
- ~95% of scores will be within 1.96 standard deviations of the mean
- ~99% of scores will be within 2.58 standard deviations of the mean

How to tell whether something is actually normally distributed

~68% of scores will be within 1 standard deviation of the mean:

This means that shaded area contains approximately ~68% of the scores.

How to tell whether something is actually normally distributed

~95% of scores will be within 1.96 standard deviations of the mean:

This means that the shaded area contains approximately ~95% of the scores.

How to tell whether something is actually normally distributed

~99% of scores will be within 2.58 standard deviations of the mean:

This means that the shaded area contains approximately ~99% of the scores. The remaining <1% will be a the unshaded tails.

Critical Values

Proportions to Probability

The proportions are always the same in a normal distribution
If we know that a particular quantity is normally distributed…
- We know something about the probability of observing a particular value!

This allows us to quantify whether something “unusual” or “surprising” with reference to the population.

What’s “unusual”?

Let’s assume that the cat population has the same mean and SD (for looking latency) as our sample.
Mean = 16.6
SD = 4.7

Then:

16.6 - 4.7 = 11.9
16.6 + 4.7 = 21.3

68% of cats take between 11.9 and 21.3 to look at their owner when completing a task.

What’s “unusual”?

Black cat chilling in a pumpkin, looking about as content as a cat would in a box.

Pumpkin is a cat from our sample
Pumpkin only spent 7 seconds on a task before turning to his owner
Is pumpkin unusual?

What’s “unusual”?

How common is Pumpkin’s score of 7 ?
Shaded area: proportion of the cat population that spend more time on a task than Pumpkin
Non-shaded area: proportion of cats who spent less time on a task

Working out probabilities

Option 1: Good old Z-scores

\[ Z = \frac{X_i - Mean}{SD} = \frac{7 - 16.56}{4.71} = -2.03 \]

Reminder

Transforming into Z-scores is called standardisation. If we transform the whole distribution, it will have (1) the same shape, (2) mean of 0 and SD of exactly 1

Working out probabilities

Option 1: Good old Z-scores

\[ Z = \frac{X_i - Mean}{SD} = \frac{7 - 16.56}{4.71} = -2.03 \]

Look up Z-scores in a Z-table

A table of probabilities associated with Z-scores
	0.00	0.01	0.02	0.03	0.04	0.05	0.06	0.07	0.08	0.09
0.0	0.50000	0.50399	0.50798	0.51197	0.51595	0.51994	0.52392	0.52790	0.53188	0.53586
0.1	0.53983	0.54380	0.54776	0.55172	0.55567	0.55962	0.56356	0.56749	0.57142	0.57535
0.2	0.57926	0.58317	0.58706	0.59095	0.59483	0.59871	0.60257	0.60642	0.61026	0.61409
0.3	0.61791	0.62172	0.62552	0.62930	0.63307	0.63683	0.64058	0.64431	0.64803	0.65173
0.4	0.65542	0.65910	0.66276	0.66640	0.67003	0.67364	0.67724	0.68082	0.68439	0.68793
0.5	0.69146	0.69497	0.69847	0.70194	0.70540	0.70884	0.71226	0.71566	0.71904	0.72240
0.6	0.72575	0.72907	0.73237	0.73565	0.73891	0.74215	0.74537	0.74857	0.75175	0.75490
0.7	0.75804	0.76115	0.76424	0.76730	0.77035	0.77337	0.77637	0.77935	0.78230	0.78524
0.8	0.78814	0.79103	0.79389	0.79673	0.79955	0.80234	0.80511	0.80785	0.81057	0.81327
0.9	0.81594	0.81859	0.82121	0.82381	0.82639	0.82894	0.83147	0.83398	0.83646	0.83891
1.0	0.84134	0.84375	0.84614	0.84849	0.85083	0.85314	0.85543	0.85769	0.85993	0.86214
1.1	0.86433	0.86650	0.86864	0.87076	0.87286	0.87493	0.87698	0.87900	0.88100	0.88298
1.2	0.88493	0.88686	0.88877	0.89065	0.89251	0.89435	0.89617	0.89796	0.89973	0.90147
1.3	0.90320	0.90490	0.90658	0.90824	0.90988	0.91149	0.91309	0.91466	0.91621	0.91774
1.4	0.91924	0.92073	0.92220	0.92364	0.92507	0.92647	0.92785	0.92922	0.93056	0.93189
1.5	0.93319	0.93448	0.93574	0.93699	0.93822	0.93943	0.94062	0.94179	0.94295	0.94408
1.6	0.94520	0.94630	0.94738	0.94845	0.94950	0.95053	0.95154	0.95254	0.95352	0.95449
1.7	0.95543	0.95637	0.95728	0.95818	0.95907	0.95994	0.96080	0.96164	0.96246	0.96327
1.8	0.96407	0.96485	0.96562	0.96638	0.96712	0.96784	0.96856	0.96926	0.96995	0.97062
1.9	0.97128	0.97193	0.97257	0.97320	0.97381	0.97441	0.97500	0.97558	0.97615	0.97670
2.0	0.97725	0.97778	0.97831	0.97882	0.97932	0.97982	0.98030	0.98077	0.98124	0.98169
2.1	0.98214	0.98257	0.98300	0.98341	0.98382	0.98422	0.98461	0.98500	0.98537	0.98574
2.2	0.98610	0.98645	0.98679	0.98713	0.98745	0.98778	0.98809	0.98840	0.98870	0.98899
2.3	0.98928	0.98956	0.98983	0.99010	0.99036	0.99061	0.99086	0.99111	0.99134	0.99158
2.4	0.99180	0.99202	0.99224	0.99245	0.99266	0.99286	0.99305	0.99324	0.99343	0.99361
2.5	0.99379	0.99396	0.99413	0.99430	0.99446	0.99461	0.99477	0.99492	0.99506	0.99520
2.6	0.99534	0.99547	0.99560	0.99573	0.99585	0.99598	0.99609	0.99621	0.99632	0.99643
2.7	0.99653	0.99664	0.99674	0.99683	0.99693	0.99702	0.99711	0.99720	0.99728	0.99736
2.8	0.99744	0.99752	0.99760	0.99767	0.99774	0.99781	0.99788	0.99795	0.99801	0.99807
2.9	0.99813	0.99819	0.99825	0.99831	0.99836	0.99841	0.99846	0.99851	0.99856	0.99861
3.0	0.99865	0.99869	0.99874	0.99878	0.99882	0.99886	0.99889	0.99893	0.99896	0.99900
3.1	0.99903	0.99906	0.99910	0.99913	0.99916	0.99918	0.99921	0.99924	0.99926	0.99929
3.2	0.99931	0.99934	0.99936	0.99938	0.99940	0.99942	0.99944	0.99946	0.99948	0.99950
3.3	0.99952	0.99953	0.99955	0.99957	0.99958	0.99960	0.99961	0.99962	0.99964	0.99965
3.4	0.99966	0.99968	0.99969	0.99970	0.99971	0.99972	0.99973	0.99974	0.99975	0.99976
3.5	0.99977	0.99978	0.99978	0.99979	0.99980	0.99981	0.99981	0.99982	0.99983	0.99983
3.6	0.99984	0.99985	0.99985	0.99986	0.99986	0.99987	0.99987	0.99988	0.99988	0.99989
3.7	0.99989	0.99990	0.99990	0.99990	0.99991	0.99991	0.99992	0.99992	0.99992	0.99992
3.8	0.99993	0.99993	0.99993	0.99994	0.99994	0.99994	0.99994	0.99995	0.99995	0.99995
3.9	0.99995	0.99995	0.99996	0.99996	0.99996	0.99996	0.99996	0.99996	0.99997	0.99997

Working out probabilities

Option 1: Good old Z-scores

\[ Z = \frac{X - Mean}{SD} = \frac{7 - 16.56}{4.71} = -2.03 \]

Look up Z-scores in a Z-table

Z-scores
	0.03
2.0	0.97882

Working out probabilities

Alternatively…

pnorm(-2.03, lower.tail = FALSE)

[1] 0.9788217

Shaded area: 97.88%
Unshaded area: 2.12%

Assuming our cat population has the shape we specified, there’s only 0.021 probability (or 2.12%) of finding a cat like Pumpkin. He’s quite unusual!

Black cat, holding a little knife, eyes big, looking up.

Another example

Oreo looked at his owner after 22.8 seconds. Is Oreo unusual - e.g. in the top 5% ? 👍 👎

Another example

Oreo looked at his owner after 22.8 seconds. Is Oreo unusual - e.g. in the top 5% ? 👍 👎

Another example

Oreo looked at his owner after 22.8 seconds. Is Oreo unusual - e.g. in the top 5% ? 👍 👎

pnorm(22.8, mean = 16.6, sd = 4.7)

[1] 0.9064403

Shaded area: 90.64 %

Unshaded area: 9.36 %

The other way around …

How long would Oreo have to wait before looking at his owner to be in top 5%?
We can reverse the math (or use a different R function) and calculate a critical cut-off for a specific probability, assuming some population mean and SD:

qnorm(p = 0.05, mean = 16.6, sd = 4.7, lower.tail = FALSE)

[1] 24.33081

Assuming that…

We calculated these probabilities assuming a certain shape of the population distribution - we don’t know whether this assumption is reasonable.

Summary

Samples resemble populations - larger samples resemble them better
This allows us make assumptions about what the population looks might look like and calculate probabilities.
We assume some version of the reality (the population) and then we check whether what we observed is sufficiently unusual/interesting in this version of the reality.
Normal distribution is one of many “mathematically defined” distributions - we’ll meet more in the weeks to come.

NEXT WEEK:

Quantifying uncertainty

References

Forman, Jemma, and David Leavens. 2024. “The Effect of Transparency on Unsolvable Task Engagement in Domestic Cats (Felis Catus) Using Citizen Science.” http://dx.doi.org/10.21203/rs.3.rs-3834933/v1.