Fundamentals of Statistical Testing

Recap, Z-scores, and Unusual Cats.

Dr. Martina Sladekova

A reminder image so that I don't forget to record the lecture on Zoom. Again.

Housekeeping

Register your Kahoot username:

https://canvas.sussex.ac.uk/courses/35783/quizzes

The R-Helpdesk is up and running:

https://canvas.sussex.ac.uk/courses/35783/pages/module-contacts

Where are we going?

Research questions and hypotheses

  • We want to answer a research question or be able to make decision about a hypothesis

Research question:

Is CBT (Cognitive Behavioural Therapy) effective for treating social anxiety?


Hypothesis:

Participants who receive the CBT intervention will show lower social anxiety levels than participants who don’t receive an intervention.

Research questions:

Research questions and hypotheses

Some other examples [made up data]:

Hypothesis: The more we procrastinate, the more stressed we feel.

Research questions and hypotheses

Some other examples [made up data]:

Hypothesis: The more we procrastinate, the more stressed we feel.

Research questions and hypotheses

Some other examples [made up data]:

Research question: Is there a relationship between caffeine consumption and productivity?

Research questions and hypotheses

Some other examples [made up data]:

Research question: Is there a relationship between caffeine consumption and productivity?

Research questions and hypotheses

  • Often the data will not show a clear cut difference
  • p-value”:
    • a hypothesis testing tool
    • a value that we calculate to “formally” decide whether our hypothesis is supported
  • The next three weeks - building blocks of hypothesis testing

Analysing Data Roadmap

Roadmap on the module. Top row contains boxes "Introduction and distributions", "Standard error and confidence intervals" and "null hypothesis significance testing". Middle row is "t-test", "correlation" and "chi-square". Bottom row is "equation of a straight line", "linear model with one predictor", "linear model with multiple predictors"

Where did we come from?

Quantitative research

  • In quantitative research, we often (but not always):

    • Start with a theory,

    • Devise an experiment (or a cross-sectional study) to test that theory

    • Collect the data

    • Describe our sample <- last term

    • Test hypotheses <- this term

A study

  • Forman and Leavens (2024) - The Effect of Transparency on Unsolvable Task Engagement in Domestic Cats (Felis catus) using Citizen Science

  • A study of social behaviours - e.g. looking at the owner while completing an unsolvable puzzle

  • Sample of 21 cats (each cat completed multiple trials)

Fluffy white and grey cat holding a fetch toy. The cats irises are giant and playful. She's a good cat.

What can we say about this sample?

  • On average, how long do cats spend on a task before looking at their owner?

  • What is the shortest and longest time?

  • What is the variance of scores in our sample? How do scores in our sample differ from each other?

  • Are there any “unusual” cats in our sample?

The mean, the median and the mode

Measures of central tendency:

  • Mean: the average value \(\frac{\sum{x_i}}{n}\)

  • Median: the value exactly in the middle

  • Mode: the most common value

The mean, the median and the mode

Measures of central tendency:

  • Mean: the average value \(\frac{\sum{x_i}}{n}\) = 16.56

  • Median: the value exactly in the middle = 16.1

  • Mode: the most common value (around 15)

Variance

  • Variance - how scores differ from each other \(\frac{\sum{(X_i - X)^2}}{n-1}\) . Alternatively:
var(cat_sample$look_latency)
[1] 22.17848

Variance

  • Standard deviation - on average, by how much do scores differ from the mean? \(\sqrt{\frac{\sum{(X_i - X)^2}}{n-1}}\) . Alternatively:
sd_sample <- sd(cat_sample$look_latency)
sd_sample
[1] 4.709403

Unusual cases?

  • Given our sample, how unusual is a cat who took longer than 25 seconds to look at their owner
nrow(cat_sample)
[1] 21

There are 21 cats in our sample

dplyr::filter(
  cat_sample, 
  look_latency > 25
)
cat_name look_latency
Bubbles 25.3
Commodore 25.9

2 cats out of 21 represents a proportion of 2/21 = 0.095.

The empirical probability that a cat takes more than 25s to look at owner is 0.095, or 9.5 percent.

Populations vs samples

Sample =/= Population

Population distribution

Describes the frequency with which scores of a variable occur in the population.

Sample distribution

Describes the frequency with which scores of a variable occur in the sample.

Two cornerstones of statistical research:

  1. A distribution of a sample from a given population will resemble the shape of that population.
  2. A lot of variables have population distributions with a predictable shape.

Populations vs samples

  1. A distribution of a sample from a given population will resemble the shape of that population.
  • The larger our sample, the closer it will resemble the population distribution.

Populations vs samples

  1. A distribution of a sample from a given population will resemble the shape of that population.
  • The larger our sample, the closer it will resemble the population distribution.

Known distributions

  1. A lot of variables have population distributions with a predictable shape.

Bottom row shows histograms of 5 distributions - normal, chi-square, t, beta, and uniform. The top row shows their equivalent density plots.

One shape to rule them all

  • Normal distribution is :

    • Symmetrical (skewness of 0)

    • Bell-shaped

    • Unimodal (only has one mode)

    • Defined by mean and standard deviation

  • Mean, median, and mode converge on one value

Sneaky distributions

  • There are infinite possible combinations of means and standard deviations

  • Therefore there are infinite possible normal distributions

  • But not every bell-shaped distributions is a normal distribution

  • Proportions matter

How to tell whether something is actually normally distributed

  • We know that normal distribution has:

    • More scores in the middle

    • Fewer and fewer scores in the tails, the further away we get from the mean - the centre of the distribution

Proportions matter

  • We expect certain proportions of scores at certain distances away from the mean:

    • ~68% of scores will be within 1 standard deviation of the mean

    • ~95% of scores will be within 1.96 standard deviations of the mean

    • ~99% of scores will be within 2.58 standard deviations of the mean

How to tell whether something is actually normally distributed

~68% of scores will be within 1 standard deviation of the mean:

This means that shaded area contains approximately ~68% of the scores.

How to tell whether something is actually normally distributed

~95% of scores will be within 1.96 standard deviations of the mean:

This means that the shaded area contains approximately ~95% of the scores.

How to tell whether something is actually normally distributed

~99% of scores will be within 2.58 standard deviations of the mean:

This means that the shaded area contains approximately ~99% of the scores. The remaining <1% will be a the unshaded tails.

Z-scores

Z-scores

A way of assessing how unusual/uncommon a score is with reference to the mean

  • Mean of our sample: 16.56

  • SD sample: 4.71

\[ Z = \frac{X-M}{SD} \]

We are converting scores into standard deviation units.

Z-scores - what’s “unusual”?


  • Dracula is a cat from our sample

  • Dracula only spent 7 seconds on a task before turning to his owner

  • Is Dracula unusual?

Z-scores - what’s “unusual”?

Calculate Dracula’s Z-score:

\[ Z = \frac{X-M}{SD} \]

\[ Z = \frac{7-16.56}{4.71} \]

\[ Z = -2.03 \]

Dracula’s looking latency is 2.03 standard deviations smaller than the mean.

Z-scores - what’s “unusual”?

We can convert the whole sample into Z-scores:

Standardisation - the shape of the distribution remains the same but mean and SD change.

  • Mean = 0

  • SD = 1

Z-scores - what’s “unusual”?

We can convert the whole sample into Z-scores:

Standardisation allows us to work with probabilities:

  • ~68% of scores will be within Z-scores of -1 to +1

  • ~95% of scores will be within Z-scores -1.96 to + 1.96

  • ~99% of scores will be within Z-scores -2.58 to + 2.58

Z-scores - what’s “unusual”?

We can convert the whole sample into Z-scores:

Standardisation allows comparisons:

  • Across measurement scales

  • Accounting for the characteristics of a normal distribution

  • With reference to the population

Critical Values

Unusual in a population…

  • Problem: we don’t know what the population looks like - i.e. what is the mean and the SD?
  • We can assume a standard normal population distribution
    • Simplest normal distribution - makes minimal assumptions:
    • \(\mu\) = 0 (population mean)
    • \(\sigma\) = 1 (population SD)
  • A distribution of Z-scores from a sample can be (1) compared against a standard normal population distribution (2) calculate the probability of a score based on standard deviation expectations.

This allows us to quantify whether something “unusual” or “surprising” with reference to the population.

Is Dracula a strange cat?

  1. Place Dracula’s score on a standard normal distribution
  2. Workout the probability of the shaded area
  3. Compare against some defined “unusualnes” cut-off

Is Dracula a strange cat?

  1. Place Dracula’s score on a standard normal distribution

Is Dracula a strange cat?

  1. Workout the probability of the shaded area

  • Shaded area: proportion of the cat population that spend more time on a task than Dracula

Is Dracula a strange cat? Working out probabilities

  1. Workout the probability of the shaded area

Option 1: Use a “Z-table” - ancient tablets from the 1800 BC. Pre-calculated and printed in special books or at the end of textbooks.

Option 1: Z-tables

A table of probabilities associated with Z-scores
0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.50000 0.50399 0.50798 0.51197 0.51595 0.51994 0.52392 0.52790 0.53188 0.53586
0.1 0.53983 0.54380 0.54776 0.55172 0.55567 0.55962 0.56356 0.56749 0.57142 0.57535
0.2 0.57926 0.58317 0.58706 0.59095 0.59483 0.59871 0.60257 0.60642 0.61026 0.61409
0.3 0.61791 0.62172 0.62552 0.62930 0.63307 0.63683 0.64058 0.64431 0.64803 0.65173
0.4 0.65542 0.65910 0.66276 0.66640 0.67003 0.67364 0.67724 0.68082 0.68439 0.68793
0.5 0.69146 0.69497 0.69847 0.70194 0.70540 0.70884 0.71226 0.71566 0.71904 0.72240
0.6 0.72575 0.72907 0.73237 0.73565 0.73891 0.74215 0.74537 0.74857 0.75175 0.75490
0.7 0.75804 0.76115 0.76424 0.76730 0.77035 0.77337 0.77637 0.77935 0.78230 0.78524
0.8 0.78814 0.79103 0.79389 0.79673 0.79955 0.80234 0.80511 0.80785 0.81057 0.81327
0.9 0.81594 0.81859 0.82121 0.82381 0.82639 0.82894 0.83147 0.83398 0.83646 0.83891
1.0 0.84134 0.84375 0.84614 0.84849 0.85083 0.85314 0.85543 0.85769 0.85993 0.86214
1.1 0.86433 0.86650 0.86864 0.87076 0.87286 0.87493 0.87698 0.87900 0.88100 0.88298
1.2 0.88493 0.88686 0.88877 0.89065 0.89251 0.89435 0.89617 0.89796 0.89973 0.90147
1.3 0.90320 0.90490 0.90658 0.90824 0.90988 0.91149 0.91309 0.91466 0.91621 0.91774
1.4 0.91924 0.92073 0.92220 0.92364 0.92507 0.92647 0.92785 0.92922 0.93056 0.93189
1.5 0.93319 0.93448 0.93574 0.93699 0.93822 0.93943 0.94062 0.94179 0.94295 0.94408
1.6 0.94520 0.94630 0.94738 0.94845 0.94950 0.95053 0.95154 0.95254 0.95352 0.95449
1.7 0.95543 0.95637 0.95728 0.95818 0.95907 0.95994 0.96080 0.96164 0.96246 0.96327
1.8 0.96407 0.96485 0.96562 0.96638 0.96712 0.96784 0.96856 0.96926 0.96995 0.97062
1.9 0.97128 0.97193 0.97257 0.97320 0.97381 0.97441 0.97500 0.97558 0.97615 0.97670
2.0 0.97725 0.97778 0.97831 0.97882 0.97932 0.97982 0.98030 0.98077 0.98124 0.98169
2.1 0.98214 0.98257 0.98300 0.98341 0.98382 0.98422 0.98461 0.98500 0.98537 0.98574
2.2 0.98610 0.98645 0.98679 0.98713 0.98745 0.98778 0.98809 0.98840 0.98870 0.98899
2.3 0.98928 0.98956 0.98983 0.99010 0.99036 0.99061 0.99086 0.99111 0.99134 0.99158
2.4 0.99180 0.99202 0.99224 0.99245 0.99266 0.99286 0.99305 0.99324 0.99343 0.99361
2.5 0.99379 0.99396 0.99413 0.99430 0.99446 0.99461 0.99477 0.99492 0.99506 0.99520
2.6 0.99534 0.99547 0.99560 0.99573 0.99585 0.99598 0.99609 0.99621 0.99632 0.99643
2.7 0.99653 0.99664 0.99674 0.99683 0.99693 0.99702 0.99711 0.99720 0.99728 0.99736
2.8 0.99744 0.99752 0.99760 0.99767 0.99774 0.99781 0.99788 0.99795 0.99801 0.99807
2.9 0.99813 0.99819 0.99825 0.99831 0.99836 0.99841 0.99846 0.99851 0.99856 0.99861
3.0 0.99865 0.99869 0.99874 0.99878 0.99882 0.99886 0.99889 0.99893 0.99896 0.99900
3.1 0.99903 0.99906 0.99910 0.99913 0.99916 0.99918 0.99921 0.99924 0.99926 0.99929
3.2 0.99931 0.99934 0.99936 0.99938 0.99940 0.99942 0.99944 0.99946 0.99948 0.99950
3.3 0.99952 0.99953 0.99955 0.99957 0.99958 0.99960 0.99961 0.99962 0.99964 0.99965
3.4 0.99966 0.99968 0.99969 0.99970 0.99971 0.99972 0.99973 0.99974 0.99975 0.99976
3.5 0.99977 0.99978 0.99978 0.99979 0.99980 0.99981 0.99981 0.99982 0.99983 0.99983
3.6 0.99984 0.99985 0.99985 0.99986 0.99986 0.99987 0.99987 0.99988 0.99988 0.99989
3.7 0.99989 0.99990 0.99990 0.99990 0.99991 0.99991 0.99992 0.99992 0.99992 0.99992
3.8 0.99993 0.99993 0.99993 0.99994 0.99994 0.99994 0.99994 0.99995 0.99995 0.99995
3.9 0.99995 0.99995 0.99996 0.99996 0.99996 0.99996 0.99996 0.99996 0.99997 0.99997

Option 1: Z-tables

\[ Z = \frac{X - Mean}{SD} = \frac{7 - 16.56}{4.71} = -2.04 \]

Z-scores
0.04
2.0 0.97932

Is Dracula a strange cat?

  1. Workout the probability of the shaded area

Option 1: Z-tables

  • Probability of a score from the shaded area: 0.97932

  • Probability of a score from the non-shaded area: 1 - 0.97932 = 0.02068 (a little over 2%)


Is Dracula a strange cat? Working out the probability

Option 2: Modern technology

pnorm(-2.04, lower.tail = FALSE)
[1] 0.9793248

  • Shaded area: 0.9793248 or 97.93%

  • Unshaded area: 0.0206752 or 2.07%

Assuming our cat population comes from a normal distribution, there’s only 0.021 probability (or 2.07%) of finding a cat like Dracula. He’s quite unusual!

Black cat, holding a little knife, eyes big, looking up.

Another example

  • Oreo looked at his owner after 22.8 seconds. Is Oreo unusual - e.g. in the top 5% ? 👍 👎

Another example

  • Oreo looked at his owner after 22.8 seconds. Is Oreo unusual - e.g. in the top 5% ? 👍 👎

Another example

pnorm(22.8, mean = 16.6, sd = 4.7, lower.tail = FALSE) # we can skip manual Z-score conversion
[1] 0.09355966

Shaded area: 0.094 or 9.36%

Unshaded area: 0.906 or 90.64%

The other way around …

  • What is the critical value cutting-off top 5%?

  • We can reverse the math (or use a different R function) and calculate a critical cut-off for a specific probability, assuming some population mean and SD:

qnorm(p = 0.05, mean = 16.6, sd = 4.7, lower.tail = FALSE)
[1] 24.33081
  • Decide in advance (e.g. top 5%, top 10%, etc…), then compare scores against this cut-off

3. Compare against some defined “unusualnes” cut-off

The other way around …


Researcher is interested in cats that spent the longest on the puzzle. He calculates the critical cut-off point for top 10% of looking latency as 22.62. He then identifies two cats. Goose, who spent 19.45 seconds on the task, and Pringles who spent 22.90 seconds on the task.

Are Goose and Pringles in the top 10%?

The other way around …

  • Goose: 19.45 seconds

  • Pringles: 22.90 seconds

  • Critical cut-off: 22.62 👍👎

The other way around …

  • Goose: 19.45 seconds

  • Pringles: 22.90 seconds

  • Critical cut-off: 22.62 👍👎

Assuming that…

We calculated these probabilities assuming a certain shape of the population distribution - we don’t know whether this assumption is reasonable.

Summary

  • Samples resemble populations - larger samples resemble them better

  • This allows us make assumptions about what the population looks might look like and calculate probabilities.

  • We assume some version of the reality (the population) and then we check whether what we observed is sufficiently unusual/interesting in this version of the reality.

  • Comparing against a pre-decided critical cut-off from some population distribution is a core principle in hypothesis testing - more in the weeks to come.

NEXT WEEK:

  • Quantifying uncertainty with standard errors and confidence intervals

References

Forman, Jemma, and David Leavens. 2024. “The Effect of Transparency on Unsolvable Task Engagement in Domestic Cats (Felis Catus) Using Citizen Science.” http://dx.doi.org/10.21203/rs.3.rs-3834933/v1.