Dice Roll | Observed Count |
---|---|
1 | 25 |
2 | 29 |
3 | 24 |
4 | 22 |
Week 07
We did it!!! 🎉
This afternoon I will post the Set Analysis on the TAP Information page under “TAP Materials”
The Set Analysis is the output you must use for your Psychobiology report
You MUST NOT use the output from your own TAP!
Important
The numbers and output on the Set Analysis WILL BE DIFFERENT than what you produced for the TAP.
DO NOT PANIC!
After this lecture you will understand:
The concepts behind tests of goodness-of-fit and association
How to read tables and figures of counts
How to calculate the \(\chi^2\) statistic
How to interpret and report significance tests of \(\chi^2\)
The relationship between association and causation
To start, an dicey 🎲 example to get the key ideas
Refresh concepts of probability, frequencies, and counts
Introduction to the \(\chi2\) test statistic
I want to know if my four-sided die (d4) is fair
If it is, each number should come up with equal probability
So, if I roll the dice 100 times, each number should come up (approximately) 25 times
Dice Roll | Observed Count |
---|---|
1 | 25 |
2 | 29 |
3 | 24 |
4 | 22 |
These numbers are not exactly 25 each
The Fundamental Question
Are these proportions what we would expect if the die were fair? Are they different enough to believe the die is not fair?
How different are the observed counts from the expected counts?
Dice Roll | Obs. Count | Exp. Count |
---|---|---|
1 | 25 | 25 |
2 | 29 | 25 |
3 | 24 | 25 |
4 | 22 | 25 |
\(\chi^2 = \frac{(25-25)^2}{25} + \frac{(29-25)^2}{25} + \frac{(24-25)^2}{25} + \frac{(22-25)^2}{25}\)
\(\chi^2 = \frac{0}{25} + \frac{16}{25} + \frac{1}{25} + \frac{9}{25}\)
\(\chi^2 = 0 + 0.64 + 0.04 + 0.36\)
\(\chi^2 = 1.04\)
Vocabulary: Observed counts
The number of occurrences in each category observed in the sample.
Vocabulary: Expected counts
The number of occurrences in each category expected under the null hypothesis.
Vocabulary: \(\chi^2\)
The test statistic \(\chi^2\) represents the sum of the squared (and scaled) differences between observed and expected counts.
We’ve calculated a test statistic, \(\chi^2\), that represents the thing we are trying to test
Compare our test statistic to the distribution of that statistic
IMPORTANT: These distributions assume that the null hypothesis is true!
Here, our null hypothesis is that the die IS fair
Unfortunately test statistics like the one we have are not normally distributed
No problem - we just have to use a different distribution!
Meet the \(\chi^2\) distribution
The sum of squared normal distributions
See this excellent Khan Academy explainer for more!
Degrees of freedom determine the distribution’s shape and proportions
At base, they are the number of values that are free to vary
Consider your module quiz scores: let’s say you want an overall quiz mark of 5/7 (or 71.43%)
Week of Term | Quiz Score | Rolling Mean |
---|---|---|
Week 3 | 6 | 6 |
Week 4 | 4 | 5 |
Week 5 | 2.5 | 4.17 |
Week 6 | 6.5 | 4.75 |
Week 8 | 4 | 4.6 |
Week 9 | 7 | 5 |
Week 10 | 3 | 4.71 |
Week 11 | ??? | ??? |
The last value must have a particular value in order to work out to the desired mean
\[\frac{6 + 4 + 2.5 + 6.5 + 4 + 7 + 3 + ???}{8} = 5\]
Here, df = 7
One less than the number of scores
Other degrees of freedom have a similar idea, just calculated differently
Important
You do NOT need to know how to calculate degrees of freedom!
You must know how to report it from the output, and have some idea of what it does.
Look at the distribution for 3 degrees of freedom
What percentage of the distribution is greater than or equal to 1.04?
The sum of squared differences between our expected and observed counts ( \(\chi^2\) ) was 1.04
For a \(\chi^2\) distribution with 3 degrees of freedom, this value is extremely common under the null hypothesis!
If our die is fair, our data are extremely likely
To believe that the die was not fair, we would have to observe test statistic of > ~7.8 (\(\alpha\) = .05)
If only there were an easier way to do this…!
\(\chi^2\) quantifies how different a set of observed frequencies are from expected frequencies
We can follow the usual steps of our analysis:
Obtain data
Calculate test statistic
Compare to distribution
Obtain p-value
Evaluate hypotheses
Vocabulary: \(\chi^2\) Goodness of Fit Test
Tests whether a sample of frequency data came from a population with a specific, known distribution.
Next, let’s look at a test of association, or independence
Vocabulary: Continuous data
Represent some measurement or score on a scale.
Examples: Reaction time to press a button, mean anxiety scor
Answers the question: how much?
Vocabulary: Categorical data
Represent membership in a particular group or condition.
Examples: control vs experimental group, year of uni
Answers the question: which one?
This time we will have two variables, both categorical
Data: counts of how many observations fall into each combination of categories
Spatial orientation of sequences, such as numbers, months, or days of the week
“Calendars” of spatial orientations of months of the year
Brang et al. (2011): Is the orientation of the calendar related to the synaesthete’s handedness?
Orientation: months progress clockwise or counterclockwise in space
Handedness: left or right handed
Each synaesthete has one value for orientation and one value for handedness
Orientation | Handedness |
---|---|
clock | right |
clock | right |
clock | right |
anti | right |
anti | left |
Our study is investigating the relationship between handedness (right or left) and direction of a synaesthete’s spatial orientation (clockwise or counterclockwise)
What is the alternative hypothesis?
What is the null hypothesis for this study?
What do you think we will find?
Alternative hypothesis
Clockwise and anticlockwise calendar orientations will occur in different proportions in left- and right-handed syanesthetes
Slight rephrase: Calendar orientation is associated with synaesthete handedness
Null hypothesis
Both calendar orientations will occur in equal proportions in left- and right-handed syanesthetes
Slight rephrase: Calendar orientation is not associated with synaesthete handedness
Prediction
From the Brang et al. paper:
Right-handed synaesthetes will tend to have a clockwise calendar
Left-handed synaesthetes will tend to have an anticlockwise calendar
Left-handed synaesthetes have more anti-clockwise than clockwise
Right-handed synaesthetes have the reverse
Do our results indicate that there may be an association between orientation and handedness?
Pearson's Chi-squared test with Yates' continuity correction
data: seq_space$orientation and seq_space$handedness
X-squared = 9.7798, df = 1, p-value = 0.001764
Interpretation
What can you conclude from this result?
Do our results indicate that there may be an association between orientation and handedness?
Pearson's Chi-squared test with Yates' continuity correction
data: seq_space$orientation and seq_space$handedness
X-squared = 9.7798, df = 1, p-value = 0.001764
Interpretation
“There was a significant association between calendar orientation and handedness ( \(\chi^2\)(1) = 9.78, p = .002).”
Our hypothesis is supported by the data
Furthermore, the association is in the direction we predicted
One of the assumptions of \(\chi^2\) is that all expected frequencies are greater than 5
Otherwise this test can give you a drastically wrong answer 😱
We can get these easily out of R!
Orientation | Left | Right |
---|---|---|
Anti-Clockwise | 3.53 | 8.47 |
Clockwise | 6.47 | 15.53 |
😬😬😬😬😬
In this case, use Fisher’s exact test (fisher.test()
) instead
We have just had our first glimpse of statistical assumptions
Vocabulary: Statistical Assumption
A precondition that must be true in order for a statistical test to work as expected. If these assumptions are violated (i.e. not true), then the test may give inaccurate or misleading estimates or results.
The \(\chi^2\) test quantifies the difference between observed and expected frequencies
Goodness of Fit
Test of Association/Independence
Like with correlation, association is not causation
For quizzes/exam:
You will not be expected to calculate \(\chi^2\) by hand!
You will be expected to interpret the output of chisq.test()
for tests of association
More in the tutorial!