Correlation

Week 06

Jennifer Mankin

Announcements/Reminders

Studies ongoing

Hybrid teaching and disability support: rebrand.ly/hybrid_ds
ChatGPT and AI at University: rebrand.ly/gpt_uni

Nominate faculty for a University Education Award!

Nominate staff who have inspired you, or made a positive difference to your experience
Nominated staff see all nominations - makes a huge difference!
Deadline: Friday 8 March (tomorrow)

Nominate classmates for a SavioR Award!

Thank someone who has helped you with R

The Take-Away Paper

Preparing for the TAP
- Read the Take-Away Paper Information page carefully!
- Attempt the sample take-away paper (on the Cloud)
- Tonight’s Skills Lab will talk through some portions of the paper, do Q&A

Practicals Next Week

Practice-only Quiz
- We will still run practicals and quizzes as normal (including taking attendance)
- Next week’s quiz mark will not count toward your module mark.
Short (Fun and Completely Optional) Study
- Possible difference in the code revision portion of the session
- Invitation to complete a short survey after the quiz
- Help us understand what helps you learn!

Looking Ahead

This week: Correlation

Week 7: Chi-Square (\(\chi^2\))

Week 8: The Linear Model

Week 9: The Linear Model

Week 10: The Linear Model…
- (You get the idea)

Today’s Objectives

After this lecture you will understand:

The concepts behind statistical correlation
How to interpret the values of the correlation coefficient r
How to read a correlation matrix
How to interpret and report significance tests of r
The relationship between correlation and causation 👀

Distributions, Test Statistics, and NHST

Putting our statistical “grammar” into practice

For each statistical analysis, we will have the same elements:

Data, from which we calculate…

A test statistic that represents the relationship of interest, which we compare to…

The distribution of that test statistic under the null hypothesis to get…

The probability p of getting a test statistic as large as the one we have (or larger) if the null hypothesis is true so that we can…

Evaluate our competing hypotheses using a previously decided \(\alpha\) level.

Overall Reminder

We want to believe true things about the world, and disbelieve false things
- More accurately: we should believe things that are well-founded in reliable evidence, and disbelieve things that are not
Statistics is a system to help us make decisions about whether, and to what degree, we believe something is supported by evidence

The Core of Correlation

Quantifies how two quantities change in relation to each other
When one variable changes, does the other…
- Change in a similar way?
- Change in the opposite way?
- Not change very much at all?

The Fundamental Question

To what degree do two variables behave the same way - do they covary?

Putting the Co in Covariance

Variance should be familiar already!
- Check out PAAS Lecture 7 for a refresher
Covariance (“co” = with) is a similar idea - and calculated in a similar way

Vocabulary: Variance

How much scores deviate from the mean, on average

Calculate how far each data point is from the mean of x, multiply the deviations by themselves (i.e. square them), add them all together, divide by N - 1

\[\text{variance} = s^2 = \frac {\sum\nolimits_{i = 1}^n {(x_{i} - \bar{x_{i}})(x_{i} - \bar{x_{i}})}}{N - 1}\]

Vocabulary: Covariance

How much pairs of scores deviate from their (respective) means in the same way, on average

Calculate how far each data point is from the mean for both x and y, multiply them together, add all those together, divide by N - 1

\[\text{covariance}_{xy} = \frac {\sum\nolimits_{i = 1}^n {(x_{i} - \bar{x_{i}})(y_{i} - \bar{y_{i}})}}{N - 1}\]

Let’s look at an example to make this more concrete!

Gender and Sexuality Questionnaire

Psychology very frequently collects gender as a variable
- Typically categorical, e.g. “woman”, “man”, “non-binary/third gender”
Is this a useful way to categorise people?
- Do these discrete categories capture gender as people perceive it?

Gender and Sexuality Questionnaire about gender and attraction
- Completed by two previous cohorts of Psychology first years

Research Question

Are femininity and masculinity actually dichotomous? What is the nature of the relationship between them?

Visualisation: Gender Ratings

Plot of ratings of femininity vs masculinity

What’s r Got To Do With It?

People who rated their femininity high tended to rate their masculinity low, and vice versa
We might like to know:
- What is the nature of this relationship?
- How strong is it? What direction does it go?
- Should we believe that it’s real (ie representative of people/first-year psychology students in general?)

Covariance to Correlation

We talked a moment ago about covariance
- How much pairs of scores deviate from their (respective) means in the same way, on average

\[\text{covariance}_{xy} = \frac {\sum\nolimits_{i = 1}^n {(x_{i} - \bar{x_{i}})(y_{i} - \bar{y_{i}})}}{N - 1}\]

This is the signal - the effect or relationship of interest (sound familiar…??)

What’s r Got to Do With It?

Same problem as last week with the difference in the means
- Is the covariance big? Small? In comparison to what?

Let’s standardise by dividing by an estimate of the noise
- Here: the product of the two variables’ standard deviations

\[r = \frac{covariance_{xy}}{s_{x}s_{y}}\]

What do you know! It’s a signal to noise ratio, Pearson’s correlation coefficent r

Understanding r

Typically used with two (or more) continuous variables
- Can be used when one is categorical!
r quantifies the strength and direction of the relationship
- ALWAYS has a value between -1 and 1

Strength

Absolute value of r between 0 and 1
- 0: no relationship at all
- 1: perfect relationship

Direction

The sign of r (positive or negative)
- Positive: as one variable increases, the other tends to increase
- Negative: as one variable increases, the other tends to decrease

Let’s Try It!

r	95% CI
-0.76	[-0.8, -0.7]

So, our correlation coefficient \(r = -.76\)

Pop Quiz

How can we interpret this value of r?

Correlation: Interpretation

The negative sign (-) means as femininity increases, masculinity tends to decrease (and vice versa)
The absolute value of .76 is very strong!

Correlation: Significance

We now have our data, from which we calculated…
Our test statistic r (-.76)
We also know the distribution of r with different degrees of freedom
- Or, rather…of t, for Reasons (TM)

How likely is an r of -.76 (or larger) if in fact femininity and masculinity have a true r of 0
- i.e. the null hypothesis is in fact true
- We will again use \(\alpha\) = .05 in this case

Correlation: Significance

Parameter1	Parameter2	r	95% CI	t(304)	p
gender_fem	gender_masc	-0.76	[-0.8, -0.7]	-20.1	< .001

Reporting Correlation

There was a significant negative correlation between femininity and masculinity, r(304) = -.76, p < .001.

Leading 0s

We reported both r and p without leading 0s (e.g. as -.76 and not -0.76). The rule is this:

Statistics that can have a value greater than 1 get a leading 0 when they are less than 1 (e.g. t, F)
Statistics that cannot have a value greater than 1 do not (e.g. r, p)

Correlation Matrices

Correlations are often presented in matrices
Each cell contains the correlation coefficient r for the variables in that row and column

                   gender_comfortable gender_masc gender_fem gender_stability
gender_comfortable               1.00       -0.31       0.17             0.61
gender_masc                     -0.31        1.00      -0.76            -0.28
gender_fem                       0.17       -0.76       1.00             0.18
gender_stability                 0.61       -0.28       0.18             1.00

Pop Quiz

Why is there a diagonal line of 1s?

Correlation Matrices

More useful version with GGally::ggscatmat()
Scatterplots, distributions, and r values

Correlation = Causation?

Our analysis showed that higher ratings of femininity tended to correspond to lower ratings of masculinity, and vice versa
Can we conclude from this that being more feminine causes you to be more masculine?

❌🙅💥🚨 ABSOLUTELY NOT!!! 🚨💥🙅❌

Correlation ≠ Causation!

Why not? :(

No distinction between cause and effect
- Which is the chicken and which is the egg?
- Which came first: femininity or masculinity?

No experimental manipulation (randomisation)

The problem of tertium quid

Consider This…

A new study shows a rise in depression and stress among young people parallels the growth in smartphone and social media use.https://t.co/AxyseUyBxn

— NPR (@NPR) March 14, 2019

How many quid?

“Parallels”, “linked to”, etc. are common-language synonyms for “correlated with”
This study says that they looked at changes in mental health between 2005 and 2017

What third thing might have influenced negative mental health outcomes during this time?

Vocabulary: Tertium quid

An unmeasured third variable that influences two other measured quantities

How many quid?

As it turns out…

The original study didn’t measure or have access to data on social media and smartphone use
- They did measure changes in mental health outcomes in different groups
- They then suggested this could be due to social media

Nurture a healthy skepticism of claims that two things are “linked”
- What evidence do they have? Or NOT have?
- What other explanations have not be considered or accounted for?

Correlation: VOCAB ALERT!

In everyday language, “correlated” means “related to in some way, usually causally”
- In statistics, it has a very specific, technical definition

Vocabulary: Correlation

The (standardised) degree to which two variables covary. Calculated as covariance divided by the product of the standard deviations. Quantifies both the strength (absolute value) and direction (sign) of the relationship between -1 and 1.

“Correlation” is a technical term!
- Do not say two things are “correlated” unless you report r as evidence!
- Instead: variables “have a relationship”/“are related to each other”

More Examples

Website that collects examples of spurious correlations
- Can you suggest a “third thing” that might influence both?
- Content warning: examples involve death rates, self-harm rates
More practice with interpreting r with this fun little game

Say It With Me

❌🙅🚨 CORRELATION DOES NOT IMPLY CAUSATION!🚨🙅❌

Correlation: Summary

The correlation coefficient r quantifies the strength and direction of relationships between variables
The p-value associated with r is the probability of encountering a value of r as large as the one we have, or larger, if in fact the true value of r in the population is 0
Correlation DOES NOT IMPLY CAUSATION!!!!!!!

Reminders!

Hybrid teaching and disability support study: rebrand.ly/hybrid_ds
ChatGPT and AI at University study: rebrand.ly/gpt_uni
Nominate someone for a SavioR award
Nominate staff for the Education Awards
Prepare for the TAP!
- Read the TAP Information page
- Sample TAP in Skills Lab TONIGHT
Next week’s (07) practicals will:
- Contain a short and optional study
- Have a quiz that is practice only (i.e. will not contribute to your quiz mark)!
- Be based only on this week’s lecture and tutorial

Correlation

Announcements/Reminders

The Take-Away Paper

Practicals Next Week

Looking Ahead

Today’s Objectives

Distributions, Test Statistics, and NHST

Overall Reminder

The Core of Correlation

Putting the Co in Covariance

Gender and Sexuality Questionnaire

Visualisation: Gender Ratings

What’s r Got To Do With It?

Covariance to Correlation

What’s r Got to Do With It?

Understanding r

Strength

Direction

Let’s Try It!

Correlation: Interpretation

Correlation: Significance

Correlation: Significance

Correlation Matrices

Correlation Matrices

Correlation = Causation?

❌🙅💥🚨 ABSOLUTELY NOT!!! 🚨💥🙅❌

Correlation ≠ Causation!

Consider This…

How many quid?

How many quid?

Correlation: VOCAB ALERT!

More Examples

Say It With Me

❌🙅🚨 CORRELATION DOES NOT IMPLY CAUSATION!🚨🙅❌

Correlation: Summary

Reminders!

✨Good luck!!!✨