Correlation

Week 06

Jennifer Mankin

Announcements/Reminders

Studies ongoing

Nominate faculty for a University Education Award!

  • Nominate staff who have inspired you, or made a positive difference to your experience

  • Nominated staff see all nominations - makes a huge difference!

  • Deadline: Friday 8 March (tomorrow)

Nominate classmates for a SavioR Award!

  • Thank someone who has helped you with R

The Take-Away Paper

  • Preparing for the TAP

    • Read the Take-Away Paper Information page carefully!

    • Attempt the sample take-away paper (on the Cloud)

    • Tonight’s Skills Lab will talk through some portions of the paper, do Q&A

Practicals Next Week

  • Practice-only Quiz
    • We will still run practicals and quizzes as normal (including taking attendance)
    • Next week’s quiz mark will not count toward your module mark.
  • Short (Fun and Completely Optional) Study
    • Possible difference in the code revision portion of the session
    • Invitation to complete a short survey after the quiz
    • Help us understand what helps you learn!

Looking Ahead

  • This week: Correlation
  • Week 7: Chi-Square (\(\chi^2\))
  • Week 8: The Linear Model
  • Week 9: The Linear Model
  • Week 10: The Linear Model…
    • (You get the idea)

Today’s Objectives

After this lecture you will understand:

  • The concepts behind statistical correlation

  • How to interpret the values of the correlation coefficient r

  • How to read a correlation matrix

  • How to interpret and report significance tests of r

  • The relationship between correlation and causation 👀

Distributions, Test Statistics, and NHST

Putting our statistical “grammar” into practice

For each statistical analysis, we will have the same elements:

  • Data, from which we calculate…
  • A test statistic that represents the relationship of interest, which we compare to…
  • The distribution of that test statistic under the null hypothesis to get…
  • The probability p of getting a test statistic as large as the one we have (or larger) if the null hypothesis is true so that we can…
  • Evaluate our competing hypotheses using a previously decided \(\alpha\) level.

Overall Reminder

  • We want to believe true things about the world, and disbelieve false things

    • More accurately: we should believe things that are well-founded in reliable evidence, and disbelieve things that are not
  • Statistics is a system to help us make decisions about whether, and to what degree, we believe something is supported by evidence

The Core of Correlation

  • Quantifies how two quantities change in relation to each other

  • When one variable changes, does the other…

    • Change in a similar way?

    • Change in the opposite way?

    • Not change very much at all?

The Fundamental Question

To what degree do two variables behave the same way - do they covary?

Putting the Co in Covariance

  • Variance should be familiar already!

  • Covariance (“co” = with) is a similar idea - and calculated in a similar way

Vocabulary: Variance

How much scores deviate from the mean, on average

Calculate how far each data point is from the mean of x, multiply the deviations by themselves (i.e. square them), add them all together, divide by N - 1

\[\text{variance} = s^2 = \frac {\sum\nolimits_{i = 1}^n {(x_{i} - \bar{x_{i}})(x_{i} - \bar{x_{i}})}}{N - 1}\]

Vocabulary: Covariance

How much pairs of scores deviate from their (respective) means in the same way, on average

Calculate how far each data point is from the mean for both x and y, multiply them together, add all those together, divide by N - 1

\[\text{covariance}_{xy} = \frac {\sum\nolimits_{i = 1}^n {(x_{i} - \bar{x_{i}})(y_{i} - \bar{y_{i}})}}{N - 1}\]

  • Let’s look at an example to make this more concrete!

Gender and Sexuality Questionnaire

  • Psychology very frequently collects gender as a variable

    • Typically categorical, e.g. “woman”, “man”, “non-binary/third gender”
  • Is this a useful way to categorise people?

    • Do these discrete categories capture gender as people perceive it?
  • Gender and Sexuality Questionnaire about gender and attraction

    • Completed by two previous cohorts of Psychology first years

Research Question

Are femininity and masculinity actually dichotomous? What is the nature of the relationship between them?

Visualisation: Gender Ratings

Plot of ratings of femininity vs masculinity

What’s r Got To Do With It?

  • People who rated their femininity high tended to rate their masculinity low, and vice versa

  • We might like to know:

    • What is the nature of this relationship?

    • How strong is it? What direction does it go?

    • Should we believe that it’s real (ie representative of people/first-year psychology students in general?)

Covariance to Correlation

  • We talked a moment ago about covariance

    • How much pairs of scores deviate from their (respective) means in the same way, on average

\[\text{covariance}_{xy} = \frac {\sum\nolimits_{i = 1}^n {(x_{i} - \bar{x_{i}})(y_{i} - \bar{y_{i}})}}{N - 1}\]

  • This is the signal - the effect or relationship of interest (sound familiar…??)

What’s r Got to Do With It?

  • Same problem as last week with the difference in the means

    • Is the covariance big? Small? In comparison to what?
  • Let’s standardise by dividing by an estimate of the noise

    • Here: the product of the two variables’ standard deviations

\[r = \frac{covariance_{xy}}{s_{x}s_{y}}\]

  • What do you know! It’s a signal to noise ratio, Pearson’s correlation coefficent r

Understanding r

  • Typically used with two (or more) continuous variables

    • Can be used when one is categorical!
  • r quantifies the strength and direction of the relationship

    • ALWAYS has a value between -1 and 1

Strength

  • Absolute value of r between 0 and 1

    • 0: no relationship at all
    • 1: perfect relationship

Direction

  • The sign of r (positive or negative)

    • Positive: as one variable increases, the other tends to increase
    • Negative: as one variable increases, the other tends to decrease

Let’s Try It!


r 95% CI
-0.76 [-0.8, -0.7]
  • So, our correlation coefficient \(r = -.76\)

Pop Quiz

How can we interpret this value of r?

Correlation: Interpretation

  • The negative sign (-) means as femininity increases, masculinity tends to decrease (and vice versa)

  • The absolute value of .76 is very strong!

Correlation: Significance

  • We now have our data, from which we calculated…

  • Our test statistic r (-.76)

  • We also know the distribution of r with different degrees of freedom

    • Or, rather…of t, for Reasons (TM)
  • How likely is an r of -.76 (or larger) if in fact femininity and masculinity have a true r of 0

    • i.e. the null hypothesis is in fact true

    • We will again use \(\alpha\) = .05 in this case

Correlation: Significance

Parameter1 Parameter2 r 95% CI t(304) p
gender_fem gender_masc -0.76 [-0.8, -0.7] -20.1 < .001

Reporting Correlation

There was a significant negative correlation between femininity and masculinity, r(304) = -.76, p < .001.

Leading 0s

We reported both r and p without leading 0s (e.g. as -.76 and not -0.76). The rule is this:

  • Statistics that can have a value greater than 1 get a leading 0 when they are less than 1 (e.g. t, F)
  • Statistics that cannot have a value greater than 1 do not (e.g. r, p)

Correlation Matrices

  • Correlations are often presented in matrices

  • Each cell contains the correlation coefficient r for the variables in that row and column

                   gender_comfortable gender_masc gender_fem gender_stability
gender_comfortable               1.00       -0.31       0.17             0.61
gender_masc                     -0.31        1.00      -0.76            -0.28
gender_fem                       0.17       -0.76       1.00             0.18
gender_stability                 0.61       -0.28       0.18             1.00

Pop Quiz

Why is there a diagonal line of 1s?

Correlation Matrices

  • More useful version with GGally::ggscatmat()

  • Scatterplots, distributions, and r values

Correlation = Causation?

  • Our analysis showed that higher ratings of femininity tended to correspond to lower ratings of masculinity, and vice versa

  • Can we conclude from this that being more feminine causes you to be more masculine?



❌🙅💥🚨 ABSOLUTELY NOT!!! 🚨💥🙅❌

Correlation ≠ Causation!

  • Why not? :(
  • No distinction between cause and effect

    • Which is the chicken and which is the egg?

    • Which came first: femininity or masculinity?

  • No experimental manipulation (randomisation)
  • The problem of tertium quid

Consider This…

How many quid?

  • “Parallels”, “linked to”, etc. are common-language synonyms for “correlated with”
  • This study says that they looked at changes in mental health between 2005 and 2017
  • What third thing might have influenced negative mental health outcomes during this time?

Vocabulary: Tertium quid

An unmeasured third variable that influences two other measured quantities

How many quid?

As it turns out…

  • The original study didn’t measure or have access to data on social media and smartphone use

    • They did measure changes in mental health outcomes in different groups
    • They then suggested this could be due to social media
  • Nurture a healthy skepticism of claims that two things are “linked”

    • What evidence do they have? Or NOT have?

    • What other explanations have not be considered or accounted for?

Correlation: VOCAB ALERT!

  • In everyday language, “correlated” means “related to in some way, usually causally”

    • In statistics, it has a very specific, technical definition

Vocabulary: Correlation

The (standardised) degree to which two variables covary. Calculated as covariance divided by the product of the standard deviations. Quantifies both the strength (absolute value) and direction (sign) of the relationship between -1 and 1.

  • “Correlation” is a technical term!

    • Do not say two things are “correlated” unless you report r as evidence!

    • Instead: variables “have a relationship”/“are related to each other”

More Examples

Say It With Me







❌🙅🚨 CORRELATION DOES NOT IMPLY CAUSATION!🚨🙅❌

Correlation: Summary

  • The correlation coefficient r quantifies the strength and direction of relationships between variables

  • The p-value associated with r is the probability of encountering a value of r as large as the one we have, or larger, if in fact the true value of r in the population is 0

  • Correlation DOES NOT IMPLY CAUSATION!!!!!!!

Reminders!







✨Good luck!!!✨