Linear Model 1: A New Equation

Week 08

Jenny Terry

Looking Ahead (and Behind)

The story so far…
- Fundamentals of NHST & Statistical Tests

This week:
- The Linear Model - Equation of a Line

Coming up after the break:
- The Linear Model - Evaluating the Model with p-values, CIs, \(F\) & \(R^2\)
- The Linear Model - Models with Multiple Predictors
- Questionable Research Practices

“Is it bad if I don’t understand anything from the lectures?”

Today’s Objectives

After this lecture, you will (begin to) understand:

What a statistical model is and why they are useful
The equation for a linear model with one predictor
- \(b_0\) (the intercept)
- \(b_1\) (the slope)
How to use the equation to predict an outcome
How to read scatterplots and lines of best fit

Talk to Me!

Open the Lecture Google Doc: bit.ly/and24_lecture08

Statistical Models

Vocabulary: The General Model Equation

A conceptual representation of all statistical models, with the following form:

\[outcome = model + error\]

We can use models to predict the outcome for a particular case
The model itself is a set of mathematical assumptions (i.e., a formula) that assume properties about a population
This is always subject to some degree of error
Why might predictive models like this be useful? Can you think of any recent examples?

Here is one example of modelling you are probably all too familiar with

One Equation to Rule Them All!

The linear model is fundamental and extremely common statistical testing paradigm
Most statistical tests (e.g., t-tests and chi-squared) are some form of linear model
Allows us to predict an outcome from one or more predictor variables
Our first (explicit) contact with statistical modeling

All the World’s a Model…

… and our variables merely players!

Vocabulary: Predictor(s)

The variable(s) that you hypothesise will predict the outcome.

In experimental studies, this is usually the treatment variable (the thing that is manipulated).

In observational studies (e.g., cross-sectional surveys), this will be one of the variables you’ve measured.

Also - and commonly in experimental research - called the independent variable, or IV.

Vocabulary: Outcome

The variable that we hypothesise will vary, depending on the predictor.

In experimental studies, this is what you think will change because of your manipulation.

In observational studies (e.g., cross-sectional surveys), this will also be one of the variables you’ve measured!

Also - and commonly in experimental research - called the dependent variable, or DV.

The Linear Model Equation

The linear model is the equation for a straight line:

\[ y = mx + b \]

It is usually written like this:

\[y_{i} = b_{0} + b_{1} x_{1i} + e_{i}\]

I will write it in full for now, so you can get used to it:

\[y_{i} = b_{0} + b_{1}\times x_{1i} + e_{i}\]

The Linear Model Equation

\[y_{i} = b_{0} + b_{1}\times x_{1i} + e_{i}\]

Term	Meaning
\(y_i\) =	The outcome (\(y\)) for an individual’s actual score (\(i\)) is equal to (or, is predicted by)…
\(b_0\)	… the value of beta-zero (the model’s intercept)…
+	… plus…
\(b_1\)	… the value of beta-one (the model’s slope)…
\(\times\)	… multiplied by…
\(x_{1i}\)	… the value of the predictor (\(x_1\)) for an individual’s actual score (\(i\))…
\(+\)	… plus…
\(e_i\)	… the error (\(e\)) for the individual’s actual score (\(i\))…

Using the Linear Model to Make Predictions

Overview

Example 1: Predicting Masculinity from Femininity
- A recognisable example (from your correlation lecture)
- Visual, approximate representation (so you can get a sense of where the numbers come from)
- Computational, precise calculations (where we actually get the numbers from)
Example 2: Predicting Better Sleep from Positive Psychology
- A new example for extrapolation
- Visual, approximate representation (so you can get a sense of where the numbers come from)
- Computational, precise calculations (where we actually get the numbers from)

Example 1: Predicting Masculinity from Femininity

Dr Mankin was interested in the relationship between femininity and masculinity
Participants took part in a cross-sectional, self-report survey that asked them to rate their:
- Femininity
- Masculinity
- & a bunch of other things not relevant for today’s example

Hypothesis: Previous research (your correlation lecture!) suggests that… femininity will have a negative relationship with masculinity

Operationalisation

Hypothesis: Femininity will have a negative relationship with masculinity

Predictor (\(x_1\)): Femininity

Outcome (\(y\)) : Masculinity

Model: \(Masculinity_{i} = b_{0} + b_{1}\times Femininity_{1i} + e_{i}\)
- Masculinity doesn’t have a value because that is what we’re estimating
- Femininity will be given a value, but we can pick different values and plug them in to solve the equation to get the value of masculinity for whatever value of femininity we choose
- We’re not estimating error for now, so we don’t need to worry about that
- But, where do the \(b_0\) and \(b_1\) come from?! 🤔
- Hint: Remember that the linear model is the equation for a straight line…

Visualising our Model (the Line)

Where would you draw a straight line through these dots to best capture where they tend to fall?

Visualising our Model (the Line)

The line is our statistical model - it is not the data itself, but it is using the data to make a prediction

Visualising our Model (the Line)

The individual scores (data points) tend to be higher up on the left and lower down on the right

As the variable on \(x\) (here, ratings of femininity) increases…

… the variable on \(y\) (here, ratings of masculinity) tends to decrease

This represents a negative relationship between \(x\) and \(y\) - as one goes up, the other goes down

ChallengR: Why Not Correlation?

You already saw this same data, and relationship, with the correlation analysis you did with this data in a previous lecture.

Why are we doing something different? What do we get from the linear model that we don’t get from our correlation analysis?

Visualising our Model (the Line)

\(b_0\) - the intercept (where the line crosses 0 on the y-axis)
\(b_1\) - the slope (the gradient of the line - the difference in \(y\) for every unit increase in \(x\))
What would we estimate these values to be?

Estimating the Slope

Using the Model to Predict Masculinity

We can make some guesses based on the plot:

The line would cross the y-axis (aka 0 on the x-axis) somewhere between 8 and 9
- \(b_{0} \approx 8.5\)

For every unit increase on the femininity (predictor, \(x\)) scale, masculinity (outcome, \(y\)) decreases by a little less than one point
- \(b_{1} \approx -0.8\)

Using the Model to Predict Masculinity

We can then plug those values into our model…

We have already plugged in our outcome (masculinity) and predictor (femininity):

\[ Masculinity_i = b_0 + b_1\times Femininity_{1i} + e_i \]

We also know the intercept (aka \(b_0\), aka the predicted value of masculinity when femininity is 0) is \(\approx\) 8.5, so we can plug that in:

\[ Masculinity_i = \hat{8.5} + b_1\times Femininity_{1i} + e_i \]

We also know the slope (aka \(b_1\), aka the change in masculinity associated with a unit change in femininity) is \(\approx\) -0.8, so we can plug that in (note the sign change):

\[ Masculinity_i = \hat{8.5} - \hat{0.8}\times Femininity_{1i} + e_i \]

Using the Model to Predict Masculinity

Before we use the equation to predict masculinity, let’s get the real beta values from R…


Call:
lm(formula = gender_masc ~ gender_fem, data = gensex)

Coefficients:
(Intercept)   gender_fem  
     8.8246      -0.7976

Adapt our equation to include the real \(b\) values:

Intercept (\(b_{0}\)): the predicted value of masculinity when femininity is 0
- = 8.82
Slope (\(b_{1}\)): change in masculinity associated with a unit change in femininity
- = -0.80

\[Masculinity_i = \hat{8.82} - \hat{0.8}\times Femininity_{1i} + e_i\]

Using the Model to Predict Masculinity

\[Masculinity_i = \hat{8.82} - \hat{0.8}\times Femininity_{1i} + e_i\]

For someone with a fairly low (on a scale of 1-9) femininity rating of 3:

\[Masculinity_i = \hat{8.82} - \hat{0.8}\times 3 + e_i\]

\[Masculinity_i = 6.42 + e_i\]

For someone with a fairly high (on a scale of 1-9) femininity rating of 8:

\[Masculinity_i = \hat{8.82} - \hat{0.8}\times 8 + e_i\]

\[Masculinity_i = 2.42 + e_i\]

Why Not Correlation?

ChallengR: Why Not Correlation?

You already saw this same data, and relationship, with the correlation analysis you did with this data in a previous lecture.

Why are we doing something different? What do we get from the linear model that we don’t get from our correlation analysis?

Both correlation and the linear model can tell us about the strength and direction of the relationship…

… but only the linear model can predict the outcome for any value of the predictor

Correlation and the linear model are related though - see this interactive visualisation!

Example 2: Predicting Better Sleep from Positive Psychology

Tout et al. (2023) were interested in the effect of positive psychology upon sleep
Participants took part in a cross-sectional, self-report survey that asked them to rate their:
- Positive psychology attributes (a composite of gratitude, optimism, self-compassion, and mindfulness)
- Sleep quality and quantity (a composite of subjective sleep quality, sleep literacy, sleep duration, sleep efficiency, sleep disturbances, sleep medication, daytime dysfunction)
- & a bunch of other things not relevant for today’s example

Hypothesis: Based on the evidence that other positive psychology attributes positively impacted sleep, Tout hypothesised that… positive psychology attributes will have a positive relationship with sleep quality and quantity

Operationalisation

Hypothesis: Positive psychology attributes are associated with better sleep
Predictor (\(x_1\)): Positive psychology attributes
Outcome (\(y\)) : Sleep quality & quantity
Where would our predictor and outcome fit into the linear model equation: \(y_{i} = b_{0} + b_{1}\times x_{1i} + e_{i}\)?

Talk to Me!

Open the Lecture Google Doc: bit.ly/and24_lecture08

Model: \(Sleep_{i} = b_{0} + b_{1}\times PositivePsychology_{1i} + e_{i}\)
What about \(b_0\) and \(b_1\)…?

Visualising our Model (the Line)

Where would you draw a line through these dots that best captures where they tend to fall?

Visualising our Model (the Line)

Is this a positive or a negative relationship? How do we know?

Visualising our Model (the Line)

The individual scores (data points) tend to be higher up on the right and lower down on the left
As the variable on \(x\) (here, positive psychology attributes) increases…

… the variable on \(y\) (here, sleep quality & quantity) also increases

This represents a positive relationship between \(x\) and \(y\): as one goes up, the other goes up

Visualising our Model (the Line)

Two key elements:
- \(b_0\) - the intercept (where the line crosses 0 on the y-axis)
- \(b_1\) - the slope (the gradient of the line - the difference in \(y\) for every unit increase in \(x\))
What would you estimate these values to be?

Estimating the Intercept

Estimating the Slope

Using the Model to Predict Sleep

\[ Sleep = b_0 + b_1\times Positive Psychology_{1i} + e_i \]

Predictor (\(x_1\)): Positive psychology attributes
Outcome (\(y\)) : Sleep quality & quantity
Intercept (\(b_{0}\)): the predicted value of sleep when positive psychology is 0
- \(\approx\) 3.5
Slope (\(b_{1}\)): change in sleep associated with a unit change in positive psychology
- \(\approx\) 2.2

Using the Model to Predict Sleep

Before we plug the intercept and slope into the equation, let’s get the more precise beta values from R…


Call:
lm(formula = sleep ~ pos_psy, data = sleep_tib)

Coefficients:
(Intercept)      pos_psy  
      3.520        2.287

Can you adapt our equation to include the real \(b\) values?

\[ Sleep = b_0 + b_1\times Positive Psychology_{1i} + e_i \]

Intercept (\(b_{0}\)): the predicted value of sleep when positive psychology is 0
- = 3.52
Slope (\(b_{1}\)): change in sleep associated with a unit change in positive psychology
- = 2.29

Using the Model to Predict Sleep

\[Sleep_i = \hat{3.52} + \hat{2.29}\times PositivePsychology_{1i} + e_i\]

For someone with a fairly low positive psychology rating (on a scale of 1-5) of 1.5:

\[Sleep_i = \hat{3.52} + \hat{2.29}\times 1.5 + e_i\]

\[Sleep_i = 6.95 + e_i\]

For someone with a fairly high positive psychology rating (on a scale of 1-5) of 4:

\[Sleep_i = \hat{3.52} + \hat{2.29}\times 4 + e_i\]

\[Sleep_i = 12.68 + e_i\]

Summary

Vocabulary: The Linear Model

\[y_{i} = b_{0} + b_{1}\times x_{1i} + e_{i}\]

A statistical model representing the linear relationship between a predictor and outcome
\(y_{i}\): the (predicted value of the) outcome
- What we’re trying to estimate
\(x_{1i}\): the (actual value of the) predictor
- We can plug in any value of the predictor
\(b_{0}\): the intercept - the value of the outcome when the predictor is 0
- Obtained from our model
\(b_{1}\): the slope - the change in the outcome for every unit change in the predictor
- Obtained from our model
\(e_{i}\): the (unknown and unknowable) error in prediction
- More on error next year

Welcome to the World of `lm()`

The linear model (lm()) will be crucial for the rest of your degree
If that was a bit of a blur to you, it’s highly recommended that you spend some time working through it slowly, until it clicks.
These sources may be helpful:
- Visualisation on the Analysing Data website
- Khan Academy’s introduction to linear equations
- Learning Statistics with R - see Section V, Chapter 15, Linear Regression
- Andy Field’s statistics textbooks (SPSS version is fine, edition 4 onwards)
Next week (after the break):
- Recap of the Linear Model
- How do we know if it is a good model?
- How do we know if it is a good prediction?

Linear Model 1: A New Equation

Looking Ahead (and Behind)

Today’s Objectives

Statistical Models

One Equation to Rule Them All!

All the World’s a Model…

The Linear Model Equation

The Linear Model Equation

Using the Linear Model to Make Predictions

Overview

Example 1: Predicting Masculinity from Femininity

Operationalisation

Visualising our Model (the Line)

Visualising our Model (the Line)

Visualising our Model (the Line)

Visualising our Model (the Line)

Estimating the Slope

Using the Model to Predict Masculinity

Using the Model to Predict Masculinity

Using the Model to Predict Masculinity

Using the Model to Predict Masculinity

Why Not Correlation?

Example 2: Predicting Better Sleep from Positive Psychology

Operationalisation

Visualising our Model (the Line)

Visualising our Model (the Line)

Visualising our Model (the Line)

Visualising our Model (the Line)

Estimating the Intercept

Estimating the Slope

Using the Model to Predict Sleep

Using the Model to Predict Sleep

Using the Model to Predict Sleep

Summary

Welcome to the World of lm()

Welcome to the World of `lm()`