Linear Model 3: Return of the yi

Week 10

Jenny Terry

Survey Study on Statistics Attitudes & Career Goals

Interested in a Research Career?

Looking Ahead (and Behind)

  • The story so far…

    • The Linear Model - Equation of a Line

    • The Linear Model - Evaluating the Model with p-values, CIs, \(F\), & \(R^2\)

  • This week:

    • The Linear Model - Adding predictors; Comparing models; Comparing predictors
  • Coming up:

    • Questionable Research Practices

Objectives

After this lecture, you will (begin to) understand:

  • How to extend the linear model equation to two, three, or more predictors

  • How to compare linear models using the \(R^2\)-change and \(F\)- change statistics

  • How to interpret the relationship of each predictor with the outcome

  • How to compare predictors using standardised beta coefficients

Talk to Me!

Open the Lecture Google Doc: bit.ly/and24_lecture10

Three Types of Research Questions

When we are using linear models with more than one predictor (aka “multiple regression”), there are usually three stages we go through, each of which align with slightly different research questions:

  1. Which model is better (e.g., a model with one predictor vs. a model with three predictors)?
  1. Does each predictor in the better model have a statistically significant relationship with the outcome (and which direction will those relationships be in)?
  1. Which predictor in the better model has the biggest impact upon the outcome?

We’ll look at these in turn, but first, let’s just see what the linear model looks like with more than one predictor…

Adding Predictors to the Linear Model

Extending the Equation

The equation:

  • One-predictor model: \(y_{i} = b_{0} + b_{1}\times x_{1i} + e_{i}\)

    • Predicts the outcome \(y\) based on a predictor \(x_1\)
  • Two-predictor model: \(y_{i} = b_{0} + b_{1}\times x_{1i} + b_{2}\times x_{2i} + e_{i}\)

    • Predicts the outcome \(y\) based on a predictor \(x_1\) and another predictor \(x_2\)
  • Three–predictor model: \(y_{i} = b_{0} + b_{1}\times x_{1i} + b_{2}\times x_{2i} + b_{3}\times x_{3i} + e_{i}\)

    • Predicts the outcome \(y\) based on a predictor \(x_1\) and \(x_2\) and \(x_3\)
  • \(n\)-predictor model: \(y_{i} = b_{0} + b_{1}\times x_{1i} + ... + b_{n}\times x_{ni} + e_{i}\)

    • Predicts the outcome \(y\) based on as many predictors as you like!

1. Comparing Linear Models

Comparing Linear Models

  • We can compare linear models with different numbers of predictors as long as they are hierarchical (aka nested)

  • Hierarchical models must be the same except for the addition of something new

    • The model with one predictor is nested in the two and three predictor models because they are the same except for the addition of the extra predictor(s)

    • Similarly, a two-predictor model would be nested in a three-predictor model

    • However, we couldn’t remove a variable from the two-predictor model and replace it with a different one to create a different two-predictor model and compare those

    • In that case, changing the variable would mean it is no longer the same and, therefore, no longer nested

What is a ‘good’ model?

  • In Lecture 09, we learned that a good model will:

    • Fit the data better than the simplest possible model (\(R^2\) & \(F\)-statistic)

    • Explain a lot of variance in the outcome (\(R^2\))

    • Explain an amount of variance that significantly differs from zero (\(F\)-statistic)

  • When we are comparing models, we can use the \(R^2\) and \(F\) values to instead ask, which is the better fitting model?

It’s all Greek to me!

Vocabulary: \(\Delta\)

\(\Delta\) just means “change”

\(R^2\) Change

How do we interpret \(R^2\)? 🤔

Vocabulary: \(R^2\)

\(R^2\) = percentage of the variance in the outcome explained by the model

  • A larger value means better fit (more variance is explained)

Vocabulary: \(R^2\) Change

\(R^2 \Delta = R^2_\text{Model 1} - R^2_\text{Model 2}\)

  • The difference in \(R^2\) values between two models

  • The model with the larger \(R^2\) value is the better fitting model (more variance is explained)

  • The larger the \(R^2\Delta\) value, the greater the improvement in the better fitting model

\(F\) Change

How do we interpret \(F\)? 🤔

Vocabulary: \(F\)-statistic

\(F\) = whether the variance explained significantly differs from zero

  • Compares a model with predictor(s) to the null model

  • An \(F\)-statistic with a p-value lower than 0.05 means that it is statistically significant, so we can reject the null hypothesis that the null model fits as well as the predictor model

Vocabulary: \(F\) Change

\(F\Delta = F_\text{Model 1} - F_\text{Model 2}\)

  • Compares models with predictors to one another

  • The model with the larger \(F\)-statistic is the better fitting model

  • An \(F\Delta\) with an a p-value lower than 0.05 means that the model is statistically significantly better, so we can reject the null hypothesis that one model fits as well as the other model

Let’s look at an example to see how this looks in practice…

Example: Predicting Sleep from 1 vs. 3 Predictors

  • Tout et al. (2023) were interested in the effect of positive psychology and emotional regulation upon sleep

  • Participants took part in a cross-sectional, self-report survey that asked them to rate their:

    • Positive psychology attributes (a composite of gratitude, optimism, self-compassion, and mindfulness)

    • Adaptive emotional regulation strategies (a composite of acceptance, positive refocusing, refocus on planning, positive reappraisal, perspective taking)

    • Maladaptive emotional regulation strategies (a composite of self-blame, rumination, catastrophising, other-blame)

    • Sleep quality and quantity (a composite of subjective sleep quality, sleep literacy, sleep duration, sleep efficiency, sleep disturbances, sleep medication, daytime dysfunction)

Research Question & Hypothesis

Research Question

  1. Which model is better, a model with just positive psychology attributes as a predictor or a model with positive psychology attributes and adaptive emotional regulation strategies and maladaptive emotional regulation?

Hypotheses

  • The model with positive psychology attributes and adaptive emotional regulation strategies and maladaptive emotional regulation will fit the data better than a model with just positive psychology attributes.

Operationalisation - Model 1

  • Predictors:

    1. Positive psychology attributes (\(PosPysch\))
  • Outcome: Sleep quality & quantity (\(Sleep\))

  • Model 1: \(Sleep_{i} = b_{0} + b_{1}\times PosPsych_{1i} +e_{i}\)

Operationalisation - Model 2

  • Predictors:

    1. Positive psychology attributes (\(PosPysch\))

    2. Adaptive emotion regulation attributes (\(AdaptEmoReg\))

    3. Maladaptive emotion regulation attributes (\(MalEmoReg\))

  • Outcome: Sleep quality & quantity (\(Sleep\))

  • So, what will our 3-predictor model equation look like? 🤔

  • Model 2: \(Sleep_{i} = b_{0} + b_{1}\times PosPsych_{1i} + b_2\times AdaptEmoReg_{2i} + b_3\times MalEmoReg_{3i} +e_{i}\)

Running the Analysis

We run the analysis in a series of stages:

  1. Fit Models

    1. Fit Model 1

    2. Fit model 2

  2. Compare \(R^2\) Values

    1. Calculate \(R^2\) for Models 1 & 2

    2. Calculate \(R^2\Delta\)

  3. Calculate \(F\Delta\)

Running the Analysis

Step 1a: Fit Model 1

Model 1

sleep_lm1 <- sleep_tib |>   
  lm(sleep ~ pos_psy, data = _)  

summary(sleep_lm1)

Call:
lm(formula = sleep ~ pos_psy, data = sleep_tib)

Residuals:
    Min      1Q  Median      3Q     Max 
-12.526  -1.706   0.408   1.978   5.389 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   3.5202     1.1502   3.060  0.00239 ** 
pos_psy       2.2872     0.3149   7.262 2.74e-12 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.855 on 331 degrees of freedom
Multiple R-squared:  0.1374,    Adjusted R-squared:  0.1348 
F-statistic: 52.74 on 1 and 331 DF,  p-value: 2.744e-12

Running the Analysis

Step 1b: Fit Model 2

Model 2

sleep_lm2 <- sleep_tib |>   
  lm(sleep ~ pos_psy + adapt_er + mal_er, data = _)  

summary(sleep_lm2)

Call:
lm(formula = sleep ~ pos_psy + adapt_er + mal_er, data = sleep_tib)

Residuals:
     Min       1Q   Median       3Q      Max 
-12.2262  -1.5042   0.3768   2.0303   4.7376 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   8.5671     2.0735   4.132 4.57e-05 ***
pos_psy       2.1531     0.4154   5.183 3.82e-07 ***
adapt_er     -0.5004     0.3316  -1.509  0.13229    
mal_er       -0.9886     0.3690  -2.679  0.00776 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.823 on 329 degrees of freedom
Multiple R-squared:  0.1621,    Adjusted R-squared:  0.1545 
F-statistic: 21.22 on 3 and 329 DF,  p-value: 1.371e-12

Running the Analysis

Step 2a: Get the \(R^2\) values for each model

Model 1

broom::glance(sleep_lm1)
r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance df.residual nobs
0.1374355 0.1348296 2.855112 52.73944 0 1 -820.8574 1647.715 1659.139 2698.2 331 333

Model 2

broom::glance(sleep_lm2)
r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance df.residual nobs
0.1621142 0.1544739 2.822512 21.21831 0 3 -816.0243 1642.049 1661.089 2621.002 329 333

Running the Analysis

Step 2b: Calculate the \(R^2\Delta\) value from the \(R^2\) values

  • \(R^2 \Delta = R^2_\text{Model 1} - R^2_\text{Model 2}\)

  • \(R^2 \Delta = 0.137 - 0.162\)

  • \(R^2 \Delta = -0.025\)

  • \(R^2 \Delta = 2.5\text %\)

  • \(R^2\) for Model 2 accounts for 2.5% more of the variance in sleep, indicating Model 2 is the better fitting model

Running the Analysis

Step 3: Get & interpret the \(F\Delta\) and associated p-value

anova(sleep_lm1, sleep_lm2) |> broom::tidy()
term df.residual rss df sumsq statistic p.value
sleep ~ pos_psy 331 2698.200 NA NA NA NA
sleep ~ pos_psy + adapt_er + mal_er 329 2621.002 2 77.19761 4.845095 0.0084371


\(F\Delta\) = 4.85, p < 0.01, so we can conclude that Model 2 accounts for statistically significantly more variance and is, therefore, the better fitting model

2. Relationships Between the Predictors & the Outcome

Three Types of Research Questions

  • Now we know that Model 2 is the better fitting model, we can turn to examining the relationships between the predictors and the outcome

  • We can do this by examining the slope values (\(b_n\)) for each predictor and asking whether they are statistically significantly different from 0 (see Lecture 09 for a refresher)?

  • This is very similar to how we interpret a linear model with one predictor - let’s take a look…

Interpreting the Model Intercept & Slope

Intercept (\(b_0\)):

  • One predictor model: value of \(y\) when \(x_1\) is 0
  • Two predictor model: value of \(y\) when \(x_1\) and \(x_2\) are 0

  • Three predictor model: value of \(y\) when \(x_1\) and \(x_2\) and \(x_3\) are 0

Slopes (\(b_1\), \(b_2\)\(b_n\)):

  • One predictor model: \(b_1\) = change in \(y\) for every unit change in \(x_1\)
  • Two predictor model: \(b_2\) = change in \(y\) for every unit change in \(x_2\) … when the other predictor in the model is held constant (i.e., when the other variables do not change)

  • Three predictor model: \(b_3\) = change in \(y\) for every unit change in \(x_3\) … when the other predictors in the model are held constant

Interpreting the Model Intercept & Slope

  • The interpretation of \(b_0\) (the intercept) doesn’t change

    • It is always the value of \(y\) (the outcome) when the predictor(s) are at 0
  • The interpretation of \(b_n\) coefficients (a given slope) changes a little

    • They always represent the change in \(y\) (the outcome) for every unit change in \(x_n\) (a given predictor)…
    • … but when there are other predictors in the model, the relationship between the outcome and predictor assumes the other predictors are held constant
  • It doesn’t matter if there are two, five, ten, or fifty predictors - the \(b\)-values will always be interpreted in this same way

  • Let’s have a look at our example…

Research Question & Hypothesis

Research Question

  1. Does each predictor in the better model have a statistically significant relationship with sleep (and which direction will those relationships be in)?

Hypotheses

  • Positive psychology attributes and adaptive emotional regulation strategies have a positive relationship with sleep quality and quantity
  • Maladaptive emotional regulation strategies would have a negative relationship with sleep quality and quantity

Operationalisation - Model 2

Predictors:

  • \(x_1\) Positive psychology attributes (\(PosPysch\))

    \(x_2\) Adaptive emotion regulation attributes (\(AdaptEmoReg\))

    \(x_3\) Maladaptive emotion regulation attributes (\(MalEmoReg\))

  • Outcome (\(y\)): Sleep quality & quantity (\(Sleep\))

  • Model: \(Sleep_{i} = b_{0} + b_{1}\times PosPsych_{1i} + b_2\times AdaptEmoReg_{2i} + b_3\times MalEmoReg_{3i} +e_{i}\)

Running the Analysis

term estimate std.error statistic p.value conf.low conf.high
(Intercept) 8.5670973 2.0734842 4.131740 0.0000457 4.488138 12.6460569
pos_psy 2.1530694 0.4154330 5.182711 0.0000004 1.335829 2.9703095
adapt_er -0.5003627 0.3316101 -1.508889 0.1322869 -1.152706 0.1519809
mal_er -0.9885696 0.3690450 -2.678724 0.0077616 -1.714555 -0.2625841


  • \(b_0\) (intercept) = 8.57 (the value of sleep when all predictors are at 0)

  • \(b_1\) (slope for \(PosPysch\)) = 2.15, p < .001, 95% CI [1.34, -2.97]

  • \(b_2\) (slope for \(AdaptEmoReg\)) = -0.50, p = .132, 95% CI [-1.15, 0.15]

  • \(b_3\) (slope for \(MalEmoReg\)) = -0.99, p < .001, 95% CI [-1.71, -0.26]

  • Each slope represents the relationship between the predictor and the outcome when the other predictors in the model are held constant

3. Comparing Predictors

Three Types of Research Questions

  • First, we determined that the model with 3 predictors was the better fitting model

  • Second, we determined that positive psychology and maladaptive emotional regulation are statistically significant predictors of sleep

  • We also learned that positive psychology had a positive relationship with sleep and maladaptive emotional regulation had a negative relationship with sleep

  • Now we can ask, which is the best predictor?

Comparing Predictors

  • We can compare predictors to ascertain which is the “best” predictor

  • However, we cannot compare the beta values in their current (raw, unstandardised) form, because they are in different units

  • These different units reflect how the predictor variables are measured

    • e.g., seconds, kilograms, centimetres, pound stirling etc.

    • In psychology, we often use self-report scales where the units are Likert scale points…

Likert Scales - Maths Anxiety

Likert Scales - Trait Anxiety

Comparing Predictors

  • The two Likert Scales use different units of measurement, so cannot be directly compared

  • Similarly, we couldn’t directly compare reaction time in seconds with the amount of money someone earns, or the distance someone walks with their Likert Scale responses, etc.

  • However, we can make them comparable by standardising the slope values

Unstandardised vs Standardised Betas

Vocabulary: Unstandardised Betas

  • Change in the outcome for each unit change in the predictor

  • Depends on original scale of measurement

  • Usually denoted by \(b\)

Vocabulary: Standardised Betas

  • A standard deviation change in the outcome for each standard deviation change in the predictor

  • Does not depend on original scale of measurement

  • Usually denoted by \(β\)

Research Question & Hypothesis

Research Question

  1. Which predictor in the better model has the biggest impact upon sleep?

Hypotheses

  • Positive psychology attributes will predict sleep quality and quantity better than maladaptive emotional regulation strategies
  • Note that we’ve dropped the non-significant predictor, adaptive emotional regulation strategies
  • Note we’re not saying anything about the direction of the effect here, just the magnitude

Operationalisation - Model 2

  • Predictors:

    \(x_1\) Positive psychology attributes (\(PosPysch\))

    \(x_2\) Adaptive emotion regulation attributes (\(AdaptEmoReg\))

    \(x_3\) Maladaptive emotion regulation attributes (\(MalEmoReg\))

  • Outcome (\(y\)): Sleep quality & quantity (\(Sleep\))

  • Model: \(Sleep_{i} = b_{0} + b_{1}\times PosPsych_{1i} + b_2\times AdaptEmoReg_{2i} + b_3\times MalEmoReg_{3i} +e_{i}\)

Running the Analysis

Statistical Interpretation


Parameter Coefficient SE CI CI_low CI_high t df_error p
(Intercept) 0.000 0.050 0.95 -0.099 0.099 0.000 329 1.000
pos_psy 0.349 0.067 0.95 0.217 0.481 5.183 329 0.000
adapt_er -0.092 0.061 0.95 -0.212 0.028 -1.509 329 0.132
mal_er -0.154 0.057 0.95 -0.267 -0.041 -2.679 329 0.008


  • Look at the absolute values of \(β\) (ignore positive/negative sign) and decide which is the ‘bigger’ predictor of sleep? 🤔

  • Remember, we can assume non-statistically significant \(β\) values are not important predictors of the outcome (we’d report them, but not interpret them as meaningful)

Running the Analysis

Applied Interpretation

What should people focus on for better sleep quality/quantity? 🤔

  • This analysis suggests that the best thing we can do to improve sleep is to focus on increasing positive psychology attributes

  • Decreasing maladaptive emotional regulation strategies will also help, but not as much

  • Working on adaptive emotional regulation strategies would not have any impact on sleep (because it was non-significant)

Lecture Summary

  • You can extend the linear model to include multiple predictors: \(y_{i} = b_{0} + b_{1}\times x_{1i} + ... + b_{n}\times X_{ni} + e_{i}\)
  • We can ask three main types of question to evaluate multiple–predictor linear models:
  1. Hierarchical models can be compared with \(R^2\Delta\) and \(F\Delta\) to determine which model is best
  1. We can then examine the statistical significance of each predictor to establish whether it is an important predictor of the outcome
  1. We can also interpret the relative importance of the predictors in the best model using standardised betas (\(β\))

Next Week

  • Next week, there are unfortunately no statistics! 😭

  • Martina and I are going to tell you all about Questionable Research Practices (QRPs)

  • QRPs are “a range of activities that intentionally or unintentionally distort data in favour of a researcher’s own hypotheses…” (Fortt 2021)

  • At best, bad science. At worst, academic misconduct and outright fraud! 😈

  • We’ll explore some different QRPs, look at some famous examples, tell you how the field is trying to solve the problems (i.e., Open Science), and consider some emerging critiques of that movement.