Uncertainty,
standard errors and
confidence intervals

Dr. Martina Sladekova

A reminder image so that I don't forget to record the lecture on Zoom. Again.

Session links

linktr.ee/analysingdata

Recap

Distributions describe how often different values occur - in a sample or a population.
“Mathematically defined” dsitribution are useful - we kow how probable scores are.
Normal distribution - defined by mean, standard deviation, and proportions of scores expected above/below critical values

Density plot of a normal distribution with shaded proportions. The X axis shows standard deviations. Middle portion is shaded from -1 to +1 SD from the mean, representing 68.2 percent of scores. Outer portion is shaded from -2 to + 2 SD from the mean, represeting 95.4% of scores

Recap

Key idea

Assuming a distribution of a particular shape, how common is a given value?

Average individual attends 127 social events per year.
Assume a population with M = 127 and SD = 40 (e.g. based on collected sample)
Is an individual who attends 57 social events per year unusual?
- In the assumed normal distribution, only 4 % of people attend less than 57 events per year.

1-pnorm(57, mean = 127, sd = 40, lower.tail = FALSE)

[1] 0.04005916

density plot of a normal distribution centred at 127 with standard deviation of 40. Vertical line crosses x axis at the value 57

Recap

Key idea

Assuming a distribution of a particular shape, is the value we’re interested in above or below a specific cut-off point?

Assuming the same distribution (M = 127, SD = 40) is an individual who attends 190 social events per year among the top 10% of event-goers?
- In the assumed normal distribution, an individual would have to attend 178 events or more to be in the top 10% of event goers.
- Therefore a person who attends 190 events is in the top 10%.

qnorm(p = 0.9, mean = 127, sd = 40)

[1] 178.2621

density plot of a normal distribution centred at 127 with standard deviation of 40. Vertical line crosses x axis at the value 190

How many social events cut-off the top 5%?

qnorm(p = 0.05, mean = 127, sd = 40, lower.tail = FALSE)

[1] 192.7941

PollEverywhere:

Let’s practice probability and critical values

Link tree:

linktr.ee/analysingdata

Let’s practice (1)

Example 1: Average individual drinks 730 cups of coffee per year. Jennifer drinks 1100 cups of coffee per year. Is Jennifer in top 5% of the distribution (shaded)?

density plot of a normal distribution centred at 730 with standard deviation of 200. Top 5% of the distribution are shaded.

Let’s practice (1)

Example 1: Average individual drinks 730 cups of coffee per year. Jennifer drinks 1100 cups of coffee per year. Is Jennifer in top 5% of the distribution (shaded)?

Let’s practice (2)

Example 2: A critical value for the bottom 5% on an anxiety scale is 7.08. A study participant receives a score of 8. Are they in the bottom 5%?

density plot of a normal distribution centred at 15.3 with standard deviation of 5.

Let’s practice (2)

Example 2: A critical value for the bottom 5% on an anxiety scale is 7.08. A study participant receives a score of 8. Are they in the bottom 5%?

density plot of a normal distribution centred at 15.3 with standard deviation of 5. Top 5% are shaded. Verticlal line crosses the x axis at 19

Today

Sampling from populations - the broader picture
Uncertainty in research and estimation
Sampling distributions and the Central Limit Theorem
Standard error of the mean
Confidence intervals: what they are, and what they are not

Where are we?

Roadmap on the module. Top row contains boxes "Introduction and distributions", "Standard error and confidence intervals" and "null hypothesis significance testing". Second box is labelled as "We're here!". Middle row is "t-test", "correlation" and "chi-square". Bottom row is "equation of a straight line", "linear model with one predictor", "linear model with multiple predictors"

Uncertainty in estimation

Samples and populations

So far we:

Described a sample
Assumed a mean and standard deviation for a population
Looked at individuals
- How unusual are they?
- Are they far from the mean?
- Are they above or below some critical value?

Samples and populations

In research, we often want to know:

How well does our sample represent the population?
Is our sample unusual (under certain assumptions we make about the population)?

Samples and populations

THE PROBLEM: Samples are not perfect representations of populations
There is uncertainty around how close the sample mean matches true population value.
Standard errors and confidence intervals are tools we use to quantify that uncertainty.

histogram of stress scores ranging from roughly 30 to 70. Orange dot on the distribution represents the mean of about 55. There are error bars surronding the dot.

histogram of stress scores in a population ranging from roughly 30 to 70. Orange dot on the distribution represents the mean of about 50.

Samples and populations

The sample estimate is our best guess
The population value we’re trying to estimate is called the parameter

Parameter estimates

What is an estimate?

A parameter estimate can take many different forms. We might want to:

Estimate a typical value of a single variable in a population
Estimate group differences for a variable
Estimate strength of association between two or more variables (e.g. a correlation coefficient, or slope of a straight line).

The average person…

drinks 730 cups of coffee per year (twice as much for academics, incl. students) ☕
spends 192 minutes a day watching TV 📺
eats 250 cloves of garlic per year 🧄
takes 3500 steps each day 🚶
falls asleep in 7 minutes 😴

Parameter estimates

What is an estimate?

A parameter estimate can take many different forms. We might want to:

Estimate a typical value of a single variable in a population
Estimate group differences for a variable
Estimate strength of association between two or more variables (e.g. a correlation coefficient, or slope of a straight line).

Parameter estimates

What is an estimate?

A parameter estimate can take many different forms. We might want to:

Estimate a typical value of a single variable in a population
Estimate group differences for a variable
Estimate strength of association between two or more variables (e.g. a correlation coefficient, or slope of a straight line).

Today’s example

Doomscrolling

“… refers to a unique media habit where social media users persistently attend to negative information in their newsfeeds about crises, disasters, and tragedies.”

- Sharma, Lee, and Johnson (2022)

A gif of a person scrolling endlessly on their phone

Research question…

How much does an average person doomscroll?

Doomscrolling poll

Group 1: Right side of the room

Group 2: Middle of the room

Group 3: Left side of the room

linktr.ee/analysingdata

A group of stick figures representing the population

A group of stick figures representing the population. Below is a sample of 4 stick figures, labeled with mean of 101 minutes per day.

A group of stick figures representing the population. Below are two samples of stick figures drawn from the population, each showing a different value of the mean

A group of stick figures representing the population. Below are three samples of stick figures drawn from the population, each showing a different value of the mean

Uncertainty in research and estimation

Each time we take a sample, we get a different estimate
This is because of random sampling

How do we know if our estimate is accurate and close to the real population value?

dot plot of average time spent scrolling (x axis) and 3 sample IDs (y axis). Dot at each value of the y aixs represents the mean of each population. Top dot is at the value of 93, middle dot is at the value of 109, lower dot is at the value of 101 minutes.

Uncertainty in research and estimation

Each time we take a sample, we get a different estimate
This is because of random sampling

How do we know if our estimate is accurate and close to the real population value?

We can never be certain - we need to be able to quantify this uncertainty

Dot plot representing the distribution of means of 30 samples. Dots are scattered across a range of doomscrolling values.

Sampling distributions

We can plot the sample estimates in a histogram to see how they’re distributed
We’re now working with sample means, not the scores of individual people. Therefore the plot below shows a sampling distribution.

histogram showing sampling distribution of sample means. 3 previously sampled means are highlighted in orange.

The Central Limit Theorem

Describes how sampling distributions arise
Imagine we repeat the following process thousands (infinite) number of times:
1. Collect a sample of individuals
2. Calculate mean doom-scrolling time in that sample
3. Save this mean value and put it on the plot
4. Repeat

The Central Limit Theorem

sampling distribution of many means, forming a perfect normal distribution.

The Central Limit Theorem

For many types of estimates, including means, the sampling distribution will be normal
The centre of the sampling distribution will be the population parameter
True regardless of the sample distribution or the population distribution.
As long as the samples that we’re taking are large enough
- If each sample has only 3 participants, the sampling distribution might not end up being normal
- Textbooks often say that 30 is enough, but there are situations when we might need more
- Let’s put a pin in this and we’ll get back to it later 📌

📌Attendance pin🧷

Standard error

Normal distribution - what we already know

We can describe every normal distribution using:

Mean - the central value
Standard deviation (SD) - the average difference from the mean
Proportions of scores at cut-off points
- Around 68% of scores are within 1 SD of the mean
- 95% of scores are within \(\pm\) 1.96 SDs of the mean

Normal sampling distribution

Same rules apply!
The mean of a sampling distribution will be centered on the population value
LANGUAGE CHANGE: when talking about standard deviation in the context of a sampling distribution, we call it standard error.

Standard deviation

The average difference between each score and the sample mean

Standard error

Standard deviation of sample means
The average difference between each sample mean and the population value

Normal sampling distribution

We know that 95% of sample means will fall within 1.96 standard errors from the population mean
We can use this knowledge to construct an interval around the mean - 95% of sample means will fall within this interval.

sampling distribution of many means, forming a perfect normal distribution. Distance of 1.96 standard errors from the mean is highlighted in orange

Normal sampling distribution

We know that 95% of scores will fall within 1.96 standard errors from the population mean
We can use this knowledge to construct an interval around the mean - 95% of sample means will fall within this interval.

Standard error

Standard error is a useful metric for quantifying uncertainty in estimates - it describes the extent to which samples differ from each other in a sampling distribution
We can use it to construct an interval within which a certain percentage of sample means will fall
However…

Estimating the standard error from the sample

Sampling distributions don’t exist “in the wild”. They are a hypothetical statistical concept.
Remember: standard error refers to the standard deviation of the sampling distribution (created by re-sampling and computing the mean infinite number of times), but we only have access one sample with one mean.
Therefore, if we want to use the standard error to construct an interval, we need to estimate it from our sample.

Estimating the standard error from the sample

Equation:

\[ SE = \frac{SD}{\sqrt N} \]

Translation:

\[ \text{standard error} = \frac{\text{sample standard deviation}}{\text{(the square root of) the sample size}} \]

In R:

se = sd(data$variable) / sqrt(n)

Note that the SE will be smaller for larger samples (because we’re dividing by a larger number).

Estimating the standard error from the sample

Example:

We collect a sample of 4 individuals.
Each person reports their daily doomscrolling time (in minutes): 86, 114, 97, 107
The mean for the sample is 101 minutes
The standard deviation is:

\[ SD = \sqrt\frac{\sum(x_i - x)^2}{N} = \sqrt\frac{(86-101)^2 + (114-101)^2 + (97 - 101)^2+(107-101)^2}{4} = 12.19 \]

Which makes the standard error:

\[ SE = \frac{SD}{\sqrt{N}} = \frac{12.19}{\sqrt{4}} = 6.095 \]

Confidence intervals

Average doomscrolling time for the sample: 101 minutes

Standard deviation: 12.19

Standard error: 6.095

\[ \text{Lower CI limit} = \text{sample mean} - 1.96 \times\text{SE} \\ \text{Upper CI limit} = \text{sample mean} + 1.96 \times\text{SE} \]

\[ \text{Lower CI limit} = 101 - 1.96 \times6.095 = 89.054\\ \text{Upper CI limit} = 101 + 1.96 \times6.095 = 112.946 \]

illustrative image of a confidence interval. Dot in the middle represents the mean. The left and right edges of the error bar around the mean represent lower and upper limits of the confidence interval.

You might see in a paper…

“The average doomscrolling time in our sample was 101 minutes (SD = 12.19) 95% CI [89.05, 112.95].”

Confidence intervals

In papers:

“Error bars” on plots will often represent confidence intervals - labelled as “CI”
Sometimes they might represent standard errors - labelled as “SE” or “SEM” - always check the plot description.
Other times the authors just keep it a secret

Confidence intervals for small samples

📌 Sampling distribution of the mean will have a normal shape as long as the sample size large enough
Smaller samples don’t approximate the normal sampling distribution very well. Because of this, we can’t rely on the value 1.96 to give us accurate intervals.

The t-distribution

Instead, we can use the t-distribution
- Looks like normal, by isn’t.
- Defined by degrees of freedom (df) - calculated as N-1 (number of observations minus 1)
- The “critical t value” will change for different degrees of freedom.
  - It’s the value we use instead of 1.96 to calculate 95% confidence intervals

Density plot of a bell-shaped t-distribution, with 3 degrees of freedom. Shape is similar to normal, but tails are longer. Critical value displayed on the plot for 3 degrees of freedom is 3.184

The t-distribution

Instead of multiplying the standard error by 1.96, we multiply by the critical t value.
Critical t gets closer to 1.96 with larger sample - the t-distribution itself will approximate normal distribution more closely
For example, in our sample of 4, the df is 4 - 1 = 3. Move the slider to df = 3 to see that the critical t value for 3 is 3.182

t-based confidence intervals:

Average doomscrolling time for the sample: 101 minutes

Standard error: 6.095

Critical t value: 3.182

\[ \text{CI Limits} = \text{mean} \pm3.182 \times\text{SE} \\ \text{CI Limits} = 101 \pm3.182 \times\text{6.095} \\ \text{CI Limits} = [81.606, 120.394] \]

t-based confidence intervals

Compare the intervals:
- Original: [81.61, 120.394]
- t-based: [89.05, 112.95].
The new CI is wider!

two confidence intervals. First one is the one calculated using the Z score of 1.96. The second one is calculated using t-distribution. The second interval is wider.

t-based confidence intervals:

CIs will generally be wider in smaller samples - more uncertainty
t-distribution additionally accounts for the fact that small samples don’t always generate normal sampling distributions.
The larger the sample, the narrower the confidence intervals
note how t approaches 1.96 as the sample size (df) increases

Density plot of a bell-shaped t-distribution, with 100 degrees of freedom. Shape is now closer to normal, with lighter tails. Critical value displayed on the plot for 100 degrees of freedom is 1.98

Confidence intervals across samples

We take samples over and over again, compute the mean for each, and construct confidence intervals around that mean - 95% of them will contain the population value, the remaining 5% will not.
This is known as an interval with 95% coverage. 95% is the most common value that we choose, but it can take on other values as well (e.g 50%, 90%, 99%).

dot plot of means from our 3 samples, now with confidence intervals. There's a red line going through the middle of the plot representing the population mean. Only 3 out of 4 intervals cross this line.

gif of many confidence intervals being generated for many samples. 95% of confidence intervals include the population value. 5% miss it entirely.

Confidence intervals across samples

If we use the wrong critical value for calculation - e.g. assuming normal sampling distribution when it’s not there - the coverage will be inaccurate
I.e. we might expect 95% of CIs to contain the population value, when in reality the coverage is lower.

How to interpret confidence intervals

\[ \text{"The average doomscrolling time in our sample was} \\ \text{101 minutes (SD = 12.19) 95% CI [81.61, 120.39]."} \]

Correct interpretation

ASSUMING THAT our sample is one of the 95% producing confidence intervals that contain the population value, then the population value for time spent doomscrolling per day falls somewhere between 81.61 and 120.39 minutes.

However…

There is no guarantee that the assumption above is correct! And we just have to live our lives not knowing…

image of many confidence intervals generated for many samples. 95% of confidence intervals include the population value. 5% miss it entirely.

Researchers (mis)interpreting confidence intervals

Hoekstra et al. (2014) :

Both researchers and students endorsed, on average, more than three [incorrect] statements [about confidence intervals], indicating a gross misunderstanding of CIs. Self-declared experience with statistics was not related to researchers’ performance […] Researchers hardly outperformed the students, even though the students had not received any education on statistical inference whatsoever.

How not to interpret confidence intervals

No:

“We can be 95% confident that the population value falls between 81.61 and 120.39.”

“95%” in the name refers to coverage, not to how confident we’re feeling.

Also no:

“There is 95% probability that the population value falls between 81.61 and 120.39.”

Think back to the “ladder” of confidence intervals - each of the intervals on that plot shows different limits. So the probability cannot be 95% for every single one of them.

How to interpret confidence intervals

Correct interpretation

More general correct interpretation

ASSUMING THAT our sample is one of the 95% producing confidence intervals that contain the population value, then the population value for the estimate of interest falls somewhere between the lower limit the upper limit of the interval we’ve computed for our sample.

Memorise and practice!

The bigger picture…

When interpreting estimates and confidence intervals for your sample - always consider them as just one of many different possible estimates
This is why replication is important in science - our sample could easily be the one that misses the population value
Always be vary of studies placing too much certainty on a single finding

Summary

We want to use our sample estimate some population value (e.g. some average value - a mean)
Confidence intervals help us quantify uncertainty around that estimate
To construct a confidence interval, we use the standard error which we can estimate as:

\[ SE = \frac{SD}{\sqrt N} \]

Lower and upper limits of a 95% confidence interval can be estimated as (replacing 1.96 with critical t for small samples):

\[ \text{CI limits} = mean \pm (1.96 \times{SE}) \\ \]

When sampling repeatedly, 95% of samples produce confidence intervals that contain the true population value. We don’t know if our sample is one of them - we only (rightly or wrongly) assume that it does.

Next week:

Putting it all into practice:

Research questions
Good and less good hypotheses
Testing hypotheses with Null Hypothesis Significance Testing
A disappointing answer to why we’re so obsessed with the value 95%.

References

Hoekstra, Rink, Richard D. Morey, Jeffrey N. Rouder, and Eric-Jan Wagenmakers. 2014. “Robust Misinterpretation of Confidence Intervals.” Psychonomic Bulletin & Review 21 (5): 1157–64. https://doi.org/10.3758/s13423-013-0572-3.

Sharma, Bhakti, Susanna S. Lee, and Benjamin K. Johnson. 2022. “The Dark at the End of the Tunnel: Doomscrolling on Social Media Newsfeeds.” Technology, Mind, and Behavior 3 (1). https://doi.org/10.1037/tmb0000059.

Uncertainty, standard errors and confidence intervals

Session links

Recap

Recap

Recap

Recap

PollEverywhere:

Let’s practice (1)

Let’s practice (1)

Let’s practice (2)

Let’s practice (2)

Today

Where are we?

Uncertainty in estimation

Samples and populations

Samples and populations

Samples and populations

Samples and populations

Parameter estimates

Parameter estimates

Parameter estimates

Today’s example

Doomscrolling

Doomscrolling poll

Uncertainty in research and estimation

Uncertainty in research and estimation

Sampling distributions

The Central Limit Theorem

The Central Limit Theorem

The Central Limit Theorem

📌Attendance pin🧷

Standard error

Normal distribution - what we already know

Normal sampling distribution

Normal sampling distribution

Normal sampling distribution

Standard error

Estimating the standard error from the sample

Estimating the standard error from the sample

Estimating the standard error from the sample

Example:

Confidence intervals

Confidence intervals

Confidence intervals

Confidence intervals for small samples

The t-distribution

The t-distribution

t-based confidence intervals:

t-based confidence intervals

t-based confidence intervals:

Confidence intervals across samples

Confidence intervals across samples

How to interpret confidence intervals

Researchers (mis)interpreting confidence intervals

How *not* to interpret confidence intervals

How to interpret confidence intervals

The bigger picture…

Summary

Next week:

References

Uncertainty,
standard errors and
confidence intervals

How not to interpret confidence intervals