Identifying Outliers in Data

Contributor: Lynn Ellis. Lesson ID: 13758

One of these things is not like the others. One of these things just doesn't belong. Learn how to identify an outlier in a set of data!

categories

Measurement and Data, Statistics and Probability

subject
Math
learning style
Auditory
personality style
Lion
Grade Level
High School (9-12)
Lesson Type
Quick Query

Lesson Plan - Get It!

Audio: Image - Button Play
Image - Lession Started Image - Button Start

tall giraffe short kids

  • Does it look like the giraffe does not belong with the people in that picture?

The giraffe's height is so much greater than the height of the people that he would be called an outlier.

  • What is an outlier?

Keep reading to learn how statisticians define an outlier and how to determine if a data point is an outlier.

outlier written with blocks

An outlier is a data point significantly higher or lower than most of the data.

You need to identify an outlier because they can significantly influence statistical analysis.

"Significantly higher or lower" is not precise, so that idea needs to be formalized further.

What would be helpful is having a way to determine how much higher or lower that particular data point must be to be labeled an outlier.

John Tukey devised a more formal method for labeling a point an outlier. Tukey created fences based on the Interquartile Range (IQR). This leads to a more formal definition of an outlier.

wood fence

An outlier is a data point that lies more than one-and-a-half times the interquartile range below quartile 1 or above quartile 3.

Explore some examples of finding outliers.

Example 1

In a statistics class, the ages of everyone in the room, including the teacher, are as follows.

16, 17, 16, 15, 18, 16, 17, 52, 14, 18, 17, 16, 19, 14, 15

  • Is the teacher's age an outlier?
  1. Find the Interquartile Range (IQR). Remember that IQR = Q3 - Q1.

If you put the data in order, you can find the five-number summary (min, Q1, med, Q3, max). For this data, the five-number summary is 14, 15, 16, 18, 52.

IQR = 18 - 15 = 3

  1. Next, multiply the IQR by 1.5

3 x 1.5 = 4.5

  1. Add that value to Q3 to get the upper fence.

18 + 4.5 = 22.5

This tells you that any value above 22.5 is an outlier.

  1. Subtract that value from Q1 to get the lower fence.

15 - 4.5 = 10.5

This tells you that any value below 10.5 is an outlier.

52 is above 22.5, so the teacher's age is an outlier in this data set.

Example 2

The following data is a set of test scores.

92, 62, 98, 76, 97, 85, 87, 69, 45, 89, 90, 88

  • Are there any outliers in this data?
  1. Find the Interquartile Range (IQR). For this data, the five-number summary is 45, 72.5, 87.5, 91, 98.

IQR = 91 - 72.5 = 18.5

  1. Multiply the IQR by 1.5

18.5 x 1.5 = 27.75

  1. Add that value to Q3 to get the upper fence.

91 + 27.75 = 118.75

This tells you that any value above 118.75 is an outlier.

  1. Subtract that value from Q1 to get the lower fence.

72.5 - 27.75 = 44.75

This tells you that any value below 44.75 is an outlier.

There are no values above the upper fence or below the lower fence, so there are no outliers in this data set.

Note that a singular low test score of 45 may seem extreme. However, it is not extreme enough to fall below the lower fence, so it is not an outlier.

recap time

Recap what you have learned in this lesson.

  An outlier is any value that falls above the upper fence or below the lower fence.
     
  To calculate the upper fence, add 1.5 x IQR to Q3.
     
  To calculate the lower fence, subtract 1.5 x IQR from Q1.
     
  Remember that values can appear extreme but not, technically, outliers. It is essential to calculate the fences to identify outliers.

 

  • Are you ready to practice your new skills?

Move to the Got It? section to calculate fences and identify outliers.

calculator work

Image - Button Next