# Identifying Outliers in Data

Contributor: Lynn Ellis. Lesson ID: 13758

One of these things is not like the others. One of these things just doesn't belong. Learn how to identify an outlier in a set of data!

categories

## Measurement and Data, Statistics and Probability

subject
Math
learning style
Auditory
personality style
Lion
High School (9-12)
Lesson Type
Quick Query

## Lesson Plan - Get It!

Audio:

• Does it look to you like the giraffe does not belong with the people in that picture?

Is the height of the giraffe so much more than the height of the people that we would call him an outlier?

In this lesson, we will talk about how a statistician defines an outlier, and how he or she determines if a data point is an outlier.

• An outlier is a data point that is significantly higher or lower than the majority of the data.

We need to identify an outlier because they can influence our statistical analysis a great deal.

"Significantly higher of lower" is not a very precise definition, so we need to formalize the idea a little bit more.

What would be helpful is having a way to determine now much higher or lower that particular data point must be to be labeled an outlier.

John Tukey devised a more formal method for labeling a point an outlier. Tukey created "fences" based on the Interquartile Range (IQR). This leads to a more formal definition of an outlier.

An outlier is a data point that lies more than one-and-a-half times the interquartile range below quartile 1 or above quartile 3.

Let me show you a couple of examples of finding outliers.

Example 1:

In a statistics class, the ages of all of the people in the room, including the teacher, are as follows:

16, 17, 16, 15, 18, 16, 17, 52, 14, 18, 17, 16, 19, 14, 15

• Is the teacher's age an outlier?
1. Find the Interquartile Range (IQR). Remember that IQR = Q3 - Q1.

If you put the data in order, you can find the five number summary (min, Q1, med, Q3, max). For this data, the five number summary is 14, 15, 16, 18, 52.

IQR = 18 - 15 = 3

1. Next, multiply the IQR by 1.5

3 x 1.5 = 4.5

1. Add that value to Q3 to get the upper fence.

18 + 4.5 = 22.5

This tells us that any value above 22.5 is an outlier.

1. Subtract that value from Q1 to get the lower fence.

15 - 4.5 = 10.5

This tells us that any value below 10.5 is an outlier.

52 is above 22.5, so the teacher's age is an outlier in this data set.

Example 2:

The following data is a set of test scores:

92, 62, 98, 76, 97, 85, 87, 69, 45, 89, 90, 88

• Are there any outliers in this data?
1. Find the Interquartile Range (IQR). For this data, the five number summary is 45, 72.5, 87.5, 91, 98.

IQR = 91 - 72.5 = 18.5

1. Multiply the IQR by 1.5

18.5 x 1.5 = 27.75

1. Add that value to Q3 to get the upper fence.

91 + 27.75 = 118.75

This tells us that any value above 118.75 is an outlier.

1. Subtract that value from Q1 to get the lower fence.

72.5 - 27.75 = 44.75

This tells us that any value below 44.75 is an outlier.

There are no values above the upper fence or below the lower fence, so there are no outliers in this data set.

Note that there is a singular low test score, a score of 45, that may seem extreme to us. However, it is not extreme enough to fall below the lower fence, so it is not an outlier.

Let's recap what we have learned in this lesson:

 ⇒ An outlier is any value that falls above the upper fence or below the lower fence. ⇒ To calculate the upper fence add 1.5 x IQR to Q3. ⇒ To calculate the lower fence, subtract 1.5 x IQR from Q1. ⇒ Remember that values can appear extreme but not, technically, be outliers. It is important to calculate the fences in order to identify outliers.