Contributor: Lynn Ellis. Lesson ID: 13758
One of these things is not like the others. One of these things just doesn't belong. Learn how to identify an outlier in a set of data!
The giraffe's height is so much greater than the height of the people that he would be called an outlier.
Keep reading to learn how statisticians define an outlier and how to determine if a data point is an outlier.
An outlier is a data point significantly higher or lower than most of the data.
You need to identify an outlier because they can significantly influence statistical analysis.
"Significantly higher or lower" is not precise, so that idea needs to be formalized further.
What would be helpful is having a way to determine how much higher or lower that particular data point must be to be labeled an outlier.
John Tukey devised a more formal method for labeling a point an outlier. Tukey created fences based on the Interquartile Range (IQR). This leads to a more formal definition of an outlier.
An outlier is a data point that lies more than one-and-a-half times the interquartile range below quartile 1 or above quartile 3.
Explore some examples of finding outliers.
Example 1
In a statistics class, the ages of everyone in the room, including the teacher, are as follows.
16, 17, 16, 15, 18, 16, 17, 52, 14, 18, 17, 16, 19, 14, 15
If you put the data in order, you can find the five-number summary (min, Q1, med, Q3, max). For this data, the five-number summary is 14, 15, 16, 18, 52.
IQR = 18 - 15 = 3
3 x 1.5 = 4.5
18 + 4.5 = 22.5
This tells you that any value above 22.5 is an outlier.
15 - 4.5 = 10.5
This tells you that any value below 10.5 is an outlier.
52 is above 22.5, so the teacher's age is an outlier in this data set.
Example 2
The following data is a set of test scores.
92, 62, 98, 76, 97, 85, 87, 69, 45, 89, 90, 88
IQR = 91 - 72.5 = 18.5
18.5 x 1.5 = 27.75
91 + 27.75 = 118.75
This tells you that any value above 118.75 is an outlier.
72.5 - 27.75 = 44.75
This tells you that any value below 44.75 is an outlier.
There are no values above the upper fence or below the lower fence, so there are no outliers in this data set.
Note that a singular low test score of 45 may seem extreme. However, it is not extreme enough to fall below the lower fence, so it is not an outlier.
Recap what you have learned in this lesson.
⇒ | An outlier is any value that falls above the upper fence or below the lower fence. | |
⇒ | To calculate the upper fence, add 1.5 x IQR to Q3. | |
⇒ | To calculate the lower fence, subtract 1.5 x IQR from Q1. | |
⇒ | Remember that values can appear extreme but not, technically, outliers. It is essential to calculate the fences to identify outliers. |
Move to the Got It? section to calculate fences and identify outliers.