Contributor: Lynn Ellis. Lesson ID: 13758
One of these things is not like the others. One of these things just doesn't belong. Learn how to identify an outlier in a set of data!
Is the height of the giraffe so much more than the height of the people that we would call him an outlier?
In this lesson, we will talk about how a statistician defines an outlier, and how he or she determines if a data point is an outlier.
We need to identify an outlier because they can influence our statistical analysis a great deal.
"Significantly higher of lower" is not a very precise definition, so we need to formalize the idea a little bit more.
What would be helpful is having a way to determine now much higher or lower that particular data point must be to be labeled an outlier.
John Tukey devised a more formal method for labeling a point an outlier. Tukey created "fences" based on the Interquartile Range (IQR). This leads to a more formal definition of an outlier.
An outlier is a data point that lies more than one-and-a-half times the interquartile range below quartile 1 or above quartile 3.
Let me show you a couple of examples of finding outliers.
Example 1:
In a statistics class, the ages of all of the people in the room, including the teacher, are as follows:
16, 17, 16, 15, 18, 16, 17, 52, 14, 18, 17, 16, 19, 14, 15
If you put the data in order, you can find the five number summary (min, Q1, med, Q3, max). For this data, the five number summary is 14, 15, 16, 18, 52.
IQR = 18 - 15 = 3
3 x 1.5 = 4.5
18 + 4.5 = 22.5
This tells us that any value above 22.5 is an outlier.
15 - 4.5 = 10.5
This tells us that any value below 10.5 is an outlier.
52 is above 22.5, so the teacher's age is an outlier in this data set.
Example 2:
The following data is a set of test scores:
92, 62, 98, 76, 97, 85, 87, 69, 45, 89, 90, 88
IQR = 91 - 72.5 = 18.5
18.5 x 1.5 = 27.75
91 + 27.75 = 118.75
This tells us that any value above 118.75 is an outlier.
72.5 - 27.75 = 44.75
This tells us that any value below 44.75 is an outlier.
There are no values above the upper fence or below the lower fence, so there are no outliers in this data set.
Note that there is a singular low test score, a score of 45, that may seem extreme to us. However, it is not extreme enough to fall below the lower fence, so it is not an outlier.
Let's recap what we have learned in this lesson:
⇒ | An outlier is any value that falls above the upper fence or below the lower fence. | |
⇒ | To calculate the upper fence add 1.5 x IQR to Q3. | |
⇒ | To calculate the lower fence, subtract 1.5 x IQR from Q1. | |
⇒ | Remember that values can appear extreme but not, technically, be outliers. It is important to calculate the fences in order to identify outliers. |
Move on to the Got It? section to calculate fences and identify outliers.