# What Can You Do with Outliers?

Contributor: Lynn Ellis. Lesson ID: 13759

Outliers can disproportionately affect your analysis of your data. Can you just remove them? What happens if you do? In this lesson, we will explore the answers to those questions.

categories

## Measurement and Data, Statistics and Probability

subject
Math
learning style
Auditory
personality style
Lion, Beaver
High School (9-12)
Lesson Type
Quick Query

## Lesson Plan - Get It!

Audio:

Today, we are going to be data detectives!

We have to be good detectives because our detective work will tell us whether we can eliminate an outlier or not.

(If you need a refresher on want an outlier even is, check out our lesson under the Additional Resources in the right-hand sidebar.)

Outliers can cause problems for evaluating data because they can influence the mean of the data set and give misleading information.

For example, nine students in a statistics class take all of the change out of their pockets and put it on their desks. Here are the amounts of money (in cents) that each student had:

50, 67, 0, 97, 76, 87, 65, 85, 75

When we calculate the upper and lower fences, we find that anything above 128.75 or below 15 is an outlier. So zero is an outlier in my data set.

If I take the mean of the data set with the zero in it, I get a mean of 66.89 cents. However, if I take the mean without the zero in the data set, I get a mean of 75.25.

• If I want a single number that describes the average amount of money that a student in the class has in their pockets, which is a better representation?

That one outlier has had a significant influence.

• Can I just eliminate it and call the mean of 75.25 a better representation?

The answer is, it depends. This is where we have to start being detectives.

A good detective wants to answer specific questions. For us as data detectives, those questions are:

• Is there a mistake in the recording of the data?

If so, we should remove that data point.

• Was there a mistake in measurement?

If so, we should remove that data point.

• Is the outlier from a member of the population that is in our sample?

If it is, we should not remove it.

• Is the extreme value of the outlier due to natural variability?

If it is, we should not remove it.

Let's apply these questions to our example above.

• Is there a mistake in the recording of the data?

It's possible that the student had 50 cents in his or her pocket, and it got recorded without the 5. But there are people who do not carry change in their pockets, so that is an assumption that we cannot make without further information.

• Was there a mistake in measurement?

Since the number is a zero, it is doubtful that someone counted incorrectly. Again, we can't assume a mistake here without further information.

• Is the outlier from a member of the population that is in our sample?

If the person with no change in his or her pockets was not a member of the statistics class, then they would not be a member of the population we are looking at.

For instance, if a visitor came to class that day and had no change but everyone in the class did, that outlier would not be from a member of the population. In that case, we would want to remove the outlier from the data set.

If the person with no money in his or her pocket was part of the class, we must keep the outlier.

• Is the extreme value of the outlier due to natural variability?

In this case, it likely is due to natural variability. Enough people do not carry money in their pockets that we can see the zero as a naturally occurring variation in the data. For this reason, we would not want to remove the data point.

As you can see, removing an outlier is a matter of being a detective and making an informed judgment.