Lesson Plan - Get It!
Take out a piece of paper and draw a random shape.
Now, try to figure out where the center of that random shape is. Sometimes, the idea of center is a little tricky.
- How do you figure out where the center of a set of data is?
- How do you talk about how spread out it is?
The answer to both of these questions is that it depends.
- Upon what exactly does it depend?
Imagine that 11 people walk into the room. Your goal is to describe the collective heights of the people in a single number.
Here are their heights (in inches):
67, 64, 68, 69, 72, 67, 69, 68, 63, 66, 71
One way to describe the heights of these 11 people is with the mean.
The mean is the average of all of the numbers. We add all of these numbers together and divide by 11. Take a minute to do that yourself.
The mean of these heights is 67.6 inches (rounded to the nearest tenth).
This makes sense as a description of the heights of the group. The individual heights are all fairly close together (within 10 inches from shortest to tallest). 67.6 inches is close to the majority of the heights.
Another way to describe the heights of these 11 people is with the median.
The median is the center number when the numbers are in order from least to greatest. Let's put them in order and see which one is in the middle:
63, 64, 66, 67, 67, 68, 68, 69, 69, 71, 72
Now that they are in order, we can see that 68 inches is the middle number so it is the median. Again, this makes sense as a description of the heights of the group. In fact, it is very close to the mean.
OK, now imagine that Tacko Fall, the tallest player in the NBA, walks into the room. Tacko Fall is 7'5" tall, so his height in inches is 89 inches.
- What does that do to the mean?
- How about the median?
Go ahead and calculate them for yourself.
The mean is now 69.4 inches. Tacko Fall's presence in the room increased the mean by almost 2 inches.
The median is the middle number. You may have noticed when you tried to find the median yourself that there is no single number in the middle. When that happens, you take the average of the two middle numbers.
In this case you would take the average of the 6th and 7th number in the ordered list. The median remains at 68. Tacko Fall's presence in the room does not change the median at all.
We call mean and median both measures of central tendency.
Let's return to our original question.
- Which descriptor is better -- the mean or the median?
In the first case, with the 11 original people, both numbers provide good descriptions of the entire group of heights.
In the second case, when Tacko Fall has joined the group, the median is a better description of the entire group because it is not overly influenced by one very tall man the way that the mean is influenced by him.
Here is a visual of the first group of heights:
Notice that it is fairly symmetric.
While mean and median are both excellent measures of central tendency for a symmetric set of data, statisticians will normally use mean for symmetric data.
As you continue to study statistics, you will understand more about the power of using the mean for symmetric data.
Here is a visual of the second group of heights:
Notice that Tacko Fall's extreme height made the data skew right (the data trails out to the right.)
Median is the best measure of central tendency for skewed data because it is not influenced by the extreme individual data points.
We have answered our first question about the center of the data, but identifying an entire set of data by a single number does not give us enough information.
We could have a median or a mean that is 68 when all the heights in the entire data set are 68 inches. We could also have a mean or a median of 68 when the shortest height is 50 inches and the tallest height is 80 inches.
That is why we also need to consider how spread out the data is.
When we use mean for our measure of central tendency, we use standard deviation for our measure of spread. Standard deviation tells us the average distance from the mean of all data points.
When we use median for our measure of central tendency, we use interquartile range for our measure of spread.
In the next section, you will practice identifying the best measures of central tendency and spread for various data sets.