Main content
Course: Statistics and probability > Unit 5
Lesson 1: Introduction to scatterplots- Constructing scatter plots
- Making appropriate scatter plots
- Positive and negative linear associations from scatter plots
- Describing trends in scatter plots
- Positive and negative associations in scatterplots
- Outliers in scatter plots
- Clusters in scatter plots
- Describing scatterplots (form, direction, strength, outliers)
- Scatterplots and correlation review
© 2024 Khan AcademyTerms of usePrivacy PolicyCookie Notice
Clusters in scatter plots
Learn what a cluster in a scatter plot is!
What are clusters in scatter plots?
Sometimes the data points in a scatter plot form distinct groups. These groups are called clusters.
Consider the scatter plot above, which shows nutritional information for brands of hot dogs in . (Each point represents a brand.) The points form two clusters, one on the left and another on the right.
The left cluster is of brands that tend to be .
The right cluster is of brands that tend to be .
Practice problems
To better wrap our minds around the idea of clusters, let's try a couple of practice problems.
Problem 1: Male and female fish
Adult male Lamprologus callipterus (a type of fish) are much bigger than their female counterparts. They weigh about times as much. Also, while females reach a length of centimeters, males reach a length of centimeters.
Problem 2: SAT test scores
Some high school students in the U.S. take a test called the SAT before applying to colleges. The scatter plot below shows what percent of each state's college-bound graduates participated in the SAT in , along with that state's average score on the math section.
There is a cluster of states with , and a cluster of states with .
Why do clusters exist in data?
Explaining why clusters exist in a particular data set can be difficult. This article presented three data sets, each using data from the real world. Only in the fish data set was there a clear explanation behind the clusters.
If you have a theory that explains the clusters in either of the other data sets, please share your thoughts in the comments below.
Want to join the conversation?
No posts yet.