This lesson explores more important vocabulary for statistics and probability.
- sample
- survey
- bias
- estimate
- sampling method
- subgroup
- simple random sampling
- overrepresentation
- stratified sampling
- skew
- strata
Samples and surveys
Imagine you want to find out how often Americans eat fast food. It would be impossible to ask every single American about their eating habits. Instead, you can estimate. An estimate is a guess based on evidence. The evidence we need for a good estimate is a sample.
A sample is a representative data set taken from a larger population. For example, instead of asking each of the 321 million Americans about fast food, you could ask 10,000 Americans. This collection of data is called a survey.
An important question arises, however--when you begin your survey, which 10,000 Americans do you ask? Do you ask the first 10,000 people you see? If so, the answers might be indicative of your area but not all of the United States. Do you ask 200 people in each state? If so, does it matter that California has a large population and Rhode Island has a small population?
The answers to all of these questions depend on your sampling method. There are many different sampling methods. Each method is useful for different situations or goals. Below are two common sampling methods.
Simple Random Sampling
In this method, the entire population has an equal chance of being selected as part of the sampling data set. For example, imagine there were 250 students in a public school, and every student's name was assigned a number, 1-250. If forty numbers from 1-250 were randomly selected, this would be an example of simple random sampling. Every person has the possibility of being part of the sample, and every person has the same number of opportunities of being selected.
- Benefits: Because the selection is random, bias is minimized. In sampling, bias means that the sample selection was influenced or affected by other factors. Also, in simple random sampling, the sample tends to match the variance within the total population fairly well.
- Drawbacks: Because the selection is random, it is possible that certain subgroups could be overrepresented. In the example above, chance would say that if forty people were randomly selected, approximately half would be women, a subgroup of the population. However, it is possible that simple random sampling could result in a sampling of 80% women, which could skew some kinds of surveys. When data is skewed, it means that it is not distributed in a normal or realistic way. Also, this kind of sampling cannot be done for extremely large populations. For example, there is no list of every single person in the United States that we could use to make a simple random sample for America.
Stratified sampling
In stratified sampling, specific subgroups are selected. These specific subgroups are called strata. Random samples are taken from each strata. For example, imagine you want a sample from the 250 public students described above, but you want to ensure that all four levels of students--freshmen, sophomores, juniors, and seniors--were all proportionally represented. Below is a list of how many students are in each level:
- 50 freshmen
- 50 sophomores
- 100 juniors
- 50 seniors
In this case, you would randomly select 8 students from the 50 freshman, 8 students from the 50 sophomores, 16 students from the 100 juniors, and 8 students from the 50 seniors.
- Benefits: Because different strata receive proportional representation, more information can be gathered about subgroups.
- Drawbacks: Strata must be carefully selected. It would not be useful, for example, to make "tall students" and "short students" strata when trying to research test scores. Additionally, the addition of strata can make surveys and analysis more costly and difficult.
Exercise
Open the exercise to begin the activity. Follow the instructions in the document.