An open laptop with charts on the screen.

3.1: Basic Concepts

Learning Objectives

Upon completion of this section, you should be able to

  • Identify the population of a study
  • Determine whether a value calculated from a group is a statistic or a parameter
  • Determine whether a measurement is categorical (qualitative) or quantitative data

Populations and Samples

Video Introduction to Population and Sample (4 mins 47 secs – CC)

Statistics deals with the collection, analysis, interpretation, and presentation of data. We are usually interested in understanding a measurement or observation for a specific group of people. This group (whether it is people, animals, plants, etc…) is known as the population of interest , or simply the population. The population is the collection of all people who have some characteristic in common; it can be as broad as “all people” if we have a very general research question about human psychology, or it can be extremely narrow, such as “all nursing major students at Pima Community College” if we have a more specific group in mind. The challenge with studying the group as a whole is that it may not be possible to examine each member from the population and collect data from them due to costs and time requirements to do so in many cases. When we have this situation where the cost is too high to examine everyone in the population we use a process where we collect a smaller part of the population, called a sample. The sample is done instead as it reduces the cost of gathering the information of interest. From that sample we then make inferences about the population. If we are able to collect information from the entire population we call that a census.

Population, Sample, Census

The population is the group the collected data is intended to describe.A sample is a smaller subset of the entire population, ideally one that is fairly representative of the whole population.

A census is a survey of the entire population (including relevant details of interest).

Example 1

The Pew Research Center asked 1000 women in the U.S. what their highest attained education level was.

What is the Population? What is the Sample?

Solution

Population: The collection of the highest education levels of all women in the U.S.

Sample: The collection of the highest education level of 1000 women in the U.S.

Before we begin gathering and analyzing data we need to characterize the population we are studying. If we want to study the amount of money spent on textbooks by a typical first-year college student, our population might be all first-year students at your college.  Or it might be:

  • All first-year community college students in the state of Washington.
  • All first-year students at public colleges and universities in the state of Washington.
  • All first-year students at all colleges and universities in the state of Washington.
  • All first-year students at all colleges and universities in the entire United States.
  • And so on.

Sometimes the population is called the intended population or target population, since if we design our study badly, the collected data might not actually be representative of that intended (target) population.

Intended or Target Population

The intended population or target population is the theoretical population of interest.

Why is it important to specify the population? What if we are interested in the average cost of textbooks for a first-year student in higher-ed in Arizona? Our intended population for this study may be all first-year students in Arizona, but if our sampling is done and focused at just gathering data from students at UofA, ASU, and NAU the results may not be applicable to students that are in the first-year at one of Arizona various other colleges (like Pima Community College). If that average cost of textbooks was shared as the generalized cost for first-year students it could be higher than say the average cost of textbooks at a college that emphasizes more use of OER (open education resources), like this textbook, and be a deterint to students who may not be able to afford that perceived higher cost for textbooks. Particularly when conveying our results to others, we want to be clear about the population we are describing with our data.

Example 2

We are interested in examining how many years it took on average to earn a Bachelors Degree by looking at graduating seniors at American colleges and universities. The entire graduating class from UofA was examined to determine the average number of years.

What is the intended target population? What is the real population?

Solution

While the target (intended) population may have been all graduating seniors of American colleges and university, the real population of the survey is only graduating seniors from UofA. This would be considered a census of the graduating seniors from UofA as all students were examined.

Example 3

A newspaper website contains a poll asking the local people in the city their opinion on a recent news article.

What is the intended target population of this poll? What is the real population of this poll?

Solution

While the target (intended) population may have been all people in the area, the real population of the survey is readers of the website.

Try It Now 1

To determine the average length of trout in a lake, researchers catch 20 fish and measure them. What is the sample and population in this study?

Hint 1 (click to Show/Hide)

The population of a study is the group the collected data is intended to describe.

A sample is a smaller subset of the entire population, ideally one that fairly representative of the whole population.

Answer (click to Show/Hide)

The sample is the 20 fish caught. The population is all fish in the lake. The sample may be somewhat unrepresentative of the population since not all fish may be large enough to catch the bait.

Paramenter and Statistic

If we were able to gather data on every member of our population, say the average (we will define “average” more carefully in a subsequent section) amount of money spent on textbooks by each first-year student at your college during the last academic year, the resulting number would be called a parameter. The problem with trying to gather that information is that it will be very costly in terms of money and time to reach every student and get the total spent (since we can’t just pull records of the campus bookstore when students also purchase textbooks from other sources). Instead what we may do is take a sample from the first-year students and calculate that average just from the sample. The resulting number from the sample would be called a statistic.

Parameter and Statistic

A parameter is a measurement or value (average, percentage, etc.) calculated using all the data from a population

A statistic is a measurement or value (average, percentage, etc.) calculated using the data from a sample.

To help remember the difference that a Parameter is a value from a Population and a Statistic is a value from a Sample we can just recall that the first letters correspond.

For very large population we seldom have a value of a parameter that may be of interest, since surveying an entire population is usually very time-consuming and expensive. This idea of gathering data of a population is really where the study of statistics was formed as it gives us the tools to describe that unknown population parameter by using a well designed sample to find a statistic to give us an estimate of the parameter. One thing to keep in mind that is a parameter is a fixed value from the population, but a statistic can change based on a new sample from that same population. The study of Statistics includes looking at how well that statistic from a sample estimates that population parameter.

We will discuss sampling methods in greater detail in the next section. For now, let us assume that samples are chosen in an appropriate manner if we say a sample is taken.

Example 4

A researcher wanted to know how citizens of Tucson felt about a voter initiative. To study this, she goes to the Park Place Mall and randomly selects 500 shoppers and asks them their opinion. 60% indicate they are supportive of the initiative. What is the sample and population? Is the 60% value a parameter or a statistic?

Solution

The sample is the 500 shoppers questioned. The population is less clear. While the intended population of this survey was Tucson citizens, the effective population was mall shoppers. There is no reason to assume that the 500 shoppers questioned would be representative of all Tucson citizens.

The 60% value was based on the sample, so it is a statistic.

Example 5

In a 2021 Gallup poll of 547 adults aged 18 and older living in the U.S. found that 42% of the respondents are satisfied with nation’s gun laws and 56% dissatisfied. Is the 42% a parameter or a statistic?

Solution

The 42% would represent a statistic as it is a value calculated from a sample.

Try it Now 2

A college reports that the average age of their students is 28 years old.  Is this a statistic or a parameter?

Hint 1 (click to Show/Hide)

A parameter is a value (average, percentage, etc.) calculated using all the data from a population

A statistic is a value (average, percentage, etc.) calculated using the data from a sample.

Answer (click to Show/Hide)

This is a parameter, since the college would have access to data on all students (the population)

Try it Now 3

A study was conducted at TUSD to analyze the average cumulative GPAs of students who graduated. Identify the Population, Sample, Parameter, and Statistic.

Answer (click to Show/Hide)

The population is all students who graduated from the TUSD last year.

The sample a group of students who graduated from the TUSD last year, who were randomly selected.

The parameter is the average cumulative GPA of students who graduated from the TUST last year.

The statistic is the average cumulative GPA of students in the study who graduated from the TUSD last year.

Categorizing Data

Video Introduction to Catagorizing Data (3 mins 29 secs – CC)

Once we have gathered data, we need to classify it on certain properties to know what types of tools we can use to analyze the data. The two types of classifications we are going to look at are categorical (also called qualitative) data or quantitative data.

Quantitative and categorical data

Categorical (qualitative) data are pieces of information that allow us to classify the objects under investigation into various categories.

Quantitative data are responses that are numerical in nature and with which we can perform meaningful arithmetic calculations.

Example 6

We might conduct a survey to determine the name of the favorite movie that each person in a math class saw in a movie theater.

When we conduct such a survey, the responses would look like: Frozen II, Black Panther, or Jurassic World. We might count the number of people who give each answer, but the answers themselves do not have any numerical values: we cannot perform computations with an answer like “Finding Nemo.” Is this categorical or quantitative data?

Solution

This would be categorical data.

Example 7

A survey could ask the number of movies you have seen in a movie theater in the past 12 months. In this situation when we conduct that survey the responses would look like: 4, 2, 12, 1, 5. Is this data categorical or quantitative?

Solution

This would be quantitative data as they are numbers and adding the values would make sense.

Other examples of quantitative data would be the running time of the movie you saw most recently (104 minutes, 137 minutes, 104 minutes, …) or the amount of money you paid for a movie ticket the last time you went to a movie theater ($5.50, $7.75, $9, …).

Sometimes, determining whether or not data is categorical or quantitative can be a bit trickier.

Example 8

Suppose we gather respondents’ ZIP codes in a survey to track their geographical location. Is this data categorical or quantitative?

Solution

ZIP codes are numbers, but we can’t do any meaningful mathematical calculations with them (it doesn’t make sense to say that 85005 is “500 less than” 85711 — that’s like saying that Phoenix, AZ is “500 less than” Tucson, AZ, which doesn’t make sense at all), so ZIP codes are really categorical data.

Example 9

A survey about the movie you most recently attended includes the question “How would you rate the movie you just saw?” with these possible answers:

  1. it was awful
  2. it was just OK
  3. I liked it
  4. it was great
  5. best movie ever!

Is this data categorical or quantitative?

Solution

Again, there are numbers associated with the responses, but we can’t really do any calculations with them: a movie that rates a 4 is not necessarily twice as good as a movie that rates a 2, whatever that means; if two people see the movie and one of them thinks it stinks and the other thinks it’s the best ever it doesn’t necessarily make sense to say that “on average they liked it.”

There are cases where data like this where we can order the values may be treated like quantitative results. Examples of this would be rating systems like you see on Amazon and other sites for which reviews are made of a product. The problem is that going from five star to four star may not be a large jump, but somebody going from three stars to star may be more extreme in their feeling of the product. The rating system can’t properly judge those difference and averages of these values may not be appropriate.

As we study movie-going habits and preferences, we shouldn’t forget to specify the population under consideration.  If we survey 3-7 year-olds the runaway favorite might be Finding Nemo.  13-17 year-olds might prefer Terminator 3.  And 33-37 year-olds might prefer…well, Finding Nemo.

Try it Now 4

Classify each measurement as categorical or quantitative

  1. Eye color of a group of people
  2. Daily high temperature of a city over several weeks
  3. Annual income
  4. Phone Number
Hint 1 (click to Show/Hide)

Categorical (qualitative) data are pieces of information that allow us to classify the objects under investigation into various categories.

Quantitative data are responses that are numerical in nature and with which we can perform meaningful arithmetic calculations.

Answer (click to Show/Hide)
  1. Categorical
  2. Quantitative
  3. Quantitative
  4. Categorical

Exercises


  1. A political scientist surveys 28 of the current 106 representatives in a state’s congress.   Of them, 14 said they were supporting a new education bill, 12 said there were not supporting the bill, and 2 were undecided.
    1. What is the population of this survey?
    2. What is the size of the population?
    3. What is the size of the sample?
    4. Give the sample statistic for the proportion of voters surveyed who said they were supporting the education bill.
    5. Based on this sample, we might expect how many of the representatives to support the education bill?
    Answer (click to Show/Hide)
    1. Population is the current representatives in the state’s congress
    2. 106
    3. the 28 representatives surveyed
    4. 14 out of 28 = ½ = 50%
    5. We might expect 50% of the 106 representatives = 53 representatives
  2. The city of Raleigh has 9500 registered voters. There are two candidates for city council in an upcoming election: Brown and Feliz. The day before the election, a telephone poll of 381 randomly selected registered voters was conducted. 112 said they’d vote for Brown, 238 said they’d vote for Feliz, and 31 were undecided.
    1. What is the population of this survey?
    2. What is the size of the population?
    3. What is the size of the sample?
    4. Give the sample statistic for the proportion of voters surveyed who said they’d vote for Brown.
    5. Based on this sample, we might expect how many of the 9500 voters to vote for Brown?
    Answer (click to Show/Hide)
    1. The registered voters in Raleigh
    2. 9500 registered voters.
    3. The sample was the 381 randomly selected registered voters.
    4. The sample statistic is 112/381 or 29.40%
    5. We would expect 29.40% of the 9500 registered voters to vote for Brown or 1123819500=2,793.
  3. In a study, you ask the subjects their age in years. Is this data categorical or quantitative?
    Answer (click to Show/Hide)
    Quantitative
  4. In a study, you ask the subjects their gender.  Is this data categorical or quantitative?
    Answer (click to Show/Hide)

    Categorical

  5. In a study, you ask subjects their social security number. Is this data categorical or quantitative?
    Answer (click to Show/Hide)
    Categorical
  6. In a study, you ask subjects their gross income. Is this data categorical or quantitative?
    Answer (click to Show/Hide)
    Quantitative
  7. A survey is conducted of students from Pima Community College by randomly selecting students taking classes at the Downtown Campus and found that 45 out of 97 students surveyed have taken an online course.
    1. What is the population of this survey?
    2. What is the intended population of this survey?
    3. What is the size of the sample?
    4. Give the sample statistic for the proportion of students who said they have taken an online course.
    Answer (click to Show/Hide)
    1. Students from Pima Community College at the Downtown Campus.
    2. All students at Pima Community College.
    3. The sample size was 97 students.
    4. The sample statistic is 45/97.

Attributions

This page contains modified content from “Psychology”, OpenStax College. Licensed under CC BY 4.0. Download for free at http://cnx.org/content/col11629/latest

This page contains modified content from “Collecting Data” by Foster et al., LibreTexts Licensed under CC BY-NC-SA.

This page contains modified content from “OpenStax Introductory Satistics” by Barbara Illowsky, Susan Dean. Licensed under CC BY 4.0.

This page contains content by Robert Foth, Math Faculty, Pima Community College, 2021. Licensed under CC BY 4.0.

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Topics in Mathematics Copyright © by Robert Foth is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.

Share This Book