A computer monitor with a bar and line charts being displayed.

4.1: Presenting Categorical Data Graphically

Learning Objectives

Upon completion of this section, you should be able to

  • Create a frequency table
  • Create bar graphs
  • Create pie charts
  • Identify common graphical mistakes

Frequency Table

Categorical, or qualitative, data are pieces of information that allow us to classify the objects under investigation into various categories. For example this could be the color of the car a person drives, the zip code where they reside, or the education level attained. We usually begin working with categorical data by summarizing the data into a frequency table. The frequency table is organized by identifying the categories found in the data and then counting how many observations there are for each category.

Frequency Table

A frequency table is a table with two columns or two rows. One column (or row) lists the categories, and another for the frequencies with which the items in the categories occur (how many items fit into each category).

Example 1

The earned grade of 21 randomly selected students enrolled in mathematics courses at PCC is given below. Organize the data into a frequency table.

BBCCCBIDBBACBWBDBFDAC 


Solution

In this frequency table we will use two rows (often done in textbooks and articles as it uses less vertical space on a page).

The first row will represent the grade classification and the second row counts how many students in the sample of 21 with that grade.

Grade A B C D F I W
Frequency 2 8 5 3 1 1 1

Alternatively we could have made the table with two columns where the first column would be the age characteristic and the second column would be the frequency.

In the example below if we had listed out the original data the list of colors would just appear as a long jumple of words without any rhyme or reason for their order or any idea of what the data tells us in any easy way about the frequency of colors observed.

Example 2

An insurance company determines vehicle insurance premiums based on known risk factors. If a person is considered a higher risk, their premiums will be higher. One potential factor is the color of your car. The insurance company believes that people with some color cars are more likely to get in accidents. To research this, they examine police reports for recent total-loss collisions. The data is summarized in the frequency table below.

Color Frequency
Blue 25
Green 52
Red 41
White 36
Black 39
Grey 23

I would caution you on interpreting this data at this point. One of the things we didn’t examine was the number of potential of cars of each type of color on the road. Hypothetically speaking if there were twice as many Green cars as Blue cars on the road the difference in the number of Green cars in accidents and Blue cars in accidents may be explained by that total difference in the number of cars of each color on the road.

Bar Graphs

Sometimes we need an even more intuitive way of displaying data. This is where charts and graphs come in. There are many, many ways of displaying data graphically, but we will concentrate on one very useful type of graph called a bar graph. In this section we will work with bar graphs that display categorical data; the next section will be devoted to bar graphs that display quantitative data.

Bar graph

A bar graph is a graph that displays a bar for each category with the length of each bar indicating the frequency of that category.

To construct a bar graph, we need to draw a vertical axis and a horizontal axis. The vertical direction will have a scale and measure the frequency of each category; the horizontal axis has no scale in this instance. On the horizontal axis we will create bars to represent each category. It is important that the bars are evenly spaced and each have the same width. The construction of a bar chart is most easily described by use of an example.

Example 3

You would start the process with identifying the frequency as the vertical axis values and the horizontal axis being made up of the categories (color of car). On the vertical axis we look at our data to determine the largest value as we will need to make sure the bar graph can be displayed up to that height. Using our car data above we see the highest frequency is 52, so our vertical axis needs to go from 0 to 52, but we might as well use 0 to 55, so that we can put a hash mark every 5 units. For each category you will create a bar going from 0 to the frequency value in the table (as shown below):

Bar Graph for data on number of accidents for each color. Horizontal Axis: 'Vehicle color involved in total-loss collision.' Vertical Axis: 'Frequency.' First bar is labeled Blue with height of 25. Second bar is labeled Green with height of 52. Third bar is labeled Red with height of 41. Fourth bar is labeled White with height of 26. Fifth bar is labeled Black with height of 39. Sixth bar is labeled Grey with a height of 23.

Notice that the height of each bar is determined by the frequency of the corresponding color. The horizontal gridlines are a nice touch, but not necessary. In practice, you will find it useful to draw bar graphs using graph paper, so the gridlines will already be in place, or using technology. Instead of gridlines, we might also list the frequencies at the top of each bar, like this:

Bar Graph for data on number of accidents for each color with height of each bar labeled on top of each bar. Horizontal Axis: 'Vehicle color involved in total-loss collision.' Vertical Axis: 'Frequency.' First bar is labeled Blue with height of 25. Second bar is labeled Green with height of 52. Third bar is labeled Red with height of 41. Fourth bar is labeled White with height of 26. Fifth bar is labeled Black with height of 39. Sixth bar is labeled Grey with a height of 23.

These types of graphs are typically easy to create in a spreadsheet program, like Excel or Google Sheets.

In the above example, our chart might benefit from being reordered from largest to smallest frequency values. This arrangement can make it easier to compare similar values in the chart, even without gridlines. When we arrange the categories in decreasing frequency order like this, it is called a Pareto chart.

Video Summary of Examples (4 mins 19 secs – CC)

Pareto chart

A Pareto chart is a bar graph ordered from highest to lowest frequency.

Example 4

Transforming our bar graph from earlier into a Pareto chart, we get:

Bar Graph for data on number of accidents for each color with height of each bar labeled on top of each bar and bars are in decreasing order based on Frequency. Horizontal Axis: 'Vehicle color involved in total-loss collision.' Vertical Axis: 'Frequency.' First bar is labeled Blue with height of 25. Second bar is labeled Green with height of 52. Third bar is labeled Red with height of 41. Fourth bar is labeled White with height of 26. Fifth bar is labeled Black with height of 39. Sixth bar is labeled Grey with a height of 23.

Video Solution Example 3 (1 mins 58 secs – CC)

Example 5

In some cases the bar graph categories may have an implied order (like dates or quarters) and we would not want to move those categories in a different order as it would lose the meaning behind the graphic. Take the example below from Statista about the revenue for Zoom. As you read this graph we are seeing a progression in time of the revenue growth of Zoom. Reordering from largest to small would lose the meaning behind what we are seeing with the visual representation of the revenue growth through time and the huge increase in the 2021 fiscal year report (Zoom’s 2021 fiscal year started in February 2020). That huge increase in growth was explained by the start of the Covid-19 Pandemic.

Infographic: Zoom's Revenue Skyrockets On Pandemic Boost | Statista You will find more infographics at Statista

Example 6

In a survey, adults were asked whether they personally worried about a variety of environmental concerns.  The number (out of 1012 surveyed) indicating that they worried “a great deal” about some selected concerns is summarized below.

Environmental Issue Frequency
Pollution of drinking water 597
Contamination of soil and water by toxic waste 526
Air pollution 455
Global warming 354

Construct the a bar graph for the data.


Solution

Bar Graph for data on number of responses on an environmental concern. Horizontal Axis: 'Environmental Worries.' Vertical Axis: 'Frequency.' First bar is labeled Water Pollution with height of 597. Second bar is labeled Toxic Waste with height of 526. Third bar is labeled Air Pollution with height of 455. Fourth bar is labeled Global Warming with height of 354.

Now since the bars are ordered in terms of frequency from greatest to least we can call this a pareto chart.

To show relative sizes, it is common to use a pie chart. In a pie chart a circle is divided into wedges where each category represents a wedge. The size of the wedge of the whole circle is relative to the frequency for that category when compared to all of the data. If one category represents 25% of the data, than the wedge for that category would be 25% of the circle.

Relative Frequency

A relative frequency refers to the proportion of times we observe that category item within a collection of data compared to the total number of observations in that data.

Relative Frequency of a category item = (# of times an observation from a category is observed) / (total number of observations)

Values between 0 and 1: Relative frequency is a proportion and must lie between 0 (never happens) and 1 (always happens).

Pie Chart

A pie chart is a circle with wedges that represent each categories relative frequencies.

Example 7

In the insurance company example from above construct a relative frequency table along with a pie chart to represent the car color data that was provided.

Color Frequency
Blue 25
Green 52
Red 41
White 36
Black 39
Grey 23

Solution

To relative frequency table for the vehicle color data we start by adding a new column to our original frequency table and title it as Relative Frequency. The relative frequency for each car color is then found by:

Category FrequencyTotal of all Frequency100%.

For example to find the Blue relative frequency first find the total number in our frequency column (25+52+41+36+39+23=216) and then calculate the relative frequency:

25216100%11.6%

Now do this for each vehicle color:

Color Frequency Relative Frequency
Blue 25 11.6%
Green 52 24.1%
Red 41 19.0%
White 36 16.7%
Black 39 18.1%
Grey 23 10.6%

To find the pie chart we divide a circle into wedges for each color (category) where the relative frequency would be the percent of the circle that is filled up for that color. For our vehicle color data, a pie chart might look like this:

Pie chart labeled 'Vehicle color involved in total-loss collisions' where each slice of the pie is the same color of the vehicle color it represents. The size of the slice is the relative size of each colors count of the whole. Blue has 11.6% of the total area. Green has 24.1% of the total area. Red has 19.0% of the total area. White has 16.7% of the total area. Black has 18.1% of the total area. Grey has 10.1% of the total area. A legend is included which indicates which color on the pie chart is related to each color of the vehicle in a total-loss collision.

When looking at the above pie chart you may have a hard time determining which wedge is the largest, 2nd largest, and so on. Pie charts can often benefit from including frequencies or relative frequencies (percent) in the chart next to the pie slices. Often having the category names in a legend next to the pie slices is helpful we can also attach the names to the slices in most programs (as seen below).

Pie chart labeled 'Vehicle color involved in total-loss collisions' where each slice of the pie is the same color of the vehicle color it represents. Next to each slice of pie the frequency is also given. The size of the slice is the relative size of each colors count of the whole. Blue has 11.6% of the total area with a frequency of 25. Green has 24.1% of the total area with a frequency of 52. Red has 19.0% of the total area with a frequency of 41. White has 16.7% of the total area with a frequency of 36. Black has 18.1% of the total area with a frequency of 39. Grey has 10.1% of the total area with a frequency of 23.

Video Solution Example 5 (4 mins 49 secs – CC)

The pie chart below shows the percentage of voters supporting each candidate running for a local senate seat. If there are 20,000 voters in the district, the pie chart shows that about 11% of those, about 2,200 voters, support Reeves.

Take note that without the percentages labeled on the graph it would be hard to determine if Ellison had indeed received more votes than Douglas. By including the percentages we give both a visual and numeric way to compare different groups in the pie chart.

Video Explanation (1 mins 1 secs – CC)

Pie charts look nice, but are harder to draw by hand than bar charts since to draw them accurately we would need to compute the angle each wedge cuts out of the circle, then measure the angle with a protractor. Computers are much better suited to drawing pie charts. Common software programs like Excel or Google Sheets are able to create bar graphs, pie charts, and other graph types.

Try it Now 1

Create a bar graph and a pie chart to illustrate the grades on a history exam below.
A: 12 students, B: 19 students, C: 14 students, D: 4 students, F: 5 students

Answer (click to Show/Hide)

Start with creating a frequency table and adding the relative frequency column (for the pie chart).

Grade Frequency Relative Frequency
A 12 1254100%22.2%
B 19 1954100%35.2%
C 14 1454100%25.9%
D 4 454100%7.4%
F 5 554100%9.3%

Both charts are given below.

Two graphs displayed. First is a bar graph with Grade on horizontal axis and Frequency on vertical axis. First bar represents A with a frequency of about 12, second bar represent B with a frequency about 18, third bar represents C with a frequency of about 14, fourth bar represents D with a frequency of about 4, fifth bar represents F with a frequency of 5. The second graph is a pie chart. Each letter grade represents a different region and has the percentage of students labeled next to the letter grade: A 22%, B 36%, C 26%, D 7%, and F 9%.

Be aware we only did the relative frequency as an exercise to show where the numbers in the pie chart came from. Typical software packages do not require you to do that step.

Common Mistakes on Graphs

Video Summary of Bad Graphs (3 mins 3 secs – CC)

Don’t get fancy with graphs! People sometimes add features to graphs that don’t help to convey their information. For example, 3-dimensional bar charts like the one shown on the right are usually not as effective as their two-dimensional counterparts. This chart makes it very challenging to determine the heights of the bars due to the horizontal axis being skewed. It would be really challenging to determine if there were more blue or more black cars involved in a total-loss collision.

3-dimensional bar graph for number of vehicles of a particular color involved in a total-loss collision.

Here is another way that fanciness can lead to trouble. Instead of plain bars, it is tempting to substitute meaningful images. This type of graph is called a pictogram.

Pictogram

A pictogram is a statistical graphic in which the size of the picture is intended to represent the frequencies or size of the values being represented.

Example 8

A labor union might produce the graph to the right to show the difference between the average manager salary and the average worker salary.

Two money bags are in the picture. The first money bag is labeled Manager Salaries and looks to be four times as large as the second money bag labeled Workers Salaries.

Looking at the picture, it would be reasonable to guess that the manager salaries is 4 times as large as the worker salaries – the area of the bag looks about 4 times as large.  However, the manager salaries are in fact only twice as large as worker salaries, which were reflected in the picture by making the manager bag twice as tall.

Try it Now 2

Carefully examine the 2011 State of the Union address graphic given below. Does anything seem wrong? What caused the error?

State of Union Speech from 2011 where circles ares used to represent the size of the Grodd Domestic Product. United States $14.6 Trillion, China $5.7 Trillion, Japan $5.3 Trillion, Germany $3.3 Trillion, and France $2.5 Trillion. The circles used the Diameter to be the GDP which causes visually the area to be much larger than it should. For instance it appears you could fit four to five Chinea circles inside yet United States GDP is less than 3 times the size of Chinas GDP

Hint 1 (click to Show/Hide)

Visually something is not right. Look at the circles that are created. Does anything seem odd based on the size of the circle and the sizes of the numbers being compared?

Answer (click to Show/Hide)

This type of distortion can be intentional or unintentional as in the 2011 State of Union Address shown above. [Image Source dy/dan blog] The error in the image shown in the State of the Union is based upon the diameter being used as the “height” of a graphic causing the area to be disproportional larger than it should have been. A rough estimate could be shown that we could fit six circles the size of China in the United States region, yet we can see based on the values given that United States GDP is not six times the size of Chinas (it is in face less than three times the size of China).

Another distortion in bar charts results from setting the baseline to a value other than zero. The baseline is the bottom of the vertical axis, representing the least number of cases that could have occurred in a category. Normally, this number should be zero. There are times where setting this number higher is needed to show differences in values on the graph, but other times this change dramatically changes the message from the data as shown in the next example.

Example 9

Compare the two graphs below showing support for same-sex marriage rights from a poll taken in December 2008. The difference in the vertical scale on the first graph suggests a different story than the true differences in percentages; the second graph makes it look like twice as many people oppose marriage rights as support it.

First graph displays vertical axis going from 0 to 100 showing a difference in support to be narrow and around 10%. The second graph has vertical axis going from 40 to 60 with oppose appears three times as large as support yet is still around 10% when you compare the values on the vertical axis.

In the above example we saw that by changing this vertical axis we are allowing for a different story to be told. On the flip side it is sometimes helpful to do this to allow for the examination of close differences between groups.

Try it Now 3

A poll was taken asking people if they agreed with the positions of the 4 candidates for a county office. The poll found that 42% agreed with Nguyen’s position, 35% agreed with McKee’s position, 52% agreed with Brown’s position, and 64% agreed with Jones position.

Pie Chart with the following regions: 42% agreed with Nguyen's position, 35% agreed with McKee's position, 52% agreed with Brown's position, and 64% agreed with Jones position. Each slice of the pie has the candidates name and percent who agreed listed next to it.

Does the pie chart above present a good representation of this data? Explain.

Answer (click to Show/Hide)

A pie chart is inappropriate when a respondent can give an answer that falls into multiple categories (as in this case). You can see this is incorrect since the percentages do not add to 100%. A better approach for this visual would be a bar chart (where you can put the relative frequency on the vertical axis).

Bar chart showing the percent approving candidates position. Google Sheet for Try it Now 3 data.

Exercises


Please work on all the problems listed below for homework. You may ask questions in the discussion forum (it is also a great place to compare answers with your classmates).

  1. The table below shows scores on a Math test.
    80 50 50 90 70 70 100 60 70 80 70 50
    90 100 80 70 30 80 80 70 100 60 60 50
    1. Treat the scores 30, 40, 50, 60, 70, 80, 90, and 100 as a category. Complete the frequency table for the Math test scores.
      Test Score
      Score Frequency
      30
      40
      50
      60
      70
      80
      90
      100
    2. Construct a bar graph of the data
    3. Construct a pie chart of the data
    Answer (click to Show/Hide)
    1. Test Score
      Score Frequency
      30 1
      40 0
      50 4
      60 3
      70 6
      80 5
      90 2
      100 3
    2. This is technically a histogram (something you will see in a later section).
      Bar graph math test scores.Google Sheet for Exercise 1
    3. Pie Chart
      Pie chart math test scores Google Sheet for Exercise 1
  2. A group of adults where asked what type (model) of cars they had in their household
    1. Complete the frequency table for the car number data
    2. Construct a bar graph of the data
    3. Construct a pie chart of the data
    Type (model) of cars in your household
    Ford Kia Jeep Ford Toyota Toyota Chevy Honda Ford Toyota Honda Chevy
    Kia Chrysler Honda Jeep Ford Ford Toyota Kia Ford Toyota Chevy Toyota
    Answer (click to Show/Hide)
    1. Test Score
      Model Frequency
      Ford 6
      Kia 3
      Jeep 2
      Toyota 6
      Chevy 3
      Honda 3
      Chrysler 1
    2. Bar Graph
      Bar graph car model Google Sheet for HW Exercise 2
    3. Pie Chart
      Pie chart car model Google Sheet for HW Exercise 2
  3. A group of adults were asked how many children they have in their families. The bar graph below shows the number of adults who indicated each number of children.

    Frequency on vertical axis. Number of Children on horizontal axis. 0 Children has frequency 5, 1 Children has frequency 3, 2 Children has frequency 4, 3 Children has frequency 2, 4 Children has frequency 0, 5 Children has frequency 1.

    1. How many adults where questioned?
    2. What percentage of the adults questioned had 0 children?
    Answer (click to Show/Hide)
    1. The total number of adults from the table:
      5+3+4+2+1=15
    2. 5 of the 15 adults had 0 children:
      515100%=3313%515100%33.33%
  4. Jasmine was interested in how many days it would take an order of a single movie from Netflix to arrive at her door. The graph below shows the data she collected. The frequency represents orders of a single movie.

    Frequency on Vertical axis. Shipping time in days on horizontal axis. Shipping time with 1 day has frequency of 4, Shipping time with 2 day has frequency of 8, Shipping time with 3 day has frequency of 6, Shipping time with 6 day has frequency of 0, Shipping time with 5 day has frequency of 1.

    1. How many movies in all did she order?
    2. What percentage of the movies arrived in one day? Round to the nearest tenth.
    Answer (click to Show/Hide)
    1. The total number of movies ordered:
      4+8+6+1=19
    2. The percentage of movies that arrived in one day:
      419100%21.1%
  5. The bar graph below shows the percentage of students who received each letter grade on their last English paper. The class contains 20 students.  What number of students earned an A on their paper?

    Vertical axis is frequency in percentage. Horizontal axis has bars representing grade earned. Bar representing an A grade has frequency at 25%. Bar representing a B grade has frequency at 35%. Bar representing C grade has frequency at 25%. Bar representing a D grade has frequency at 15%.

    Answer (click to Show/Hide)

    The table is showing the percent of the class earning a given grade. Looking at the column for A we see that it looks like 25% of the class earned an A. The class size was 20, so 25% of the 20 earned an A:

    2510020=5. This shows 5 students earned an A.

  6. Kori categorized her spending for this month into four categories: Rent, Food, Fun, and Other. The percents she spent in each category are pictured here.  If she spent a total of $2600 this month, how much did she spend on rent?

    Pie Chart. Food represents 24% of the region, Rent represents 26% of the region, Fun represents 16% of the region, and Other represent 34% of the region.

    Answer (click to Show/Hide)

    From the pie chart we see rent represents 26% of her spending. To find the total spent on rent find 26% of 2600 (total spent):

    26100$2600=$676. Kori spent $676 on rent.

  7. A graph appears below showing the number of adults and children who prefer each type of soda. There were 130 adults and kids surveyed. Discuss some ways in which the graph below could be improved

    Side by Side bar chart using 3D. Each type of soda has both the Adult and Kids bar shown. No label on Vertical or Horizontal axis. Bars labeled as Coke, Diet Coke, Sprite, and Cherry Coke. Somewhat difficult to read height of bars quickly as the front and back appear at different heights when compared to grid in background.

    Answer (click to Show/Hide)

    It is hard to make comparison on 3-d graphs as it can be difficult to determine the heights of each bar. It would be better to turn this into a 2-d bar graph. The graph is also misleading in that the y-axis values do not start with 0, so a height difference between two bars is magnified and seems larger than what it actually is numerically.

  8. The graph below shows the number of complaints for six different airlines as reported to the US Department of Transportation in February 2013. Alaska, Pinnacle, and Airtran Airlines have far fewer complaints reported than American, Delta, and United. Can we conclude that American, Delta, and United are the worst airline carriers since they have the most complaints?

    This is a bar graph with 6 different airlines on the x-axis, and number of complaints on y-axis. The graph is titled Total Passenger Complaints. Data is from an April 2013 DOT report.

    Answer (click to Show/Hide)

    You cannot assume that the numbers of complaints reflect the quality of the airlines. The airlines shown with the greatest number of complaints could be the ones with the most passengers. You must consider the appropriateness of methods for presenting data; in this case displaying totals is misleading as the categories where the data was pulled from (airlines) are not of equal sizes. A more appropriate choice would be to compare the percent of complaints for an airline as it takes into consideration the total number of passengers to compute that percent.

  9. Below is a frequency table that shows the number of covid cases in some Arizona Counties on May 7 2021 in thousands. (source: azdhs.gov/covid19/data/index.php).
    Arizona Covid Cases
    Arizona County Covid Cases in Thousands
    Maricopa 540
    Pima 115
    Pinal 52
    Yuma 37
    Mohave 23
    Yavapai 19
    1. Construct a bar graph to represent the data for the number of covid cases (in thousands) for the Arizona Counties.
    2. What danger is there to compare the values for each county directly against each other?
    3. From the bar graph it seems clear Maricopa has many more cases of covid when compared to Yuma (about 15 times as many). If you factor in the population for each county we can get a better understanding of the penetration of Covid-19. According to the the recent 2019 census Maricopa has a population of 4,485,414 and Yuma has a population of 209,468. Which county has a higher percent of covid cases?

Attributions

This page contains modified content from David Lippman, “Math In Society, 2nd Edition.” Licensed under CC BY-SA 4.0.

This page contains modified content from “Collecting Data” by Foster et al., LibreTexts is licensed under CC BY-NC-SA 4.0.

This page contains modified content from “OpenStax Introductory Statistics” by Barbara Illowsky, Susan Dean. Licensed under CC BY 4.0.

This page contains content by Robert Foth, Math Faculty, Pima Community College, 2021. Licensed under CC BY 4.0.

The survey data for Example 1 is from Gallup Poll.  March 5-8, 2009. http://www.pollingreport.com/enviro.htm

The survey data for Example 1 is from CNN/Opinion Research Corporation Poll.  Dec 19-21, 2008, from http://www.pollingreport.com/civil.htm

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Topics in Mathematics Copyright © by Robert Foth is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.

Share This Book