Chapter 3 Exploring Data with Tables and Graphs

Learning Outcome:


Interpret quantitative data using tables and graphs, and descriptive statistics using histograms.


The chapter introduces various techniques such as frequency table and histogram to organize quantitative data to explore its important characteristics.

3.1 Frequency Distribution

A frequency distribution (or frequency table) shows how data are partitioned among several categories (or classes) by listing the categories along with the number (frequency) of data values in each of them.

Example: A retailer accepts four types of credit cards and lists the types used by the last \(50\) customers as follows:

The frequency distribution presents the frequencies for each type of credit card.

\[ \bbox[white,4px] { \color{black} { \begin{array}{c|c|c|c} \text{Type of Credit Cards} & \text{Frequency} \\ \hline \text{MasterCard} & 11 \\ \text{Visa} & 23 \\ \text{American Express} & 9 \\ \text{Discover} & 7 \\ \hline \text{Total} & 50 \end{array} } } \]

Relative Frequency Distribution

A relative frequency distribution (or percentage frequency distribution) is a variation of the basic frequency distribution in which a class frequency is replaced by a relative frequency (or proportion).

\[ \begin{align} \text{relative frequency for a class} &= \dfrac{\text{frequency for a class}}{\text{sum of all frequencies}} \\ \text{percentage of a class} &= \dfrac{\text{frequency for a class}}{\text{sum of all frequencies}} \times 100\%\\ \end{align} \]

\[ \bbox[white,4px] { \color{black} { \begin{array}{c|c|c|c} \text{Type of Credit Cards} & \text{Frequency} & \text{Relative Frequency} \\ \hline \text{MasterCard} & 11 & 11/50 = 0.22 \\ \text{Visa} & 23 & 23/50 = 0.46 \\ \text{American Express} & 9 & 9/50 = 0.18 \\ \text{Discover} & 7 & 7/50 = 0.14 \\ \hline \text{Total} & 50 & 50/50 = 1 \end{array} } } \]

3.1.1 Bar Graphs

A bar graph is a graphical representation of a frequency distribution. It consists of rectangles of equal width, with one rectangle for each category. The heights of the rectangles represent the frequencies or relative frequencies. Following are the frequency and relative frequency bar graphs for the credit card data.

3.2 Frequency Distribution for Quantitaive Data

\(\text {Table: Drive-Through Service Times (seconds) for McDonald's Lunches}\)

\[ \bbox[white,4px] { \color{black} { \begin{array}{c|c|c|c} \text{Time(Seconds)} & \text{Frequency} \\ \hline \text{75-124} & \text{11} \\ \text{125-174} & \text{24} \\ \text{175-224} & \text{10} \\ \text{225-274} & \text{3} \\ \text{275-324} & \text{2} \\ \hline \text{Total} & 50 \end{array} } } \]

Lower class limits: \({75, 125, 175, 225, 275}\)
Upper class limits: \({124, 174, 224, 274, 324}\)
Class boundaries: \({74.5, 124.5, 174.5, 224.5, 274.5, 324.5}\)
Class midpoints: \({99.5, 149.5, 199.5, 249.5, 299.5}\)
Class width: \(125 - 75 = 50\)

\[ \bbox[white,4px] { \color{black} { \begin{array}{c|c} \text{Time(Seconds)} & \text{Relative Frequency} \\ \hline \text{75-124} & \text{22}\% \\ \text{125-174} & \text{48}\% \\ \text{175-224} & \text{20}\% \\ \text{225-274} & \text{6}\% \\ \text{275-324} & \text{4}\% \\ \hline & 100\% \end{array} } } \]

Cumulative Distribution

\[ \bbox[white,4px] { \color{black} { \begin{array}{c|c|c} \text{Time(Seconds)} & \text{N} & \text{%} \\ \hline \text{75-124} & 11 & \text{22}\% \\ \text{125-174} & 35 & \text{70}\% \\ \text{175-224} & 45 & \text{90}\% \\ \text{225-274} & 48 & \text{96}\% \\ \text{275-324} & 50 & \text{100}\% \\ \hline \end{array} } } \]

3.2.1 Histogram

A histogram is a graph consisting of bars of equal width drawn adjacent to each other. The horizontal scale represents classes of quantitative data values; and the vertical scale represents frequencies.

A relative frequency histogram has the same shape and horizontal scale as a histogram, but the vertical scale uses relative frequencies (as percentages) instead of actual frequencies.

Importance of Histogram

  • Visually displays the shape of the distribution of the data
  • Shows the location of the center of the data
  • Shows the spread of the data
  • Identifies outliers

Density Histogram

\[ \begin{align} \textbf{Density} &= \dfrac{\textbf{relative frequency}}{\textbf{bin width}} \\ \text {Density of class (75-124)} &= \dfrac{\text{rel. freq of class (75-124)}}{\text{class width}} \\ &= \dfrac{0.22}{50} \\ &= 0.0044 \end{align} \]

In density histogram, area of each rectangular bar is the relative frequency of its class. \(\textbf{Total area of a density histogram is equal to 1.}\)

Practice - Construct a Density Histogram

The accompanying frequency distribution summarizes data on the number of times smokers attempted to quit before their final successful attempts.

\[ \bbox[yellow,5px] { \color{black} { \begin{array}{r|c} \text{Number of attempts} & \text{Frequency} & \text{Relative Frequency} & \text{Density} \\ \hline \textbf{0-10} & 778 \\ \textbf{10-20} & 306 \\ \textbf{20-30} & 274 \\ \textbf{30-40} & 221 \\ \textbf{40-50} & 238 \end{array} } } \]


Pie Charts

The distribution of a categorical variable can be described by a pie chart, which is a disk where slices represent the categories. The proportion of the total area for one slice is equal to the relative frequency for the category represented by the slice. The relative frequencies are usually written as percentages.

Example 1: Construct and Interpret a Pie Chart

A total of 273 children were surveyed about what job they would want to do. The jobs and the percentages of the children who voted for them are shown in the table.

\[ \bbox[yellow,5px] { \color{black} { \begin{array}{r|c} \text{Job} & \text{Percent} \\ \hline \text{Spy/Agent} & 16 \\ \text{Veterinarian} & 13 \\ \text{Professional Athlete} & 12 \\ \text{Movie Star} & 10 \\ \text{Video Game Designer} & 8 \\ \text{Doctor} & 6 \\ \text{Other} & 35 \end{array} } } \]

Questions:

  1. Find the proportion of the observations that fall in the spy category.

  2. Find the proportion of the observations that do NOT fall in the spy category.

  3. Find the proportion of the observations that fall in the athlete category OR fall in the movie-star category.


Interpreting a Multiple Bar Graph

In a survey in 2012, 1960 adults were asked the following question: “Generally speaking, do you usually think of yourself as a Republican, Democrat, Independent, or other?” The results of the survey are described by the multiple bar graph.

  1. What proportion of women thought of themselves as Democrats?

  2. Which political party did the greatest proportion of men choose?

  3. Compare the proportion of women who thought of themselves as Independents to the proportion of men who thought of themselves as Independents.

  4. A total of 1081 women and 879 men responded to the survey. Were there more women or men who thought of themselves as Independents? How is this possible, given there was a smaller proportion of women who thought of themselves as Independents than men?


Two-Way (Contingency) Table

The table summarizes the responses from all 42 students who participated in the survey about whether they had read a novel in the past year.

\[ \bbox[yellow,5px] { \color{black} { \begin{array}{l|c|c|c} \text{Gender} & \text{Did Not Read Novel} & \text{Read Novel} & \text{Total} \\ \hline \text{Female} & 6 & 19 & 25 \\ \text{Male} & 6 & 11 & 17 \\ \hline \text{Total} & 12 & 30 & 42 \\ \hline \end{array} } } \]

  1. How many of the students read a novel in the past year?
  2. What proportion of the students did not read a novel in the past year?
  3. What proportion of the women read a novel in the past year?
  4. What proportion of the students is men AND read a novel in the past year?

3.3 Graphical Summaries for Small Data Sets

Dotplot

A dotplot uses dots to show the frequency, or number of occurrences, of the values in a data set. The higher the stack of dots, the greater the number of occurrences there are of the corresponding value.


Stem-and-leaf Plot

Stem-and-leaf plots are a simple way to display small data sets. In a stem-and-leaf plot, the rightmost digit is the leaf, and the remaining digits form the stem.


Data:

\[ 1.2, 1.5, 1.6, 2.1, 2.4, 2.9, 3.0, 3.1, 3.2, 3.3, \\ 4.5, 4.6, 4.9, 5.0, 5.2, 5.8, 6.0, 6.3, 6.4, 6.5 \]



  The decimal point is at the |

  1 | 256
  2 | 149
  3 | 0123
  4 | 569
  5 | 028
  6 | 0345