Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 1: Exploring Data

Similar presentations


Presentation on theme: "Chapter 1: Exploring Data"— Presentation transcript:

1 Chapter 1: Exploring Data
Sec. 1.1 Analyzing Categorical Data

2 Data Analysis: Making Sense of Data
IDENTIFY the individuals and variables in a set of data CLASSIFY variables as categorical or quantitative DISPLAY categorical data with a bar graph IDENTIFY what makes some graphs of categorical data deceptive CALCULATE and DISPLAY the marginal distribution of a categorical variable from a two-way table CALCULATE and DISPLAY the conditional distribution of a categorical variable for a particular value of the other categorical variable in a two-way table DESCRIBE the association between two categorical variables

3 Data Analysis Statistics is the science of data. Data Analysis is the process of organizing, displaying, summarizing, and asking questions about data.

4 Data Analysis Individuals
objects described by a set of data (don’t necessarily have to be people) Variable any characteristic of an individual Categorical Variable places an individual into one of several groups or categories. Quantitative Variable takes numerical values for which it makes sense to find an average. Some number values can be considered categorical – for instance, area codes, floors of a building Some quantitative variables can be converted to categorical – Giving letter grades instead of numerical grades, meeting standards

5 Example, p. 3 CensusAtSchool is an international project that collects data about primary and secondary school students using surveys. Hundreds of thousands of students from Australia, Canada, New Zealand, South Africa, and the United Kingdom have taken part in the project since Data from the surveys are available at the project’s Web site ( We used the site’s “Random Data Selector” to choose 10 Canadian students who completed the survey in a recent year. The table displays the data.

6 Example, p. 3 Who are the individuals in this data set?
What variables were measured? Identify each as categorical or quantitative. Describe the individual in the highlighted row. 10 randomly selected Canadian students who participated in the CensusAtSchool survey. 7 variables: Province, Gender, Dominant Hand, and Preferred Communication are Categorical, and Language spoken, Height, and Wrist Circumference are Quantitative. Student lives in Ontario and is male. He speaks 4 languages, is left-handed, is cm tall with a wrist circumference of 147 mm and prefers text messaging. Notice that the female who is 166 cm tall has a wrist circumference of 65 mm. This is probably wrong. Always check to see if your data makes sense!

7 Data Analysis Distribution
tells us what values a variable takes and how often it takes those values. How to Explore Data Start by determining the variable of interest. Display the data in a graph. Add Numerical Summaries.

8 Categorical Variables
Categorical variables place individuals into one of several groups or categories. Frequency Table Format Count of Stations Adult Contemporary 1556 Adult Standards 1196 Contemporary Hit 569 Country 2066 News/Talk 2179 Oldies 1060 Religious 2014 Rock 869 Spanish Language 750 Other Formats 1579 Total 13838 Relative Frequency Table Format Percent of Stations Adult Contemporary 11.2 Adult Standards 8.6 Contemporary Hit 4.1 Country 14.9 News/Talk 15.7 Oldies 7.7 Religious 14.6 Rock 6.3 Spanish Language 5.4 Other Formats 11.4 Total 99.9 Variable Count Percent “Values” do not necessarily mean numbers. A categorical variable for instance can have a value of “red” if you are talking about colors. Values

9 Count vs. Percent Frequency Table – displays counts in each category
Relative Frequency Table – displays the percent in each category (% found by count in category / total count) Usually percents are rounded. If the rounded percents do not add up to exactly 100%, it may be due to round-off error.

10 Displaying Categorical Data – Bar Graph
Bars don’t touch and must be equally wide Can rearrange categories Scale may be in counts or percents Bar graphs compare several quantities by comparing the heights of bars that represent those quantities. Our eyes, however, react to the area of the bars as well as to their height. Must make bars equally wide because of this.

11 Displaying Categorical Data – Pie Chart
Must be Out of a Whole (100%) Read more about Pie Charts on p. 8. You may get data where they give the % out of that category. This data cannot be displayed in a pie chart.

12 Graphs: Good and Bad There are two important lessons to keep in mind:
Who Buys iMacs? There are two important lessons to keep in mind: beware the pictograph, and watch those scales. It is tempting to replace the bars with pictures for greater eye appeal. Don’t do it!

13 Two-Way Tables and Marginal Distributions
A two-way table describes two categorical variables, organizing counts according to a row variable and a column variable. Typically, we put the explanatory variables in the columns and the response variables in the rows.

14 Example, p. 12 What are the variables described by this two-way table?
I’m Gonna Be Rich! A survey of 4826 randomly selected young adults (aged 19 to 25) asked, “What do you think the chances are you will have much more than a middle-class income at age 30?” The table below shows the responses. What are the variables described by this two-way table? How many young adults were surveyed? Describes two categorical variables: Gender and opinion about becoming rich young adults were surveyed.

15 Two-Way Tables and Marginal Distributions
The marginal distribution of one of the categorical variables in a two-way table of counts is the distribution of values of that variable among all individuals described by the table. Note: Percents are often more informative than counts, especially when comparing groups of different sizes. How to examine a marginal distribution: Use the data in the table to calculate the marginal distribution (in percents) of the row or column totals. (row or column total / table total) Make a graph to display the marginal distribution.

16 Example, p. 13 Examine the marginal distribution of chance of getting rich. Response Percent Almost no chance 194/4826 = 4.0% Some chance 712/4826 = 14.8% A chance 1416/4826 = 29.3% A good chance 1421/4826 = 29.4% Almost certain 1083/4826 = 22.4%

17 Example, p. 13 Response Percent Almost no chance 194/4826 = 4.0%
Some chance 712/4826 = 14.8% A chance 1416/4826 = 29.3% A good chance 1421/4826 = 29.4% Almost certain 1083/4826 = 22.4%

18 Relationships Between Categorical Variables
A conditional distribution of a variable describes the values of that variable among individuals who have a specific value of another variable. How to examine or compare conditional distributions: Select the row(s) or column(s) of interest. Use the data in the table to calculate the conditional distribution (in percents) of the row(s) or column(s). Make a graph to display the conditional distribution. Use a bar graph to compare distributions.

19 Example, p. 15 Calculate the conditional distribution of opinion among males. Examine the relationship between gender and opinion. Response Male Almost no chance 98/2459 = 4.0% Some chance 286/2459 = 11.6% A chance 720/2459 = 29.3% A good chance 758/2459 = 30.8% Almost certain 597/2459 = 24.3% Female 96/2367 = 4.1% 426/2367 = 18.0% 696/2367 = 29.4% 663/2367 = 28.0% 486/2367 = 20.5% Segmented bar graph

20 We can also compare categories with a side-by-side bar graph.

21 Example, p. 15 Can we say there is an association between gender and opinion in the population of young adults? Making this determination requires formal inference, which will have to wait a few chapters. Caution! Even a strong association between two categorical variables can be influenced by other variables lurking in the background.

22 Data Analysis: Making Sense of Data
DISPLAY categorical data with a bar graph IDENTIFY what makes some graphs of categorical data deceptive CALCULATE and DISPLAY the marginal distribution of a categorical variable from a two-way table CALCULATE and DISPLAY the conditional distribution of a categorical variable for a particular value of the other categorical variable in a two-way table DESCRIBE the association between two categorical variables A dataset contains information on individuals. For each individual, data give values for one or more variables. Variables can be categorical or quantitative. The distribution of a variable describes what values it takes and how often it takes them. Inference is the process of making a conclusion about a population based on a sample set of data.

23 Homework – Due Block Day
P. 6 # 3 & 6 P. 20 # 9, 16, 19, 21


Download ppt "Chapter 1: Exploring Data"

Similar presentations


Ads by Google