Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 1: Exploring Data Sec. 1.1 Analyzing Categorical Data.

Similar presentations


Presentation on theme: "Chapter 1: Exploring Data Sec. 1.1 Analyzing Categorical Data."— Presentation transcript:

1 Chapter 1: Exploring Data Sec. 1.1 Analyzing Categorical Data

2 Learning Objectives After this section, you should be able to: The Practice of Statistics, 5 th Edition2 IDENTIFY the individuals and variables in a set of data CLASSIFY variables as categorical or quantitative DISPLAY categorical data with a bar graph IDENTIFY what makes some graphs of categorical data deceptive CALCULATE and DISPLAY the marginal distribution of a categorical variable from a two-way table CALCULATE and DISPLAY the conditional distribution of a categorical variable for a particular value of the other categorical variable in a two- way table DESCRIBE the association between two categorical variables Data Analysis: Making Sense of Data

3 Data Analysis Statistics is the science of data. Data Analysis is the process of organizing, displaying, summarizing, and asking questions about data.

4 Data Analysis Individuals objects described by a set of data (don’t necessarily have to be people) Variable any characteristic of an individual Individuals objects described by a set of data (don’t necessarily have to be people) Variable any characteristic of an individual Categorical Variable places an individual into one of several groups or categories. Categorical Variable places an individual into one of several groups or categories. Quantitative Variable takes numerical values for which it makes sense to find an average. Quantitative Variable takes numerical values for which it makes sense to find an average.

5 Example, p. 3 CensusAtSchool is an international project that collects data about primary and secondary school students using surveys. Hundreds of thousands of students from Australia, Canada, New Zealand, South Africa, and the United Kingdom have taken part in the project since 2000. Data from the surveys are available at the project’s Web site (www.censusatschool.com). We used the site’s “Random Data Selector” to choose 10 Canadian students who completed the survey in a recent year. The table displays the data.www.censusatschool.com

6 Example, p. 3 (a) Who are the individuals in this data set? (b) What variables were measured? Identify each as categorical or quantitative. (c) Describe the individual in the highlighted row.

7 Distribution tells us what values a variable takes and how often it takes those values. Distribution tells us what values a variable takes and how often it takes those values. Data Analysis How to Explore Data 1.Start by determining the variable of interest. 2.Display the data in a graph. 3.Add Numerical Summaries.

8 Categorical Variables Categorical variables place individuals into one of several groups or categories. Frequency Table FormatCount of Stations Adult Contemporary1556 Adult Standards1196 Contemporary Hit569 Country2066 News/Talk2179 Oldies1060 Religious2014 Rock869 Spanish Language750 Other Formats1579 Total13838 Relative Frequency Table FormatPercent of Stations Adult Contemporary11.2 Adult Standards8.6 Contemporary Hit4.1 Country14.9 News/Talk15.7 Oldies7.7 Religious14.6 Rock6.3 Spanish Language5.4 Other Formats11.4 Total99.9 Count Percent Variable Values

9 Count vs. Percent  Frequency Table – displays counts in each category  Relative Frequency Table – displays the percent in each category (% found by count in category / total count)  Usually percents are rounded. If the rounded percents do not add up to exactly 100%, it may be due to round-off error.

10  Categorical data  Bars don’t touch and must be equally wide  Can rearrange categories  Scale may be in counts or percents Displaying Categorical Data – Bar Graph

11  Categorical Data  Must be Out of a Whole (100%) Displaying Categorical Data – Pie Chart Read more about Pie Charts on p. 8.

12 Graphs: Good and Bad There are two important lessons to keep in mind: (1)beware the pictograph, and (2)watch those scales. There are two important lessons to keep in mind: (1)beware the pictograph, and (2)watch those scales. Who Buys iMacs?

13 Two-Way Tables and Marginal Distributions A two-way table describes two categorical variables, organizing counts according to a row variable and a column variable. Typically, we put the explanatory variables in the columns and the response variables in the rows.

14 Example, p. 12 What are the variables described by this two-way table? How many young adults were surveyed? I’m Gonna Be Rich! A survey of 4826 randomly selected young adults (aged 19 to 25) asked, “What do you think the chances are you will have much more than a middle-class income at age 30?” The table below shows the responses.

15 Two-Way Tables and Marginal Distributions The marginal distribution of one of the categorical variables in a two- way table of counts is the distribution of values of that variable among all individuals described by the table. Note: Percents are often more informative than counts, especially when comparing groups of different sizes. How to examine a marginal distribution: 1) Use the data in the table to calculate the marginal distribution (in percents) of the row or column totals. (row or column total / table total) 2) Make a graph to display the marginal distribution. How to examine a marginal distribution: 1) Use the data in the table to calculate the marginal distribution (in percents) of the row or column totals. (row or column total / table total) 2) Make a graph to display the marginal distribution.

16 Example, p. 13 ResponsePercent Almost no chance 194/4826 = 4.0% Some chance712/4826 = 14.8% A 50-50 chance1416/4826 = 29.3% A good chance1421/4826 = 29.4% Almost certain1083/4826 = 22.4% Examine the marginal distribution of chance of getting rich.

17 Example, p. 13 ResponsePercent Almost no chance 194/4826 = 4.0% Some chance712/4826 = 14.8% A 50-50 chance1416/4826 = 29.3% A good chance1421/4826 = 29.4% Almost certain1083/4826 = 22.4%

18 Relationships Between Categorical Variables A conditional distribution of a variable describes the values of that variable among individuals who have a specific value of another variable. How to examine or compare conditional distributions: 1) Select the row(s) or column(s) of interest. 2) Use the data in the table to calculate the conditional distribution (in percents) of the row(s) or column(s). 3) Make a graph to display the conditional distribution. Use a bar graph to compare distributions. How to examine or compare conditional distributions: 1) Select the row(s) or column(s) of interest. 2) Use the data in the table to calculate the conditional distribution (in percents) of the row(s) or column(s). 3) Make a graph to display the conditional distribution. Use a bar graph to compare distributions.

19 Example, p. 15 ResponseMale Almost no chance 98/2459 = 4.0% Some chance 286/2459 = 11.6% A 50-50 chance 720/2459 = 29.3% A good chance 758/2459 = 30.8% Almost certain 597/2459 = 24.3% Calculate the conditional distribution of opinion among males. Examine the relationship between gender and opinion. Female 96/2367 = 4.1% 426/2367 = 18.0% 696/2367 = 29.4% 663/2367 = 28.0% 486/2367 = 20.5%

20

21 Example, p. 15 Caution! Even a strong association between two categorical variables can be influenced by other variables lurking in the background. Caution! Even a strong association between two categorical variables can be influenced by other variables lurking in the background. Can we say there is an association between gender and opinion in the population of young adults? Making this determination requires formal inference, which will have to wait a few chapters.

22 Section Summary In this section, we learned that… The Practice of Statistics, 5 th Edition22 DISPLAY categorical data with a bar graph IDENTIFY what makes some graphs of categorical data deceptive CALCULATE and DISPLAY the marginal distribution of a categorical variable from a two-way table CALCULATE and DISPLAY the conditional distribution of a categorical variable for a particular value of the other categorical variable in a two-way table DESCRIBE the association between two categorical variables A dataset contains information on individuals. For each individual, data give values for one or more variables. Variables can be categorical or quantitative. The distribution of a variable describes what values it takes and how often it takes them. Inference is the process of making a conclusion about a population based on a sample set of data. Data Analysis: Making Sense of Data

23 Homework – Due Friday  P. 6 # 3 & 6  P. 20 # 9, 16, 19, 21


Download ppt "Chapter 1: Exploring Data Sec. 1.1 Analyzing Categorical Data."

Similar presentations


Ads by Google