Chapter 1: Exploring Data

Slides:



Advertisements
Similar presentations
Chapter 1: Exploring Data
Advertisements

CHAPTER 1 Exploring Data 1.1 Analyzing Categorical Data.
CHAPTER 1 Exploring Data
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 1 Exploring Data 1.1 Analyzing Categorical.
Chapter 1: Exploring Data
+ The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE Chapter 1: Exploring Data Introduction Data Analysis: Making Sense of Data.
CHAPTER 1 Exploring Data Introduction Data Analysis: Making Sense of Data.
Chapter 1: Exploring Data Sec. 1.1 Analyzing Categorical Data.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 1 Exploring Data 1.0 Introduction Data Analysis:
+ Chapter 1: Exploring Data Section 1.1 Analyzing Categorical Data The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE.
+ Chapter 1: Exploring Data Section 1.1 Analyzing Categorical Data The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE Introduction:
+ The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE Chapter 1: Exploring Data Introduction Data Analysis: Making Sense of Data.
+ Chapter 1: Exploring Data Section 1.1 Analyzing Categorical Data The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE.
+ Chapter 1: Exploring Data Section 1.1 Analyzing Categorical Data The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE.
+ Warm Up Which of these variables are categorical? Which are quantitative?
+ Chapter 1: Exploring Data Section 1.1 Analyzing Categorical Data The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE Introduction:
+ Analyzing Categorical Data Categorical Variables place individuals into one of several groups or categories The values of a categorical variable are.
+ Chapter 1: Exploring Data Section 1.1 Analyzing Categorical Data The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE.
+ Chapter 1: Exploring Data Section 1.1 Analyzing Categorical Data The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE.
Sections TAKE OUT YOUR NOTES, Book & Do Page 8 #7-8
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
Take out a piece of paper and open book to pg. 6
Analyzing Categorical Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Introduction & 1.1: Analyzing categorical data
CHAPTER 1 Exploring Data
1.1 Analyzing Categorical Data.
Do Now: Mr. Buckley gathered some information on his class
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
1.1: Analyzing Categorical Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Section 1.1 Analyzing Categorical Data
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Warmup A teacher is compiling information about his students. He asks for name, age, student ID, GPA and whether they ride the bus to school. For.
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Presentation transcript:

Chapter 1: Exploring Data Sec. 1.1 Analyzing Categorical Data

Data Analysis: Making Sense of Data IDENTIFY the individuals and variables in a set of data CLASSIFY variables as categorical or quantitative DISPLAY categorical data with a bar graph IDENTIFY what makes some graphs of categorical data deceptive CALCULATE and DISPLAY the marginal distribution of a categorical variable from a two-way table CALCULATE and DISPLAY the conditional distribution of a categorical variable for a particular value of the other categorical variable in a two-way table DESCRIBE the association between two categorical variables

Data Analysis Statistics is the science of data. Data Analysis is the process of organizing, displaying, summarizing, and asking questions about data.

Data Analysis Individuals objects described by a set of data (don’t necessarily have to be people) Variable any characteristic of an individual Categorical Variable places an individual into one of several groups or categories. Quantitative Variable takes numerical values for which it makes sense to find an average. Some number values can be considered categorical – for instance, area codes, floors of a building Some quantitative variables can be converted to categorical – Giving letter grades instead of numerical grades, meeting standards

Example, p. 3 CensusAtSchool is an international project that collects data about primary and secondary school students using surveys. Hundreds of thousands of students from Australia, Canada, New Zealand, South Africa, and the United Kingdom have taken part in the project since 2000. Data from the surveys are available at the project’s Web site (www.censusatschool.com). We used the site’s “Random Data Selector” to choose 10 Canadian students who completed the survey in a recent year. The table displays the data.

Example, p. 3 Who are the individuals in this data set? What variables were measured? Identify each as categorical or quantitative. Describe the individual in the highlighted row. 10 randomly selected Canadian students who participated in the CensusAtSchool survey. 7 variables: Province, Gender, Dominant Hand, and Preferred Communication are Categorical, and Language spoken, Height, and Wrist Circumference are Quantitative. Student lives in Ontario and is male. He speaks 4 languages, is left-handed, is 157.5 cm tall with a wrist circumference of 147 mm and prefers text messaging. Notice that the female who is 166 cm tall has a wrist circumference of 65 mm. This is probably wrong. Always check to see if your data makes sense!

Data Analysis Distribution tells us what values a variable takes and how often it takes those values. How to Explore Data Start by determining the variable of interest. Display the data in a graph. Add Numerical Summaries.

Categorical Variables Categorical variables place individuals into one of several groups or categories. Frequency Table Format Count of Stations Adult Contemporary 1556 Adult Standards 1196 Contemporary Hit 569 Country 2066 News/Talk 2179 Oldies 1060 Religious 2014 Rock 869 Spanish Language 750 Other Formats 1579 Total 13838 Relative Frequency Table Format Percent of Stations Adult Contemporary 11.2 Adult Standards 8.6 Contemporary Hit 4.1 Country 14.9 News/Talk 15.7 Oldies 7.7 Religious 14.6 Rock 6.3 Spanish Language 5.4 Other Formats 11.4 Total 99.9 Variable Count Percent “Values” do not necessarily mean numbers. A categorical variable for instance can have a value of “red” if you are talking about colors. Values

Count vs. Percent Frequency Table – displays counts in each category Relative Frequency Table – displays the percent in each category (% found by count in category / total count) Usually percents are rounded. If the rounded percents do not add up to exactly 100%, it may be due to round-off error.

Displaying Categorical Data – Bar Graph Bars don’t touch and must be equally wide Can rearrange categories Scale may be in counts or percents Bar graphs compare several quantities by comparing the heights of bars that represent those quantities. Our eyes, however, react to the area of the bars as well as to their height. Must make bars equally wide because of this.

Displaying Categorical Data – Pie Chart Must be Out of a Whole (100%) Read more about Pie Charts on p. 8. You may get data where they give the % out of that category. This data cannot be displayed in a pie chart.

Graphs: Good and Bad There are two important lessons to keep in mind: Who Buys iMacs? There are two important lessons to keep in mind: beware the pictograph, and watch those scales. It is tempting to replace the bars with pictures for greater eye appeal. Don’t do it!

Two-Way Tables and Marginal Distributions A two-way table describes two categorical variables, organizing counts according to a row variable and a column variable. Typically, we put the explanatory variables in the columns and the response variables in the rows.

Example, p. 12 What are the variables described by this two-way table? I’m Gonna Be Rich! A survey of 4826 randomly selected young adults (aged 19 to 25) asked, “What do you think the chances are you will have much more than a middle-class income at age 30?” The table below shows the responses. What are the variables described by this two-way table? How many young adults were surveyed? Describes two categorical variables: Gender and opinion about becoming rich. 4826 young adults were surveyed.

Two-Way Tables and Marginal Distributions The marginal distribution of one of the categorical variables in a two-way table of counts is the distribution of values of that variable among all individuals described by the table. Note: Percents are often more informative than counts, especially when comparing groups of different sizes. How to examine a marginal distribution: Use the data in the table to calculate the marginal distribution (in percents) of the row or column totals. (row or column total / table total) Make a graph to display the marginal distribution.

Example, p. 13 Examine the marginal distribution of chance of getting rich. Response Percent Almost no chance 194/4826 = 4.0% Some chance 712/4826 = 14.8% A 50-50 chance 1416/4826 = 29.3% A good chance 1421/4826 = 29.4% Almost certain 1083/4826 = 22.4%

Example, p. 13 Response Percent Almost no chance 194/4826 = 4.0% Some chance 712/4826 = 14.8% A 50-50 chance 1416/4826 = 29.3% A good chance 1421/4826 = 29.4% Almost certain 1083/4826 = 22.4%

Relationships Between Categorical Variables A conditional distribution of a variable describes the values of that variable among individuals who have a specific value of another variable. How to examine or compare conditional distributions: Select the row(s) or column(s) of interest. Use the data in the table to calculate the conditional distribution (in percents) of the row(s) or column(s). Make a graph to display the conditional distribution. Use a bar graph to compare distributions.

Example, p. 15 Calculate the conditional distribution of opinion among males. Examine the relationship between gender and opinion. Response Male Almost no chance 98/2459 = 4.0% Some chance 286/2459 = 11.6% A 50-50 chance 720/2459 = 29.3% A good chance 758/2459 = 30.8% Almost certain 597/2459 = 24.3% Female 96/2367 = 4.1% 426/2367 = 18.0% 696/2367 = 29.4% 663/2367 = 28.0% 486/2367 = 20.5% Segmented bar graph

We can also compare categories with a side-by-side bar graph.

Example, p. 15 Can we say there is an association between gender and opinion in the population of young adults? Making this determination requires formal inference, which will have to wait a few chapters. Caution! Even a strong association between two categorical variables can be influenced by other variables lurking in the background.

Data Analysis: Making Sense of Data DISPLAY categorical data with a bar graph IDENTIFY what makes some graphs of categorical data deceptive CALCULATE and DISPLAY the marginal distribution of a categorical variable from a two-way table CALCULATE and DISPLAY the conditional distribution of a categorical variable for a particular value of the other categorical variable in a two-way table DESCRIBE the association between two categorical variables A dataset contains information on individuals. For each individual, data give values for one or more variables. Variables can be categorical or quantitative. The distribution of a variable describes what values it takes and how often it takes them. Inference is the process of making a conclusion about a population based on a sample set of data.

Homework – Due Block Day P. 6 # 3 & 6 P. 20 # 9, 16, 19, 21