Presentation on theme: "C HAPTER 1.1 Analyzing Categorical Data. I NDIVIDUALS AND V ARIABLES Individuals are the objects described by a set of data. Individuals may be people,animals,"— Presentation transcript:
C HAPTER 1.1 Analyzing Categorical Data
I NDIVIDUALS AND V ARIABLES Individuals are the objects described by a set of data. Individuals may be people,animals, or things. A variable is any characteristic of an individual. A variable can take different values for different individuals. Categorical Variable: Places an individual into one of several groups or categories. It does NOT make sense to find the “average” of the variable Quantitative Variable: Takes numerical values for which it makes sense to find an average.
In practice, any set of data is accompanied by background information that helps us understand the data. When you first meet a new data set, ask yourself the following questions: Who are the individuals described by the data? How many individuals are there? What are the variables? In what units is each variable recorded? Weights, for example, might be recorded in grams, pounds, thousands of pounds, or kilograms. We could follow a newspaper reporter’s lead and extend our list of questions to include Why, When, Where, and How were the data produced? For now, we’ll focus on the first two questions.
(a) Who are the individuals in this data set? (b) What variables were measured? Identify each as categorical or quantitative.In what units were the quantitative variables measured? (c) Describe the individual in the highlighted row. We used the site censusatschool.com ‘s “Random Data Selector” to choose 10 Canadian students who completed a survey in a recent year. The table below displays the data.
The distribution of a variable tells us what values the variable takes and how often it takes these values. Graphs are often used to see the “shape” of a distribution where we can identify patterns or deviations in the data. HOW TO EXPLORE DATA Begin by examining each variable by itself. Then move on to study relationships among the variables. Start with a graph or graphs. Then add numerical summaries.
CHECK YOUR UNDERSTANDING Jake is a car buff who wants to find out more about the vehicles that students at his school drive. He gets permission to go to the student parking lot and record some data. Later, he does some research about each model of car on the Internet. Finally, Jake makes a spreadsheet that includes each car’s model, year, color, number of cylinders, gas mileage,weight, and whether it has a navigation system. Who are the individuals in Jake’s study? What variables did Jake measure? Identify each as categorical or quantitative.
ANALYZING CATEGORICAL DATA Frequency vs. Relative Frequency Frequency is the count of observations that lie in each category. To be meaningful, it is important that the total be given so the counts can be put into perspective. Relative Frequency is the percent of the total that lies in each category. Knowing the total number of individuals in the data is helpful but not necessary to understand the meaning of the data.
G RAPHICAL D ISPLAYS FOR CATEGORICAL DATA PIE CHART BAR GRAPH
Pie charts are used to show the parts of a single group that fall into different categories. If the data is about separate groups, then a pie chart is NOT appropriate. A pie chart of the percentage of OHS students with different colored hair would be appropriate because the categories come from one group (OHS students). The number of people that owned an iPod from different age groups would NOT be appropriate because they are different groups. Bar graphs are always appropriate for categorical data and are easier to read and understand. When choosing your graphical display for categorical data, always use a bar graph unless instructed to make a pie chart.
T WO – WAY TABLES A two way table shows the counts for individuals that fit into a combination of two variables.
M ARGINAL DISTRIBUTIONS The marginal distribution of one of the categorical variables in a two-way table of counts is the distribution of values of that variable among all individuals described by the table. In other words, the row or column total for that variable divided by the table total. This would be the marginal distribution for the variable “almost certain”. The row “almost certain” has a total of 1083 and there were 4826 total individuals counted in the data
If we calculate the marginal distribution for all the row variables we would get the following outcomes a bar graph showing the marginal distributions for each row variable is also shown:
C ONDITIONAL D ISTRIBUTION A conditional distribution of a variable describes the values of that variable among individuals who have a specific value of another variable. There is a separate conditional distribution for each value of the other variable.
Women are the condition here. These are the conditional distributions of responses ONLY AMONG WOMEN. Therefore the “total” becomes the total number of WOMEN (2367)
We say that there is an association between two variables if values of one variable seem to occur in common with values of the other variable. For example, higher values in variable x seem to occur with higher values of variable y, we say there is an association between x and y.
4 STEPS TO ORGANIZING A STATS PROBLEM 1. State: What’s the question that you’re trying to answer? 2. Plan: How will you go about answering the question? What statistical techniques does this problem call for? 3. Do: Make graphs and carry out needed calculations. 4. Conclude: Give your practical conclusion in the setting of the real-world problem. To keep the four steps straight, just remember: Statistics Problems Demand Consistency!