Quantitative vs. Categorical Data

Name: Quantitative vs. Categorical Data
Uploaded: 2017-08-26T10:01:04+00:00
Duration: PTM13S8
Channel: Darleen Harrison
Description: Quantitative vs. Categorical Data

Quantitative vs. Categorical Data
Quantitative data consist of number that represent counts or measurements. All quantitative data is numerical, but not all numerical data is quantitative. Data with a unit of measurement (seconds, feet, pounds, dollars, etc.) is quantitative. Numerical data used as a label or range of values (Student ID Number, years) is not quantitative.

Examples: Quantitative Data
The University keeps the following quantitative data about each student. Grade Point Average Number of Credit Hours Completed Age Amount of money owed for tuition Other examples?

Categorical Data Non-quantitative data is called categorical.
Non-numerical data must be categorical. Numerical data that serves to label or identify individuals are categorical (Example: Social Security Number). A useful guide: Would it make sense to consider an average value? If not, treat the data as categorical.

Examples: Categorical Data
The University keeps the following categorical data about each student: Name Laker ID Number Date of Birth Gender Residency (“in-state” or “out-of-state”) Other?

Basic Practice of Statistics - 3rd Edition
Chapter One, Part 2 1.4: Critical Thinking in Statistics 1.5: Collecting Sample Data Chapter 1 Chapter 1 Chapter 1 5 5

Quick Review We typically have a VERY LARGE set of individuals (called the Population), but we cannot obtain data from every individual. A parameter is a numerical value that describes the population. The actual value is not known. We choose a subset of the Population (this subset is called a Sample), and gather data from those individuals. A statistic is a numerical value, computed from the Sample data. We often use this to estimate the unknown value of some parameter. Chapter 1

Population vs. Sample (not to scale!!)
Basic Practice of Statistics - 3rd Edition Population vs. Sample (not to scale!!) Chapter 1

Sampling Bias Ideally, we want our sample to be representative of the overall population. If the way we choose the sample and/or gather data from the chosen individuals… …is more/less likely to include a certain type of individual or produce a certain type of response... …then the conclusions we draw from the sample might be inaccurate for the intended population. This is called bias. Examples to follow. 08/15/11 Chapter 1

Examples of Bias Estimate average class height using a sample of students from the front row. Estimate average class height using a sample of male students. Study the effectiveness of a weight-loss diet using a sample of professional athletes. Estimate what percent of Americans approve of the president using a sample of voters from only one political party. 08/15/11 Chapter 1

Common Types of Sampling Bias
Basic Practice of Statistics - 3rd Edition Common Types of Sampling Bias A voluntary response sample occurs when the individuals to be studied have control over whether or not they are included in the sample. This is also called “self-selection bias.” A convenience sample occurs when the researcher is more likely to choose individuals for which it is easier to obtain data. The researcher might be unaware of this! Small sample: Using too few individuals increases of chance of getting a sample that consists only of “unusual” individuals. 08/15/11 Chapter 1

Example: Voluntary Response Ratemyprofessors.com is a website that collects information about college professors from their students. The ratings come from students volunteering to create an account and submit information. Question: What kind of students are likely to volunteer? Chapter 1 11

Voluntary Response Bias
Basic Practice of Statistics - 3rd Edition Voluntary Response Bias Answer: Students with stronger opinions are more likely to volunteer a response. In many “customer satisfaction” surveys, those with a strong negative opinion are most likely to volunteer. Those with a neutral opinion are least likely to volunteer. There is potential bias: those in the sample are more likely to have a negative opinion than the entire population. 08/15/11 Chapter 1

Example: Convenience Sample
Basic Practice of Statistics - 3rd Edition Example: Convenience Sample I want to determine the average age of all current Clayton State students. For my sample data, I choose five students from the class and compute their average age. Why might this lead to inaccurate results? Although my intended population is all CSU students, I picked only from a small part of the population (that was most convenient for me). 08/15/11 Chapter 1

Other Common Problems Some types of bias occur not in choosing the sample, but in gathering data from the chosen individual. Misreported data: Individuals may give inaccurate results (perhaps unintentionally) when asked a certain question. Example: How much do you weigh today? Example: How many hours per week do you study? Question wording: Variations in the wording of a question can greatly influence people’s responses. Compare: Should the government spend more money on public education? Should the government spend more of your tax dollars on public education? 08/15/11 Chapter 1

Good Ways to Sample We usually want to have some degree of randomness when choosing our sample. Randomly-chosen samples reduce the potential for biased results, but complete randomness is not always possible. We’ll talk more about what “random” actually means in Chapters 4 and 5. 08/15/11 Chapter 1

*** Simple Random Sample ***
Basic Practice of Statistics - 3rd Edition *** Simple Random Sample *** All of the statistical inference in this course will assume that data comes from a Simple Random Sample (SRS). This means… Before choosing individuals, we decide how many we want to use. This is called the sample size, usually denoted by the letter n. We choose the sample so that each group of n individuals (from the overall population) is equally likely to be picked. 08/15/11 Chapter 1

Other “Good” Samples Random Sample: Each individual from the population has an equal chance of being chosen for the sample (every SRS is a Random Sample, but not every Random Sample is an SRS). Probability Sample: For each individual, we know the chance that he/she will be chosen for the sample, but different individuals may have different chances (Random Samples and SRS’s are special cases of this). 08/15/11 Chapter 1

Other “Good” Samples Stratified Sample: Divide the population into mutually exclusive groups (strata), and choose a sample (often an SRS) from within each group. This is often done if we want to account for some kind of population demographics. Example: If our population is 60% women and 40% men, we might choose a sample of 30 women and 20 men. Sample demographics match those of the population. 08/15/11 Chapter 1

Other “Good” Samples Cluster Sampling: Divide the population into mutually exclusive groups (clusters), and randomly choose a set of clusters. For each selected cluster, gather data from all individuals within that cluster. Example: There are 13 sections of Math currently offered at Clayton State. Randomly choose 3 sections, and survey all students from each of those 3 sections. 08/15/11 Chapter 1

Other Types of Samples? Real-world statistics often uses very complex sampling methods. See the text for a survey of some of these. The trade-off: A simple method (like an SRS) is easier to analyze mathematically, but often more difficult to achieve in practice. 08/15/11 Chapter 1

How are the data obtained?
In addition to how the individuals are chosen for a sample, we distinguish between the following two scenarios: Observational Study: Simply observe and/or measure individuals, without attempting to modify their characteristics or behavior. Experiment: Deliberately impose a specific set of conditions (a treatment) on each individual. A valid experiment has more than one possible treatment, and we can compare results. 08/15/11

Example: Experiment vs. Observational Study
Question: Is there any sort of relationship between caffeine use and exam scores? Here is an Observational Study: Just before the exam, record how much caffeine each student has consumed today. Record each student’s exam score. It will probably be the case that different students consume different amounts of caffeine, but we do not deliberately try to create this difference. 08/15/11

Example: Experiment vs. Observational Study
Question: Is there any sort of relationship between caffeine use and exam scores? Here is an Experiment: Require students to consume no caffeine on exam day. 15 minutes before the exam, give each student a cup of coffee. Some students will get regular coffee (with caffeine), others will get decaffeinated coffee. Record each student’s exam score. It will certainly be the case that different students consume different amounts of caffeine, because we deliberately created such a difference. 08/15/11

Questions for Discussion
In the Experiment, why not give all students a cup of (caffeinated) coffee? In the Experiment, why not use “regular coffee” versus “no coffee”? What are some advantages/disadvantages of the Study versus the Experiment? Can you think of a scenario where it would not be possible to do an Experiment?

Quantitative vs. Categorical Data

Similar presentations

Presentation on theme: "Quantitative vs. Categorical Data"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Quantitative vs. Categorical Data

Similar presentations

Presentation on theme: "Quantitative vs. Categorical Data"— Presentation transcript:

Similar presentations

About project

Feedback