Types of Data and Sampling

Once you have determined the population that you are considering for your study. The next step in completing your study is obtaining a sample that best represents your population. Sample selection is one of the key factors that will determine if your survey is valid and will produce legitimate conclusions

Types of Data Raw Data This is the name given to data that has not yet been analyzed, only collected.

Discrete Data There is a limit to the categories that data can be placed in. Ex. The soft drink size at the movie theatre There are only the 4 categories and it is not possible to go in between them. Continuous Data All rational values. The data can take on any value, particularly decimal values of infinite place value.

Population numbers Population numbers Counts of physical objects where fractions don’t make sense (people) Counts of physical objects where fractions don’t make sense (people) Time ( can win a race in 3 seconds or 3.4 seconds or 3.148 etc..) Time ( can win a race in 3 seconds or 3.4 seconds or 3.148 etc..) Length Length Mass Mass Discrete DataContinuous Data

4 Types of Data Nominal Data Ordinal Data Interval Data Ratio Data

This is data that can be linked into categories but those categories can not be ranked or quantified This is data that can be linked into categories but those categories can not be ranked or quantified Ex: if a survey asks what type of food you prefer: Chinese, Italian, American or Indian. Nominal Data Discret e

Data is organized into rankings. Data is organized into rankings. Ex: Rank your top five favourite movies. Matrix = 1 Batman Begins = 2 etc… The order doesn’t matter as long as the data can be ranked the way that you want it to be. The order doesn’t matter as long as the data can be ranked the way that you want it to be. Ex: Matrix = 100 Batman Begins = 300 Ordinal Data Discret e

Data is categorized into numerical groupings in which the distance between these groupings is the same Data is categorized into numerical groupings in which the distance between these groupings is the same The initial or zero point is arbitrary The initial or zero point is arbitrary Ex: Intervals 2006-2007 is the same as 2005-2006 Ex: IQ intervals Interval Data Discret e

All continuous data is Ratio Data. All continuous data is Ratio Data. The name ratio comes from Rational, the number system which contains decimal values The name ratio comes from Rational, the number system which contains decimal values Ex:Your time in the 100 m dash Ratio Data Continuous

Sampling The method used to collect sample data from a population is very important and can mean the difference between a credible conclusion or a biased one The method used to collect sample data from a population is very important and can mean the difference between a credible conclusion or a biased one

Simple Random Sampling Gives all the elements of the population an equal chance of being a part of the sample. Gives all the elements of the population an equal chance of being a part of the sample. Must be as impartial as possible and not favouring one over the other Must be as impartial as possible and not favouring one over the other

Systematic Sample Selecting a sample from a population is done systematically or through a constant counting process Selecting a sample from a population is done systematically or through a constant counting process Ex: picking every 100 th person from a phone book

To determine if you should choose ever 5 th or 100 th item find the ratio of the population and sample To determine if you should choose ever 5 th or 100 th item find the ratio of the population and sample If you wanted a tenth of the population then select every 10 th item.

Ex: A telephone company is planning a marketing survey of its 760 000 customers. For budget reasons, the company wants a sample size of about 250. a) Determine the interval that should be used for a systematic sample. Therefore the company should be selecting every 3040 th customer for their survey

Stratified Sample Takes into account that a population is made up of many demographics that tend to react differently Takes into account that a population is made up of many demographics that tend to react differently If a population of turtles has more females than males, then if the sample is purposely weighted with more females than males in a proportional number to the population, it is stratified sample. If a population of turtles has more females than males, then if the sample is purposely weighted with more females than males in a proportional number to the population, it is stratified sample.

To determine how many subjects from each subgroup to select determine the percent of that subgroup is in the population and multiply by the number desired in the sample

Ex: Before booking bands for the school dances, the students’ council at Statsville H.S. wants to survey the music preferences of the student body. The following table shows the enrolment at the high school a) Design a stratified sample for a survey of 25% of the student body Grade # Students 9255 10232 11209 12184 Total880 25% of the student body is 880 x 0.25 = 220

Complete this step for each grade and you should get that there should be: Complete this step for each grade and you should get that there should be: 64 gr 9’s 58 gr 10’s 52 gr 11’s 46 gr 12’s To check they should add up to 220

Cluster Sample Takes advantage of groups that have similar characteristics of other similar groupings Takes advantage of groups that have similar characteristics of other similar groupings Randomly selecting whole classes assuming they are random Randomly selecting whole classes assuming they are random

Multi-Stage Sample Uses compound randomization Uses compound randomization A study that determines passenger safety in cars randomly picks a car manufacturer (stage 1), then randomly picks a vehicle type like a van, compact, truck (stage 2), then randomly picks a type of car in that class (stage 3). A study that determines passenger safety in cars randomly picks a car manufacturer (stage 1), then randomly picks a vehicle type like a van, compact, truck (stage 2), then randomly picks a type of car in that class (stage 3).

Ex: Suppose that your population consisted of all Ontario households. How would you create a Multi-Staged Sample? You could first randomly select from the different towns/cities in Ontario Then randomly select a sample of blocks or subdivision within the selected cities Finally you could then select from individual homes on that block

Voluntary-Response Sample Depends on the initiative of the sample itself Depends on the initiative of the sample itself Internet and mail polls Internet and mail polls Elements selected for the sample may or may not respond Elements selected for the sample may or may not respond This creates a potential bias This creates a potential bias

Convenience Sample Samples local elements that are nearby or elements that are accessible with little or no cost Samples local elements that are nearby or elements that are accessible with little or no cost Telephone or internet Telephone or internet

Bias Statistical bias is any factor that favours certain outcomes or responses thus skews the results of the data collection Sometimes the bias is unintentional or it could be deliberate. “4 out of 5 dentists recommend”

Sampling Bias Resulting from a sampling frame that does not reflect the characteristics of the entire population Resulting from a sampling frame that does not reflect the characteristics of the entire population Due to sample technique or data collection Due to sample technique or data collection

Ex: Identify the bias in the following a) A survey asked students at a high school football game whether a fund for extra-curricular activities should be used to buy new equipment for the football team or for the school band. Since the sample includes only football fans, it is not representative of the whole student body. A poor choice of sample created and invalid survey. You should ask the entire student body for this question. Sampling Bias

Non-Response Bias When particular groups are under- represented in a survey because they choose not to participate When particular groups are under- represented in a survey because they choose not to participate Researchers can include a question that identifies them as members of a particular group to avoid this bias Researchers can include a question that identifies them as members of a particular group to avoid this bias

Ex: A science class asks every 5 th student entering the caf to answer a survey on environmental issues. Less than half agree to do the survey. The completed questionnaire show that a high proportion of the respondents are concerned about the environment and are well-informed. What bias could affect these results? Non-Response Bias

Measurement Bias Resulting from a data collection method that over-estimates or under-estimates a characteristic of the population Resulting from a data collection method that over-estimates or under-estimates a characteristic of the population Results from data collection process Results from data collection process Leading questions or Loaded question Leading questions or Loaded question

Ex: Identify the Bias in the following survey A highway engineer suggests that an economical way to survey traffic speeds on an expressway would be to have the police officers who patrol the highway record the speed of traffic every 30 min People tend to slow down when they are around police cars so this will not give you accurate data. Your data will underestimate the average speed Measurement Bias

Response Bias Occurs when participants in a survey deliberately give false or misleading answers Occurs when participants in a survey deliberately give false or misleading answers Could occur because respondents want to purposely skew the results of the survey or because they are afraid or embarrassed to answer honestly Could occur because respondents want to purposely skew the results of the survey or because they are afraid or embarrassed to answer honestly

Ex: A teacher has just explained a particularly difficult concept to her class and wants to check that everyone is ‘with’ her. She asks those who understand to put up their hands. What is the bias in this survey. Students are less willing to put up their hand if they Don’t understand because they might be embarrassed to do so Response Bias

! Remember ! Bias can invalidate the results Bias can invalidate the results Intentional bias’ can manipulate stats in favour of a certain point of view Intentional bias’ can manipulate stats in favour of a certain point of view Unintentional bias can be introduced if sample or data collection is not done properly Unintentional bias can be introduced if sample or data collection is not done properly Careful wording of survey questions is essential for avoiding bias (no leading or loaded questions) Careful wording of survey questions is essential for avoiding bias (no leading or loaded questions)

