A focus on Sampling and Sampling Methods. Menu Measures of Centre Measures of Spread Definitions Assessment Tips Practice Tasks For clarification, click.

Presentation on theme: "A focus on Sampling and Sampling Methods. Menu Measures of Centre Measures of Spread Definitions Assessment Tips Practice Tasks For clarification, click."— Presentation transcript:

A focus on Sampling and Sampling Methods

Menu Measures of Centre Measures of Spread Definitions Assessment Tips Practice Tasks For clarification, click on any step you do not understand to see that element broken down The example used throughout this presentation is trying to find the mean height of WBHS pupils On Your Calculator

Sampling Methods In this presentation you will see a number of sampling methods, their benefits and drawbacks. Simple Random Sample Cluster Sampling Systematic Sampling Stratified Sampling Note: For more detailed instructions on any of the example click on the step you misunderstand

Measures of Central Tendency In this presentation you will learn how to calculate a number of measures of average or centre, as well as their benefits and drawbacks Mean Median Mode Note: For more detailed instructions in any of the examples click on the step you misunderstand

Measures of Spread In this presentation you will learn how to find a number of measures of spread as well as their drawbacks and advantages. You will also need to decide which measure of spread and which measure of centre go together. Standard Deviation Interquartile Range Range Note: For more detailed instructions in any of the examples click on the step you misunderstand

Simple Random Sample The simplest unbiased sample. 1-Number the entire population. 2-Generate random numbers. 3-Proceed until you have as many as you need ignoring any repeats. Example (Heights of WBHS students) 1.Get a copy of the School Roll. 2.Number every person 3.Generate Random numbers from 1 to the maximum you need. 4.Proceed until you have the desired sample size ignoring repeats.

Simple Random Sample AdvantagesCheap Easy to carry out UnbiasedDisadvantages May not represent strata Needs an entire population list

Cluster Sampling The easiest unbiased sample. 1. Sort your data into clusters based on location. 2. Randomly choose the cluster. 3. Perform a simple random sample on the chosen cluster. Example (Heights of WBHS students) 1.Get a copy of the School Roll. 2.Sort into clusters eg year levels 3.Randomly select the cluster. 4.Randomly generate a sample from each cluster. Care with clusters as Juniors are much shorter than Seniors

Cluster Sampling Advantages Very Cheap Very Cheap Very Easy to carry out Very Easy to carry outUnbiasedDisadvantages Needs an entire population list Can be biased if clusters strongly affect the statistics.

Systematic Sampling A relatively quick way to pick an unbiased sample 1. List the entire population. 2. Decide on your step size (Total ÷ Sample size = n). 3. Randomly generate a starting point. 4. Step every n th data point till you have your sample. Example (Heights of WBHS students) 1.Get an alphabetical copy of the School Roll. 2.Step Size = Total ÷ Sample size 3.Randomly generate a starting point. 4.Starting from the beginning use the step size to pick the rest of the sample

Systematic Sampling AdvantagesCheap Easy to Choose Sample UnbiasedDisadvantages Needs an entire population list If population list is ordered then sample can become biased

Stratified Sampling The most reliable sampling method. 1. Sort the data into strata based on information you already know. 2. Calculate the proportions for each strata. 3. Perform a Simple Random Sample on each of the strata. Example (Heights of WBHS students) 1.Get a copy of the School Roll separated into year levels. 2.Calculate the sample size for each year group (strata). 3.Perform a simple random sample on each year group to their specific sample size.

Stratified Sampling AdvantagesUnbiased Completely representative of each of the strata Most reliable estimates Disadvantages Needs entire population list Information about entire population needs to be known beforehand Time consuming

Generate a Random Number 1. Decide on the starting number (in this case 1) 2. Decide how many you need (In the case of the school 529 students) 3. Choose your calculator Casio FX-82 Casio Graphic Texas

Random Number on a Casio Graphics Calculator 1. Decide on the starting number (in this case 1) 2. Decide how many you need (In the case of the school 529 students) 3. In Run Mode Intg OPTN – F6 – F4 – F5 Ran# OPTN – F6 – F3 – F4 On Screen Intg(529 × Ran# + 1) Population size or Strata size Starting Value OPTN F3F4F6 ( ) 7 8 5 × + 1 Intg(529 × Ran# + 1)

Random Number on a Casio FX - 82 1. Decide on the starting number (in this case 1) 2. Decide how many you need (In the case of the school 529 students) 3. Ran# = 2 nd function · 4. On screen Ran# × 529 + 1 = Ran# × 529 + 1 = note Ignore any decimal in the answer Population size or strata size Starting value RAN#×529+1 · shift

Random Number on a Texas 1. Decide on the starting number (in this case 1) 2. Decide how many you need (In the case of the school 529 students) RANDIPRB → RANDI,2 nd Function ),2 nd Function ) 3. On Screen RANDI(1, 529) End Value Starting Value 2nd PRB ) RANDI(1,529)

Simple Random Sample The simplest unbiased sample. 1. Number the entire population. 2. Generate random numbers. 3. Proceed until you have as many as you need ignoring any repeats. Example (Heights of WBHS students) 1.Get a copy of the School Roll. 2.Number every person from 1 (to 529) 3.Generate Random numbers from 1 to the maximum you need (529). 4.Proceed until you have the desired sample size ignoring repeats.

Strata Proportions 1. Number of people in strata divided by total in population. 2. Multiplied by number of people wanted in total sample. Example (Heights of WBHS students) 1.529 people on School Roll. 2.115 year 10’s 3.Sample size of 30 4.So year 10 sample size 115 ÷ 529 × 30 = 6.52 So take 7 year 10 students

Systematic Step Sizes 1. Number of people in population divided by Sample Size Example (Heights of WBHS students) 1.529 people on School Roll. 2.Sample size of 30 3.So Step size 529 ÷ 30 = 17.63333 So take every 17 th student from the starting position

Systematic Stepping 1. Starting at the random start point step out till you get desired sample size. Example (Heights of WBHS students) 1.Random starting point 803, step size 29 2.803 rd student on alphabetical list is where we start. 3.Then 832 nd student, 861 st student, we have now reached the end of the roll so start at the beginning 890= 15 th student then 45 th student…

Mean 1. Add up all of the values in the sample. 2. Divide by the sample size. Advantages Easy to calculate for large samples. Accurate and well understood Disadvantages Affected by outliers Calculator Method

Median 1. List all the values in order. 2. Find the central value Advantages Accurate Not affected much by Outliers Disadvantages Not so widely known as an average Time consuming to list large sample in order

Mode 1. List all the values 2. Find the most common item Advantages Can calculate mode for data that is not numeric or ordered Not affected much by Outliers Very easy to calculate Disadvantages Can be inaccurate for numeric or data that can be ordered

Statistics on a Calculator Choose your calculator Casio FX-82 Casio Graphic Texas

1. In Stat Mode 2. In list 1 enter all data values 3. In list 2 enter their frequencies 4. F2 (CALC) 5. F6 (SET) Should read 6. Exit 7. F1 (1VAR) (All Statistics are listed χ is mean, χσn is std. dev.) F2 F1 F6 1Var XList :List1 1Var Freq :List2 2Var XList :List3 2Var YList :List4 2Var Freq :List5 EXIT Statistics on a Casio Graphics Calculator S.D. using table

Entering Data on Casio Graphics Calculator Enter each data value in List 1 followed by EXE Enter the frequency of each data value in List 2 followed by EXE Note If all of the frequencies are 1 then you don’t need to enter the frequencies. In the Set Menu change the 1Var Freq to 1 instead of list 2 List 1 List 2 List 3 List4 1 2 3 4 5 EXE

1. Put your calculator into statistics mode Mode 2 Mode 2 2. Clear the statistics memory Shift Mode 1 Shift Mode 1 3. Enter the data carefully 180cm M+ 180cm M+ 4. Calculate desired statistics Shift 2 Shift 2 1. χ mean 2. χσnstandard deviation Statistics on a Casio FX 82 Calculator Scl mode clr all 1 2 3 shift mode M+ Shown on Screen S.D. using table

Entering Data on Casio FX 82 Calculator Enter each data value followed by M+ ‘n’ is the number of data values that you have entered Note Be very careful entering the data values as you cannot review them later to make sure that they are correct. n = 1 M+

Statistics on a Texas Calculator 1. Put your calculator into statistics mode 1. 2 nd Function DATA 2. 1 - VAR 2. Enter the data carefully 1. DATA 3. Calculate desired statistics 1. STATVAR 2. Shift between statistics with arrow keys 1. nnumber of data values 2. χ mean 3. σχstandard deviation S.D. using table 2nd DATA n x Sx σx STATVAR

Entering Data on a Texas Calculator 2nd DATA X1 = 180 Press the Data Key to begin Begin entering data. X1 is the data value Followed by the down arrow Freq1 is that data values frequency Followed by the down arrow X2 is next then Freq2 To check data use up arrow

Definitions PopulationThe entire list of those people or things that you wish to sample PopulationThe entire list of those people or things that you wish to sample Census A survey of an entire population Census A survey of an entire population SampleA small group of a population SampleA small group of a population Parameters Facts about an entire population gained from a census Parameters Facts about an entire population gained from a census (Notation: mean ‘μ’ or standard deviation ‘σ’) StatisticsEstimates of population parameters calculated from a sample StatisticsEstimates of population parameters calculated from a sample (Notation: mean ‘χ’ or standard deviation ‘s’) Representative A sample that appears to represent all elements of the in the correct proportions population Representative A sample that appears to represent all elements of the in the correct proportions population BiasA sampling method that does not give every element of the population an equal chance of selection BiasA sampling method that does not give every element of the population an equal chance of selection

Standard Deviation This is a calculation of the average difference between the data values and the mean. This is a calculation of the average difference between the data values and the mean. This measure of spread applies to the mean. This measure of spread applies to the mean. Advantages Easy to calculate for large samples on calculator. Accurate Very useful for certain types of data Disadvantages Affected by outliers Possibly not so well understood Use Calculator to CalculateCalculator Use table to calculate

Interquartile Range 1. Calculate the upper and lower quartiles. 2. Upper quartile minus lower quartile. 3. This measure of spread applies to the median Advantages Well understood Unaffected by outliers Disadvantages Easy to calculate for large samples.

1. Find the highest and lowest value. 2. Highest value minus the lowest value. 3. This measure of spread applies to all measures of centre. Range Advantages Well understood Unaffected by outliers Disadvantages Easy to calculate for large samples.

Standard Deviation by Table χ χ χ – χ (χ – χ) 2 180165 15 225 150165 -15 225 165165 0 0 170165 5 25 160165 -5 25 Total 825 0 500 Mean 165 100 Data Values From your sample or census Mean Calculated as usual, doesn’t change Data values minus the Mean Square of each of the values to the left Final Standard Deviation is the square root of this value so s = 10 Use Calculator to Calculate

1. List all the values in order. 2. Find the central value 3. Discard that central value 4. Find the central value of the remaining two halves. 5. These 2 numbers are the upper and lower quartiles Calculating Quartiles Example (Heights of WBHS students) 1.Data Values 165, 170, 173, 180, 182, 183, 191, 192 2.Central value middle of 180 and 182 so median is 181 3.Discard 181 and calculate middle of each half. 4.165, 170, 173, 180//182, 183, 191, 192 Lower quartile Upper quartile 171 187 171 187

Things to Consider Is my sample representative of the population? Need to consider whether any strata present in the data are represented in approximately the correct proportions. Need to consider the presence of any apparent outliers in the sample chosen, and the effect they will have on estimates of population parameters.

Things to Consider Is my sample representative of the population? Estimates are more reliable when taken from a large sample as the effects of outliers are lessened. Consider the size of the s.d. A larger value of s suggests considerable variation in the data values. Thus taking another sample could produce quite different statistics. Ask yourself, “If I were to repeat this sampling process, would I get the same results?”

Things to Consider How could I improve my sampling method? Need to choose a sampling method which eliminates bias, and which gives the best chance of choosing a representative sample. (Bias exists when some of the population members have greater or lesser chance of being included in the sample.) Need to discuss which statistics would give the best estimates of population parameters, including the effect of outliers.

Things to Consider Would I get the same or similar results if I repeated the same process? Are there outliers or extreme values that may affect the sample statistics? If so then I probably wouldn’t get similar results. Is the standard deviation (or measure of spread) large when compared to the mean, if it is then repeating the same results is unlikely.

Things to Consider When answering question or stating conclusions; Answers need to be precise and refer to actual data values present in the sample and/or population. Strata must be clearly defined. Answers cannot be vague or rote-learnt without referring specifically to the context of the assessment. Students must be very clear that the sample statistics are ESTIMATES of the population parameters. They must NOT state that the population mean is … unless they have taken a census of the whole population!

On Your Calculator In this part of the presentation you can check on exactly how to use your calculator effectively to help with Statistics Generating Random Numbers Entering Data Calculating Statistics Note: For more detailed instructions on any of the example click on the step you misunderstand

Entering Data on a Calculator Choose your calculator Casio FX-82 Casio Graphic Texas

Statistics on a Calculator Choose your calculator Casio FX-82 Casio Graphic Texas

1. In Stat Mode 2. In list 1 enter all data values 3. In list 2 enter their frequencies 4. F2 (CALC) 5. F6 (SET) Should read 6. Exit 7. F1 (1VAR) (All Statistics are listed χ is mean, χσn is std. dev.) F2 F1 F6 1Var XList :List1 1Var Freq :List2 2Var XList :List3 2Var YList :List4 2Var Freq :List5 EXIT Statistics on a Casio Graphics Calculator S.D. using table

Entering Data on Casio Graphics Calculator Enter each data value in List 1 followed by EXE Enter the frequency of each data value in List 2 followed by EXE Note If all of the frequencies are 1 then you don’t need to enter the frequencies. In the Set Menu change the 1Var Freq to 1 instead of list 2 List 1 List 2 List 3 List4 1 2 3 4 5 EXE

1. Put your calculator into statistics mode Mode 2 Mode 2 2. Clear the statistics memory Shift Mode 1 Shift Mode 1 3. Enter the data carefully 180cm M+ 180cm M+ 4. Calculate desired statistics Shift 2 Shift 2 1. χ mean 2. χσnstandard deviation Statistics on a Casio FX 82 Calculator Scl mode clr all 1 2 3 shift mode M+ Shown on Screen S.D. using table

Entering Data on Casio FX 82 Calculator Enter each data value followed by M+ ‘n’ is the number of data values that you have entered Note Be very careful entering the data values as you cannot review them later to make sure that they are correct. n = 1 M+

Statistics on a Texas Calculator 1. Put your calculator into statistics mode 1. 2 nd Function DATA 2. 1 - VAR 2. Enter the data carefully 1. DATA 3. Calculate desired statistics 1. STATVAR 2. Shift between statistics with arrow keys 1. nnumber of data values 2. χ mean 3. σχstandard deviation S.D. using table 2nd DATA n x Sx σx STATVAR

Entering Data on a Texas Calculator 2nd DATA X1 = 180 Press the Data Key to begin Begin entering data. X1 is the data value Followed by the down arrow Freq1 is that data values frequency Followed by the down arrow X2 is next then Freq2 To check data use up arrow

Generate a Random Number 1. Decide on the starting number (in this case 1) 2. Decide how many you need (In the case of the school 529 students) 3. Choose your calculator Casio FX-82 Casio Graphic Texas

Random Number on a Casio Graphics Calculator 1. Decide on the starting number (in this case 1) 2. Decide how many you need (In the case of the school 529 students) 3. In Run Mode Intg OPTN – F6 – F4 – F5 Ran# OPTN – F6 – F3 – F4 On Screen Intg(529 × Ran# + 1) Population size or Strata size Starting Value OPTN F3F4F6 ( ) 7 8 5 × + 1 Intg(529 × Ran# + 1)

Random Number on a Casio FX - 82 1. Decide on the starting number (in this case 1) 2. Decide how many you need (In the case of the school 529 students) 3. Ran# = 2 nd function · 4. On screen Ran# × 529 + 1 = Ran# × 529 + 1 = note Ignore any decimal in the answer Population size or strata size Starting value RAN#×529+1 · shift

Random Number on a Texas 1. Decide on the starting number (in this case 1) 2. Decide how many you need (In the case of the school 529 students) RANDIPRB → RANDI,2 nd Function ),2 nd Function ) 3. On Screen RANDI(1, 529) End Value Starting Value 2nd PRB ) RANDI(1,529)