Presentation is loading. Please wait.

# Sampling Design for International Surveys in Education Guide to the PISA Data Analysis ManualPISA Data Analysis Manual.

## Presentation on theme: "Sampling Design for International Surveys in Education Guide to the PISA Data Analysis ManualPISA Data Analysis Manual."— Presentation transcript:

Sampling Design for International Surveys in Education Guide to the PISA Data Analysis ManualPISA Data Analysis Manual

Finite versus Infinite – Most human populations can be listed but other types of populations (e.g. mosquitoes) cannot; however their sizes can be estimated from sample If a sample from a finite population is drawn from a finite population with replacement, then the population is assimilated to an infinite population Costs of a census Time to collect, code or mark, enter the data into electronic files and analyze the data Delaying the publication of the results, delay incompatible with the request of the survey sponsor The census will not necessarily bring additional information Why drawing a sample, but not a census

Let us assume a population of N cases. To draw a simple random sample of n cases: – Each individual must have a non zero probability of selection (coverage, exclusion); – All individuals must have the same probability of selection, i.e. a equi-probabilistic sample and self-weighted sample – Cases are drawn independently each others What is a simple random sample (SRS)?

SRS is assumed by most statistical software packages (SAS, SPSS, Statistica, Stata, R…) for the computation of standard errors (SE); If the assumption is not correct (i.e. cases were not drawn according to a SRS design) – estimates of SE will be biased; – therefore P values and inferences will be incorrect – In most cases, null hypothesis will be rejected while it should have been accepted What is a simple random sample (SRS)?

There are several ways to draw a SRS: – The N members of the population are numbered and n of them are selected by random numbers, without replacement; or – N numbered discs are placed in a container, mixed well, and n of them are randomly selected; or – The N population members are arranged in a random order, and every N/n member is then selected or the first n individuals are selected. How to draw a simple random sample

Randomness : use of inferential statistics – Probabilistic sample – Non-probabilistic sample Convenience sample, quota sample Single-stage versus multi-stage samples – Direct or indirect draws of population members Selection of schools, then classes, then students Criteria for differentiating samples

Probability of selection – Equiprobabilistic samples – Samples with varying probabilities Selection of farms according to the livestock size Selection of schools according to the enrolment figures (PPS: Probability Proportional to Size) Stratification – Explicit stratification dividing the population into different subpopulations and drawing independent samples within each stratum Criteria for differentiating samples

Stratification – Explicit stratification – Implicit stratification sorting the data according to one or several criteria and then applying a systematic sampling procedure Estimating the average weight of a group of students – sorting students according to their height – Defining the sampling interval (N/n) – Selecting every (N/n) th students Criteria for differentiating samples

The target population (population of inference): a single grade cohort (IEA studies) versus age cohort, typically a twelve-month span (PISA) – Grade cohort In a particular country, meaningful for policy makers and easy to define the population and to sample it How to define at the international level grades that are comparable? – Average age – Educational reform that impact on age average Criteria for designing a sample in education

Extract from the J.E. Gustafsson in Loveless, T (2007) TIMSS grade 8 : Change in performance between 1995 and 2003 Criteria for designing a sample in education

– Age cohort Same average age, same one year age span Varying grades Not so interesting at the national level for policy makers Administration difficulties Difficulties for building the school frame Criteria for designing a sample in education

Multi-stage sample – Grade population Selection of schools Selection of classes versus students of the target grade – Student sample more efficient but impossible to link student data with teacher / class data, – Age population Selection of schools and then selection of students across classes and across grades Criteria for designing a sample in education

School / Class / Student Variance

Criteria for designing a sample in education School / Class / StudentVariance

Criteria for designing a sample in education School / Class / StudentVariance

Criteria for designing a sample in education School / Class / Student Variance

OECD (2010). PISA 2009 Results: What Makes a School Successfull? Ressources, Policies and Practices. Volume IV. Paris: OECD. Criteria for designing a sample in education

19 Variance Decomposition Reading Literacy PISA 2000 Criteria for designing a sample in education

What is the best representative sample: – 100 schools and 10 students per school; OR – 20 schools and 50 students per school? Systems with very low school variance – Each school SRS – Equally accurate for student level estimates – Not equally accurate for school level estimates In Belgium, about 60 % of the variance lies between schools: – Each school is representative of a narrow part of the population only – Better to sample 100 schools, even for student level estimates

Data collection procedures – Test Administrators External Internal – Online data collection procedures Cost of the survey Accuracy – IEA studies: effective sample size of 400 students – Maximizing accuracy with stratification variables Criteria for designing a sample in education

Weights Simple Random Sample

Weights Simple Random Sample (SRS)

Weights Multi-Stage Sample : SRS & SRS Population of – 10 schools with exactly – 40 students per school SRS Samples of – 4 schools – 10 students per school

Weights Multi-Stage Sample : SRS & SRS

Sch IDSizePiPi WiWi P j|i W j|i P ij W ij Sum(W ij ) 140 2 0.42.50.2540.110100 340 4 5 0.42.50.2540.110100 640 7 0.42.50.2540.110100 840 9 10400.42.50.2540.110100 Total10400 Weights Multi-Stage Sample : SRS & SRS

Sch IDSizePiPi WiWi P j|i W j|i P ij W ij Sum(W ij ) 1 10 2 150.42.50.661.50.273.7537.5 3 20 4 25 5 300.42.50.3330.137.575 6 35 7 400.42.50.2540.110100 8 45 9 80 10 1000.42.50.1100.0425250 Total 40010462.5 Weights Multi-Stage Sample : SRS & SRS

Sch IDSizePiPi WiWi P j|i W j|i P ij W ij Sum(W ij ) 1 100.42.5110.42.525 2 150.42.50.661.50.273.7537.5 3 200.42.50.520.2550 4 250.42.50.42.50.166.2562.5 Total 10175 Weights Multi-Stage Sample : SRS & SRS Sch IDSizePiPi WiWi P j|i W j|i P ij W ij Sum(W ij ) 7400.42.50.25040.1010.00100.0 8450.42.50.2224.50.8811.25112.5 9800.42.50.12580.0520.00200.0 101000.42.50.100100.0425.00250.0 Total10662.5

Weights Multi-Stage Sample : PPS & SRS

Sch IDSizePiPi WiWi P j|i W j|i P ij W ij Sum(W ij ) 110 215 3200.25.000.5002.00.110100 425 530 635 7400.42.500.2504.00.110100 845 9800.81.250.1258.00.110100 1010011.000.10010.00.110100 Total4009.75400 Weights Multi-Stage Sample : PPS & SRS

Sch IDSizePiPi WiWi P j|I W j|i P ij W ij Sum(W ij ) 1 100.1010.001.00 0,1010100 2 150.156,670.671.500,1010100 3 200,205.000.502.000,1010100 4 250.254.000.402.500,1010100 Total 25.67 400 Weights Multi-Stage Sample : PPS & SRS Sch IDSizePiPi WiWi P j|i W j|i P ij W ij Sum(W ij ) 7400.402.500.254.000,1010100 8450.452.220.224.500,1010100 9800.801.250.138.000,1010100 101001.00 0.1010.000,1010100 Total 6.97 400

Several steps – 1. Data cleaning of school sample frame; – 2. Selection of stratification variables; – 3. Computation of the school sample size per explicit stratum; – 4. Selection of the school sample. How to draw a Multi-Stage Sample : PPS & SRS

Step 1:data cleaning: – Missing data School ID Stratification variables Measure of size – Duplicate school ID – Plausibility of the measure of size: Age, grade or total enrolment Outliers (+/- 3 STD) Gender distribution … How to draw a Multi-Stage Sample : PPS & SRS

Step 2: selection of stratification variables – Improving the accuracy of the population estimates Selection of variables that highly correlate with the survey main measures, i.e. achievement – % of over-aged students (Belgium) – School type (Gymnasium, Gesantschule, Realschule, Haptschule) – Reporting results by subnational level Provinces, states, Landers Tracks Linguistics entities How to draw a Multi-Stage Sample : PPS & SRS

Step 3: computation of the school sample size for each explicit stratum – Proportional to the number of students schools How to draw a Multi-Stage Sample : PPS & SRS

StratumSchool IDSize 1120 12 13 14 15 2660 27 28 29 21060 5 schools and 100 students How to draw a Multi-Stage Sample : PPS & SRS 5 schools and 100 students

Proportional to the number of schools (i.e. 2 schools per stratum and 10 students per school) StratumSchool IDSizeWiWi W j|i W ij 1120 12 2.5025 1320 14 2.5025 1520 2660 27 2.50615 2860 29 2.50615 21060 How to draw a Multi-Stage Sample : PPS & SRS

Proportional to the number of students How to draw a Multi-Stage Sample : PPS & SRS Stratum Number of schools Number of students % Schools to be sampled WiWi W j|i W ij 1510025%15210 2530075%35/3610 This is an example as it is required to have at least 2 schools per explicit stratum

Step 4: selection of schools – Distributing as many lottery tickets as students per school and then SRS of n tickets A school can be drawn more than once Important sampling variability for the sum of school weights – From 6.97 to 25.67 in the example How to draw a Multi-Stage Sample : PPS & SRS Sch IDSizePiPi WiWi Sch IDSizePiPi WiWi 1 100.1010.00 7 400.402.50 2 150.156.67 8 450.452.22 3 200.205.00 9 800.801.25 4 250.254.00 10 1001.00 Total 25.67 Total 6.97

Step 4: selection of schools – Use of a systematic procedure for minimizing the sampling variability of the school weights Sorting schools by size Computation of a school sampling interval Drawing a random number from a uniform distribution [0,1] Application of a systematic procedure – Impossibility of selecting the n sc smallest schools or the n sc biggest schools How to draw a Multi-Stage Sample : PPS & SRS

IDSizeFromToSAMPLED 115 1 1 220 16350 325 36600 430 61900 535 911251 640 1261650 745 1662100 850 2112601 960 2613200 1080 3214001 Total400 1.Computation of the sampling interval, i.e. 2.Random draw from a uniform distribution [0,1], i.e. 0.125 3.Multiplication of the random number by the sampling interval 4.The school that contains 12 is selected 5.Systematic application of the sampling interval, i.e. 112, 212, 312 How to draw a Multi-Stage Sample : PPS & SRS

IDSizePiPi WiWi 1 100.1010.00 2 150.156.67 3 200.205.00 4 250.254.00 5 300.303.33 6 350.352.86 7 400.402.50 8 450.452.22 9 500.502.00 10 1301.300.77 Total400 IDSizePiPi WiWi 1 1 100.119.00 2 150.176.00 3 200.224.50 4 250.283.60 5 300.333.00 6 350.392.57 7 400.442.25 8 450.502.00 9 500.561.80 Total270 210130 11 4 3 Certainty schools How to draw a Multi-Stage Sample : PPS & SRS

CountryMeanP5P95STDCV AUS 16.63.129.19.054.3 AUT 18.310.233.46.636.0 BEL 13.91.122.36.345.5 CAN 16.41.166.21.5131.5 CHE 7.41.020.87.196.8 CZE 21.72.249.814.566.8 DEU 184.7127.4273.346.125.0 DNK 12.67.720.13.729.3 ESP 19.52.183.126.8137.5 FIN 13.010.915.82.216.6 FRA 156.8136.7193.319.112.2 GBR 55.77.0152.956.3101.2 GRC 19.811.533.16.432.4 HUN 23.615.439.57.230.6 IRL 12.010.015.21.815.2 ISL 1.21.01.50.112.2 ITA 23.91.293.527.7116.1 Weight variability (w_fstuwt) OECD (PISA 2006)

Why do weights vary at the end? – Oversampling (Ex: Belgium, PISA 2009) Weight variability Belgian Communities Sample sizeAverage weightSum of weights Flemish 459614.3365847 French 310916.8752453 German 7961.05839 – Non-response adjustment – Lack of accuracy of the school sample frame – Changes in the Measure of Size (MOS)

Lack of accuracy / changes. – PISA 2009 main survey School sample drawn in 2008; MOS of 2006 Ex: 4 schools with the same p i, selection of 20 students ID Old Size PiPi W New size P j|i W j|i P ij W ij Sum(W ij ) 11000.2052000.10100.020501000 21000.2051400.1470.02835700 31000.205800.2540.05020400 41000.205400.5020.10010200 Weight variability Larger risk with small or very small schools

StratumIDSizeWiWi Parti.W i _adW j|i W ij Parti.W ij _adSum 1 120 2 5.001 2.0010812.5100 320 4 5 Total100 2 6601.6612.506.0015818.75150 760 8 1.660 9601.6612.506.00151015150 1060 Total3005 Non-response adjustment (school / student ) : ratio between the number of units that should have participated and the number of units that actually participated Weight variability

3 types of weight: TOTAL weight: the sum of the weights is an estimate of the target population size CONSTANT weight : the sum of the weights for each country is a constant (for instance 1000) – Used for scale (cognitive and non cognitive) standardization SAMPLE weight : the sum of the weights is equal to the sample size Different types of weight

Similar presentations

Ads by Google