Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Weighting Strategy of the Canadian Community Health Survey Cathlin Sarafin Methodologist Statistics Canada March 25, 2008.

Similar presentations


Presentation on theme: "The Weighting Strategy of the Canadian Community Health Survey Cathlin Sarafin Methodologist Statistics Canada March 25, 2008."— Presentation transcript:

1 The Weighting Strategy of the Canadian Community Health Survey Cathlin Sarafin Methodologist Statistics Canada March 25, 2008

2 Outline Introduction Methodology The Canadian Community Health Survey (CCHS) The Multiple Frames The Weighting Strategy of the CCHS Methodology Recruitment Process

3 Introduction Methodology Structure: You Recruits are called Junior Methodologists Your Unit 2 to 7 Methodologists supervised by one Senior Methodologist Your Section 3 to 6 units working on related projects, managed by a Chief Your Division A division has roughly 100 people, usually all together on one floor of the building

4 Introduction Every person has their own responsibilities Senior Methodologist outlines tasks Discuss options and approaches as a team

5 Introduction Variance estimation Data quality indicators Record linkage Time series Data analysis Disclosure control Research and development Survey Methodology: Frame creation Sampling Questionnaire design Data collection methods Data processing Edit and imputation Weighting and estimation

6 The CCHS Collects general health information on the Canadian population Estimates produced for more than 120 Health Regions (HRs) across Canada Produces estimates on: Health Risk Factors Health Status Health Care Services

7 The CCHS The CCHS was introduced in 2000 Data was collected every second year for a total sample size of 130,000 per year It was redesigned in 2007 Data is now collected continuously for a total sample size of ≈ 65,000 respondents per year Annual files are released Multi-year files will be produced starting in 2009

8 The CCHS A cross-sectional survey Survey a specific population for a given period of time A longitudinal survey Survey a specific population repeatedly over time

9 The CCHS Target population: Individuals living in private dwellings aged 12 years old and over Exclusions: those living on Indian Reserves and Crown Lands, residents of institutions, full- time members of the Canadian Forces and residents of some remote areas CCHS covers ~98% of the Canadian population

10 The CCHS Has a complex, multi-stage, dual frame design Area frame (49%) Telephone list frame (50%) Random digit dialing (RDD) frame (1%) The telephone frames compliment the area frame in most HRs

11 The Area Frame Units are geographical areas Target sampling units are not listed Based on Labour Force Survey (LFS) design 6 rotation groups Stratified probability proportional to size sample of clusters Systematic sample of dwellings Random selection of a start Probabilistic sample of one individual per household

12 The Area Frame Stratum #1 Stratum #2 1.Each province is divided into geographic strata 2.Clusters selected within strata (PPS sampling)  1st stage 3.Dwellings selected within clusters (systematic sampling)  2nd stage 4.People selected within responding dwellings  3rd stage Province XYZ              LFS Sample Selection

13 The Area Frame Why use such a design? Stratification: Better coverage of the entire region of interest Increases precision Clustering: Efficient for interviewing (less travel, less costly) Decreases precision

14 The Area Frame The CCHS selection process: The LFS provides a list of available starts (systematic samples) within each cluster The clusters are mapped to the CCHS HRs A random selection of starts is chosen within a HR Probabilistic sample of one individual per household

15 The Area Frame 2-phase sample 1 st phase is the LFS sample of starts within the LFS strata 2 nd phase is the CCHS sample of starts within the HRs

16 The Area Frame Why use the LFS? No adequate list of addresses available Costly to create and maintain such a frame LFS has good coverage of target population It is a monthly sample conducted at Statistics Canada Continually updated

17 The Telephone Frame List of telephone numbers from across Canada Created using InfoDirect © files Stratified by HR SRSWOR sample of phone numbers Probabilistic sample of one individual per household

18 The RDD Frame Phone numbers are grouped into banks Banks are assigned to a HR Computer randomly generates the last 2 numbers Probabilistic sample of one individual per household

19 Dual Frame Design Multiple frames are used to: Improve the coverage of the target population Reduce costs Area Frame Covers target population Costly to implement Listing costs Face-to-face interview costs

20 Dual Frame Design Telephone Frame Only covers population with listed phone numbers Undercoverage may bias the estimates Growing problem with the increasing popularity of cell phones Less costly to implement Calls made from regional offices

21 Dual Frame Design RDD Frame Inefficient Results in a large amount of out-of-scope numbers Used alone for 2 northern regions LFS is not adequate for these 2 regions Used as a complement to the area frame in Whitehorse and Yellowknife Quality of telephone frame is considered poor in these regions

22 The Weighting Strategy of the CCHS Area Frame A4 - Household nonresponse A3 - Out-of-scope dwellings A2 - Stabilization A1 – Sub-cluster adjustment A0 – Initial weight Telephone Frame T4 - Multiple phone lines T3 - Household nonresponse T2 - Out-of-scope numbers T1 - Number of collection periods T0 - Initial weight Final CCHS Weight 6 Combined Frame I5 - Calibration I4 - Winsorization I3 – Person nonresponse I1 - Integration I2 – Person selection

23 Sampling Weights Number of people in the population represented by the interviewed person Ex: w i = 500 Can be broken down into 3 major steps: Design weights Nonresponse adjustment Calibration

24 Design Weights Weights determined by the design of the survey They are the inverse of the inclusion probability A person selected according to a sampling fraction of 1% will have a weight of 1/0.01 = 100 The design weights in the CCHS are calculated separately for each frame Sampling fractions differ between HRs, therefore design weights are not uniform

25 List Frame Design Weights The sample is stratified by HR, so weights are calculated within HR It is an SRSWOR of phone numbers Probability of selection within HR g is

26 Area Frame Design Weights The LFS is redesigned every 10 years A sample 20 year sample plan created The LFS provides a list of available starts Typically consists of 40 columns and 6 rows per LFS stratum Each row represents a rotation group Each column represents a monthly LFS sample

27 Area Frame Design Weights LFS Stratum RotationClusterStartClusterStartClusterStart 501111213 2242536 37879710 504616243 5949596 6516512513 One LFS sample

28 Area Frame Design Weights The LFS provides a weight for one LFS sample A weight for every start in one column This weight is used to assign a weight to all available starts The weights are then redistributed to the CCHS selected starts within each HR

29 Nonresponse Adjustments The design weights are corrected for total nonresponse (NR) All the variables for the respondent are missing Complete refusal Unable to contact the respondent Respondent absent for the duration of the survey language barrier Information obtained is unusable

30 Nonresponse Adjustments There are 2 types of NR in the CCHS Household level Person level The weights of the nonrespondents have to be redistributed to the respondents Form groups based on auxiliary information

31 NR Adjustments There are several methods available for the creation of response homogeneity groups (RHGs) The CCHS uses the scoring method Logistic regression is used to obtain a probability of response ( ) for every unit Groups are formed based on the values of

32 NR Adjustments Logistic Regression Models Variables include geographic information, process data and socio-economic indicators Variables derived from process data include: Number of attempts Time/day of attempt Called on weekday/weekend

33 NR Adjustments Initial groups are formed using a clustering algorithm in SAS These groups are then collapsed to ensure: A response rate of at least 50% At least 20 observations The adjustment within each RHG is

34 Integration of Frames Area Frame Telephone Frame No phone line Unlisted phone number Listed phone number

35 Integration of Frames Area Frame Population = A Sample = S A Telephone Frame Population = B Sample = S B

36 Integration Integration factor: A number between 0 and 1 For CCHS it is based on sample size

37 Integration Parameter of interest: Unbiased estimates

38 Integration Composite estimation

39 Integration of Frames Possible to integrate only the overlapping populations covered by the 2 frames Problem identifying the overlapping portion for the area frame due to nonresponse Possible to impute these cases

40 Integration of Frames Area Frame Telephone Frame SBSB S AB SASA S AU

41 Integration of Frames Logistic regression is used to assign a probability of belonging to the non- common part S A The final integration method is

42 Calibration Weights are adjusted to match population projection counts Based on the Census Adjusted to account for births, deaths, immigration and emigration The rounded average of the monthly projection counts is used within each post-stratum

43 Calibration Why is calibration used? Gives confidence when estimating totals Improves precision of the estimates If auxiliary variables are well correlated to the survey variables Adjusts for coverage inadequacies when the survey population differs from the target population

44 Calibration In the CCHS All post-strata with at least 20 observations are calibrated at the HR by age by sex level HR: 120 across Canada Age groups: 12-19, 20-29, 30-44, 45-64 and 65 + Sex: Male and Female

45 Calibration Age Group Number of Observations 12-1915 20-2940 30-4453 45-6418 65 + 31 Age GroupNumber of Observations 12-1925 20-2940 30-4453 45-6422 65 + 31 FemalesMales Example: HR 2 Post-strata = HR by age by sexPost-strata = HR by sexPost-strata = Prov by age by sex

46 Final Weights Master: Contains all variables for all respondents Share: Contains all variables for the subset of people who agreed to share (subset of records) PUMF: Contains a subset of variables for all respondents (subset of variables) Dummy: Contains a subset of records from the master file. Scrambled data used for testing and remote access purposes Bootstrap: Created for variance estimation purposes Special Requests: linkage, different geographies, etc.

47 Methodology Typical tasks: Write computer programs to solve problems or explore data Attend meetings Write documentation Present our work at seminars Work on different committees

48 Methodology Working Conditions Permanent job Continuous learning: Computer courses Statistics and methodology courses Language courses Seminars, conferences and publications

49 Methodology All methodologists work at the Head Office in Ottawa

50 Recruitment Our recruitment campaign takes place each fall Detailed presentations at the Universities by early October It is a 3 step process: On-line application Starts in September Deadline in mid-October Written Exam Early November Interview January

51 Recruitment Who can apply? Persons residing in Canada and Canadian citizens residing abroad Preference will be given to Canadian citizens Bilingualism No preference is given to those who speak both English and French

52 For more information please contact www.statcan.ca Under: About Us Employment opportunities Mathematical statisticians (MA) Email: MA-recruitment@statcan.ca Telephone: 1-888-321-3089

53 Thank you Cathlin.sarafin@statcan.ca Canadian Community Health Survey cchs-escc@statcan.ca


Download ppt "The Weighting Strategy of the Canadian Community Health Survey Cathlin Sarafin Methodologist Statistics Canada March 25, 2008."

Similar presentations


Ads by Google