Presentation on theme: "1 Statistics Canada Research Data Centre Program* Facilities across Canada housing detailed confidential microdata and documentation files from Statistics."— Presentation transcript:
1 Statistics Canada Research Data Centre Program* Facilities across Canada housing detailed confidential microdata and documentation files from Statistics Canada Statistics Canada released data that would otherwise not be available into “secure” sites.
About statistics Canada data: Canadian Community Health Survey (CCHS) Ethnic Diversity Survey (EDS) General Social Survey (GSS selected cycles) –Access to and Use of Information Communication TechnologyAccess to and Use of Information Communication Technology –Education, Work and RetirementEducation, Work and Retirement –FamilyFamily –HealthHealth –Social EngagementSocial Engagement –Social Support and AgingSocial Support and Aging –Time UseTime Use –VictimizationVictimization Longitudinal Survey of Immigrants to Canada (LSIC) National Longitudinal Survey of Children and Youth (NLSCY) National Population Health Survey (NPHS) Survey of Labour and Income Dynamics (SLID) Workplace and Employee Survey (WES) Youth in Transition Survey and the Programme for International Student Assessments (YITS-PISA) 2
Stats Canada data are released to universities through the “Data Liberation Initiative” Most Cdn. universities part of this Data housed in university data library (at Uvic: Kathleen Matthews, library) and copies are made available to any researcher requesting it as long as: –a) agree to terms (no re-dissemination; etc.) –b) bona fida member of the university community 3
DLI Restrictions No longitudinal data (in some cases, cross-sectional waves, not linked and with unique identifiers stripped, are available, but in other cases survey not available at all) Many variables treated as “confidential” and deleted from dataset or coarsely categorized 4
5 censored variables Full versions of datasets with censored variables + datasets not otherwise available can be worked on in a “Research Data Centre”
7 There are RDCs across Canada at most major universities with doctoral programs: New Brunswick, Dalhousie, Moncton Toronto (York has a “branch” which will soon develop into a full-blown centre) Waterloo (Guelph, WLU participate + a “branch” at Laurentian) McMaster (Brock participates) Western (Windsor participates and will soon have a branch) Queen’s (part-time site) Carleton (U of Ottawa participates) Manitoba U of Saskatchewan 2 Alberta sites: U of Alberta; Calgary (various Prairie universities participate) Manitoba Consortium (U de Montreal) with branches at UQAM, Sherbrooke, Laval McGill BC universities consortium BC consortium: UBC, SFU, UVic, Vancouver Island Univ. Statistics Canada Research Data Centre Program
8 The UVic branch works within the British Columbia Interuniversity Research Data Centre network “main” site is at UBC; open 9-5 M-F UVic site has more restrictive hours (arranged term- by-term in consultation with researchers). –Currently 15.5 hours/week (sometimes a bit less in summer) –Exact hours worked out in consultation with users
9 Support: Capital costs: –Canadian Foundation for Innovation – Office of the Provost Operating costs: – Dean of Social Science –Vice-President, Research –Dean of Graduate Studies Past support & seeking support for present year: – Dean of Humanities; Dean of Graduate Studies, Assoc. Dean, Island Med. Pgm. –Dean of Human and Social Development – Dean of Education
10 What is the relationship between the RDC network and the “Data Liberation Initiative”? often users work with the DLI version of a dataset before progressing to work using the RDC StatCan will only approve projects if it can be demonstrated that DLI data is insufficient or there are no DLI files for the survey of interest contact person on campus for DLI: Kathleen Matthews
11 RDCDLI Files available -Those listed on RDC site -Other files if arragements can be made - Those listed on DLI site – see UVic library web page under Data Acquisition Files not available Any linked longitudinal dataset Recent waves of NCLSY Some newly available surveys Information on files not avail. Cluster numbersCluster numbers, Geographic detail Demographic detail Other
12 RDCDLI Who may access - Faculty with approved projects -Graduate students with approved projects (+faculty co-investigator) - Any member of UVic community with NetLink ID Where files may be worked on In Data Centre onlyMay be downloaded to be used anywhere, with agreement not to redistributed Initial contactDoug Baer or Lee Grenon (StatCan Analyst located in Vancouver at UBC) Kathleen Matthews, Data Librarian, UVic
13 Data used most frequently at RDCs survey of labor income dynamics (SLID) Victimization GSS cycle 3, 8, 13, 18 Participation and Activity Limitation Family history GSS cycle 5, 10, 15 and 20 homicide data-pilot national graduate survey / Follow-up Survey of 2000 Graduates Canadian Health survey 1.2 Canadian Health survey 2.2 Canadian Health survey.1
14 Major Statcan surveys: (ALL VERY WELL SUPPORTED AT RDCs) Workplace and Employment Survey Canadian Community Health Survey Health Services Access Survey Longitudinal: National Population Health Survey Survey of Labour & Income Dynamics National Longitudinal Survey of Children and Youth Longitudinal Survey of Immigrants to Canada Youth in Transition Survey Workplace & Employment Survey Census (presently 1991,1996,2001,2006)
15 Other “supported” surveys ethnic diversity survey Social engagement GSS cycle 17 Family expenditure survey Housing facilities by income Survey of consumer finances survey of household spending labour force survey Education and work and technology GSS cycle 4 and 9, 14 adult education and training international adult literacy and skills survey survey of approaches to education planning Food expenditure survey survey of financial security national survey on giving volunteering and participating cycle 1 and 6
16 Other surveys adult literacy and lifestyles survey canada's alcohol and other drug survey cross national equivalency file consumer price survey Canadian tobacco use monitoring survey Employment Insurance Coverage Survey Foreign direct investment Health Promotion Survey Information and Communications Technologies in Schools Survey labour market activity survey Statistics Canada Survey of Literacy Skills Used in Daily Activities National Private Vehicle Use Survey Ontario Adult Literacy Survey Ontario Child Health Study ontario first nation regional health survey Post-Secondary Education Participation Survey Residential care facilities survey survey of displaced workers survey of independent workers School Leavers Follow-up Survey School Leavers Survey Public service employees survey survey of repeat users of EI Status vector file us health and retirement survey united stats national health interview survey
17 & other data can be arranged There is presently a project involving BC Administrative Health data (to be linked to Stats Can survey data) For a very large list of StatCan Surveys, see the DLI website (UVic library) click on “DLI collection” future plans: see below
18 What is the process for gaining access?
19 Application process works through SSHRC Graduate students must have faculty member as co- investigator
20 Project proposal Proposal evaluation by SSHRC peer review and Statistics Canada Very few are turned down… though must establish that confidential data are required to complete project –Does project have scientific merit? is access to confidential microdata necessary? Does researcher have expertise to conduct research? –Takes 6-8 weeks Proposals that are part of SSHRC or CIHR grants forgo the SSHRC peer review process –Approvals typically 3-4 weeks
21 Process: Submit proposal Proposal approved Security check on applicant oath, investigator becomes “deemed employee” of statistics canada Orientation session at UVic Issued access card for card reader
22 UVic facilities: 6 workstation lab with room for expansion to up to 10 workstations workstations now have widescreen monitors or dual screen configuration Server for data Most commonly used statistical software packages Some highly specialized software packages Hours are worked out to suit the needs of active researchers. Fall 2008 hours: Monday & Thursday 10am-3pm 5 additional hours to be worked out in consultation with users
23 Software Standard stats packages: SPSS (17), SAS (9.1) STATA (10)** [Stata/SE on one machine) Open-source stats: R Multilevel models: HLM, LISREL, MPlus SEM models: LISREL, MPlus Specialized (Bayesian, MCMC etc.): WinBugs Other software can be obtained if demand exists.
24 Security process No output or notes can be taken out of the room Users have file drawers and access to printer inside the centre Output listings and notes (if typed into a computer file) can be released after they are “vetted” by a Statistics Canada Analyst at the main BC site Files are sent via encrypted CD to Vancouver (2-3 days) Files that are approved for release are ed back to researcher Pass card works only during centre hours (swipe in, swipe out protocol)
25 Can I work at other RDCs too? Can I work with other researchers? What about other researchers at other universities? Access is “network wide” Files are stored on a “project” basis (researchers, RAs, etc. have own account but access to shared files) UVic researchers are part of the BC consortium and could go to the UBC site if more intense periods of research are required (35 hrs/week vs. 15); project files can be sent to and from the branch (3-6 days)
26 Preparation: Check to see if dataset is one of standard RDC datasets: check –Extensive data documentation provided for listed datasets –If what you are interested is not on the list, check with Doug Baer or Lee Grenon Is a public use file available? Check with Kathleen Matthews or on library web site. Verify that variables needed for research are not on public use file. If possible, use public use file to explore data, etc.http://gateway.uvic.ca/data/default.html If further dataset documentation required, ask Doug Baer or Lee Grenon Go to SSHRC web page to put together application. Don’t hesitate to consult Doug Baer for help. Be prepared to specify variables to be used. Where a public use version of the dataset is available, be prepared to make clear why RDC access is needed (e.g., “a needed variable is suppressed on the public use file”).
27 Statistics Training Summer Institutes: –SPIDA (York University) –ICPSR (U Michigan) –Prairie school? (Calgary) –Possible BC initiatives –Seminar at the Congress for the Humanities & Social Sciences (this year: multilevel models) Special workshops and seminars (Baer): –Possible: SEM, survival/event history models, longitudinal data, multi-level data
28 Contact information: Doug Baer, Academic Director (Sociology) (721) – 7581 Cornett, A365 RDC (Assistant Lorraine Dame) (853) 3196 RDC Analyst at UBC: Lee Grenon, Centre web site (shows hours): web.uvic.ca/rdc
Future: Plans are in development to add the following to RDC dataset collection: –Cancer Registry (pilot project in progress at BCIRDC) –HRSDC administrative data –CPP-disability data –Homicide data (Cdn. Centre for Justice Statistics) [under review: pilots only] –Census –Business data: (selected datasets from Small Business & Special Surveys Division) 29