DLI Orientation: Concepts

Slides:



Advertisements
Similar presentations
Archiving Trevor Croft MICS3 Data Archiving, Dissemination and Further Analysis Workshop Geneva - November 6th, 2006.
Advertisements

DLI & Research Data Centres Creating a better understanding of these two programs Chuck Humphrey Data Library University of Alberta April 2004.
Aggregate Data and Statistics
Elizabeth Hamilton Atlantic DLI Training April 29, 2005.
EQUINOX DATA DELIVERY SYSTEM May 31, 2011 –Elizabeth Hill Equinox.uwo.ca.
DLI Orientation: Concepts A Framework for Thinking about Statistical Information Train the Trainers Montreal, March 9, 2004 Chuck Humphrey Data Library.
The Economic and Social Data Service (ESDS) Karen Dennison UK Data Archive Improving access to government datasets 18 January 2007.
Input Data Warehousing Canada’s Experience with Establishment Level Information Presentation to the Third International Conference on Establishment Statistics.
Business Register Outputs in Support of Regional Policy John Perry UK Office for National Statistics.
Data Access and Data Use: the Missing Link? Elizabeth Hamilton University of New Brunswick Chuck Humphrey University of Alberta Data and Knowledge Transfer.
Chuck Humphrey Data Library University of Alberta.
Meeting the Challenge The National Population Health Survey and Data Access E. Hamilton UNB Libraries IASSIST 2003.
Introducing Statistics and Data Geographic, Statistical and Government Information Centre, Susan Mowers.
Geo-referenced data and DLI aggregate data sources Chuck Humphrey University of Alberta September 29, 2008.
Quantitative Evidence for Marketing Data Library, Rutherford North 1 st Floor Chuck Humphrey Data Library October 26, 2009.
Chuck Humphrey & Lynne Robinson University of Alberta Surviving Statistics Strategies for dealing with statistical questions on the reference desk.
Searching the University of Alberta Library’s Statistics Canada-based Websites 2001 Census of Canada Canadian Centre for Justice Statistics Canadian Business.
Quantitative Evidence for Marketing Data Library, Rutherford North 1 st Floor Chuck Humphrey Data Library March 6, 2009.
Statistics and Data for Marketing Data Library, Rutherford North 1 st Floor Chuck Humphrey Data Library October 27, 2008.
EAS 293 Data Library, Rutherford North 1 st Floor Chuck Humphrey Data Library October 14, 2008.
Community Information Database (CID) Presented by: Carl Sauriol Rural Research and Analysis Rural and Co-operatives Secretariat.
The Data Liberation Initiative Orientation Session Statistics Canada / Statistique Canada University of Alberta December 5, 2001 Chuck Humphrey.
Country Paper on: Census Data Accessibility, Confidentiality and Copyright Policy: Ethiopia’s Experience Seminar United Nations Regional Seminar on Census.
Geo-referenced data and DLI aggregate data sources Chuck Humphrey University of Alberta ACCOLEDS 2007.
NAICS? YIKES! (North American industry classification system (NAICS)? Yearly index of constant (k) dollar estimates (YIKES)!) Jeff Moon, Queens
The Crime Scene Justice Data and the Case of Multiple Files in GSS 18 Chuck Humphrey University of Alberta Atlantic DLI Workshop April 20-21, 2006.
Whither or wither? Tracking and Sharing Survey Data: Findings from the Field E. Hamilton UNB Libraries Accoleds 2003.
Finding Data & GIS Files at the U of S Library Kiran Doranalli Lucy Li
Statistics Canada’s Real Time Remote Access Solution 2011 MSIS Meeting – Karen Doherty May 2011.
Doing data & statistics at the reference desk (some of) what you’ll need to know OLA Super Conference Walter W. Giesbrecht Data Librarian,
Michel Séguin DLI Chief December 2006 The Need to Liberate The Data.
Data and Social Research Chuck Humphrey Data Library Rutherford North Library.
Chuck Humphrey, University of Alberta Atlantic DLI Training, 2008 DLI Orientation: Concepts A Framework for Thinking about Data and Statistics.
DLI Workshop -- Mar Hosted by Dalhousie University March 2000 DLI Training Workshop.
NAICS? YIKES! Or North American industry classification system (NAICS)? Yearly index of constant (k) dollar estimates (YIKES)!
Health Data Sources Sunny Kaniyathu 03 February 2011.
The Census of Canada and Immigration & Ethno-cultural Data Chuck Humphrey University of Alberta February 10, 2006.
5 Marzo 2007 Census mapping and Gis Part II: dissemination Fabio Crescenzi Istat, Central Directorate on General Censuses UNECE Training Workshop on Census.
Framework of Statistical Information. This is a typology of the categories or classes of statistical information. Remember the relationship between statistics.
Innovations in Data Dissemination Thomas L. Mesenbourg, Jr. Acting Director U.S. Census Bureau United Nations Seminar on Innovations in Official Statistics.
Soc : Principles of Research Design LONGITUDINAL DATA Sunny Kaniyathu, Data Services Librarian.
Creating Something from Nothing: Synthetic and Dummy files Bo Wandschneider University of Guelph Chuck Humphrey University of Alberta DLI Training: Ottawa,
United Nations Economic Commission for Europe Statistical Division The Importance of Databases in the Dissemination Process Steven Vale, UNECE.
2008 NCHS Data Users’ Conference Omni Shoreham Hotel Washington, DC Wednesday, August 13, 2008.
Statistical data confidentiality and micro data in Albania
Creating Something from Nothing: Working with Synthetic Files ACCOLEDS /DLI Training: December 2003 Chuck Humphrey University of Alberta.
RRM : Resource Data and Environmental Modeling DATA SOURCES Sunny Kaniyathu, Data Services Librarian.
Creating Open Data whilst maintaining confidentiality Philip Lowthian, Caroline Tudor Office for National Statistics 1.
Sociology 343 Chuck Humphrey Data Library University of Alberta.
Handling Reference Questions DLI Orientation Session Kingston, Ontario April 5, 2004.
DLI and EQUINOX Question 1 How do I find out what survey datasets are available from Statistics Canada ?
Stretching Your Data Management Skills Chuck Humphrey University of Alberta Atlantic DLI Workshop 2003.
Anticipating Great Things: A 2006 Census Preview June, 2006 DLI, Ottawa, ON Paul Schwets // Stuart Fyffe.
Hosted by the University of Regina Library December 1999 DLI Training Workshop Chuck Humphrey.
Getting the Whole Picture Using Numbers to Enhance Your Stories
Rural Development Finding data and statistics.  Statistics Canada: Federal statistical agency  Data released under the Data Liberation Initiative (DLI)
Geo-referenced data and DLI aggregate data sources
Tracking and Sharing Survey Data: Findings from the Field
Navigating Your Way Through the EFT, Nesstar and Beyond 20/20 (WDS)
Accessing data – a user’s perspective
Creating Something from Nothing: Working with Synthetic Files
DLI Orientation: Concepts
The Data Liberation Initiative Orientation Session
Research Data Centre DLI Workshop (December, 2001)
2001 Census of Population Products and Services Presentation to ACCOLEDS December 6, 2001.
Susan Mowers, Data Librarian, GSG Centre - UOttawa
University of Regina Library
Data Liberation Initiative (DLI)
Exploring the DLI Product line
Creating Something from Nothing: Working with Synthetic Files
Presentation transcript:

DLI Orientation: Concepts A Framework for Thinking about Statistical Information Chuck Humphrey Data Library University of Alberta April 2004

Statistical Information Two models for identifying and selecting appropriate statistical information: A chart of statistical information Distinguishing statistics & data Distinguishing aggregate data & microdata

Statistical Information Continuum of access Matching dissemination channels with desired products

Statistics or Data Statistics Data numeric facts/figures created from data, i.e, already processed presentation-ready Data numeric files created and organized for analysis requires processing not ready for display

Statistics or Data

Statistics or Data

Chart of Statistical Information

Chart of Statistical Information This is a typology of the categories or classes of statistical information. Remember the relationship between statistics and data, however, is causal. Statistics are created from data.

Chart of Statistical Information

Chart of Statistical Information An overlap occurs in this chart between Statistics: Databases and Data: Aggregate, which will be discussed below. Chart of Statistical Information

Chart of Statistical Information In print

In Print Rely on yearbooks, statistical abstracts, catalogues, and indexes to locate statistics in print. Examples of online indexes to print resources: Statistical Universe and Tablebase Example of an online catalogue that includes print resources: Statistics Canada’s Online Catalogue

Chart of Statistical Information Online

Online Statistics Example of e-publications Example of e-tables Statistics Canada Downloadable Publications (DSP) Example of e-tables Canadian Statistics (STC Website) Example of statistical databases CANSIM II (STC Website, E-STAT, CHASS)

E-Publications Tend to be available in PDF format Can use the “Select Text” Tool in the Adobe Reader and copy columns to another application

Statistical Information

E-Tables Tend to be displayed in HTML May provide a pull-down list to view other categories in the table Some e-tables will provide an alternate format for the table that can be downloaded (e.g., the Census tables are available in comma-separated ASCII, IVT, and print-friendly formats)

Databases Often use HTML forms to define the statistics to be retrieved May offer a variety of output formats for the retrieved statistics (e.g., E-STAT provides IVT format for Beyond 20/20, graphs, charts, maps, and ASCII formats for spreadsheets and databases)

Chart of Statistical Information Aggregate Data

Aggregate Data Aggregate data consist of statistics that are organized into a data structure and stored in a database or in a data file. The data structure is based on tabulations organized by time, geography, or social content.

Aggregate Data Example: CANSIM II Data Structure Time Geography Social Content

Aggregate Data Time series data have long fueled econometric models based on macro-economic indicators. Comma-separate values (CSV) have become an important format for time series data, which is often manipulated in Excel if not analyzed in a spreadsheet.

Aggregate Data Example: CENSUS Data Structure Time Geography Social Content

Aggregate Data Increased availability of GIS software has created greater demand for Census statistics organized as aggregate data. Beyond 20/20 has become a popular tool for reshaping census statistics from 1996 and 2001 for use with GIS software. DBF is the most commonly used format to share census statistics with GIS software.

A map from E-STAT of Montreal Census Tracts Aggregate Data

Aggregate Data “Small area statistics” are a special category of aggregate data. These data files consist of statistics for small geographic areas usually calculated from a population or manufacturing census or an administrative database with enough cases to create accurate summaries for small areas.

Aggregate Data Example: Cause of Death (HID) Data Structure Time Geography Social Content

Aggregate Data Also known as “cross-classified” tables, these files tend to be made of statistics constructed from social-content variables. Examples of cross-classified tables in DLI are found in education and justice.

Chart of Statistical Information Microdata

Microdata This is raw data organized in a file where the lines in the file represent a specific unit of observation and the information on the lines are the values of variables. There are different types of microdata files, which will now be discussed.

Confidential Microdata Master files: these files contain the fullness of detail captured about each case of the unit of observation. This detail is specific enough that the identify of a case can often be disclosed easily. Therefore, these files are treated as confidential.

Confidential Microdata Share files: these are confidential files in which the participants in the survey have signed a consent form permitting Statistics Canada to allow access to their information for approved research. These files consist of a subset of the cases in the master file.

Confidential Microdata In summary, confidential microdata get grouped into two types: master files and share files.

Public Use Microdata These microdata are specially prepared to minimize the possibility of disclosing or identifying any of the cases in a file, i.e, participants in a survey. The original data from the master file are edited to create a public use microdata file.

Public Use Microdata Steps in Anonymizing Microdata Remove of all personal identification information (names, addresses, etc); Include only gross levels of geography; Collapse detailed information into a smaller number of general categories; Cap the upper range of values of variables with rare cases; Suppress the values of a variable; or Suppress entire cases.

Public Use Microdata Statistics Canada PUMFs Only available for select social surveys that undergo a review of the Data Release Committee, an internal Statistics Canada committee. No ‘enterprise’ public use microdata.

Public Use Microdata Statistics Canada PUMFs Almost all PUMFs consist of cross-sectional samples, that is, samples where the data have been collected from respondents at one point in time. Longitudinal samples, where data are collected from the same individuals two or more times, are difficult to anonymize and maintain any useful information.

Synthetic Microdata These data files have been created by author divisions to assist with the analysis of confidential data files. The files provide the full variable structure of the confidential microdata but do not contain any real cases. They are intended to be used by researchers wanting to submit a file of commands in a statistical package’s language for remote job submission.

Synthetic Microdata They are also being used by those with approved projects in Research Data Centres to help prepare their analysis strategies prior to working in an RDC. Synthetic files are also commonly referred to as “dummy files,” although a more technical use of this term does exist for this specific type of synthetic file.

Synthetic Microdata A variety of synthetic file types are being created and tested by author divisions. One type has no real data but does contain a complete set of real variables. This type is the more technical reference to a dummy file. Another type has a mix of real data but no real cases. The purpose of this type is to provide -- in the aggregate -- results that should be close to an analysis of the real microdata file.

Synthetic Microdata Users of these files must be advised that none of the analytic results from these files should ever be reported. Their only purpose is to help researchers construct their statistical analysis programs to guard against syntax errors that might exist in their setup. The DLI FTP site clearly distinguishes synthetic files from real microdata files.

Summary: First Model

Summary: First Model This first model provides a way of thinking about the types of statistical information that exist. Is the information Statistics or Data? If Statistics, is the information in print or online? If online, is it in an e-pub, e-table, or database? If Data, is the information aggregate data or microdata?

The Second Model It is one thing to know about the variety of statistical information that exists, but access to this information is a separate issue. The second model describes the various dissemination channels through which access is provided to statistical information by Statistics Canada.

Continuum of Access Statistics Canada provides access to its statistical information through a variety of services and initiatives that function as dissemination channels. Think of this variety as constituting a continuum along which levels of access are provided.

Continuum of Access There are three characteristics that make up this continuum: Cost : which runs from free to expensive; Restrictions or conditions : which run from open or no restrictions to very restricted; and Type of Information : which runs from statistics to data.

Continuum of Access ACCESS CHANNELS Depository Service Program Open Free Statistics Restricted Expensive Data Depository Service Program Remote Job Submission Statistics Canada Website Data Liberation Initiative Custom Tabulations Research Data Centres

Statistics Canada Website Free, Open, Statistics The Daily is an important source of publicly-released official statistics. It has been available on the Website for several years and was the primary source for free statistics in the early years of the Statistics Canada website.

Statistics Canada Website Free, Open, Statistics With the introduction of Community Profiles from the 1996 Census in 2000 and more recent offerings from the Health Statistics Division, this dissemination channel has had a big increase in the amount of statistics available at the national, provincial, CMA, CSD, and Health Region levels.

Depository Service Program Free, Open, Statistics The Depository Service Program (DSP) has provided public access to government information for over 75 years. Through a network of public, special, and academic libraries, the Treasury Board has paid Federal Departments to release publications to the public through the DSP.

Depository Service Program Free, Open, Statistics Statistics Canada has a large series of publications that it makes available through the DSP. Many of these titles are available online in PDF format and are part of the Statistics Canada Downloadable Publication series. While these statistical publications are free, the public is required to go to a DSP library to access them.

Data Liberation Initiative Fee, Licenced, Conditional Access, Data and Statistics DLI provides a wider range of statistical information than the Statistics Canada Website or the DSP, but access in no longer free and rules apply those who are eligible to use these materials. This is a move away from free-&-open to fees-&-conditional access.

Data Liberation Initiative Fee, Licenced, Conditional Access, Data and Statistics DLI provides member institutions in the post-secondary educational sector with access to all “standard data products,” which consists of the statistical databases, public use microdata files, and geography files listed for sale in the Statistics Canada Online Catalogue.

Data Liberation Initiative Fee, Licenced, Conditional Access, Data and Statistics Patrons of this service must hold a current affiliation with a member institution and are restricted in their use of these materials for teaching, scholarly research, or institutional planning. Furthermore, secondary redistribution of DLI materials is not allowed.

Customized Tabulations Pay-per-view Access A long-term dissemination channel within Statistics Canada has been custom tabulation services. This is a contract service with Statistics Canada to produce tables from surveys or the Census that have not been produced for public release. Each customized product comes with its own licence.

Remote Job Submission AKA, Remote Data Access (RDA) This is a relatively new service for a select number of surveys. The terms of access vary among the author divisions offering this service. Some charge a fee (e.g., access to YITS and PISA is $75 a run), while other divisions do not charge. The Health Statistics Division requires a proposal to access the surveys for which it provides remote job submission.

Remote Job Submission AKA, Remote Data Access (RDA) Synthetic files have been created to assist with the preparation of the statistical command files that are submitted for remote processing. An analysis is prepared in the command language of a statistical package supported by the author division (SAS or SPSS, e.g.) and submitted via email to the division.

Remote Job Submission AKA, Remote Data Access (RDA) All results are screened by the author division for disclosure issues prior to the output being sent to the researcher who submitted the job. This dissemination channel provides a means of producing analysis from confidential data files with conditional approval and in some instances for a fee.

Research Data Centres Restricted Access to Confidential Data Research Data Centres house select confidential data files in a controlled Statistics Canada office environment. Access is provided on a project-by-project basis. A SSHRC-administered application process is used to evaluate the proposed use of the confidential data.

Research Data Centres Restricted Access to Confidential Data Furthermore, a security clearance with Statistics Canada must be passed. With approval from both the SSHRC peer review and the security clearance, the members of a research project must undergo an orientation to the RDC, swear an oath to the Statistics Act, and sign a contract with Statistics Canada.

Research Data Centres Restricted Access to Confidential Data The advantage of RDC access over Remote Job Submission is that researchers get to work directly with the confidential data source.

Statistical Information available through Statistics Canada Different Services Service: Statistics Canada Website Depository Service Program Data Liberation Initiative Cu$tomized Tabulations & Pay per View Remote Job Submission Research Data Centres Who is Eligible & Conditions: General Public: available on the Internet at www.statcan.ca Designated DSP Libraries & their Users: available on site Post-secondary Academic: restricted to teaching and research purposes Individuals: contract between STC and individual Approved Researchers: SSHRC peer review & deemed STC employee Products: - The Daily - Canadian - Census - Statistical profiles of Canadian communities - Downloadable publications - Paper publica- tions - Electronic pub- lications, which includes priced down-loadable publications & select CD ROMS Standard data products: aggregate data bases, microdata files and geography files Tables from confidential files that are specially produced by Statistics Canada for a fee and access to specialized databases “Dummy” or synthetic files to build analysis setups that must then be submitted to Stats Can for processing Confidential data files from the longitudinal surveys begun in the 1990’s Notes Warning: some parts of the Website are fee-based Some DSP libraries provide off-site access to authenticated users Interface to CANSIM I and Trade Analyzer available through CHASS (University of Toronto) by subscription Specialized databases include Open Free Statistics Restricted Expensive Data ACCESS Services available Applications can for selected titles. now be submitted CANSIM II and Remote job through the Trade Analyzer submission is the SSHRC Web site. most developed for NPHS.

Using the Two Models Combining these two models should assist you in identifying and selecting appropriate statistical information. The types of statistical information should help you identify an appropriate product, while the continuum of access should help you locate the channel or channels through which the statistical information is disseminated.

Using the Two Models Hopefully, you will find this framework useful in your data reference interviews, which is a separate topic in this orientation, and in navigating the DLI FTP site for various statistical information.

Warning Remember that while Statistics Canada is an important source of statistical information in our country, it is not the only source. Other important sources include other federal government and provincial departments, data libraries and archives, non- & inter-governmental agencies, and commercial vendors.