The Data Liberation Initiative Orientation Session Statistics Canada / Statistique Canada University of Alberta December 5, 2001 Chuck Humphrey.

Slides:



Advertisements
Similar presentations
DLI Orientation: Concepts
Advertisements

DLI & Research Data Centres Creating a better understanding of these two programs Chuck Humphrey Data Library University of Alberta April 2004.
Elizabeth Hamilton Atlantic DLI Training April 29, 2005.
EQUINOX DATA DELIVERY SYSTEM May 31, 2011 –Elizabeth Hill Equinox.uwo.ca.
Elizabeth Hamilton Chuck Humphrey Sage Cram Mike Sivyer Atlantic DLI Training April 22nd, 2005.
DLI Orientation: Concepts A Framework for Thinking about Statistical Information Train the Trainers Montreal, March 9, 2004 Chuck Humphrey Data Library.
C6 Databases.
Input Data Warehousing Canada’s Experience with Establishment Level Information Presentation to the Third International Conference on Establishment Statistics.
1 The DLI Contacts and Designates Survey: Ontario regional profile Gaëtan Drolet Train the Trainers February 23-25, 2010 Université de Montréal Montréal,
Data Access and Data Use: the Missing Link? Elizabeth Hamilton University of New Brunswick Chuck Humphrey University of Alberta Data and Knowledge Transfer.
Chuck Humphrey Data Library University of Alberta.
Meeting the Challenge The National Population Health Survey and Data Access E. Hamilton UNB Libraries IASSIST 2003.
Geo-referenced data and DLI aggregate data sources Chuck Humphrey University of Alberta September 29, 2008.
Quantitative Evidence for Marketing Data Library, Rutherford North 1 st Floor Chuck Humphrey Data Library October 26, 2009.
Chuck Humphrey, Leah Vanderjagt and Anna Bombak University of Alberta The Winter Institute on Statistical Literacy for Librarians Demystifying statistics.
Chuck Humphrey & Lynne Robinson University of Alberta Surviving Statistics Strategies for dealing with statistical questions on the reference desk.
Searching the University of Alberta Library’s Statistics Canada-based Websites 2001 Census of Canada Canadian Centre for Justice Statistics Canadian Business.
Quantitative Evidence for Marketing Data Library, Rutherford North 1 st Floor Chuck Humphrey Data Library March 6, 2009.
Statistics and Data for Marketing Data Library, Rutherford North 1 st Floor Chuck Humphrey Data Library October 27, 2008.
EAS 293 Data Library, Rutherford North 1 st Floor Chuck Humphrey Data Library October 14, 2008.
Country Paper on: Census Data Accessibility, Confidentiality and Copyright Policy: Ethiopia’s Experience Seminar United Nations Regional Seminar on Census.
Geo-referenced data and DLI aggregate data sources Chuck Humphrey University of Alberta ACCOLEDS 2007.
PUBH 898: Health Economics Finding data and statistics.
Product Retrieval Statistics Canada / Statistique Canada Chuck Humphrey ACCOLEDS/DLI Training December, 2001.
NAICS? YIKES! (North American industry classification system (NAICS)? Yearly index of constant (k) dollar estimates (YIKES)!) Jeff Moon, Queens
Merging census aggregate statistics with postal code-based microdata Laine Ruus University of Toronto. Data Library Service ,
FCM Quality of Life Reporting System Metadata By: Acacia Consulting and Research June 2002.
Next on OPRAH – Bringing Data Out of the Closet Walter Giesbrecht, Data Librarian York University Jeff Moon, Head, Documents Unit Queen’s University OLA.
Finding Data & GIS Files at the U of S Library Darlene Fichter & Elise Pietroniro
Whither or wither? Tracking and Sharing Survey Data: Findings from the Field E. Hamilton UNB Libraries Accoleds 2003.
Finding Data & GIS Files at the U of S Library Kiran Doranalli Lucy Li
Purchasing BIOSIS Electronic Content Presentation to ICOLC4 Meeting Denver, CO October 2, 1998.
Health Statistics Information on STC website Calgary–DLI training–Dec 2003 Michel B. Séguin, Statistics Canada,
Nesstar: A Web-based Data Extraction and Analysis System Richard Pinnell & Sandra Keys, University of Waterloo Libraries.
Doing data & statistics at the reference desk (some of) what you’ll need to know OLA Super Conference Walter W. Giesbrecht Data Librarian,
Michel Séguin DLI Chief December 2006 The Need to Liberate The Data.
Data and Social Research Chuck Humphrey Data Library Rutherford North Library.
Chuck Humphrey, University of Alberta Atlantic DLI Training, 2008 DLI Orientation: Concepts A Framework for Thinking about Data and Statistics.
DLI Workshop -- Mar Hosted by Dalhousie University March 2000 DLI Training Workshop.
NAICS? YIKES! Or North American industry classification system (NAICS)? Yearly index of constant (k) dollar estimates (YIKES)!
Health Data Sources Sunny Kaniyathu 03 February 2011.
The Census of Canada and Immigration & Ethno-cultural Data Chuck Humphrey University of Alberta February 10, 2006.
5 Marzo 2007 Census mapping and Gis Part II: dissemination Fabio Crescenzi Istat, Central Directorate on General Censuses UNECE Training Workshop on Census.
Framework of Statistical Information. This is a typology of the categories or classes of statistical information. Remember the relationship between statistics.
Innovations in Data Dissemination Thomas L. Mesenbourg, Jr. Acting Director U.S. Census Bureau United Nations Seminar on Innovations in Official Statistics.
C6 Databases. 2 Traditional file environment Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple data files.
ISR Training February 12,  Types of information you’ll find  Searching the website  Finding statistics using... ◦ Browse By Subject (Summary.
Soc : Principles of Research Design LONGITUDINAL DATA Sunny Kaniyathu, Data Services Librarian.
Creating Something from Nothing: Synthetic and Dummy files Bo Wandschneider University of Guelph Chuck Humphrey University of Alberta DLI Training: Ottawa,
DATABASE MANAGEMENT SYSTEMS CMAM301. Introduction to database management systems  What is Database?  What is Database Systems?  Types of Database.
Project? Microdata? Say what? TRY Conference May 5, 2008 Suzette Giles, Ryerson University Laine Ruus, University of Toronto.
Creating Something from Nothing: Working with Synthetic Files ACCOLEDS /DLI Training: December 2003 Chuck Humphrey University of Alberta.
Handling Reference Questions DLI Orientation Session Kingston, Ontario April 5, 2004.
DLI and EQUINOX Question 1 How do I find out what survey datasets are available from Statistics Canada ?
Hosted by the University of Regina Library December 1999 DLI Training Workshop Chuck Humphrey.
Getting the Whole Picture Using Numbers to Enhance Your Stories
Rural Development Finding data and statistics.  Statistics Canada: Federal statistical agency  Data released under the Data Liberation Initiative (DLI)
Geo-referenced data and DLI aggregate data sources
Navigating Your Way Through the EFT, Nesstar and Beyond 20/20 (WDS)
Accessing data – a user’s perspective
DLI Website.
Creating Something from Nothing: Working with Synthetic Files
DLI Orientation: Concepts
The Data Liberation Initiative Orientation Session
2001 Census of Population Products and Services Presentation to ACCOLEDS December 6, 2001.
Product Retrieval Statistics Canada / Statistique Canada Title page
University of Regina Library
Data Liberation Initiative (DLI)
Exploring the DLI Product line
Creating Something from Nothing: Working with Synthetic Files
Presentation transcript:

The Data Liberation Initiative Orientation Session Statistics Canada / Statistique Canada University of Alberta December 5, 2001 Chuck Humphrey

Products and Services Establishing Perspectives – statistical information  statistics and data – statistics & data sources  national and international – continuum of access  DLI

Statistical Information Statistics numeric facts/figures created from data, i.e, already processed presentation-ready Data numeric files organized for analysis requires processing not ready for display

Statistical Information The lines are blurring... – the past  if it was on paper, it was statistics  if it was digital, it was data – the present  dynamic tables retrievable from online databases  e-journal publications with tables

Statistical Information Statistics...and a map!

Statistical Information Product Implications ± won’t have a ‘published’ product but rather forced to work with dynamically generated tables from databases ± toward this end, will see more Web retrieval of statistics and processing of data examples: STC Community Profiles and ICPSR Data Analysis System

Statistical Information Product Implications ± may only see graphical displays of statistics or data without the numbers or data example: Web map servers

Statistical Information Service Implications + spend less time providing technical services and more time doing extended reference and consulting ± the move to disintermediate products, that is, make them self-serve

Statistical Information Service Implications - need to deal with an even wider variety of retrieval or software tools and possibly formats - may be more difficult to get at the actual statistics or data that are wanted (especially historical data)

Statistics & Data Sources Financial & Stock Data Academic Research Data Statistics Canada Other Canadian Gov’t & Non-gov’t Sources

Statistics & Data Sources Statistics Canada Other Governmental & Non-Governmental Academic Research Data Financial & Stock Data Surveys – x-sect’l & longitudinal Aggregate dbases – time-series & x-class Geography files Supporting documentation – SIC, SOC

Statistics & Data Sources Statistics Canada Other Governmental & Non-Governmental Academic Research Data Financial & Stock Data Health Canada – HBSC & Heart Health CIC – LIDS & IMDB CIHI GDSourcing Statistical Universe

Statistics & Data Sources Statistics Canada Other Governmental & Non-Governmental Academic Research Data Financial & Stock Data ICPSR – ISSP – World Values – Euro-barameters ISR-York – CNES Data Libraries – AAS

Statistics & Data Sources Statistics Canada Other Governmental & Non-Governmental Academic Research Data Financial & Stock Data Datastream Financial Post Corporate Database Compustat CRSP DRI Basic Economics

Statistics & Data Sources Statistics Canada is an important source for statistics and data, but not the only source.

Continuum of Access Turning to Statistics Canada, access to statistics and data is through a variety of services and initiatives. Think of this as a continuum along which levels of access are provided.

Continuum of Access Characteristics of this continuum are: – cost : which runs from free to expensive – restrictions : which runs from open to very restricted – information : which runs from statistics to data

CANSIM II and Trade Analyzer Services available for selected titles. Remote job submission is the most developed for NPHS. Applications can now be submitted through the SSHRC Web site. ACCESS Open Free Statistics Restricted Expensive Data

Products and Services Summary – statistical information  traditional ways of handing print statistics now challenged by online statistics and data – statistics & data sources  Statistics Canada is an important source but not the only source – continuum of access  Several points of access may be needed when dealing with Statistics Canada

Product Types The DLI license provides post-secondary institutions with access to “standard data products”, which consist of public use microdata, aggregate databases, and geography files listed in the Statistics Canada Catalogue.

Product Types Think of this as the stuff that is sold, excluding publications and services. Tape CD-ROM Diskette STC Online Catalogue – Medium Categories

Product Types Think of this as the stuff that is sold, excluding publications. Tape CD-ROM Diskette

Product Types Aggregate data – statistics organized in databases or as data files – tabulations structured by time, geography, and social content

Aggregate Data Structure – Time – Geography – Social Content Example: CANSIM

Aggregate Data Structure – Time – Geography – Social Content Example: CANSIM

Aggregate Data Structure – Time – Geography – Social Content Example: Census

Aggregate Data Structure – Time – Geography – Social Content Example: Small Area Statistics

Aggregate Data Structure – Time – Geography – Social Content Example: HID

Product Types Microdata – raw data organized in a file where the records or lines in the file are observations of a specific unit of analysis and the information on the lines are the values of variables – requires some form of processing or analysis to be used

Public Use Microdata Anonymized Microdata – these are microdata prepared to minimize the possibility of disclosing or identifying any of the cases or observations – the original data (or master file) are edited to create a public use microdata file

Public Use Microdata Steps in Anonymizing Microdata  removal of all personal identification information (names, addresses, etc)  include on gross levels of geography  collapse detailed information into a smaller number of general categories  suppress the values of a variable

Public Use Microdata Statistics Canada PUMFs – only available for select social surveys that undergo a review of the Data Release Committee, an internal Statistics Canada committee – no enterprise public use microdata

Public Use Microdata Statistics Canada PUMFs – almost all are cross-sectional, that is, represent data collected at one point in time – longitudinal data are difficult to anonymize and maintain useful information

Public Use Microdata Statistics Canada PUMFs – how do you recognize a PUMF? Statistics Canada calls them public use microdata files in the Daily.

Statistics Canada Microdata Other Microdata in Statistics Canada – Master files: these are the confidential files from which public use microdata are created. They contain the fullness of the data captured about the unit of observation.

Statistics Canada Microdata Other Microdata in Statistics Canada – Share files: these are confidential files in which the respondents have signed a consent form permitting Statistics Canada to allow access for approved research to their information.

Product Types Geography Files – Census digital boundary and cartographic files in two proprietary formats: ArcView and MapInfo – correspondence tables for linking between Postal Code geography and Census geography

Product Types Digital Copies of Standardized Code Lists and Concordances – Files containing standardized codes for industry, goods, and occupations – correspondence tables between versions of standardized codes for industry and occupations

Data Service Models Service models were presented as a continuum during the 1997 DLI workshop “Order & Pass- through” Service Install Data and Provide Access Treat as a Collection and Provide Reference

Data Service Models Choose a model that matches your staff and computing resources

Acquisition Fill a Request Locate data Order data & documentation Collection Development Select & Locate data Order data & documentation Catalogue data & documentation Install & Store (data & documentation) Reference Search for data Interpret documentation Retrieve or download data Process data change formats subset cases or variables aggregate cases merge files analyze data Acquisition Fill a Request Locate data Order data & documentation Collection Development Select & Locate data Order data & documentation Catalogue data & documentation Install & Store (data & documentation) Reference Search for data Interpret documentation Retrieve or download data Process data change formats subset cases or variables aggregate cases merge files analyze data

Acquisition Fill a Request Locate data Order data & documentation Collection Development Select & Locate data Order data & documentation Catalogue data & documentation Install & Store (data & documentation) Reference Search for data Interpret documentation Retrieve or download data Process data change formats subset cases or variables aggregate cases merge files analyze data Acquisition Fill a Request Locate data Order data & documentation Collection Development Select & Locate data Order data & documentation Catalogue data & documentation Install & Store (data & documentation) Reference Search for data Interpret documentation Retrieve or download data Process data change formats subset cases or variables aggregate cases merge files analyze data

Acquisition Fill a Request Locate data Order data & documentation Collection Development Select & Locate data Order data & documentation Catalogue data & documentation Install & Store (data & documentation) Reference Search for data Interpret documentation Retrieve or download data Process data change formats subset cases or variables aggregate cases merge files analyze data Acquisition Fill a Request Locate data Order data & documentation Collection Development Select & Locate data Order data & documentation Catalogue data & documentation Install & Store (data & documentation) Reference Search for data Interpret documentation Retrieve or download data Process data change formats subset cases or variables aggregate cases merge files analyze data

Acquisition Fill a Request Locate data Order data & documentation Collection Development Select & Locate data Order data & documentation Catalogue data & documentation Install & Store (data & documentation) Reference Search for data Interpret documentation Retrieve or download data Process data change formats subset cases or variables aggregate cases merge files analyze data Acquisition Fill a Request Locate data Order data & documentation Collection Development Select & Locate data Order data & documentation Catalogue data & documentation Install & Store (data & documentation) Reference Search for data Interpret documentation Retrieve or download data Process data change formats subset cases or variables aggregate cases merge files analyze data

Acquisition Fill a Request Locate data Order data & documentation Collection Development Select & Locate data Order data & documentation Catalogue data & documentation Install & Store (data & documentation) Reference Search for data Interpret documentation Retrieve or download data Process data change formats subset cases or variables aggregate cases merge files analyze data Acquisition Fill a Request Locate data Order data & documentation Collection Development Select & Locate data Order data & documentation Catalogue data & documentation Install & Store (data & documentation) Reference Search for data Interpret documentation Retrieve or download data Process data change formats subset cases or variables aggregate cases merge files analyze data Find a referral partner on campus

The Inventory Model In the traditional inventory model, roughly half of the support goes to putting items on the shelf, while the other half goes to finding and getting the items off the shelf. Source: Darlene Fichter

The Access Model With the access model, support is split between getting information into a deliverable state and finding appropriate ways of retrieving and disseminating the information.

Access Models The access models for data and statistics are not really that different from the models employed with bibliographic and full-text databases.  stand-alone workstation  local area network CD- server  campus network server  Internet server

Examples of Access Models Let’s look at some technology-based examples of access models divided between: – statistics and aggregate data, and – microdata.

Stand-alone Workstation Advantages – install once with usually fewer problems – usually fewer license issues Disadvantages – patron must come to the service – queues may develop to use the workstation

Stand-alone Workstation DLI Examples – Statistics and Aggregate Data  1996 Census CD-ROMs, Industrial Monitor, Inter-corporate Ownership, Canadian Business Patterns – Microdata  1996 Census Public Use Microdata Files  a download station for data services staff to write files onto removable media

LAN CD Server Advantages – access to a wider number of concurrent users – products not as ghettoized Disadvantages – patron may still have to come to the service – LANs increase installation difficulties

LAN CD Server DLI Examples – Statistics and Aggregate Data  1996 Census CD-ROMs, Industrial Monitor, Inter-corporate Ownership, Canadian Business Patterns (same examples) – Microdata  place on a shared disk drive copies of microdata files for patrons to analyze or to write files onto removable media

Campus Network Server Advantages – access to largest number of concurrent users – patron does not have to come to the service Disadvantages – licensing issues tend to increase – helper apps must be widely installed

Campus Network Server DLI Examples – Statistics and Aggregate Data  Beyond 20/20 files from the 1996 Census or Health Indicators (serve files not necessarily applications) – Microdata  place on an institutional file server copies of microdata files for patrons to analyze or to write files onto removable media  use of data extraction tools

Internet Server Advantages – possible to integrate local and remote services through a common (seemingly seamless) point of access – increases flexibility in the use of local hardware & storage – creates sharing opportunities between institutions

Internet Server Disadvantages – increases dependence on the agenda of others to enhance and fix problems – often must pay a subscription fee to use – may increase licensing obligations

Internet Server DLI Examples – Statistics and Aggregate Data  access to Internet database applications such as E-STAT and CHASS CANSIM II – Microdata  access to Internet data extraction tools such as IDSL, LANDRU, ISLAND, QWIFS, Sherlock, TDR

A Mixed Access Model Many of us employ a mix of the above access methods. This depends upon: – our institution’s technology mix – our access to technology on our campus – ways that we’ve handled different formats

Access/Dissemination Issues Regardless of the access method used, certain issues apply in all instances. – managing licenses – determining dissemination options

Managing Licenses  What are the conditions of use specified in the license?  What type of identification or authentication is required?

Managing Licenses DLI License – must be an authorized user  need to identify type of user – has only conditional use of material  need to restrict to non-commercial uses of material – permits sharing among DLI member institutions

Managing Licenses Product Licenses – may restrict the use of the product  e.g., Beyond 20/20: educational use only – may restrict the number of copies that can be disseminated – may prevent the distribution of a specific format for a product  e.g., Oracle & World Trade Analyzer

Managing Licenses Special Vendor Licenses – may require a content license separate from the access method  e.g., CHASS’ CANSIM access is based on the DLI license to provide access to the content in CANSIM and the CHASS license is required to use their Internet access tool

Dissemination Options Determining how to disseminate DLI products – what are finding tools for locating DLI products at your institution? – what are the access formats needed for your institution?

Dissemination Options Finding Tools – will the product be catalogued? – will the product be associated with a specific service and/or workstation?  e.g., located in Data Services or Reference – will the product be listed on the library web site?

Dissemination Options Access formats – is there a format that is commonly requested at your institution?  e.g., do most patrons want microdata in SPSS.sav files? – is there a dissemination format that is required as part of your service?  e.g., a format for a data extractor

Products, Service, Access This concludes the discussion on DLI products, data service models, and access models. More will be said about reference and technical services for data later today.