Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Data Liberation Initiative Orientation Session

Similar presentations


Presentation on theme: "The Data Liberation Initiative Orientation Session"— Presentation transcript:

1 The Data Liberation Initiative Orientation Session
University of Alberta December 5, 2001 Title page Greetings: Good morning everyone, glad to be with you for this workshop. If you have questions through out our presentation please just ask and we can elaborate more on these questions you have. Statistics Canada / Statistique Canada

2 Products and Services Establishing Perspectives
statistical information statistics and data statistics & data sources national and international continuum of access DLI

3 Statistical Information
Statistics numeric facts/figures created from data, i.e, already processed presentation-ready Data numeric files organized for analysis requires processing not ready for display

4 Statistical Information
The lines are blurring ... the past if it was on paper, it was statistics if it was digital, it was data the present dynamic tables retrievable from online databases e-journal publications with tables

5 Statistical Information
Statistics ... and a map!

6 Statistical Information
Product Implications won’t have a ‘published’ product but rather forced to work with dynamically generated tables from databases toward this end, will see more Web retrieval of statistics and processing of data examples: STC Community Profiles and ICPSR Data Analysis System

7 Statistical Information
Product Implications may only see graphical displays of statistics or data without the numbers or data example: Web map servers

8 Statistical Information
Service Implications spend less time providing technical services and more time doing extended reference and consulting the move to disintermediate products, that is, make them self-serve

9 Statistical Information
Service Implications need to deal with an even wider variety of retrieval or software tools and possibly formats may be more difficult to get at the actual statistics or data that are wanted (especially historical data)

10 Statistics & Data Sources
Canada Academic Research Data Other Canadian Gov’t & Non-gov’t Sources Financial & Stock Data

11 Statistics & Data Sources
Statistics Canada Other Governmental & Non-Governmental Academic Research Data Financial & Stock Data Surveys x-sect’l & longitudinal Aggregate dbases time-series & x-class Geography files Supporting documentation SIC, SOC

12 Statistics & Data Sources
Statistics Canada Other Governmental & Non-Governmental Academic Research Data Financial & Stock Data Health Canada HBSC & Heart Health CIC LIDS & IMDB CIHI GDSourcing Statistical Universe

13 Statistics & Data Sources
Statistics Canada Other Governmental & Non-Governmental Academic Research Data Financial & Stock Data ICPSR ISSP World Values Euro-barameters ISR-York CNES Data Libraries AAS

14 Statistics & Data Sources
Statistics Canada Other Governmental & Non-Governmental Academic Research Data Financial & Stock Data Datastream Financial Post Corporate Database Compustat CRSP DRI Basic Economics

15 Statistics & Data Sources
Statistics Canada is an important source for statistics and data, but not the only source.

16 Continuum of Access Turning to Statistics Canada, access to statistics and data is through a variety of services and initiatives. Think of this as a continuum along which levels of access are provided.

17 Continuum of Access Characteristics of this continuum are:
cost : which runs from free to expensive restrictions : which runs from open to very restricted information : which runs from statistics to data

18 Statistical Information Available through Statistics Canada
Different Services Service: Statistics Canada Website Depository Service Program Data Liberation Initiative Cu$tomized Tabulations & Pay per View Remote Job Submission Research Data Centres Who is Eligible & Conditions: General Public: available on the Internet at Designated DSP Libraries & their Users: available on site Post-secondary Academic: restricted to teaching and research purposes Individuals: contract between STC and individual Approved Researchers: SSHRC peer review & deemed STC employee Products: - The Daily - Canadian - Census - Statistical profiles of Canadian communities - Downloadable publications - Paper publica- tions - Electronic pub- lications, which includes priced down-loadable publications & select CD ROMS Standard data products: aggregate data bases, microdata files and geography files Tables from confidential files that are specially produced by Statistics Canada for a fee and access to specialized databases “Dummy” or synthetic files to build analysis setups that must then be submitted to Stats Can for processing Confidential data files from the longitudinal surveys begun in the 1990’s Notes Warning: some parts of the Website are fee-based Some DSP libraries provide off-site access to authenticated users Interface to CANSIM I and Trade Analyzer available through CHASS (University of Toronto) by subscription Specialized databases include Open Free Statistics Restricted Expensive Data ACCESS Services available Applications can for selected titles. now be submitted CANSIM II and Remote job through the Trade Analyzer submission is the SSHRC Web site. most developed for NPHS.

19 Products and Services Summary statistical information
traditional ways of handing print statistics now challenged by online statistics and data statistics & data sources Statistics Canada is an important source but not the only source continuum of access Several points of access may be needed when dealing with Statistics Canada

20 Product Types The DLI license provides post-secondary institutions with access to “standard data products”, which consist of public use microdata, aggregate databases, and geography files listed in the Statistics Canada Catalogue.

21 Product Types Think of this as the stuff that is sold, excluding publications and services. Tape CD-ROM Diskette STC Online Catalogue Medium Categories

22 Product Types Think of this as the stuff that is sold, excluding publications. Tape CD-ROM Diskette

23 Product Types Aggregate data
statistics organized in databases or as data files tabulations structured by time, geography, and social content

24 Aggregate Data Structure Time Geography Social Content Example: CANSIM

25 Aggregate Data Structure Time Geography Social Content Example: CANSIM

26 Aggregate Data Structure Time Geography Social Content Example: Census

27 Aggregate Data Structure SABAL cancelled
Time Geography Social Content Example: Small Area Statistics SABAL cancelled

28 Aggregate Data Structure Time Geography Social Content Example: HID

29 Product Types Microdata
raw data organized in a file where the records or lines in the file are observations of a specific unit of analysis and the information on the lines are the values of variables requires some form of processing or analysis to be used

30 Public Use Microdata Anonymized Microdata
these are microdata prepared to minimize the possibility of disclosing or identifying any of the cases or observations the original data (or master file) are edited to create a public use microdata file

31 Steps in Anonymizing Microdata
Public Use Microdata Steps in Anonymizing Microdata removal of all personal identification information (names, addresses, etc) include on gross levels of geography collapse detailed information into a smaller number of general categories suppress the values of a variable

32 Statistics Canada PUMFs
Public Use Microdata Statistics Canada PUMFs only available for select social surveys that undergo a review of the Data Release Committee, an internal Statistics Canada committee no enterprise public use microdata

33 Statistics Canada PUMFs
Public Use Microdata Statistics Canada PUMFs almost all are cross-sectional, that is, represent data collected at one point in time longitudinal data are difficult to anonymize and maintain useful information

34 Statistics Canada PUMFs
Public Use Microdata Statistics Canada PUMFs how do you recognize a PUMF? Statistics Canada calls them public use microdata files in the Daily.

35 Statistics Canada Microdata
Other Microdata in Statistics Canada Master files: these are the confidential files from which public use microdata are created. They contain the fullness of the data captured about the unit of observation.

36 Statistics Canada Microdata
Other Microdata in Statistics Canada Share files: these are confidential files in which the respondents have signed a consent form permitting Statistics Canada to allow access for approved research to their information.

37 Product Types Geography Files
Census digital boundary and cartographic files in two proprietary formats: ArcView and MapInfo correspondence tables for linking between Postal Code geography and Census geography

38 Digital Copies of Standardized Code Lists and Concordances
Product Types Digital Copies of Standardized Code Lists and Concordances Files containing standardized codes for industry, goods, and occupations correspondence tables between versions of standardized codes for industry and occupations

39 Install Data and Provide Access
Data Service Models Service models were presented as a continuum during the 1997 DLI workshop “Order & Pass-through” Service Install Data and Provide Access Treat as a Collection and Provide Reference

40 Data Service Models Choose a model that matches your staff and computing resources

41 Acquisition  Fill a Request  Locate data  Order data & documentation Collection Development Select & Locate data Order data & documentation Catalogue data & documentation Install & Store (data & documentation) Reference Search for data Interpret documentation Retrieve or download data Process data change formats subset cases or variables aggregate cases merge files analyze data

42 Acquisition  Fill a Request  Locate data  Order data & documentation Collection Development Select & Locate data Order data & documentation Catalogue data & documentation  Install & Store (data & documentation) Reference Search for data Interpret documentation Retrieve or download data Process data change formats subset cases or variables aggregate cases merge files analyze data

43 Acquisition  Fill a Request  Locate data  Order data & documentation Collection Development Select & Locate data Order data & documentation Catalogue data & documentation  Install & Store (data & documentation) Reference  Search for data  Interpret documentation  Retrieve or download data Process data change formats subset cases or variables aggregate cases merge files analyze data

44 Acquisition  Fill a Request  Locate data  Order data & documentation  Collection Development  Select & Locate data  Catalogue data & documentation  Install & Store (data & documentation) Reference  Search for data  Interpret documentation  Retrieve or download data  Process data  change formats  subset cases or variables aggregate cases merge files analyze data

45 Find a referral partner on campus
Acquisition  Fill a Request  Locate data  Order data & documentation  Collection Development  Select & Locate data  Catalogue data & documentation  Install & Store (data & documentation) Reference  Search for data  Interpret documentation  Retrieve or download data  Process data  change formats  subset cases or variables aggregate cases merge files analyze data Find a referral partner on campus

46 The Inventory Model In the traditional inventory model, roughly half of the support goes to putting items on the shelf, while the other half goes to finding and getting the items off the shelf. Source: Darlene Fichter

47 The Access Model With the access model, support is split between getting information into a deliverable state and finding appropriate ways of retrieving and disseminating the information.

48 Access Models The access models for data and statistics are not really that different from the models employed with bibliographic and full-text databases. stand-alone workstation local area network CD-server campus network server Internet server

49 Examples of Access Models
Let’s look at some technology-based examples of access models divided between: statistics and aggregate data, and microdata.

50 Stand-alone Workstation
Advantages install once with usually fewer problems usually fewer license issues Disadvantages patron must come to the service queues may develop to use the workstation

51 Stand-alone Workstation
DLI Examples Statistics and Aggregate Data 1996 Census CD-ROMs, Industrial Monitor, Inter-corporate Ownership, Canadian Business Patterns Microdata 1996 Census Public Use Microdata Files a download station for data services staff to write files onto removable media

52 LAN CD Server Advantages Disadvantages
access to a wider number of concurrent users products not as ghettoized Disadvantages patron may still have to come to the service LANs increase installation difficulties

53 LAN CD Server DLI Examples Statistics and Aggregate Data Microdata
1996 Census CD-ROMs, Industrial Monitor, Inter-corporate Ownership, Canadian Business Patterns (same examples) Microdata place on a shared disk drive copies of microdata files for patrons to analyze or to write files onto removable media

54 Campus Network Server Advantages Disadvantages
access to largest number of concurrent users patron does not have to come to the service Disadvantages licensing issues tend to increase helper apps must be widely installed

55 Campus Network Server DLI Examples Statistics and Aggregate Data
Beyond 20/20 files from the 1996 Census or Health Indicators (serve files not necessarily applications) Microdata place on an institutional file server copies of microdata files for patrons to analyze or to write files onto removable media use of data extraction tools

56 Internet Server Advantages
possible to integrate local and remote services through a common (seemingly seamless) point of access increases flexibility in the use of local hardware & storage creates sharing opportunities between institutions

57 Internet Server Disadvantages
increases dependence on the agenda of others to enhance and fix problems often must pay a subscription fee to use may increase licensing obligations

58 Internet Server DLI Examples Statistics and Aggregate Data Microdata
access to Internet database applications such as E-STAT and CHASS CANSIM II Microdata access to Internet data extraction tools such as IDSL, LANDRU, ISLAND, QWIFS, Sherlock, TDR

59 A Mixed Access Model Many of us employ a mix of the above access methods. This depends upon: our institution’s technology mix our access to technology on our campus ways that we’ve handled different formats Some products are bound to operating systems that were not intended for a lan (e.g., C91, XV tables in CBP). Other products provide files in a format that can be delivered over a lan, such as Beyond 20/20’s IVT.

60 Access/Dissemination Issues
Regardless of the access method used, certain issues apply in all instances. managing licenses determining dissemination options

61 Managing Licenses What are the conditions of use specified in the license? What type of identification or authentication is required?

62 Managing Licenses DLI License must be an authorized user
need to identify type of user has only conditional use of material need to restrict to non-commercial uses of material permits sharing among DLI member institutions

63 Managing Licenses Product Licenses may restrict the use of the product
e.g., Beyond 20/20: educational use only may restrict the number of copies that can be disseminated may prevent the distribution of a specific format for a product e.g., Oracle & World Trade Analyzer

64 Managing Licenses Special Vendor Licenses
may require a content license separate from the access method e.g., CHASS’ CANSIM access is based on the DLI license to provide access to the content in CANSIM and the CHASS license is required to use their Internet access tool

65 Dissemination Options
Determining how to disseminate DLI products what are finding tools for locating DLI products at your institution? what are the access formats needed for your institution?

66 Dissemination Options
Finding Tools will the product be catalogued? will the product be associated with a specific service and/or workstation? e.g., located in Data Services or Reference will the product be listed on the library web site?

67 Dissemination Options
Access formats is there a format that is commonly requested at your institution? e.g., do most patrons want microdata in SPSS .sav files? is there a dissemination format that is required as part of your service? e.g., a format for a data extractor

68 Products, Service, Access
This concludes the discussion on DLI products, data service models, and access models. More will be said about reference and technical services for data later today.


Download ppt "The Data Liberation Initiative Orientation Session"

Similar presentations


Ads by Google