Presentation on theme: "Data archiving in Canada: problems and prospects Presentation to NRF by Laine GM Ruus University of Toronto. Data Library Service 16/05/2002."— Presentation transcript:
Data archiving in Canada: problems and prospects Presentation to NRF by Laine GM Ruus University of Toronto. Data Library Service 16/05/2002
Outline Why archive data? Current problems in Canada (and a personal view of some solutions) Current trends, in problems and solutions
Why archive data? Promotes research ethics/research integrity Enables replication analysis Allows re-analysis with refined or new social theories Allows re-analysis with new techniques (eg. bootstrapping, etc.)
Why archive data (cont’d) Allows comparative analysis, similar data with new universes Enables analysis of change, similar data at different points in time Maximizes return on initial investment in data collection Enables training of policy makers with local data
Why archive data (cont’d) Promotes increased numeracy in the population Preservation for future generations of an aspect of our culture not measured by other means.
Current climate for data archiving in Canada 3 major data producers: government, academia, commercial sector Copyright Act: –Crown copyright: government produced data and information products belong to the Queen Government information policy –set by Treasury Board for government, without consultation with non-government sectors
Government sector as a data producer Statistics Canada is major data collector for socio-economic data Government has no data archive National Archives has not collected data over last 15 years Treasury Board policy since ca 1984 treats government data and information products as a commodity
Academia as a data producer SSHRCC has data deposit regulations, but no enforcement CIHR (formerly MRC), and NSERC have no data deposit regulations individual university research funds have no data deposit regulations no national data archive
Academia as a data producer (cont’d) no tradition in academic sector of depositing research data attitude of individual ownership vis-à-vis data files history of using US data/10 no tradition of citing data files in publications
Academia as a data producer (cont’d) Few requirements among periodical editors requiring citation of data files no tradition among tenure boards to treat creation of a data file as equivalent to publication only commercial value in software and applications awakening universities to their rights under Copyright Act.
Commercial sector as a data producer Subject to Copyright Act and new Personal Information Acts No national data archive No uniformity in approach to archiving their data No national body with which to negotiate arrangements
Solutions to the Canadian problems (a very personal view) Canada needs a national information policy Canada needs a national data archive Government (all sectors) need government data archives Need to promote a culture of data deposit and data sharing in the academic sector (SSRC, CIHR, and NSERC, etc.)
Solutions to the Canadian problems (a very personal view) cont’d Need to educate hiring bodies and tenure boards that data file creation is a valuable academic activity Need to sell commercial sector on benefits of data archiving and data sharing Need to promote numeracy in the population.
Recent trends in the data archiving/data service sector Longitudinal data Research data centres DDI/DTD WWW data extractor interfaces GIS Proliferating formats
Recent trends: longitudinal data Data producers increasingly collecting longitudinal/panel data Enhanced capability to test theories re social change over time Increased problems of preserving privacy and confidentiality Requires more sophisticated research techniques
Recent trends: research data centres Secure access to more detailed or sensitive data Creates segregation of research capabilities (data haves vs data have nots) Data producers less likely to produce public use microdata files
Recent trends: DDI/DTD Data Documentation Initiative Data Type Definition A standard format for metadata describing microdata files Being expanded to encompass aggregate and time-series data
Recent trends: data extractors Data extractors provide access to data/analyses of data via Internet protocols Enabled by development of DDI/DTD Selected data extractors linked at:
Recent trends: data extractors (cont’d) Two major data extractor developments: –NESSTAR –Virtual Data Center project
Recent trends: GIS Geographic information systems New theoretical models based on spatial analysis New software capable of spatial analysis (ArcGIS) Increasing demand for geocoded aggregate and microdata
The nice thing about standards is that there are so many to choose from!
Recent trends: proliferation of formats Data archiving becoming more difficult Many new proprietary formats and flavours to deal with Increasing number of formats for which we have not yet developed preservation formats, eg GIS shape files, relational databases, etc.
To finish... Without data archives, we will loose about 50 years of our culture Successful long-term preservation of our electronic culture will partly depend on bringing copyright legislation, internationally, into the 21st century
Data archiving is not a national problem, nor a problem that is unique to any one country. The problems and solutions are similar in all countries. We can all learn which solutions are best and/or worst from each other.
IASSIST (International Association for Social Science Information Service & Technology) is one of the venues in which we learn from each other.