Presentation is loading. Please wait.

Presentation is loading. Please wait.

Metadata: why and how for social science Louise Corti Online Resources Day 15 November 2005, London.

Similar presentations


Presentation on theme: "Metadata: why and how for social science Louise Corti Online Resources Day 15 November 2005, London."— Presentation transcript:

1 Metadata: why and how for social science Louise Corti Online Resources Day 15 November 2005, London

2 What Do Social Researchers Want? Discover available datasets (globally, not just in their own country) and related research literature Understand in detail the origin, methodology and structure of datasets (social sciences datasets are modest in size but big in complexity) Compare and Link data from different sources Model the social phenomena underlying the data Publish their findings with all the supporting evidence (no ‘iceberg’ publishing) and Reproduce published results Connect to other experts and Share informal comments and advice Enforce confidentiality and intellectual property rights while mantaining accuracy and access to data sources. … and more

3 How? through rich and systematic description – though a language that humans and computers can both understand using commonly agreed or mappable vocabularies and standards which must be flexible and adaptable metadata

4 What are metadata? Metadata are structured data which describe the characteristics of an object or resource. They share many similar characteristics to the cataloguing that takes place in libraries, museums and archives. The term "meta" derives from the Greek word denoting a nature of a higher order or more fundamental kind. A metadata record typically consists of a number of pre-defined elements representing specific attributes of a resource, and each element can have one or more values.

5 Grasshopper

6 Metadata schema Element nameValue Title Web UKDA Catalogue CreatorLouise Corti PublisherUK Data Archive Identifier http://www.data-archive.ac.uk/ FormatText/html RelationData Archive Web site Each metadata schema will usually have the following characteristics: a limited number of elements the name of each element the meaning of each element

7 International standards for metadata schema to ensure that every element of information pertaining to the lifecycle of an object ( collection) can be captured: –creation, appraisal, accessioning, conservation, preservation, availability and access must be dynamic and must be open to amendment aim to be consistent, appropriate and self-explanatory description facilitate the retrieval and exchange of information enable the sharing of authority data enable the integration of descriptions from different locations into a unified information system

8 Common metadata schemas Dublin Core minimum number of elements required to facilitate the discovery of document-like objects in a networked environment (eg Internet). Currently 15: Content: Title, Subject, Description, Source, Language, Relation, Coverage Intellectual Property: Author/Creator, Publisher, Contributor, Rights Electronic/Physical Manifestation: Date,Type, Format, Identifier ISAD(G) General International Standard of Archival Description E-GIF E-Government Interoperability Framework OAIS Open Archival Information Systems Reference Model OAI Open Archives Initiative Protocol for Metadata Harvesting

9 No shortage of statistical metadata standards The Common Warehouse Metamodel (CWM) from OMG – data warehousing and business intelligence ISO 11179 – data elements in a metadata repository SDMX – multidimensional data and time-series IQML, AskXML and Triple-S - questionnaire data The Data Documentation Initiative (DDI) – a general metadata standard for statistical data (micro as well as aggregated) And many other related standards. e-Social Science requires more than simple ”data” metadata: –Thesauri, Classifications

10 Encoding schemes HTML (Hyper-Text Markup Language in Web pages, version 3.2 or 4.0) SGML (Standard Generalised Markup Language) XML (eXtensible Markup Language) RDF (Resource Description Framework) MARC (MAchine Readable Cataloging) MIME (Multipurpose Internet Mail Extensions) Z39.50 (protocol for distributed information retrieval) LDAP (Lightweight Directory Application Protocol )

11 Example of deploying metadata for a simple web resource embedding the metadata in a Web page by the creator using META tags in the HTML coding of the page as a separate document (eg XML) linked to a web resource it describes in a database linked to the web resource. The records may either have been directly created within the database or extracted from another source, such as Web pages but what about complex social science data?

12 Stepping back: The Standard Study Description devised in 1970s to describe academically created sociological/political science datasets recommended key bibliographic elements informally ‘adopted’ by CESSDA in 1980s often adapted to suit local needs

13 The Standard Study Description recommended elements : subject category title depositor principal investigator abstract and main topics kind of data dimensions of dataset universe sampled sampling procedures method of data collection dates of coverage, fieldwork and deposit availability and access conditions references to reports and related datasets Controlled vocabulary adopted for some elements –e.g sampling, kind of data subject and geographical key words from broad social science Thesaurus (HASSET)

14 The first step towards interoperability driven by the need to search across European Data Archive holdings development of a core element set for the Integrated Data Catalogue (IDC) catalogue records marked with standard tags for inclusion into WAIS indexes (Wide Area Information Servers) enabled multi-site searching via WAIS protocol simplistic and excluded - links to additional metadata, documentation, thesaurus help, and browsing

15 the DDI is widely adopted by social sciences data archives all over the world that provide many of the datasets used by social scientists for secondary analysis initiated and organised by the the Inter-University Consortium for Political and Social Research (USA) in 1995 to create a metadata standard for the social science community members coming from social science data archives and libraries in USA, Canada and Europe and from major producers of statistical data first in SGML then in XML DDI 1.0 published in 2000. Currently at version 2. Version 3 is being designed and it is scheduled for 2006

16 The Structure of a DDI Codebook Document Description –Description of the codebook document itself (author, sources, etc) Study Description –Information about the entire study or data collection (content, collection methods, processing, sources, access conditions etc) File Description –Description of each single file of the data collection (formats, dimensions, processing information, etc.). Data Description –Description of each single variable in a datafile (format, variable and value labels, definitions, question texts, imputations etc.) Other Study-related Materials –References to reports and publications and other machine readable documentation

17 Data description - variables 000001 1 1 44 123 9 5 4 5 000002 1 3 47 003 1 3 3 3 000003 2 5 43 155 1 1 2 3 000004 1 3 36 012 2 5 5 5 000005 9 4 24 207 9 1 4 5 CaseNumberSex Age Country Ocuupation QuestionResponses

18 DDI in XML

19 Understanding Statistical Metadata Different approaches to understanding: what is it for? –statistical metadata has no value in itself, it is just a means to an end. Its progress should be measured by the extent that it facilitates social research what is it like? –Anything familiar we can relate it to? Form of communication might be a good choice

20 Benefits interoperability –homogeneous exchangeable documents richer content –comprehensive set of elements providing the potential data analyst with broader knowledge single document - multiple purposes –repurposed for different needs and applications – preservation, discovery, and dissemination on-line subsetting and analysis –standard uniform structure and content for variables, ensures easy import into on-line analysis systemsp precision in searching –field-specific searches across documents are enabled and more … – human-readable and computer actionable – essential foundation for E-science and the Grid

21

22

23

24

25 EU Madiera Portal Meta(data) Browsing Search Multilingual Browsing

26 Summary - the DDI The DDI can serve as the foundation for content, distribution, use and preservation of data collections in the social and behavioural sciences, across institutions, countries, and disciplines cooperation from both data producers and statistical software manufacturers, so that the DDI specification can readily become the basis for the entire research process, from generation of a data collection instrument to production of research articles serves the social science community well with a specification that produces quality metadata with multiple purposes. It fully documents the details of datasets, it is user friendly and accessible, it integrates into the infrastructure of the Web and it supports automatic generation of statistical software system files. the widespread adoption of the DDI will vastly improve access to a range of varied datasets. Expanded use will greatly enhance comparative research; the ability to harmonize datasets over time and geography will lead to significant improvement in our understanding of societies

27 The future Statistical metadata is here and it is already changing the way people locate and make sense of data but it does not yet support most use cases of interest to social scientist. What we will need to move forward is: Grammar, a standard Semantic infrastructure (e.g. as provided by the Semantic Web): –semantic extendibility –ability of integrating (merging and overriding) descriptions from different sources large Vocabulary, by integrating different flavours of metadata: –unique identifiers for data and research literature –statistical data metadata (full life cycle) –Ontologies, Thesauri and Classifications (and mappings among them) –statistical processing metadata –“Secondary metadata”: annotations, quality assessment, links to research literature –experts metadata (FOAF)

28 Not Even Half Way There.. DDI StandardTEI for QD RDF Semantic Web Nesstar – Data Web ELSST Integrated Data Catalogue USI Cooperative Markup Annotations Comparable variables Unified Authentication Mappings References Extraction Future developments: Progress in metadata and technical standardisation Latent knowledge capture and extraction Grid

29 Qualitative data and the DDI in October 2001 ESDS Qualidata formally adopted the DDI to describe data in 2000, began to explore standards for archiving, and web representation of qualitative data expertise from the text processing/arts and humanities communities - TEI ESDS Qualidata Online show basic potential of what can be achieved by a common standard need to catch up with the statistical community! working model that will presented today


Download ppt "Metadata: why and how for social science Louise Corti Online Resources Day 15 November 2005, London."

Similar presentations


Ads by Google