1 Hans Pfeiffenberger, Ana Macario, Alfred Wegener Institut, Helmholtz Association OAI4 CERN 2005-10-20 Text, Data and People – How to Represent Earth.

Slides:



Advertisements
Similar presentations
© S.J. Coles 2006 Usability WS, NeSC Jan 06 Enabling the reusability of scientific data: Experiences with designing an open access infrastructure for sharing.
Advertisements

Opening the Research Data Lifecycle Workshop Capturing and Sharing Research Data Simon Coles School of Chemistry, University of Southampton, U.K.
© S.J. Coles 2006 Institutional Data Repositories for Chemistry Simon Coles School of Chemistry, University of Southampton, U.K.
EBankII Workshop 1 Making Scientific Data Openly Available Simon Coles School of Chemistry, University of Southampton.
An Introduction to Repositories Thornton Staples Director of Community Strategy and Alliances Director of the Fedora Project.
October 28, 2003Copyright MIT, 2003 METS repositories: DSpace MacKenzie Smith Associate Director for Technology MIT Libraries.
The Central Role of Data ‘Capturing and Sharing Chemistry Research Data’ Simon Coles School of Chemistry, University of Southampton, U.K.
-1- Ana Macario, Computer Center Alfred Wegener Institute, Bremerhaven, Germany European Fedora User Meeting, Copenhagen, Denmark, Mastertitelformat.
© S.J. Coles 2006 Data Management in the Chemistry Domain Simon Coles School of Chemistry, University of Southampton, U.K.
Ana Macario, Bastian Onken and Hans Pfeiffenberger Plankton*Net: Content aggregation and information re-use Content aggregation and information re-use.
The Open Archives Initiative Simeon Warner (Cornell University) Symposium on “Scholarly Publishing and Archiving on the Web”, University.
Introducing Symposia : “ The digital repository that thinks like a librarian”
OAI Standards for Sheet Music Meeting March 28-29, 2002 Basic OAI Principals How They Apply to Sheet Music Presenter: Curtis Fornadley, Senior Programmer/Analyst.
Metadata: Its Functions in Knowledge Representation for Digital Collections 1 Summary.
The Bremen core repositories and data curation with PANGAEA Hannes Grobe Alfred Wegener Institute for Polar and Marine Research.
EZID (easy-eye-dee) is a service that makes it simple for digital object producers (researchers and others) to obtain and manage long-term identifiers.
THE DATA CITATION INDEX AN INNOVATIVE SOLUTION TO EASE THE DISCOVERY, USE AND ATTRIBUTION OF RESEARCH DATA MEGAN FORCE 22 FEBRUARY 2014.
Data Exchange Tools (DExT) DExT PROJECTAN OPEN EXCHANGE FORMAT FOR DATA enables long-term preservation and re-use of metadata,
METS-Based Cataloging Toolkit for Digital Library Management System Dong, Li Tsinghua University Library
How to participate in the Union Catalogue Project Hussein Suleman Sivulile – Open Access South Africa Advanced Information Management.
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
The Metadata Object Description Schema (MODS) NISO Metadata Workshop May 20, 2004 Rebecca Guenther Network Development and MARC Standards Office Library.
Getting Started with CONTENTdm Corey Harper, University of Oregon Terry Reese, Oregon State University OLA - April 8, 2005.
Amos Kujenga ADLSN Training Coordinator Addis Ababa, Ethiopia 5 – 7 November 2014 Introduction To Digital Libraries and Repositories.
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
Ontology-Driven Automatic Entity Disambiguation in Unstructured Text Jed Hassell.
Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 1 Metadata Helen Aristar Dry Eastern Michigan University LINGUIST List.
NEEO Technical Workshop 2 Exchange of usage metadata Sciences Po, Paris January 15th, 2009 Benoit PAUWELS Université Libre de Bruxelles (ULB) Brussels.
PLoS ONE Application Journal Publishing System (JPS) First application built on Topaz application framework Web 2.0 –Uses a template engine to display.
19/10/20151 Semantic WEB Scientific Data Integration Vladimir Serebryakov Computing Centre of the Russian Academy of Science Proposal: SkTech.RC/IT/Madnick.
GLOBAL BIODIVERSITY INFORMATION FACILITY Éamonn Ó Tuama Senior Programme Officer, IDA 21 June Metadata publishing with the IPT.
Metadata Handling in the North Carolina Geospatial Data Project (NCGDAP) NCSU Libraries Steve Morris Head of Digital Library Initiatives Rob Farrell Geospatial.
07/11/2002Thomas Baron - JACoW Workshop1 CERN Library Requirements T. Baron CERN ETT-DH-CDS.
A centre of expertise in digital information management RDN, e-Prints UK and NOF- Digitise: a (very) small sample of UK OAI activity Andy.
Scientific Data and Electronic Publishing Renze Brandsma, Head, Digital Production Centre University of Amsterdam Maarten Hoogerwerf, Project Manager,
Van de Sompel, Herbert Los Alamos National Laboratory – Research Library OAI-PMH for Resource Harvesting.
International Data Exchange Workshop, Kiel, PANGAEA Publishing Network for Geoscientific & Environmental Data.
DNER Architecture Andy Powell 6 March 2001 UKOLN, University of Bath UKOLN is funded by Resource: The Council for.
1 Everyday Requirements for an Open Ontology Repository Denise Bedford Ontolog Community Panel Presentation April 3, 2008.
Uwe SchindlerGES 2007 – May 2-4, 2007 Data Information Service based on Open Archives Initiative Protocols and Apache Lucene Uwe Schindler 1, Benny Bräuer.
1 GRID Based Federated Digital Library K. Maly, M. Zubair, V. Chilukamarri, and P. Kothari Department of Computer Science Old Dominion University February,
Laura Russell Programmer VertNet Buenos Aires (Argentina) 28 September 2011 Training course on biodiversity data publishing and.
Search Interoperability, OAI, and Metadata Sarah Shreeves University of Illinois at Urbana-Champaign Basics and Beyond Grainger Engineering Library April.
Information Modeling and Semantic Web Application For National Climate Assessment Jin Guang Zheng 1 Curt Tilmes 2
ARROW Institutional Repositories for Managing e-Theses Presentation to ETD September 2005 Geoff Payne, ARROW Project Manager.
Resource Description and Access (RDA) information session Deirdre Kiorgaard Australian Committee on Cataloguing Representative to the Joint Steering Committee.
Registering Earth Science Data and Data Related Services Using NASA’s Global Change Master Directory (GCMD) Tyler Stevens (GIS/Services Coordinator) ESIP.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
Corporation For National Research Initiatives Technical Issues in Electronic Publishing Corporation for National Research Initiatives William Y. Arms.
Information Retrieval
Research Data Management At the Smithsonian Using Sidora CNI December 10, 2013.
Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.
Developing a Canadian Metadata Profile for Institutional Repositories Mark Jordan Simon Fraser University Institutional Repositories: The Future Is Now!
Entering the Data Era; Digital Curation of Data-intensive Science…… and the role Publishers can play The STM view on publishing datasets Bloomsbury Conference.
Tiziana // Alessandra Lenzi - MG Breaking down the walls Project Museo Galileo and the Linked Open Data A joint project between.
CombeDay Making Data Openly Available Simon Coles.
The IODE Anniversary Bibliography: 50 years of activities Maria Kalenchits, Estonian Marine Institute, Estonia Pauline Simpson, Central Caribbean Marine.
Surveying the landscape: collection-level description & resource discovery JISC/NSF DLI Projects meeting, Edinburgh, 24 June 2002 Pete Johnston UKOLN,
Metadata-based Discovery: Experience in Crystallography UKOLN is supported by: Monica Duke UKOLN, University of Bath, UK A centre of.
Data Citation Implementation Pilot Workshop
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.
Open Access and Institutional Repositories, 10 July 2007, UKZN, Durban,,South Africa Metadata for institutional repositories: an introduction Pat Liebetrau.
Metadata & Repositories Jackie Knowles RSP Support Officer.
Online Information and Education Conference 2004, Bangkok Dr. Britta Woldering, German National Library Metadata development in The European Library.
Developing our Metadata: Technical Considerations & Approach Ray Plante NIST 4/14/16 NMI Registry Workshop BIPM, Paris 1 …don’t worry ;-) or How we concentrate.
Repository Software - Standards
VI-SEEM Data Repository
Outline Pursue Interoperability: Digital Libraries
The New Face of Information Retrieval: The Ankara University Open Access Platform Prof. Dr. Sekine Karakaş Prof. Dr. Doğan.
School of Information Studies, Syracuse University, Syracuse, NY, USA
Presentation transcript:

1 Hans Pfeiffenberger, Ana Macario, Alfred Wegener Institut, Helmholtz Association OAI4 CERN Text, Data and People – How to Represent Earth System Science Hans Pfeiffenberger Ana Macario Alfred Wegener Institut, Bremerhaven

2 Hans Pfeiffenberger, Ana Macario, Alfred Wegener Institut, Helmholtz Association OAI4 CERN Introduction Earth System Science (ESS) is an interdisciplinary and global collaboration ESS output is heavily data-centric data come from observations and simulation (“in silico” experiments) ESS work is organized around expeditions or campaigns and coupled models of earth’s sub-systems Logistics and system cost are extremely high one ship may cost up to 500 G€ “Earth Simulator”, the fastest computer 2 years ago ESS data potentially are of extreme long term value

3 Hans Pfeiffenberger, Ana Macario, Alfred Wegener Institut, Helmholtz Association OAI4 CERN An important, typical Experiment EISENEX / EIFEX : Conducted during two expeditions of “Polarstern”, with a 4 year pause EIFEX (2004): 54 scientists (and students) from 14 institutes and 3 companies from 7 European countries and South Africa Oceanographers Biologists Chemists….. “Biogeochemistry”

4 Hans Pfeiffenberger, Ana Macario, Alfred Wegener Institut, Helmholtz Association OAI4 CERN Collaboration’s data needs Need to work from a common understanding of what is known about the subject Need to plan expeditions and coordinate with ships’ operators general plan (5 or more years in advance) Need to coordinate instrument design, operation and interfacing before ships departure Meet aboard, sail and work 8 weeks or so Do evaluation, when at the home institute, exchanging their particular results. Publish text; PhD students dump the data somewhere, if nobody watches, or keep it “private”

5 Hans Pfeiffenberger, Ana Macario, Alfred Wegener Institut, Helmholtz Association OAI4 CERN Data Publishing There is reason enough to thoroughly publish data: Potential reuse in many more contexts than foreseen Enable peer reviewers to have a critical look at data quality Problem: Metadata ISO is a metadata standard (with ~1000 attributes) for georeferenced data Almost no producer of data knows how to form ISO for his/her data (nor wishes to know) There is no reward system (like: number of peer reviewed papers) in place to stimulate individuals There should be a solution for well curated datasets and databases

6 Hans Pfeiffenberger, Ana Macario, Alfred Wegener Institut, Helmholtz Association OAI4 CERN Data Management Metadata needed even on “work in progress”- or auxiliary datasets, both need to be “archived”, or managed Even if they may never achieve a level of “published” data They need to be available to a distributed project group during their project, long before publication There are too many datasets to produce correct and complete ISO metadata “manually” Find ways to produce ISO by each instrument at the time of data creation, automatically Use context or relationship instead of descriptive metadata

7 Hans Pfeiffenberger, Ana Macario, Alfred Wegener Institut, Helmholtz Association OAI4 CERN Relating all relevant Objects …but for AWI expeditions only, today

8 Hans Pfeiffenberger, Ana Macario, Alfred Wegener Institut, Helmholtz Association OAI4 CERN Current PANGAEA relationship encoding Resource Item Dublin Core Pangaea- specific OAI-PMH records OAI-PMH identifier – “DOI” ISO Descriptive + Administrative metadata Descriptive + Administrative metadata Descriptive metadata DC metadata locator for content locator for publication(s) Dataset-to-Publication relationship metadata should be expressed in RDF/XML and placed in the “Relations datastream” Identifiers needed (in addition to locators)

9 Hans Pfeiffenberger, Ana Macario, Alfred Wegener Institut, Helmholtz Association OAI4 CERN Goals Transfer concepts and content from “homegrown”, internal repositories to federations of standards-based IRs around the world Harvest (f.e.) Polarstern-expedition related text and data from all IRs of participants Display / sort / analyze / rank the maze of material through all meaningful criteria Find key networks of people, projects, text,…..

10 Hans Pfeiffenberger, Ana Macario, Alfred Wegener Institut, Helmholtz Association OAI4 CERN

11 Hans Pfeiffenberger, Ana Macario, Alfred Wegener Institut, Helmholtz Association OAI4 CERN

12 Hans Pfeiffenberger, Ana Macario, Alfred Wegener Institut, Helmholtz Association OAI4 CERN Types of Object In the order of appearance (1) (Institutions) Person represented by splash page (Personal home page) uid: eduPersonPrimaryName primary encoding: eduPerson schema (informal group) Project represented by splash page (Project home page) uid: maybe a specific encoding of the funders’ project number primary encoding: eduPerson/eduOrg schema Expedition, Campaign: represented by splash page (Expedition home page) treat it as a project, generate project number from expedition identifier primary encoding: eduPerson/eduOrg schema

13 Hans Pfeiffenberger, Ana Macario, Alfred Wegener Institut, Helmholtz Association OAI4 CERN Types of Object In the order of appearance (2) Datasets represented by splash page uid: maybe the same kind as publications primary encoding: Community specific (f.e.: ISO 19115) Publications represented by splash page containing –abstract, etc. –pointer to article at publishers site –pointer to article at IR –publisher’s word about what is the “original”, etc. uid: DOI, permanent URL, etc. primary encoding: repository’s (proprietary) format (f.e.: Fedora’s, it must be possible to map this in an unambiguous way to METS, MPEG21-DIDL,…

14 Hans Pfeiffenberger, Ana Macario, Alfred Wegener Institut, Helmholtz Association OAI4 CERN Object relationships (tentative) Person Project Expedition / Experiment / Campaign Group IsMemberOf IsPIOf IsPartOf Publication IsPartOf IsAuthorOf Dataset IsBasedOn IsDescribedBy IsResultOf IsAuthorOf IsMemberOf

15 Hans Pfeiffenberger, Ana Macario, Alfred Wegener Institut, Helmholtz Association OAI4 CERN Conclusion 1 – Text with Data (Text-)Publications and related primary data have to be cross-referenced We need ontology and schema designs to express the relationships (to solve reuse/aggregation problem) Extensive descriptive metadata (f.e. ISO19115) are useful only to big repositories of well curated datasets with similar content The full text of publications (and its relation to datasets) may be the best “metadata” for the datasets you will get Primary hit in a (Google-like) search may be a publication, which refers to primary data

16 Hans Pfeiffenberger, Ana Macario, Alfred Wegener Institut, Helmholtz Association OAI4 CERN Conclusions 2 - Full Relation Network Service providers should make use of network of all relevant objects - people, projects, … datasets, text harvest relationship metadata harvest descriptive metadata (Dublin Core quality) enable new search paradigms Data providers need to expose the relationship of objects will require a “complex” metadata format will require an ontology for relationships will require unique identifiers for people etc. (from eduPerson schema, ~ address) introduce identifiers for projects and “experiments”