Information Types and Registries Giridhar Manepalli Corporation for National Research Initiatives Strategies for Discovering Online Data BRDI Symposium.

Slides:



Advertisements
Similar presentations
Geoscience Information Network Stephen M Richard Arizona Geological Survey National Geothermal Data System.
Advertisements

1 Building scientific Virtual Research Environments in D4Science Paul Polydoras University of Athens, Greece.
Breakout 1 Socio-legal etc. Every discipline will be different & each data centre will have different answers to questions. Use a questionnaire and send.
SDMX in the Vietnam Ministry of Planning and Investment - A Data Model to Manage Metadata and Data ETV2 Component 5 – Facilitating better decision-making.
A Unified Approach to Combat Counterfeiting: Use of the Digital Object Architecture and ITU-T Recommendation X.1255 Robert E. Kahn President & CEO CNRI,
Health Ingenuity Exchange (HingX) Best Practices for User Groups and Resource Registration.
Presentation at WebEx Meeting June 15,  Context  Challenge  Anticipated Outcomes  Framework  Timeline & Guidance  Comment and Questions.
SCIDIP-ES Components Oct ,Brussels. Basic Preservation Strategies Often stated as: “Emulate or Migrate” OAIS concepts change these to: Add Representation.
The Data Curation Profile IASSIST 2010 Jake Carlson Data Research Scientist Purdue University Libraries.
Study Period Report: Metamodel for On Demand Model Selection (ODMS) Wang Jian, He Keqing, He Yangfan, Wang Chong State Key Lab of Software Engineering,
The NSDL Registry Diane Hillmann  Jon Phipps. What We’re Doing Received an NSF grant in Oct. 2006, to: Register metadata schemas, vocabularies, application.
Columbia University Dept of Computer Science Center for Research on Info Access University of So. Calif Information Sciences Institute (ISI)
The RDF meta model: a closer look Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations.
Measurement Data Archive – Project Highlights GEC12 Nov 2011 Giridhar Manepalli Corporation for National Research Initiatives
Governance Issues Governance Dimensions of data access infrastructures Rob Atkinson Social Change Online.
The NSDL Registry Jon Phipps Stuart Sutton Diane Hillmann Ryan Laundry Cornell U. U. of Washington.
January, 23, 2006 Ilkay Altintas
Key integrating concepts Groups Formal Community Groups Ad-hoc special purpose/ interest groups Fine-grained access control and membership Linked All content.
SC32 WG2 Metadata Standards Tutorial Metadata Registries and Big Data WG2 N1945 June 9, 2014 Beijing, China.
Measurement Data Archive GEC11 July 2011 Giridhar Manepalli Corporation for National Research Initiatives
Digital Object Architecture
Profiling Metadata Specifications David Massart, EUN Budapest, Hungary – Nov. 2, 2009.
U.S. Department of the Interior U.S. Geological Survey Next Generation Data Integration Challenges National Workshop on Large Landscape Conservation Sean.
RDA Data Foundation and Terminology (DFT) IG: Introduction Prepared for RDA 6 th Plenary Paris, Sept. 25, 2015 Gary Berg-Cross, Raphael Ritz Co-Chairs.
Preserving Digital Collections for Future Scholarship Oya Y. Rieger Cornell University
Sept 19,  Provides a common set of terminology and definitions  A framework for describing resources and processes  Enables computer based interoperability.
Sharing Research Data Globally Alan Blatecky National Science Foundation Board on Research Data and Information.
API, Interoperability, etc.  Geoffrey Fox  Kathy Benninger  Zongming Fei  Cas De’Angelo  Orran Krieger*
“Interoperability”??? Opportunities for Applied Research on the Creation, Management, Preservation and Use of Digital Content IMLS Washington, DC March.
Extensible Markup Language (XML) Extensible Markup Language (XML) is a simple, very flexible text format derived from SGML (ISO 8879).ISO 8879 XML is a.
USGS Metadata in the Broader Picture 1994 Executive Order – Metadata must be created for all Federally-funded research – Federal Geographic Data.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
BAA - Big Mechanism using SIRA Technology Chuck Rehberg CTO at Trigent Software and Chief Scientist at Semantic Insights™
Adoption of RDA-DFT Terminology and Data Model to the Description and Structuring of Atmospheric Data Aaron Addison, Rudolf Husar, Cynthia Hudson-Vitale.
Alternative Architecture for Information in Digital Libraries Onno W. Purbo
OAI Overview DLESE OAI Workshop April 29-30, 2002 John Weatherley
CyberInfrastructure for Network Analysis Importance of, contributions by network analysis Transformation of NA Support needed for NA.
GBIF Data Access and Database Interoperability 2003 Work Programme Overview Donald Hobern, GBIF Programme Officer for Data Access and Database Interoperability.
1 Understanding Cataloging with DLESE Metadata Karon Kelly Katy Ginger Holly Devaul
Breakout # 1 – Data Collecting and Making It Available Data definition “ Any information that [environmental] researchers need to accomplish their tasks”
Theme 2: Data & Models One of the central processes of science is the interplay between models and data Data informs model generation and selection Models.
Information Dynamics & Interoperability Presented at: NIT 2001 Global Digital Library Development in the New Millennium Beijing, China, May 2001, and DELOS.
The eGY Legacy: A framework for e-Discovery and e-Access (eDeA) to Scientific Data Vladimir Papitashvili, AOSS, University of Michigan The eGY General.
ISAN: International Standard Audiovisual Number Hollywood Post Alliance Technology Retreat January 27 & 28, 2005 S. Merrill Weiss Merrill Weiss Group LLC.
April 14, 2005MIT Libraries Visiting Committee Libraries Strategic Plan Theme III Work to shape the future MacKenzie Smith Associate Director for Technology.
Fire Emissions Network Sept. 4, 2002 A white paper for the development of a NSF Digital Government Program proposal Stefan Falke Washington University.
Why RDA? A domain repository perspective George Alter ICPSR University of Michigan.
Carl Lagoze Digital Library Service Registry Workshop Services in a Scholarly Communication Framework.
Project number: ENVRI and the Grid Wouter Los 20/02/20161.
Images: Images are extremely important to publicize the results of NSF investments. In general, graphs, spectra, and reaction diagrams are not compelling.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No The Data Type.
Open Science (publishing) as-a-Service Paolo Manghi (OpenAIRE infrastructure) Institute of Information Science and Technologies Italian Research Council.
1 Geospatial Standards for Canada Proposed blueprint for Jean Brodeur and Cindy Mitchell.
Data Type Registries (DTR) WG RDA P3 Breakout 28 March 2014 Larry Lannom Corporation for National Research Initiatives
Of 24 lecture 11: ontology – mediation, merging & aligning.
Informatics for Scientific Data Bio-informatics and Medical Informatics Week 9 Lecture notes INF 380E: Perspectives on Information.
IPDA Registry Definitions Project Dan Crichton Pedro Osuna Alain Sarkissian.
Data Type Registries #2 Co-Chairs: RDA Chairs’ Mtg Gothenburg
RDA US Science workshop Arlington VA, Aug 2014 Cees de Laat with many slides from Ed Seidel/Rob Pennington.
Data Type Registries #2 12 Month Status Larry Lannom, Tobias Weigel Date Location TBD? CC BY-SA 4.0.
Data Type Registries Breakout
RDA Plenary 9 Breakout Session
Data Type Registries (DTR)
C2CAMP (A Working Title)
PDAP Query Language International Planetary Data Alliance
Health Ingenuity Exchange - HingX
HingX Project Overview
Dr Kristin Stock Allworlds Geothinking
Bird of Feather Session
Incorporating Scientific Practices into the BBNJ ILBI
Presentation transcript:

Information Types and Registries Giridhar Manepalli Corporation for National Research Initiatives Strategies for Discovering Online Data BRDI Symposium – Feb 26, 2013 Corporation for National Research Initiatives 1

Research Data Interoperability Corporation for National Research Initiatives 2 Scientists, Data Curators, End Users, Applications Enabling Technologies Discovery Access Interpretation Reuse Accessed via Repositories ID Datasets ID ID ID ID

Research Data Interoperability (cont.) Interoperability of research data allows discovery, access, interpretation, and reuse of datasets by researchers Examples Discovery: A scientist from US “discovers” datasets from research in Germany, in related or even unrelated domain Reuse: A scientist from US “re-uses” or processes datasets from the discovered research in Germany For interpretation of accessible datasets, Types and Type Registries play a significant role Corporation for National Research Initiatives 3

Information Types – Our Definition What they are not: Programmatic data types (string, integer, double, etc.) Mime types as normally used (text/xml, application/rdf) Types are identifiers that, with the help of associated metadata, characterize data structures used for managing information Data structures could be at multiple levels of granularity Individual observations, to sets of observations within a time series, to multiple time-series sets that explain a phenomenon Usually Spread across multiple files (each with specific mime type) Distributed on the network (managed by various repositories) We call such data structures used for managing information digital objects Types (aka type identifiers) are unique across their user base Types are associated with machine-readable metadata to support interpretation of information CNRI’s focus is to support infrastructure for enabling inter-discipline types Corporation for National Research Initiatives 4 Type ID Machine Readable Metadata Digital Object Network Typed Digital Object File

Value Proposition of Info. Types Typing allows Grouping of digital objects generated in different times and domains for reasoning and establishing correlations between different types of objects Grouping is an aspect fundamental to humans for reasoning about things Creation of services that can automate information processing based on information types Advanced information processing can be performed for finding unforeseen correlations, trends, etc. This type of advanced processing has different names: data-intensive science, fourth paradigm, big-data analytics, etc. Corporation for National Research Initiatives 5 Type C Type A Type B Typed Digital Object Collection Digital Object

Value Proposition of Info. Types (cont.) Corporation for National Research Initiatives 6 SUITE OF SERVICES Visualization I Agree Terms:… Rights I Agree Terms:… I Agree Terms:… Data Set Dissemination … … …. Data Processing 1.User requests Type from a Digital Object of interest Type ID is returned to the user. Type Registry Digital Objects Interaction 3.User requests the Type Registry for the Type info. 4.Type Info is returned to the user containing Services Info. 5.User requests a Service for processing.

Info. Typing Challenges Challenge: When are two digital objects assigned the same type? When the bit-level encoding matches? Or when the higher-level structures and intent matches? If two observations are made by two similar instruments at the same time on the same entity, would the data generated by those two observations be constituted as being of the same type? Even if the data generated by each observation, similar in concept, has a different format (e.g., JPEG vs. PNG)? Our approach: Intent wins over optics (formats, encodings, etc.) The metadata associated with the type could list possible formats, encodings, etc. Alternative approach: Establish a base type and then sub-type for accommodating variations Our experience was that it was too cumbersome to deal with multiple formats, encodings at the type definition level Corporation for National Research Initiatives 7

Info. Typing Challenges (cont.) Challenge: Can the same digital object be assigned multiple types? If so, how do we deal with duplicate types? If not, how do we manage multiple types assigned by several domains? Our approach: An object is assigned an inter- discipline type only once. Any domain-specific types are listed in its metadata Corporation for National Research Initiatives 8 Type α Type β Typed Digital Object Collection Type I Machine-readable Metadata Type α Type β Biologist Computer Scientist Inter-discipline Type

Info. Typing Challenges (cont.) Challenge: How can existing information be typed under this new scheme? A lot of information exists already One approach: Start with domain-specific types, if any, and generate domain- neutral types and list the domain-specific types in their metadata records Corporation for National Research Initiatives 9

Info. Types – Machine-readable Metadata Machine-readable metadata for Info. Types is still an area of research for us Type interdependence It is clear that sub-typing is needed for building on previously defined types Our experience shows that sub-typing based on variations in formats and encoding is a cumbersome process Instead, an exhaustive list of possible formats and encodings may be specified in the metadata Domain-specific Types Cross-domain Types could list or point at domain-specific types which could be multiple for a given object, and which might define detailed semantics for interpretation Metadata for automated interpretation For the few types of information we prototyped, defining metadata that helps services process datasets is loose ended and sometimes impractical A parsing-language or a pseudo-code may instead be captured that transforms datasets into domain-specific ontologies or semantics Corporation for National Research Initiatives 10

Info. Type Registries Info. Type Registries are metadata registries that Support recording of information types and associated metadata records Perform federation across other registries De-duplicate (or match types) to control registration requests of existing types Include manual moderation and/or crowd sourcing function for spotting redundant registrations (optional) Cross-domain Type Registries may optionally link to domain- specific Type Registries Type Registries may manage or reference services that process information of certain types CNRI has vast experience building metadata registries Corporation for National Research Initiatives 11

Next Steps Received Sloan Foundation funding to research Type Registries within scientific and financial communities CNRI employees lead and participate in a Type Registry working group within the Research Data Alliance Technical goal is to define the scope of ‘Information Type’ by working in aforementioned projects, and build and release an open-source Type Registry in the next 18 months. Corporation for National Research Initiatives 12