Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Type Registries Breakout

Similar presentations


Presentation on theme: "Data Type Registries Breakout"— Presentation transcript:

1 Data Type Registries Breakout
Co-chairs: Larry Lannom, Tobias Weigel P10, Montreal September 2017

2 Agenda 11:30 - 11:35 Welcome & Intros, Agenda Bashing
11: :40 Larry Lannom, State of the WG & Brief DTR Overview 11: :50 Tobias Weigel, Climate Data Processing 11: :00 Ulrich Schwardmann, ePIC DTR 12: :10 Wo Chang, Common Access Protocol, IEEE BDGMM 12: :20 Rob Quick, RPID Test Bed 12: :25 Steve Richard, EarthCube (remote) 12: :30 Andres Ferreyra, AgGateway (remote) 12: :40 Giridhar Manepalli, ISO WG plus Data Models (remote) 12: :00 Tobias Weigel, Discussion: Next Steps, Goals for P11

3 What is the Issue? Data sharing requires that data can be parsed, understood, and reused by people and applications other than those that created the data How do we do this now? For documents – formats are enough, e.g., PDF, and then the document explains itself to humans This doesn’t work well with data – numbers are not self-explanatory What does the number 7 mean in cell B27? Data producers may not have explicitly specified certain details in the data: measurement units, coordinate systems, variable names, etc. Need a way to precisely characterize those assumptions such that they can be identified by humans and machines that were not closely involved in its creation

4 Federated Set of Type Registries
DTR Usage Example Users Federated Set of Type Registries 3 2 1 4 Typed Data ID Type Payload Visualization I Agree Terms:… Rights Services Data Processing 10100 11010 101…. Data Set Dissemination 4 Client (process or people) encounter data of an unknown type 1 Resolved the Type to Type Registry 2 Response includes type definitions, relationships, properties, and possibly service pointers. Response can be used locally for processing, or, optionally 3 Typed data or reference to typed data can be sent to service provider 4

5 Goal of the WG Evaluate and identify a few assumptions in data that can be codified and shared in order to… Produce a functioning Registry system that can easily be evaluated by organizations before adoption Highly configurable for changing scope of captured and shared assumptions depending on the domain or organization This doesn’t work well with data – numbers are not self-explanatory Supports several Type record dissemination variations Design for allowing federation between multiple Registry instances The emphasis is not on Identifying every possible assumption and data characteristic applicable for all domains Technology

6 Status of the WG A prototype is at: http://typeregistry.org/
Multiple other implementations/projects, including multiple schemas Implementation supports notions of primitives and derived types Primitives are fundamental types that we expect humans and software to parse and understand Derived types depend on primitives to describe something complex Registered types are assigned unique identifiers Initial WG output published as ICT Technical Standard ISO Study Group in process

7 Initial Adopters EarthCube – Steve Richard
Vermont Monitoring Cooperative – Mike Finnegan DKRZ – Tobias Weigel ePIC – Ulrich Schwardmann NIST, Common Access Platform – Wo Chang CNRI – multiple projects Ongoing ISO Study Group

8 Expected Impact of the Deliverable
Best case scenario: agreed upon set of standard schemas; ISO standard Wide use of types for data sharing and workflow automation Significant use of federation of distributed set of type registries Extended use of typed attribute/value pairs in PID resolution Worst case scenario: no agreed upon set of schemas, no further standardization General concept influences multiple communities in the direction of clearer data syntax and semantics ICT Tech Standard remains Existing use of typed attribute/value pairs in PID resolution

9 Expected Impact of the Deliverable
Before After Data sets difficult to impossible to parse, understand, and re-use unless you created them, know who did, or there exists detailed pubic documentation. Search criteria for data sets restricted to keywords and sources. Standardization across data sets fairly arbitrary, concentrated in small groups and narrow communities. Data sets can be typed at a fine level of granularity, those types can be registered in a public registry, and those type records can contain sufficient information to make detailed and accurate use of the data sets so typed. Search criteria for data sets can include type information, yielding easier comparisons and mash-ups. Greater chance of standards developing across data set construction.


Download ppt "Data Type Registries Breakout"

Similar presentations


Ads by Google