Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Catalog Project A Browsable, Searchable, Metadata System

Similar presentations


Presentation on theme: "Data Catalog Project A Browsable, Searchable, Metadata System"— Presentation transcript:

1 Data Catalog Project A Browsable, Searchable, Metadata System
J. Stillerman, T. Fredian, M. Greenwald, G. Manduchi

2 The Problem Modern experiments record very large sets of heterogeneous measurements. Measurements are hard to find, understand and access for a variety of consumers: local users other than the primary producers new users visitors remote users users analyzing data from more than one experiment

3 Local Users / New Users / Visitors
Modern experiments are so complicated that even experienced local users have trouble navigating the stored data. New users need to be able to find and understand the available measurements from an experiment. what measurements are available? what do they mean ? - how are they defined? who do I talk with about them? How do I access them ? How do I display them ? Visitors are have these questions / need these answers, even more! Off-site collaborators can not even ‘just ask someone’

4 The Solution Create a data catalog and corresponding data store.
Associate a standard set of metadata with each measurement. Use an ontology to make metadata names unambiguous, and easily searchable Associate usage/type tags with measurements. Also based on an ontology Associate additional metadata germain to these tags Store these metadata in a relational database for browsing and searching Include data access URIs in the catalog, specifying where and how to access the data referred to.

5 Use Restricted Vocabularies
Ontologies or ‘dictionaries’ define the namespaces for the metadata in the system. General metadata Usage tag names usage specific metadata These dictionaries provide the terms used to build user interfaces for searching and browsing. Fixed vocabularies reduce user confusion of similar terms.

6 Catalog Entries Coupled to Datastore
The data catalog allows users to locate and understand recorded data from experiments. The URI details how to access the data described. URI == Uniform Resource Identifier == a character string that specifies how to retrieve the data Loose coupling between the data catalog and the data store. Examples: mdsplus://server/tree/shot/path-in-shot hdfstore://myfile.h5::/data/path * *from

7 Measurement details Catalog entries describe collections of recorded data. These collections can be heterogeneous. Measurements are composites of one or more traces. Who What When Meta data tags: Name, Owners, Description, URL, Short Label, Long Label, Units, Geometry, View(s),... Usage tags: Time-Series, Profile, Image, Image-Sequence,...

8 Trace Details Traces have a similar list of metadata but refer to specific retrievable data. A URI specifies where and how to access the data. These URIs could refer to any persistent data store. They will likely contain references to MDSplus branches. The MDSplus branches could all be stored in a subtree constructed for the purpose.

9 Homogeneous MDSplus Branches
Regularity of the metadata in the database facilitates searching, browsing, and understanding. Regularity of the metadata in the MDSplus data store facilitates data driven application development. Each trace referred to by the database, will be represented in MDSplus as a node with associated metadata in the tree. Mechanism to refer to required and optional metadata in the database. Mechanism to refer to required and optional tag specific metadata in the database. Metadata could be referenced as nodes under referred to URI or some other mechanism. Could be stored as values or expressions. As long as it is accessible as ‘properties’ of the trace node. This same scheme can be (should be) used throughout the experiment.

10 Database and MDSplus Locations
Sites can host the database and corresponding MDSplus trees locally so that visitors and collaborators can search, browse and view their measurements. For consumers of data from sites not using MDSplus, the database and an MDSplus tree can be hosted on the consumer’s site. These ‘local’ trees can contain references to network APIs for the remote experiments. These ‘local’ trees can contain data extracted from the remote experiment’s data store. These ‘local’ trees can be augmented with locally produced results

11 Applications The combination of the catalog (database) and a homogeneous data representation (MDSplus) makes it very easy to make data driven, high level display applications. The database can drive signal/image selection The data store (referenced by URIs in the database) provide the data to display.

12 MIT’s VIDEO_DISPLAY

13 General Atomic’s ReviewPlus

14 πScope πScope main screen is a development environment equipped with Shell, Editor, Data Browser and Debugger) Screenshot of MDSplus data scope

15 Implementation This project is currently in a design phase.
The database will likely be done using DJANGO’s ORM which abstracts the underlying database as a set of python classes. The GUI for searching and browsing will likely be implemented using DJANGO. There will be a restful API to the database implemented using DJANGO-Rest-Framework.

16 Related Project The Metadata Provenance Ontology Project (MPO) [See presentation Abla] has as one of its key constituents ‘data objects’. These data objects are references to persistent data. Files, Database Records, Records in Files, or MDSplus. Eventually the URI’s that refer to results from experiments should be data catalog entries.

17 Conclusions A browsable, searchable data catalog is being created.
It will provide an index for homogeneous branches stored in MDSplus. New users and visitors will be able to use this to find, access and understand the recorded measurements from an experiment. This also applies to local users of diagnostics. This tool can be used to homogenize the APIs from remote experiments for users working as collaborators.

18 Thank you... Many of the ideas for this project come from: Questions ?
Matt Reinke (University of York) Jet JPF/PPF system The ITER data archive assessment (2011), provided by (Tessella, Nakanshi) under contract for ITER. Local users at Alcator C-Mod Questions ?


Download ppt "Data Catalog Project A Browsable, Searchable, Metadata System"

Similar presentations


Ads by Google