Presentation on theme: "Meta Data Larry, Stirling md on data access – data types, domain meta-data discovery Scott, Ohio State – caBIG md driven architecture semantic md Alexander."— Presentation transcript:
Meta Data Larry, Stirling md on data access – data types, domain meta-data discovery Scott, Ohio State – caBIG md driven architecture semantic md Alexander Vienna md tailored to optimise data integration – do we need one data source or other in integration, performance capability Elias – OGSA-DAI – any md aspects Leena – schema integration add semantics to schema element to aid finding elements and in integrating them – for matching Mario OGSA-DAI – extracting md, 3 rd party registries for OGSA-DAI – capabilities? OD not build registries but use globus registries Jessie – domain semantics in md, discovery, integration
Meta Data For What? –Discovery –Data Access –Data Integration –Optimisation –Service composition? What? –Ontologies – most conceptual –Schema - data types –Content - –Capability
MetaData for Discovery & Integration caBIG - Scott Content –Structure (XML schema) –Semantics (relationship to ontology) Data Registry –Describe data model (UML) no constraints Review – curated semantic ontology (EVS, UMLS – proprietary) Bind UML model map to ontology ISO stnd md repositories – class attribute value domain semantic concept. If data type doesnt exist – add concept to ontology UML objects -> XML (GME, data type in XML schemas) Issues –Hard to get new types into ontology Review process – restrictive to certain users –What aspects of this could be relaxed without collapsing the whole system e.g. remove requirement for centralised ontology –Currently ontology resides external to the registry, Registry doesnt understand ontology what added functionality would you get if you added semantics to the registry?
MetaData for Optimisation SemDIG (GRIDMiner) - Alex Lots of data sources – want to choose the right one for data integration based on the data (does it contribute to the answer?) –If several sources contribute which one(s) would be the best Technical information required not semantics as such –Information on distribution, ranges etc (summary data) Defining a common meta data set for data sources Solved – –the metadata to be collected (data statistics), –the collection of the data Questions: –Can we uniformly present histograms and data required? Pmml? Predictive model markup language (xml schema for describing decision trees, dictionaries etc) –What is the architecture for presenting it? Can OGSA-DAI, data cutter etc. help? –Possibly integrate ideas in OGSA-DAI –Requires background threads
MetaData for Discovery GEODE: (Larry) Has syntactic discovery based on DB schema –Supported by OGSA-DAI: Really data access but not discovery –Doesnt tell the users what the data that is being exposed is about good if you know what you want Use OGSA-DAI to expose the semantics - for domain specific discovery –Is this possible or sensible? (Japanese team - RDF + OMII UK team - GRIMOIRES) How? –Representation mechanism Mark up of the semantics of the domain in whatever the external ontology uses –Storage of the semantics for discovery Need a metadata registry well record terms bound to local data structure which points to the external ontology If multiple ontologies used – need to refer to the specific ontology used –Reason about the concepts for discovery - external Requires a reasoning engine Return possible concepts for checking in the data registry Issues –Creation/association of data with the ontology terms (similar to schema mapping) –How can OGSA-DAI help – using the registry are there plans for OGSA-DAI to support 1 schema (to insert semantics within the existing provision) How do you tie the term to the data item? –(Similar to automed schema to RDFS ontology mapping) –Look at WSDL-S (W3C) for inserting refs in a document that points to external ontology.
MetaData for Access Dave Maintaining metadata separately from data –e.g. a metadata catalogue linked to files Issue –How can we maintain the metadata? Have service controlling both Notification system
Generation of MetaData& Ontologies All preceding technologies rely on good metadata & ontologies –How do we encourage users/communities to create metadata or ontologies –Know they are correct –What are the limits - performance etc. Service infrastructure for representing controlled vocabularies –LEXGRID –API for graph operations etc. Point at which size of ontologies makes reasoning too expensive –Survey of scalability techniques for reasoning with ontologies What they are doing about it…