Presentation is loading. Please wait.

Presentation is loading. Please wait.

How to integrate data Barry Smith. The problem: many, many silos DoD spends more than $6B annually developing a portfolio of more than 2,000 business.

Similar presentations


Presentation on theme: "How to integrate data Barry Smith. The problem: many, many silos DoD spends more than $6B annually developing a portfolio of more than 2,000 business."— Presentation transcript:

1 How to integrate data Barry Smith

2 The problem: many, many silos DoD spends more than $6B annually developing a portfolio of more than 2,000 business systems and Web services these systems are poorly integrated deliver redundant capabilities, make data hard to access, foster error and waste prevent secondary uses of data https://ditpr.dod.mil/https://ditpr.dod.mil/ Based on FY11 Defense Information Technology Repository (DITPR) data 2

3 what is missing here 3

4 Syntactic and semantic interoperability Syntactic interoperability = systems can exchange messages (realized by XML). Semantic interoperability = messages are interpreted in the same way by senders and receivers. When meanings are specified via natural- language strings, experience shows that this is not a viable route to achieving semantic interoperability. 4

5 Instance data vs. data about types instances: Bill Clinton Bill Clinton’s dog the planet Earth types: human being dog plant

6 DoD Enterprise Ontology http://www.youtube.com/watch?v=OzW3Gc_yA9A Dennis Wisnosky, Chief Architect & Chief Technical Officer, Business Mission Area, Office of the Deputy Chief Management Officer, US Department of Defense

7 Instance data vs. data about types instances: Iraq Basra Abu Ghraib types: country city prison

8 compare: legends for maps maps vs. legends for maps 8

9 compare: legends for maps common legends allow (cross-border) integration 9

10 The Gene Ontology MouseEcotope GlyProt DiabetInGene GluChem sphingolipid transporter activity 10

11 The Gene Ontology MouseEcotope GlyProt DiabetInGene GluChem Holliday junction helicase complex 11

12 The Gene Ontology MouseEcotope GlyProt DiabetInGene GluChem sphingolipid transporter activity 12

13 Common legends help human beings use and understand complex representations of reality help human beings create useful complex representations of reality help computers process complex representations of reality help glue data together But common legends serve these purposes only if the legends are developed in a coordinated, non-redundant fashion 13

14 International System of Units 14

15 RELATION TO TIME GRANULARITY CONTINUANTOCCURRENT INDEPENDENTDEPENDENT ORGAN AND ORGANISM Organism (NCBI Taxonomy) Anatomical Entity (FMA, CARO) Organ Function (FMP, CPRO) Phenotypic Quality (PaTO) Biological Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Process (GO) The Open Biomedical Ontologies (OBO) Foundry 15

16 CONTINUANTOCCURRENT INDEPENDENTDEPENDENT ORGAN AND ORGANISM Organism (NCBI Taxonomy) Anatomical Entity (FMA, CARO) Organ Function (FMP, CPRO) Phenotypic Quality (PaTO) Organism-Level Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) Cellular Process (GO) MOLECULE Molecule (ChEBI, SO, RNAO, PRO) Molecular Function (GO) Molecular Process (GO) rationale of OBO Foundry coverage GRANULARITY RELATION TO TIME 16

17 RELATION TO TIME GRANULARITY CONTINUANTOCCURRENT INDEPENDENTDEPENDENT COMPLEX OF ORGANISMS Family, Community, Deme, Population Organ Function (FMP, CPRO) Population Phenotype Population Process ORGAN AND ORGANISM Organism (NCBI Taxonomy) Anatomical Entity (FMA, CARO) Phenotypic Quality (PaTO) Biological Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Process (GO) Population-level ontologies 17

18 RELATION TO TIME GRANULARITY CONTINUANTOCCURRENT INDEPENDENTDEPENDENT ORGAN AND ORGANISM Organism (NCBI Taxonomy) Anatomical Entity (FMA, CARO) Organ Function (FMP, CPRO) Phenotypic Quality (PaTO) Biological Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Process (GO) Environment Ontology environments 18

19 19 RELATION TO TIME GRANULARITY CONTINUANTOCCURRENT INDEPENDENTDEPENDENT COMPLEX OF ORGANISMS Family, Community, Deme, Population Organ Function (FMP, CPRO) Population Phenotype Population Process ORGAN AND ORGANISM Organism (NCBI Taxonomy) (FMA, CARO) Phenotypic Quality (PaTO) Biological Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cell Com- ponent (FMA, GO) Cellular Function (GO) MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Process (GO) http://obofoundry.org E N V I R O N M E N T

20 OBO Foundry approach being applied in the following biology domains 20 NIF StandardNeuroscience Information Framework ISF OntologiesIntegrated Semantic Framework OGMS and ExtensionsOntology for General Medical Science IDO ConsortiumInfectious Disease Ontology cROPCommon Reference Ontologies for Plants

21 What do ontology annotations do? make data retrievable even by those not involved in their creation allow integration of data deriving from heterogeneous sources break down the walls of roach motels 21

22 Applying the annotations approach to military data via Semantic Enhancement data remain in their original state (is treated at ‘arms length’) ‘tagged’ using interoperable ontologies created in coordinated way allows flexible response to new needs, adjustable in real time can be as complete as needed, lossless, long-lasting because flexible and responsive big bang for buck – measurable benefit even from first small investments The strategy works only to the degree that it rests on shared governance and training 22

23 Benefits of the Approach Does not interfere with the source content Enables content to evolve in a cumulative fashion as it accommodates new kinds of data Does not depend on the data resources and can be developed independently from them in an incremental and distributed fashion Provides a more consistent, homogeneous, and well- articulated presentation of the content which originates in multiple internally inconsistent and heterogeneous systems 23

24 Benefits of the Approach Makes management and exploitation of the content more cost-effective Allows graceful integration with other government initiatives and brings the system closer to the federally mandated net-centric data strategy Creates incrementally an integrated content that is effectively searchable and that provides content to which more powerful analytics can be applied 24

25 Building the Shared Semantic Resource Methodology of distributed incremental development Training Governance Common Architecture of Ontologies to support consistency, non-redundancy, modularity  Upper Level Ontology (BFO)  Mid-Level Ontologies  Low Level Ontologies 25

26

27

28 Goal: To realize Horizontal Integration(HI) of intelligence data HI =Def. the ability to exploit multiple data sources as if they are one  Problem: the data coming onstream are out of our control  Any strategy for HI must be agile in the sense that it can be quickly extended to new zones of emerging data according to need 28

29 Army Intelligence and Information Warfare Directorate (I2WD) Create an agile strategy for building ontologies within a Shared Semantic Resource (SSR) and apply and extend these ontologies to annotate new source data as they come onstream Problem: Given the immense and growing variety of data sources, the development methodology must be applied by multiple different groups: How to manage collaboration? 29

30 Why do large-scale ontology projects fail? focus on vocabularies, lexicons, with no logical structure, no attention to life cycle failure of housekeeping yields redundancy and therefore forking the same data is annotated in different ways by users of different ontology fragments data is siloed as before HOW TO BUILD THE NEEDED LOGIC INTO THE ARCHITECTURE OF THE ONTOLOGIES? 30

31 MeSH (Medical Subject Headings) MeSH Descriptors Anthropology, Education, Sociology and Social Phenomena Social Sciences Political Systems National Socialism National Socialism is_a Political Systems National Socialism is_a Anthropology...

32 Examples of Principles All terms in all ontologies should be singular nouns Same relations between terms should be reused in every ontology Reference ontologies should be based on single inheritance All definitions should be of the form an S = Def. a G which Ds where ‘G’ (for: genus) is the parent term of S (for: species) in the corresponding reference ontology

33 Anatomy Ontology (FMA*, CARO) Environment Ontology (EnvO) Infectious Disease Ontology (IDO*) Biological Process Ontology (GO*) Cell Ontology (CL) Cellular Component Ontology (FMA*, GO*) Phenotypic Quality Ontology (PaTO) Subcellular Anatomy Ontology (SAO) Sequence Ontology (SO*) Molecular Function (GO*) Protein Ontology (PRO*) Extension Strategy + Modular Organization 33 top level mid-level domain level Information Artifact Ontology (IAO) Ontology for Biomedical Investigations (OBI) Spatial Ontology (BSPO) Basic Formal Ontology (BFO)

34 Ontologies are built as orthogonal modules which form an incrementally evolving network developers and SMEs are motivated to commit to developing ontologies because they will need in their own work ontologies that fit into this network users are motivated by the assurance that the ontologies they turn to are maintained by experts 34

35 More benefits of orthogonality helps those new to ontology to find what they need to find models of good practice ensures mutual consistency of ontologies (trivially) and thereby ensures additivity of annotations 35

36 More benefits of orthogonality No need to reinvent the wheel for each new domain Can profit from storehouse of lessons learned Can more easily reuse what is made by others Can more easily reuse training Can more easily inspect and criticize results of others’ work Leads to innovations (e.g. Mireot, Ontofox) in strategies for combining ontologies 36


Download ppt "How to integrate data Barry Smith. The problem: many, many silos DoD spends more than $6B annually developing a portfolio of more than 2,000 business."

Similar presentations


Ads by Google