Stanford University, CSD

Slides:



Advertisements
Similar presentations
The 20th International Conference on Software Engineering and Knowledge Engineering (SEKE2008) Department of Electrical and Computer Engineering
Advertisements

Database Systems: Design, Implementation, and Management Tenth Edition
Software Engineering Institute Carnegie Mellon University Pittsburgh, PA Sponsored by the U.S. Department of Defense © 1998 by Carnegie Mellon.
Helping people find content … preparing content to be found Enabling the Semantic Web Joseph Busch.
Basic guidelines for the creation of a DW Create corporate sponsors and plan thoroughly Determine a scalable architectural framework for the DW Identify.
Automated Analysis and Code Generation for Domain-Specific Models George Edwards Center for Systems and Software Engineering University of Southern California.
Gio Wiederhold SimQL 1 SimQL Accessing Simulation as Services to Information Systems Gio Wiederhold July 2001.
Status of Mediation Technology Gio Wiederhold Stanford University Oct 1999 SNU -- KINS.
A Review of Ontology Mapping, Merging, and Integration Presenter: Yihong Ding.
March 2000 Gio XIT 1 Increasing the Precision when Obtaining Information from the Web Gio Wiederhold Stanford University 4 April 2000 related report: www-db.stanford.edu/pub/gio/1999/miti.htm.
August 2000 Gio vdR 1 Increasing the Precision of Semantic Interoperation Gio Wiederhold Stanford University August 2000 Reind van de Riet celebration.
Feb. 23, 2004CS WPI1 CS 509 Design of Software Systems Lecture #5 Monday, Feb. 23, 2004.
April 2002 Gio ArtInt 1 Information Interoperation versus Integration Gio Wiederhold Stanford University April 2002 www-db.stanford.edu/people/gio.html.
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING PROJECT VISTA: Integrating Heterogeneous Utility Data A very brief overview.
The information integration wizard (Iwiz) project Report on work in progress Joachim Hammer Presented by Muhammed Al-Muhammed.
March 2000 Gio XIT 1 Increasing the Precision of Semantic Interoperation Gio Wiederhold Stanford University March 2000 report: www-db.stanford.edu/pub/gio/1999/miti.htm.
IBM User Technology March 2004 | Dynamic Navigation in DITA © 2004 IBM Corporation Dynamic Navigation in DITA Erik Hennum and Robert Anderson.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Problems with reuse – Increased maintenance costs; lack of tool support; not-invented- here syndrome; creating, maintaining, and using a component library.
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
Semantic Interoperability Jérôme Euzenat INRIA & LIG France Natasha Noy Stanford University USA.
Katanosh Morovat.   This concept is a formal approach for identifying the rules that encapsulate the structure, constraint, and control of the operation.
SWE 316: Software Design and Architecture – Dr. Khalid Aljasser Objectives Lecture 11 : Frameworks SWE 316: Software Design and Architecture  To understand.
CS 474 Database Design and Application Terminology Jan 11, 2000.
Ontology Summit2007 Survey Response Analysis -- Issues Ken Baclawski Northeastern University.
Linked-data and the Internet of Things Payam Barnaghi Centre for Communication Systems Research University of Surrey March 2012.
Confidential 111 Financial Industry Business Ontology (FIBO) [FIBO– Business Entities] Understanding the Business Conceptual Ontology For FIBO-Business.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
1 Introduction to Software Engineering Lecture 1.
Information Systems Engineering. Lecture Outline Information Systems Architecture Information System Architecture components Information Engineering Phases.
©Ferenc Vajda 1 Semantic Grid Ferenc Vajda Computer and Automation Research Institute Hungarian Academy of Sciences.
Chapter 8 Object Design Reuse and Patterns. Object Design Object design is the process of adding details to the requirements analysis and making implementation.
A Context Model based on Ontological Languages: a Proposal for Information Visualization School of Informatics Castilla-La Mancha University Ramón Hervás.
Christoph F. Eick University of Houston Organization 1. What are Ontologies? 2. What are they good for? 3. Ontologies and.
A State Perspective Mentoring Conference New Orleans, LA 2/28/2005 RCRAInfo Network Exchange.
S calable K nowledge C omposition Ontology Interoperation January 19, 1999 Jan Jannink, Prasenjit Mitra, Srinivasan Pichai, Danladi Verheijen, Gio Wiederhold.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
MODEL-BASED SOFTWARE ARCHITECTURES.  Models of software are used in an increasing number of projects to handle the complexity of application domains.
1 Resolving Schematic Discrepancy in the Integration of Entity-Relationship Schemas Qi He Tok Wang Ling Dept. of Computer Science School of Computing National.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
Semantic Data Extraction for B2B Integration Syntactic-to-Semantic Middleware Bruno Silva 1, Jorge Cardoso 2 1 2
GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011
Introduction to Software Engineering 1. Software Engineering Failures – Complexity – Change 2. What is Software Engineering? – Using engineering approaches.
Of 24 lecture 11: ontology – mediation, merging & aligning.
Technische Universität München © Prof. Dr. H. Krcmar An Ontology-based Platform to Collaboratively Manage Supply Chains Tobias Engel, Manoj Bhat, Vasudhara.
Database Systems: Design, Implementation, and Management Tenth Edition
The Semantic Web By: Maulik Parikh.
Agenda Federated Enterprise Architecture Vision
Definition CASE tools are software systems that are intended to provide automated support for routine activities in the software process such as editing.
Distribution and components
Architecture Components
Overview of SDN Controller Design
Software Quality Engineering
Component Based Software Engineering
Abstract descriptions of systems whose requirements are being analysed
February 2001 Gio Wiederhold Stanford University
Using Knowledge to Exploit Data
Outline Pursue Interoperability: Digital Libraries
Tools For Resolving Heterogeneity Computer Science Department
8. Educational Challenges
File Systems and Databases
Data Quality By Suparna Kansakar.
Obtaining Precision when Integrating Information
Semantic Precision for Web-based Interoperation
CSSSPEC6 SOFTWARE DEVELOPMENT WITH QUALITY ASSURANCE
Ontology-Based Approaches to Data Integration
Automated Analysis and Code Generation for Domain-Specific Models
Presentation transcript:

Stanford University, CSD K C Scalable Knowledge Composition September 2001 Gio Wiederhold, Shrish Agarwal, Stefan Decker, Jan Janninck, Prasenjit Mitra, et al. Stanford University, CSD 12/7/2018 SKC Synopsis

Data + Knowledge  Information Apply relevant Knowledge to relevant Data Analyses Composition of source information SKC focus Aggregation of instances Selection Observations Quality Filters to obtain: Information for decision-making 12/7/2018 SKC Synopsis

Many sources, disciplines, people Extraction of actionable information, so that future benefits can accrue, requires broad-based knowledge Areas make deep progress in isolation Benefits are possible when solid results are available the results are Integrated or Composed Broad base leads to heterogeneity and inconsistency of terminologies 12/7/2018 SKC Synopsis

Language differences inhibit integration An essential feature of science autonomy of fields differing granularity and scope of focus growth of fields requires new terms A feature of technological process standards require stability yesterday’s innovations are today’s infrastructure today’s innovations are tomorrow’s infrastructure Must be dealt with explicitly sharing, integration, and aggregation are essential large quantities of data require precision 12/7/2018 SKC Synopsis

Semantic Mismatches Ok for web browsing , poor for business & science Autonomous sources in all domains have Differing viewpoints ( by source ) differing terms for similar items { lorry, truck } same terms for dissimilar items trunk( luggage, car) differing coverage vehicles ( DMV, police, AIA ) differing granularity trucks ( shipper, manuf. ) different scope student ( museum fee, Stanford ) different hierarchical structures supplier vs. usage  Hinders use of information from disjoint sources missed linkages loss of information, opportunities irrelevant linkages overload on user or application program Poor precision when merged Ok for web browsing , poor for business & science 12/7/2018 SKC Synopsis

Heterogeneity among Domains is natural Interoperation creates mismatch Autonomy conflicts with consistency, Local Needs have Priority, Outside uses are a Byproduct Heterogeneity must be addressed Platform and Operating Systems   Data Representation and Access Conventions  Metadata: Naming and Ontology  needed to share data from distinct sources 12/7/2018 SKC Synopsis

Two Mismatch Solutions A Single, Globally consistent Ontology ( Your Hope ) wonderful for users and their programs too many interacting sources long time to achieve, 2 sources ( UAL, LH ), 3 (+ trucks), 4, … all ? costly maintenance, since all sources evolve no world-wide authority to dictate conformance Domain-specific ontologies ( XML DTD assumption ) Small, focused, cooperating groups high quality, some examples - arthritis, Shakespeare plays allows sharable, formal tools ongoing, local maintenance affecting users - annual updates poor interoperation, users still face inter-domain mismatches 12/7/2018 SKC Synopsis

Our approach (SKC project) 1. Define Terminology in a domain precisely Schemas, XML DTDs  Ontologies Develop methods to permit interoperation among differing domains (not integration) Articulation --- support the limited interoperation needed to solve problems in an application domain Ontology Algebra --- enable scalability to as many sources as are needed to support applications Develop tools to support the methods Ontology matching 12/7/2018 SKC Synopsis

An Ontology Algebra A knowledge-based algebra for ontologies The glue that holds the bricks together A knowledge-based algebra for ontologies The Articulation Ontology (AO) consists of matching rules that link domain ontologies Intersection create a subset ontology keep sharable entries Union create a joint ontology merge entries Difference create a distinct ontology remove shared entries 12/7/2018 SKC Synopsis

Sample Operation: INTERSECTION Result contains shared terms Articulates the two domains Terms useful for purchasing Source Domain 1: Owned and maintained by Store Source Domain 2: Owned and maintained by Factory 12/7/2018 SKC Synopsis

color =table(colcode) Sample Intersections Articulation ontology matching rules : size = size color =table(colcode) style = style Ana- tomy Material inventory {...} Employees { . . . } Machinery { . . . } Processes { . . . } Shoes { . . . } Shoe Factory Shoe Store Shoes { . . . } Customers { . . . } Employees { . . . } {. . . } Hard- ware foot = foot Employees Employees Nail (toe, foot) Nail (fastener) . . . . . . Department Store 12/7/2018 SKC Synopsis

Within a Domain Terms have clear Meanings a domain will contain many objects the object configuration is consistent within a domain all terms are consistent & relationships among objects are consistent context is implicit No committee is needed to forge compromises * within a domain Domain Ontology Compromises hide valuable details 12/7/2018 SKC Synopsis

SKC grounded definition . Ontology: a set of terms and their relationships Term: a reference to real-world and abstract objects Relationship: a named and typed set of links between objects Reference: a label that names objects Abstract object: a concept which refers to other objects Real-world object: an entity instance with a physical manifestation (or its representation in a factual database) 12/7/2018 SKC Synopsis

Grounding enables implementation We use many abstract terms in our work Needed because we are dealing with many objects Human thinking is limited to short-term memory Someone must be able to translate them into code reliably Each abstract term must have a path to reality One must provide that path for students and coders Without a clear path that is not possible Not automatically at all – machines need specs Not reliably by human programmers – failures occur Without implementation there is no benefit 12/7/2018 SKC Synopsis

INTERSECTION support Articulation ontology Matching rules that use terms from the 2 source domains Terms useful for purchasing Store Ontology Factory Ontology 12/7/2018 SKC Synopsis

Other Basic Operations UNION: merging entire ontologies DIFFERENCE: material fully under local control Arti- culation ontology typically prior intersections 12/7/2018 SKC Synopsis

Features of an algebra The record of past operations can be kept and reused (experience: 3 months  1 week for Webster's annual update,  2 weeks for OED (6 x size) [Jannink:01]) Maintenance is enabled by using remote, deep domain expertise rapid recomposition for application domain Expect also that Operations can be composed Operations can be rearranged Alternate arrangements can be evaluated Optimization is enabled 12/7/2018 SKC Synopsis

Sample Processing in the DARPA HPKB challenge What is the most recent year an OPEC member nation was on the UN security council (SC)? SKC resolves 3 Sources CIA Factbook ‘96 (nation) OPEC (members, dates) UN (SC members, years) SKC obtains the Correct Answer 1996 (Indonesia) Other groups obtained more, but factually wrong answers; they relied on one global source, the CIA factbook. Problems resolved by SKC Factbook – a secondary source -- has out of date OPEC & UN SC lists Indonesia not listed Gabon (left OPEC 1994) different country names Gambia => The Gambia historical country names Yugoslavia UN lists future security council members Gabon 1999 needed ancillary data 12/7/2018 SKC Synopsis

Interoperation via Articulation Process phases:  At application definition time Match relevant ontologies where needed Establish articulation rules among them. Record the process  At execution time Perform query rewriting to get to sources Optimize based on the ontology algebra.  For maintenance Regenerate rules using the stored formulation 12/7/2018 SKC Synopsis

Generation of the articulation rules Provide library of automatic match heuristics Lexical Methods – spelling similarity --- commonly used by others Structural Methods -- relative graph position Reasoning-based Methods Nexus – a graph we derive from the OED / Websters links terms based on definitions, not lexical similarity Hybrid Methods Iteratively, with an expert in control GUI tool to - display matches and - verify generated matches using the human expert - expert can also supply matching rules 12/7/2018 SKC Synopsis

Articulation Generator Being built by Prasenjit Mitra Thesaurus OntA Context-based Word Relator Phrase Relator Driver Semantic Network (Nexus) Structural Matcher Ont1 Ont2 Human Expert 12/7/2018 SKC Synopsis

Principle of Knowledge Composition Composed knowledge for applications using A,B,C,E Articulation knowledge for U (A B) (B C) (C E) Articulation knowledge Legend: U : union U : intersection U (C E) Articulation knowledge for Knowledge resource E Knowledge resource C (A B) U U (B C) U (C D) Knowledge resource A Knowledge resource B Knowledge resource D 12/7/2018 SKC Synopsis

Exploiting the result (future plans) Avoid n2 problem of interpreter mapping [Swartout HPKB year 1] Result has links to source Processing & query evaluation is best performed within Source Domains & by their engines 12/7/2018 SKC Synopsis

Support Domain Specialization Knowledge Acquisition (20% effort) & Knowledge Maintenance (80% effort *) to be performed Domain specialists (SMEs) Professional organizations Field teams of modest size Empowerment automously maintainable * based on experience with software 12/7/2018 SKC Synopsis

Domain-specific Expertise . Knowledge needed is huge Partition into natural domains Determine domain responsibility and authority Empower domain owners Exploit domain-specific expertise Provide computer-science tools Consider interaction Our Ontology Society of specialists 12/7/2018 SKC Synopsis

SKC Project Synopsis Research Objective: Precise information for applications from heterogeneous, imperfect, scalably many data sources Sources for Ontologies used currently: General: CIA World Factbook ‘96, www.UN, www.OPEC Webster’s Dictionary, Thesaurus, Oxford English Dictionary Topical: NATO, BattleSpace Sensors, Logistics Servers Theory: Domain autonomy and exploitation Rule-based algebra over ontologies Translation & Composition primitives Sponsors and collaboration AFOSR; DARPA DAML program; W3C; Stanford KSL and SMI; Univ. of Karlsruhe, Germany; others. 12/7/2018 SKC Synopsis