Presentation is loading. Please wait.

Presentation is loading. Please wait.

Heterogeneous Information Management

Similar presentations


Presentation on theme: "Heterogeneous Information Management"— Presentation transcript:

1 Heterogeneous Information Management
prepared for CERN seminar, June 2000 Heterogeneous Information Management June 2000 Gio Wiederhold Stanford University 11/19/2018 Gio - CERN

2 Abstract Information is created by applying knowledge (enoded as programs or rules) to collected data and message received. Data and computation resources are provided by a variety of suppliers, public and private. The autonomy of the suppliers causes heterogeneity and inconsistencies. The number of potential suppliers and their autonomy also creates information overload To cope with these issues novel intermediate services are needed, opening up new opportunities. Many traditional relationships among consumers and vendors will change. We will present the concepts and status of such services. Collaboration, security, and payment schemes are some of the considerations. 11/19/2018 Gio - CERN

3 Outline Resolving Semantic Heterogeneity
Background for Mediated Systems Motivation and Functions needed Architecture Current Status Resolving Semantic Heterogeneity Research Directions Background Maintenance Research Projects Integration of Simulation Information 11/19/2018 Gio - CERN

4 Evolution of mediation
applications A3 A4 A2 A5 A1 A6 integrators a. I2 I1 mediators b. M1 network c. M2 d. e. wrappers D1 W3 D6 W2 D5 D4 W1 D2 D3 datasources 11/19/2018 Gio - CERN

5 Transforming Data to Information
users at workstations Application Layer Mediation Layer Foundation Layer value-added services data and simulation resources 11/19/2018 Gio - CERN

6 Data and Knowledge Information is created at the confluence of
Knowledge Loop Data Loop Education Recording Action Storage Selection Integration Summarization Decision-making State changes Abstraction Experience Information is created at the confluence of data -- the state & knowledge -- the ability to select and project the state into the future 11/19/2018 Gio - CERN

7 Definition* A mediator is a software module that exploits encoded knowledge about certain sets or subsets of data to create information for a higher layer of applications. It should be small and simple, so that it can be maintained by one expert or, at most, a small and coherent group of experts. * Wiederhold: IEEE Computer March 1992 11/19/2018 Gio - CERN

8 Information overload Data starvation
More databases public & corporate Faster communication digital packeting: TCP-IP, ATM World-wide connectivity Internet & Intranets world-wide web Disintermediation ubiquitous publishing 11/19/2018 Gio - CERN

9 Change in Supply vs Demand
What information consumes is rather obvious, it consumes the attention of its recipients. Hence a wealth of information creates a poverty of attention, and a need to allocate that attention efficiently among the overabundance of information sources that might consume it. [Herbert Simon] 11/19/2018 Gio - CERN

10 Function of Mediation Apply Domain-specific Specialist Knowledge to add value to locate data sources to convert for consistency to integrate from diverse sources to describe data for processing to abstract for insight / models to extrapolate to new situations to summarize for presentation INFORMATION 11/19/2018 Gio - CERN

11 Interfaces MEDIATION Human-computer Interaction User interface
Service interface Resource access User interface Real-world Human-computer Interaction Application- specific code Domain- specific code Source- MEDIATION 11/19/2018 Gio - CERN

12 Making data relevant Data reduction Data abstraction
Level changing Summarization Exception search Level change to integrate with other data sources Follow Customer Model: hierarchical, divide-and-conquer, a common paradigm 11/19/2018 Gio - CERN

13 Functions inside Mediation
Selection Summarize Transform Inte gration articulation Hetero- genous resources 11/19/2018 Gio - CERN

14 Status of Mediation Technology
Today Handcrafted Expert consults with programmer Programmer codes the knowledge needed Resource changes require advise, program update Future Generated from models Domain Expert maintains models Specification determines functions Resource changes trigger regeneration 11/19/2018 Gio - CERN

15 Coverage of Current DARPA I3 Efforts
Good progress / active research / related work / poor coverage ( ] | ) Databases / Web / Text / Simulation Facilitation (auto linking) Maintenance (rule technology?) Discovery (web,schema searching) Wrapping (syntactical heterogeneity) Integration over sources Abstraction for relevance to customer Mediators for multiple domains Caching / History :-[ :-( :-) Security for cooperation :-| :-) :-[ 11/19/2018 Gio - CERN

16 Mediator Design Principle
Transform Data into Information Match Costumer Model Hierarchical to Resource Model General network (and maintain models) 11/19/2018 Gio - CERN

17 Heterogeneity among Domains
If interoperation involves distinct domains mismatch ensues Autonomy conflicts with consistency, Local Needs have Priority, Outside uses are a Byproduct Heterogeneity must be addressed Platform and Operating Systems 4 4 Representation and Access Conventions 4 Naming and Ontology : 11/19/2018

18 Unsolved problem in Interoperation
Common assumption in assembling and integrating distributed information resources The language used by the resources is the same Sublanguages used by the resources are subsets of a globally consistent language This assumption is provably false. Working towards the goal of global consistency is 1. naïve -- the goal cannot be achieved 2. inefficient -- languages are efficient in local contexts 11/19/2018 Gio - CERN

19 Ontology: components . We represent the contents and structure of a languages by its ontology: a set of well-defined terms, which delimit the domain of discourse relationships among those terms, chosen from a limited set a formalizable subset of expert knowledge 11/19/2018 Gio - CERN

20 SKC’s grounded definition .
Ontology: a set of terms and their relationships Term: a reference to real-world and abstract objects Relationship: a named and typed set of links between objects Reference: a label that names objects Real-world object: an entity instance with a physical manifestation Abstract object: a concept which refers to other objects 11/19/2018 Gio - CERN

21 Where are Ontologies found?
Ontologies allow communication among partners in enterprises (rarely in machine-readable form) Relationships determine meaning - parent, school, company Variable and Class names in Software Databases use ontologies during design in their E-R diagrams (implicitly) and to represent the leaf nodes in their schemas. Knowledge-bases use term ontologies (often explicitely), add class definition (to hold instances), constraints, and operations among the terms. 11/19/2018 Gio - CERN

22 Establishing Ontologies
Top-down: Commonly acceptable UPPER layers Domain-specific Analysis and Sharing tools Model and Object-type based Bottom-up Wordlist creation from task-specific collections Database models, schemas, and contents 11/19/2018 Gio - CERN

23 Large Ontologies: good or bad?
Have all the Knowledge together simple for customers of KBs hard for owners of KBs, must synchronize with many others in the limit -- everybody must be globally consistent Large KB will cover multiple / all domains created by a committee -- slow maintained by a committee -- costly Differences in level of abstraction -- efficiency homeowner: nail carpenter: sinker, brad, boxnail, . . . 11/19/2018 Gio - CERN

24 Domain ontology assumption .
a domain will contain known objects the object configuration is consistent within a domain all terms are consistent & relationships among objects are consistent context is implicit in use explicit context is needed for external use Domain Ontology No committee is needed to forge compromises * within a domain Compromises hide valuable details 11/19/2018 Gio - CERN

25 SKC Objective Provide for Maintainable Ontologies
devolve maintenance onto many domain-specific experts / authorities provide an algebra to compute composed ontologies that are limited to their articulation terms enable interpretation within the source contexts SKC 11/19/2018 Gio - CERN

26 Conservative assumption !
When dealing with multiple ontologies one can never be sure that identically or similarly spelled words mean the same thing, I.e, refer to exactly the same set of real-world objects under all current and future conditions Common, optimistic assumption: Meaning is identical Gets worse when terms are stemmed SKC, conservative or pessimistic assumption: Meaning never matches, unless there is a match rule number of matching rules is reduced by focusing on the articulation 11/19/2018 Gio - CERN

27 An Ontology Algebra A knowledge-based algebra for ontologies
The Articulation Ontology (AO) consists of matching rules that link domain ontologies Intersection create a subset ontology keep sharable entries Union create a joint ontology merge entries Difference create a distinct ontology remove shared entries 11/19/2018 Gio - CERN

28 Sample Operation: INTERSECTION
Result contains shared terms Terms useful for purchasing Source Domain 1: Owned and maintained by Store Source Domain 2: Owned and maintained by Factory 11/19/2018 Gio - CERN

29 INTERSECTION support Articulation ontology Matching rules that use
terms from the 2 source domains Terms useful for purchasing Store Ontology Factory Ontology 11/19/2018 Gio - CERN

30 color =table(colcode)
Sample Intersections Shoe Store Shoes { } Customers { } Employees { } size = size color =table(colcode) style = style Ana- tomy {. . . } Material inventory {...} Machinery { } Processes { } Shoe Factory Hard- ware Articulation ontology matching rules : foot = foot Employees Employees Nail (toe, foot) Nail (fastener) Department Store 11/19/2018 Gio - CERN

31 Other Basic Operations
typically prior intersections UNION: merging entire ontologies DIFFERENCE: material fully under local control Arti- culation ontology 11/19/2018 Gio - CERN

32 Features of an algebra Operations can be composed
Operations can be rearranged Alternate arrangements can be evaluated Optimization is enabled The record of past operations can be kept and reused 11/19/2018 Gio - CERN

33 Knowledge Composition
Composed knowledge for applications using A,B,C,E Articulation knowledge for U (A B) (B C) (C E) Legend: U : union U : intersection Articulation knowledge for U (C E) Articulation knowledge for Knowledge resource E Knowledge resource C (A B) U U (B C) U (C D) Knowledge resource A Knowledge resource B Knowledge resource D 11/19/2018 Gio - CERN

34 Sample Processing in HPKB
What is the most recent year an OPEC member nation was on the UN security council? Related to DARPA HPKB Challenge Problem SKC resolves 3 Sources CIA Factbook ‘96 (nation) OPEC (members, dates) UN (SC members, years) SKC obtains the Correct Answer 1996 (Indonesia) Other groups obtained more, but factually wrong answers Problems resolved by SKC Factbook has out of date OPEC & UN SC lists Indonesia not listed Gabon (left OPEC 1994) different country names Gambia => The Gambia historical country names Yugoslavia UN lists future security council members Gabon 1999 intent of original question Temporal variants 11/19/2018 Gio - CERN

35 Tools to create articulations
Graph matcher for Articulation- creating Expert Vehicle ontology Transport ontology Suggestions for articulations 11/19/2018 Gio - CERN

36 continue from initial point
Also suggest similar terms for further articulation: by spelling similarity, by graph position by term match repository Expert response: 1. Okay 2. False 3. Irrelevant to this articulation All results are recorded Okay’s are converted into articulation rules 11/19/2018 Gio - CERN

37 Candidate Match Repository
Term linkages automatically extracted from 1912 Webster’s dictionary * * free, other sources have been processed. Based on processing headwords ý definitions using algebra primitives Notice presence of 2 domains: chemistry, transport 11/19/2018 Gio - CERN

38 Using the match repository
11/19/2018 Gio - CERN

39 Navigating the match repository
11/19/2018 Gio - CERN

40 Primitive Operations Model and Instance Unary Constructors
Summarize -- structure up Glossarize - list terms Filter - reduce instances Extract - circumscription Binary Match - data corrobaration Difference - distance measure Intersect - schem discovery Blend - schema extension Constructors create object create set Connectors match object match set Editors insert value edit value move value delete value Converters object - value object indirection reference indirection 11/19/2018 Gio - CERN

41 Future: exploiting the result
Avoid n2 problem of interpreter mapping as stated by Swartout as an issue in HPKB year 1 Result has links to source Processing & query evaluation is best performed within Source Domains & by their engines 11/19/2018 Gio - CERN

42 SKC Synopsis Research: Reliable query answers from heterogeneous, imperfect data sources Sources: General: CIA World Factbook ‘96, UN www, OPEC www Webster’s Dictionary, Thesaurus, Oxford English Dictionary Topical: OPEC, BattleSpace Sensors, Logistics Servers Client: DARPA High Performance Knowledge Base (HPKB) project Theory: Rule-based algebra Translation & Composition primitives 11/19/2018 Gio - CERN

43 Innovation in SKC No need to harmonize full ontologies
Focus on what is critical for interoperation Rules specific for articulation Potentially many sets of articulation rules Maintenance is distributed to n sources to m articulation agents is m < n2 , depending on architecture density a research question 11/19/2018 Gio - CERN

44 Domain Specialization
Knowledge Acquisition (20% effort) & Knowledge Maintenance (80% effort *) to be performed by Domain specialists Professional organizations Field teams of modest size automously maintainable Empowerment * based on experience with software 11/19/2018 Gio - CERN

45 SKC Summary . Algebra enables Interoperation by
dealing explicitly with differences by knowledge identifying maintenance domains keeping sources autonomous Assumes domain has a common ontology composing domain ontologies requires the algebra to manage the linkages where articulation occurs processes are best executed within the domains Knowledge about articulation is disjoint allows integration specialists to work independently supports multiple intersections and views Maintenance is structured and partitioned 11/19/2018 Gio - CERN

46 Current SKC Directions
Experience with real world (imperfect) data confirms validity of our approach Expert sources are better maintained than general sources Rules applied to multiple sources provide more reliable and accurate query results Component architecture enables scalable, maintainable knowledge base development Porting the concepts to the DARPA Markup Language (DAML) setting 11/19/2018 Gio - CERN

47 Mediation Research Topics
Mediator management and maintenance Representation of knowledge and customer models Balancing dynamic and warehouse solutions Formalization of semantic heterogneities many levels and types roles for wrappers vs. mediators vs. applications scalability by partitioning -- make it simple! Domain Ontologies --- tools, validation, . . . Effect of object paradigm and method-based access Service and business models New types of information systems 11/19/2018 Gio - CERN

48 Long Range Science Vision
Artificial Intelligence knowledge mgmt domain expertise uncertainty Databases access storage algebras Systems Engineering analysis documentation costing Integration Methods GIS Integration Science Spatial is special. 11/19/2018 Gio - CERN

49 Background Material: Technology Sources Maintenance Projects
Information about the Future 11/19/2018 Gio - CERN

50 Interfaces Human « Computer Application « Mediator
{x-widgets, HTML} Application « Mediator {OQL, KQML, ...} Mediator « Data sources {SQL, TQL, XML, … } Data ¬ real world {sensors, clerks, … } 11/19/2018 Gio - CERN

51 Support for KB-Algebra
Ontolingua [Gruber, Stanford KSL]: Repository for Domain Terminologies Used for mechanical design, bibliographies, catalogs LOOM USC ISI]: Classification-based Expert System Helps in structuring and processing ontologies PROTÉGÉ Stanford MIS] Reuse Penguin [Barsalou, Stanford MIS, CIFE]: Object manipulation based on Relational Algebra Used for genetics laboratory, building design 11/19/2018 Gio - CERN

52 Getting there: Available Technology/Science
Web Search Tools Multimedia Interfaces Agents Database Models Security Filters Domain Ontologies Object Bases Temporal Algebras Uncertainty algebras Customer Models Constraint Management Circumscription Communication Standards GIS GIS Case-based Reasoning Active Databases Internet Billing Knobots Simulation Access DB Views Wrappers Caching Text & Speech Processing Public Databases Distributed Storage Systems High Perf.Comm. 11/19/2018 Gio - CERN

53 Fat versus thin mediators
too thin: insufficient added value Too fat: hard to compose Too narrow: few costumers Just right service scope too broad: hard to maintain, needs a committee domain scope 11/19/2018 Gio - CERN

54 Maintenance is good for you
? years 10 4 2 7 3 1 8 9 6 5 13 11 12 100% 40 20 70 30 10 80 90 60 50 depreciation = 1 / lifetime maintenance cost relative annual lifetime automobile hardware software 11/19/2018 Gio - CERN

55 Client-Server Architecture
Client system data and simulation resources Fast build of clients by resource reuse s X Changes (x) are difficult, can affect many clients 11/19/2018 Gio - CERN

56 Systems with Mediators
Applications Gio Wiederhold Mediators Data Resources . . . 11/19/2018 Gio - CERN

57 Growth through Reuse New Application Prior & Revised Mediators
Gio Wiederhold Prior & Revised Mediators Extended Data Resources 11/19/2018 Gio - CERN

58 Linear O(n) Cost of Growth-- now O(n2)
7 2 Data changes only affect some mediators; only in their domain Mediators can 1. supply old information to n-1 prior applications 2. provide better information to the new application 3. be partially or completely reused New applications, using the new data, can be developed and inserted dynamically 11/19/2018 Gio - CERN

59 A mediator is not just static software: Knowledge ages
Application Interface Changes of user needs Software & People Domain changes Owner / Creator Maintainer Lessor - Seller Advertisor Models, programs, rules, caches, Resource Interfaces Resource changes 11/19/2018 Gio - CERN

60 Roles Computer Scientists Domain Experts Provide tools
adapatation integration matching composing Assess Standards Assure scalability Domain Experts Learn to use the tools Select resources Assess their value Rank their quality Resolve semantics Get client feedback Give provide feedback 11/19/2018 Gio - CERN

61 Assigning maintenance responsibility
a. Source data quality – supplier database, files, or web pages b. Interface to the source – wrapper, supplier or vendor for supplier c. Source selection – expert specialist in mediator d. Source quality assessment – customer input to mediator e. Semantic interoperation – specialist group providing input to the mediator f. Consistency and metadata information – mediator service operation or warehouse g. Informal, pragmatic integration – client services with customer input h. User presentation formats – Sources Services Customers 11/19/2018 Gio - CERN

62 Sample projects Tsimmis at Stanford E-Commerce in Digital Libraries
INEEL: information integration for environmental restoration MIFT: feedback for training Civil Engineering and Architecture F-22 SimQL Security 11/19/2018 Gio - CERN

63 Projects at Stanford DB group
Data Mining. Mediator & Wrapper Generation. Warehousing. Security Mediators. Megaprogramming. Simulation Access. Changes, Consistency, and Configurations. MIDAS WHIPS TSIMMIS TIHI SimQL CHAIMS C3 11/19/2018 Gio - CERN

64 The TSIMMIS Project Ramana Yerneni, Yannis Papakonstantinou, ...
Objective: Support mediation technology integrated access to distributed, autonomous, heterogeneous data sources, using object fusion wrapper toolkit to rapidly create wrappers, based on source specification, a uniform interface to heterogeneous sources mediator toolkit to rapidly construct mediators, based on a mediator specification, to integrate data from a set of wrappers 11/19/2018 Gio - CERN

65 Investors Need to Fuse Information from Multiple Sources .
Network WWW Ticker Tape Personal database group together information about the same real-world entity remove redundancies resolve conflicts 11/19/2018 Gio - CERN

66 An Integration Architecture
Client Application portfolios for each company Mediator stock market prices business reports Wrapper Wrapper Dialog Ticker Tape 11/19/2018 Gio - CERN

67 Additional Challenge: Sources Without a Well-Structured Schema
Examples semistructured irregular deeply nested incomplete schema knowledge autonomous dynamic World Wide Web SGML documents genome, chemical structures bibliographic information files 11/19/2018 Gio - CERN

68 Wrappers & Mediators from High-Level Specifications
DeclarativeMediator Specification Client Mediator Specification Interpreter Mediator Wrapper Specification Interpreter Wrapper Wrapper Declarative Source Specifications Source Source 11/19/2018 Gio - CERN

69 yes E-money Services must be paid for
Incentive for creation and improvement price proportional to value added, often small profit f (cost, market, price, overhead ) price low per item, so overhead must be low Simple payment (no credit accounts, checks) Enabled through secure signatures yes 11/19/2018 Gio - CERN

70 E-Commerce in the Digital Library
Steven Ketchpel & DL Economics Group Delivery Cryptolope DigiBox HTTP Payment CyberCash DigiCash First Virtual SET Major Integration Problem Shopping Models: Pay-per-view, Subscription, Session, Shareware, Auctions, Site License, Gift Certificate, Layaway, Pre-paid vouchers, … . 11/19/2018 Gio - CERN

71 Shopping model: merchant-independent logic controlling flow of business model
Example shopping models: Order, Pay, (Deliver 52 times) (1 month; Order, Deliver) Pay State Information Event Handlers Event Handlers Bill 2 Merchant Customer 1 Order Complete Event Handlers 3 4 Payment Complete Start Transfer $ Abstract API allows application to interact with many different services in a consistent way Event Handlers Proxy event handlers translate from native applications to shopping model defined protocols Payment/Delivery/ Other Services 11/19/2018 Gio - CERN

72 TSIMMIS Status Mediator Specification Interpreter running on Ultrix, AIX, OSF. 9000 lines of C/C++ code 4000 C++ lines of Server/Client Support Libraries Integration of three disparate bibliographic sources legacy system flat BibTeX files relational DB wwWeb files 11/19/2018 Gio - CERN

73 Mediator Specification Interpreter Architecture
Result Query Mediator Specification Query Rewriter logical datamerge program Cost-Based Optimizer plan Datamerge Engine Queries to Wrappers Results 11/19/2018 Gio - CERN

74 Environmental Restoration at INEL Undoing 50 years of messes ….
MSL [Stanford] OQL [ODMG] MQL [ISX] QEM OEM OEM QEM OEM QEM OEM OEM QEM QEM other mediators mediator CORBA wrapper OEM OEM QEM QEM QEM wrapper wrapper ERIS wrapper IEDMS Many projects many sources Idaho National Engineering Laboratory LOCKHEED MARTIN ISX - Stanford Univ. 11/19/2018 11/19/2018 Gio - CERN

75 Mediation to Implement Feedback in Training David Maluf, Priya Panchapagesan, Ted Linden
Another task of mediators, prior to integration MIFT Abstraction Abstraction to match levels of granularity 11/19/2018 Gio - CERN

76 Mediation Feedback: Playback or Graph Wrapped Simulation Resources
User Interface Commanders Training Developers Trainees UI in Java Analysts Observers Application Layer Standards in KQML Objectives Mediation Layers Stanford Mediators with rules in CLIPS Tasks I.D.A Wrappers in C/C++ Wrapped Simulation Resources Janus SimNet 11/19/2018 Gio - CERN

77 MIFT . Result . Analyses: Force ratio Losses Area gain Exercise
Simulator Type 11/19/2018 Gio - CERN

78 Control Valve Sizing, Future
From Andrew Arnold: Civ. Eng. Qualification Exam Interpretation Programmatic Analysis Integrated Evaluation Transformation Automated 11/19/2018 Gio - CERN

79 F-22 IWSDB Phase 6 User Interfaces Integration Services Databases
Wrappers PD DS Appli- cation PRIDE Change Notification Sy- base Index Suppliers Provi- sioner Query Re- formulation WAIS server Match maker Domain Model Engi- neer IWSDB client S Q L Domain Matching GUI 11/19/2018 Gio - CERN 19

80 Simulation services 1. Continously executing: weather prediction
SimQL result reports best match samples 2. Execution specific to query: what-if assessment, spreadsheets may require HPC power for adequate response 3. Complement base data: materials data, assembly performs inter- or extra-polations to match query parameters 4. Combinations of 2. and 3.: top layer simulation using stored partial lower level results: weapon performance in setting 5. Human-in-the-loop (mediated by an agent program): SAFs Note A simulation service program can be written in any language A simulation service must be compliant to the interface 11/19/2018 Gio - CERN

81 SimQL: Simulation Access Service
Information Systems should also deal with the Future past SQL now SimQL future time Decision-making requires dealing with the future, as well the past Databases deal well with the past Sensors can provide current status Spreadsheets, simulations deal with the likely futures Information systems should be able to combine all three 11/19/2018 Gio - CERN


Download ppt "Heterogeneous Information Management"

Similar presentations


Ads by Google