Presentation is loading. Please wait.

Presentation is loading. Please wait.

Semantic Precision for Web-based Interoperation

Similar presentations


Presentation on theme: "Semantic Precision for Web-based Interoperation"— Presentation transcript:

1 Semantic Precision for Web-based Interoperation
RIACS / NASA AMES Semantic Precision for Web-based Interoperation Gio Wiederhold Stanford University July 2001 www-db.stanford.edu/people/gio.html Thanks to Jan Jannink, Shrish Agarwal, Prasenjit Mitra, & Stefan Decker. 11/28/2018 Gio ICEIS 1

2 Outline Setting VG 3 - VG 5 Precision VG 6 - VG 8
Lack of precision VG 9 - VG 11 SKC solution VG 12, VG 20- VG 29 Ontologies VG 13 - VG 19 Early results VG 30 Interoperation VG 31 - VG 32 Tool & examples VG 33 - VG 41 Composition and execution VG 42 - VG 43 Evolution and maintenance VG44 - VG46 Summary – SKC to general science VG 47 - VG 49 11/28/2018 Gio ICEIS 2

3 T r e n d s : 1999 Users of the Internet %  52% of U.S. population Growth of Net Sites (now 2.2M public sites with 288M pages) Expected growth in E-commerce by Internet users [BW, 6 Sep.1999] segment books %  16.0% music & video 6.3%  16.4% toys 3.1%  10.3% travel %  4.0% tickets %  4.2% Overall %  33.0% = $9.5Billion An unstainable trend cannot be sustained [Herbert Stein]  new services ** Year / % Ü % Ü E-penetration Toys Centroid, in 1999 ~1% of total market 11/28/2018 Gio ICEIS 3

4 Growth and Perception E-commerce
Gartner: prediction for 2004: 7.3 T$ Revision:2001 prediction for 2004: 5.9 T$ drastic loss? 50 companies, each after 20% of the market Examples Artificial Intelligence Databases Neural networks E-commerce Extrapolated growth Disap- pointment Combi- natorial growth Realistic growth Failures Perceived growth Perception level Perceived initial growth Invisible growth 11/28/2018 Gio ICEIS 4

5 Our* Information Environment .
B2B, B2C, G2G, G2C, . . . In the past: Scarcity Customers needed more information to make better decisions Today: Excess The web provides more information than customers can digest Effect: confusion in decision-making Must I look at all possibly relevant information? What is the penalty for missing something ? What is the cost of looking at everything ? I am confused, best defer making any decision 11/28/2018 Gio ICEIS 5

6 Need for precision Precision: Few wrong or irrelevant results
More precision is needed as data volume increases --- a small error rate still leads to too many errors Information Wall human limit hard to move human with tools? acceptable limit data error rate information quantity adapted from Warren Powell, Princeton Un. 11/28/2018 Gio ICEIS 6

7 Relationships among parameters
volume retrieved volume available 100% 50% 0% space of methods %tage actually relevant perfect recall v.relevant p= v.retrieved r = v.available perfect precision Type 1 errors recall Type 2 errors precision 11/28/2018 Gio ICEIS 7

8 Cost of Error types differs
2 1 Missed Valid Information (False Negatives ) causes lost opportunities cheapest shovel, . . . suboptimal decision-making by x valid suppliers Excess Information (False Positives ) has to be investigated attractive-looking supplier - makes toys Cost- benefit Space of results, ordered Having many cases of excess information costs more than some missing information 11/28/2018 Gio ICEIS 8

9 A Major Cause of Errors Searches extend over many domains
Domains have their own terminologies Need autonomy to deal with knowledge growth The usage of terms in a domain is efficient Appropriate granularity Mechanic working on a truck vs. logistics manager Shorthand notations PSU vs. PSU Functions differ in scope Payroll versus Personnel getting paid vs. available (includes contract staff) 11/28/2018 Gio ICEIS 9

10 Semantic Mismatches Information comes from many autonomous sources Differing viewpoints ( by source ) differing terms for similar items { lorry, truck } same terms for dissimilar items trunk( luggage, car) differing coverage vehicles ( DMV, police, AIA ) differing granularity trucks ( shipper, manuf. ) different scope student ( museum fee, Stanford ) different hierarchical structures supplier vs. usage  Hinders use of information from disjoint sources missed linkages loss of information, opportunities irrelevant linkages overload on user or application program Poor precision when merged Still ok for web browsing , poor for business & science 11/28/2018 Gio ICEIS 10

11 Structural Heterogeneity
Same concept? If incorporated in differing structures? 11/28/2018 Gio ICEIS 11

12 Approach (SKC project)
Scalable Knowledge Composition – Stanford Univ. DB group Define Terminology in a domain precisely Schemas, XML DTDs  Ontologies Develop methods to permit interoperation among differing domains (not integration) Articulation Ontology Algebra Develop tools to support the methods Ontology matching 11/28/2018 Gio ICEIS 12

13 Functions of Ontologies .
Enable Precision in Understanding People = designers, implementors, users, maintainers Systems = sources, mediators, applications Share the Cost of Knowledge Acquisition & Maintenance reuse encoded knowledge, remain up-to-date as domains change Enable Information Interoperation * Define the terms that link domains 11/28/2018 Gio ICEIS 13

14 Ancestors of Ontologies
Lexicons: collect terms used in information systems Taxonomies: categorize, abstract, classify terms Schemas of databases: attributes, ranges, constraints Data dictionaries: systems with multiple files, owners Object libraries: grouped attributes, inherit., methods Symbol tables: terms bound to implemented programs Domain object models: (XML DTD): interchange terms More Knowledge formalized 11/28/2018 Gio ICEIS 14

15 bound to an application
Data and Knowledge Much Data Diverse Knowledge Knowledge Loop Information is created at the confluence of data -- the state & knowledge -- the ability to select and project the state into the future Data Loop Storage Education Selection Recording Integration Abstraction Experience State changes Decision-making Action bound to an application 11/28/2018 Gio ICEIS 15

16 Two Mismatch Solutions
A Single, Globally consistent Ontology ( Your Hope ) wonderful for users and their programs too many interacting sources long time to achieve, 2 sources ( UAL, LH ), 3 (+ trucks), 4, … all ? costly maintenance, since all sources evolve no world-wide authority to dictate conformance Domain-specific ontologies ( XML DTD assumption ) Small, focused, cooperating groups high quality, some examples - arthritis, Shakespeare plays allows sharable, formal tools ongoing, local maintenance affecting users - annual updates poor interoperation, users still face inter-domain mismatches 11/28/2018 Gio ICEIS 16

17 Global consistency: Hope, but . .
Common assumptions in assembling and integrating distributed information resources The language used by the resources is the same Sub languages used by the resources are subsets of a globally consistent language These assumptions are provably false Working towards the goal of globally consistency is 1. naïve -- the goal cannot be achieved inefficient -- languages are efficient in local contexts unmaintainable – terminology evolves with progress 11/28/2018 Gio ICEIS 17

18 Domain-specific Expertise .
Knowledge needed is huge Partition into natural domains Determine domain responsibility and authority Empower domain owners Provide tools Consider interaction Society of specialists 11/28/2018 Gio ICEIS 18

19 Domains and Consistency .
a domain will contain many objects the object configuration is consistent within a domain all terms are consistent & relationships among objects are consistent context is implicit No committee is needed to forge compromises * within a domain Domain Ontology Compromises hide valuable details 11/28/2018 Gio ICEIS 19

20 SKC grounded definition .
Ontology: a set of terms and their relationships Term: a reference to real-world and abstract objects Relationship: a named and typed set of links between objects Reference: a label that names objects Abstract object: a concept which refers to other objects Real-world object: an entity instance with a physical manifestation (or its representation in a factual database) 11/28/2018 Gio ICEIS 20

21 Grounding enables implementation
We use many abstract terms in our work Needed because we are dealing with many objects Human thinking is limited to short-term memory Someone must be able to translate them into code reliably Each abstract term must have a path to reality You must provide that path for students and coders Without a clear path that is not possible Not automatically at all – machines need specs Not reliably by human programmers – failures occur Without implementation there is no benefit Advice 11/28/2018 Gio ICEIS 21

22 An Ontology Algebra A knowledge-based algebra for ontologies
The Articulation Ontology (AO) consists of matching rules that link domain ontologies Intersection create a subset ontology keep sharable entries Union create a joint ontology merge entries Difference create a distinct ontology remove shared entries 11/28/2018 Gio ICEIS 22

23 Sample Operation: INTERSECTION
Result contains shared terms Terms useful for purchasing Source Domain 1: Owned and maintained by Store Source Domain 2: Owned and maintained by Factory 11/28/2018 Gio ICEIS 23

24 color =table(colcode)
Sample Intersections Articulation ontology matching rules : size = size color =table(colcode) style = style Ana- tomy Material inventory {...} Employees { } Machinery { } Processes { } Shoes { } Shoe Factory Shoe Store Shoes { } Customers { } Employees { } {. . . } Hard- ware foot = foot Employees Employees Nail (toe, foot) Nail (fastener) Department Store 11/28/2018 Gio ICEIS 24

25 Other Basic Operations
UNION: merging entire ontologies DIFFERENCE: material fully under local control Arti- culation ontology typically prior intersections 11/28/2018 Gio ICEIS 25

26 Features of an algebra The record of past operations can be
Operations can be composed Operations can be rearranged Alternate arrangements can be evaluated Optimization is enabled The record of past operations can be kept and reused (experience: 3 months  1 week for Webster's annual update,  2 weeks for OED (6 x size [Jannink:01] ) 11/28/2018 Gio ICEIS 26

27 INTERSECTION support Articulation ontology Store Ontology Matching
rules that use terms from the 2 source domains Factory Terms useful for purchasing 11/28/2018 Gio ICEIS 27

28 Primitive Operations Model and Instance ... Constructors create object
create set Connectors match object match set Editors insert value edit value move value delete value Converters object - value object indirection reference indirection Unary Summarize -- structure up Glossarize - list terms Filter - reduce instances Extract - circumscription Binary Match - data corrobaration Difference - distance measure Intersect - schem discovery Blend - schema extension ... 11/28/2018 Gio ICEIS 28

29 Matching rules (in process)
A equals U A equals {u, … } A equals Union (U, V) A equals Union (U, {v, ...}) A equals Intersection (U, W) A equals Intersection (U, {w, ...}) A equals Difference (U, X) A equals Difference (U, {x, ...}) must obey algebraic properties 11/28/2018 Gio ICEIS 29

30 Sample Processing in HPKB
What is the most recent year an OPEC member nation was on the UN security council (SC)? (An DARPA HPKB Challenge Problem) SKC resolves 3 Sources CIA Factbook ‘96 (nations) OPEC (members, dates) UN (SC members, years) SKC obtains the Correct Answer 1996 (Indonesia) Other groups obtained more, but factually wrong answers; they relied on one global source, the CIA factbook. Problems resolved by SKC Factbook – a secondary source -- has out of date OPEC & UN SC lists Indonesia not listed Gabon (left OPEC 1994) different country names Gambia => The Gambia historical country names Yugoslavia UN lists future security council members Gabon 1999 needed ancillary data 11/28/2018 Gio ICEIS 30

31 Interoperation via Articulation
At application definition time Match relevant ontologies where needed Establish articulation rules among them. Record the process At execution time Perform query rewriting to get to sources Optimize based on the ontology algebra. For maintenance Regenerate rules using the stored process formulation 11/28/2018 Gio ICEIS 31

32 Generation of the rules
Provide library of automatic match heuristics Lexical Methods -- spelling Structural Methods -- relative graph position Reasoning-based Methods Nexus  Hybrid Methods Iteratively, with an expert in control GUI tool to - display matches and - verify generated matches using the human expert - expert can also supply matching rules 11/28/2018 Gio ICEIS 32

33 Articulation Generator
Being built by Prasenjit Mitra Thesaurus OntA Context-based Word Relator Phrase Relator Driver Semantic Network (Nexus) Structural Matcher Ont1 Ont2 Human Expert 11/28/2018 Gio ICEIS 33

34 Lexical Methods Preprocessing rules. - Expert-generated seed rules.
e.g., (Match O1.President O2.PrimeMinister) - Context-based preprocessing directives. Thesaurus - synonyms, generalizations yellow  ochre, canary Nexus – term relationship graph Owner = buyer ( Distance of words as measure of relatedness ) 11/28/2018 Gio ICEIS 34

35 Tools to create articulations
Graph matcher for Articulation- creating Expert Vehicle ontology Transport ontology Suggestions for articulations 11/28/2018 Gio ICEIS 35

36 continue from initial point
Also suggest similar terms for further articulation: by spelling similarity, by graph position by term match repository Expert response: 1. Okay 2. False 3. Irrelevant to this articulation All results are recorded Okay’s are converted into articulation rules 11/28/2018 Gio ICEIS 36

37 Candidate Match Nexus * free, we also have an OED-based nexus.
Term linkages automatically extracted from 1912 Webster’s dictionary * * free, we also have an OED-based nexus. Based on processing headwords ý definitions using algebra primitives Notice presence of 2 domains: chemistry, transport 11/28/2018 Gio ICEIS 37

38 Using the nexus 11/28/2018 Gio ICEIS 38

39 Navigating the match repository
11/28/2018 Gio ICEIS 39

40 Example: NATO Country Graphs
Austria: bundestag .... 11/28/2018 Gio ICEIS 40

41 To be matched to 70% of documented matches found automatically
Great Britain parliament .... 70% of documented matches found automatically Remainder required human interaction with our tools. 11/28/2018 Gio ICEIS 41

42 Broader Applications Compose
Composed ontology for applications using A,B,C,E Articulation ontology for U (A B) (B C) (C E) Articulation ontology Legend: U : union U : intersection U (C E) Articulation ontology for ontology for resource E Ontology for C (A B) U U (B C) U (C D) Ontology for resource A Ontology for resource B Ontology for resource D 11/28/2018 Gio ICEIS 42

43 Exploiting the result Result has links to source
Future work Result has links to source Processing & query evaluation is best performed within Source Domains & by their engines 11/28/2018 Gio ICEIS 43

44 Architectural development
time W2 W1 D2 D6 D4 W3 I1 D1 D5 I2 M1 M2 A1 A4 A5 A2 A6 b. A3 datasources wrappers mediators Network middleware integrators applications D3 11/28/2018 Gio ICEIS 44

45 Transform Data to Information
data and simulation resources value-added services decision-makers at workstations Application Layer Mediation Layer Foundation Layer 11/28/2018 Gio ICEIS 45

46 Mediation and maintenance
Application Interface Changes of user needs Domain ontology changes Software & People Owner / Creator Maintainer Lessor - Seller Advertisor Models, programs, rules, caches, Resource Interfaces Resource functional & ontological changes Tools can help, but changes have to be dealt with rapidly. Automated learning is typically too slow, requires many instances 11/28/2018 Gio ICEIS 46

47 Domain Specialization
Knowledge Acquisition (20% effort) & Knowledge Maintenance (80% effort *) to be performed by Domain specialists Professional organizations Field teams of modest size autonomously maintainable Empowerment * based on experience with software 11/28/2018 Gio ICEIS 47

48 SKC Synopsis Research Objective:
Precise answers from heterogeneous, imperfect, scalably many data sources Sources for Ontologies: General: CIA World Factbook ‘96, UN-www, OPEC-www Webster’s Dictionary, Thesaurus, Oxford English Dictionary Topical: NATO, BattleSpace Sensors, Logistics Servers Theory: Rule-based algebra over ontologies Translation & Composition primitives Sponsor and collaboration AFOSR; DARPA DAML program; W3C; Stanford KSL and SMI; Univ. of Karlsruhe, Germany; others. 11/28/2018 Gio ICEIS 48

49 Innovation in SKC No need to harmonize full ontologies Focus on what is critical for interoperation Rules specific for articulation Tools for creation and maintenance Maintenance is distributed to n sources to m articulation agents in mediators Potentially many sets of articulation rules is m < n2 , depends on semantic architecture density a research question: density 11/28/2018 Gio ICEIS 49

50 Conclusion Today solved by human intermediate experts
High precision is important for enterprise applications cost of overload versus opportunity loss Semantic differences cause problems Today solved by human intermediate experts Will need automation support Tools so that expert knowledge is captured and maintainable Scalability requires a thorough foundation Algebra provides composition, formal basis, delegation Formal composition supports maintenance Delegation of responsibility and authority enhances quality Many research tasks left 11/28/2018 Gio ICEIS 50

51 Long Range Science Vision
Artificial Intelligence knowledge mgmt domain expertise uncertainty Databases access storage algebras Systems Engineering analysis documentation costing . . . . . . Integration Science Gio ICEIS 51


Download ppt "Semantic Precision for Web-based Interoperation"

Similar presentations


Ads by Google