CoopIS2001 Trento, Italy The Use of Machine-Generated Ontologies in Dynamic Information Seeking Giovanni Modica Avigdor Gal Hasan M. Jamil
CoopIS2001 Trento, Italy Motivating example
CoopIS2001 Trento, Italy Preliminaries Definition: An ontology is an explicit representation of a conceptualization. (Gruber 1993) Conjecture I: Applications in a given domain base their information exchange on some (shared) underlying ontology. Observation: Application in a given domain use different ontology representation. Conjecture II: Given an application A such that A utilizes an ontology representation O A, and an ontology O, there exists an invertible mapping f A such that f A (O A )=O
CoopIS2001 Trento, Italy Problem description Given two applications A and B, such that A utilizes an ontology representation O A and B utilizes an ontology representation O B, introduce a mapping f BA such that f BA (O B )=O A In a perfect world: –O is known. –f A is known. –f B is known. O A = f A -1 (f B (O B )) Alas: –O is unknown. At best, an approximation of O exists, in a form of a standard. –f A and f B are unknown: lack of documentation, the mental state of a designer, etc.
CoopIS2001 Trento, Italy Proposed solution Given two applications A and B, such that A utilizes an ontology representation O A and B utilizes an ontology representation O B, introduce a mapping f BA such that f BA depends on the ontology representation. A matching is associated with a degree of confidence in the matching. 0 identifies non-matching terms. 1 identifies a crisp matching.
CoopIS2001 Trento, Italy Ontology representation Dynamic information seeking: –HTML forms Labels Input fields Scripts –Assumptions: Labels represent terms in an ontology ( e.g., Pick-up Date). Input fields provide constraints on the value domains ( e.g., {Day, 1, … 31}). Scripts, among other things, suggest a precedence relationship (e.g., Pick-up Locations is required before selecting a Car Type).
CoopIS2001 Trento, Italy Ontology representation Conceptual modeling approach Based on Bunge: –Terms (things) –Values –Composition –Precedence
CoopIS2001 Trento, Italy Ontology extraction and matching URL (e.g. HTML Parsing DOM Tree Phase 1 Parsing Phase 2 Labeling HTML Elements Label Identification FORM Elements rules Form Rendering Phase 3 Ontology Phase 4 Merging KB Submission Matching Algorithms Target/Candidate Ontology Target Ontology CandidateO ntology Refined Ontology Ontology Creation Thesaurus
CoopIS2001 Trento, Italy Phase 1: Parsing
CoopIS2001 Trento, Italy Phase 2: Labeling
CoopIS2001 Trento, Italy Phase 2: Labeling
CoopIS2001 Trento, Italy Phase 2: Labeling
CoopIS2001 Trento, Italy Merging Heuristics for the ontology merging (Frakes and Baeza-Yates, 1992) : Textual matching: Date datePickup pickup Ignorable characters removal: *Country country De-hyphenation: Pick-up PickupPickup Pick up Stop terms removal: Date of Return Return Date Stop terms: a, to, do, does, the, in, or, and, this, those, that, … etc. Substring matching: Pickup Location Code Pick-up location (66%) Content matching: Dropoff Day (1,..,31) Return Day (1,..,31)(100%) Dropoff Return Thesaurus matching: Dropoff Location Return Location (100%)
CoopIS2001 Trento, Italy Phase 4: Merging
CoopIS2001 Trento, Italy Preliminary Results Two metrics are used for performance analysis (Frakes and Baeza-Yates, 1992) : Recall (completeness) Precision (soundness) Parameters: t r : number of terms retrieved t m : number of terms matched t e : number of terms effectively matched Recall:Precision:
CoopIS2001 Trento, Italy Preliminary Results Example: # of terms in Ontology1: 20 # of matches identified: 15 Recall: 75%(15/20) # of effective matches: 10 Precision: 66% (10/15) A third metric is used to compare the recall and precision. For a precision value P, a recall value R and an importance measure b, the combined metric E is calculated as (Frakes and Baeza-Yates, 1992) :
CoopIS2001 Trento, Italy Preliminary Results
CoopIS2001 Trento, Italy Preliminary Results
CoopIS2001 Trento, Italy Preliminary Results
CoopIS2001 Trento, Italy Summary and Future Work We have introduced: –Automatic ontology creation –Automatic matching process –Preliminary results Future work oriented towards: –Incorporation of query facilities into the tool –Automatic navigation of web sites for ontology extraction –Dynamic translation between queries against the target ontology to queries against the multiple candidate ontologies