Presentation is loading. Please wait.

Presentation is loading. Please wait.

Algorithm to populate Telecom domain OWL-DL ontology with A-box object properties derived from Technical Support Documents 1 Kouznetsov A, 2 Shoebottom.

Similar presentations


Presentation on theme: "Algorithm to populate Telecom domain OWL-DL ontology with A-box object properties derived from Technical Support Documents 1 Kouznetsov A, 2 Shoebottom."— Presentation transcript:

1 Algorithm to populate Telecom domain OWL-DL ontology with A-box object properties derived from Technical Support Documents 1 Kouznetsov A, 2 Shoebottom B, 1 Baker CJO 1 Department of Computer Science and Applied Statistics, University of New Brunswick, Saint John, Canada 2 Innovatia, Inc, Saint John, Canada

2 Motivation: Why Ontology-Centric? Problem: To respond information requests timely contact center workers need to search through many types of knowledge resources Challenge: increasing quality of service and decreasing contact center costs Solution: using the ontology‐centric platform – less escalation to more experienced workers – less time spent in resolving cases – training time is also greatly reduced

3 Motivation: Why Text Mining? Problem : Significant time spent by highly educated experts in populating ontology. Challenge: Reduce the workload Solution: Apply text mining - semiautomatic method for extracting information, specifically named entities and their relations, from texts and populating a domain ontology.

4 Focus We are focused on the problem of accurately extracting and populating relations between the named entities and presenting them as object properties between A-box individuals in an OWL-DL ontology.

5 Populate A-box Object Property. Single Property Domain Class Man Range Class Woman Object Property hasSister Domain Instance Samuel Range Instance Mary ? T-Box A-Box

6 Populate A-box Object Property. Multi- properties Domain Class Man Range Class Woman Object Property hasSister T-Box A-Box Object Property hasMother Domain Instance Samuel Range Instance Mary hasSister ? hasMother ?

7 More complicate case…. Domain Instance Samuel Range Instance Mary hasSister ? hasMother ? hasSameLastName ?

8 Methodology Ontology-based information retrieval applies Natural Language processing (NLP) to link text segments, named entities and relations between named entities to existing ontologies. Algorithm leverages a customized gazetteer list, including lists specific to object property synonyms Score A-box property candidates by using functions of distance between co-occurred terms. A-box Property prediction and population based on these scores (Thresholds, Fuzzy approach)

9 Main Implementation tools  Java  GATE/JAPE  OWLAPI

10 Semi-Automatic Ontology populating pipeline Source Documents XML Pre processing Synonyms Lists Text Segments Processing Text Segments Separation Sentences Tables Other Text Segments Ontology unpopulated (OWL) Term List (Excel) Ontology Population Named Entities Single Relations Multi Relations Populated Ontology Using Ontology Reasoning Visualizing Visual Queries Connecting Recourses

11 Populating Ontology Scoring Framework Co-occurrence Based Scores generator Relation Framework for A-box candidates extraction Candidate Decision Framework Decision module Reasoning Ontology Scores Focus Labelled Data Tres

12 Co-occurrence Based Scores generator Co-occurrence Based Scores generator (Light version) A-box Candidate All related content Scores Relations Framework Relation Object Tokenizer Gazetteer Score calculator Integrator Fragments Processor Synonyms List

13 Generation of Scores Relation Collection Framework to process Relation objects Relation Object integrates object property with: all types of related text fragments ontology objects and score processing intermediate and final results identified as : Domain Class: Domain Instance : Object Property : Range Class: Range Instance

14 Scores Generator: Details Score Calculator: Score calculation for text fragments associated with the Relation. Current version based on distance between occurred entities and number of text fragments with co-occurrence Includes by Text Fragments Processor and Integrator

15 2-terms and 3-terms scoring system Tokenizer Score Gazeteer Score Processor Domain Synonyms list Range Synonyms list Object Property Synonyms list Tokenized sentence sentence score Legend Legacy (2 terms) System Modified/Added on new (3 terms) system

16 Multiple Formats Score Generation Technical documentation contains knowledge displayed in multiple formats, each requiring different processing subroutines: Table Processing Sentence Processing Other segments

17 Extensible Data Model Document Segment Table Segment Data Cell ID Content Row Header ID Content Column Header ID Content Table Header ID Content Text Segment Sentence ID Content Document Corpus Doc ID Options: Sections, Paragraphs, Bullet lists, Headings

18 A-Box Prop. Population A-Box property candidates list Text Mining corpus Gazetteer List A-Box Obj. Properties (399) Properties with occurrence of domain or range Individuals (256) Properties with co-occurrence of domain and range Individuals (143) Ontology processing T-Box Obj. Properties (102)

19 A-Box scoring Evidences for A-box Obj. Property candidates Current A-box Object Property Candidate Evidences for Current A-box (co-occurrence of Domain and Range) Text Segment Sentence ID Content Text Segment Sentence ID Content Text Segment Sentence ID Content Text Segment Sentence ID Content Table Segment Data Cell ID Content Row Header ID Content Column Header ID Content Table Header ID Content Table Segment Data Cell ID Content Row Header ID Content Column Header ID Content Table Header ID Content Table Segment Data Cell ID Content Row Header ID Content Column Header ID Content Table Header ID Content Table Segment Data Cell ID Content Row Header ID Content Column Header ID Content Table Header ID Content Evidences for Current A-box (occurrence of Domain or Range) Text Segment Sentence ID Content Text Segment Sentence ID Content Text Segment Sentence ID Content Text Segment Sentence ID Content Table Segment Data Cell ID Content Row Header ID Content Column Header ID Content Table Header ID Content Table Segment Data Cell ID Content Row Header ID Content Column Header ID Content Table Header ID Content Table Segment Data Cell ID Content Row Header ID Content Column Header ID Content Table Header ID Content Table Segment Data Cell ID Content Row Header ID Content Column Header ID Content Table Header ID Content

20 Table Segments: Primary Scoring Table Segment Data Cell ID Content Row Header ID Content Column Header ID Content Table Header ID Content A-Box scoring Current A-box Object Property Candidate DomainPropertyRange

21 Table Segments: Secondary Scoring Table Segment Data Cell ID Content Row Header ID Content Column Header ID Content Table Header ID Content A-Box scoring Current A-box Object Property Candidate DomainPropertyRange

22 Sentence Scoring A-box Object property Score for sentence SentenceScore=1/(distance+1)+Bonus Integrated Object property Score over all related sentences IntegratedScore= SUM(SentenceScore) Summarize Integrated Score with Table Scores Normalized Object property Score NormolizedScore= IntegratedScore/Norm

23 Sentence scoring Score=1/(distance+1)+Bonus 1 DR 2 123D4R 4 12PD4R 3 123D4R6P Domain Synonym Range Synonym Object Property Synonym DRP Distance: 1000, Bonus =0, Score= 1/(1000+1)+0=0.00099 Distance: 4, Bonus =0, Score= 1/(4+1)+0=0.2 Distance: 6, Bonus =3, Score= 1/(6+1)+3=3.14 Distance: 4, Bonus =10, Score= 1/(4+1)+10=10.2

24 Example Sentence Type 1 1 DR Distance: 1000, Bonus =0, Score= 1/(1000+1)+0=0.00099 sentence before cleaning: [" Rotate the insert/extract levers to eject the 8660 SDM from the chassis.] Final Score=9.99000999000999E-4 Best Bonus=0.0 Final Distance=1000.0 Telecommunications_Chassis:8010co_Chassis:hasChassis_Shipping_Accessories:Telecomm unications_Chassis_Screws:Screws Property Synonyms: need have require has Domain Synonyms: 8010co chassis 8010co Chassis 8010 CO chassis 8010co 8010CO chassis Range Synonyms: Screws screws

25 Example Sentence Type 2 sentence after cleaning: In a chassis that includes two power supplies in a non redundant power configuration, you must start both restrictions dual power supplies power supply units within 2 seconds of each other. Final Score=0.05 Best Bonus=0.0 Final Distance=19 Telecommunications_Chassis:Chassis:hasChassis_Components:Telecommunicatio ns_Chassis_Power_Supply:Power_Supply Property Synonyms: have has Domain Synonyms: chassis switch chassis 8000 series Chassis CO chassis Range Synonyms: Power Supply transformer power supply power module Power supply 2 123D4R

26 Example Sentence Type 4 sentence after cleaning: In a chassis that includes two power supplies in a non redundant power configuration, you must start both restrictions dual power supplies power supply units within 2 seconds of each other. Final Score=10.05 Best Bonus=10.0 Final Distance=19 Telecommunications_Chassis_Power_Supply:Power_Supply:isPart_of_Chassis:Telecommuni cations_Chassis:Chassis Property Synonyms: used in include Domain Synonyms: Power Supply transformer power supply power module Power supply Range Synonyms: chassis switch chassis 8000 series Chassis CO chassis 4 12PD4R

27 Bonus Calculation 12PD4R6 123DR6P Distance: 6, Bonus Constant =10, Tokens in Property=2, Score= 1/(6+1)+2*10=20.14 Distance: 6, Bonus Constant=10, Tokens in Property=1, Score= 1/(6+1)+1*10=10.14 P 3 Bonus= Bonus Constant * Number of tokens in property Sentence Example: Device X does not support Device Y Object Properly Tokens Number Obtained Score Support 1 1/(3+1)+1*10=10.25 Not Support 2 1/(3+1)+2*10=20.25 V

28 Normalization Norm coefficient for A-box object property Log(1.0+(NSD+1.0/Cd) *(NSR+1.0/Cr) ) NSD – Number Of Sentences Domain Occurred Cd – Domain Synonyms List Cardinality NSR – Number Of Sentences Range Occurred Cr – Range Synonyms List Cardinality

29 Gold Standard and Evaluation Framework A-Box Ontology T-Box Ontology Labels Evaluation Report Source Documents XML Pre processing Synony ms Lists Text Segments Processing Text Segmen ts Separati on Senten ces Tables Bullet Lists Ontology unpopulated (OWL) Term List (Excel) Ontology Population Name d Entitie s Single Relati ons Multi Relati ons Populated Ontology Using Ontology Reasoni ng Visualizi ng Visual Queries Connect ing Recours es Populate Ontology Prediction evaluation Framework Evaluate predicted Properties / Update DB Golden Standard Database Import labels Knowledge Engineer

30 Thresholds: Decision Boundary  All scores for each A-box property candidate are summarized for based on eligible sources of evidence for the A-box in question  Threshold in use  Trade off - Recall vs. Precision

31 Results for Tables: Baseline result Focus on Positive class Recall and Positive class Precision  Class of interest (Positive class)  Recall =0.80  Precision=0.85

32 Results for Tables: Continued Focus on Positive class Precision  Class of interest (Positive class)  Recall =0.25  Precision=1.0

33 Results for Tables: Continued Focus on Positive class Recall  Class of interest (Positive class)  Recall =1.0  Precision=77.5

34 Results for Sentences Focus on Positive class Precision  Class of interest (Positive class)  Recall =0.14  Precision=1.0

35 Results for Sentences and Tables Focus on Positive class Precision  Class of interest (Positive class)  Recall =0.4  Precision=1.0  Synergetic effect of using Sentences and Tables (wrt Precision=1.0): Recall (sentences)= 0.14 Recall (tables)= 0.25 Recall (sentences & tables)= 0.4

36 Advantages  Improve Quality of Knowledge Base  Managing the argumentation process KB vs KE  Iterative improvement of accuracy  Tier1 doing Tier 2 task (improve service)  Tier1 (high precision) KB query  Tier 2 (high recall) – knowledge integration  Facilitate information processing without KE  Reduce workload (saving)

37 Improve Quality of Knowledge Base Offline task by Knowledge Engineer Disambiguation – Expert can pay special attention to any significant inconsistency in human and machine outputs such as - Highly scored A-box candidates labeled as negatives Human Expert & Machine Committee vs. single human expert

38 Real Time Integration of New Evidence Online, by call centre worker, at knowledge use stage – Extracting additional object properties from new documents for emergency case – High Positive Precision focused scenario Offline, by Senior call centre worker, at knowledge use stage – Extracting additional object properties from new documents for questions not answered online – High Positive Recall focused scenario

39 Reduce Workload Online and Offline Automatically Extracted Evidenced Ranked Solutions with notified level of confidence

40 Gold Standard Corpus and Evaluation Framework A-Box Ontology T-Box Ontology Labels Evaluation Report Source Documents XML Pre processing Synony ms Lists Text Segments Processing Text Segmen ts Separati on Senten ces Tables Bullet Lists Ontology unpopulated (OWL) Term List (Excel) Ontology Population Name d Entitie s Single Relati ons Multi Relati ons Populated Ontology Using Ontology Reasoni ng Visualizi ng Visual Queries Connect ing Recours es Populate Ontology Prediction evaluation Framework Evaluate predicted Properties / Update DB Golden Standard Database Import labels Knowledge Engineer

41 Future Work: Extend Literature Scheme Sections Paragraphs Bullet Lists Connect with Headings and Topics


Download ppt "Algorithm to populate Telecom domain OWL-DL ontology with A-box object properties derived from Technical Support Documents 1 Kouznetsov A, 2 Shoebottom."

Similar presentations


Ads by Google