Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data-Extraction Ontology Generation by Example Yuanqiu (Joe) Zhou Data Extraction Group Brigham Young University Sponsored by NSF.

Similar presentations


Presentation on theme: "Data-Extraction Ontology Generation by Example Yuanqiu (Joe) Zhou Data Extraction Group Brigham Young University Sponsored by NSF."— Presentation transcript:

1 Data-Extraction Ontology Generation by Example Yuanqiu (Joe) Zhou Data Extraction Group Brigham Young University Sponsored by NSF

2 Motivation  Semi-structured Web data need to be extracted for further manipulations.  Contrast to other wrapper generation techniques, BYU ontology-based data-extraction technique is resilient.  By-Example approach makes it possible to help common users generate ontologies easily.

3 Web-based System GUI Canon PowerShot S40 4.0 1600 x 1200 1024 x 768 640 x 480

4 Architecture Data Frame Library User Defined Form System GUI Sample Pages Ontology Generator Extraction EngineTest PagesPopulated Database Extraction Ontology

5 Extraction Ontology  Object and Relationship Sets and Constraints  Extraction Patterns  Keywords and Context Expressions

6 Ontology Generation Object and Relationship Sets and Constraints Base [0:1] A [1:*] Base [0:2] B [1:*] Base [0:2] D1 [1:*] D2 [1:*] Base [0:*] C [1:*] Base [0:*] E1 [1:*] E2 [1:*]

7 Ontology Generation Object and Relationship Sets and Constraints A [0:1] F [1:*] B1 [0:1] G [1:*] B2 [0:1] H [1:*] I [1:*] … …… … B1, B2 : B

8 Ontology Generation Extraction Patterns  Data Frame Library  Lexicons  Synonym Dictionaries or thesauri  Regular Expressions  Matching extraction patterns:  Only one  More than one (use extraction pattern filters)  None (create one)

9  3.5x optical zoom (2.5x digital)  a superior 4x Optical Zoom Nikkor lens, plus 4x stepless digital zoom  optical 3X /digital 6X zoom Ontology Generation Keywords and Context Expressions

10 User Defined Forms Object and Relationship Sets and Constraints DigitalCamera [-> object] DigitalCamera [0:1] Brand [1:*] DigitalCamera [0:1] Model [1:*] DigitalCamera [0:1] CCDResolution [1:*] DigitalCamera [0:1] ImageResolution [1:*] DigitalCamera [0:1] Zoom [1:*] Zoom [0:1] DigitalZoom [1:*] Zoom [0:1] OpticalZoom [1:*] Sample Web Page PowerShot G2 Canon 4.0 2272 x 1074 3 2

11 DigitalCamera [-> object]; DigitalCamera [0:1] Brand [1:*]; DigitalCamera [0:1] ImageResolution [1:*]; DigitalCamera [0:1] Zoom [1:*]; DigitalCamera [0:1] CCDResolution [1:*]; Zoom[0:1] OpticalZoom[1:*]; Brand matches [10] constant{ extract "\bNikon\b";}, { extract "\bCanon\b";}, { extract "\bOlympus\b";}, { extract "\bMinolta\b";}, { extract "\bSony\b";}; end; CCD Resolution matches [20] constant{ extract "\b\d(\.\d{1,2})?\b"; }; keyword "\bMegapixel\b”, "\bCCD\b", "\bCCD Resolution\b"; end; OpticalZoom matches [10] constant{ extract "\b\d(\.\d)"; context "\b\d(\.\d)?(x)\b"; }; keyword "\boptical\b"; end; Extraction Ontology

12 Measurements  How much of the ontology was generated with respect to how much could have been generated?  How many components generated should not have been generated?  What comparisons can we make about the precision and recall ratios of extraction data between a system- generated ontology and an expert-generated ontology?  How many sample pages are necessary for acceptable system performance?

13 Contributions  Proposes a by-example approach to semi- automatically generate data-extraction ontologies  Constructs a Web-based tool to generate data-extraction ontologies


Download ppt "Data-Extraction Ontology Generation by Example Yuanqiu (Joe) Zhou Data Extraction Group Brigham Young University Sponsored by NSF."

Similar presentations


Ads by Google