Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data-Extraction Ontology Generation by Example Yuanqiu (Joe) Zhou Data Extraction Group Brigham Young University Sponsored by NSF.

Similar presentations


Presentation on theme: "Data-Extraction Ontology Generation by Example Yuanqiu (Joe) Zhou Data Extraction Group Brigham Young University Sponsored by NSF."— Presentation transcript:

1 Data-Extraction Ontology Generation by Example Yuanqiu (Joe) Zhou Data Extraction Group Brigham Young University Sponsored by NSF

2 Motivation  Semi-structured Web data need to be extracted for further manipulations.  Contrast to other wrapper generation techniques, BYU ontology-based data-extraction technique is resilient.  By-Example approach makes it possible to help common users generate ontologies easily.

3 Web-based System GUI CanonPowerShot S40 4.01600 x 1200 1024 x 768 640 x 480

4 Architecture Data Frame Library User Defined Form System GUI Sample Pages Ontology Generator Extraction EngineTest PagesPopulated Database Extraction Ontology

5 Extraction Ontology  Object and Relationship Sets and Constraints  Extraction Patterns  Keywords  Context Expressions

6 Base A B C D1D2 E1E2 Base [0:1] A [1:*] Base [0:2] B [1:*] Base [0:*] C [1:*] Base [0:2] D1 [1:*] D2 [1:*] Base [0:*] E1 [1:*] E2 [1:*] Ontology Generation Object and Relationship Sets and Constraints

7 Base A B … A B1 B2 B1, B2 : B G HI F A [0:1] F [1:*] B1 [0:1] G [1:*] B2 [0:1] H [1:*] I [1:*] Ontology Generation Object and Relationship Sets and Constraints

8 Sample Web PageUser Created Form CCD ResolutionImage Resolution Optical Zoom Digital Zoom Digital Camera Brand Model Zoom PowerShot G2Canon 4.02272 x 1074 3 2 Object and Relationship Sets and Constraints DigitalCamera [-> object] DigitalCamera [0:1] Brand [1:*] DigitalCamera [0:1] Model [1:*] DigitalCamera [0:1] CCDResolution [1:*] DigitalCamera [0:1] ImageResolution [1:*] DigitalCamera [0:1] Zoom [1:*] Zoom [0:1] DigitalZoom [1:*] Zoom [0:1] OpticalZoom [1:*]

9 Ontology Generation Extraction Patterns  Data Frame Library  Lexicons  Synonym Dictionaries or thesauri  Regular Expressions  Matching extraction patterns:  Only one (bingo!)  More than one (use extraction pattern filters)  No matching extraction pattern (create one)

10  Features a high-quality 4.0 Megapixel Resolution CCD  The new Nikon Coolpix 995 boasts of a 3.34 Megapixel CCD  3 effective megapixel Ontology Generation Keywords

11  3.5x optical zoom (2.5x digital)  a superior 4x Optical Zoom Nikkor lens, plus 4x stepless digital zoom  optical 3X /digital 6X zoom Ontology Generation Context Expressions

12 DigitalCamera [-> object]; DigitalCamera [0:1] Brand [1:*]; DigitalCamera [0:1] ImageResolution [1:*]; DigitalCamera [0:1] Zoom [1:*]; DigitalCamera [0:1] CCDResolution [1:*]; Zoom[0:1] OpticalZoom[1:*]; Brand matches [10] constant{ extract "\bNikon\b";}, { extract "\bCanon\b";}, { extract "\bOlympus\b";}, { extract "\bMinolta\b";}, { extract "\bSony\b";}; end; CCD Resolution matches [20] constant{ extract "\b\d(\.\d{1,2})?\b"; }; keyword "\bMegapixel\b“, "\bCCD\b", "\bCCD Resolution\b"; end; OpticalZoom matches [10] constant{ extract "\b\d(\.\d)"; context "\b\d(\.\d)?(x)\b"; }; keyword "\boptical\b"; end; Extraction Ontology

13 Measurements  How much of the ontology was generated with respect to how much could have been generated?  How many components generated should not have been generated?  What comparisons can we make about the precision and recall ratios of extraction data between a system- generated ontology and an expert-generated ontology?  How many sample pages are necessary for acceptable system performance?

14 Contributions  Proposes a by-example approach to semi- automatically generate data-extraction ontologies  Constructs a Web-based tool to generate data-extraction ontologies


Download ppt "Data-Extraction Ontology Generation by Example Yuanqiu (Joe) Zhou Data Extraction Group Brigham Young University Sponsored by NSF."

Similar presentations


Ads by Google