"000";}; end; Car [0:1] has Price [1:*]; Price matches [8] constant { extract "[1-9]\d{3,6}"; context "\$[1-9]\d{3,6}";}; end;"> "000";}; end; Car [0:1] has Price [1:*]; Price matches [8] constant { extract "[1-9]\d{3,6}"; context "\$[1-9]\d{3,6}";}; end;">

Presentation is loading. Please wait.

Presentation is loading. Please wait.

Semi-Automatically Generating Data-Extraction Ontology Yihong Ding March 6, 2001.

Similar presentations


Presentation on theme: "Semi-Automatically Generating Data-Extraction Ontology Yihong Ding March 6, 2001."— Presentation transcript:

1 Semi-Automatically Generating Data-Extraction Ontology Yihong Ding March 6, 2001

2 Extract information from Web document ------------------------------------------------------------------------- -- Cars Application Ontology -- -- $Revision: 1.2 $ -- -- $Log: cars.osm,v $ -- Revision 1.2 1998/02/20 00:15:55 liddl -- Cleaned up header -- -- Revision 1.1 1998/02/20 00:14:14 liddl -- Initial revision -- Car [-> object]; Car [0:1] has Year [1:*]; Year matches [4] constant { extract "\d{2}"; context "([^\$\d]|^)[4-9]\d[^,\dkK]"; substitute "^" -> "19"; }, { extract "\d{2}"; context "([^\$\d]|^)[4-9]\d,[^\d]"; substitute "^" -> "19"; }, { extract "\d{2}"; context "\b'[4-9]\d\b"; substitute "^" -> "19"; }, { extract "\d{2}"; context "([^\$\d]|^)0\d[^,\dkK]"; substitute "^" -> "20"; },

3 Ontology a computational entity, a resource containing knowledge about what “concepts” exist in the world and how they relate to one another Components Concepts  Domain dependent Context free Context sensitive  Domain independent Context free Context sensitive Relationship (relational schema between the concepts) Constraints Car [-> object]; Car [0:1] has Make [1:*]; Make matches [10] constant { extract "\baudi\b"; }; end; Car [0:1] has Model [1:*]; Model matches [25] constant { extract "80"; context "\baudi\S*\s*80\b"; }; end; Car [0:1] has Mileage [1:*]; Mileage matches [8] constant {extract "\b[1-9]\d{0,2}k"; substitute "[kK]" -> "000";}; end; Car [0:1] has Price [1:*]; Price matches [8] constant { extract "[1-9]\d{3,6}"; context "\$[1-9]\d{3,6}";}; end;

4 My work Pre-assumptions Given information knowledge base that already containing domain dependent and domain independent concepts  Pre-defined ontologies Mikrokosmos, Gene, our ontologies, etc.  Component recognizers date, time, price, phone number, etc. Given sample training Web documents Semi-automatically generate the ontology

5 Architecture Information knowledge base Training Web documents Output final ontology Pattern learning & updating Raw completed ontology Satisfied Partial completed ontology Classify related concepts for the sample documents Need modification User Control Interface Pattern learning & updating Raw completed ontology

6 Example: CIA Factbook Country: China Location: Eastern Asia, bordering the East China Sea, Korea Bay, Yellow Sea, and South China Sea, between North Korea and Vietnam Geographic coordinates: 35 00 N, 105 00 E Map references: Asia Area: total: 9,596,960 sq km land: 9,326,410 sq km water: 270,550 sq km

7 Partial completed ontology CountryName matches [30] constant { extract “\bChina\b”; }, { extract “\bUnited States\b”; }; … end; Location matches [50] constant { extract "\bAsia\b"; }, { extract "\bEurope\b"; }, … { extract “\bYellow Sea\b”; }, … end; Latitude matches [10] constant { extract "\b[1-9]\d{0,2}\b[1- 9]\d{0,1}(E|W)"; }, end; Longitude matches [10] constant { extract "\b[1-9]\d{0,2}\b[1- 9]\d{0,1}(N|S)"; }, end; Number matches [6] constant { extract "[1-9]\d{0,5}"; }, { extract "[1-9]\d{0,2},\d{3}"; }, end; Country: China Location: Eastern Asia, bordering the East China Sea, Korea Bay, Yellow Sea, and South China Sea, between North Korea and Vietnam Geographic coordinates: 35 00 N, 105 00 E Map references: Asia Area: total: 9,596,960 sq km land: 9,326,410 sq km water: 270,550 sq km

8 Raw completed ontology Country [-> object]; Country [0:1] has CountryName [1:1]; Country [0:1] has Location1 [1:*];... Country [0:1] has Location8 [1:*]; Country [0:1] has Latitude [1:*]; Country [0:1] has Longitude [1:*]; Country [0:1] has Number1 [1:*]; Country [0:1] has Number2 [1:*]; Country [0:1] has Number3 [1:*]; -- ** Generalization/Specializations Location1 : Location... Location8 : Location Number1 : Number Number2 : Number Number3 : Number Country: China Location: Eastern Asia, bordering the East China Sea, Korea Bay, Yellow Sea, and South China Sea, between North Korea and Vietnam Geographic coordinates: 35 00 N, 105 00 E Map references: Asia Area: total: 9,596,960 sq km land: 9,326,410 sq km water: 270,550 sq km

9 User control interface Output to user raw completed ontology tagged training web pages the query results User may modify attribute name combine attributes delete useless attributes change relationships add new attributes, new relations, and constraints … When satisfied, output the final ontology Country: China {CountryName} Location: Eastern Asia {Location1}, bordering the East China Sea {Location2}, Korea Bay {Location3}, Yellow Sea {Location4}, and South China Sea {Location5}, between North Korea {Location6}, and Vietnam {Location7} Geographic coordinates: 35 00 N {Latitude}, 105 00 E {Longitude} Map references: Asia {Location8} Area: total: 9,596,960 {Number1} sq km land: 9,326,410 {Number2} sq km water: 270,550 {Number3} sq km Country: China {CountryName} Location: Eastern Asia {Location1}, bordering the East China Sea {Location2}, Korea Bay {Location3}, Yellow Sea {Location4}, and South China Sea {Location5}, between North Korea {Location6}, and Vietnam {Location7} Geographic coordinates: 35 00 N {Latitude}, 105 00 E {Longitude} Map references: Asia {MapReference} Area: total: 9,596,960 {TotalArea} sq km land: 9,326,410 {LandArea} sq km water: 270,550 {WaterArea} sq km Country: China {CountryName} Location: Eastern Asia, bordering the East China Sea, Korea Bay, Yellow Sea, and South China Sea, between North Korea, and Vietnam {Location} Geographic coordinates: 35 00 N {Latitude}, 105 00 E {Longitude} Map references: Asia {MapReference} Area: total: 9,596,960 {TotalArea} sq km land: 9,326,410 {LandArea} sq km water: 270,550 {WaterArea} sq km

10 Problems Obtain knowledge base Classify related concepts for the sample documents Refine Tag the document based on the raw completed ontology User interface design and control Update strategy to raw completed ontology based on user modification

11 Contribution Exploit existing knowledge Semi-automatically generate an extraction ontology


Download ppt "Semi-Automatically Generating Data-Extraction Ontology Yihong Ding March 6, 2001."

Similar presentations


Ads by Google