Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Ontology Based Extraction of RDF Data from the World Wide Web Tim Chartrand Masters Thesis Research Supported By NSF.

Similar presentations


Presentation on theme: "1 Ontology Based Extraction of RDF Data from the World Wide Web Tim Chartrand Masters Thesis Research Supported By NSF."— Presentation transcript:

1 1 Ontology Based Extraction of RDF Data from the World Wide Web Tim Chartrand Masters Thesis Research Supported By NSF

2 2 Introduction World Wide Web Has a huge amount of existing information Designed primarily for human consumption Semantic Web Is an extension of WWW Gives information a well-defined meaning Allows automation of tasks DEG contribution – Extract data from the WWW Solution Extract Semantic Web data from the WWW Superimpose extracted data

3 3 RDF Browser Extraction Ontology Extraction Engine HTML Page Relational Data Research Overview Extraction Ontology DAML Ontology User Extraction Engine HTML Page Relational Data RDF Data

4 4 RDF – What is it? Resource Description Framework Language of the Semantic Web Set of triples “25” Extraction Ontology DAML Ontology User Extraction Engine HTML Relational Data RDF Data mailto:tim@cs.byu.edu 25 genealogy:age mailto:tyler@thechartrands.com genealogy:fatherOf

5 5 DAML Core Concepts daml:class – defines a class daml:property – defines a binary relation, has a value rdfs:domain – specifies class to which a property applies rdfs:range – specifies possible values of a property daml:uniqueProperty, daml:unambiguousProperty – specify cardinality constraints for a property Extraction Ontology DAML Ontology User Extraction Engine HTML Relational Data RDF Data

6 6 Example Ontology... Program OperatingSystem...... Extraction Ontology DAML Ontology User Extraction Engine HTML Relational Data RDF Data

7 7 DAML  OSM Class  Non-lexical object set Property  Binary relationship set between object sets Literal property  Lexical object set and binary relationship set between non-lexical and lexical object sets Cardinality restriction  Participation constraint Extraction Ontology DAML Ontology User Extraction Engine HTML Relational Data RDF Data

8 8 Program OperatingSystem... DAML  OSM Extraction Ontology DAML Ontology User Extraction Engine HTML Relational Data RDF Data

9 9 Data Frames Lexical object sets need data frames. Use data-frame library Match lexical object sets with data frames Compare stemmed names and aliases Levenshtein edit distance Soundex Longest common subsequence Weighted average Specialization heuristic Choose most similar data frame (above a threshold) Extraction Ontology DAML Ontology User Extraction Engine HTML Relational Data RDF Data

10 10 User Modification Provide graphical ontology editor Automate graph layout Allow the user to edit participation constraints Allow user to edit data-frame mapping Provide data frame editor Extraction Ontology DAML Ontology User Extraction Engine HTML Relational Data RDF Data

11 11 Extracting the Data Extraction Ontology DAML Ontology User Extraction Engine HTML Relational Data RDF Data

12 12... Stick Death 1.0 Advance in levels, grab weapons, and unlock new levels and characters. OS: Windows 3.x/95/98/Me/NT/2000/XP File Size: 2.66MB License: Free 05/14/2002 new 2,235 Download now... Pointing to the Data Extraction Ontology DAML Ontology User Extraction Engine HTML Relational Data RDF Data xpointer(string-range(/html[1]/body[1]/table[1]/tr[1], ’’, 10, 3))

13 13 Convert to RDF Extraction Ontology DAML Ontology User Extraction Engine HTML Relational Data RDF Data http://www.deg.byu.edu/software.html#Program1001 software:Program Stick Death1.0 Windows 2.66MB rdf:type software:name software:version software:OperatingSystem software:ProgSize software:SizeVal software:SizeUnit software:Size rdf:type 3.x/95/98/Me/NT/2000/X software:OperatingSystem software:OSVersion software:OSName rdf:type

14 14 Superimposed Data Extraction Ontology DAML Ontology User Extraction Engine HTML Relational Data RDF Data

15 15 Results RDF Data Extraction and Viewing Built 4 data-extraction ontologies 3 from DAML ontologies for data extraction 1 from an existing DAML ontology Most existing DAML ontologies not good for data extraction Data Frame Matcher 8 ‘training ontologies’, 16 test ontologies 128 lexical object sets, 40 correct matches, 12 incorrect matches Precision: 77% Recall: 89% Experiment (apartment rentals): 6 students 3 data frames Phone: 2.8 min RentalRate: 16.5 min Bedrooms: 17.5 min

16 16 Contributions Advancement of Semantic Web Application of Information Extraction to building Semantic Web content Semantic Web data as superimposed information Algorithm for ontology conversion

17 17 Future Work Data extraction Enhance name matcher with data values Support n-ary relationship sets RDF data generation Generate only one URI for an object Associate concepts from DAML ontologies to well-known DAML ontologies


Download ppt "1 Ontology Based Extraction of RDF Data from the World Wide Web Tim Chartrand Masters Thesis Research Supported By NSF."

Similar presentations


Ads by Google