Presentation is loading. Please wait.

Presentation is loading. Please wait.

Roi Adadi David Ben-David.  Semantic Web Document (SWD) ◦ A web page that serializes an RDF graph. ◦ Uses one of the recommended RDF syntax languages,

Similar presentations


Presentation on theme: "Roi Adadi David Ben-David.  Semantic Web Document (SWD) ◦ A web page that serializes an RDF graph. ◦ Uses one of the recommended RDF syntax languages,"— Presentation transcript:

1 Roi Adadi David Ben-David

2  Semantic Web Document (SWD) ◦ A web page that serializes an RDF graph. ◦ Uses one of the recommended RDF syntax languages, i.e. RDF/XML, N-TRIPLE or N3.  Semantic Web Term (SWT) ◦ An RDF resource that represents an instance of rdfs:Class or rdf:Property, and can be universally referenced by its URI reference (URIref).  Semantic Web Ontology (SWO) ◦ An SWD is considered to be an SWO when a significant proportion of the statements it makes defines new SWTs.  Semantic Web Database (SWDB) ◦ An SWD that does not define or extend a significant number of terms. ◦ Introduces individuals and makes assertions about them. ◦ Make assertions about individuals defined in other SWDs. … Computer Science Object Oriented Programming 3.0 … SWD SWT

3 Class Document Class Organization Property mbox FOAF http://xmlns.com/foaf/spec/index.rdf Contain 12 classes and 51 properties (in 466 triples) (No individuals) FOAF http://xmlns.com/foaf/spec/index.rdf Contain 12 classes and 51 properties (in 466 triples) (No individuals)

4 Name statement Nick Name statement FOAF description for Tim Finin www.cs.umbc.edu/~finin//foaf.rdf Defines three individuals and make statements about them (No classes or properties) FOAF description for Tim Finin www.cs.umbc.edu/~finin//foaf.rdf Defines three individuals and make statements about them (No classes or properties)

5  Current form of the Semantic Web ◦ web of Semantic Web Documents (SWD)  Navigating the Semantic Web is difficult ◦ Paucity of explicit hyperlinks (beyond NS in URIrefs). ◦ Relations such as rdfs:seeAlso and owl:imports are rare.  There is a need for a search engine customized for SWD ◦ Find and analyze SWDs on the web. ◦ Suggest a measure for SWDs’ importance (ranking).

6  Semantic Web researchers ◦ Search for SWTs and SWOs for publishing their knowledge.  Software Agents ◦ Search SWDs for external knowledge. ◦ Retrieve SWOs to fully understand SWTs. Find the most popular ontology to publish a personal profile

7  Conventional web navigation and ranking models are not suitable for the Semantic Web.  They do not differentiate SWDs from other web pages.  They do not parse and use the internal structure of SWD and the external semantic links among SWDs ◦ Designed to work with NL and unstructured text The FOAF ontology is not among the 10 search results in Google for “person ontology”

8  Finding appropriate ontologies ◦ Qualified search (Terms + Types) ◦ Ontologies are sorted by their popularity.  Finding instance data ◦ Querying SWDs with constraints on the classes and properties used by them. ◦ Helps to integrate Semantic Web data on the web.  Characterizing the Semantic Web ◦ Structural properties

9  Ontology Based Annotation Systems ◦ SHOE, Ontobroker, webKB, QuizRDF, CREAM, … ◦ Annotating online documents. ◦ Document indexes based on the annotations, but not on the entire document. ◦ Use their own ontologies that might not suit some SWDs

10  Ontology Repositories ◦ DAML Ontology Library, SemWebCentral, Schema Web, … ◦ Collect ontologies (simply store the entire RDF document). ◦ Do not automatically discover SWDs but rather require people to submit URLs. ◦ Constitute a small portion of the Semantic Web.

11  Semantic Web Browsers ◦ W3C’s Ontaria  Searchable and browsable directory of RDF documents developed by the W3C. ◦ Do not automatically discover SWDs. ◦ Stores the full RDF graphs. ◦ Indexes individuals of well known classes  e.g. foaf:Person, rss:Item Experiments show: outperforms them all!

12  Crawler-based indexing and retrieval system for the Semantic web.  Discover semantic web documents  Computes relations between documents  Store and reason over extracted metadata ◦ The system is designed to scale up to handle tens of millions of documents  Enables rich query constraints on semantic relations

13

14  Collects candidate URLs to find and cache SWDs ◦ Submitted URLs. ◦ A Web crawler. ◦ A customized meta-crawler (using conventional search engines). ◦ SwoogleBot Semantic Web Crawler.  Analyzes SWDs to produce new candidates. Up until now Swoogle has found over 1.7M SWDs with more than 1G triples!

15  Analyzes the discovered SWDs  Generates the bulk of Swoogle’s metadata about the Semantic Web ◦ Characterizes features associated with SWDs and SWTs. ◦ Tracks relations among SWDs and SWTs. How SWDs use/define/populate a given SWT? How two SWTs are associated?…

16  Analyzes the generated metadata. ◦ Classification of SWOs and SWDBs.  Hosts the modular ranking mechanisms. ◦ Ontology Rank.

17  provides search services to software agents and users, allowing them to access metadata and navigate the semantic web ◦ Swoogle Search – searches SWDs using constraints on URLs, SWTs being used or defined, etc. ◦ Ontology Dictionary – searches ontologies at the term level and offers more navigational paths.

18  SWD metadata is collected to make SWD search more efficient and effective.  Derived from the content of SWD as well as the relations among SWDs  3 categories of metadata: ◦ Basic metadata ◦ Relations among SWDs ◦ Analytical results

19  Language Features – properties describing the syntactic or semantic features of an SWD. ◦ Encoding – syntactic encoding of an SWD.  “RDF/XML”, “N-TRIPLE” and “N3”. ◦ Language – the language used by an SWD.  “OWL”, “DAML+OIL”, “RDFS” and “RDF”. ◦ OWL Species – the language species of an SWD written in OWL.  “OWL-LITE”, “OWL-DL” and “OWL-FULL”

20  RDF Statistics – properties summarizing node distribution of the RDF graph of an SWD. ◦ How an SWD defines new classes, properties and individuals. ◦ Let foo be an SWD and let C(foo), P(foo), I(foo) be the set of classes, properties and individuals defined in the SWD foo respectively. The onology-ratio R(foo) is calculated by: ◦ R(foo) ranges from 0 to 1, where 0 implies that foo is a pure SWDB and 1 implies that foo is a pure SWO. Computer Science Object Oriented Programming 3.0

21  Ontology Annotations– properties that describe an SWD as an ontology. ◦ The SWD has an instance of OWL:Ontology ◦ Swoogle records the following properties:  label (rdfs:label)  comment (rdfs:comment)  versionInfo (owl:versionInfo/daml:versionInfo)

22  Capturing and analyzing relations at the RDF node level is hard.  Swoogle generalizes RDF node level relations and Focuses on SWD level relations.  Swoogle captures the following SWD level relations: ◦ TM/IN – SWD is using terms defined by some other SWDs. ◦ IM – an ontology imports another ontology. ◦ EX – an ontology extends another ontology ◦ PV – an ontology is a prior version of another. ◦ CPV – an ontology is a prior version of another and is compatible with it. ◦ IPV - an ontology is a prior version of another and is incompatible with it.

23 Indicators of inter-ontology relation

24  OntologyRank inspired by Google’s PageRank algorithm.  Underlying Random Surfing Model: ◦ Surfer jumps to a random URL ◦ With probability d randomly chooses a link to follow. ◦ With probability 1-d jumps to another random URL.

25  Given a document A, A ’s Page rank is computed by: where are web documents that link to A ; C(T) is the total outlinks of T ; and d is a damping factor, typically set to 0.85.

26

27  The graph formed by SWDs has a richer set of relations. ◦ The edges have explicit semantics  Users can navigate the Semantic Web whithin or across the web and RDF graph through 7 groups of navigational paths

28

29  The semantics of links lead to a non-uniform probability of following a particular outgoing link.  Given SWD’s A and B, Swoogle classifies inter-SWD links into four categories: ◦ imports(A,B) – A import all content of B. ◦ uses-term(A,B) – A uses some of the terms defined by B (without importing B). ◦ extends(A,B) – A extends the definitions of terms defined by B. ◦ asserts(A,B) – A makes assertions about the individuals defined by B.  Each category is assigned a different weight, which represents the probability of following that kind of link.

30  Given an SWD a, Swoogle computes its raw rank by: where L(a) is the set of SWDs that link to a, T(x) is the set of SWDs that x links to.

31  Then, Swoogle computes the rank for SWDB and SWO by: where T(c) is the transitive closure of SWOs imported by a.

32  The problem of Indexing and Searching SWDs ◦ Significant semantic information encoded in marked documents. ◦ Reasoning over large collection of documents can be expensive.  Traditional information retrieval techniques ◦ Faster (coarse view of the text). ◦ Can quickly retrieve a set of SWD’s based on similarities of the source text alone.

33  SWDs are not entirely markup. ◦ Search should be applied to both structured and unstructured components of the document.  We may want SWDs to be available to commonly used search engins ◦ Documents must be transformed to a form that a standard IR engine can understand and manipulate.  Well researched methods for ranking matches, computing similarities between documents and employing relevance feedback.

34  Look at a document as a collection of either tokens or N-Grams.  URIrefs of classes, properties and individuals corresponds to words in natural languages.  Apply the following process to an SWD ◦ Reduce it to triples. ◦ Extract URIrefs (with duplicates). ◦ Discard URIrefs of blank nodes. ◦ Hash each URI to a token. ◦ Index the document. indexes by either N-Gram or URIrefs Matching “time” to: http://foo.com/timeont.owl#timeInterval http://foo.com/timeont.owl#calendarClockInterval http://purl.org/upper/temporal/t13.owl#timeThing

35


Download ppt "Roi Adadi David Ben-David.  Semantic Web Document (SWD) ◦ A web page that serializes an RDF graph. ◦ Uses one of the recommended RDF syntax languages,"

Similar presentations


Ads by Google