Presentation on theme: "1 Introduction to KR and Semantic Web Many slides are based on tutorial by Ivan Herman (W3C) reproduced here with kind permission. Additionally, many of."— Presentation transcript:
1 Introduction to KR and Semantic Web Many slides are based on tutorial by Ivan Herman (W3C) reproduced here with kind permission. Additionally, many of the intro slides are based on tutorial by Sean Bechhofer, Ian Horrocks and Peter F. Patel-Schneider, with additions by Suresh Manandhar and Dimitar Kazakov.
2 History of the Semantic Web Web was “invented” by Tim Berners-Lee (amongst others), a physicist working at CERN TBL’s original vision of the Web was much more ambitious than the reality of the existing (syntactic) Web: TBL (and others) have since been working towards realising this vision, which has become known as the Semantic Web E.g., article in May 2001 issue of Scientific American… “... a goal of the Web was that, if the interaction between person and hypertext could be so intuitive that the machine- readable information space gave an accurate representation of the state of people's thoughts, interactions, and work patterns, then machine analysis could become a very powerful management tool, seeing patterns in our work and facilitating our working together through the typical problems which beset the management of large organizations.”
3 Realising the complete “vision” is too hard for now (probably) But we can make a start by adding semantic annotation to web resources Scientific American, May 2001:
4 Where we are Today: the Syntactic Web [Hendler & Miller 02]
5 The Syntactic Web is… A hypermedia, a digital library A library of documents called (web pages) interconnected by a hypermedia of links A database, an application platform A common portal to applications accessible through web pages, and presenting their results as web pages A platform for multimedia BBC Radio 4 anywhere in the world! Terminator 3 trailers! A naming scheme Unique identity for those documents A place where computers do the presentation (easy) and people do the linking and interpreting (hard). Why not get computers to do more of the hard work? [Goble 03]
6 Hard Work using the Syntactic Web… Find images of Peter Patel-Schneider, Frank van Harmelen and Alan Rector… Rev. Alan M. Gates, Associate Rector of the Church of the Holy Spirit, Lake Forest, Illinois
7 Impossible (?) using the Syntactic Web… Complex queries involving background knowledge Find information about “animals that use sonar but are not either bats or dolphins” Locating information in data repositories Travel enquiries Prices of goods and services Results of human genome experiments Finding and using “web services” Visualise surface interactions between two proteins Delegating complex tasks to web “agents” Book me a holiday next weekend somewhere warm, not too far away, and where they speak French or English, e.g., Barn Owl
8 What is the Problem? Consider a typical web page: Markup consists of: –rendering information (e.g., font size and colour) –Hyper-links to related content Semantic content is accessible to humans but not (easily) to computers…
9 What information can we see… WWW2002 The eleventh international world wide web conference Sheraton waikiki hotel Honolulu, hawaii, USA 7-11 may location 5 days learn interact Registered participants coming from australia, canada, chile denmark, france, germany, ghana, hong kong, india, ireland, italy, japan, malta, new zealand, the netherlands, norway, singapore, switzerland, the united kingdom, the united states, vietnam, zaire Register now On the 7 th May Honolulu will provide the backdrop of the eleventh international world wide web conference. This prestigious event … Speakers confirmed Tim berners-lee Tim is the well known inventor of the Web, … Ian Foster Ian is the pioneer of the Grid, the next generation internet …
10 What information can a machine see…
14 Need to Add “Semantics” External agreement on meaning of annotations E.g., Dublin Core Agree on the meaning of a set of annotation tags Problems with this approach Inflexible Limited number of things can be expressed Use Ontologies to specify meaning of annotations Ontologies provide a vocabulary of terms New terms can be formed by combining and extending existing ones Meaning (semantics) of such terms is formally specified Can also specify relationships between terms in multiple ontologies
15 An ontology is an engineering artifact: It is constituted by a specific vocabulary used to describe a certain reality, plus a set of explicit assumptions regarding the intended meaning of the vocabulary. Thus, an ontology describes a formal specification of a certain domain: Shared understanding of a domain of interest Formal and machine manipulable model of a domain of interest “An explicit specification of a conceptualisation” [Gruber93] Ontology in Computer Science
16 Structure of an Ontology Ontologies typically have two distinct components: Names for important concepts in the domain Elephant is a concept whose members are a kind of animal Herbivore is a concept whose members are exactly those animals who eat only plants or parts of plants Adult_Elephant is a concept whose members are exactly those elephants whose age is greater than 20 years Background knowledge/constraints on the domain Adult_Elephants weigh at least 2,000 kg All Elephants are either African_Elephants or Indian_Elephants No individual can be both a Herbivore and a Carnivore
17 Example Ontology
18 A Semantic Web — First Steps Extend existing rendering markup with semantic markup Metadata annotations that describe content/funtion of web accessible resources Use formal Ontologies to provide vocabulary for annotations “Formal specification” is accessible to machines A prerequisite is a standard web ontology language Need to agree common syntax before we can share semantics Syntactic web based on standards such as HTTP and HTML Make web resources more accessible to automated processes
19 Ontology Design and Deployment Given key role of ontologies in the Semantic Web, it will be essential to provide tools and services to help users: Design and maintain high quality ontologies, e.g.: Meaningful — all named classes can have instances Correct — captured intuitions of domain experts Minimally redundant — no unintended synonyms Richly axiomatised — (sufficiently) detailed descriptions Store (large numbers) of instances of ontology classes, e.g.: Annotations within web pages Answer queries over ontology classes and instances, e.g.: Find more general/specific classes Retrieve annotations/pages matching a given description Integrate and align multiple ontologies
20 Lets have a go towards building a semantic website
21 HTML and XML Book Title: AI introduction Author: Frank Russell ISBN: X HTML: AI introduction Frank Russell X XML:
22 XML documents are trees over textbox nodes XML Schema: grammar to describe legal trees Problem: No agreed semantics book authorisbntitle
23 Sharing data across the web Requires Shared Syntax: XML provides this Shared Semantics : XML does not provide this
24 But this is a difficult task Web documents also contain unstructured pages Words are ambiguous bank – river bank or financial institution (context dependent, so cannot do dictionary lookup) Even though sentences may be unambiguous. Mostly, but not always: I saw a man with a telescope But the web is a collection of documents
25 But hang on The web is not just a collection of documents The web is lot more: It’s a collection of links between documents It’s a collection of structured documents Hence, we can gain a lot by attaching shared semantics to: the links the structure within documents At least, this is a start!!
26 But how do we do it?
27 Example: The Music site of the BBC 27
29 Site editors roam the Web for new facts may discover further links while roaming They update the site manually And the site gets soon out-of-date How to build such a site 1.
30 Editors roam the Web for new data published on Web sites “Scrape” the sites with a program to extract the information Ie, write some code to incorporate the new data Easily gets out of date again… Content changes How to build such a site 2.
31 Editors roam the Web for new data via API-s Understand those… input, output arguments, datatypes used, etc Write some code to incorporate the new data Easily get out of date again… Format changes How to build such a site 3.
32 Use external, public datasets Wikipedia, MusicBrainz, … They are available as data not API-s or hidden on a Web site data can be extracted using, e.g., HTTP requests or standard queries standardised formats/semantics The choice of the BBC
33 Use the Web of Data as a Content Management System Use the community at large as content editors Get others to do the work In short…
34 And this is no secret…
35 There are more an more data on the Web government data, health related data, general knowledge, company information, flight information, restaurants,… More and more applications rely on the availability of that data Data on the Web
36 But… data are often in isolation, “silos” Photo credit “nepatterson”, Flickr
37 A “Web” where documents are available for download on the Internet but there would be no hyperlinks among them Imagine…
38 And the problem is real…
39 We need a proper infrastructure for a real Web of Data data is available on the Web accessible via standard Web technologies data are interlinked over the Web ie, data can be integrated over the Web This is where Semantic Web technologies come in We want machines to do the integration. But this requires machines to understand data Data on the Web is not enough…
40 Lets have a go i.e. connect the silos Photo credit “kxlly”, Flickr
41 We will use a simplistic example to introduce the main Semantic Web concepts In what follows…
42 Map the various data onto an abstract semantically grounded data representation Merge the resulting representations Start making queries on the whole! queries not possible on the individual data sets The rough structure of data integration
43 We start with a book...
44 A simplified bookstore data (dataset “A”)
45 1 st : export your data as a set of relations …isbn/ X Ghosh, Amitav The Glass Palace 2000 London Harper Collins a:title a:year a:city a:p_name a:name a:homepage a:author a:publisher
46 Relations form a graph the nodes refer to the “real” data or contain some literal how the graph is represented in machine is immaterial for now Some notes on the exporting the data
47 Same book in French…
48 Another bookstore data (dataset “F”) ABCD 1 IDTitreTraducteurOriginal 2 ISBN Le Palais des Miroirs$A12$ISBN X IDAuteur 7 ISBN X $A11$ Nom 11 Ghosh, Amitav 12 Besse, Christianne
49 2 nd : export your second set of data …isbn/ X Ghosh, Amitav Besse, Christianne Le palais des miroirs f:original f:nom f:traducteur f:auteur f:titre …isbn/ f:nom
50 3 rd : start merging your data …isbn/ X Ghosh, Amitav Besse, Christianne Le palais des miroirs f:original f:nom f:traducteur f:auteur f:titre …isbn/ f:nom …isbn/ X Ghosh, Amitav The Glass Palace 2000 London Harper Collins a:title a:year a:city a:p_name a:name a:homepage a:author a:publisher
51 3 rd : start merging your data (cont) …isbn/ X Ghosh, Amitav Besse, Christianne Le palais des miroirs f:original f:nom f:traducteur f:auteur f:titre …isbn/ f:nom …isbn/ X Ghosh, Amitav The Glass Palace 2000 London Harper Collins a:title a:year a:city a:p_name a:name a:homepage a:author a:publisher Same URI!
52 3 rd : start merging your data a:title Ghosh, Amitav Besse, Christianne Le palais des miroirs f:original f:nom f:traducteur f:auteur f:titre …isbn/ f:nom Ghosh, Amitav The Glass Palace 2000 London Harper Collins a:year a:city a:p_name a:name a:homepage a:author a:publisher …isbn/ X
53 User of data “F” can now ask queries like: “give me the title of the original” well, … « donnes-moi le titre de l’original » This information is not in the dataset “F”… …but can be retrieved by merging with dataset “A”! Start making queries…
54 We “feel” that a:author and f:auteur should be the same But an automatic merge does not know that! Let us add some extra information to the merged data: a:author same as f:auteur both identify a “Person” a term that a community may have already defined: a “Person” is uniquely identified by his/her name and, say, homepage it can be used as a “category” for certain type of resources However, more can be achieved…
55 3 rd revisited: use the extra knowledge Besse, Christianne Le palais des miroirs f:original f:nom f:traducteur f:auteur f:titre …isbn/ f:nom Ghosh, Amitav The Glass Palace 2000 London Harper Collins a:title a:year a:city a:p_name a:name a:homepage a:author a:publisher …isbn/ X …foaf/Person r:type
56 User of dataset “F” can now query: “donnes-moi la page d’accueil de l’auteur de l’original” well… “give me the home page of the original’s ‘auteur’” The information is not in datasets “F” or “A”… …but was made available by: merging datasets “A” and datasets “F” adding three simple extra statements as an extra “glue” Start making richer queries!
57 Using, e.g., the “Person”, the dataset can be combined with other sources For example, data in Wikipedia can be extracted using dedicated tools e.g., the “dbpedia” project can already extract the “infobox” information from Wikipedia (check out the dbpedia ontology)dbpediadbpedia ontology Combine with different datasets
58 Merge with Wikipedia data Besse, Christianne Le palais des miroirs f:original f:nom f:traducteur f:auteur f:titre …isbn/ f:nom Ghosh, Amitav The Glass Palace 2000 London Harper Collins a:title a:year a:city a:p_name a:name a:homepage a:author a:publisher …isbn/ X …foaf/Person r:type r:type foaf:namew:reference
59 Merge with Wikipedia data Besse, Christianne Le palais des miroirs f:original f:nom f:traducteur f:auteur f:titre …isbn/ f:nom Ghosh, Amitav The Glass Palace 2000 London Harper Collins a:title a:year a:city a:p_name a:name a:homepage a:author a:publisher …isbn/ X …foaf/Person r:type r:type foaf:namew:reference w:author_of w:isbn
60 Merge with Wikipedia data Besse, Christianne Le palais des miroirs f:original f:nom f:traducteur f:auteur f:titre …isbn/ f:nom Ghosh, Amitav The Glass Palace 2000 London Harper Collins a:title a:year a:city a:p_name a:name a:homepage a:author a:publisher …isbn/ X …foaf/Person r:type r:type foaf:namew:reference w:author_of w:born_in w:isbn w:long w:lat
61 It may look like it but, in fact, it should not be… What happened via automatic means is done every day by Web users! The difference: a bit of extra rigour so that machines could do this, too Is that surprising?
62 We could add extra knowledge to the merged datasets e.g., a full classification of various types of library data geographical information etc. This is where ontologies, extra rules, etc, come in ontologies/rule sets can be relatively simple and small, or huge, or anything in between… Even more powerful queries can be asked as a result It could become even more powerful
63 What did we do? Data in various formats Data represented in abstract format Applications Map, Expose, … Manipulate Query …
64 What did we do? (alternate view) Inferencing Query and Update Web of Data Applications Browser Applications Stand Alone Applications Common “Graph” Format & Common Vocabularies “Bridges” Data on the Web
65 … the graph representation is independent of the exact structures … a change in local database schemas, XHTML structures, etc, does not affect the whole “schema independence” … new data, new connections can be added seamlessly The abstraction pays off because…
66 Through URI-s we can link any data to any data The “network effect” is extended to the (Web) data “Mashup on steroids” become possible The network effect
67 Ontology Languages for the Semantic Web
68 Ontology Languages Wide variety of languages for “Explicit Specification” Graphical notations Semantic networks Topic Maps (see UML RDF Logic based Description Logics (e.g., OIL, DAML+OIL, OWL) Rules (e.g., RuleML, LP/Prolog) First Order Logic (e.g., KIF) Conceptual graphs (Syntactically) higher order logics (e.g., LBase) Non-classical logics (e.g., Flogic, Non-Mon, modalities) Probabilistic/fuzzy Degree of formality varies widely Increased formality makes languages more amenable to machine processing (e.g., automated reasoning)
69 Objects/Instances/Individuals Elements of the domain of discourse Equivalent to constants in FOL Types/Classes/Concepts Sets of objects sharing certain characteristics Equivalent to unary predicates in FOL Relations/Properties/Roles Sets of pairs (tuples) of objects Equivalent to binary predicates in FOL Such languages are/can be: Well understood Formally specified (Relatively) easy to use Amenable to machine processing Many languages use “object oriented” model based on:
70 Web “Schema” Languages Existing Web languages extended to facilitate content description XML XML Schema (XMLS) RDF RDF Schema (RDFS) XMLS not an ontology language Changes format of DTDs (document schemas) to be XML Adds an extensible type hierarchy Integers, Strings, etc. Can define sub-types, e.g., positive integers RDFS is recognisable as an ontology language Classes and properties Sub/super-classes (and properties) Range and domain (of properties)
71 RDF and RDFS RDF stands for Resource Description Framework It is a W3C candidate recommendation (http://www.w3.org/RDF) RDF is graphical formalism ( + XML syntax + semantics) for representing metadata for describing the semantics of information in a machine- accessible way RDFS extends RDF with “schema vocabulary”, e.g.: Class, Property type, subClassOf, subPropertyOf range, domain
72 URIs URI = Uniform Resource Identifier "The generic set of all names/addresses that are short cuts referring to resources" URLs (Uniform Resource Locators) are a particular type of URI, used for resources that can be accessed on the WWW (e.g., web pages) In RDF, URIs typically look like “normal” URLs, often with fragment identifiers to point at specific parts of a document:
73 Web Ontology Language Requirements Desirable features identified for Web Ontology Language: Extends existing Web standards Such as XML, RDF, RDFS Easy to understand and use Should be based on familiar KR idioms Formally specified Of “adequate” expressive power Possible to provide automated reasoning support
74 From RDFS to OWL Two languages developed to satisfy above requirements OIL: developed by group of (largely) European researchers (several from EU OntoKnowledge project) DAML-ONT: developed by group of (largely) US researchers (in DARPA DAML programme) Efforts merged to produce DAML+OIL Development was carried out by “Joint EU/US Committee on Agent Markup Languages” Extends (“DL subset” of) RDF DAML+OIL submitted to W3C as basis for standardisation Web-Ontology (WebOnt) Working Group formed WebOnt group developed OWL language based on DAML+OIL OWL language now a W3C Candidate Recommendation Will soon become Proposed Recommendation
75 OWL Axioms Axioms (mostly) reducible to inclusion (v) C ´ D iff both C v D and D v C
76 Semantic Web Wedding Cake
77 The Semantic Web provides technologies to make such integration possible! Hopefully you get a full picture at the end … So where is the Semantic Web?
78 Books – Semantic Web + OWL
79 Books – Semantic Web + OWL
80 Papers – Description Logics A Description Logic Primer Markus Krötzsch, Frantisek Simancik, Ian Horrocks This paper provides a self-contained first introduction to description logics (DLs). Attributive concept descriptions with complements Manfred Schmidt-Schauß, Gert Smolka The description logic described in this paper is closely related to the one taught in ARIN.