Introduction to the Semantic Web Charlie Abela Department of Artificial Intelligence charlie.abela@um.edu.mt
Lecture Outline Course organisation Today’s Web limitations Machine-processable data The Semantic Web Impact Semantic Web Technologies The Layered Approach CSA 3210 Introduction
Organisation This part of the course: approx. 2ECTS = 14 hrs Lectures: usually Tuesday 15:00-16:00 Assignment: intends to combine all aspects of this course CSA 3210 Introduction
Course Material Slides & Additional Reading Textbooks http://www.cs.um.edu.mt/~cabe2/lectures/sw/course_material.html Textbooks A Semantic Web Primer by Grigoris Antoniou and Frank van Harmelen ISBN 9780262012102 Semantic Web: concepts, technologies and applications by Karin K. Breitman, Marco A. Casanova and Walter Truszkowski ISBN 9781846285813 CSA 3210 Introduction
The Web What are the main component of the Web? HTTP (how to transfer data) GET /index.html URI (how to address data) http://www.cs.um.edu.mt/.... HTML (how to mark up data for human reader) <html><head><title>..... CSA 3210 Introduction
The core problem of the Web Information Overload which leads to problems when Retrieving documents Extracting relevant data from retrieved documents Combining information from different sources to achieve a particular goal CSA 3210 Introduction
Retrieve a document Querying for “jaguar” returns various types of results: Cars Feline Operating system Who knows what else CSA 3210 Introduction
Extracting information CSA 3210 Introduction
Extracting information CSA 3210 Introduction
Aggregating information Find me the cheapest price for the book “Semantic Web Primer” CSA 3210 Introduction
Aggregating information CSA 3210 Introduction
Personal Software Agents Let a personal assistant handle all the web related tasks. Cool!! However…. CSA 3210 Introduction
Today’s Web Today’s Web content is suitable for human consumption However for a machine it must be like this Crazy!!! CSA 3210 Introduction
Current Web Content HTML <h1> Department of AI</h1> Welcome to the Department of Artificial Intelligence. <h2>Students’ hours</h2> Mon 10am – 11.30am<br> Tue 11am – 12.30pm<br> Wed 3pm - 4pm<br> Thu 11am – 12.30pm<br> Fri 10am – 11.30am<p> Students are urged to contact us during these slots <a href=". . .">Staff Pages</a> Web content is currently formatted for human readers rather than programs. HTML is the predominant language in which Web pages are written Leads to problems where machines are involved: How to distinguish staff pages? How to determine exact contact hours? If links are to be followed, how will the agent find the correct one? CSA 3210 Introduction
Possible solution Apart from making content human-readable, make it also machine-processable! Ask queries that are machine-understandable i.e. machines must be capable of understanding all the terms involved CSA 3210 Introduction
The Semantic Web Approach The Semantic Web is specifically a web of machine-readable information whose meaning is well-defined by standards. It is not artificial intelligence: no magic involved, rather we need to find ways in which our machines can access and use machine-processable information to ease our day-to-day activities a separate kind of Web: rather an extension Web + machine-processable information CSA 3210 Introduction
Impact of the Semantic Web Knowledge Management: concerns itself with acquiring, accessing, and maintaining knowledge within an organization Key activity of large businesses: they view internal knowledge as an intellectual asset B2C Electronic Commerce: A typical scenario: user visits one or several online shops, browses their offers, selects and orders products. Browsing multiple stores is too time consuming. Make use of Shopbots. B2B Electronic Commerce: Currently relies mostly on EDI (complex, difficult to use) But B2B not well supported by Web standards CSA 3210 Introduction
Semantic Web Technologies Explicit metadata Ontologies to standardise concepts and relations between them Logic and Inference: languages founded in various flavours of logic Software Agents: make use of all the above to help us in our tasks CSA 3210 Introduction
Explicit Metadata Metadata: data about data Metadata capture part of the meaning of data is structured data which describes the characteristics of a resource used in HTML: <Meta>…tag It shares many similar characteristics to the cataloguing that takes place in libraries, museums and archives. E.g. Dublin Core schema: can be used to define a “virtual card” CSA 3210 Introduction
A more Comprehensive Representation XML based <department> <departmentName>Artificial intelligence </departmentName> <hod> <name>Roger Right</name> <room>312</room> <telephone>23400007</telephone> <contactHr>11:30am-13:30pm</contactHr> </hod> <staff> <lecturer>Steve Runner</lecturer> <lecturer>George Cool</lecturer> <secretary>Mary Nice</secretary> </staff> </department> XML-based representations are more easily processable by machines, since they are more structured CSA 3210 Introduction
Ontologies The term ontology originates from philosophy: The study of the nature of existence Ontology is the study of the categories of things that exist or may exist in some domain…it is a catalogue of the types of things that are assumed to exist in a domain D from the perspective of a person who uses a language L to talk about D. (Sowa 1997) Think of an ontology as a vocabulary used to describe things (Guarino 1998) Ontologies are used to facilitate knowledge sharing and reuse by formally defining a shared conceptualization CSA 3210 Introduction
Components of Ontologies An ontology describes formally a domain of discourse and includes the following components. Terms denote important concepts (or classes of objects) in the domain e.g. professors, staff, students, courses, departments Relationships between these terms: most typical is a taxonomy relation (is-A) a class C is a subclass of another class C' if every object in C is also included in C' e.g. all professors are staff members CSA 3210 Introduction
Other Ontology Components Properties: e.g. X teaches Y Value restrictions e.g. only faculty members can teach courses Disjointness statements e.g. faculty members and general staff are disjoint Logical relationships between objects e.g. every department must include at least 10 faculty members CSA 3210 Introduction
Ontologies on the Web Ontologies are ideal to provide a shared understanding of a domain: enable semantic interoperability overcome differences in terminology issue: mappings between ontologies Ontologies are useful for the organization and navigation of Web sites Ontologies are useful for improving the accuracy of Web searches search engines can look for pages that refer to a precise concept in an ontology CSA 3210 Introduction
Semantic Web Languages Need languages to define ontologies Initially there where RDF/Schema: Resource Description Framework then came DAML and OiL now we have a W3C recommendation for OWL Web Ontology Language EXPRESSIVE CSA 3210 Introduction
Logic and Inference Logic is the discipline that studies the principles of reasoning Formal languages for expressing knowledge Well-understood formal semantics Declarative knowledge: we describe what holds without caring about how it can be deduced Automated reasoners can deduce (infer) conclusions from the given knowledge CSA 3210 Introduction
Machine understandable… Published facts B related-to A C related-to A D related-to C Query Return all entities related to A ?x related-to A Result B C 1st look at what machine understandable means CSA 3210 Introduction
Machine understandable + inference Published facts B related-to A C related-to A D related-to C also declare that related-to is transitive ?x related-to ?y and ?y related-to ?z => ?x related-to ?z Query Return all entities related to A ?x related-to A Result B C D How is this possible? Do people need to learn logic? How are we going to specify what A, B, C and D are? How are we going to specify related-To? CSA 3210 Introduction
Software Agents Software agents work autonomously and proactively They evolved out of object oriented and component-based programming A personal agent on the Semantic Web will: receive some tasks and preferences from the person seek information from Web sources, communicate with other agents compare information about user requirements and preferences, suggest certain choices recommend answers to the user CSA 3210 Introduction
Semantic Web Layered Approach CSA 3210 Introduction
In the following lectures… We will explore some of the technologies mentioned in the SW layered approach, particularly those in the lower layers: present an overview of these technologies walk through examples and discuss their importance vis-à-vis application areas CSA 3210 Introduction
Suggested reading… Textbook: Semantic Web Primer, Chapter 1 TBL, J.Hendler, O.Lassila, The Semantic Web. http://www.cs.um.edu.mt/~cabe2/lectures/sw/papers/The_Semantic_Web.pdf J.Hendler, Agents and the Semantic Web. http://www.cs.umd.edu/users/hendler/AgentWeb.html Further reading The Semantic Web: A Primer, E.Dumbill. http://www.xml.com/pub/a/2000/11/01/semanticweb/ The Semantic Web: An Introduction, S.Palmer. http://infomesh.net/2001/swintro/ CSA 3210 Introduction
Next lecture Introduction to XML DTD XML schema Comparison CSA 3210
Extra slides CSA 3210 Introduction
Another typical Example prof(X) facultyMember(X) facultyMember(X) staffMember(X) prof(michael) We can deduce the following conclusions: facultyMember(michael) staffMember(michael) prof(micheal) staff(micheal) CSA 3210 Introduction