Presentation is loading. Please wait.

Presentation is loading. Please wait.

February 22, 2011COMS 61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011.

Similar presentations


Presentation on theme: "February 22, 2011COMS 61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011."— Presentation transcript:

1 February 22, 2011COMS 61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011

2 February 22, 2011COMS 61252 Today’s Topic: Introduction to the Semantic Web RDF Ontologies

3 February 22, 2011COMS 61253 Simplicity is Good The World Wide Web contains huge amounts of information created by many different organizations, communities and individuals for many different reasons Web users can easily access this information by specifying a known URL or using a search engine, and following links to find other related resources This simplicity is a key aspect that made the Web so popular

4 February 22, 2011COMS 61254 Simplicity is Bad The simplicity of the current Web has a price It is very easy to get lost, or discover irrelevant or unrelated information For instance, if we search for courses taught by a person named “Gail Kaiser”, we might find all kinds of other information http://www.google.com/search?hl=&q=course+taug ht+by+gail+kaiser&sourceid=navclient- ff&rlz=1B3GGGL_enUS253US253&ie=UTF-8http://www.google.com/search?hl=&q=course+taug ht+by+gail+kaiser&sourceid=navclient- ff&rlz=1B3GGGL_enUS253US253&ie=UTF-8 The problem is that the search engine does know what “courses” or “taught” means

5 February 22, 2011COMS 61255 Machine accessible meaning (What it’s like to be a machine) CV name education work private

6 February 22, 2011COMS 61256 So what does this mean? What’s a “CV”? What’s a “name”? Etc.  Need semantics

7 February 22, 2011COMS 61257 What to do? Develop enabling standards and technologies –to help machines understand more information on the Web –so that they can support richer discovery, data integration, navigation and automation of tasks

8 February 22, 2011COMS 61258 Add Metadata Associate semantically rich, descriptive information with any resource For instance, add metadata about teaching, so we can search for documents that have metadata specifying “Gail Kaiser” as a “teacher” (or “instructor”)

9 February 22, 2011COMS 61259 The Semantic Web Provides a common framework that allows data to be shared and reused across application, enterprise and community boundaries Not only provides URLs for documents, but to people, concepts and relationships By giving unique identifiers to the person, the role “teacher” and the concept of “course”, we make very clear who the person is and the corresponding relation between this person and a particular document

10 February 22, 2011COMS 612510 What’s the difference? Most Web content today is designed for humans to read, not for computer programs to manipulate meaningfully Computers can adeptly parse Web pages for layout and routine processing—here a header, there a link to another page—but in general, computers have no reliable way to process the semantics The Semantic Web brings structure to the meaningful content of Web pages, creating an environment where software agents roaming from page to page can carry out sophisticated tasks for users

11 February 22, 2011COMS 612511 What’s the difference? The Semantic Web is not a separate web but an extension of the current web, in which information is given well-defined meaning, better enabling computers and people to work in co-operation. [Berners-Lee et al., 2001]Berners-Lee et al., 2001

12 February 22, 2011COMS 612512 Wasn’t that what XML was supposed to do? Yes and no For the Semantic Web to function, computers must have access to structured collections of information and to sets of inference rules that they can use to conduct automated reasoning

13 February 22, 2011COMS 612513 Isn’t that just Knowledge Representation? Traditional knowledge representation systems typically have been centralized, requiring everyone to share exactly the same definition of common concepts such as “parent” or “vehicle” But central control is stifling, and doesn’t scale Which is why centralized hypertext link servers were abandoned for WWW

14 February 22, 2011COMS 612514 What about Web Services? Web services are computational programs accessed using Web technologies They may or may not operate on Web pages as data But when they do, the semantics are implied by WSDL descriptions but basically hidden inside the code There is no way for an arbitrary Web service or other program to “understand” the semantics of Web pages

15 Semantic Web Layers (T. Berners-Lee)T. Berners-Lee 15

16 February 22, 2011COMS 612516 Start with XML, not HTML WHIM Instructor: Gail Kaiser Students: Donald Duck WHIM Instructor: Gail Kaiser Students: Donald Duck HTML: WHIM Gail Kaiser Donald Duck XML :

17 February 22, 2011COMS 612517 XML document = labeled tree course instructortitlestudents namehttp............... = XML Schema: grammars for describing legal trees and datatypes node = label + attr/values + contents

18 February 22, 2011COMS 612518 Why not use XML Tags to represent Semantics? Syntax: the structure of your data Semantics: the meaning of your data Two conditions necessary for interoperability: –Adopt a common syntax: enables applications to parse the data –Adopt a means for understanding the semantics: enables applications to use the data

19 February 22, 2011COMS 612519 XML and Semantics? … But what does “title” mean? If we ask google, we get (on the 1 st page)ask google –Boxing and martial arts equipment –Prefix or suffix added to person’s name –HTML tag –Women’s underwear –US Laws –Home purchase insurance –Library search

20 February 22, 2011COMS 612520 XML Limitations for Semantic Markup XML makes no commitment on:  Domain-specific vocabulary  Modeling primitives Requires pre-arranged agreement on  &  Only feasible for closed collaboration –agents in a small & stable community –pages on a small & stable intranet Not suited for sharing Web resources

21 February 22, 2011COMS 612521 XML  machine accessible meaning CV name education work private

22 February 22, 2011COMS 612522 Beyond XML XML lets everyone create their own tags Scripts, or programs, can make use of these tags in sophisticated ways - but the programmer has to know what the page writer uses each tag for XML allows users to add structure to their documents but says nothing about what the structures mean

23 February 22, 2011COMS 612523 Semantic Web Layers

24 February 22, 2011COMS 612524 Add RDF = Resource Description FrameworkRDF Encodes meaning in sets of triples - subject, predicate and object - analogous to the subject, verb and object of an elementary sentence Makes assertions that particular things (people, Web pages or whatever) have properties (such as “is a sister of”, “is the author of”) with certain values (another person, another Web page) This structure can describe much of the data processed by machines

25 February 22, 2011COMS 612525 Example Imagine that we want to state the fact that someone named Gail Kaiser wrote a particular Web page A straightforward way to state this in English would be in the form of a simple statement such as: http://www.cs.columbia.edu/~kaiser/index.h tml has an author whose value is Gail Kaiser

26 February 22, 2011COMS 612526 Making Statements about Resources We need a way to identify the thing we want to describe (the Web page) We need a way to identify a specific property (author) of the thing that we want to describe We need a way to identify the thing we want to assign as the value of this property (who the author is), for the thing we want to describe

27 February 22, 2011COMS 612527 Making Statements about Resources In the example, we used the Web page's URL (Uniform Resource Locator) to identify it - subject We used the word “author” to identify the property we want to talk about - predicate And the phrase “Gail Kaiser” to identify the thing (a person) we want to say is the value of this property - object

28 February 22, 2011COMS 612528 Many Statements can be made We could state other properties of this Web page by writing additional English statements of the same general form http://www.cs.columbia.edu/~kaiser/index.h tml has a modification-date whose value is January 07, 2011 http://www.cs.columbia.edu/~kaiser/index.h tml has a size whose value is 18,985 bytes

29 February 22, 2011COMS 612529 But what do these Statements actually mean? Subject and object can each be identified by a URL, just as used in a link on a Web page The verbs – predicates – can also be identified by URLs, which enables anyone to define a new concept, a new predicate, just by defining a URL for it somewhere on the Web (a “Web resource”) The URLs ensure that concepts are not just words in a document, but are tied to a unique definition that everyone can find on the Web

30 February 22, 2011COMS 612530 Web Resources RDF is a language for representing information about resources on the World Wide Web It is particularly intended for representing metadata about Web resources, such as the title, author, modification date and size of a Web page

31 February 22, 2011COMS 612531 Generalized Resources By generalizing the concept of a “Web resource”, RDF can be used to represent information about things that can be identified on the Web, even when they can't be directly retrieved on the Web Examples include the author of the web page

32 February 22, 2011COMS 612532 Reconsider Example http://www.cs.columbia.edu/~kaiser/in dex.html has an author whose value is Gail Kaiser Neither the notion of a “author” nor Gail Kaiser can be retrieved from the Web Thus we need URIs in addition to URLs

33 February 22, 2011COMS 612533 Concept Graphs RDF is based on the idea of identifying things using URIs And describing resources (subjects) in terms of simple properties (verbs or predicates) and property values (objects) This enables RDF to represent related concepts as a graph of nodes and arcs representing the resources, their properties and values

34 February 22, 2011COMS 612534 Concept Graph Example XML syntax Chained triples form a graph http://bank.cs.columbia.edu/classes/cs6125/ site-owner Kaiser kaiser+6125 @... email W3C describes http://www.w3.org/RDF site-owner kaiser+6125@...

35 February 22, 2011COMS 612535 Information Exchange RDF provides a common framework for expressing this information so it can be exchanged between applications without loss of meaning The ability to exchange information between different applications means that the information may be made available to applications other than those for which it was originally created Application designers can leverage the availability of common RDF parsers and processing tools RDF is written in XML format further leveraging XML tools and experience

36 February 22, 2011COMS 612536 What is RDF (again) ? RDF is a data model –the model is domain-neutral and application- neutral –the model can be viewed as directed, labeled graphs or as an object-oriented model (object/attribute/value) RDF data model is an abstract, conceptual layer independent of XML consequently, XML is a transfer syntax for RDF, not a component of RDF RDF data might never occur in XML form

37 February 22, 2011COMS 612537 RDF Model RDF “statements” consist of resources (= nodes) which have properties which have values (= nodes,strings) = subject = predicate = object

38 February 22, 2011COMS 612538 RDF Model http://www.w3.org/TR/REC-rdf-syntax/ “Dave Beckett” editor “http://www.w3.org/TR/REC-rdf-syntax/ has the editor Dave Beckett” resource value property

39 February 22, 2011COMS 612539 RDF Model Example http://www.w3.org/TR/REC-rdf-syntax/ “Dave Beckett” dc:Creator “2004-02-10” dc:Date “W3C” dc:Publisher

40 February 22, 2011COMS 612540 Complex Values So far, values of properties have been strings A graph node (corresponding to a resource) also can be the value of a property –arbitrarily complex tree and graph structures are possible –syntactically, values can be embedded (i.e., lexically in-line) or referenced (linked)

41 February 22, 2011COMS 612541 Complex Values http://www.w3.org/TR/REC-rdf-syntax/ “Dave Beckett” dc:Creator “mailto:dave@dajobe.org” p:EMail p:Name

42 February 22, 2011COMS 612542 Complex Values Corresponding triples { “http://www.w3.org/TR/REC-rdf-syntax/”, dc:Creator, x } { x, p:Name, “Dave Beckett” } { x, p:EMail, “dave@dajobe.org” } http://www.w3.org/TR/REC-rdf-syntax/ “Dave Beckett” dc:Creator “mailto:dave@dajobe.org” p:EMail p:Name

43 February 22, 2011COMS 612543 Containers Containers are collections - a llow grouping of resources (or literal values) It is possible to make statements about the container (as a whole) or about its members individually Different types of containers –bag - unordered collection –seq - ordered collection (= “sequence”) –alt - represents alternatives It is possible to create collections based on URI patterns – e.g., all files in a particular web site Duplicate values are permitted - no mechanism to enforce unique value constraints

44 February 22, 2011COMS 612544 Containers http://www.w3.org/TR/REC-rdf-syntax “Dave Beckett” rdf:_1 rdf:Seq dc:Creator rdf:Type “Brian McBride” rdf:_2

45 February 22, 2011COMS 612545 Higher-order Statements One can make RDF statements about other RDF statements Example: “The Library of Congress affiliates Dave Beckett as the author of the RDF Syntax spec” Allow us to express beliefs (and other modalities) Important for trust models, digital signatures, etc. Constitute metadata about metadata Represented by modeling RDF in RDF itself

46 Reification http://www.w3.org/TR/REC-rdf-syntax“Dave Beckett” dc:Creator “Library of Congress” dc:Creator The dotted box corresponds to the following statements { x, rdf:predicate, “dc:creator” } { x, rdf:subject, “http://www.w3.org/TR/REC-rdf-syntax } { x, rdf:object, “Dave Beckett” } { x, rdf:type, “rdf:statement” } February 22, 201146COMS 6125

47 Reification Reification allows a computer to process an abstraction as if it were any other datum RDF is not really second-order But it does provide a built-in predicate vocabulary for reification February 22, 201147COMS 6125

48 February 22, 2011COMS 612548 Reification pers05 ISBN... Author-of NYT claims ISBN... Any statement can be an object (graphs can be nested)

49 49 RDF Schema Defines small vocabulary for RDF: Class, subClassOf, type Property, subPropertyOf domain, range Organizes this vocabulary in a typed hierarchy Vocabulary can be used to define other vocabularies for your application domain Person StudentResearcher subClassOf type hasSuperVisor domain range Swap type hasSuperVisor Gail

50 February 22, 2011COMS 612550 RDF Schema syntax in XML

51 February 22, 2011COMS 612551 Conclusions about RDF Next step up from plain XML –modeling primitives –possible to define vocabulary However: –no precisely described meaning –no inference model Problematic examples: “Columbus believed that the world is flat” “Gloria believes that the Web should be delivered on CD-ROM”

52 February 22, 2011COMS 612552 Where do we get the precisely defined meaning? Two databases may use different identifiers for the same concept, such as zip code vs. postal code A program that wants to compare or combine information across the two databases has to know that these two terms mean the same thing The program must have a way to discover such common meanings for whatever databases it encounters A solution to this problem is provided by collections of information called ontologies

53 February 22, 2011COMS 612553 Semantic Web Layers

54 February 22, 2011COMS 612554 What is an Ontology? In philosophy, an ontology is a theory about the nature of existence, of what types of things exist; ontology as a discipline studies such theories Semantic Web researchers (and various other communities) have co-opted the term for their own jargon For Semantic Web researchers, an ontology is a document or file that formally defines the relationships among terms The most typical kind of ontology for the Web has a taxonomy and a set of inference rules

55 February 22, 2011COMS 612555 What is a Taxonomy? Taxonomy = segmentation, classification and ordering of elements into a classification system according to the relationships between each other Object PersonTopicDocument ResearcherStudent Semantics Ontology Doctoral Student PhD StudentF-Logic Menu

56 February 22, 2011COMS 612556 Taxonomies A taxonomy defines classes of objects and relations among them For example, an address may be defined as a type of location, and city codes may be defined to apply only to locations If city codes must be of type city and cities generally have Web sites, we can discuss the Web site associated with a city code even if no database links a city code directly to a Web site

57 February 22, 2011COMS 612557 An Ontology also provides a form of Thesaurus Object PersonTopicDocument Researcher Student Semantics PhD Student Doctoral Student Terminology for specific domain Graph with primitives, fixed relationships ( similar, synonym ) similarsynonym OntologyF-Logic Menu

58 February 22, 2011COMS 612558 An Ontology also provides a Topic Map Topics (nodes), relationships and occurrences (to documents) Useful for navigation and visualization Object PersonTopicDocument ResearcherStudent Semantics PhD Student Doctoral Student knows described_in writes AffiliationTel OntologyF-Logic similar synonym Menu

59 OntologyF-Logic similar PhD Student Doctoral Student The Taxonomy is Augmented by Inference Rules Object PersonTopicDocument Tel Semantics knows described_in writes Affiliation described_inis_about knows P writes D is_about TPT DTTD Rules ResearcherStudent instance_of is_a Swapneel Sheth 59

60 February 22, 2011COMS 612560 Inference Rules An ontology may express the rule “If a city code is associated with a state code, and an address uses that city code, then that address has the associated state code” A program could then deduce, for instance, that a Columbia University address, being in New York City, must be in New York State, which is in the U.S., and therefore should be formatted to U.S. standards The computer doesn't truly “understand” any of this information But it can now manipulate the terms much more effectively in ways that are useful and meaningful to the human user

61 February 22, 2011COMS 612561 Solution to Terminology Problems The meaning of terms or XML tags used on a Web page can be defined by pointers from the page to an ontology The same problems as before now arise if I point to an ontology that defines addresses as containing a zip code and you point to one that uses postal code This can be resolved if ontologies (or other Web services) provide equivalence relations: one or both of our ontologies may contain the information that my zip code is equivalent to your postal code

62 February 22, 2011COMS 612562 Using Ontologies Ontologies can be used in a simple fashion to improve the accuracy of Web searches The search program can look for only those pages that refer to a precise concept instead of all the ones using ambiguous keywords More advanced applications could use ontologies to relate the information on a page to the associated knowledge structures and inference rules

63 February 22, 2011COMS 612563 Example Suppose you wish to find the Ms. Cook you met at a trade conference last year You don't remember her first name, but you remember that she worked for one of your clients and that her brother was a student at your alma mater

64 February 22, 2011COMS 612564 Example An intelligent search program can sift through all the pages of people whose name is “Cook” Sidestep all the pages relating to cooks, cooking, the Cook Islands and so forth Find the person named Cook who works for a company that's on your client list And follow links to Web pages of their relatives to track down if any are in school at the right place

65 February 22, 2011COMS 612565 Agents The real power of the Semantic Web will be realized when people create (many) programs that collect Web content from diverse sources, process the information and exchange the results with other programs The effectiveness of such software agents will increase exponentially as more machine-readable Web content and automated services (including other agents) become available

66 February 22, 2011COMS 612566 Proofs The Semantic Web promotes this synergy: even agents that were not expressly designed to work together can transfer data among themselves when the data comes with semantics An important facet of agents' functioning will be the exchange of “proofs”

67 February 22, 2011COMS 612567 Example Suppose Ms. Cook's contact information has been located by an online service, and places her in Baghdad You want to check this, so your computer asks the service for a proof of its answer An inference engine on your computer verifies this proof, i.e., that this Ms. Cook indeed matches the one you were seeking, and it can show you the relevant Web pages if you still have doubts

68 February 22, 2011COMS 612568 Service Discovery Many automated Web-based services already exist without semantics But current service discovery initiatives attack the problem at a structural or syntactic level, and rely heavily on standardization of a predetermined set of functionality descriptions

69 February 22, 2011COMS 612569 Service Discovery Other programs such as agents have no way to locate a service that will perform a specific function This process can happen only when there is a common language to describe a service in a way that lets other agents “understand” both the function offered and how to take advantage of it The consumer and producer agents can reach a shared understanding by exchanging ontologies, which provide the vocabulary needed for discussion Semantics also makes it easier to take advantage of a service that only partially matches a request

70 February 22, 2011COMS 612570 Non-Web Applications The Semantic Web can extend into our physical world URIs can point to anything, including physical entities, which means we can use RDF to describe devices such as cell phones and TVs Such devices can advertise their functionality — what they can do and how they are controlled — much like software agents Semantic descriptions of device capabilities and functionality will let us achieve “home automation” with minimal human intervention

71 February 22, 2011COMS 612571 Examples When you answer your phone, other sound is automatically turned down –Instead of having to program each specific appliance, you could program such a function once and for all to cover every local device that advertises having a volume control — the TV, the DVD player, the media players on the laptop, … Your Web-enabled microwave oven consults the frozen-food manufacturer's Web site for optimal cooking parameters

72 February 22, 2011COMS 612572 OWL Delivers Ontologies that Work on the Web What's needed next is a way to develop domain specific vocabularies An ontology defines the terms used to describe and represent an area of knowledge Ontologies include computer-usable definitions of basic concepts in the domain and the relationships among them, making that knowledge reusable

73 February 22, 2011COMS 612573 OWLOWL = Web Ontology Language For defining structured, Web-based ontologies enabling richer integration and interoperability of data among descriptive communities Uses URIs for naming Uses RDF and RDF Schema for description Adds vocabulary for describing relations between classes (e.g. disjointness), cardinality (e.g. "exactly one"), characteristics of properties (e.g. symmetry)

74 February 22, 2011COMS 612574 Semantic Web Layers

75 February 22, 2011COMS 612575 Semantic Web Layers The Unicode and URI layers make sure that we use international character sets and provide means for identifying the objects in the Semantic Web The XML layer with namespaces and schema definitions make sure we can integrate the Semantic Web definitions with other XML-based standards

76 February 22, 2011COMS 612576 Semantic Web Layers RDF and RDFSchema make it possible to make statements about objects with URIs and define vocabularies that can be referred to by URIs RDFSchema defines the XML vocabulary for defining classes, subclasses, properties and subproperties The Ontology layer (OWL) supports the evolution of vocabularies as it can define relations between the different concepts

77 February 22, 2011COMS 612577 Semantic Web Layers The top layers, Logic, Proof and Trust, are “under development” The Logic layer will enable the writing of rules The Proof layer will execute the rules The Trust layer together with the Digital Signature layer will provide mechanisms for applications to determine whether to trust the given proof or not

78 February 22, 2011COMS 612578 Semantic Web Layers RFC Standard Work in Progress

79 February 22, 2011COMS 612579 Next Assignments Full paper due Tuesday March 8 thFull paper Project Proposal due Tuesday March 8 thProject Proposal

80 February 22, 2011COMS 612580 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011


Download ppt "February 22, 2011COMS 61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011."

Similar presentations


Ads by Google