Presentation is loading. Please wait.

Presentation is loading. Please wait.

Adding Semantics to the Web Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems April 11, 2005.

Similar presentations


Presentation on theme: "Adding Semantics to the Web Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems April 11, 2005."— Presentation transcript:

1 Adding Semantics to the Web Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems April 11, 2005

2 2 Administrivia  Next readings and summaries:  Wednesday – First two sections of the Piazza paper  Summarize the goals, key ideas, and challenges  Reduced reading so you can work on the project!

3 3 Today’s Trivia Question

4 4 Last Time…  We were discussing Google  Main features:  Commodity hardware  Fast, transparent failover  Replication and partitioning  No requirement that all of the replicas be consistent with one another, as long as each user only sees a consistent image  Allows them to update one replica set, transition in others  The major components of computation:  PageRank (run offline)  Ranking of queries  Relatively parallelizable

5 5 Google’s Search Algorithm 1.Parse the query 2.Convert words into wordIDs 3.Seek to start of doclist in the short barrel for every word 4.Scan through the doclists until there is a document that matches all of the search terms 5.Compute the rank of that document 6.If we’re at the end of the short barrels, start at the doclists of the full barrel, unless we have enough 7.If not at the end of any doclist, goto step 4 8.Sort the documents by rank; return the top K

6 6 Ranking in Google  Considers many types of information:  Position, font size, capitalization  Anchor text  PageRank  Done offline, in a non-query-sensitive way  Count of occurrences (basically, TF) in a way that tapers off  Multi-word queries consider proximity also

7 7 Could We Build a DBMS for Google?  What would a DBMS for Google-like environments look like?  What would it be useful for, other than Google?

8 8 Beyond Google  What if we wanted to:  Add on-the-fly query capabilities to Google?  e.g., query over up-to-the-second stock market results  Use WordNet or some thesaurus to supplement Google?  Do PageRank in a topic-specific way?  Supplement Google with “ontology” info?  Do some sort of XML path matching along with keywords?  Allow for OLAP-style analysis?  Do a cooperative, e.g., P2P, Google?  Benefits of this?

9 9 Beyond the Web  The Web is mostly human-readable  … With some exceptions due to the adoption of XML, plus proprietary formats  Ideally, we’d like to be able to pose questions that go way beyond text matching, exploiting machine-readable data  What are the 5 tallest mountains?  How much has the stock market dropped since January?  What traits are known to be recessive in rats?  etc.  In a sense, the goal is to meet Vannevar Bush’s “Personal Memex” idea from 1945

10 10 The Semantic Web  The basic ideas:  Semantically annotated data (RDF)  Knowledge of concepts and relationships (ontologies, e.g., OWL)  Inferencing systems (based on KR tools)  Goal: allow very complex queries to be expressed; give best effort in answering them  “We make the language for the rules as expressive as needed to allow the Web to reason as widely as desired” (p. 38)  Berners-Lee, 2003: “The Semantic Web is data integration”  “The challenge of the Semantic Web … is to provide a language that expresses both data and rules for reasoning about the data and that allows rules from any existing knowledge-representation system to be exported onto the Web”

11 11 RDF: Resource Description Framework  Not too dissimilar in goal or style to XML  “Machine processable” data format  A ternary data model: everything is 3-ary relations  (Resource, Property, Value)  Resources are given unique URLs as global keys  Serialized in XML, in one of several formats

12 12 “It’s More Semantic” 5’10” www.usahockey.com www.a.com/~joe 5’10” hasHeight plays hockey Joe 5’10” hasHeight plays hockey www.usahockey.com named

13 13 RDF vs. XML  What’s more semantic about RDF?  It requires us to specify entities and relationships, which we can omit in XML  Though someone who understands the XML can specify what the relationships are!  It encodes a number of concepts by default:  Universal identity  Reification  Basically, the specific class or statement becomes something that can be described at a meta-level  e.g., the name “Joe” is only true up to a particular point in time  How: we give the RDF description an ID  A number of default concepts (e.g., some types, descriptions, titles)

14 14 Ontologies: The Basis of the SW  Basically, a very fancy class hierarchy  “An explicit, shared, formal specification of the terms in the domain and relations among them”  Focus is on structural properties of a class, not methods  Elements of an ontology:  classes (aka concepts)  properties (aka slots, roles)  facets (aka role restrictions)

15 15 Classes, Properties, Facets, Reification  Classes are generally familiar  Properties may include:  intrinsic properties (of the object)  relationships to other entities (e.g., your parents)  parts (if structured)  Facets are basically the properties’ domains  value type, cardinality, …  “RDF Schema” describes these; think XML Schema in RDF, for RDF  Reification takes a class definition and makes it into an object:  (i,think,(mcintoshapple,has-color,red))

16 16 Description Logics (Borgida survey)  A class of languages based on FOL, like Datalog, Prolog  Key questions: subsumption of classes, recognition of members of classes  Prolog allows us to reason about instances:  ParentOf(liz,andy).Male(andy).  Child(_x) :- ParentOf(_z, _x)  Son(_y) :- Male(_y), ParentOf(_w, _y)  DLs allow us to make further inferences – that andy is a Child, i.e., they realize:  Child(x)  ( 9 z) ParentOf(z,x)  Son(y)  ( 9 w) Male(y) Æ ParentOf(w,y)

17 17 Syntax and Semantics  Build variable-free composite terms from atoms using term constructors (e.g., at-most, all)  COURSE and at-most(10, takers) and all (takers, GRADS)  (:and COURSE (:at-most 10 takers) (:all takers GRADS)  COURSE \ · 10 takers \ 8 takers:GRADS  Can be expressed in FOPC:  COURSE(a) Æ ( 9 x 1 … x 10 ) takers(a,x 1 ) Æ … Æ takers(a, x 10 ) Æ (x 1 ≠ x 2 Æ x 2 ≠ x 3 Æ … Æ x 9 ≠ x 10 ) Æ takers µ GRADS

18 18 Questions for DLs  Is a description D consistent and coherent?  Not if the instance is empty for every possible relational structure  Are D and D’ mutually disjoint?  Yes if D I [ D’ I = ; for every I  Are D and D’ equivalent?  Yes if D I = D’ I for every I  Does D subsume some other description D’?  Yes if for every relational structure I, D I subsumes D’ I  Inconsistency: and(C,D)  NOTHING  Equivalence: D subsumes D’, D’ subsumes D

19 19 DL Example  class STUDENT is-a PERSON with  studNumber: int, key; level: {1,2,3,4}  and(PERSON, all(studNumber, INTEGER), at-least(1,studNumber),at- most(1,studNumber), all(level, one-of(1,2,3,4)), at-least(1,level),at- most(1,level)  at-most(1, compose(studNumber, inverse(studNumber))  ENROLLMENT := and( all(st,STUDENT) at-least(1,st) at-most(1,st) all(crs,COURSE) at-least(1,crs) at-most(1,crs) all(when,DATE) at-least(1,when) at-most(1,when))  STUDENT := and( all(inverse(st), ENROLLMENT) at-least(1, inverse(st)) at-most(6, inverse(st))  COURSE := and( all(inverse(crs), ENROLLMENT) at-least(1, inverse(crs)) at-most(300,inverse(crs)))  INSERT-IN(Cs431, COURSE). FILL-WITH(Cs431,taughtBy,Einstein). FILL-WITH(Cs431,takers,Anna)

20 20 More on DLs  We can have both primitive classes (equivalent to extensional relations) and virtual ones  But we can make assertions over virtual classes that directly impact the primitive ones  Contrast with updates to views in databases  Many different levels of expressiveness in different DLs  Comparison with Datalog:  Both are subsets of FOL, with some limitations  DLs allow bidirectional inference; Datalog is unidirectional  DLs are equivalent to at most FOL with <= 3 variables; Datalog has an unbounded number of existential variables

21 21 Coming Back to the SW  Lots of work on OWL, the Web Ontology Language  Based on different levels of DLs:  OWL Lite – classification hierarchy, simple constraints (cardinalities 0 or 1)  OWL DL – maximum expressiveness, computational completeness (always decidable and terminating)  OWL Full – no computational guarantees, allows classes as instances of other classes  Goal: each community builds an ontology  But how to relate ontologies?  “equivalentClass”, “equivalentProperty”, “sameAs”  Is this enough???  (More on this next time…)


Download ppt "Adding Semantics to the Web Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems April 11, 2005."

Similar presentations


Ads by Google