Presentation is loading. Please wait.

Presentation is loading. Please wait.

TU/e technische universiteit eindhoven Web Data and Metadata Geert-Jan Houben.

Similar presentations


Presentation on theme: "TU/e technische universiteit eindhoven Web Data and Metadata Geert-Jan Houben."— Presentation transcript:

1 TU/e technische universiteit eindhoven Web Data and Metadata Geert-Jan Houben

2 TU/e technische universiteit eindhoven Contents Evolution in Web data Techniques and Languages for Web data: –XML –XML Querying: XQuery –RDF (& RQL) –OWL Note: here the context, not the details!

3 TU/e technische universiteit eindhoven Evolution

4 TU/e technische universiteit eindhoven Future of the Web 1.common syntax: XML HTML: a fixed set of tags complicates the identification of information elements XML allows to define data structures: Tags with freely chosen names –No predefined tags enables definition, transmission, validation and interpretation of data between applications (and organizations) Freely chosen attributes Simple definition: DTD Extended definition: XML-Schema

5 TU/e technische universiteit eindhoven Bob Quilt Peter Quilt XML-GL Quilt Karin Alice

6 TU/e technische universiteit eindhoven //person/name[../know-how="Quilt"] $union$ //seminar[topic="Quilt"]/participant/name

7 TU/e technische universiteit eindhoven Future of the Web 2.Specification of meaning: RDF Resource: denotes an information item, e.g. via a URL Property type: name of a property of a resource Value: value for that property Example: Resource = URL of web page Property type = “author” Value =“John Smith”

8 TU/e technische universiteit eindhoven John Smith smith@home.net Home, Inc.

9 TU/e technische universiteit eindhoven Future of the Web 3.Meaning: ontologies Ontology = a vocabulary with associated meaning Possibility to define synonyms, specializations and other relationships Use of same ontology = contract on meaning of words (tags, attributes) Often, industry or domain dependent

10 TU/e technische universiteit eindhoven Future of the Web 4.Logic to derive conclusions Necessary in electronic commerce: What do messages mean exchanged between supplier and customer? 5.Goal: trust in the meaning of communication between Web systems, and hence the possibility to automate using agents Ref: www.w3.org

11 TU/e technische universiteit eindhoven Web Data Integration WIS repository (back-end) typically assembled from different heterogeneous sources, e.g. databases, files, WWW To manage (coordinate) data from different sources, metadata helps to structure the data

12 TU/e technische universiteit eindhoven Metadata Describing the data and its availability Sometimes provided by sources Needed by IS Engineering metadata: –Meaning –Validity –Quality Specifying “logistics” of data

13 TU/e technische universiteit eindhoven XML Semistructured data

14 TU/e technische universiteit eindhoven XML: Complex data Structure is irregular (missing/extra data) Schema does not exist or is unknown Schema is rapidly evolving Relational and ODB models are too rigid Standard is a document/hypertext language HTML Solution: semistructured data model XML –data model consists of a type definition language, a query/update language and more

15 TU/e technische universiteit eindhoven XML Environment Follow-up of SGML, markup language for documents, and OO databases XML eXtensible Mark-up Language –W3C and most industrial companies [B2B] –Main idea: separate content and presentation –Use tags to represent structure and semantics Ref: www-rocq.inria.fr/~abitebou/pub/lics01.ppt

16 TU/e technische universiteit eindhoven HTML = Hypertext Language Ref Name Price X23 Camera 359.99 R2D2 Robot 19350.00 Z25 PC 1299.99 Information System HTML The X23 new camera replaces the X22. It comes equipped with a flash (worth by itself 53.99 $ ) and provides great quality for only 359.99 $. Text + presentation Where is the data ? hard

17 TU/e technische universiteit eindhoven XML = Semistructured Data Ref Name Price X23 Camera 359.99 R2D2 Robot 19350.00 Z25 PC 1299.99... Information System camera 359.99 … Robot 19350 …... XML Data + Structure Semistructured: more flexible easy

18 TU/e technische universiteit eindhoven XML Flexibility no fixed set of tags no fixed interpretation/rendering of tags no fixed structure

19 TU/e technische universiteit eindhoven Alice Smith 123 Maple Street Mill Valley CA 90952 Robert Smith 8 Oak Avenue Old Town PA 95819 Hurry, my lawn is going wild! Lawnmower 1 148.95 Confirm this is electric Baby Monitor 1 39.98 1999-05-21

20 TU/e technische universiteit eindhoven XML Documents elements and attributes elements are ordered attribute values are strings well-formed documents (e.g. proper nesting) namespaces: vocabularies for tags valid documents: DTD, Schema

21 TU/e technische universiteit eindhoven DTD: a grammar Catalog  Product* Product  Name Price? Cat (Part Quantity)* Part  BasicPart + ComposedPart BasicPart  Name ComposedPart  Name (Part Quantity)*

22 TU/e technische universiteit eindhoven XML Schema to define a class of documents: conforming to a schema in XML syntax built-in types

23 TU/e technische universiteit eindhoven Purchase order schema for Example.com. Copyright 2000 Example.com. All rights reserved....

24 TU/e technische universiteit eindhoven...

25 TU/e technische universiteit eindhoven Typing XML Not really, the true spirit of the Web, but essential for data management: query optimization, user interfaces, applications Differences with standard database typing –Collections are sequences instead of sets –Types may be very large (e.g., from integration) –Data is more irregular so types should be more permissive –New issues sometimes: you have the data, extract its type: an approximate type

26 TU/e technische universiteit eindhoven More on XML The Database Models course in BIS, given by De Bra and Paredaens, will pay much more attention to the XML data model. Also, look at the W3C site: w3c.org

27 TU/e technische universiteit eindhoven XML Querying XQuery

28 TU/e technische universiteit eindhoven XML query language XML is used for data exchange on the Web W3C develops standard: XML Query Working Group XML Query Data Model XPath and XQuery Ref: www.w3.org/XML/Query

29 TU/e technische universiteit eindhoven XPath Path expressions in OO databases /Students/Student/Status Semistructured: –missing parts /Students//Status –conditions /Students/Student[Status=“U4”] Indexing, wildcards Selection, string manipulation, aggregation, attribute existence, union

30 TU/e technische universiteit eindhoven XSLT XSL: XML Stylesheet Language –(XSLT: XSL Transformations) declarative language for transforming XML documents using an XSLT processor

31 TU/e technische universiteit eindhoven XQuery http://www.w3.org/XML/Query “the” standard for XML querying Goal WG: “data model for XML documents, a set of query operators on that data model, and a query language based on these query operators” General query language (next to XPath + XSLT)

32 TU/e technische universiteit eindhoven XQuery Path Expressions Based on XPath In the second chapter of the document named “zoo.xml”, find the figure(s) with caption “Tree Frogs”. document(“zoo.xml”)/chapter[2]// figure[caption=“Tree Frogs”] Find captions of figures that are referenced by elements in the chapter of “zoo.xml” with title “Frogs”. document(“zoo.xml”)/chapter[title=“Frogs”]// figref/@refid->fig/caption

33 TU/e technische universiteit eindhoven XQuery Element Constructor Generate an element that has an “empid” attribute. The value of the attribute and the content of the subelements are specified by variables that are bound in other parts of the query. {$name} {$job}

34 TU/e technische universiteit eindhoven XQuery FLWR Expression FOR var IN exprbinding-clause LET var := exprbinding-clause WHERE exprselect-predicate RETURN exproutput-generation List the titles of books published by Morgan Kaufmann in 1998. FOR $b IN document(“bib.xml”)//book WHERE $b/publisher = “Morgan Kaufmann” AND $b/year = “1998” RETURN $b/title

35 TU/e technische universiteit eindhoven FLWR Expression List each publisher and the average price of its books. FOR $p IN distinct(document(“bib.xml”)//publisher) LET $a := avg(document(“bib.xml”)/book[publisher=$p]/price) RETURN {$p/text()} {$a}

36 TU/e technische universiteit eindhoven Operators and Functions Find the maximum depth of the document named “partlist.xml”. NAMESPACE xsd=http://www.w3.org/2001/XMLSchema-datatypes FUNCTION depth(ELEMENT $e) RETURNS xsd:integer { -- An empty element has depth 1 -- Otherwise, add 1 to max depth of children IF empty($e/*) THEN 1 ELSE max(depth($e/*)) + 1 } depth(document(“partlist.xml”))

37 TU/e technische universiteit eindhoven Conditional Expression Make a list of holdings, ordered by title. For journals, include the editor, and for all other holdings, include the author. FOR $h IN //holding RETURN {$h/title, IF $h/@type=“Journal” THEN $h/editor ELSE $h/author } SORTBY (title)

38 TU/e technische universiteit eindhoven Quantified Expressions Find titles of books in which both sailing and windsurfing are mentioned in the same paragraph. FOR $b IN //book WHERE SOME $p IN $b//para SATISFIES contains($p,”sailing”) AND contains($p,”windsurfing”) RETURN $b/title Find titles of books in which sailing is mentioned in every paragraph. FOR $b IN //book WHERE EVERY $p IN $b//para SATISFIES contains($p,”sailing”) RETURN $b/title

39 TU/e technische universiteit eindhoven Other expressions Sequence-related expressions –Example: ($x,$y,$z) –PRECEDES, FOLLOWS Operators on data types –INSTANCEOF –CAST –TREAT

40 TU/e technische universiteit eindhoven More on XQuery The Database Models course in BIS, given by De Bra and Paredaens, will pay much more attention to XML query languages. Also, look at the W3C site: w3c.org

41 TU/e technische universiteit eindhoven RDF RQL

42 TU/e technische universiteit eindhoven Resource Description Framework W3C standard for metadata description Describes the “meaning” of data like Web sites, parts of HTML pages, etc. Makes data “machine - understandable” – allows automated data processing Framework that allows you to make simple assertions about anything: distributed and extensible (as is the Web) “meaning” expressed via “subclass of” Ref: www.w3.org/RDF, www.w3.org/TR/rdf-primer

43 TU/e technische universiteit eindhoven Basic RDF Model Recognizes 3 object types: –Resources – always named by URI, e.g. web site, part of web page, others –Properties – an attribute of a Resource, its characteristics –Statements – Resource + Property + Property Value

44 TU/e technische universiteit eindhoven Basic RDF Model Example RDF representation of the sentence: “Ora Lassila is the creator of the resource www.w3.org/Home/Lassila.” Statement: Subject (Resource) www.w3.org/Home/Lassila Predicate (Property) Creator Object (Literal) “Ora Lassila”

45 TU/e technische universiteit eindhoven Basic RDF Model Example In general : HAS here www.w3.org/Home/Lassila HAS Creator Ora Lassila Diagram of the statement: www.w3.org/Home/Lassila Ora Lassila Creator

46 TU/e technische universiteit eindhoven RDF and XML RDF can be implemented using XML The example of complete XML for the previous example is: <rdf:RDF xmlns:rdf=“http://www.w3.org/1999/02/22-rdf-syntax-ns#” xmlns:s=http://description.org/schema/> Ora Lassila

47 TU/e technische universiteit eindhoven Structured Value Example “The employee with ID 85740, Ora Lassila, with Email lassila@w3.org, is the creator of the resource www.w3.org/Home/Lassila” www.w3.org/staffid/85740 www.w3.org/Home/Lassila Ora LassilaLassila@w3.org Creator Name Email In XML it is: Ora Lassila lassila@w3.org

48 TU/e technische universiteit eindhoven RDF - more Property value can be literal or resource One subject can have more than one property It is possible to make statements about statements It is possible to refer a collection of resources (containers) of 3 types: –Bag – a property has multiple values, order has no significance –Sequence – a property has multiple value, order is significant –Alternative – list of literals/resources representing alternatives for single property

49 TU/e technische universiteit eindhoven RDF Schemas and Namespaces Meaning of terms used in statements like “Creator”, “Name”, “Email” is expressed by referencing to RDF Schemas (“domain-definition”) RDF Schema provides information about the interpretation of the statement in given RDF model RDF Schema is usually separate document To avoid confusion between different definitions of the same term, RDF Schemas use Namespace facility. xmlns:s=“http://description.org/schema” xmlns:v=“http://description.org/differentschema” Ora Lassila

50 TU/e technische universiteit eindhoven RDF Query Language Querying RDF metadata –SQL/XQL style approach, viewing RDF metadata as relational or XML database [RDF Query Specification (IBM)] –viewing Web descriptions by RDF metadata as knowledge base, applying knowledge representation and reasoning techniques [W3C related] RQL Ref: 139.91.183.30:9090/RDF/publications/bda01.PDF 139.91.183.30:8999/RQLdemo/

51 TU/e technische universiteit eindhoven RQL subClassOf(Artist) subClassOf^(Artist) SELECT $C1, $C2 FROM {$C1}creates{$C2} SELECT X, Y FROM {X}last_modified{Y} WHERE Y >= 2000-01-01

52 TU/e technische universiteit eindhoven OWL

53 TU/e technische universiteit eindhoven OWL Web Ontology Language used to explicitly represent meaning of terms in vocabularies and relationships between terms: ontology –ontology engineering beyond XML and RDF(S) revision of DAML+OIL

54 TU/e technische universiteit eindhoven Stack XML: surface syntax for structured documents (no semantic constraints on meaning) XML Schema: restricting structure of XML documents RDF: datamodel for objects (resources) and relationships, provides simple semantics for this datamodel RDF Schema: vocabulary for describing properties and classes of RDF resources, with semantics for generalization-hierarchies OWL: adds vocabulary for describing properties and classes, e.g. relations between classes (disjoint), cardinality (exactly one), equality, richer typing of properties, characteristics of properties (symmetry), enumerated classes

55 TU/e technische universiteit eindhoven OWL Sublanguages OWL Lite: classification hierarchy and simple constraints OWL DL: maximum expressiveness while retaining computational completeness and decidability (description logics) OWL Full: maximum expressiveness and syntactic freedom of RDF with no computational guarantees

56 TU/e technische universiteit eindhoven OWL Lite RDF Schema features: Class, rdf:Property, rdfs:subClassOf, rdfs:subPropertyOf, rdfs:domain, rdfs:range, Individual (In)Equality: equivalentClass, equivalentProperty, sameIndividualAs, differentFrom, allDifferent Property characteristics: inverseOf, TransitiveProperty, SymmetricProperty, FunctionalProperty, InverseFuntionalProperty Property type restrictions: allValuesFrom, someValuesFrom Restricted cardinality: minCardinality (0/1), maxCardinality (0/1), cardinality (0/1) Class intersection: intersectionOf

57 TU/e technische universiteit eindhoven OWL DL and Full Class axioms: oneOf, disjointWith, equivalentClass, rdfs:subClassOf (both applied to class expressions) Boolean combinations of class expressions: unionOf, intersectionOf, complementOf Arbitrary cardinality: minCardinality, maxCardinality, cardinality

58 TU/e technische universiteit eindhoven References There is a lot of information available through the W3C site. Depending on your background, have a close look at some of the languages and the ideas behind them.

59 TU/e technische universiteit eindhoven


Download ppt "TU/e technische universiteit eindhoven Web Data and Metadata Geert-Jan Houben."

Similar presentations


Ads by Google