Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Berendt: Gegevensbanken, 2nd semester 2011/2012, 1 Gegevensbanken Outlook – The Semantic Web, XML, RDF,

Similar presentations


Presentation on theme: "1 Berendt: Gegevensbanken, 2nd semester 2011/2012, 1 Gegevensbanken Outlook – The Semantic Web, XML, RDF,"— Presentation transcript:

1 1 Berendt: Gegevensbanken, 2nd semester 2011/2012, 1 Gegevensbanken Outlook – The Semantic Web, XML, RDF, Linked (Open) Data, NoSQL Bettina Berendt Katholieke Universiteit Leuven, Department of Computer Science

2 2 Berendt: Gegevensbanken, 2nd semester 2011/2012, 2 Waar zijn we? Les #wiewat 1EDintro, ER 2EDEER, (E)ER naar relationeel schema 2EDrelationeel model 3KVRelationele algebra & relationeel calculus 4,5KVSQL 6KVProgramma's verbinden met gegevensbanken 7KVFunctionele afhankelijkheden & normalisatie 8KVPHP 10BBBeveiliging van gegevensbanken 11BBGeheugen en bestandsorganisatie 12BBExterne hashing 13BBIndexstructuren 14BBQueryverwerking 15-17BBTransactieverwerking en concurrentiecontrole 18BBData mining en Information Retrieval 9 BBXML (en meer over het Web als GB), NoSQL Nieuwe thema‘s / vooruitblik Hoe worden gegevens machtig? Analyse & combinatie

3 3 Berendt: Gegevensbanken, 2nd semester 2011/2012, 3 Een motivatie V: Algemeen over het internet: valt dit te beschouwen als één grote ongeordende chaos van websites, of zijn het meer allemaal aparte databases (bijvoorbeeld met alle webpagina's uit België of alle webpagina's van een internetprovider als Telenet) die samen het internet vormen (en dus toelaten aan een grote, algemene database om die zijn taken te verdelen) ?

4 4 Berendt: Gegevensbanken, 2nd semester 2011/2012, 4 Bijvoorbeeld: SIG.MA

5 5 Berendt: Gegevensbanken, 2nd semester 2011/2012, 5 Gegevensbanken Outlook – The Semantic Web, XML, RDF, Linked (Open) Data, NoSQL Bettina Berendt Katholieke Universiteit Leuven, Department of Computer Science

6 6 Berendt: Gegevensbanken, 2nd semester 2011/2012, 6 The original vision The entertainment system was belting out the Beatles' "We Can Work It Out" when the phone rang. When Pete answered, his phone turned the sound down by sending a message to all the other local devices that had a volume control. His sister, Lucy, was on the line from the doctor's office: "Mom needs to see a specialist and then has to have a series of physical therapy sessions. Biweekly or something. I'm going to have my agent set up the appointments." Pete immediately agreed to share the chauffeuring. At the doctor's office, Lucy instructed her Semantic Web agent through her handheld Web browser. The agent promptly retrieved information about Mom's prescribed treatment from the doctor's agent, looked up several lists of providers, and checked for the ones in-plan for Mom's insurance within a 20-mile radius of her home and with a rating of excellent or very good on trusted rating services. It then began trying to find a match between available appointment times (supplied by the agents of individual providers through their Web sites) and Pete's and Lucy's busy schedules. (The emphasized keywords indicate terms whose semantics, or meaning, were defined for the agent through the Semantic Web.) Tim Berners-Lee, James Hendler and Ora Lassila (2001). The Semantic Web. A new form of Web content that is meaningful to computers will unleash a revolution of new possibilities. Scientific American. 84A9809EC588EF21http://www.sciam.com/article.cfm?articleID= D2-1C70- 84A9809EC588EF21

7 7 Berendt: Gegevensbanken, 2nd semester 2011/2012, 7 The Semantic Web layer cake (T. Berners-Lee talk at XML 2000) RDF: W3C Rec OWL: W3C Rec OWL2: W3C Rec URI = Uniform Resource Identifier, bv: URL (U.R. Locator) : waar te vinden (~ adres van een persoon) URN (U.R. Name) : identiteit (~ naam van een persoon, ISBN van een boek)

8 8 Berendt: Gegevensbanken, 2nd semester 2011/2012, 8 Gegevensbanken Outlook – The Semantic Web, XML, RDF, Linked (Open) Data, NoSQL Bettina Berendt Katholieke Universiteit Leuven, Department of Computer Science

9 9 Berendt: Gegevensbanken, 2nd semester 2011/2012, 9 You have data … How should you structure it? medium-altitude, long-endurance unmanned aerial vehicle 14.7 meters 512 kilograms 70 knots Here's some data about an aircraft: 400 nautical miles

10 10 Berendt: Gegevensbanken, 2nd semester 2011/2012, 10 The XML approach is to "wrap" each data item in start/end tags 14.8 meters 512 kilograms 70 knots 400 nautical miles medium-altitude, long-endurance unmanned aerial vehicle RQ-1.xml and define this data schema, in a DTD or XML Schema

11 11 Berendt: Gegevensbanken, 2nd semester 2011/2012, 11 XML Terminology 14.8 meters Start tag End tag Data Element

12 12 Berendt: Gegevensbanken, 2nd semester 2011/2012, 12 Why use XML? n It is a universally accepted standard way of structuring data (syntax). n It is a W3C recommendation (W3C = World Wide Web Consortium) n The marketplace supports it with a lot of free/inexpensive tools. n The alternative to using XML is to define your own proprietary data syntax, and then build your own proprietary tools to support the proprietary syntax (Not a very appealing idea).

13 13 Berendt: Gegevensbanken, 2nd semester 2011/2012, 13 But: What is this XML snippet talking about, i.e., what are the semantics? … What is a Predator?

14 14 Berendt: Gegevensbanken, 2nd semester 2011/2012, 14 Predator - which one? n Predator: a medium-altitude, long-endurance unmanned aerial vehicle system. n Predator : one that victimizes, plunders, or destroys, especially for one's own gain. n Predator : an organism that lives by preying on other organisms. n Predator: a company which specializes in camouflage attire. n Predator: a video game. n Predator: software for machine networking. n Predator: a chain of paintball stores.

15 15 Berendt: Gegevensbanken, 2nd semester 2011/2012, 15 A little more flexibility through namespaces OL231-b 14.8 metres Panthera antelopes

16 16 Berendt: Gegevensbanken, 2nd semester 2011/2012, 16 Querying XML Verschillende querytalen, bv. XPath, XQuery

17 17 Berendt: Gegevensbanken, 2nd semester 2011/2012, 17

18 18 Berendt: Gegevensbanken, 2nd semester 2011/2012, 18

19 19 Berendt: Gegevensbanken, 2nd semester 2011/2012, 19

20 20 Berendt: Gegevensbanken, 2nd semester 2011/2012, 20

21 21 Berendt: Gegevensbanken, 2nd semester 2011/2012, 21

22 22 Berendt: Gegevensbanken, 2nd semester 2011/2012, 22

23 23 Berendt: Gegevensbanken, 2nd semester 2011/2012, 23

24 24 Berendt: Gegevensbanken, 2nd semester 2011/2012, 24 Problems of XML 1. What does nesting mean? 2. What do syntactical variations mean? 3. What do linguistic variations mean? 4. How can we extend our knowledge?

25 25 Berendt: Gegevensbanken, 2nd semester 2011/2012, What does nesting mean? Schema 1 allows for expressions like: Peter Parker...  name being an XML-element of Person means: the person HAS-A... Schema 2 allows for expressions like: Comic-book hero...  type being an XML-element of Person means: the person IS-A... Problems: a) we don‘t know what nesting means, b) even if we do know, we can‘t express this in a machine-readable way (at most build it into an application that uses these XML statements, but that would bury meaning in procedures!)

26 26 Berendt: Gegevensbanken, 2nd semester 2011/2012, What do syntactical variations mean? Schema 1 allows for expressions like: Peter Parker Schema 2 allows for expressions like: Comic-book hero... Problems: a) what does it mean for some information to be an XML- element vs. an XML-attribute? b) even if we do know that they are the same, we can‘t express this in a machine-readable way, for example to combine the information from the two sources (same remark about applications as in 1.)

27 27 Berendt: Gegevensbanken, 2nd semester 2011/2012, What do linguistic variations mean? Schema 1 allows for expressions like: Peter Parker... Schema 2 allows for expressions like: Peter Parker... Problems: a) we do not know whether elements from different data sources that differ by, e.g. natural, language, are the same or not b) even if we do know that they are the same, we can‘t express this in a machine-readable way, for example to combine the information from the two sources (same remark about applications as in 1.)

28 28 Berendt: Gegevensbanken, 2nd semester 2011/2012, How can we extend our knowledge? Schema 1 allows for expressions like: Picture Peter Parker... Schema 2 allows for expressions like: CreativeCommons... Problems: a) we cannot refine our schema information by that provided by another source b) even if we can be sure about principal linkability (here: via the URL), we can‘t express this in a machine-readable way, for example to combine the information from the two sources (same remark about applications as in 1.)

29 29 Berendt: Gegevensbanken, 2nd semester 2011/2012, 29 Summary: XML not well-suited for conceptual modelling and therefore not suited for truly semantic markup XML makes no commitment on:  Domain-specific ontological vocabulary  Ontological modeling primitives Requires pre-arranged agreement on  &  Only feasible for closed collaboration n agents in a small & stable community n pages on a small & stable intranet Not suited for sharing Web-resources

30 30 Berendt: Gegevensbanken, 2nd semester 2011/2012, 30 Solution approach of the „higher levels“ of the Semantic Web 1. Break down information into atomic statements: subject-predicate- object 2. Define (in a formal-semantics way) what each component of each statement means a. Give it a URI (uniform resource identifier) to enable uniform meaning specification b. Define languages to say more about (specify) the meaning (by relating it to other units of meaning – cf. a dictionary in which each word is explained by other words) 3. The languages mentioned in 2.b. each add more expressivity: 1. RDF: subject-predicate-object statements (in RDF terminology: a resource has a property with a certain value. 2. RDFS: simple ontology building blocks: class, subclass-of relation, use RDF‘s type to denote that (e.g.) an individual is a instance of a class (= make it possible to define a schema and its instances), OWL: more advanced ontology building blocks: a class (= concept) is disjoint with another one, is the same as another one; a property is functional, symmetric, the inverse of another one;...

31 31 Berendt: Gegevensbanken, 2nd semester 2011/2012, 31 Semantic Web vs. Database Advantages of using RDF/RDFS/OWL to define an Ontology: n Extensible: much easier to add new properties. Contrast with a database - adding a new column may break a lot of applications n Portable: much easier to move an OWL document than to move a database. Advantages of using a Database to define an Ontology: n Mature: the database technology has been around a long time and is very mature.

32 32 Berendt: Gegevensbanken, 2nd semester 2011/2012, 32 Gegevensbanken Outlook – The Semantic Web, XML, RDF, Linked (Open) Data, NoSQL Bettina Berendt Katholieke Universiteit Leuven, Department of Computer Science

33 33 Berendt: Gegevensbanken, 2nd semester 2011/2012, 33 RDF model RDF “statements” consist of resources (= nodes) which have properties which have values (= nodes,strings) “Ora Lassila” author = subject = predicate = object “http://www.w3.org/TR/REC-rdf-syntax/ has the author Ora Lassila” resource value property

34 34 Berendt: Gegevensbanken, 2nd semester 2011/2012, 34 RDF Model Example “Ora Lassila” dc:Creator “ ” dc:Date “W3C” dc:Publisher

35 35 Berendt: Gegevensbanken, 2nd semester 2011/2012, 35 Complex values So far, values of properties have been strings A graph node (corresponding to a resource) also can be the value of a property n arbitrarily complex tree and graph structures are possible n syntactically, values can be embedded (i.e. lexically in-line) or referenced (linked) Example: “Ora Lassila” dc:Creator p: p:Name

36 36 Berendt: Gegevensbanken, 2nd semester 2011/2012, 36 RDF in XML “Ora Lassila”

37 37 Berendt: Gegevensbanken, 2nd semester 2011/2012, 37 RDF Schema Defines small vocabulary for RDF: Class, subClassOf, type Property, subPropertyOf domain, range Vocabulary can be used to define other vocabularies for your application domain Person StudentResearcher subClassOf Jeen type hasSuperVisor domain range Frank type hasSuperVisor

38 38 Berendt: Gegevensbanken, 2nd semester 2011/2012, 38 RDF Schema syntax in XML

39 39 Berendt: Gegevensbanken, 2nd semester 2011/2012, 39 Gegevensbanken Outlook – The Semantic Web, XML, RDF, Linked (Open) Data, NoSQL Bettina Berendt Katholieke Universiteit Leuven, Department of Computer Science

40 40 Berendt: Gegevensbanken, 2nd semester 2011/2012, 40 Wat is dit? Kunnen we hiermee iets doen?

41 41 Berendt: Gegevensbanken, 2nd semester 2011/2012, 41 Gecombineerd door SIG.MA

42 42 Berendt: Gegevensbanken, 2nd semester 2011/2012, 42 En hoe werkt dit? Linked Open Data: n “A way of making the Semantic Web happen“ (it is hoped) n Key concept: leverage the existence of structured data and combine it with the languages and infrastructures of the Web and the Semantic Web End 2011: 32 billion triples

43 43 Berendt: Gegevensbanken, 2nd semester 2011/2012, 43 Data items are identified with HTTP URIs pd:cygri Richard Cyganiak dbpedia:Berlin foaf:name foaf:based_near foaf:Person rdf:type pd:cygri = dbpedia:Berlin = From

44 44 Berendt: Gegevensbanken, 2nd semester 2011/2012, 44 Resolving URIs over the Web dp:Cities_in_Germany dp:population skos:subject Richard Cyganiak dbpedia:Berlin foaf:name foaf:based_near foaf:Person rdf:type pd:cygri From

45 45 Berendt: Gegevensbanken, 2nd semester 2011/2012, 45 Dereferencing URIs over the Web dp:Cities_in_Germany dp:population skos:subject Richard Cyganiak dbpedia:Berlin foaf:name foaf:based_near foaf:Person rdf:type dbpedia:Hamburg dbpedia:Muenchen skos:subject pd:cygri From

46 46 Berendt: Gegevensbanken, 2nd semester 2011/2012, 46 What is LOD? n “A way of making the Semantic Web happen“ (it is hoped) n Key concept: leverage the existence of structured data and combine it with the languages and infrastructures of the Web and the Semantic Web n Tim Berners-Lee: four principles of Linked Data (http://www.w3.org/DesignIssues/LinkedData)http://www.w3.org/DesignIssues/LinkedData l Use URIs to identify things.URIs l Use HTTP URIs so that these things can be referred to and looked up ("dereferenced") by people and user agents.HTTPdereferenceduser agents l Provide useful information about the thing when its URI is dereferenced, using standard formats such as RDF/XML.RDF/XML l Include links to other, related URIs in the exposed data to improve discovery of other related information on the Web.

47 47 Berendt: Gegevensbanken, 2nd semester 2011/2012, 47 SPARQL: The standard query language for LOD "What are all the country capitals in Africa?" PREFIX abc: SELECT ?capital ?country WHERE { ?x abc:cityname ?capital ; abc:isCapitalOf ?y. ?y abc:countryname ?country ; abc:isInContinent abc:Africa. }

48 48 Berendt: Gegevensbanken, 2nd semester 2011/2012, 48 Connecting to a database … ah … triple store

49 49 Berendt: Gegevensbanken, 2nd semester 2011/2012, 49 The Linked Open Data Cloud

50 50 Berendt: Gegevensbanken, 2nd semester 2011/2012, 50 Gegevensbanken Outlook – The Semantic Web, XML, RDF, Linked (Open) Data, NoSQL Bettina Berendt Katholieke Universiteit Leuven, Department of Computer Science

51 51 Berendt: Gegevensbanken, 2nd semester 2011/2012, 51 History of the World, Part 1 Relational Databases – mainstay of business Web-based applications caused spikes n Especially true for public-facing e-Commerce sites Developers begin to front RDBMS with memcache or integrate other caching mechanisms within the application (ie. Ehcache) From: Perry Hoekstra. NoSQL.

52 52 Berendt: Gegevensbanken, 2nd semester 2011/2012, 52 SELECT * FROM members WHERE name LIKE „%kirsten%“ ???? Get write lock Update friends table Release write lock ????

53 53 Berendt: Gegevensbanken, 2nd semester 2011/2012, 53 Herinnering: Taak voor de volgende les Zijn alle ACID eigenschappen even belangrijk voor de volgende types van toepassingen? Wat kann je doen als voor je toepassing snelheid heel belangrijk is? Online banking Een online shop (e.g. boeken/media) Een sociale netwerk site

54 54 Berendt: Gegevensbanken, 2nd semester 2011/2012, 54 Scaling Up Issues with scaling up when the dataset is just too big RDBMS were not designed to be distributed Began to look at multi-node database solutions Known as ‘scaling out’ or ‘horizontal scaling’ Different approaches include: n Master-slave n Sharding All approaches come with their own respective problems From: Perry Hoekstra. NoSQL.

55 55 Berendt: Gegevensbanken, 2nd semester 2011/2012, 55 What is NoSQL? Stands for Not Only SQL Class of non-relational data storage systems Usually do not require a fixed table schema nor do they use the concept of joins All NoSQL offerings relax one or more of the ACID properties (will talk about the CAP theorem) NoSQL best gebruikt in grote gedistribueerde gegevensbanken! From: Perry Hoekstra. NoSQL.

56 56 Berendt: Gegevensbanken, 2nd semester 2011/2012, 56 Why NoSQL? For data storage, an RDBMS cannot be the be-all/end-all Just as there are different programming languages, need to have other data storage tools in the toolbox A NoSQL solution is more acceptable to a client now than even a year ago n Think about proposing a Ruby/Rails or Groovy/Grails solution now versus a couple of years ago From: Perry Hoekstra. NoSQL.

57 57 Berendt: Gegevensbanken, 2nd semester 2011/2012, 57 Dynamo and BigTable Three major papers were the seeds of the NoSQL movement n BigTable (Google) n Dynamo (Amazon) l Gossip protocol (discovery and error detection) l Distributed key-value data store l Eventual consistency n CAP Theorem (discuss in a sec..) From: Perry Hoekstra. NoSQL.

58 58 Berendt: Gegevensbanken, 2nd semester 2011/2012, 58 CAP Theorem Three properties of a system: consistency, availability and partitions You can have at most two of these three properties for any shared-data system To scale out, you have to partition. That leaves either consistency or availability to choose from n In almost all cases, you would choose availability over consistency Note that this is a slightly different notion of consistency than the one we are used to from transaction systems (ACID)! n From: Perry Hoekstra. NoSQL.

59 59 Berendt: Gegevensbanken, 2nd semester 2011/2012, 59 Availability Traditionally, thought of as the server/process available five 9’s ( %). However, for large node system, at almost any point in time there’s a good chance that a node is either down or there is a network disruption among the nodes. n Want a system that is resilient in the face of network disruption From: Perry Hoekstra. NoSQL.

60 60 Berendt: Gegevensbanken, 2nd semester 2011/2012, 60 Consistency Model A consistency model determines rules for visibility and apparent order of updates. For example: n Row X is replicated on nodes M and N n Client A writes row X to node N n Some period of time t elapses. n Client B reads row X from node M n Does client B see the write from client A? n Consistency is a continuum with tradeoffs n For NoSQL, the answer would be: maybe n CAP Theorem states: Strict Consistency can't be achieved at the same time as availability and partition-tolerance. From: Perry Hoekstra. NoSQL.

61 61 Berendt: Gegevensbanken, 2nd semester 2011/2012, 61 Eventual Consistency When no updates occur for a long period of time, eventually all updates will propagate through the system and all the nodes will be consistent For a given accepted update and a given node, eventually either the update reaches the node or the node is removed from service Known as BASE (Basically Available, Soft state, Eventual consistency), as opposed to ACID From: Perry Hoekstra. NoSQL.

62 62 Berendt: Gegevensbanken, 2nd semester 2011/2012, 62 What kinds of NoSQL NoSQL solutions fall into two major areas: n Key/Value or ‘the big hash table’. l Amazon S3 (Dynamo) l Voldemort l Scalaris n Schema-less which comes in multiple flavors, column-based, document-based or graph-based. l Cassandra (column-based) l CouchDB (document-based) l Neo4J (graph-based) l HBase (column-based) From: Perry Hoekstra. NoSQL.

63 63 Berendt: Gegevensbanken, 2nd semester 2011/2012, 63 Dus, kunnen jullie nu beantwoorden: p 26 tabel 2.4: Relationele databases komen slecht uit de vergelijking, waarom worden deze dan zo veel gebruikt?

64 64 Berendt: Gegevensbanken, 2nd semester 2011/2012, 64 Gegevensbanken Outlook – The Semantic Web, XML, RDF, Linked (Open) Data, NoSQL Bettina Berendt Katholieke Universiteit Leuven, Department of Computer Science

65 65 Berendt: Gegevensbanken, 2nd semester 2011/2012, 65 Data mining/information retrieval and Linked Data? Crowdsourcing: Unstructured / semi-structured information  Structured data DM and IR: Unstructured / semi-structured information  Structured data … and vice versa: LOD as a data source for DM !

66 66 Berendt: Gegevensbanken, 2nd semester 2011/2012, 66 NoSQL and Linked Data ? „RDF database systems are the only standardized NoSQL solutions available at the moment, being built on a simple, uniform data model and a powerful, declarative query language.” More ideas: processing/

67 67 Berendt: Gegevensbanken, 2nd semester 2011/2012, 67 NoSQL and Data Mining / Information Retrieval ? Indeed! Since scalability is a huge issue! More in Advanced Databases and Text-Based Information Retrieval, where you‘ll work with such systems (and, if you want, use LOD …)

68 68 Berendt: Gegevensbanken, 2nd semester 2011/2012, 68


Download ppt "1 Berendt: Gegevensbanken, 2nd semester 2011/2012, 1 Gegevensbanken Outlook – The Semantic Web, XML, RDF,"

Similar presentations


Ads by Google