Presentation is loading. Please wait.

Presentation is loading. Please wait.

Models and languages for semistructured data Bridging documents and databases.

Similar presentations


Presentation on theme: "Models and languages for semistructured data Bridging documents and databases."— Presentation transcript:

1 Models and languages for semistructured data Bridging documents and databases

2 Lectures 1. Introduction to data models 2. Query languages for relational databases 3. Models and query languages for object databases 4. Models and query languages for semistructured data, XML 5. Embedded query languages 6. Guest lecture on Object Role Modelling

3 Why do we like types? zTypes facilitate understanding zTypes enable compact representations zTypes enable query optimisation zTypes facilitate consistency enforcement

4 Background assumptions for typed data zData stable over time zOrganisational body to control data zExercise: Give an example of a context where these assumptions do not hold

5 Semistructured data Semistructured data is schemaless and self describing The data and the description of the data are integrated

6 An example {name: {first: “John”, last: “Smith”}, tel: 112233, email: “john@123.edu”} “John” “Smith” 112233 “john@123.edu” name tel email first last

7 Another example person name age child &o1&o2 “Eva” 40 “Abel” 20 {person: &o1{name: “Eva”, age: 40, child: &o2}, person: &o2{name: “Abel”, age: 20}} An object identifier, such as &o1, before a structure, binds the object identifier to the identity of that structure. The object identifier can then be used to refer to the structure.

8 Terminology The following is an ssd-expression: &o1{name: “Eva”, age: 40, child: &o2} Label Value Object identifier

9 A database biblio paper book author title date Crick Wallace DNA spiral 1956 author title date n1 n2 Darwin Origin1848 db author title date n3 Marx Kapital1860 book …….

10 Path expressions A path expression is a sequence of labels: l 1.l 2 …l n A path expression results in a set of nodes Path properties are specified by regular expressions on two levels: on the alphabet of labels and on the alphabet of characters that comprise labels

11 A path expression biblio paper book author title date Crick Wallace DNA spiral 1956 author title date n1 n2 Darwin Origin1848 db author title date n3 Marx Kapital1860 book ……. biblio.book.author

12 A path expression biblio paper book author title date Crick Wallace DNA spiral 1956 author title date n1 n2 Darwin Origin1848 db author title date n3 Marx Kapital1860 book ……. biblio.(book l paper).author

13 Examples of path expressions zbiblio.book.author - authors of books zbiblio.paper.author - authors of papers zbiblio.(book l paper).author - authors of books or papers zbiblio._.author - authors of anything zbiblio._*.author - nodes at the ends of paths starting with biblio, ending with author, and having an arbitrary sequence of labels between

14 Example of a label pattern z((b l B)ook l (a l A)uthor) (s)? - book, Book, author, Author, books, Books, authors, Authors

15 An exercise biblio._*.author.(“[s l S]ection”) Which ones of the following paths match the path expression above? 1. Biblio.author.Section 2. Biblio.cat.rat.hat.author.section 3. Biblio.author 4. Biblio.cat.author.section.Section

16 A simple query Select author: X from biblio.book.author X Result: {author: “Darwin”, author: “Marx”}

17 A query with a condition select row: X from biblio._ X where “Crick” in X.author Result: {row: {author: “Crick”, author: “Wallace”, date: 1956, title: “The spiral DNA”}, …}

18 Two exercises select row: {title: Y, date: Z} from biblio.paper X, X.title Y, X.date Z select row: {author: Y, date: Z} from biblio.book X, X.author Y, X.date Z

19 A database biblio paper book author title date Crick Wallace DNA spiral 1956 author title date n1 n2 Darwin Origin1848 db author title date n3 Marx Kapital1860 book ……. select row: {title: Y, date: Z} from biblio.paper X, X.title Y, X.date Z

20 A database biblio paper book author title date Crick Wallace DNA spiral 1956 author title date n1 n2 Darwin Origin1848 db author title date n3 Marx Kapital1860 book …….

21 Nested queries select row: (select author: Y from X.author Y) from biblio.book X

22 Three exercises zWhich authors have written a book or a paper in 1992? zWhich authors have written a book together with Jones? zWhich authors have written both a book and a paper?

23 Expressing relations a b c 1 2 3 3 2 2 4 3 1 b d e 1 1 3 3 4 2 2 3 1 r1r2 { r1: { row: {a: 1, b:2, c:2}, row: {a: 1, b:2, c:2}, row: {a: 1, b:2, c:2} }, r2: { row: {b: 1, d:2, e:2}, row: {b: 1, d:2, e:2}, row: {b: 1, d:2, e:2} } }

24 Expressing relational joins select a: A, d: D fromr1.row X r2.row Y X.a A, X.b B, Y.b B’, Y.d D where B = B’

25 Label variables select L: X from biblio._*.L X where matches(“.*Shakespeare.*”, X) Label variable biblio book author title date n2 Shakespeare Macbeth1622 db author title date n3 Smith Best of Shakespeare1992 book …….

26 Label variables select L: X from biblio._*.L X where matches(“.*Shakespeare.*”, X) {author: “Shakespeare”, title: “Best of Shakespeare”}

27 Turning labels into data select publ: {type: L, author: A} from biblio.L X, X.author A biblio paper book author title date Crick Wallace DNA spiral 1956 author title date n1 n2 Darwin Origin1848 db {publ: {type: “paper”, author: “Crick”}, publ: {type: “paper”, author: “Wallace”}, publ: {type: “book”, author: “Darwin”}

28 An exercise zList all publications in 1992, their types, and titles.

29 Basic XML syntax XML is a textual representation of data An element is a text bounded by tags John start-tag end-tagcontent element can be abbreviated as

30 Basic XML syntax Elements may contain subelements John 112233 john@123.edu

31 XML attributes An attribute is defined by a name-value pair within a tag 500 25

32 XML attributes and elements widget 10 widget

33 XML and ssd-expressions John 112233 john@123.edu {person: {name: “John”, tel: 112233, email: “john@123.edu”}}

34 XML references John 112233 Peter 998877 element identifier reference attribute

35 Document Type Definitions <!DOCTYPE db [ ]>

36 An exercise on DTDs as schemas a1 b1 a2 b2 a1 b1 c2 d2 a1 b1 Write down a DTD for the data above!

37 Attributes in DTDs trumpet 500 25 <!ATTLIST name language CDATA #REQUIRED departmentCDATA #IMPLIED>

38 Reference attributes in DTDs <!DOCTYPE people [ <!ATTLIST person id ID#REQUIRED bossIDREF#REQUIRED friendsIDREFS#IMPLIED> ]>

39 An exercise id = “sven” boss = “olle”> Sven Svensson id = “olle” friends = “nils eva”> Olle Olsson id = “pelle” boss = “nils eva”> Per Persson Does this XML element conform to the previous DTD?

40 Limitations of DTDs as schemas zDTDs impose order zNo base types zThe types of IDREFs cannot be constrained

41 XSL - extensible stylesheet language t1 a1 a2 t2 a3 a4 t3 a5 a6

42 Template rules and XSL patterns } Template rule XSL pattern t1 t2 t3

43 Two exercises select row: {title: Y, date: Z} from biblio.paper X, X.title Y, X.date Z {row: {title: “The spiral DNA”, date: 1956}, {title: “Origin”, date: 1848}, {title: “Kapital”, date: 1860}} select row: {author: Y, date: Z} from biblio.book X, X.author Y, X.date Z

44 Which authors have written a book or a paper in 1992? select author: X from biblio.(book | paper) Y, Y.author X where Y.date = 1992

45 Which authors have written a book together with Jones? select author: X from biblio.book Y, Y.author X where “Jones” in Y.author

46 Which authors have written both a book and a paper? select author: A from biblio.book B, biblio.paper P, B.author A where B.author = P.author select author: A1 from biblio.book B, biblio.paper P, B.author A1, P.author A2 where A1 = A2

47 List all publications in 1992, their types, and titles. select publ: {type: L, title: T} from biblio.L X, X.title T where X.date = 1992

48 <!DOCTYPE db [ ]> a1 b1 a2 b2 a1 b1 c2 d2 a1 b1


Download ppt "Models and languages for semistructured data Bridging documents and databases."

Similar presentations


Ads by Google