Presentation is loading. Please wait.

Presentation is loading. Please wait.

(C) 2000, The University of Michigan 1 Database Application Design Handout #11 March 24, 2000.

Similar presentations


Presentation on theme: "(C) 2000, The University of Michigan 1 Database Application Design Handout #11 March 24, 2000."— Presentation transcript:

1 (C) 2000, The University of Michigan 1 Database Application Design Handout #11 March 24, 2000

2 (C) 2000, The University of Michigan 2 Course information Instructor: Dragomir R. Radev (radev@si.umich.edu) Office: 305A, West Hall Phone: (734) 615-5225 Office hours: Thursdays 3-4 and Fridays 1-2 Course page: http://www.si.umich.edu/~radev/654w00 Class meets on Fridays, 2:30 - 5:30 PM, 311 WH

3 (C) 2000, The University of Michigan 3 Web-based databases

4 (C) 2000, The University of Michigan 4 Types of databases Textual databases Semi-structured databases

5 (C) 2000, The University of Michigan 5 Indexing textual data Inverted files Boolean queries Signature files Signature S 1 matches signature S 2 if S 2 &S 1 =S 2

6 (C) 2000, The University of Michigan 6 XML-QL

7 (C) 2000, The University of Michigan 7 XML-QL WHERE $1 in “www.booklist.com/books.xml CONSTRUCT $1 Two slides from Johannes Gehrke, Cornell University x y 2

8 (C) 2000, The University of Michigan 8 XML-QL (continued) WHERE $b IN “www.booklist.com/books.xml”, $n $p in $e CONSTRUCT $p WHERE $l IN $n CONSTRUCT $l

9 (C) 2000, The University of Michigan 9 XML-QL (continued)

10 (C) 2000, The University of Michigan 10 WHERE Addison-Wesley $t $a IN "www.a.b.c/bib.xml" CONSTRUCT $a XML-QL (continued)

11 (C) 2000, The University of Michigan 11 WHERE Addison-Wesley $t $a IN "www.a.b.c/bib.xml" CONSTRUCT $a XML-QL (continued)

12 (C) 2000, The University of Michigan 12 WHERE Addison-Wesley $t $a IN "www.a.b.c/bib.xml" CONSTRUCT $a $t XML-QL (continued)

13 (C) 2000, The University of Michigan 13 An Introduction to Database Systems Date Addison-Wesley Foundation for Object/Relational Databases: The Third Manifesto Date Darwen Addison-Wesley XML-QL (continued)

14 (C) 2000, The University of Michigan 14 Date An Introduction to Database Systems Date Foundation for Object/Relational Databases: The Third Manifesto Darwen Foundation for Object/Relational Databases: The Third Manifesto XML-QL (continued)

15 (C) 2000, The University of Michigan 15 WHERE $p IN "www.a.b.c/bib.xml", $t, Addison-Wesley > IN $p CONSTRUCT $t WHERE $a IN $p CONSTRUCT $a XML-QL (continued)

16 (C) 2000, The University of Michigan 16 An Introduction to Database Systems Date Foundation for Object/Relational Databases: The Third Manifesto Date Darwen XML-QL (continued)

17 (C) 2000, The University of Michigan 17 WHERE $f // firstname $f $l // lastname $l CONTENT_AS $a IN "www.a.b.c/bib.xml" $f // join on same firstname $f $l // join on same lastname $l IN "www.a.b.c/bib.xml", y > 1995 CONSTRUCT $a XML-QL (continued)

18 (C) 2000, The University of Michigan 18 XML-QL (continued)

19 (C) 2000, The University of Michigan 19 XML-QL (continued)

20 (C) 2000, The University of Michigan 20 John Smith...... 1995 XML-QL (continued)

21 (C) 2000, The University of Michigan 21 XML-QL (continued)

22 (C) 2000, The University of Michigan 22 WHERE $n IN "abc.xml” XML-QL (continued) WHERE ELEMENT_AS $t, ELEMENT_AS $l CONSTRUCT $t $l

23 (C) 2000, The University of Michigan 23 Scalar values A Trip to the Moon NOT! A Trip to the Moon YES

24 (C) 2000, The University of Michigan 24 Tag variables WHERE $t 1995 Smith IN "www.a.b.c/bib.xml", $e IN {author, editor} CONSTRUCT $t Smith

25 (C) 2000, The University of Michigan 25 Transforming data

26 (C) 2000, The University of Michigan 26 Transforming data (cont’d) WHERE $fn $ln $t IN "www.a.b.c/bib.xml", CONSTRUCT $fn $ln $t

27 (C) 2000, The University of Michigan 27 Integrating data from different sources WHERE ELEMENT_AS $n $ssn IN "www.a.b.c/data.xml", $ssn ELEMENT_AS $i IN "www.irs.gov/taxpayers.xml" CONSTRUCT $n $i

28 (C) 2000, The University of Michigan 28 Query blocks WHERE $t 1995 CONTENT_A $p IN "www.a.b.c/bib.xml" CONSTRUCT $t { WHERE $e = "journal-paper", $m IN $p CONSTRUCT $m } { WHERE $e = "book", $q IN $p CONSTRUCT $q }

29 (C) 2000, The University of Michigan 29 WSQ

30 (C) 2000, The University of Michigan 30 Web-supported queries SIGMOD2000 (Goldman and Widom) WebPages (SearchExp,T1,T2,…,Tn,URL,Rank, Date) SELECT NAME, COUNT FROM STATES, WEBCOUNT WHERE NAME = T1 ORDER BY COUNT DESC

31 (C) 2000, The University of Michigan 31 XHTML

32 (C) 2000, The University of Michigan 32 Simple example <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "DTD/xhtml1-strict.dtd"> Virtual Library Moved to vlib.org.

33 (C) 2000, The University of Michigan 33 SI 760 Language and information (Fall 2000)

34 (C) 2000, The University of Michigan 34 SI 760 (1) Classes 1-3 Introduction to the course and linguistic background The study of language. Computational Linguistics and Psycholinguistics. Classes 4-5 Elementary probability and statistics Describing data. Measures of central tendency. The z score. Hypothesis testing. Classes 6-8 Information theory Entropy, joint entropy, conditional entropy. Relative entropy and mutual information. Chain rules. Classes 9-10 Data compression and coding Entropy rate. Language modeling. Examples of codes. Optimal codes. Huffman codes. Arithmetic coding. The entropy of English.

35 (C) 2000, The University of Michigan 35 SI 760 (2) Classes 11-12 Clustering Cluster analysis. Clustering of terms according to semantic similarity. Distributional clustering. Classes 13-14 Concordancing and collocations Concordances. Collocations. Syntactic criteria for collocability. Classes 15-16 Literary detective work The statistical analysis of writing style. Decipherment and translation. Classes 17-18 Information extraction Message understanding. Trainable methods.

36 (C) 2000, The University of Michigan 36 SI 760 (3) Classes 19-20 Word sense disambiguation and lexical acquisition Supervised disambiguation. Unsupervised disambiguation. Attachment ambiguity. Computational lexicography. Classes 21-22 Part-of-speech tagging Statistical taggers. Transformation-based learning of tags. Maximum entropy models. Weighted finite- state transducers. Classes 23-24 Question answering Semantic representation. Predictive annotation.

37 (C) 2000, The University of Michigan 37 SI 760 (4) Classes 25-26 Text summarization Single-document summarization. Multi-document summarization. Language models. Maximal Marginal Relevance. Cross-document structure theory. Trainable methods. Text categorization. Classes 27-28 (30) Other topics Text alignment. Word alignment. Statistical machine translation. Discourse segmentation. Text categorization. Maximum entropy modeling.

38 (C) 2000, The University of Michigan 38 SI 760 (5) Manning and Schuetze. Foundations of Statistical Natural Language Processing. MIT Press. 1999. Jurafsky and Martin. Speech and Language Processing. Prentice-Hall 2000. Cover & Thomas. Elements of Information Theory. John Wiley and Sons 1991. Baeza-Yates and Ribeiro-Neto. Modern Information Retrieval. Addison-Wesley 1999. Oakes. Statistics for Corpus Linguistics. Edinburgh University Press 1998.

39 (C) 2000, The University of Michigan 39 Course URL http://www.si.umich.edu/~radev/760f00

40 (C) 2000, The University of Michigan 40 Readings for next time Web-based readings –Asilomar report: http://www.acm.org/sigmod/record/issues/9812/asilomar.html –White paper on XML: http://www-db.stanford.edu/~widom/xml-whitepaper.html


Download ppt "(C) 2000, The University of Michigan 1 Database Application Design Handout #11 March 24, 2000."

Similar presentations


Ads by Google