(C) 2000, The University of Michigan 1 Database Application Design Handout #11 March 24, 2000.

Slides:



Advertisements
Similar presentations
The Application of Machine Translation in CADAL Huang Chen, Chen Haiying Zhejiang University Libraries, Hangzhou, China
Advertisements

Feature Selection as Relevant Information Encoding Naftali Tishby School of Computer Science and Engineering The Hebrew University, Jerusalem, Israel NIPS.
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Natural Language and Speech Processing Creation of computational models of the understanding and the generation of natural language. Different fields coming.
CSE111: Great Ideas in Computer Science Dr. Carl Alphonce 219 Bell Hall Office hours: M-F 11:00-11:
Information Retrieval in Practice
Shallow Processing: Summary Shallow Processing Techniques for NLP Ling570 December 7, 2011.
Web Information Retrieval and Extraction Chia-Hui Chang, Associate Professor National Central University, Taiwan
SLIDE 1IS 202 – FALL 2004 Lecture 13: Midterm Review Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am -
1 Empirical Learning Methods in Natural Language Processing Ido Dagan Bar Ilan University, Israel.
XML - QL A Query Language for XML Version /2000XML-QL2 Outline * Introduction * Examples in XML-QL * A Data Model for XML * Advanced Examples in.
Introduction to CL Session 1: 7/08/2011. What is computational linguistics? Processing natural language text by computers  for practical applications.
Web Information Retrieval and Extraction Chia-Hui Chang, Associate Professor National Central University, Taiwan Sep. 16, 2005.
1 Statistics XML: –Altavista: 800,000 pages returned. –Amazon.com: 242 books. In comparison: –God: 12,000 books, 7 Million pages –Bible: 32,000 books,
1 Information Retrieval and Web Search Introduction.
What is a document? Information need: From where did the metaphor, doing X is like “herding cats”, arise? quotation? “Managing senior programmers is like.
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
4/15/2002Bo Du 1 - Bo Du, April 15, XML - QL A Query Language for XML.
تمرين شماره 1 درس NLP سيلابس درس NLP در دانشگاه هاي ديگر ___________________________ راحله مکي استاد درس: دکتر عبدالله زاده پاييز 85.
TextMOLE: Text Mining Operations Library and Environment Daniel B. Waegel and April Kontostathis, Ph.D. Ursinus College Collegeville PA.
Fundamentals, Design, and Implementation, 9/e Text and XML databases Instructor: Dragomir R. Radev Winter 2005.
Overview of Search Engines
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
ELN – Natural Language Processing Giuseppe Attardi
INFORMATION THEORY CONDITIONAL ENTROPY Thomas Tiahrt, MA, PhD CSC492 – Advanced Text Analytics.
NLP.
Machine Translation, Digital Libraries, and the Computing Research Laboratory Indo-US Workshop on Digital Libraries June 23, 2003.
Text summarization MEAD NewsInEssence Cross-document structure Sentence compression Lexrank Political science Discourse dynamics Centrality identification.
(C) 2000, The University of Michigan 1 Database Application Design Handout #4 January 28, 2000.
Information Retrieval and Web Search Lecture 1. Course overview Instructor: Rada Mihalcea Class web page:
1 Information Retrieval, Search, and Mining Introduction.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
(C) 2000, The University of Michigan 1 Database Application Design Handout #10 March 17, 2000.
Text Feature Extraction. Text Classification Text classification has many applications –Spam detection –Automated tagging of streams of news articles,
Lecture 6: XML Query Languages Thursday, January 18, 2001.
1 CSI 5180: Topics in AI: Natural Language Processing, A Statistical Approach Instructor: Nathalie Japkowicz Objectives of.
October 2005CSA3180 NLP1 CSA3180 Natural Language Processing Introduction and Course Overview.
For Monday Read chapter 24, sections 1-3 Homework: –Chapter 23, exercise 8.
Introduction to Information Retrieval Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
(C) 2000, The University of Michigan 1 Database Application Design Handout #5 February 4, 2000.
1 Patrick Lambrix Department of Computer and Information Science Linköpings universitet Information Retrieval.
Modern Information Retrieval Presented by Miss Prattana Chanpolto Faculty of Information Technology.
INFORMATION THEORY POLYNESIAN REVISITED Thomas Tiahrt, MA, PhD CSC492 – Advanced Text Analytics.
Information Retrieval and Web Search Course overview Instructor: Rada Mihalcea.
Document Databases for Information Management Gregor Erbach FTW, Wien DFKI, Saarbrucken ETL, Tsukuba
Information Retrieval
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 1 (03/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Introduction to Natural.
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
1 An Introduction to Computational Linguistics Mohammad Bahrani.
(C) 2003, The University of Michigan1 Information Retrieval Handout #2 February 3, 2003.
Semi-structured Data In many applications, data does not have a rigidly and predefined schema: –e.g., structured files, scientific data, XML. Managing.
(C) 2003, The University of Michigan1 Information Retrieval Handout #10 April 7, 2003.
(C) 2003, The University of Michigan1 Information Retrieval Handout #5 January 28, 2005.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
WIRED Week 5 Readings Overview - Text & Multimedia Languages & Properties - Text Operations - Multimedia IR Finalize Topic Discussions Schedule Projects.
임 순 범 숙명여대 정보과학부 멀티미디어학과 1 III. XML-QL 멀티미디어 데이터베이스 ( ~11.1)
CS416 Compiler Design1. 2 Course Information Instructor : Dr. Ilyas Cicekli –Office: EA504, –Phone: , – Course Web.
(C) 2000, The University of Michigan 1 Language and Information Handout #2 September 21, 2000.
Information Retrieval in Practice
Information Retrieval and Web Search
Information Retrieval and Web Search
Information Retrieval and Web Search
Special Topics in Data Mining Applications Focus on: Text Mining
Language and Statistics
Alin Deutsch, University of Pennsylvania Mary Mernandez, AT&T Labs
Multimedia Information Retrieval
How to publish in a format that enhances literature-based discovery?
Introduction to Information Retrieval
Content Analysis of Text
Information Retrieval and Web Search
Presentation transcript:

(C) 2000, The University of Michigan 1 Database Application Design Handout #11 March 24, 2000

(C) 2000, The University of Michigan 2 Course information Instructor: Dragomir R. Radev Office: 305A, West Hall Phone: (734) Office hours: Thursdays 3-4 and Fridays 1-2 Course page: Class meets on Fridays, 2:30 - 5:30 PM, 311 WH

(C) 2000, The University of Michigan 3 Web-based databases

(C) 2000, The University of Michigan 4 Types of databases Textual databases Semi-structured databases

(C) 2000, The University of Michigan 5 Indexing textual data Inverted files Boolean queries Signature files Signature S 1 matches signature S 2 if S 2 &S 1 =S 2

(C) 2000, The University of Michigan 6 XML-QL

(C) 2000, The University of Michigan 7 XML-QL WHERE $1 in “ CONSTRUCT $1 Two slides from Johannes Gehrke, Cornell University x y 2

(C) 2000, The University of Michigan 8 XML-QL (continued) WHERE $b IN “ $n $p in $e CONSTRUCT $p WHERE $l IN $n CONSTRUCT $l

(C) 2000, The University of Michigan 9 XML-QL (continued)

(C) 2000, The University of Michigan 10 WHERE Addison-Wesley $t $a IN " CONSTRUCT $a XML-QL (continued)

(C) 2000, The University of Michigan 11 WHERE Addison-Wesley $t $a IN " CONSTRUCT $a XML-QL (continued)

(C) 2000, The University of Michigan 12 WHERE Addison-Wesley $t $a IN " CONSTRUCT $a $t XML-QL (continued)

(C) 2000, The University of Michigan 13 An Introduction to Database Systems Date Addison-Wesley Foundation for Object/Relational Databases: The Third Manifesto Date Darwen Addison-Wesley XML-QL (continued)

(C) 2000, The University of Michigan 14 Date An Introduction to Database Systems Date Foundation for Object/Relational Databases: The Third Manifesto Darwen Foundation for Object/Relational Databases: The Third Manifesto XML-QL (continued)

(C) 2000, The University of Michigan 15 WHERE $p IN " $t, Addison-Wesley > IN $p CONSTRUCT $t WHERE $a IN $p CONSTRUCT $a XML-QL (continued)

(C) 2000, The University of Michigan 16 An Introduction to Database Systems Date Foundation for Object/Relational Databases: The Third Manifesto Date Darwen XML-QL (continued)

(C) 2000, The University of Michigan 17 WHERE $f // firstname $f $l // lastname $l CONTENT_AS $a IN " $f // join on same firstname $f $l // join on same lastname $l IN " y > 1995 CONSTRUCT $a XML-QL (continued)

(C) 2000, The University of Michigan 18 XML-QL (continued)

(C) 2000, The University of Michigan 19 XML-QL (continued)

(C) 2000, The University of Michigan 20 John Smith XML-QL (continued)

(C) 2000, The University of Michigan 21 XML-QL (continued)

(C) 2000, The University of Michigan 22 WHERE $n IN "abc.xml” XML-QL (continued) WHERE ELEMENT_AS $t, ELEMENT_AS $l CONSTRUCT $t $l

(C) 2000, The University of Michigan 23 Scalar values A Trip to the Moon NOT! A Trip to the Moon YES

(C) 2000, The University of Michigan 24 Tag variables WHERE $t 1995 Smith IN " $e IN {author, editor} CONSTRUCT $t Smith

(C) 2000, The University of Michigan 25 Transforming data

(C) 2000, The University of Michigan 26 Transforming data (cont’d) WHERE $fn $ln $t IN " CONSTRUCT $fn $ln $t

(C) 2000, The University of Michigan 27 Integrating data from different sources WHERE ELEMENT_AS $n $ssn IN " $ssn ELEMENT_AS $i IN " CONSTRUCT $n $i

(C) 2000, The University of Michigan 28 Query blocks WHERE $t 1995 CONTENT_A $p IN " CONSTRUCT $t { WHERE $e = "journal-paper", $m IN $p CONSTRUCT $m } { WHERE $e = "book", $q IN $p CONSTRUCT $q }

(C) 2000, The University of Michigan 29 WSQ

(C) 2000, The University of Michigan 30 Web-supported queries SIGMOD2000 (Goldman and Widom) WebPages (SearchExp,T1,T2,…,Tn,URL,Rank, Date) SELECT NAME, COUNT FROM STATES, WEBCOUNT WHERE NAME = T1 ORDER BY COUNT DESC

(C) 2000, The University of Michigan 31 XHTML

(C) 2000, The University of Michigan 32 Simple example <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "DTD/xhtml1-strict.dtd"> Virtual Library Moved to vlib.org.

(C) 2000, The University of Michigan 33 SI 760 Language and information (Fall 2000)

(C) 2000, The University of Michigan 34 SI 760 (1) Classes 1-3 Introduction to the course and linguistic background The study of language. Computational Linguistics and Psycholinguistics. Classes 4-5 Elementary probability and statistics Describing data. Measures of central tendency. The z score. Hypothesis testing. Classes 6-8 Information theory Entropy, joint entropy, conditional entropy. Relative entropy and mutual information. Chain rules. Classes 9-10 Data compression and coding Entropy rate. Language modeling. Examples of codes. Optimal codes. Huffman codes. Arithmetic coding. The entropy of English.

(C) 2000, The University of Michigan 35 SI 760 (2) Classes Clustering Cluster analysis. Clustering of terms according to semantic similarity. Distributional clustering. Classes Concordancing and collocations Concordances. Collocations. Syntactic criteria for collocability. Classes Literary detective work The statistical analysis of writing style. Decipherment and translation. Classes Information extraction Message understanding. Trainable methods.

(C) 2000, The University of Michigan 36 SI 760 (3) Classes Word sense disambiguation and lexical acquisition Supervised disambiguation. Unsupervised disambiguation. Attachment ambiguity. Computational lexicography. Classes Part-of-speech tagging Statistical taggers. Transformation-based learning of tags. Maximum entropy models. Weighted finite- state transducers. Classes Question answering Semantic representation. Predictive annotation.

(C) 2000, The University of Michigan 37 SI 760 (4) Classes Text summarization Single-document summarization. Multi-document summarization. Language models. Maximal Marginal Relevance. Cross-document structure theory. Trainable methods. Text categorization. Classes (30) Other topics Text alignment. Word alignment. Statistical machine translation. Discourse segmentation. Text categorization. Maximum entropy modeling.

(C) 2000, The University of Michigan 38 SI 760 (5) Manning and Schuetze. Foundations of Statistical Natural Language Processing. MIT Press Jurafsky and Martin. Speech and Language Processing. Prentice-Hall Cover & Thomas. Elements of Information Theory. John Wiley and Sons Baeza-Yates and Ribeiro-Neto. Modern Information Retrieval. Addison-Wesley Oakes. Statistics for Corpus Linguistics. Edinburgh University Press 1998.

(C) 2000, The University of Michigan 39 Course URL

(C) 2000, The University of Michigan 40 Readings for next time Web-based readings –Asilomar report: –White paper on XML: