Presentation is loading. Please wait.

Presentation is loading. Please wait.

Information Retrieval and Web Search Lecture 1. Course overview Instructor: Rada Mihalcea Class web page:

Similar presentations


Presentation on theme: "Information Retrieval and Web Search Lecture 1. Course overview Instructor: Rada Mihalcea Class web page:"— Presentation transcript:

1 Information Retrieval and Web Search Lecture 1. Course overview Instructor: Rada Mihalcea Class web page: http://www.cs.unt.edu/~rada/CSCE5300

2 Slide 1 What is this course about? Processing Indexing Retrieving … textual data Fits in four lines, but much more complex and interesting than that

3 Slide 2 Need for IR With the advance of WWW - more than 3 Billion documents indexed on Google Various needs for information: –Search for documents that fall in a given topic –Search for a specific information –Search an answer to a question –Search for information in a different language

4 Slide 3 Some definitions of Information Retrieval (IR) Salton (1989): “Information-retrieval systems process files of records and requests for information, and identify and retrieve from the files certain records in response to the information requests. The retrieval of particular records depends on the similarity between the records and the queries, which in turn is measured by comparing the values of certain attributes to records and information requests.” Kowalski (1997): “An Information Retrieval System is a system that is capable of storage, retrieval, and maintenance of information. Information in this context can be composed of text (including numeric and date data), images, audio, video, and other multi-media objects).”

5 Slide 4 Examples of IR systems Conventional (library catalog) Search by keyword, title, author, etc. E.g. : You are probably familiar with www.library.unt.edu Text-based (Lexis-Nexis, Google, FAST). Search by keywords. Limited search using queries in natural language. Multimedia (QBIC, WebSeek, SaFe) Search by visual appearance (shapes, colors,… ). Question answering systems (AskJeeves, Answerbus) Search in (restricted) natural language Other: cross language information retrieval, music retrieval

6 Slide 5

7 Slide 6

8 Slide 7 IR systems on the Web Search for Web pages http://www.google.comhttp://www.google.com Search for images http://images.google.comhttp://images.google.com Search for image content http://wang.ist.psu.edu/IMAGE/ http://wang.ist.psu.edu/IMAGE/ Search for answers to questions http://www.askjeeves.com http://www.askjeeves.com Search for music?

9 Slide 8 Course information Instructor: Rada Mihalcea Contact info: NTRP 228, 940-369-7630, rada@cs.unt.edu rada@cs.unt.edu Teaching assistant: TBA Class meets TTh, 2:00-3:20pm Office hourse –T, 4:00-5:30pm –Any time electronically –For grading, programming problems, first try to get in touch with the TA.

10 Slide 9 Course resources Textbook: –Modern Information Retrieval Ricardo Baeza-Yates and Berthier Ribeiro-Neto Recommended: –Readings in Information Retrieval K.Sparck Jones and P. Willett –See the class website for pointers to places to buy them for less Papers from conferences, journals will be assigned throughout the course. Whenever possible, a copy of the paper will be placed on the class website.

11 Slide 10 Grading Homeworks: 30% –Start early! Some may be time consuming –3 days late policy Midterm I: 15% Midterm II: 15% Project: 30% Class participation: 10% Good news! No final – final is replaced by the project

12 Slide 11 Programming language Students are free to choose the programming language they want to work with However: –I recommend working with Perl –We’ll have a short Perl tutorial next 1-2 lectures –Why Perl? Makes life much much more easier for text processing problems and for Web based applications Information Retrieval involves a lot of text processing, and often involves Web access –Code reusability Regardless of the language, code MUST compile and run on the CSP Linux machines. –No credit will be given for programs that do not compile!

13 Slide 12 Tentative schedule Course Overview Short Perl Tutorial Introduction to IR models and methods Text analysis / document preprocessing Vectorial model Boolean model Probabilistic model; other IR models IR collections IR evaluation Query operations Query languages Natural Language IR (Named Entity recognition)

14 Slide 13 Tentative schedule Natural Language IR (Semantic ambiguity, conceptual indexing) Natural Language IR (Phrase indexing, other) Question Answering: TREC / Web Information extraction Text classification/Topic tracking and detection Web IR: crawlers Web IR: search engines Web IR: link based / content based Web IR: evaluation metrics / Midterm review Special topics: Cross Language IR Special topics Final IR overview, future directions …. Midterm I, Midterm II, Project presentations


Download ppt "Information Retrieval and Web Search Lecture 1. Course overview Instructor: Rada Mihalcea Class web page:"

Similar presentations


Ads by Google