Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS523 INFORMATION RETRIEVAL COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY.

Similar presentations


Presentation on theme: "CS523 INFORMATION RETRIEVAL COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY."— Presentation transcript:

1 CS523 INFORMATION RETRIEVAL COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY

2 Contact Info ysaygin@sabanciuniv.edu http://people.sabanciuniv.edu/~ysaygin Tel : 9576 No Specific office hours. You can drop by anytime you like. Email or call me to make sure I am at the office.

3 Course Info Reference Book: Introduction to Information Retrieval, Authors: Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze Publisher: Cambridge University Press. 2008.

4 Course Info Grading:  Homework : 10%  Project : 40%  Paper presentation : 20%  Term Paper : 20%  Attendance during paper presentations: 10%

5 Topics that will be covered Document Retrieval Techniques Information Retrieval on the Web Data Mining for Information Retrieval

6 Aim of the course Knowledge: To introduce information retrieval techniques Skills: paper reading and presentation research and/or project work

7 A Rough Schedule October, November: Lectures on various information retrieval techniques Remaining weeks: Paper and research project presentations

8 What I will do Give the basics on information retrieval Project supervision Give directions and advise on the projects Coordination of the presentations

9 What I expect you to do Understand the basic concepts of Information Retrieval Choose a specific area and two related papers on the same topic for presentation in class Attendance is required for paper presentations and you will loose 2% of your overall grade for each presentation you missed. Write a term paper on the two papers presented. Do a project and a final report describing what you learned or achieved in the scope of the project.

10 Sources TREC Conference http://trec.nist.gov/ SIGIR Conference http://www.sigir.org/http://www.sigir.org/ WWW Conference http://www2004.org/http://www2004.org/ ACM TOIS Journal SIGMOD, VLDB, ICDE Conferences (database perspective) SIGKDD, ICDM Conferences (data mining perspective)

11 Tools SMART IR (Cornell Univ.) http://www.cs.cornell.edu/Info/Projects/NLP/ Glimpse from Univ. Arizona http://webglimpse.net/ Google Altavista Yahoo

12 Information Retrieval Refers to the retrieval of any type of information such as Structured data (e.g. relational database) Text (We will focus on this) Video Image, sound DNA

13 Document Retrieval User Query Static Document Collection Ranked Result Document Collection is previously indexed User query is ad hoc Results are ranked wrt their similarity to the user query

14 Document Routing User profiles are set in advance Incoming documents are directed to relevant users Useful for redirecting corporate emails to relevant departments (sales, marketing, support etc)

15 Performance Metrics for IR Precision Recall Not practical to have good precision and recall Whole Document Space Relevant Documents Retrieved Documents Relevant and Retrieved Documents

16 First Reading for Tomorrow The Anatomy of a Large-Scale Hypertextual Web Search Engine (WWW Conference 1998) paper by Sergey Brin and Lawrence Page www-db.stanford.edu/~backrub/google.html

17 Web Information Retrieval Two possible ways: Use the web structure starting from a location like yahoo where things are categorized Use search engines

18 Web Information Retrieval Challenges Scale:  Hundreds of millions of queries per day  Web grows, continuous crawling is needed  Obstacles due to OS, and disk seek time Google handles large data sets by indexing and compression Search quality is important Completeness of the index is important But ranking is also of utmost importance due to the size of the Web

19 Web Information Retrieval Ranking (of google) The idea is to give importance to pages that have a lot of back links Similar to the notion of citations in academia A link graph of the web was formed and maintained (518 million links in 1998 for the prototype)

20 Web Mining (focused) Crawling and Indexing Topic Directories Clustering and Classification Hyperlink Analysis Personalization (profiles, preferences)


Download ppt "CS523 INFORMATION RETRIEVAL COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY."

Similar presentations


Ads by Google