Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSCI 572: Information Retrieval and Search Engines: Summer 2010

Similar presentations


Presentation on theme: "CSCI 572: Information Retrieval and Search Engines: Summer 2010"— Presentation transcript:

1 CSCI 572: Information Retrieval and Search Engines: Summer 2010
Prof. Chris A. Mattmann

2 The Class Will give you a complete treatment of the area of search engines and information retrieval The fundamental building blocks of the web and search engines The Search Engine Architecture proposed by Brin/Page Understanding algorithms for ranking pages Understanding technologies for characterizing, downloading, parsing, indexing, searching and disseminating web content Advanced topics in search engines such as BigData and distributed computation Will equip you with the necessary skills to design complex, real-world search engines

3 General class information
Lecture, but… You can participate You should participate You will participate, that is, if you want to do well :) Breakdown of points 20% participation 40% research paper presentation 40% course project

4 General class information
Syllabus/website: Visit it often, as the schedule may change! This is where all of your course project info and presentation info will be posted This site will point you to required reading (research papers), and to lectures that you can download before class

5 What we’ll cover Theory Practice
Understanding of basic information retrieval Search engine querying Search engine ranking Architecture of search engines and technologies Design Patterns Practice Modern search engine technologies from Apache

6 Course Presentation Each week, we’ll read a few research papers on search engines For the first part of the course (5 weeks), I’ll lecture on the general topics that the research papers cover The search engine architecture: fetching, parsing, indexing, querying, distributed computation, etc. For the last part of the course (~5 weeks), each one of you will present on one of the research papers we covered in the first 5 weeks

7 Course Presentation What I’m looking for (~20 minutes of presentation, with ~5 mins questions at the end) You understood the paper Discussion of related work and background Discussion of why should I care about the topic And more importantly why your fellow classmates should care Relation of your paper to the lecture slides I gave on the topic Simple summarization and description of the algorithm and/or technology introduced in the paper What were the results/contributions/conclusions of the paper Your evaluation of Pros of the paper Your evaluation of Cons of the paper

8 Course Presentation What I’m NOT looking for
Plagiarism Repetition Cutting/Pasting out of the paper Regurgitation You to follow the EXACT set of bullets that I gave on the prior slide You should be looking to be innovative – show the class and me that you really understood what was in the paper Treat it like a conference presentation

9 Course Project You will get to leverage one or a combination of several Apache software technologies Nutch, Tika, Lucene, Solr, Hadoop, HBase, Hive, Cassandra, etc. You will make a significant contribution to one or more of the above communities Deliverables A 2 page project proposal A 2 page mid-term project report Source code and final demonstration to me at end of class

10 Course Project Deliverables Your project proposal should include:
Demonstration that you’ve researched your particular idea with pointers to issue trackers and mailing lists Objectives section Approach section Identification of deliverables section Timeline/Schedule Your mid term report should include: Current status Blockers to completion Planned mitigation to blockers

11 Me Graduated with my Ph.D. in Computer Science from USC in 2007
Advisor: Dr. Nenad Medvidovic Was a student at USC from B.S., Computer Science 2001 M.S., Computer Science 2003 My research interests The intersection of software architectures, and large-scale data dissemination Software connector selection Bayesian decision theory Reinforcement learning Search Engines

12 So…today Quick lecture on characterizing the web
Read the papers linked from the syllabus Be ready for next Tuesday as this is a 10-week course and we are going to dive in


Download ppt "CSCI 572: Information Retrieval and Search Engines: Summer 2010"

Similar presentations


Ads by Google