Course grading Project: 75% Broken into several incremental deliverables Paper appraisal/evaluation/project tool evaluation in earlier May: 25%

Slides:



Advertisements
Similar presentations
DAML Queries/Life Cycle SRI International. Parts of Ontologies (used in the examples to follow) Assumptions Researcher String lastName firstName Publication-ref.
Advertisements

Web Search - Summer Term 2006 III. Web Search - Introduction (Cont.) (c) Wolfgang Hürst, Albert-Ludwigs-University.
Web Search – Summer Term 2006 VI. Web Search - Indexing (c) Wolfgang Hürst, Albert-Ludwigs-University.
Search Engines and Information Retrieval
Project Title: Deepin Search Member: Wenxu Li & Ziming Zhai CSCI 572 Project.
Web Search – Summer Term 2006 III. Web Search - Introduction (Cont.) - Jeff Dean, Google's Systems Lab:
Information Retrieval in Practice
ISP 433/633 Week 7 Web IR. Web is a unique collection Largest repository of data Unedited Can be anything –Information type –Sources Changing –Growing.
1 MARG-DARSHAK: A Scrapbook on Web Search engines allow the users to enter keywords relating to a topic and retrieve information about internet sites (URLs)
Exercise 1: Bayes Theorem (a). Exercise 1: Bayes Theorem (b) P (b 1 | c plain ) = P (c plain ) P (c plain | b 1 ) * P (b 1 )
What’s The Difference??  Subject Directory  Search Engine  Deep Web Search.
IBM User Technology March 2004 | Dynamic Navigation in DITA © 2004 IBM Corporation Dynamic Navigation in DITA Erik Hennum and Robert Anderson.
Web Information Retrieval Projects Ida Mele. Rules Students can work in teams (max 3 people) The project must be delivered by the deadline that will be.
Internet Research, Second Edition- Illustrated 1 Internet Research: Unit A Searching the Internet Effectively.
1 Web Developer Foundations: Using XHTML Chapter 11 Web Page Promotion Concepts.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
CSC 9010 Spring Paula Matuszek A Brief Overview of Watson.
Search Engines and Information Retrieval Chapter 1.
Course grading Project: 65% Broken into several incremental deliverables Paper appraisal/evaluation in earlier May: 15% One open-book exam on May 23: 20%
CS523 INFORMATION RETRIEVAL COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY.
Web Categorization Crawler Mohammed Agabaria Adam Shobash Supervisor: Victor Kulikov Winter 2009/10 Design & Architecture Dec
Homework 4 Final homework Deadline: Sunday April 20, PM In this homework you have to write a short essay on how Google can handle new types of data.
Information Retrieval and Web Search Lecture 1. Course overview Instructor: Rada Mihalcea Class web page:
Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
Internet Information Retrieval Sun Wu. Course Goal To learn the basic concepts and techniques of internet search engines –How to use and evaluate search.
Search - on the Web and Locally Related directly to Web Search Engines: Part 1 and Part 2. IEEE Computer. June & August 2006.
Proposal for Term Project J. H. Wang Mar. 2, 2015.
Autumn Web Information retrieval (Web IR) Handout #0: Introduction Ali Mohammad Zareh Bidoki ECE Department, Yazd University
It is impossible to guarantee that all relevant pages are returned (even inspected) (Figure 1): Millions of pages available, many of them not indexed in.
XP New Perspectives on The Internet, Sixth Edition— Comprehensive Tutorial 3 1 Searching the Web Using Search Engines and Directories Effectively Tutorial.
Using The Right Tools Information Searching by using the right tools. by Dolores Jordan August 1,2006.
Curtis Spencer Ezra Burgoyne An Internet Forum Index.
Topical Categorization of Large Collections of Electronic Theses and Dissertations Venkat Srinivasan & Edward A. Fox Virginia Tech, Blacksburg, VA, USA.
IR Homework #2 By J. H. Wang Mar. 31, Programming Exercise #2: Query Processing and Searching Goal: to search relevant documents for a given query.
Alexey Kolosoff, Michael Bogatyrev 1 Tula State University Faculty of Cybernetics Laboratory of Information Systems.
Search Engines Reyhaneh Salkhi Outline What is a search engine? How do search engines work? Which search engines are most useful and efficient? How can.
GUIDED BY DR. A. J. AGRAWAL Search Engine By Chetan R. Rathod.
Searching the web Enormous amount of information –In 1994, 100 thousand pages indexed –In 1997, 100 million pages indexed –In June, 2000, 500 million pages.
IT-522: Web Databases And Information Retrieval By Dr. Syed Noman Hasany.
CSCI 572: Information Retrieval and Search Engines: Summer 2011 Prof. Chris A. Mattmann.
The Structure of Information Retrieval Systems LBSC 708A/CMSC 838L Douglas W. Oard and Philip Resnik Session 1: September 4, 2001.
1 Internet Research Third Edition Unit A Searching the Internet Effectively.
Search Engines By: Faruq Hasan.
Internet Research – Illustrated, Fourth Edition Unit A.
Information Retrieval and Web Search Course overview Instructor: Rada Mihalcea.
ITCS 6265 Details on Project & Paper Presentation.
Augmenting (personal) IR Readings Review Evaluation Papers returned & discussed Papers and Projects checkin time.
Kendra Hunter & Charde Johnson EDUC Dr. M. Kariuki.
CS798: Information Retrieval Charlie Clarke Information retrieval is concerned with representing, searching, and manipulating.
ELISQ Systems Demonstration Sagnik Ray Choudhury Doha -- May 2015.
1 CS 8803 AIAD (Spring 2008) Project Group#22 Ajay Choudhari, Avik Sinharoy, Min Zhang, Mohit Jain Smart Seek.
CS276B Text Information Retrieval, Mining, and Exploitation Practical 1 Jan 14, 2003.
Frompo is a Next Generation Curated Search Engine. Frompo has a community of users who come together and curate search results to help improve.
Lecture 4 Access Tools/Searching Tools. Learning Objectives To define access tools To identify various access tools To be able to formulate a search strategy.
SEMINAR ON INTERNET SEARCHING PRESENTED BY:- AVIPSA PUROHIT REGD NO GUIDED BY:- Lect. ANANYA MISHRA.
Information Retrieval in Practice
Information Storage and Retrieval Fall Lecture 1: Introduction and History.
Working in Groups in Canvas
Proposal for Term Project
Prepared by Rao Umar Anwar For Detail information Visit my blog:
Internet Research Third Edition
CS 179 Project Intro.
Information Retrieval
Agenda What is SEO ? How Do Search Engines Work? Measuring SEO success ? On Page SEO – Basic Practices? Technical SEO - Source Code. Off Page SEO – Social.
Lecture 8 Information Retrieval Introduction
Final Project Display By 曹昕哲 Xinzhe Cao
Web Search and Advanced Internet Services
Discussion Class 9 Google.
CSCI 572: Information Retrieval and Search Engines: Summer 2010
Presentation transcript:

Course grading Project: 75% Broken into several incremental deliverables Paper appraisal/evaluation/project tool evaluation in earlier May: 25%

Paper appraisal Read and critically appraise the system/tool your project uses and/or a recent research paper (e.g. SIGIR, WWW conferences) which is relevant to your project By April 24, obtain instructor confirmation on content you will read. Propose what to do no later than April 17 By May 12 turn in a slide report/web site Summarize and include relevant content. Compare it to other work in the area Discuss interesting issue/research directions that arise. Get ready for a short presentation.

Project Opportunity to devote time to a substantial research project Typically a substantive programming project. Or in-depth analysis/comparison of methods and evaluation. Work in teams of 1 or 2 students Topic can be selected towards your interests Meet with me to discuss options

Project Due April 14: Project group and project idea Decision on project group Brief description of project area/topic We’ll provide initial feedback Due April 21: Project proposal Should break project execution into three phases – Block 1, Block 2 and Block 3 Each phase should have a tangible deliverable Block 1 delivery due May 5 Block 2 due May 19 Block 3 (final project report) due June 9th Week of June 7: Student project presentations

Project - breakdown 10% for initial project proposal Scope, timeline, cleanliness of measurements Writeup should state problem being solved, related prior work, approach you propose and what you will measure. 10% for deliveries each of Blocks 1, 2 30% for final delivery of Block 3 Must turn in a writeup Components measured will be overall scope, writeup, code quality, fit/finish. Writeup should be ~8 pages

Project Presentations Project presentations in class (about 10 mins per group): Great opportunities to get feedback. April 23/26: Students present project plans Week of June 7: Final project presentations

What is next? Project examples Example of Tools WordNet Google API Amazon Web Services / Alexa Lucene Stanford WebBase

Project examples Leveraging existing theory/data/software is encouraged, e.g.: Web services WordNet Algorithms and concepts from research papers Etc. Most projects: compare performance of several methods, or test a new idea against some baseline

Project Ideas Build a search engine for UCSB technical reports. Compare and improve the ranking algorithm. Crawl pages of a particular subject and build a special database and ranking (e.g. wikipedia) Classify pages based on wikipedia or DMOZ categories.

Lucene Easy-to-use, efficient Java library for building and querying your own text index Could use it to build your own search engine, experiment with different strategies for determining document relevance, …

Google API Web service for querying Google from your software ~1M queries per day. Web search, site search, news search, blog search, Note: within search requests you can use special commands like link, related, intitle, etc.

WordNet Java API available (already installed) Useful tool for semantic analysis Represents the English lexicon as a graph Each node is a “synset” – a set of words with similar meanings Nodes are connected by various relations such as hypernym/hyponym (X is a kind of Y), troponym, pertainym, etc. Could use for query reformulation, document classification, …

Stanford WebBase They offer various relatively small web crawls (the largest is about 100 million pages) offering cached pages and link structure data They provide code for accessing their data

Recommendation systems Data: Rating of 270K books from 278K users. Rating of 100 jokes from 73K users.

“Natural language” search Present an interface that invites users to type in queries in natural language Find a means of parsing such questions of important categories into full-text queries for the engine. What is Why is How to Evaluate the relevancy of query answering.

Text spamming detection

Detecting index spamming lots of “invisible” text in the background color There is less of that now, as search engines check for it as sign of spam Questions: Can one use term weighting strategies to make IR system more resistant to spam? Can one detect and filter pages attempting index spamming? E.g. a language model run over pages [From the other direction, are there good ways to hide spam so it can’t be filtered??]