Aruna Balasubramanian, Yun Zhou, W Bruce Croft, Brian N Levine and Arun Venkataramani Department of Computer Science, University of Massachusetts, Amherst.

Slides:



Advertisements
Similar presentations
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science R3: Robust Replication Routing in Wireless Networks with Diverse Connectivity Characteristics.
Advertisements

Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
Aruna Balasubramanian Department of Computer Science University of Massachusetts Amherst Architecting Protocols to Improve Connectivity.
Information Retrieval in Practice
Search Engines and Information Retrieval
Internet Networking Spring 2006 Tutorial 12 Web Caching Protocols ICP, CARP.
1 Spring Semester 2007, Dept. of Computer Science, Technion Internet Networking recitation #13 Web Caching Protocols ICP, CARP.
21 21 Web Content Management Architectures Vagan Terziyan MIT Department, University of Jyvaskyla, AI Department, Kharkov National University of Radioelectronics.
University of Kansas Department of Electrical Engineering and Computer Science Dr. Susan Gauch April 2005 I T T C Dr. Susan Gauch Personalized Search Based.
Federated Search of Text Search Engines in Uncooperative Environments Luo Si Language Technology Institute School of Computer Science Carnegie Mellon University.
Overview of Search Engines
What’s The Difference??  Subject Directory  Search Engine  Deep Web Search.
Web Programming Language Dr. Ken Cosh Week 1 (Introduction)
Niranjan Balasubramanian Aruna Balasubramanian Arun Venkataramani University of Massachusetts Amherst Energy Consumption in Mobile Phones: A Measurement.
WebQuilt and Mobile Devices: A Web Usability Testing and Analysis Tool for the Mobile Internet Tara Matthews Seattle University April 5, 2001 Faculty Mentor:
1 Introduction to Web Development. Web Basics The Web consists of computers on the Internet connected to each other in a specific way Used in all levels.
Computer Concepts 2014 Chapter 7 The Web and .
Evaluation David Kauchak cs458 Fall 2012 adapted from:
1 Anonshare 2.0 P2P Anonymous Browsing History Share Frank Chiang Terry Go Rui Ma Anita Mathew.
Chapter 16 The World Wide Web Chapter Goals ( ) Compare and contrast the Internet and the World Wide Web Describe general Web processing.
Chapter 16 The World Wide Web Chapter Goals Compare and contrast the Internet and the World Wide Web Describe general Web processing Describe several.
Aruna Balasubramanian Brian Neil Levine Arun Venkataramani University of Massachusetts, Amherst Enhancing Interactive Web Applications in Hybrid Networks.
Enterprise & Intranet Search How Enterprise is different from Web search What to think about when evaluating Enterprise Search How Intranet use is different.
Lecturer: Ghadah Aldehim
Search Engines and Information Retrieval Chapter 1.
How did the internet develop?. What is Internet? The internet is a network of computers linking many different types of computers all over the world.
The Internet  Internet Hardware connected together Creates a massive worldwide network  Hardware Computers Communication lines  Interlinked collection.
Linking Wikipedia to the Web Antonio Flores Bernal Department of Computer Sciencies San Pablo Catholic University 2010.
Dynamic Content On Edge Cache Server (using Microsoft.NET) Name: Aparna Yeddula CS – 522 Semester Project Project URL: cs.uccs.edu/~ayeddula/project.html.
Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009.
Disruption Tolerant Networks Aruna Balasubramanian University of Massachusetts Amherst 1.
Disruption Tolerant Networks Aruna Balasubramanian University of Massachusetts Amherst 1.
Context-Aware Interactive Content Adaptation Iqbal Mohomed, Jim Cai, Sina Chavoshi, Eyal de Lara Department of Computer Science University of Toronto MobiSys2006.
 2004 Prentice Hall, Inc. All rights reserved. 1 Segment – 6 Web Server & database.
Personalized Search Xiao Liu
Internet Real-Time Laboratory Arezu Moghadam and Suman Srinivasan Columbia University in the city of New York 7DS System Design 7DS system is an architecture.
Toward A Session-Based Search Engine Smitha Sriram, Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.
GUIDED BY DR. A. J. AGRAWAL Search Engine By Chetan R. Rathod.
WIRED Week 3 Syllabus Update (next week) Readings Overview - Quick Review of Last Week’s IR Models (if time) - Evaluating IR Systems - Understanding Queries.
Personalization with user’s local data Personalizing Search via Automated Analysis of Interests and Activities 1 Sungjick Lee Department of Electrical.
Dissertation Proposal Aruna Balasubramanian Department of Computer Science, University of Massachusetts, Amherst Architecting Protocols To Enable Mobile.
1 UNIT 13 The World Wide Web Lecturer: Kholood Baselm.
21/11/20151Gianluca Demartini Ranking Clusters for Web Search Gianluca Demartini Paul–Alexandru Chirita Ingo Brunkhorst Wolfgang Nejdl L3S Info Lunch Hannover,
1 Web Servers (Chapter 21 – Pages( ) Outline 21.1 Introduction 21.2 HTTP Request Types 21.3 System Architecture.
Chapter 29 World Wide Web & Browsing World Wide Web (WWW) is a distributed hypermedia (hypertext & graphics) on-line repository of information that users.
CONTENTS  Definition And History  Basic services of INTERNET  The World Wide Web (W.W.W.)  WWW browsers  INTERNET search engines  Uses of INTERNET.
Chapter 5 Introduction To Form Builder. Lesson A Objectives  Display Forms Builder forms in a Web browser  Use a data block form to view, insert, update,
Personalizing Web Search Jaime Teevan, MIT with Susan T. Dumais and Eric Horvitz, MSR.
Toward Entity Retrieval over Structured and Text Data Mayssam Sayyadian, Azadeh Shakery, AnHai Doan, ChengXiang Zhai Department of Computer Science University.
September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research The GRACE Project GRid enabled.
Computer Science Department 1 Studying the Impact of More Complete Server Information on Web Caching Craig E. Wills and Mikhail Mikhailov Worcester Polytechnic.
The Anatomy of a Large-Scale Hypertextual Web Search Engine S. Brin and L. Page, Computer Networks and ISDN Systems, Vol. 30, No. 1-7, pages , April.
1 UNIT 13 The World Wide Web. Introduction 2 Agenda The World Wide Web Search Engines Video Streaming 3.
1 UNIT 13 The World Wide Web. Introduction 2 The World Wide Web: ▫ Commonly referred to as WWW or the Web. ▫ Is a service on the Internet. It consists.
General Architecture of Retrieval Systems 1Adrienn Skrop.
Search Engine and Optimization 1. Introduction to Web Search Engines 2.
Information Retrieval in Practice
Enhancing Interactive Web Applications in Hybrid Networks (“thedu”)
Evaluation Anisio Lacerda.
Search Engine Architecture
Proposal for Term Project
CASE STUDY -HTML,URLs,HTTP
Some Common Terms The Internet is a network of computers spanning the globe. It is also called the World Wide Web. World Wide Web It is a collection of.
Information Retrieval
Chapter 16 The World Wide Web.
Panagiotis G. Ipeirotis Luis Gravano
Information Retrieval and Web Design
Information Retrieval and Web Design
The Internet and Electronic mail
Presentation transcript:

Aruna Balasubramanian, Yun Zhou, W Bruce Croft, Brian N Levine and Arun Venkataramani Department of Computer Science, University of Massachusetts, Amherst Web Search From a Bus

Why web search from a bus?  Open access point commonly available  Intermittent internet connectivity from vehicles possible no subscription cost useful when no other connectivity is available  Web search 2 nd most common web activity (survey by pewinternet.org)

Connectivity characteristics of testbeds Goal: Build web search in the presence of frequent disconnections and small connectivity duration

Web search process Retrieving web…. Retrieving images… Retrieving….

Adapting to vehicular network

Why challenging?  Interactive several exchanges between user and search engine needed  Results imprecise response may not be relevant difficult to measure relevance Thedu: Proxy Architecture: sustain interaction IR contribution: increase usefulness of returned response

Thedu proxy  Between vehicle and search engine  When proxy receives query request from vehicle retrieves urls and snippets prefetches URL contents including images stores responses and maintains state  When vehicle connects to proxy downloads pending responses

Client and proxy architecture USERUSER Web interface Store query Process response Client-side Vehicle Server-side Proxy Queries for vehicle Fetch URL/images Prioritize response Pending responses Search engine Web site Intermittent connectivity New queries Queries Response bundles Responses

How to prioritize?  Search engines use relevance scores to rank responses scores not comparable across queries  Even if response is relevant it may not be useful Query “chants 2007” needs only one response  Thedu Normalize relevance scores: Comparable across queries Classify query-type: To capture user intent

Query-Type classification  Query-type classification Homepage query: “cnn”, “chants 2007” Non-homepage query: “Harry potter review”  Thedu classifies using URL, snippet and title field E.g., “chants 2007” on Google Welcome to the home page of the ACM MobiCom workshop on Challenged Networks (CHANTS 2007). chants workshop HomepageNon Homepage Query terms occur in URLQuery is in question form All query terms occur in title or snippet Top URL is wikipedia Less than 3 wordsLength greater than 3 words URL is root

Relevance score normalization  Modified language model framework  D: Document, Q: Query, C: Collection  Normalized score  Kullback-Leibler divergence (distance between Q and D) Probability of word occurring in document Probability of word occurring in collection

Thedu protocol 1. Sort responses in the order of normalized score 2. For response r for query q, 2a. Update 2b. If q is homepage query and do not send 2c. Else send response to vehicle : expected relevance of all response sent for a query q : probability that r is relevant for q

Evaluation goals  What is the delay in getting search results?  How many results were relevant to the user?

Evaluation Tools  DieselNet  Indri search engine  TREC (Text Retrieval Conference) Predefined web data collection (10G) Predefined set of queries (100 homepage + 50 content) Relevance judgments (which documents are relevant for query) Thedu’s query-type classifier accuracy: 88%

Deployment on DieselNet

Thedu vs Proxy-less server  Thedu March 26 to March 30 Bundle responses Returns responses in prioritized order Maintains state  Proxy-less server April 30 to May 5 Bundle responses Returns responses as FIFO No state

Connectivity duration Mean connection duration: 35 sec Mean disconnection duration: 8 min

Thedu vs Proxy-less architecture TheduStateless proxy

Delay until first relevant response

Extending Thedu  Can we use connectivity among buses to improve throughput?  Are we limited to academic search engines? Convince commercial search providers to provide relevance scores Or, assign scores based on ranking  Are users really happy with search results and delay? traces.cs.umass.edu

Simulation Results

Inter-meeting times