Presentation is loading. Please wait.

Presentation is loading. Please wait.

Aruna Balasubramanian, Yun Zhou, W Bruce Croft, Brian N Levine and Arun Venkataramani Department of Computer Science, University of Massachusetts, Amherst.

Similar presentations


Presentation on theme: "Aruna Balasubramanian, Yun Zhou, W Bruce Croft, Brian N Levine and Arun Venkataramani Department of Computer Science, University of Massachusetts, Amherst."— Presentation transcript:

1 Aruna Balasubramanian, Yun Zhou, W Bruce Croft, Brian N Levine and Arun Venkataramani Department of Computer Science, University of Massachusetts, Amherst Web Search From a Bus

2 Why web search from a bus?  Open access point commonly available  Intermittent internet connectivity from vehicles possible no subscription cost useful when no other connectivity is available  Web search 2 nd most common web activity (survey by pewinternet.org)

3 Connectivity characteristics of testbeds Goal: Build web search in the presence of frequent disconnections and small connectivity duration

4 Web search process Retrieving web…. Retrieving images… Retrieving….

5 Adapting to vehicular network

6 Why challenging?  Interactive several exchanges between user and search engine needed  Results imprecise response may not be relevant difficult to measure relevance Thedu: Proxy Architecture: sustain interaction IR contribution: increase usefulness of returned response

7 Thedu proxy  Between vehicle and search engine  When proxy receives query request from vehicle retrieves urls and snippets prefetches URL contents including images stores responses and maintains state  When vehicle connects to proxy downloads pending responses

8 Client and proxy architecture USERUSER Web interface Store query Process response Client-side Vehicle Server-side Proxy Queries for vehicle Fetch URL/images Prioritize response Pending responses Search engine Web site Intermittent connectivity New queries Queries Response bundles Responses

9 How to prioritize?  Search engines use relevance scores to rank responses scores not comparable across queries  Even if response is relevant it may not be useful Query “chants 2007” needs only one response  Thedu Normalize relevance scores: Comparable across queries Classify query-type: To capture user intent http://www.netlab.hut.fi/chants-2007/

10 Query-Type classification  Query-type classification Homepage query: “cnn”, “chants 2007” Non-homepage query: “Harry potter review”  Thedu classifies using URL, snippet and title field E.g., “chants 2007” on Google http://www.netlab.hut.fi/chants-2007 Welcome to the home page of the ACM MobiCom workshop on Challenged Networks (CHANTS 2007). chants workshop HomepageNon Homepage Query terms occur in URLQuery is in question form All query terms occur in title or snippet Top URL is wikipedia Less than 3 wordsLength greater than 3 words URL is root

11 Relevance score normalization  Modified language model framework  D: Document, Q: Query, C: Collection  Normalized score  Kullback-Leibler divergence (distance between Q and D) Probability of word occurring in document Probability of word occurring in collection

12 Thedu protocol 1. Sort responses in the order of normalized score 2. For response r for query q, 2a. Update 2b. If q is homepage query and do not send 2c. Else send response to vehicle : expected relevance of all response sent for a query q : probability that r is relevant for q

13 Evaluation goals  What is the delay in getting search results?  How many results were relevant to the user?

14 Evaluation Tools  DieselNet  Indri search engine  TREC (Text Retrieval Conference) Predefined web data collection (10G) Predefined set of queries (100 homepage + 50 content) Relevance judgments (which documents are relevant for query) Thedu’s query-type classifier accuracy: 88%

15 Deployment on DieselNet

16 Thedu vs Proxy-less server  Thedu March 26 to March 30 Bundle responses Returns responses in prioritized order Maintains state  Proxy-less server April 30 to May 5 Bundle responses Returns responses as FIFO No state

17 Connectivity duration Mean connection duration: 35 sec Mean disconnection duration: 8 min

18 Thedu vs Proxy-less architecture TheduStateless proxy

19 Delay until first relevant response

20 Extending Thedu  Can we use connectivity among buses to improve throughput?  Are we limited to academic search engines? Convince commercial search providers to provide relevance scores Or, assign scores based on ranking  Are users really happy with search results and delay? traces.cs.umass.edu

21 Simulation Results

22 Inter-meeting times


Download ppt "Aruna Balasubramanian, Yun Zhou, W Bruce Croft, Brian N Levine and Arun Venkataramani Department of Computer Science, University of Massachusetts, Amherst."

Similar presentations


Ads by Google