CWS: A Comparative Web Search System Jian-Tao Sun, Xuanhui Wang, § Dou Shen Hua-Jun Zeng, Zheng Chen Microsoft Research Asia University of Illinois at.

Slides:



Advertisements
Similar presentations
Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Advertisements

A Comparison of Implicit and Explicit Links for Web Page Classification Dou Shen 1 Jian-Tao Sun 2 Qiang Yang 1 Zheng Chen 2 1 Department of Computer Science.
Multilinguality & Semantic Search Eelco Mossel (University of Hamburg) Review Meeting, January 2008, Zürich.
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
A Cross-Collection Mixture Model for Comparative Text Mining
Date: 2013/1/17 Author: Yang Liu, Ruihua Song, Yu Chen, Jian-Yun Nie and Ji-Rong Wen Source: SIGIR12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Adaptive.
Introduction to ReviewMiner Hongning Wang Department of Computer Science University of Illinois at Urbana-Champaign
Document Summarization using Conditional Random Fields Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, Zheng Chen IJCAI 2007 Hao-Chin Chang Department of Computer.
1 Opinion Summarization Using Entity Features and Probabilistic Sentence Coherence Optimization (UIUC at TAC 2008 Opinion Summarization Pilot) Nov 19,
Query Chain Focused Summarization Tal Baumel, Rafi Cohen, Michael Elhadad Jan 2014.
Implicit Queries for Vitor R. Carvalho (Joint work with Joshua Goodman, at Microsoft Research)
Learning to Cluster Web Search Results SIGIR 04. ABSTRACT Organizing Web search results into clusters facilitates users quick browsing through search.
Comparing Twitter Summarization Algorithms for Multiple Post Summaries David Inouye and Jugal K. Kalita SocialCom May 10 Hyewon Lim.
EventCube Aviation Safety Data Analysis System Fangbo Tao, Xiao Yu, Jiawei Han 08/10/13.
Web Document Clustering: A Feasibility Demonstration Hui Han CSE dept. PSU 10/15/01.
Information Retrieval in Practice
LYU0101 Wireless Digital Information System Lam Yee Gordon Yeung Kam Wah Supervisor Prof. Michael Lyu Second semester FYP Presentation 2001~2002.
Creating Concept Hierarchies in a Customer Self-Help System Bob Wall CS /29/05.
Retrieval Evaluation. Brief Review Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
Re-ranking Documents Segments To Improve Access To Relevant Content in Information Retrieval Gary Madden Applied Computational Linguistics Dublin City.
FACT: A Learning Based Web Query Processing System Hongjun Lu, Yanlei Diao Hong Kong U. of Science & Technology Songting Chen, Zengping Tian Fudan University.
1 MARG-DARSHAK: A Scrapbook on Web Search engines allow the users to enter keywords relating to a topic and retrieve information about internet sites (URLs)
Chapter 5: Information Retrieval and Web Search
Overview of Search Engines
Finding Advertising Keywords on Web Pages Scott Wen-tau YihJoshua Goodman Microsoft Research Vitor R. Carvalho Carnegie Mellon University.
Databases & Data Warehouses Chapter 3 Database Processing.
WebPage Summarization Using Clickthrough Data JianTao Sun & Yuchang Lu, TsingHua University, China Dou Shen & Qiang Yang, HK University of Science & Technology.
NUITS: A Novel User Interface for Efficient Keyword Search over Databases The integration of DB and IR provides users with a wide range of high quality.
 Clustering of Web Documents Jinfeng Chen. Zhong Su, Qiang Yang, HongHiang Zhang, Xiaowei Xu and Yuhen Hu, Correlation- based Document Clustering using.
RuleML-2007, Orlando, Florida1 Towards Knowledge Extraction from Weblogs and Rule-based Semantic Querying Xi Bai, Jigui Sun, Haiyan Che, Jin.
Text mining.
 An important problem in sponsored search advertising is keyword generation, which bridges the gap between the keywords bidded by advertisers and queried.
Reyyan Yeniterzi Weakly-Supervised Discovery of Named Entities Using Web Search Queries Marius Pasca Google CIKM 2007.
©2008 Srikanth Kallurkar, Quantum Leap Innovations, Inc. All rights reserved. Apollo – Automated Content Management System Srikanth Kallurkar Quantum Leap.
Redeeming Relevance for Subject Search in Citation Indexes Shannon Bradshaw The University of Iowa
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Topical Crawlers for Building Digital Library Collections Presenter: Qiaozhu Mei.
25/03/2003CSCI 6405 Zheyuan Yu1 Finding Unexpected Information Taken from the paper : “Discovering Unexpected Information from your Competitor’s Web Sites”
Web Document Clustering: A Feasibility Demonstration Oren Zamir and Oren Etzioni, SIGIR, 1998.
11 A Hybrid Phish Detection Approach by Identity Discovery and Keywords Retrieval Reporter: 林佳宜 /10/17.
윤언근 DataMining lab.  The Web has grown exponentially in size but this growth has not been isolated to good-quality pages.  spamming and.
Improving Suffix Tree Clustering Base cluster ranking s(B) = |B| * f(|P|) |B| is the number of documents in base cluster B |P| is the number of words in.
Intent Subtopic Mining for Web Search Diversification Aymeric Damien, Min Zhang, Yiqun Liu, Shaoping Ma State Key Laboratory of Intelligent Technology.
29-30 October, 2006, Estonia 1 IST4Balt Information analysis using social bookmarking and other tools IST4Balt Information analysis using social bookmarking.
Improving Cloaking Detection Using Search Query Popularity and Monetizability Kumar Chellapilla and David M Chickering Live Labs, Microsoft.
Chapter 6: Information Retrieval and Web Search
Text mining. The Standard Data Mining process Text Mining Machine learning on text data Text Data mining Text analysis Part of Web mining Typical tasks.
Binxing Jiao et. al (SIGIR ’10) Presenter : Lin, Yi-Jhen Advisor: Dr. Koh. Jia-ling Date: 2011/4/25 VISUAL SUMMARIZATION OF WEB PAGES.
Feature Detection in Ajax-enabled Web Applications Natalia Negara Nikolaos Tsantalis Eleni Stroulia 1 17th European Conference on Software Maintenance.
Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.
IR Homework #2 By J. H. Wang Mar. 31, Programming Exercise #2: Query Processing and Searching Goal: to search relevant documents for a given query.
Improving Web Search Results Using Affinity Graph Benyu Zhang, Hua Li, Yi Liu, Lei Ji, Wensi Xi, Weiguo Fan, Zheng Chen, Wei-Ying Ma Microsoft Research.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
1 Web-Page Summarization Using Clickthrough Data* JianTao Sun, Yuchang Lu Dept. of Computer Science TsingHua University Beijing , China Dou Shen,
1 Opinion Retrieval from Blogs Wei Zhang, Clement Yu, and Weiyi Meng (2007 CIKM)
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
Discovering Evolutionary Theme Patterns from Text - An Exploration of Temporal Text Mining Qiaozhu Mei and ChengXiang Zhai Department of Computer Science.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
PAIR project progress report Yi-Ting Chou Shui-Lung Chuang Xuanhui Wang.
Trends in NL Analysis Jim Critz University of New York in Prague EurOpen.CZ 12 December 2008.
Information Retrieval in Practice
Web Data Extraction Based on Partial Tree Alignment
Tantan Liu, Fan Wang, Gagan Agrawal The Ohio State University
Introduction to Search Engines
Presentation transcript:

CWS: A Comparative Web Search System Jian-Tao Sun, Xuanhui Wang, § Dou Shen Hua-Jun Zeng, Zheng Chen Microsoft Research Asia University of Illinois at Urbana-Champaign § Hong Kong University of Science and Technology

Problem to Solve Massive needs for comparing information –Products, stores, companies –Peoples, countries, cities –General information Few effective ways for this comparison –Existing comparison shopping engines (e.g. shopping.com and froogle.google.com) Domain-dependent Based on structured information –Search engines Single search box Long list of search result pages

A Scenario for Information Comparison Comparing Greece and Turkey for a holiday Method 1: –Input Greece vs. Turkey into a search engine –Some results with low quality Single search box problem

A Scenario for Information Comparison (contl) Method 2: –Input Greece and Turkey separately –Good results for each query but difficult to compare them Simple result list problem

Our Proposal Comparative Web Search (CWS): –Facilitate the information comparison by using search engines Features: 1.Multiple search boxes for the input 2.Side-by-side comparison of corresponding results 3.Clustering related results into themes

Related Work Website comparison [Liu, WWW02; Liu, KDD01] –Hierarchical clustering webpages of two websites –Pages are displayed as a tree form –Differences are highlighted Comparative Web browser [Nadamoto, WWW03] –Concurrently presents multiple Web pages –After a user selects a page from one site, the system retrieves similar contents from the other site

Related Work (contl) Comparative text mining, [Zhai, KDD04; Zang, Master thesis 2004] –Mining a set of comparative text collections –Discover latent common themes and specific themes across all collections Product comparison [Hu, KDD04; Liu, WWW05] –Extract customers' opinions on product features based on a collection of customer reviews –Both customers and manufactures can make comparisons between products

CWS System Flowchart

CWS Pair-view Interface

CWS Cluster-view Interface Common keywords for clusters Query-specific keywords

Algorithm for Page Pair Ranking Input: query q 1 & q 2 Output: ranked list of comparative page pairs Assumptions: page pair is a comparative page pair if: –p 1 is relevant to q 1 –p 2 is relevant to q 2 – contains comparative information of q1 and q2

Algorithm for Page Pair Ranking Function for measure the comparativeness of page pair f: Comparativeness function R: Relevance between query and page S: Similarity between two text segments SR: Search result list T: Comparative information contained in the page pair p * \q * : Remaining text content of page p * after removing q * from it

Algorithm for Clustering and Keyword Extraction Cluster comparative page pairs –Each page pair is treated as a whole –A probabilistic clustering algorithm based on simple mixture generative model [Zhai, KDD04] Represent clusters by keywords

Algorithm for Clustering and Keyword Extraction Extracting query-specific keywords Supervised keyword extraction algorithm Linear regression model with 4 features –PF: phrase frequency –ATF: average frequency of all terms in phrase –AIDF: average inverse document frequency –OKA: OKAPI weighting score Selection of key-phrases for sub-clusters Entropy based approach

Experiment for Page Pair Ranking Data set 20 query pairs Retrieve top 50 pages of MSN search for each query

Data Labeling and Evaluation 3 human labelers judge the results of pair-view mode –Is the left page relevant with the first query? –Is the right page relevant with the second query? –Is the page pair helpful for making comparisons? Evaluation Method: Number of correct comparative page pairs in top N / N

Precision of Comparative Page Pair Conclusions: –Our algorithm can get a 80% top1 precision –Both URL and snippet are useful for comparativeness measure –The combination of them get best result

Page Pair Ranking Case Study Comparative page pair examples Canon Sure Shot 130u vs. Olympus Stylus Epic Afghanistan War vs. Iraq War

Experiment for Comparative Page Clustering Example results of comparative page clustering and keyphrase extraction

Conclusions In this work Proposed and studied a new search problem, comparative Web search Implemented a CWS system, characterized by Allowing users input two comparative queries Organizing pages into ranked comparative page pairs Grouping page pairs into comparative clusters Extraction of keyphrases to summarize comparative information Future work Adoption of other evaluation approaches for larger scale experiment Automatic identification of comparative query pairs

Q&A Microsoft Research Asia

Backup Slides