Binxing Jiao et. al (SIGIR ’10) Presenter : Lin, Yi-Jhen Advisor: Dr. Koh. Jia-ling Date: 2011/4/25 VISUAL SUMMARIZATION OF WEB PAGES.

Slides:



Advertisements
Similar presentations
A Comparison of Implicit and Explicit Links for Web Page Classification Dou Shen 1 Jian-Tao Sun 2 Qiang Yang 1 Zheng Chen 2 1 Department of Computer Science.
Advertisements

VisualRank: Applying PageRank to Large-Scale Image Search Yushi Jing, Member, IEEE, and Shumeet Baluja, Member, IEEE.
Learning to Cluster Web Search Results SIGIR 04. ABSTRACT Organizing Web search results into clusters facilitates users quick browsing through search.
Comparing Twitter Summarization Algorithms for Multiple Post Summaries David Inouye and Jugal K. Kalita SocialCom May 10 Hyewon Lim.
Toward Whole-Session Relevance: Exploring Intrinsic Diversity in Web Search Date: 2014/5/20 Author: Karthik Raman, Paul N. Bennett, Kevyn Collins-Thompson.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
Ao-Jan Su † Y. Charlie Hu ‡ Aleksandar Kuzmanovic † Cheng-Kok Koh ‡ † Northwestern University ‡ Purdue University How to Improve Your Google Ranking: Myths.
A New Suffix Tree Similarity Measure for Document Clustering Hung Chim, Xiaotie Deng City University of Hong Kong WWW 2007 Session: Similarity Search April.
Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1.
Mining Query Subtopics from Search Log Data Date : 2012/12/06 Resource : SIGIR’12 Advisor : Dr. Jia-Ling Koh Speaker : I-Chih Chiu.
1 Entity Ranking Using Wikipedia as a Pivot (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu.
Chapter 11 Beyond Bag of Words. Question Answering n Providing answers instead of ranked lists of documents n Older QA systems generated answers n Current.
ADVISE: Advanced Digital Video Information Segmentation Engine
MANISHA VERMA, VASUDEVA VARMA PATENT SEARCH USING IPC CLASSIFICATION VECTORS.
1 Web Query Classification Query Classification Task: map queries to concepts Application: Paid advertisement 问题:百度 /Google 怎么赚钱?
Query Biased Snippet Generation in XML Search Yi Chen Yu Huang, Ziyang Liu, Yi Chen Arizona State University.
Overview of Search Engines
SIEVE—Search Images Effectively through Visual Elimination Ying Liu, Dengsheng Zhang and Guojun Lu Gippsland School of Info Tech,
Improving web image search results using query-relative classifiers Josip Krapacy Moray Allanyy Jakob Verbeeky Fr´ed´eric Jurieyy.
Title Extraction from Bodies of HTML Documents and its Application to Web Page Retrieval Microsoft Research Asia Yunhua Hu, Guomao Xin, Ruihua Song, Guoping.
1 Opinion Spam and Analysis (WSDM,08)Nitin Jindal and Bing Liu Date: 04/06/09 Speaker: Hsu, Yu-Wen Advisor: Dr. Koh, Jia-Ling.
MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.
Web Usage Mining with Semantic Analysis Date: 2013/12/18 Author: Laura Hollink, Peter Mika, Roi Blanco Source: WWW’13 Advisor: Jia-Ling Koh Speaker: Pei-Hao.
Tag Clouds Revisited Date : 2011/12/12 Source : CIKM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh. Jia-ling 1.
1 Context-Aware Search Personalization with Concept Preference CIKM’11 Advisor : Jia Ling, Koh Speaker : SHENG HONG, CHUNG.
Presented by Tienwei Tsai July, 2005
PageRank for Product Image Search Kevin Jing (Googlc IncGVU, College of Computing, Georgia Institute of Technology) Shumeet Baluja (Google Inc.) WWW 2008.
Review of the web page classification approaches and applications Luu-Ngoc Do Quang-Nhat Vo.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
1 A Unified Relevance Model for Opinion Retrieval (CIKM 09’) Xuanjing Huang, W. Bruce Croft Date: 2010/02/08 Speaker: Yu-Wen, Hsu.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.
1 Efficient Search Ranking in Social Network ACM CIKM2007 Monique V. Vieira, Bruno M. Fonseca, Rodrigo Damazio, Paulo B. Golgher, Davi de Castro Reis,
Querying Structured Text in an XML Database By Xuemei Luo.
A Probabilistic Graphical Model for Joint Answer Ranking in Question Answering Jeongwoo Ko, Luo Si, Eric Nyberg (SIGIR ’ 07) Speaker: Cho, Chin Wei Advisor:
COLOR HISTOGRAM AND DISCRETE COSINE TRANSFORM FOR COLOR IMAGE RETRIEVAL Presented by 2006/8.
11 A Hybrid Phish Detection Approach by Identity Discovery and Keywords Retrieval Reporter: 林佳宜 /10/17.
Presenter: Lung-Hao Lee ( 李龍豪 ) January 7, 309.
Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.
Improving Web Search Results Using Affinity Graph Benyu Zhang, Hua Li, Yi Liu, Lei Ji, Wensi Xi, Weiguo Fan, Zheng Chen, Wei-Ying Ma Microsoft Research.
4 1 SEARCHING THE WEB Using Search Engines and Directories Effectively New Perspectives on THE INTERNET.
BioSnowball: Automated Population of Wikis (KDD ‘10) Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/11/30 1.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
LOGO Summarizing Conversations with Clue Words Giuseppe Carenini, Raymond T. Ng, Xiaodong Zhou (WWW ’07) Advisor : Dr. Koh Jia-Ling Speaker : Tu.
21/11/20151Gianluca Demartini Ranking Clusters for Web Search Gianluca Demartini Paul–Alexandru Chirita Ingo Brunkhorst Wolfgang Nejdl L3S Info Lunch Hannover,
2005/12/021 Content-Based Image Retrieval Using Grey Relational Analysis Dept. of Computer Engineering Tatung University Presenter: Tienwei Tsai ( 蔡殿偉.
Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.
Finding Near-Duplicate Web Pages: A Large-Scale Evaluation of Algorithms Author: Monika Henzinger Presenter: Chao Yan.
Finding Experts Using Social Network Analysis 2007 IEEE/WIC/ACM International Conference on Web Intelligence Yupeng Fu, Rongjing Xiang, Yong Wang, Min.
Automatic Video Tagging using Content Redundancy Stefan Siersdorfer 1, Jose San Pedro 2, Mark Sanderson 2 1 L3S Research Center, Germany 2 University of.
LOGO Identifying Opinion Leaders in the Blogosphere Xiaodan Song, Yun Chi, Koji Hino, Belle L. Tseng CIKM 2007 Advisor : Dr. Koh Jia-Ling Speaker : Tu.
Semi-Automatic Image Annotation Liu Wenyin, Susan Dumais, Yanfeng Sun, HongJiang Zhang, Mary Czerwinski and Brent Field Microsoft Research.
Data Mining, ICDM '08. Eighth IEEE International Conference on Duy-Dinh Le National Institute of Informatics Hitotsubashi, Chiyoda-ku Tokyo,
Advantages of Query Biased Summaries in Information Retrieval by A. Tombros and M. Sanderson Presenters: Omer Erdil Albayrak Bilge Koroglu.
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,
Post-Ranking query suggestion by diversifying search Chao Wang.
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
Combining Text and Image Queries at ImageCLEF2005: A Corpus-Based Relevance-Feedback Approach Yih-Cheng Chang Department of Computer Science and Information.
Single Document Key phrase Extraction Using Neighborhood Knowledge.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
Opinion spam and Analysis 소프트웨어공학 연구실 G 최효린 1 / 35.
Information Retrieval in Practice
Improving Search Relevance for Short Queries in Community Question Answering Date: 2014/09/25 Author : Haocheng Wu, Wei Wu, Ming Zhou, Enhong Chen, Lei.
Julián ALARTE DAVID INSA JOSEP SILVA
Color Image Retrieval based on Primitives of Color Moments
Connecting the Dots Between News Article
Color Image Retrieval based on Primitives of Color Moments
Presentation transcript:

Binxing Jiao et. al (SIGIR ’10) Presenter : Lin, Yi-Jhen Advisor: Dr. Koh. Jia-ling Date: 2011/4/25 VISUAL SUMMARIZATION OF WEB PAGES

OUTLINE Introduction Internal image based visual summarization Visual Snippet External image based visual summarization Experimental results User study Conclusion

INTRODUCTION Search and re-finding tasks are among the most typical applications on the Internet. Information search and re-finding tasks would become more efficient if the web pages are visually summarized.

INTRODUCTION CONT. Thumbnails Sites with images

INTRODUCTION CONT. Difficulties of internal image based visual summarization For those web pages which have complex layout or rich content, users are difficult to see clearly from the small thumbnails. Internal images are unavailable for a large amount of web pages which do not contain useful images. We propose to mine images from the external, which is so-called external image based visual summarization. We investigate the applicability of the different visual summarization approaches to different kinds of web pages and for different tasks including search and re-finding. The different approaches including: Thumbnails Internal images External images Visual snippet

INTERNAL IMAGE BASED VISUAL SUMMARIZATION The images contained in a web page are usually used to describe the content of the page, which are so-called internal images and can be directly used to summarize the web page. We need to detect the most dominant among all the images in a web page for visually summarization. Extract features Labeled training data Ranking- SVM

VISUAL SNIPPET Visual snippet enriches internal dominant images by synthesizing the salient text (e.g., title), dominant image, and logo into a single image. Internal dominant image selection and processing Logo extraction Title extraction Image Composition Detected by a trained model Image’s location in the page Size Name Surrounding text Extracted from the html source of web page The first 19 characters of the title are cropped

EXTERNAL IMAGE BASED VISUAL SUMMARIZATION

KEY PHRASE EXTRACTION We employ the KEX algorithm to extract the key phrases. In KEX, a model is trained by logistic regression, and can be used to calculate the salience score of each candidate phrase. The top 4 key phrases are used as queries to the image search engine. For each key phrase, 30 results including images and corresponding web pages are retrieved for further processing. EXTERNAL IMAGE BASED VISUAL SUMMARIZATION CONT.

IMAGE RANKING AND FILTERING Rerank result images Based on the textual similarities between the web pages containing the images and the target web page. TFIDF --> cosine similarity Filter images Visual importance of result images is calculated using VisualRank The images whose importance score is below a threshold will be filtered out EXTERNAL IMAGE BASED VISUAL SUMMARIZATION CONT.

IMAGE RANKING AND FILTERING CONT. VisualRank Extract local SIFT features for each image Similarity(img i, img j) : the # of local features shared between them divided by the average # of local features in img i, img j A graph is constructed with images as vertices and visual similarities as weights on edges PageRank is applied on the graph and then the resulted rank score vector is employed to represent the importance of each image EXTERNAL IMAGE BASED VISUAL SUMMARIZATION CONT.

EXPERIMENT SETUP 100 queries from KDD CUP 2005 Use Bing Search API to retrieve top 100 search results for each of queries retrieved web pages was obtained Download all of internal images and retrieved external images. Expert labelers annotate whether the internal images were dominant and the external images were relevant to summarize the corresponding web page. EXPERIMENTAL RESULTS

EXPERIMENT SETUP CONT. DIP : Web pages with at least on internal image annotated as dominant NDIP : without any dominant internal image annotated 41.8% of web pages are NDIPs in our data set. Labelers judge which summary is the best to represent the web page. He/she can select “others”. EXPERIMENTAL RESULTS

USEFULNESS OF EXTERNAL IMAGES External images can be considered as a useful source for web page summarization. Web pages containing dominant images can be regarded as more appropriate for visual summarization, which, hence, can also be summarized well by a relevant image retrieved from the external. The external images are valuable complement to internal images for visual summarization of web pages. EXPERIMENTAL RESULTS CONT.

COMPARISON OF VARIOUS VISUAL SUMMARIES Dominant Image and Visual Snippet as the best summarization Thumbnail as the best summarization External Image as the best summarization EXPERIMENTAL RESULTS CONT.

COMPARISON OF VARIOUS VISUAL SUMMARIES CONT. Dominant Image and Visual Snippet as the best summarization EXPERIMENTAL RESULTS CONT. Dominant image based summarizations are the best summarization for over a half DIPs. For DIP, the visual snippet is worse than the dominant image. Title information is misread Visual snippet is rough-and- tumble.

COMPARISON OF VARIOUS VISUAL SUMMARIES CONT. Thumbnail as the best summarization EXPERIMENTAL RESULTS CONT. categorize these pages into pages in small size pages with several images Pages with salient and clear image/text Pages With logo Pages With simple page structure Pages in well-known web site With salient and clear image/text With logo Well-known web site

COMPARISON OF VARIOUS VISUAL SUMMARIES CONT. External Image as the best summarization EXPERIMENTAL RESULTS CONT. For NDIPs, external images with tremendous relevance score can be adopted as reliable summarizations Otherwise, thumbnails are better, especially for “pages with logo” and “pages in well-known web site”.

COMPARISON OF VARIOUS VISUAL SUMMARIES CONT. Summary – guidance of selecting summarization type EXPERIMENTAL RESULTS CONT. For NDIP External images are useful, but suffer from reliability problem. Thumbnail are also good, especially for “pages with logo” and “pages in well-known web site”. For DIP If the dominant images are contained and can be recognized in the thumbnail, thumbnail is the best choice; otherwise dominant images are more reliable. If an exclusive dominant image is found and the title information agrees with the content of the dominant image, visual snippet is better than a single dominant image.

USER STUDY The goal of this study is to explore how different summarization approaches support a understanding or a re-finding task. 51 participants Bachelor’s degree at least years old 56% male, 44% female Heavy web users 11 web pages are chosen from a range of different categories 9 DIP / 2 NDIP

USER STUDY CONT. Phrase 1 -- understanding Each participant was given a web page and one of four visual summaries Step 1 : guess the content of the web page based solely on the provided visual summarization Step 2 : read the web page and to rate a score ranging from -1 to 3 -1 : the summarization misleads the participant 0 : participant can not understanding the summarization 1 : the summarization can help users grasp a rough idea on what the page is about 3 : the summarization can help users exactly understanding the content of the web page 2 : the between of 1 and 3

USER STUDY CONT. Phrase 2 -- re-finding We explored how the various summarizations types would affect the recall of the web pages visited in phrase 1. we specify a web page to be re-found, the participants were asked to select an image, which can remind them of the web page. To evaluate the performance of each summarization type for this phrase, we define the error rate of a summarization type (e(s)) as I{ } : an indicator, 1 or 0

RESULTS AND DISCUSSION USER STUDY CONT. Understanding Task For DIP For NDIP

RESULTS AND DISCUSSION CONT. USER STUDY CONT. Re-finding Task

RESULTS AND DISCUSSION CONT. USER STUDY CONT. Re-finding Task The visual snippet performs better than other summarizations. The additional two components of the visual snippet : Title and logo information are useful for identifying different visited web pages.

RESULTS AND DISCUSSION CONT. USER STUDY CONT. Re-finding Task Thumbnail is good Thumbnail worse than visual snippet External image based summarization is not suitable for re-finding task

CONCLUSION An external image based visual summarization is proposed, which selects the representative images from the entire Internet. We present a careful study on comparing the various visual summarization method for web pages. The various summarization approaches have respective advantages on different types of web pages and for different tasks. Thumbnails are simple and effective for web pages with simple structure or clear image and text with large size Internal images can summarize the web page well if it includes dominant images External images are useful for those web pages without dominant images Visual snippets are effective in helping users identifying the visited web pages in the re-finding task