BY Asef poormasoomi. Motivation summaries which are generic in nature do not cater to the user’s background and interests results show that each person.

Slides:

Advertisements

Similar presentations

A Comparison of Implicit and Explicit Links for Web Page Classification Dou Shen 1 Jian-Tao Sun 2 Qiang Yang 1 Zheng Chen 2 1 Department of Computer Science.

Advertisements

CWS: A Comparative Web Search System Jian-Tao Sun, Xuanhui Wang, § Dou Shen Hua-Jun Zeng, Zheng Chen Microsoft Research Asia University of Illinois at.

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki

Document Summarization using Conditional Random Fields Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, Zheng Chen IJCAI 2007 Hao-Chin Chang Department of Computer.

Chapter 5: Introduction to Information Retrieval

Overview of Collaborative Information Retrieval (CIR) at FIRE 2012 Debasis Ganguly, Johannes Leveling, Gareth Jones School of Computing, CNGL, Dublin City.

Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Web-Page Summarization Using Clickthrough Data Advisor.

Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.

Evaluating Search Engine

WebMiningResearch ASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007.

Context-Aware Query Classification Huanhuan Cao 1, Derek Hao Hu 2, Dou Shen 3, Daxin Jiang 4, Jian-Tao Sun 4, Enhong Chen 1 and Qiang Yang 2 1 University.

6/16/20151 Recent Results in Automatic Web Resource Discovery Soumen Chakrabartiv Presentation by Cui Tao.

WebMiningResearchASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007 Revised.

Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed.

1 Automatic Identification of User Goals in Web Search Uichin Lee, Zhenyu Liu, Junghoo Cho Computer Science Department, UCLA {uclee, vicliu,

1 Today  Tools (Yves)  Efficient Web Browsing on Hand Held Devices (Shrenik)  Web Page Summarization using Click- through Data (Kathy)  On the Summarization.

Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.

Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.

By : asef poormasoomi autumn Introduction summary: brief but accurate representation of the contents of a document 2.

Chapter 5: Information Retrieval and Web Search

Overview of Search Engines

WebPage Summarization Using Clickthrough Data JianTao Sun & Yuchang Lu, TsingHua University, China Dou Shen & Qiang Yang, HK University of Science & Technology.

Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.

Search Engines and Information Retrieval Chapter 1.

RuleML-2007, Orlando, Florida1 Towards Knowledge Extraction from Weblogs and Rule-based Semantic Querying Xi Bai, Jigui Sun, Haiyan Che, Jin.

1 Wikification CSE 6339 (Section 002) Abhijit Tendulkar.

Web-page Classification through Summarization D. Shen, *Z. Chen, **Q Yang, *H.J. Zeng, *B.Y. Zhang, Y.H. Lu and *W.Y. Ma TsingHua University, *Microsoft.

Redeeming Relevance for Subject Search in Citation Indexes Shannon Bradshaw The University of Iowa

A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.

1 Applying Collaborative Filtering Techniques to Movie Search for Better Ranking and Browsing Seung-Taek Park and David M. Pennock (ACM SIGKDD 2007)

UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.

When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.

CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.

WebMining Web Mining By- Pawan Singh Piyush Arora Pooja Mansharamani Pramod Singh Praveen Kumar 1.

Glasgow 02/02/04 NN k networks for content-based image retrieval Daniel Heesch.

1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)

Implicit User Feedback Hongning Wang Explicit relevance feedback 2 Updated query Feedback Judgments: d 1 + d 2 - d 3 + … d k -... Query User judgment.

Chapter 6: Information Retrieval and Web Search

Presenter: Shanshan Lu 03/04/2010

CONCLUSION & FUTURE WORK Normally, users perform search tasks using multiple applications in concert: a search engine interface presents lists of potentially.

1 SIGIR 2004 Web-page Classification through Summarization Dou Shen Zheng Chen * Qiang Yang Presentation ： Yao-Min Huang Date ： 09/15/2004.

Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.

Improving Web Search Results Using Affinity Graph Benyu Zhang, Hua Li, Yi Liu, Lei Ji, Wensi Xi, Weiguo Fan, Zheng Chen, Wei-Ying Ma Microsoft Research.

Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.

Analysis of Topic Dynamics in Web Search Xuehua Shen (University of Illinois) Susan Dumais (Microsoft Research) Eric Horvitz (Microsoft Research) WWW 2005.

1 Web-Page Summarization Using Clickthrough Data* JianTao Sun, Yuchang Lu Dept. of Computer Science TsingHua University Beijing , China Dou Shen,

Mining Binary Constraints in Feature Models: A Classification-based Approach Yi Li.

1 Sentence Extraction-based Presentation Summarization Techniques and Evaluation Metrics Makoto Hirohata, Yousuke Shinnaka, Koji Iwano and Sadaoki Furui.

Algorithmic Detection of Semantic Similarity WWW 2005.

Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.

DOCUMENT UPDATE SUMMARIZATION USING INCREMENTAL HIERARCHICAL CLUSTERING CIKM’10 (DINGDING WANG, TAO LI) Advisor: Koh, Jia-Ling Presenter: Nonhlanhla Shongwe.

Probabilistic Latent Query Analysis for Combining Multiple Retrieval Sources Rong Yan Alexander G. Hauptmann School of Computer Science Carnegie Mellon.

Automatic Video Tagging using Content Redundancy Stefan Siersdorfer 1, Jose San Pedro 2, Mark Sanderson 2 1 L3S Research Center, Germany 2 University of.

Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.

+ User-induced Links in Collaborative Tagging Systems Ching-man Au Yeung, Nicholas Gibbins, Nigel Shadbolt CIKM’09 Speaker: Nonhlanhla Shongwe 18 January.

Information Retrieval

Web Information Retrieval Prof. Alessandro Agostini 1 Context in Web Search Steve Lawrence Speaker: Antonella Delmestri IEEE Data Engineering Bulletin.

UWMS Data Mining Workshop Content Analysis: Automated Summarizing Prof. Marti Hearst SIMS 202, Lecture 16.

Context-Aware Query Classification Huanhuan Cao, Derek Hao Hu, Dou Shen, Daxin Jiang, Jian-Tao Sun, Enhong Chen, Qiang Yang Microsoft Research Asia SIGIR.

Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.

Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.

An Adaptive User Profile for Filtering News Based on a User Interest Hierarchy Sarabdeep Singh, Michael Shepherd, Jack Duffy and Carolyn Watters Web Information.

1 CS 430 / INFO 430: Information Retrieval Lecture 20 Web Search 2.

Personalized Ontology for Web Search Personalization S. Sendhilkumar, T.V. Geetha Anna University, Chennai India 1st ACM Bangalore annual Compute conference,

Evaluation Anisio Lacerda.

Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance Hello everyone,

Multimedia Information Retrieval

Presentation transcript:

BY Asef poormasoomi

Motivation summaries which are generic in nature do not cater to the user’s background and interests results show that each person has different perspective on the same text So a good summary should change in accordance to preferences of its reader

Motivation Marcu-1997: found percent agreement of 13 judges over 5 texts from scientific America is 71 percent. Rath-1961 : found that extracts selected by four different human judges had only 25 percent overlap Salton-1997 : found that most important 20 paragraphs extracted by 2 subjects have only 46 percent overlap

Users Feedback Query History: is the most widely used implicit user feedback at present. Data Click: when a user clicks on a document, the document is considered to be of more interest to the user than other unclicked ones Attention Time : often referred to as display time or reading time Other types of implicit user feedbacks : Other types of implicit user feedbacks include display time, scrolling, annotation, bookmarking and printing behaviors

ARTICLE1 : Generating Personalized Summaries Using Publicly AvailableWeb Documents 2008 IEEE Chandan Kumar, Prasad Pingali, Vasudeva Varma

extract the personal information of the user using information available on the web Generic Sentence Scoring In General : compute the probability distribution over the words w appearing in the input D, p(w|D) : For each sentence S in the input, assign a weight equal to the average probability of the words in the sentence Article1 : Generating Personalized Summaries Using Publicly AvailableWeb Documents, 2008 IEEE, Chandan Kumar, Prasad Pingali, Vasudeva Varma

Estimating User Background model : used search engine to extract the personal information of the user using information available on the web. put the person’s full name to a search engine (name is quoted with double quotation such as ”Albert Einstein”) ’n’ top documents are taken and retrieved. After performing the removal of stop words and stemming, a unigram language model is learned on the extracted text content. This model can be interpreted as the probability of a word w being related to the person’s profile U : Article1 : Generating Personalized Summaries Using Publicly AvailableWeb Documents, 2008 IEEE, Chandan Kumar, Prasad Pingali, Vasudeva Varma

User Specific Sentence Scoring : the term probability of the document set D p(w|D), and the user profile U p(w|U) have been merged using a linear weighted combination. The score of a sentence S for user u is given as :

Article1 : Generating Personalized Summaries Using Publicly AvailableWeb Documents, 2008 IEEE, Chandan Kumar, Prasad Pingali, Vasudeva Varma After sentence scoring, eliminate redundancy : for redundancy identification, use the measure of number of terms overlapping between the already generated summary and the new sentence being considered sentence are arranged based on chronological ordering (between documents i.e.based on the time stamp) and order of occurrence (within the document).

Article1 : Generating Personalized Summaries Using Publicly AvailableWeb Documents, 2008 IEEE, Chandan Kumar, Prasad Pingali, Vasudeva Varma

Example : Topic of summary generation is ”Microsoft to open research lab in India” 8 articles published in different new sources forms the news cluster In the example we are showing the condensed summary(100 words) for two users. User A is from NLP domain and User B from network security domain. The italic text in user specific summary shows the differnce compare to generic summary

Article1 : Generating Personalized Summaries Using Publicly AvailableWeb Documents, 2008 IEEE, Chandan Kumar, Prasad Pingali, Vasudeva Varma Generic summary: The New Lab, Called Microsoft Research India, Goes Online In January, And Will Be Part Of A Network Of Five Research Labs That Microsoft Runs Worldwide, Said Padmanabhan Anandan, Managing Director Of Microsoft Research India. Microsoft’s Mission India, Formally Inaugurated Jan. 12, 2005, Is Microsoft’s Third Basic Research Facility Established Outside The United States. In Line With Microsoft’s Research Strategy Worldwide, The Bangalore Lab Will Collaborate With And Fund Research At Key Educational Institutions In India, Such As The Indian Institutes Of Technology, Anandan Said. Although Microsoft Research Doesn’t Engage In Product Development Itself, Technologies Researchers Create Can Make Their Way Into The Products The Company

Article1 : Generating Personalized Summaries Using Publicly AvailableWeb Documents, 2008 IEEE, Chandan Kumar, Prasad Pingali, Vasudeva Varma User A Specific summary : The New Lab, Called Microsoft Research India, Goes Online In January, And Will Be Part Of A Network Of Five Research Labs That Microsoft Runs Worldwide, Said Padmanabhan Anandan, Managing Director Of Microsoft Research India.Microsoft’s Mission India, Formally Inaugurated Jan. 12, 2005, Is Microsoft’s Third Basic Research Facility Established Outside The United States. Microsoft Will Collaborate With The Government Of India And The Indian Scientific Community To Conduct Research In Indic Language Computing Technologies, This Will Include Areas Such As Machine Translation Between Indian Languages And English, Search And Browsing And Character Recognition. In Line With Microsoft’s Research Strategy Worldwide,The Bangalore Lab

Article1 : Generating Personalized Summaries Using Publicly AvailableWeb Documents, 2008 IEEE, Chandan Kumar, Prasad Pingali, Vasudeva Varma User B Specific summary : The New Lab, Called Microsoft Research India, Goes Online In January, And Will Be Part Of A Network Of Five Research Labs That Microsoft Runs Worldwide, Said Padmanabhan Anandan, Managing Director Of Microsoft Research India. The Newly Announced India Research Group Focuses On Cryptography, Security, Algorithms And Multimedia Security, Ramarathnam Venkatesan, A Leading Cryptographer At Microsoft Research In Redmond, Washington, In The US, Will Head The New Group. Microsoft Research India will conduct a four-week summer school featuring lectures by leading experts in the fields of cryptography, algorithms and security. The program is aimed at senior undergraduate students, graduate students and faculty

Article1 : Generating Personalized Summaries Using Publicly AvailableWeb Documents, 2008 IEEE, Chandan Kumar, Prasad Pingali, Vasudeva Varma Evaluation The evaluation of this technique was carried out on five different research scholars working in different fields of computer science News articles of science and technology domain were considered for summarization. 25 different topics were chosen with each topic having 5-10 articles. Each researcher was asked to judge the relevance of both versions of summaries for all 25 topics( 1-5 score ). Result show that the users prefer profile based personalized summaries compared to a generic summary given by general automatic summarization system

Article1 : Generating Personalized Summaries Using Publicly AvailableWeb Documents, 2008 IEEE, Chandan Kumar, Prasad Pingali, Vasudeva Varma

Evaluation This figure shows the scores given by a particular user across different topics. for most of the topics user find personalized summaries relevant for him. personalized summaries for the topics strongly related to the user’s domain are more relevant to him For topics which are not closely related to user’s field, the personalized and generic summaries are quite similar For a few rare topics the user did not find personalized summary better

Article1 : Generating Personalized Summaries Using Publicly AvailableWeb Documents, 2008 IEEE, Chandan Kumar, Prasad Pingali, Vasudeva Varma Evaluation

ARTICLE2 : User-oriented Document Summarization Through Vision-based Eye-tracking 2009 ACM Songhua Xu, Hao Jiang, Francis C.M. Lau

Article2 : User-Oriented Document Summarization through Vision-Based Eye- Tracking, 2009 ACM, Songhua Xu, Hao Jiang, Francis C.M. Lau MAIN IDEA The key idea is to rely on the attention (reading) time of individual users spent on single words in a document. The prediction of user attention over every word in a document is based on the user’s attention during his previous reads algorithm tracks a user’s attention times over individual words using a vision-based commodity eye-tracking mechanism. user attention time over any arbitrary word is predicted by a data mining process use simple web camera and an existent eye-tracking algorithm “Opengazer project” The error of the detected gaze location on the screen is between 1–2 cm, depending which area of the screen the user is looking at (a 19” screen monitor).

Article2 : User-Oriented Document Summarization through Vision-Based Eye- Tracking, 2009 ACM, Songhua Xu, Hao Jiang, Francis C.M. Lau Anchoring Gaze Samples onto Individual Words the detected gaze central point is positioned at (x; y) on the screen space compute the central displaying point of the word which is denoted as (xi; yi). and are the average width and height of a word’s displaying bounding box in the document For each gaze detected by eye-tracking module, assign the gaze samples to the words in the document in this manner. The overall attention that a word in the document receives is the sum of all the fractional gaze samples it is assigned in the above process During processing, remove the stop words.

Article2 : User-Oriented Document Summarization through Vision-Based Eye- Tracking, 2009 ACM, Songhua Xu, Hao Jiang, Francis C.M. Lau PREDICTION OF USER ATTENTION OVER A SENTENCE attention time prediction for a word is based on the semantic similarity of two words. Sim(wi,wj) to denote the semantic similarity between word wi and word wj, where Sim(wi,wj) € [0; 1] use the algorithm proposed in : Y. Li, Z. A. Bandar, and D. Mclean. An approach for measuring semantic similarity between words using multiple information sources. IEEE Transactions on Knowledge and Data Engineering. for an arbitrary word w which is not among, calculate the similarity between w and every wi(i = 1,…, n) and then select k words which share the highest semantic similarity with w.( k is set as min(10; n) )

Article2 : User-Oriented Document Summarization through Vision-Based Eye- Tracking, 2009 ACM, Songhua Xu, Hao Jiang, Francis C.M. Lau Predicting User Attention for Sentences estimate the total attention of a certain user on a sentence as the sum of the user’s attention over all the words in the sentence : AT(w i ;U j ) is user U j ’s attention over the word w i, which is either sampled from the user’s previous reading activities via (1) or predicted via (2). = 0 if the word w i is a stop word; = 0:6 if there is no attention sample for the user U j over the word w i, = 1,otherwise

Article2 : User-Oriented Document Summarization through Vision-Based Eye- Tracking, 2009 ACM, Songhua Xu, Hao Jiang, Francis C.M. Lau

A Hybrid Summarization Approach In early experiments, noticed that the performance of our user-oriented document summarization algorithm heavily depends on the amount of available user attention time samples To address the issue, integrate new method with a conventional automatic document summarization algorithm(MEAD) = 1 if sentence s i is selected by MEAD in its document summarization result, = 0 otherwise. k is free parameter and is user tunable.

Article2 : User-Oriented Document Summarization through Vision-Based Eye- Tracking, 2009 ACM, Songhua Xu, Hao Jiang, Francis C.M. Lau EXPERIMENT RESULTS comparing the document summarization results with those generated by two popular text summarization algorithms. use two sets of articles. Articles in the first set are all about science (60 articles from “Science” magazine) and articles in the second set are all about entertainment and leisure (sixty articles are randomly selected from the travel and sports section on “New York Times”) 12 people with different knowledge backgrounds read some selected articles from the two article sets. they are asked to provide a summary for the article they just read

Article2 : User-Oriented Document Summarization through Vision-Based Eye- Tracking, 2009 ACM, Songhua Xu, Hao Jiang, Francis C.M. Lau EXPERIMENT RESULTS to measure the performance, three measurements :Recall (R), Precision (P) and F-rate (F) are introduced SU e is the human summary result

Article2 : User-Oriented Document Summarization through Vision-Based Eye- Tracking, 2009 ACM, Songhua Xu, Hao Jiang, Francis C.M. Lau EXPERIMENT RESULTS

Article2 : User-Oriented Document Summarization through Vision-Based Eye- Tracking, 2009 ACM, Songhua Xu, Hao Jiang, Francis C.M. Lau EXPERIMENT RESULTS experiment to evaluate the performance of hybrid approach under different settings for the parameter K.

Article3 : WebPage Summarization Using Clickthrough Data, 2005 ACM, JianTao Sun, Dou Shen, HuaJun Zeng, Qiang Yang, Yuchang Lu, Zheng Chen Main Idea use extra knowledge of the clickthrough data to improve Web-page summarization collection of clickthrough data, can be represented by a set of triples Typically, a user's query words, reflect the true meaning of the target Web-page content In new algorithm, adapt two text-summarization methods to summarize Web pages. The first approach is based on significant-word selection adapted from Luhn's method The second method is based on Latent Semantic Analysis (LSA)

Article3 : WebPage Summarization Using Clickthrough Data, 2005 ACM, JianTao Sun, Dou Shen, HuaJun Zeng, Qiang Yang, Yuchang Lu, Zheng Chen Problems Web pages may have no associated query words the clickthrough data are often very noisy Solution thematic lexicon : ( using the annotated hierarchical taxonomy of Web pages such as the one provided by ODP web-site ( )

Article3 : WebPage Summarization Using Clickthrough Data, 2005 ACM, JianTao Sun, Dou Shen, HuaJun Zeng, Qiang Yang, Yuchang Lu, Zheng Chen Adapted Significant Word (ASW) Method each sentence is assigned a significance factor( word frequency ) and the sentences with high significance factors are selected to form the summary customized factor : Adapted Latent Semantic Analysis (ALSA) Method The corpus can be represented by a term-document matrix.

Article3 : WebPage Summarization Using Clickthrough Data, 2005 ACM, JianTao Sun, Dou Shen, HuaJun Zeng, Qiang Yang, Yuchang Lu, Zheng Chen Summarize Web Pages Not Covered by Clickthrough Data build a thematic lexicon use TS(c) to represent a set of terms associated with category c. thematic lexicon is a set of TS, which correspond with categories in ODP. The lexicon is built as follows : first, TS corresponding to each category is set empty for each page covered by the clickthrough data, its query words are added into TS if a page belongs to more than one category, its query terms will be added into all TS associated with all its categories. At last, term weight in each TS is multiplied by its Inverse Category Frequency (ICF). For each Web page that are not covered by the clickthrough data,first look up the lexicon for TS according to the page's category, Then the summarization methods are used. When a TS does not have sufficient terms, TS corresponding with its parent category is used

Article3 : WebPage Summarization Using Clickthrough Data, 2005 ACM, JianTao Sun, Dou Shen, HuaJun Zeng, Qiang Yang, Yuchang Lu, Zheng Chen EXPERIMENTS data set contains about 44.7 million records of 29 days from Dec 6 of 2003 to Jan 3 of 2004 (MSN search engine ) 3,074,678 Web pages of the ODP directory are crawled. Web pages crawled At last got 1,125,207 Web pages, 260,763 of which are clicked by Web users using 1,586,472 different queries. DAT1, consists of 90 pages which are selected from the browsed pages. Three human evaluators were employed to summarize these pages they also use a relatively large scale data set, denoted by DAT2, to evaluate summarization methods(10,000 pages).

Article3 : WebPage Summarization Using Clickthrough Data, 2005 ACM, JianTao Sun, Dou Shen, HuaJun Zeng, Qiang Yang, Yuchang Lu, Zheng Chen Summarization Results on DAT1 (ASW) ROUGE is a software package adopted by DUC for automatic summarization evaluation ( cyl/ROUGE/ )

Article3 : WebPage Summarization Using Clickthrough Data, 2005 ACM, JianTao Sun, Dou Shen, HuaJun Zeng, Qiang Yang, Yuchang Lu, Zheng Chen Summarization Results on DAT1(ALSA)

Article3 : WebPage Summarization Using Clickthrough Data, 2005 ACM, JianTao Sun, Dou Shen, HuaJun Zeng, Qiang Yang, Yuchang Lu, Zheng Chen evaluation summarization method using the thematic lexicon clickthrough data contains only 260,763 pages, and lexicon contains 141,869 categories, which is a subset of the ODP category structure. If terms under this category have more than P% overlap with distinct terms in the Web page, then they are used for summarization. Otherwise, use lexicon terms of its parent category. This process continues until we find a category which covers enough query terms or until we reach the root of the thematic lexicon

Article3 : WebPage Summarization Using Clickthrough Data, 2005 ACM, JianTao Sun, Dou Shen, HuaJun Zeng, Qiang Yang, Yuchang Lu, Zheng Chen evaluation summarization method using the thematic lexicon (ASW)

Article3 : WebPage Summarization Using Clickthrough Data, 2005 ACM, JianTao Sun, Dou Shen, HuaJun Zeng, Qiang Yang, Yuchang Lu, Zheng Chen evaluation summarization method using the thematic lexicon (ALSA)

thanks