Presentation is loading. Please wait.

Presentation is loading. Please wait.

BY Asef poormasoomi. Motivation summaries which are generic in nature do not cater to the user’s background and interests results show that each person.

Similar presentations


Presentation on theme: "BY Asef poormasoomi. Motivation summaries which are generic in nature do not cater to the user’s background and interests results show that each person."— Presentation transcript:

1 BY Asef poormasoomi

2 Motivation summaries which are generic in nature do not cater to the user’s background and interests results show that each person has different perspective on the same text So a good summary should change in accordance to preferences of its reader

3 Motivation Marcu-1997: found percent agreement of 13 judges over 5 texts from scientific America is 71 percent. Rath-1961 : found that extracts selected by four different human judges had only 25 percent overlap Salton-1997 : found that most important 20 paragraphs extracted by 2 subjects have only 46 percent overlap

4 Users Feedback Query History: is the most widely used implicit user feedback at present. http://www.google.com/psearch Data Click: when a user clicks on a document, the document is considered to be of more interest to the user than other unclicked ones Attention Time : often referred to as display time or reading time Other types of implicit user feedbacks : Other types of implicit user feedbacks include display time, scrolling, annotation, bookmarking and printing behaviors

5 ARTICLE1 : Generating Personalized Summaries Using Publicly AvailableWeb Documents 2008 IEEE Chandan Kumar, Prasad Pingali, Vasudeva Varma

6 extract the personal information of the user using information available on the web Generic Sentence Scoring In General : compute the probability distribution over the words w appearing in the input D, p(w|D) : For each sentence S in the input, assign a weight equal to the average probability of the words in the sentence Article1 : Generating Personalized Summaries Using Publicly AvailableWeb Documents, 2008 IEEE, Chandan Kumar, Prasad Pingali, Vasudeva Varma

7 Estimating User Background model : used search engine to extract the personal information of the user using information available on the web. put the person’s full name to a search engine (name is quoted with double quotation such as ”Albert Einstein”) ’n’ top documents are taken and retrieved. After performing the removal of stop words and stemming, a unigram language model is learned on the extracted text content. This model can be interpreted as the probability of a word w being related to the person’s profile U : Article1 : Generating Personalized Summaries Using Publicly AvailableWeb Documents, 2008 IEEE, Chandan Kumar, Prasad Pingali, Vasudeva Varma

8 User Specific Sentence Scoring : the term probability of the document set D p(w|D), and the user profile U p(w|U) have been merged using a linear weighted combination. The score of a sentence S for user u is given as :

9 Article1 : Generating Personalized Summaries Using Publicly AvailableWeb Documents, 2008 IEEE, Chandan Kumar, Prasad Pingali, Vasudeva Varma After sentence scoring, eliminate redundancy : for redundancy identification, use the measure of number of terms overlapping between the already generated summary and the new sentence being considered sentence are arranged based on chronological ordering (between documents i.e.based on the time stamp) and order of occurrence (within the document).

10 Article1 : Generating Personalized Summaries Using Publicly AvailableWeb Documents, 2008 IEEE, Chandan Kumar, Prasad Pingali, Vasudeva Varma

11 Example : Topic of summary generation is ”Microsoft to open research lab in India” 8 articles published in different new sources forms the news cluster In the example we are showing the condensed summary(100 words) for two users. User A is from NLP domain and User B from network security domain. The italic text in user specific summary shows the differnce compare to generic summary

12 Article1 : Generating Personalized Summaries Using Publicly AvailableWeb Documents, 2008 IEEE, Chandan Kumar, Prasad Pingali, Vasudeva Varma Generic summary: The New Lab, Called Microsoft Research India, Goes Online In January, And Will Be Part Of A Network Of Five Research Labs That Microsoft Runs Worldwide, Said Padmanabhan Anandan, Managing Director Of Microsoft Research India. Microsoft’s Mission India, Formally Inaugurated Jan. 12, 2005, Is Microsoft’s Third Basic Research Facility Established Outside The United States. In Line With Microsoft’s Research Strategy Worldwide, The Bangalore Lab Will Collaborate With And Fund Research At Key Educational Institutions In India, Such As The Indian Institutes Of Technology, Anandan Said. Although Microsoft Research Doesn’t Engage In Product Development Itself, Technologies Researchers Create Can Make Their Way Into The Products The Company

13 Article1 : Generating Personalized Summaries Using Publicly AvailableWeb Documents, 2008 IEEE, Chandan Kumar, Prasad Pingali, Vasudeva Varma User A Specific summary : The New Lab, Called Microsoft Research India, Goes Online In January, And Will Be Part Of A Network Of Five Research Labs That Microsoft Runs Worldwide, Said Padmanabhan Anandan, Managing Director Of Microsoft Research India.Microsoft’s Mission India, Formally Inaugurated Jan. 12, 2005, Is Microsoft’s Third Basic Research Facility Established Outside The United States. Microsoft Will Collaborate With The Government Of India And The Indian Scientific Community To Conduct Research In Indic Language Computing Technologies, This Will Include Areas Such As Machine Translation Between Indian Languages And English, Search And Browsing And Character Recognition. In Line With Microsoft’s Research Strategy Worldwide,The Bangalore Lab

14 Article1 : Generating Personalized Summaries Using Publicly AvailableWeb Documents, 2008 IEEE, Chandan Kumar, Prasad Pingali, Vasudeva Varma User B Specific summary : The New Lab, Called Microsoft Research India, Goes Online In January, And Will Be Part Of A Network Of Five Research Labs That Microsoft Runs Worldwide, Said Padmanabhan Anandan, Managing Director Of Microsoft Research India. The Newly Announced India Research Group Focuses On Cryptography, Security, Algorithms And Multimedia Security, Ramarathnam Venkatesan, A Leading Cryptographer At Microsoft Research In Redmond, Washington, In The US, Will Head The New Group. Microsoft Research India will conduct a four-week summer school featuring lectures by leading experts in the fields of cryptography, algorithms and security. The program is aimed at senior undergraduate students, graduate students and faculty

15 Article1 : Generating Personalized Summaries Using Publicly AvailableWeb Documents, 2008 IEEE, Chandan Kumar, Prasad Pingali, Vasudeva Varma Evaluation The evaluation of this technique was carried out on five different research scholars working in different fields of computer science News articles of science and technology domain were considered for summarization. 25 different topics were chosen with each topic having 5-10 articles. Each researcher was asked to judge the relevance of both versions of summaries for all 25 topics( 1-5 score ). Result show that the users prefer profile based personalized summaries compared to a generic summary given by general automatic summarization system

16 Article1 : Generating Personalized Summaries Using Publicly AvailableWeb Documents, 2008 IEEE, Chandan Kumar, Prasad Pingali, Vasudeva Varma

17 Evaluation This figure shows the scores given by a particular user across different topics. for most of the topics user find personalized summaries relevant for him. personalized summaries for the topics strongly related to the user’s domain are more relevant to him For topics which are not closely related to user’s field, the personalized and generic summaries are quite similar For a few rare topics the user did not find personalized summary better

18 Article1 : Generating Personalized Summaries Using Publicly AvailableWeb Documents, 2008 IEEE, Chandan Kumar, Prasad Pingali, Vasudeva Varma Evaluation

19 ARTICLE2 : User-oriented Document Summarization Through Vision-based Eye-tracking 2009 ACM Songhua Xu, Hao Jiang, Francis C.M. Lau

20 Article2 : User-Oriented Document Summarization through Vision-Based Eye- Tracking, 2009 ACM, Songhua Xu, Hao Jiang, Francis C.M. Lau MAIN IDEA The key idea is to rely on the attention (reading) time of individual users spent on single words in a document. The prediction of user attention over every word in a document is based on the user’s attention during his previous reads algorithm tracks a user’s attention times over individual words using a vision-based commodity eye-tracking mechanism. user attention time over any arbitrary word is predicted by a data mining process use simple web camera and an existent eye-tracking algorithm “Opengazer project” The error of the detected gaze location on the screen is between 1–2 cm, depending which area of the screen the user is looking at (a 19” screen monitor).

21 Article2 : User-Oriented Document Summarization through Vision-Based Eye- Tracking, 2009 ACM, Songhua Xu, Hao Jiang, Francis C.M. Lau Anchoring Gaze Samples onto Individual Words the detected gaze central point is positioned at (x; y) on the screen space compute the central displaying point of the word which is denoted as (xi; yi). and are the average width and height of a word’s displaying bounding box in the document For each gaze detected by eye-tracking module, assign the gaze samples to the words in the document in this manner. The overall attention that a word in the document receives is the sum of all the fractional gaze samples it is assigned in the above process During processing, remove the stop words.

22 Article2 : User-Oriented Document Summarization through Vision-Based Eye- Tracking, 2009 ACM, Songhua Xu, Hao Jiang, Francis C.M. Lau PREDICTION OF USER ATTENTION OVER A SENTENCE attention time prediction for a word is based on the semantic similarity of two words. Sim(wi,wj) to denote the semantic similarity between word wi and word wj, where Sim(wi,wj) € [0; 1] use the algorithm proposed in : Y. Li, Z. A. Bandar, and D. Mclean. An approach for measuring semantic similarity between words using multiple information sources. IEEE Transactions on Knowledge and Data Engineering. for an arbitrary word w which is not among, calculate the similarity between w and every wi(i = 1,…, n) and then select k words which share the highest semantic similarity with w.( k is set as min(10; n) )

23 Article2 : User-Oriented Document Summarization through Vision-Based Eye- Tracking, 2009 ACM, Songhua Xu, Hao Jiang, Francis C.M. Lau Predicting User Attention for Sentences estimate the total attention of a certain user on a sentence as the sum of the user’s attention over all the words in the sentence : AT(w i ;U j ) is user U j ’s attention over the word w i, which is either sampled from the user’s previous reading activities via (1) or predicted via (2). = 0 if the word w i is a stop word; = 0:6 if there is no attention sample for the user U j over the word w i, = 1,otherwise

24 Article2 : User-Oriented Document Summarization through Vision-Based Eye- Tracking, 2009 ACM, Songhua Xu, Hao Jiang, Francis C.M. Lau

25 A Hybrid Summarization Approach In early experiments, noticed that the performance of our user-oriented document summarization algorithm heavily depends on the amount of available user attention time samples To address the issue, integrate new method with a conventional automatic document summarization algorithm(MEAD) = 1 if sentence s i is selected by MEAD in its document summarization result, = 0 otherwise. k is free parameter and is user tunable.

26 Article2 : User-Oriented Document Summarization through Vision-Based Eye- Tracking, 2009 ACM, Songhua Xu, Hao Jiang, Francis C.M. Lau EXPERIMENT RESULTS comparing the document summarization results with those generated by two popular text summarization algorithms. use two sets of articles. Articles in the first set are all about science (60 articles from “Science” magazine) and articles in the second set are all about entertainment and leisure (sixty articles are randomly selected from the travel and sports section on “New York Times”) 12 people with different knowledge backgrounds read some selected articles from the two article sets. they are asked to provide a summary for the article they just read

27 Article2 : User-Oriented Document Summarization through Vision-Based Eye- Tracking, 2009 ACM, Songhua Xu, Hao Jiang, Francis C.M. Lau EXPERIMENT RESULTS to measure the performance, three measurements :Recall (R), Precision (P) and F-rate (F) are introduced SU e is the human summary result

28 Article2 : User-Oriented Document Summarization through Vision-Based Eye- Tracking, 2009 ACM, Songhua Xu, Hao Jiang, Francis C.M. Lau EXPERIMENT RESULTS

29 Article2 : User-Oriented Document Summarization through Vision-Based Eye- Tracking, 2009 ACM, Songhua Xu, Hao Jiang, Francis C.M. Lau EXPERIMENT RESULTS experiment to evaluate the performance of hybrid approach under different settings for the parameter K.

30

31 Article3 : WebPage Summarization Using Clickthrough Data, 2005 ACM, JianTao Sun, Dou Shen, HuaJun Zeng, Qiang Yang, Yuchang Lu, Zheng Chen Main Idea use extra knowledge of the clickthrough data to improve Web-page summarization collection of clickthrough data, can be represented by a set of triples Typically, a user's query words, reflect the true meaning of the target Web-page content In new algorithm, adapt two text-summarization methods to summarize Web pages. The first approach is based on significant-word selection adapted from Luhn's method The second method is based on Latent Semantic Analysis (LSA)

32 Article3 : WebPage Summarization Using Clickthrough Data, 2005 ACM, JianTao Sun, Dou Shen, HuaJun Zeng, Qiang Yang, Yuchang Lu, Zheng Chen Problems Web pages may have no associated query words the clickthrough data are often very noisy Solution thematic lexicon : ( using the annotated hierarchical taxonomy of Web pages such as the one provided by ODP web-site (http://dmoz.org/) )

33 Article3 : WebPage Summarization Using Clickthrough Data, 2005 ACM, JianTao Sun, Dou Shen, HuaJun Zeng, Qiang Yang, Yuchang Lu, Zheng Chen Adapted Significant Word (ASW) Method each sentence is assigned a significance factor( word frequency ) and the sentences with high significance factors are selected to form the summary customized factor : Adapted Latent Semantic Analysis (ALSA) Method The corpus can be represented by a term-document matrix.

34 Article3 : WebPage Summarization Using Clickthrough Data, 2005 ACM, JianTao Sun, Dou Shen, HuaJun Zeng, Qiang Yang, Yuchang Lu, Zheng Chen Summarize Web Pages Not Covered by Clickthrough Data build a thematic lexicon use TS(c) to represent a set of terms associated with category c. thematic lexicon is a set of TS, which correspond with categories in ODP. The lexicon is built as follows : first, TS corresponding to each category is set empty for each page covered by the clickthrough data, its query words are added into TS if a page belongs to more than one category, its query terms will be added into all TS associated with all its categories. At last, term weight in each TS is multiplied by its Inverse Category Frequency (ICF). For each Web page that are not covered by the clickthrough data,first look up the lexicon for TS according to the page's category, Then the summarization methods are used. When a TS does not have sufficient terms, TS corresponding with its parent category is used

35 Article3 : WebPage Summarization Using Clickthrough Data, 2005 ACM, JianTao Sun, Dou Shen, HuaJun Zeng, Qiang Yang, Yuchang Lu, Zheng Chen EXPERIMENTS data set contains about 44.7 million records of 29 days from Dec 6 of 2003 to Jan 3 of 2004 (MSN search engine ) 3,074,678 Web pages of the ODP directory are crawled. Web pages crawled At last got 1,125,207 Web pages, 260,763 of which are clicked by Web users using 1,586,472 different queries. DAT1, consists of 90 pages which are selected from the browsed pages. Three human evaluators were employed to summarize these pages they also use a relatively large scale data set, denoted by DAT2, to evaluate summarization methods(10,000 pages).

36 Article3 : WebPage Summarization Using Clickthrough Data, 2005 ACM, JianTao Sun, Dou Shen, HuaJun Zeng, Qiang Yang, Yuchang Lu, Zheng Chen Summarization Results on DAT1 (ASW) ROUGE is a software package adopted by DUC for automatic summarization evaluation ( http://www.isi.edu/ cyl/ROUGE/ )

37 Article3 : WebPage Summarization Using Clickthrough Data, 2005 ACM, JianTao Sun, Dou Shen, HuaJun Zeng, Qiang Yang, Yuchang Lu, Zheng Chen Summarization Results on DAT1(ALSA)

38 Article3 : WebPage Summarization Using Clickthrough Data, 2005 ACM, JianTao Sun, Dou Shen, HuaJun Zeng, Qiang Yang, Yuchang Lu, Zheng Chen evaluation summarization method using the thematic lexicon clickthrough data contains only 260,763 pages, and lexicon contains 141,869 categories, which is a subset of the ODP category structure. If terms under this category have more than P% overlap with distinct terms in the Web page, then they are used for summarization. Otherwise, use lexicon terms of its parent category. This process continues until we find a category which covers enough query terms or until we reach the root of the thematic lexicon

39 Article3 : WebPage Summarization Using Clickthrough Data, 2005 ACM, JianTao Sun, Dou Shen, HuaJun Zeng, Qiang Yang, Yuchang Lu, Zheng Chen evaluation summarization method using the thematic lexicon (ASW)

40 Article3 : WebPage Summarization Using Clickthrough Data, 2005 ACM, JianTao Sun, Dou Shen, HuaJun Zeng, Qiang Yang, Yuchang Lu, Zheng Chen evaluation summarization method using the thematic lexicon (ALSA)

41 thanks


Download ppt "BY Asef poormasoomi. Motivation summaries which are generic in nature do not cater to the user’s background and interests results show that each person."

Similar presentations


Ads by Google