By : asef poormasoomi Supervisor : Dr. Kahani autumn 2010 Ferdowsi University of Mashad.

Slides:



Advertisements
Similar presentations
A Novel Visualization Model for Web Search Results An Application of the Solar System Metaphor Tien N. Nguyen and Jin Zhang Electrical and Computer Engineering.
Advertisements

LIBRARY WEBSITE, CATALOG, DATABASES AND FREE WEB RESOURCES.
Date : 2012/09/20 Author : Sina Fakhraee, Farshad Fotouhi Source : KEYS12 Speaker : Er-Gang Liu Advisor : Dr. Jia-ling Koh 1.
A probabilistic model for retrospective news event detection
Aspect-driven summarization Unit for Global Security and Crisis Management Unit for Global Security and Crisis Management Background and motivation We.
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
CONTRIBUTIONS Ground-truth dataset Simulated search tasks environment Multiple everyday applications (MS Word, MS PowerPoint, Mozilla Browser) Implicit.
Introduction to Information Retrieval Outline ❶ Latent semantic indexing ❷ Dimensionality reduction ❸ LSI in information retrieval 1.
1 ~Khaled Shaban PhD. Candidate Supervisors: Dr. Otman Basir Dr. Mohammad Kamel.
DIKLA GRUTMAN 2014 Databases- presentation and training.
Product Review Summarization Ly Duy Khang. Outline 1.Motivation 2.Problem statement 3.Related works 4.Baseline 5.Discussion.
Automatic Text Summarization
WEB MINING. Why IR ? Research & Fun
Document Summarization using Conditional Random Fields Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, Zheng Chen IJCAI 2007 Hao-Chin Chang Department of Computer.
1 A Systematic Review of Cross- vs. Within-Company Cost Estimation Studies Barbara Kitchenham Emilia Mendes Guilherme Travassos.
INFORMATION SOLUTIONS Citation Analysis Reports. Copyright 2005 Thomson Scientific 2 INFORMATION SOLUTIONS Provide highly customized datasets based on.
Application of Ensemble Models in Web Ranking
Chapter 5: Introduction to Information Retrieval
Person Name Disambiguation by Bootstrapping Presenter: Lijie Zhang Advisor: Weining Zhang.
IVITA Workshop Summary Session 1: interactive text analytics (Session chair: Professor Huamin Qu) a) HARVEST: An Intelligent Visual Analytic Tool for the.
Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.
Predicting Text Quality for Scientific Articles AAAI/SIGART-11 Doctoral Consortium Annie Louis : Louis A. and Nenkova A Automatically.
Search and Retrieval: More on Term Weighting and Document Ranking Prof. Marti Hearst SIMS 202, Lecture 22.
1 I256: Applied Natural Language Processing Marti Hearst Oct 2, 2006.
Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.
Cover Coefficient based Multidocument Summarization CS 533 Information Retrieval Systems Özlem İSTEK Gönenç ERCAN Nagehan PALA.
Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
By : asef poormasoomi autumn Introduction summary: brief but accurate representation of the contents of a document 2.
Chapter 5: Information Retrieval and Web Search
Overview of Search Engines
Query session guided multi- document summarization THESIS PRESENTATION BY TAL BAUMEL ADVISOR: PROF. MICHAEL ELHADAD.
Automated Essay Grading Resources: Introduction to Information Retrieval, Manning, Raghavan, Schutze (Chapter 06 and 18) Automated Essay Scoring with e-rater.
Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification on Reviews Peter D. Turney Institute for Information Technology National.
Probabilistic Model for Definitional Question Answering Kyoung-Soo Han, Young-In Song, and Hae-Chang Rim Korea University SIGIR 2006.
CS 6604 Middle Term Report Computational Linguistics PJ -Explore Correlation between Newswires and Twitter by Tianyu Geng, Wei Huang, Ji Wang, and Xuan.
Citation Recommendation 1 Web Technology Laboratory Ferdowsi University of Mashhad.
RuleML-2007, Orlando, Florida1 Towards Knowledge Extraction from Weblogs and Rule-based Semantic Querying Xi Bai, Jigui Sun, Haiyan Che, Jin.
AnswerBus Question Answering System Zhiping Zheng School of Information, University of Michigan HLT 2002.
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
1 Text Summarization: News and Beyond Kathleen McKeown Department of Computer Science Columbia University.
Presented by: Apeksha Khabia Guided by: Dr. M. B. Chandak
10/22/2015ACM WIDM'20051 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis Voutsakis.
Katrin Erk Vector space models of word meaning. Geometric interpretation of lists of feature/value pairs In cognitive science: representation of a concept.
Generic text summarization using relevance measure and latent semantic analysis Gong Yihong and Xin Liu SIGIR, April 2015 Yubin Lim.
Chapter 6: Information Retrieval and Web Search
1 Learning Sub-structures of Document Semantic Graphs for Document Summarization 1 Jure Leskovec, 1 Marko Grobelnik, 2 Natasa Milic-Frayling 1 Jozef Stefan.
CONCLUSION & FUTURE WORK Normally, users perform search tasks using multiple applications in concert: a search engine interface presents lists of potentially.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
1 Web-Page Summarization Using Clickthrough Data* JianTao Sun, Yuchang Lu Dept. of Computer Science TsingHua University Beijing , China Dou Shen,
1 Sentence Extraction-based Presentation Summarization Techniques and Evaluation Metrics Makoto Hirohata, Yousuke Shinnaka, Koji Iwano and Sadaoki Furui.
Department of Software and Computing Systems Research Group of Language Processing and Information Systems The DLSIUAES Team’s Participation in the TAC.
Probabilistic Latent Query Analysis for Combining Multiple Retrieval Sources Rong Yan Alexander G. Hauptmann School of Computer Science Carnegie Mellon.
Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.
1 Masters Thesis Presentation By Debotosh Dey AUTOMATIC CONSTRUCTION OF HASHTAGS HIERARCHIES UNIVERSITAT ROVIRA I VIRGILI Tarragona, June 2015 Supervised.
UWMS Data Mining Workshop Content Analysis: Automated Summarizing Prof. Marti Hearst SIMS 202, Lecture 16.
Web Search and Text Mining Lecture 5. Outline Review of VSM More on LSI through SVD Term relatedness Probabilistic LSI.
Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)
The Development of a search engine & Comparison according to algorithms Sung-soo Kim The final report.
GENERATING RELEVANT AND DIVERSE QUERY PHRASE SUGGESTIONS USING TOPICAL N-GRAMS ELENA HIRST.
A Survey on Automatic Text Summarization Dipanjan Das André F. T. Martins Tolga Çekiç
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
GRAPH BASED MULTI-DOCUMENT SUMMARIZATION Canan BATUR
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Clustering of Web pages
Text Summarization by asef poormasoomi summer 2009.
Applying Key Phrase Extraction to aid Invalidity Search
Chapter 5: Information Retrieval and Web Search
Giannis Varelas Epimenidis Voutsakis Paraskevi Raftopoulou
Introduction to Search Engines
Presentation transcript:

By : asef poormasoomi Supervisor : Dr. Kahani autumn 2010 Ferdowsi University of Mashad

Introduction summary: brief but accurate representation of the contents of a document

Is this the best we can do? Motivation Abstracts for Scientific and other articles News summarization (mostly Multiple document summarization)‏ Classification of articles and other written data Web pages for search engines Web access from PDAs, Cell phones Question answering and data gathering

Extract vs. abstract lists fragments of text vs. re-phrases content coherently. example : He ate banana, orange and apple=> He ate fruit Generic vs. query-oriented provides author’s view vs. reflects user’s interest. example : question answering system Personal vs. general consider reader’s prior knowledge vs. general. Single-document vs. multi-document source based on one text vs. fuses together many texts. Input text, video, image, map Genres

Methods Statistical scoring methods (Pseudo) Higher semantic/syntactic structures Network (graph) based methods Semantic based methods(LSA, ontology, WordNet) Other methods (rhetorical analysis, lexical chains, co- reference chains) AI methods

Statistical scoring (Pseudo) General method: 1. score each entity (sentence, word) ; 2. combine scores; 3. choose best sentence(s) Scoring tecahniques: Word frequencies throughout the text (Luhn 58) Position in the text (Edmunson 69, Lin&Hovy 97) Title method (Edmunson 69) Cue phrases in sentences (Edmunson 69) Bayesian Classifier (Kupiec at el 95)

Methods Statistical scoring methods problems : Synonymy: one concept can be expressed by different words. example cycle and bicycle refer to same kind of vehicle. Polysemy: one word or concept can have several meanings. example, cycle could mean life cycle or bicycle. Phrases: a phrase may have a meaning different from the words in it. An alleged murderer is not a murderer (Lin and Hovy 1997) Higher semantic/syntactic structures Network (graph) based methods Other methods (rhetorical analysis, lexical chains, co-reference chains) AI methods

LSI based summarization (Gong, 2001) Make Term-Sentence Matrix Apply SVD on Term-Sentence Matrix Problem  TFISF con not show context and relation correctly

Proposed Approach Preprocessing ( Tokenizing, Stopword, Stemming) Extract Context ( Use LSA on Term-Document ) Extract Perspective( SRL and WordNet ) Summary Generation

Proposed Approach Preprocessing Tokenizing And Remove Stop words Stemming and make Term-Document matrix A Extract Context Use SVD on A and use matrix U(term-Concept) Calculate Cosine distance between Concepts And Documents Calculate Cosine distance between Sentences And Concept of each Topic Rank Sentences

Proposed Approach Extract Perspective Use SRL and WordNet for sentence similarity Cosine Distance Problem S1 = United States Army, successfully tested an anti-missile defense system. S2 = U.S. military projectile interceptor, streaked into space and hit the target. S3 = Iran's weekend test of a long-range missile underscored the need for a U.S. national missile defense system. Semantic Similarity  S1 = United States Army, successfully tested an anti-missile defense system. subject AM-MNR verb object Summary Generation Remove Redundancy and Rank Sentence

Evaluation Tools & Summarization Systems ROUGE : Recall-Oriented Understudy for Gisting Evaluation  Types : ROUGE-N، ROUGE-L، ROUGE-W ، ROUGE-S, ROUGE-SU MEAD  http : //  chinese, english, japanese, dutch DMSumm  http : //www. icmc.usp.br /~taspardo/DMSumm.htm  portuguese, english SweSum (Martin Hassel)   english, german, italian, spanish, greek,... FarsiSum( Nima Mazdak, Martin Hassel) o  SUMMARIST  PERSIVAL  GLEANS  SumUM  RIPTIDES  NTT  GISTSumm  GISTexter  DiaSumm  NeATS

[1] I. Mani. Automatic summarization. John Benjamins Publishing Company, [2] Yeh, J. Y., Ke, H. R., Yang, W. P., & Meng, I. H. Text summarization using a trainable summarizer and latent semantic analysis. Information Processing and Management, 41, 75-95, [3] Gong, Y., & Liu, X. Generic text summarization using relevance measure and latent semantic analysis. In Proceedings of the 24 th annual international ACM SIGIR conference on research and development in information retrieval, SIGIR`01, New Orleans, [4] Steinberger, J., & Kabadjov, M.A. & Poesio, M., & Sanchez-Graillet,O. Improving LSA-based summarization with anaphora resolution. In Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing [5] Yu, H. News summarization based on semantic similarity measure. Ninth International Conference on Hybrid Intelligent Systems, vol. 1, pp , [6] C. H. Papadimitrious, P. Raghavan, H. Tamaki, and S. Vempala. Latent semantic indexing:A probabilistic analysis. J. Comput. Syst. Sci., 61(2): , [7] C. –Y. Lin and Hovy. Automatic evaluation of summaries using n-gram co-occurrence statistics. In Proccedings of NLT-NAACL, [8] Nomoto, T., & Yuji, M. A new approach to unsupervised text summarization. In Proceedings of the 24 th annual international ACM SIGIR conference on research and development in information retrieval, SIGIR`01. New Orleans, Louisiana, United States, [9] J. Lee, S. Park, C. Ahn, D. Kim. Automatic generic document summarization based on non-negative matrix factorization. Information Processing and Management [10] Steinberger, J., & Poesio, M.,& Kabadjov, M.A. & Jeek, K.Two uses of anaphora resolution in summarization. Information Processing and Management: an International Journal, vol 43, November, [11] D. Wang, T. Li, S. Zhu, C. Ding. Multi-Document summarization via sentence-level semantic analysis and symmetric matrix factorization. SIGIR’08, July 2008, Singapore. [12] V. Gupta, G. S. Lehal, A Survey of Text Summarization Extractive Techniques. Journal of emerging thechnologies in web intelligence, august 2010 References

thanks

Document Understanding Conferences AQUAINT corpus Associated Press and New York Times( ) & Xinhua News Agency( ) Totally 1125 Documents 25 Document In each Topic 25 Document In each Topic 45 Topics Dataset Specifications Terms By Stemming & without S.W Terms By Stemming & without S.W Terms without S.W Terms ROUGE-2 ROUGE-SU4 ROUGE-2 ROUGE-SU4 32 system summarizer Each Topic has 4 human summary Ten NIST assessors wrote summaries for the 45 topics in the DUC 2007 main task.

Experimental Result Recall On ROUGE-2 Average result on 3 topics

Experimental Result Recall On ROUGE-SU4 Average result on 3 topics

The Best … Topic : US missile defense system WordResultSystemsEvaluation ROUGE ROUGE ROUGE-SU ROUGE-SU4