Processing of large document collections Part 3 (Evaluation of text classifiers, applications of text categorization) Helena Ahonen-Myka Spring 2005.

Slides:



Advertisements
Similar presentations
Chapter 5: Introduction to Information Retrieval
Advertisements

1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Text Categorization Moshe Koppel Lecture 1: Introduction Slides based on Manning, Raghavan and Schutze and odds and ends from here and there.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Other Classification Techniques 1.Nearest Neighbor Classifiers 2.Support Vector Machines.
Evaluating Search Engine
Search Engines and Information Retrieval
Search Strategies Online Search Techniques. Universal Search Techniques Precision- getting results that are relevant, “on topic.” Recall- getting all.
1 CS 430 / INFO 430 Information Retrieval Lecture 8 Query Refinement: Relevance Feedback Information Filtering.
Evaluation.  Allan, Ballesteros, Croft, and/or Turtle Types of Evaluation Might evaluate several aspects Evaluation generally comparative –System A vs.
April 22, Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Doerre, Peter Gerstl, Roland Seiffert IBM Germany, August 1999 Presenter:
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Modern Information Retrieval
Evaluation of kernel function modification in text classification using SVMs Yangzhe Xiao.
Information Retrieval in Practice
Text Classification from Labeled and Unlabeled Documents using EM Kamal Nigam Andrew K. McCallum Sebastian Thrun Tom Mitchell Machine Learning (2000) Presented.
1 The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller MI – 25. oktober 2007 Kresten Toftgaard Andersen.
Evaluation.  Allan, Ballesteros, Croft, and/or Turtle Types of Evaluation Might evaluate several aspects Evaluation generally comparative –System A vs.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
Chapter 5: Information Retrieval and Web Search
Evaluating Classifiers
Search Engines and Information Retrieval Chapter 1.
Classifying Tags Using Open Content Resources Simon Overell, Borkur Sigurbjornsson & Roelof van Zwol WSDM ‘09.
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
Bayesian Networks. Male brain wiring Female brain wiring.
1 A study on automatically extracted keywords in text categorization Authors:Anette Hulth and Be´ata B. Megyesi From:ACL 2006 Reporter: 陳永祥 Date:2007/10/16.
1 Wikification CSE 6339 (Section 002) Abhijit Tendulkar.
Processing of large document collections Part 2 (Text categorization, term selection) Helena Ahonen-Myka Spring 2005.
Lecture Four: Steps 3 and 4 INST 250/4.  Does one look for facts, or opinions, or both when conducting a literature search?  What is the difference.
Resolving abbreviations to their senses in Medline S. Gaudan, H. Kirsch and D. Rebholz-Schuhmann European Bioinformatics Institute, Wellcome Trust Genome.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen Author : Satoshi Oyama Takashi Kokubo Toru lshida 國立雲林科技大學 National Yunlin.
Evaluating What’s Been Learned. Cross-Validation Foundation is a simple idea – “ holdout ” – holds out a certain amount for testing and uses rest for.
Planning a search strategy.  A search strategy may be broadly defined as a conscious approach to decision making to solve a problem or achieve an objective.
Processing of large document collections Part 2. Feature selection: IG zInformation gain: measures the number of bits of information obtained for category.
Processing of large document collections Part 4 (Applications of text categorization, boosting, text summarization) Helena Ahonen-Myka Spring 2006.
Chapter 6: Information Retrieval and Web Search
Processing of large document collections Part 3 (Evaluation of text classifiers, term selection) Helena Ahonen-Myka Spring 2006.
Detecting Dominant Locations from Search Queries Lee Wang, Chuang Wang, Xing Xie, Josh Forman, Yansheng Lu, Wei-Ying Ma, Ying Li SIGIR 2005.
1 Automatic Classification of Bookmarked Web Pages Chris Staff Second Talk February 2007.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.
PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL Seo Seok Jun.
Evaluation of (Search) Results How do we know if our results are any good? Evaluating a search engine  Benchmarks  Precision and recall Results summaries:
ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.
ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.
1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 3. Word Association.
Practical Issues of Classification Underfitting and Overfitting –Training errors –Generalization (test) errors Missing Values Costs of Classification.
Chapter Ⅳ. Categorization 2007 년 2 월 15 일 인공지능연구실 송승미 Text : THE TEXT MINING HANDBOOK Page. 64 ~ 81.
A Repetition Based Measure for Verification of Text Collections and for Text Categorization Dmitry V.Khmelev Department of Mathematics, University of Toronto.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
Information Retrieval using Word Senses: Root Sense Tagging Approach Sang-Bum Kim, Hee-Cheol Seo and Hae-Chang Rim Natural Language Processing Lab., Department.
Machine Learning Tutorial-2. Recall, Precision, F-measure, Accuracy Ch. 5.
Class Imbalance in Text Classification
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
Chapter 5: Credibility. Introduction Performance on the training set is not a good indicator of performance on an independent set. We need to predict.
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
Bootstrapped Optimistic Algorithm for Tree Construction
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
Information Retrieval Lecture 3 Introduction to Information Retrieval (Manning et al. 2007) Chapter 8 For the MSc Computer Science Programme Dell Zhang.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Query Refinement and Relevance Feedback.
Feature Assignment LBSC 878 February 22, 1999 Douglas W. Oard and Dagobert Soergel.
Introduction to Information Retrieval Introduction to Information Retrieval Lecture 15: Text Classification & Naive Bayes 1.
Information Organization: Evaluation of Classification Performance.
Knowledge and Information Retrieval Dr Nicholas Gibbins 32/4037.
Information Retrieval in Practice
Information Organization: Overview
Chapter 5: Information Retrieval and Web Search
Information Organization: Overview
Presentation transcript:

Processing of large document collections Part 3 (Evaluation of text classifiers, applications of text categorization) Helena Ahonen-Myka Spring 2005

2 Evaluation of text classifiers evaluation of document classifiers is typically conducted experimentally, rather than analytically reason: in order to evaluate a system analytically, we would need a formal specification of the problem that the system is trying to solve text categorization is non-formalisable

3 Evaluation the experimental evaluation of a classifier usually measures its effectiveness (rather than its efficiency) –effectiveness= ability to take the right classification decisions –efficiency= time and space requirements

4 Evaluation after a classifier is constructed using a training set, the effectiveness is evaluated using a test set the following counts are computed for each category i: –TP i : true positives –FP i : false positives –TN i : true negatives –FN i : false negatives

5 Evaluation TP i : true positives w.r.t. category c i –the set of documents that both the classifier and the previous judgments (as recorded in the test set) classify under c i FP i : false positives w.r.t. category c i –the set of documents that the classifier classifies under c i, but the test set indicates that they do not belong to c i

6 Evaluation TN i : true negatives w.r.t. c i –both the classifier and the test set agree that the documents in TN i do not belong to c i FN i : false negatives w.r.t. c i –the classifier do not classify the documents in FN i under c i, but the test set indicates that they should be classified under c i

7 Evaluation measures Precision wrt c i Recall wrt c i TPiFPiFNi TNi Classified CiTest Class Ci

8 Evaluation measures for obtaining estimates for precision and recall in the collection as a whole, two different methods may be adopted: –microaveraging counts for true positives, false positives and false negatives for all categories are first summed up precision and recall are calculated using the global values –macroaveraging average of precision (recall) for individual categories

9 Evaluation measures microaveraging and macroaveraging may give quite different results, if the different categories have very different generality e.g. the ability of a classifier to behave well also on categories with low generality (i.e. categories with few positive training instances) will be emphasized by macroaveraging choice depends on the application

10 Combined effectiveness measures neither precision nor recall makes sense in isolation of each other the trivial acceptor (each document is classified under each category) has a recall = 1 –in this case, precision would usually be very low higher levels of precision may be obtained at the price of lower values of recall

11 Trivial acceptor Precision wrt c i Recall wrt c i TPiFPi Classified Ci: allTest Class Ci

12 Combined effectiveness measures a classifier should be evaluated by means of a measure which combines recall and precision some combined measures: –11-point average precision –the breakeven point –F1 measure

13 11-point average measure in constructing the classifier, the threshold is repeatedly tuned so as to allow recall (for the category) to take up values 0.0, 0.1., …, 0.9, 1.0. precision (for the category) is computed for these 11 different values of precision, and averaged over the 11 resulting values

14 _ _ _ __ _ _ _ _ _ _ _ _ _ _

15 Recall-precision curve 100% 0% recall 100% precisionprecision.....

16 Breakeven point process analoguous to the one used for 11- point average precision –precision as a function of recall is computed by repeatedly varying the thresholds breakeven is the value where precision equals recall

17 F 1 measure F 1 measure is defined as: the breakeven point of a classifier is always less or equal than its F 1 value for the trivial acceptor,   0 and  = 1, F 1  0

18 Effectiveness once an effectiveness measure is chosen, a classifier can be tuned (e.g. thresholds and other parameters can be set) so that the resulting effectiveness is the best achievable by that classifier

19 Evaluation measures efficiency (= time and space requirements) –seldom used, although important for real- life applications –difficult: environment parameters change –two parts training efficiency = average time it takes to build a classifier for a category from a training set classification efficiency = average time it takes to classify a new document under a category

20 Conducting experiments in general, different sets of experiments may be used for cross-classifier comparison only if the experiments have been performed –on exactly the same collection (same documents and same categories) –with the same split between training set and test set –with the same evaluation measure

21 Applications of text categorization automatic indexing for Boolean information retrieval systems document organization text filtering word sense disambiguation authorship attribution hierarchical categorization of Web pages

22 Automatic indexing for information retrieval systems in an information retrieval system, each document is assigned one or more keywords or keyphrases describing its content –keywords belong to a finite set called controlled dictionary TC problem: the entries in a controlled dictionary are viewed as categories –k 1  x  k 2 keywords are assigned to each document

23 Document organization indexing with a controlled vocabulary is an instance of the general problem of document collection organization e.g. a newspaper office has to classify the incoming ”classified” ads under categories such as Personals, Cars for Sale, Real Estate etc. organization of patents, filing of newspaper articles...

24 Text filtering classifying a stream of incoming documents by an information producer to an information consumer e.g. newsfeed –producer: news agency; consumer: newspaper –the filtering system should block the delivery of documents the consumer is likely not interested in

25 Word sense disambiguation given the occurrence in a text of an ambiguous word, find the sense of this particular word occurrence e.g. –bank, sense 1, like in ”Bank of Finland” –bank, sense 2, like in ”the bank of river Thames” –occurrence: ”Last week I borrowed some money from the bank.”

26 Word sense disambiguation indexing by word senses rather than by words text categorization –documents: word occurrence contexts –categories: word senses also resolving other natural language ambiguities –context-sensitive spelling correction, part of speech tagging, prepositional phrase attachment, word choice selection in machine translation

27 Authorship attribution task: given a text, determine its author author of a text may be unknown or disputed, but some possible candidates and samples of their works exist literary and forensic applications –who wrote this sonnet? (literary interest) –who sent this anonymous letter? (forensics)

28 Hierarchical categorization of Web pages e.g. Yahoo like web hierarchical catalogues typically, each category should be populated by ”a few” documents new categories are added, obsolete ones removed usage of link structure in classification usage of the hierarchical structure