Presentation is loading. Please wait.

Presentation is loading. Please wait.

Queensland University of Technology

Similar presentations


Presentation on theme: "Queensland University of Technology"— Presentation transcript:

1 Queensland University of Technology
Y. Li Y, A. Algarni, and N. Zhong Mining Positive and Negative Patterns for Relevance Feature Discovery 16th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC (KDD 2010), pp (Presented By) Prof. Yuefeng Li Queensland University of Technology Brisbane, QLD 4001, Australia

2 Outline Introduction What’s relevance feature discovery, term based Models, Why patterns? Are pattern effective for relevance feature discovery? And, the new solution The Deploying Method The definition, deploying higher level patterns to low-level terms Low-level Features Specificity and Exhaustive. Calculate the specificity score. Classification rules. Revise weights of the low-level features. Evaluation Conclusion

3 Introduction Relevance is a fundamental concept:
Topical relevance, discusses a document's relevance to a given query User relevance, discusses a document's relevance to a user. The objective of relevance feature discovery is to find useful features available in a training set for describing what users want.

4 Term Based Models Popular term-based IR models
The Rocchio algorithm Probabilistic models Okapi BM25 Language models: model-based methods and relevance models The advantages: efficient computational performance; and mature theories for term weighting. Phrases have been used in some IR models, as phrases are more discriminative and carry more “semantics" than words. Currently, many researchers illustrated that phrases are useful and crucial for query expansion in building good ranking functions.

5 Why Patterns? Challenging issue of using phrases:
Finding useful phrases for text mining and classification, since naturally, phrases have inferior statistical properties to words, and there are a large number of redundant and noisy phrases among them. Patterns can be an promising alternative of phrases Like words, patterns enjoy good statistical properties. Data mining has developed some techniques: maximal patterns, closed patterns and master patterns for removing redundant and noisy patterns.

6 Are Patterns Effective for Relevance Feature Discovery?
Pattern Taxonomy Models (PTM) It used closed sequential patterns, and has shown a certain extent improvement on the effectiveness. Two Challenging Issues for using Patterns in Text Mining The low-support problem, given a topic, large patterns are more specific for the topic, but with low supports (or low frequency). If we decrease the minimum support, there are a lot of noisy patterns would be discovered. The misinterpretation problem that means the measures used in pattern mining (e.g., “support” and “confidence”) turn out to be not suitable in using discovered patterns for answering what users want. For example, a highly frequent pattern is usually a general pattern for a topic.

7 The New Solution Features in two levels: low-level terms and higher level patterns. An innovative approach to evaluate weights of terms according to both their specificity and their distributions in the higher level features where the higher level features include both positive and negative patterns.

8 The Deploying Method What’s deploying: evaluating weights (supports) of low-level terms based on their distribution (appearances) in higher level patterns where higher level patterns are closed patterns that frequently appeared in paragraphs. A method for interpreting discovered patterns that provide a new method for weighting terms. An efficient and effective way for using patterns in solving problems; especially for using large patterns.

9 How Deploying Works Let SP1, SP2, ..., SPn be the sets of discovered closed sequential patterns for all documents di  D+ (i = 1, 2, …, n), where n = |D+|.

10 How Deploying Works cont.

11 Specificity of Low-Level Features
A term’s specificity describes the extent of the term to which the topic focuses on what users want. For example, “JDK” is more specific than term “LIB” for describing “Java Programming Language”. Basically, the specificity of terms based on their positions in a concept hierarchy. For example, terms are more general if they are in the upper part of the LCSH (Library of Congress Subject Headings) hierarchy; otherwise, they are more specific. In many cases, a term’s specificity is measured based on what topics we are talking about. For example, “knowledge discovery” will be a general term in data mining community; however it may be a specific term when we talk about information technology.

12 Definition of Specificity
The concept of relevance is subjective. It is easy for human being to do so. However, it is very difficult to use these concepts for interpreting relevance features in text documents. We define the specificity of a given term t in the training set D = D+  D- as follows:

13 Classification Rules Based on the specificity score the terms can be categorized into three group used the following classification rules: where G is a general terms, T+ is the specific positive terms and T- is the specific negative terms T | G | T

14 Negative Feedback In general, negative and positive documents could share some background concepts or noises knowledge. There are two main issues of using negative relevance feedback: How to select constructive negative examples to reduce noises and the space of negative examples as well? How to use selected negative examples to refine the discovered knowledge in D+?

15 The Selection of Constructive Negative Examples
Offender documents are negative documents that are most likely to be classified as positive.

16 How to select constructive negative examples?
Re-rank the negative examples used the extracted low-level features from positive feedback. Select the top-K documents as an offenders, Extract higher level patterns and low-level terms from selected negative examples that uses the similar method as for mining in positive documents.

17 Weight Revision of Low-level Features
Generally the specific terms are more important; however, general terms are also necessary for describing what users want. For that reasons, we increased the weights of positive terms and reduce the weights of the specific negative terms as in this paper:

18 EVALUATION Text Data - Reuters Corpus Volume 1 (RCV1) is used to test the effectiveness of the proposed model, including a total of 806,791 documents. Topics - 50 TREC assessor topics The documents are treated as plain text documents by pre- processing the documents. The tasks of removing stop-words according to a given stop- words list and stemming terms by applying the Porter Stemming algorithm are conducted.

19 Results Results of the proposed model against the baseline models for all 50 assessing topics

20 Discussions top-20 Map Fscore p/b IAP RFD PTM Rocchio SVM BM25
Percentage Changes top-20 Map Fscore p/b IAP RFD 0.557 0.493 0.470 0.472 0.513 PTM 0.496 0.444 0.439 0.430 0.464 Rocchio 0.474 0.420 0.452 SVM 0.453 0.409 0.421 0.408 0.435 BM25 0.445 0.407 0.414 0.428 % Change 12.30% 11.18% 6.92% 9.75% 10.44% Statistical information for RFD with different values of K K Average number of training documents Average number of extracted terms Average weight of extracted terms MAP Positive Negative Offenders T+ G T- w(t+) w(tg) w(t-) K=|D+|/2 12.780 41.300 6.540 23.540 22.360 4.158 1.400 -0.551 0.493 K=|D-| 39.920 14.200 15.280 1.858 0.890 0.278 K=|D+| 10.180 20.780 20.740 3.060 1.271 -2.965 0.463 Statistical information for RFD and PTM Average number of Extracted Terms used RFD Average weight(t) before revision Averageweight(t) after revision Terms extracted from D+ used PTM T+ G T- w(t+) w(tg) w(t-) T w(t) 23.54 22.36 231.78 2.842 1.400 0.320 4.158 -0.551 156.9 1.452

21 Results of using different values of K in RCV1 dataset
Offenders selection Results of using different values of K in RCV1 dataset

22 Results of using different groups of terms in all 50 assessing topics
Classification rules Results of using different groups of terms in all 50 assessing topics

23 Weight revision Weight distributions before and after the revision for the extracted features.

24 Conclusion Compared with the state-of-the-art models, the proposed model is consistent and very significant on all five measures on all 50 assessing topics. Negative (offender) selection approach is satisfactory. The use of negative relevance feedback is very significant for the relevance feature discovery. It can balance the percentages of the specific terms and the general terms to reduce noises.

25 Questions?


Download ppt "Queensland University of Technology"

Similar presentations


Ads by Google