Presentation is loading. Please wait.

Presentation is loading. Please wait.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. An information-pattern-based approach to novelty detection Presenter : Lin, Shu-Han Authors : Xiaoyan.

Similar presentations


Presentation on theme: "Intelligent Database Systems Lab N.Y.U.S.T. I. M. An information-pattern-based approach to novelty detection Presenter : Lin, Shu-Han Authors : Xiaoyan."— Presentation transcript:

1 Intelligent Database Systems Lab N.Y.U.S.T. I. M. An information-pattern-based approach to novelty detection Presenter : Lin, Shu-Han Authors : Xiaoyan Li, W. Bruce Croft Information Processing and Management (2008)

2 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 2 Outline Motivation Objective Definition Observation Methodology Experiments Conclusion Personal Comments

3 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Motivation - specific topic It is very difficult for traditional word-based approaches to separate the two non-relevant sentences(3&4) from the two relevant sentences(1&2). The two non-relevant sentences are very likely to be indentified as novel because they contain many new words that do not appear in previous sentences. 3

4 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Motivation - general topic It is very difficult for traditional word-based approaches to separate the non-relevant sentence(2) from the relevant sentence(1). 4

5 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Objectives To attack above hard problem:  To provide a new and more explicit definition of novelty. Novelty is defined as new answers to the potential questions representing a user’s request or information need.  To propose a new concept in novelty detection – query-related information patterns. Very effective information patterns for novelty detection at the sentence level have been identified.  To propose a unified pattern-based approach that includes the following three steps: query analysis, relevant sentence detection and new pattern detection. The unified approach works for both specific topics and general topics. 5

6 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Definition - Information Patterns Information patterns of specific topics Information patterns of general topics Opinion patterns and opinion sentences Event patterns and event sentences 6 Table. Word patterns for the five types of NE(Name Entities)-questions Table. Examples of opinion patterns

7 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Observation – information patterns Sentence lengths  Relevant sentences on average have more words than non-relevant sentences.  Novel sentences on average have slightly more words than relevant sentences. Opinion patterns  There are relatively more opinion sentences in relevant (and novel) sentences than in non- relevant sentences.  The novel sentences’ percentage of opinion sentences is slightly larger than relevant sentences’. 7 Table. Statistics of sentence lengths Table. Statistics on opinion patterns for 22 opinion topics (2003)

8 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Observation – information patterns(Cont.) NE(Named entity) combinations  PLD( PERSON, LOCATION, DATE ) types are more effective in separating relevant and non-relevant sentence.  POLD types( PERSON, ORGANIZATION, LOCATION, DATE ) will be used in new pattern detection; NEs of the ORGANIZATION type may provide different sources of new information.  NEs of the PLD types play a more important role in event topics than in opinion topics. 8

9 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology 9 Fig. ip-BAND: a unified information-pattern-based approach to novelty detection.

10 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology(Cont.) (1) Query analysis and question formulation 10 How many (2) Where (3)

11 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology(Cont.) (2) Using patterns in relevance re-ranking  Ranking with TFISF(term frequency –inverse sentence frequency) models  TFISF with information patterns  Sentence lengths  Name Entities  Opinion patterns (3) Novel sentence extraction 11

12 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments Baseline approaches B-NN: initial retrieval ranking B-NW: new word detection B-NWT: new word detection with a threshold B-MMR: Maximal Marginal Relevance(MMR) 12

13 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments Performance for specific topics from TREC 2002, 2003, 2004 13 Note: Data with * pass significance test at 95% confidence level by the Wilcoxon test and ** for significance test at 90% level. Chg%: Improvement over the first(B-NN) baseline in %. Table. Performance of novelty detection for 8 specific topics (queries) from TREC 2002 Table. Performance of novelty detection for 15 specific topics (queries) from TREC 2003 Table. Performance of novelty detection for 11 specific topics (queries) from TREC 2004 ①②③④ 3.4 of 15 novel sentence 10.1 of 15 novel sentence 4.6 of 15 novel sentence

14 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments Performance for general topics from TREC 2002, 2003, 2004 14 Note: Data with * pass significance test at 95% confidence level by the Wilcoxon test and ** for significance test at 90% level. Chg%: Improvement over the first(B-NN) baseline in %. Table. Performance of novelty detection for 41 general topics (queries) from TREC 2002 Table. Performance of novelty detection for 35 general topics (queries) from TREC 2003 Table. Performance of novelty detection for 3 general topics (queries) from TREC 2004 ①④ 3.2 of 15 novel sentence 7.5 of 15 novel sentence 3.4 of 15 novel sentence

15 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments Comparison among specific, general and all topics at top 15 ranks 15 Note: Chg%: Improvement over the first baseline in percentage; Nvl#: Number of true novel sentences; Rdd#: Number of relevant but redundant sentences; NRl#: Number of non-relevant sentences. Table. Comparison among specific, general and all topics at top 15 ranks

16 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 16 Conclusions Novelty means new answers to the potential questions representing a user’s request or information need. The proposed ip-BAND outperforms all baselines for specific topics and general topics, and specific topics is better than general topics. It is impossible to collect complete novelty judgments in reality  Baseline selection and evaluation measure by human assessors  Misjudgment of relevance and/or novelty by human assessors and disagreement of judgments between the human assessors  Limitation and accuracy of question formulations  Novelty detection precision will be low since some non-relevant sentences may be treated as novel.

17 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 17 Personal Comments Advantage  … Drawback  … Application  …


Download ppt "Intelligent Database Systems Lab N.Y.U.S.T. I. M. An information-pattern-based approach to novelty detection Presenter : Lin, Shu-Han Authors : Xiaoyan."

Similar presentations


Ads by Google