Intelligent Database Systems Lab N.Y.U.S.T. I. M. An information-pattern-based approach to novelty detection Presenter : Lin, Shu-Han Authors : Xiaoyan.

Slides:



Advertisements
Similar presentations
Term Level Search Result Diversification DATE : 2013/09/11 SOURCE : SIGIR’13 AUTHORS : VAN DANG, W. BRUCE CROFT ADVISOR : DR.JIA-LING, KOH SPEAKER : SHUN-CHEN,
Advertisements

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A 24-h forecast of solar irradiance using artificial neural.
Query Chains: Learning to Rank from Implicit Feedback Paper Authors: Filip Radlinski Thorsten Joachims Presented By: Steven Carr.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A novel document similarity measure based on earth mover’s.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Web-Page Summarization Using Clickthrough Data Advisor.
1 Sentence Level Information Patterns for Novelty Detection Xiaoyan Li PhD in Computer Science UMass Amherst Visiting Assistant Professor Department of.
ISP 433/633 Week 6 IR Evaluation. Why Evaluate? Determine if the system is desirable Make comparative assessments.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Quality evaluation of product reviews using an information.
IR Evaluation Evaluate what? –user satisfaction on specific task –speed –presentation (interface) issue –etc. My focus today: –comparative performance.
A Comparative Study of Search Result Diversification Methods Wei Zheng and Hui Fang University of Delaware, Newark DE 19716, USA
Intelligent Database Systems Lab N.Y.U.S.T. I. M. BNS Feature Scaling: An Improved Representation over TF·IDF for SVM Text Classification Presenter : Lin,
1 Formal Models for Expert Finding on DBLP Bibliography Data Presented by: Hongbo Deng Co-worked with: Irwin King and Michael R. Lyu Department of Computer.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Probabilistic Model for Definitional Question Answering.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Evaluation of novelty metrics for sentence-level novelty mining Presenter : Lin, Shu-Han Authors : Flora.
1 A Unified Relevance Model for Opinion Retrieval (CIKM 09’) Xuanjing Huang, W. Bruce Croft Date: 2010/02/08 Speaker: Yu-Wen, Hsu.
Query Routing in Peer-to-Peer Web Search Engine Speaker: Pavel Serdyukov Supervisors: Gerhard Weikum Christian Zimmer Matthias Bender International Max.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Web usage mining: extracting unexpected periods from web.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Mining Positive and Negative Patterns for Relevance Feature.
Retrieval Models for Question and Answer Archives Xiaobing Xue, Jiwoon Jeon, W. Bruce Croft Computer Science Department University of Massachusetts, Google,
Intelligent Database Systems Lab N.Y.U.S.T. I. M. A Web 2.0-based collaborative annotation system for enhancing knowledge sharing in collaborative learning.
INTERESTING NUGGETS AND THEIR IMPACT ON DEFINITIONAL QUESTION ANSWERING Kian-Wei Kor, Tat-Seng Chua Department of Computer Science School of Computing.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extraction Presenter : Jiang-Shan.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. A language modeling framework for expert finding Presenter : Lin, Shu-Han Authors : Krisztian Balog,
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Instance Filtering for Entity Recognition Advisor : Dr.
Probabilistic Models of Novel Document Rankings for Faceted Topic Retrieval Ben Cartrette and Praveen Chandar Dept. of Computer and Information Science.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology SIGIR1 Improving Web Search Results Using Affinity Graph.
WIRED Week 3 Syllabus Update (next week) Readings Overview - Quick Review of Last Week’s IR Models (if time) - Evaluating IR Systems - Understanding Queries.
Presenter : Lin, Shu-Han Authors : Jeen-Shing Wang, Jen-Chieh Chiang
Users and Assessors in the Context of INEX: Are Relevance Dimensions Relevant? Jovan Pehcevski, James A. Thom School of CS and IT, RMIT University, Australia.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. A semantic similarity metric combining features and intrinsic information content Presenter: Chun-Ping.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. An IPC-based vector space model for patent retrieval Presenter: Jun-Yi Wu Authors: Yen-Liang Chen, Yu-Ting.
1 Using The Past To Score The Present: Extending Term Weighting Models with Revision History Analysis CIKM’10 Advisor : Jia Ling, Koh Speaker : SHENG HONG,
Intelligent Database Systems Lab N.Y.U.S.T. I. M. How valuable is medical social media data? Content analysis of the medical web Presenter :Tsai Tzung.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 2007.SIGIR.8 New Event Detection Based on Indexing-tree.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Utilizing Marginal Net Utility for Recommendation in E-commerce.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Using Text Mining and Natural Language Processing for.
August 17, 2005Question Answering Passage Retrieval Using Dependency Parsing 1/28 Question Answering Passage Retrieval Using Dependency Parsing Hang Cui.
Intelligent Database Systems Lab Presenter : WU, MIN-CONG Authors : YUNG-MING LI, TSUNG-YING LI 2013, DSS Deriving market intelligence from microblogs.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Psychiatric document retrieval using a discourse-aware model Presenter : Wu, Jia-Hao Authors : Liang-Chih.
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Improving the performance of personal name disambiguation.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Region-based image retrieval using integrated color, shape,
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
Intelligent Database Systems Lab Presenter : JIAN-REN CHEN Authors : Wen Zhang, Taketoshi Yoshida, Xijin Tang 2011.ESWA A comparative study of TF*IDF,
1 Evaluating High Accuracy Retrieval Techniques Chirag Shah,W. Bruce Croft Center for Intelligent Information Retrieval Department of Computer Science.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Validity index for clusters of different sizes and densities Presenter: Jun-Yi Wu Authors: Krista Rizman.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A personal route prediction system base on trajectory.
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Longzhuang Li, Yi Shang, Wei Zhang 2002.ACM. Improvement of HITS-based Algorithms.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Direct mining of discriminative patterns for classifying.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 1 Identifying Domain Expertise of Developers from Source Code Presenter : Wu, Jia-Hao Authors : Renuka.
Date: 2012/5/28 Source: Alexander Kotov. al(CIKM’11) Advisor: Jia-ling, Koh Speaker: Jiun Jia, Chiou Interactive Sense Feedback for Difficult Queries.
PERSONALIZED DIVERSIFICATION OF SEARCH RESULTS Date: 2013/04/15 Author: David Vallet, Pablo Castells Source: SIGIR’12 Advisor: Dr.Jia-ling, Koh Speaker:
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Towards comprehensive support for organizational mining Presenter : Yu-hui Huang Authors : Minseok Song,
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Providing Justifications in Recommender Systems Presenter.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A Study of Learning a Merge Model for Multilingual Information.
Intelligent Database Systems Lab Presenter: YU-TING LU Authors: Christopher C. Yang and Tobun Dorbin Ng TSMCA Analyzing and Visualizing Web Opinion.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Community self-Organizing Map and its Application to Data Extraction Presenter: Chun-Ping Wu Authors:
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Key Blog Distillation: Ranking Aggregates Presenter : Yu-hui Huang Authors :Craig Macdonald, Iadh Ounis.
Intelligent Database Systems Lab Presenter : JHOU, YU-LIANG Authors : Jae Hwa Lee, Aviv Segev 2012 CE Knowledge maps for e-learning.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. An Integrated Machine Learning Approach to Stroke Prediction Presenter: Tsai Tzung Ruei Authors: Aditya.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Named Entity Disambiguation by Leveraging Wikipedia Semantic Knowledge Presenter : Jiang-Shan Wang Authors.
Question Answering Passage Retrieval Using Dependency Relations (SIGIR 2005) (National University of Singapore) Hang Cui, Renxu Sun, Keya Li, Min-Yen Kan,
Intelligent Database Systems Lab N.Y.U.S.T. I. M. A method of extracting malicious expressions in bulletin board systems by using context analysis Presenter:
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Learning Portfolio Analysis and Mining for SCORM Compliant Environment Pattern Recognition (PR, 2010)
Research Progress Kieu Que Anh School of Knowledge, JAIST.
Intelligent Database Systems Lab Presenter: YU-TING LU Authors: Yong-Bin Kang, Pari Delir Haghighi, Frada Burstein ESA CFinder: An intelligent key.
Queensland University of Technology
Retrieval Performance Evaluation - Measures
Preference Based Evaluation Measures for Novelty and Diversity
Presentation transcript:

Intelligent Database Systems Lab N.Y.U.S.T. I. M. An information-pattern-based approach to novelty detection Presenter : Lin, Shu-Han Authors : Xiaoyan Li, W. Bruce Croft Information Processing and Management (2008)

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 2 Outline Motivation Objective Definition Observation Methodology Experiments Conclusion Personal Comments

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Motivation - specific topic It is very difficult for traditional word-based approaches to separate the two non-relevant sentences(3&4) from the two relevant sentences(1&2). The two non-relevant sentences are very likely to be indentified as novel because they contain many new words that do not appear in previous sentences. 3

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Motivation - general topic It is very difficult for traditional word-based approaches to separate the non-relevant sentence(2) from the relevant sentence(1). 4

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Objectives To attack above hard problem:  To provide a new and more explicit definition of novelty. Novelty is defined as new answers to the potential questions representing a user’s request or information need.  To propose a new concept in novelty detection – query-related information patterns. Very effective information patterns for novelty detection at the sentence level have been identified.  To propose a unified pattern-based approach that includes the following three steps: query analysis, relevant sentence detection and new pattern detection. The unified approach works for both specific topics and general topics. 5

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Definition - Information Patterns Information patterns of specific topics Information patterns of general topics Opinion patterns and opinion sentences Event patterns and event sentences 6 Table. Word patterns for the five types of NE(Name Entities)-questions Table. Examples of opinion patterns

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Observation – information patterns Sentence lengths  Relevant sentences on average have more words than non-relevant sentences.  Novel sentences on average have slightly more words than relevant sentences. Opinion patterns  There are relatively more opinion sentences in relevant (and novel) sentences than in non- relevant sentences.  The novel sentences’ percentage of opinion sentences is slightly larger than relevant sentences’. 7 Table. Statistics of sentence lengths Table. Statistics on opinion patterns for 22 opinion topics (2003)

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Observation – information patterns(Cont.) NE(Named entity) combinations  PLD( PERSON, LOCATION, DATE ) types are more effective in separating relevant and non-relevant sentence.  POLD types( PERSON, ORGANIZATION, LOCATION, DATE ) will be used in new pattern detection; NEs of the ORGANIZATION type may provide different sources of new information.  NEs of the PLD types play a more important role in event topics than in opinion topics. 8

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology 9 Fig. ip-BAND: a unified information-pattern-based approach to novelty detection.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology(Cont.) (1) Query analysis and question formulation 10 How many (2) Where (3)

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology(Cont.) (2) Using patterns in relevance re-ranking  Ranking with TFISF(term frequency –inverse sentence frequency) models  TFISF with information patterns  Sentence lengths  Name Entities  Opinion patterns (3) Novel sentence extraction 11

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments Baseline approaches B-NN: initial retrieval ranking B-NW: new word detection B-NWT: new word detection with a threshold B-MMR: Maximal Marginal Relevance(MMR) 12

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments Performance for specific topics from TREC 2002, 2003, Note: Data with * pass significance test at 95% confidence level by the Wilcoxon test and ** for significance test at 90% level. Chg%: Improvement over the first(B-NN) baseline in %. Table. Performance of novelty detection for 8 specific topics (queries) from TREC 2002 Table. Performance of novelty detection for 15 specific topics (queries) from TREC 2003 Table. Performance of novelty detection for 11 specific topics (queries) from TREC 2004 ①②③④ 3.4 of 15 novel sentence 10.1 of 15 novel sentence 4.6 of 15 novel sentence

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments Performance for general topics from TREC 2002, 2003, Note: Data with * pass significance test at 95% confidence level by the Wilcoxon test and ** for significance test at 90% level. Chg%: Improvement over the first(B-NN) baseline in %. Table. Performance of novelty detection for 41 general topics (queries) from TREC 2002 Table. Performance of novelty detection for 35 general topics (queries) from TREC 2003 Table. Performance of novelty detection for 3 general topics (queries) from TREC 2004 ①④ 3.2 of 15 novel sentence 7.5 of 15 novel sentence 3.4 of 15 novel sentence

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments Comparison among specific, general and all topics at top 15 ranks 15 Note: Chg%: Improvement over the first baseline in percentage; Nvl#: Number of true novel sentences; Rdd#: Number of relevant but redundant sentences; NRl#: Number of non-relevant sentences. Table. Comparison among specific, general and all topics at top 15 ranks

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 16 Conclusions Novelty means new answers to the potential questions representing a user’s request or information need. The proposed ip-BAND outperforms all baselines for specific topics and general topics, and specific topics is better than general topics. It is impossible to collect complete novelty judgments in reality  Baseline selection and evaluation measure by human assessors  Misjudgment of relevance and/or novelty by human assessors and disagreement of judgments between the human assessors  Limitation and accuracy of question formulations  Novelty detection precision will be low since some non-relevant sentences may be treated as novel.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 17 Personal Comments Advantage  … Drawback  … Application  …