Filtering Semi-Structured Documents Based on Faceted Feedback Lanbo Zhang, Yi Zhang, Qianli Xing Information Retrieval and Knowledge Management (IRKM)

Slides:



Advertisements
Similar presentations
You have been given a mission and a code. Use the code to complete the mission and you will save the world from obliteration…
Advertisements

3.6 Support Vector Machines
Multistage Sampling.
Advanced Piloting Cruise Plot.
Chapter 1 The Study of Body Function Image PowerPoint
Author: Julia Richards and R. Scott Hawley
1 Copyright © 2013 Elsevier Inc. All rights reserved. Appendix 01.
OvidSP Flexible. Innovative. Precise. Introducing OvidSP Resources.
UNITED NATIONS Shipment Details Report – January 2006.
Business Transaction Management Software for Application Coordination 1 Business Processes and Coordination.
Library 1 Electronic Resources in the EUI Library Veerle Deckmyn, Library Director Aimee Glassel, Electronic Resources Librarian September 2, 2009.
Electronic Resources in the EUI Library
Summary of Convergence Tests for Series and Solved Problems
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Title Subtitle.
Exit a Customer Chapter 8. Exit a Customer 8-2 Objectives Perform exit summary process consisting of the following steps: Review service records Close.
Determine Eligibility Chapter 4. Determine Eligibility 4-2 Objectives Search for Customer on database Enter application signed date and eligibility determination.
My Alphabet Book abcdefghijklm nopqrstuvwxyz.
FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.
Addition Facts
Year 6 mental test 10 second questions
1 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College Distributed databases 3.
Fawaz Ghali Web 2.0 for the Adaptive Web.
ZMQS ZMQS
1 Implementing Internet Web Sites in Counseling and Career Development James P. Sampson, Jr. Florida State University Copyright 2003 by James P. Sampson,
Richmond House, Liverpool (1) 26 th January 2004.
Filtering Semi-Structured Documents Based on Faceted Feedback Lanbo Zhang, Yi Zhang, Qianli Xing Information Retrieval and Knowledge Management (IRKM)
Adaptive Information Filtering Lanbo Zhang (ISSDM fellow) Yi Zhang (UCSC advisor) Carla Kuiken (LANL mentor)
ABC Technology Project
1 Undirected Breadth First Search F A BCG DE H 2 F A BCG DE H Queue: A get Undiscovered Fringe Finished Active 0 distance from A visit(A)
VOORBLAD.
15. Oktober Oktober Oktober 2012.
Text Categorization.
1 Breadth First Search s s Undiscovered Discovered Finished Queue: s Top of queue 2 1 Shortest path from s.
1 Evaluations in information retrieval. 2 Evaluations in information retrieval: summary The following gives an overview of approaches that are applied.
Identifying Our Own Style Extended DISC ® Personal Analysis.
BIOLOGY AUGUST 2013 OPENING ASSIGNMENTS. AUGUST 7, 2013  Question goes here!
Factor P 16 8(8-5ab) 4(d² + 4) 3rs(2r – s) 15cd(1 + 2cd) 8(4a² + 3b²)
Squares and Square Root WALK. Solve each problem REVIEW:
© 2012 National Heart Foundation of Australia. Slide 2.
Understanding Generalist Practice, 5e, Kirst-Ashman/Hull
Chapter 5 Test Review Sections 5-1 through 5-4.
GG Consulting, LLC I-SUITE. Source: TEA SHARS Frequently asked questions 2.
Addition 1’s to 20.
25 seconds left…...
Music Recommendation by Unified Hypergraph: Music Recommendation by Unified Hypergraph: Combining Social Media Information and Music Content Jiajun Bu,
1 Minimally Supervised Morphological Analysis by Multimodal Alignment David Yarowsky and Richard Wicentowski.
Januar MDMDFSSMDMDFSSS
REGISTRATION OF STUDENTS Master Settings STUDENT INFORMATION PRABANDHAK DEFINE FEE STRUCTURE FEE COLLECTION Attendance Management REPORTS Architecture.
Week 1.
We will resume in: 25 Minutes.
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
PSSA Preparation.
TASK: Skill Development A proportional relationship is a set of equivalent ratios. Equivalent ratios have equal values using different numbers. Creating.
Immunobiology: The Immune System in Health & Disease Sixth Edition
1 PART 1 ILLUSTRATION OF DOCUMENTS  Brief introduction to the documents contained in the envelope  Detailed clarification of the documents content.
1 Chapter 13 Nuclear Magnetic Resonance Spectroscopy.
Immunobiology: The Immune System in Health & Disease Sixth Edition
CpSc 3220 Designing a Database
1 Distributed Agents for User-Friendly Access of Digital Libraries DAFFODIL Effective Support for Using Digital Libraries Norbert Fuhr University of Duisburg-Essen,
Basics of Statistical Estimation
Carnegie Mellon 1 Maximum Likelihood Estimation for Information Thresholding Yi Zhang & Jamie Callan Carnegie Mellon University
Exploration & Exploitation in Adaptive Filtering Based on Bayesian Active Learning Yi Zhang, Jamie Callan Carnegie Mellon Univ. Wei Xu NEC Lab America.
Carnegie Mellon Novelty and Redundancy Detection in Adaptive Filtering Yi Zhang, Jamie Callan, Thomas Minka Carnegie Mellon University {yiz, callan,
Date : 2013/1/10 Author : Lanbo Zhang, Yi Zhang, Yunfei Chen
Presentation transcript:

Filtering Semi-Structured Documents Based on Faceted Feedback Lanbo Zhang, Yi Zhang, Qianli Xing Information Retrieval and Knowledge Management (IRKM) Lab University of California, Santa Cruz

Outline Introduction Faceted Feedback – Facet-Value Pair Candidate Selection – Learning from Faceted Feedback Experiments – Settings – Results Summary 2

Personalized Information Filtering Identify user-desired documents from a document stream Two families of filtering approaches – Collaborative Filtering (CF) – Content-Based Filtering (CBF) Applications: news feeder, spam filter, etc. 3 Filtering System News Blogs s Passed documents …

Semi-Structured Documents Increasingly prevalent over the Internet s, news, movies, tweets, etc. Plenty of metadata available 4

Definitions Facet: a metadata field – Date, Topic, Location, Director, Genre, etc. Facet-Value Pair (FVP): a metadata field assigned with a particular value – Topic: Royal wedding – Date: – Location: London, UK 5 Lanbo Zhang, Yi Zhang, Qianli Xing. IRKM Lab at University of California, Santa Cruz

Motivation Existing filtering approaches learn user interests based on users relevance judgments of documents Users may have prior knowledge on which facet-value pairs are relevant – English-only readers Language: English – Social network analysts Company: Facebook 6 Lanbo Zhang, Yi Zhang, Qianli Xing. IRKM Lab at University of California, Santa Cruz

7 Can we exploit users prior knowledge on facet-value pairs for filtering? Lanbo Zhang, Yi Zhang, Qianli Xing. IRKM Lab at University of California, Santa Cruz

A New User Interaction Mechanism: Faceted Feedback 8 Filtering System FVP candidates: Lang: … Topic: … Date: … Relevant FVPs: Topic: … Lang: …

Research Questions Question 1 – How to select facet-value pair candidates? Question 2 – How to learn user profiles based on faceted feedback? 9 Lanbo Zhang, Yi Zhang, Qianli Xing. IRKM Lab at University of California, Santa Cruz

FVP Selection: Our Approach In a filtering task – A large number of unlabeled documents – Possibly a small number of labeled documents We rank facet-value pairs by 10 Pseudo relevant (positively classified) documents User-labeled relevant documents Lanbo Zhang, Yi Zhang, Qianli Xing. IRKM Lab at University of California, Santa Cruz Intuition: features that occur frequently among relevant docs while rarely in the whole corpus are very likely to be relevant

Research Questions Question 1 – How to select facet-value pair candidates? Question 2 – How to learn user profiles based on faceted feedback? 11 Lanbo Zhang, Yi Zhang, Qianli Xing. IRKM Lab at University of California, Santa Cruz

Content-Based Filtering (CBF) Treated as a binary text classification task User profile: a feature vector that represents a users information needs (interests/preferences) Given the user profile θ, a document can be determined as relevant or not according to: 12 Lanbo Zhang, Yi Zhang, Qianli Xing. IRKM Lab at University of California, Santa Cruz Document vector Document label The core of CBF is learning the user profile!

Our Approach The assumption – A feature is selected by a user since it has a high correlation with the document label (R/NR) Generalized Constraint Model (GCM) 13 Lanbo Zhang, Yi Zhang, Qianli Xing. IRKM Lab at University of California, Santa Cruz

Correlation Decomposition Sufficiency – The probability of a document being relevant given that the feature has occurred: P(R + |f=1) – P(R + |f=1)=1 : sufficient features E.g., Company: Facebook for social network analysts Necessity – The probability of the feature having occurred given that a document is relevant: P(f=1|R + ) – P(f=1|R + )=1 : necessary features E.g., Language: English for English-only readers 14 Lanbo Zhang, Yi Zhang, Qianli Xing. IRKM Lab at University of California, Santa Cruz

Examples: Highly-Correlated Features 15 The whole corpus R+R+ f 2 =1 f 1 =1 f 3 =1 1) f 1 is a sufficient feature since P(R + |f 1 =1)=1 2) f 2 is a necessary feature since P(f 2 =1|R + )=1 Lanbo Zhang, Yi Zhang, Qianli Xing. IRKM Lab at University of California, Santa Cruz 3) f 3 is neither necessary nor sufficient, but both its sufficiency and necessity are high (>0.5)

Estimating Sufficiency 16 Document label The feature The set of documents covered by feature f User profile vector Lanbo Zhang, Yi Zhang, Qianli Xing. IRKM Lab at University of California, Santa Cruz Estimation of the label of document d i

Estimating Necessity 17 Feature sufficiency Bayes Theorem! Lanbo Zhang, Yi Zhang, Qianli Xing. IRKM Lab at University of California, Santa Cruz Prior distribution

Reference Distributions Our assumption – User selects a feature since it has a high sufficiency and/or a high necessity Reference distributions: two Bernoulli distns – The sufficiency/necessity of a user-selected feature should be close to the reference distribution – KL-divergence for similarity measure 18 Lanbo Zhang, Yi Zhang, Qianli Xing. IRKM Lab at University of California, Santa Cruz

User Profile Learning The unified loss function to combine two types of feedback: 19 User-labeled documents Necessary features Sufficient features Lanbo Zhang, Yi Zhang, Qianli Xing. IRKM Lab at University of California, Santa Cruz T s, T n : reference distns

User Interaction Mechanisms Two mechanisms – Mechanism 1: ask users to select features they think are relevant – Mechanism 2: ask users to specifically select features they think are sufficient and necessary respectively 20Lanbo Zhang, Yi Zhang, Qianli Xing. IRKM Lab at University of California, Santa Cruz

Outline Introduction Faceted Feedback – Facet-Value Pair Candidate Selection – Learning from Faceted Feedback Experiments – Settings – Results Summary 21

Data Sets Use two data sets from TREC filtering track – TREC 2000: OHSUMED ( medical articles) + 63 topics (information needs) Metadata field: MeSH (Medical Subject Headings) – TREC 2002: RCV1 (~800,000 news articles) + 50 topics defined by human assessors Metadata fields: Topic, Industry, Region Split each topic set into two equal-size subsets – One for parameter tuning, the other for testing 22 Lanbo Zhang, Yi Zhang, Qianli Xing. IRKM Lab at University of California, Santa Cruz

Faceted Feedback Collection Recruit subjects on Mechanical Turk – Five subjects per topic – The average performances will be reported For each topic, we show subjects – The topic description (information need) – A group of facet-value pair candidates 23 Lanbo Zhang, Yi Zhang, Qianli Xing. IRKM Lab at University of California, Santa Cruz

Evaluation Metrics Precision (macro) Recall (macro) T11U = 2 * N rd – N nd – N rd : the number of relevant docs delivered – N nd : the number of non-relevant docs delivered T11SU = – MinNU = -0.5 – MaxU: the maximum possible utility (T11U) 24 Lanbo Zhang, Yi Zhang, Qianli Xing. IRKM Lab at University of California, Santa Cruz

Outline Introduction Faceted Feedback – Facet-Value Pair Candidate Selection – Learning from Faceted Feedback Experiments – Settings – Results Summary 25

Results 1: w/wo Faceted Feedback (FF) 26 Faceted feedback improves filtering performances, especially when fewer relevant documents are initially known. Lanbo Zhang, Yi Zhang, Qianli Xing. IRKM Lab at University of California, Santa Cruz # relevant docs initially known

Outline Introduction Faceted Feedback – Facet-Value Pair Candidate Selection – Learning from Faceted Feedback Experiments – Settings – Results Summary 27

Summary Faceted feedback is useful for filtering, especially in the cold-start scenarios The Generalized Constraint Model (GCM) is a robust user profile learning algorithm In future work, we will evaluate our methods on data sets where faceted features are more important – Movie, music, product, etc. 28 Lanbo Zhang, Yi Zhang, Qianli Xing. IRKM Lab at University of California, Santa Cruz

Questions? 29 Filtering Semi-Structured Documents Based on Faceted Feedback Lanbo Zhang, Yi Zhang, Qianli Xing Information Retrieval and Knowledge Management (IRKM) Lab University of California, Santa Cruz