Adaptive Information Filtering Lanbo Zhang (ISSDM fellow) Yi Zhang (UCSC advisor) Carla Kuiken (LANL mentor)

Slides:



Advertisements
Similar presentations
3.6 Support Vector Machines
Advertisements

Collaborative Tagging in Recommender Systems AE-TTIE JI1, CHEOL YEON1, HEUNG-NAM KIM1, AND GEUN-SIK JO2 1 Intelligent E-Commerce Systems Laboratory,
Online Max-Margin Weight Learning with Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science.
1 Vorlesung Informatik 2 Algorithmen und Datenstrukturen (Parallel Algorithms) Robin Pomplun.
Chapter 1 The Study of Body Function Image PowerPoint
Effective Change Detection Using Sampling Junghoo John Cho Alexandros Ntoulas UCLA.
Properties Use, share, or modify this drill on mathematic properties. There is too much material for a single class, so you’ll have to select for your.
UNITED NATIONS Shipment Details Report – January 2006.
Electronic Resources in the EUI Library
Summary of Convergence Tests for Series and Solved Problems
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.
Year 6 mental test 10 second questions
Mianwei Zhou, Kevin Chen-Chuan Chang University of Illinois at Urbana-Champaign Entity-Centric Document Filtering: Boosting Feature Mapping through Meta-Features.
Proud Members of the Consulting Group, LLC
Solve Multi-step Equations
Filtering Semi-Structured Documents Based on Faceted Feedback Lanbo Zhang, Yi Zhang, Qianli Xing Information Retrieval and Knowledge Management (IRKM)
Filtering Semi-Structured Documents Based on Faceted Feedback Lanbo Zhang, Yi Zhang, Qianli Xing Information Retrieval and Knowledge Management (IRKM)
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
On Comparing Classifiers : Pitfalls to Avoid and Recommended Approach
ABC Technology Project
EU Market Situation for Eggs and Poultry Management Committee 21 June 2012.
Context-Aware Mobile Music Recommendation for Daily Activities
2 |SharePoint Saturday New York City
IP Multicast Information management 2 Groep T Leuven – Information department 2/14 Agenda •Why IP Multicast ? •Multicast fundamentals •Intradomain.
VOORBLAD.
Name Convolutional codes Tomashevich Victor. Name- 2 - Introduction Convolutional codes map information to code bits sequentially by convolving a sequence.
Text Categorization.
1 Breadth First Search s s Undiscovered Discovered Finished Queue: s Top of queue 2 1 Shortest path from s.
1 Evaluations in information retrieval. 2 Evaluations in information retrieval: summary The following gives an overview of approaches that are applied.
1 Developing a Predictive Model for Internet Video Quality-of-Experience Athula Balachandran, Vyas Sekar, Aditya Akella, Srinivasan Seshan, Ion Stoica,
Factor P 16 8(8-5ab) 4(d² + 4) 3rs(2r – s) 15cd(1 + 2cd) 8(4a² + 3b²)
© 2012 National Heart Foundation of Australia. Slide 2.
Universität Kaiserslautern Institut für Technologie und Arbeit / Institute of Technology and Work 1 Q16) Willingness to participate in a follow-up case.
Understanding Generalist Practice, 5e, Kirst-Ashman/Hull
Model and Relationships 6 M 1 M M M M M M M M M M M M M M M M
25 seconds left…...
Music Recommendation by Unified Hypergraph: Music Recommendation by Unified Hypergraph: Combining Social Media Information and Music Content Jiajun Bu,
Januar MDMDFSSMDMDFSSS
We will resume in: 25 Minutes.
Chapter 12 Analyzing Semistructured Decision Support Systems Systems Analysis and Design Kendall and Kendall Fifth Edition.
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
12 January 2009SDS batch generation, distribution and web interface 1 ExESS IT tool for SDS batch generation, distribution and web interface ExESS IT tool.
Intracellular Compartments and Transport
PSSA Preparation.
Experimental Design and Analysis of Variance
Essential Cell Biology
Document Summarization using Conditional Random Fields Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, Zheng Chen IJCAI 2007 Hao-Chin Chang Department of Computer.
Chapter 13 The Data Warehouse
CO-AUTHOR RELATIONSHIP PREDICTION IN HETEROGENEOUS BIBLIOGRAPHIC NETWORKS Yizhou Sun, Rick Barber, Manish Gupta, Charu C. Aggarwal, Jiawei Han 1.
Profile. 1.Open an Internet web browser and type into the web browser address bar. 2.You will see a web page similar to the one on.
1 Distributed Agents for User-Friendly Access of Digital Libraries DAFFODIL Effective Support for Using Digital Libraries Norbert Fuhr University of Duisburg-Essen,
Understanding User Intents in Online Health Forums
Psychological Advertising: Exploring User Psychology for Click Prediction in Sponsored Search Date: 2014/03/25 Author: Taifeng Wang, Jiang Bian, Shusen.
Basics of Statistical Estimation
Classification Classification Examples
Application of Ensemble Models in Web Ranking
Carnegie Mellon 1 Maximum Likelihood Estimation for Information Thresholding Yi Zhang & Jamie Callan Carnegie Mellon University
Exploration & Exploitation in Adaptive Filtering Based on Bayesian Active Learning Yi Zhang, Jamie Callan Carnegie Mellon Univ. Wei Xu NEC Lab America.
Scalable Text Mining with Sparse Generative Models
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
TREC 2009 Review Lanbo Zhang. 7 tracks Web track Relevance Feedback track (RF) Entity track Blog track Legal track Million Query track (MQ) Chemical IR.
Carnegie Mellon Novelty and Redundancy Detection in Adaptive Filtering Yi Zhang, Jamie Callan, Thomas Minka Carnegie Mellon University {yiz, callan,
Exploring in the Weblog Space by Detecting Informative and Affective Articles Xiaochuan Ni, Gui-Rong Xue, Xiao Ling, Yong Yu Shanghai Jiao-Tong University.
Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:
Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang
Presentation transcript:

Adaptive Information Filtering Lanbo Zhang (ISSDM fellow) Yi Zhang (UCSC advisor) Carla Kuiken (LANL mentor)

Outline Introduction Our Research – Interactive Retrieval Based on Faceted Feedback (SIGIR 2010) – Discriminative Factored Prior Models for Personalized Content-Based Recommendation (CIKM 2010) Future Work 2 Adaptive Information Filtering. Lanbo Zhang, Yi Zhang (UCSC advisor), Carla Kuiken (LANL mentor)

Why Filtering? In some cases, users want to persistently track certain kinds of information on the Internet – CDC (Centers for Disease Control and Prevention) personnel News reports about H1N1 – Physicians New treatments of a disease – FBI investigators Potential terrorist threats – Financial analysts News that may influence a stock For these tasks, search engines that require users to actively issue the queries are not enough 3 We need an intelligent system that can PUSH our desired information to us whenever it is available! Adaptive Information Filtering. Lanbo Zhang, Yi Zhang (UCSC advisor), Carla Kuiken (LANL mentor)

Adaptive Information Filtering 4 Adaptive Information Filtering. Lanbo Zhang, Yi Zhang (UCSC advisor), Carla Kuiken (LANL mentor) The central task – Identify the relevant documents from a document stream

The Cold-Start Problem The filtering performance for new users is usually bad due to a lack of enough training data (user feedback) from these users We follow two directions to handle this problem – Explore new user interaction mechanisms to encourage more user feedback – Research advanced filtering models that can borrow information for new users 5 Adaptive Information Filtering. Lanbo Zhang, Yi Zhang (UCSC advisor), Carla Kuiken (LANL mentor)

Outline Introduction Our Research – Direction 1: A New User Feedback Mechanism Faceted Feedback – Direction 2: A New Filtering Model Discriminative Factored Prior Model Future Work 6 Adaptive Information Filtering. Lanbo Zhang, Yi Zhang (UCSC advisor), Carla Kuiken (LANL mentor)

Semi-Structured Documents Semi-structured documents with metadata are proliferating on the Internet – Authors, Topic, Publisher, Created Time, etc. – Metadata might be useful for filtering 7 Adaptive Information Filtering. Lanbo Zhang, Yi Zhang (UCSC advisor), Carla Kuiken (LANL mentor)

From New York Times Human assigned metadata Algorithm generated metadata 8

Definitions Facet – Each metadata field is called a facet – E.g., Date, Topic, Location, Author, etc. Facet-Value Pair – A metadata field with a specific value is called a facet-value pair – E.g., Publisher = New York Times 9 Adaptive Information Filtering. Lanbo Zhang, Yi Zhang (UCSC advisor), Carla Kuiken (LANL mentor)

Faceted Feedback Traditional User Feedback Mechanism – Allows users to provide feedback on the relevance of documents Doc1 Relevant Doc2 Non-relevant Faceted Feedback – Allows users to provide feedback on facet-value pairs – Each facet-value pair represents a constraint on the desired documents Topic = FIFA World Cup Yes Year = 2010 Yes Year = 2006 No 10 Adaptive Information Filtering. Lanbo Zhang, Yi Zhang (UCSC advisor), Carla Kuiken (LANL mentor)

Why Faceted Feedback Users may have clear ideas on some facets of the target documents –FIFA World Cup Year = 2010 May encourage user feedback – Facet-value pairs are short and easy to understand 11 Adaptive Information Filtering. Lanbo Zhang, Yi Zhang (UCSC advisor), Carla Kuiken (LANL mentor)

Research Questions Question 1 – How to select a small number of facet-value pair candidates? Question 2 – How to make use of faceted feedback? 12 Adaptive Information Filtering. Lanbo Zhang, Yi Zhang (UCSC advisor), Carla Kuiken (LANL mentor)

Q1: Facet-Value Pair Selection Four approaches to rank facet-value pairs – Top Document Frequency (TDF) Frequency in the top N ranked documents – TDF*IDF (Inverse Document Frequency) – Query Likelihood (QL) P(q|f=v) – TDF+QL TDF: P(f=v|q) QL: P(q|f=v) 13 Adaptive Information Filtering. Lanbo Zhang, Yi Zhang (UCSC advisor), Carla Kuiken (LANL mentor)

Q2: How to Use Faceted Feedback? The commonly used method – Boolean Model Problem with Boolean Model – Document metadata is not perfect Inaccurate / incomplete – This may badly hurt the retrieval performance 14 Adaptive Information Filtering. Lanbo Zhang, Yi Zhang (UCSC advisor), Carla Kuiken (LANL mentor)

The Soft Model The basic idea – Rewarding documents with user-identified facet- value pairs by adding a certain number of credits – The number of credits for each facet are learnt on training queries 15 Score(d) = original score + rewards for facet match Adaptive Information Filtering. Lanbo Zhang, Yi Zhang (UCSC advisor), Carla Kuiken (LANL mentor)

Experimental Settings Datasets OHSUMED + Queries from TREC (Text REtrieval Conference) 2000 filtering track 348,566 medical articles, 63 queries RCV1 + Queries from TREC 2002 filtering track ~810,000 news articles from Reuters, 50 queries User Study We collected user faceted feedback on Amazon Mechanical Turk 16 Adaptive Information Filtering. Lanbo Zhang, Yi Zhang (UCSC advisor), Carla Kuiken (LANL mentor)

Chosen Facets 17 OHSUMED RCV1 MeSH (Medical Subject Headlines) Region Industry Topic Adaptive Information Filtering. Lanbo Zhang, Yi Zhang (UCSC advisor), Carla Kuiken (LANL mentor)

Experimental Results: Overall Performance of Faceted Feedback Faceted feedback significantly improves the retrieval performance 18 Adaptive Information Filtering. Lanbo Zhang, Yi Zhang (UCSC advisor), Carla Kuiken (LANL mentor)

Experimental Results: Boolean Models vs. Soft Model OHSUMED RCV1 The Boolean models dont work well or even hurt, while the soft model always performs well 19 Adaptive Information Filtering. Lanbo Zhang, Yi Zhang (UCSC advisor), Carla Kuiken (LANL mentor)

Outline Introduction Our Research – Direction 1: A New User Feedback Mechanism Faceted Feedback – Direction 2: A New Filtering Model Discriminative Factored Prior Model Future Work 20 Adaptive Information Filtering. Lanbo Zhang, Yi Zhang (UCSC advisor), Carla Kuiken (LANL mentor)

Existing Filtering Approaches Two categories – Retrieval models + threshold setting methods Rocchio, BM25, Language Models, etc. – Standard machine learning models for binary text classification Naïve Bayes, logistic regression, SVM, neural networks, etc. 21 Adaptive Information Filtering. Lanbo Zhang, Yi Zhang (UCSC advisor), Carla Kuiken (LANL mentor)

Characteristics of User Interests For example, – User 1: Sports, Technology – User 2: Sports, Politics, Shopping – User 3: Politics, Technology, Travel Characteristics – A single user may have multiple interests – Different users may have overlapped interests Existing filtering approaches dont explicitly capture these characteristics 22 Adaptive Information Filtering. Lanbo Zhang, Yi Zhang (UCSC advisor), Carla Kuiken (LANL mentor)

Discriminative Factored Prior Models (DFPM) 23 The hidden factor matrix The variance matrix The profile/classifier of user m The feature vector of the j-th training document of user m The label of the j-th training document of user m The hidden vector of user m

Advantages As discriminative models, our models can incorporate any kinds of features – Textual features (words) – Semantic features (very useful) Topic = Lung Cancer Source = Cancer Cause and Control Borrow information from other users when learning profiles for new users – All user profiles share a common hidden factor matrix Capture a single users multiple interests – Each user profile follows a factored prior distribution 24 Adaptive Information Filtering. Lanbo Zhang, Yi Zhang (UCSC advisor), Carla Kuiken (LANL mentor)

Parameter Estimation Assume is diagonal and all entries are equivalent to a constant value c 1, then 25

Optimization Use an EM-like iterative algorithm to solve the above optimization problem 1: Initialize 2: 3: Close form solution! Conjugate gradient decent 26

Experimental Settings Dataset – Collected from Digg.com, where users can digg their interested news articles to promote their rankings – 15,162 users, 251 relevant documents per user Details – 80%(training), 10%(validation), 10%(test) – Words as features: 35,865 (TFIDF score) – Metrics: Precision, Recall, Macro-F1 Baselines – L-2 normalized Logistic Regression (L2LR) Learns user profile separately without borrowing information – The standard Bayesian Hierarchical model with Logistic Regression (BHLR) Uses a standard prior 27 Adaptive Information Filtering. Lanbo Zhang, Yi Zhang (UCSC advisor), Carla Kuiken (LANL mentor)

Performance Comparison Our models outperform the baselines significantly 28 Adaptive Information Filtering. Lanbo Zhang, Yi Zhang (UCSC advisor), Carla Kuiken (LANL mentor)

Outline Introduction Our Research – A New User Interaction Mechanism Faceted Feedback – A New Filtering Approach Discriminative Factored Prior Model Future Work 29 Adaptive Information Filtering. Lanbo Zhang, Yi Zhang (UCSC advisor), Carla Kuiken (LANL mentor)

Future Work Active learning on facet-value pair selection – To maximize learning benefits Integrating multiple types of user feedback – Feedback on documents – Feedback on facets – … 30 Adaptive Information Filtering. Lanbo Zhang, Yi Zhang (UCSC advisor), Carla Kuiken (LANL mentor)

Thanks! Comments & Questions ? 31 Adaptive Information Filtering. Lanbo Zhang, Yi Zhang (UCSC advisor), Carla Kuiken (LANL mentor)