Presentation is loading. Please wait.

Presentation is loading. Please wait.

Mianwei Zhou, Kevin Chen-Chuan Chang University of Illinois at Urbana-Champaign Unifying Learning to Rank and Domain Adaptation -- Enabling Cross-Task.

Similar presentations


Presentation on theme: "Mianwei Zhou, Kevin Chen-Chuan Chang University of Illinois at Urbana-Champaign Unifying Learning to Rank and Domain Adaptation -- Enabling Cross-Task."— Presentation transcript:

1 Mianwei Zhou, Kevin Chen-Chuan Chang University of Illinois at Urbana-Champaign Unifying Learning to Rank and Domain Adaptation -- Enabling Cross-Task Document Scoring

2 Document Scorer Learning with Queries/Domains 2 Spam DetectionInformation RetrievalSentiment Analysis Handle Various User Needs Queries Tackle Diverse Sources of Data Domains

3 3 Document Scorer Learning for Different Queries: Learning to Rank Example Application: Entity-Centric Filtering (TREC-KBA 2012) Difficulty: Long Wikipedia pages as queries; Noise keywords in queries. Training Phase Testing Phase Wiki Page RelevantIrrelevant Wiki Page Relevant Irrelevant Wiki Page ? ???

4 4 Document Scorer Learning for Different Domains: Domain Adaptation I do not like this book, it is very boring... Do not buy this juice extractor, it is leaking... Example Application: Cross-Domain Sentiment Analysis (Blitzer 2006) Difficulty: Different domains use different sentiment keywords. Training Phase Testing Phase It is a very interesting book … Book Reviews This coffee maker has high quality! Kitchen Appliance Reviews

5 Learning to Rank VS Domain Adaptation 5 Training Phase Testing Phase B Boring Interesting book B Bill Gates Microsoft talk B Chicago Bull Basketball training B Ugly Leaking kitchen Common Challenge Different keyword importance across training and testing phases.

6 6 Problem: Cross-Task Document Scoring

7 7 Training Phase Testing Phase B Boring Interesting book B Bill Gates Microsoft talk B Chicago Bull Basketball training B Ugly Leaking kitchen Cross-Task Document Scoring Unify Learning to Rank and Domain Adaptation 1. Task Query (learning to rank) Domain (domain adaptation) 2. Cross-Task: Training and testing phases tackle different tasks.

8 8 Challenge: Document Scoring across Different Tasks

9 9 Relevant or Not Document Scoring Principle QueryDocument

10 10 Document Scoring Principle Q1: Which keywords are important for the query? Q2: How the keywords are contained in the doc? The relevance of a document depends on how it contains keywords that are important for the query. B Bill Gates Microsoft talk Tuesday QueryDocumentKeywords

11 Requirement of Traditional Learning to Rank: Manually Fulfill the Principle In Feature Design 11 Learning to Rank Models RankSVM RankBoost LambdaMART Abstraction BM25, Language Model, Vector Space, … Output Score Document Relevance

12 Difficult to Manually Fulfill the Principle for Noisy Query and Complex Document 12 BM25, Language Model,... Q1: Which are important?Q2: How are contained? Ans: High-IDF Ans: TF B Bill Gates Microsoft talk Tuesday Insufficient

13 Limitation of Traditional Learning to Rank: Leave Heavy Burden to Feature Designers 13 Learning to Rank Models RankSVM RankBoost LambdaMART Leave the Burden to Feature Designers

14 14 Proposal: Feature Decoupling

15 Feature Decoupling -- Towards Facilitating the Feature Design 15 Document Scoring Principle How the document contains keywords that are important for the query? Document Scoring Principle How the document contains keywords that are important for the query? Q1: Which keywords are important? Q2: How the keywords are contained?

16 Feature Decoupling for Entity Centric Document Filtering 16 Intra-Feature B Bill Gates Microsoft talk Tuesday Meta-Feature General: IDF, IsNoun, InEntity,... Structural: PositionInPage, InInfobox, InOpenPara,... Different Position: TFInURL, TFInTitle,... Different Representation: LogTF, NormalizedTF,...

17 Feature Decoupling for Cross-Domain Sentiment Analysis 17 Intra-FeatureMeta-Feature Different Position: TFInURL, TFInTitle,... Different Representation: LogTF, NormalizedTF,... Good Bad Boring Interesting Tedious High Quality Leaking Broken Pivot Keywords

18 To Learn Ranker given Decoupled Features, the model should 1. “Recouple” Features; 18 Document RelevanceKeyword Contribution Document Scoring Principle The relevance of a document depends on how it contains keywords that are important for the query? Document Scoring Principle The relevance of a document depends on how it contains keywords that are important for the query?

19 To Learn Ranker given Decoupled Features, the model should 1. “Recouple” Features; 19 Document Scoring Principle The relevance of a document depends on how it contains keywords that are important for the query? Document Scoring Principle The relevance of a document depends on how it contains keywords that are important for the query? Intra-Feature Meta-Feature Recoupling

20 20 To Learn Ranker given Decoupled Features, the model should 1. “Recouple” Features; 2. Be Noise Aware. Document Scoring Principle The relevance of a document depends on how it contains keywords that are important for the query? Document Scoring Principle The relevance of a document depends on how it contains keywords that are important for the query? QueryNoisy Keywords listMexico Jeff Noise-Aware

21 Requirement for Noise-Aware Recoupling: Inferred Sparsity 21 QueryDocumentNoisy Keywords list Mexico Jeff

22 To Achieve Inferred Sparsity: Two-Layer Scoring Model 22 Keyword Classifier Inferred Sparsity for noisy keywords Contribution for important Keywords

23 Realize such a Two-Stage Scoring Model is Non-Trivial 23 Keyword Classifier Inferred Sparsity for noisy keywords Contribution for important Keywords

24 24 Solution: Tree-Structured Restricted Boltzmann Machine

25 Overview of Tree-Structured Restricted Boltzmann Machine (T-RBM) 25 …

26 26 … 00

27 27 … y 000 100 0 1

28 28 … 00

29 Learning Feature Weighting by Likelihood Maximization 29 … Maximize Compute the Gradient by Belief Propagation

30 30 Experiment

31 Datasets for Two Different Applications 31 Entity-Centric Document Filtering Dataset: TREC-KBA 29 person entities, 52,238 documents Wikipedia pages as ID pages Cross-Domain Sentiment Analysis Use Dataset released by Blitzer et al. 4 domains, 8000 documents.

32 T-RBM Outperforms Other Baselines on Both Applications 32 Traditional learning to rank/classify frameworks without Feature Decoupling. Use a simple linear weighting model to combine meta- features and intra-features. Use a boosting framework to combine meta-features and intra-features. Structural Correspondence Learning (SCL), the domain adaptation model proposed by Blitzer et al.

33 Summary 33 Propose to solve learning to rank and domain adaptation as a unified cross-task document scoring problem. Propose the idea of feature decoupling to facilitate feature design. Propose a noise-aware T-RBM model to “recouple the features”.

34 34 Thanks! Q&A


Download ppt "Mianwei Zhou, Kevin Chen-Chuan Chang University of Illinois at Urbana-Champaign Unifying Learning to Rank and Domain Adaptation -- Enabling Cross-Task."

Similar presentations


Ads by Google