Mutual Information Scheduling for Ranking Hamza Aftab Nevin Raj Paul Cuff Sanjeev Kulkarni Adam Finkelstein 1.

Slides:

Advertisements

Similar presentations

Bayes rule, priors and maximum a posteriori

Advertisements

Lecture 18: Temporal-Difference Learning

Hidden Markov Models (1)  Brief review of discrete time finite Markov Chain  Hidden Markov Model  Examples of HMM in Bioinformatics  Estimations Basic.

Fast Algorithms For Hierarchical Range Histogram Constructions

Computing Kemeny and Slater Rankings Vincent Conitzer (Joint work with Andrew Davenport and Jayant Kalagnanam at IBM Research.)

1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.

Likelihood Ratio, Wald, and Lagrange Multiplier (Score) Tests

Maximum Likelihood And Expectation Maximization Lecture Notes for CMPUT 466/551 Nilanjan Ray.

Estimation  Samples are collected to estimate characteristics of the population of particular interest. Parameter – numerical characteristic of the population.

1 Methods of Experimental Particle Physics Alexei Safonov Lecture #21.

CMPUT 466/551 Principal Source: CMU

Chain Rules for Entropy

Phylogenetic Trees Lecture 4

Hidden Markov Model based 2D Shape Classification Ninad Thakoor 1 and Jean Gao 2 1 Electrical Engineering, University of Texas at Arlington, TX-76013,

Visual Recognition Tutorial

. EM algorithm and applications Lecture #9 Background Readings: Chapters 11.2, 11.6 in the text book, Biological Sequence Analysis, Durbin et al., 2001.

Maximum likelihood (ML) and likelihood ratio (LR) test

. Phylogeny II : Parsimony, ML, SEMPHY. Phylogenetic Tree u Topology: bifurcating Leaves - 1…N Internal nodes N+1…2N-2 leaf branch internal node.

Heuristic alignment algorithms and cost matrices

Lecture 5: Learning models using EM

Maximum likelihood (ML)

An Optimal Learning Approach to Finding an Outbreak of a Disease Warren Scott Warren Powell

Scalable Information-Driven Sensor Querying and Routing for ad hoc Heterogeneous Sensor Networks Maurice Chu, Horst Haussecker and Feng Zhao Xerox Palo.

Maximum Likelihood (ML), Expectation Maximization (EM)

G. Cowan 2011 CERN Summer Student Lectures on Statistics / Lecture 41 Introduction to Statistics − Day 4 Lecture 1 Probability Random variables, probability.

Visual Recognition Tutorial

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

Maximum likelihood (ML)

CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.

If we measured a distribution P, what is the tree- dependent distribution P t that best approximates P? Search Space: All possible trees Goal: From all.

1 G Lect 11W Logistic Regression Review Maximum Likelihood Estimates Probit Regression and Example Model Fit G Multiple Regression Week 11.

Modeling Your Spiking Data with Generalized Linear Models.

Introduction to variable selection I Qi Yu. 2 Problems due to poor variable selection: Input dimension is too large; the curse of dimensionality problem.

Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.

CS433: Modeling and Simulation Dr. Anis Koubâa Al-Imam Mohammad bin Saud University 15 October 2010 Lecture 05: Statistical Analysis Tools.

Learning a Small Mixture of Trees M. Pawan Kumar Daphne Koller Aim: To efficiently learn a.

Inference Complexity As Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Joint work with Pedro Domingos.

More statistical stuff CS 394C Feb 6, Today Review of material from Jan 31 Calculating pattern probabilities Why maximum parsimony and UPGMA are.

Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.

PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Principles of Parameter Estimation.

Mathematical Model for the Law of Comparative Judgment in Print Sample Evaluation Mai Zhou Dept. of Statistics, University of Kentucky Luke C.Cui Lexmark.

Dr. Sudharman K. Jayaweera and Amila Kariyapperuma ECE Department University of New Mexico Ankur Sharma Department of ECE Indian Institute of Technology,

1 Standard error Estimated standard error,s,. 2 Example 1 While measuring the thermal conductivity of Armco iron, using a temperature of 100F and a power.

BCS547 Neural Decoding.

1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.

Guest lecture: Feature Selection Alan Qi Dec 2, 2004.

ECE 8443 – Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem Proof EM Example – Missing Data Intro to Hidden Markov Models.

1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.

State Estimation and Kalman Filtering Zeeshan Ali Sayyed.

NTU & MSRA Ming-Feng Tsai

Cameron Rowe.  Introduction  Purpose  Implementation  Simple Example Problem  Extended Kalman Filters  Conclusion  Real World Examples.

Predicting Consensus Ranking in Crowdsourced Setting Xi Chen Mentors: Paul Bennett and Eric Horvitz Collaborator: Kevyn Collins-Thompson Machine Learning.

G. Cowan Lectures on Statistical Data Analysis Lecture 9 page 1 Statistical Data Analysis: Lecture 9 1Probability, Bayes’ theorem 2Random variables and.

Sparse Approximate Gaussian Processes. Outline Introduction to GPs Subset of Data Bayesian Committee Machine Subset of Regressors Sparse Pseudo GPs /

R. Kass/W03 P416 Lecture 5 l Suppose we are trying to measure the true value of some quantity (x T ). u We make repeated measurements of this quantity.

11/24/2008CS Common Voting Rules as Maximum Likelihood Estimators - Matthew Kay 1 Common Voting Rules as Maximum Likelihood Estimators Vincent Conitzer,

Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability Primer Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability.

CSC2535: Computation in Neural Networks Lecture 7: Independent Components Analysis Geoffrey Hinton.

Bayesian Optimization. Problem Formulation Goal  Discover the X that maximizes Y  Global optimization Active experimentation  We can choose which values.

12. Principles of Parameter Estimation

LECTURE 09: BAYESIAN ESTIMATION (Cont.)

Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani

LECTURE 10: EXPECTATION MAXIMIZATION (EM)

Latent Variables, Mixture Models and EM

Likelihood Ratio, Wald, and Lagrange Multiplier (Score) Tests

Statistical Learning Dong Liu Dept. EEIS, USTC.

Clustering Using Pairwise Comparisons

10701 / Machine Learning Today: - Cross validation,

12. Principles of Parameter Estimation

Mathematical Foundations of BME Reza Shadmehr

Presentation transcript:

Mutual Information Scheduling for Ranking Hamza Aftab Nevin Raj Paul Cuff Sanjeev Kulkarni Adam Finkelstein 1

Applications of Ranking 2

Pair-wise Comparisons 3 Query: A > B ? Ask a voter whether candidate I is better than candidate J Observe the outcome of a match

Scheduling 4 Design queries dynamically, based on past observations.

Example: Kitten Wars 5

Example: All Our Ideas (Matthew Salganik – Princeton) 6

Select Informative Matches Assume matches are expensive but computation is cheap Previous Work (Finkelstein) Use Ranking Algorithm to make better use of information Select matches by giving priority based on two criterion Lack of information: Has a team been in a lot of matches already? Comparability of the match: Are the two teams roughly equal in strength? Our innovation Select matches based on Shannon’s mutual information 7

Related Work Sensor Management (tracking) Information-Driven [Manyika, Durrant-Whyte 1994] [Zhao et. al. 2002] – Bayesian filtering [Aoki et. al. 2011] – This session Learning Network Topology [Hayek, Spuckler 2010] Noisy Sort 8

Ranking Algorithms – Linear Model Each player has a skill level µ The probability that Player I beats Player J is a function of the difference µ i - µ j Transitive Use Maximum Likelihood Thurstone-Mosteller Model Q function Performance has Gaussian distribution about the mean µ Bradley-Terry Model Logistic function 9

Examples Elo’s chess ranking system Based on Bradley-Terry model Sagarin’s sports rankings 10

Mutual Information 11 Mutual Information: Conditional Mutual information

Entropy 12 Entropy: Conditional Entropy High entropy Low entropy

Mutual Information Scheduling Let R be the information we wish to learn (i.e. ranking or skill levels) Let O k be the outcome of the k th match At time k, scheduler chooses the pair (i k+1, j k+1 ): 13

Why use Mutual Information? Additive Property Fano’s Inequality Related entropy to probability of error For small error: Continuous distributions: MSE bounds differential entropy 14

Greedy is Not Optimal 15 Consider Huffman codes---Greedy is not optimal

Performance (MSE) 16

Performance (Gambling Penalty) 17

Identify correct ranking 18

Find strongest player 19

Find strongest player 20

Evaluating Goodness-of-Fit 21 Ranking: Inversions Skill Level Estimates: Mean squared error (MSE) Kullback-Leibler (KL) divergence (relative entropy) Others Betting risk Sampling inconsistency

Numerical Techniques Calculate mutual information Importance sampling Convex Optimization (tracking of ML estimate)

Summary of Main Idea Get the most out of measurements for estimating a ranking Schedule each match to maximize (Greedy, to make the computation tractable) Flexible S is any parameter of interest, discrete or continuous (skill levels; best candidate; etc.) Simple design---competes well with other heuristics

Ranking Based on Pair-wise Comparisons Bradley Terry Model: Examples: A hockey team scores Poisson- goals in a game Two cities compete to have the tallest person is the population

Computing Mutual Information 25 Importance Sampling: Multidimensional integral Probability distributions Skill level estimates Why is it good for estimating skill levels? –Faster than convex optimization –Efficient memory use Skill level of player 1 Skill level of player 2

Results 26 (for a 10 player tournament and100 experiments)

Visualizing the Algorithm 27 PlayerABCD A0233 B0072 C0205 D1220 ABCD A B C D AB C D ? Outcomes Scheduling