Sep 25, 2014 Lirong Xia Computational social choice Statistical approaches.

Slides:



Advertisements
Similar presentations
Bayes rule, priors and maximum a posteriori
Advertisements

MCMC estimation in MlwiN
Sep 16, 2013 Lirong Xia Computational social choice The easy-to-compute axiom.
Sep 15, 2014 Lirong Xia Computational social choice The easy-to-compute axiom.
Oct 9, 2014 Lirong Xia Hypothesis testing and statistical decision theory.
Nov 7, 2013 Lirong Xia Hypothesis testing and statistical decision theory.
A Tutorial on Learning with Bayesian Networks
Voting and social choice Vincent Conitzer
Bayesian inference “Very much lies in the posterior distribution” Bayesian definition of sufficiency: A statistic T (x 1, …, x n ) is sufficient for 
Week11 Parameter, Statistic and Random Samples A parameter is a number that describes the population. It is a fixed number, but in practice we do not know.
Flipping A Biased Coin Suppose you have a coin with an unknown bias, θ ≡ P(head). You flip the coin multiple times and observe the outcome. From observations,
.. . Parameter Estimation using likelihood functions Tutorial #1 This class has been cut and slightly edited from Nir Friedman’s full course of 12 lectures.
Lirong Xia Friday, May 2, 2014 Introduction to Social Choice.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.
The General Linear Model. The Simple Linear Model Linear Regression.
Parameter Estimation using likelihood functions Tutorial #1
Bayesian inference Gil McVean, Department of Statistics Monday 17 th November 2008.
Visual Recognition Tutorial
Introduction  Bayesian methods are becoming very important in the cognitive sciences  Bayesian statistics is a framework for doing inference, in a principled.
. Learning Bayesian networks Slides by Nir Friedman.
This presentation has been cut and slightly edited from Nir Friedman’s full course of 12 lectures which is available at Changes.
Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.
Presenting: Assaf Tzabari
Parametric Inference.
. PGM: Tirgul 10 Parameter Learning and Priors. 2 Why learning? Knowledge acquisition bottleneck u Knowledge acquisition is an expensive process u Often.
. Maximum Likelihood (ML) Parameter Estimation with applications to reconstructing phylogenetic trees Comput. Genomics, lecture 6b Presentation taken from.
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
Thanks to Nir Friedman, HU
Learning Bayesian Networks (From David Heckerman’s tutorial)
Sep 26, 2013 Lirong Xia Computational social choice Statistical approaches.
Oct 6, 2014 Lirong Xia Judgment aggregation acknowledgment: Ulle Endriss, University of Amsterdam.
Chapter Two Probability Distributions: Discrete Variables
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Statistical Decision Theory
.. . Maximum Likelihood (ML) Parameter Estimation with applications to inferring phylogenetic trees Comput. Genomics, lecture 6a Presentation taken from.
Bayesian Inference Ekaterina Lomakina TNU seminar: Bayesian inference 1 March 2013.
Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.
A statistical model Μ is a set of distributions (or regression functions), e.g., all uni-modal, smooth distributions. Μ is called a parametric model if.
Bayesian Sets Zoubin Ghahramani and Kathertine A. Heller NIPS 2005 Presented by Qi An Mar. 17 th, 2006.
Truth-Revealing Social Choice ADT-15 Tutorial Lirong Xia.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Likelihood Methods in Ecology November 16 th – 20 th, 2009 Millbrook, NY Instructors: Charles Canham and María Uriarte Teaching Assistant Liza Comita.
Week11 Parameter, Statistic and Random Samples A parameter is a number that describes the population. It is a fixed number, but in practice we do not know.
- 1 - Bayesian inference of binomial problem Estimating a probability from binomial data –Objective is to estimate unknown proportion (or probability of.
Bayesian inference for Plackett-Luce ranking models
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
Ch15: Decision Theory & Bayesian Inference 15.1: INTRO: We are back to some theoretical statistics: 1.Decision Theory –Make decisions in the presence of.
Maximum Likelihood Estimation
1 Optimizing Decisions over the Long-term in the Presence of Uncertain Response Edward Kambour.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
11/24/2008CS Common Voting Rules as Maximum Likelihood Estimators - Matthew Kay 1 Common Voting Rules as Maximum Likelihood Estimators Vincent Conitzer,
CSC321: Lecture 8: The Bayesian way to fit models Geoffrey Hinton.
Model Comparison.
Bayesian Estimation and Confidence Intervals Lecture XXII.
Hypothesis testing and statistical decision theory
Introduction to Social Choice
Introduction to Social Choice
Parameter Estimation 主講人:虞台文.
Bayes Net Learning: Bayesian Approaches
Maximum Likelihood Estimation
Oliver Schulte Machine Learning 726
Introduction If we assume
Summarizing Data by Statistics
EM for Inference in MV Data
Learning From Observed Data
EM for Inference in MV Data
Preference Modeling Lirong Xia. Preference Modeling Lirong Xia.
Computational social choice
Presentation transcript:

Sep 25, 2014 Lirong Xia Computational social choice Statistical approaches

Report your preferences over papers in a week via ! Then –meeting 1: before making slides –meeting 2: after making the slides Start to think about the topic for project 1 Announcement

Various “undesirable” behavior –manipulation –bribery –control 2 Last class: manipulation NP- Hard

3 ab a bc Turker 1 Turker 2 Turker n … >> Example: Crowdsourcing > ab > b c >

Outline: statistical approaches 4 Condorcet’s MLE model (history) A General framework Why MLE? Why Condorcet’s model? Random Utility Models Model selection

The Condorcet Jury theorem. Given –two alternatives { a, b }. –0.5< p <1, Suppose –each agent’s preferences is generated i.i.d., such that –w/p p, the same as the ground truth –w/p 1 - p, different from the ground truth Then, as n →∞, the majority of agents’ preferences converges in probability to the ground truth 5 The Condorcet Jury theorem [Condorcet 1785]

Composed of three parts –A parameter space: Θ –A sample space: S = Rankings( C ) n C = the set of alternatives, n=#voters assuming votes are i.i.d. –A set of probability distributions over S: {Pr( s | θ ) for each s ∈ Rankings( C ) and θ ∈ Θ} 6 Parametric ranking models

Maximum likelihood estimator (MLE) mechanism For any profile D=(P 1,…,P n ), –The likelihood of Θ is L( Θ |D)=Pr(D| Θ )=∏ P ∈ D Pr(P| Θ ) –The MLE mechanism MLE (D)= argmax Θ L( Θ |D) –Decision space = Parameter space “Ground truth” Θ P1P1 P2P2 PnPn … 7 Model: M r

Use a statistical model to explain the data (preference profile) –Condorcet’s model Use likelihood inference to make a decision –Decision space = Parameter space –not necessarily MLE 8 Condorcet’s MLE approach [Condorcet 1785]

Parameterized by an opinion (simple directed graphs) Given a “ground truth” opinion W and p > 1/2, generate each pairwise comparison in V independently as follows (suppose c ≻ d in W ) MLE ranking is the Kemeny rule [Young APSR-88] 9 Condorcet’s model [Condorcet 1785] Pr( b ≻ c ≻ a | a ≻ b ≻ c ) = (1-p) p (1-p)p (1-p) 2 c ≻ d in W c ≻ d in V p d ≻ c in V 1-p

10 Condorcet’s model for more than 2 alternatives [Young 1988] Not very clear in Young’s paper, Lirong for a working note that proofs this according to Young’s calculations –message 1: Condorcet’s model is different from the Mallows model –message 2: Kemeny is not an MLE of Condorcet (but it is an MLE of Mallows) Fix 0.5<p<1, parameter space: all binary relations over the alternatives –may contain cycles Sample space: each vote is a all binary relations over the alternatives Probabilities: given a ground truth binary relation –comparison between a and b is generated i.i.d. and is the same as the comparison between a and b in the ground truth with probability p Also studied in [ES UAI-14]

Fix ϕ <1, parameter space –all full rankings over alternatives –different from Condorcet’s model Sample space –i.i.d. generated full rankings over alternatives –different from Condorcet’s model Probabilities: given a ground truth ranking W, generate a ranking V w.p. – Pr ( V|W ) ∝ ϕ Kendall ( V,W ) 11 Mallows model [Mallows 1957]

Given –statistical model: Θ, S, Pr( s | θ ) –decision space: D –loss function: L( θ, d ) ∈ℝ Make a good decision based on data –decision function f : dataD –Bayesian expected lost: EL B (data, d ) = E θ |data L( θ,d ) –Frequentist expected lost: EL F ( θ, f ) = E data| θ L( θ,f ( data )) –Evaluated w.r.t. the objective ground truth different from the approaches evaluated w.r.t. agents’ subjective utilities [BCH+ EC-12] 12 Statistical decision theory

13 Statistical decision framework ground truth Θ P1P1 PnPn …… MrMr Decision (winner, ranking, etc) Information about the ground truth P1P1 P2P2 PnPn … Step 1: statistical inference Data D Given M r Step 2: decision making

M r = Condorcet’ model Step 1: MLE Step 2: top-alternative 14 Example: Kemeny Winner The most probable ranking P1P1 P2P2 PnPn … Step 1: MLE Data D Step 2: top-1 alternative

Frequentist –there is an unknown but fixed ground truth – p = 10/14=0.714 –Pr(2heads| p= ) =(0.714) 2 =0.51>0.5 –Yes! 15 Frequentist vs. Bayesian in general Bayesian –the ground truth is captured by a belief distribution –Compute Pr( p |Data) assuming uniform prior –Compute Pr(2heads|Data)=0.485 <0.5 –No! Credit: Panos Ipeirotis & Roy Radner You have a biased coin: head w/p p –You observe 10 heads, 4 tails –Do you think the next two tosses will be two heads in a row?

M r = Condorcet’ model 16 Classical Kemeny [Fishburn-77] Winner The most probable ranking P1P1 P2P2 PnPn … Step 1: MLE Data D Step 2: top-1 alternative This is the Kemeny rule (for single winner)!

17 Example: Bayesian M r = Condorcet’ model Winner Posterior over rankings P1P1 P2P2 PnPn … Step 1: Bayesian update Data D Step 2: mostly likely top-1 This is a new rule!

Anonymity, neutrality, monotonicity ConsistencyCondorcet Easy to compute Kemeny (Fishburn version) YN YN Bayesian NY 18 Classical Kemeny vs. Bayesian Lots of open questions! New paper (one of discussion papers)

Outline: statistical approaches 19 Condorcet’s MLE model (history) Why MLE? Why Condorcet’s model? A General framework

When the outcomes are winning alternatives –MLE rules must satisfy consistency: if r(D 1 )∩r(D 2 )≠ ϕ, then r(D 1 ∪ D 2 )=r(D 1 )∩r(D 2 ) –All classical voting rules except positional scoring rules are NOT MLEs Positional scoring rules are MLEs This is NOT a coincidence! –All MLE rules that outputs winners satisfy anonymity and consistency –Positional scoring rules are the only voting rules that satisfy anonymity, neutrality, and consistency! [Young SIAMAM-75] 20 Classical voting rules as MLEs [Conitzer&Sandholm UAI-05]

When the outcomes are winning rankings –MLE rules must satisfy reinforcement (the counterpart of consistency for rankings) –All classical voting rules except positional scoring rules and Kemeny are NOT MLEs This is not (completely) a coincidence! –Kemeny is the only preference function (that outputs rankings) that satisfies neutrality, reinforcement, and Condorcet consistency [Young&Levenglick SIAMAM-78] 21 Classical voting rules as MLEs [Conitzer&Sandholm UAI-05]

Condorcet’s model –not very natural –computationally hard Other classic voting rules –most are not MLEs –models are not very natural either –approximately compute the MLE 22 Are we happy?

New mechanisms via the statistical decision framework Model selection –How can we evaluate fitness? Frequentist or Bayesian? Computation –How can we compute MLE efficiently? 23 Decision Information about the ground truth Data D decision making inference

Outline: statistical approaches 24 Condorcet’s MLE model (history) A General framework Why MLE? Why Condorcet’s model? Random Utility Models

Continuous parameters: Θ =( θ 1,…, θ m ) – m : number of alternatives –Each alternative is modeled by a utility distribution μ i – θ i : a vector that parameterizes μ i An agent’s perceived utility U i for alternative c i is generated independently according to μ i ( U i ) Agents rank alternatives according to their perceived utilities – Pr ( c 2 ≻ c 1 ≻ c 3 |θ 1, θ 2, θ 3 ) = Pr U i ∼ μ i ( U 2 >U 1 >U 3 ) 25 Random utility model (RUM) [Thurstone 27] U1U1 U2U2 U3U3 θ3θ3 θ2θ2 θ1θ1

Pr ( Data |θ 1, θ 2, θ 3 ) = ∏ R ∈ Data Pr(R |θ 1, θ 2, θ 3 ) 26 Generating a preference- profile Parameters P 1 = c 2 ≻ c 1 ≻ c 3 P n = c 1 ≻ c 2 ≻ c 3 … Agent 1 Agent n θ3θ3 θ2θ2 θ1θ1

μ i ’ s are Gumbel distributions –A.k.a. the Plackett-Luce (P-L) model [BM 60, Yellott 77] Equivalently, there exist positive numbers λ 1,…, λ m Pros: –Computationally tractable Analytical solution to the likelihood function –The only RUM that was known to be tractable Widely applied in Economics [McFadden 74], learning to rank [Liu 11], and analyzing elections [GM 06,07,08,09] Cons: does not seem to fit very well 27 RUMs with Gumbel distributions c 1 is the top choice in { c 1,…,c m } c 2 is the top choice in { c 2,…,c m } c m-1 is preferred to c m

μ i ’ s are normal distributions –Thurstone’s Case V [Thurstone 27] Pros: –Intuitive –Flexible Cons: believed to be computationally intractable –No analytical solution for the likelihood function Pr(P | Θ) is known 28 RUM with normal distributions U m : from - ∞ to ∞U m-1 : from U m to ∞ … U 1 : from U 2 to ∞

Utility distributions μ l ’s belong to the exponential family (EF) –Includes normal, Gamma, exponential, Binomial, Gumbel, etc. In each iteration t E-step, for any set of parameters Θ –Computes the expected log likelihood ( ELL ) ELL(Θ| Data, Θ t ) = f (Θ, g(Data, Θ t )) M-step –Choose Θ t+1 = argmax Θ ELL(Θ | Data, Θ t ) Until | Pr ( D | Θ t ) -Pr ( D | Θ t+ 1 )| < ε 29 Approximately computed by Gibbs sampling MC-EM algorithm for RUMs [APX NIPS-12]

Outline: statistical approaches 30 Condorcet’s MLE model (history) A General framework Why MLE? Why Condorcet’s model? Random Utility Models Model selection

31 Model selection Value(Normal) - Value(PL) LLPred. LLAICBIC 44.8(15.8)87.4(30.5)-79.6(31.6)-50.5(31.6) Compare RUMs with Normal distributions and PL for –log-likelihood: log Pr( D | Θ ) –predictive log-likelihood: E log Pr( D test | Θ ) –Akaike information criterion (AIC): 2k-2 log Pr( D | Θ ) –Bayesian information criterion (BIC): k log n-2 log Pr( D | Θ ) Tested on an election dataset –9 alternatives, randomly chosen 50 voters Red: statistically significant with 95% confidence Project: model fitness for election data