Crowdsourcing with Multi- Dimensional Trust Xiangyang Liu 1, He He 2, and John S. Baras 1 1 Institute for Systems Research and Department of Electrical.

Slides:



Advertisements
Similar presentations
Topic models Source: Topic models, David Blei, MLSS 09.
Advertisements

A Tutorial on Learning with Bayesian Networks
Incentivize Crowd Labeling under Budget Constraint
Probabilistic models Haixu Tang School of Informatics.
Bayesian inference of normal distribution
Simultaneous Image Classification and Annotation Chong Wang, David Blei, Li Fei-Fei Computer Science Department Princeton University Published in CVPR.
Statistical Topic Modeling part 1
Hidden Variables, the EM Algorithm, and Mixtures of Gaussians Computer Vision CS 143, Brown James Hays 02/22/11 Many slides from Derek Hoiem.
Hidden Variables, the EM Algorithm, and Mixtures of Gaussians Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 03/15/12.
04/02/2006RECOMB 2006 Detecting MicroRNA Targets by Linking Sequence, MicroRNA and Gene Expression Data Joint work with Quaid Morris (2) and Brendan Frey.
2. Introduction Multiple Multiplicative Factor Model For Collaborative Filtering Benjamin Marlin University of Toronto. Department of Computer Science.
Variational Inference and Variational Message Passing
Overview Full Bayesian Learning MAP learning
Logistics Course reviews Project report deadline: March 16 Poster session guidelines: – 2.5 minutes per poster (3 hrs / 55 minus overhead) – presentations.
Midterm Review. The Midterm Everything we have talked about so far Stuff from HW I won’t ask you to do as complicated calculations as the HW Don’t need.
Text Classification from Labeled and Unlabeled Documents using EM Kamal Nigam Andrew K. McCallum Sebastian Thrun Tom Mitchell Machine Learning (2000) Presented.
Expectation Maximization Algorithm
Collaborative Ordinal Regression Shipeng Yu Joint work with Kai Yu, Volker Tresp and Hans-Peter Kriegel University of Munich, Germany Siemens Corporate.
Expectation-Maximization
Visual Recognition Tutorial
Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
Scalable Text Mining with Sparse Generative Models
EM Algorithm Likelihood, Mixture Models and Clustering.
A Unifying Review of Linear Gaussian Models
West Virginia University A Bayesian Approach to Reliability Predication of Component Based Systems H. Singh, V. Cortellessa, B. Cukic, E. Gunel, V. Bharadwaj.
. Expressive Graphical Models in Variational Approximations: Chain-Graphs and Hidden Variables Tal El-Hay & Nir Friedman School of Computer Science & Engineering.
Reinforcement Learning in the Presence of Hidden States Andrew Howard Andrew Arnold {ah679
Incomplete Graphical Models Nan Hu. Outline Motivation K-means clustering Coordinate Descending algorithm Density estimation EM on unconditional mixture.
Unambiguity Regularization for Unsupervised Learning of Probabilistic Grammars Kewei TuVasant Honavar Departments of Statistics and Computer Science University.
Correlated Topic Models By Blei and Lafferty (NIPS 2005) Presented by Chunping Wang ECE, Duke University August 4 th, 2006.
Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.
Using Trust in Distributed Consensus with Adversaries in Sensor and Other Networks Xiangyang Liu, and John S. Baras Institute for Systems Research and.
Trust-Aware Optimal Crowdsourcing With Budget Constraint Xiangyang Liu 1, He He 2, and John S. Baras 1 1 Institute for Systems Research and Department.
Topic Modelling: Beyond Bag of Words By Hanna M. Wallach ICML 2006 Presented by Eric Wang, April 25 th 2008.
Finding Scientific topics August , Topic Modeling 1.A document as a probabilistic mixture of topics. 2.A topic as a probability distribution.
Learning Geographical Preferences for Point-of-Interest Recommendation Author(s): Bin Liu Yanjie Fu, Zijun Yao, Hui Xiong [KDD-2013]
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
A Passive Approach to Sensor Network Localization Rahul Biswas and Sebastian Thrun International Conference on Intelligent Robots and Systems 2004 Presented.
Latent Dirichlet Allocation D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3: , January Jonathan Huang
Calculating Risk of Cost Using Monte Carlo Simulation with Fuzzy Parameters in Civil Engineering Michał Bętkowski Andrzej Pownuk Silesian University of.
Variational Inference for the Indian Buffet Process
A Trust Based Distributed Kalman Filtering Approach for Mode Estimation in Power Systems Tao Jiang, Ion Matei and John S. Baras Institute for Systems Research.
Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.
Lecture 2: Statistical learning primer for biologists
Latent Dirichlet Allocation
Inferring High-Level Behavior from Low-Level Sensors Donald J. Patterson, Lin Liao, Dieter Fox, and Henry Kautz.
Expectation-Maximization (EM) Algorithm & Monte Carlo Sampling for Inference and Approximation.
Human and Optimal Exploration and Exploitation in Bandit Problems Department of Cognitive Sciences, University of California. A Bayesian analysis of human.
Bayesian Speech Synthesis Framework Integrating Training and Synthesis Processes Kei Hashimoto, Yoshihiko Nankaku, and Keiichi Tokuda Nagoya Institute.
Consensus Relevance with Topic and Worker Conditional Models Paul N. Bennett, Microsoft Research Joint with Ece Kamar, Microsoft Research Gabriella Kazai,
Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.
Predicting Consensus Ranking in Crowdsourced Setting Xi Chen Mentors: Paul Bennett and Eric Horvitz Collaborator: Kevyn Collins-Thompson Machine Learning.
Hidden Variables, the EM Algorithm, and Mixtures of Gaussians Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 02/22/11.
Modeling Annotated Data (SIGIR 2003) David M. Blei, Michael I. Jordan Univ. of California, Berkeley Presented by ChengXiang Zhai, July 10, 2003.
Gaussian Mixture Model classification of Multi-Color Fluorescence In Situ Hybridization (M-FISH) Images Amin Fazel 2006 Department of Computer Science.
Xi Chen Mentor: Denny Zhou In collaboration with: Qihang Lin
ICS 280 Learning in Graphical Models
Multimodal Learning with Deep Boltzmann Machines
CSCI 5822 Probabilistic Models of Human and Machine Learning
Stochastic Optimization Maximization for Latent Variable Models
Learning Probabilistic Graphical Models Overview Learning Problems.
Latent Dirichlet Allocation
Junghoo “John” Cho UCLA
Topic Models in Text Processing
Multivariate Methods Berlin Chen
CS590I: Information Retrieval
CS639: Data Management for Data Science
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
Presentation transcript:

Crowdsourcing with Multi- Dimensional Trust Xiangyang Liu 1, He He 2, and John S. Baras 1 1 Institute for Systems Research and Department of Electrical and Computer Engineering University of Maryland, College Park, MD 2 Deptment of Computer Science, University of Maryland, College Park, MD

Crowdsourcing Background Crowdsourcing Assignment Engine Malicious workers More reliable workers Pure experts Amazon Turkers Trust Evaluation True Label Inference clients Upload tasks Estimated answers

Motivation Tasks on crowdsourcing markets like Amazon Mechanical Turk often require knowledge in widely-ranging domains. Workers have different level of reliability in different domains. Goal: design algorithm to jointly evaluate workers’ trust values in each of the domains and at the same time estimate true labels for classification crowdsourcing tasks. Task politics sports fashion worker [good, bad, bad] worker [bad, good, bad] [bad, bad, good]

Notations Domain distribution for question i Domain for question i Truth label for question i. Take value from {0, 1} Trust vector for worker j Answer given by worker j to question i. Takes value from {0, 1} Hyper parameter of the Dirichlet prior on domain distribution. Parameter of the beta prior on trust of workers Probability that question i is associated with lth domain Trust value for worker j in domain l. Takes value from [0,1]

Probabilistic Graphical Model: No Feature Compute posterior probability for trust and true label.

Inference and Estimation Obtain the approximate posterior distributions by maximizing the lower bound of the log likelihood: We update the trust and true labels as below:

Probabilistic Graphical Model With Features Compute posterior probability for trust and true label.

Inference and Estimation Obtain the approximate posterior distributions by maximizing the lower bound of the log likelihood:  E-Step: given current model parameter estimation,, obtain approximate posterior q.  M-Step: given current posterior q, calculate the new model parameter estimation by maximizing lower bound

Probabilistic Graphical Model With Topic Models Multi-dimension trust crowdsourcing Topic model

Inference and Estimation Alternatively update approximate posterior distribution for different hidden variables:

Experiments Worker TypeDomain 0Domain 1 Type 10.5 Type Type Type PimaMVSDCMDFCMDC (1,2,2,1) N/A (2,2,2,1) N/A (3,2,2,1) N/A (1,2,2,1)NF N/A0.039 (2,2,2,1)NF N/A0.043 (3,2,2,1)NF N/A0.041

Experiments

Scientific TextMVMDCMDTC T T T T T T Tested model on 1000 scientific text annotated by five workers. Each worker answers whether a given sentence contains contradicting statements. Each sentence has the text data along with the labels provided by the five experts. We simulate D workers in total where worker j answers questions from topic j perfectly and answers questions from topics other than j close to randomly.

Experiments To show that MDTC can recover workers’ trust in each of the domains, we plot the mean trust value of 8 workers in each of the eight domains.

Conclusions Formulated a probabilistic graphical model with multi- dimensional characteristics and provided novel inference method based on variational inference. (MDC) The model is flexible and easily extensible to incorporate feature values. (MDFC) We extended MDC with topic discovery based on questions’ text descriptions and derive an analytical solution to the collection variational inference.

Thank you