Matchin: Eliciting User Preferences with an Online Game Severin Hacker, and Luis von Ahn Carnegie Mellon University SIGCHI 2009.

Matchin: Eliciting User Preferences with an Online Game Severin Hacker, and Luis von Ahn Carnegie Mellon University SIGCHI 2009

Matchin A game that asks two randomly chosen partners "which of these two images do you think your partner prefers?"

Some Findings It is possible to extract a global "beauty" ranking within a large collection of images. It is possible to extract the person's general image preferences. Their model can determine a player's gender with high probability.

A Taxonomy of Methods Absolute Versus Relative Judgments Total Versus Partial Judgments Random Access Versus Predefined Access "I Like" Versus "Others Like" Direct Versus Indirect

Existing Methods Flickr Interestingness Voting Hot or Not

The Mechanism Matchin is a two-player game that is played over the Internet. Every game takes two minutes. –One pair of images usually takes between two to five seconds. Matchin uses a collection of 80,000 images from Flickr that were gathered October 2007.

The Scoring Function Matchin uses a sigmoid function for scoring games. Constant scoring function –Players could get many points by quickly picking the images at random. Exponential scoring function –The rewards sometimes became too high

The Data The game was launched on May 15, 2008. Within only four months, 86,686 games had been played by 14,993 players. There have been 3,562,856 individual decisions (clicks) on images. An individual decision/record is stored in the form: –

Ranking Functions Empirical Winning Rate (EWR) ELO Rating TrueSkill Rating

Empirical Winning Rate (EWR) Function: Two problems: –For images that have a low degree, the empirical winning rate might be artificially high or low. –It does not take the quality of the competing image into account.

ELO Rating (1/2) The ELO rating system was introduced for rating chess players. Each chess player’s performance in a game is modeled as a normally distributed random variable. The mean of that random variable should reflect the player’s true skill and is called the player’s ELO rating.

ELO Rating (2/2) Expected score: ELO rating: pdpdp 0.99+677 0.9+366 0.8+240 0.7+149 0.6+72 0.50 0.4-72 0.3-149 0.2-240 0.1-366 0.01-677

TrueSkill Rating (1/2) Every player’s skill s is modeled as a normally distributed random variable centered around a mean μ and per-player variance σ 2. A player’s particular performance in a game then is drawn from a normal distribution with mean s and a per-game variance β 2.

TrueSkill Rating (2/2) Update: Conservative skill estimate:

Collaborative Filtering (1/2) In the collaborative filtering setting, they want to find out about each individual's preferences –recommend images to each user based on his/her preferences –compare users and images with each other They have developed a new collaborative filtering algorithm they call “Relative SVD”

Collaborative Filtering (2/2) The user feature vectors: The image feature vectors: The amount by which user i likes image j Data: a set D of triplets (i,j,k) The error for a particular decision: The total sum of squared errors (SSE):

Comparison of the Models

Local Minimum Do humans learn while playing the game? They compared the agreement rate of first-time players and other players. –the first-time players: 69.0% –the more experienced players: 71.8% They have also measured if people learn within a game. –the first half of the game: 67% –the second half of the game : 64%

Gender Prediction The conditional entropy: The necessary conditional probabilities Pr(G=g|X=x) can be computed with Bayes' rule given the class conditionals Pr(X=x|G=g). The naïve Bayes classifier will maximize the likelihood of the data: The total accuracy is 78.3%

The Top Ranked Images

Discussion (1/2) The highest ranked pictures –sunsets, animals, flowers, churches, bridges, and famous tourist attractions –neither provocative nor offensive The worst pictures –taken indoors and include a person –blurry or too dark –screenshots or pictures of documents or text

Discussion (2/2) There are substantial differences among players in judging images, and taking those differences into account can greatly help in predicting the users’ behavior on new images. More experienced players had about the same error rate as new players.

Conclusion The main contribution of this paper is to provide a new method to elicit user preferences. They compared several algorithms for combining these relative judgments into a total ordering and found that they can correctly predict a user’s behavior in 70% of the cases. They describe a new algorithm called Relative SVD to perform collaborative filtering on pair-wise relative judgments. They present a gender test that asks users to make some relative judgments and can predict a random user’s gender in roughly 4 out of 5 cases.

Matchin: Eliciting User Preferences with an Online Game Severin Hacker, and Luis von Ahn Carnegie Mellon University SIGCHI 2009.

Similar presentations

Presentation on theme: "Matchin: Eliciting User Preferences with an Online Game Severin Hacker, and Luis von Ahn Carnegie Mellon University SIGCHI 2009."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Matchin: Eliciting User Preferences with an Online Game Severin Hacker, and Luis von Ahn Carnegie Mellon University SIGCHI 2009.

Similar presentations

Presentation on theme: "Matchin: Eliciting User Preferences with an Online Game Severin Hacker, and Luis von Ahn Carnegie Mellon University SIGCHI 2009."— Presentation transcript:

Similar presentations

About project

Feedback