1 Crowdsourcing Formal Decision Making Using Generalized Semantic Games Ahmed Abdelmeged.

1 Crowdsourcing Formal Decision Making Using Generalized Semantic Games Ahmed Abdelmeged

2 We organize formal scientific knowledge in an objectively disputable form and make formal science available to a wider audience.

3 My Thesis Semantic games of interpreted logic statements provide a useful foundation for building successful crowdsourcing systems for deciding formal science claims.

4 Applications Formal science Wikipedia. Solving computational problems. Solving hardware and software verification problems. Education in formal science. Wikipeida has a subjective process for disputing claims, can make formal science claims about computational problems and h/w and s/w Students get feedback on the position they take on formal science claims. With minimal instructor involvement. Wikipeida has a subjective process for disputing claims, can make formal science claims about computational problems and h/w and s/w Students get feedback on the position they take on formal science claims. With minimal instructor involvement.

5 Outline ‣ Introduction Related Work Proposed Approach Evaluation History and Future Work

6 Deciding Formal Science Claims A Formal Science Claim Family φ(p), A is a parameterized logical formula, interpreted in a “rich”, computable structure A. S(c ∈ [0,2]) = ∀ x ∈ [0,1]: ∃ y ∈ [0,1]: x + y > c The structure consists of the natural numbers with +, >,...

7 Example Formal Science Claim Protein Folding (1) Proteins are made of long chains of amino acids (~100’s). Some amino acids attract and repulse, some amino acids are hydrophilic and some are hydrophobic. These forces determine the native state, the most stable 3D-structure (a.k.a. folding) of a protein.

8 Example Formal Science Claim Protein Folding (2) nativeState(p ∈  Proteins, f ∈  Foldings(p)) : = ∀ f2 ∈  Foldings(p) : energy(p, f)≤ energy(p, f2) hasNativeState(p ∈  Proteins) : = ∃ f ∈  Foldings(p) : nativeState(p, f) The logical formula is intended to describe the input to be provided by humans. Supported by the “rich” structures, implemented in Turing complete programming language. For most claim families, there is no known (efficient) decision procedure. Humans are needed to provide justified decisions. FoldIt! “Predicting protein structures with a multiplayer online game” --Seth Cooper, et al., 2010 8

9 Example Formal Science Claim The Hot Spots Conjecture for Acute Triangles (1) Given an insulated flat piece of metal with some initial heat distribution. Eventually, the hottest point on the metal will lie on its boundary. For acute-triangle-shaped piece of metal, eventually, the hottest point is the corner with the sharpest angle and the coldest point is the corner with widest angle.

10 Example Formal Science Claim The Hot Spots Conjecture for Acute Triangles (2) ∃ p ∈  :

11 How to crowdsource justified decision of formal science claims?

12 Outline Introduction ‣ Related Work Proposed Approach Evaluation History and Future Work

13 Current Approaches to Deciding Formal Science Claims Proofs. Too challenging for the crowd. Model checkers. Don’t handle “rich” structures. Semantic Games.

14 Deciding Formal Science Claims CrowdSourcing ?

15 Decision Making Using Semantic Games (SGs) A semantic game for a given claimφ(p 0 ), Ais a game played by a verifier and a falsifier, denoted SG(φ(p 0 ), A, verifier, falsifier), such that: A |= φ(p 0 ) the verifier has a winning strategy.

16 Toy Example S(c ∈ [0,2]) = ∀ x ∈ [0,1]: ∃ y ∈ [0,1]: x + y > c S(c) is true for c ∈ [0,1) and false for c ∈ [1,2] Best strategy: for the falsifier: x=0 for the verifier: y=1

17 Toy Example: SG Trace SG( ∀ x ∈ [0,1]: ∃ y ∈ [0,1]: x + y > 1.5,,) \forall x \in [0,1] \exists y \in [0,1]: x \cdot y + (1- x)\cdot (1-y^2) \geq 0.62 \exists y \in [0,1]: 0.5 \cdot y + 0.5 \cdot (1-y^2) \geq 0.62 \forall x \in [0,1] \exists y \in [0,1]: x \cdot y + (1- x)\cdot (1-y^2) \geq 0.62 \exists y \in [0,1]: 0.5 \cdot y + 0.5 \cdot (1-y^2) \geq 0.62 SG( ∃ y ∈ [0,1]: 1 + y > 1.5,,) Provides 1 for x SG( 1 + 1 > 1.5,,) Provides 1 for y Wins Weakening (too much!) Strengthening

18 Moves of SG(φ, A, v, f) φMoveNext Game ∀ x : ψ  f provides x 0 SG(ψ[x 0/ x], A, v, f)  ∧  f chooses θ ∈ {ψ,  } SG(θ, A, v, f) ∃ x :  ψ  v provides x 0 SG(ψ[x 0/ x], A, v, f) ψ ∨ ψ ∨  v chooses θ ∈ {ψ,  } SG(θ, A, v, f) ¬ ψN/ASG(ψ, A, f, v) P(t 0 )v wins if P(t 0 ) holds, o/w f wins “The Game of Language: Studies in Game-Theoretical Semantics and Its Applications” -- Kulas and Hintikka, 1983

19 Strategies A strategy is a set of functions, one for each potential move.

21 Example For A potential falsifier strategy is: provideX(c){ 0.5 }. A potential verifier strategy is: provideY(x, c){ x }.

22 Example: SG Trace SG(,,) \forall x \in [0,1] \exists y \in [0,1]: x \cdot y + (1- x)\cdot (1-y^2) \geq 0.62 \exists y \in [0,1]: 0.5 \cdot y + 0.5 \cdot (1-y^2) \geq 0.62 \forall x \in [0,1] \exists y \in [0,1]: x \cdot y + (1- x)\cdot (1-y^2) \geq 0.62 \exists y \in [0,1]: 0.5 \cdot y + 0.5 \cdot (1-y^2) \geq 0.62 SG(,,) Provides 0.5 for x SG(,,) Provides 0.5 for y Wins Weakening (too much!) Strengthening

23 SG Properties (Relevant to our approach) SG winners drive their opponents into contradiction. Faulty verifier (falsifier) actions can produce a false (true) claim from a true (false) one. Faulty actions will be exposed by a perfect opponent leading to a loss. Winning against a perfect verifier (falsifier) implies that the claim is false (true). Losing an SG implies that either you did a faulty action or you were on the wrong position.

24 C/S Systems Filter ∑ Encourageme nt Combined user contributions “Crowdsourcing systems on the world-wide web” -- Anhai Doan, Raghu Ramakrishnan and Alon Y. Halevy, 2011 User contributions Challenges Define user contributions. Evaluate users and their contributions Combine user contributions. Encourage and retain users. Owners 24

25 Example C/S Systems (1) Informally specified tasks: Simple: Image labeling (ESP Game) and web page classifiers (Ipeirotis et al.). Combined through majority voting. Complex: Crowdforge (Smus et al.) and Wikipedia. Combined through manual effort.

26 Example C/S Systems (2) Formally specified tasks: FoldIt! (Cooper et al.), EteRNA(Treuille et al.), PipeJam(Ernst et al.) and Algorithm development competitions at TopCoder. We provide a general, collaborative, framework.

27 Current Approaches to Crowdsourcing Formally Specified Complex Tasks Labeling pictures, web page classifiers. Wikipedia. CrowdForge. Crowdsourcing Competition. Harvard Catalyst., TopCoder, Kaggle.

28 Current Approaches to Crowdsourcing Complex Tasks Labeling pictures, web page classifiers. Wikipedia. CrowdForge. Crowdsourcing Competition. Harvard Catalyst., TopCoder, Kaggle.

29 Majority Voting. Gold Standard. Statistical techniques. Objective evaluation.

30 Strategies provideX(){ return 0.5; } provideY(x){ return x; }

31 Outline Introduction Related Work ‣ Proposed Approach Evaluation History and Future Work

32 Overview We use SGs to collect evidences of truth of claims an skill/strength of users. Egoistic users produce social welfare.

33 SGs and C/S Systems SGs provides a foundation to: Combine user contributions : winner’s position and moves are assumed to be correct. Evaluate users: winner is assumed to be more skilled. SGs can help retaining users as they can be fun to play and watch. SGs have a collaborative nature. Winners provide information to losers. SGs help “educate” the crowd.

34 How to use SGs to crowdsource decisions of formal science claims?

35 Proposed Approach Owners provide a claim c, the unreliable users in the crowd provide strategies (a.k.a. avatars) for playing SG(c,-,-). We get the avatars to play numerous SGs. Then we combine their outcome to: Estimate the truth likelihood of c. Estimate the strength of avatars. Users update their avatars and then we iterate.

36 First Shot: Using SGs Given a claim c, run numerous SG(c, v, f) where v, f are chosen at random from the crowd. The more often the verifiers win, the more likely c is true. Users with more wins have better strategies. Suppose c is true (false), and the falsifier (verifier) wins. This reduces the estimated truth likelihood of c. Suppose c is true (false), and the falsifier (verifier) loses. This reduces the estimated skill level of the falsifier (verifier).

37 Generalizing SGs First ask users for their favorite position. If both choose the same position, force one to play the devil’s advocate. WinnerForced Payoff (u, !u) Truth evidence uNone(1, 0)Pos(u) uu(1, 0)None u!u(0, 0)Pos(u)

38 WinnerForced Payoff (u, !u) Truth evidence uNone(1, 0)Pos(u) uu(1, 0)None u!u(0, 0)Pos(u)

39 Generalized Semantic Games Forced players. The algorithms to handle the unreliable crowd. CIM. User evaluation. Claim evaluation. Claim family relations.

40 Crowdsourcing (C/S)and SGs

41 Estimating Claim Truth Likelihood Truth Likelihood = E v / (E v + E f ), where E v (E f ) is the number of times the non- forced verifier (falsifier) wins. [UNW] Each win is weighted by the strength of the opponent. [W8D]

42 Estimating User Skill: Simple Approach W\!ins_{SM}(U_i) &=& \sum_{j} P\!ayo\!f\!\!f\!\,({U_i},{U_j}) \\ Losses_{SM}(U_i) &=& \sum_{j} P\!ayo\!f\!\!f\!\,({U_j},{U_i}) \\ Str_{SM}(U_i) &=& W\!ins_{SM}(U_i)/(W\!ins_{SM}(U_i)+Losses_{SM}(U_i)) W\!ins_{SM}(U_i) &=& \sum_{j} P\!ayo\!f\!\!f\!\,({U_i},{U_j}) \\ Losses_{SM}(U_i) &=& \sum_{j} P\!ayo\!f\!\!f\!\,({U_j},{U_i}) \\ Str_{SM}(U_i) &=& W\!ins_{SM}(U_i)/(W\!ins_{SM}(U_i)+Losses_{SM}(U_i)) 42 The fraction of wins against a non- forced players [SM]

43 Estimating User Skill: Iterative Approach Winning against a strong user results in a large gain. Losing against a strong user results in a small hit. [IT] Str_{IT}^{0}(U_i) &=& Str_{SM}(U_i) \\ W\!ins_{IT}^{(k)}(U_i) &=& \sum_{j} P\!ayo\!f\!\!f\!\,({U_i},{U_j}) * Str_{IT}^{(k-1)}(U_j) \\ Losses_{IT}^{(k)}(U_i) &=& \sum_{j} P\!ayo\!f\!\!f\!\,({U_j},{U_i}) * (1 - Str_{IT}^{(k-1)}(U_j)) \\ T\!otal_{IT}^{(k)}(U_i) &=& W\!ins_{IT}^{(k)}(U_i) + Losses_{IT}^{(k)}(U_i) \\ Str_{IT}^{(k)}(U_i) &=& \begin{cases} 0.5, \mbox{if } T\!otal_{IT}^{(k)} = 0 \\ W\!ins_{IT}^{(k)}(U_i) / T\!otal_{IT}^{(k)}(U_i), \mbox{o/w} \end{cases} Str_{IT}^{0}(U_i) &=& Str_{SM}(U_i) \\ W\!ins_{IT}^{(k)}(U_i) &=& \sum_{j} P\!ayo\!f\!\!f\!\,({U_i},{U_j}) * Str_{IT}^{(k-1)}(U_j) \\ Losses_{IT}^{(k)}(U_i) &=& \sum_{j} P\!ayo\!f\!\!f\!\,({U_j},{U_i}) * (1 - Str_{IT}^{(k-1)}(U_j)) \\ T\!otal_{IT}^{(k)}(U_i) &=& W\!ins_{IT}^{(k)}(U_i) + Losses_{IT}^{(k)}(U_i) \\ Str_{IT}^{(k)}(U_i) &=& \begin{cases} 0.5, \mbox{if } T\!otal_{IT}^{(k)} = 0 \\ W\!ins_{IT}^{(k)}(U_i) / T\!otal_{IT}^{(k)}(U_i), \mbox{o/w} \end{cases} 43

44 The Crowd Interaction Mechanism (CIM) SGs are binary interaction mechanisms that need to be scaled to the crowd. CIM decides which SGs to play. Several tournament options. Should be simple and intuitive for users. Need a fair CIM with minimal effect on estimated user skill levels.

45 Sources of Unfairness Users u1 and u2, taking the same position on claim c, are not given the same chance if: u1 and u2 play a different number of SGs against any other opponents. Either u1 or u2 is forced more often. There are other players that are willing to lose against either u1 or u2 on purpose.

46 The Contradiction Agreement Game (CAG) If two users choose different positions (contradiction) on a given claim. They play a regular SG. If two users choose the same position (agreement) on a given claim. They play two SGs where they switch playing devil’s advocate. CAGs eliminate the forcing advantage.

47 A Fair CIM A full round robin tournament of CAGs, eliminates the potential unfairness arising from playing a different number of games against any other opponent or being forced more often.

49 Outline Introduction Related Work Proposed Approach ‣ Evaluation History and Future Work

50 Evaluation Approach We evaluate the system based on the quality of estimated truth likelihood (Et) and user strength in a set of benchmark experiments. Each experiment consists of: A claim with a known truth. A crowd of synthetic users with predetermined skill distribution. The quality of estimated truth likelihood is Et for true claims and (1-Et) for false claims. The quality of estimated user strength is the fraction of pairs of users whose rank is consistent with their predetermined skill.

51 Synthetic Users (1) A synthetic user with skill level p, denoted su p, makes the perfectly correct action with probability p and makes the perfectly incorrect action with probability (1-p).

52 Example Synthetic User Perfectly Correct Actions ProvideX(c){ 0.552 } ProvideY(c, x){ min(x, x/(2-2*x)) } Perfectly Incorrect Actions ProvideX(c){ 0 } ProvideY(c, x){ x>0.5?0:1 }

53 Synthetic Users (2) A synthetic user su p chooses its position on a claim c to be the winner position of SG(c, su p, su p ). su 1 will always choose the correct position. Otherwise, the probability of choosing the correct position depends on the claim.

54 Winning Probability Correct actions {verifier, falsifier} {falsifier } {verifier}{} Pr pf*pvpf*pv p f * (1-p v )(1-p f ) * p v (1-p f ) * (1- p v ) sp(0.2) true verifier falsifier sp(0.6) true verifierfalsifierverifierfalsifier sp(0.75) false falsifier verifierfalsifier

55 Winning Probability Assumptions: Claim is of the form Verifier skill is p v, falsifier skill is p f. Incorrect moves are always possible. When claim is true (false), falsifier (verifier)actions do not matter. Pr{Verifier wins | Claim is true} = p v Pr{Falsifier wins | Claim is false} = p f + (1-p f )p v

56 Winning Probability For an SG with k v verifier moves and k f falsifier moves, there are 2 k v +k f different scenarios.

57 Winning Probability For an SG with k moves there are 2 k different scenarios. Pr{Verifier wins SG} = ∑ Pr{Scenario i}

58 Initial Experiments Crowd skill distributions: Normal. Binomial. Uniform. Claims: SP(0.2) and SP(0.75). Configurations: CIM : AL vs TH UE : SM vs IT CE : W8D vs UNW Frequency Skill level

59 Results W8D to enhances CE quality. Full round robin to be produce fewer inconsistent rankings than the partial round robin. Surprisingly, we found that the simple user evaluator to produce fewer inconsistent rankings than the iterative evaluator. Configuration CE quality UE Quality AL-SM-UNW0.8070.815 AL-SM-W8D0.8510.815 AL-IT-UNW0.8070.769 AL-IT-W8D0.8480.770 TH-SM-UNW0.8080.556 TH-SM-W8D0.8370.554 TH-IT-UNW0.8070.558 TH-IT-W8D0.8360.557 Uniform, SP(0.2)

60 Outline Introduction Related Work Proposed Approach Evaluation ‣ History and Future Work

61 What we Have Done (2007-2013) [2007-2008] Specker Derivative Game (SDG): Game of Financial Derivatives for CSP. Supported by GMO. [2009-2011] Specker Challenge Game (SCG): protocols instead logic sentences, propose claims, defend or refute or strengthen. Supported by Novartis. [2013-]Scientific Community Game (SCG): claim families defined by parameterized logic formulas, defend or refute through semantic games (instead of protocols). “The Specker Challenge Game for Education and Innovation in Constructive Domains” -- Keynote paper at Bionetics 2010.

62 5/3/2013

63 We’ve come a long way... We worked with GMO and we had a game of financial derivative. SCG, we used protocols instead of logic statements.

64 What We Plan To Do (Until End of August ’13) Run more experiments with synthetic scholars with the goal to fine tune our system before we give it to humans. Run system with humans writing avatars teach the humans want humans to be innovative Claim family to use: minHSR(n,k,q) = exists DT(n,k) of minimum depth q.

65 What We Plan To Do (Until End of August ’13) Development & evaluation based on synthetic users. Build up the benchmarks. Fine tune the system. Support more use cases. Bring the system to the web. Experiment with humans writing avatars. Highest safe rung problem. Beat the system.

66 What we want to do (until end of August 2013) Run more experiments with synthetic scholars with the goal to fine tune our system before we give it to humans. Run system with humans writing avatars teach the humans want humans to be innovative Claim family to use: minHSR(n,k,q) = exists DT(n,k) of minimum depth q.

67 Innovation that could happen for minHSR Baby avatar k=2. n/x+x min minHSR(n,k,q) -> MR(k,q,n): reformulation Modified Pascal’s Triangle

68 Questionnaire before What did you know before you solved this homework? (sample answer: linear search, binary search). too general

69 Highest Safe Rung Given a ladder with n rungs and k identical jars, the goal is to discover the highest rung such that the jar doesn’t break when thrown from. What is the experimental plan that minimizes the total number of experiments? minHSR(n ∈ N, k ∈ N) : = ∃ q ∈   ∧ ¬ HSR(n, k, q-1) HSR(n ∈ N, k ∈ N, q  ∈ N) := ∃ d ∈ 

70 Potential Innovations Start with linear search. K=2. Reformulate: what is the maximum number of rungs that can be handled by k jars in q experiments? Modified Pascal Triangle.

71 Questionnaire How engaging was the experience of writing an avatar that fought on your behalf (scale from 1 to 10). What did you learn from your peers through the semantic games. Did you know about Pascal’s Triangle before? Did you know about linear and binary search before? What kind of change should be made to the system to enhance your learning experience.

72 5/3/2013 Questionnaire after What did you learn from your peers through the semantic games. Did you know about Pascal’s Triangle before. How engaging was the experience of writing an avatar that fought on your behalf (scale from 1 to 10). What kind of change should be made to the system to enhance your learning experience.

73 Questions?

74 Thank You!

75 minVertexBasisSize(  : = ∀ g ∈  ∃ n ∈  ∧ ¬   g ∈  n ∈  : = ∃ b ⊆ nodes(g) : basis(g, n, b)

76 Teams

77 Playing By Distance

79 Estimating User Strength: Simple Approach W\!ins_{SM}(U_i) &=& \sum_{j} P\!ayo\!f\!\!f\!\,({U_i},{U_j}) \\ Losses_{SM}(U_i) &=& \sum_{j} P\!ayo\!f\!\!f\!\,({U_j},{U_i}) \\ Str_{SM}(U_i) &=& W\!ins_{SM}(U_i)/(W\!ins_{SM}(U_i)+Losses_{SM}(U_i)) W\!ins_{SM}(U_i) &=& \sum_{j} P\!ayo\!f\!\!f\!\,({U_i},{U_j}) \\ Losses_{SM}(U_i) &=& \sum_{j} P\!ayo\!f\!\!f\!\,({U_j},{U_i}) \\ Str_{SM}(U_i) &=& W\!ins_{SM}(U_i)/(W\!ins_{SM}(U_i)+Losses_{SM}(U_i)) 79

80 Estimating User Strength: Iterative Approach Winning against a strong user results in a large gain. Losing against a strong user results in a small hit. Str_{IT}^{0}(U_i) &=& Str_{SM}(U_i) \\ W\!ins_{IT}^{(k)}(U_i) &=& \sum_{j} P\!ayo\!f\!\!f\!\,({U_i},{U_j}) * Str_{IT}^{(k-1)}(U_j) \\ Losses_{IT}^{(k)}(U_i) &=& \sum_{j} P\!ayo\!f\!\!f\!\,({U_j},{U_i}) * (1 - Str_{IT}^{(k-1)}(U_j)) \\ T\!otal_{IT}^{(k)}(U_i) &=& W\!ins_{IT}^{(k)}(U_i) + Losses_{IT}^{(k)}(U_i) \\ Str_{IT}^{(k)}(U_i) &=& \begin{cases} 0.5, \mbox{if } T\!otal_{IT}^{(k)} = 0 \\ W\!ins_{IT}^{(k)}(U_i) / T\!otal_{IT}^{(k)}(U_i), \mbox{o/w} \end{cases} Str_{IT}^{0}(U_i) &=& Str_{SM}(U_i) \\ W\!ins_{IT}^{(k)}(U_i) &=& \sum_{j} P\!ayo\!f\!\!f\!\,({U_i},{U_j}) * Str_{IT}^{(k-1)}(U_j) \\ Losses_{IT}^{(k)}(U_i) &=& \sum_{j} P\!ayo\!f\!\!f\!\,({U_j},{U_i}) * (1 - Str_{IT}^{(k-1)}(U_j)) \\ T\!otal_{IT}^{(k)}(U_i) &=& W\!ins_{IT}^{(k)}(U_i) + Losses_{IT}^{(k)}(U_i) \\ Str_{IT}^{(k)}(U_i) &=& \begin{cases} 0.5, \mbox{if } T\!otal_{IT}^{(k)} = 0 \\ W\!ins_{IT}^{(k)}(U_i) / T\!otal_{IT}^{(k)}(U_i), \mbox{o/w} \end{cases} 80

1 Crowdsourcing Formal Decision Making Using Generalized Semantic Games Ahmed Abdelmeged.

Similar presentations

Presentation on theme: "1 Crowdsourcing Formal Decision Making Using Generalized Semantic Games Ahmed Abdelmeged."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Crowdsourcing Formal Decision Making Using Generalized Semantic Games Ahmed Abdelmeged.

Similar presentations

Presentation on theme: "1 Crowdsourcing Formal Decision Making Using Generalized Semantic Games Ahmed Abdelmeged."— Presentation transcript:

Similar presentations

About project

Feedback