Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction Client Motivations  Tasks Categories Crowd Motivation Pros & Cons Quality Management Scale up with Machine Learning Workflows for Complex.

Similar presentations


Presentation on theme: "Introduction Client Motivations  Tasks Categories Crowd Motivation Pros & Cons Quality Management Scale up with Machine Learning Workflows for Complex."— Presentation transcript:

1 Introduction Client Motivations  Tasks Categories Crowd Motivation Pros & Cons Quality Management Scale up with Machine Learning Workflows for Complex tasks Market evolution  Reputation Systems ECCO, March 20,2011 corina.ciechanow@pobox.com http://bitsofknowledge.waterloohills.com

2 Introduction June 2006: Jeff Howe created the term for his article in the Wired magazine "The Rise of Crowdsourcing".Wired Elements: At least 2 actors: - Client/Requester - Crowd or community (an online audience) A Challenge: - What has to be done? Need, task, etc. - Reward: money, prize, other motivators.

3 Ex: “Adult Websites” Classification Large number of sites to label Get people to look at sites and classify them as: –G(general audience) –PG (parental guidance) –R (restricted) –X (porn) [Panos Ipeirotis. WWW2011 tutorial]

4 Ex: “Adult Websites” Classification Large number of hand ‐ labeled sites Get people to look at sites and classify them as: –G(general audience) –PG (parental guidance) –R (restricted) –X (porn) Cost/Speed Statistics: Undergrad intern: 200 websites/hr, cost: $15/hr MTurk: 2500 websites/hr, cost: $12/hr [Panos Ipeirotis. WWW2011 tutorial]

5 Client motivation Need Suppliers:  Mass work, Distributed work, or just tedious work  Creative work  Look for specific talent  Testing  Support  To offload peak demands  Tackle problems that need specific communities or human variety  Any work that can be done cheaper this way.

6 Client motivation Need customers! Need Funding Need to be Backed up Crowdsourcing is your business!

7 Crowd Motivation Money €€€ Self-serving purpose (learning new skills, get recognition, avoid boredom, enjoyment, create a network with other profesionals) Socializing, feeling of belonging to a community, friendship Altruism (public good, help others)

8 Crowd Demography (background defines motivation) The 2008 survey at iStockphoto indicates that the crowd is quite homogenous and elite. Amazon’s Mechanical Turk workers come mainly from 2 countries: a) USA b) India

9 Crowd Demography

10

11 Client Tasks Parameters 3 main goals for a task to be done: 1.Minimize Cost (cheap) 2.Minimize Completion Time (fast) 3.Maximize Quality (good) Client has other goals when the crowd is not just a supplier

12 Pros Quicker: Parallellism reduces time Cheap, even free Creativity, Innovation Quality (depends) Availability of scarce ressources: Taps on the ‘long tail’ Multiple feedback Allows to create a community (followers) Business Agility Scales up!

13 Cons Lack of professionalism: Unverified quality Too many answers No standards No organisation of answers Not always cheap: Added costs to bring a project to conclusion Too few participants if task or pay is not attractive If worker is not motivated, lower quality of work

14 Cons Global language barriers. Different laws in each country: adds complexity No written contracts, so no possibility of non-disclosure agreements. Hard to maintain a long term working relationship with workers. Difficulty managing a large-scale, crowdsourced project. Can be targeted by malicious work efforts. Lack of guaranteed investment, thus hard to convince stakeholders.

15 Quality Management Ex: “Adult Website” Classification Bad news: Spammers! Worker ATAMRO447HWJQ labeled X (porn) sites as G (general audience) [Panos Ipeirotis. WWW2011 tutorial]

16 Quality Management Majority Voting and Label Quality Ask multiple labelers, keep majority label as “true” label Quality is probability of being correct

17 Dealing with Quality Majority vote works best when workers have similar quality Otherwise better to just pick the vote of the best worker Or model worker qualities and combine  Vote combination studies [Clemen and Winkler, 1999, Ariely et al. 2000] show that complex models work slightly better than simple average, but are less robust. Spammers try to go undetected Good willing workers may have bias  difficult to set apart.

18 Human Computation Biases Anchoring Effect: “Humans start with a first approximation (anchor) and then make adjustments to that number based on additional information.” [Tversky & Kahneman, 1974] Priming: Exposure to one stimulus (as stereotypes) influences another [Shih et al., 1999] Exposure Effect: Familiarity leads to liking... [Stone and Alonso, 2010] Framing Effect: Presenting the same option in different formats leads to different answers. [Tversky and Kahneman, 1981]  Need to remove sequential effects from human computation data…

19 Dealing with Quality Use this process to improve quality: 1.Initialize by aggregating labels (using majority vote) 2. Estimate error rates for workers (use aggregated labels) 3. Change aggregate labels (using error rates, weight worker votes according to quality) Note: Keep labels for “example data” unchanged 4. Iterate from Step 2 until convergence Or Use exploration ‐ exploitation scheme: – Explore: Learn about the quality of the workers – Exploit: Label new examples using the quality  In both cases, significant advantage on bad conditions like imbalanced datasets and bad workers

20 Effect of Payment: Quality Cost does not affect quality [Mason and Watts, 2009, AdSafe] Similar results for bigger tasks [Ariely et al, 2009] [Panos Ipeirotis. WWW2011 tutorial]

21 Effect of payment in #tasks Payment incentives increase speed, though [Panos Ipeirotis. WWW2011 tutorial]

22 Optimizing Quality Quality tends to remain the same, independent of completion time [Huang et al., HCOMP 2010]

23 Scale Up with Machine Learning Build an ‘Adult Website’ Classifier Crowdsourcing is cheap but not free – Cannot scale to web without help  Build automatic classification models using examples from crowdsourced data

24 Integration with Machine Learning Humans label training data Use training data to build model

25 Dealing w/Quality in Machine Learning Noisy labels lead to degraded task performance Labeling quality increases  Classification quality increases

26 Tradeoffs for Machine Learning Models Get more data  Improve model accuracy Improve data quality  Improve classification

27 Tradeoffs for Machine Learning Models Get more data: Active Learning, select which unlabeled example to label [Settles, http://active-learning.net/] Improve data quality: Repeated Labeling, label again an already labeled example [Sheng et al. 2008, Ipeirotis et al, 2010]

28 Model Uncertainty (MU) Model uncertainty: get more labels for instances that cause model uncertainty – for modeling: why improve training data quality if model already is certain there? (“Self ‐ healing” process :[Brodley et al, JAIR 1999], [Ipeirotis et al NYU 2010] ) – for data quality, low ‐ certainty “regions” may be due to incorrect labeling of corresponding instances

29 Quality Rule of Thumb With high quality labelers (80% and above):  One worker per case (more data better) With low quality labelers (~60%)  Multiple workers per case (to improve quality) [Sheng et al, KDD 2008; Kumar and Lease, CSDM 2011]

30 Complex tasks: Handle answers through workflow Q: “My task does not have discrete answers….” A: Break into two Human Intelligence Tasks (HITs): – “Create” HIT – “Vote” HIT Vote controls quality of Creation HIT Redundancy controls quality of Voting HIT Catch: If “creation” very good, voting workers just vote “yes” – Solution: Add some random noise (e.g. add typos)

31 Photo description But the free-form answer can be more complex, not just right or wrong… TurkIt toolkit [Little et al., UIST 2010]: http://groups.csail.mit.edu/uid/turkit/

32 Description Versions 1.A partial view of a pocket calculator together with some coins and a pen. 2.... 3.A close ‐ up photograph of the following items: A CASIO multi ‐ function calculator. A ball point pen, uncapped. Various coins, apparently European, both copper and gold. Seems to be a theme illustration for a brochure or document cover treating finance, probably personal finance. 4.… 8.A close ‐ up photograph of the following items: A CASIO multi ‐ function, solar powered scientific calculator. A blue ball point pen with a blue rubber grip and the tip extended. Six British coins; two of £1value, three of 20p value and one of 1p value. Seems to be a theme illustration for a brochure or document cover treating finance ‐ probably personal finance.

33 Collective Problem Solving Exploration / exploitation tradeoff (Independence/or not) – Can accelerate learning, by sharing good solutions – But can lead to premature convergence on suboptimal solution [Mason and Watts, submitted to Science, 2011]

34 Independence or Not? Building iteratively (lack of independent) allows better outcomes for image description task…In the FoldIt game, workers built on each other’s results But lack of independence may cause high dependence on starting conditions and create Groupthink [Little et al, HCOMP 2010]

35 Exploration/Exploitation? With high quality labelers (80% and above):

36 Exploration/Exploitation?

37 Group Effect Individual search strategies affect group success: Players copying each other make less exploring  lower probability of finding peak on a round

38 Workflow Patterns Generate / Create Find Improve / Edit / Fix  Creation Vote for accept ‐ reject Vote up, vote down, to generate rank Vote for best / select top ‐ k  Quality Control Split task Aggregate Flow Control Iterate  Flow Control

39 AdSafe Crowdsourcing Experience

40

41 Detect pages that discuss swine flu – Pharmaceutical firm had drug “treating” (off-label) swine flu – FDA prohibited pharmaceuticals to display drug ad in pages about swine flu  Two days to comply! Big fast-food chain does not want ad to appear: – In pages that discuss the brand (99% negative sentiment) – In pages discussing obesity

42 Adsafe Crowdsourcing Experience Workflow to classify URLs Find URLs for a given topic (hate speech, gambling, alcohol abuse, guns, bombs, celebrity gossip, etc etc) http://url ‐ collector.appspot.com/allTopics.jsp Classify URLs into appropriate categories http://url ‐ annotator.appspot.com/AdminFiles/Categories.jsp Mesure quality of the labelers and remove spammers http://qmturk.appspot.com/ Get humans to “beat” the classifier by providing cases where the classifier fails http://adsafe ‐ beatthemachine.appspot.com/

43 Market Design of Crowdsourcing Aggregators: Create a crowd or community. Create a portal to connect a client to the crowd Deal with workflow of complex tasks, like decomposition in simpler tasks and answer recomposition  Allow anonymity  Consumers can benefit from a crowd without the need to create it.

44 Market Design: Crude vs Intelligent Crowdsourcing Intelligent Crowdsourcing uses an organized workflow to tackle CONS of crude crowdsourcing.  Complex task is divided by experts,  Given to relevant crowds, and not to everyone  Individual answers are recomposed by experts into general answer  Usually covert

45 Lack of Reputation and Market for Lemons “When quality of sold good is uncertain and hidden before transaction, prize goes to value of lowest valued good” [Akerlof, 1970; Nobel prize winner] Market evolution steps: 1. Employers pays $10 to good worker, $0.1 to bad worker 2. 50% good workers, 50% bad; indistinguishable from each other 3. Employer offers price in the middle: $5 4. Some good workers leave the market (pay too low) 5. Employer revised prices downwards as % of bad increased 6. More good workers leave the market… death spiral http://en.wikipedia.org/wiki/The_Market_for_Lemons

46 Reputation systems Great number of reputation mechanisms Challenges in the Design of Reputation Systems - Insufficient participation - Overwhelmingly positive feedback - Dishonest reports - Identity changes - Value imbalance exploitation (“milking the reputation”)

47 Reputation systems [Panos Ipeirotis. WWW2011 tutorial]

48 Reputation systems Dishonest Reports 1. Ebay “Riddle for a PENNY! No shipping ‐ Positive Feedback”. Sets agreement in order to be given unfairly high ratings by them. 2 “Bad ‐ mouthing”: Same situation but to “bad ‐ mouth” other sellers that they want to drive out the market. Design incentive ‐ compatible mechanism to elicit honest feedbacks [Jurca and Faltings 2003: pay rater if report matches next; Miller et al. 2005: use a proper score rule to value report; Papaioannou and Stamoulis 2005: delay next transaction over time] [Panos Ipeirotis. WWW2011 tutorial]

49 Reputation systems Identity changes “Cheap pseudonyms”: easy to disappear and reregister under a new identity with almost zero cost. [Friedman and Resnick 2001] Introduce opportunities to misbehave without paying reputational consequences.  Increase the difficulty of online identity changes Impose upfront costs to new entrants: allow new identities (forget the past) but make it costly to create them

50 Challenges for Crowdsourcing Markets Two ‐ sided opportunistic behavior 1. In e ‐ commerce markets, only sellers are likely to behave opportunistically. 2. In crowdsourcing markets, both sides can be fraudulent. Imperfect monitoring and heavy ‐ tailed participation verifying the answers is sometimes as costly as providing them. - Sampling often does not work, due to heavy tailed participation distribution (lognormal, according to self ‐ reported surveys) [Panos Ipeirotis. WWW2011 tutorial]

51 Challenges for the Crowdsourcing Market Constrained capacity of workers Workers have constrained capacity (cannot do more than xxhours per day)  Machine Learning Techniques No “price premium” for high ‐ quality workers It is the requester who set the prices, which are generally the same for all the workers, regardless of their reputation or quality.

52 Market is Organizing the Crowd Reputation Mechanisms – Crowd: Ensure worker quality – Employer: Ensure employer trustworthiness Task organization for task discovery (worker finds employer/task) Worker expertise recording for task assignment (employer/task finds worker)

53 Crowdsourcing Market Possible Evolutions Optimize allocation of tasks to worker based on completion time and expected quality Recommender system for crowds (“workers like you performed well in…”) Create a market with dynamic pricing for tasks, following the pricing model of the stock market (prices increase for task when work supply low, and vice versa) [P. Ipeirotis, 2011]

54 References Wikipedia,2011 Dion Hinchcliffe Crowdsourcing: 5 Reasons Its Not Just For Start Ups Anymore,2009Dion HinchcliffeCrowdsourcing: 5 Reasons Its Not Just For Start Ups Anymore Tomoko A. Hosaka, MSNBC. "Facebook asks users to translate for free“,2008."Facebook asks users to translate for free“ Daren C. Brabham. "Moving the Crowd at iStockphoto: The Composition of the Crowd and Motivations for Participation in a Crowdsourcing Application", First Monday, 13(6),2008."Moving the Crowd at iStockphoto: The Composition of the Crowd and Motivations for Participation in a Crowdsourcing Application", Karim R. Lakhani, Lars Bo Jeppesen, Peter A. Lohse & Jill A. Panetta. The value of openness in scientific problem solving (Harvard Business School Working Paper No. 07-050),2007.The value of openness in scientific problem solving Klaus-Peter Speidel How to Do Intelligent Crowdsourcing,2011Klaus-Peter SpeidelHow to Do Intelligent Crowdsourcing Panos Ipeirotis. Managing Crowdsourced Human Computation, WWW2011 tutorial,2011 Omar Alonso & Matthew Lease. Crowdsourcing 101: Putting the WSDM of Crowds to Work for You, WSDM Hong Kong 2011. Sanjoy Dasgupta, http://videolectures.net/icml09_dasgupta_langford_actl/,2009


Download ppt "Introduction Client Motivations  Tasks Categories Crowd Motivation Pros & Cons Quality Management Scale up with Machine Learning Workflows for Complex."

Similar presentations


Ads by Google