Presentation is loading. Please wait.

Presentation is loading. Please wait.

Task Search and Recommendation KSE 801 Uichin Lee.

Similar presentations


Presentation on theme: "Task Search and Recommendation KSE 801 Uichin Lee."— Presentation transcript:

1 Task Search and Recommendation KSE 801 Uichin Lee

2 Task Search in a Human Computation Market Lydia B. Chilton, John J. Horton, Robert C. Miller, Shiri Azenkot KDD-HCOMP 2010

3 Introduction In every labor market, information plays a critical role in determining efficiency – buyers and sellers cannot make good choices unless they know their options (e.g., availability and nature of tasks) If workers lack full information about the tasks, they are likely to make suboptimal decisions (e.g., accepting inferior offers) In large markets, lack of information is a key source of friction (in part due to imperfect, costly task searching)

4 Introduction The “search problem" is particularly challenging in labor markets because both jobs and workers are unique There is neither single prevailing price for a unit of labor nor a commodified unit of labor Reducing search frictions with ICT: e.g., monster.com (job listing site) In online labor marketplaces, relatively little information sharing happens as in traditional markets Online labor marketplaces (e.g., M-Turk) do not impose geographic constraints, meaning that markets could be very large (also with micro-tasks)  searching is a very important problem

5 Introduction M-Turk typically offers (1) several features of searching tasks (e.g., keyword, reward, qualification), and (2) sorting tasks (e.g., most available, most recent, highest reward) Goal: investigate search behavior of tuckers Two step procedure: – Crawling “sorted task pages” periodically and analyzing the rank data (for every 30 seconds) Monitoring how fast a task was demoted to lower ranks – Surveying tuckers to understand their search behavior

6 M-Turk Search Features M-Turk arranges HITs in HIT groups that are presented much like traditional web search engine results with 10 HIT groups listed on each page of research results By default, the list is sorted by “HITs available (most first)” HIT types: 1-HIT wonders (single tasks e.g., surveys) vs. HIT groups (e.g., image labeling, transcription, etc.) Each HIT group shows the following information 1. Title (e.g., “Choose the best category for this product") 2. Requester (e.g., “Dolores Labs") 3. HIT Expiration Date (e.g., “Jan 23, 2011 (38 weeks)") 4. Time Allotted (e.g., “60 minutes") 5. Reward (e.g., “$0.02") 6. HITs Available (e.g., 17110) 7. Required qualifications (if any)

7 M-Turk Search Features By clicking on a HIT title, display expands to show three additional fields: – 1. Description (e.g., “Assign a product category to this product") – 2. Keywords (e.g., categorize, product, kids) – 3. Qualifications. (e.g., “Location is US“) HITs can be shorted in either ascending or descending order: – 1. HIT Creation Date (newest or oldest) – 2. HITs Available (most or fewest) (i.e., how many sub-tasks may be performed) – 3. Reward Amount (highest or lowest) – 4. Expiration Date (soonest or latest) – 5. Title (a-z or z-a) – 6. Time Allotted (shortest or longest)

8 Rank Data Analysis For each sorting mechanism, periodically crawled top 3 pages (total 30 tasks for every 30 seconds) How is the rank of each sorting mechanism related to disappearance rate of a HIT? (see if some mechanism is widely used by turkers) Disappearance of HIT g = F(rank of g, X g ) where X g are all the associated HIT attributes, including time-of-day effect

9 Rank Data Analysis: Model Data: April 29, 2010 – May 1, 2010 – Used 2040 unique HITs out of 1M observations For each sorting mechanism: – For each HIT group “g”, rank change at time slot “k” is measured as Δy k, g =y k, g – y k-1, g – Outcome variable Y k,g = I{Δy k, g <0} Note that negative rank is used {-1, -2, …, -30} – Multilevel logistic regression: Pr(Y k,g =1)= logit -1 (∑ r=1~30 β r *x r k,g + γ + τ + ε) x k,g =1 if the HIT k is at position r (otherwise x k,g = 0) γ: group random effect based on the fact that some groups are more attractive: γ ~ N(0, σ 2 γ ) τ: time-of-day effect: τ ~ N(0, σ 2 τ ) – β r can be interpretable as the probability that a HIT occupying position r is demoted to lower ranks – Group random effects : γ is considered – Pooled model: γ = 0

10 Rank Expected HIT-Disappearing Event Per Scrape Iteration # of HITs (most) Reward (largest) Time (newest) Title (a-z) (a) Group-specific random effect model(b) Pooled model Rank Data Analysis: Results

11 Group random effect (left figure): strong positional effects with “most” # of HITs Pooled model (right figure): most # of HITs and newest Time Searching by newest HITs? – Group random effect: rank 1 (newest) is considered harmful, but this does not really match with survey results – This is due to gaming requesters who try their best to put their tasks in higher ranks – For newest HITs, the pooled model fits better; yet the coefficient estimates are still biased due to gaming

12 Worker Survey Survey HIT questionnaire: – Which of the 12 sorting categories were they presently using? – Whether were they filtering tasks based on key words, minimum price? – What page number of the search results did they find this survey HIT on? – Asked a free-form comments about how easy or hard to find HITs

13 Worker Survey Survey HIT posting strategies: 1) Best-case posting: HIT was configured to rank near top under 6 primary categories, e.g., newest posting, fewest possible HITs available (one HIT wonder), least amount ($0.01), soonest expiration (5 hours), keyword (survey included) 2) Worst-case posting: making hard to find, e.g., by putting it in the middle of the list (awaking a HIT after 6 hours), no “survey” keyword, etc. 3) Newest-favored or 4) a-z favored posting: only focused on a single ranking mechanism (newest or a-z) All HITs have identical description and are posted very close in time The best case posting offered the lowest reward ($0.01 vs. $0.05 the others)

14 Worker Survey: Results Recruited total 257 workers: 1) 70, 2) 58, 3) 59, 4) 70 workers Newest shows a knee (after 1 hour) a-z favored posting recruits workers steadily (no changes in rank) Time in Hours Number of Workers w/ $0.01 rewards w/ $0.05 rewards

15 Worker Survey: Results 65%: not set any minimum amount when they found the survey HIT 9%: used reward filter that was greater than the reward offered by the survey (cf: due to filter re-set problem of M-Turk) 20%: used keyword searching, e.g., survey

16 Worker Survey: Results 25%: browsing through pages beyond page 10 Some workers are willing to drill dozens of pages deep in order to find tasks Some comments: – “scrolling through all 8 pages to find HITs can be a little tiring.” – “I find it easier to find hits on m-turk if I search for the newest tasks created first. If I don’t find anything up until page 10 then I refresh the page and start over otherwise it becomes too hard to find tasks” – “I was on page 25, but as soon as I finish a task, it takes me to page 1!!!”

17 Labor Allocation in Paid Crowdsourcing: Experimental Evidence on Positioning, Nudges and Prices John Horton and Dana Chandler KDD-HCOMP 2011

18 Introduction Study how worker decision making is affected by type (e.g., monetary) and saliency (e.g., position) of incentives Heuristics, attention, and salience – Bounded rationality (Simon 1955): rationality of individuals is limited by the information they have, the cognitive limitations of their minds, and the finite amount of time they have to make a decision – People have limited attention; it matters how choices are presented – Salient prices: people respond more to salient prices (as opposed to “rational” responses) Electronic vs. in-person toll collection (Finkelstein 2009) – “Setting default options, and other similar seemingly trivial menu-changing strategies, can have huge effects on outcomes, from increasing savings to improving health care to providing organs for lifesaving transplant operations (Thaler and Sunstein 2008, Nudge: Improving Decisions About Health, Wealth, and Happiness)

19 Experiment Setup

20 Treatment dimensions: – Type of incentive: monetary, progress bars or both Progress bar is used to indicate that images have different levels of completion and to mean that workers choose the task that is most in need of completion – Size of monetary incentive: large bonus (5-cents) or small bonus (1-cent) – Saliency/position of incentive: focal or non-focal Position of each incentive type Focal: top-left, non-focal: bottom-right M-Turk: HITs with $0.25 to label a single image (with extra bonus if any)

21 Results Focal point matters Offering bonus has a larger effect Imbalanced progress bar makes bonus more effective Higher bonus is (slightly) more effective – Don’t overpay! (small bonus worked as well as larger ones) No bonus 5¢ bonus on non-focal 5¢ bonus on focal 1¢ bonus on non-focal Progress bar: Balanced Progress bar: Imbalanced on Non-focal Progress bar: Imbalanced on focal

22 Towards Task Recommendation in Micro-Task Markets Vamsi Ambati, Stephan Vogel Jaime Carbonell KDD-HCOMP 2011

23 Some Problems Genuinely skilled workers who are more suitable for particular tasks may not be able to search and find them at the right time before the rest of the crowd and so can not contribute. Lesser-skilled workers may attempt the tasks and produce sub-standard or noisy output, requiring the requesters to put in extra effort to clean and verify the data Less qualified workers may provide low quality and risk being rejected and in turn hurt their reputation in micro- task markets A vicious cyclic effect leads to a market of lemons where requesters lack trust in workers and do not pay the right monetary rewards, attracting low quality turkers

24 Task Recommendation User data: – Static profile information – Explicit feedback (explicit rating a user made) – Implicit feedback (profiling a user’s behavior, e.g., search query, clicks, etc.) – Details of the task: description, reward, # HITs available, timestamp, etc – Requester feedback: bonuses, comments, etc. User preference modeling and task recommendation – Content based matching: bag-of-words (e.g., task description) – More sophisticated machine learning methods can be used for better classification – Collaborative filtering can be used as well

25 Summary Searching (information) is very important in labor marketplaces – Lowering cognitive burden of workers is critical – Recency of tasks plays a quite an important role Non-monetary incentives work well (e.g., progress bars) -- even games can be designed; and “saliency” is very important when designing tasks; Task recommendation can help users choose tasks based on their interests and skill sets; it has a potential of improving the overall quality of work in the marketplace

26 Notes KJ/ST: – Recommendation – Categorization YC/MY: – Task based recommendation (like Genius) SH/DH/SG: – Google’s I feel lucky (keyword search)? Today’s task recommendation – Requester quality (reputation) – Graphical effects (e.g., special marks?) JH/SY: – Collaborative filtering, pushing – TF/IDF based content based filtering (task tagging) – Dividing widow: left panel (original) and right panel (personalized)


Download ppt "Task Search and Recommendation KSE 801 Uichin Lee."

Similar presentations


Ads by Google