Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data and text mining workshop The role of crowdsourcing Anna Noel-Storr Wellcome Trust, London, Friday 6 th March 2015.

Similar presentations


Presentation on theme: "Data and text mining workshop The role of crowdsourcing Anna Noel-Storr Wellcome Trust, London, Friday 6 th March 2015."— Presentation transcript:

1 Data and text mining workshop The role of crowdsourcing Anna Noel-Storr Wellcome Trust, London, Friday 6 th March 2015

2 What is crowdsourcing? “…the practice of obtaining needed services, ideas, or content by soliciting contributions from a large group of people, and especially from an online community, rather than from traditional employees…” Image credit: DesignCareer

3 What is crowdsourcing? Knowledge discovery and management Brabham’s problem focused crowdsourcing typology: 4 types

4 What is crowdsourcing? Knowledge discovery and management Broadcast search Brabham’s problem focused crowdsourcing typology: 4 types

5 What is crowdsourcing? Knowledge discovery and management Broadcast search Peer-vetted creative production Brabham’s problem focused crowdsourcing typology: 4 types

6 What is crowdsourcing? Knowledge discovery and management Broadcast search Peer-vetted creative production Distributed human intelligence tasking Brabham’s problem focused crowdsourcing typology: 4 types

7 What is crowdsourcing? Knowledge discovery and management Broadcast search Peer-vetted creative production Distributed human intelligence tasking Brabham’s problem focused crowdsourcing typology: 4 types

8 Micro-tasking: process Breaking down large corpus of data into smaller units and distributing those units to a large online crowd “the distribution of small parts of a problem”

9 Human computation Humans remain better than machines at certain tasks: e.g. Identifying pizza toppings from a picture of a pizza e.g. “preventing obesity without eating like a rabbit”.ti. – autotag: Animal study

10 Tools and platforms What platforms and tools exist and how do they work? Image credit: ThinkStock

11 The Zooniverse “each project uses the efforts and ability of volunteers to help scientists and researchers deal with the flood of data that confronts them”

12 Classification and annotation Galaxy Zoo Operation War Diary

13 Health related evidence production Can we use crowdsourcing to identify the evidence in a more timely way? -Known pressure point within the review production -Between 2000 and 5000 citations per new review, but can be much more -A not much loved task Trial identification

14 The Embase project Cochrane’s Central Register of Controlled Trials: CENTRAL Embase Crowd Embase auto Step 2: Use a crowd to screen thousands of search results from Embase and feed the identified reports of RCTs into CENTRAL How will the crowd do this? Step 1: run a very sensitive search in the largest biomedical database for studies

15 The screening tool Three choice s You are not alone! (and you can’t go back) Progress bar Yellow highlights to indicate a likely RCT Red highlights

16 The Embase project: recruitment - 900+ people have signed-up to screen citations in 12 months - 110,000+ citations have been collectively screened - 4,000 RCTs/q-RCTs identified by the crowd

17 Why do people do it? Made it very easy to participate (and equally easy to stop!) Gain experience (bulk up the CV) Provide feedback: both to the individual and to the community Wanting to do something to contribute (healthcare is a strong hook) (people are more likely to come back)

18 RCT Reject Unsure CENTRAL Bin Resolver How accurate is the crowd? RCT Reject Resolver 5%

19 Crowd accuracy TP 1565 FP 9 FN 2 TN 2888 TP 415 FP 5 FN 1 TN 2649 The Crowd: INDEX TEST The Crowd: INDEX TEST The Info specialist: REFERENCE STANDARD The Info specialists: REFERENCE STANDARD Validation 1 Validation 2 Sensitivity: 99.9% Specificity: 99.7% Sensitivity: 99.8% Specificity: 99.8% Enriched sample; blinded to crowd decision; dual independent screeners as reference standard Enriched sample; blinded to crowd decision; single independent expert screener (me!) as reference standard; possibility of incorporation bias Individual screener accuracy is also carefully monitored

20 How fast is the crowd? Number of weeks Jan 2014Jul 2014Jan 2015 6 weeks 5 weeks 2 weeks More screeners and more screeners screening more quickly Length of time to screen one month’s worth of records

21 More of the same, and more tasks As the crowd becomes more efficient, we plan to do two things: 1.Increase the databases we search – feed in more citations 2.Offer other ‘micro-tasks’ Feed in more citations – from other databases Bin Y N Screen Annotate, appraise And in these tasks the machine plays a vital and complementary role… e.g. is the healthcare condition Alzheimer’s disease? Y, N, Unsure

22 Perfect partnership Machine driven probability + Collective human decision-making It’s not one or the other, the ideal is both

23 In summary Effective method in large scale study identification Identify more studies, more quickly No compromise on quality or accuracy Offers meaningful ways to contribute Feasible to recruit a crowd Highly functional tool Complements data and text mining And enables the move towards the living review Crowdsourcing:


Download ppt "Data and text mining workshop The role of crowdsourcing Anna Noel-Storr Wellcome Trust, London, Friday 6 th March 2015."

Similar presentations


Ads by Google