Presentation is loading. Please wait.

Presentation is loading. Please wait.

Crowdsourcing Chemistry for the Community – 5 Years of Experiences Antony Williams NFAIS, February 28 th 2012.

Similar presentations


Presentation on theme: "Crowdsourcing Chemistry for the Community – 5 Years of Experiences Antony Williams NFAIS, February 28 th 2012."— Presentation transcript:

1 Crowdsourcing Chemistry for the Community – 5 Years of Experiences Antony Williams NFAIS, February 28 th 2012

2 The World of Online Chemistry  Safety data  Toxicity data  Blogs and Wikis  Property databases  Experimental results  Scientific publications  Compound aggregators  Open Notebook Science  Metabolic pathway databases  Encyclopedic articles (Wikipedia)

3 If it was not just about me…

4  We might have a community built encyclopedia  I might know where the best restaurants are  I might get good advice on books to read  I might know which movies to watch  I might know which plumber to call  Data might just be Open

5 If it was not just about me…  We might have a community built encyclopedia  I might know where the best restaurants are  I might get good advice on books to read  I might know which movies to watch  I might know which plumber to call  Data might just be Open

6 Collaborative Knowledge Management

7 QUESTION  Are you involved with assisting chemists, pharmaceutical scientists, etc. in sourcing information about Chemistry?  1. Yes  2. No

8 Chemistry Databases on the Internet  Public databases are “trusted” as primary sources  Trust is granted without investigation of the content  Online data vary dramatically in quality!  Examples…

9 With Great Fanfare…

10 NPC Browser http://tripod.nih.gov/npc/

11

12

13 How many contribute to clean-up?  Less than a dozen contributors to data  The majority are project members  The crowd is small …

14 What you might not know about Chemistry Databases on the Internet  Data-sharing between the databases is cyclic – proliferating errors – “Linked Data”

15 What is the Structure of Vitamin K?

16 MeSH  A lipid cofactor that is required for normal blood clotting.  Several forms of vitamin K have been identified:  VITAMIN K 1 (phytomenadione) derived from plants,  VITAMIN K 2 (menaquinone) from bacteria, and synthetic naphthoquinone provitamins,  VITAMIN K 3 (menadione).

17 What is the Structure of Vitamin K1?

18 QUESTION  Who has heard of ChemSpider as a chemistry database?  1. Yes  2. No

19 ChemSpider

20 We Want to Answer Questions  Questions a chemist might ask…  What is the melting point of n-heptanol?  What is the chemical structure of Xanax?  Chemically, what is phenolphthalein?  What are the stereocenters of cholesterol?  Where can I find publications about xylene?  What are the different trade names for Ketoconazole?  What is the NMR spectrum of Aspirin?  What are the safety handling issues for Thymol Blue?

21 Available Information…  Linked to vendors, safety data, toxicity, metabolism

22 Available Information….

23 Crowdsourced “Annotations”  Users can add  Descriptions/Syntheses/Commentaries  Links to PubMed articles  Links to articles via DOIs  Add spectral data  Add Crystallographic Information Files  Add photos  Add MP3 files  Add Videos

24

25 QUESTION  Did you know that ChemSpider was OWNED by the Royal Society of Chemistry?  1. Yes  2. No

26 Public Domain Databases  Our databases are a mess…  Non-curated databases are proliferating errors  We source and deposit data between databases  Original sources of errors hard to determine  Curation is time-consuming and challenging

27 Stop Whining – Fix it

28 Crowdsourced Curation  Crowdsourced curation: identify/tag errors, edit names, synonyms, identify records to deprecate

29 Search “Vitamin H”

30 “Curate” Identifiers

31

32 Validated Name-Structure Dictionaries  Chemical name dictionaries are used for:  Text-mining (publications, patents)  Used to index PubMed and link to Google Patents  Linking to other databases – think Biology!  When structures are not available drug names link  Searching the web  Names link to structures link to InChIs

33 Why are Dictionaries important?

34 The Final Search Strategy

35 Many Names, One Structure

36 I want to know about “Vincristine”

37 Vincristine: Identifiers and Properties

38 Vincristine: Patents Linked by Name

39 Text-Mining Depends on Dictionaries

40 Curated Dictionaries Matter

41 Originally 15 compounds “called” Yohimbine 54 Skeletons for Yohimbine

42 Sharing Chemspider curation

43 Data Curation Sharing - Proof of Concept

44 Identifier Dictionaries  Reciprocal curation processes…share curation  A series of “added” and “removed” synonyms against structures for matching.  Announced 9 months ago – only one consumer  Who will participate???

45 Community Contribution to ChemSpider

46 www.SpectralGame.com www.SpectralGame.com http://www.jcheminf.com/content/1/1/9

47 Curation through “gaming”

48 Data Curation

49 Reversed Spectrum

50 True Curation of Data

51 ChemSpider SyntheticPages

52

53 Submission Process  Simple template-based submission process  Submissions reviewed by editorial board.  Online Peer Review process  Crowdsourced expansion?  A few regular dedicated authors only  Online peer review and feedback small but useful

54 Crowdsourcing – does it work?  192 people EVER have deposited or curated data  ChemSpider SyntheticPages small group of authors  Database hosts make the largest contributions  ChemSpider staff tend to do the most curation

55 Contributions

56 Curations  2009 – 8255 curations by 43 people  2010 – 10014 curations by 66 people  2011 – 16025 curations by 116 people  “Crowdsourcing” – the crowd is small!

57 www.SciMobileApps.com  8 contributors only…in 7 months

58 www.SciDBs.com  7 contributors only…in 6 months

59 www.ScientistsDB.com  38 contributors …in 6 weeks

60 What encourages participation?  “Interested” parties contribute  Marketing and self-promotion are primary reasons for participation  There are very few “selfless” participants  Relationships garner contributions…

61  Crowdsourcing across drug discovery  Open PHACTS : partnership between European Community and European Pharma Companies  Freely accessible for knowledge discovery and verification.  Data on chemistry and biology  Pharmacological profiles  Proprietary and public data sources.

62

63 How will it improve? Participation and contribution

64 Conclusions  For chemistry - crowdsourced deposition, annotation, and curation works but low engagement to date  Primary challenge – engaging the community to help create what they want. Rewards and recognition?  MORE collaboration can benefit us all  Indicators are good for small but continued growth

65 Thank you Email: williamsa@rsc.org Twitter: ChemConnector Personal Blog: www.chemconnector.comwww.chemconnector.com SLIDES: www.slideshare.net/AntonyWilliamswww.slideshare.net/AntonyWilliams


Download ppt "Crowdsourcing Chemistry for the Community – 5 Years of Experiences Antony Williams NFAIS, February 28 th 2012."

Similar presentations


Ads by Google