Presentation is loading. Please wait.

Presentation is loading. Please wait.

Scratchpads Virtual Research Environments for taxonomic and biodiversity related data Dr. Vince Smith Informatics Research Leader The Natural History Museum.

Similar presentations


Presentation on theme: "Scratchpads Virtual Research Environments for taxonomic and biodiversity related data Dr. Vince Smith Informatics Research Leader The Natural History Museum."— Presentation transcript:

1 Scratchpads Virtual Research Environments for taxonomic and biodiversity related data Dr. Vince Smith Informatics Research Leader The Natural History Museum London

2 Smith, V.S., Koureas, D, & Livermore, L. 2014. Scratchpads introductory presentation. Slideshare. http://www.slideshare.net/vsmithuk/Scratchpad-2014-Introduction Where to find and how to cite this presentation:

3 Publications based on countless specimens, images, maps, keys and datasets Current taxonomic data production Typically generated by small communities for “local” research projects Figure from Costello M.J et al, 2013. doi: 10.1126/science.1230318

4 However… not publicly accessible lack sufficient contextual metadata published in formats that require time-consuming manual extraction difficulty in publishing valuable datasets (i.a. local or regional Floras, Faunas) Published knowledge cannot easily be mobilised Vast amounts of unpublished taxonomic “knowledge”

5 On the other hand: Estimates of 7.5 million species still undescribed 1 1 How Many Species Are There on Earth and in the Ocean? Mora C et al. doi:10.1371/journal.pbio.1001127

6 Expected volume of taxonomic and biodiversity data Need of extracting, aggregating and linking data on a global level

7 The four nodes of data cycle 1. We collect and generate data 2. We curate, link and structure data 3. We analyse data 4. We publish data

8 Data curation Data curation Data publishing Data publishing The four nodes of data cycle Data collection & generation Data collection & generation What are the bottlenecks in the workflow ? Data analysis Data analysis

9 Data curation Data curation Data publishing Data publishing What we need is… Data collection & generation Data collection & generation a seamless workflow Data analysis Data analysis

10 Cyndy Parr, Rob Guralnick, Nico Cellinese and Rod Page. TREE. doi:10.1016/j.tree.2011.11.001 This requires data, information & knowledge to be… Digital Not printed paper Openly accessible Not behind barriers (e.g. paywalls) Linked-up Not in silos “ Link together evolutionary data … by developing analytical tools and proper documentation and then use this framework to conduct comparative analyses, studies of evolutionary process and biodiversity analyses” To achieve this…

11 Scratchpads Virtual Research Environments Making taxonomy digital, open & linked

12 so… what are the Scratchpads ?

13 What are Scratchpads? Hosted websites for biodiversity data Virtual research & publication platform Completely open access & open source Modular & flexible

14 What are Scratchpads? development of online research communities facilitate standardized environment of entering and curating data through sharing and interlinking that allow dissemination of research products and

15 A Scratchpad is a website that holds data for you and your community The Scratchpads concept Your data External data & services

16 The Scratchpads concept

17 Examples of use: Taxa (Classifications, taxon profiles, specimens, literature, images, maps, phenotypic, genotypic & morphometric datasets, keys, phylogenies) Conservation ProjectsRegionsSocieties

18 Red List conservation assessments Examples of use:

19 Bulbous monocot genera listed in CITES

20 Global Invasive Alien Species Information Partnership Examples of use:

21 Belgian Network for DNA Barcoding Examples of use:

22 Major integrated projects Online resource for monocot plants Collaboration between Kew, Oxford University and NHM Data to be open and usable by other scientists

23 Major integrated projects 21+ open community sites and growing Over 45 internationally collaborating scientists Site data feeds into a “Portal” Site List: http://about.e-monocot.org/list-emonocot-scratchpads

24 Major integrated projects Retrieve information on any Monocot plant Rich downloadable data Identification keys Model example of linked attributed data eMonocot Portal: http://e-monocot.org/

25 65,000 unique visitors/month Per month unique visitors to Scratchpads sites 665 Scratchpads Communities by 7,334 active registered users covering 162,432 taxa in 735,660 pages. Are Scratchpads sustainable? 81 paper citations in 2012 In total more than 1,300,000 visitors

26 Are Scratchpads sustainable? 2007 2011 2014 ViBRANT Virtual Biodiversity Research & & Other grants in the pipeline New Proposals

27 the main features

28 Classification term oriented system Biological classifications Non-biological classifications Taxonomies Hierarchical controlled vocabularies The main features

29 Dynamic Biological Classifications Manually entered or imported Auto generated The main features

30 Taxon pages Overview of data related to taxon Generated from tagged content The main features

31 Bibliography management Faceted browsing An inbuilt Bibliography manager Taxon tagging and free keywords Import from and export to all major formats The main features

32 Specimen/Observation data Linked to images and georeferenced Annotated full specimen/observation records The main features Linked to GenBank accession numbers

33

34 Distribution maps Google maps based Data layers Occurrence data Distribution data TDWG regions GBIF data The main features

35 Example regional distribution The main features

36 Create phylogenetic trees Based on Newick/NeXML Different views

37 Character matrices – Key construction Quantitative or qualitative characters Auto generation of keys Taxon based matrices [Specimens based character matrices] The main features

38 Media handling Bulk upload Metadata (EXIF & Aubudon core) Media galleries The main features

39 Generation of custom pages Tagged or not External RSS Twitter feeds Media files The main features

40 Working groups Forums Blog entries Webforms Newsletters RSS syndication Inbuilt comments Enhanced communication tools The main features

41 analytical tools OBOE service i.a. Ecological informatics, Phylogenetics, Sequence alignment The main features

42 MCMC methods to estimate the posterior distribution of model parameters Phylogenies Sequence alignment Multiple sequence alignment Microsatellite repeats finder

43 data mobilisation more on the way… External services Integration

44 IUCN data integration

45 GBIF data integration

46

47 Help & Support In-site Support Wiki Training Courses (12 in 2012) Ambassadors Programme Embedded Issues Queue Sandbox Site http://help.scratchpads.eu

48 Data curation Data curation Data publishing Data publishing Data collection & generation Data collection & generation a seamless workflow Data analysis Data analysis Data publishing

49 Helping researchers take credit for all research products The vision

50 Publication module

51 The Publication module Open-access journal The main features

52 What does the BDJ publish? Single taxon treatments and nomenclatural acts Local or regional checklists Sampling reports and occasional inventories Habitat-based checklists and inventories Ecological and biological observations of species and communities? Single identification keys biodiversity-related databases, including genomic, ecological and environmental data (data papers) Biodiversity-related software tools

53 How do Scratchpads and the BDJ interact?

54 Allow submission of datasets for publication without reformatting and restructuring Working in a single environment based on standardised XML schema

55 Work on multiple manuscripts Allocate different people to different manuscripts Handle permissions Assembling a manuscript

56 Author names and affiliations Data included in manuscript in a structured annotated format Assembling a manuscript

57 Taxon descriptions Assembling a manuscript

58 Specimen data Assembling a manuscript

59 Figures and Tables

60 Supplementary files Select from existing or upload new

61 References Assembling a manuscript Easily cite bibliography Auto compile list of references

62 Assembling a manuscript Texts

63 XML Figures and Tables Keys References Texts The publication module Author names and affiliations Taxon descriptions Specimen data Supplementary files

64 Previewing your manuscript

65 Submission & enhanced peer review Manuscript data validation One-click submission to BDJ Traditional peer review and optional panel/public review

66 The workflow MANUSCRIPT PUBLISHED (XML, PDF) MANUSCRIPT PUBLISHED (XML, PDF) PENSOFT JOURNAL SYSTEM (PJS 2.0) XML submission SCRATCHPADS Community Taxon namesOccurrence datadatasets Archive Taxon treatments Plazi Wiki

67 Scratchpads are an integrated system to Enter, Curate, Mark-up, Link and Publish data taxonomic workflow in a single virtual environment

68 Scratchpads technical development -Vince Smith, Simon Rycroft, Ben Scott, Ed Baker, Alice Heaton, Katherine Boutton Scratchpads outreach -Laurence Livermore, Isa van deVelde & Dimitris Koureas e-Monocot -Paul Wilkin & the Kew team, Charles Godfray & the Oxford team ViBRANT -Vince Smith, Dave Roberts & Lucy Reeve Pensoft - Lyubomir Penev and the Pensoft team Our 7000+ users Acknowledgements

69 Thank you Data curation Data curation Data analysis Data analysis Data publishing Data publishing Data collection & generation Data collection & generation

70

71 However… not publicly accessible lack sufficient contextual metadata published in formats that require time-consuming manual extraction difficulty in publishing valuable datasets (i.a. local or regional Floras, Faunas) Published knowledge cannot easily be mobilised Vast amounts of unpublished taxonomic “knowledge”


Download ppt "Scratchpads Virtual Research Environments for taxonomic and biodiversity related data Dr. Vince Smith Informatics Research Leader The Natural History Museum."

Similar presentations


Ads by Google