Presentation is loading. Please wait.

Presentation is loading. Please wait.

Knowledge Sharing and Collaborative Problem Solving in Biodiversity Informatics Andrew C. Jones Cardiff University, UK.

Similar presentations

Presentation on theme: "Knowledge Sharing and Collaborative Problem Solving in Biodiversity Informatics Andrew C. Jones Cardiff University, UK."— Presentation transcript:

1 Knowledge Sharing and Collaborative Problem Solving in Biodiversity Informatics Andrew C. Jones Cardiff University, UK

2 2 The Species 2000 vision To enumerate all known species of plants, animals, fungi and microbes on Earth as the baseline dataset for studies of global biodiversity To provide a simple access point enabling users to link from Species 2000 to other data systems for all groups of organisms, using direct species-links To enable users worldwide to verify the scientific name, status and classification of any known species through species checklist data drawn from an array of participating databases (More recently) to provide a “synonymy server” for use as a service by other applications needing to obtain suitable scientific names, e.g. for querying biological data sets

3 3 Need for a catalogue Suppose we wished to retrieve all locations where specimens of Caragana arborescens have been collected, from various specimen distribution databases. A taxonomic checklist might include: Caragana arborescens Lam. [accepted name] Caragana sibirica Medikus [synonym] Classification of organisms is based on opinion regarding –what the groups are –identification of individuals So we need to use both these names as search terms In practice the problem might be far worse

4 4 SPICE for Species 2000: Meeting the Computing challenges The SPICE for Species 2000 project aimed to: –build a federated ‘registry’ of scientific names organised by taxon (species, etc.) –accommodate GSD (Global Species Database) heterogeneity –accommodate GSD autonomy & instability –ensure scalability Funding: –SPICE was funded by the UK BBSRC/EPSRC Bioinformatics panel –EuroCat – new EU-funded project to augment SPICE catalogue of life & develop/maintain SPICE software

5 SPICE Project Staff Cardiff – Prof. Alex Gray, Dr. Andrew Jones, Prof. Nick. Fiddian, Dr. Xuebiao Xu, (Mr. Nick Pittas). Object and Knowledge-based Systems Group, Department of Computer Science, Cardiff University, PO Box 916, Cardiff CF24 3XF Email:{W.A.Gray|Andrew.C.Jones|N.Fiddian|X.Xu|N.Pittas} Telephone +44 (0)29 2087 4812 Reading – Prof. Frank Bisby, Prof. Sir Ghillean Prance and Dr. Sue Brandt. Centre for Plant Diversity & Systematics, The University of Reading, Reading RG6 6AS Email:{F.A.Bisby|S.M.Brandt} Telephone +44 (0) 118 378 6437 Southampton – Dr. Richard White and Mr. John Robinson. Biodiversity & Ecology Research Division, School of Biological Sciences, University of Southampton, Southampton SO16 7PX Email:{R.J.White|J.S.Robinson} Telephone +44 (0)23 8059 2021 Royal Botanic Gardens, Kew - Prof. Peter Crane, Dr. Don Kirkup, Ms. Sally Hinchcliffe, Mr. Graham Christian and others Natural History Museum, London - Prof. Paul Henderson, Mr. Charles Hussey and others BIOSIS UK - Mr. Michael Dadd, Ms. Judith Howcroft and others 5

6 6 Interactive use of SPICE …

7 7

8 8

9 9

10 10

11 11 Basic uses for the catalogue User wishes to check taxonomy of some organisms interactively; or User wishes to access or store data (observations, gene sequences; …) associated with a given species: –Catalogue gives information about accepted name/synonyms –Can use all names for retrieval, for example –May well want to use the accepted name provided by SPICE for storing new data.

12 12 The “standard data” Comprises the information about a species which Species 2000 wishes to provide: –AVCNameWithRefs –SynonymWithRefs –CommonNameWithRefs –Family –Comment –Scrutiny –DataLink –Geography Minimalistic CDM devised: –The basic information needed for a catalogue of life; –If GSD can’t be wrapped to conform, probably doesn’t contain required information

13 13 Request Types 0-5 Again, a fairly simple set of operations is required: –Type 0: Get CDM version compliance for a GSD –Type 1: Search for a name in a GSD –Type 2: Fetch “standard data” about a chosen species –Type 3: Get information about a GSD –Type 4: Move up the taxonomic hierarchy –Type 5: Move down the taxonomic hierarchy

14 14 Type 1 response (XML) extract Abrus abrus (L.) Wright synonym Abrus precatorius L. accepted 1571 …

15 15 SPICE architecture GSD Wrapper (e.g. JDBC) Wrapper (e.g.CGI/XML + ODBC) User (Web Browser) User (Web browser) …… (in some cases, generic) CORBA ‘wrapper’ element of GSD Wrapper User Server module (HTTP) ‘Query’ co-ordinator CAS knowledge repository (taxonomic hierarchy, annual checklist, genus and other caches,...) Common Access System (CAS) CORBA

16 16 Why a federation of autonomous, heterogeneous GSDs? Taxonomists have specialist knowledge of a limited range of organisms, and want to make their data available in various ways So –the hierarchy is divided into sectors, with an individual or group of scientists responsible for each –scientists are given control over their databases –we accommodate existing heterogeneous GSDs; also new ones built for various purposes This helps assure taxonomic data quality (peer review of GSDs is also used)

17 17 Specialist GSDs mean better data quality than non-specialist ones … … but data quality problems still arise: –“Non-overlapping” sectors may, in fact, overlap –GSDs may be inconsistent taxonomically –GSDs may be formed by merging two or more other databases, mutually inconsistent

18 18 LITCHI Project A rule-based tool for the detection and repair of conflicts and merging of data in taxonomic databases

19 19 Project Staff Suzanne Embury, Alex Gray, Andrew Jones, Iain Sutherland Object and Knowledge-based Systems Group, Department of Computer Science, University of Wales, Cardiff, PO Box 916, Cardiff CF24 3XF Frank Bisby, Sue Brandt Centre for Plant Diversity and Systematics, School of Plant Sciences, The University of Reading, Reading RG6 6AS John Robinson, Richard White Biodiversity & Ecology Research Division, School of Biological Sciences, University of Southampton, Southampton SO16 7PX

20 20 Summary We modelled the knowledge integrity rules in a taxonomic treatment The knowledge tested is implicit in the assemblage of scientific names and synonyms used to represent each taxon (examples later) Practical uses include detecting and resolving taxonomic conflicts when merging or linking two databases

21 21 Example 1 Checklist A Caragana arborescens Lam. [accepted name] Caragana sibirica Medikus [synonym] Checklist B Caragana sibirica Medikus [accepted name] Caragana arborescens Lam. [synonym]

22 22 Example 2 In the case of the species Cytisus scoparius Treatment A will list it as Cytisus scoparius (synonym Sarothamnus scoparius) Treatment B will list it as Sarothamnus scoparius (synonym Cytisus scoparius) Genus Cytisus Genus Sarothamnus Genus Cytisus Cytisus scoparius Sarothamnus scoparius Cytisus striatus Sarothamnus striatus Cytisus multiflorus Cytisus praecox Treatment A recognises one genus, Cytisus Treatment B recognises two genera, Cytisus and Sarothamnus

23 23 Example of a rule In each of the 2 examples, merging the checklists would lead to violation of: –“A full name which is not a pro-parte name may not appear as both an accepted name and a synonym in the same checklist” (Violations of other rules help user to distinguish the taxonomic causes; various options to repair this violation) violation:- accepted_name(N,A,C1,L,T1), synonym(N,A,C2,L,T2), (\+pro_parte(C1); \+pro_parte(C2)).

24 24 Conflict display

25 25 LITCHI: current status Good selection of rules (for botanical nomenclature) A research project, now in need of re- engineering: –Implemented in Prolog & Visual Basic; not portable –Uses XDF file format for data import/export

26 26 Some future developments of LITCHI BiodiversityWorld –BiodiversityWorld is not funded to develop LITCHI at all, but will be able to take advantage of LITCHI developments for ‘taxonomically intelligent navigation’ EuroCat –Re-engineer LITCHI, to work with GSDs wrapped to SPICE CDM 1.2 –Use for Intra- and inter- GSD consistency checking Navigation between resources organised according to differing taxonomies, e.g. for access to regional hubs –Use in conjunction with, and for generating, ‘cross-maps’

27 27 Litchi in (future) use Checklist AChecklist B Rules Conflict description Possible repairs Cross-map Taxonomic intelligence Read into system Write Conflict detection Conflict display Conflict repair (not necessarily used in this context)

28 28 BiodiversityWorld Problem solving environment for biodiversity informatics on the GRID UK BBSRC-funded Universities of Reading, Cardiff & Southampton, and The Natural History Museum, London

29 29 BiodiversityWorld – The Challenge Some difficult Biodiversity questions How should conservation efforts be concentrated? –(example of Biodiversity Richness & Conservation Evaluation) Where might a species be expected to occur, under present or predicted climatic conditions? –(example of Bioclimatic modelling and Climate Change) Is geography a good predictor of relationship between lineages? (e.g. are the more closely related species found near each other?) –(example of Phylogenetic Analysis & Biogeography)

30 30 Some relevant resource types Data sources: –Catalogue of life –Species Information Sources (SISs) Species geography Descriptive data Specimen distribution –Geographical Boundaries of geographical & political units Climate surfaces –Genetic sequences Analytic tools: –Biodiversity richness assessment – various metrics –Bioclimatic modelling – bioclimatic ‘envelope’ generation –Phylogenetic analysis (generation of phylogenetic trees)

31 31 Some challenges … Finding the resources Knowing how to use these heterogeneous resources –Originally constructed for various reasons –Often little thought was given to standards or interoperability One important specific issue: using appropriate scientific name for SIS queries (hence SPICE for Species 2000)

32 32 Our vision Biodiversity Problem Solving Environment – –Heterogeneous diverse resources –Flexible workflows –Main challenges centre around metadata, interoperability, etc; –High-performance computing secondary (though relevant) Our previous GRAB demonstrator illustrates some Bioclimatic Modelling elements, with a fixed workflow …

33 33 Typical GRAB display Web browser ‘front-end’ to the GRAB server Applet monitoring communication between GRAB server and GRAB databases

34 34 Why the GRID for BiodiversityWorld (or even GRAB?) HPC; mobility of data & programs Resource discovery OGSA (Open Grid Services Architecture) – not Globus-specific – gives Web Services & life cycle management, etc Workflow for orchestrating resources, etc.

35 35 Taxonomic index (SPICE Catalogue of Life) Analytic tool Thematic Data source BioD-GRID Ontology:  Metadata  Intelligent links  Resource & Analytic tool descriptions  Maintenancetools Proxy Abiotic Data source Analytic tool Proxy User Local tools Problem Solving Environment User Interface GSD Problem Solving Environment:  Broker agents  Facilitator agents  Presentation agents BiodiversityWorld architecture

36 36 Bioclimatic modelling Case Study - Leucaena leucocephala Leucaena leucocephala (Lam.) De Wit Native of Central America Widely introduced around the tropics Widely utilised around the globe for: –Wood –Forage –Soil enrichment and erosion control Regarded as an invasive weed in some areas

37 37 Point data from various herbaria

38 38 Distribution data from ILDIS database

39 39 GARP prediction of climatic suitability

40 40 Workflow Our PSE should provide flexible support for development of complex workflows for: –experimental design of in silico biodiversity-related experiments –repeatability –modification of experiments

41 41 START STAGE 1 STAGE 2 STAGE 3 Analytical Toolbox Reference to Abiotic datasets Species 2000 Catalogue of Life Distributed Array of GSD’s Enquiry name(s) Returns list of accepted taxa, synonyms and common names Distributed array of thematic data sources Enquiry: select ‘data’ for ‘taxon set’ Return dataset composed of homologous responses from multiple thematic data sources Presentation and storage of results Typical workflow

42 42 Initial test workflow SPICE Localities Climate Space Model Base Maps Climate Prediction Submit scientific name; retrieve accepted name & synonyms for species Retrieve distribution maps for species of interest Climate surfaces Model of climatic conditions where species is currently found Possibly different climate surfaces (e.g. predicted climate) World or regional maps Prediction of suitable regions for species of interest

43 43 BiodiversityWorld – much more complex than SPICE Much more heterogeneity –diverse kinds of databases and tools Much greater range of data quality and terminology problems, e.g. –accuracy of “point data” –country names –…

44 44 Role/use of metadata Descriptive Create electronic book for user Create workflows –necessary transformations –provenances –interoperability Locate appropriate elements Rerun processing (possibly with modifications)

45 45 Conclusion The field of biodiversity informatics presents various challenges including: –taxonomic/naming –heterogeneity & autonomy –data quality –need for extensive metadata

Download ppt "Knowledge Sharing and Collaborative Problem Solving in Biodiversity Informatics Andrew C. Jones Cardiff University, UK."

Similar presentations

Ads by Google