Presentation is loading. Please wait.

Presentation is loading. Please wait.

10 March 2004Richard J. White – COMSC / BB Unit Reliable knowledge discovery in a biodiversity Grid Part 2: Litchi and ambiguous names by Richard J. White.

Similar presentations


Presentation on theme: "10 March 2004Richard J. White – COMSC / BB Unit Reliable knowledge discovery in a biodiversity Grid Part 2: Litchi and ambiguous names by Richard J. White."— Presentation transcript:

1 10 March 2004Richard J. White – COMSC / BB Unit Reliable knowledge discovery in a biodiversity Grid Part 2: Litchi and ambiguous names by Richard J. White presented to the Biostatistics & Bioinformatics Unit, Cardiff Wednesday 10 March 2004

2 10 March 2004Richard J. White – COMSC / BB Unit Ambiguous nomenclature Challenges in creating global biodiversity information systems by merging and linking databases: ambiguities arise from the way scientific names refer to species for example, if two species are combined, one of the original names must be re-used to refer to the new concept conversely, when a species is divided into two, one part must retain the original name

3 10 March 2004Richard J. White – COMSC / BB Unit A problem in Biodiversity Informatics The way species are named may affect the reliability and usability of species information systems Techniques to handle the problem semi-automatically can be developed This problem and potential solutions may in some cases generalise to other naming schemes

4 10 March 2004Richard J. White – COMSC / BB Unit Names for species A new name is published by an author who thinks the species is new and therefore needs a name Later, others may disagree and merge this species with another (the older name is re-used to designate the merged species – same name, different meaning (broader circumscription)

5 10 March 2004Richard J. White – COMSC / BB Unit Names for species Alternatively, a species may be split in two; one of the new species gets a new name (the older name is re-used to designate the other one – same name, different meaning (narrower circumscription)

6 10 March 2004Richard J. White – COMSC / BB Unit Example Locate sequence data for all species of Vicia Some data may be listed under species of the obsolete genus Orobus A name such as Vicia narbonensis might be regarded by some as just another name for Vicia faba

7 10 March 2004Richard J. White – COMSC / BB Unit Example You want to discover all there is to know about one species It may be listed in different sources under different names These examples show why taxonomists attach great importance to synonyms

8 10 March 2004Richard J. White – COMSC / BB Unit (PDL cover)

9 10 March 2004Richard J. White – COMSC / BB Unit (PDL page)

10 10 March 2004Richard J. White – COMSC / BB Unit (ILDIS search results)

11 10 March 2004Richard J. White – COMSC / BB Unit (ILDIS species page)

12 10 March 2004Richard J. White – COMSC / BB Unit “Mr Linnaeus” A web-based mock-up to explore aspects of the user interface of a system for interpreting “taxonomically intelligent links” Prepared by Helen Bradbrook, an MSc student in the School of Plant Sciences at the University of Reading

13 10 March 2004Richard J. White – COMSC / BB Unit

14 10 March 2004Richard J. White – COMSC / BB Unit

15 10 March 2004Richard J. White – COMSC / BB Unit

16 10 March 2004Richard J. White – COMSC / BB Unit

17 10 March 2004Richard J. White – COMSC / BB Unit

18 10 March 2004Richard J. White – COMSC / BB Unit

19 10 March 2004Richard J. White – COMSC / BB Unit Ambiguous nomenclature The problems are inherent in the subjective nature of the species concept they cannot be removed by, for example, using numbers instead of names (unless a completely new name or number is invented every time the circumscription changes) Some of these issues were addressed in the LITCHI project …

20 10 March 2004Richard J. White – COMSC / BB Unit LITCHI Project A rule-based tool for the detection and repair of conflicts and merging of data in taxonomic databases

21 10 March 2004Richard J. White – COMSC / BB Unit Litchi a BBSRC/EPSRC “Bioinformatics Initiative” project (with Reading) using “conflicts” between species databases arising from ambiguous nomenclature but information is implicit in the lists of synonyms accompanying species names rule-based (Prolog) definition, detection and resolution of conflicts

22 10 March 2004Richard J. White – COMSC / BB Unit Project Staff Suzanne Embury, Alex Gray, Andrew Jones, Iain Sutherland Object and Knowledge-based Systems Group, Department of Computer Science, University of Wales, Cardiff, PO Box 916, Cardiff CF24 3XF Frank Bisby, Sue Brandt Centre for Plant Diversity and Systematics, School of Plant Sciences, The University of Reading, Reading RG6 6AS John Robinson, Richard White Biodiversity & Ecology Research Division, School of Biological Sciences, University of Southampton, Southampton SO16 7PX

23 10 March 2004Richard J. White – COMSC / BB Unit Why is LITCHI needed? Species names are the key to biodiversity information Trend towards large biodiversity databases and global systems Manual merging of taxonomic databases very time- consuming Users want to browse “seamlessly” from one web-site to another Users want to assemble reliable data sets drawn from several sources, but information on naming “conflicts” is hard to find and checking for them is tedious

24 10 March 2004Richard J. White – COMSC / BB Unit Example 1 Checklist A Caragana arborescens Lam. [accepted name] Caragana sibirica Medikus [synonym] Checklist B Caragana sibirica Medikus [accepted name] Caragana arborescens Lam. [synonym]

25 10 March 2004Richard J. White – COMSC / BB Unit Example 2 Checklist A Caesalpinia crista L. [accepted name] Checklist B Caesalpinia crista L. [accepted name] Caesalpinia bonduc (L.) Roxb. [accepted name] Caesalpinia crista L., p.p. [synonym]

26 10 March 2004Richard J. White – COMSC / BB Unit Example 3 In the case of the species Cytisus scoparius Treatment A will list it as Cytisus scoparius (synonym Sarothamnus scoparius) Treatment B will list it as Sarothamnus scoparius (synonym Cytisus scoparius) Genus Cytisus Genus Sarothamnus Genus Cytisus Cytisus scoparius Sarothamnus scoparius Cytisus striatus Sarothamnus striatus Cytisus multiflorus Cytisus praecox Treatment A recognises one genus, Cytisus Treatment B recognises two genera, Cytisus and Sarothamnus

27 What we did Formulated rules for integrity and conflict, first in English and then in definite clauses of logic Translated these declarative rules to build and test a Prolog model Devised and tested algorithms to detect and report conflicts Devised and tested algorithms to manage the partially-automated correction of the conflicting elements Built and operated a prototype software system

28 10 March 2004Richard J. White – COMSC / BB Unit Integrity and conflict rules How a scientific name should be composed (Rules of Nomenclature) Rules for citing the assemblage of names and synonyms for one taxon Rules of integrity and “concept relationships” (overlap etc.) between the taxa in a taxonomic treatment Rules for detecting conflicts between treatments Rules for classifying conflicts to determine the action to be taken

29 10 March 2004Richard J. White – COMSC / BB Unit Testing the rules Conflicts were detected in the ILDIS database by Rule 3 which states that a full name may not appear as an accepted name and a synonym in the same checklist:  (  n,a,l) accepted_name(n,a,_,l,_)  synonym(n,a,_,l,_) In Prolog form, this rule is expressed: litchi_rule3:- accepted_name(N,A,_,L,_), synonym(N,A,_,L,_).

30 10 March 2004Richard J. White – COMSC / BB Unit A detected conflict The Prolog conflict detection engine reported: conflict(3:[Astragalus,variegatus]: [Freyn,&,Bornm,.]:combinedlist) The conflict report includes the following information: Astragalus variegatus Freyn & Bornm. (accepted name) Astragalus sarypulensis B.Fedtsch. (synonym) Astragalus rufescens Freyn (accepted name) Astragalus variegatus Freyn & Bornm. (synonym)

31 10 March 2004Richard J. White – COMSC / BB Unit Conflict display

32 10 March 2004Richard J. White – COMSC / BB Unit Repairing violations User may wish to look at context of violation to determine appropriate repair Domain-specific knowledge can be applied to narrow down set of (taxonomically) valid repairs presented to the user

33 10 March 2004Richard J. White – COMSC / BB Unit Conflict repair

34 10 March 2004Richard J. White – COMSC / BB Unit Implementing LITCHI: major aspects Design of a suitable architecture Development of a model for species checklists Modelling taxonomic practice using constraints Providing appropriate support to the editor in repairing constraint violations

35 Summary We modelled the knowledge integrity rules in a taxonomic treatment. The knowledge tested is implicit in the assemblage of scientific names and synonyms used to represent each taxon (examples later). Practical uses include detecting and resolving taxonomic conflicts when merging or linking two databases.

36 10 March 2004Richard J. White – COMSC / BB Unit Outcome of project A prototype tool for merging checklists & checking integrity of individual checklists was implemented & is freely available (but scarcely usable) We plan to extend this work:  “re-implemented” production version  dynamic linking (so-called “taxonomically intelligent links”)

37 10 March 2004Richard J. White – COMSC / BB Unit Litchi 2 Solutions to the nomenclature challenges, including Litchi and its interaction with Spice are being developed further in the course of the new BBSRC “Biodiversity World” Grid demonstrator project and the EU “Species 2000 europa” and ENBI projects (involving the same parties)

38 10 March 2004Richard J. White – COMSC / BB Unit Litchi 2 “Intelligent linking” is to protect users from and explain nomenclatural ambiguities Development of these techniques would be easier if we had an explicit representation of the overlaps between species in different databases Such “cross-maps” can be constructed automatically using similar rules in the new Litchi version 2

39 10 March 2004Richard J. White – COMSC / BB Unit Future projects Ambiguous nomenclature on-going programme of projects (already involving collaboration with staff here in COMSC) building tools such as Litchi to help bioinformaticians deal with ambiguous nomenclature These techniques might be extended to other areas of bioinformatics where subjective identification and ambiguous nomenclature occur, such as the names of proteins (as suggested by Andrew Jones), genes, geographical areas, habitat types, etc.

40 10 March 2004Richard J. White – COMSC / BB Unit An “intelligent” system It would know about the synonymies and ambiguities existing in various data domains It would help the user work with such data It would contain a thesaurus, “knowledge-base” or “ontology”

41 10 March 2004Richard J. White – COMSC / BB Unit An “intelligent” system These are hard to construct by hand Litchi shows how this might be done by supervised automatic procedures in the case of species names We want to generalise these ideas and techniques to other data domains, maybe those that you are interested in


Download ppt "10 March 2004Richard J. White – COMSC / BB Unit Reliable knowledge discovery in a biodiversity Grid Part 2: Litchi and ambiguous names by Richard J. White."

Similar presentations


Ads by Google