Presentation is loading. Please wait.

Presentation is loading. Please wait.

Standardizing Mansfeld's World Database of Agricultural and Horticultural Crops by Implementing a Concept-Based Data Model Ram Narang and Helmut Knüpffer.

Similar presentations


Presentation on theme: "Standardizing Mansfeld's World Database of Agricultural and Horticultural Crops by Implementing a Concept-Based Data Model Ram Narang and Helmut Knüpffer."— Presentation transcript:

1 Standardizing Mansfeld's World Database of Agricultural and Horticultural Crops by Implementing a Concept-Based Data Model Ram Narang and Helmut Knüpffer Leibniz Institute of Plant Genetics and Crop Plant Research, D-06466 Gatersleben, Germany knupffer@ipk-gatersleben.de knupffer@ipk-gatersleben.de Introduction The integration of species-related information from multiple sources in federated information systems or web portals faces the problem of different taxonomic approaches used. Many global and local taxonomic databases, among them ITIS and Species2000, provide information about species, based on a single taxonomic view, where information is attached to a single accepted (or preferred) name. Taxonomic opinions and standards vary with time, place, and investigator, and depend upon many factors like geographical range of study, interpretation of collected specimens, the fossil record, morphology, genetics and molecular phylogeny. New classifications may arise from more detailed studies of specimens, the discovery of new taxonomic information, or the description of new species and groupings. Consequently, biological taxa often have multiple names, which in turn may have been applied to multiple taxon concepts. When combining such data from diverse sources into a single database or portal, one needs to reconcile those different standards. In addition, the increasing use of DNA sequence comparison as a tool to analyse phylogenetic relationships is accelerating the rate of taxonomic revision, which is thus unlikely to stabilize in the foreseeable future. Therefore, the availability and implementation of a data model representing multiple, alternative taxonomic views is crucial for a sound taxonomic information management. Leibniz Institute of Plant Genetics and Crop Plant Research The Berlin Data Model for Taxonomic Information Mansfeld’s World Database of Agricultural and Horticultural Crops Taxonomy Module of the Mansfeld Database Implementing the Multiple Taxonomic Concepts Model References Various data models have been developed to support the representation of multiple, alternative taxonomic views in taxonomic databases (cf. Kennedy et al. 2006), among them the Berlin Model (Berendsohn et al. 2003), based on the IOPI model. The Berlin Model allows to use alternative taxonomic concepts (potential taxa) for species information. A number of projects, such as the Euro+Med PlantBase, AlgaTerra, MoReTax, the IOPI Global Plant Checklist, the Dendroflora of El Salvador and Med-Checklist, implemented the core of the Berlin Model as a taxonomic backbone for their databases and contributed to its continuous development and optimization (http://www.bgbm.org/biodivinf/docs/bgbm-model/). In addition, the Berlin Model is the underlying model of several tools dedicated to taxonomic data management such as taxonomic revisions, data import from external sources, data integrity checking and data publishing on the World Wide Web.http://www.bgbm.org/biodivinf/docs/bgbm-model/ The Core of the Berlin Model contains four central functional sections: (1) Taxon Names, (2) Potential Taxon (taxonomic concepts), (3) Facts and (4) References. Taxon names are the botanical names according to the International Code of Botanical Nomenclature (ICBN). Like many other global taxonomic checklists, the Mansfeld Database represents a single taxonomic view of nomenclatural information. It incorporates classifications that have gained broad acceptance in taxonomic literature and by taxonomists working with the taxa concerned, and thus offers the opportunity of standardizing scientific nomenclature and taxonomy for cultivated plant species. Alternative taxonomic views (reflected by phrases such as sensu, amend., etc.) are presently stored as part of the nomenclatural reference. Similarly, authors and bibliographical references are not yet atomized into individual attributes. These information items need to be parsed and abstracted into the entity-relationship model to allow a conceptual view on the taxon. The Mansfeld Database (http://mansfeld.ipk-gatersleben.de) is an online database developed at IPK since 1998, initially as a contribution to the project “Federal Information System on Genetic Resources” (BIG, http://www.big-flora.de/). It reflects the contents of “Mansfeld’s Encyclopedia of Agricultural and Horticultural Crops” (Hanelt and IPK 2001) and contains information on ca. 6,100 crop plant species, excluding forestry and ornamental plants. Each species entry provides nomenclature and synonymy, common names in different languages, the distribution of the species in the wild and regions of cultivation, uses, images, references, but also the ancestral species and notes on the phylogeny, variation and history.http://mansfeld.ipk-gatersleben.dehttp://www.big-flora.de/ Originally developed under Microsoft Visual FoxPro, the Mansfeld Database has recently been migrated to the database platform Oracle 10g, and the procedures for the web interface were re-programmed. In a first step of implementation, the latest version of the Berlin Core Model, a database model under MS SQL Server, was migrated into Oracle 10g. All database procedures, functions and triggers that implement taxonomic logic, were translated into their PL/SQL equivalents. Nomenclatural and bibliographical data of the Mansfeld Database was atomised using JAVA programmes. The parsed information was tagged and stored in an XML file. The resulting soft-schema XML-file was read with JDOM and corrected manually -- a time-consuming task --, to write a strict schema XML file which was used to populate the tables in the Taxon, Reference and Potential Taxon sections of the Berlin taxonomic model. After completion of the taxonomic core, the remaining information from the Mansfeld Database, such as textual information on geographical distribution and uses, was linked to the potential taxon as factual data. Finally, the web interface was adapted (re- programmed) to the new data model. Name Taxon Concept Reference Facts Relation The combination of such a name with a reference forms a taxonym (or potential taxon, taxon concept). An auxiliary section Authors assembles author teams for the nomenclatural references. Finally, the fact component can be used to store any kind of factual information. Basic data integrity rules in the Berlin Model are implemented at the level of tables, keys, and relations within the database model. For example, the rule that every botanical name should have a rank can be assured with a foreign key to the table defining the list of valid ranks. More complex rules and functions, e.g. to construct syntactically correct botanical names, are implemented using stored procedures and trigger functions. Triggers are functions executed automatically when certain database events occur. For example, one of the triggers automatically rebuilds an author team when one of its author names was changed. The implementation of the Berlin Model in the Mansfeld Database facilitates standardisation and improves the quality of the taxonomic information by increasing accuracy, resolution and interpretability. In addition, existing standard taxonomy management tools such as a web editors can be adapted to be used on the underlying new conceptual Mansfeld Database model for updating the contents of the database. Vast information about 6,100 species of agricultural and horticultural crop plants will thus become more easily accessible to global portals on biodiversity information. Outlook Conceptual Db model Mansfeld Database XML soft schema I XML strict schema IIIII Web screenshots of the Mansfeld Database before the transformation to the Berlin Model Mansfeld Database – Taxonomy module Entity-relationship model of the potential taxon Concept-oriented database core Implementation steps Taxon Rank Taxon Name Potential Taxon Name cm m m 11 1 1 c c is accepted name assigns accepted name is higher taxon in classification gives status and other taxonomic information of is classified 1 Reference Status Assignment Assigned Status Reference Title The Encyclopedia of Life (http://www.eol.org) launched in 2007 is developing “species pages” for all known organisms, the contents to be provided and edited by experts from all over the world, using a wiki-like editor. Its initial contents is being gathered from existing web resources. The rich information contents of >6,000 of the economically most important plant species documented in the Mansfeld Database was offered for inclusion at the EoL Plant Species Pages Meeting (St. Louis, Missouri), 31.10.-2.11.2007.http://www.eol.org The Global Biodiversity Information Facility (http://www.gbif.org) is aiming at providing free access to biodiversity information on the web, using standardised web services. The Mansfeld Database developers have been approached by GBIF to make its ca. 38,000 common names of crop plant species in many languages available to GBIF, to start developing an interface that would allow the world’s biodiversity data to be queried also via common names, besides scientific names. Integrating the Mansfeld Database fully into GBIF would also make its rich crop species information accessible along with data from other providers of taxon-related data.http://www.gbif.org Berendsohn, W.G., M. Döring, M. Geoffroy, K. Glück, A. Güntsch, A. Hahn, W.-H. Kusber, J.L. Li, D. Röpert and F. Specht. 2003. The Berlin Model: a concept-based taxonomic information model. Pp. 15-26 in Berendsohn, W.G. (ed), MoReTax. Handling Factual Information Linked to Taxonomic Concepts in Biology. Schriftenreihe für Vegetationskunde 39, Bonn. Hanelt, P. and Institute of Plant Genetics and Crop Plant Research (eds), 2001. Mansfeld’s Encyclopedia of Agricultural and Horticultural Crops (Except Ornamentals). 6 vols. 1 st Engl. ed. Springer, Berlin, Heidelberg, New York, etc. (LXX+3645 pp.) Kennedy, J., R. Hyam, R. Kukla and T. Paterson, 2006. Standard data model representation for taxonomic information. OMICS. A Journal of Integrative Biology 10 (Special Issue on Data Standards), 220-230.


Download ppt "Standardizing Mansfeld's World Database of Agricultural and Horticultural Crops by Implementing a Concept-Based Data Model Ram Narang and Helmut Knüpffer."

Similar presentations


Ads by Google