Presentation is loading. Please wait.

Presentation is loading. Please wait.

Beispielbild The BioCASe Technology Jörg Holetschek Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories.

Similar presentations


Presentation on theme: "Beispielbild The BioCASe Technology Jörg Holetschek Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories."— Presentation transcript:

1 Beispielbild The BioCASe Technology Jörg Holetschek Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories Königin-Luise-Straße 6-8 14195 Berlin BioCASe Workshop, Melbourne, Feb 4-5th 2010

2 2 Agenda 1.The BioCASe Architecture: An Overview 2.The BioCASe Provider Software Feature, Requirements, Installation, Configuration 3.The ABCD and HISPID data standards Intentions, Structure, Elements, Use 4.Preparing the database for BioCASe/ABCD 5.Setting Up Datasources Database connection, Table Setup, Mapping Process; Testing, Data Backups 6.Other Issues Workshop Wiki: http://hiscom.chah.org.au/wiki/BioCASe_Workshophttp://hiscom.chah.org.au/wiki/BioCASe_Workshop

3 Beispielbild 1.BioCASe Technology: Motivation, Idea and Architecture

4 4BioCASe Workshop, Melbourne, Feb 4-5th 2010 Herbaria, Drawings © J. Holstein et al.

5 5BioCASe Workshop, Melbourne, Feb 4-5th 2010 Preserved Specimens © J. Holstein et al.

6 6BioCASe Workshop, Melbourne, Feb 4-5th 2010 Living Collections © J. Holstein et al.

7 7BioCASe Workshop, Melbourne, Feb 4-5th 2010 Culture collections © J. Holstein et al.

8 8BioCASe Workshop, Melbourne, Feb 4-5th 2010 Primary Biodiversity Data  Biological Collection Access Service Documentation of the occurrence of one species at a given location at a certain point in time = Primary Biodiversity Data Record

9 9BioCASe Workshop, Melbourne, Feb 4-5th 2010 Data sources worldwide -Index Herbariorum: 3,293 herbaria, 400 million herbarium sheets -50-100,000 natural history collections, 1.5-2 billion specimens -With observations added, occurrence records 3+ billion (10b?) Over 75% of biodiversity information are stored in developed countries. Est. 75% of all species are found in the developing world. Source: BARTHLOTT et al. 1999

10 10BioCASe Workshop, Melbourne, Feb 4-5th 2010 Accessibility Stage 0: Only in real world (paper catalogues, just stacks) Only meta information available on the web Stage 1:Stage 2: Online catalogue Digitalization of specimen

11 11BioCASe Workshop, Melbourne, Feb 4-5th 2010 Biodiversity Data Stage 3: Networking the databases

12 12BioCASe Workshop, Melbourne, Feb 4-5th 2010 Architectural Overview 2. Wrapper Software 1. Protocols/Data Standards Data Quality Checker DataMining 3. Applications Data Portal

13 13BioCASe Workshop, Melbourne, Feb 4-5th 2010 BioCASe Design Principles No central database  Data remain in the existing DB systems  Data Provider gets full credit  Full control over published data by collection holder Partial publication possible  Collection holder can withhold information from publication (e.g., locality data for endangered species) or exclude records (e.g. until research results are published) Wrapper principle  Data remain in original collection management system  No changes in workflow for curator/local users

14 14BioCASe Workshop, Melbourne, Feb 4-5th 2010 2: The BioCASe ProviderSoftware Wrapper: BioCASe Provider Software Protocols/Data Standards Data Quality Checker DataMining Applications Data Portal

15 15BioCASe Workshop, Melbourne, Feb 4-5th 2010 Software package that „wraps“ around the collection database  Equips it with a BioCASe protocol compliant interface 1.Accepts requests from the network 3. Transforms results into ABCD documents and sends them back BioCASe Provider Software (Wrapper) Marmota marmota? 2.Tanslates queries to the collection database SELECT * FROM specimen WHERE ScientificName LIKE “Marmota marmota%“

16 16BioCASe Workshop, Melbourne, Feb 4-5th 2010 BioCASe Provider Software (Wrapper) Compatible with several protocols (BioCASe, DiGIR) and data schemas (ABCD, ABCD-EFG, ABCD-DNA, DarwinCore) Works with all SQL-compliant databases (Access, mySQL, Postgres, SQL Server, Oracle,...) Currently 84 production installations serving 1.315 collections Platform independent Support available!

17 17BioCASe Workshop, Melbourne, Feb 4-5th 2010 BioCASe Providers Worldwide 84 production installations serving 1.315 collections

18 18BioCASe Workshop, Melbourne, Feb 4-5th 2010 Requirements - SQL compliant database (with existing Python connectivity module) - Webserver (preferrably Apache), allowing the execution of Python scripts - Possibility to install additional Python packages

19 19BioCASe Workshop, Melbourne, Feb 4-5th 2010 Steps 1.Installing Apache 2.Installing Python 3.Downloading BPS (from repository/archive) 4.Installing BPS 5.Creating the link Apache/BPS 6.Testing BPS, Setup of additional packages 7.Changing directory permissions

20 20BioCASe Workshop, Melbourne, Feb 4-5th 2010 1. Installing Apache http://httpd.apache.org/download

21 21BioCASe Workshop, Melbourne, Feb 4-5th 2010 2. Installing Python http://www.python.org/download/

22 22BioCASe Workshop, Melbourne, Feb 4-5th 2010 3. Downloading BPS Archive: http://www.biocase.org/products/provider_software/http://www.biocase.org/products/provider_software/ Subversion repository Trunk version: http://ww2.biocase.org/svn/bps2/trunkhttp://ww2.biocase.org/svn/bps2/trunk Defined version: http://ww2.biocase.org/svn/bps2/tags/release_2.5.2http://ww2.biocase.org/svn/bps2/tags/release_2.5.2 Linux: svn co Windows: Tortoise client

23 23BioCASe Workshop, Melbourne, Feb 4-5th 2010 4. Installing the BPS Setup.py No files copies, only adapted!

24 24BioCASe Workshop, Melbourne, Feb 4-5th 2010 5. Linking BPS with Apache http.conf

25 25BioCASe Workshop, Melbourne, Feb 4-5th 2010 6. Testing BPS, Installing Additional Packages http://localhost/biocasehttp://localhost/biocase  Utilities  Library Test

26 26BioCASe Workshop, Melbourne, Feb 4-5th 2010 6a: mysqldb http://sourceforge.net/projects/mysql-python/

27 27BioCASe Workshop, Melbourne, Feb 4-5th 2010 7. Write permissions... /bps2/configuration

28 28BioCASe Workshop, Melbourne, Feb 4-5th 2010 Changing the Password... /bps/configuration.ini

29 29BioCASe Workshop, Melbourne, Feb 4-5th 2010 3: ABCD, HISPID Protocols/Data Standards Wrapper Software Data Quality Checker DataMining Applications Data Portal

30 30BioCASe Workshop, Melbourne, Feb 4-5th 2010 ABCD Data Schema Access to Biological Collection Data: Data schema for all types of primary biodiversity data (living/preserved/observational, botanical/zoological/bacterial/viral, marine/terrestrial) XML (eXtensible Markup Language) based  can be consumed by humans and machines Highly complex, hierarchical, currently 1,055 data elements  almost every data item will fit in Extendable (plug-in slot for additional information) standard (currently version 2.06)

31 31BioCASe Workshop, Melbourne, Feb 4-5th 2010 ABCD: Structure Namespace: http://www.tdwg.org/schemas/abcd/2.06

32 32BioCASe Workshop, Melbourne, Feb 4-5th 2010 ABCD Metadata: Technical/Content Contact

33 33BioCASe Workshop, Melbourne, Feb 4-5th 2010 ABCD Metadata: Intellectual Property Rights

34 34BioCASe Workshop, Melbourne, Feb 4-5th 2010 ABCD Metadata: Representation, Owner,...

35 35BioCASe Workshop, Melbourne, Feb 4-5th 2010 ABCD: Triple ID, Record Basis

36 36BioCASe Workshop, Melbourne, Feb 4-5th 2010 ABCD: Identification (multiple)

37 37BioCASe Workshop, Melbourne, Feb 4-5th 2010 ABCD: Gathering Event

38 38BioCASe Workshop, Melbourne, Feb 4-5th 2010 ABCD: Multimedia

39 39BioCASe Workshop, Melbourne, Feb 4-5th 2010 ABCD: Unit Associations

40 40BioCASe Workshop, Melbourne, Feb 4-5th 2010 ABCD: Specialised Portions Specimen Unit: Acquisition, Accession, Peparation, Duplicate Distribution, Type Status Herbarium Unit: Loan Information Botanical Garden Unit: Location in Garden, Hardiness, Lineage, Cultivation, Planting Date Other Specialised Subtrees for Observations Culture Collections Mycological Units Zoological Units Paleontological Units Plant Genetic Resources

41 41BioCASe Workshop, Melbourne, Feb 4-5th 2010 ABCD: UnitExtension Own Namespace for Extension http://www.chah.org.au/schemas/hispid/5http://www.chah.org.au/schemas/hispid/5 Other Extensions: Extension for Geoscienes (ABCD-EFG) DNA Bank Network (ABCD-DNA)

42 42BioCASe Workshop, Melbourne, Feb 4-5th 2010 HISPID HISPID Gathering -Coordinates DMS -PersonCollector -Substrate/ParentRock -SoilType, Vegetation HISPID Unit -LifeForm, Phenology -NonComputerisedDataFlag -DonorTyp, ProvenanceType HISPID Identification -HigherTaxon: Addition ranks

43 43BioCASe Workshop, Melbourne, Feb 4-5th 2010 BioCASe Protocol Biological Collection Access Service Protocol: Manages data exchange between data providers (collections) and applications (data portals) Vehicle for transporting requests: data portal  collection and responses (ABCD documents): collection database  data portal XML based

44 44BioCASe Workshop, Melbourne, Feb 4-5th 2010 BioCASe Protocol: Capabilities request

45 45BioCASe Workshop, Melbourne, Feb 4-5th 2010 BioCASe Protocol: Inventory Request

46 46BioCASe Workshop, Melbourne, Feb 4-5th 2010 BioCASe Protocol: Search Request

47 Beispielbild 4. Preparing the database for BioCASe

48 48BioCASe Workshop, Melbourne, Feb 4-5th 2010 4. Reasons for not publishing the live DB 1.Publishing the live DB is not desired  creating snapshots for publication 2.DBMS not accessible for the BPS  export into another DBMS 3.Performance considerations (too highly normalized)  partial, controlled denormalization 4.Repeatable elements kept in columns, not in separate rows  Moving repeatable elements to separate records

49 49BioCASe Workshop, Melbourne, Feb 4-5th 2010 Each repeatable elements needs its own primary key! Repeatable elements kept in columns specimen_id...classorderfamily 3476...ConjugatophyceaeDesmidiales Desmidiaceae 3477...ConjugatophyceaeDesmidiales Desmidiaceae 3478...ConjugatophyceaeDesmidiales Closteriaceae specimen_id... 3476... 3477... 3478... sp_idht_entryht_rankht_name 3476456765classConjugatophyceae 3476456766orderDesmidiales 3476456767family Desmidiaceae 3477456768classConjugatophyceae 3477456769orderDesmidiales 3477456770family Desmidiaceae 3478456771classConjugatophyceae 3478456772orderDesmidiales 3478456773family Closteriaceae

50 50BioCASe Workshop, Melbourne, Feb 4-5th 2010 CREATE VIEW [dbo].[vwHigherTaxa] AS SELECT 'k_' + [EDIT_ATBI_RecordID] AS id, [EDIT_ATBI_RecordID] AS unit_id, [kingdom] AS name, 'kingdom' AS rank FROM unit_data WHERE [kingdom] IS NOT NULL UNION SELECT 'p_' + [EDIT_ATBI_RecordID], [EDIT_ATBI_RecordID], [phylum], 'phylum' FROM unit_data WHERE [phylum] IS NOT NULL UNION SELECT 'c_' + [EDIT_ATBI_RecordID], [EDIT_ATBI_RecordID], [class], 'class' FROM unit_data WHERE [class] IS NOT NULL UNION... Example View

51 51BioCASe Workshop, Melbourne, Feb 4-5th 2010 Commonly used repeatable elements - Identification - HigherTaxon - GatheringSite/NamedArea - Metadata/Scope/GeoecologicalTerms - Metadata/Scope/TaxonomicTerms - MultimediaObjects - MeasurementsOrFacts -...

52 52BioCASe Workshop, Melbourne, Feb 4-5th 2010 Controlled Denormalization insert into [dbo].[abcd_Object] SELECT dbo.CollectionObject.CollectionObjectID, ISNULL(dbo.CatalogSeries.SeriesName, '') + '-' + ISNULL(CAST(dbo.CollectionObjectCatalog.SubNumber AS nvarchar(20)), '') + '-' + ISNULL(CAST(dbo.CollectionObjectCatalog.CatalogNumber AS nvarchar(20)), ''), dbo.f_getParentID(dbo.CollectionObject.CollectionObjectID), dbo.f_getCollectingEventID(dbo.CollectionObject.CollectionObjectID), dbo.f_getFieldNumber(dbo.CollectionObject.CollectionObjectID), cast(dbo.CollectionObjectCatalog.CatalogNumber as int), dbo.CollectionObject.PreparationMethod, case when Sex = ' ' then NULL else Sex end, case when Stage = ' ' then NULL else Stage end, case when dbo.CollectionObject.Text1 is null then '' else 'Barcode: ' + dbo.CollectionObject.Text1 + '; ' end + case when dbo.Accession.Number is null then '' else 'Specimen Location: ' + dbo.Accession.Number end + case when DerivedFrom.Remarks is null then '' else ' ' + cast(DerivedFrom.Remarks as nvarchar(2000)) end FROM dbo.BiologicalObjectAttributes RIGHT OUTER JOIN dbo.CollectionObject ON dbo.BiologicalObjectAttributes.BiologicalObjectAttributesID = dbo.f_getParentID(dbo.CollectionObject.CollectionObjectID) LEFT OUTER JOIN dbo.CollectionObjectCatalog LEFT OUTER JOIN dbo.CatalogSeries ON dbo.CollectionObjectCatalog.CatalogSeriesID = dbo.CatalogSeries.CatalogSeriesID ON dbo.CollectionObject.CollectionObjectID = dbo.CollectionObjectCatalog.CollectionObjectCatalogID LEFT JOIN dbo.Accession on Accession.AccessionID = CollectionObjectCatalog.AccessionID LEFT JOIN dbo.CollectionObject AS DerivedFrom ON CollectionObject.DerivedFromID = DerivedFrom.collectionObjectID WHERE (dbo.f_hasChildObjects(dbo.CollectionObject.CollectionObjectID) = 0) AND...

53 53BioCASe Workshop, Melbourne, Feb 4-5th 2010 How Do I See Someting is Wrong? Errors in ABCD documents: -Several datasets (one for each unit) -Several units for one specimen record Reasons: - Repeatable elements not in separate tables (no separate PK  several units will be created) - Several records in DB for non-repeatable elements (several ABCD objects are necessary to create a valid document)

54 Beispielbild 5. Setting Up a BioCASe Data Source: Database connection, Table Setup, Schema Mapping

55 55BioCASe Workshop, Melbourne, Feb 4-5th 2010 BPS Datasource URL for a BioCASe protocol compliant webservice: http://ww3.bgbm.org/biocase/pywrapper.cgi?dsa=AlgenEngels search http://www.tdwg.org/schemas/abcd/2.06 http://www.tdwg.org/schemas/abcd/2.06 A* false

56 56BioCASe Workshop, Melbourne, Feb 4-5th 2010 BPS QueryTools Tool for sending Scan, Search and Capabilities Requests to a datasource Choose datasource  „Debug“

57 57BioCASe Workshop, Melbourne, Feb 4-5th 2010 Steps for Setting Up a Datasource 1.Create a new Datasource 2.Configure Datasource: 1. Database Connection 2. Table Setup 3. Create new empty Mapping 4. Edit Mapping: 1. Choose root table 2. Edit mandatory ABCD elements (red) 3. Save Configration, test datasource (QueryTools) 4. Add additional ABCD elements, occasional testing 3.Test Datasource

58 58BioCASe Workshop, Melbourne, Feb 4-5th 2010 Datasource Loglevel The lower the loglevel, the more information is logged. (10=info, 20=debug, 30=warning, 40=error)

59 59BioCASe Workshop, Melbourne, Feb 4-5th 2010 How The BPS performs requests 1.Get a ID list of records matching the filter 2.Loading all details for the matching IDs  Joining of ALL tables, beginning with the root table (table with UnitID, one record per Unit)

60 60BioCASe Workshop, Melbourne, Feb 4-5th 2010 Datasources folder... /bps/configuration/datasources/ querytool_prefs.xml Just what its name says. xxxx.pick Temporary files; should be deleted if BPS behaves strangely. cmf_xxxxxx.xml Concept mapping; one for each supported schema. provider_setup_file.xml Database conncetion, table setup, supported schemas. Regular backup of configuration folder is highly recommended!

61 61BioCASe Workshop, Melbourne, Feb 4-5th 2010 Metadata tables If metadata differ for each or some of the records:  several records in metadata table, linked to unit by foreign key If metadata is unique for all records  possible to hold data in one record  no reference key is needed  static table

62 62BioCASe Workshop, Melbourne, Feb 4-5th 2010 Applications 2. Wrapper Software 1. Protocols/Data Standards Data Quality Checker DataMining 3. Applications Data Portal

63 63BioCASe Workshop, Melbourne, Feb 4-5th 2010 Local QueryTool

64 64BioCASe Workshop, Melbourne, Feb 4-5th 2010 Distibuted Search vs. Harvesting/Caching GeoCASe Distributed Search: http://search.biocase.org/geocasehttp://search.biocase.org/geocase

65 65BioCASe Workshop, Melbourne, Feb 4-5th 2010 GBIF Registration

66 66BioCASe Workshop, Melbourne, Feb 4-5th 2010 GBIF Data Portal

67 67BioCASe Workshop, Melbourne, Feb 4-5th 2010 BioCASe European data portal

68 68BioCASe Workshop, Melbourne, Feb 4-5th 2010 EDIT Specimen Explorer

69 69BioCASe Workshop, Melbourne, Feb 4-5th 2010 Data Mining: Itineraries Project Goal: Detect itinerary patterns in geo-referenced primary data presumably collected during a collecting event. 1st step: Try to validate itineraries from well-documented expeditions (literature) against geo-referenced primary biodiversity records with dates/collecting information 2nd step: Try to find itineraries for collecting events with missing expedition diaries

70 70BioCASe Workshop, Melbourne, Feb 4-5th 2010 Data Mining: Ecological Niche Modelling

71 71BioCASe Workshop, Melbourne, Feb 4-5th 2010 Jörg Holetschek Botanischer Garten & Botanisches Museum Abteilung Biodiversitätsinformatik & Labors Königin-Luise-Straße 6-8 14195 Berlin-Dahlem j.holetschek@bgbm.org j.holetschek@bgbm.org Tel. +49 30 838 50150 0448 831 980 www.bgbm.org/biodivinf www.biocase.org search.biocase.org search.biocase.de http://hiscom.chah.org.au/wiki/BioCASe_Workshop


Download ppt "Beispielbild The BioCASe Technology Jörg Holetschek Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories."

Similar presentations


Ads by Google