Presentation is loading. Please wait.

Presentation is loading. Please wait.

Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics.

Similar presentations


Presentation on theme: "Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics."— Presentation transcript:

1 Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories Königin-Luise-Straße 6-8 14195 Berlin BioCASe Workshop Berlin, May 30 th / 31 st 2011

2 2BioCASe Workshop, Berlin, May 30-31st 2011 Agenda Monday 11.00Welcome by Walter Berendsohn, Housekeeping 11.20 – 12.00 The BioCASe Architecture: An Overview 12.00 – 13.00 The BioCASe Provider Software I: An Overview 13.00 – 14.00 Lunch break 14.00 – 15.45 The BioCASe Provider Software II: Installation (Hands-on) 16.00 – 17.00The ABCD data standard: Intention, Structure, Elements, Use 17.00 – 18.00Preparing the database for BioCASe/ABCD 19.00Dinner Tuesday 09.30 – 12.00Setting Up Datasources with the BPS (Hands-on): DB connection, Table Setup, Mapping; Testing, Data Backups 12.00 – 13.00Lunch break 13.00 – 14.30Setting up Networks with BioCASe (Hands-on) 15.00 – 15.30A Thematic BioCASe Network: The DNA Bank Network 15.30 – 17.00Questions (and answers?)

3 3BioCASe Workshop, Berlin, May 30-31st 2011 Workshop Presentation http://www.biocase.org/files/BioCASe_Workshop_Berlin_2011.ppt WiFi Network:Conference Key:g59mn3w2

4 Beispielbild 1.BioCASe Technology: Motivation, Idea and Architecture

5 5BioCASe Workshop, Berlin, May 30-31st 2011 Primary Biodiversity Information © Agnes Kirchhoff, J. Holstein et al.

6 6BioCASe Workshop, Berlin, May 30-31st 2011 Primary Biodiversity Data Items -Living specimen -Preserved specimen -Multimedia document (drawing, photo, video, sound) -Observation = Primary Biodiversity Data Record Documentation of the occurrence of one species at a given location at a certain point in time  Biological Collection Access Service

7 7BioCASe Workshop, Berlin, May 30-31st 2011 Data sources worldwide -Index Herbariorum: 3,293 herbaria, 400 million herbarium sheets -50-100,000 natural history collections, 1.5-2 billion specimens -With observations added, occurrence records 3+ billion (10b?) Over 75% of biodiversity information are stored in developed countries. Est. 75% of all species are found in the developing world. Source: BARTHLOTT et al. 1999

8 8BioCASe Workshop, Berlin, May 30-31st 2011 Accessibility Stage 0: Only in real world (paper catalogues, just stacks) Only meta information available on the web Stage 1:Stage 2: Online catalogue Digitalization of specimen

9 9BioCASe Workshop, Berlin, May 30-31st 2011 Biodiversity Data Level 3: Networking the databases

10 10BioCASe Workshop, Berlin, May 30-31st 2011 Global Biodiversity Information Facility (GBIF)

11 11BioCASe Workshop, Berlin, May 30-31st 2011 Biological Collection Access Service (BioCASe)

12 12BioCASe Workshop, Berlin, May 30-31st 2011 Architecture of Biodiversity Networks 2. Wrapper Software: BioCASe Provider Software 1.Protocols/Data Standards: BioCASe Protocol/ABCD Data Quality Checker DataMining 3. Applications Data Portal

13 13BioCASe Workshop, Berlin, May 30-31st 2011 BioCASe Design Principles No central database  Data remain in the existing DB systems  Data Provider gets full credit  Full control over published data by collection holder Partial publication possible  Collection holder can withhold information from publication (e.g., locality data for endangered species) or exclude records (e.g. until research results are published) Wrapper principle  Data remain in original collection management system  No changes in workflow for curator/local users

14 14BioCASe Workshop, Berlin, May 30-31st 2011 2: The BioCASe ProviderSoftware Wrapper: BioCASe Provider Software Protocols/Data Standards Data Quality Checker DataMining Applications Data Portal

15 15BioCASe Workshop, Berlin, May 30-31st 2011 Software package that „wraps“ around the collection database  Equips it with a BioCASe protocol compliant interface 1.Accepts requests from the network 3. Transforms results into ABCD documents and sends them back BioCASe Provider Software (Wrapper) Marmota marmota? 2.Translates queries to the collection database SELECT * FROM specimen WHERE ScientificName LIKE “Marmota marmota%“

16 16BioCASe Workshop, Berlin, May 30-31st 2011 BioCASe Provider Software (Wrapper) Compatible with several protocols (BioCASe, DiGIR) and data schemas (ABCD, DarwinCore, ABCD-EFG, ABCD-DNA) Works with most SQL-compliant databases (Access, MySQL, Postgres, SQL Server,...) Currently ~95 production installations serving ~1,500 collections with ~33.5m records to GBIF and BioCASe Platform independent Support available!

17 17BioCASe Workshop, Berlin, May 30-31st 2011 BioCASe Providers Worldwide ~95 production installations serving ~1.500 collections

18 18BioCASe Workshop, Berlin, May 30-31st 2011 Requirements 1.SQL compliant database with existing Python connectivity module: MySQL, SQL Server, Postgres, Access, Foxpro, Excel 2.Webserver (preferrably Apache), allowing the execution of Python scripts 3.Privileges to install additional Python packages

19 19BioCASe Workshop, Berlin, May 30-31st 2011 Steps 1.Installing Apache 2.Installing Python 3.Downloading BPS 4.Installing BPS (from repository/archive) 5.Creating the link Apache/BPS 6.Test of Installation 7.Changing directory permissions 8.Setup of additional packages (DB Connectivity Package)

20 20BioCASe Workshop, Berlin, May 30-31st 2011 1. Installing Apache http://httpd.apache.org/download

21 21BioCASe Workshop, Berlin, May 30-31st 2011 2. Installing Python http://www.python.org/download/

22 22BioCASe Workshop, Berlin, May 30-31st 2011 3. Downloading BPS Archive: http://www.biocase.org/products/provider_software/http://www.biocase.org/products/provider_software/ Subversion repository Latest stable version: http://ww2.biocase.org/svn/bps2/branches/stable Defined version: http://ww2.biocase.org/svn/bps2/tags/release_2.5.3 http://ww2.biocase.org/svn/bps2/branches/stable http://ww2.biocase.org/svn/bps2/tags/release_2.5.3 Linux: svn co Windows: Tortoise client

23 23BioCASe Workshop, Berlin, May 30-31st 2011 4. Installing the BPS Setup.py No files copies, only adapted!

24 24BioCASe Workshop, Berlin, May 30-31st 2011 5. Linking BPS with Apache http.conf

25 25BioCASe Workshop, Berlin, May 30-31st 2011 6. Testing BPS, Installing Additional Packages http://localhost/biocasehttp://localhost/biocase  Utilities  Library Test

26 26BioCASe Workshop, Berlin, May 30-31st 2011 6. Write permissions …/bps2/configuration …/bps2/log

27 27BioCASe Workshop, Berlin, May 30-31st 2011 7a: mysqldb http://sourceforge.net/projects/mysql-python/

28 28BioCASe Workshop, Berlin, May 30-31st 2011 Changing the Password... /bps/configuration.ini

29 29BioCASe Workshop, Berlin, May 30-31st 2011 3: ABCD Standard Protocols/Data Standards Wrapper Software Data Quality Checker DataMining Applications Data Portal

30 30BioCASe Workshop, Berlin, May 30-31st 2011 ABCD Data Schema Access to Biological Collection Data: Data schema for all types of primary biodiversity data (living/preserved/observational, botanical/zoological/bacterial/viral, marine/terrestrial) XML (eXtensible Markup Language) based  can be consumed by humans and machines Highly complex, hierarchical, currently 1,055 data elements  almost every data item will fit in Extendable (plug-in slot for additional information) standard (currently version 2.06)

31 31BioCASe Workshop, Berlin, May 30-31st 2011 ABCD: Structure Namespace: http://www.tdwg.org/schemas/abcd/2.06

32 32BioCASe Workshop, Berlin, May 30-31st 2011 ABCD Metadata: Technical/Content Contact

33 33BioCASe Workshop, Berlin, May 30-31st 2011 ABCD Metadata: Description

34 34BioCASe Workshop, Berlin, May 30-31st 2011 ABCD Metadata: Coverage

35 35BioCASe Workshop, Berlin, May 30-31st 2011 ABCD Metadata: Revision/Version

36 36BioCASe Workshop, Berlin, May 30-31st 2011 ABCD Metadata: Ownership

37 37BioCASe Workshop, Berlin, May 30-31st 2011 ABCD Metadata: Intellectual Property Rights

38 38BioCASe Workshop, Berlin, May 30-31st 2011 ABCD Metadata

39 39BioCASe Workshop, Berlin, May 30-31st 2011 ABCD: Triple ID, Record Basis

40 40BioCASe Workshop, Berlin, May 30-31st 2011 ABCD: Identification (multiple)

41 41BioCASe Workshop, Berlin, May 30-31st 2011 ABCD: Gathering Event

42 42BioCASe Workshop, Berlin, May 30-31st 2011 ABCD: Multimedia OpenUp: Thumbnails will be created  Always provide link to image file!

43 43BioCASe Workshop, Berlin, May 30-31st 2011 ABCD: Unit Associations

44 44BioCASe Workshop, Berlin, May 30-31st 2011 ABCD: Specialised Portions Specimen Unit: Acquisition, Accession, Peparation, Duplicate Distribution, Type Status Herbarium Unit: Loan Information Botanical Garden Unit: Location in Garden, Hardiness, Lineage, Cultivation, Planting Date Other Specialised Subtrees for Observations Culture Collections Mycological Units Zoological Units Paleontological Units Plant Genetic Resources

45 45BioCASe Workshop, Berlin, May 30-31st 2011 ABCD: UnitExtension Own Namespace for Extension http://www.chah.org.au/schemas/hispid/5http://www.chah.org.au/schemas/hispid/5 Other Extensions: Extension for Geoscienes (ABCD-EFG) DNA Bank Network (ABCD-DNA)

46 46BioCASe Workshop, Berlin, May 30-31st 2011 BioCASe Protocol Biological Collection Access Service Protocol: Manages data exchange between data providers (collections) and applications (data portals) Vehicle for transporting requests: data portal  collection and responses (ABCD documents): collection database  data portal XML based

47 47BioCASe Workshop, Berlin, May 30-31st 2011 BioCASe Protocol: Capabilities request

48 48BioCASe Workshop, Berlin, May 30-31st 2011 BioCASe Protocol: Inventory Request

49 49BioCASe Workshop, Berlin, May 30-31st 2011 BioCASe Protocol: Search Request

50 Beispielbild 4. Preparing the database for BioCASe

51 51BioCASe Workshop, Berlin, May 30-31st 2011 4. Reasons for not publishing the live DB 1.Publishing the live DB is not desired  creating snapshots for publication 2.DBMS not accessible for the BPS  export into another DBMS 3.Performance considerations (too highly normalized)  partial, controlled denormalization 4.Repeatable elements kept in columns, not in separate rows  Moving repeatable elements to separate records

52 52BioCASe Workshop, Berlin, May 30-31st 2011 Each repeatable elements needs its own primary key! Repeatable elements kept in columns specimen_id...classorderfamily 3476...ConjugatophyceaeDesmidiales Desmidiaceae 3477...ConjugatophyceaeDesmidiales Desmidiaceae 3478...ConjugatophyceaeDesmidiales Closteriaceae specimen_id... 3476... 3477... 3478... sp_idht_entryht_rankht_name 3476456765classConjugatophyceae 3476456766orderDesmidiales 3476456767family Desmidiaceae 3477456768classConjugatophyceae 3477456769orderDesmidiales 3477456770family Desmidiaceae 3478456771classConjugatophyceae 3478456772orderDesmidiales 3478456773family Closteriaceae

53 53BioCASe Workshop, Berlin, May 30-31st 2011 Example View CREATE VIEW [dbo].[vwHigherTaxa] AS SELECT 'k_' + [EDIT_ATBI_RecordID] AS id, [EDIT_ATBI_RecordID] AS unit_id, [kingdom] AS name, 'kingdom' AS rank FROM unit_data WHERE [kingdom] IS NOT NULL UNION SELECT 'p_' + [EDIT_ATBI_RecordID], [EDIT_ATBI_RecordID], [phylum], 'phylum‚ FROM unit_data WHERE [phylum] IS NOT NULL UNION...

54 54BioCASe Workshop, Berlin, May 30-31st 2011 Commonly used repeatable elements - Identification - HigherTaxon - GatheringSite/NamedArea - Metadata/Scope/GeoecologicalTerms - Metadata/Scope/TaxonomicTerms - MultimediaObjects - MeasurementsOrFacts -...

55 55BioCASe Workshop, Berlin, May 30-31st 2011 Controlled Denormalization insert into [dbo].[abcd_Object] SELECT dbo.CollectionObject.CollectionObjectID, ISNULL(dbo.CatalogSeries.SeriesName, '') + '-' + ISNULL(CAST(dbo.CollectionObjectCatalog.SubNumber AS nvarchar(20)), '') + '-' + ISNULL(CAST(dbo.CollectionObjectCatalog.CatalogNumber AS nvarchar(20)), ''), dbo.f_getParentID(dbo.CollectionObject.CollectionObjectID), dbo.f_getCollectingEventID(dbo.CollectionObject.CollectionObjectID), dbo.f_getFieldNumber(dbo.CollectionObject.CollectionObjectID), cast(dbo.CollectionObjectCatalog.CatalogNumber as int), dbo.CollectionObject.PreparationMethod, case when Sex = ' ' then NULL else Sex end, case when Stage = ' ' then NULL else Stage end, case when dbo.CollectionObject.Text1 is null then '' else 'Barcode: ' + dbo.CollectionObject.Text1 + '; ' end + case when dbo.Accession.Number is null then '' else 'Specimen Location: ' + dbo.Accession.Number end + case when DerivedFrom.Remarks is null then '' else ' ' + cast(DerivedFrom.Remarks as nvarchar(2000)) end FROM dbo.BiologicalObjectAttributes RIGHT OUTER JOIN dbo.CollectionObject ON dbo.BiologicalObjectAttributes.BiologicalObjectAttributesID = dbo.f_getParentID(dbo.CollectionObject.CollectionObjectID) LEFT OUTER JOIN dbo.CollectionObjectCatalog LEFT OUTER JOIN dbo.CatalogSeries ON dbo.CollectionObjectCatalog.CatalogSeriesID = dbo.CatalogSeries.CatalogSeriesID ON dbo.CollectionObject.CollectionObjectID = dbo.CollectionObjectCatalog.CollectionObjectCatalogID LEFT JOIN dbo.Accession on Accession.AccessionID = CollectionObjectCatalog.AccessionID LEFT JOIN dbo.CollectionObject AS DerivedFrom ON CollectionObject.DerivedFromID = DerivedFrom.collectionObjectID WHERE (dbo.f_hasChildObjects(dbo.CollectionObject.CollectionObjectID) = 0) AND...

56 56BioCASe Workshop, Berlin, May 30-31st 2011 How Do I See Someting is Wrong? Errors in ABCD documents: Several datasets (one for each unit) Reason: Metadata field stored in Units table (no separate PK  several datasets need to be created) Several units for one specimen record Reason: Several records in DB for non-repeatable elements (several ABCD objects are necessary to create a valid document)

57 Beispielbild 5. Setting Up a BioCASe Data Source: Database connection, Table Setup, Schema Mapping

58 58BioCASe Workshop, Berlin, May 30-31st 2011 BPS Datasource URL for a BioCASe protocol compliant webservice: http://ww3.bgbm.org/biocase/pywrapper.cgi?dsa=AlgenEngels search http://www.tdwg.org/schemas/abcd/2.06 http://www.tdwg.org/schemas/abcd/2.06 A* false

59 59BioCASe Workshop, Berlin, May 30-31st 2011 BPS QueryForms Tool for sending Scan, Search and Capabilities Requests to a datasource Choose Datasource  „Test and Debug“

60 60BioCASe Workshop, Berlin, May 30-31st 2011 Steps for Setting Up a Datasource 1.Create a new Datasource 2.Configure Datasource: 1. Database Connection 2. Table Setup 3. Create new empty Mapping 4. Edit Mapping: 1. Choose root table 2. Edit mandatory ABCD elements (red) 3. Save Configration, test datasource (QueryForms) 4. Add additional ABCD elements, occasional testing 3.Test/Debug Datasource

61 61BioCASe Workshop, Berlin, May 30-31st 2011 FloraExsiccataBavarica: Additional Fields ConceptTable/Column Metadata/… Description/Representation/Detailsmetadata.description (text) IconURImetadata.logo_url (text) Version/Majormetadata.source_version (text) Metadata/IPRStatements/… Citations/Citation/Textmetadata.citationsText (text) Copyrights/Copyright/Textmetadata.copyright (text) Disclaimers/Disclaimer/Textmetadata.disclaimer (text) Acknowledgements/Acknolwedgement/Textmetadata.acknowledgement (text) TermsOfUseStatements/TermsOfUse/Textmetadata.terms_of_use (text) Units/Unit/Gathering/… Agents/GatheringAgent/Person/FullNameunit.sammler (text) Altitude/MeasurementOrFactTextunit.hoehe (text) + “m” Altitude/MeasurementOrFactAtomised/LowerValueunit.hoehe (text) Altitude/MeasuremntOrFactAtomised/UnitOfMeasurement“m” Country/ISO3166Code“DE” Country/Name“Germany” DateTime/DateTextunit.datum1 (text) LocalityTextunit.fundort (text) NamedAreas/NamedArea/AreaClass“State” NamedAreas/NamedArea/AreaName“Bavaria”

62 62BioCASe Workshop, Berlin, May 30-31st 2011 How The BPS performs requests 1.Get an ID list of records matching the filter 2.Loading all details for the matching IDs  Joining of ALL tables, beginning with the root table (table with UnitID, one record per Unit)

63 63BioCASe Workshop, Berlin, May 30-31st 2011 Typical Mapping Errors -Incomplete Mappings -Missing explicit mappings for implicit knowledge (e.g. Country = “Germany” for a German collection) -Abusing the MultimediaObject for non-multimedia Documents (e.g. Links to taxon pages) -Providing “0” values for non-existent data

64 64BioCASe Workshop, Berlin, May 30-31st 2011 Datasource Loglevel The lower the loglevel, the more information is logged: Debug < Info < Warning < Error Datasource  Configuration  Settings

65 65BioCASe Workshop, Berlin, May 30-31st 2011 Datasources folder... /configuration/datasources/ querytool_prefs.xml Just what its name says. cmf_xxx.xml Concept mapping; one for each supported schema. provider_setup_file.xml Database conncetion, table setup, supported schemas. Regular backup of configuration folder is highly recommended!

66 66BioCASe Workshop, Berlin, May 30-31st 2011 Metadata tables If metadata differ for each or some of the records:  several records in metadata table, linked to unit by foreign key If metadata is unique for all records  possible to hold data in one record  no reference key is needed  static table

67 67BioCASe Workshop, Berlin, May 30-31st 2011 Applications 2. Wrapper Software 1. Protocols/Data Standards Data Quality Checker DataMining 3. Applications Data Portal

68 68BioCASe Workshop, Berlin, May 30-31st 2011 Local QueryTool

69 69BioCASe Workshop, Berlin, May 30-31st 2011 Distributed Search: BioCASe Simple UI BioCASe Distributed Search: http://search.biocase.org/simple-uihttp://search.biocase.org/simple-ui

70 70BioCASe Workshop, Berlin, May 30-31st 2011 Harvesting: GBIF Data Portal

71 71BioCASe Workshop, Berlin, May 30-31st 2011 GBIF Registration

72 72BioCASe Workshop, Berlin, May 30-31st 2011 GBIF Indexing History

73 73BioCASe Workshop, Berlin, May 30-31st 2011 EDIT Specimen Explorer: Interactive filters

74 74BioCASe Workshop, Berlin, May 30-31st 2011 Distributed Search vs. Harvesting Distributed Search + No harvesting application/database required + No Delay with data updates (instantly visible) - Dependent on Provider Availability - Slow - No data verification - No maps, taxon lists, … Harvesting - Need for a harvester/cache database - Delays when records get updated/added/removed + No heavy dependency on provider availability + Fast (as long as your portal is) + Data verification/improvements/transformation in harvesting process + Maps, suggestion lists, Interactive filters, …

75 75BioCASe Workshop, Berlin, May 30-31st 2011 OpenUp! Harvesting BioCASE OpenUp! Harvester OAI-PMH Harvester ABCD ESE EDM

76 76BioCASe Workshop, Berlin, May 30-31st 2011 Jörg Holetschek, Gabriele Dröge Botanischer Garten & Botanisches Museum Abteilung Biodiversitätsinformatik & Labors Königin-Luise-Straße 6-8 14195 Berlin-Dahlem j.holetschek@bgbm.org j.holetschek@bgbm.org Tel. +49 30 838 50150 0448 831 980 www.bgbm.org/biodivinf www.biocase.org search.biocase.org search.biocase.de http://www.biocase.org/files/BioCASe_Workshop_Berlin_2011.ppt


Download ppt "Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics."

Similar presentations


Ads by Google