Presentation is loading. Please wait.

Presentation is loading. Please wait.

IDigBio Cyberinfrastructure Working Group Andréa Matsunaga (on behalf of the WG) iDigBio Summit, Gainesville October 23- 24, 2012.

Similar presentations


Presentation on theme: "IDigBio Cyberinfrastructure Working Group Andréa Matsunaga (on behalf of the WG) iDigBio Summit, Gainesville October 23- 24, 2012."— Presentation transcript:

1 iDigBio Cyberinfrastructure Working Group Andréa Matsunaga (on behalf of the WG) iDigBio Summit, Gainesville October 23- 24, 2012

2 Advanced Computing and Information Systems laboratory Cyberinfrastructure WG https://www.idigbio.org/wiki/index.php/Cyberinfrastruct ure_Working_Group https://www.idigbio.org/wiki/index.php/Cyberinfrastruct ure_Working_Group Established as an outcome of the iDigBio IT Standards Workshop The focus of the initial group will be on the iDigBio data ingestion procedures related to Application Programming Interface (API) or appliance specification, implementation and test Produce material with concrete data ingestion use cases from TCNs, provide input about the existing cyberinfrastructure, produce data ingestion requirements, and help evaluate iDigBio services and appliances implementation 2

3 Advanced Computing and Information Systems laboratory iDigBio Data Portal v0 API Retrieval-only operations (REST GET operations) Endpoints: List all endpoints: http://api.idigbio.org/v0http://api.idigbio.org/v0 List collections: http://api.idigbio.org/v0/recordsetshttp://api.idigbio.org/v0/recordsets List specimens: http://api.idigbio.org/v0/recordshttp://api.idigbio.org/v0/records List media metadata: http://api.idigbio.org/v0/mediarecordshttp://api.idigbio.org/v0/mediarecords List media objects: http://api.idigbio.org/v0/mediaapshttp://api.idigbio.org/v0/mediaaps Individual records: Example: http://api.idigbio.org/v0/records/eac2e4ec-5dbb-4c34-b56f- 231ed28a5bcahttp://api.idigbio.org/v0/records/eac2e4ec-5dbb-4c34-b56f- 231ed28a5bca {"idigbio:data":{ "dwc:county":"Liberty", "dwc:recordedBy":"Loran C. Anderson", "dwc:scientificNameAuthorship":"(Nees) Small", "id":"http://www.morphbank.net/586214", ”dwc:eventDate":"2009-06-30 00:00:00.0", "dwc:scientificName":"Yeatesia viridiflora"}, "idigbio:etag":"c3113b3aa2612ce8af46cde267c355ba18325719", "idigbio:links":{ "thumbnailurl":"http://api.idigbio.org/v0/mediaaps/f43df2a6-22e8-4783-a998-39a12d7784ef/media", "mediarecord":"http://api.idigbio.org/v0/mediarecords/3409722c-9c23-4a62-808b-7ae684ad2046", "recordset":"http://api.idigbio.org/v0/recordsets/b4372b49-c7cc-42db-b1a3-f1c001de0f18"}, "idigbio:uuid":"eac2e4ec-5dbb-4c34-b56f-231ed28a5bca"} 3

4 Advanced Computing and Information Systems laboratory iDigBio Data Portal v0 Presentation http://portal.idigbio.org/record-view.shtml#eac2e4ec-5dbb-4c34-b56f-231ed28a5bca 4

5 Advanced Computing and Information Systems laboratory Virtual Private Server (VPS) Total: 7 VMs, 17 cores, 39GB RAM, 1.7TB storage Symbiota: 2VMs 1 production, 2 cores, 8GB RAM, 200GB disk, 1 pub IP, apache, php, java, MySQL, SVN, tomcat, 1user 1 for FP testing/development, 2 cores, 8GB RAM, 200GB disk, 1 pub IP, apache, php, java, MySQL, SVN, tomcat, 3 users FilteredPush: 2VMs 1 core, 1024MB RAM, 40GB storage, fp-lite SCAN testbed 2 cores, 4GB RAM, 80 GB storage, mysql, apache, php, tomcat for Symbiota, Morphbank, and FilteredPush Vertnet: 1VM 2 cores, 2 GB RAM, 500 GB storage, 1 pub IP, CentOS6, 5 users, Tomcat, IPT Biogeomancer: 1VM 4 core, 8GB RAM, 500GB storage, 1 public IP, apache, tomcat, postgres and postgis, 3 users aOCR hackathon: 1VM 4 cores, 8 GB RAM, 250 GB storage, Linux (Ubuntu 12.04), Java, PHP, Python, Perl, MySQL, Apache HTTP server, FTP server, ImageMagick, Tesseract, OCRopus, GOCR/JOCR, ZBar 5

6 Advanced Computing and Information Systems laboratory Databases/DwC-A Examined DatasetDateFormatOccurrencesMediaTaxon TCN-BryophytesJun/01/2012Symbiota-MySQL9618815621749882 TCN-LichensJun/01/2012Symbiota-MySQL6919675943810647 TCN-MycologyJun/01/2012Symbiota-MySQL2795291179415812 TCN-InvertNetMar/14/2012DwC-A 631388 00 TCN-TTD-AMNHJun/21/2012AMNH-MySQL 785134 419561655 TCN-TTD-NYBGApr/26/2012CSV 1469089 9050 TCN-PALEONICHESJul/12/2012Specify-MySQL9607906128 FLMNH-IchthyologyDec/19/2011DwC-A21336100 FLMNH-IchthyologyApr/27/2012DwC-A21448700 ValdostaApr/16/2012Specify-MySQL148271229196817 MorphbankNov/22/2011DwC-A1937042504420 MorphbankJun/29/2012DwC-A1940152523030 Total 5,338,396 386,528640,941 6

7 Advanced Computing and Information Systems laboratory Relational Databases 7 Specimen omoccurrences Specimen omoccurrences Taxon taxa Taxon taxa Media Images Media Images Collection omcollections Collection omcollections Person omcollectors Person omcollectors detHis Dataset Exsiccati title, number, links Dataset Exsiccati title, number, links Symbiota Specify EMu Specimen omoccurrences Specimen omoccurrences Taxon mnl U flora_mnl Taxon mnl U flora_mnl Media images Media images Collection institution Collection institution Person collector Person collector subfamily tribe genus species CollectingEvent colevent CollectingEvent colevent TTD-AMNH Geography locality Geography locality hostFor parent Geography esites Geography esites Specimen collectionobject Specimen collectionobject Taxon taxon Taxon taxon Media attachments Media attachments Collection collection Collection collection detHis Geology paleocontext Geology paleocontext CollectingEvent colevent CollectingEvent colevent parent hybrid Person agent Person agent Geography Locality U geography Geography Locality U geography parent Specimen ecatalogue Specimen ecatalogue Taxon etaxonomy Taxon etaxonomy Media emultimedia Media emultimedia Geology esites Geology esites Person eparties Person eparties CollectingEvent ecollectionevents CollectingEvent ecollectionevents

8 Advanced Computing and Information Systems laboratory Image ingestion appliance First instance of an application built upon the iDigBio APIs Enables easy, reliable bulk-ingestion of media records User selects image directory folder; appliance takes care of Traversing sub-directories Uploading individual images through iDigBio API Transparently recovering from various failure conditions Providing to user the mapping of file name to iDigBio URLs Ingested images can be accessed by the URLs Appliance runs a lightweight Web server to expose a Web- based UI to users 8

9 Advanced Computing and Information Systems laboratory Image Ingestion APIs Used By Appliance 9 Create RecordSet URL: RecordSet collection level endpoint e.g. POST http://idb-websrv1-dev.acis.ufl.edu:9197/v1/recordsets Request content: JSON ["idigbio:data"]["ac:variant"]: "IngestionTool" ["idigbio:providerId"]: Currently, client generated random UUID Response content: JSON ["idigbio:uuid"]: RecordSet iDigBio UUID

10 Advanced Computing and Information Systems laboratory Image Ingestion APIs Used By Appliance 10 Create MediaRecord URL: MediaRecord collection level endpoint e.g. POST http://idb-websrv1-dev.acis.ufl.edu:9197/v1/mediarecords Request content: JSON ["idigbio:data"]["ac:variant"]: "IngestionTool" ["idigbio:data"]["dc:rights"]: One of {"cc0", "cc-by", "cc-by-sa", "cc-by-nc", "cc-by-nc-sa"} ["idigbio:data"]["idigbio:localpath"]: Full local path ["idigbio:data"]["idigbio:relationships"]["recordset"]: The iDigBio UUID of the RecordSet the media record belongs to, from the previous step. ["idigbio:providerId"]: User defined GUID prefix + (full local path or file name) ["idigbio:data"]["idigbio:relationships"]["owner"]: Organizational owner of the record, otherwise the signed-in user is saved as the owner. (Optional) Response content: JSON ["idigbio:uuid"]: iDigBio MediaRecord UUID

11 Advanced Computing and Information Systems laboratory Image Ingestion APIs Used By Appliance 11 Upload Media Object URL: API sub-collection level endpoint e.g. POST http://idb-websrv1-dev.acis.ufl.edu:9197/v1/mediarecords/8c7ae0c3-a3b8- 4ddd-b433-ab99141ed405/media (the UUID in the middle is the “iDigBio MediaRecord UUID" returned in the previous step) Request content: Binary multipart/form-data with the image as the "file" Response content: JSON ["idigbio:links"]["media"]: The URL where the image is accessible online ['idigbio:uuid']: The Media Access Point UUID ['idigbio:data']['idigbio:imageEtag']: The hash (MD5) of the image stored at the server, which is compared with the MD5 of the local image to verify the success of the upload

12 Advanced Computing and Information Systems laboratory Image Ingestion v1 – Call for beta-testers 12 Features already implemented Reliable uploads Automatically retry of failed upload of individual files Keep local record of unsuccessful transfers ; resume after failure of network or service Skip already uploaded files when the same directory is uploaded multiple times Allow user to specify a license for the media object being uploaded Save, Export (local path : URL) mappings for individual files in the batch upload Contact: Andréa Matsunaga (ammatsun@ufl.edu) Features to add Integrated user authentication Currently, application keys can be provided to beta testers Improved user interface, error reporting, and encoded best practices on the UI Feedback from early adopters Help guide UI improvements and prioritize features to be incorporated Help fine-tune performance (e.g. parallel uploads) and failure handling

13 Advanced Computing and Information Systems laboratory Current Members Andréa Matsunaga (Co-Lead), iDigBio IT Joanna McCaffrey (Co-Lead), iDigBio Program Manager Renato Figueiredo, iDigBio IT Alex Thompson, iDigBio IT Jiangyan Xu, iDigBio IT Guillaume Jimenez, iDigBio IT Casey McLaughlin, iDigBio IT Michael Bevans, NYBG Photographer James Beach, Paleoniches TCN IT/Co-PI, University of Kansas, Specify Andrew Brown, KE Software Edward Gilbert, Lichens & Bryophytes and SCAN TCNs IT, Symbiota Corinna Gries, Lichens & Bryophytes TCN, PI Tony Kirchgessner, Tri-trophic TCN IT, NYBG Nahil Sobh, InvertNet TCN IT, Co-PI Omar Sobh, InvertNet TCN IT Ex Officio: José Fortes, iDigBio Director for Computational Activities 13 Thank you!


Download ppt "IDigBio Cyberinfrastructure Working Group Andréa Matsunaga (on behalf of the WG) iDigBio Summit, Gainesville October 23- 24, 2012."

Similar presentations


Ads by Google