Download presentation
Presentation is loading. Please wait.
Published byEunice Stone Modified over 9 years ago
1
The e!DAL JAVA API: Sharing & Citing Research Data in Life Sciences 1 Gray, Jim.: Jim Gray on eScience: A Transformed Scientific Method 2 Arend et al. (2014) e!DAL - a framework to store, share and publish research data. BMC Bioinformatics 2014, 15:214. doi:10.1186/1471-2105-15-214 In the current release, Java-RMI is supported. The extension towards a RESTful API is planned for the next release. The support of this platform-independent protocol will enable direct access to the e!DAL-API for a wide spectrum of programing languages and infrastructures. Another future e!DAL extension will be the use of scalable, distributed architectures, such as Java clustering solutions, and distributed file systems like Apache HDFS. For this, the next release will virtualize the backend file-system access. One promising framework is the Apache Commons Virtual File System. Using the high number of available drivers, e!DAL will support cloud storage, distributed file systems, and local file systems as the storage backend. e!DAL – Metadata API 2 : stands for electronic Data Archive Library file system like storage system for any data file use embedded as local API or in a remote client-server mode as repository Open source data citation infrastructure: long-term stable ISO standard citation by DOI or URL metadata annotation (support the Dublin Core standard) manage different versions search across files and metadata using Apache Lucene support mounting as remote file system (WebDAV) flexible JAAS homogeneous authentication and authorisation (e.g. Windows/Unix/Kerberos) 100% object oriented JAVA-API with embedded infrastructure (HTTP(S)-server, database, DataCite) support Apache Maven build system - artifacts are available in the official central Maven repository http://edal.ipk-gatersleben.de Motivation – Current Challenges in the Life Science Data Publication Process Electronic Data Archive Library (e!DAL) Outlook References Publications with Data Processed Data & Data Representations Data Collections & Structured Databases Raw Data & Data Sets ►Most of the datasets remain unpublished and consequently not accessible for later re-analysis. ►Data collections resides either in distributed databases or file services, with individual access policies and accession numbers or URLs. ►Processed data, i.e. supplementary files to articles, links to the published “Record of Science”. But it is not always clear at what level the data is preserved and if the criteria for discoverability and re-usability are met. ►At the top of the pyramid, the journal, the author or the scientist is fully responsible for the publication and preservation of the embedded data. VERSION MANAGEME NT H2 database Hibernate File system METADATA MANAGEME NT DublinCore INFORMATION RETRIEVAL Apache Lucene Hibernate Search Apache Solr PERSISTENT IDENTIFIERS URLs DOIs DATA SECURITY AspectJ Java Authentication & Authorisation API INTER- OPERABILITY local/remote Java API GUI components WebDAV HTTP(S)-Server Leibniz Institute of Plant Genetics and Crop Plant Research – IPK, Corrensstraße 3, 06466 Stadt Seeland, OT Gatersleben, Germany Daniel Arend, Jinbo Chen, Christian Colmsee, Steffen Flemming, Denny Hecht, Uwe Scholz and Matthias Lange Data Publication Pyramid inspired by Gray 1 express the different manifestation forms of research data set resource public create public resolvable e!DAL URL request DOI? generate new DOI migrate e!DAL metadata to DataCite generate new DOI migrate e!DAL metadata to DataCite request publication confirmation from scientific & organizational authorities publication approved? set resource public send DOI to user set resource public send DOI to user no yes no send reject message to user Data Publication & Approval Process Data Domains: -life sciences produce a huge volume of data -different data domains (-omics Data) →basis for research & publication process Some examples: -PubMed: ca. 23 million article -Sequence Read Archive (SRA): over 1000 TB data / 4.5 trillion sequences -BioModels database around 140.000 models -figshare: 600.000 figures, 200.000 datasets, and 6000 papers -DRYAD: 5.300 documents Acknowledgements We thank Joscha Joel Benz for the initial WebDAV code and Thomas Münch, Heiko Miehe as administrators of the project website, code, and artifact repositories. This work was supported by the German-Plant-Phenotyping Network, which is funded by the German Federal Ministry of Education and Research (project identification number: 031A053). Demo - Mode: As use case the demo import a snapshot of NGS data. After this you have the chance to use and test the different functions of the API like the panel of the FileChooser GUI, the e!DAL WebDAV listener and a lot of more features. Install - Mode: This mode allows you to create a start script (depending on your operating system as *.bat or *.sh script) to run a remote e!DAL server with given parameter. e!DAL in action: In order to show the manifold features of eDAL, a demo/install wizard (Java Webstart) shows how to start-up a primary data repository from the scratch. http://edal.ipk-gatersleben.de intuitive graphical publication interface to assign a DOI for research data DOI agency: DataCite Consortium (registered prefix required) two-level approval process to ensure authorised publications eMail notification system submit publication request by object owner generate virtual landing pages for published IDs using embedded HTTP(S) server approval by authorised persons (scientific and organizational authorities) finally assigned DOI contact: arendd@ipk-gatersleben.de
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.