Presentation is loading. Please wait.

Presentation is loading. Please wait.

Prof. Dr. Mohan Raj Pradhan Ms. Parbati Pandey

Similar presentations


Presentation on theme: "Prof. Dr. Mohan Raj Pradhan Ms. Parbati Pandey"— Presentation transcript:

1 Prof. Dr. Mohan Raj Pradhan Ms. Parbati Pandey
Comparative study of Data Repository Software with Reference to Harvesting Data in the Context of Library and Information Science Prof. Dr. Mohan Raj Pradhan Ms. Parbati Pandey

2 Introduction Data is becoming more important to business decisions.
This requires tools that can collect, store and help analyze data. Data repository is a tool that is common in scientific research but also useful for managing business data. Data repository is also known as a data library or data archive. Data repository is a large database infrastructure that can collect, manage, and store data sets for data analysis, sharing and reporting. A data set is a collection of data. Most commonly a data set corresponds to the contents of a single database table, or a single statistical data matrix, where every column of the table represents a particular variable, and each row corresponds to a given member of the data set in question.

3 Data repository : the what
IT infrastructure (cloud based/online) set up to manage, share, access, maintain, and archive datasets. An application database specialized in storing metadata of data files/datasets/databases. Differs from publication repository mainly in its ability to: Store metadata at different level/hierarchy. Store and ingest data files in various formats for long-term preservation Dspace is publication repository/institutional repository i.e. text-based.

4 Data Repository: The Why
Easy information discovery Easy and efficient access More contact and intensify impact Persistent access (through persistent URL) i.e. make data citable through the assignment of DOI (Digital Object Identifier) Long-term storage and preservation Allows unprecedented use, analysis and finding through interoperability and interlinking with other repositories. Most major federal grant agencies require data access as mandatory part of the grant proposal /oversite process (NIH, NSF, NEH, USDA) Negative data are data that do not enable us to reject our null hypothesis. Such data are often difficult to publish because it is not possible to prove the null hypothesis. Every active research scientist has a large drawer where these data languish.

5 Data Repository: The Why
Collecting all data at one place Statistics on downloads and citations

6 What makes Data Management Repositories useful?
Makes available faculty, departmental and institutional research Allows publication of negative data

7 Research Data Repository software Characteristics
Hosted locally or remotely on a server Software contains collaborative options Open source or proprietary software Wide variety of data types (Excel to SPSS to various discipline specific formats)

8 Perceived Benefits of Data Repository
Can share publications and research data Make research data more widely available Statistics available on downloads and citations of data Savings various versions of dataset (data lifecycle) Collecting all data in one place

9 Research data management tools
A survey was done to identify currently implemented standards, requirements and features related to research data repositories. Based on this, five well-known platform is chosen in this study, namely DSpace, CKAN, Zenodo, Figshare and Dataverse. These tools are considered and evaluated them according to a set of key aspects: architecture, metadata handling capabilities, interoperability, content dissemination, search features and community acceptance.

10 Architecture Class Feature DSpace CKAN Figshare Zenodo Dataverse
Deployment Installation package Service Storage location Local or remote Remote Maintenance costs Infrastructure management Monthly fee based- free of cost Open Source × Platform customization Community policies Embargo period Private storage Content versioning Pre-reserving DOI

11 Architecture… Class Feature DSpace CKAN Figshare Zenodo Dataverse
Metadata Required fields Title, Date of issue Title Author, title, categories description Type, DOI, author, title, description Title, Author, Description, Contact , Subject, and DOI Exporting schemas Any pre-loaded schema × DC DC, MARCXML XML Schema flexibility Flexible Fixed Validation  √ Versioning

12 Architecture… With ckanext- harvest installer Class Feature DSpace
Figshare Zenodo Dataverse Dissemination API OAI-PMH Compliance With ckanext- harvest installer Faceted search

13 Architecture Most of the above mentioned software are open source based and have given some flexibility to the users. Speedy and simple deployment of the used software is a crucial part for the implementation. Open source software can be installed in house whereas platforms like Figshare and Zenodo are to be installed and implemented by the help of the developer. Dspace, Dataverse & CKAN have better control in the recorded data as they are open source.

14 Architecture… The proprietary software viz Figshare or Zenodo are not viable platform for the researchers and the institution as they have to rely on the developers. DSpace, CKAN, Dataverse and Zenodo permit a customization with improvements ranging from small interface modification to the development of new data imagining plugins to satisfy the needs of their users: while Zenodo allows parametrization settings such as community-level can be further customized. DSpace, Zenodo and Dataverse permit users to stipulate embargo period whereas CKAN and Figshare have options for reserved storage to let researchers control the data publication mode.

15 Metadata Zenodo and Figshare software are able to export records that comply with established metadata schemas (Dublin Core and MARC-XML respectively). DSpace goes further by exporting DIPs (Dissemination Information Package) that include METS metadata records, thus enabling the ingestion of these packages into a long-term preservation workflow.

16 Metadata… Although CKAN and Dataverse metadata records do not follow any standard schema, the platform allows the inclusion of a dictionary of key- value pairs that can be used to record domain specific metadata as a complement to generic metadata descriptions. Neither platform natively supports collaborative validation stages where curators and researchers enforce the correct data and metadata structure, but Zenodo allows the users to create a highly curated area within communities, as highlighted in the “validation” feature. Every deposit will have to be validated by the community curator, if the policy of a particular community specifies manual validation. There is an important issue to tracking content changes in data management. CKAN provides an auditing trail of each deposited dataset by showing all changes made to it since its deposit.

17 Interoperability and Dissemination
All of the evaluated platforms allow the development of external clients and tools as they already provide their own APIs for exposing metadata records to the outside community, but there are some differences regarding standards compliance. Zenodo and DSpace natively comply with the OAI-PMH (Open Archives Initiative Protocol for Metadata Harvesting) protocol. This is a widely-used protocol that promotes interoperability between repositories while also streamlining data dissemination, and is a valuable resource for harvesters to index the contents of the repository.

18 Advantage of DSpace Can comply with domain-level metadata schemas
Is open-source and has a wide supporting community Has an extensive, community maintained documentation Can be fully under institutions control Structured metadata representation Complaint with OAI-PMH Supports Dublin Core, and MARCXML for metadata exporting

19 Advantage of CKAN Is open-source and widely supported by the developer community Features extensive and comprehensive documentation Allows deep customization of its features

20 Advantage of CKAN… Can be fully under institutions control
Supports unrestricted (non standards-compliant) metadata Has faceted search with fuzzy-matching Records datasets change logs and versioning information

21 Advantage of Figshare Gives credit to authors through citations and references Can export reference to Mendeley, DataCite, RefWorks, Endnote, NLM and Reference Manager Records statistics related to citations and shares Does not require any maintenance

22 Advantage of Zenodo Allows creating communities to validate submissions Supports Dublin Core, MARC and MARCXML for metadata exporting Can export references to BibTeX, DataCite, DC, EndNote, NLM, RefWorks Complies with OAI-PMH for data dissemination Does not require any maintenance Includes metadata records in the searchable fields

23 Advantage of Dataverse
Is open-source and widely supported by the developer community Data Citation automatically generated Multiple Publishing Workflows Faceted Search as well as tags can be used for searches

24 Advantage of Dataverse…
Already defines roles and also custom roles can be designed and assigned to the users Branding, metadata based facets, sub-dataverses, featured dataverses, Re-format, Summary Statistics, and Analysis for Tabular Files integration with TwoRavens

25 Advantage of Dataverse…
Mapping of Geospatial files and integration with WorldMap Restricted Files as well as ability to request access to restricted files three level of Metadata i.e. description/citation, domain-specific or custom fields, file metadata

26 Advantage of Dataverse…
Search API, data deposit API etc Notifications will be generated to the user and also will be communicated by mail for access request, roles, and when data is published CC0 waiver default, terms of use can be customised by user, and download statistics Can export reference to EndNote XML, RIS Format, or BibTeX Format

27 Conclusion Dataverse, CKAN and DSpace’s open-source licenses were highlighted that allow them to be updated and customized, while keeping the core functionalities intact. There is live demo of Dataverse, CKAN, DSpace and Zenodo. CKAN is mainly used by governmental institutions to disclose their data, its features. DSpace enables system administrators to parametrize additional metadata schemas that can be used to describe resources.

28 Conclusion… Dspace is often compared with Dataverse and is used for storing scientific data. Zenodo and Figshare provide ways to reserve a permanent link and a DOI, even if the actual dataset is under embargo at the time of first citation. Dspace, Dataverse and CKAN can be installed in an institutional server instead of relying on external storage provided by contracted services.

29 Conclusion… Dataverse repository software focuses mainly on social science data, its improvisational tools to analyze and explore only for tabular data. Geospatial data is handled by the Dataverse and with the help of worldmap Dataverse also has some features like Guestbook template which allows to record the details of the users downloading the data

30 Re-Mix Harvesting XML format
Notification through whenever update is made. LiveDVD-Koha, DSpace, VuFind, SubjectsPlus, and WordPress. Plugin of VuFind.

31 References Amorim, Ricardo Carvalho; Castro, João Aguiar; Rocha, João; Ribeiro, C. (2015). A Comparative Study of Platforms for Research Data Management: Interoperability, Metadata Capabilities and Integration Potential. In L. P. R. Alvaro Rocha, Ana Maria Correia, Sandor Costanzo (Ed.), Maturity, Benefits and Project Management Shaping Project Success (pp. 101–111). Springer International Publishing. Amorim, R. C., Castro, J. A., Rocha da Silva, J., & Ribeiro, C. (2017). A comparison of research data management platforms: architecture, flexible metadata and interoperability. Universal Access in the Information Society, 16(4), 851–862. Breu, F., Guggenbichler, S., Wollmann, J. (2008). Research and Advanced Technology for Digital Libraries. Vasa. Brook, C. (2018). What is a Data Repository. Retrieved June 30, 2019, from Devarakonda, R., Palanisamy, G., Green, J. M., & Wilson, B. E. (2011). Data sharing and retrieval using OAI-PMH. Earth Science Informatics, 4(1), 1–5. Institute for Quantitiative Social Sciences. (2019). Features : The Dataverse Project. Retrieved June 30, 2019, from features Lyon, L. (2007). Dealing with Data : Roles , Rights , Responsibilities and Relationships Consultancy Report. JISC Digital Repositories Conference, Manchester, June 2007, (June), 1–65. Mahato, S. S., & Gajbe, S. B. (2018). A Comparative study of Open source data repository software: Dataverse and CKAN. Library Herald, 56(1), Rocha da Silva, J., Ribeiro, C., Correia Lopes, J., da Silva, J. R., Ribeiro, C., Lopes, J. C., … Correia Lopes, J. (2012). Managing multidisciplinary research data: Extending DSpace to enable long-term preservation of tabular datasets. IPres 2012 Conference, 105–108. Retrieved from Conference Proceedings Willis, C., Hill, C., , N. C., Greenberg, J., Hill, C., , N. C., … , N. C. (2012). Analysis and Synthesis of Metadata Goals. Journal of the American Society for Information Science and Technology, 63(8), 1505–


Download ppt "Prof. Dr. Mohan Raj Pradhan Ms. Parbati Pandey"

Similar presentations


Ads by Google