Presentation is loading. Please wait.

Presentation is loading. Please wait.

Institutional Repositories and the Ontario Scholars Portal

Similar presentations


Presentation on theme: "Institutional Repositories and the Ontario Scholars Portal"— Presentation transcript:

1 Institutional Repositories and the Ontario Scholars Portal
OZone Institutional Repositories and the Ontario Scholars Portal Gabriela Mircea

2 Agenda Project Overview DSpace Opportunities CORIL community
Ozone project overview The platform used for IR Look at the opportunities that IR software offers to consortia for resource sharing And then at coril community

3 Ontario Scholars Portal
Ontario Council of University Libraries Single point of access Explore the use of institutional repositories The Ontario Scholars Portal is a project of the Ontario Council of University Libraries. Scholars Portal is an initiative to give users a single point of access to high-quality research from a broad range of disciplines.

4 OZone shared IR environment to develop electronic archives
Share information literacy materials amongst libraries host institutional levels IRs Consortium aggregate for searching One of the “Ontario Scholars Portal” programs is to explore the use of institutional repositories as a tool for consortial resource sharing. O-Zone is a shared IR environment being used to develop archives of electronic resources, to exchange electronic instructional packages among libraries, and to host institutional IRs that can be aggregated at a consortium level for searching.

5 DSpace Captures Describes Distributes Preserves
The platform we used to build the Ozone institutional repository is DSpace, which was jointly developed MIT Libraries and Hewlett-Packard (HP) and is now freely available as an open source software DSpace manages and distributes digital items, made up of digital files (or bitstreams) and allows for the creation, indexing, and searching of associated metadata to locate and retrieve the items. It is designed to support the long-term preservation of the digital material stored in the repository. Captures Digital research material in any formats directly from creators using simple web forms Describes Descriptive, technical, rights metadata Assigns persistent identifiers Distributes Searches metadata and full text Google crawling Delivers via Web, with necessary access control Preserves Large-scale, stable, managed long-term storage

6 DSpace Captures Describes Distributes Preserves
▬ Content is provided by faculty, researchers, … ▬ Via simple web forms ▬ Digital research material in any format The platform we used to build the Ozone institutional repository is DSpace, which was jointly developed MIT Libraries and Hewlett-Packard (HP) and is now freely available as an open source software DSpace manages and distributes digital items, made up of digital files (or bitstreams) and allows for the creation, indexing, and searching of associated metadata to locate and retrieve the items. It is designed to support the long-term preservation of the digital material stored in the repository. Captures Digital research material in any formats directly from creators using simple web forms Describes Descriptive, technical, rights metadata Assigns persistent identifiers Distributes Searches metadata and full text Google crawling Delivers via Web, with necessary access control Preserves Large-scale, stable, managed long-term storage

7 DSpace Captures Describes Distributes Preserves
▬ Metadata is provided by faculty, researchers .. ▬ Includes  Descriptive information  Technical information  Rights management information Captures Describes Distributes Preserves The platform we used to build the Ozone institutional repository is DSpace, which was jointly developed MIT Libraries and Hewlett-Packard (HP) and is now freely available as an open source software DSpace manages and distributes digital items, made up of digital files (or bitstreams) and allows for the creation, indexing, and searching of associated metadata to locate and retrieve the items. It is designed to support the long-term preservation of the digital material stored in the repository. Captures Digital research material in any formats directly from creators using simple web forms Describes Descriptive, technical, rights metadata Assigns persistent identifiers Distributes Searches metadata and full text Google crawling Delivers via Web, with necessary access control Preserves Large-scale, stable, managed long-term storage

8 DSpace Captures Describes Distributes Preserves
▬ Via secure web server ▬ Access control ▬ Persistent identifiers The platform we used to build the Ozone institutional repository is DSpace, which was jointly developed MIT Libraries and Hewlett-Packard (HP) and is now freely available as an open source software DSpace manages and distributes digital items, made up of digital files (or bitstreams) and allows for the creation, indexing, and searching of associated metadata to locate and retrieve the items. It is designed to support the long-term preservation of the digital material stored in the repository. Captures Digital research material in any formats directly from creators using simple web forms Describes Descriptive, technical, rights metadata Assigns persistent identifiers Distributes Searches metadata and full text Google crawling Delivers via Web, with necessary access control Preserves Large-scale, stable, managed long-term storage

9 DSpace Captures Describes Distributes Preserves ▬ Storage Large-scale
Stable Long-term Managed ▬ File migration as technology changes The platform we used to build the Ozone institutional repository is DSpace, which was jointly developed MIT Libraries and Hewlett-Packard (HP) and is now freely available as an open source software DSpace manages and distributes digital items, made up of digital files (or bitstreams) and allows for the creation, indexing, and searching of associated metadata to locate and retrieve the items. It is designed to support the long-term preservation of the digital material stored in the repository. Captures Digital research material in any formats directly from creators using simple web forms Describes Descriptive, technical, rights metadata Assigns persistent identifiers Distributes Searches metadata and full text Google crawling Delivers via Web, with necessary access control Preserves Large-scale, stable, managed long-term storage

10 DSpace technologies and standards
UNIX-like OS / Windows Java 1.4 or later Apache Ant 1.5 or later PostgreSQL 7.3 or later Jakarta Tomcat 4.x/5.x Lucene OAI support CNRI Handle System Dublin Core with Qualifiers Hardware and OS Platform independent UNIX-like OS (Linux, HP/UX etc) Java 1.4 or later (standard SDK is fine, you don't need J2EE) Apache Ant 1.5 or later (Java make-like tool) PostgreSQL 7.3 or later, an open source relational database. Jakarta Tomcat 4.x/5.x or Lucene for search indexing DSpace supports the Open Archives Initiative’s Protocol for Metadata Harvesting (OAI-PMH) v2.0 as a data provider DSpace uses the Handle System from CNRI to assign and resolve persistent identifiers for each digital item

11 Architecture The storage layer is responsible for physical storage of metadata and content. The business logic layer deals with managing the content of the archive, users of the archive (e-people), authorization, and workflow. The application layer contains components that communicate with the world outside of the individual DSpace installation, for example the Web user interface and the Open Archives Initiative protocol for metadata harvesting service. Each layer only invokes the layer below it; the application layer may not used the storage layer directly, for example. Each component in the storage and business logic layers has a defined public API. The union of the APIs of those components are referred to as the Storage API (in the case of the storage layer) and the DSpace Public API (in the case of the business logic layer). These APIs are in-process Java classes, objects and methods. It is important to note that each layer is trusted. Although the logic for authorising actions is in the business logic layer, the system relies on individual applications in the application layer to correctly and securely authenticate e-people. If a 'hostile' or insecure application were allowed to invoke the Public API directly, it could very easily perform actions as any e-person in the system. The reason for this design choice is that authentication methods will vary widely between different applications, so it makes sense to leave the logic and responsibility for that in these applications. The source code is organized to cohere very strictly to this three-layer architecture. Also, only methods in a component's public API are given the public access level. This means that the Java compiler helps ensure that the source code conforms to the architecture.

12 Information Model Communities and Sub-communities
Collections (in communities) Items (in collections) Bitstreams (in items) Community Sub Community Contains collections that can have individual workflows Communities and Sub-communities – contain collections that can have individual workflows Collections (in communities) Distinct groupings of like items Items (in collections) Logical content objects Receive persistent identifier Bitstreams (in items) Individual files Receive preservation treatment Collection Contains items Item Contains bundles, persistent ID Bundle Files name, user description, size, checksum

13 An example At the community level

14 And this is how an item looks like

15 Self submission Item templates Batch importing
- DSpace is also designed to make submission easy: DSpace Communities (such as university departments, labs, and research centers) can adapt the system to meet their individual needs and manage the submission process themselves. Permission based 7 simple steps, and with the custom submission forms in you can have as many or as few steps and also use any of the dublin core with qualifiers in any of the forms - batch tools to import and export items in a simple directory structure, where the Dublin Core metadata is stored in an XML file Item templates Batch importing

16 Workflow Each collection can have its own approval process
Up to 3 steps Reviewer Coordinator Metadata editor An message is sent to each person at the appropriate step in the workflow Workflow StepPossible actions Reviewer permissions (optional)- Can review content of all files submitted to collection - Can accept or reject all submissions to collection - Can send a message explaining decision - Rejection will stop submission - Acceptance will let submission go to next step - (Cannot edit metadata, or change files) Coordinator permissions (optional)- Can edit metadata of all submissions to collection - Can accept or reject all submissions to collection - Can send a message explaining rejection - Rejection will stop submission - Acceptance will move submission to next step Metadata editor permissions (optional)- Can edit metadata of all submissions to collection - Submission automatically becomes part of DSpace after this step - (Any approval would have happened before) One last possibility is that a workflow can be 'aborted' by a DSpace site administrator. This is accomplished using the administration UI.

17 Content Documents, such as articles, preprints, working papers, technical reports, conference papers Books Theses Data sets Computer programs Visualizations, simulations, and other models Administrative records Published books Images Audio files Video files Multimedia publications Reformatted digital library collections Learning objects Web pages DSpace accepts all manner of digital formats. Some examples of items that DSpace can accommodate are:

18 Format Support Supported Known Unsupported
Because the system will accept any file format, there are three levels of preservation defined for a given format: supported: we fully support the format known: we can recognize the format, but cannot guarantee full support unsupported: we cannot recognize a format; these will be listed as "application/octet-stream", aka Unknown Supported formats will be functionally preserved using either format migration or emulation techniques. Examples include TIFF, SGML, XML, AIFF, and PDF. Known formats are those that we can’t promise to preserve, such as proprietary or binary formats, but which are so popular that third party migration tools will likely emerge to help with format migration. Examples include Microsoft Word and Powerpoint, Lotus 1-2-3, and WordPerfect. Unsupported formats are those that we don’t know enough about to do any sort of functional preservation. This would include some proprietary formats or a one-of-a-kind software program.

19 New features Thumbnail enhancements Customizable submission forms
Full text indexing Sub-communities Enhanced administrative interface New roles for collection mangers Thumbnails in item view can now be switched off/on Browse and search thumbnail options Improved item importer Customisable submission forms added Configurable number of index terms in Lucene for full-text indexing Submit button on collection pages only appears if user has authorisation Community and collection strengths displayed

20 Future steps DSpace 2.0 Modularity Internationalisation Preservation
Given the number and diversity of individuals and organisations interested in using DSpace, one size will certainly not fit all. In addition, the area of digital preservation (as well as other aspects of digital asset management) is very much in the research arena. One of the drivers behind the open source approach is to promote and facilitate experimentation; some of these approaches may turn out not to be great, and never production quality. Although DSpace 1.x is divided into a number of components, modifying or extending functionality could still be simplified further in the long term. Once 2 modifications have been made to the system, it can be difficult to update your modified system in line with the main DSpace source distribution. For instance, if you’ve made a large number of modifications to version 1.1.1, to update your code base to DSpace 1.2 can involve quite a complex merge operation. 5.1.2 Internationalisation Additionally the DSpace Web user interface should allow modules to be internationalised’. Internationalisation refers not only to allowing translations of the UI to be made without affecting the underlying DSpace module, but also allowing DSpace instances to be ‘multi-lingual’, for example to allow a French user to see the UI in French and an English user to see it in English. Preservation In the DSpace 1.x design, all metadata is in a relational database, and all content bitstreams are in the file system on the server. This is not an ideal situation from a preservation point of view. · You need to be careful about backups. If for some reason the database becomes corrupt, all you have is a jumble of bitstreams. · In any sort of backup restoration, disaster recovery, or situation where an organisation is taking over the contents of a DSpace from another organisation, one must have the hardware and software stack available that runs a compatible version of the relational database software so that the metadata can be restored. In order to be truly long-term, data needs to be independent of the software and hardware stack in which it is stored. · Another essential digital preservation process is basic auditing; i.e. periodically ensuring that the content in the archive is all present and correct, to ensure that storage systems are not failing, and content has not become corrupt. In DSpace 1.x, this is relatively simple in the case of bitstreams (sizes and checksums are stored for each), but the all-important data in the relational database is not easily auditable in this way. · Additionally, the existing DSpace 1.x approach forces you to go through hoops to actually share content, an essential component of many digital preservation strategies involving keeping multiple copies of content in a variety of geographic locations. To share an object, you need to know how to retrieve the appropriate bits and pieces from the database and file system. However, the content management API does some of this work for you, so this is not as critical a concern here as the previous points. In order to improve a DSpace instance’s resilience, it is proposed that storage of DSpace content is reworked such that both the metadata and content associated with an archival object are stored together, with the metadata in a standardised format that can be read without the need for a particular hardware or software stack. This means the Archival Information Package (AIP) is a more tangible concept in the system, as opposed to being a purely logical concept (“an AIP consists of these files here, together with this row from this database table and these rows from that database table” etc.) This approach has advantages, from both digital preservation and scalability viewpoints: · No reliance on a particular set of hardware or software to be able to read the AIP metadata · Don’t need to understand how DSpace system works to be able to transfer or rescue data · Hence easier to reconstruct an archive in the event of disaster recovery, or receiving stewardship of assets from another organisation 4 · These ready-packaged AIPs are easy to move around, making mirroring and distributed storage strategies easier to implement In this approach, these AIPs represent the real ‘archive’. For performance or other reasons, various indices or copies of the metadata may be stored in a database or other mechanism, but these are considered a cache of the information in the asset store.

21 CORIL Increase quantity and availability Sharing and dissemination
Learning community Peer-reviewed publication CORIL (Cooperative Online Repository for Information Literacy) is an initiative to support information literacy instruction among Canadian universities. The aim of the project is to: Increase the quantity and availability of a range of high quality learning materials including interactive, multi-media tutorials, generic and discipline-based guides and assessment tools Facilitate the sharing and dissemination of learning resources Create a learning community of Canadian instruction librarians Provide a mechanism for peer-reviewed publication of instruction materials

22 One community in ozone

23 Gabriela Mircea


Download ppt "Institutional Repositories and the Ontario Scholars Portal"

Similar presentations


Ads by Google