Fedora An Architecture for Complex Objects and their Relationships Old Dominion University, VA April 7, 2005 Sandy Payette Cornell University.

Slides:



Advertisements
Similar presentations
Fedora TM and Repository Implementation at UVa Leslie Johnston, UVa Library DASER Summit November 22, 2003.
Advertisements

DuraSpace: Digital Information All Ways, Always Pretoria, South Africa May 14 th, 2009.
An Introduction to Repositories Thornton Staples Director of Community Strategy and Alliances Director of the Fedora Project.
Goals for RUcore o Flexible, extensible cyberinfrastructure for Rutgers University o Integrating platform for legacy information systems o Support preservation.
A. Grigorov, A. Georgiev, M. Petrov, S. Varbanov, K. Stefanov Building a Knowledge Repository for Life-long Competence Development.
The Fedora Project JA-SIG Winter Conference December 9, 2003 Tim Sigmon University of Virginia.
Repositories: Disruptive Technology or Disrupted Technology? Sandy Payette, Executive Director DORSDL Workshop at ECDL 2008 September 2008.
Depositing e-material to The National Library of Sweden.
Planning for Flexible Integration via Service-Oriented Architecture (SOA) APSR Forum – The Well-Integrated Repository Sydney, Australia February 2006 Sandy.
Fedora Commons: Introduction and Update Swedish National Library June 24, 2008.
Fedora New Features, New Collaborations, Bright Future Fedora Users Conference Copenhagen, Denmark September 28, 2005 Sandy Payette Co-Director Fedora.
Building a Digital Library with Fedora International Conference on Developing Digital Institutional Repositories Hong Kong December 9, 2004.
The Fedora Project April 28-29, 2003 CNI, Washington DC Thornton Staples University of Virginia Sandy Payette Cornell Information Science.
Update on the Fedora Project Where we’ve been and where we’re going Fedora Users Conference Rutgers University May Sandy Payette Co-Director.
Representing and Storing Complex Digital Objects Fedora CS 431 – April 11, 2005 Carl Lagoze – Cornell University Acknowledgements: Sandy Payette (Cornell)
The Fedora Project March 19, 2003 ISTEC Symposium, Brazil Sandy Payette Cornell Information Science.
The Fedora Project Where we’ve been and where we’re going Mellon OS Retreat March 2005 Sandy Payette Cornell University.
The Fedora Project Update as of January 2004 Ithaca, NY January 29, 2004 Sandy Payette Cornell Information Science.
A Framework for Distributed Preservation Workflows Rainer Schmidt AIT Austrian Institute of Technology iPres 2009, Oct. 5, San.
The Mellon-Funded Fedora Project Technical Specifications Review August 26, 2002 Sandy Payette Cornell Information Science.
The Fedora Project DLF Forum Albuquerque, NM November 17, 2003 Sandy Payette Cornell Information Science.
AgriDrupal - a “suite of solutions” for agricultural information management and dissemination, built on the Drupal CMS; - the community of practice around.
FEDORA Project McGill University May Bill Parod Academic Technologies Northwestern University
Digital Object: A Virtual Online Storage Solution 598C Course Project Huajing Li.
Architecting an Extensible Digital Repository Anoop Kumar, Ranjani Saigal,Rob Chavez, Nikolai Schwertner Tufts University, Medford, MA.
DuraSpace, Fedora and DuraCloud Thorny Staples Director, Community Strategy and Alliances ESIP Meeting, July 8, 2009.
Update on the Fedora Project Common Solutions Group September 2005 Tim Sigmon University of Virginia Special thanks to the Fedora Team for these slides!
METS-Based Cataloging Toolkit for Digital Library Management System Dong, Li Tsinghua University Library
Tutorial – Semantic Digital Libraries, May 9, 2007 WWW 2007 Copyright , DERI NUI Galway, University of Vienna, Fraunhofer IPSI, Cornell University.
1. 2 introductions Nicholas Fischio Development Manager Kelvin Smith Library of Case Western Reserve University Benjamin Bykowski Tech Lead and Senior.
Fedora Commons Overview and Future Plans Sandy Payette, Executive Director Cornell University Library Metadata Working Group June 13, 2008.
1 XML as a preservation strategy Experiences with the DiVA document format Eva Müller, Uwe Klosa Electronic Publishing Centre Uppsala University Library,
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
Information Network Overlay Architecture Adding Value to Digital Content Carl Lagoze CS 431 – May 4, 2005 Cornell University.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
The Mellon-Funded Fedora Project A Briefing for the Cornell University Library January 24, 2002 Sandy Payette Thorny Staples Ross Wayland.
The Fedora Project April 28-29, 2003 CNI, Washington DC Thornton Staples University of Virginia Sandy Payette Cornell Information Science NOTE: CSG
Web Services and Fedora EDUCAUSE Mid-Atlantic Regional Conference January 14, 2003 Tim Sigmon University of Virginia.
This presentation describes the development and implementation of WSU Research Exchange, a permanent digital repository system that is being, adding WSU.
OAI Overview DLESE OAI Workshop April 29-30, 2002 John Weatherley
Technical Update 2008 Sandy Payette, Executive Director Eddie Shin, Senior Developer April 3, 2008 Open Repositories 2008, Fedora User Group.
A Fedora 3 to 4 Migration Case Study for UNSW Australia Library Fedora 4 Training Workshop, eResearch Australasia 2015, Brisbane UNSW Library Arif Shaon,
A Fedora 3 to 4 Migration Case Study for UNSW Australia Library Fedora 4 Training Workshop, eResearch Australasia 2015, Brisbane UNSW Library Arif Shaon,
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
1 Registry Services Overview J. Steven Hughes (Deputy Chair) Principal Computer Scientist NASA/JPL 17 December 2015.
Fedora Content Modeling for Improved Services for Research Databases Open Repositories 2009 Mikael Karstensen Elbæk Alfred Heller Gert Schmeltz Pedersen.
DSpace - Digital Library Software
DSpace System Architecture 11 July 2002 DSpace System Architecture.
The library is open Digital Assets Management & Institutional Repository Russian-IUG November 2015 Tomsk, Russia Nabil Saadallah Manager Business.
Fedora Metadata The Basics 9/9/2008. Mini Glossary Fedora: ‘ Flexible Extensible Digital Repository Object Architecture;’ asset repository, metadata architecture.
Carl Lagoze Digital Library Service Registry Workshop Services in a Scholarly Communication Framework.
The Mellon-Funded Fedora Project A Presentation to the European Digital Library Conference September 17, 2002 Sandy Payette and Thornton Staples.
Fedora Service Framework Sandy Payette, Executive Director UK Fedora Training London January 22-23, 2009.
An Introduction to Data Modeling with Fedora Thorny Staples Fedora Commons, Inc.
Fedora Digital Object in a Nutshell Sandy Payette, Executive Director UK Fedora Training London January 22-23, 2009.
Building Foundations: Fedora, Fez, and the ADR prepared by Jessica Branco Colati ADR Project Director, Colorado Alliance of Research Libraries
Fedora, Fez, and the ADR an ePoster presented at Institutional Repositories: Disseminating, Promoting, and Preserving Scholarship Utah State University.
The Fedora Project March 10, 2003
? What is Institutional Repository for Rutgers University
The Fedora Project March 19, 2003 ISTEC Symposium, Brazil
Overview: Fedora Architecture and Software Features
Flexible Extensible Digital Object Repository Architecture
Flexible Extensible Digital Object Repository Architecture
An Architecture for Complex Objects and their Relationships
VI-SEEM Data Repository
Fedora Metadata The Basics 9/9/2008.
NSDL Data Repository (NDR)
Fedora Filling the “Sweet Spot” in the Information Landscape
Malte Dreyer – Matthias Razum
The Fedora Project April 28-29, 2003 CNI, Washington DC
Presentation transcript:

Fedora An Architecture for Complex Objects and their Relationships Old Dominion University, VA April 7, 2005 Sandy Payette Cornell University

The Fedora Project Fedora –Flexible –Extensible –Digital –Object –Repository –Architecture Open source software –Not Red Hat ! –Mozilla Public License

The Digital Problem Space: Manage, Publish, Preserve… Conventional digital objects Complex, compound, dynamic objects

Fedora History Cornell Research (1997-present) –DARPA and NSF-funded research –First reference implementation developed –Distributed, Interoperable Repositories (experiments with CNRI) –Policy Enforcement First Application ( ) –University of Virginia digital library prototype –Technical implementation: adapted to web; RDBMS storage –Scale/stress testing for 10,000,000 objects Open Source Software (2002-present) –Andrew W. Mellon Foundation grants –Technical implementation: XML and web services –Fedora 1.0 (May 2003) –Fedora 2.0 (Jan 2005)

Why Fedora? (1) Digital Object Model –Abstraction for heterogeneous digital resources –Flexibility for different “content models” –Container for content and metadata (both local and remote) –Behaviors (extensible service attachments) Service-Oriented Architecture –Core Repository web service (SOAP and REST) –Service Framework for collaborating and supporting services Feature-worthy for archiving and preservation –XML object serialization for ingest, storage, and export –Content versioning –Event history –Manage dependencies (e.g., object-to-object relationships) –New service framework for plug-in of preservation services

Why Fedora? (2) Relationships –RDF-based index –Define and query “the graph” of objects in repository Content repurposing –Reuse digital content in different contexts –Re-purpose content via mechanisms for dynamically transforming content to fit new requirements Web Services –SOAP and REST bindings –WSDL to define interfaces –XML transmission Easy integration with other apps and systems –Does not assume any particular workflow or end-user application –Generic repository service as substrate

Fedora Use Cases Digital Library Collections Institutional Repository Educational Software Information Network Overlay Digital archives and preservation Digital Asset Management Content Management System Scholarly publishing

Selected Fedora Adopters University of Virginia (image, EAD, e-texts)imageEADe-texts VTLS Tufts University OhioLink Northwestern: Library and Academic Technologies National Science Digital Library (NSDL): Core Integration ARROW: National Library of Australia and Monash University Royal Library Denmark, National Library, and DTU Rutgers University Indiana University American Geophysical Union Library of Congress: I Hear America Singing University of Delaware Hamilton College Cornell CIT Tibetan Buddhist Resource Center Yale University DISA – South Africa, History of Apartheid resistance

Fedora Digital Object Model

Digital object identifier Reserved Datastreams Key object metadata Disseminators Pointers to service definitions to provide service-mediated views Datastreams Set of content or metadata items Fedora Digital Object Model Component View Persistent ID (PID) Dublin Core (DC) Datastream Audit Trail (AUDIT) Relations (RELS-EXT) Disseminator Default Disseminator

Managed Content Fedora stores and manages the content bytestream Fedora stores a reference (URL) to the content Fedora stores a reference (URL) to the content, but will not mediate access to content. Fedora stores a name-spaced block of XML content within the Fedora digital object XML file. The Datastream Component External Referenced External Redirected Inline XML 4 Classifications for Datastreams

implements Behavior Definition (BDEF Object) External Service The Disseminator Component Library Resource (Data Object) Persistent ID (PID) Disseminator Default Disseminator Persistent ID (PID) Service Binding Metadata (WSDL) Persistent ID (PID) Service Definition Metadata Behavior Mechanism (BMECH Object) Encapsulates references to service definition objects

Example Digital Object Get Object Profile Get DC Get THUMB Get MRSID Get Medium Quality Get High Quality Get MARC Record External Service External Service Default Disseminations Custom Disseminations Persistent ID (PID) DC (text/xml) AUDIT (text/xml) RELS-EXT (text/xml) Image Disseminator Default Disseminator Metadata Disseminator MRSID (image/x-mrsid) THUMB (image/gif)

Fedora – XML for digital objects FOXML (Fedora Object XML) –Simple XML format directly expresses Fedora object model –Easily adapts to Fedora new and planned features –Easily translated to other well-known formats –Internal storage format for objects in repository XML-based Ingest/Export of objects –FOXML, METS (Fedora extension) –Extensible to accommodate new XML formats –Planned: METS 1.4, MPEG21 DIDL

FOXML – Object Properties 2

FOXML – Datastream (type ‘E’) 2

FOXML – Relationships Datastream 2

FOXML – Disseminator 2

Fedora Repository Service

RDF files rdbms

Fedora 2.0 Repository Modules Management and Validation (via API-M) –Object Ingest and Export –Object Validation –Object Maintenance (create-modify-delete-purge-etc.) –Object Versioning –Initiates incremental object Indexing Access and Dissemination (via API-A and API-A-Lite) –Get object profile –Get Datastream Disseminations –Get Service-Mediated Disseminations Storage –Default file system implementation –Stores XML object wrappers and datastream byte stream content –Relational database registry

Fedora 2.0 Repository Modules Basic Search –Search object properties and DC record of each object –Relational database Resource Index Search –Search “graph” of objects with object properties, object relationships, DC –Kowari triple-store (RDF-based index of repository) Authorization –XACML policy enforcement –Repository policies and object-specific policies –Sun XACML Engine OAI-PMH

Fedora Web Service APIs in a Nutshell Management Service (API-M) –Ingest Object –Export Object –Get Object XML –Purge Object –Modify Object –Get Next PID –Get Datastream(s) –Get DatastreamHistory –Get DisseminatorHistory –Get Disseminator(s) –Add/modify/purge Datastream –Add/modify/purge Disseminator –Set State

Fedora Web Service APIs in a Nutshell Access Service (API-A and API-A-LITE) –Describe Repository –Get Object Profile –Get Object History –Get Datastream –Get Dissemination –Find Objects –Resume Find Objects

Fedora Web Service APIs in a Nutshell API-A-Lite –Repository-level operations: fedora/describe - Describe Repository fedora/search – methods to locate objects via the default repository index –Object-level operations: fedora/get - method to get object profile fedora/get/.. – method to “disseminate” a view of an object’s content Fedora/getMethods – methods get information about all disseminations available on object OAI-PMH Provider Service –All OAI-PMH methods to harvest OAI-DC from each object

Fedora 2.1 (May 2005) Authentication plug-ins –HTTP basic authentication and SSL –Plug-in #1 : Tomcat user/password file/db –Plug-in #2 : LDAP tie-in –Plug-in #3 : Radius Authentication Authorization module –XML-based policies using XACML –Fine-grained policy enforcement (API actions X subject attrs X object attrs) –Repository-wide policies –Object-specific policies

Fedora Clients Fedora Administrator (via SOAP interfaces) –Java Swing client –Ingest/Export objects –Batch and single object creation and modification –Wizards for creating BDEF/BMECH objects Web Browser (via REST interfaces) –API-A-LITE: Access, Search, –OAI –RISearch: Resource Index Search –API-M-LITE: Selected management operations Command Line Utilities –Ingest, export, purge –Migration Policy Builder (available in 2.1) –Graphical user interface to assist in authoring XACML policies

Fedora Service Framework (as of Fedora 2.1)

Phase 2 – Preservation Services Enhanced ingest/export (for archive transfers) –Focus on exchange of self-contained archives among different systems –OAI Harvesting of complex objects (e.g., MPEG21-DIDL) Preservation Integrity Service –Check Datastreams on ingest and modification –Validate byte stream format integrity (e.g., via JHOVE) –Checksums Preservation Monitoring Service –Dependency checking –Publication of repository events significant to preservation

Fedora Service Framework (Year 1 - starting v2.1)

Fedora Service Framework (Year 2)

Fedora Service Framework (Year 3)

Fedora Resource Index: Practical Use of Semantic Web Technologies

Fedora Digital Objects Resource Index View

Fedora Digital Objects Service Relationships – Object to BDef/BMech

Fedora 2.0 and RDF Object-to-object Relationships –Ontology of common relationships (RDF schema) –Relationships stored in special datastream (RELS-EXT) Resource Index (RI) –RDF-based index of repository (Kowari triple-store) –Graph-based index includes: Object properties and Dublin Core Object Relationships Object Disseminations RI Search –Powerful querying of graph of inter-related objects –REST-based query interface (using RDQL or ITQL) –Results in different formats (triples, tuples, sparql)

Uses of Object Relationships Define collections (e.g., collection objects) Assert critical relationships among object for management purposes Enable network overlay –Surrogate objects referring to external entities –Assert relationships among them –Assert other relationships (e.g., annotations) Enable navigation of repository (as tree or graph)

Fedora Relationship Ontology (RDFS) isPartOf / hasPart isMemberOf / hasMember isDescriptionOf / hasDescription hasEquivalent … others

Demo: Collection – Member Relationships Collection Object [smiley]smiley –Datastream containing a query to Resource Index for all members of collection Image Objects [brush]brush –Use RELS-EXT datastream to assert relationship to collection object

Example: UVA’s Scholarly Text Collections

UVa TEI Book Content Model

TEI Book Example link

Example: UVA’s Quantitative Data Collections

Example: NSDL Network Overlay Architecture Fedora

Phase 2 - Development Plans New Services (as depicted in Framework) Federated repositories Performance – goal of 10 million objects Web services security and Shibboleth Event Notification – pub/sub Code Refactoring New Client Applications –Fedora-Web-IR (for institutional repository use) –Advanced Scholarly workbench –Workflow –Tools for RDF browse and graph traversal

Fedora Software Distribution Open Source (Mozilla Public License) 100% Java (Sun Java J2SDK1.4) Supporting Technologies –Apache Tomcat –Apache Axis (SOAP) –Xerces for XML parsing and validation –Saxon for XSLT transformation –Schematron for validation –RDBMS: MySQL, Mckoi, Oracle 9i support –Kowari triple store –Sun XACML Engine for policy enforcement Deployment Platforms –Windows 2000, NT, XP –Solaris –Linux –Mac OSX

Fedora Development Consortium Advisory Board –University of Virginia –Tufts –VTLS –ARROW (Monash University and Nat’l Lib Australia) –Harris Corp. –Danish Royal Library and DTU –Northwestern University –NSDL – Core Integration Mission –Requirements Definition, Specifications. Joint Development –Commission of Working Groups Content Modeling Outreach and Education Workflow and Service-Oriented Processes –Recommendation for Long-Term sustainability model Governance and Funding Set Fedora Free – full open source model (e.g., public SourceForge) Code Maintenance (UVA until 2012; plan for beyond)

Recent News Downloads ~20K; 52 countries Growth – lots of new interest Fedora Users Conference (May 13-14, Rutgers) Interesting new adopters –OhioLink –DISA (South Africa history) Interesting new proposals –Company X finalist for large government contract –Cornell Lab of Ornithology (data + tools + documents) Recent Article –XML CoverPages

Finally, a new Fedora Web Site!