Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Mellon-Funded Fedora Project A Briefing for the Los Alamos National Laboratory August 26, 2002 Sandy Payette Cornell Information Science.

Similar presentations


Presentation on theme: "The Mellon-Funded Fedora Project A Briefing for the Los Alamos National Laboratory August 26, 2002 Sandy Payette Cornell Information Science."— Presentation transcript:

1 The Mellon-Funded Fedora Project A Briefing for the Los Alamos National Laboratory August 26, 2002 Sandy Payette Cornell Information Science

2 Motivation The Problem of Complex Content

3 Digital Library Content not just documents... Some familiar objects Complex, compound, dynamic objects

4 Key Research Questions How can clients interact with heterogeneous collections of complex objects in a simple and interoperable manner? How can complex objects be designed to be both generic and genre-specific at the same time? How can we hide the complexity of an object’s underlying data structures and relationships from clients? How can we associate services and tools with objects to provide different presentations or transformations of the object content? How can we associate specialized, fine-grained access control policies with specific objects, or with groups of objects?

5 The Flexible Extensible Digital Object Repository Architecture (FEDORA) Developed as a DARPA and NSF-funded research project at Cornell (1997-present) – CORBA-based reference implementation – Extensive interoperability testing – Policy Enforcement Interpreted and re-implemented at University of Virginia (1999) – Simple web-oriented implementation, focused on access to collections – Java servlet and relational db Virginia prototype supported testbed of 10,000,000 digital objects with very good results (1999-2001) Andrew W. Mellon Foundation granted Virginia and Cornell $1,000,000 to develop a full-featured production FEDORA system that that is web-based (2002+)

6 FEDORA Original Research Goals Flexibility – object model that fits many different contexts Management - of distributed digital content and services Access – stable interfaces to digital objects; behavior-centric Interoperability – among digital objects and repositories Extensibility – easy evolution of object behaviors Security – rights management and access control Preservation – of content, plus “look and feel”

7 Model for Collaboration Digital Library Research and Real Library Requirements University of Virginia developing extensive digital collections since 1992 Virginia Digital Library R&D Group chartered with finding solution for integration Formal Requirements analysis Search for commercial products Discovery: Cornell research parallels stated requirements

8 Virginia Requirements: Heterogeneous Digital Collections Books Rare Books MultimediaMusic E-textsMapsPhotographsStatistics VideoArtManuscriptsData Images 3-D Objects Journals Sound Effects

9 Virginia Requirements: Managing the Collections Scalability to support hundred of millions of objects Persistent unique names for all resources without respect to machine address Support inter-relationships among objects Manage the digital resources and metadata, as well as computer programs, services and tools that support them Enforce appropriate policies for use of Library resources Provide a high level of security Support preservation activities appropriately

10 Virginia Requirements: Delivering the Collections Well-architected, flexible relationships between services/tools and digital content Digital objects, themselves, have ability to provide users with an appropriate launch-pad or tool to use the object content Every resource can be used in any number of contexts Move towards a digital library that is configurable by an “aware” user Provide resource discovery (searching) across the full collection Deep searching in particular collections

11 Shortcomings of commercial digital library products Narrow focus on specific media formats (e.g. image databases, document management) Fail to effectively address interrelationships among digital entities Fail to address interoperability; no open interfaces to facilitate sharing of services; no standard protocols for cross-system interoperability Fail to provide facilities for managing programs and tools that are integral to delivering digital content. Not extensible; does not enable easy integration of new tools and services

12 The Fedora Architecture Overview of Basic Model

13 FEDORA Basic Architectural Abstractions Digital Object – Container for aggregating any digital content – Content disseminations based on behavior definitions – Extensibility of behavior mechanisms Repository – Service layer for “contained” Digital Objects – Object lifecycle management – Access management

14 Persistent ID (PID) Disseminators SystemMetadata Datastreams FEDORA Digital Object Globally unique persistent id Public view: access methods for obtaining “disseminations” of digital object content Internal view: metadata necessary to manage the object Protected view: content that makes up the “basis” of the object

15 Persistent ID (PID) Service Definition Metadata SystemMetadata Datastreams Behavior Definition Object Behavior Mechanism Object Persistent ID (PID) Disseminators SystemMetadata Datastreams Data Object FEDORA Digital Object Architecture Persistent ID (PID) Service Binding Metadata SystemMetadata Datastreams

16 Data Object Association to External Behavior Service

17 Digital Object Interoperability Common Behaviors for Variable Content Functional equivalency

18 Digital Object Extensibility Adding New Behaviors The same underlying content... can be operated on in novel ways… Book Photo Collection to create new disseminations not originally conceived of Digital Object

19 Virginia Prototype Content Models and Fedora Demos

20 (Mycenae image example) General Image Content Model

21 (Pavilion III image example) MrSID Image Content Model

22 (Finding Aid example) Finding Aid Content Model

23 (TEI letter example) TEI Letter Content Model

24 (TEI book example) TEI Book Content Model

25 (Mycenae example) GDMS Content Model (lawn example)

26 (ICPSR survey example) Numerical Data Content Model

27 The New FEDORA Technical Specifications – Part I

28 Background Material Overview of Web Service Technologies

29 What is a Web Service? A distributed application that runs over the internet. An addressable network endpoint which receives structured messages returns structured responses. A web application that publishes an open interface through which clients can send requests and received responses.

30 How is this different from plain old web applications? Formally defined API (application programming interface) defines a set of abstract operations for a web service Published bindings for client to run operations Standard protocol for invoking operations on the service. XML as standard means of encoding service requests and responses.

31 Why are Web Services important? Interoperability – Web applications can interact and build upon each other – Data is transferred in an interoperable manner (e.g., over HTTP) – Data is encoded in an interoperable format (XML) Works in decentralized, distributed, operating- system independent environment. Standards-oriented Means to expose complex operations with rich data typing (via XML Schema language typing) Ease of integrating distributed systems via the Web W3C effort to develop this service architecture

32 How are Web Services Implemented? The Simple Object Access Protocol (SOAP) Approach – SOAP is a messaging protocol that can run over different transport protocols (e.g., HTTP, SMTP) – Operation oriented (send a request to a end point) – Like CORBA, RMI, DCOM…but for Web and simpler – Application APIs can be defined and published using the Web Service Description Language (WSDL) – Requests and responses sent as XML messages – Supports simple and complex data typing in requests and responses – Supports transmission of binary data within requests or response packages

33 How are Web Services Implemented? The REST (Representational State Transfer) Approach – URI + HTTP + XML – URI/resource driven; message built into a URI (URL) – HTTP GET or POST – Response is XML data – Issues: Not a standard, but a style of doing web apps; arguably it just gives a fancy name to how lots of people do applications on the web by default; nothing really new here; just argues to do things the way we have been, maybe a little more standard by using XML. Fragile service definition – URL’s change No data typing on requests Limited ability to transmit complex requests on URL W3C behind SOAP, but only one strong voice out there for REST (Prescod).

34 Example of Web Service using SOAP My Application SOAP/HTTP Google Web Service SOAP Request (XML) SOAP Response (XML) doSpellingSuggestion(payet) payette

35 XML SOAP Request SOAP-ENV:Envelope xmlns:SOAP-ENV=http://schemas.xmlsoap.org/soap/envelope/ xmlns:xsi="http://www.w3.org/1999/XMLSchema-instance xmlns:xsd="http://www.w3.org/1999/XMLSchema"> /e325JlNPASJu payet

36 <SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/1999/XMLSchema-instance" xmlns:xsd="http://www.w3.org/1999/XMLSchema"> < ns1:doSpellingSuggestionResponse xmlns:ns1="urn:GoogleSearch" SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/"> payette XML SOAP Response

37 New Fedora: Key Features Repository system exposed as two related Web services – described using WSDL – both SOAP and HTTP bindings Digital objects encoded and stored as XML using Metadata Encoding and Transmission Standard (METS) Digital object behaviors implemented as linkages to distributed web services (also described using WSDL) Digital objects support versioning of both content and services.

38 New Fedora System

39 Web Service Communication View

40 The New FEDORA Encoding Digital Objects in XML

41 Metadata Encoding and Transmission Standard (METS) XML “standard” for encoding descriptive, administrative, and structural metadata of digital library objects Developed under auspices of the Digital Library Federation METS standard maintained by the Network Development and MARC Standards Office of the Library of Congress http://www.loc.gov/standards/mets/

42 METS Schema METS is written in the XML Schema Language METS defines four sections for an object – Descriptive metadata – Administrative metadata – File group – Structure map METS goals include: – Facilitate management of objects within a repository – Provide a standard format for exchange of objects between repositories – Provide standard format for transmission of objects to users for rendering (via tools or applications)

43 Mapping Fedora to METS FedoraMETS Persistent Identifier (PID) Datastreams <METS:file ID="DS1.1" CREATED="2002-05-20T06:32:00“ MIMETYPE="image/jgp" <METS:file ID="DS1.0" CREATED="2002-05-10T02:32:00“ MIMETYPE="image/jgp"

44 Mapping Fedora to METS FedoraMETS System Metadata Disseminator

45 Digital Object Versioning Versioning within Data Objects – Datastream versioning Date/time stamped New version every time datastream is modified – Disseminator versioning Date/time stamped New version if disseminator is modified to reference a different Behavior Mechanism (“better mousetrap”) Versioning within Behavior Definition and Mechanism Objects – New versions of WSDL metadata recorded in these objects (with date/time stamps) – This deserves much more explanation that this slide can offer!

46 METS : Sample Fedora Object Click here for image digital object

47 Fedora Dissemination Database Alternate form of object storage that will act as a cache of most recent versions of digital objects Ensure high-performance access (disseminations) Repository system replicates from authoritative XML version of objects to relational database Plan to phase-out the database in Phase 2-3: – Access sub-system to work completely off the XML storage, as XML tools improve performance-wise. – Pursue different caching strategies as necessary

48 The New FEDORA Repository System Design

49 Fedora Repository System

50 FEDORA Web Service API Definitions “API-M” – interface for management sub-system – Operations necessary to create and maintain objects and their components – Interface directly with authoritative XML version of object “API-A” – interface for access sub-system – Operations necessary for clients to perform disseminations on objects in the repository – No direct access to object internal structure or components – Will work against cached representation of object to optimize performance.

51 Fedora Management Sub-System Implements API-M Object Management Object Component Management Object Validation PID Generation Interacts with Storage Subsystem

52 Other Sub-systems Storage Sub-system – Responsible for all matters pertaining to reading and writing objects from persistent storage – Modular design – can configure different object readers and writers to suit the context. – Modular design – can configure different data store strategies (in phase 1 will have file system and relational database) Security Sub-system – Store access control policies for repository and objects – Store user and group information – Enforcement of policies

53 Security Sub-system Access Control Policies General Purpose – “Only repository managers can add new disseminators to digital objects in the repository.” Object-Specific (e.g., Lecture object) – “Guests may view course syllabus and slides 1-10 of Lecture 1, but may not view the lecture video or any other slides.” – “Students may not view Lecture 2 video unless they submit assignment for Lecture 1.” See research at: http://www.cs.cornell.edu/payette/prism/security/policy.htm

54 Fedora Repository System

55 Fedora Access Sub-System Implements API-A Object Reflection – Identify the types of Behavior Definitions to which an object subscribes (via the object’s Disseminators) – Reflect on a Behavior Definition to identify the kinds of disseminations that can be run on the object (i.e,. as method requests) Dissemination – Fulfills requests for particular methods (i.e., of a Behavior Definition) to be run on an object – Mediates access to supporting services (i.e., Behavior Mechanisms) used to present or transform datastreams of the object – Returns a view of the object’s content to client

56 API-A: Object Reflection Requests Identify Types of Behavior Definitions Each Disseminator is said to “subscribe” to a Behavior Definition It does this by referencing the PID of a particular Behavior Definition Object. Each Behavior Definition Object contains metadata that describes a set of related behaviors (or operations) Via API-A, clients can send a service request to determine what Behavior Definitions an object subscribes to.

57 API-A: Object Reflection Request Get Behavior Methods Each Disseminator has a Behavior Definition Object associated with it. Each Disseminator has a Behavior Mechanism Object associated with it that describes how to bind to a particular service that complies with the Disseminator’s Behavior Definition. Via API-A, clients can send a service request to obtain the list of method definitions associated with a particular Disseminator of the digital object.

58 API-A: Object Reflection Requests Web-default, Web-image, Admin get-as-page; get-in-context MrSID Image Object Web-default Web-image Admin System Metadata Basis (MrSID-encoded image file) Repository API-A GetBehaviorDefinitions?PID=101 PID = 101 GetBehaviorMethods?PID=101&BID=Web-default

59 API-A: Dissemination Request Clients can obtain content from a digital object with minimal knowledge about the object. Behavior Definition identifiers and method definitions are the basis for making dissemination requests on digital objects Client’s do not need to know particulars of how to attach to the service (Behavior Mechanism) that is operating on its behalf. A dissemination request requires just three things: – Digital Object Identifier (PID) – Behavior Definition Identifier (BID) – Method name (and optional parameters) for a behavior

60 API-A: Dissemination Request Digital Object: 101 Image of bird Bird Digital Library1 White Birds: Image 1 Image 2 Image 3 GetDissemination?PID=101&BID=Web-default &method=get-as-page MrSID Image Object Web-default Web-image Admin System Metadata Basis (MrSID-encoded image file) Repository API-A

61 Disseminations Benefits Simple access: dissemination requests shield clients from the internal structure of digital objects Stable interface: dissemination requests are like requests against an abstract interface in that they are not tied to object implementation details that may change over time (e.g., storage locations of datastreams) Foster Interoperability: different digital objects can vary in both the format of content and how it is structured, yet we can access them in a consistent manner via disseminations.

62 The New FEDORA Software Deployment

63 Fedora Software Deployment Goals An efficient, scalable, freely distributable FEDORA repository system ASAP Make all software open source A complete basic management and access interfaces with the initial release Add other important digital library functionality in later releases Create multiple testbed repositories to deploy and evaluate the software Interoperability testing, including sharing of content and mechanisms among deployment partner repositories.

64 Deployment Group Indiana University: Digital Library group NYU: Humanities Computing group Tufts: Digital Collections and Archives Department Kings College London: Humanities Computing Oxford: Oxford Digital Library and The Refugee Studies Center Library of Congress: Motion Picture and Recorded Sound Division Northwestern University: library/academic computing Los Alamos National Laboratory: Research Library

65 Fedora Project Plan Phase 1: (pre-release Oct 31, 2002; final Jan 2003) –Repository system with management and access subsystems exposed as web services –Storage subsystem with XML object store and replication to relational database cache –Object builder tools (GUI and batch) –Basic set of behavior services Phase 2: Add more production support –Security and policy enforcement –Additional management tools –Optimize performance for accessing XML objects –Object versioning –Collection objects –Advanced disk management Phase 3: Enhance end-user support –New kinds of disseminators, with supporting behavior services –Efficiency and scale optimization

66 FEDORA Web Site: www.fedora.info

67 Questions and Discussion


Download ppt "The Mellon-Funded Fedora Project A Briefing for the Los Alamos National Laboratory August 26, 2002 Sandy Payette Cornell Information Science."

Similar presentations


Ads by Google