Presentation is loading. Please wait.

Presentation is loading. Please wait.

UC3 Curation Micro-Services Simplified Repository Ingest UC Curation Center California Digital Library May 20, 2010.

Similar presentations


Presentation on theme: "UC3 Curation Micro-Services Simplified Repository Ingest UC Curation Center California Digital Library May 20, 2010."— Presentation transcript:

1 UC3 Curation Micro-Services Simplified Repository Ingest UC Curation Center California Digital Library May 20, 2010

2 Agenda Introduction – Welcome and review of objectives – UC3 and digital curation – Landscape, assumptions, and imperatives Curation micro-services – The Merritt project – Design goals – The future of the DPR Simplified repository ingest – Concepts – Implementation – Demonstration Discussion

3 Objectives By the end of this discussion we hope that you will understand – Digital curation and the UC3 mission – The emergent, micro-services approach to curation infrastructure – The Merritt curation environment and the future of the DPR – The Merritt Ingest service and its interactions with the Identity, Storage, and Inventory services – How to incorporate the Ingest service into your workflows

4 University of California Curation Center (UC3) We’ve changed our name, but not our commitment – Ensuring that the information resources supporting, and resulting from, the University’s research, teaching, and learning mission remains authentic, available, and usable UC3 is a Center of Excellence – A creative partnership bringing together the expertise and resources of the CDL, the ten UC campuses, and the broader international curation community

5 Digital curation The set of policies and practices focused on managing and adding value to a body of trusted digital content – Preservation ensures access over time – Access depends upon preservation up to a point in time It can also be seen as facilitating the alignment of the scholarly and information lifecycles

6 Landscape Ever increasing number, size, and diversity of content – More stuff, less resources Ever increasing diversity of partners, stakeholders, and expectations – Producers / consumers  prosumers / conducers Inevitability of disruptive change – Technology – User expectation – Institutional mission and resources Problem or opportunity? $ Work Time

7 Assumptions Curated content gains – Safety through redundancy “Lots of copies keeps stuff safe” – Meaning through context “Lots of description keeps stuff meaningful” – Utility through service “Lots of services keeps stuff useful” – Value through use “Lots of uses keeps stuff valuable” Curation is an outcome, not a place – Decentralized curation can be as effective as centralized Curation stewardship is a relay

8 Imperatives Provide innovative, effective, and efficient services Plan for change – Focus on content, not the systems in which that content is managed Systems come and go (but not our system ;-) – Occam’s Razor and Murphy’s Law suggest Favor the small and simple over the large and complex Favor the minimally sufficient over the feature laden Favor the configurable over the prescribed Favor the proven over the (merely) novel Enable curation at the point of use Do more with less

9 Curation micro-services Devolve curation function into a granular set of independent, but interoperable micro-services – Since each is small and self-contained,they are collectively easier to develop, maintain, and enhance – Since the level of investment in, and therefore commitment to, any given service is small, they are easier to replace when they have outlived their usefulness – The scope of each service is limited, but complex behavior emerges from the strategic composition of individual atomistic services

10 Merritt curation micro-services Value Annotation of content by consumers Notification of new content availability Transformation to create derivatives Curation Utility Search of content and metadata Index to enable fast search of content for curation Preservation Context Characterization to extract content properties of curated content Replication for safety State Fixity to verify bit-level integrity for long-term retention for long-term reference Ingest Inventory Storage Identity

11 What is the future of the DPR? The DPR will continue to be operated as a core UC3 service – However, the components of the underlying system will be gradually replaced with their new Merritt- based equivalents – All content currently managed in the DPR will be automatically migrated to the new environment Micro-services also can be used to deploy locally- hosted repositories to meet specialized local needs

12 What is the future of the DPR? Continuing stewardship commitment by UC3 regarding managed content – Safety, persistence, efficiency, economy Streamlined workflows for submission, access, and collection management – Easy in, easy out Minimal technical requirements for contribution Great flexibility in deploying customized repository solutions

13 Design goals Policy neutral, protocol and platform independent – We know we can’t foresee all of the contexts in which these services can be usefully deployed Principle of least surprise – Extensive options, but meaningful default behavior Linked data – All entities exist within a web of semantic relations http://linkeddata.org/ The file system is the database – All content and metadata are expressed in the file system – Some subset of this information may be replicated in databases as an optimization for fast query

14 Design goals Code to interfaces – Underlying implementations should and will evolve over time without invalidating the public interface “contract” Exploit agile methods – Early prototyping, frequent refactoring – Stakeholder engagement The appropriate benchmark for submission user experience is Flickr

15 Storage concepts Node – A sub-domain of the Storage service established to meet specific policy, administrative, or technical needs Object – Encapsulation in digital form of an abstract intellectual or aesthetic work Version – A set of files representing a discrete state of the object – Any change to object state constitutes a new version File – A formatted bit stream

16 Storage concepts Stable reference – All objects (and their versions, and their files) managed in the Storage service have stable URLs that can be used to retrieve entities or metadata about entities, subject to appropriate access control http://example-store.edu/content/abc/1234 http://example-store.edu/content/abc/1234/3 http://example-store.edu/state/abc/1234/3/xyz File Version Object Storage service Request type Storage node

17 Ingest concepts Queue – Asynchronous processing of submitted material Batch – A set of digital objects submitted together – The unit of notification and reporting Job – The processing of a single digital object Handler – A specific processing stage

18 Ingest concepts Profile – A user-specific set of processing choices – Negotiated as part of the submission agreement Notification – At the time of ingest submission and completion – Our stewardship obligation begins at the time of ingest completion Submit by-value (a file) or by-reference (a URL)

19 Ingest process flow Submitting library Ingest Inventory Storage Node Identity Submit Create identifier Identifier Add version Get version metadata Version metadata Notification Version metadata Get version metadata Add version

20 Ingest implementation Submitting library Submitter Consumer Ingester Storage Queue HTML form Servlet Implicitly multi-threaded Servlet Implicitly multi-threaded Dæmon Explicitly multi-threaded ZooKeeper dæmon Job metadata Job payload Submission notification Ingest notification Batch or single object

21 Demonstration A few caveats… – Still a work in progress! – The final interface style sheets are not yet applied – Inventory and authentication/authorization services still under development – Full error reporting is not complete

22 Development roadmap First waveSecond wave Third waveFourth wave Fifth waveSixth wave IdentityInventoryIndexSearchNotificationAnnotation StorageIngestFixityReplicationCharacterizationTransformation Object / collection modelingMetadata standards Authentication / authorizationSemantic interoperability Policy / business model development

23 Early community reaction Collaborative development and integration projects with UC3 partners Independent implementation of key Merritt specifications Presentation/BOF at Open Repositories 2010 Digital curation group and Barcamp http://groups.google.com/group/digital-curation http://groups.google.com/group/digital-curation/web/curation-technology-sig

24 Discussion Will existing workflows continue to work? – Yes, we have a crosswalk from the existing METS- based feeder submission What are the minimal requirements for an acceptable digital object? – A per-object METS file is no longer required – The DPR will accept any content in any form However, the long-term curation service level may vary depending on the object’s formal characteristics, the presence (or absence) of accompanying metadata, the general state of curation understanding, and the availability of appropriate tools

25 Discussion How do I include metadata in my submission? – The Ingest submission form provides an opportunity to specify descriptive Dublin Kernel metadata – Administrative metadata is implied by the user’s profile Name, affiliation, contact information, collection, … – Technical (and, potentially, descriptive) metadata is automatically extracted by the characterization handler – Additional metadata can be expressed in recognized schemas and stored in files with well-known names mrt-dublin-core.txt mrt-mods.xml mrt-creative-commons.rdf …

26 Discussion Isn’t a enterprise storage solution or RDMS (e.g. Oracle) better than just relying on the file system? – No, we believe that there are a number of important advantages to directly exploiting the file system No vendor lock-in; propriety systems are difficult to debug Modern file systems have excellent scaling characteristics The ability to re-instantiate the system by walking the file system is significant

27 Discussion Why is there a separate Ingest service? Why can’t I just submit directly to the Storage service? – Merritt embraces the “separation of concerns” principle http://en.wikipedia.org/wiki/Separation_of_concerns The Storage service only “knows” about storage and has strict requirements for the allowable form of submissions The Ingest service was explicitly designed for user-facing operation and imposes minimal constraints on submission forms

28 Discussion (questions for you) What constitutes a “collection”? –Does it have hierarchically-arranged sub-components? What tools do you need to manage your collections effectively? How do you expect to retrieve content from the repository? – Following a saved link? – Search query? If so, what would be the query terms?

29 Discussion (questions for you) What level of access control is necessary? – Bright vs. dark policy – Embargo periods – Redaction Who are the subject populations? – UC affiliates – Non-UC How fine-grained must this control be? – Collection or object – Campus, research group, user

30 Discussion (questions for you) Are there other repository tools or protocols that we should investigate? Please respond to the DPR survey at http://vovici.com/wsb.dll/s/aaeg44ec2

31 For more information UC Curation Center http://www.cdlib.org/services/uc3 Curation micro-services https://confluence.ucop.edu/display/Curation DPR survey http://vovici.com/wsb.dll/s/aaeg44ec2 Digital curation group and Barcamp http://groups.google.com/group/digital-curation http://groups.google.com/group/digital-curation/web/curation-technology-sig UC3 Stephen AbramsErik Hetzner Margaret Low Mark ReyesPerry Willett Patricia Cruse Greg Janée John KunzeTracy Seneca Scott Fisher David Loy Isaac RabinovitchMarisa Strong


Download ppt "UC3 Curation Micro-Services Simplified Repository Ingest UC Curation Center California Digital Library May 20, 2010."

Similar presentations


Ads by Google