Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 British Computer Society North London Branch Major Programmes Richard Boulderstone July 27, 2004.

Similar presentations


Presentation on theme: "1 British Computer Society North London Branch Major Programmes Richard Boulderstone July 27, 2004."— Presentation transcript:

1 1 British Computer Society North London Branch Major Programmes Richard Boulderstone July 27, 2004

2 2 Agenda The British Library Vision Our Audiences/Customers ILS Digitisation Digital Object Management Web Archiving Collaboration Conclusions Magna Carta

3 3 What Is The British Library ? Created by British Library Act commenced 1973 Merger of British Museum Library (1753), National Reference Library of Science and Invention (1855), National Central Library (1916), and National Lending Library for Science and Technology (1961) Subsequent incorporation of British National Bibliography in 1974, India Office Library and Records in 1982, and British Institute of Recorded Sound in 1983 Flagship building at St Pancras - largest public building project in Great Britain in 20th century - opened in 1998

4 4 World-Class Research Library Key Statistics 2002/3 150 million items 8.2 million items consulted or supplied 408,000 reading room visits 618,000 catalogue records created 554,000 items received on legal deposit 651 km shelf capacity 92% full add 12 km each year 18.5M Web Site Hits (www.bl.uk) 2,400 staff £85.2 million Grant in Aid and £27.0 million trading income in 2001/2 Annual report -

5 5 … by aiding scientific advances … by adding commercial value for businesses … by contributing to UK “knowledge economy” … through the pursuit of academic excellence … through the stimulation of ideas … by adding to personal and family history … through increasing the nation’s cultural wellbeing … by giving information relevant to their interests … by helping to find the next medical breakthrough … by creating a link between the past, present and future Outcome Based Vision Pride Relevance Innovation To help people advance knowledge to enrich lives ‘The World’s Knowledge’

6 6 High R+D Industries Prof. Services Creative Industries RESEARCHER BUSINESS PUBLIC LIBRARIES EDUCATION Publishing Industries SMEs School Libraries Teachers Students 11>18 Lifelong Learner Visitors (child + adult) Lifelong Learner Lifelong Learner Librarians Public Libraries Public H.E. Libraries Scholars Lifelong Learner Postgraduate/ Undergraduate Commercial Researcher Broadcasting e.g. BBC Publishing e.g. OED On-site Visits School Tours Web Learning Reading Rooms Bespoke Services Reprographics Publishing Document Supply Searching Tools Document Supply Resource Discovery Training Best Practice Resource Discovery Bespoke Services Research Services Document Supply Reprographics Innovation Centre Exhibitions Events Tours Publishing 2

7 7 Integrated Library System (ILS) Programme Major Programmes/1 Da Vinci Notebook

8 8 ILS: Development Data migration Due to finish in a few days 16M+ BL records 10M+ records from other sources Online ILS software All online changes made (mainly interfaces) – final tests Web OPAC configuration – tested by staff, HE, expert Batch imports / exports Most ones done for go live Rest in priority order

9 9 ILS: Implementation Training Courses to end-users well underway ‘Practice’ system available ‘Search only’ training also underway Testing Functional testing (end to end) nearly complete Performance poor – OPAC very slow Automated stress testing (LoadRunner scripts) eIS trying to find area of problem Ex Libris experts flying over Some security ‘hardening’ needed

10 10 ILS: Cutover from legacy systems Now : Temporary Aleph cataloguing 7 June : Phase 1 – internal processing Staggered take-on of users to ease cutover problems Merge ‘temporary’ records 30 June : Phase 2 – reading rooms Reading rooms closed for cutover June Mainly brand-new PCs etc rather than XP upgrade 30 July : Phase 3 – remote users Could be delayed major problems

11 11

12 12 Future ILS development (ILS/2) Current ILS development seen just as the start Extra records E.g. Sound archive, Manuscripts, Newspaper issues Extra functions E.g. Preservation records Links to other new BL systems E.g. Digital Object Management (images, web pages etc) New releases of Ex Libris packages

13 13 Digitisation Programme Major Programmes/2 International Dunhuang Project

14 14 Background Digitisation Is The Process Of Converting Existing Physical Items Into Digital Surrogates. Digitisation Projects Must Take Into Account Metadata Creation, Optical Character Recognition, Navigation, Display, Archiving, Preservation. Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image. Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000. BL Has Had Fairly Ad Hoc Approach Driven By External Funding Opportunities Curator Interest Projects Have Generally Created Their Own Approach, IT Resources, Project Management BL Has Created About 1.5M Digital Images So Far…

15 15 Digitisation Strategy Digitisation Strategy Project Was Formerly Initiated On February 2, 2004 Key objectives for the project are to define: Selection Criteria Uniform Approach Communications Plan Sustainability Intellectual Property Rights External Relationship Management Funding Integration with DOMS

16 16 Definitive Register of Projects 19 Complete 19 Current 20 Planning JISC Sound (3,900 Hours) JISC Newspapers (2M Pages of 750M Pages) Chopin (Collaborative Project) Early English Books Online Project Status Information

17 17 Digital Object Management (DOM) Programme Major Programmes/3 Gutenberg Bible

18 18 DOM Programme vision Our mission is to enable the United Kingdom to preserve and use its digital intellectual property forever Our vision is create a management system for digital objects that will store and preserve any type of digital material in perpetuity provide access to this material to users with appropriate permissions ensure that the material is easy to find ensure that users can view the material with contemporary applications ensure that users can, where possible, experience material with the original look-and-feel

19 19 Introduction - history Digital Library PFI Mar 1997 – Dec 1998 Digital Library System 1999 – early 2002 Lessons DOM Report Nov 2002 The DOM Programme Started September 2003

20 20 Drivers for the BL DOM Programme Legal deposit legislation for non-print material was granted royal assent in October 2003 Existing voluntary deposit scheme operational since 2000 Storage of digitised masters from early ’90s onwards New digitisation initiatives: newspapers, sound, etc Sound archive receives 12T of material per year (with 50 year collection) Web archiving Cartography and datasets Electronic journals, picture library … and …. …. and …. We need a generic and cost-effective approach for the secure long term storage of digital material that is produced by numerous initiatives

21 21 DOM – many topics to address LDEP: Legal Deposit of Electronic Publications LDLSE: Legal Deposit Libraries Secure Environment RADM: Risk Analysis of Digital Materials SDM: Storage of Digitised Masters VDEP: Voluntary Deposit of Electronic Publications HIGH LOW COMPONENT AMBIGUITY / COMPLEXITY ESTIMATED SIZE OF COMPONENT TECHNICAL REQUIREMENTS VDEP SDM RADM STRATEGY DEVELOPMENT PROTOTYPES LDEP RESOURCE DISCOVERY INTERFACES METADATA DEFINITION RIGHTS MANAGEMENT WORKFLOW Started Planned Non-DOM projects Planned co-operation FILE CONVERSION UTILITIES PERSISTENT IDENTIFIERS LDLSE FILE FORMAT REGISTRY WEB ARCHIVING ILS AUTHENTICATION DIGITISATION PROGRAMME

22 22 Scope - life cycle of objects Collection Selection Acquisition Accession Description Preservation Storage Preservation Access Resource discovery Delivery Rendering

23 23 Scope – objects and processes Preservation store Preserves the bit stream in perpetuity Access store Access versions Limited formats – in the flavour of the era Metadata to support resource discovery Descriptive, Administrative, Links with existing tools e.g. Integrated Library System (ILS) Workflow Ingest, e.g. Legal Deposit processing

24 24 DOM Storage DOM Resource Discovery Delivery ACCESS DONATIONS WEB ARCHIVING Non-Serial Store Grey Literature Publishers Archives Archiving Operational Stores DOCUMENT SUPPLY NSA Newspapers St Pancras Studios DIGITISATION LDL Secure Environment Legal Deposit Items Legal Deposit Processing LEGAL DEPOSIT Digital Rights Management Shared services Authentication Metadata Persistent ID Signing Ingest

25 25 R0 Timeline R2 R1 BC Prototype will provide a basic preservation- quality digital object storage module Consolidate R0 into operational system Provide preservation-quality digital store for materials received under Voluntary Deposit of Electronic Publications (VDEP) Integrate it with the existing VDEP front-end Support ingest for a major content stream Integrate with core Library systems as required Definition. R0 Operat’l Storage Sub System. R ET approve Business Case & Timeline 1st Content Stream ingest. R2 R3 Open DOM to new projects. R5+ LDEP - initial format. R3 & R4 R4 Provide functionality for material covered by LDEP secondary legislation

26 26 DOM: Project definition - 1 digital rights, file formats, etc allow changes to new suppliers, relationships to ILS, other projects etc how do we build it cost-effectively today, supplier selection criteria Functional Architecture “What” Logical architecture “how – overall architecture” Physical architecture “how – storage & specifics” Cross team workshops – reviewing progress, debating detailed technical issues, planning immediate priorities, risk management & way forward Prototyping - basic functioning architecture Prototyping - principal solutions and options Prototyping – assessing market solutions Business case Planning – incremental implementation phases Example issues

27 27 DOM: Project definition - 2 Approach is to be incremental and not ‘Big Bang’ We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a good solution A principal goal is to define: An overall long term “logical architecture” Within which, there will be successive generations of physical architectures We are understanding the storage marketplace, and we will use the knowledge to manage procurement We are certain that we will need >500T of storage but we are uncertain when – we thus need flexible scalable procurement

28 28 DOM architecture - overview DOM Storage Service DOM Physical Storage Unique persistent identifier (DOMID) IntegrityAuthenticity Compound objects/relations Atomic Objects Others Resource Discovery ILS Non-cat based RD Rights Management LDEP Doc supply DOMIDOBJECT Local resource locatorObject DOMID is mapped to node/vol/LRL

29 29 DOM System DOM System (release 3) Mailroom Administration Access Storage subsystem Shared services Publishers Aleph

30 30 DOM logical architecture – integrity and authenticity Integrity: System has capability to continuously monitor the object store to detect object corruption It would then initiate object recovery Authenticity: A process is defined to provide long-term assurance that an object that is re-presented is as it was when it was ingested Based on the use of cryptographic signing techniques Each object is signed when it is ingested The signature is verified when required The signing mechanism is “tightly” controlled

31 31 Procuring physical storage in volume A major cost is in physical storage The market for storage systems is changing rapidly, and this implies that “lock-in” is not sensible We thus need flexibility to change supplier over time Cost of storage is reducing by 30-40% per year Hence procure on rolling basis just ahead of demand Replace storage on a rolling basis on expiry of warranty The rolling programmes imply the need to be able to support a heterogeneous product solution The design of the logical architecture thus supports storage sourced from multiple storage vendors

32 32 Disaster tolerance and the organisation of storage clusters One can obtain commercial disaster recovery (DR) solutions for common equipment configurations However one cannot obtain such solutions for systems comprising multi-100 Tb systems So we must build in the need for DR into the design of the system A single site solution, subject to a common-mode disaster, would suffer considerable loss of availability after a disaster, and so is not acceptable This implies that we need a multi-site solution Conventionally these are based on a master-standby where only 50% of kit is delivering normal service Our design is based on the use of multiple autonomous independent peer clusters that cross-synchronise so 100% of the kit delivers normal service

33 33 DOM architecture in the context of the storage solution market The dominant segment of the market focuses on delivering performance within a highly resilient single cluster However: Many of our objects will be rarely accessed so we do not want to pay for “maximised” performance we do not need We have resilience by using multiple clusters, hence we have a reduced need for resilience within a cluster so we do not want to pay for “maximised” resilience we do not need We are using these drivers to design a cost- effective large scale resilient solution

34 34 DOM storage subsystem architecture - overview DOM Shared Services Unique ID Signing Logging DOM Physical Storage DOM Storage Service DOM Storage gateway Storage cluster DOM Physical Storage DOM Storage Service DOM Storage gateway Storage cluster

35 35 DOM storage subsystem architecture - access DOM central Unique ID Signing Logging DOM Physical Storage DOM Storage Service DOM Storage gateway Storage cluster DOM Physical Storage DOM Storage Service DOM Storage gateway Storage cluster Normal access/delivery is from local storage cluster DOM Shared Services Unique ID Signing Logging

36 36 DOM storage subsystem architecture - access DOM central Unique ID Signing Logging DOM Physical Storage DOM Storage Service DOM Storage gateway Storage cluster DOM Physical Storage DOM Storage Service DOM Storage gateway Storage cluster When a cluster is off-line then access/delivery is from a remote storage cluster DOM Shared Services Unique ID Signing Logging

37 37 DOM storage subsystem architecture - ingest DOM central Unique ID Signing Logging DOM Physical Storage DOM Storage Service DOM Storage gateway Storage cluster DOM Physical Storage DOM Storage Service DOM Storage gateway Storage cluster Synchronise remote store Normal ingest is to the local storage cluster and then the remote cluster is synchronised Signing Store DOM Shared Services Unique ID Signing Logging

38 38 DOM storage subsystem architecture - ingest DOM central Unique ID Signing Logging DOM Physical Storage DOM Storage Service DOM Storage gateway Storage cluster DOM Physical Storage DOM Storage Service DOM Storage gateway Storage cluster Synchronise remote store later When a cluster is off-line then ingest is managed by the remote storage cluster and the local cluster is synchronised later Signing Store DOM Shared Services Unique ID Signing Logging

39 39 In conclusion We plan for generations of physical storage Migration from one generation to the next Allow changes of supplier Purchase incrementally in modest quantities Move quickly when required Be cost conscious We provide assurance that an object is held and re- presented as when it was ingested We are designing a cost-effective large scale resilient solution In summary: we take a long term view

40 40 Web Archiving Programme Major Programmes/4

41 41 Structure of Programme Web Archiving Programme is a collaborative initiative, roughly implemented across two consortiums UK Web Archiving Consortium Developing a selective approach to web archiving, procuring a common web archiving infrastructure and software to begin archiving activities at the earliest International Internet Preservation Consortium Developing advanced web archiving technologies for the long terms, large scale, continuous crawling requirements enabled through legislation

42 42 UK Web Archiving Consortium Developing a selective approach to web archiving License for PANDAS about to be signed with NLA Sub-licenses with consortium partners and contractor to follow ITT concluded with Magus Research winning the contract. Implement a common web arching infrastructure (lots of Linux machines + PANDAS) Provide customisation/development of PANDAS Provide help desk and support

43 43 International Internet Preservation Consortium Developing advanced web archiving technologies Smart Crawler Continuous adaptive crawler, adjusting crawl priority on the fly Based on IA Heritrix Working on requirements now Expect to being tender process in June Content Management Archival formats Framework Metrics and Test Bed

44 44 External Collaboration

45 45 Digital Library Collaborations/Partnerships Current UK Digital Preservation Collation Founder Member TEL (The European Library Project) Web Archiving UK JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales International Internet Preservation Consortium BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of Australia, National Library Of Italy, National Libraries Of Nordic Countries JISC Funded - Digital Curation Centre Persistent Identifiers DOI Foundation, European National Libraries (KB & DDB) Resource Discovery Union Catalogues (SUNCAT) Digital Library Federation

46 46 Secure Legal Deposit Network 6 Legal Deposit Libraries Global Digital Format Registry Potential Partners (National Archives, DLF) Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration) KB (Netherlands National Library & Other Partners – FP6 Bid) Digital Rights Management Potential Partners (Publishers, JISC) Metadata Publishers, Others ? Authentication JISC ? Resource Discovery Search Engine Vendors, Researchers Others ??? Digital Library Collaborations/Partnerships Potential

47 47 Conclusions Beautiful Building! Market & Outcome Focus Huge IT Agenda Collaboration Is Critical To Our Success Can You Work With Us?


Download ppt "1 British Computer Society North London Branch Major Programmes Richard Boulderstone July 27, 2004."

Similar presentations


Ads by Google