Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Multi-Tiered Architecture for Distributed Data Collection and Centralized Data Delivery Stacy Kowalczyk and James Halliday April 28, 2008.

Similar presentations


Presentation on theme: "A Multi-Tiered Architecture for Distributed Data Collection and Centralized Data Delivery Stacy Kowalczyk and James Halliday April 28, 2008."— Presentation transcript:

1 A Multi-Tiered Architecture for Distributed Data Collection and Centralized Data Delivery Stacy Kowalczyk and James Halliday April 28, 2008

2 Project Overview IN Harmony is An IMLS funded grant Awarded in Fall 2004 To be competed in Fall 2008 A partnership of Indiana University Digital Library Program Indiana University Lilly Library Indiana State Library Indiana State Museum Indiana Historical Society April 28, 2008IN Harmony – DLP Spring Forum 2008

3 Project Goals 1.To provide a model for fostering collaborative digital library development by partnering with institutions with complementary collections; 2.To digitize a portion of the sheet music from these collections and offer access to these materials free of charge on the web; 3.To bring these materials and their attendant metadata together on a single web site, offering both federated searching of the entire collection and searching of one or more selected collections; April 28, 2008IN Harmony – DLP Spring Forum 2008

4 Deliverables Tools to Process the images Capture metadata Provide search and display functions 10,000 pieces of sheet music scanned and cataloged 4,000 Indiana University Lilly Library 2,000 Indiana State Library 2,000 Indiana State Museum 2,000 Indiana Historical Society April 28, 2008IN Harmony – DLP Spring Forum 2008

5 Cataloging and Imaging Workflow Goals Data integrity Quality of the scans Quality of the metadata Accuracy of the links between page images Accuracy of the links between metadata and images Simplicity of use Balance of flexibility and constraints April 28, 2008IN Harmony – DLP Spring Forum 2008

6 Cataloging and Imaging Use Cases 1.Catalog first 2.Scanning first 3.Metadata created in another system and imported into IN Harmony April 28, 2008IN Harmony – DLP Spring Forum 2008

7 Digitizing Quality Control 2 phased Quality Control Process Automated QC process verifies: All TIFF tags of every digital file TIFF must be uncompressed Files names Embedded profile appropriate to its bit depth Consistency of pixel dimensions within a score Appropriate resolution April 28, 2008IN Harmony – DLP Spring Forum 2008

8 Digitizing Quality Control (2) Manual QC – at 100% pixel display, verify: Correct page orientation and order Correct color balance Sharp and in-focus scan No digital artifacts When all QC is passed, derivative files are created Large and small jpgs for screen delivery PDF sized for 8.5 x 11 printing April 28, 2008IN Harmony – DLP Spring Forum 2008

9 Digitizing Quality Control Software

10

11

12

13 Designing the metadata model User studies Work with the partners Define fields Write cataloging guidelines with partner input Representation in MODS April 28, 2008IN Harmony – DLP Spring Forum 2008

14 Types of fields Title elements Name elements Publication elements Subject elements Identification elements Note elements Cover information April 28, 2008IN Harmony – DLP Spring Forum 2008

15 Metadata Collection Tool

16

17

18 Public Search and Discovery System Demo Demo December 13, 2015Customize footer: View menu/Header and Footer

19 A RCHITECTURE O VERVIEW J IM H ALLIDAY December 13, 2015Customize footer: View menu/Header and Footer

20 IN Harmony Technical Overview Fedora Web Browser SRU and http Mass Storage System Oracle Cataloging Client Quality Control Scanner Authentication Service Java Swing MODs Export FTP Perl Web Application

21 Getting Data Into IN Harmony 2 primary data sources Cataloging client Image QC/upload application Other data sources XML data exported from other cataloging systems Score images exported from older systems April 28, 2008IN Harmony – DLP Spring Forum 2008

22

23 Image QC/upload application 1.User scans scores and uploads to IN Harmony server 2.User accesses Perl-based web application to initiate automated quality control 3.A second user proceeds with manual QC, then uses web application to signal that manual QC is finished 4.The application moves and backs up the files, creates derivatives, and alerts both Fedora and the internal database that the process is complete April 28, 2008IN Harmony – DLP Spring Forum 2008

24 IN Harmony Derivatives Three sizes of JPG’s produced per page Full (1200px high) Screen (600px high) Thumb (200px high) Multi-page, playable PDF Approx. 1MB for an average score April 28, 2008IN Harmony – DLP Spring Forum 2008

25

26 IN Harmony cataloging client Standalone Java Swing based client Connects to Oracle database and outputs MODS for Fedora ingestion Implemented as a client-server application via web services using Axis Specialized UI components (such as ‘smart’ combo boxes) assist with quick, correct data entry April 28, 2008IN Harmony – DLP Spring Forum 2008

27

28 Internal IN Harmony database Oracle database stores record and user data in our own internal format Communicates with upload/QC application, and cataloging client Cataloging client and internal scripts can output to MODS format for ingestion into Fedora April 28, 2008IN Harmony – DLP Spring Forum 2008

29

30 IN Harmony authentication CAS (IU’s Central Authentication Service) is used to authenticate all users Non-IU users must create IU Guest Accounts to authenticate All account/password maintenance in user’s control April 28, 2008IN Harmony – DLP Spring Forum 2008

31

32 Fedora and IN Harmony Fedora used as a single storage and infrastructure solution for Digital Library Program projects as IU Data (score images and metadata) ingested into Fedora and referenced as METS objects Master images sent to IU’s mass storage system Derivatives stored internally Objects indexed using Lucene for SRU-based searching April 28, 2008IN Harmony – DLP Spring Forum 2008

33 Fedora Object Model Collection Sheet music Copy Page

34

35 IN Harmony end-user interface - Java Struts based web application - Offers searching, browsing, and record display - Each partner institution is offered a personalized view of their data only Interaction with Fedora - Application sends CQL queries to Fedora and retrieves MODS data which is transformed via XSLT - PURLs (persistent URL’s) are used to access image derivatives April 28, 2008IN Harmony – DLP Spring Forum 2008

36 METS Navigator METS Navigator is used to page through scores online Uses METS structmap to facilitate navigation Allows views of multiple sizes of images Released by IU as open source – see http://metsnavigator.sourceforge.net http://metsnavigator.sourceforge.net April 28, 2008IN Harmony – DLP Spring Forum 2008

37 IN Harmony Technical Overview Fedora Web Browser SRU and http Mass Storage System Oracle Cataloging Client Quality Control Scanner Authentication Service Java Swing MODs Export FTP Perl Web Application

38 IN Harmony Links IN Harmony Public Interface IN Harmony Project Information Cataloging Tool Release date – June 2008 April 28, 2008IN Harmony – DLP Spring Forum 2008

39 Questions? April 28, 2008IN Harmony – DLP Spring Forum 2008


Download ppt "A Multi-Tiered Architecture for Distributed Data Collection and Centralized Data Delivery Stacy Kowalczyk and James Halliday April 28, 2008."

Similar presentations


Ads by Google