Presentation is loading. Please wait.

Presentation is loading. Please wait.

Digital Asset Management and Publication with LadyBird Eric James programmer/analyst library IT Yale University Library 12 July 2013.

Similar presentations

Presentation on theme: "Digital Asset Management and Publication with LadyBird Eric James programmer/analyst library IT Yale University Library 12 July 2013."— Presentation transcript:

1 Digital Asset Management and Publication with LadyBird Eric James programmer/analyst library IT Yale University Library 12 July 2013 Eric James programmer/analyst library IT Yale University Library 12 July 2013

2 What is LadyBird? Bebop song by Tadd Dameron First Lady, Lyndon B. Johnson presidency Old dog from King of the Hill Digital asset management tool 2

3 LadyBird - Digital Asset Management Tool 3 LadyBird from its origin is a system which processes metadata and temporarily houses digital assets to be published. It provides a configurable system for migrating digital objects and collections, normalizing metadata, and preserving and publishing content. It was initially writing in Microsoft.Net and C#, hosted on Windows 2008 using Microsoft SQL Server Some work on java modules (for import) Wish list – To migrate to Jruby/rails.

4 LadyBird components Web interface Job processing engine - imports Export processing engine – exports Bag creation Heartbeat monitor Application cleanup system This presentation will focus on the workflow and concepts involved in publication of digital objects w/ metadata to fedora 4

5 LadyBird concepts I Core of the application is the object table Collection – departments within the library and Yale (later will come into play when discussing c# tables) Project – projects specific to a collection An object belongs to a project and a project belongs to a collection Currently 16 collections with 34 projects and 1.53 million objects We call objects oids, technically oid means object id column of the object table but we tend to use it to describe the whole ball of wax User table – cataloger is registered and roles and permissions setting are used throughout the app 5

6 LadyBird concepts II Processing objects is all about the spreadsheet Each row is an object Each column represents either functions or metadata Functions ex – {F1} is the object as identified by oid(primary key of object table), if left blank that is signal to create a new oid {F4} parent oid (for complex objects) {F40} can have a value PUBLISH telling ladybird to auto publish this object Metadata ex – {FDID=58} call number,{FDID=262} Host,creator,etc. The cataloger can take advantage of excel functionality (like repeating fields) to quickly create a spreadsheet for batch import, 6

7 LadyBird concepts III field_definition (fdid) table (230 metadata fields) 51Cataloger 52Record source 53Record date 54Record modified date 55Record ID 56Local record ID 57Local record ID, other 58Call number 59Accession number 60Box The values are either strings or acid values (more on acids later) 7

8 LadyBird concepts IV Import tables – all about the spreadsheets, though you can import MARC or EAD records by bibid, barcode, handle too, in that case the records are deserialized into fdids, and any spreadsheet data overrides the records im_job (1 master row for spreadsheet) Im_job_exHead (column headers from spreadsheet) im_job_contents (values) Im_files(for files) import_checksum (for files) im_job_contents_history Job tracking (overall tracking associates a oid imported to a specific job) trk_project trk_job trk_job_contents trk_oid 8

9 LadyBird concepts V The C# tables – c for current,# for each collection The Metadata home - data imported to the im tables finally transferred here There is a set of tables for each collection. Ex: # = 13 (collection:Hydra, project: Hydra Test) c13 – master list of oids c13_strings c13_longstrings c13_acid Each row contains basically a oid/fdid/value, thus given an oid one could get all metadata fields for that object as rows from this table. It also has a favid for additional values associated with the fdid. There also corresponding p# tables, p for past that keep a audit trail of any updates to specific oids. C#table designed for high volumeExploring better options, hashing 9

10 LadyBird concepts VI Acid – authority control – a system for using controlled vocabulary for metadata fields Fdid 62 = Host, Creator Acid fdid value Luhan, Mabel Dodge, Dobbs, Arthur, Filson, John, ca Thomson, Charles, Hutchins, Thomas, Adair, James, ca So If for an oid row in the spreadsheet the fdid 62 column was given the value , that field would resolve to Adair, James, ca Currently 155,415 values. Potential for more sophisticates uses with linked data. 10

11 LadyBird sample workflow start Workstation mounted with a job folder for both import and export Windows: \\\project25\import\ \\\project25\import\ Mac: SMB:// Windows: \\\project25\export\ \\\project25\export\ Mac: SMB:// Project25 corresponds to the project table Create a folder in the import directory and drag files into folders or subfolders LadyBird will now have detected that folder and have created a job for this under the Dashboard menu selection 11

12 LadyBird dashboard 12

13 add digital object to folder 13

14 Got to dashboard and process this folder 14

15 Receive confirmation Subject: LadyBird Import Complete job: test_open_rep Your import has been processed. test_open_rep Visit your dashboard in Ladybird for your most recent jobs. View job: * A jobcomplete.txt file with the time is added to import folder so app know that directory is complete 15

16 View job 16

17 View set 17

18 New object->Metadata (form) 18

19 Or From View Set, Export as Job 19

20 Receive export confirmation Subject: LadyBird Export Ready Your export is ready. \\birdcage\project25\export\ermadmix_46371_ _ xls \\birdcage\project25\export\ermadmix_46371_ _ xls 20

21 Spreadsheet – fill in and save as tab-delimited text file 21

22 Import 22

23 Import Confirmation Subject: LadyBird Import Complete job: ermadmix_import_062613_ Your import has been processed. ermadmix_import_062613_ Visit your dashboard in Ladybird for your most recent jobs. View job: 23

24 Publish Publishes automatically if {F40}=publish Or can use interface to check file and metadata and explicitly click the publish button 24

25 Publish (behind the scenes) Oid is added to the hydra table with date (when added) and date published (when processing complete) timestamps Id oid date date published … … :01: :14: :01: :14: :01: :14: :01: :14: :01: :14: :01: :14: … … 25

26 oid added to hydra_publish table Key fields: hpid: hcmid: 2 cid:9 Pid: 27 Oid: _oid: 0 zindex: 0 hydraID: null dateReady: :01: dateHydraStart: null 26

27 Rows for oid added to hydra_publish_path table Key fields w/ example: hppid: Hpid: Type: jp2 pathHTTP: pathUNC: \\\home\ladybird yul\ladybird\project27\publish\dl\ \ _page1.jp2 Md5: 35433b00ca9de2cdaed275c controlGroup: M mimeType: image/jp2 Dsid: jp2 ingestMethod: filepath oidPointer: null 27

28 Hydra_publish_path – typical files xml rights (hydra rights) Xml metadata (MODS descMetadata) Xml access (home grown granular rights) pdf (transcript YIPP) pdf2 (annotated transcript YIPP) jp2 (derivative) jpg (derivatives) tif (master) 28

29 descMetadata - creation There is a service (c# class and methods) that is called upon hydra publish that iterates through all the fdids for an oid and uses the XML DOM to create a MODS file. This is basically a mapping of field definitions to the MODS schema. There is the potential to map the fdids to any metadata format. 29

30 accessMetadata 30

31 Rights metadata 31

32 Transition in fedora hydra world select * from hydra_content_model iddate uidcontentModel :50: simple :50: complexParent :50: complexChild ContentModel maps to ActiveFedora model 32

33 Transition into fedora hydra world II select * from hydra_content_model_ds iddate uidhcmiddsid ingMethodrequired :56: accessMetadatapullHTTPy :56: descMetadatapullHTTPy :56: rightsMetadatapullHTTPy :56: tif filepathy :56: jp2 filepathy :56: jpg filepathy :56: accessMetadatapullHTTPy :56: descMetadatapullHTTPy :56: rightsMetadatapullHTTPy :56: tif filepathn :56: jp2 filepathn :56: jpg filepathn :56: accessMetadatapullHTTPy :56: descMetadatapullHTTPy :56: rightsMetadatapullHTTPy :56: tif filepathy :56: jp2 filepathy :56: jpg filepathy :48: oidPointer pointern :03: pdf filepathn :03: pdf2 filepathn 33

34 Example - simple content model require "active-fedora" class Simple < ActiveFedora::Base belongs_to :collection, :property=> :is_member_of has_metadata :name => 'descMetadata', :type => Hydra::Datastream::SimpleMods has_metadata :name => 'accessMetadata', :type => Hydra::Datastream::AccessConditions has_metadata :name => 'rightsMetadata', :type => Hydra::Datastream::Rights has_metadata :name => 'propertyMetadata', :type => Hydra::Datastream::Properties delegate :oid, :to=>"propertyMetadata", :unique=>true delegate :projid, :to=>"propertyMetadata", :unique=>true delegate :cid, :to=>"propertyMetadata", :unique=>true delegate :zindex, :to=>"propertyMetadata", :unique=>true delegate :parentoid, :to=>"propertyMetadata", :unique=>true end 34

35 Example – Properties Datastream require 'active_fedora' module Hydra module Datastream class Properties < ActiveFedora::OmDatastream #ERJ note ladybird pid = projid, ladybird _oid = parentoid set_terminology do |t| t.root(:path=>"root") t.oid(:path=>"oid") t.cid(:path=>"cid") t.projid(:path=>"projid") t.zindex(:path=>"zindex") t.parentoid(:path=>"parentoid") t.ztotal(:path=>"ztotal") t.oidpointer(:path=>"oidpointer") end def to_solr( super(solr_doc) solr_doc['oid_isi'] = oid solr_doc['cid_isi'] = cid solr_doc['projid_isi'] = projid solr_doc['zindex_isi'] = zindex solr_doc['parentoid_isi'] = parentoid solr_doc['ztotal_isi'] = ztotal solr_doc['oidpointer_isi'] = oidpointer solr_doc end 35

36 Workflow review 1.Add folder with files to import folder 2.Process folder. This will create the records in the database (oids, job tracking,c# instances, and file derivatives) 3.Export spreadsheet. This will create a spreadsheet template for the folder of files in (1) 4.Fill in metadata in spreadsheet – the main cataloging task. 5.Import spreadsheet. This will ultimately populate the c# with metadata from the oid rows of the spreadsheet. 6.Publish to hydra. This will create the hydra tables with serialized metadata files(MODS, access rights), and stage files in storage for ingest. 36

37 Ingest task Set up within a hydra project gem tiny_tds connect to the ladybird SQL Server database 37

38 app/models (objects) collection.rb – maps to pid (project) in ladybird, parent to simple.rb and complex_parent.rb simple.rb – 1 image w/derivatives, no hierarchy complex_parent.rb – parent to a set of images (like a book or image set) complex_child.rb – 1 image w/derivatives (like a page These relate to the hydra_content_model table 38

39 app/model (datastreams) coll_properties.rb properties.rb rights.rb access_conditions.rb simple_mods.rb 39

40 simple_mods.rb - indexing 40

41 rake yulhy4:ingest I Properties: SQL server connection config Mount of ladybird storage Uses the hydra_publish table as a queue (driven by this query until done): select top 1 a.hpid,a.oid,a.cid,,b.contentModel,a._oid from dbo.hydra_publish a, dbo.hydra_content_model b where a.dateHydraStart is null and a.dateReady is not null and a._oid=0 and a.hcmid is not null and a.hcmid=b.hcmid and a.action='insert' order by a.dateReady") 41

42 rake yulhy4:ingest II ActiveFedora ingest Create new object based on content model obj = obj = obj = 42

43 Rake yulhy4:ingest III Iterate through all datastreams for the content model select hcmds.dsid as dsid,hcmds.ingestMethod as ingestMethod, hcmds.required as required from dbo.hydra_content_model hcm, dbo.hydra_content_model_ds hcmds where hcm.contentModel = '#{contentModel}' and hcm.hcmid = hcmds.hcmid/) For each in above query get the datastream info for the oid select type,pathHTTP,pathUNC,md5,controlGroup,mimeType,dsid,OIDpointer from dbo.hydra_publish_path where hpid=#{i["hpid"]} and dsid='#{dsid}'/) Verify checksums and use activeFedora to ingest datastreams 43

44 rake yulhy4:ingest IV Add ladybird specific info to properties datastream oid cid pid zindex _oid Add hierarchical info to RELS-EXT Simple and complex_parent – is_member_of a collection Complex_child – is member of a complex_parent Some discussion about adding more linked data. 44

45 Rake yulhy4:ingest V 45

46 Rake yulhy4:ingest VI 46

47 Blacklight 47

48 review 48

49 future Hydra_publish – revise already ingested content action=update action=insert Archivematica (by artefactual) Replace the ingest task with a custom workflow GUI interface Human decision points and manual processing Technical metadata generation (FITS) Provenance (jhove) Issues – how to employ OAI packages (SIP,AIP,DIP) for objects without a natural package structure? 49

50 Contributors Eric James Lakeisha Robinson Kalee Sprague Osman Din Jay Terray Rebekeh Irwin Mike Friscia 50

51 Thank you

Download ppt "Digital Asset Management and Publication with LadyBird Eric James programmer/analyst library IT Yale University Library 12 July 2013."

Similar presentations

Ads by Google