Presentation is loading. Please wait.

Presentation is loading. Please wait.

Toro 1 EMu Hacking at the Peabody Museum. Yale campus.

Similar presentations


Presentation on theme: "Toro 1 EMu Hacking at the Peabody Museum. Yale campus."— Presentation transcript:

1 Toro 1 EMu Hacking at the Peabody Museum

2 Yale campus

3 Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Individual Invertebrate Paleontology 300,000Lot Invertebrate Zoology 300,000Lot Mineralogy 35,000Individual Paleobotany 150,000Individual Scientific Instruments 2,000Individual Vertebrate Paleontology 125,000Individual Vertebrate Zoology 185,000Lot / Individual 2.7 million database-able units => ~11 million items

4 Peabody Collections Functional Units Databased Anthropology 325,000 90 % Botany 350,000 1 % Entomology1,000,000 1 % Invertebrate Paleontology 300,000 55 % Invertebrate Zoology 300,000 20 % Mineralogy 35,000 85 % Paleobotany 150,000 60 % Scientific Instruments 2,000100 % Vertebrate Paleontology 125,000 60 % Vertebrate Zoology 185,000 95 % 940,000 of 2.7 million => 37 % overall

5 EMu migration in '05 (all disciplines went live simultaneously) Physical move in '00-'02 (primarily neontological disciplines) Big events

6 The four YPM buildings Peabody (YPM) Environmental Science Center (ESC) Geology / Geophysics (KGL) 175 Whitney (Anthropology)

7 VZ Kristof Zyskowski (Vert. Zool. - ESC) Greg Watkins-Colwell (Vert. Zool. - ESC)

8 HSI Shae Trewin (Scientific Instruments – KGL )

9 VP Mary Ann Turner (Vert. Paleo. – KGL / YPM)

10 ANT Maureen DaRos (Anthro. - YPM / 175 Whitney)

11 EMu Hacking at Peabody Hacking – in a laudatory programming sense, not a criminal sense

12 Mitnick Often we tend to think of “hackers” in this mode

13 Mitnick modified cracker A better moniker

14 Mitnick modified w/EMu cracker Crackers often have unnamed accomplices…

15 3 Vignettes of YPM EMu “hacks” An issue of functionality (background script) An issue of performance (tweaking the catalogue) An issue of user behavior & cost (another script…)

16 Hack Vignette #1 Multimedia module - JPEG 2000 support

17 http://www.jpeg.org/jpeg2000 - non-proprietary compression standard - lossless mode (much smaller files) - lossy mode (vastly smaller files) - potential space/bandwidth savings

18 http://www.fnordware.com/j2k

19 JP2 spicebush with J2K and tail target

20 JP2 spicebush tails with file sizes 1.54 mB (native TIFF) 15 kB (heavily squeezed JP2)

21 HERBIS images 261 kb – <1%1,302 kb – 2% 5,166 kb – 12%62,640 kb – 100%

22 JP2 – no thumbnail In EMu, oops… no thumbnail

23 JP2 – script coding find imagedir –name *.jp2 –mtime -2 –print loop on the matches and test to see which recently loaded JP2 files are missing a thumbnail JPG, or which JP2 files have been modified more recently than their existing thumbnail JPG ; then build filenames for any qualifying target JPGs ; execute script several times per hour from cron jasper –f match –F tempfile convert tempfile –resize 90x90 target

24 JP2 – prior, without script wakes up every 20 minutes…

25 JP2 – now, with makes the thumbnail…

26 JP2 – Tiled View JP2 files now behave just like all other standard multimedia

27 JP2 – Photoshop opens Double click and the Photoshop handler kicks in

28 JP2 – V1 V. 1 – simply generated thumbnails in the background

29 JP2 – V2 V. 2 – also inserted suitable metadata into records via texload (next version, script to be called directly in validation code at file time)

30 Hack Vignette #1 Moral #1 = EMu is extensible, you may be able to implement significant changes yourself in whole or in part, without delay

31 Catalogue module - performance issues Hack Vignette #2

32 Default EMu “cron” job configuration late night workday evening = emulutsrebuild = emumaintenance batch = emumaintenance compact Mo Tu We Th Fr Sa Su Orange is time EMu busy running background jobs. Interfering with workday work, and leaving Sunday processing time idle/unused.

33 The ecatalogue database is a rate limiter File NameFunction ~/emu/data/ecatalogue/datathe actual data ~/emu/data/ecatalogue/recindexing (part) ~/emu/data/ecatalogue/segindexing (part) At YPM, the combined size of these was >10 gB, with 4 gB in data and 3 gB in both rec and seg

34 Touch many types of records in EMu… e.g., Party record add middle name e.g., Bibliography recordadd author e.g., Collecting Events recordadd collector …automatic changes subsequently propogate to numerous records in the ecatalogue database …ecatalogue can grow a lot and slow EMu to varying degrees between maintenance runs

35 How to make ecatalogue go faster ?

36 maybe save 20+% ? Make it smaller - trim nulls from Legacy Data ?

37 Repetitive scripting of texexport & texload jobs Conducting around a million re-imports of records Manual adjustment of nightly cron jobs to accommodate Do the work at nighttime over a month-long period Watched ecatalogue closely to keep from exploding disk Make it smaller - trim nulls from Legacy Data ?

38 data rec seg Starting situation at YPM for ecatalogue (gB on y axis)

39 data rec seg delete nulls from AdmOriginalData

40 sites – round 2 constant data lengthy prefixes … not satisfied with just that… here are some other things to possibly trim!

41 data rec seg delete nulls from AdmOriginalData shorten prefix on AdmOriginalData selectively delete AdmOriginalData >55 % !

42 catalogue – round 2 data rec seg What ecatalogue AdmOriginalData looks like post scripting

43 Default EMu “cron” job configuration late night workday evening = emulutsrebuild = emumaintenance batch = emumaintenance compact Mo Tu We Th Fr Sa Su BEFORE

44 Modified EMu “cron” job configuration Mo Tu We Th Fr Sa Su late night workday evening = emulutsrebuild = emumaintenance batch = emumaintenance compact *** AFTER Can now squeeze all maintenance into wee hours of night, use Sunday, and fully compact ecatalogue every other day (asterisks)!

45 Quick backup Also, all of YPM EMu can now be squeezed onto a thumbdrive

46 Hack Vignette #2 Moral #2 = know your data, you can put aspects of EMu on a diet and your computer system is likely to thank you

47 Hack Vignette #3 EMu sessions - licensing and user behavior

48 Dreaded email WARNING! 2 KE EMu user(s) are currently being denied access because all 10 of your KE EMu licenses are in use. For license upgrades, please contact info@kesoftware.com Dreaded email for sysadmins

49 Museum Director: "Go license shopping at KE!" Systems Admin: "VISA or MasterCard?" The conversation you dream of but of course never have…

50 What do you need ? Guaranteed license seat for every potential user ? Cover maximal number of expected concurrent users ? Minimize expenses by minimizing license seats ?

51 Jess & Lourdes fight (2) My turn to log in ! %}&$ Dream on, loser ! #@^* 3rd option is dangerous… if you have this you probably have too few licenses

52 Even with a moderate number of licenses… … inactive EMu sessions can and will accumulate

53 Critical research VARIANT 1: critical research needed, EMu session put on hold

54 VARIANT 2: both people and computers crash… Life intervenes Mon cherie IRN View >Attachments

55 …enter the EMu Grim Reaper Script seeks out inactive EMu sessions

56 reaper – script coding texlicstatus ps -ef -Grim Reaper wakes up frequently throughout the day -keeps a running table of statistics about each texserver -compares each texserver against a countdown timer -adjusts timer based on activity since last wake up -if some new activity, resets the countdown timer -if no activity, increments the countdown timer -if countdown timer max is reached, kill the texserver kill –9 texserver_process_id

57 Tuning the Emu Grim Reaper Script Change time between wakeup checks Change number of wakeup check intervals Tell reaper to ignore certain users Amend reaper behavior by time of day Alter how much inactivity is considered bad 32 regular YPM users, 13 runtime licenses

58 New sessions started per hour, 0800-1700 25 0 Real data prior two weeks in October 2006

59 Cumulative new sessions started, 0800-1700 80 0 Real data prior two weeks in October 2006

60 Active sessions, 0800-1700: three slow days 12 0 2 10 6 8 4 Real data prior two weeks in October 2006

61 Active sessions, 0800-1700: three fast days 12 0 2 10 6 8 4 Real data prior two weeks in October 2006

62 Cope on phone It’s telling me, “Licenses Exceeded?!” No more worrries

63 Hack Vignette #3 MORAL = find a licensing balance, but also consider training your users and EMu system

64 Happy Scripting, Happy Campers


Download ppt "Toro 1 EMu Hacking at the Peabody Museum. Yale campus."

Similar presentations


Ads by Google