Presentation is loading. Please wait.

Presentation is loading. Please wait.

Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

Similar presentations


Presentation on theme: "Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot."— Presentation transcript:

1

2 Toro 1 EMu on a Diet

3 Yale campus

4

5 Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot Invertebrate Paleontology 300,000Lot Invertebrate Zoology 300,000Lot Mineralogy 35,000Individual Paleobotany 150,000Individual Scientific Instruments 2,000Individual Vertebrate Paleontology 125,000Individual Vertebrate Zoology 185,000Lot / Individual 2.7 million database-able units => ~11 million items

6 Peabody Collections Functional Units Databased Anthropology 325, % Botany 350,000 1 % Entomology1,000,000 3 % Invertebrate Paleontology 300, % Invertebrate Zoology 300, % Mineralogy 35, % Paleobotany 150, % Scientific Instruments 2, % Vertebrate Paleontology 125, % Vertebrate Zoology 185, % 990,000 of 2.7 million => 37 % overall

7 The four YPM buildings Peabody (YPM) Environmental Science Center (ESC) Geology / Geophysics (KGL) 175 Whitney (Anthropology)

8 VZ Kristof Zyskowski (Vert. Zool. - ESC) Greg Watkins-Colwell (Vert. Zool. - ESC)

9 HSI Shae Trewin (Scientific Instruments – KGL )

10 VP Mary Ann Turner (Vert. Paleo. – KGL / YPM)

11 ANT Maureen DaRos (Anthro. - YPM / 175 Whitney)

12 % Databased vs. Collection Size (in 1000s of items)

13 Botany Entomology Invertebrate Paleontology Invertebrate Zoology % Databased vs. Collection Size (in 1000s of items)

14 1991 Systems Office created & staffed Peabody Collections Approximate Digital Timeline

15 1991 Systems Office created & staffed 1992 Argus collections databasing initiative started Peabody Collections Approximate Digital Timeline

16 1991 Systems Office created & staffed 1992 Argus collections databasing initiative started 1994 Gopher services launched for collections data Peabody Collections Approximate Digital Timeline

17 1991 Systems Office created & staffed 1992 Argus collections databasing initiative started 1994 Gopher services launched for collections data 1997 Gopher mothballed, Web / HTTP services launched Peabody Collections Approximate Digital Timeline

18 1991 Systems Office created & staffed 1992 Argus collections databasing initiative started 1994 Gopher services launched for collections data 1997 Gopher mothballed, Web / HTTP services launched 1998 Physical move of many collections begins 2002 Physical move of many collections ends Peabody Collections Approximate Digital Timeline

19 1991 Systems Office created & staffed 1992 Argus collections databasing initiative started 1994 Gopher services launched for collections data 1997 Gopher mothballed, Web / HTTP services launched 1998 Physical move of many collections begins 2002 Physical move of many collections ends 2003 Search for Argus successor commences 2003 Informatics Office created & staffed Peabody Collections Approximate Digital Timeline

20 1991 Systems Office created & staffed 1992 Argus collections databasing initiative started 1994 Gopher services launched for collections data 1997 Gopher mothballed, Web / HTTP services launched 1998 Physical move of many collections begins 2002 Physical move of many collections ends 2003 Search for Argus successor commences 2003 Informatics Office created & staffed 2004 KE EMu to succeed Argus, data migration begins 2005 Argus data migration ends, go-live in KE EMu Peabody Collections Approximate Digital Timeline

21 EMu migration in '05 (all disciplines went live simultaneously) Physical move in 98-'02 (primarily neontological disciplines) Big events

22

23 What do you do …

24 … when your EMu is out of shape & sluggish ?

25 What do you do … … when your EMu is out of shape & sluggish ?

26

27

28

29

30

31

32 The Peabody Museum Presents

33 What clued us in that we should put our EMu on a diet ? The Peabody Museum Presents

34 980 megabytes in Argus 10,400 megabytes in EMu Area of Server Occupied by Catalogue

35 ? 980 megabytes in Argus 10,400 megabytes in EMu

36 Default EMu cron maintenance job schedule Mo Tu We Th Fr Sa Su late night workday evening = emulutsrebuild = emumaintenance batch = emumaintenance compact

37 late night workday evening = emulutsrebuild = emumaintenance batch = emumaintenance compact Mo Tu We Th Fr Sa Su Default EMu cron maintenance job schedule

38 late night workday evening = emulutsrebuild = emumaintenance batch = emumaintenance compact Mo Tu We Th Fr Sa Su Default EMu cron maintenance job schedule

39 late night workday evening = emulutsrebuild = emumaintenance batch = emumaintenance compact Mo Tu We Th Fr Sa Su Default EMu cron maintenance job schedule

40 Three Fabulously Easy Steps !

41 1. The Legacy Data Burnoff ( best quick loss plan ever ! )

42 Three Fabulously Easy Steps ! 1. The Legacy Data Burnoff ( best quick loss plan ever ! ) 2. The Darwin Core Binge & Purge ( eat the big enchilada and still end up thin ! )

43 Three Fabulously Easy Steps ! 1. The Legacy Data Burnoff ( best quick loss plan ever ! ) 2. The Darwin Core Binge & Purge ( eat the big enchilada and still end up thin ! ) 3. The Validation Code SlimDing ( your Texpress metabolism is your friend ! )

44 1. The Legacy Data Burnoff Anatomy of the ecatalogue database File NameFunction ~/emu/data/ecatalogue/datathe actual data ~/emu/data/ecatalogue/recindexing (part) ~/emu/data/ecatalogue/segindexing (part) The combined size of these was 10.4 gb -- 4 gb in data and 3 gb in each of rec and seg 980 mB 10,400 mB

45 The ecatalogue database was a rate limiter typical EMu data directory 23 files, 2 subdirs

46 Closer Assessment of Legacy Data In 2005, we had initially adopted many of the existing formats for data elements from the USNMs EMu client, to allow for rapid development of the Peabodys modules by KE prior to migration -- Legacy Data fields were among them

47 Closer Assessment of Legacy Data In 2005, we had initially adopted many of the existing formats for data elements from the USNMs EMu client, to allow for rapid development of the Peabodys modules by KE prior to migration -- Legacy Data fields were among them

48 Closer Assessment of Legacy Data

49 sites – round 2 constant data lengthy prefixes

50 sites – round 2 data of temporary use in migration

51 catalogue – round 2 data rec seg

52 Repetitive scripting of texexport & texload jobs Conducted around a million updates of records Manually adjusted cron jobs to accommodate Did the work at night over six-month-long period Watched process closely to keep from filling server disks How did we do the LegacyData Burnoff in 2005 ?

53 Repetitive scripting of texexport & texload jobs Conducted around a million updates of records Manual;y adjusted nightly cron jobs to accommodate Did the work at night over six-month-long period Watched process closely to keep from filling server disks How did we do the LegacyData Burnoff in 2005 ?

54 ecatalogue data rec seg

55 Crunch 2 data rec seg delete nulls from AdmOriginalData ecatalogue

56 Crunch 3 data rec seg delete nulls from AdmOriginalData shorten labels on AdmOriginalData ecatalogue

57 Crunch 4 data rec seg delete nulls from AdmOriginalData shorten labels on AdmOriginalData delete prefixes on AdmOriginalData ecatalogue

58 Crunch 4 data rec seg delete nulls from AdmOriginalData shorten labels on AdmOriginalData delete prefixes on AdmOriginalData ecatalogue Wow ! 55 % reduction !

59 2. The Darwin Core Binge & Purge Charles Darwin,

60 Natural History Metadata Standard DwC Affords interoperability of different database systems Widely used in collaborative informatics initiatives Circa fields depending on particular version Directly analogous to the Dublin Core standard

61

62

63

64

65 Populate DwC fields at upgrade in 2006… so what ?

66 IZ Department: total characters existing data 43,941,006

67 Populate DwC fields at upgrade in 2006… so what ? IZ Department: total characters existing data 43,941,006 IZ Department: est. new DwC characters 20,000,000

68 Populate DwC fields at upgrade in 2006… so what ? IZ Department: total characters existing data 43,941,006 IZ Department: est. new DwC characters 20,000,000 IZ Department: est. expansion factor 45 %

69 Were about to gain back most of the pounds we just lost in the Legacy Data Burnoff !

70 catalogue – round 2 data rec seg

71 catalogue – round 2 data rec seg action in ecollectionevents

72 catalogue – round 2 data rec seg action in eparties

73 catalogue – round 2 data rec seg action in ecatalogue

74 catalogue – round 2 data rec seg Before actions

75 catalogue – round 2 data rec seg After actions

76

77 ExtendedData

78 SummaryData

79 ExtendedData SummaryData ExtendedData field is a full duplication of IRN + SummaryData fields… delete the ExtendedData field, use SummaryData when in thumbnail mode on records

80 Populate DwC fields at upgrade… so what ? IZ Department: total characters existing data 43,941,006 IZ Department: est. new DwC characters 20,000,000 IZ Department: est. expansion factor 45 %

81 Populate DwC fields at upgrade… so what ? IZ Department: total characters modified data 43,707,277 IZ Department: total new DwC characters 22,358,461 IZ Department: actual expansion factor %

82 Populate DwC fields at upgrade… so what ? IZ Department: total characters existing data 43,707,277 IZ Department: total new DwC characters 22,358,461 IZ Department: actual expansion factor % Some pain, but NO weight gain !

83 3. The Validation Code SlimDing Weve taken off the easiest pounds… any other fields to trim ? Some sneakily subversive texpress tricks

84 3. The Validation Code SlimDing Can history of query behavior by users help identify some EMu soft spots ?

85 3. The Validation Code SlimDing Can history of query behavior by users help identify some EMu soft spots ? If so, can we slip EMu a dynamic diet pill into its computer code ?

86 3. The Validation Code SlimDing Can history of query behavior by users help identify some EMu soft spots ? If so, can we slip EMu a dynamic diet pill into its computer code ? texadmin

87 …you make certain common types of changes to any record in any EMu module …and automatic changes then propagate via emuload to numerous records in linked modules …those linked modules can grow a lot and slow EMu significantly between maintenance runs EMu actions in the background you dont see

88

89

90 Why not harness EMus continuously ravenous appetite for pushing local copies of linked fields into remote modules… and put it to work slimming for us !

91 Why not harness EMus continuously ravenous appetite for pushing local copies of linked fields into remote modules… and put it to work slimming for us ! Need to first understand how different EMu queries work

92 Drag and Drop Query

93 checks the link field

94 Straight Text Entry Query instead checks a local copy of the SummaryData from the linked record that has been inserted into the catalogue

95 EMus audit log - gigantic activity trail How often do users employ these two very different query strategies, on what fields, and are there distinctly divergent patterns ?

96 catalogue audit In this one week sample, only 7 of 52 queries for accessions from inside the catalogue module used text queries, the other 45 were drag & drops

97 Of those 7 text queries, every one asked for a primary id number for the accession, or the numeric piece of that number, but not for any other type of data from within those accessions

98 Over a full year of catalogue audit data, less than 1% of all the queries into accessions used other than the primary id of the accession record as the keyword(s).

99 Over a full year of catalogue audit data, less than 1% of all the queries into accessions used other than the primary id of the accession record as the keyword(s). This is where we gain our SlimDing advantage !

100 Over a full year of catalogue audit data, less than 1% of all the queries into accessions used other than the primary id of the accession record as the keyword(s). This is where we gain our SlimDing advantage ! We dont need more than the primary id of the accession record in the local copy of the accession module data stored in the catalogue module.

101 Over a full year of catalogue audit data, less than 1% of all the queries into accessions used other than the primary id of the accession record as the keyword(s). This is where we gain our SlimDing advantage ! We dont need more than the primary id of the accession record in the local copy of the accession module data stored in the catalogue module. This pattern also held true for queries launched from the catalogue against the bibliography and loans modules !

102

103 Catalogue Database

104

105

106

107 Catalogue module lost another 19% of its bulk over a couple months !

108 Internal Movements Database

109 Internal movements dropped from 550 mbytes down to 200 mbytes… 65% reduction !

110 Internal Movements Database

111

112 late night workday evening = emulutsrebuild = emumaintenance batch = emumaintenance compact Mo Tu We Th Fr Sa Su Default EMu cron maintenance job schedule

113 Mo Tu We Th Fr Sa Su late night workday evening = emulutsrebuild = emumaintenance batch = emumaintenance compact Default EMu cron maintenance job schedule * * *

114 Mo Tu We Th Fr Sa Su late night workday evening = emulutsrebuild = emumaintenance batch = emumaintenance compact Default EMu cron maintenance job schedule * * *

115 Mo Tu We Th Fr Sa Su late night workday evening = emulutsrebuild = emumaintenance batch = emumaintenance compact Default EMu cron maintenance job schedule * * *

116 Quick backup

117 A Happy EMu Means Happy Campers

118 finis


Download ppt "Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot."

Similar presentations


Ads by Google