Presentation is loading. Please wait.

Presentation is loading. Please wait.

Rapid digitization of P Herbarium Switching to the fast track: Rapid digitization of the world's largest herbarium TDWG 2011- New Orleans Simon Chagnoux,

Similar presentations


Presentation on theme: "Rapid digitization of P Herbarium Switching to the fast track: Rapid digitization of the world's largest herbarium TDWG 2011- New Orleans Simon Chagnoux,"— Presentation transcript:

1 Rapid digitization of P Herbarium Switching to the fast track: Rapid digitization of the world's largest herbarium TDWG 2011- New Orleans Simon Chagnoux, Henri Michiels

2 Rapid digitization of P Herbarium The French Museum

3 Rapid digitization of P Herbarium An old institution Founded in 1635 (at that time the Royal garden of medicinal plants) In 1793, the French revolution turns the garden into the national Museum Now: 15 locations in France, 2000 people 20 Oct. 2011TDWG - Orleans3

4 Rapid digitization of P Herbarium 4 main founding objectives The collections, archives of the planet (70 million specimens) Fundamental and applied research (350 researchers) Higher Education (400 students – Master and PhD) Dissemination of knowledge (galleries, botanical and zoological parks) 20 Oct. 2011TDWG - Orleans4

5 Rapid digitization of P Herbarium Renovating the Herbarium An opportunity to digitize the entire collection

6 Rapid digitization of P Herbarium The Paris Herbarium 20 Oct. 2011TDWG - Orleans6

7 Rapid digitization of P Herbarium The Renovation Project (1) Two main drivers to this project : –the herbarium, designed for 6 million specimens, was packed with 10 million sheets and fitted with old storage –raising the storage density required to reinforce the floors 20 Oct. 2011TDWG - Orleans7

8 Rapid digitization of P Herbarium The Renovation Project (2) The only way of doing this was to move away the entire collection and to put it back in the renovated place after works An opportunity for –New sorting, from geographic to phylogenetic (APG3) –Reconditioning –Digitizing 20 Oct. 2011TDWG - Orleans8

9 Rapid digitization of P Herbarium 2006 – Start of the project 2009 – Start of the works 2010 (June) – Start of digitization 2011 (Nov) – Opening of the first rearranged spaces to researchers 2012 – End of the project Renovation Calendar 20 Oct. 2011TDWG - Orleans9

10 Rapid digitization of P Herbarium Budget Overall project cost: 24,5 Million € –Building renovation12 000 000 –Movers 900 000 –Attaching specimens 3 200 000 –Reconditioning, digitization and sorting 6 700 000 –Supplies 1 600 000 –Storage 100 000 20 Oct. 2011TDWG - Orleans10

11 Rapid digitization of P Herbarium Herbarium Warehouse Industrial Partner The renovation cycle Digitization Reconditioning Sorting Floor by floor renovation 20 Oct. 2011TDWG - Orleans11

12 Before.... 20 Oct. 2011TDWG - Orleans12

13 ... And after 20 Oct. 2011TDWG - Orleans13

14 Rapid digitization of P Herbarium Why digitize ? Because all the parts have to be manipulated in the course of the project Digitization gives us: –a virtual copy of specimens –the possibility to share and study specimens without touching them More than an electronic copy of the collection catalog, we’ll have a collaborative tool for managing scientific knowledge inside, as well as outside the institution 20 Oct. 2011TDWG - Orleans14

15 Rapid digitization of P Herbarium 2D Digitization is cheap the cost of digitization is marginal compared to the full project full specimen processing (moving, sorting, reconditionning, new furniture) digitization and name processing digitization is appealing to funding $1,5 $0,1 20 Oct. 2011TDWG - Orleans15

16 Rapid digitization of P Herbarium A new paradigm For 15 years we have been entering all information of some specimens, –1 million entries in the database (rich information) –One fifth (200 000 images) was photographed Since summer 2010, we use a massive approach where digitization precedes data entry –2 million records digitized in one year –limited information in the database (name and geographic area) –The scientific information can be added without manipulating the specimens themselves 20 Oct. 2011TDWG - Orleans16

17 Rapid digitization of P Herbarium The workflow Digitizing, reconditionning and sorting

18 Rapid digitization of P Herbarium An industrial process (1) We chose a contractor with an industrial know-how A dedicated place had to be set-up and equipped by the contractor Two teams of 20 workers in two shifts working from 6am to 9pm The process had to align on the schedule of the renovation works, floor by floor 20 Oct. 2011TDWG - Orleans18

19 Rapid digitization of P Herbarium An industrial process (2) Planned production rate: 17 000 sheets per day over 24 months  ca. 15 seconds / sheet At this rate, a variation of ± 1 second per specimen has an impact of ± 300 k€ over the project cost 20 Oct. 2011TDWG - Orleans19

20 The Bussy-St-Georges site 20 Oct. 2011TDWG - Orleans20

21 Rapid digitization of P Herbarium Workflow overview 20 Oct. 2011TDWG - Orleans21

22 Rapid digitization of P Herbarium How to alleviate data entry We take advantage of the physical ordering of specimens We provide a name list to the contractor (APG 3 classification) The contractor enriches the list with the information generated during the process and provides us with a table containing consolidated information (image number, barcode numbers, classification,…) 20 Oct. 2011TDWG - Orleans22

23 Rapid digitization of P Herbarium 1 – Delivery (1) A carting company transports the specimens to the facility where they arrive in clearly labeled boxes. Boxes receive a tracking barcode 20 Oct. 2011TDWG - Orleans23

24 Rapid digitization of P Herbarium 1 – Delivery (2) The Museum provides two files: 1. a “logistics” file –number of boxes –family name and number –genus name and number –geographic area 2. a “taxonomy” file –List of available taxon names with family, genus, species, authors, ID (taxon number) 20 Oct. 2011TDWG - Orleans24

25 Rapid digitization of P Herbarium 1 – Delivery (3) This information is digested by the contractor’s Information System and used along the industrial process (labeling, sorting, quality assurance) 20 Oct. 2011TDWG - Orleans25

26 Rapid digitization of P Herbarium 2 – Folder processing For each folder, the operator : 1.replaces the jacket (color according to region) 2.reads the species name and types the first letters on its computer 3.selects the name in a list 4.prints a label with barcode and identification information, and sticks it on the folder 20 Oct. 2011TDWG - Orleans26

27 Rapid digitization of P Herbarium 3 – Specimen Digitization (1) Datamatrix and barcode are stuck on each sheet –Datamatrix: for tracking purposes –Barcode: specific to Muséum and to int’l herbarium standard The specimens are placed three by three on a tray 20 Oct. 2011TDWG - Orleans27

28 Rapid digitization of P Herbarium 3 - Specimen Digitization (2) The tray is placed on a conveyor belt The sheet is scanned The scan is checked (framing and focus) At the end of the chain, the barcode is read to check if all specimens are back in the folder 20 Oct. 2011TDWG - Orleans28

29 The Digitization Bench 20 Oct. 2011TDWG - Orleans29

30 Rapid digitization of P Herbarium 4 - Reconditioning After scanning, each sheet is inserted in a sulfurized paper liner The barcode of each specimen is read, allowing the system to check if all specimens are back in the right folder The folders are stored in a “cut box” before sorting 20 Oct. 2011TDWG - Orleans30

31 Rapid digitization of P Herbarium 5 - Sorting 1 (by genus) This sorting consists in storing specimens by family and genus names The operator puts the jackets in boxes and places them on shelves according to the family and genus numbers (the shelves are labelled in advance by the contractor) 20 Oct. 2011TDWG - Orleans31

32 Rapid digitization of P Herbarium 6 - Sorting 2 (by species) The operator takes a box, reads the barcode on each jacket The system displays the species name and assigns a number which is printed on a label The label is sticked on the folder, which is then stored on the shelf with the same number 20 Oct. 2011TDWG - Orleans32

33 Rapid digitization of P Herbarium 7 – Packing, transport and final storage 20 Oct. 2011TDWG - Orleans33 The folders are put in boxes and sent to the Museum The contractor stores the folders in the Museum’s herbarium

34 Rapid digitization of P Herbarium How to ensure quality in mass digitization? 60 000 images produced each week 1% of the production checked (ca. 600 images) Samples are distributed among botanical staff Checking: Focus Data quality Barcode number Barcode location 1 2 3 4

35 Rapid digitization of P Herbarium Scanning Resolution and Image Format

36 Rapid digitization of P Herbarium Production of images The conveyor belt passes the specimens under a bidirectional scanner which produces 11x17” (A3), 300 dpi, 5000 x 3300 pixel images TIFF files are saved offline (one production day per disk of 1 TB) JPEG’s are made for online use 20 Oct. 2011TDWG - Orleans36

37 Rapid digitization of P Herbarium Scanning resolution and image size One TIFF image is 50 MB One JPEG is 5 MB. This compression rate was chosen to have the same level of details as with TIFF (only colour is slightly changed) This choice is a technico-economic trade-off For 10 million images: –TIFF represents 500 TB –JPEG represents 50 TB –Data represents <100 GB 20 Oct. 2011TDWG - Orleans37

38 Rapid digitization of P Herbarium Why do we keep TIFF ? Partners seek lossless data (Reflora, Mellon) Standard for physical publishing Native scan output, which can be used for any future use or transformation 20 Oct. 2011TDWG - Orleans38

39 Rapid digitization of P Herbarium Handling TIFF data We cannot afford « live » storage of 500 TB … and even 1 Po with redundancy ! $$$ With a lot of energy consumption and heat dissipation for rarely accessed images We are planning to start using tape storage next year, with HSM software For the time being, USB disks are stored in the collection warehouse 20 Oct. 2011TDWG - Orleans39

40 Rapid digitization of P Herbarium Exception for the types The types are not part of this industrial process They are manually digitized on-premises at 600 dpi (200 MB in compressed TIFF) This process was initiated by the Mellon foundation in 2004 We now have about 100 000 type images 20 Oct. 2011TDWG - Orleans40

41 Rapid digitization of P Herbarium What we’ve achieved and learned … … after 12 months of collaboration between scientists and industrials (over an anticipated duration of 24 months)

42 Rapid digitization of P Herbarium Achievements 2,1 million specimens processed between June 2010 and August 2011 Images and data are of good quality The new premises comply with today’s standards (space, safety, light, air- conditioning, …) 20 Oct. 2011TDWG - Orleans42

43 Rapid digitization of P Herbarium Fast but... not fast enough 20 Oct. 2011TDWG - Orleans43

44 Rapid digitization of P Herbarium Reasons for being behind schedule Logisticians have under-estimated the sorting work Only two digitization chains are operational, instead of three (due to lack of staff) 20 Oct. 2011TDWG - Orleans44

45 Rapid digitization of P Herbarium Software and quality assurance There is more software needed for ensuring tracability and detecting failures than for data acquisition. Fast web publication of images allows a broader audience to perform quality control. Continuous control is mandatory 20 Oct. 2011TDWG - Orleans45

46 Rapid digitization of P Herbarium People Working under constant time pressure during two years is really difficult in an academic context The contractor must be considered as a service provider and not just the team next-door (not obvious in an academic context) 20 Oct. 2011TDWG - Orleans46

47 Rapid digitization of P Herbarium Working with a contractor ROI speed robustness quality exhaustivity specifity Culture clash Many parameters were not known at the beginning of the project (processes, numbers,...) Quality control is a key point to make sure that scientific excellence governs the industrial throughput (to be defined upfront) Write everything and always refer to the contract 20 Oct. 2011TDWG - Orleans47

48 Rapid digitization of P Herbarium Digitizing other objects Digitizing herbarium is « easy »: –same dimensions for all objects –Easy manipulation and scanning –The plant itself is not touched – only the paper Digitizing 3D objects is a lot more complex and generally requires to manipulate the specimen itself 20 Oct. 2011TDWG - Orleans48

49 Rapid digitization of P Herbarium Is it over ? Digitization is just a very first step…

50 Rapid digitization of P Herbarium Virtual herbarium The amount of information available on- line will lower the number of physical visits to the Herbarium … but visitors leave post-it note on the sheets  How to replace this ? –Annotation systems –« virtual visit » website 20 Oct. 2011TDWG - Orleans50

51 Rapid digitization of P Herbarium Spot the differences … AFM FABACEAE Abrus aureus R. Vig. ? 20 Oct. 2011TDWG - Orleans51

52 Rapid digitization of P Herbarium Differences are Occurrence –occurrenceID | catalogNumber | occurrenceDetails | occurrenceRemarks | recordNumber | recordedBy | individualID | individualCount | sex | lifeStage | reproductiveCondition | behavior | establishmentMeans | occurrenceStatus | preparations | disposition | otherCatalogNumbers | previousIdentifications | associatedMedia | associatedReferences | associatedOccurrences | associatedSequences | associatedTaxaoccurrenceIDoccurrenceDetailsoccurrenceRemarksrecordNumberrecordedByindividualID individualCountsexlifeStagereproductiveConditionbehaviorestablishmentMeansoccurrenceStatuspreparations dispositionotherCatalogNumberspreviousIdentificationsassociatedMediaassociatedReferencesassociatedOccurrences associatedSequencesassociatedTaxa Event –eventID | samplingProtocol | samplingEffort | eventDate | eventTime | startDayOfYear | endDayOfYear | year | month | day | verbatimEventDate | habitat | fieldNumber | fieldNotes | eventRemarkseventIDsamplingProtocolsamplingEfforteventDateeventTimestartDayOfYearendDayOfYearyearmonthday verbatimEventDatehabitatfieldNumberfieldNoteseventRemarks Location –locationID | higherGeographyID | higherGeography | continent | waterBody | islandGroup | island | country | countryCode | stateProvince | county | municipality | locality | verbatimLocality | verbatimElevation | minimumElevationInMeters | maximumElevationInMeters | verbatimDepth | minimumDepthInMeters | maximumDepthInMeters | minimumDistanceAboveSurfaceInMeters | maximumDistanceAboveSurfaceInMeters | locationAccordingTo | locationRemarks | verbatimCoordinates | verbatimLatitude | verbatimLongitude | verbatimCoordinateSystem | verbatimSRS | decimalLatitude | decimalLongitude | geodeticDatum | coordinateUncertaintyInMeters | coordinatePrecision | pointRadiusSpatialFit | footprintWKT | footprintSRS | footprintSpatialFit | georeferencedBy | georeferenceProtocol | georeferenceSources | georeferenceVerificationStatus | georeferenceRemarkslocationIDhigherGeographyIDhigherGeographywaterBodyislandGroupislandcountrycountryCode stateProvincecountymunicipalitylocalityverbatimLocalityverbatimElevationminimumElevationInMeters maximumElevationInMetersverbatimDepthminimumDepthInMetersmaximumDepthInMeters minimumDistanceAboveSurfaceInMetersmaximumDistanceAboveSurfaceInMeterslocationAccordingTolocationRemarks verbatimCoordinatesverbatimLatitudeverbatimLongitudeverbatimCoordinateSystemverbatimSRSdecimalLatitude decimalLongitudegeodeticDatumcoordinateUncertaintyInMeterscoordinatePrecisionpointRadiusSpatialFitfootprintWKT footprintSRSfootprintSpatialFitgeoreferencedBygeoreferenceProtocolgeoreferenceSources georeferenceVerificationStatusgeoreferenceRemarks Identification –identificationID | identifiedBy | dateIdentified | identificationReferences | identificationRemarks | identificationQualifier | typeStatusidentificationIDidentifiedBydateIdentifiedidentificationReferencesidentificationRemarksidentificationQualifiertypeStatus Taxon –taxonID | scientificNameID | acceptedNameUsageID | parentNameUsageID | originalNameUsageID | nameAccordingToID | namePublishedInID | taxonConceptID | scientificName | acceptedNameUsage | parentNameUsage | originalNameUsage | nameAccordingTo | namePublishedIn | higherClassification | kingdom | phylum | class | order | family | genus | subgenus | specificEpithet | infraspecificEpithet | taxonRank | verbatimTaxonRank | scientificNameAuthorship | vernacularName | nomenclaturalCode | taxonomicStatus | nomenclaturalStatus | taxonRemarktaxonIDscientificNameIDacceptedNameUsageIDparentNameUsageIDoriginalNameUsageIDnameAccordingToID namePublishedInIDtaxonConceptIDacceptedNameUsageparentNameUsageoriginalNameUsage nameAccordingTonamePublishedInhigherClassificationkingdomphylumclassordersubgenusinfraspecificEpithettaxonRankverbatimTaxonRankvernacularName nomenclaturalCodetaxonomicStatusnomenclaturalStatustaxonRemark 20 Oct. 2011TDWG - Orleans52

53 Rapid digitization of P Herbarium OCR / NLP ? 20 Oct. 2011TDWG - Orleans53

54 Rapid digitization of P Herbarium Projects to fill the gap Remote Taxonomists –Yack web tool Citizen Science / CrowdSourcing –« les collecteurs » project Repatriation project –Reflora (Brasil) 20 Oct. 2011TDWG - Orleans54

55 Rapid digitization of P Herbarium Thank you ! A project managed by: Direction of Collections –Michel Guiraud mguiraud (at) mnhn (.) frmguiraud (at) mnhn (.) fr –Pascale Joannot joannot (at) mnhn (.) frjoannot (at) mnhn (.) fr DSI (Information Systems) –Henri Michiels michiels (at) mnhn (.) frmichiels (at) mnhn (.) fr –Simon Chagnoux chagnoux (at) mnhn (.) frchagnoux (at) mnhn (.) fr 20 Oct. 2011TDWG - Orleans55


Download ppt "Rapid digitization of P Herbarium Switching to the fast track: Rapid digitization of the world's largest herbarium TDWG 2011- New Orleans Simon Chagnoux,"

Similar presentations


Ads by Google