Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bridging the Gap between Libraries and Data Archives: Progress Report Roger Revelle, Gulf of California Expedition, 1939 JISC/NSF Digital Libraries Initiative.

Similar presentations


Presentation on theme: "Bridging the Gap between Libraries and Data Archives: Progress Report Roger Revelle, Gulf of California Expedition, 1939 JISC/NSF Digital Libraries Initiative."— Presentation transcript:

1 Bridging the Gap between Libraries and Data Archives: Progress Report Roger Revelle, Gulf of California Expedition, 1939 JISC/NSF Digital Libraries Initiative All Projects Meeting 24-25 June 2002, Edinburgh

2 Two new NSF Projects … “Bridging the Gap between Libraries and Data Archives” NSDL Collections Track “SIOExplorer: Web Exploration of Seagoing Archives” Information Technology Research (ITR) Started October 2001

3 Collaborative effort UCSD Libraries Scripps Institution of Oceanography San Diego Supercomputer Center Advisory Board NOAA US Naval Oceanographic Office Private Industry Other oceanographic institutions

4 Combine … Data 50 years of digital data Growing 200 GB per year Images 99 years of SIO Archives Documents Reports, publications, books … into one digital library

5 Data in the collection …

6 Bathymetry, magnetics, gravity Gathered from worldwide sources 795 SIO cruise legs Swath bathymetry since 1981 Approx. 3000 cruise legs online at SIO

7 Multibeam sonar revolutionizes seafloor understanding Map a wide swath Not just a single profile –SeaBeam Classic, 1981-1992 –16 beams –SeaBeam 2000, 1992- –121 beams –SeaBeam 2100, 1996-2000 –151 beams –Simrad EM120, 2001- –191 beams –150 degree swath width Also backscatter –Determine bottom type –Sediment –Lava flow Realtime swath 20 km across-track

8 SIO Swath Mapping Expeditions 244 swath mapping cruises on vessels, since 1981 Thomas Washington Melville Revelle 600 GB multibeam holdings Adding 200 GB/year

9 Deliver sampling information Sample index, 1968- 100,000 entries 500 types –Dredged rocks, cores –Biological trawls –Water samples –CTD Build on www.EarthRef.org Seamount catalog (Amelia Earhart) Roger Revelle, MidPac, 1950

10 Images in the collection …

11 Access Voyages of Discovery Encourage inquiry “What’s this?” links from image –Data (“What”) –Instruments (“How”) –Other voyages Dual use Research and education Naga Expedition, 1959-61 (artist’s illustrations from logbook)

12 R/V Albatross departed SIO 1904 Sigsbee sounding machine

13 Voyages of Discovery in the Pacific La Perouse 1780’s R/V Revelle “La Perouse Expedition” –Departed June 8 R/V Melville “Cook Expedition” –Returns July 17 Special Collections, UCSD Library James Cook By Nathaniel Dance, 1776

14 Voyages of Discovery in the Pacific 1950’s Ed Hamilton, MidPac, 1950 Samoa, Capricorn, 1952

15 R/V Spencer F. Baird L to R back row: Dick Von Herzen, Roger Revelle, Willard Bascom, Ted Folsom, Alan Jones, Gustaf Arrhenius, Henri Rotschi, Robert Livingston, Russell Raitt. Seated: Dick Blumberg, Ronald Mason, Bob Dill, Art Maxwell, Winter Horton, Walter Munk, Helen Raitt Capricorn Expedition, 1952-53 Query for ideas and careers Not just data Track a scientist’s expeditions and publications

16 Documents in the collection …

17 Full text of publications The Challenger Expedition 30,000 scanned pages Anatomy of an Expedition Bill Menard, 1967 Nova Expedition –Link to 1998 Avon Expedition Exploring the Deep Pacific Helen Raitt, 1952 Capricorn Expedition

18 Cruise reports 50 years available Scan older versions Currently generate.pdf automatically Page with swath bathymetry every 6 hours

19 Bridging the Gap: Progress Report

20 The Problem Archives are search-impaired Content not a problem Material exists in great abundance Data archives Historical archives But it is hard to get Litany of woes …

21 Litany of archive woes Magnetic media at risk Need to migrate to new storage Local access only Some online, but sprawling directories Tapes and CDs in drawers Inconsistent naming over 30 years Home-grown software Pre-database technology Minimal documentation Formal metadata non-existent Creators now retired What to do? Shipboard archives for one recent cruise

22 Steps toward a Solution Seek professional help Computer scientists Advisory Board (Similar problems faced in many fields) Review the problem Seven issues from national workshop Analyze the dataflow Build a prototype Test the prototype New Zealand – Samoa Expedition

23 Search Metadata rarely exist Access Automated management Quality A challenge Display Interactive tools Flexibility Import, export Scalability Interoperate with large projects Stability Curation, beyond end of project Review archive problems NSF/ONR Marine Geology and Geophysics Workshop

24 First, create a conceptual data model Spend time to review with all participants Design a robust model Define common categories –9 basic directories –Specific subdirectories Controlled design document Map existing digital objects to categories Both documents and data Accommodate variations –Data types and names over 50 years –Valid for future developments Result “CCDS” – Canonical Cruise Data Structure Dataflow

25 Second, organize domain-specific content Work inside a “Staging Area” Deal with complexity –Extract from 3 archive levels –Shipboard (tape, CD) –Post-processing lab (tape) –Current online content – (not always “best”) Opportunity for data cleanup –Apply corrections –Weed out intermediate and duplicate versions –Gather information for metadata

26 Third, load the “CCDS” Clear transition in activities Domain specialists final approval IT team takes over Early mistake “Pushed” content from legacy data directories –Complex, vary over years –Revised to “pull” into Canonical Structure IT lesson learned Dataflow needs to be “template-driven” Template can incorporate –Rules for automatic loading –Adaptive choice among multiple alternatives Maintain flexibility as project evolves –Team members negotiate content of template

27 Fourth, load the data Persistent data archive management Use the “Storage Resource Broker” –San Diego Supercomputer Center product Fifth, load the metadata Harvest metadata from data files, automatically Provide tools for metadata editing Load into Oracle

28 Building a Collection Developer’s Toolkit

29 Collection Developer’s Toolkit Make it easy to build, and maintain Not just for IT experts Portable and scalable for other projects Integrate Metadata tools Data tools Interactive search and display console

30 Make use of existing resources Alexandria Digital Library Geospatial content OAI-compliant server Environmental data archive and delivery tools John Helly, http://ceed.sdsc.edu/ Storage Resource Broker http://www.npaci.edu/DICE/SRB/index.html/ Domain-specific toolkits GMT, MB-System, ARC/IMS

31 Build metadata tools Automate Bulk harvesting from data files Bulk loading into Oracle database Use NSDL community standards Dublin Core + “ADN” metadata –Alexandria Digital Library (UCSB) –DLESE (Digital Library for Earth System Education) –NASA Controlled vocabularies –Science themes –Geographic names Embed domain-specific metadata into standards Multibeam, cruise, sampling

32 MOBE Metadata Object Browser and Editor Inherit metadata from –Dublin Core –Cruise Flexible –Expand for projects as needed –Generic ascii metadata interchange format “MIF” –Export to xml Java

33 Search interface Design for alternative approaches Geospatial –Lat, lon Temporal –“1995-2000” Keyword –Region “Samoa” –Vessel “Melville” –Cruise “AVON02MV” –Data type“dredge” –Scientist“Staudigel” Expert-level –Research, teacher, student, public Prototype search interface

34 CruiseViewer Interactive browser and query interface Display tracks and samples Download library objects Java

35 Manage interfaces for multiple projects Both data and metadata

36 Lessons learned (so far)…

37 Make it easier to collaborate Interactions between groups Not just a technology project Diverse goals, vocabularies and audiences Interoperate Each domain has own sphere of responsibility –Don’t engineer someone else’s domain Work through interfaces –Re-negotiate as needed –Avoid long-term maintenance headaches between domains

38 Build tools for collaborative projects 3 “cultures” in this project Oceanographers Computer scientists Librarians Example: bridge vocabularies between separate domains Use metadata “triples,” not “pairs” Reduce phone calls by including narrative label parameter name valuenarrative label science_themesgeochemistry, marine geology, marine geophysics, hot spots, mantle plumes, geochronometry, seamount chains keywords, from controlled vocabulary of science terms, selected from the “SIOExplorer Science Theme” template

39 Adding new projects to SIOExplorer Make use of Collection Developer’s Toolkit NSDL server Metadata interchange Query processing SDSC –Managed storage –Web service

40 Test the prototype Melville departs Lyttelton harbor

41 Floating Digital Library Workshop R/V Melville March 7-21, New Zealand to Samoa Realtime acquisition of library objects? Load metadata into swath files –At acquisition time Specify cruise metadata Sensor documentation database Load the CCDS Learn from a common experience

42 A good day at 51° S Renewed appreciation for the collection of field data

43 Common experience Librarians Computer scientists Oceanographers Royal New Zealand Navy Melville in Lyttelton Collaboration between SIO and RNZN

44 Floating Digital Library Workshop Librarian at sea Computer scientist in galley Oceanographer holding onto computer

45 Bollons Gap survey New Zealand Law of the Sea Claim Librarian at sea Visualization of swath bathymetry, looking north

46 Heading for Samoa Crossing the Louisville Ridge Tonga Trench Osbourn Trough (ancient spreading center) Visualization of Global Topography, looking north

47 Relate cruise to SIO holdings Display search results Red –SIO multibeam Black –Other cruises Yellow –SIO dredged rock samples Also –Volcanoes –Earthquakes –Plate boundaries Typical research support product Make it available on web Select cruises for further study Export for ArcView –Related NSF/ITR project

48 Data Publishing Toolkit for Digital Library Interoperability: Integrating the Albatross Cruise Holdings into SIOExplorer NSF Division of Biological Infrastructure Collaboration with Smithsonian Institution Biogeography and Geology of the Oceans: SIO Collections Gateway for the NSDL NSF NSDL Collections Track Track of the Albatross, 1884-1921 Next steps

49 SIOExplorer: Expedition Planner Open research data for student discovery Leverage Digital Library efforts Students design a virtual expedition –Explore relationships –Depth, Sediment thickness, Crustal age –More … –Earthquakes, volcanoes, trenches –Wind, waves, currents –Climate Students publish expedition report –On the web Teacher workshops –At the Birch Aquarium Crustal Age Sediment thickness Global Topography

50 SIO 100 th Anniversary September 26, 2003 SIO, 1909 http://SIOExplorer.ucsd.edu R/V Alexander Agassiz, 1907


Download ppt "Bridging the Gap between Libraries and Data Archives: Progress Report Roger Revelle, Gulf of California Expedition, 1939 JISC/NSF Digital Libraries Initiative."

Similar presentations


Ads by Google