CSIRO Marine Research Divisional Data Centre Current and Future Activities Tony Rees, Data Centre Manager April 2004.

Slides:



Advertisements
Similar presentations
Strategic issues for digital projects... …or, what are we doing here?
Advertisements

Strategic issues for digital projects... …or, what are we doing here?
Bruce Johnson Library of Congress, Cataloging Distribution Service, 2008 After a presentation by: Anna Martin, Union Catalogue Project, Cambridge University,
Modeling Hybrid Information Environments: The Librarian and the Super Model Kerry Blinco Consultant Macquarie University Project Manager LIDDAS Project.
Near East Plant Protection Network for Regional Cooperation & Knowledge Sharing Food and Agriculture Organization of the United Nations An Overview on.
WHY CMS? WHY NOW? CONTENT MANAGEMENT SYSTEM. CMS OVERVIEW Why CMS? What is it? What are the benefits and how can it help me? Centralia College web content.
Geospatial One-Stop A Federal Gateway to Federal, State & Local Geographic Data
ICOLC October 4, 2001 OCLC Services. Purpose Libraries’ web-based information portal needs –Maximize consortia’s role in their members’ use of database.
OBIS Australia – Regional Node for the Ocean Biogeographic Information System (OBIS) OBIS Australia is an operational component of the Census of Marine.
2009 Mid–Term Review El Verde Field Station June 4, 2009.
University of Adelaide Library Life Impact The University of Adelaide The well connected catalogue Patricia Scott, Denise Tobin and Helen Attar.
Network Management Overview IACT 918 July 2004 Gene Awyzio SITACS University of Wollongong.
1 Adaptive Management Portal April
1 ISO – Metadata Next Generation International consensus being built on structured metadata within a broader Geomatics Standard under ISO Technical.
Joint Information Systems Committee Supporting Higher and Further Education Development of an Information Environment for UK Learning and Teaching NOF-Digitise.
Data Warehouse success depends on metadata
DoW text: Task and WP leaders will prepare syntheses reports of the project progress, its results and its implications. These synthesis reports will be.
A Data Management Life-Cycle By David Ferderer Project Chief Chris SkinnerContractor Greg GuntherContractor
EGEMS A Dedicated Web Based System for Ground Water Data Processing Analysis and Storage.
Release 4 of the COUNTER Code of Practice for e- Resources and new usage- based measures of impact Peter Shepherd COUNTER May 2014.
Evaluations and recommendations for a user support toolkit Christine Cahoon George Munroe.
Section 13.1 Add a hit counter to a Web page Identify the limitations of hit counters Describe the information gathered by tracking systems Create a guest.
ORGANIZING AND STRUCTURING DATA FOR DIGITAL PROJECTS Suzanne Huffman Digital Resources Librarian Simpson Library.
Trimble Connected Community
MEDIN Data Guidelines. Data Guidelines Documents with tables and Excel versions of tables which are organised on a thematic basis which consider the actual.
Justice Information Exchange Model (JIEM) Larry Webster SEARCH January 23, 2004.
DE&T (QuickVic) Reporting Software Overview Term
Metadata and Data Management activities at CSIRO Marine Research, Australia Kim Finney & Tony Rees Divisional Data Centre CSIRO Marine Research, Hobart.
Controlled Vocabularies (Term Lists). Controlled Vocabs Literally - A list of terms to choose from Aim is to promote the use of common vocabularies so.
HEALTH DEVELOPMENT AGENCY ONLINE INFORMATION RESOURCES Heidi Livingstone Marta Calonge Contreras.
5-7 November 2014 DR Workflow Practical Digital Content Management from Digital Libraries & Archives Perspective.
LIS 506 (Fall 2006) LIS 506 Information Technology Week 11: Digital Libraries & Institutional Repositories.
OBIS Portal Architecture Concepts plus potential for utilization as a basis for Regional OBIS Nodes Tony Rees, CSIRO Marine Research, Hobart (and OBIS.
The Legislative Library of Ontario’s Ontario Documents Repository Road to Partnership.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
Water Quality Data, Maps, and Graphs Over the Web · Chemical concentrations in water, sediment, and aquatic organism tissues.
U.S. Department of the Interior U.S. Geological Survey NWIS, STORET, and XML Advisory Committee on Water Information September 10, 2003 Kenneth J. Lanfear,
Zope/Plone/Python for Research Ben Best OBISSEAMAP mapping marine megavertebrates
OBIS and species distributions Tony Rees discussion presentation, March 2003 Some fundamental intentions for OBIS... –Choose any species and discover its.
MarLIN CSIRO Marine Laboratories Information Network update April 1999 Tony Rees Divisional Data Centre CSIRO Marine Research, Hobart acknowledgements:
VO Sandpit, November 2009 CEDA Metadata Steve Donegan/Sam Pepler.
CBSOR,Indian Statistical Institute 30th March 07, ISI,Kokata 1 Digital Repository support for Consortium Dr. Devika P. Madalli Documentation Research &
CSIRO Marine Research Data Centre linked databases - CAAB, MarLIN and Divisional Data Warehouse.
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
U.S. Department of the Interior U.S. Geological Survey CDI Webinar Series 2013 Data Management at the National Climate Change and Wildlife Science Center.
NDD (National Oceans Office Data Directory) development overview as at 1 July 2002 Tony Rees/Miroslaw Ryba CSIRO Marine Research, Hobart.
GBIF Data Access and Database Interoperability 2003 Work Programme Overview Donald Hobern, GBIF Programme Officer for Data Access and Database Interoperability.
1 Overview Finding and importing data sets –Searching for data –Importing data_.
MarLIN - CSIRO Marine Laboratories Information Network.
TSS Database Inventory. CIRA has… Received and imported the 2002 and 2018 modeling data Decided to initially store only IMPROVE site-specific data Decided.
Hellenic Centre for Marine Research (HCMR) MedOBIS - Ocean Biogeographic Information System for the Eastern Mediterranean and Black Sea.
OPEXShare The New Cloud-based System for DOE Operating Experience Presented by Gerry Whitney Mission Support Alliance, LLC DOE Operating Experience Committee.
U.S. Environmental Protection Agency Central Data Exchange Pilot Project Promoting Geospatial Data Exchange Between EPA and State Partners. April 25, 2007.
CAAB and taxon management at CSIRO Marine Research Tony Rees Divisional Data Centre CSIRO Marine Research, Hobart
MarLIN: a research data metadatabase for CSIRO Marine Research Tony Rees Divisional Data Centre CSIRO Marine Research, Hobart contact:
Fire Emissions Network Sept. 4, 2002 A white paper for the development of a NSF Digital Government Program proposal Stefan Falke Washington University.
CAAB - Codes for Australian Aquatic Biota Tony Rees Divisional Data Centre CSIRO Marine Research, Hobart
A Project of the University Libraries Ball State University Libraries A destination for research, learning, and friends.
System concept and development by: Tony Rees Divisional Data Centre CSIRO Marine Research, Australia c-squares - a new method for representing, querying,
C-squares concept: Data items are represented by the grid squares in which they are located 1: Data items2: Data items and relevant grid squares 3: Grid.
Open Access data at VLIZ Experience in retrieving data from EMODnet “Data ingestion, archiving, citation and DOI” June 26, 2014.
Grant Writing for Digital Projects September 2012 IODE Project Office IODE Project Office Oostende, Belgium Oostende, Belgium Sustainability and.
GISELA & CHAIN Workshop Digital Cultural Heritage Network
Reusing and repurposing metadata in a Current Research Information System and Institutional Repository 3 June 2010 Robin Armstrong Viner Cataloguing.
Flanders Marine Institute (VLIZ)
WHAT DOES THE FUTURE HOLD? Ann Ellis Dec. 18, 2000
VI-SEEM Data Repository
Decision Support Systems Directory – an Envirolink Tools Project
GISELA & CHAIN Workshop Digital Cultural Heritage Network
Robin Dale RLG OAIS Functionality Robin Dale RLG
Presentation transcript:

CSIRO Marine Research Divisional Data Centre Current and Future Activities Tony Rees, Data Centre Manager April 2004

Talk outline General Divisional context – past and present Data Centre approaches and tools – including MarLIN, Data Warehouse & Trawler, CAAB, C-squares, and OBIS Data Centre services to CMR projects Cleveland-specific issues Target audience and level of talk Introductory / overview level, some examples but not full detail Aimed at CMR staff in general, project managers, plus project metadata staff Database designers, application developers will find material of interest, but need separate more detailed info.

Our people (and their intellectual capabilities) Our hardware (collecting platforms etc.) and technologies Our data – newly collected, plus historic data What are the Division’s chief assets? How do we manage our data assets? Mixture of good, moderately good, and not good at all “good” – well documented; details in searchable catalogue; appropriate/current formats; online access (to appropriate users); ongoing curation “moderately good” and “not good” depart from the above, to lesser or greater degree Data Centre curates selected datasets on behalf of the Division, others reside long-term in projects Data Centre also maintains “MarLIN” – the Division’s data catalogue (metadata system)

Overview of metadata, data systems – national context CMRNOOGAAADAIMSetc. metadata systems MarLIN Neptune CMR data NOO data GA data AAD data AIMS data etc. describe / point to... ASDD Australian Spatial Data Directory – national cross-agency metadata gateway 3 rd party data (CMR copy) example search via ASDD – search across multiple agencies, basic functionality search via MarLIN – search only CMR holdings, but extra functionality (also view “CMR internal” records not visible to external users)

The Card Index...

MarLIN Marine Laboratories Information Network Divisional Data Catalogue (metadata system)

What is in MarLIN? Descriptions of <2,000 Divisional datasets (including c.1000 held by the Data Centre) Individual MarLIN records are searchable by subject, keyword, CMR project, geographic region, time period, biological species, voyage reference, and more Contain metadata (“data about data”) in a common structure (ANZLIC format plus CMR-specific additional fields) Can contain links to images, related documents, data files, and other metadata records “Quick maps” (using c-squares data footprints, see later) can indicate the spatial extent of the data Who creates MarLIN metadata records? Records are created/maintained by the data custodians, who best understand the data and associated useful resources, using an online metadata entry form (Data Centre staff can assist with this process)

revised Data Centre website (extract)

Sample MarLIN content Alphabetical dataset lists

Sample MarLIN content Alphabetical dataset lists Indexes by keyword, etc.

Sample MarLIN content Alphabetical dataset lists Indexes by keyword, etc. Brief dataset details

example search result... (etc.)

Viewing the full metadata record produces... (etc.) with clickable link to show dataset extent using c-squares:

(Quick look at the ASDD)

What’s in it for me / us? Allows CMR staff / others to know what data we have already, what we are collecting (or plan to collect), what we do not have (gap analysis) – facilitates data re-use, avoids duplicate acquisition, fosters collaborations Permits inspection of relevant data documentation in order to assess data usefulness / completeness / quality, inspect thumbnails of data coverage, etc. Gives a contact person and/or electronic access for the data, via a standard entry point Provides dissemination of project scientific activities into a new “information space” – online searching via the ASDD, indexing by web search engines, possible future one-csiro system (only don’t hold your breath for the latter) Can be feasible for projects to utilise MarLIN to catalogue / access their own data – use MarLIN’s built-in search capability rather than re-invent.

Data Warehouse and Data Trawler

2000 onwards – databasing of “all” Data Centre holdings into a Divisional Data Warehouse, accessed by a custom “Data Trawler” application Historic holdings of Hydrology (bottle chemistry) and CTD data – 200,000 HYD analyses, 10,000 CTD casts, from hundreds of research voyages and coastal stations Underway data for 175 research voyages (10 million observations) – depth, position, time, meteorological variables, sea temperature, salinity, fluorescence Biological (catch composition) data from 85 voyages – 10,000 trawls, 240,000 individual species records (number or weight caught) Currents data from 548 moored current meters (3 million readings) ADCP data, some old hydrology data still in archives, awaiting migration to on-line Warehouse system. Also note, c. 50% of Divisional catch data is not held by the Data Centre at this time (probably still with original investigators)

example Data Trawler Screens

HYD and CTD data – all years current Warehouse content accessible via Data Trawler

moorings data – all years current Warehouse content accessible via Data Trawler

catch data – all years current Warehouse content accessible via Data Trawler

What’s in it for me / us? Provides access to centrally held data on a self-serve basis, via a standard web browser Allows queries to be constructed by data type, region, time period, species, voyage... Contains the actual data, but not text information (the latter is in MarLIN) Permits retrieval of data across multiple projects, as integrated result set in a common format Provides preview / mapping of spatial extents of result sets generated (closer to true web GIS facility cf. MarLIN, which is more of a quick “thumbnail” facility) Data are provided in csv / spreadsheet compatible format, suitable for upload to user’s own machine for further manipulation.

Remote Applications Divisional Systems “MarLIN” Data Catalogue Divisional Data Warehouse “Data Trawler” application Austr. Spatial Data Directory (ASDD) Hyperlinked documents, graphics, etc. Project-based data holdings Off line archived data Systems considered thus far...

CAAB Codes for Australian Aquatic Biota master taxonomic database

1999-current – upgrading of “CAAB” master taxon management system for the Division CAAB (Codes for Australian Aquatic Biota) is a database of species names and codes, now covering >25,000 marine species in Australian waters codes are standardised species identifiers for use in Divisional databases (species names may change, codes are intended to be constant) “quick maps” of all catch data in the Warehouse (by species) have been associated with relevant CAAB record; also predicted species ranges for c. 3,000 fish species individual maps form clickable interface(s) to retrieve corresponding data items (individual catch records) from the warehouse and display in a web page

web-accessible version of CAAB

What’s in it for me / us? Codes are a standard storage and interchange format for taxonomic information in CMR and other regional databases CAAB website and derived tables allow matching of codes to names, and vice versa Check correct spelling of species names, full citation, generate Australian species lists per genus / family / larger category Links to pictures and maps of CMR data distribution, where available “Quick maps” form clickable front end to Data Warehouse queries Also provides access to most recent predicted species range in many cases Potentially supports “what lives here” queries from predicted species ranges and specified depths (fishes only, at present time).

C-squares Concise Spatial Query and Representation System spatial indexing and mapping utility

“C-squares” mapping / spatial indexing utility Original Data Centre creation, 2001 onwards Mainly a developer’s tool Permits “lightweight” spatial indexing, queries, and web mapping from a standard text-based system (no GIS required) Currently used in 4 CMR and 3 international systems (Tony Rees can supply more details if interested).

OBIS Ocean Biogeographic Information System

OBIS – Ocean Biogeographic Information System Operated by an international consortium, including CMR representation Like a “super CAAB” for the world, but with names only (not codes) Can currently access point data for 20,000 marine species from c. 20 institutions worldwide (2 million records), plus lists of names awaiting data, and returns integrated result sets (like Data Trawler) Many aspects similar to CAAB, including “Quick maps”, click-on-map spatial queries, OBIS taxonomic groups, and more (Data Centre staff did the interface and query logic) CMR catch data to be visible via the system in due course.

Data Centre Services to CMR projects

Who are we? Tony Rees (Hobart) – Data Centre manager; MarLIN, CAAB, C-squares technical support & development; national & international connections; project-level advice (metadata) Pamela Brodie, Leanne Wilkes (Hobart) – Data Warehouse, Data Trawler support and data loading; project-level advice (databases) Miroslaw Ryba (Hobart) – Oracle support; ships biological data collection suite Terry Byrne (Hobart) – National Facility Data Librarian; data requests; data archiving Hiski Kippo (Floreat) – project-level liaison, DC representation (WA) Steven Edgar (Cleveland) – project-level liaison, DC representation (QLD)

“On the ground” DC services to CMR projects Advice and assistance to CMR project staff – metadata entry, database design, general data management issues Maintaining the Division’s Oracle systems, and provision of Oracle advice and web-based help Servicing/forwarding data requests as appropriate Migrating project data to the Data Warehouse, for integration with other relevant data holdings, and archiving data to offline media as required Looking at whole-of-Division issues such as data access and exchange policies, engagement with relevant national and international data operations, cross-CSIRO data access, etc. New Data Management officers in Floreat (2002) and Cleveland (2004) Developing interest in GIS data layers and systems e.g. ArcSDE, ArcIMS Continuing to advance existing DC systems on three fronts – tools, content, and connectivity (internally, nationally, internationally).

Cleveland-specific issues... Steven Edgar has an advisory role for Data Management in projects at the Cleveland site (project personnel actually do the project-level management); can assist with database design, etc., also some/all Oracle administration needs Steve’s time (or portions of it) can be spent on migrating project data to our central warehouse/trawler system, also assisting project staff with metadata entry as needed Steve brings new expertise in GIS systems to the Data Centre; will take an interest in cross-project / cross-Divisional GIS issues and progress where possible Steve can act as conduit for technology/content/expertise transfer in 2 directions (DC Systems/tools > CMR projects and vice versa) – also the “eyes and ears” of the Data Centre in Cleveland to bring local issues to Hobart attention as needed Additional Hobart-based staff are only an or phone call away if they can be of assistance.

Summary – an idealised “data life cycle” at CMR Project starts Divisional Data Warehouse “MarLIN” Data Catalogue PSS “Data Trawler” application administrative details project overview interim documents, graphics, etc. Project-based data holdings Project completed Off line archived data Persistent project db’s project data repository project data published output

towards “best practice” data management at project level... Projects should be recording the existence of their data in MarLIN – ideally sooner rather than at end of project Data should eventually be migrated off PCs into Divisional systems As much relevant data as possible should be in the Warehouse Effort should be made to produce definitive / final version of the data Data Centre can help with archiving for closed projects Data Warehouse table structure, and other Divisional databases, can provide starting points / examples for project level databases Taxonomic / survey data recording should employ CAAB codes as a Divisional standard... refer Data Centre internal website and local Data Centre person/s for additional information.

Some action items / ideas for discussion... Upgrade MarLIN content to reflect the true data holdings of the Division (augmented with project descriptions as available) Look into migrating more “completed” project datasets into centralised (Data Centre) holdings / systems Locate as much as possible of the “missing” catch data, to add to present Warehouse content Obtain clearance as needed to make CMR catch data visible to the outside world (currently, it is all intranet-only) via Data Trawler and other linked systems (CAAB, OBIS, others) Assist project staff with pressing data management issues and work to ensure good technology transfer for database design, etc. Work with key project staff to progress the usefulness of the “new” web- enabled GIS systems across appropriate datasets, for the benefit of multiple users Identify needs to digitise important non-digital data holdings (notebooks, field log sheets etc.) and assist in seeking resources to digitise them.

Feedback / discussion time...

“C-squares” Spatial Indexing/ Mapping System Remote Applications Divisional Systems Divisional Data Warehouse “MarLIN” Data Catalogue Hyperlinked documents, graphics, etc. “CAAB” Taxonomic Database “Data Trawler” application Project-based data holdings Off line archived data Distributed AODC? OBIS? other? Austr. Spatial Data Directory (ASDD) external c-squares users – FishBase, OBIS, others jsp/loginpage.jsp asdd.ga.gov.au/asdd/ (e.g.) Summary of core Data Centre components as at April 2004