Presentation is loading. Please wait.

Presentation is loading. Please wait.

An On-line Collaborative Data Management System Roger Curry 1, Cameron Kiddle 1, Rob Simmonds 1 and Gilberto Z. Pastorello Jr. 2 1 Grid Research Centre,

Similar presentations


Presentation on theme: "An On-line Collaborative Data Management System Roger Curry 1, Cameron Kiddle 1, Rob Simmonds 1 and Gilberto Z. Pastorello Jr. 2 1 Grid Research Centre,"— Presentation transcript:

1 An On-line Collaborative Data Management System Roger Curry 1, Cameron Kiddle 1, Rob Simmonds 1 and Gilberto Z. Pastorello Jr. 2 1 Grid Research Centre, University of Calgary 2 Centre for Earth Observation Science, University of Alberta

2  Data Challenges  Related Work  Data Management System  Use Case: GeoChronos  Summary and Future Work Outline GCE 2010 Nov. 14, 2010 2

3  Data Acquisition Much scientific data stored on off-line media Cumbersome and time consuming to access Making data available on-line difficult Insufficient storage and bandwidth  Sharing of Data Lack of willingness to share data Proprietary data - need for controlled access Data Challenges - I GCE 2010 Nov. 14, 2010 3

4  Usability of Data Insufficient metadata to describe data Various metadata standards in some domains, but many lacking metadata standards – many scientists use their own metadata format  Finding Data Difficult to find data that you need Different data organized / stored differently Tools to browse, search, visualize data often lacking Data Challenges - II GCE 2010 Nov. 14, 2010 4

5  Content Management Systems i.e., Drupal, Joomla!, Microsoft SharePoint, Plone,... Offer rich set of features but do not handle:  Meaningful support to specific data formats  Efficient association of metadata and ancillary files to data sets  Access to a variety of data processing tools  Uniform handling of outputs from processing tools  Spectral Libraries i.e., USGS, ASTER, Vegetation Spectral Library (VSL) Are available on-line but lack:  ability to dynamically restructure metadata for browsing  collaboration features enabled by social networking Related Work - I GCE 2010 Nov. 14, 2010 5

6  Spectral Library Tools i.e., DLR-DFD Spectral Archive, SPECCHIO Flexibile in creating / handling metadata but:  Have a fixed metadata schema – do not support new metadata needs  Data repositories for other domains i.e., Astrophysics Data System, FLUXNET, European Bioinformatics (EBI) Databases Offer wide range of functionality but:  Primarily focus on data that is already validated and structured  Do not handle preliminary, intermediate, untested data (i.e. research in progress)  Digital Libraries i.e., Planetary Data Systems, NCore, SciPort Have flexible functionality but:  Most focus on well-defined digital artefacts  Limited in handling collaboration on evolving data, metadata and schemas Related Work - II GCE 2010 Nov. 14, 2010 6

7  Supports the following functionality: On-line access to data Enables scientists to share data while maintaining control of who sees it Ability to add and edit metadata while working with multiple schemas Collaboratively create new schemas to facilitate consistent/accurate recording of metadata Dynamically restructure the way data is browsed Data Management System - Overview GCE 2010 Nov. 14, 2010 7

8 Data Management System - Framework GCE 2010 Nov. 14, 2010 8  User & Data: User acquires data from sensor and uploads to portal Direct acquisition of data also possible  Elgg Portal: Built on top of Elgg – Open source social networking platform Fine grained access control Flexible data model  Data Storage: Currently local NFS storage Working on distributed iRODS based system  Data Ingestion Service: Creates records, parses metadata, establishes ancillary relationships Deployed on cloud-based Condor pool

9 Data Management System – Data Model GCE 2010 Nov. 14, 2010 9 Source: http://docs.Elgg.org/wiki/File:Elgg_data_model.png) Data Management System – Data Model  Arbitrary metadata can be assigned to any entity  Annotations allow users to comment on entities not owned by them  Data management system adds three new types of ElggObjects  Schema  Collection  Record

10 Data Management System - Schemas GCE 2010 Nov. 14, 2010 10  Create schemas Custom or standards-based (i.e. Dublin Core) Individually or as a collaborative team  Schemas consist of Namespace Description Read/write access permissions Series of metadata keys  Metadata keys consist of Name Description Type (text, latlong, ancillary) Optionality: required, recommended, optional

11 Data Management System - Collections  Group of related data i.e., spectral library, set of satellite data  Collection consists of Name, description, read/write access permissions, metadata, records GCE 2010 Nov. 14, 2010 11

12 Data Management System - Records GCE 2010 Nov. 14, 2010 12  Atomic unit of data management system Usually represents a single file, but does not need to be associated with a file  Tabbed interface for viewing: Spectral plot, metadata, ancillary data, map, comments Custom tabs based on data type

13 Data Management System – Virtual Directory Structure GCE 2010 Nov. 14, 2010 13  Dynamic restructuring of data for browsing purposes  Folders based on metadata keys/values  User can customize the metadata keys used to establish the directory hierarchy

14 Use Case - GeoChronos GCE 2010 Nov. 14, 2010 14 (http://geochronos.org/)

15  An on-line platform For:  Earth Observation Scientists Facilitating:  Collaboration between scientists  Data access, management and sharing  Application access, management and sharing Leveraging:  Web 2.0 and social networking technologies  Cloud computing technologies Funded by:  CANARIE - Network Enabled Platform (NEP-1) program  Cybera GeoChronos - Overview GCE 2010 Nov. 14, 2010 15

16 GeoChronos - Project Team GCE 2010 Nov. 14, 2010 16 Dr. Arturo Sanchez-Azofeifa University of Alberta Dr. John Gamon University of Alberta Dr. Benoit Rivard University of Alberta Dr. Rob Simmonds University of Calgary Prinicipal Investigators Project CoordinationPlatform DevelopmentDomain Scientists

17 GeoChronos - Virtual Organization GCE 2010 Nov. 14, 2010 17

18  Libraries created Ingested some existing on-line libraries  USGS, ASTER, Vegetation Spectral Library (VSL)  Many enhanced features as part of GeoChronos Spectral Library module - improved browsing, dynamic plotting, mapping, annotations,... Domain scientists have contributed libraries  Rock samples, tar sand samples, lichen samples, vegetation samples, alfalfa/barley field samples  Data formats / parsers supported ENVI, UNISPEC, ASD, several ASCII formats  Schemas incorporated Library specific – USGS, ASTER, VSL,... Sensor/Format specific – UNISPEC, ENVI,.. Other Standards – Dublin Core  Currently hosting (including MODIS data) 10+ schemas, 20+ collections (libraries), 20,000+ records GeoChronos – Spectral Libraries GCE 2010 Nov. 14, 2010 18

19 GeoChronos – MODIS Satellite Data  Developed automated workflow service for mosaicing, subsetting, reprojecting and masking MODIS satellite data  Significantly reduces time that scientists have spent manually doing such workflows  Data management system used to store raw MODIS satellite data and data products derived from the workflow  Parsers/schemas specific to MODIS data have been added to system  User provided with same powerful interface as Spectral Libraries for browsing, accessing and viewing data GCE 2010 Nov. 14, 2010 19

20  Have developed data management system in an interactive, iterative fashion  Domain scientists on project have provided much guidance, testing and feedback  Have customized, enhanced the data management system based on feedback received GeoChronos – User Feedback GCE 2010 Nov. 14, 2010 20

21  Identified data related challenges facing scientists  Discussed some related efforts and shortcomings of these approaches  Presented an on-line collaborative data management system addressing many data challenges  Showed example usage of the data management system by GeoChronos Summary GCE 2010 Nov. 14, 2010 21

22  Currently have a single local data repository Working on extending data management system to work with distributed data repositories using iRODS  Currently have powerful browsing functionality Need to add search functionality across collections and based on metadata values  Currently support custom metadata schemas Plan to make use of Semantic Web technologies to better relate data and provide ontological mapping between different metadata schemas / standards  Currently work with spectral and MODIS satellite data Plan to incorporate other data such as carbon flux data, other satellite data, meteorological data, phenology tower data Next Steps GCE 2010 Nov. 14, 2010 22

23 Contact Information GCE 2010 Nov. 14, 2010 23 http://geochronos.org/ info@geochronos.org http://grid.ucalgary.ca/ http://ceos.ualberta.ca/http://www.cybera.ca/


Download ppt "An On-line Collaborative Data Management System Roger Curry 1, Cameron Kiddle 1, Rob Simmonds 1 and Gilberto Z. Pastorello Jr. 2 1 Grid Research Centre,"

Similar presentations


Ads by Google