Thomas Huang PO.DAAC Software System Engineer Jet Propulsion Laboratory California Institute of Technology These activities were carried out at the Jet.

Slides:



Advertisements
Similar presentations
National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology Pasadena, California Metadata-Centric Discovery.
Advertisements

Netscape Application Server Application Server for Business-Critical Applications Presented By : Khalid Ahmed DS Fall 98.
Technical Architectures
Robust Tools for Archiving and Preserving Digital Data Joseph JaJa, Mike Smorul, and Mike McGann Institute for Advanced Computer Studies Department of.
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
An Agent-Oriented Approach to the Integration of Information Sources Michael Christoffel Institute for Program Structures and Data Organization, University.
SOA, EDA, ECM and more Discover a pragmatic architecture for an intelligent enterprise, to maximize impact on the business Patrice Bertrand Software Architect.
Slide 1 of 9 Presenting 24x7 Scheduler The art of computer automation Press PageDown key or click to advance.
Understanding and Managing WebSphere V5
® IBM Software Group © IBM Corporation IBM Information Server Service Oriented Architecture WebSphere Information Services Director (WISD)
System Design/Implementation and Support for Build 2 PDS Management Council Face-to-Face Mountain View, CA Nov 30 - Dec 1, 2011 Sean Hardman.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting February 24-25, 2003.
Configuration Management Process and Environment MACS Review 1 February 5th, 2010 Roland Moser PR a-RMO, February 5 th, 2010 R. Moser 1 R. Gutleber.
Boštjan Šumak dr. Marjan Heričko THE ROLE OF BIZTALK SERVER IN BUSINESS PROCESS INTEGRATION.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
Crystal Hoyer Program Manager IIS Team Preview of features that will be announced at MIX09 Please do not blog, take pictures or video of session.
CGW 2003 Institute of Computer Science AGH Proposal of Adaptation of Legacy C/C++ Software to Grid Services Bartosz Baliś, Marian Bubak, Michał Węgiel,
Institute of Computer and Communication Network Engineering OFC/NFOEC, 6-10 March 2011, Los Angeles, CA Lessons Learned From Implementing a Path Computation.
OASIS ebXML Registry Standard Open Forum 2003 on Metadata Registries 10:30 – 11:15 January 20, 2003 Kathryn Breininger The Boeing Company Chair, OASIS.
Codeigniter is an open source web application. It occupies a very small amount of space in the memory and is most useful for developers who aim to develop.
Flexibility and user-friendliness of grid portals: the PROGRESS approach Michal Kosiedowski
GCMD/IDN STATUS AND PLANS Stephen Wharton CWIC Meeting February19, 2015.
1 OPeNDAP/ECHO Demo Integrating and Chaining services September, 2006 CEOS WGISS 22 Annapolis, MD.
material assembled from the web pages at
Page 1 Informatics Pilot Project EDRN Knowledge System Working Group San Antonio, Texas January 21, 2001 Steve Hughes Thuy Tran Dan Crichton Jet Propulsion.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting October 10-11, 2002.
1 A National Virtual Specimen Database for Early Cancer Detection June 26, 2003 Daniel Crichton NASA Jet Propulsion Laboratory Sean Kelly NASA Jet Propulsion.
OOI CI LCA REVIEW August 2010 Ocean Observatories Initiative OOI Cyberinfrastructure Architecture Overview Michael Meisinger Life Cycle Architecture Review.
CD ASTER Scenario: Backward Chaining INSERTION RETRIEVAL PRODUCTION Subscribe Search & Order Store External Data Provider User Deliver Generate.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Introduction to the Adapter Server Rob Mace June, 2008.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
Middleware for FIs Apeego House 4B, Tardeo Rd. Mumbai Tel: Fax:
SOFTWARE DESIGN AND ARCHITECTURE LECTURE 05. Review Software design methods Design Paradigms Typical Design Trade-offs.
Sea Ice Mapping Systems Archive Browser Interface Distribution IngestProduction Ice Analyst Application Database Henrik Steen AndersonDMI Paul SeymourNIC.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
CLASS Information Management Presented at NOAATECH Conference 2006 Presented by Pat Schafer (CLASS-WV Development Lead)
IODE Ocean Data Portal - ODP  The objective of the IODE Ocean Data Portal (ODP) is to facilitate and promote the exchange and dissemination of marine.
National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology Pasadena, California EDGE: The Multi-Metadata.
CS 501: Software Engineering Fall 1999 Lecture 12 System Architecture III Distributed Objects.
User Working Group 2013 Data Access Mechanisms – Status 12 March 2013
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
Agile SOA Agile EAI How do we achieve agility in Enterprise Integration?
Mercury – A Service Oriented Web-based system for finding and retrieving Biogeochemical, Ecological and other land- based data National Aeronautics and.
Fleet Numerical… Atmospheric & Oceanographic Prediction Enabling Fleet Safety and Decision Superiority… Fleet Numerical Meteorology & Oceanography Center.
AMQP, Message Broker Babu Ram Dawadi. overview Why MOM architecture? Messaging broker like RabbitMQ in brief RabbitMQ AMQP – What is it ?
Ceilometer + Gnocchi + Aodh Architecture
System/SDWG Update Management Council Face-to-Face Flagstaff, AZ August 22-23, 2011 Sean Hardman.
ECS Metadata Considerations for Preservation SiriJodha S. Khalsa National Snow and Ice Data Center.
Securing the Grid & other Middleware Challenges Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer.
Software Development and Deployment PDS Management Council Face-to-Face Berkeley, California November 18-19, 2014 Sean Hardman.
Pavel Nevski DDM Workshop BNL, September 27, 2006 JOB DEFINITION as a part of Production.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Regional Nagios Emir Imamagic /SRCE EGEE’09,
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Simulation Production System Science Advisory Committee Meeting UW-Madison March 1 st -2 nd 2007 Juan Carlos Díaz Vélez.
Physical Oceanography Distributed Active Archive Center THUANG June 9-13, 20089th GHRSST-PP Science Team Meeting GHRSST GDAC and EOSDIS PO.DAAC.
Interstage BPM v11.2 1Copyright © 2010 FUJITSU LIMITED INTERSTAGE BPM ARCHITECTURE BPMS.
Edward Armstrong, Jorge Vazquez, Andrew Bingham, Thomas Huang, Chris Finch, Charles Thompson, Tim McKnight, and Cynthia Chen JPL PO.DAAC / GDAC California.
1 Retirement of Legacy Features Why? –Improved usability and performance for Access Controls, Order Option Definitions, etc. through MMT GUI instead of.
International Planetary Data Alliance Registry Project Update September 16, 2011.
R2R ↔ NODC Steve Rutz NODC Observing Systems Team Leader May 12, 2011 Presented by L. Pikula, IODE OceanTeacher Course Data Management for Information.
Edward Armstrong, Jorge Vazquez, Andrew Bingham, Thomas Huang, Chris Finch, Charles Thompson, Tim Stough, Tim McKnight, and Cynthia Chen JPL PO.DAAC /
Sea Surface Temperature Distribution from the Physical Oceanography DAAC Ed Armstrong JPL PO.DAAC MODIS Science Team Meeting.
AIRS Meeting GSFC, February 1, 2002 ECS Data Pool Gregory Leptoukh.
What is BizTalk ?
Simulation Production System
Netscape Application Server
Joseph JaJa, Mike Smorul, and Sangchul Song
EIN 6133 Enterprise Engineering
PerformanceBridge Application Suite and Practice 2.0 IT Specifications
Presentation transcript:

Thomas Huang PO.DAAC Software System Engineer Jet Propulsion Laboratory California Institute of Technology These activities were carried out at the Jet Propulsion Laboratory, California Institute of Technology, under a contract with the National Aeronautics and Space Administration. © 2010 California Institute of Technology. Government sponsorship acknowledged.

About me… PO.DAAC Software System Engineer and Architect of its Data Management and Archive System Background in planetary data management, secure near real-time distribution systems Huang

Outline Pattern for data ingestion to distribution Our legacy data system The new PO.DAAC Data Management and Archive System Conclusion Q&A Huang

Simple Pattern Huang

Can All These Broken Pieces Fit? Huang

Legacy Data Systems Huang … It Works!? 3 different data systems according to the simple pattern Deployed in multiple instances Mostly consists of one-off scripts Limited reusability Limited portability Scalability? Reliability?

stovepipe Legacy Data Systems Huang

Our New Data Management and Archive System Huang

Software Development Process

Technologies and Standards Huang

Documents Huang

Architecture A system of RESTful services Standardized messages exchange between services Unified data model Distributed data ingestion services Standardized event tracking and notification service Huang

Manager Webservice Transaction-Oriented Load-Balanced job assignment On-The-Fly Deployment of Engines Dynamic support of new data product State-Driven Product Management Resource Management Transaction-Oriented Load-Balanced job assignment On-The-Fly Deployment of Engines Dynamic support of new data product State-Driven Product Management Resource Management RESTful Huang

File Management Engines RESTful Lightweight RESTful file service Supports typical file operations (add, move, delete, etc.) A single instance can carryout multiple granule operations in parallel Supports various file protocols (FTP, SFTP, FILE, HTTP… etc.) Tracks and limits the number of jobs it can handle Trans and limits the number of outbound communications Typical instances: ingest, archive, and purge Lightweight RESTful file service Supports typical file operations (add, move, delete, etc.) A single instance can carryout multiple granule operations in parallel Supports various file protocols (FTP, SFTP, FILE, HTTP… etc.) Tracks and limits the number of jobs it can handle Trans and limits the number of outbound communications Typical instances: ingest, archive, and purge Huang

Product Inventory Unified Metadata Data Model References applicable models (e.g. ISO 19115, DIF, DIF, ECHO, GCMD…) Extensible to support capturing of collection/dataset/granule-specific data attributes Support geospatial data Support project-specific data archive and distribution policies Unified Metadata Data Model References applicable models (e.g. ISO 19115, DIF, DIF, ECHO, GCMD…) Extensible to support capturing of collection/dataset/granule-specific data attributes Support geospatial data Support project-specific data archive and distribution policies Huang

Data Handlers An application framework Plugin interface for product-specific metadata handling and validation Transforming product metadata into internal Submission Information Package (SIP) Data discovery Local caching of data products Huang

Data Handlers - GHRSST Adaptation – MMR validation and translation – Data file validation – Scans local/remote locations for new data – Integration with back-end RDAC cluster Inventory – Full migration from existing MySQL database Port to use the new data model – FGDC and Index generators – Website Adaptation – MMR validation and translation – Data file validation – Scans local/remote locations for new data – Integration with back-end RDAC cluster Inventory – Full migration from existing MySQL database Port to use the new data model – FGDC and Index generators – Website Huang The Group for High-Resolution Sea Surface Temperature (GHRSST) Ingest and maintain interfaces to 52 GHRSST L2P/L3P/L4 datastreams from 10 Regional Data Assembly Center (RDAC) ~25GB/day >5000 granules/day Realtime quality checking for data and metadata granules Create Federal Geographic Data Committee metadata for daily collection granules Distribution via FTP/OPeNDAP/POET Maintain interfaces to the LTSRF for 30- day old data and metadata exchange

Data Handlers - ASCAT Adaptation – Metadata validation and translation – Data file validation – Scans remote locations for new data Dataset definition and policies Adaptation – Metadata validation and translation – Data file validation – Scans remote locations for new data Dataset definition and policies Huang The Advanced SCATterometer (ASCAT) Ingest and maintain interfaces to 2 L2 datastreams KNMI ~57 MB/day ~21 GB/year

Significant Event WS Huang

Significant Event Web Huang

DAAC in a Box? Huang

“premature optimization is the root of all evil.” Donald Knuth “The Art of Computer Programming” Huang

Ingest3 (36 parallel jobs) Archive3 (36 parallel jobs) Purge2 (20 parallel jobs) 21,254 granules/day 4 seconds/granule 21,254 granules/day 4 seconds/granule Implementation Optimization Database Performance Turning Implementation Optimization Database Performance Turning Sample Performance Huang

Conclusion PO.DAAC DMAS A system of RESTful webservices Scalable Portable Extensible Operationally supports GHRSST and ASCAT Future works New products: Aquarius GHRSST GDS 2.0 metadata model Migration Data subscription Administration tools Huang

BACKUP SLIDES

FY ‘09 Highlights Webservice Architecture Data Ingestion and Archive WS Distributed Ingestion/Archive Engines Load Balancing Service Monitoring Significant Event WS Suite of reusable components ECHO publication Dataset and Granule metadata GHRSST ASCAT L2 ASCAT Huang

Product Subscription Enable implementation of value-added services

Archive Tools Metadata Distribution

… can we build a data system with all these characteristics? Scalable Simple Speed Standardize Our Challenge Huang

Load-Balance Transaction-Oriented On-The-Fly Deployment of Engines Dynamic support of new data product Scalable State-Driven Job Management Load-Balance Transaction-Oriented On-The-Fly Deployment of Engines Dynamic support of new data product Scalable State-Driven Job Management DMAS – Ingestion and Archive Service Huang

DMAS – Significant Event Service Huang

Swath Tiler Metadata Submission Metadata Submission Dataset subscriber Trigger by newly archived granules Dispatch swath tiling program Submit tiling metadata to NAIAD WS Dataset subscriber Trigger by newly archived granules Dispatch swath tiling program Submit tiling metadata to NAIAD WS DMAS – Data Subscriber Integration with NAIAD Huang

DMAS Goals Service tools administration product rollout contact management New data subscription capability Making DMAS the data hub - RSS feed, automatic delivery of new granule, thumbnail generation… etc. New dataset search capability evaluating VODC – ACCESS program New data products Legacy migration support Planning 4 DMAS releases FY ’10 2 System Releases (DMAS + T&S) Huang

Configuration Management How to management versions of third-party software dependency matrix upgrade to one or more third-party software Standard development process between development teams change management software packaging dependency management Standard build and deployment process FY ’10 CM? Huang