Presentation is loading. Please wait.

Presentation is loading. Please wait.

Thomas Huang PO.DAAC Software System Engineer Jet Propulsion Laboratory California Institute of Technology These activities were carried out at the Jet.

Similar presentations


Presentation on theme: "Thomas Huang PO.DAAC Software System Engineer Jet Propulsion Laboratory California Institute of Technology These activities were carried out at the Jet."— Presentation transcript:

1

2 Thomas Huang PO.DAAC Software System Engineer Jet Propulsion Laboratory California Institute of Technology These activities were carried out at the Jet Propulsion Laboratory, California Institute of Technology, under a contract with the National Aeronautics and Space Administration. © 2010 California Institute of Technology. Government sponsorship acknowledged.

3 About me… PO.DAAC Software System Engineer and Architect of its Data Management and Archive System Background in planetary data management, secure near real-time distribution systems Huang - 01062010

4 Outline Pattern for data ingestion to distribution Our legacy data system The new PO.DAAC Data Management and Archive System Conclusion Q&A Huang - 01062010

5 Simple Pattern Huang - 01062010

6 Can All These Broken Pieces Fit? Huang - 01062010

7 Legacy Data Systems Huang - 01062010 … It Works!? 3 different data systems according to the simple pattern Deployed in multiple instances Mostly consists of one-off scripts Limited reusability Limited portability Scalability? Reliability?

8 stovepipe Legacy Data Systems Huang - 01062010

9 Our New Data Management and Archive System Huang - 01062010

10 Software Development Process

11 Technologies and Standards Huang - 01062010

12 Documents Huang - 01062010

13 Architecture A system of RESTful services Standardized messages exchange between services Unified data model Distributed data ingestion services Standardized event tracking and notification service Huang - 01062010

14 Manager Webservice Transaction-Oriented Load-Balanced job assignment On-The-Fly Deployment of Engines Dynamic support of new data product State-Driven Product Management Resource Management Transaction-Oriented Load-Balanced job assignment On-The-Fly Deployment of Engines Dynamic support of new data product State-Driven Product Management Resource Management RESTful Huang - 01062010

15 File Management Engines RESTful Lightweight RESTful file service Supports typical file operations (add, move, delete, etc.) A single instance can carryout multiple granule operations in parallel Supports various file protocols (FTP, SFTP, FILE, HTTP… etc.) Tracks and limits the number of jobs it can handle Trans and limits the number of outbound communications Typical instances: ingest, archive, and purge Lightweight RESTful file service Supports typical file operations (add, move, delete, etc.) A single instance can carryout multiple granule operations in parallel Supports various file protocols (FTP, SFTP, FILE, HTTP… etc.) Tracks and limits the number of jobs it can handle Trans and limits the number of outbound communications Typical instances: ingest, archive, and purge Huang - 01062010

16 Product Inventory Unified Metadata Data Model References applicable models (e.g. ISO 19115, DIF, DIF, ECHO, GCMD…) Extensible to support capturing of collection/dataset/granule-specific data attributes Support geospatial data Support project-specific data archive and distribution policies Unified Metadata Data Model References applicable models (e.g. ISO 19115, DIF, DIF, ECHO, GCMD…) Extensible to support capturing of collection/dataset/granule-specific data attributes Support geospatial data Support project-specific data archive and distribution policies Huang - 01062010

17 Data Handlers An application framework Plugin interface for product-specific metadata handling and validation Transforming product metadata into internal Submission Information Package (SIP) Data discovery Local caching of data products Huang - 01062010

18 Data Handlers - GHRSST Adaptation – MMR validation and translation – Data file validation – Scans local/remote locations for new data – Integration with back-end RDAC cluster Inventory – Full migration from existing MySQL database Port to use the new data model – FGDC and Index generators – Website Adaptation – MMR validation and translation – Data file validation – Scans local/remote locations for new data – Integration with back-end RDAC cluster Inventory – Full migration from existing MySQL database Port to use the new data model – FGDC and Index generators – Website Huang - 01062010 The Group for High-Resolution Sea Surface Temperature (GHRSST) Ingest and maintain interfaces to 52 GHRSST L2P/L3P/L4 datastreams from 10 Regional Data Assembly Center (RDAC) ~25GB/day >5000 granules/day Realtime quality checking for data and metadata granules Create Federal Geographic Data Committee metadata for daily collection granules Distribution via FTP/OPeNDAP/POET Maintain interfaces to the LTSRF for 30- day old data and metadata exchange

19 Data Handlers - ASCAT Adaptation – Metadata validation and translation – Data file validation – Scans remote locations for new data Dataset definition and policies Adaptation – Metadata validation and translation – Data file validation – Scans remote locations for new data Dataset definition and policies Huang - 01062010 The Advanced SCATterometer (ASCAT) Ingest and maintain interfaces to 2 L2 datastreams KNMI ~57 MB/day ~21 GB/year

20 Significant Event WS Huang - 01062010

21 Significant Event Web Huang - 01062010

22 DAAC in a Box? Huang - 01062010

23 “premature optimization is the root of all evil.” Donald Knuth “The Art of Computer Programming” Huang - 01062010

24 Ingest3 (36 parallel jobs) Archive3 (36 parallel jobs) Purge2 (20 parallel jobs) 21,254 granules/day 4 seconds/granule 21,254 granules/day 4 seconds/granule Implementation Optimization Database Performance Turning Implementation Optimization Database Performance Turning Sample Performance Huang - 01062010

25 Conclusion PO.DAAC DMAS A system of RESTful webservices Scalable Portable Extensible Operationally supports GHRSST and ASCAT Future works New products: Aquarius GHRSST GDS 2.0 metadata model Migration Data subscription Administration tools Huang - 01062010

26

27 BACKUP SLIDES

28 FY ‘09 Highlights Webservice Architecture Data Ingestion and Archive WS Distributed Ingestion/Archive Engines Load Balancing Service Monitoring Significant Event WS Suite of reusable components ECHO publication Dataset and Granule metadata GHRSST ASCAT L2 ASCAT Huang - 09022009

29 Product Subscription Enable implementation of value-added services

30 Archive Tools Metadata Distribution

31 … can we build a data system with all these characteristics? Scalable Simple Speed Standardize Our Challenge Huang - 09022009

32 Load-Balance Transaction-Oriented On-The-Fly Deployment of Engines Dynamic support of new data product Scalable State-Driven Job Management Load-Balance Transaction-Oriented On-The-Fly Deployment of Engines Dynamic support of new data product Scalable State-Driven Job Management DMAS – Ingestion and Archive Service Huang - 09022009

33 DMAS – Significant Event Service Huang - 09022009

34 Swath Tiler Metadata Submission Metadata Submission Dataset subscriber Trigger by newly archived granules Dispatch swath tiling program Submit tiling metadata to NAIAD WS Dataset subscriber Trigger by newly archived granules Dispatch swath tiling program Submit tiling metadata to NAIAD WS DMAS – Data Subscriber Integration with NAIAD Huang - 09022009

35 DMAS Goals Service tools administration product rollout contact management New data subscription capability Making DMAS the data hub - RSS feed, automatic delivery of new granule, thumbnail generation… etc. New dataset search capability evaluating VODC – ACCESS program New data products Legacy migration support Planning 4 DMAS releases FY ’10 2 System Releases (DMAS + T&S) Huang - 09022009

36 Configuration Management How to management versions of third-party software dependency matrix upgrade to one or more third-party software Standard development process between development teams change management software packaging dependency management Standard build and deployment process FY ’10 CM? Huang - 09022009


Download ppt "Thomas Huang PO.DAAC Software System Engineer Jet Propulsion Laboratory California Institute of Technology These activities were carried out at the Jet."

Similar presentations


Ads by Google