Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Models for scientific exploitation of EO data * ESRIN * 12.10.2012.

Similar presentations


Presentation on theme: "1 Models for scientific exploitation of EO data * ESRIN * 12.10.2012."— Presentation transcript:

1 1 Models for scientific exploitation of EO data * ESRIN * 12.10.2012

2 Calvalus Full mission EO cal/val processing and exploitation services 2 Models for scientific exploitation of EO data * ESRIN * 12.10.2012

3 Outline  Objectives and achievements  Apache Hadoop in five slides  Calvalus = Hadoop for EO  Calvalus bulk processing 3 * Sixth Symposium on Operating System Design and Implementation; San Francisco, CA, 2004 Models for scientific exploitation of EO data * ESRIN * 12.10.2012 Jeffrey Dean and Sanjay Ghemawat, Google, 2004: “MapReduce: Simplified Data Processing on Large Clusters” *

4  exploit easily full mission EO archives  have a powerful and affordable multi-mission processing infrastructure  generate products using full mission datasets, with new algorithms and algorithm versions  aggregate results in temporal and spatial dimension  test new ideas in a rapid prototyping approach  have a tool to perform calibration and validation on full mission archives as the basis for reliable scientific conclusions There was a dream … 4 Models for scientific exploitation of EO data * ESRIN * 12.10.2012  Robust production

5 Calvalus for Land Cover CCI Pre-processing 5 Generation of 7-day composites of surface reflectance from full mission MERIS FRS and RR for CCI Land Cover is a data and computing intensive automated job that runs for 3 months on a 72 nodes Calvalus/Hadoop cluster Quicklook generation for full mission MERIS FRS and RR reads and processes 150 TB input data in 10 hours. This is about 50 Gbit/s. Other full mission processes are between these two times. Models for scientific exploitation of EO data * ESRIN * 12.10.2012

6 Projects using Calvalus  ESA CoastColour: 6 years MERIS FR, 27 regions  ESA Land Cover CCI: pre-processing, full mission weekly L3 from MERIS and SPOT VGT  ESA Ocean Colour CCI: algorithm improvement cycle, MODIS, SeaWiFS, MERIS  GlobVeg: global FAPAR and LAI from MERIS  Prevue: MERIS full mission subset extraction  Fronts: MERIS detection of fronts  Diversity II: bio-diversity of lakes and drylands 6 Models for scientific exploitation of EO data * ESRIN * 12.10.2012

7 Hadoop = HDFS + jobs/tasks + MapReduce Archive-centric approach  Network storage  data are transferred on the network  risk of network bottleneck 7 Direct, data-local processing Compute cluster Network data archive Hadoop approach  data-local processing  tasks are transferred on the network  good scalability Models for scientific exploitation of EO data * ESRIN * 12.10.2012

8 Cluster hardware and network  standard hardware  Calvalus additions for I/O and development 8 node 1 local disk node 2 local disk node n local disk... master feeder external data source or destination test server test 1 vm1 node 3 local disk node 4 local disk Models for scientific exploitation of EO data * ESRIN * 12.10.2012

9 Hadoop Distributed File System 9 distributed file system HDFS on local disks of compute nodes transparent, optimised data-local access data replication automated recovery continued service Models for scientific exploitation of EO data * ESRIN * 12.10.2012

10 Hadoop Job Scheduling 10  flexible granularity of inputs defined by split functions (for EO: one file – one split)  massive parallel processing, task pull  takes failure into account, automated re-attempt, optional speculative execution  job queues, priorities, fair sharing among projects Job Input set Task Input split Task Input split Task Input split Task Input split Task Input split data-local processing Models for scientific exploitation of EO data * ESRIN * 12.10.2012 500.... 50000

11 Parallel aggregation with MapReduce  data-local access of inputs  a well-selected sorting and partitioning function  generation of the output in parts that can be simply concatenated 11 Models for scientific exploitation of EO data * ESRIN * 12.10.2012

12

13 L2 Bulk Processing Realisation MERIS RR L1, North Sea, 3 days CoastColour NN L2 processor 6 minutes (22 nodes) output: L2 files L1 File L2 Processor (Mapper Task) L2 Processor (Mapper Task) L2 File L1 File L2 Processor (Mapper Task) L2 Processor (Mapper Task) L2 File L1 File L2 Processor (Mapper Task) L2 Processor (Mapper Task) L2 File L1 File L2 Processor (Mapper Task) L2 Processor (Mapper Task) L2 File L1 File L2 Processor (Mapper Task) L2 Processor (Mapper Task) L2 File 13 Models for scientific exploitation of EO data * ESRIN * 12.10.2012

14 Match-up Analysis Realisation MERIS RR L1, global, 3 months CoastColour C2W processor NOMAD in-situ dataset 6 minutes (22 nodes) Scatter-plots and pixel extraction L1 File L2 Proc. & Matcher (Mapper Task) OutpRecs L1 File L2 Proc. & Matcher (Mapper Task) OutpRecs L1 File L2 Proc. & Matcher (Mapper Task) OutpRecs L1 File L2 Proc. & Matcher (Mapper Task) OutpRecs L1 File L2 Proc. & Matcher (Mapper Task) OutpRecs MA Output Gen. (Reducer Task) MA Output Gen. (Reducer Task) Inp Recs MA Report 14 Models for scientific exploitation of EO data * ESRIN * 12.10.2012

15 L2/L3 Processing Realisation MERIS RR L1, global, 10-day CoastColour C2W processor 1.5 hours (22 nodes) 1 L3 product L3 Temp. Binning (Reducer Task) L3 Temp. Binning (Reducer Task) Spa.Bins L1 File L2 Proc. & Spat. Binning (Mapper Task) L1 File L2 Proc. & Spat. Binning (Mapper Task) Spat.Bins L1 File L2 Proc. & Spat. Binning (Mapper Task) Spat.Bins L1 File L2 Proc. & Spat. Binning (Mapper Task) Spat.Bins L1 File L2 Proc. & Spat. Binning (Mapper Task) Spat.Bins L3 Temp. Binning (Reducer Task) L3 Temp. Binning (Reducer Task) L3 File(s) Temp.Bins L3 Formatting (Staging) L3 Formatting (Staging) 11 Models for scientific exploitation of EO data * ESRIN * 12.10.2012

16 Trend Analysis Realisation MERIS RR L1, South Pacific Gyre, 2002- 2010, first 4 days of a month CoastColour C2W processor 30 minutes (22 nodes) Time-series plots and data L3 Temp. Binning Spat.Bins L1 File L2 Proc. & Spat. Binning (Mapper Task) L1 File L2 Proc. & Spat. Binning (Mapper Task) Spat.Bins L1 File L2 Proc. & Spat. Binning (Mapper Task) Spat.Bins L1 File L2 Proc. & Spat. Binning (Mapper Task) Spat.Bins L1 File L2 Proc. & Spat. Binning (Mapper Task) Spat.Bins L3 Temp. Binning (Reducer Task) L3 Temp. Binning (Reducer Task) Temp.Bins L3 Temp. Binning Spat.Bins L1 File L2 Proc. & Spat. Binning (Mapper Task) L1 File L2 Proc. & Spat. Binning (Mapper Task) Spat.Bins L1 File L2 Proc. & Spat. Binning (Mapper Task) Spat.Bins L1 File L2 Proc. & Spat. Binning (Maper Task) Spat.Bins L1 File L2 Proc. & Spat. Binning (Mapper Task) Spat.Bins L3 Temp. Binning (Reducer Task) L3 Temp. Binning (Reducer Task) TA Report Temp.Bins TA Formatting (Staging) TA Formatting (Staging) 16 Models for scientific exploitation of EO data * ESRIN * 12.10.2012

17 Processor integration  Adapter for Unix executables (C++, Fortran, Python,...)  Adapter for BEAM GPF operators  Concurrent processor versions in the system  Automated deployment of processor bundles at runtime 17 Models for scientific exploitation of EO data * ESRIN * 12.10.2012

18 Supported by BEAM Graph Processing Framework Access to data via reader/writer objects instead of files Operator chaining to build processors from modules Tile cache and pull principle for in-memory processing Hadoop MapReduce for partitioning and streaming Calvalus + BEAM for data streaming 18 Models for scientific exploitation of EO data * ESRIN * 12.10.2012

19 Quality check in bulk processing workflows QL gen 1 day QL visual QC black list autom. QC inven tory QL gen SR QL visual QC black list GET ASSE ORB ATT error report GET ASSE feed back FRS/ RR L1B AMOR GOS FRG/ RRG L1B L2 proc. SDR 7 day SR compo L3 proc. autom. QC inven tory Models for scientific exploitation of EO data * ESRIN * 12.10.2012 700 inputs with issues identified in MERIS L1B

20 Bulk production control for full mission reprocessing 20 Processing Monitor Request Queue Workflow engine Resource management start bulk production concurrent processing steps progress observation parameterssequencingresourcesconstraints reportstatus years, increasing two months at a time processing workflow processor versions,... Models for scientific exploitation of EO data * ESRIN * 12.10.2012

21 Jobs and tasks to be managed 21 Workflow StepBulksJobsTasksInputsOutputs Input MERIS FRS+RR 2002-12150 TB auto-QA+inventory 1 2021002021000020 QL daily202173002100007300 QL scenes20210000 visual QA screening inputs7300+ AMORGOS geocoding 1 240210000 Level 2 SDR processing240210000 Level 3 SR 7-day composites 1 10402474402100001040000 QL SR1040104104010400001040 visual QA screening outputs1040+ SR result export(10)60TB Sum326202345800 Models for scientific exploitation of EO data * ESRIN * 12.10.2012

22 Calvalus portal for on-demand processing 22 input set selection processor versions processing parameters in-situ data for matchup analysis variables for aggregation trend analysis Models for scientific exploitation of EO data * ESRIN * 12.10.2012

23 Summary  Calvalus is a multi-mission full mission data processing system for bulk (re)processing, data analysis and algorithm validation  Calvalus is based on the open source middleware Apache Hadoop and implements massive parallel data-local processing  Calvalus integrates processors of the BEAM GPF processing framework and Unix executables in any programming language  Calvalus is successfully in used by various projects and will be further developed  Acknowledgement: The initial Calvalus idea was developed and its realisation was funded by the European Space Agency under the SME-LET programme. 23 Models for scientific exploitation of EO data * ESRIN * 12.10.2012

24 Reflection points  The adequate hardware infrastructure for Hadoop is different from the current trend of virtualisation and network storage (transparency vs. knowledge of data location).  Adapted optimised solutions may have a shorter life cycle than generic, standardised ones (processor interfaces that support data streaming vs. file interface)  Historical missions (ENVISAT) are not the problem. Are we prepared for Sentinel data? 24 Models for scientific exploitation of EO data * ESRIN * 12.10.2012


Download ppt "1 Models for scientific exploitation of EO data * ESRIN * 12.10.2012."

Similar presentations


Ads by Google