Presentation is loading. Please wait.

Presentation is loading. Please wait.

Earth Data Science Planning Meeting #1 February 20, 2013.

Similar presentations


Presentation on theme: "Earth Data Science Planning Meeting #1 February 20, 2013."— Presentation transcript:

1 Earth Data Science Planning Meeting #1 February 20, 2013

2 Data Science: An Emerging Discipline for Analyzing Massive Data Data Science is the intersection between data analysis, statistics, computer science, software engineering, and discipline science for the purposes of learning from massive data Recent Activities 2009-2011 GC&E Investment in CDX resulted in $12M+ of new business in data system technology Applied to key 8x science initiatives (IPCC, CO2 data record, etc) JPL leadership in the Earth System Grid JPL participated in the NRC Massive Data Analysis study Working with the NSF SAMSI on a massive data research study for earth science for 2012-2013 Acquired new business from DOE, DARPA, NOAA, NSF, and NASA Plans & Expectations Establish a multi-year roadmap for earth data science research and technology for massive data Shift technology research from serving data products to providing online, analytic data services Hold a 3 rd IT for Climate Research Workshop [funded through ROSES] Work with HQ Program Managers (e.g., CMAC, AIST, etc) on formulating ROSES calls around data science Establish Lab for Earth Data Science that integrates infrastructure, tools, and methods Needs Align science and mission roadmaps with emerging data science approaches for massive data Support a FY13 advanced study to generate the roadmap and business plan for a JPL data science program

3 TaskProgramAwarded Coastline Marine DiscoveryNASA ACCESS$583K Likelihood-based Quantification of Agreement Between Climate Model Output and NASA Data Records NASA ESDRERR$1,300K Water Resource ManagementNASA ARRA$2,000K Virtual Oceanographic Data Center (VODC)NASA ACCESS$650K Facilitate integration of NASA and ESGNASA IPP$250K NASA NCA/RCMESNASA$1,200K Multivariate Data Fusion & Uncertainty Quantification for Remote SensingNASA AIST$1,500K RCMESNASA AIST$400K ESG/RCMES IntegrationNASA CMAC$650K Collaborative Climate Model & Observational Data ServicesNASA ACCESS$700K CO2 Virtual Science EnvironmentNASA OCO Mission $750K Development of the ESGDOE$250K NOAA Earth Science Grid IntegrationNOAA IAA$450K DARPA Big DataDARPA$5,000K Total$15,683K

4 Data Science Computing Stack Science Data Processing, Storage/Management of Scientific Data Data Access/Usage – Data Search, Query, Retrieval Tools/Services/Methods for Massive Earth Data Analysis Decision Support Tools and Applications competency Needed competency Data Science Gap Mission/ Instrument SDS’s, ESG, PO.DAAC, etc Computational and Data Services for Earth Data

5 Traditional NASA Earth Science Pipeline For JPL Internal Use Only Data Acquisition and Command Instrume nt Operation s EDOS/GD S L0A Processin g Science Data Processing L0B L1 L2 L3 L4 SDS EOSDIS DAAC Science Data Management Archive & Distribution Instrume nt Operation s EDOS/GD S L0A Processin g Science Data Processing L0B L1 L2 L3 L4 SDS EOSDIS DAAC Science Data Management Archive & Distribution EOSDIS Data Centers Science Data Management Archive & Distribution Science Data Processing L0B L1 L2 L3 L4 Science Data Systems Instrument Operations EDOS/Groun d Data Systems L0A Processing Science Teams Outreach Research Mission Operation s TDRS Network On Board Processing

6 Addressing the Data Science Gap In a decade (…many say much sooner), serving scientific data to the community is not going to be sufficient –Distribution and volume of data will make this obsolete –Technology will allow for on-demand analysis leveraging enormous computational and data infrastructures –Science research will depend on these infrastructure for “services” Online analytic services will create a competitive advantage in the area of “data science” –The next generation of scientists are growing up with online services; they will go where the services are –Those with the service will be branded as the leaders –We should put in place the on-demand analytic capabilities for these measurements

7 Shifting To Online Data Analysis Paradigms For JPL Internal Use Only Data Acquisition and Command Instrume nt Operation s EDOS/GD S L0A Processin g Science Data Processing L0B L1 L2 L3 L4 SDS EOSDIS DAAC Science Data Management Archive & Distribution Instrume nt Operation s EDOS/GD S L0A Processin g Science Data Processing L0B L1 L2 L3 L4 SDS EOSDIS DAAC Science Data Management Archive & Distribution EOSDIS Data Centers Science Data Management Archive & Distribution Science Data Processing L0B L1 L2 L3 L4 Science Data Systems Instrument Operations EDOS/Groun d Data Systems L0A Processing Science Teams Outreach Mission Operation s TDRS Network On Board Processing Applications Analysis, Modeling and Application Environments/Ga teways Decision Support Research

8 The Big Picture: Enabling Multi-Disciplinary Analysis through a Systematic Approach An opportunity to improve the efficiency of data analysis for the world-wide science community Generate Capture Analyze Generate Capture Analyze Observational Data Predictive Models, Understanding Earth Science Projects Other Disciplines Planetary Sciences Projects Radio Astronomy Projects Compare Science

9 Challenges: Moving Towards Exabyte Data Analysis for Earth Science Growing, distributed, massive record of observational and climate model output –CMIP3: ~34 Terabytes –CMIP5: ~3 Petabytes –CMIP6: 350 PBs – 3 Exabytes (per D. Williams and 2011 Climate Knowledge Discovery Workshop) A new paradigm is required to shift focus from data access and independent data analysis to online analysis services for highly distributed, heterogeneous data to Fuse data together for long-term records Compute higher order data products on request Analyze distributed data (e.g., climate model output, satellite data, etc) with distributed computation Establish a scalable computing infrastructure for missions and science projects

10 Example: Data challenge of CMIP3 archive vs. CMIP5 archive 9/5/12I. Williams, LLNL Climate SFA Review CMIP3 Modeling Centersvolume (GB) BCCRNorway862 CCCmaCanada2,071 CNRMFrance999 CSIROAustralia2,088 GFDLUSA3,843 GISSUSA1,097 IAPChina2,868 INGVItaly1,472 INMCM3Russia368 IPSLFrance998 MIROC3Japan3,975 MIUBGermany/Korea477 MPIGermany2,700 MRIJapan1,025 NCARUSA9,173 UKMOUK973 Totals34,989 (TB) Archive size: currently: 1.4 PB total: 3.1 PB by 2013 CMIP5 Modeling Centersvolume (TB) BCCChina51 CCCmaCanada51 CMCCEurope (Italy)158 CNRMFrance71 CSIROAustralia81 EC-EARTHEurope (Netherland) 97 GCESSChina24 INMRussia30 IPSLFrance121 LASGChina100 MIROCJapan350 MOHCUK195 MPIGermany166 MRIJapan269 NASAUSA375 NCARUSA739 NCCNorway32 NCEPUSA26 NIMR/KMAKorea14 NOAA GFDLUSA158 Totals3,108 (PB) Archive size: 35 TB CMIP5/CMIP3 = 10 2

11 ESGF: IPCC CMIP5 Data System Credit: D. Williams, LLNL Climate SFA Review

12 Emerging Technologies Big Data Analytics/Computation (Hadoop) Cloud Computing Virtual Systems Distributed Computing Data Mining Large-scale Data Management

13 Scaling the Analysis Data Acquisition and Command Instrume nt Operation s EDOS/GD S L0A Processin g Instrume nt Operation s EDOS/GD S L0A Processin g Instrument Operations EDOS/Groun d Data Systems L0A Processing Mission Operation s TDRS Network On Board Processing Network w/ Cloud Storage & Computation Applications Analysis, Modeling and Application Environments/Ga teways Other Data Systems (e.g. NOAA) Other Data Systems (e.g. NOAA) Other Data Systems (e.g. NOAA) Decision Support Science Data Processing Science Data Manage NASA Mission/Multi- Mission Data & Science Centers Science Data Manage NASA Mission/Multi- Mission Data & Science Centers Science Data Manage NASA Mission/Multi- Mission Data & Science Centers Research Science Teams

14 Recent Examples VISION ESGF provides support for online sharing of data, not for online analytic services CMIP5 via Earth System Grid Federation – International data sharing (including observations)

15

16 Data Fusion for Remote Sensing Data Fixed-Rank Filtering - Cressie, N., Shi, T., and Kang, E.L. (2010) Multiple process multiple source spatial/spatio-temporal data fusion - Nguyen, H., Cressie, N., and Braverman, A. (2012), and Nguyen, H., Katzfuss, M., Cressie, N., and Braverman, A. (2012)

17 Example Research Questions What architectural design produces the most efficient system topology for the types of data movement that will be required given scientific objectives? Can we study this as an optimization problem? How do we design computational methods that exploit the system topology and its distributed nature? Need algorithms that operate on distributed data to produce statistics of interest, or approximations. Study this trade-off. Data analysis choreography: how to assemble algorithms most efficiently given a set of analysis goals? How to optimize the movement of data? How can statistics and other disciplines (e.g., computer science) education be better aligned? Statisticians and computer scientists need to work together to plan how system architectures can enable analysis of highly distributed data.

18 Earth Data Science Study 8X retreat discussion Recommendation to form a working group to explore the opportunities –Focus initially on Earth Science, but open to other areas Convene a cross-disciplinary group

19 Study Objective (1) Evaluation of the business case of targeting “data science” as a technology growth area in earth data systems research Identification of near-term science questions/challenges to address Identification of Data Science vs. Big Data synergies and differences Development of a capabilities roadmap Current state of JPL vs competitors Required staffing needs and gaps

20 Study Objective (2) Key partnerships Necessary facilities support vs. current state Recommendations on how to structure a long-term program Identify opportunities to work NASA ESD Program and propose

21 Study Team Michael Gunson, Earth Science Duane Waliser, Earth Science Joe Lazio, Astronomy/Science Amy Braverman, Statistics and Data Science Becky Castano, Machine Learning/AI David Thompson, Machine Learning/AI Robert Granat, Machine Learning/AI Michael Turmon, Machine Learning/AI Liz Kay-Im, Data Systems Chris Mattmann, IT Data Systems Tom Soderstrom, OCIO IT Chief Technologist Jason Hyon, 8X Chief Technologist Dan Crichton, IT Data Systems Emily Law, IT Data Systems

22 A Few Plans ESTO White Paper on Data Science for Earth Science 3 rd IT for Climate Research Workshop ACCESS proposals due in June AIST proposals in 2014 (bigger target) We’ve proposed to ESTO that they should fund an architecture study to address data science

23 Discussion

24 Future Meetings


Download ppt "Earth Data Science Planning Meeting #1 February 20, 2013."

Similar presentations


Ads by Google