Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data- and Compute-Driven Transformation of Modern Science How e-Infrastructure & Policy Support Paradigm Shifts in Research Edward Seidel Senior Vice President,

Similar presentations


Presentation on theme: "Data- and Compute-Driven Transformation of Modern Science How e-Infrastructure & Policy Support Paradigm Shifts in Research Edward Seidel Senior Vice President,"— Presentation transcript:

1 Data- and Compute-Driven Transformation of Modern Science How e-Infrastructure & Policy Support Paradigm Shifts in Research Edward Seidel Senior Vice President, Research and Innovation Skolkovo Institute of Science and Technology 1

2 2 Part 1: We are in a period of unprecedented change in Science and Society…the crises & opportunities this creates

3 3 3 Profound Transformation of Science Collision of Two Black Holes 1972: Hawking. 1 person, no computer 50 KB 1994: 10 people, NCSA Cray Y-MP, 50MB 1998: 15 people, NCSA Origin, 50GB

4 Community Einstein Toolkit “Einstein Toolkit : open software for astrophysics to enable new science, facilitate interdisciplinary research and use emerging petascale computers and advanced CI.”  Consortium: 92 members, 46 sites, 15 countries  Whole consortium engaged in directions, support, development  Simulation: Luciano Rezzolla, Max Planck Institut für Gravitationsphysik (AEI) 4 Community + software + algorithms + hardware + … Many groups can do this: field explodes! Major triumph of Computational Science---solve EEs!

5 New Frontiers: Relativistic Matter  Nuclear equations of state  Collab. with astrophysicists  General relativistic magnetohydrodynamics  Some groups have ideal MHD  Radiation Transport (neutrinos/photons)  Expensive and complicated!  Requires opacities/emissivities  Chemical reactions (thermonuclear, chemical)  SN community!  Computation:  Multiphysics!!  GRMHD: petascale problem  Radiation transport beyond this Schnetter et al, PetaScale Computing: Algorithms and Applications, 2007 Zoom in to just this part: post BH formation and evolution of jet Zoom in to just this part: post BH formation and evolution of jet

6 Schnetter, et al Post BH Formation/Evolution of Jet  Multiphysics  GR, Neutrinos, MHD, Nuclear EOS  Computer Science  10 level AMR, optimized for Blue Waters  Science  200s evolution, analyze it all  Blue Waters  6M Pflop@1PF sustained = 70days! 6 Multiphyiscs framework needed for fluids in astrophysics and porous media… Rezzolla, et al

7 7 Theory and Observation of Universe  Gravitational Waves!  Complex problems in relativistic astrophysics  Relativity, hydrodynamics, nuclear physics, radiation, neutrinos, magnetic fields: globally distributed collab!  Observe (PB), compute (PF) signals  Gravity and general relativity are transformed  4 centuries of small science, small data culture  2-3 decades of radical change in both data (factors of 1000 per~5 years) and collaboration LIGO/VIRGO/GEO New era of science after a century! Data- and compute- dominated gravitational wave astronomy!

8 US Council on Competitiveness  Ping Golf  Moved from workstation to Cray, now make prototype only at last stage of design  Too effective: had to simulate “less effective” design!  Proctor & Gamble  Pringles “flying” off manufacturing line causing significant lost product and revenue. Using CFD codes that Boeing uses, airflow over the Pringle modeled, design so Pringles did not “lift off” 8

9 9 Part 1a: The Growth of Data Data Tsunami “I’m still here…” But I’m your new baby big brother… With millions of processors…

10 Going Beyond a Community Transient & Multi-Messenger Astronomy 10  New era: seeing events as they occur  Here now  ALMA, EVLA in radio  Ice Cube neutrinos  On horizon  24-42m optical?  LSST = SDSS (40TB) every night!  SKA = exabytes  Simulations integrate all physics  Data-intensive = compute-intensive Astronomy 1500-2010 was passive. No longer! Communities need to share data, software, knowledge, in real time Will require integration across disciplines, end-to- end

11 Big Data vs The Long Tail of Science  Many “Big Data” projects are “special”  Tend to be highly organized, have singular sources of data, professionally curated, a lot attention paid to them  What about the “Long Tail” (the other 99%)?  Thousands of biologists sequencing communities of organisms  Thousands of chemist and materials scientists developing a “materials genome”  Millions of people “Tweeting”…  Characteristics: Heterogeneous, perhaps hand generated Not curated, reused, served, etc… 11 How do we harness the power of this long tail? News Flash! NYT 6/3/13: Drug side effects discovered by mining web logs: paroxetine + pravastatin = high blood sugar!

12 12 Grand Challenge Communities Combine it All... Where is it going to go? 12 Same CI useful for black holes, hurricanes

13 13 Grand Challenge Communities for Complex Problems  Require many disciplines, all scales of collaborations  Individuals, groups, teams, communities  Multiscale Collaborations: Beyond teams  Are dynamic and highly multidisciplinary  Time domain astronomy, emergency forecasting, metagenomincs, materials genome…  Drive sharing technologies and methodologies  Researchers collaborate, work by sharing data. Places requirements on eInfrastructre:  Software, networks, collaborative environments, data, sharing, computing, etc  Scientific culture, reproducibility, access, university structures  “Publications.” What is a modern publication? 13 Social, behavioral and economic sciences will be critical in helping us understand these issues…

14 Scenarios like this in all fields 14 NEON+GIS

15 Framing the Challenge: Science and Society Transformed by Data  Modern science  Data- and compute- intensive  Integrative, multiscale  4 centuries of constancy, 4 decades 10 9-12 change!  Multi-disciplinary Collaborations  Individuals (Galileo!)  Groups, teams, Grand Challenge Communities  Big Data + Long Tail  Sea of Data  Age of Observation 15 We still think like this… …But such radical change cannot be adequately addressed with (current) incremental approach! Students take note!

16 Part 2: Crises, Challenges, Opportunities Computing Data Software End-to-end Networks Organizational structures Education No, we are not… Cybe r Instruments & Facilities

17 17 Five Crises “ CDSE” Community needs to address  Computing Technology  Multicore: processor is new transistor  Programming model, fault tolerance, etc  New models: clouds, grids, GPUs, …  Data, provenance, and visualization  How do we create “data scientists”?  What is an international data infrastructure?  Software treated as e-Infrastructure  Complex applications on coupled compute- data-networked environments, tools needed  Modern apps: 10 6 + lines, many groups contribute, take decades

18 18 Five Crises  Organization for Multidisciplinary & Computational Science  “Universities must significantly change organizational structures: multidisciplinary & collaborative research are needed [for US] to remain competitive in global science”  “Itself a discipline, computational science advances all science…inadequate/outmoded structures within Federal government and the academy do not effectively support this critical multidisciplinary field”  Education  The CI environment is running away from us!  How do we develop a workforce to work effectively in this world?  How do universities transition?

19 Scientific Computing and Imaging Institute, University of Utah Data Crisis: Information Big Bang PCAST Digital Data NSF Experts Study Wired, Nature Storage Networking Industry Association (SNIA) 100 Year Archive Requirements Survey Report “there is a pending crisis in archiving… we have to create long-term methods for preserving information, for making it available for analysis in the future.” 80% respondents: >50 yrs; 68% > 100 yrs Industry

20 The Shift Towards a “Sea of Data” Implications  Science & society are now data-dominated  Experiment, computation, theory  US mobile phone traffic exceeded 1 exabyte!  Classes of data  Collections, observations, experiments, simulations  Software  Publications  Totally new methodologies  Algorithms, mathematics, culture  Data become the medium for  Multidisciplinarity, communication, publication, science, economic development… 20 How do we attribute credit for this new publication form? How are data peer reviewed? What is a publication in the modern data-rich world? What is a business model for OA? Fundamental questions become focused around data: What to curate, how to remove boundaries? How to incentivize sharing? IP?

21 21 Part 2a: Recommendations

22 Software ACCI Task Force Reports  Final recommendations presented to the NSF Advisory Committee on Cyberinfrastructure Dec 2010  More than 25 workshops and Birds of a Feather sessions, 1300 people involved  Final reports on-line “Permanent programmatic activities in Computational and Data-Enabled Science & Engineering (CDS&E) should be established within NSF.” Grand Challenges Task Force “NSF should establish processes to collect community requirements and plan long-term software roadmaps.” Software Task Force “ Higher education should adopt criteria for tenure and promotion that reward…the production of digital artifacts of scholarship. Such artifacts include widely used data sets, scholarly services delivered online, and software. ” Campus Bridging Task Force 22 Campus Bridging Data & Viz Grand Challenge HPC Learning

23 Recommendation of NSF Advisory Committee on Cyberinfrastructure ACCI "The National Science Foundation should create a program in Computational and Data-Enabled Science and Engineering (CDS&E), based in and coordinated by the NSF Office of Cyberinfrastructure. The new program should be collaborative with relevant disciplinary programs in other NSF directorates and offices." 23

24 24 Part 3: Universities attempt to respond We have to do all this and revolutionize the state/national economy?

25 S koltech: Example of a 21 st Century University in the Making

26 Skoltech at a Glance A unique Russian institution in international context – This decade: a community of 200 faculty, 300 post- docs, 1200 graduate students Focused on science, engineering and technology – Addressing problems and issues in IT, Energy, Biomedicine, Space and Nuclear Interdisciplinary by design; no departments – 15 centers organized around complex problems With strong programs in support of innovation and entrepreneurship – Creating a culture of innovation in every student, professor, staff member Important part of the Skolkovo innovation ecosystem Integrated data, compute, instrumentation infrastructure and policy under development for Interdisciplinary research Accelerating discovery Economic development Integrated data, compute, instrumentation infrastructure and policy under development for Interdisciplinary research Accelerating discovery Economic development

27 27 Part 3a: You can help lead this revolution Kathryn Gray

28 Modern Research & Education Ecosystem 28 SoftwareSoftware SoftwareSoftware Track 2 CampusCampusCampusCampusCampusCampus CampusCampus CampusCampusCampusCampusCampusCampusCampusCampus DataData DataData DataData DataData XSEDE Education Crisis: I need all of this to start to solve my problem! Blue Waters

29 The Opportunity (US picture)!  Now have emerging national Integrated, High Performance Research Architecture  Blue Waters and beyond towards exascale: high end Extraordinary science continues lead at cutting edge Traditional and novel large data applications Few places can house, field, or drive such a facility  XSEDE architecture can connect… Campus Bridging: campus to national CI… –Campus Assets: MRI, Instruments, DNA sequencers… –Facilities: Supercomputers, telescopes, accelerators, light sources, NEON … »”More silicon than Steel” –Networks: end-to-end connectivity »Where are those optical network apps? 29

30 Much to do to build CDSE on this Background: address the “5 Crises”  Education  Many new opportunities and challenges CSE already has its struggles Now data: what is a “data scientist”? CDSE emerges  Data opportunities for education and citizen science  Faculty development, curriculum development  Needed on every campus  Talk to NSF, DOE, EC, your national agencies  Recommendations of ACCI, MPSAC, etc  New programs needed: See NSF CDS&E, CI TraCS, CAREER, “LWD”, etc  You can help make this happen 30

31 Key Messages  Astounding rate of change of the “Triple Helix” of Research, Education, and Innovation  Computing and Data radically change methods  Culture of collaboration around complex problems  These create many crises and opportunities  From technology to methodology to culture…  Deep integration required for science  Emergence of Computational and Data- enabled Science and Engineering as a discipline and your role!  A key part of the paradigm shift 31

32 32 & Data


Download ppt "Data- and Compute-Driven Transformation of Modern Science How e-Infrastructure & Policy Support Paradigm Shifts in Research Edward Seidel Senior Vice President,"

Similar presentations


Ads by Google