Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Future of Scientific Computing at Harvard Alyssa A. Goodman Professor of Astronomy Director, Initiative in Innovative Computing Alyssa A. Goodman Professor.

Similar presentations


Presentation on theme: "The Future of Scientific Computing at Harvard Alyssa A. Goodman Professor of Astronomy Director, Initiative in Innovative Computing Alyssa A. Goodman Professor."— Presentation transcript:

1 The Future of Scientific Computing at Harvard Alyssa A. Goodman Professor of Astronomy Director, Initiative in Innovative Computing Alyssa A. Goodman Professor of Astronomy Director, Initiative in Innovative Computing

2 “The Heavy Red Bag” How can computers advance (my) science?

3

4 A new collaborative scientific initiative at Harvard.

5 Computational challenges are common across scientific disciplines How to: Acquire, transmit, organize, and query new kinds of data? Apply distributed computing resources to solve complex problems? Derive meaningful insight from large datasets? Share, integrate and analyze knowledge across geographically dispersed researchers? Visually represent scientific results so as to maximize understanding? Opportunity to collaborate and apply insights from one field to another

6 Filling the “Gap” between Science and Computer Science Increasingly, core problems in science require computational solution Typically hire/“home grow” computationalists, but often lack the expertise or funding to go beyond the immediate pressing need Focused on finding elegant solutions to basic computer science challenges Often see specific, “applied” problems as outside their interests Scientific disciplines Computer Science departments

7 “Workflow” & “Continuum”

8 Workflow ExamplesAstronomyPublic Health“Collect”TelescopeMicroscope, Stethoscope, Survey COLLECT “National Virtual Observatory”/ COMPLETE CDC Wonder “Analyze” Study the density structure of a star- forming glob of gas Find a link between one factory’s chlorine runoff & disease ANALYZE Study the density structure of all star- forming gas in… Study the toxic effects of chlorine runoff in the U.S. “Collaborate” Work with your student COLLABORATE Work with 20 people in 5 countries, in real-time “Respond” Write a paper for a Journal. RESPOND Write a paper, the quantitative results of which are shared globally, digitally.

9 IIC contact: AG, FAS Workflow

10 Workflow a.k.a. The Scientific Method (in the Age of the Age of High-Speed Networks, Fast Processors, Mass Storage, and Miniature Devices) IIC contact: Matt Welsh, FAS

11 Workflow: The Harvard Virtual Brain Faculty of Arts and Sciences  Harvard College  Division of Engineering Harvard School of Public Health Faculty of Medicine  Harvard Medical School  Affiliated Teaching Hospitals Data Acquisition  MRI  PET  Microscopy  etc. Distributed Data Storage Data Processing  Analysis  Visualization  Integration  etc. Information Access  Query  Statistical Analysis  Knowledge Management  etc. Establishing a Harvard-wide Neuroscience Infrastructure Harvard IIC IIC contact: David Kennedy, HMS/MGH

12 New technologies for measurement and simulation are transforming the “workflow.” Manual/low throughput Solitary Limited by two hands Analog High throughput Automated/networked Highly scalable Digital Biomedicine: pre-genomics Biomedicine: genomics era

13 Continuum “Pure” Discipline Science (e.g. Galileo) “Pure” Computer Science (e.g. Turing) “Computational Science” Missing at Most Universities

14 Workflow & Continuum For any particular scientific investigation: Where does, and could, “computational science” make improvements in this cycle?

15 Harvard Public Health “NOW” (Oct. 2004) "In the past, experiments did not involve such large data sets," observed Dyann Wirth, professor of infectious diseases in the Department of Immunology and Infectious Diseases and member of the advisory group for the core. "There has been a dramatic change in the past five to 10 years in the amount and availability of genomic data [or the DNA sequences themselves] and functional genomic data, [or the sequences’ purpose]." In the past five years alone, the genomes of humans, rats, and the malaria parasite Plasmodium Falciparum have been published, for example. Dyann Wirth "One of the purposes of bioinformatics is to reduce the number of experiments that need to be done to achieve reliable information," said L.J. Wei, professor of biostatistics in the Department of Biostatistics and member of the advisory group for the core. "However, an issue right now is that there are huge data sets that can be run through different kinds of software programs, ending up with many data points. Unless we understand and use bioinformatics well, we may not even know which of those data points are important." L.J. Wei

16 Filling the “computational science” gap: IIC Problem-driven approach …focusing effort on solving problems that will have greatest impact & educational value Collaborative projects …combining disciplinary knowledge with computer science expertise Interdisciplinary effort …to ensure that best practices are shared across fields and that new tools and methodologies will be broadly applicable Links with industry …to draw on and learn from experience in applied computation Institutional funding …to ensure effort is directed towards key needs and not driven solely by narrow priorities of funding agencies

17 IIC at Harvard

18 Numerical Simulation of Star Formation Bate, Bonnell & Bromm 2002 (UKAFF) MHD turbulence gives “t=0” conditions; Jeans mass=1 M sun 50 M sun, 0.38 pc, n avg =3 x 10 5 ptcls/cc forms ~50 objects T=10 K SPH, no B or  movie=1.4 free-fall times

19 Simulations & Public Health

20

21 Goal: Statistical Comparison of “Real” and “Synthesized” Star Formation Figure based on work of Padoan, Nordlund, Juvela, et al. Excerpt from realization used in Padoan & Goodman 2002.

22 Measuring Motions: Molecular Line Maps

23 Alves, Lada & Lada 1999 Radio Spectral-Line Survey Radio Spectral-line Observations of Interstellar Clouds

24 Velocity from Spectroscopy 1.5 1.0 0.5 0.0 -0.5 Intensity 400350300250200150100 "Velocity" Observed Spectrum All thanks to Doppler Telescope  Spectrometer

25 1.5 1.0 0.5 0.0 -0.5 Intensity 400350300250200150100 "Velocity" Observed Spectrum Telescope  Spectrometer All thanks to Doppler Velocity from Spectroscopy

26 Barnard’s Perseus COMPLETE/FCRAO W( 13 CO)

27 IRAS N dust H-  emission,WHAM/SHASSA Surveys (see Finkbeiner 2003) HH 2MASS/NICER Extinction

28 “Astronomical Medicine” Excerpts from Junior Thesis of Michelle Borkin (Harvard College); IIC Contacts: AG (FAS) & Michael Halle (HMS/BWH/SPL)

29 IC 348

30 “Astronomical Medicine”

31

32 After “Medical Treatment” Before “Medical Treatment”

33 3D Slicer Demo (available after talk) IIC contacts: Michael Halle & Ron Kikinis

34 VisualizationDistributed Computing Databases/ Provenance Analysis & Simulations Instrumentation Physically meaningful combination of diverse data types. e-Science aspects of large collaborations. Sharing of data and computational resources and tools in real-time. Management, and rapid retrieval, of data. “Research reproducibility” …where did the data come from? How? Development of efficient algorithms. Cross-disciplinary comparative tools (e.g. statistical). Improved data acquisition. Novel hardware approaches (e.g. GPUs, sensors). IIC: Five Research Branches

35 IIC: Innovative Organizational Model Culture Staffing Promotion/ career path Criteria for promotion will give equal weight to scholarly activities, and to technological invention No “class” distinctions made between teaching and non- teaching faculty, scientists and engineers, artists and designers working in the visualization program Highly accomplished academics and senior experts whose careers have been primarily in industry, working together

36 How IIC will Function: Overview IIC Objectives Identify and fund projects that are likeliest to have the greatest and broadest impact Pursue projects in way that will yield best outcome, enable shared learning, etc. Enable new research for specific scientific discipline Generate new computational tools for broader application Project execution Dissemination of knowledge Project selection

37 Role Submit proposal in response to call for ideas Evaluate/rank proposals for scientific merit: should this be a priority for IIC? Evaluate/prioritize proposals according to technical feasibility, assess resource needs Who participates Any Harvard researcher (e.g., in genomics, fluid dynamics, epidemiology,neuroscience, nanoscience, comp bio, chemical biology, optics, geology, astronomy, quantum mechanics, et al.) Harvard researchers representing broad interests of IIC stakeholders plus IIC Director & Dir. of Research Consists of IIC Director Dirs. of Res. & Adm/Ops Heads of IIC branches Project Selection Program Advisory Committee Project proposals IIC Management Team

38 Project Execution Responsible for project execution and metrics for tracking progress/performance; interfaces with IIC branch heads Scientists who “own” the problem and are committed to working with IIC staff to tackle it IIC staff scientists assigned to work on project by relevant IIC branch heads. The same IIC staff member may serve on multiple IIC project teams Discipline scientistsIIC staff Project Manager IIC Project Team C, etc. Discipline scientistsIIC staff Project Manager IIC Project Team B Discipline scientistsIIC staff Project Manager IIC Project Team A

39 Dissemination of Knowledge Seminars/colloquiaPublications Knowledge management system Communities of practice Scientific journals IIC white papers Internal... External… New tools IIC process

40 Education is central to IIC’s mission At Harvard: Undergraduate & graduate courses focused on “data-intensive science” New graduate certificate program, within existing Ph.D. programs Research opportunities at undergraduate, graduate, and postdoctoral levels Beyond Harvard: New museum, highlighting the kind of science done at the IIC

41 IIC organization: research and education Assoc Dir, Instrumentation Assoc Dir, Visualization Assoc Dir, Analysis & Simulation Provost IIC Director Assoc Provost Dir of Admin & Operations Project 1 (Proj Mgr 1) Project 2 (Proj Mgr 2) Project 3 (Proj Mgr 3) Dir of Education & Outreach    Etc. CIO (systems) Knowledge mgmt Education & Outreach staff Dean, Physical Sciences Dir of Research Assoc Dir, Databases/Data Provenance Assoc Dir, Distributed Computing

42 IIC organization: admin and operations Provost IIC Director Dir of Research Assoc Provost Dir of Admin & Operations Dir of Education & Outreach Dean, Physical Sciences Admin Finance Development Facilities HR Note: admin roles expected to be played by 1-2 staff members at outset; staff will grow with overall IIC growth

43 VisualizationDistributed Computing Databases/ Provenance Analysis & Simulations Instrumentation Physically meaningful combination of diverse data types. e-Science aspects of large collaborations. Sharing of data and computational resources and tools in real-time. Management, and rapid retrieval, of data. “Research reproducibility” …where did the data come from? How? Development of efficient algorithms. Cross-disciplinary comparative tools (e.g. statistical). Improved data acquisition. Novel hardware approaches (e.g. GPUs, sensors). IIC: Examples

44 Visualization: 3D Slicer (BWH Surgical Planning Lab) IIC contacts: Michael Halle & Ron Kikinis

45 IIC contact: Felice Frankel (MIT) Work: Garstecki/Whitesides (FAS) “Image and Meaning” (Visualization)

46 Distributed Computing: Semantics, Ontologies IIC Contact: Tim Clark (HMS/MGH)

47

48 Distributed Computing & Large Databases: Large Synoptic Survey Telescope Optimized for time domain scan mode deep mode 7 square degree field 6.5m effective aperture 24th mag in 20 sec > 5 Tbyte/night Real-time analysis Simultaneous multiple science goals Simultaneous multiple science goals IIC contact: Christopher Stubbs (FAS)

49 Relative optical survey power based on A  = 270 LSST design

50 AstronomyHigh Energy Physics LSSTSDSS2MASSMACHODLSBaBarAtlasRHIC First year of operation 20111998200119921999199820071999 Run-time data rate to storage (MB/sec) 5000 Peak 500 Avg 8.3 1 1 2.7 60 (zero- suppressd) 6* 540* 120* ( ’ 03) 250* ( ’ 04) Daily average data rate (TB/day) 200.020.0160.0080.0120.660.03 ( ’ 03) 10 ( ’ 04) Annual data store (TB) 20003.6610.253007000200 ( ’ 03) 500 ( ’ 04) Total data store capacity (TB) 20,000 (10 yrs) 20024.58210,000100,000 (10 yrs) 10,000 (10 yrs) Peak computational load (GFLOPS) 140,000100 111.000.6002,000100,0003,000 Average computational load (GFLOPS) 140,0001020.7000.0302,000100,0003,000 Data release delay acceptable 1 day moving 3 months static 2 months 6 months1 year6 hrs (trans) 1 yr (static) 1 day (max) <1 hr (typ) Few days100 days Real-time alert of event30 secnone <1 hour1 hrnone Type/number of processors TBD1GHz Xeon 18 450MHz Sparc 28 60-70MHz Sparc 10 500MH z Pentium 5 Mixed/ 5000 20GHz/ 10,000 Pentium/ 2500

51 Analysis & Simulations Figure based on work of Padoan, Nordlund, Juvela, et al. Excerpt from realization used in Padoan & Goodman 2002.

52 Analysis & Simulations: Neural Net Models of Intelligence Does Speed of Convergence in Neural Nets Predict Scores on Measures of “General Intelligence”? Select from the lower 8 the one that completes the pattern in the top 9 IIC contact: Stephen Kosslyn (Psychology)

53 (Easier) Analysis of Large Data Sets: Mendelian Disease Genes OMIM on the genome 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 050100150200250 Position (MB) Chromosome 1 2 Hello world 189 Large data files reformat, merge, and filter Can a biologist get from here to there? Location of every known disease gene on the human genome Without programming? IIC contact: Eitan Rubin (FAS/CGR)

54 Instrumentation IIC contact: Matt Welsh, FAS

55 IIC: Mission The Institute for Innovative Computing (IIC) will make Harvard a world leader in the innovative and creative use of computational resources to address forefront scientific problems. We will focus on developing capabilities that are applicable to multiple disciplines, by undertaking specific, well-defined projects, thereby developing tools and approaches that can be generalized and shared. We will foster the flow of ideas and inventions along the continuum from basic science to scientific computation to computational science to computer science. We will train a next generation of creative and computationally capable scientists, build linkages to industry, and communicate with the public at large.

56

57 Why Here? Diverse group of senior faculty and accomplished scientists… …spanning a wide range of relevant disciplines, e.g., Computer science Physics, Chemistry, Astronomy, Statistics, Biology, Medicine, etc. Psychology, Graphic Design …with backgrounds in both academia and industry… …deeply committed to the vision of a collaborative approach to solving the most compelling computing challenges facing scientists today

58 Who are IIC’s “competitors”? Caltech Center for Advanced Scientific Computing Research Computation Institute at the University of Chicago Cornell Theory Center MIT Media Lab Scientific Computing and Imaging Institute (University of Utah) UK National eScience Center of the Universities of Glasgow and Edinburgh IIC is unique in its collaborative, comprehensive, interdisciplinary approach

59 IIC will evolve over three phases Phase I 2005-07 Timing IIC staffing level, combo of new faculty senior scientists admin staff Number of projects Educational mission New courses offered Outreach programs Other key milestones Phase II 2008-10 Phase III 2011+ Total ~25to ~100 ~3to ~15 New courses to museum Evaluation schedule (internal, external committees)

60 Challenges In “Phase I” (Startup) Result of “Allston” Science & Technology Task Force IIC intended to be a “University” (not a single school) initiative FAS Constraints Faculty Appointments Non-Faculty Appointments Startup Space “Chicken-and-Egg” Problem with Recruiting Good, but not certain, Funding Prospects Role of DEAS Computer Science


Download ppt "The Future of Scientific Computing at Harvard Alyssa A. Goodman Professor of Astronomy Director, Initiative in Innovative Computing Alyssa A. Goodman Professor."

Similar presentations


Ads by Google