Ian Foster Computation Institute Argonne National Lab & University of Chicago Why Computer Science is Fundamental to Almost Everything.

Slides:



Advertisements
Similar presentations
National e-Science Centre Glasgow e-Science Hub Opening: Remarks NeSCs Role Prof. Malcolm Atkinson Director 17 th September 2003.
Advertisements

Kensington Oracle Edition: Open Discovery Workflow Meets Oracle 10g Professor Yike Guo.
Design of Experiments Lecture I
SCEC/ITR All-Hands Meeting October 11, 2002 Introduction by Tom Jordan.
Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
The ADAMANT Project: Linking Scientific Workflows and Networks “Adaptive Data-Aware Multi-Domain Application Network Topologies” Ilia Baldine, Charles.
A Systematic approach to the Large-Scale Analysis of Genotype- Phenotype correlations Paul Fisher Dr. Robert Stevens Prof. Andrew Brass.
Computational Biology: A Measurement Perspective Alden Dima Information Technology Laboratory
Microsoft Research Faculty Summit Ian Foster Computation Institute University of Chicago & Argonne National Laboratory.
Ian Foster Computation Institute Argonne National Lab & University of Chicago Education in the Science 2.0 Era.
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
Data Conservancy: A Life Sciences Perspective Sayeed Choudhury Johns Hopkins University
1 High Performance Computing at SCEC Scott Callaghan Southern California Earthquake Center University of Southern California.
Jeffery Loo NLM Associate Fellow ’03 – ’05 chemicalinformaticsforlibraries.
Fungal Semantic Web Stephen Scott, Scott Henninger, Leen-Kiat Soh (CSE) Etsuko Moriyama, Ken Nickerson, Audrey Atkin (Biological Sciences) Steve Harris.
Bioinformatics: a Multidisciplinary Challenge Ron Y. Pinter Dept. of Computer Science Technion March 12, 2003.
Introduction to Genomics, Bioinformatics & Proteomics Brian Rybarczyk, PhD PMABS Department of Biology University of North Carolina Chapel Hill.
Use of Ontologies in the Life Sciences: BioPax Graciela Gonzalez, PhD (some slides adapted from presentations available at
Astrophysics, Biology, Climate, Combustion, Fusion, Nanoscience Working Group on Simulation-Driven Applications 10 CS, 10 Sim, 1 VR.
Grand Challenges Robert Moorhead Mississippi State University Mississippi State, MS 39762
CBioC: Massive Collaborative Curation of Biomedical Literature Future Directions.
Data mining and statistical learning: lecture 1a Statistics and computer science for a data-rich world.
Building Knowledge-Driven DSS and Mining Data
April 2009 OSG Grid School - RDU 1 Open Science Grid John McGee – Renaissance Computing Institute University of North Carolina, Chapel.
Bioinformatics Jan Taylor. A bit about me Biochemistry and Molecular Biology Computer Science, Computational Biology Multivariate statistics Machine learning.
Southern California Earthquake Center Toward a Collaboratory for System-Level Earthquake Science Tom Jordan Southern California Earthquake Center University.
1 Building National Cyberinfrastructure Alan Blatecky Office of Cyberinfrastructure EPSCoR Meeting May 21,
Paleoseismic and Geologic Data for Earthquake Simulations Lisa B. Grant and Miryha M. Gould.
9/30/2004TCSS588A Isabelle Bichindaritz1 Introduction to Bioinformatics.
CBP 2006MSc. Computing1 Modelling and Simulation.
Open Science Grid For CI-Days Internet2: Fall Member Meeting, 2007 John McGee – OSG Engagement Manager Renaissance Computing Institute.
DOE Genomics: GTL Program IT Infrastructure Needs for Systems Biology David G. Thomassen Office of Biological and Environmental Research DOE Office of.
Objectives of the Lecture
Seismic Hazard Assessment for the Kingdom of Saudi Arabia
Beyond the Human Genome Project Future goals and projects based on findings from the HGP.
Functional Genomic Hypothesis Generation and Experimentation by a Robot Scientist King et al, Nature : Presented by Monica C. Sleumer February.
Southern California Earthquake Center - SCEC SCEC/CME Tom Jordan (USC) Bernard Minster (SIO) Carl Kesselman (ISI) Reagan Moore (SDSC) Phil Maechling (USC)
Function first: a powerful approach to post-genomic drug discovery Stephen F. Betz, Susan M. Baxter and Jacquelyn S. Fetrow GeneFormatics Presented by.
Introduction to Science Informatics Lecture 1. What Is Science? a dependence on external verification; an expectation of reproducible results; a focus.
Agent-based methods for translational cancer multilevel modelling Sylvia Nagl PhD Cancer Systems Science & Biomedical Informatics UCL Cancer Institute.
System Level Science and System Level Models Ian Foster Argonne National Laboratory University of Chicago Improving IAM Representations of a Science-Driven.
Geosciences - Observations (Bob Wilhelmson) The geosciences in NSF’s world consists of atmospheric science, ocean science, and earth science Many of the.
Fig. 1. A wiring diagram for the SCEC computational pathways of earthquake system science (left) and large-scale calculations exemplifying each of the.
On Parallel Time Domain Finite Difference Computation of the Elastic Wave Equation and Perfectly Matched Layers (PML) Absorbing Boundary Conditions (With.
Ian Foster The Computation Institute. 2 Type Ia Supernova: SN 1994D.
Genomes To Life Biology for 21 st Century A Joint Initiative of the Office of Advanced Scientific Computing Research and Office of Biological and Environmental.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
SCEC Community Modeling Environment (SCEC/CME): SCEC TeraShake Platform: Dynamic Rupture and Wave Propagation Simulations Seismological Society of America.
NVO Review -- San Diego Jan The VO compared to Other O‘s Jim Gray Microsoft T HE US N ATIONAL V IRTUAL O BSERVATORY.
Introduction to Bioinformatics Dr. Rybarczyk, PhD University of North Carolina-Chapel Hill
SIG: Synthetic Seismogram Exchange Standards (formats & metadata) Is it time to establish exchange standards for synthetic seismograms? IRIS Annual Workshop.
Pathway: a collection of genes, proteins, and /or small molecules that modulate a cellular process or disease state Growing demand in biological sciences.
GEOSCIENCE NEEDS & CHALLENGES Dogan Seber San Diego Supercomputer Center University of California, San Diego, USA.
06/22/041 Data-Gathering Systems IRIS Stanford/ USGS UNAVCO JPL/UCSD Data Management Organizations PI’s, Groups, Centers, etc. Publications, Presentations,
Applications and Requirements for Scientific Workflow May NSF Geoffrey Fox Indiana University.
Cyberinfrastructure Overview Russ Hobby, Internet2 ECSU CI Days 4 January 2008.
Securing the Grid & other Middleware Challenges Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer.
Ian Foster Computation Institute Argonne National Lab & University of Chicago Grid Enabling Open Science.
UCERF3 Uniform California Earthquake Rupture Forecast (UCERF3) 14 Full-3D tomographic model CVM-S4.26 of S. California 2 CyberShake 14.2 seismic hazard.
GADU: A System for High-throughput Analysis of Genomes using Heterogeneous Grid Resources. Mathematics and Computer Science Division Argonne National Laboratory.
Applications and Requirements for Scientific Workflow May NSF Geoffrey Fox Indiana University.
High throughput biology data management and data intensive computing drivers George Michaels.
INTRODUCTION TO XSEDE. INTRODUCTION  Extreme Science and Engineering Discovery Environment (XSEDE)  “most advanced, powerful, and robust collection.
SCEC CyberShake on TG & OSG: Options and Experiments Allan Espinosa*°, Daniel S. Katz*, Michael Wilde*, Ian Foster*°,
There is an inherent meaning in everything. “Signs for people who can see.”
Sub-fields of computer science. Sub-fields of computer science.
KnowEnG: A SCALABLE KNOWLEDGE ENGINE FOR LARGE SCALE GENOMIC DATA
Scott Callaghan Southern California Earthquake Center
SCEC Community Modeling Environment (SCEC/CME)
Presentation transcript:

Ian Foster Computation Institute Argonne National Lab & University of Chicago Why Computer Science is Fundamental to Almost Everything

2 “Applied computer science is now playing the role that mathematics did from the 17th through the 20th centuries: providing an orderly, formal framework & exploratory apparatus for other sciences.” —George Djorgovski

3 The Big Questions Nature of the universeConsciousness Life & death Future of the planet

4 The Little Questions l Friends l Sales l Entertainment l Spelling l Parking

5 How Do We Answer Them? Simulation Data Empirical < Theory

6 Type Ia Supernova: SN 1994D

7

8 Type Ia Supernova Explosion: Gravitationally Confined Detonation (Calder, Plewa, Vladimirova, Lamb, and Truran, 2004)

9 IBM BG/L Computer

10 Challenges Include … l Multi-scale, multi-physics modeling u Adaptive mesh refinement u Component architectures l Scaling to 100K+ processors u Scalable parallel libraries u Parallel operating systems l Understand & validating results u Visualization, data mining u Quantifying uncertainty

11 How Much Data? l In 2006: u The world created 161 exabytes (1.6 x bytes) of digital data u There were one billion devices able to capture digital images l By 2010: u Annual data output will reach one zettabyte (1 x bytes) Source: IDC, 2007

12 Information Big Bang

13 A Data Deluge

14 And comparisons must be made among many We need to get to one micron to know location of every cell. We are starting to get to 10 microns A Brain is a Lot of Data! (Mark Ellisman, UCSD)

15 Images courtesy Mark Ellisman, UCSD

16 Images courtesy Mark Ellisman, UCSD

17 Images courtesy Mark Ellisman, UCSD

18 Images courtesy Mark Ellisman, UCSD

19 Understanding increases far more slowly l Methodological bottlenecks?  Improved technology l Human limitations?  AI-assisted discovery

20 The Problem l Data ingest l Managing a petabyte l Common schema l How to organize it? l How to reorganize it? l Data query & visualization l Support/training l Performance: interactivity, scale in data size, analysis complexity, demand Experiments, Instruments Simulations facts answers questions ? Literature Other Archives facts Jim Gray & Alex Szalay

21 Identify Genes Phenotype 1 Phenotype 2 Phenotype 3 Phenotype 4 Predictive Disease Susceptibility Physiology Metabolism Endocrine Proteome Immune Transcriptome Biomarker Signatures Morphometrics Pharmacokinetics Ethnicity Environment Age Gender Evidence Integration: Genetics & Disease Susceptibility Source: Terry Magnuson

22 GeneWays as an Info-Grinder On-line Journals Pathways GeneWays Andrey Rzhetsky et al., U.Chicago We are currently screening 250,000 journal articles… 2.5M reasoning chains 4M statements

23 Data Analysis gets Fuzzy l Global statistics? u Correlation functions: N 2 u Likelihood techniques: N 3 l Best we can do is N or maybe N logN (scale approximate) Haystack: Jim Gray/Alex Szalay

24

25 Growth of Sequences and Annotations since 1982 Folker Meyer, Genome Sequencing vs. Moore’s Law: Cyber Challenges for the Next Decade, CTWatch, August 2006.

26 Production Science: Biology Public PUMA Knowledge Base Information about proteins analyzed against ~2 million gene sequences Back Office Analysis on Grid Millions of BLAST, BLOCKS, etc., on OSG and TeraGrid Natalia Maltsev et al.,

27 Genome Analysis & DB Update (GADU) CPUs

28

29 Integrated View of Simulation, Experiment, & Bioinformatics *Simulation Information Management System + Laboratory Information Management System Database Analysis Tools Experiment SIMS* Problem Specification Simulation Browsing & Visualization LIMS + Experimental Design Browsing & Visualization

30 eScience Computational science + Informatics = eScience [John Taylor, UK EPSRC] u “Large-scale science carried out through distributed collaborations—often leveraging access to large-scale data & computing”

31 Seismic Hazard Analysis l Intensity measure: peak ground acceleration l Interval: 50 yrs l Probability of exceedance: 2% Defn: Max. intensity of shaking expected at a site during a fixed time interval Example:National seismic hazard maps( T. Jordan et al., Southern California Earthquake Center

32 Seismic Hazard Analysis Seismic Hazard Model Seismicity Paleoseismology Local site effects Geologic structure Faults Stress transfer Crustal motion Crustal deformation Seismic velocity structure Rupture dynamics T. Jordan et al., Southern California Earthquake Center

33 SHA Computational Pathways IntensityMeasuresEarthquake Forecast Model AttenuationRelationship 1 Standardized Seismic Hazard Analysis Ground motion simulation Physics-based earthquake forecasting Ground-motion inverse problem AWM GroundMotions SRM Unified Structural Representation Faults Motions Stresses Anelastic model 2 AWP = Anelastic Wave Propagation = SRM = Site Response Model RDMFSM 3 FSM = Fault System Model RDM = Rupture Dynamics Model Invert Other Data GeologyGeodesy 4 Physics-based simulations Empirical relationships Improvement of models

34 SDSCUSC SCEC PSCTeraGrid ISI 12 CPUs1,700 CPUs1,200 CPUs 1 CPU 4 CPUs Prepare input to Pathway2 wave propagation code Pathway2PGV converts output into hazard map Map is visualized Access to National Cyberinfrastructure

35 Europe: 4603 users Elsewhere: 1632 users CERN Users

36 Cancer Biology

37 Bennett Berthenthal et al.,

38 James Evans, U.Chicago Arabidopsis articles

39 eScience Challenges l Simulate complex, multi-component systems l Evaluate accuracy of such simulations l Integrate evidence to draw conclusions l Evaluate strength of conclusions l Automate “experimental” workflows l Document basis for conclusions (provenance) l Allow these problems to be tackled by distributed teams using federated resources

40 What is Fundamental? Bits + Algorithms + Complex systems First two, at least © CS

41 Computer Science: A Narrow or Broad View? l Narrow u CS is programming  Other aspects of information are the domain of “statistics,” “bioinformatics”, etc., etc. l Broad u CS is “the systematic study of algorithmic processes that describe and transform information, their theory, analysis, design, efficiency, implementation, and application” (Denning et al., CACM, 1989)  Statistics & bioinformatics are subdisciplines of computer science

Creation of knowledge: basic, curiosity- driven research Application of knowledge Focus on New Knowledge Creation? Focus on Application? No Yes Edison Bohr Pasteur Pasteur’s Quadrant Research Model Effective eScience requires PQ research models Classic Linear Research Model Slide courtesy Dan Atkins, U.Michigan

43 “Applied computer science is now playing the role that mathematics did from the 17th through the 20th centuries: providing an orderly, formal framework & exploratory apparatus for other sciences.” —George Djorgovski “… the branch of computer science that concerns itself with the application of computing knowledge to other domains”?

44 Computation Institute A joint institute of Argonne and the University of Chicago, focused on advancing system-level science Solutions to many grand challenges facing science and society today are dependent upon the analysis and understanding of entire systems, not just individual components. They require not reductionist approaches but the synthesis of knowledge from multiple levels of a system, whether biological, physical, or social (or all three).

45 Thanks! l l l

46 In Memoriam: Jim Gray ( ?) Turing Award, 1998 “for seminal contributions to database & transaction processing and technical leadership in system implementation”