Presentation is loading. Please wait.

Presentation is loading. Please wait.

Opportunities and Challenges in e_Science Fabrizio Gagliardi & Carlos Hulot Microsoft Corporation Fabrizio Gagliardi & Carlos Hulot Microsoft Corporation.

Similar presentations


Presentation on theme: "Opportunities and Challenges in e_Science Fabrizio Gagliardi & Carlos Hulot Microsoft Corporation Fabrizio Gagliardi & Carlos Hulot Microsoft Corporation."— Presentation transcript:

1 Opportunities and Challenges in e_Science Fabrizio Gagliardi & Carlos Hulot Microsoft Corporation Fabrizio Gagliardi & Carlos Hulot Microsoft Corporation

2 Outline Introductory remarks Reviewing emergence of e_Science the intensive computing side the massive data side The opportunity of e_Science The challenges of e_Science A Microsoft contribution Conclusions

3 Introductory remarks Who am I? A computer scientist who has spent 30 years at CERN (and in other scientific laboratories) developing HPC systems for physics and other sciences Started in real-time, data acquisition and networking Pioneered ES, AI, MPP systems, cluster computing and in the last 7 years, Grid computing Initiator of EU-DataGrid, EGEE and more than 10 other HPC and Grid projects (mostly within the EU IST programmes) Co-founder of the Global Grid Forum (started in Amsterdam in 2001 together with EU-DataGrid) See my last article on IEEE Spectrum Magazine (July 2006)

4 Introductory remarks 2 Joined Microsoft on 1/November/2005 Promoting Microsoft Computing into Science and Science into Microsoft Computing My mission: Promoting Microsoft Computing into Science and Science into Microsoft Computing by exploring and building important collaborations with science in Europe, Middle East, Africa and Latin America Director in the Technical Computing team led by Tony Hey (Corporate VP)

5 A New Science Paradigm  Thousand years ago: Experimental Science - description of natural phenomena - description of natural phenomena  Last few hundred years: Theoretical Science - Newton’s Laws, Maxwell’s Equations … - Newton’s Laws, Maxwell’s Equations …  Last few decades: Computational Science - simulation of complex phenomena - simulation of complex phenomena  Today: e-Science or Data-centric Science - unify theory, experiment, and simulation - unify theory, experiment, and simulation - using massive computing and large data - using massive computing and large data exploration and mining: exploration and mining: Data captured by instruments Data captured by instruments Data generated by simulations Data generated by simulations Data generated by sensor networks Data generated by sensor networks  Scientists mostly work on computers (With thanks to Jim Gray)

6 Life Sciences Multidisciplinary Research New Materials, Technologies & Processes Math and Physical Science Social Sciences Earth Sciences Computer & Information Sciences Accelerating Discovery

7 7 CERN LHC 40 million particle collisions every second reduced by online computers to a few hundred “good” events per sec. Which are recorded on disk and magnetic tape at 100-1,000 MegaBytes/sec ~15 PetaBytes per year for all four experiments

8 8 Technology evolution has helped… System Cray Y-MP C916Sun HPC10000Small Form Factor PCs Architecture 16 x Vector 4GB, Bus 24 x 333MHz Ultra- SPARCII, 24GB, SBus 4 x 2.2GHz Athlon64 4GB, GigE OS UNICOSSolaris 2.5.1Windows Server 2003 SP1 GFlops~10 Top500 # 1500N/A Price $40,000,000$1,000,000 (40x drop)< $4,000 (250x drop) Customers Government LabsLarge EnterprisesEvery Engineer & Scientist Applications Classified, Climate, Physics Research Manufacturing, Energy, Finance, Telecom Bioinformatics, Materials Sciences, Digital Media

9 Top 500 Architectures / Systems

10 Enabling Grids for E-sciencE INFSO-RI LCG depends on two major science Grid infrastructures (plus regional Grids) EGEE - Enabling Grids for E-Science OSG - US Open Science Grid High Energy Physics (LCG) Scale (June 2006): ~ 200 sites in 40 countries ~ CPUs > 10 PB storage > jobs per day > 100 Virtual Organizations

11 Enabling Grids for E-sciencE EGEE-II INFSO-RI Grids in Biomedical Sciences A multiplication of projects around the world –Example: the National Bioinformatics Initiative in Holland The example of EGEE –More than 20 applications in medical imaging, bioinformatics and drug discovery –Large scale deployment of in silico drug discovery initiatives binding energy docking energy T01 (E119A) T01 energy statistics kcal/mol number Docking Energy Binding Energy 1f8b, 1f8c 2qwe 55% 11.58% binding energy docking energy Kcal/mol compound numbers T01 (E119A) T01 energy statistics kcal/mol number Docking Energy Binding Energy 1f8c 2qwe 55% 11.58% binding energy docking energy Kcal/mol compound numbers Impact of mutations on drug efficiency against H5N1 In Silico Docking On Malaria on 5 grid infrastructures is breaking the the world record for in silico docking throughput

12 12 Future ITER Fusion reactor Applications with distributed calculations: Monte Carlo, Separate estimates, … Multiple Ray Tracing: e. g. TRUBA Stellarator Optimization: VMEC Transport and Kinetic Theory: Monte Carlo Codes

13 13 The data deluge e_Science is now dominated by huge amounts of data Many discoveries are hidden in those data, but… How to organize, mine and understand the data? How to address the above issues in a scientist friendly environment, this is where commodity computing tools developed by Microsoft for business and industry could help…

14 © 14 Data, Data, Data Courtesy of Carole Goble

15 © 15 Lets put it in context…. “Six weeks in the laboratory can save you six minutes at the computer” Jeremy Zucker, Tom Knight Courtesy of Carole Goble

16 © 16 Courtesy of Carole Goble

17 17 The opportunity in e_Science Replacing experimental activity (or part of it) with computing simulation and modelling based on large distributed computing infrastructures is what is now called e_Science Allowing sharing of resources, not only computing, but also data and people’s knowledge is what motivated the emergency of grid computing and the establishment of international virtual organisations which replace local resident scientists This is major paradigm shift which requires scientists to become expert in complex computing methods

18 18 The challenges (still) in e_Science The applied scientist is obliged to become also a computer scientist Far too much time is spent in developing often over engineered computing solutions distracting the applied scientist from their primary mission This has shifted the conventional scientific computing paradigm and could limit scientific discovery in the future and produce major set backs The applied scientist is obliged to become also a computer scientist Far too much time is spent in developing often over engineered computing solutions distracting the applied scientist from their primary mission This has shifted the conventional scientific computing paradigm and could limit scientific discovery in the future and produce major set backs

19 19 The Problem for the e-Scientist Data ingest Managing Petabytes Common schemas How to organize it? How to reorganize it? How to coexist & cooperate with others?  Data Query and Visualization tools  Support/training  Performance  Execute queries in a minute  Batch (big) query scheduling Experiments & Instruments Simulations facts answers questions ? Literature Other Archives facts

20 20 Can “Here and Now” technologies accelerate discovery? Can “Business” Tools and techniques for dealing with be used in scientific research to allow researchers to be scientists and not computer scientists…

21 21 Computational Modeling Real-world Data Interpretation & Insight Persistent Distributed Data Workflow, Data Mining & Algorithms

22 22 Computational Modeling Real-world Data Interpretation & Insight Persistent Distributed Data Workflow, Data Mining & Algorithms

23 23 Conclusion We need to advance in making computing easy to use for the scientists to concentrate their energy on their science rather than on the computing tools Only in this way e_Science will be successful in accelerating discovery and producing new breakthroughs Microsoft is making first significative contributions with contribution to Grid standards (OGF HPC profile) and first HPC cluster products MS CSS We need to advance in making computing easy to use for the scientists to concentrate their energy on their science rather than on the computing tools Only in this way e_Science will be successful in accelerating discovery and producing new breakthroughs Microsoft is making first significative contributions with contribution to Grid standards (OGF HPC profile) and first HPC cluster products MS CSS

24 24 Windows Compute Cluster Server 2003 Launched on June 2006 !!!

25 25 Microsoft Compute Cluster Server  Vision  Solution for aplications that uses intensive compute tasks.  To help scalate using a cluster of computers.  Mission Statement  Empowering end users by allowing them to easily harness distributed computing resources to solve complex problems.  Platform  Based on Windows Server 2003 SP1 64 bit Edition.  Suport for Ethernet, Infiniband and others (better than Winsock Direct).  Administration  Setup and administration simplified.  Administration based on images + scripts.  Security based on Active Directory.  Job scheduling and resources administration.  Development  Cluster scheduler via.NET and DCOM.  MPI2 stack with a better performance and security for parallel applications.  Visual Studio 2005 – OpenMP, Parallel Debugger.

26 26 Topology of WCCS

27 27 Communication Components  Computers in a cluster can be connected in one of the six communication topologies:  Star  Crossbar  Ring  2D Hypercube  Fully Connected  Mesh / Grid

28 28 Some Details about Security  Permissions on files and folders on the file server that is connected to both the head nodes and the compute nodes.  Secure movement of files from personal computers back and forth to the secure file server.  Authentication of users on compute nodes so that jobs can be run remotely on these computers.  User management  Human and programming interfaces  Program run levels  User level, kernel, Admin mode  Dynamic access to resources

29 29 WCCS Components  Head Node  Compute Node  Job Scheduler  Management Infrastructure  Compute Cluster Administrator and Job Manager  Command Line Interface

30 30 Installing and Configuring Head Node Head Node Node

31 31  Configuring the Cluster Installing and Configuring Head Node

32 32  Selecting Network Topology Installing and Configuring Head Node

33 33 Services on Nodes  Head Node  Compute Cluster Management Service  Compute Cluster Scheduler Service  Compute Cluster SDM Store Service  Compute Cluster MPI Service  Compute Cluster Node Manager Service  Compute Nodes  Compute Cluster Management Service  Compute Cluster MPI Service  Compute Cluster Node Manager Service

34 34 Cluster Control

35 35 Run a sample code on the Cluster

36 36 Management of WCCS  Remote Desktop Sessions

37 37 Management of WCCS  System Monitor  This page displays performance monitoring data for the cluster

38 38 Job Activation  State transition during job execution on compute node

39 39 Job life cycle in WCCS

40 40 Create a new Job

41 41 Windows Compute Cluster Server 2003 Developing using Visual Studio 2005

42 42 Microsoft Academic Programs WCCS 2003 Access to Academia free for non commercial use WCCS 2003 Access to Academia free for non commercial use

43 43 Windows Compute Cluster Server 2003 Thank you!!! Carlos Hulot New Technologies & Plataform Manager Microsoft Brasil Microsoft HPC website Public Newsgroup nntp://microsoft.public.windows.hpc Comunidade Acadêmica | Brasil


Download ppt "Opportunities and Challenges in e_Science Fabrizio Gagliardi & Carlos Hulot Microsoft Corporation Fabrizio Gagliardi & Carlos Hulot Microsoft Corporation."

Similar presentations


Ads by Google