Presentation is loading. Please wait.

Presentation is loading. Please wait.

E-Science and Cyberinfrastructure Tony Hey Corporate VP for Technical Computing Microsoft Corporation.

Similar presentations


Presentation on theme: "E-Science and Cyberinfrastructure Tony Hey Corporate VP for Technical Computing Microsoft Corporation."— Presentation transcript:

1 e-Science and Cyberinfrastructure Tony Hey Corporate VP for Technical Computing Microsoft Corporation

2 Licklider’s Vision “Lick had this concept – all of the stuff linked together throughout the world, that you can use a remote computer, get data from a remote computer, or use lots of computers in your job” “Lick had this concept – all of the stuff linked together throughout the world, that you can use a remote computer, get data from a remote computer, or use lots of computers in your job” Larry Roberts – Principal Architect of the ARPANET

3 What is e-Science? ‘e-Science is about global collaboration in key areas of science, and the next generation of infrastructure that will enable it’ ‘e-Science is about global collaboration in key areas of science, and the next generation of infrastructure that will enable it’ John Taylor Former Director General of Research Councils Former Director General of Research Councils Office of Science and Technology, UK Office of Science and Technology, UK

4 e-Science  e-Science is about data-driven, multidisciplinary science and the technologies to support such distributed, collaborative scientific research  Many areas of science are now being overwhelmed by a ‘data deluge’ from new high-throughput devices, sensor networks, satellite surveys …  Areas such as bioinformatics, genomics, drug design, engineering and healthcare require collaboration between different domain experts  ‘e-Science’ is a shorthand for a set of technologies to support collaborative networked science  HPC and Information Management are key technologies to support this e-Science revolution

5 The UK e-Science Initiative  DTI and OST Investment of over £250M over 5 years aimed at enabling the next generation of multi-disciplinary collaborative science and engineering  Major collaborative industrial program with participation from the engineering, pharmaceutical, petrochemical, media and financial sectors  Similar national e-Science programs now initiated around the world e.g. China, South Korea, Australia, Germany, Spain, Netherlands … e.g. China, South Korea, Australia, Germany, Spain, Netherlands … … and now Chile … and now Chile

6 A New Science Paradigm  Thousand years ago: Experimental Science - description of natural phenomena - description of natural phenomena  Last few hundred years: Theoretical Science - Newton’s Laws, Maxwell’s Equations … - Newton’s Laws, Maxwell’s Equations …  Last few decades: Computational Science - simulation of complex phenomena - simulation of complex phenomena  Today: e-Science or Data-centric Science - unify theory, experiment, and simulation - unify theory, experiment, and simulation - using data exploration and data mining - using data exploration and data mining Data captured by instruments Data captured by instruments Data generated by simulations Data generated by simulations Data generated by sensor networks Data generated by sensor networks  Scientist analyzes databases/files (With thanks to Jim Gray)

7 http://www.neptune.washington.edu/

8 Undersea Sensor Network Connected & Controllable Over the Internet

9 Visual Programming Persistent Distributed Storage

10 Distributed Computation Interoperability & Legacy Support via Web Services

11 Live Documents Searching & Visualization Reputation & Influence

12 Two examples of e-Science  An Astronomy DataGrid – AstroGrid and the International Virtual Observatory  Chemistry – The Comb-e-Chem Project

13 Powering the Virtual Universe www.astrogrid.ac.uk Multi-wavelength showing the jet in M87: from top to bottom – X-ray, Optical, Infra-Red and Radio

14 The Multiwavelength Crab Nebulae X-ray, optical, infrared, and radio views of the nearby Crab Nebula, which is now in a state of chaotic expansion after a supernova explosion first sighted in 1054 A.D. by Chinese Astronomers. Slide courtesy of Robert Brunner @ CalTech. Crab star 1053 AD

15 International Virtual Observatory  Data has no commercial value  No privacy concerns  Can freely share results with others  Great for experimenting with algorithms  Data is real and well documented  High-dimensional data  Spatial data  Temporal data  Data from many different instruments, places and times instruments, places and times  Federation is a key goal  There is a lot of data (petabytes) With thanks to Jim Gray IRAS 100  ROSAT ~keV DSS Optical 2MASS 2  IRAS 25  NVSS 20cm WENSS 92cm GB 6cm

16 IVO: An Astronomy Data Grid  Working to build world-wide telescope  All astronomy data and literature  online and cross indexed  Tools to analyze it  Built SkyServer.SDSS.org  Built Analysis system  MyDB  CasJobs (batch job)  OpenSkyQuery Federation of ~20 observatories.  Results:  It works and is used every day  Spatial extensions in SQL 2005  A good example of Data Grid  A good example of Web Services

17 The Comb-e-Chem Project National X-Ray Service Data Mining and Analysis Automatic Annotation Combinatorial Chemistry Wet Lab HPC Simulation Video Data Stream Diffractometer Middleware Structures Database

18 National Crystallographic Service Send sample material to NCS service Search materials database and predict properties using Grid computations Download full data on materials of interest Collaborate in e-Lab experiment and obtain structure

19 A digital lab book replacement that chemists were able to use, and liked

20 Monitoring laboratory experiments using a broker delivered over GPRS on a PDA

21 Crystallographic e-Prints Direct Access to Raw Data from scientific papers Raw data sets can be very large - stored at UK National Datastore using SRB software

22 Cyberinfrastructure  Cyberinfrastructure and e-Infrastructure  In the US, Europe and Asia there is a common vision for the ‘cyberinfrastructure’ required to support the e-Science revolution  Set of Middleware Services supported on top of high bandwidth academic research networks  Software, hardware and organizations that support e-Science  Similar to vision of the Grid as a set of services that allows scientists – and industry – to routinely set up ‘Virtual Organizations’ for their research – or business  The ‘Microsoft Grid’ vision is as much about integrating and managing data and information than about compute cycles

23 Grids for Virtual Organizations

24 Service-Orientation for building Distributed Systems

25 Web Services and Interoperability Company A (J2EE) Open Source (OMII) Company C (.NET) Web Services

26 Progress in Grid Standards?  The GGF/EGA merger gives great opportunity for the new Open Grid Forum (OGF) to standardize a small set of basic Grid services based on generally accepted Web Services  Harness the power of the world-wide Grid community to develop robust open source reference implementations  Grid research community needs to propose and explore new features in real experiments  OGF can reassure industry about progress in Grid standards and grow the market for all

27 Key Data Issues for e-Science  Networks  Lambda technology  The Data Life Cycle  From Acquisition to Preservation  Scholarly Communication  Open Access to Data and Publications

28 Computation Starlight (Chicago) Netherlight (Amsterdam) Leeds PSC SDSC UCL Network PoP Service Registry NCSA Manchester UKLight Oxford RAL US TeraGrid UK NGS Steering clients AHM 2004 Local laptops and Manchester vncserver All sites connected by production network (not all shown) An International e-Infrastructure

29 The Problem for the e-Scientist  Data ingest  Managing a petabyte  Common schema  How to organize it?  How to reorganize it?  How to coexist & cooperate with others?  Data Query and Visualization tools  Support/training  Performance  Execute queries in a minute  Batch (big) query scheduling Experiments & Instruments Simulations facts answers questions ? Literature Other Archives facts

30 The e-Science Data Life Cycle  Data Acquisition  Data Ingest  Metadata  Annotation  Provenance  Data Storage  Data Cleansing  Data Mining  Curation  Preservation

31 Scholarly Communication  Global Movement towards permitting ‘Open Access’ to scholarly publications  Libraries can no longer afford publisher subscriptions  Principle that results of publicly funded research should be available to all  Mandates for Open Access  US Proposal – Cornyn-Lieberman Bill  Supported by most top US research universities  EU Proposals  UK, France and German initiatives

32 NSF ‘Atkins’ Report on Cyberinfrastructure  ‘the primary access to the latest findings in a growing number of fields is through the Web, then through classic preprints and conferences, and lastly through refereed archival papers’  ‘archives containing hundreds or thousands of terabytes of data will be affordable and necessary for archiving scientific and engineering information’

33 Interoperable Repositories?  Paul Ginsparg’s arXiv at Cornell has demonstrated a new model of scientific publishing  Electronic version of ‘preprints’ hosted on the Web  David Lipman of the NIH National Library of Medicine has developed PubMedCentral as repository for NIH funded research papers  Microsoft funded development of ‘portable PMC’ now being deployed in UK and other countries  Stevan Harnad’s ‘self-archiving’ EPrints project in Southampton provides a basis for OAI-compliant ‘Institutional Repositories’  JISC-funded TARDis Project at Southampton is hybrid of full-text open access and links to publisher sites

34 The Service Revolution  Web 2.0  Social networks, tagging for sharing e.g. e.g. Flikr, Del.icio.us, MySpace, CiteULike, Connotea …  Wikis, Blogs, RSS, folksonomies …  Software delivered as a service  Microsoft Live services  Office Live  Xbox Live  Windows Live Academic  Mashups  Craigslist + GoogleMap  http://mashupcamp.com

35 id Combine services to give added value e-Science Mashups?

36 Technical Computing at Microsoft  Advanced Computing for Science and Engineering  Application of new algorithms, tools and technologies to scientific and engineering problems  High Performance Computing  Application of high performance clusters and database technologies to industrial and scientific applications  Radical Computing  Research in potential breakthrough technologies

37 Fighting HIV with Computer Science Nebojsa Jojic and David Heckerman  A major problem: Over 40 million infected  Drug treatments are effective but are an expensive life commitment  Vaccine needed for third world countries  Effective vaccine could eradicate disease  Methods from computer science are helping with the design of vaccine  Machine learning: Finding biological patterns that may stimulate the immune system to fight the HIV virus  Optimization methods: Compressing these patterns into a small, effective vaccine

38 Developed Set of Specialist Tools  Chromatogram deconvolution  Pathway analysis/association/causal models  Clustering/Trees (phylo, haplotypes etc.)  Protein binding and folding  Sequence diversity models (epitomes)  Image analysis/classification  Evolution modeling and inference  Epitope prediction

39 Kyril Faenov, Director of HPC Microsoft Corporation http://www.microsoft.com/hpc

40 Top 500 Architectures / Systems

41 HPC: Market Trends Capability, Enterprise $1M+ Divisional $250K-$1M Departmental $50-250K Workgroup <$50K 2004 Systems 1,167 3,915 22,712 127,802 Source: IDC, 2005 <$250K – 97% of systems, 52% of revenue In 2004 clusters grew 96% to 37% by revenue Average cluster size 10-16 nodes

42 Windows Compute Cluster Server 2003 Faster time-to-insight through simplified cluster deployment, job submission and status monitoring Better integration with existing Windows infrastructure allowing customers to leverage existing technology and skill-sets Familiar development environment allows developers to write parallel applications from within the powerful Visual Studio IDE

43 10,0001,000100101 ‘70‘80‘90‘00‘10 Power Density (W/cm 2 ) 4004 8008 8080 8085 8086 286 386 486 Pentium ® Hot Plate Nuclear Reactor Rocket Nozzle Sun’s Surface Intel Developer Forum, Spring 2004 - Pat Gelsinger

44 Radical Computing  Future of silicon chips  “100’s of cores on a chip in 2015” (Justin Rattner, Intel) (Justin Rattner, Intel)  Challenge for IT industry and Computer Science community  Can we make parallel computing on a chip easier than message-passing?  Challenge for the Scientific Community  How will the Multi-Core transition affect scientific computing?

45 Even more Radical Computing?  Quantum Computing  Multidisciplinary Institute at UCSB  Director is Michael Freedman of MSR  Looking at novel material physics and the 2- dimensional Quantum Hall effect  Exploring non-Abelian ‘Anyon’ excitations to protect coherence of qubits and quantum gates  See Scientific American April 2006 for description of ‘Project Q’

46 Multiplication versus Factoring 3490529610327681329911438162575788888766 8476509491326670954992357799761466120102 4784961990961988190818296721242362582561 3898133417 X3446141317 =84293570693524573389 7646384933764298799278305971235639587050 8784399082942539798258989075147599290026 057788533879543541 Prime Factors of the 129-digit number known as RSA-129 Multiplication is classically computationally ‘easy’ but factorization is computationally ‘hard’

47 Peter Shor developed new factorization algorithm for a quantum computer

48 Summary Microsoft wishes to work with the university research and library communities to: Microsoft wishes to work with the university research and library communities to: develop interoperable high-level services, work flows, tools and data services develop interoperable high-level services, work flows, tools and data services accelerate progress in a small number of societally important scientific applications accelerate progress in a small number of societally important scientific applications assist in the development of interoperable repositories and new models of scholarly publishing assist in the development of interoperable repositories and new models of scholarly publishing explore radical new directions in computing and ways and applications to exploit on-chip parallelism explore radical new directions in computing and ways and applications to exploit on-chip parallelism  How can Microsoft best collaborate with the scientific community?

49 © 2005 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.


Download ppt "E-Science and Cyberinfrastructure Tony Hey Corporate VP for Technical Computing Microsoft Corporation."

Similar presentations


Ads by Google