Presentation is loading. Please wait.

Presentation is loading. Please wait.

E-Science and Cyberinfrastructure: A Middleware Perspective Tony Hey Corporate VP for Technical Computing Microsoft Corporation.

Similar presentations


Presentation on theme: "E-Science and Cyberinfrastructure: A Middleware Perspective Tony Hey Corporate VP for Technical Computing Microsoft Corporation."— Presentation transcript:

1 e-Science and Cyberinfrastructure: A Middleware Perspective Tony Hey Corporate VP for Technical Computing Microsoft Corporation

2 Lickliders Vision Lick had this concept – all of the stuff linked together throughout the world, that you can use a remote computer, get data from a remote computer, or use lots of computers in your job Lick had this concept – all of the stuff linked together throughout the world, that you can use a remote computer, get data from a remote computer, or use lots of computers in your job Larry Roberts – Principal Architect of the ARPANET

3 The e-Science Vision e-Science is about multidisciplinary science and the technologies to support such distributed, collaborative scientific research e-Science is about multidisciplinary science and the technologies to support such distributed, collaborative scientific research Many areas of science are now being overwhelmed by a data deluge from new high-throughput devices, sensor networks, satellite surveys … Many areas of science are now being overwhelmed by a data deluge from new high-throughput devices, sensor networks, satellite surveys … Areas such as bioinformatics, genomics, drug design, engineering and healthcare require collaboration between different domain experts Areas such as bioinformatics, genomics, drug design, engineering and healthcare require collaboration between different domain experts e-Science is a shorthand for a set of technologies to support collaborative networked science e-Science is a shorthand for a set of technologies to support collaborative networked science

4 http://www.neptune.washington.edu/

5 Undersea Sensor Network Connected & Controllable Over the Internet

6 Visual Programming Persistent Distributed Storage

7 Distributed Computation Interoperability & Legacy Support via Web Services

8 Live Documents Searching & Visualization Reputation & Influence

9 Cyberinfrastructure Cyberinfrastructure and e-Infrastructure Cyberinfrastructure and e-Infrastructure In the US, Europe and Asia there is a common vision for the cyberinfrastructure required to support the e-Science revolution In the US, Europe and Asia there is a common vision for the cyberinfrastructure required to support the e-Science revolution Set of Middleware Services supported on top of high bandwidth academic research networks Set of Middleware Services supported on top of high bandwidth academic research networks Software, hardware and organizations to support e- Science Software, hardware and organizations to support e- Science Similar to vision of the Grid as a set of services that allows scientists – and industry – to routinely set up Virtual Organizations for their research – or business Similar to vision of the Grid as a set of services that allows scientists – and industry – to routinely set up Virtual Organizations for their research – or business The Microsoft Grid vision is as much about integrating and managing data and information than about compute cycles The Microsoft Grid vision is as much about integrating and managing data and information than about compute cycles

10 Technical Computing at Microsoft Advanced Computing for Science and Engineering Advanced Computing for Science and Engineering Application of new algorithms, tools and technologies to scientific and engineering problems Application of new algorithms, tools and technologies to scientific and engineering problems High Performance Computing High Performance Computing Application of high performance clusters and database technologies to industrial and scientific applications Application of high performance clusters and database technologies to industrial and scientific applications Radical Computing Radical Computing Research in potential breakthrough technologies Research in potential breakthrough technologies

11 Fighting HIV with Computer Science Nebojsa Jojic and David Heckerman A major problem: Over 40 million infected A major problem: Over 40 million infected Drug treatments are effective but are an expensive life commitment Drug treatments are effective but are an expensive life commitment Vaccine needed for third world countries Vaccine needed for third world countries Effective vaccine could eradicate disease Effective vaccine could eradicate disease Methods from computer science are helping with the design of vaccine Methods from computer science are helping with the design of vaccine Machine learning: Finding biological patterns that may stimulate the immune system to fight the HIV virus Machine learning: Finding biological patterns that may stimulate the immune system to fight the HIV virus Optimization methods: Compressing these patterns into a small, effective vaccine Optimization methods: Compressing these patterns into a small, effective vaccine

12 Developed Set of Specialist Tools Chromatogram deconvolution Chromatogram deconvolution Pathway analysis/association/causal models Pathway analysis/association/causal models Clustering/Trees (phylo, haplotypes etc.) Clustering/Trees (phylo, haplotypes etc.) Protein binding and folding Protein binding and folding Sequence diversity models (epitomes) Sequence diversity models (epitomes) Image analysis/classification Image analysis/classification Evolution modeling and inference Evolution modeling and inference Epitope prediction Epitope prediction

13 HIV: The diabolical virus The train-and-kill mechanism doesnt work for HIV – the virus adapts through rapid mutation. As soon as the killer cells get the upper hand, the epitopes start changing. The train-and-kill mechanism doesnt work for HIV – the virus adapts through rapid mutation. As soon as the killer cells get the upper hand, the epitopes start changing.Strategy: Find peptides or epitopes that occur commonly across a *population* of HIV viruses Find peptides or epitopes that occur commonly across a *population* of HIV viruses Compact the known or potential immune targets into a small vaccine Compact the known or potential immune targets into a small vaccine

14 International Virtual Observatory Data has no commercial value Data has no commercial value No privacy concerns No privacy concerns Can freely share results with others Can freely share results with others Great for experimenting with algorithms Great for experimenting with algorithms Data is real and well documented Data is real and well documented High-dimensional data High-dimensional data Spatial data Spatial data Temporal data Temporal data Data from many different Data from many different instruments, places and times instruments, places and times Federation is a key goal Federation is a key goal There is a lot of data (petabytes) There is a lot of data (petabytes) With thanks to Jim Gray IRAS 100 ROSAT ~keV DSS Optical 2MASS 2 IRAS 25 NVSS 20cm WENSS 92cm GB 6cm

15 The Multiwavelength Crab Nebulae X-ray, optical, infrared, and radio views of the nearby Crab Nebula, which is now in a state of chaotic expansion after a supernova explosion first sighted in 1054 A.D. by Chinese Astronomers. Slide courtesy of Robert Brunner @ CalTech. Crab star 1053 AD

16 SkyServer (http://cas.sdss.org) A modern archive A modern archive Access to Sloan Digital Sky Survey Spectroscopic and Optical surveys Access to Sloan Digital Sky Survey Spectroscopic and Optical surveys Raw Pixel data lives in file servers Raw Pixel data lives in file servers Catalog data (derived objects) lives in Database Catalog data (derived objects) lives in Database Online query to any and all Online query to any and all Interesting things Interesting things Spatial data search Spatial data search Query interface via Java Applet Query interface via Java Applet Query from Emacs, Python, …. Query from Emacs, Python, …. Template design cloned by other Template design cloned by other surveys surveys Web Services are core of it Web Services are core of it

17 SkyQuery ( http://skyquery.net/) http://skyquery.net/ Distributed Query tool using a set of Web Services Distributed Query tool using a set of Web Services Federates many astronomy archives from Pasadena, Chicago, Baltimore, Cambridge UK Federates many astronomy archives from Pasadena, Chicago, Baltimore, Cambridge UK Grown from 4 to 15 archives,becoming international standard Grown from 4 to 15 archives,becoming international standard WebService Poster Child WebService Poster Child Allows queries like: Allows queries like: SELECT o.objId, o.r, o.type, t.objId FROM SDSS:PhotoPrimary o, TWOMASS:PhotoPrimary t WHERE XMATCH(o,t)<3.5 AND AREA(181.3,-0.76,6.5) AND o.type=3 and (o.I - t.m_j)>2

18 IVO: An Astronomy Data Grid Working to build world-wide telescope Working to build world-wide telescope All astronomy data and literature All astronomy data and literature online and cross indexed online and cross indexed Tools to analyze it Tools to analyze it Built SkyServer.SDSS.org Built SkyServer.SDSS.org Built Analysis system Built Analysis system MyDB MyDB CasJobs (batch job) CasJobs (batch job) OpenSkyQuery Federation of ~20 observatories. OpenSkyQuery Federation of ~20 observatories. Results: Results: It works and is used every day It works and is used every day Spatial extensions in SQL 2005 Spatial extensions in SQL 2005 A good example of Data Grid A good example of Data Grid A good example of Web Services A good example of Web Services

19 HPC: Top 500 Trends Industry usage rising Clusters over 50% x86 is winning GigE is gaining

20 HPC: Market Trends Capability, Enterprise $1M+ Divisional $250K-$1M Departmental $50-250K Workgroup <$50K 2004 Systems 1,167 3,915 22,712 127,802 Source: IDC, 2005 <$250K – 97% of systems, 52% of revenue In 2004 clusters grew 96% to 37% by revenue Average cluster size 10-16 nodes

21 Continuing Trend Towards Decentralized, Networked Resources Grids of personal & departmental clusters Personal workstations & departmental servers Minicomputers Mainframes

22 Microsoft Strategy for HPC Reduce barriers to adoption for HPC clusters Reduce barriers to adoption for HPC clusters Easy to deploy, manage and use Easy to deploy, manage and use Provide application support in key HPC verticals Provide application support in key HPC verticals Engagement with the top HPC ISVs Engagement with the top HPC ISVs Leverage a breadth of standard tools Leverage a breadth of standard tools Web Services, SQL, Sharepoint, Infopath, Excel Web Services, SQL, Sharepoint, Infopath, Excel High Volume Market High Volume Market Enable broad HPC adoption Enable broad HPC adoption

23 Todays CPU Architecture Heat becoming an unmanageable problem 10,0001,000100101 7080900010 Power Density (W/cm 2 ) 4004 8008 8080 8085 8086 286 386 486 Pentium ® Hot Plate Nuclear Reactor Rocket Nozzle Suns Surface Intel Developer Forum, Spring 2004 - Pat Gelsinger

24 Radical Computing The end of Moores Law as we know it The end of Moores Law as we know it Number of transistors on a chip will continue to increase Number of transistors on a chip will continue to increase No significant increase in clock speed No significant increase in clock speed Future of silicon chips Future of silicon chips 100s of cores on a chip in 2015 (Justin Rattner, Intel) 100s of cores on a chip in 2015 (Justin Rattner, Intel) 4 cores/Tflop => 25 Tflops/chip 4 cores/Tflop => 25 Tflops/chip Challenge for IT industry and Computer Science community Challenge for IT industry and Computer Science community Can we make parallel computing on a chip easier than message-passing? Can we make parallel computing on a chip easier than message-passing?

25 Service-Orientation for building Distributed Systems

26 The Web Services Magic Bullet Company A (J2EE) Open Source (OMII) Company C (.Net) Web Services

27 Convergence in Web Services Systems Management Different approaches lead to confusion and uncertainty Different approaches lead to confusion and uncertainty WS-DM and WS-Management WS-DM and WS-Management WS-RF and WS-Transfer WS-RF and WS-Transfer WS-Notification and WS-Eventing WS-Notification and WS-Eventing Microsoft, IBM, HP, and Intel agreed to a convergence roadmap Microsoft, IBM, HP, and Intel agreed to a convergence roadmap No specific timeline yet announced No specific timeline yet announced

28 stable profile The Web Services Ecosystem WS-I Standards that have broad industry support and multiple interoperable implementations Specifications that are emerging from standardisation process and are recognised as being useful Specifications that have/will enter a standardisation process but are not stable and are still experimental

29 Web Services and the Grid A Complicated Story: Basic Web Service specifications Basic Web Service specifications WS-I (SOAP, WSDL) from 2001 onwards WS-I (SOAP, WSDL) from 2001 onwards Web Service Grids Web Service Grids G-WSDL and OGSI (2001 – 2003) G-WSDL and OGSI (2001 – 2003) WS-RF, WS-N and WS-DM (2004 - ?) WS-RF, WS-N and WS-DM (2004 - ?) Lesson: Lesson: Build Web Service Grids incrementally only on stable, mature and widely-accepted WS foundations Build Web Service Grids incrementally only on stable, mature and widely-accepted WS foundations

30 Grids for Virtual Organizations

31

32 Premise: The Grid and Web communities could soon deliver some useful specifications for Web Service Grids By focusing on simple Grid services built on accepted Web Services we can reach agreement quickly By focusing on simple Grid services built on accepted Web Services we can reach agreement quickly Look at three key areas for Grids for Virtual Organizations Look at three key areas for Grids for Virtual Organizations Security Security HPC Services HPC Services Data Services Data Services

33 Virtual Organization Security Not yet routine and seamless: many technologies and standards exist in the security space Not yet routine and seamless: many technologies and standards exist in the security space Interoperability only works if proposed solutions are widely accepted by both industry and academia Interoperability only works if proposed solutions are widely accepted by both industry and academia Larger problem than just for the GGF community Larger problem than just for the GGF community IT industry will provide high quality, well documented tooling and services to construct secure Virtual Organizations IT industry will provide high quality, well documented tooling and services to construct secure Virtual Organizations

34 The OGSA HPC Profile Defines a minimalist base interface plus optional extensions Defines a minimalist base interface plus optional extensions Small base interface enables simple interoperability widely and quickly Small base interface enables simple interoperability widely and quickly Common use cases covered by extensions Common use cases covered by extensions Extension model enables principled experimentation and evolution Extension model enables principled experimentation and evolution Defines minimal set of composable, extensible services Defines minimal set of composable, extensible services Job Submission Job Submission Data Staging Data Staging

35 An OGSA Data Profile? An OGSA Data Profile? Guiding principles: Keep profile as simple as possible Keep profile as simple as possible Example of Amazon S3 Example of Amazon S3 DAIS Working Group specifications DAIS Working Group specifications WS-DAI WS-DAI WS-DAIR and WS-DAIX WS-DAIR and WS-DAIX Build on only widely accepted Web Services Build on only widely accepted Web Services WS-I + …. WS-I + ….

36 New Science Paradigms Thousand years ago: Experimental Science Thousand years ago: Experimental Science - description of natural phenomena - description of natural phenomena Last few hundred years: Theoretical Science Last few hundred years: Theoretical Science - Newtons Laws, Maxwells Equations … - Newtons Laws, Maxwells Equations … Last few decades: Computational Science Last few decades: Computational Science - simulation of complex phenomena - simulation of complex phenomena Today: e-Science or Data-centric Science Today: e-Science or Data-centric Science - unify theory, experiment, and simulation - unify theory, experiment, and simulation - using data exploration and data mining - using data exploration and data mining Data captured by instruments Data captured by instruments Data generated by simulations Data generated by simulations Processed by software Processed by software Scientist analyzes databases/files Scientist analyzes databases/files (With thanks to Jim Gray)

37 Key Data Issues for e-Science Networks Networks Lambda technology Lambda technology The Data Life Cycle The Data Life Cycle From Acquisition to Preservation From Acquisition to Preservation Scholarly Communication Scholarly Communication Open Access to Data and Publications Open Access to Data and Publications

38 Computation Starlight (Chicago) Netherlight (Amsterdam) Leeds PSC SDSC UCL Network PoP Service Registry NCSA Manchester UKLight Oxford RAL US TeraGrid UK NGS Steering clients AHM 2004 Local laptops and Manchester vncserver All sites connected by production network (not all shown) An International e-Infrastructure

39 The Problem for the e-Scientist Data ingest Data ingest Managing a petabyte Managing a petabyte Common schema Common schema How to organize it? How to organize it? How to reorganize it? How to reorganize it? How to coexist & cooperate with others? How to coexist & cooperate with others? Data Query and Visualization tools Data Query and Visualization tools Support/training Support/training Performance Performance Execute queries in a minute Execute queries in a minute Batch (big) query scheduling Batch (big) query scheduling Experiments & Instruments Simulations facts answers questions ? Literature Other Archives facts

40 The e-Science Data Life Cycle Data Acquisition Data Acquisition Data Ingest Data Ingest Metadata Metadata Annotation Annotation Provenance Provenance Data Storage Data Storage Data Cleansing Data Cleansing Data Mining Data Mining Curation Curation Preservation Preservation

41 Scholarly Communication Global Movement towards permitting Open Access to scholarly publications Global Movement towards permitting Open Access to scholarly publications Libraries can no longer afford publisher subscriptions Libraries can no longer afford publisher subscriptions Principle that results of publicly funded research should be available to all Principle that results of publicly funded research should be available to all First World/Third World issue First World/Third World issue Open Archive Initiative (OAI) Open Archive Initiative (OAI) Creation of Subject Repositories such as arXiv for physics, astronomy and computer science, and PubMedCentral for Bio-Medical area Creation of Subject Repositories such as arXiv for physics, astronomy and computer science, and PubMedCentral for Bio-Medical area Global network of Institutional Repositories being established using software such as MITs DSpace, Southamptons EPrints and others Global network of Institutional Repositories being established using software such as MITs DSpace, Southamptons EPrints and others

42 NSF Atkins Report on Cyberinfrastructure the primary access to the latest findings in a growing number of fields is through the Web, then through classic preprints and conferences, and lastly through refereed archival papers the primary access to the latest findings in a growing number of fields is through the Web, then through classic preprints and conferences, and lastly through refereed archival papers archives containing hundreds or thousands of terabytes of data will be affordable and necessary for archiving scientific and engineering information archives containing hundreds or thousands of terabytes of data will be affordable and necessary for archiving scientific and engineering information

43 The Service Revolution Web 2.0 Web 2.0 Social networks, tagging for sharing e.g. Flikr, Del.icio.us, MySpace, … Social networks, tagging for sharing e.g. Flikr, Del.icio.us, MySpace, … Wikis, Blogs, RSS … Wikis, Blogs, RSS … Software delivered as a service Software delivered as a service Live services Live services Microsoft Office Live Microsoft Office Live XboxLive XboxLive AcademicLive AcademicLive Mashups Mashups Craigslist + GoogleMap Craigslist + GoogleMap http://mashupcamp.com http://mashupcamp.com

44 id Combine services to give added value An e-Science Mashup

45 The Semantic Grid In 2001, De Roure, Jennings and Shadbolt introduced the notion of the Semantic Grid In 2001, De Roure, Jennings and Shadbolt introduced the notion of the Semantic Grid Advocated the application of Semantic Web technologies both on and in the Grid Advocated the application of Semantic Web technologies both on and in the Grid Argued that users now required interoperability across time as well as space Argued that users now required interoperability across time as well as space Would allow both anticipated and unanticipated reuse of services, information and knowledge Would allow both anticipated and unanticipated reuse of services, information and knowledge In 2005, experience with UK e-Science Projects led them to enumerate requirements for a Semantic Grid In 2005, experience with UK e-Science Projects led them to enumerate requirements for a Semantic Grid

46 The Semantic Grid and Web Science De Roure, Jennings and Shadbolt identified 5 key technologies for building a Semantic Grid: De Roure, Jennings and Shadbolt identified 5 key technologies for building a Semantic Grid: 1) Web Services 1) Web Services 2) Software Agents 2) Software Agents 3) Metadata 3) Metadata 4) Ontologies and Reasoning 4) Ontologies and Reasoning 5) Semantic Web Services 5) Semantic Web Services Web and Grid communities coming together in a common vision for high level semantic services connecting distributed data resources Web and Grid communities coming together in a common vision for high level semantic services connecting distributed data resources

47 Summary Microsoft wishes to work with the Web, Grid and HPC communities: Microsoft wishes to work with the Web, Grid and HPC communities: to utilize open standards and develop interoperable high-level services, work flows, tools and data services to utilize open standards and develop interoperable high-level services, work flows, tools and data services to accelerate progress in a small number of societally important scientific applications to accelerate progress in a small number of societally important scientific applications to assist in the development of interoperable repositories and new models of scholarly publishing to assist in the development of interoperable repositories and new models of scholarly publishing to explore radical new directions in computing and ways and applications to exploit on-chip parallelism to explore radical new directions in computing and ways and applications to exploit on-chip parallelism

48 Acknowledgements With special thanks to Malcolm Atkinson, Neil Chuehong, Geoffrey Fox, Jim Gray, Marty Humphrey, Steven Newhouse, Stuart Ozer, Savas Parastatidis, Norman Paton and Paul Watson With special thanks to Malcolm Atkinson, Neil Chuehong, Geoffrey Fox, Jim Gray, Marty Humphrey, Steven Newhouse, Stuart Ozer, Savas Parastatidis, Norman Paton and Paul Watson


Download ppt "E-Science and Cyberinfrastructure: A Middleware Perspective Tony Hey Corporate VP for Technical Computing Microsoft Corporation."

Similar presentations


Ads by Google