Presentation on theme: "Manchester Computing Supercomputing, Visualization & eScience W T Hewitt Wednesday, April 16, 2014 UCISA Meeting Edinburgh What is e-Science & What is."— Presentation transcript:
Manchester Computing Supercomputing, Visualization & eScience W T Hewitt Wednesday, April 16, 2014 UCISA Meeting Edinburgh What is e-Science & What is the Grid?
Supercomputing, Visualization & e-Science2escigriducisa/03 Agenda What is Grid & eScience? The Global Programme The UK eScience Programme Impacts
Manchester Computing Supercomputing, Visualization & eScience What is e-Science & the Grid?
Supercomputing, Visualization & e-Science4escigriducisa/03 Why Grids? Large-scale science and engineering are done through –the interaction of people, –heterogeneous computing resources, information systems, and instruments, –all of which are geographically and organizationally dispersed. The overall motivation for Grids is to facilitate the routine interactions of these resources in order to support large-scale science and engineering. From Bill Johnston 27 July 01
Supercomputing, Visualization & e-Science5escigriducisa/03 The Grid… "…is the web on steroids." "…is Napster for Scientists" [of data grids] "…is the solution to all your problems." "…is evil." [a system manager, of Globus] "…is distributed computing re-badged." "…is distributed computing across multiple administrative domains" –Dave Snelling, senior architect of UNICORE
Supercomputing, Visualization & e-Science6escigriducisa/03 […provides] "Flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions, and resource" –From The Anatomy of the Grid: Enabling Scalable Virtual Organizations "…enables communities (virtual organizations) to share geographically distributed resources as they pursue common goals -- assuming the absence of central location, central control, omniscience, existing trust relationships."
Supercomputing, Visualization & e-Science7escigriducisa/03 CERN: Large Hadron Collider (LHC) Raw Data: 1 Petabyte / sec Filtered 100Mbyte / sec = 1 Petabyte / year = 1 Million CD ROMs Raw Data: 1 Petabyte / sec Filtered 100Mbyte / sec = 1 Petabyte / year = 1 Million CD ROMs CMS Detector
Supercomputing, Visualization & e-Science8escigriducisa/03 Why Grids? A biochemist exploits 10,000 computers to screen 100,000 compounds in an hour; A biologist combines a range of diverse and distributed resources (databases, tools, instruments) to answer complex questions; 1,000 physicists worldwide pool resources for petaop analyses of petabytes of data Civil engineers collaborate to design, execute, & analyze shake table experiments From Steve Tuecke 12 Oct. 01
Supercomputing, Visualization & e-Science9escigriducisa/03 Why Grids? (contd.) Climate scientists visualize, annotate, & analyze terabyte simulation datasets An emergency response team couples real time data, weather model, population data A multidisciplinary analysis in aerospace couples code and data in four companies A home user invokes architectural design functions at an application service provider From Steve Tuecke 12 Oct. 01
Supercomputing, Visualization & e-Science10escigriducisa/03 Broader Context Grid Computing has much in common with major industrial thrusts –Business-to-business, Peer-to-peer, Application Service Providers, Storage Service Providers, Distributed Computing, Internet Computing… Sharing issues not adequately addressed by existing technologies –Complicated requirements: run program X at site Y subject to community policy P, providing access to data at Z according to policy Q –High performance: unique demands of advanced & high-performance systems
Supercomputing, Visualization & e-Science11escigriducisa/03 What is the Grid? Grid computing [is] distinguished from conventional distributed computing by its focus on large-scale resource sharing, innovative applications, and, in some cases, high-performance orientation...we review the "Grid problem", which we define as flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions, and resources - what we refer to as virtual organizations." From "The Anatomy of the Grid: Enabling Scalable Virtual Organizations" by Foster, Kesselman and Tuecke
Supercomputing, Visualization & e-Science12escigriducisa/03 New Book
Supercomputing, Visualization & e-Science13escigriducisa/03 What is the Grid? Resource sharing & coordinated problem solving in dynamic, multi- institutional virtual organizations On-demand, ubiquitous access to computing, data, and all kinds of services New capabilities constructed dynamically and transparently from distributed services No central location, No central control, No existing trust relationships, Little predetermination Uniformity Pooling Resources
Supercomputing, Visualization & e-Science14escigriducisa/03 e-Science and the Grid e-Science is about global collaboration in key areas of science, and the next generation of infrastructure that will enable it. e-Science will change the dynamic of the way science is undertaken. John Taylor, Director General of Research Councils, Office of Science and Technology
Supercomputing, Visualization & e-Science15escigriducisa/03 Why GRID? VERY VERY IMPORTANT The GRID is one way to realise the e-Science vision. WE ARE TRYING TO DO E-SCIENCE!
Manchester Computing Supercomputing, Visualization & eScience Grid Middleware Diverse global services Grid services Local OS
Supercomputing, Visualization & e-Science17escigriducisa/03 Common principles Single sign-on –Often implying Public Key Infrastructure (PKI) Standard protocols and services Respect for autonomy of resource owner Layered architectures Higher-level infrastructures hiding heterogeneity of lower levels Interoperability is paramount
Supercomputing, Visualization & e-Science18escigriducisa/03 Grid Middleware Middleware Globus UNICORE Legion and Avaki Scheduling Sun Grid Engine Load Sharing Facility (LSF) –from Platform Computing OpenPBS and PBS(Pro) –from Veridian Maui scheduler Condor –could also go under middleware Data Storage Resource Broker (SRB) Replica Management OGSA-DAI Web services (WSDL, SOAP, UDDI) IBM Websphere Microsoft.NET Sun Open Net Environment (Sun ONE) PC Grids Peer-to-Peer computing
Manchester Computing Supercomputing, Visualization & eScience Data-oriented Grids
Supercomputing, Visualization & e-Science20escigriducisa/03 Data-oriented middleware Wide-area distributed file systems (e.g. AFS) Storage Resource Broker (SRB) –UCSD and SDSC –Provide transparent access to data storage –Centralised architecture –Motivated by experiences of HPC users, not database users –Little enthusiasm from UK e-Science programme OGSA-DAI –Database Access and Integration –Strategic contribution of UK e-Science programme –Universities of Edinburgh, Manchester, Newcastle; IBM, Oracle –Alpha release January 2003 Globus Replica Management software –Next up!
Supercomputing, Visualization & e-Science21escigriducisa/03 Data Grids for High Energy Physics Tier2 Centre ~1 TIPS Online System Offline Processor Farm ~20 TIPS CERN Computer Centre FermiLab ~4 TIPS France Regional Centre Italy Regional Centre Germany Regional Centre Institute Institute ~0.25TIPS Physicist workstations ~100 MBytes/sec ~622 Mbits/sec ~1 MBytes/sec There is a bunch crossing every 25 nsecs. There are 100 triggers per second Each triggered event is ~1 MByte in size Physicists work on analysis channels. Each institute will have ~10 physicists working on one or more channels; data for these channels should be cached by the institute server Physics data cache ~PBytes/sec ~622 Mbits/sec or Air Freight (deprecated) Tier2 Centre ~1 TIPS Caltech ~1 TIPS ~622 Mbits/sec Tier 0 Tier 1 Tier 2 Tier 4 1 TIPS is approximately 25,000 SpecInt95 equivalents
Supercomputing, Visualization & e-Science22escigriducisa/03 Data Intensive Issues Include … Harness [potentially large numbers of] data, storage, network resources located in distinct administrative domains Respect local and global policies governing what can be used for what Schedule resources efficiently, again subject to local and global constraints Achieve high performance, with respect to both speed and reliability Catalog software and virtual data
Supercomputing, Visualization & e-Science23escigriducisa/03 Desired Data Grid Functionality High-speed, reliable access to remote data Automated discovery of best copy of data Manage replication to improve performance Co-schedule compute, storage, network Transparency wrt delivered performance Enforce access control on data Allow representation of global resource allocation policies
Supercomputing, Visualization & e-Science24escigriducisa/03 Grid Standards Grid Standards Bodies: –IETF: Home of the Network Infrastructure Standards –W3C: Home of the Internet –GGF: Home of the Grid GGF Defines the Open Grid Services Architecture –OGSI is the Infrastructure part of OGSA –OGSI Public comment draft submitted 14 February 2003 Key OGSA Areas of Standards Development –Job management interfaces –Resources & Discovery –Security –Grid Economy and Brokering
Supercomputing, Visualization & e-Science25escigriducisa/03 What is OGSA? Web Services with Attitude! Also known as "Open Grid Services Architecture"
Supercomputing, Visualization & e-Science26escigriducisa/03 Aside: What are Web Services? Loosely Coupled Distributed Computing –Think Java RMI or C remote procedure call Text Based Serialization –XML: Human Readable serialization of objects IBM and Microsoft lead –Web Services Description Language (WSDL) –W3C Standardization Three Parts –Messages (SOAP) –Definition (WSDL) –Discovery (UDDI)
Supercomputing, Visualization & e-Science27escigriducisa/03 Web Services in Action UDDI Publish/WSDL Search Client https/SOAP Java/C/Browser Legacy Enterprise Application Database... WS Platform InterStage, WebSphere, J2EE, GLUE, SunOne,.NET Any protocol
Supercomputing, Visualization & e-Science28escigriducisa/03 Enter Grid Services Experiences of Grid computing (and business process integration) suggest similar extensions to Web Services State –Service Data Model Persistence and Naming –Two Level Naming (GSH, GSR) –Allows dynamic migration and QoS adaptation Lifetime Management –Self healing and soft garbage collection. Standard PortTypes –Guarantee of minimal level of service –Beyond P2P is Federation through Mediation Explicit Semantics –Grid Services specify semantics on top of Web Service syntax. –PortType Inheritance
Manchester Computing Supercomputing, Visualization & eScience If one GRID is good then Many GRIDS must be better
Supercomputing, Visualization & e-Science30escigriducisa/03 US Grid Projects NASA Information Power Grid DOE Science Grid NSF National Virtual Observatory NSF GriPhyN DOE Particle Physics Data Grid NSF DTF TeraGrid DOE ASCI DISCOM Grid DOE Earth Systems Grid DOE FusionGrid NEESGrid NIH BIRN NSF iVDGL
Manchester Computing Supercomputing, Visualization & eScience UK e-Science Programme
Supercomputing, Visualization & e-Science34escigriducisa/03 £80m Collaborative projects E-Science Steering Committee DG Research Councils Director Directors Management Role Directors Awareness and Co-ordination Role Generic Challenges EPSRC (£15m), DTI (£15m) Industrial Collaboration (£40m) Academic Application Support Programme Research Councils (£74m), DTI (£5m) PPARC (£26m) BBSRC (£8m) MRC (£8m) NERC (£7m) ESRC (£3m) EPSRC (£17m) CLRC (£5m) Grid TAG From Tony Hey 27 July 01 UK e-Science Programme
Supercomputing, Visualization & e-Science35escigriducisa/03 Key Elements Development of Generic Grid Middleware Network of Grid Core Programme e-Science Centres –National Centre –Regional Centres Grid IRC Grand Challenge Project Support for e-Science Pilots Short term funding for e-Science demonstrators Grid Network Team Grid Engineering Team Grid Support Centre Task Forces –Database lead by Norman Paton –Architecture lead by Malcolm Atkinson International Involvement Adapted from Tony Hey 27 July 01
Supercomputing, Visualization & e-Science36escigriducisa/03 Cambridge Newcastle Edinburgh Oxford Glasgow Manchester Cardiff Southampton London Belfast DL RAL Hinxton National & Regional Centres Centres donate equipment to make a Grid
Supercomputing, Visualization & e-Science38escigriducisa/03 Grid Middleware R&D £16M funding available for industrial collaborative projects £11M allocated to Centres projects plus £5M for Open Call projects Set up Task Forces –Database Task Force –Architecture Task Force –Security Task Force
Supercomputing, Visualization & e-Science39escigriducisa/03 Grid Network Team Expert group to identify end-to-end network bottlenecks and other network issues –e.g. problems with multicast for Access Grid Identify e-Science project requirements Funding £0.5M traffic engineering/QoS project with PPARC, UKERNA and CISCO –investigating MPLS using SuperJANET network Funding DataGrid extension project investigating bandwidth scheduling with PPARC Proposal for UKLight lambda connection to Chicago and Amsterdam
Supercomputing, Visualization & e-Science41escigriducisa/03 e-Science Centres of Excellence Birmingham/Warwick – Modelling Bristol – Media UCL – Networking White Rose Grid – Leeds, York, Sheffield Lancaster – Social Science Leicester – Astronomy Reading - Environment
Supercomputing, Visualization & e-Science42escigriducisa/03 Cambridge Newcastle Edinburgh Oxford Glasgow Manchester Cardiff Soton London Belfast DL RL Hinxton UK e-Science Grid
Supercomputing, Visualization & e-Science43escigriducisa/03 UK e-Science Funding First Phase: 2001 –2004 Application Projects –£74M –All areas of science and engineering Core Programme –£15M + £20M (DTI) –Collaborative industrial projects Second Phase: 2003 – 2006 Application Projects –£96M –All areas of science and engineering Core Programme –£16M –Core Grid Middleware –DTI follow-on?
Supercomputing, Visualization & e-Science44escigriducisa/03 EPSRC: Computer Science for e-Science –£9M, 18 projects so far ESRC: National e-Social Science Centre + 3 hubs –~£6M PPARC MRC BBSRC
Supercomputing, Visualization & e-Science45escigriducisa/03 Core Programme: Phase 2 UK e-Science Grid/Centres and e-Science Institute Grid Operation Centre and Network Monitoring Core Middleware engineering National Data Curation Centre e-Science Exemplars/New Opportunities Outreach and International involvement
Supercomputing, Visualization & e-Science46escigriducisa/03 Other Activities Security Task Force –Joint fund key security projects with EPSRC & JCSR and coordinated effort with NSF NMI Internet2 projects –JCSR £2M call in preparation UK Digital Curation Centre –£3M, Core e-Science + JCSR JCSR –£3M per annum
Supercomputing, Visualization & e-Science47escigriducisa/03 SR2004 – e-Science Infrastructure Persistent UK e-Science Research Grid Grid Operations Centre UK Open Middleware Infrastructure Institute National e-Science Institute UK Digital Curation Centre AccessGrid Support Service e-Science/Grid collaboratories Legal Service International Standards Activity
Manchester Computing Supercomputing, Visualization & eScience Conclusions
Supercomputing, Visualization & e-Science49escigriducisa/03 Todays Grid A Single System Image Transparent wide-area access to large data banks Transparent wide-area access to applications on heterogeneous platforms Transparent wide-area access to processing resources Security, certification, single sign-on authentication, AAA –Grid Security Infrastructure, Data access,Transfer & Replication –GridFTP, Giggle Computational resource discovery, allocation and process creation –GRAAM, Unicore, Condor-G
Supercomputing, Visualization & e-Science50escigriducisa/03 Reality Checks!! The Technology is Ready –Not true its emerging Building middleware, Advancing Standards, Developing, Dependability Building demonstrators. The computational grid is in advance of the data intensive middleware Integration and curation are probably the obstacles But!! It doesnt have to be all there to be useful. We know how we will use grid services –No Disruptive technology Lower the barriers of entry.
Supercomputing, Visualization & e-Science51escigriducisa/03 Grid Evolution 1 st Generation Grid –Computationally intensive, file access/transfer –Bag of various heterogeneous protocols & toolkits –Recognises internet, Ignores Web –Academic teams 2 nd Generation Grid –Data intensive -> knowledge intensive –Services-based architecture –Recognises Web and Web services –Global Grid Forum –Industry participation We are here!
Supercomputing, Visualization & e-Science52escigriducisa/03 Impacts It's all about interoperability, really. Web & Grid Services are creating a new marketplace for components If you're concerned with systems integration or internet delivery of services, embrace Web Services technologies now. You'll be ready for Grid Services when they're ready for you. –If you're a developer, get Web Services on your CV –If you're an IT manager, collect Web Service expertise through hiring or training Software license models must adapt
Supercomputing, Visualization & e-Science53escigriducisa/03 I don't want to share! Do I need a grid?
Supercomputing, Visualization & e-Science54escigriducisa/03 In conclusion The GRID is not, and will not, be free –must pay for resources What have we to show for £250M?
Supercomputing, Visualization & e-Science55escigriducisa/03 Acknowledgements Carole Goble Stephen Pickles Paul Jeffreys University of Manchester Academic collaborators Industrial collaborators Funding Agencies: DTI, EPSRC, NERC, ESRC, PPARC
Manchester Computing Supercomputing, Visualization & eScience World Leading Supercomputing Service, Support and Research Bringing Science and Supercomputers Together Manchester Computing