Presentation on theme: "Manchester Computing Cross Council ICT Conference For e-Science & GRID 17-19 May 2004 End to End Services to support an e-Science Community Professor M."— Presentation transcript:
Manchester Computing Cross Council ICT Conference For e-Science & GRID May 2004 End to End Services to support an e-Science Community Professor M J Clark Director IS Manchester Computing
2 Agenda End to End Services to support an e-Science Community –How does it relate to institutional Strategy –Myth & Magic? –What do we demand? –End-to-end issues! –The challenges! –Economic issues –Should I bury my head in the sand!
The vision central to the University's IS Strategy is: To provide a transparent and seamless interface to teaching, RESEARCH and administrative information services.
4 An IS architecture to provide an environment where the IS solutions maximize efficiency and effectiveness handling of: –routine transactions and access to support –creating solutions for less routine but essential transactions that facilitates University staff to provide the highest levels of customer service –whilst maintaining high degrees of job satisfaction where staff have ready access to tools necessary to do their job efficiently and effectively with simplified processes and policies within constraints acknowledging risks associated with devolved authority rich in services through a single aggregated interface accessible from networked devices
5 The Principles Strive for Simplification –Develop tools that can be flexibly applied to reduce the complexity of University business processes. Enhance Individuals Productivity –Provide flexible tools that individuals can use to perform their roles more effectively. Encourage Collaboration and Common Process approaches –alliances with and between stakeholders in process mechanisms in order to further the University's goals. Empower Technologies as an Investment –View IS investment in systems, staff and process as an investment that will yield a return in exchange for up-front expenditures with full transparency of any assumptions of risk. Focus on Outcomes –Measure and assess projects and teams by what is accomplished.
6 How does this translate into End-to- end support for Research To demonstrably enhance the research process –from idea –through planning and resourcing –supporting access to: data, codes & algorithms, computing thru to supercomputing –post-processing & visualization –to results and scientific insight –leading to innovation and to deliver this formidable advantage: –to all researchers, –in the most natural and powerful way possible Adding value to users research –Collaboration in and through projects
7 e-Science What does it mean to me? We were told! e-Science is about global collaboration in key areas of science, and the next generation of infrastructure that will enable it. e-Science will change the dynamic of the way science is undertaken. John Taylor, Director General of Research Councils, Office of Science and Technology
8 GRIDs […provides] "Flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions, and resource" –From The Anatomy of the Grid: Enabling Scalable Virtual Organizations "…enables communities (virtual organizations) to share geographically distributed resources as they pursue common goals -- assuming the absence of central location, central control, omniscience, existing trust relationships."
9 My translation Little problems are no longer good enough Large-scale research is done through –the interaction of people, –heterogeneous computing resources, information systems, and instruments, –all of which are geographically and organizationally dispersed. The overall motivation for Grids is to facilitate the routine interactions of people & resources in order to support large-scale science(s) and engineering. GRID was a bad noun to choose
10 So then: What & when? Resource sharing & coordinated problem solving in dynamic, multi-institutional virtual organizations On-demand, ubiquitous access to computing, data, and all kinds of services New capabilities constructed dynamically and transparently from distributed services When as a: –Research Project –Pilot Service –Full production service
11 An advanced IT infrastructure & standards An infrastructure that is hidden from the real science When research is facilitated invisibly by the infrastructures –The standards are embedded within the research support infrastructures and are transparent to the science
13 The 5 phases of the Hype Cycle A Hype Cycle is a graphic representation of the maturity, adoption and business application of specific technologies. 1. "Technology Trigger" The first phase of a Hype Cycle is the "technology trigger" or breakthrough, product launch or other event that generates significant press and interest. 2. "Peak of Inflated Expectations" In the next phase, a frenzy of publicity typically generates over-enthusiasm and unrealistic expectations. There may be some successful applications of a technology, but there are typically more failures. 3. "Trough of Disillusionment" Technologies enter the "trough of disillusionment" because they fail to meet expectations and quickly become unfashionable. Consequently, the press usually abandons the topic and the technology. 4. "Slope of Enlightenment" Although the press may have stopped covering the technology, some businesses continue through the "slope of enlightenment" and experiment to understand the benefits and practical application of the technology. 5. "Plateau of Productivity" A technology reaches the "plateau of productivity" as the benefits of it become widely demonstrated and accepted. The technology becomes increasingly stable and evolves in second and third generations. The final height of the plateau varies according to whether the technology is broadly applicable or benefits only a niche market.
Gartner Hype cycle for emerging technologies July 2003
15 Gartner on Commercial GRIDs Definition: –Grid formed for non-scientific, non-technical tasks across multiple enterprises to address a single, large-scale purpose. Grids can also be used within one enterprise. The term "grid" is sometimes misused to denote the related technologies of distributed and utility computing. Time to Plateau/Adoption Speed: –Five to 10 years. Justification for Hype Cycle Position/Adoption –Speed: Growing movement by vendors to call products and long-term visions "grid." Confusion over definitions, benefits, maturity and applicability. Little is known about what commercial grid applications might be. Business Impact Areas: –New industry models could replace third-party intermediaries for large, multi- enterprise systems. Joint business opportunities with combined data warehouse and analytics. Distributed computing to increase efficiency and use of IT resources. Some claim grids will transform commercial IT operations. Analysis by Carl Claunch
16 Grids consists of … Computational facilities (supercomputers, clusters, workstations, small processors, …) Access to mass storage (disk drives, tapes, …) Networking (including wireless, distributed, ubiquitous) Digital libraries/data bases Sensors/effectors Software (operating systems, middleware, domain specific tools/platforms for building applications) Services (education, training, consulting, user assistance) With people: All working together in an integrated fashion.
18 What are the issues facing service deliverers? No central coordination –Lack of joined up requirement, commitment, or resource demands Little predetermination –Making it up as we go –No planned investment, based on need as it arises –No shared understanding of the problems created Difficult to say no! –But saying yes is also bad! No experience-based trust relationships –We are asked to support un-trusted third parties –They want to install beta-class software! (and thats generous) –They want ports and security left wide open!!! AND THEY WANT IT FREE!
19 Todays Grid is demanding Transparent wide-area access to large data banks Transparent wide-area access to applications on heterogeneous platforms Transparent wide-area access to processing resources Security, certification, single sign-on authentication (AAAs) –Grid Security Infrastructure, Data access, Transfer & Replication –GridFTP Computational resource discovery, allocation and process creation –GRAAM, Unicore, Condor-G
20 E-science developments predict shifts in current research practice: Research (in many disciplines) revolutionized by using computers, digital data, and networks to replace and extend their traditional efforts. –Do we have enough resources directed at scaling solutions to problems –HPC is running on legacy codes not designed for bid problems –Algorithms are not designed for new paradigms/new science New technology-mediated, distributed work environments relax constraints of distance and time –Do we train to work in virtual organisations
21 Challenges to classical approach The classic two approaches to scientific research, –theoretical/analytical and experimental/observational, have been extended to in silico simulation –and modelling to explore new possibilities and to achieve new precision. Challenged by: –The enormous performance leap of computers (and networks) enable simulations of far more complex systems and phenomena, as well as visualizing the outputs –Advanced computing is no longer restricted to a few research groups in a few fields such as weather prediction and high-energy physics, but pervades scientific and engineering research, including the biological, chemical, social, and environmental sciences, medicine, and nanotechnology. –The primary access to the latest findings in a growing number of fields is through the Web –Crucial data collections in the social, biological, and physical sciences are online and remotely accessible
22 Tomorrow we might expect (1) Combine raw data and new models from many sources, and utilize the most up-to-date tools to analyze, visualize, and simulate complex interrelations Collect / make information widely available –E.g. the outputs of all major observatories and astronomical satellites, satellite and land-based weather data, three-dimensional images of anthropologically important objects) –leading to a qualitative change in the way research is done and the type of science that results. Work across traditional disciplinary boundaries: environmental scientists will take advantage of climate models, physicists will make direct use of astronomical observations, social scientists will analyze interactive behaviour of scientists as well as others
23 Tomorrow we might expect (2) Simulate more complex and exciting systems –E.g. cells and organisms rather than proteins and DNA; –the entire earth system rather than air, water, land, and snow independently Access the entire published record of science online Make publications incorporating rich media (hypertext, video, photographic images) Visualize the results of complex data sets in new and exciting ways, –and create techniques for understanding and acting on these observations Work routinely with colleagues at distant institutions –even ones that are not traditionally considered research universities, and with junior scientists and students as genuine peers, despite differences in age, experience, race, or physical limitations.
24 Knowledge environments for GRID working Community-Specific Knowledge Environments for researcher communities Customised for specific disciplines/inter-disciplines HPC service Data, information Knowledge management services Observation Measurement Data-collections services Interfaces Visualisation services Collaboration services Networking, Operating Systems, Middleware Infrastructures: Computation, storage, communication Denotes: grid infrastructures
25 NSF Cyberinfrastructure – panel conclusion The Panels overarching finding is that: A new age has dawned in scientific and engineering research, pushed by continuing progress in computing, information, and communication technology; and pulled by the expanding complexity, scope, and scale of todays research challenges. The capacity of this technology has crossed thresholds that now make possible a comprehensive cyberinfrastructure on which to build new types of scientific and engineering knowledge environments and organizations and to pursue research in new ways and with increased efficacy. The cost of not doing this is high, both in opportunities lost and through increasing fragmentation and balkanization of the research communities.
26 Is There a Definition for Cyberinfrastructure (CI)? Not really - means different things to different groups - but there are commonalities Literally, infrastructure composed of cyber elements Includes High-End Computing (HEC, or supercomputing), grid computing, distributed computing, etc. etc.
27 Is There a Definition of Cyberinfrastructure (CI)? Working definition: an integrated system of interconnected computation/communication/information elements that supports a range of applications Note: We are only at the beginning of infrastructure developments Cyberinfrastructure is the means; e-Science is the result
28 Integrated architectures Hardware Grid Services & Middleware Development tools & Libraries Domain specific tools Discovery & innovation Education & training } Discipline Independent infrastructures Applications
29 In Ten Years, an infrastructure that is… rich in resources, comprehensive in functionality, and ubiquitous; easily usable by all scientists and engineers accessible anywhere, anytime needed by authenticated users; interoperable, extendable, flexible, tailorable, and robust; funded by multiple agencies, states, campuses, and organizations; supported and utilised by educational programs at all levels.
30 Some characteristics: Built on broadly accessible, highly capable network: 100s of terabits backbones down to intermittent, wireless connectivity at low speeds Contains significant and varied computing resources: 100s of petaflops at high end, with capacity to support most scientific work Contains significant storage capacity: exabyte collections common; high-degree of DB confederation possible Allows wide range of sensors/effectors to be connected: sensor nets of millions of elements attached Contains a broad variety of intelligent visualization, search, database, programming and other services that are fitted to specific disciplines
31 The initial Challenges Technical Challenges –Computer Science and Engineering broadly –How to build the components? –Networks, processors, storage devices, sensors, software –How to shape the technical architecture? –Pervasive, many cyberinfrastructures, constantly evolving/changing capabilities –How to customize CI to particular Sci & Eng domains Operational Challenges –Data standards –General interoperability –Resource allocation –Security and privacy –Training –Continuous evolution Funding/Ownership Challenges –Cooperation among agencies –Cooperation between federal and state/private levels –Role of campuses –Interaction with private industry –££££s !
32 Computer Services Must run or be extensively involved in Grids –Experimental services for developers –Production service for developers –Production service for users Must be resourced to support the new world
33 The computer centre remit Once at the heart of the Computing/Network research agenda –Now support the core business –Provide plain old internet services –Significantly about quantity rather than quality –Resource limited; minimum risk environment; intolerant user base Manchester Computing is not typical –Staff actively engaged doing research –Success through partnerships –Risk taking within constraints –Entrepreneurial; >50% of funding external
34 Success through internal Partnerships with ESNW –Computer Science and increasingly all Schools –Backed by £3.1m institutional investment for 2004 BioBank (Hub & Spoke) –With Medicine NCeSS –With Social Sciences, Computer Science, Economics, Geography + Essex National Text Mining Centre –UMIST & Manchester + Salford + Liverpool
35 The MANs & Campus Networks The Metropolitan Area Networks will provide GRID capabilities –High Speed access regardless of location Resilience: how important? Who pays and how –Fairness (location) v cost –Commercial partner issues –Very significant Quality of Service issues to be resolved! Campus Networks –The last mile syndrome –Commodity v research needs (also for MANs) –Security v accessibility –Who has the accountability/responsibility, Who has the sanctions –Very real threats through providing access
36 Dont forget the Library (knowledge) services Knowledge is premised on the access to information Librarians are professionals at information management –The nature of the medium holding knowledge is changing –The nature of the learned article is changing May contain multimedia Or datasets including access to applications to re-run the experiment and even modify the parameters The data available to the researcher is growing at an alarming rate Understanding of the IPR issues in relation to knowledge, information and data is a professional issue They could be the experts for digital curation support –A growing responsibility for us all!
37 The missing skills Where are the people who are going to develop new codes for new architectures including data resources –Optimisation and recovery require to be integral Requires comprehension of the science and understanding of the algorithms Needs to be driven by the demands for efficiency/effectiveness of solutions Needs to understand the associated datasets Note US Gov is funding both architecture and language development for the 2015 timeframe –UK must not loose through under-investment to benefit the future
38 Visualisation A picture is worth a thousand words Complex information (data) requires simplification for the human consumer The cost of local visualisation facilities has radically diminished –3d, virtual reality, high definition……… However, it will require to handle complex datasets or real- time processing –The visualisation may cause the need to steer the science dynamically An area of support/requirement expected to expand dramatically with significant new tools/techniques required
39 The migration from research to production From developer/champion -> dont care user –From 1 st user -> Thousands Research Project -> Computer Service –Who does QA? –Who does integration? –Who supports it? –Who promotes it? –Who does the development? Open Middleware Infrastructure Institute (OMMI) will do some –Quality assurance, testing as the community will not abide bugs Computer services like supported products –Only the best survive?
40 AAA Authentication, Authorisation and Accounting (AAA) Managing access resources involves a number of processes: –authentication - identifying the person requesting the access –authorisation - determining from that person's identity, and often using other sources of information, what privileges the individual has and hence whether access should be allowed or not –accounting - maintaining logs of events for the purpose of generating management information on resource usage A big challenge to provide certificates for every learner –Is there anyone who is not a learner
41 The digital certificate An attachment to an electronic message used for security purposes. E.g. to verify that a user sending a message is who he/she claims to be, and to provide the receiver with the means to encode a reply.attachmentsecurity An individual wishing to send an encrypted message applies for a digital certificate from a Certificate Authority (CA). The CA issues an encrypted digital certificate containing the applicant's public key and a variety of other identification information. The CA makes its own public key readily available through print publicity or perhaps on the Internet.encryptedCertificate Authority (CA)public keyInternet The recipient of an encrypted message uses the CA's public key to decode the digital certificate attached to the message, verifies it as issued by the CA and then obtains the sender's public key and identification information held within the certificate. With this information, the recipient can send an encrypted reply.
42 Digital certificates Digital certificates are required as the means of authenticating individuals in e-science Grid projects; they will become more widespread in normal campus operations. Issues to be investigated: –certificate profiling –life-cycle management of certificates, including revocation mechanisms –key recovery mechanisms –use of certificates on public-access workstations –user mobility (on and off campus) –"mixed economy" working, i.e. use of certificates alongside more traditional forms of electronic credentials –development of open source tools to facilitate deployment of certificates in typical university or college environments
43 The REAL Challenge Educational Challenges –How to make sure that future generations of scientists and engineers can fully utilize emerging enabling infrastructures New paradigms, methods, objectives How to retrain current scientists and engineers How to make sure that new ideas for extending supporting architectures continue to come from those that are using it
44 Competition v collaboration It is a cultural agenda Assumes we can build virtual organisations We have been indoctrinated to compete! –Why collaborate? Why do I want to open my resources to others –over whom I have little or no control Who gives me resource to facilitate collaboration The Japanese say: We collaborate to compete
45 Reality Checks!! The Technology is Ready? –Not true its emerging and certainly not robust Building middleware, Advancing Standards, Developing, Dependability Building demonstrators. The computational grid is in advance of the data intensive middleware Integration and data curation are probably the obstacles But!! It doesnt have to be all there to be useful. We know how we will use grid services? –No Disruptive technology We need to lower the barriers of entry.
46 Grid Evolution 1 st Generation Grid –Computationally intensive, file access/transfer –Bag of various heterogeneous protocols & toolkits –Recognises internet, Ignores Web –Academic teams 2 nd Generation Grid –Data intensive -> knowledge intensive –Services-based architecture –Recognises Web and Web services –Global Grid Forum –Industry participation We are here!
47 Sharing & Funding The current philosophy is to donate some portion of their resource to the Grid. Who will donate resources to the GRID and why? –Will my VC? Quite reasonably a funding body could argue – If you only need X units of the resource to do the science you indicated in your case, then that is what you should get. –Alternatively if you need additional or alternative resources you should have indicated this in your original request. How should requests be cast? –Should a researcher bid for 110% for what is needed and then put the 10% into the Grid? –Should the user bid for 90% of what's needed and assume the rest is from the Grid.
48 In conclusion The biggest issues to be faced: The real challenge is cultural change in the research community –Getting researchers to see and prepare for the change that is coming –Its not about the infrastructures They will emerge Resources on a GRID are not, and will not, be free! –Resources have costs –Support for a GRID equally is not free of cost SOMEONE must pay Are we equipping the new graduates and post-graduates for this new world
Manchester Computing Thank you. Prof M.J. Clark Manchester Computing The University of Manchester M13 9PL Manchester Computing