Presentation on theme: "Assembling the EVOp Infrastructure Yehia El-khatib, Gordon S. Blair School of Computing & Communications Lancaster University."— Presentation transcript:
Assembling the EVOp Infrastructure Yehia El-khatib, Gordon S. Blair School of Computing & Communications Lancaster University
Outline EVOp: An introduction Assembling the infrastructure Developing the web interface Why cloud? Issues influencing cloud uptake –What could be done Summary
EVOp Environmental Virtual Observatory pilot. – 2 years from the start of 2011. Funded by the UK Natural Environment Research Council (NERC) to help tackle big environmental science questions through: – Enabling the integration of a variety of information sources at different granularities and scales. – Facilitating the handling of large data sets from different sources. – Providing simple access tools to increase engagement from policy makers, local communities and the general public. Focus on hydrology. Grant reference NE/I002200/1
EVOp: 4 main user groups Worry less about some of the repetitive tasks. Share and reuse datasets and workflows. Scientists An open decision support system. Policy Makers Explore the impact of different practices (farming, water management, etc.). Local Communities Raise awareness of current environmental issues and encourage a wider discussion. General Public
EVOp: 4 main user groups Scientists Policy Makers Local Communities General Public Hardware Resources Virtual Resources Models Web Services Web Interface Processes, not design.
Infrastructure Hybrid model where private resources are normally used, and public resources are used at times of increased load. Developed our own load balancer to manage resource usage to reduce costs while maintaining user experience. Might adjust in the future to run experimental services (e.g. tailored workflows) on private resources, and move more streamlined services (e.g. models) to run on public resources.
Infrastructure Public cloud: Amazon Web Services. Private cloud managed by Eucalyptus Community Cloud. – Provides an open source alternative to EC2 and S3 (similar interfaces). – However, moving between Eucalyptus and AWS is not always easy, as images need a lot of preparation beforehand. – Moreover, recent versions (1.6+) had stability issues. – Also, community support is weak. Currently testing OpenStack as an alternative, also AWS- compatible. The use of jClouds is very important to us to minimalise portability overheads (prevent being locked in to one cloud provider).
Multifaceted Web Interface We cater to different user groups of varying backgrounds and experience levels. – Users (including scientists) are not IT experts, or at least would rather not be! – They do not want to tussle with compatibility issues, security restrictions, stringencies about citing/sharing, etc. Developed an intuitive user interface that is tested repeatedly with stakeholders to ensure a low entry barrier for all targeted user groups. – Easy to use (find your way around) – Easy to understand (comprehend what this offers) – Easy to relate to (tweak-ability, reproducibility, reuse, sharing)
Multifaceted Web Interface General interface allows users to do things like: – Learn about the risk of flooding in their local area. – Explore how different farming practices affect such risk. Authenticated government / local council officials could: – Learn about polluting nutrients diffused from different catchments. – Examine how policies would affect pollution levels at different scales. An advanced path allows scientists to compose workflows: – A workflow is a pipeline of basic execution units (executables, scripts, web services, etc.). – Done in the browser. No programming prerequisites. – Allows the sharing of workflows and datasets to promote reuse, citing and collaboration.
Why Cloud? Flexibility (Virtualisation) – Allows the dynamic provisioning of bespoke environments. – Everything from the hardware, platform, libraries, etc. can be customised to suit the exact needs of an application. – Very little limitations on what the application should be. Build what we want! – To draw a comparison: Grid users are tied in to too many specifications of the grid environment: hardware architecture, runtime environment, scheduling interface, and supported application interface. As such, only certain types of jobs can be submitted, where precompiling is sometimes needed to ensure compatibility.
Why Cloud? Versatile resource management (SOA) – All resources have a uniform view. – Allows us to support data assets of different origins: from in situ gauging stations, warehoused data stores, user provided, and external sources. – Facilitates sharing and reuse (e.g. workflows) which promotes a culture of collaboration. – Provides abstraction so that data can be used in models and simulations without necessarily giving it away. – Provides transparency details of where and how the data is held are hidden without affecting user experience.
Why Cloud? Easy access to resources (IaaS) – IaaS: hardware resources as a utility. – Allows the infrastructure to scale to meet user demand and maintain quality of service. – Ease of mind: issues of reliability, performance, and security at that hardware level are outsourced. – Allows us focus on solving domain-specific problems. – No usage quotas (unless you want to). – Very few AAA hoops to jump through. …as long as you can pay for that!
Issues Influencing Cloud Uptake Users see the advantage straight away. – Previously a scientist needed to have the data on their computer, develop & calibrate a model, run it. Check output. Rinse & repeat. – Identify ease of use, universal access, abundance of resources. Some data producers are reluctant to provide their data through what they perceive as new, untested means. – Some communities are more advanced than others. Easier to get funding based on the PAYG economic model. – Cut upfront costs. Reduce money spent on unused resources. Funding bodies still perceive security to be a concern. – A cloud is just a computer system. – Public could service providers have whole teams working on security.
What more could be done? Introduce national (or even regional) initiatives to regulate cloud service provisioning. – This should ease a lot of the worry about trust. Educating data owners about cloud computing. – Difficult. – Hopefully success stories (e.g. NGS cloud, EduServ) could alter attitudes. Educating research communities about available cloud solutions. – Teach students cloudnumbers.com rather than MATLAB.
Summary Cloud resources are easy to steer in order to serve the needs of a scientific application without imposing development restrictions, integration boundaries or deployment difficulties. Public cloud is convenient, but a hybrid one offers more options. There are concerns surrounding trust and security (such as data licensing) that affect the uptake of cloud computing in research communities. – Some measures could be taken to alleviate such concerns.
Distributed Computing Paradigms HPCGridP2P Cloud (public) Cloud (private) Ownership (management) My universityOur universitiesOur partners3 rd partyMy university TrustVery HighHigh Trust in partners Perceived by some as problem Very High ReliabilityHigh Depends on size & partners Very highHigh? Accounting Individual quotas Individual / Org. quotas Difficult…Pay per use Homebrewed Access Control CustomisationVery badBadFairly flexibleVery flexible AccessEasyComplicated Easy Support Local sysadmin Remote sysadmin Local/Remote sysadmin 24x7 support Local sysadmin
Thank you! Questions http:// www.evo-uk.org/ @EVOpilot Yehia El-khatib http:// www.comp.lancs.ac.uk/~elkhatib/ @yelkhatib Gordon S. Blair http:// www.comp.lancs.ac.uk/department/staff.php?name=gordon firstname.lastname@example.org