Presentation on theme: "Elastic-R A cloud platform for web computing, real-time collaboration, rapid applications development and reproducible modelling Karim Chine Cloud Era."— Presentation transcript:
Elastic-R A cloud platform for web computing, real-time collaboration, rapid applications development and reproducible modelling Karim Chine Cloud Era Ltd firstname.lastname@example.org BD 04 February 2011
o Open-source (GPL) software environment for statistical computing and graphics o Lingua franca of data analysis. o Repositories of contributed R packages related to a variety of problem domains in life sciences, social sciences, finance, econometrics, chemo metrics, etc. are growing at an exponential rate. o R Website: http://www.r-project.org/ o CRAN Task View: http://cran.r-project.org/web/views/ o CRAN packages : http://cran.cnr.berkeley.edu/ o Bioconductor: http://www.bioconductor.org/ o R Metrics: https://www.rmetrics.org/ Scientific Computing Environments www.scilab.org http://root.cern.ch www.sagemath.org www.sas.com office.microsoft.com www.mathworks.com www.scipy.org www.spss.com www.wolfram.com
From: John Fox, Aspects of the Social Organization and Trajectory of the R Project, R Journal-Feb 2009 The ‘s Success Story
"Give me a place to stand, and I shall move the earth with a lever" Scientific/Statistical Computing Software, HPC and Usability
Extract from the NetSolve/GridSolve Description Document The emergence of Grid computing as the prototype of a next generation cyberinfrastructure for science has excited high expectations for its potential as an accelerator of discovery, but it has also raised questions about whether and how the broad population of research professionals, who must be the foundation of such productivity, can be motivated to adopt this new and more complex way of working. The rise of the new era of scientific modeling and simulation has, after all, been precipitous, and many science and engineering professionals have only recently become comfortable with the relatively simple world of the uniprocessor workstations and desktop scientific computing tools. In that world, software packages such as Matlab and Mathematica represent general-purpose scientific computing environments (SCEs) that enable users — totaling more than a million worldwide — to solve a wide variety of problems through flexible user interfaces that can model in a natural way the mathematical aspects of many different problem domains. Moreover, the ongoing, exponential increase in the computing resources supplied by the typical workstation makes these SCEs more and more powerful, and thereby tends to reduce the need for the kind of resource sharing that represents a major strength of Grid computing . Certainly there are various forces now urging collaboration across disciplines and distances, and the burgeoning Grid community, which aims to facilitate such collaboration, has made significant progress in mitigating the well-known complexities of building, operating, and using distributed computing environments. But it is unrealistic to expect the transition of research professionals to the Grid to be anything but halting and slow if it means abandoning the SCEs that they rightfully view as a major source of their productivity. We therefore believe that Grid computing’s prospects for success will tend to rise and fall according to its ability to interface smoothly with the general purpose SCEs that are likely to continue to dominate the toolbox of its targeted user base. Arnold, D. and Agrawal, S. and Blackford, S. and Dongarra, J. and Miller, M. and Seymour, K. and Sagi, K. and Shi, Z. and Vadhiyar, S.
Computational Components R packages : CRAN, Bioconductor, Wrapped C,C++,Fortran code Scilab modules, Matlab Toolkits, etc. Open source or commercial Computational Resources Hardware & OS agnostic computing engine : R, Scilab,.. Clusters, grids, private or public clouds free: academic grids or pay-per-use: EC2, Azure Computational User Interfaces Workbench within the browser Built-in views / Plugins / Spreadsheets Collaborative views Open source or commercial Computational Scripts R / Python / Groovy On client side: interactivity.. On server side: data transfer.. Stateful or stateless, automatic mapping of R data objects and functions Computational Application Programming Interfaces Java / SOAP / REST, Stateless and stateful Computational Data Storage Local, NFS, FTP, Amazon S3, Amazon EBS free or commercial Generated Computational Web Services Elastic-R Elastic-R is a ubiquitous plug-and-play platform for scientific and statistical computing
Public Clouds Private Cloud Elastic-R portal: single facade to public and private clouds
Elastic-R is a collaborative Virtual Research Environment. Users can share their machine instances, stateful remote engines, data,..
Reproducible research: A scientist can snapshot her computational environment and her data. She can archive the snapshot or share it with others. Elastic-R AMI 1 R 2.10 + BioC 2.5 Elastic-R AMI 2 R 2.9 + BioC 2..3 Elastic-R AMI 3 R 2.8+BioC 2.0 Elastic-R Amazon Machine Images Elastic-R EBS 1 Data Set XXX Elastic-R EBS 2 Data Set YYY Elastic-R EBS 3 Data Set ZZZ Elastic-R EBS 4 Data Set VVV Elastic-R AMI 2 R 2.9 + BioC 2.3 Elastic-R EBS 4 Data Set VVV Amazon Elastic Block Stores Elastic-R AMI 2 R 2.9 + BioC 2.3 Elastic-R EBS 4 Data Set VVV Elastic-R.org
Anatomy of an Elastic-R machine instance on Amazon EC2 HTTPS Restful WS over SSL SSH Restful WS over SSL SOAP over SSL Heartbeat Restful WS over SSL
The scientist can control any number of stateful R engines from within an R session on the cloud or on his machine. He can use them for parallel computing
Software+Services=Applications convergence + ubiquitous collaboration. The server-side toolkit: R + spreadsheet models + virtual gui widgets.
Elastic-R on Infrastructure-as-a-Service style Cloud
Amazon Virtual Private Cloud Subnet 2 Subnet 3 Subnet 1 The Elastic-R portal itself is an EC2 machine instance. Any number of portals can be run on EC2 for decentralized and private collaboration
T1T3T2 getData LogOn Login Pwd Options SessionID associated with a reserved Elastic-R Engine Retrieve Data logOff ES ES on2 ES on3 f ( ES ) ES on1 T1,T2,T3 : Generated Stateful Web Services for R functions T1,T2 & T3 LogOn, getData : R-SOAP methods ES : ExpressionSet ESon1, ESon2, ESon3 : ExpressionSet Object Names f = T3 o T2 o T1 remove ESonx « Clean » Elastic-R Engine Put Elastic-R Engine back in the Pool kill Elastic-R Engine Stateful generated Web Services : Elastic-R for workflow workbenches
Generate token Deliver token Use token Activate token Launch machine instance Register machine instance Use R console Call R Engine XXYYZZ AWS Credentials + Private Key One Amazon account and many users : Elastic-R signed tokens
Elastic-R Portal : www.elastic-r.org Articles about the project: Chine K. (2010). Open Science in the Cloud: Towards a Universal Platform for Scientific and Statistical Computing. In Handbook of Cloud Computing. (Chapter 19). Springer US. Karim Chine, "Learning Math and Statistics on the Cloud, Towards an EC2-Based Google Docs-like Portal for Teaching / Learning Collaboratively with R and Scilab," icalt, pp.752-753, 2010 10th IEEE International Conference on Advanced Learning Technologies, 2010 Karim Chine, "Scientific Computing Environments in the age of virtualization, toward a universal platform for the Cloud" pp. 44-48, 2009 IEEE International Workshop on Open Source Software for Scientific Computation (OSSC), 2009 Karim Chine, "Biocep, Towards a Federative, Collaborative, User-Centric, Grid-Enabled and Cloud- Ready Computational Open Platform" escience,pp.321-322, 2008 Fourth IEEE International Conference on eScience, 2008 Linkedin Group: http://www.linkedin.com/groups?home=&gid=2345405 Links
Node 5 : EC2 virtual machine 2 Remote Objects Registry Node 1: Windows XP Front-end host Node 4 : EC2 virtual machine 1 Node 2: Mac OS Node 3: 64 bits Server / Linux Supervisor Cloudbursting via Amazon Web Services Perl Scripts logOn Use R logOff.NET Appli logOn Use R logOff R-HTTPR-SOAP Parallel Computing Applications Borrow Rs Use Rs Release Rs Web Application Borrow R Generate Graphics/Data Release R Pool B Pool A Pool C Elastic-R SOA platform