Presentation is loading. Please wait.

Presentation is loading. Please wait.

Revolution Analytics Overview of Revolution R Enterprise

Similar presentations


Presentation on theme: "Revolution Analytics Overview of Revolution R Enterprise"— Presentation transcript:

1 Revolution Analytics Overview of Revolution R Enterprise
For the Dallas R User’s Group Joseph B. Rickert, Marketing Manager

2 Agenda Revolution Analytics Today Revolution R Enterprise
Revolution Analytics in the Enterprise Big Data with RevoScaleR Deploying R Throughout the Enterprise with RevoDeployR

3 Corporate Overview & Quick Facts
“Revolution Analytics is the leading commercial provider of software and support for the open-source R statistical computing language.” Founded 2008 (as REvolution Computing) Office Locations Palo Alto (HQ), Seattle (Eng) CEO David Rich Number of Employees 40+ Number of customers 100+ Investors Northbridge Venture Partners, Intel Capital, Presidio Ventures Our Mission: “To make Revolution the world-leading predictive analytics company while continuously improving and contributing to the R experience for the open source community.”

4 Open Source Analytics for the Enterprise
Most advanced statistical analysis software available The professor who invented analytic software for the experts now wants to take it to the masses Half the cost of commercial alternatives Power Productivity Enterprise Readiness 2M+ Users 2,500+ Applications Finance Statistics Life Sciences Predictive Analytics Manufacturing Retail Data Mining Telecom Social Media Visualization Government

5 Revolution R Enterprise
Productivity

6 Revolution R Enterprise has Open-Source R Engine at the core
2,500 community packages and growing exponentially Multi-Threaded Math Libraries Web Services API Big Data Analysis Parallel Tools Technical Support Developer IDE R Engine Language Libraries Community Packages Build Assurance

7 A network of partners for integrated, large-scale data analysis
Advanced Analytics Deployment / Consumption Data Infrastructure

8 Revolution R Enterprise
Performance

9 Performance: Intel MKL Math Libraries
Open Source R Revolution R Enterprise Computation (4-core laptop) Open Source R 2.13.2 Revolution R Enterprise 5.0 Speedup (4-core laptop) Linear Algebra1 Matrix Multiply 174.6 sec 10.4 sec 15.8x Cholesky Factorization 25.7 sec 1.4 sec 17.6x Linear Discriminant Analysis 224.4 sec 20.1 sec 7.6x General R Benchmarks2 R Benchmarks (Matrix Functions) 24.9 sec 3.8 sec 5.5x R Benchmarks (Program Control) 4.7 sec 4.6 sec Not appreciable 1. 2.

10 Revolution R Enterprise
Big Data Analysis

11 A common analytic platform across big data architectures
Hadoop File Based In-database Lots of funding of storage companies Large enterprises will not choose only one of these, they will likely use several; each has pros and cons We are the only company that can span each of these technologically and relationship wise The ability to not be forced to silo analysis and/or re-write algorithms offers tremendous value to companies

12 Two Big Data problems: capacity and speed
Capacity: problems handling the size of data sets or models Data too big to fit into memory Even if it can fit, there are limits on what can be done Even simple data management can be extremely challenging Speed: even without a capacity limit, computation may be too slow to be useful Capacity: problems handling the size of data sets or models Data too big to fit into memory Even if it can fit, there are limits on what can be done Even simple data management can be extremely challenging Speed: even without a capacity limit, computation may be too slow to be useful

13 RevoScaleR: Big Data Analysis for Revolution R Enterprise
Distributed Statistical Algorithms External Memory Programming Framework Addresses performance by distributing computations between cores and computers Addresses capacity through a collection of functions for chunking through massive data files R Language Interface XDF File Format A novel high-speed file format designed specifically to support statistical analyses Familiar, high-prodictivity programming paradigm for R users

14 The basis for a solution for capacity, speed, distributed and streaming data – PEMA’s
Parallel external memory algorithms (PEMA’s) allow solution of both capacity and speed problems, and can deal with distributed and streaming data External memory algorithms are those that allow computations to be split into pieces so that not all data has to be in memory at one time It is possible to “automatically” parallelize and distribute such algorithms

15 Multicore Processor (4, 8, 16+ cores)
RevoScaleR on a Multicore Server Shared Memory Data Data Data Disk Core 0 (Thread 0) Core 1 (Thread 1) Core 2 (Thread 2) Core n (Thread n) Multicore Processor (4, 8, 16+ cores) RevoScaleR A RevoScaleR algorithm is provided a data source as input The algorithm loops over data, reading a block at a time. Blocks of data are read by a separate worker thread (Thread 0). Other worker threads (Threads 1..n) process the data block from the previous iteration of the data loop and update intermediate results objects in memory When all of the data is processed a master results object is created from the intermediate results objects

16 RevoScaleR for Distributed Computing Clusters
Compute Node (RevoScaleR) Data Partition Portions of the data source are made available to each compute node RevoScaleR on the master node assigns a task to each compute node Each compute node independently processes its data, and returns it’s intermediate results back to the master node master node aggregates all of the intermediate results from each compute node and produces the final result Compute Node (RevoScaleR) Data Partition Master Node (RevoScaleR) Compute Node (RevoScaleR) Data Partition Compute Node (RevoScaleR) Data Partition

17 Platform-agnostic Big Data Analytics
Set “compute context” to define hardware (one line of code) Native job-scheduler handles distribution, monitoring, failover etc. Same code runs on other supported architectures Just change compute context Supported architectures: Windows: Microsoft HPC Server Linux: Platform Computing LSF (coming 2012) 42 seconds instead of 6 minutes

18 R and Hadoop Hadoop offers a scalable infrastructure for processing massive amounts of data Storage – HDFS, HBASE Distributed Computing - MapReduce R is a statistical programming language for developing advanced analytic applications Currently, writing analytics for Hadoop requires a combination of Java, pig, Python, … The Rhadoop project makes it possible to write PEMAs for Hadoop using the R language alone.

19 Massively parallel/distributed analytics: RevoConnectR for Hadoop
HDFS HBASE Write Map-Reduce analytics using only R code with these R packages: rhdfs - R and HDFS rhbase - R and HBASE rmr - R and MapReduce R Thrift Map or Reduce rhbase Task Node rhdfs Revolution R Client More information at: bit.ly/r-hadoop Job Tracker rmr

20 In-Database Execution with IBM Netezza

21 Revolution R Enterprise
Enterprise Deployment

22 Revolution R Web Services: RevoDeployR
Data Sources & Creation of Analytics Consumption of Analytics & Results Data Analysis Revolution “RevoDeployR” R / Statistical Modeling Expert Deployment Expert Business Intelligence Interactive Web Apps Enables Models and Analytics to be consumed by end users through BI applications Risk Analysis Sales Forecasting Implemented as a collection of Web Services that allow easy integration into many different 3rd party applications Same technology that provides the statistical engine for the GUI Cloud / SaaS

23 Twitter: @RevolutionR
Thank you. The leading commercial provider of software and support for the popular open source R statistics language.


Download ppt "Revolution Analytics Overview of Revolution R Enterprise"

Similar presentations


Ads by Google