Big Data Hands-On Labs:

Slides:

Advertisements

Similar presentations

Extreme Performance with Oracle Data Warehousing

Advertisements

Tableau Software Australia

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 1.

Living with Exadata Presented by: Shaun Dewberry, OS Administrator, RDC Tom de Jongh van Arkel, Database Administrator, RDC Komaran Hansragh, Data Warehouse.

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 1.

Hardening Hadoop for the Enterprise: Managing Diverse Workloads, Securing and Governing your Big Data Platform How does IT balance the tension between.

SAS on Your Cluster Serving your Data (Analysts)

FAST FORWARD WITH MICROSOFT BIG DATA Vinoo Srinivas M Solutions Specialist Windows Azure (Hadoop, HPC, Media)

Advance Analytics Capabilities

Oracle Enterprise Manager – Cloud Control 12c Simon Keys, The Small Ronnie Martin Lambert, The Large Ronnie.

Tableau Visual Intelligence Platform

Observation Pattern Theory Hypothesis What will happen? How can we make it happen? Predictive Analytics Prescriptive Analytics What happened? Why.

Tableau Visual Intelligence Platform

© Hitachi Data Systems Corporation All rights reserved. 1 1 Det går pænt stærkt! Tony Franck Senior Solution Manager.

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 1 Preview of Oracle Database 12 c In-Memory Option Thomas Kyte

Grid Computing Veronique Anxolabehere Senior Director of Product Marketing Mike Margulies Senior Director, Grid Platform Solutions.

SAS Analytic Solutions Running on a Hadoop Cluster using YARN

Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc

New SonicWALL Security ESA 3300 & ESA 4300 Appliances For SonicWALL Sales and SonicWALL Partners Allen Schoonmaker Product Line Manager

A Brief Overview by Aditya Dutt March 18 th ’ Aditya Inc.

Bob Thome, Senior Director of Product Management, Oracle SIMPLIFYING YOUR HIGH AVAILABILITY DATABASE.

HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.

Presented by CH.Anusha.  Apache Hadoop framework  HDFS and MapReduce  Hadoop distributed file system  JobTracker and TaskTracker  Apache Hadoop NextGen.

Introduction to Hadoop and HDFS

InfiniSwitch Company Confidential. 2 InfiniSwitch Agenda InfiniBand Overview Company Overview Product Strategy Q&A.

Business Intelligence Appliance Powerful pay as you grow BI solutions with Engineered Systems.

Maximize Return on Engagement via Scalable Omni-Channel Online Services in the Cloud COMPANY PROFILE: XOMNI, INC. Founded in 2011 and headquartered in.

An Introduction to HDInsight June 27 th,

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 1.

Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!

1 Apache Spark and Its Role in the Enterprise Data Hub Mike Olson, Chief Strategy Officer,

WHAT OUR CUSTOMERS ARE SAYING “After thorough market research and a review process, Qorus Breeze Proposals stood out from the competitors because of its.

Hadoop IT Services Hadoop Users Forum CERN October 7 th,2015 CERN IT-D*

Copyright © 2015, SAS Institute Inc. All rights reserved. THE ELEPHANT IN THE ROOM SAS & HADOOP.

Powered by Microsoft Azure, PointMatter Is a Flexible Solution to Move and Share Data between Business Groups and IT MICROSOFT AZURE ISV PROFILE: LOGICMATTER.

 Introduction  Architecture NameNode, DataNodes, HDFS Client, CheckpointNode, BackupNode, Snapshots  File I/O Operations and Replica Management File.

Microsoft Azure and DataStax: Start Anywhere and Scale to Any Size in the Cloud, On- Premises, or Both with a Leading Distributed Database MICROSOFT AZURE.

Experiments in Utility Computing: Hadoop and Condor Sameer Paranjpye Y! Web Search.

Scalable data access with Impala Zbigniew Baranowski Maciej Grzybek Daniel Lanza Garcia Kacper Surdy.

WHAT EXACTLY IS ORACLE EXALYTICS?. 2 What Exactly Is Exalytics? AGENDA Exalytics At A Glance The Exa Family Do We Need Exalytics? Hardware & Software.

Saasabi’s Analytical Processing Engine in the Cloud Makes Business Intelligence Affordable for Everyone COMPANY PROFILE: Saasabi Saasabi is a BizSpark.

Configuring SQL Server for a successful SharePoint Server Deployment Haaron Gonzalez Solution Architect & Consultant Microsoft MVP SharePoint Server

1 Copyright © 2008, Oracle. All rights reserved. Repository Basics.

Peter Idoine Managing Director Oracle New Zealand Limited.

Oracle Exalytics Business Intelligence Machine Eshaanan Gounden – Core Technology Team.

The Derivitec Risk Portal Provides Powerful, Cost-Effective Risk Management Solutions, Powered by Azure, that Deploy in Minutes MICROSOFT AZURE ISV PROFILE:

Data Analytics Challenges Some faults cannot be avoided Decrease the availability for running physics Preventive maintenance is not enough Does not take.

Univa Grid Engine Makes Work Management Automatic and Efficient, Accelerates Deployment of Cloud Services with Power of Microsoft Azure MICROSOFT AZURE.

Organizations Are Embracing New Opportunities

Hadoop and Analytics at CERN IT

What’s coming? Sneak peek.

Couchbase Server is a NoSQL Database with a SQL-Based Query Language

Welcome! Thank you for joining us. We’ll get started in a few minutes.

NGAGE Intelligence Leverages Microsoft Azure Platform to Provide Essential Analytics for Hybrid SharePoint Server/Office 365 Environments MICROSOFT AZURE.

Designed for Big Data Visual Analytics, Zoomdata Allows Business Users to Quickly Connect, Stream, and Visualize Data in the Microsoft Azure Platform MICROSOFT.

Scalable SoftNAS Cloud Protects Customers’ Mission-Critical Data in the Cloud with a Highly Available, Flexible Solution for Microsoft Azure MICROSOFT.

Logsign All-In-One Security Information and Event Management (SIEM) Solution Built on Azure Improves Security & Business Continuity MICROSOFT AZURE APP.

On-Premises, or Deployed in a Hybrid Environment

Oracle Architecture Overview

DeFacto Planning on the Powerful Microsoft Azure Platform Puts the Power of Intelligent and Timely Planning at Any Business Manager’s Fingertips Partner.

Accelerate Your Self-Service Data Analytics

Dell Data Protection | Rapid Recovery: Simple, Quick, Configurable, and Affordable Cloud-Based Backup, Retention, and Archiving Powered by Microsoft Azure.

Introduction to Apache

XtremeData on the Microsoft Azure Cloud Platform:

Overview of big data tools

TEMPLATE NOTES Our datasheet and mini-case study templates are formatted specifically for consistency of branding at Microsoft. Please do not alter font.

Dana Kaufman SQL Server Appliance Engineering

IBM Power Systems.

Built on the Powerful Azure Platform, Angoss Helps Businesses Turn Data into Actionable Insights That Reduce Risk, Increase Organizational Performance.

How Dell, SAP and SUSE Deliver Value Quickly

Presentation transcript:

Big Data Hands-On Labs: Date Time Location Tuesday 3:45pm – 4:45pm Hotel Nikko - Peninsula Wednesday 1:15pm – 2:15pm Thursday 11:30am – 12:30pm Or download: Big Data Lite Virtual Machine

Oracle Big Data Appliance for Customers and Partners Jean-Pierre Dijcks Oracle Big Data Product Management Paul Kent SAS VP Big Data

Oracle Big Data Appliance for Customers and Partners 1 Big Data Appliance Recap Why You Should Consider Big Data Appliance Driving Business Value with SAS on Big Data Appliance Q&A 2 3 4

Oracle Big Data Management System Oracle Big Data SQL Oracle Database Oracle Industry Models Oracle Advanced Analytics Oracle Spatial & Graph Cloudera Hadoop Oracle NoSQL Database Oracle R Advanced Analytics for Hadoop Oracle R Distribution Oracle Database Oracle Advanced Security Oracle Advanced Analytics Oracle Spatial & Graph Oracle Big Data Connectors Oracle Data Integrator Big Data Appliance Oracle Exadata SOURCES

Recap: Big Data Appliance Overview Big Data Appliance X4-2 Sun Oracle X4-2L Servers with per server: 2 * 8 Core Intel Xeon E5 Processors 64 GB Memory 48TB Disk space Integrated Software: Oracle Linux, Oracle Java VM Oracle Big Data SQL* Cloudera Distribution of Apache Hadoop – EDH Edition Cloudera Manager Oracle R Distribution Oracle NoSQL Database * Oracle Big Data SQL is separately licensed

Recap: Standard and Modular Starter Rack is a fully cabled and configured for growth with 6 servers In-Rack Expansion delivers 6 server modular expansion block Full Rack delivers optimal blend of capacity and expansion options Grow by adding rack – up to 18 racks without additional switches

Recap: Harness Rapid Evolution BDA 4.0 – Sept 2014 Big Data SQL Node Migration b BDA 4.0 BDA 2.x – April 2013 Starter Rack In-Rack Expansion EM Integration BDA 3.x – April 2014 CDH 5.0 (MR2 & YARN) AAA Security Encryption BDA 1.0 – Jan 2012 Initial BDA Mammoth Install

Core Design Principles for Big Data Appliance Operational Simplicity Simplify Access to ALL Data

Core Design Principles for Big Data Appliance Oracle Big Data SQL Oracle SQL on ALL your data All Native Oracle SQL Operators Smart Scan for Optimized Performance Oracle Security Govern all Data through a Single Set of Security Policies Operational Simplicity Simplify Access to ALL Data

Oracle Big Data SQL – A New Architecture Powerful, high-performance SQL on Hadoop Full Oracle SQL capabilities on Hadoop SQL query processing local to Hadoop nodes Simple data integration of Hadoop and Oracle Database Single SQL point-of-entry to access all data Scalable joins between Hadoop and RDBMS data Optimized hardware Balanced Configurations No bottlenecks Big Data SQL represents a new architecture for querying data in its natural format, wherever it leves, and – when running on Oracle Big Data Appliance and Oracle Exadata – provides a world-class Big Data Management System. Oracle Confidential – Internal/Restricted/Highly Restricted

Big Data SQL 10’s of Gigabytes of Data Hadoop Cluster Oracle Database SELECT w.sess_id, c.name FROM web_logs w, customers c WHERE w.source_country = ‘Brazil’ AND w.cust_id = c.customer_id; Relevant SQL runs on BDA nodes CUSTOMERS WEB_LOGS Big Data SQL Only columns and rows needed to answer query are returned Big Data SQL’s Smart Scan capability radically reduces the cost of joining data with Oracle Database as well. When a join between massive data in Hadoop and smaller data in Oracle occurs, Big Data SQL can process rows using Bloom filters. This ensures that only data from Hadoop which meets the join conditions are transmitted back to the database. As before, this can reduce the amount of data being transmitted and processed by the database by an order of magnitude or more. But in this case, Oracle Database is responsible for joining a part of average sized tables. By processing data at the source, whether it’s stored in Hadoop or Oracle Database, Big Data SQL ensures the best possible use of all the compute resources in a Big Data Management System. 10’s of Gigabytes of Data Hadoop Cluster Oracle Database

Big Data SQL SQL Push Down in Big Data SQL SELECT w.sess_id, c.name FROM web_logs w, customers c WHERE w.source_country = ‘Brazil’ AND w.cust_id = c.customer_id; SQL Push Down in Big Data SQL Hadoop Scans on Unstructured Data WHERE Clause Evaluation Column Projection Bloom Filters for Better Join Performance JSON Parsing, Data Mining Model Evaluation Relevant SQL runs on BDA nodes CUSTOMERS WEB_LOGS Big Data SQL Only columns and rows needed to answer query are returned Big Data SQL’s Smart Scan capability radically reduces the cost of joining data with Oracle Database as well. When a join between massive data in Hadoop and smaller data in Oracle occurs, Big Data SQL can process rows using Bloom filters. This ensures that only data from Hadoop which meets the join conditions are transmitted back to the database. As before, this can reduce the amount of data being transmitted and processed by the database by an order of magnitude or more. But in this case, Oracle Database is responsible for joining a part of average sized tables. By processing data at the source, whether it’s stored in Hadoop or Oracle Database, Big Data SQL ensures the best possible use of all the compute resources in a Big Data Management System. 10’s of Gigabytes of Data Hadoop Cluster Oracle Database

Oracle Communications Data Model Reference Architecture Data Sources Oracle Comms Apps (BSS/OSS) Oracle Comms Ntwk Products (Tekelec & Acme) Other Oracle Apps (CRM, ERP, etc.) Third Party Sources Data Management Big Data Platform (Hadoop/NoSQL) Relational Data Warehouse (OCDM) ETL/ELT Adapters Customer Experience Real-Time Adapters Operations Third Party Monetization Adapters Analytic Apps Feedback Loop To Other Apps

Core Design Principles for Big Data Appliance Operational Simplicity Simplify Access to ALL Data

Core Design Principles for Big Data Appliance No Bottlenecks Full Stack Install and Upgrades Simplified Management Cluster Growth Critical Node Migration Always Highly Available Always Secure Very Competitive Price Point Operational Simplicity Simplify Access to ALL Data

Successful Big Data Systems Grow From Cluster Install with HA to Large Clusters to Dealing with Operational Issues Day 1 12 node BDA for Production Hadoop HA and Security Set-up Ready to Load Data Full install with a single command: ./mammoth –i rck_1 This is a small example using the Name Nodes (HA setup) as an example how things change (automatically) on a BDA, and how critical node migration is happening. RCK_1

Successful Big Data Systems Grow From Cluster Install with HA to Large Clusters to Dealing with Operational Issues Day 1 This is a small example using the Name Nodes (HA setup) as an example how things change (automatically) on a BDA, and how critical node migration is happening. RCK_1 N Example Service: Hadoop Name Nodes

Successful Big Data Systems Grow From Cluster Install with HA to Large Clusters to Dealing with Operational Issues Day 90 Add 12 New Nodes across two Racks Cluster expansion with a single command: mammoth –e newhost1,…,newhostn This is a small example using the Name Nodes (HA setup) as an example how things change (automatically) on a BDA, and how critical node migration is happening. RCK_1 RCK_2 N N

Successful Big Data Systems Grow From Cluster Install with HA to Large Clusters to Dealing with Operational Issues Cluster Expansion with a single command: mammoth –e newhost1,…,newhostn This is a small example using the Name Nodes (HA setup) as an example how things change (automatically) on a BDA, and how critical node migration is happening. RCK_1 RCK_2 This expansion automatically optimizes HA setup across multiple racks N Because of uniform nodes and IB networking, no data is moved N

Successful Big Data Systems Grow From Cluster Install with HA to Large Clusters to Dealing with Operational Issues Day n Critical Node Failure => Primary Name Node This is a small example using the Name Nodes (HA setup) as an example how things change (automatically) on a BDA, and how critical node migration is happening. RCK_1 RCK_2 N N

Successful Big Data Systems Grow From Cluster Install with HA to Large Clusters to Dealing with Operational Issues Automatic Failover to other NameNode Automatic Service Request to Oracle for HW Failure This is a small example using the Name Nodes (HA setup) as an example how things change (automatically) on a BDA, and how critical node migration is happening. RCK_1 RCK_2 N N

Successful Big Data Systems Grow From Cluster Install with HA to Large Clusters to Dealing with Operational Issues Restore HA with a Single command bdacli admin_cluster migrate N1 This is a small example using the Name Nodes (HA setup) as an example how things change (automatically) on a BDA, and how critical node migration is happening. RCK_1 RCK_2 Reinstate the Repaired Node with a Single Command: bdacli admin_cluster reprovision N1 N N

Core Design Principles for Big Data Appliance 30% Quicker to Deploy “Oracle Big Data Appliance is an excellent choice for customers looking to work with the full suite of Cloudera’s leading Hadoop-based technology. It’s more cost-effective and quicker to deploy than a DIY cluster.” Operational Simplicity 21% It’s logical that an engineered system would be quicker to deploy that building your own. It’s that first one that people don’t believe. But we worked with an analyst firm, ESG, and they measured list prices, did the comparison and the BDA is at least 20% cheaper than building your own comparable cluster, assuming that you have the time and skills to do so. The key word here is comparable – I’ve had lots of people tell me that they could build one cheaper, but they turn out to have a fraction of the storage. These are large, dense storage nodes, and if you use cheap pizza box servers, you can fill a rack much more cheaply, but you’ll have way less storage. Cheaper to Buy Mike Olson, Cloudera founder, Chief Strategy Officer, and Chairman of the Board

Big Data Initiative @ Oracle Global Support Services Real-time access to better data means better insights, which means better decisions and better business results Integrate data associated with customer telemetry, configurations, service history, diagnostics, knowledge & support information Anticipate Detect Predict Automate Delight

Core Design Principles Enable Success Operational Simplicity Simplify Access to ALL Data

There is one more thing… Business Value = Applications

Big Data Appliance powers instant Business Value Customer Experience Management Communications Data Model Cyber Security Solutions

Introducing Paul Kent - SAS

Big Data and Big Analytics – So Much more Gunpowder! Paul Kent VP BigData, SAS Research and Development

1. Change 2. Safari Pics

[CON8279] Oracle Big Data Appliance: Deep Dive and Roadmap for Customers and Partners Oracle Big Data Appliance is the premier Hadoop appliance in the market. This session describes the roadmap for customers in the areas of high-performance SQL on Hadoop and securing big data, plus overall performance improvements for Hadoop. A special focus in the session is the roadmap and benefits Oracle Big Data Appliance brings to Oracle partners. To illustrate the benefits of running on a standardized and optimized Hadoop platform, SAS presents the findings of its tests of SAS In-Memory Analytics on Oracle Big Data Appliance.

Agenda SAS & Oracle Partnership Family Stories Deployment Patterns Hadoop Oracle Engineered Systems Family SAS Software Family Deployment Patterns

Reflection on a stronger partnership than ever Both leaders in Big Data – Jointly solving the most difficult and demanding Big Data Problems Providing simplicity and agility to create flexible configurations Extensive engineering collaboration Can we answer: How Does it Work? How Does it Perform? 2014 #SASGF13 Copyright © 2013, SAS Institute Inc. All rights reserved.

SOURCE: http://commons.wikimedia.org/wiki/File:Tamoxifen-3D-vdW.png the tamoxifen dilemma SOURCE: http://commons.wikimedia.org/wiki/File:Tamoxifen-3D-vdW.png

Agenda SAS & Oracle Partnership Family Stories Deployment Patterns Hadoop Oracle Engineered Systems Family SAS Software Family Deployment Patterns

Elephant :: 3 Good Ideas !! Never forgets Is a good (hard) worker Is a Social Animal (teamwork)

Hadoop – Simplified View Controller Worker Nodes MPP (Massively Parallel) hardware running database-like software “data” is stored in parts, across multiple worker nodes “work” operates in parallel ,on the different parts of the table

Idea #1 - HDFS. Never forgets! Head Node Data 1 Data 2 Data 3 Data 4… MYFILE.TXT ..block1 -> block1 ..block2 -> block2 ..block3 -> block3

Idea #1 - HDFS. Never forgets! Head Node Data 1 Data 2 Data 3 Data 4… MYFILE.TXT ..block1 -> block1 block1 copy2 ..block2 -> block2 block2 copy2 ..block3 -> block3 copy2 block3

Idea #1 - HDFS. Never forgets! Head Node Data 1 Data 2 Data 3 Data 4… MYFILE.TXT ..block1 -> block1 block1copy2 ..block2 -> block2 block2 copy2 ..block3 -> block3 copy2 block3 X X

Redundancy Wins!

Idea #2 – MapReduce – Send the work to the Data We Want the Youngest Person in the Room Each Row in the audience is a data node I’ll be the coordinator From outside to center, accumulate MIN Sweep from back to front. Youngest Advances

Agenda SAS & Oracle Partnership Family Stories Deployment Patterns Hadoop Oracle Engineered Systems Family SAS Software Family Deployment Patterns

Recap: Standard and Modular Starter Rack is a fully cabled and configured for growth with 6 servers In-Rack Expansion delivers 6 server modular expansion block Full Rack delivers optimal blend of capacity and expansion options Grow by adding rack – up to 18 racks without additional switches

Oracle Big Data SQL – A New Architecture Powerful, high-performance SQL on Hadoop Full Oracle SQL capabilities on Hadoop SQL query processing local to Hadoop nodes Simple data integration of Hadoop and Oracle Database Single SQL point-of-entry to access all data Scalable joins between Hadoop and RDBMS data Optimized hardware Balanced Configurations No bottlenecks Big Data SQL represents a new architecture for querying data in its natural format, wherever it leves, and – when running on Oracle Big Data Appliance and Oracle Exadata – provides a world-class Big Data Management System. Oracle Confidential – Internal/Restricted/Highly Restricted

Diversity. It’s a good thing! Impala Nyala

Agenda SAS & Oracle Partnership Family Stories Deployment Patterns Hadoop Oracle Engineered Systems Family SAS Software Family Deployment Patterns

4 Important Things #1 Join the Family

SAS ACCESS to Hadoop SAS SERVER HADOOP Hive QL #2 Be Familiar

SAS / Embedded Process SAS SERVER SAS/Scoring Accelerator for Hadoop SAS Data Step & DS2 SAS SERVER proc ds2 ; /* thread ~ eqiv to a mapper */ thread map_program; method run(); set dbmslib.intab; /* program statements */ end; endthread; run; /* program wrapper */ data hdf.data_reduced; dcl thread map_program map_pgm; method run(); set from map_pgm threads=N; /* reduce steps */ end; enddata; run; quit; SAS/Scoring Accelerator for Hadoop SAS/Code Accelerator for Hadoop SAS/Data Quality Accelerator for Hadoop

SAS / High Performance Analytics HADOOP SAS HPA Procedures SAS SERVER #3 Use the Cluster!

SAS / High Performance Analytics Prepare Explore / Transform Model HPDS2 HPDMDB HPSAMPLE HPSUMMARY HPCORR HPREDUCE HPIMPUTE HPBIN HPLOGISTIC HPREG HPNEURAL HPNLIN HPCOUNTREG HPMIXED HPSEVERITY HPFOREST HPSVM HPDECIDE HPQLIM HPLSO HPSPLIT HPTMINE HPTMSCORE What HPA procs … this is the short list these run in Hadoop today… HPDS2 – Parallel Execution of DS2 HPDMDB – Metadata definitions and data Summarization HPSAMPLE - Sampling and data partitioning HPSUMMARY – Summarization Descriptive Statistics HPCORR - Pearson correlation coefficients, three nonparametric measures of association, and the probabilities associated with these statistics HPREDUCE – unsupervised variable selection HPIMPUTE – Missing value replacement HPBIN – binning HPLOGISTIC – Logistic regression HPREG – regression HPNEURAL – Neural networks HPNLIN – Non linear regression HPCOUNTREG – regression of count variables HPMIXED – Mixed Linear models HPFOREST – random forest HPSVM – support vector machine HPDECIDE – Decision / Cost HPLSO – Lasso HPTMINE – Text Mining HPTSCORE – Text Scoring HPREG linear regression and variable selection HPLOGISTIC logistic regression and variable selection HPLMIXED linear mixed models HPNEURAL neural nets HPNLIN nonlinear regression and maximum likelihood HPREDUCE covariance/correlation analysis, variable reduction PROC HPREG High-performance combination of REG and GLMSELECT Supports »classical variable selection techniques »modern variable sélection techniques (LAR, LASSO) CLASS variables GLM and reference parameterizations SELECTION statement PROC HPNLIN High-performance combination of NLIN and NLP/NLMIXED »Classical nonlinear least squares (Levenberg-Marquardt) »Maximum likelihood for built-in distributions »Maximum likelihood for general, user-specified obj. functions »Boundaries, linear equality/inequality constraints

SAS / High Performance Analytics Controller Client Some processes are more complex that fits “nicely” inside the terms & conditions of the container. We can use the embedded process as a data acquisition channel, and yet perform the mathematics elsewhere (and in the first generation, elsewhere meant other operating system processes on the same server – preserving a symetric or 1:1 balance between the data parallelism and the mathematics parallelism) 2012 – SAS High Performance appliances for teradata, greenplum, oracle and hadoop

#1 Join the Family #2 Be Familiar #3 Use the cluster #4 Have a pretty face!

SAS Visual Analytics Interactive exploration, dashboards and reporting Auto-charting automatically picks the best graph Forecasting, scenario analysis, Decision Trees and other analytic visualizations Text analysis and content categorization Feature-rich mobile apps for iPad® and Android

SAS Visual Statistics July-2014 Interactive, visual application for statistical modeling and classification Multiple methods: logistic, Regression, GLM, Trees, Forest, Clustering and more… Model comparison and assessment Group BY Processing

4 Important Things (for cluster friendly software) Join the Family Be Familiar Performance Have a pretty face

Agenda SAS & Oracle Partnership Family Stories Deployment Patterns Hadoop Oracle Engineered Systems Family SAS Software Family Deployment Patterns

SAS Big Data on Big Data Appliance Flexible Architectural options for SAS deployments Can run on Starter, Half and Full configurations Optionally select nodes “N, N-1, N-2, …” for additional SAS Services such as SAS Compute Tier, SAS MidTier Optionally select node subset “N, N-1, N-2, N-3, …) for more dedicated resources for SAS Analytic Compute Environment by shifting Big Data Appliance roles Option to selectively add more memory on a per node basis depending on specific workload distribution

STARTER BDA SAS HPA Root Node SAS Visual Analytics Metadata Server SAS Compute SAS Midtier … … SAS Visual Analytics, high-Performance Analytic Compute environment Co-Located with Hadoop

… … STARTER BDA Consider: Extra Memory for 5,6? SAS HPA Root Node SAS Visual Analytics Metadata Server SAS Compute SAS Midtier Consider: Extra Memory for 5,6? … … SAS Visual Analytics, high-Performance Analytic Compute environment Co-Located with Hadoop

FULL RACK BDA SAS HPA Root Node Metadata Server SAS Compute SAS Midtier LASR Worker 18 HDFS Data 18 LASR Worker 17 … … HDFS Data 17 … SAS Visual Analytics, high-Performance Analytic Compute environment Co-Located with Hadoop

Assembled in OSC, SYDNEY AUSTRALIA FULL RACK BDA Assembled in OSC, SYDNEY AUSTRALIA

Assembled in OSC, SYDNEY AUSTRALIA FULL RACK BDA Assembled in OSC, SYDNEY AUSTRALIA

Assembled in OSC, SYDNEY AUSTRALIA FULL RACK BDA Assembled in OSC, SYDNEY AUSTRALIA

Assembled in OSC, SYDNEY AUSTRALIA FULL RACK BDA Assembled in OSC, SYDNEY AUSTRALIA Basic Smoke Tests Confirmed: Interoperate with Hadoop and Map Reduce Read and Write text files to/from HDFS Read and Write Tabular files to/from Hive (will confirm Oracle BIGSQL in OSC-SC) Read and Write SAS binary format files to/from HDFS High Degree Of Parallelism (DOP) reads via Map-Only jobs SAS LASR server co-exists on/with datanodes SAS HPA tasks scheduled on datanodes

SAS High-Performance Analytics Performance SAS Format Data (SASHDAT) 7171 SAS High-Performance Analytics Performance SAS Format Data (SASHDAT) 1107 var 11.795 Mobs 97GB 5.7GB/node 73.744 Mobs 608GB 35.7GB/node 6x Create 208.79 sec 2284.29 sec 11 Scan/Count 24.60 sec 259.38 sec 10.5 HPCORR 295.20 1410.40 4.7 HPCNTREG 336.79 1547.59 4.6 HPREDUCE (u) 236.55 2467.76 10.4 HPREDUCE (s) 219.50 2037.74 9.3 Table 1: Summation of 5/20/100/200 columns; Baseline: DOP=1 (no parallelism) 120M rows, 400 columns, reg_simtbl_400

OSC-AU FullRack BDA 408 Threads 600 GB dataset 17 servers Your Problem solved ASAP

… … Exadata IntegraTion SAS Embedded Processing (EP) to Exadata Leveraging Big Data SQL SAS HPA Root Node SAS Visual Analytics Metadata Server SAS Compute SAS Midtier LASR Worker 18 … … HDFS Data 18 SAS EP Big Data SQL

SAS High-Performance Analytics Performance 7777 SAS High-Performance Analytics Performance SAS EP Parallel Data Feeders DOP=1 DOP=24 (flash cache) Add(5) 1.25min 1.5min .5min Add(20) 2.5min Add(100) 13min .6min Add(200) 16min ~2min 1.25min (10x) Table 1: Summation of 5/20/100/200 columns; Baseline: DOP=1 (no parallelism) 120M rows, 400 columns, reg_simtbl_400

SAS High-Performance Analytics Performance 7878 SAS High-Performance Analytics Performance SAS EP Parallel Data Feeders Access Access / DBSlice SAS HPA Using EP Reg_sim_200 1:01:12 0:28:37 0:08:00 Reg_sim_400 1:49:11 0:55:33 0:16:05 (7x!) Table 2: Scan times for 2 tables (200 columns, 400 columns, 120M rows); Baseline: SAS/ACCESS vs. HPA EP feeder

SAS High-Performance Analytics Performance 7979 SAS High-Performance Analytics Performance SAS Format Data (SASHDAT) and Oracle EXADATA 1107 var 11.795 Mobs 97GB 5.7GB/node SASHDAT 907 var 79.7GB 4.7GB/node EXADATA 73.744 Mobs 608GB 35.7GB/node Create 208.79 sec 931.22 sec 2284.29 sec Scan/Count 24.60 sec 956.16 sec 259.38 sec HPCORR 295.20 833.24 1410.40 HPCNTREG 336.79 756.97 1547.59 HPREDUCE (u) 236.55 1055.11 2467.76 HPREDUCE (s) 219.50 1051.93 2037.74 Table 1: Summation of 5/20/100/200 columns; Baseline: DOP=1 (no parallelism) 120M rows, 400 columns, reg_simtbl_400

Oracle Engineered Systems for ExaData ExaLogic SuperCluster Big Data Appliance ZFS Storage Appliance Virtual Compute Appliance Database Backup, Recovery, Logging Appliance

Working together to create customer value SAS and Oracle Working together to create customer value Joint R & D development and Product Management teams in Cary and Redwood Shores Focus on driving SAS technology components to run natively in Oracle database Joint performance engineering optimizations Template physical architectures developed based on use-cases Physically tested and benchmarked together Reduction in physical effort Overall reduction in lifecycle costs Best Practice papers SAS and Oracle Engineers provide joint "Sizing and Architecture Analysis and Design"

SAS and Oracle Paul.Kent @ sas.com @hornpolish paulmkent Better Together

Oracle Confidential – Internal/Restricted/Highly Restricted