Download presentation
1
Big Data Hands-On Labs:
Date Time Location Tuesday 3:45pm – 4:45pm Hotel Nikko - Peninsula Wednesday 1:15pm – 2:15pm Thursday 11:30am – 12:30pm Or download: Big Data Lite Virtual Machine
2
Oracle Big Data Appliance for Customers and Partners
Jean-Pierre Dijcks Oracle Big Data Product Management Paul Kent SAS VP Big Data
3
Oracle Big Data Appliance for Customers and Partners
1 Big Data Appliance Recap Why You Should Consider Big Data Appliance Driving Business Value with SAS on Big Data Appliance Q&A 2 3 4
4
Oracle Big Data Management System
Oracle Big Data SQL Oracle Database Oracle Industry Models Oracle Advanced Analytics Oracle Spatial & Graph Cloudera Hadoop Oracle NoSQL Database Oracle R Advanced Analytics for Hadoop Oracle R Distribution Oracle Database Oracle Advanced Security Oracle Advanced Analytics Oracle Spatial & Graph Oracle Big Data Connectors Oracle Data Integrator Big Data Appliance Oracle Exadata SOURCES
5
Recap: Big Data Appliance Overview
Big Data Appliance X4-2 Sun Oracle X4-2L Servers with per server: 2 * 8 Core Intel Xeon E5 Processors 64 GB Memory 48TB Disk space Integrated Software: Oracle Linux, Oracle Java VM Oracle Big Data SQL* Cloudera Distribution of Apache Hadoop – EDH Edition Cloudera Manager Oracle R Distribution Oracle NoSQL Database * Oracle Big Data SQL is separately licensed
6
Recap: Standard and Modular
Starter Rack is a fully cabled and configured for growth with 6 servers In-Rack Expansion delivers 6 server modular expansion block Full Rack delivers optimal blend of capacity and expansion options Grow by adding rack – up to 18 racks without additional switches
7
Recap: Harness Rapid Evolution
BDA 4.0 – Sept 2014 Big Data SQL Node Migration b BDA 4.0 BDA 2.x – April 2013 Starter Rack In-Rack Expansion EM Integration BDA 3.x – April 2014 CDH 5.0 (MR2 & YARN) AAA Security Encryption BDA 1.0 – Jan 2012 Initial BDA Mammoth Install
8
Core Design Principles for Big Data Appliance
Operational Simplicity Simplify Access to ALL Data
9
Core Design Principles for Big Data Appliance
Oracle Big Data SQL Oracle SQL on ALL your data All Native Oracle SQL Operators Smart Scan for Optimized Performance Oracle Security Govern all Data through a Single Set of Security Policies Operational Simplicity Simplify Access to ALL Data
10
Oracle Big Data SQL – A New Architecture
Powerful, high-performance SQL on Hadoop Full Oracle SQL capabilities on Hadoop SQL query processing local to Hadoop nodes Simple data integration of Hadoop and Oracle Database Single SQL point-of-entry to access all data Scalable joins between Hadoop and RDBMS data Optimized hardware Balanced Configurations No bottlenecks Big Data SQL represents a new architecture for querying data in its natural format, wherever it leves, and – when running on Oracle Big Data Appliance and Oracle Exadata – provides a world-class Big Data Management System. Oracle Confidential – Internal/Restricted/Highly Restricted
11
Big Data SQL 10’s of Gigabytes of Data Hadoop Cluster Oracle Database
SELECT w.sess_id, c.name FROM web_logs w, customers c WHERE w.source_country = ‘Brazil’ AND w.cust_id = c.customer_id; Relevant SQL runs on BDA nodes CUSTOMERS WEB_LOGS Big Data SQL Only columns and rows needed to answer query are returned Big Data SQL’s Smart Scan capability radically reduces the cost of joining data with Oracle Database as well. When a join between massive data in Hadoop and smaller data in Oracle occurs, Big Data SQL can process rows using Bloom filters. This ensures that only data from Hadoop which meets the join conditions are transmitted back to the database. As before, this can reduce the amount of data being transmitted and processed by the database by an order of magnitude or more. But in this case, Oracle Database is responsible for joining a part of average sized tables. By processing data at the source, whether it’s stored in Hadoop or Oracle Database, Big Data SQL ensures the best possible use of all the compute resources in a Big Data Management System. 10’s of Gigabytes of Data Hadoop Cluster Oracle Database
12
Big Data SQL SQL Push Down in Big Data SQL
SELECT w.sess_id, c.name FROM web_logs w, customers c WHERE w.source_country = ‘Brazil’ AND w.cust_id = c.customer_id; SQL Push Down in Big Data SQL Hadoop Scans on Unstructured Data WHERE Clause Evaluation Column Projection Bloom Filters for Better Join Performance JSON Parsing, Data Mining Model Evaluation Relevant SQL runs on BDA nodes CUSTOMERS WEB_LOGS Big Data SQL Only columns and rows needed to answer query are returned Big Data SQL’s Smart Scan capability radically reduces the cost of joining data with Oracle Database as well. When a join between massive data in Hadoop and smaller data in Oracle occurs, Big Data SQL can process rows using Bloom filters. This ensures that only data from Hadoop which meets the join conditions are transmitted back to the database. As before, this can reduce the amount of data being transmitted and processed by the database by an order of magnitude or more. But in this case, Oracle Database is responsible for joining a part of average sized tables. By processing data at the source, whether it’s stored in Hadoop or Oracle Database, Big Data SQL ensures the best possible use of all the compute resources in a Big Data Management System. 10’s of Gigabytes of Data Hadoop Cluster Oracle Database
13
Oracle Communications Data Model
Reference Architecture Data Sources Oracle Comms Apps (BSS/OSS) Oracle Comms Ntwk Products (Tekelec & Acme) Other Oracle Apps (CRM, ERP, etc.) Third Party Sources Data Management Big Data Platform (Hadoop/NoSQL) Relational Data Warehouse (OCDM) ETL/ELT Adapters Customer Experience Real-Time Adapters Operations Third Party Monetization Adapters Analytic Apps Feedback Loop To Other Apps
14
Core Design Principles for Big Data Appliance
Operational Simplicity Simplify Access to ALL Data
15
Core Design Principles for Big Data Appliance
No Bottlenecks Full Stack Install and Upgrades Simplified Management Cluster Growth Critical Node Migration Always Highly Available Always Secure Very Competitive Price Point Operational Simplicity Simplify Access to ALL Data
16
Successful Big Data Systems Grow
From Cluster Install with HA to Large Clusters to Dealing with Operational Issues Day 1 12 node BDA for Production Hadoop HA and Security Set-up Ready to Load Data Full install with a single command: ./mammoth –i rck_1 This is a small example using the Name Nodes (HA setup) as an example how things change (automatically) on a BDA, and how critical node migration is happening. RCK_1
17
Successful Big Data Systems Grow
From Cluster Install with HA to Large Clusters to Dealing with Operational Issues Day 1 This is a small example using the Name Nodes (HA setup) as an example how things change (automatically) on a BDA, and how critical node migration is happening. RCK_1 N Example Service: Hadoop Name Nodes
18
Successful Big Data Systems Grow
From Cluster Install with HA to Large Clusters to Dealing with Operational Issues Day 90 Add 12 New Nodes across two Racks Cluster expansion with a single command: mammoth –e newhost1,…,newhostn This is a small example using the Name Nodes (HA setup) as an example how things change (automatically) on a BDA, and how critical node migration is happening. RCK_1 RCK_2 N N
19
Successful Big Data Systems Grow
From Cluster Install with HA to Large Clusters to Dealing with Operational Issues Cluster Expansion with a single command: mammoth –e newhost1,…,newhostn This is a small example using the Name Nodes (HA setup) as an example how things change (automatically) on a BDA, and how critical node migration is happening. RCK_1 RCK_2 This expansion automatically optimizes HA setup across multiple racks N Because of uniform nodes and IB networking, no data is moved N
20
Successful Big Data Systems Grow
From Cluster Install with HA to Large Clusters to Dealing with Operational Issues Day n Critical Node Failure => Primary Name Node This is a small example using the Name Nodes (HA setup) as an example how things change (automatically) on a BDA, and how critical node migration is happening. RCK_1 RCK_2 N N
21
Successful Big Data Systems Grow
From Cluster Install with HA to Large Clusters to Dealing with Operational Issues Automatic Failover to other NameNode Automatic Service Request to Oracle for HW Failure This is a small example using the Name Nodes (HA setup) as an example how things change (automatically) on a BDA, and how critical node migration is happening. RCK_1 RCK_2 N N
22
Successful Big Data Systems Grow
From Cluster Install with HA to Large Clusters to Dealing with Operational Issues Restore HA with a Single command bdacli admin_cluster migrate N1 This is a small example using the Name Nodes (HA setup) as an example how things change (automatically) on a BDA, and how critical node migration is happening. RCK_1 RCK_2 Reinstate the Repaired Node with a Single Command: bdacli admin_cluster reprovision N1 N N
23
Core Design Principles for Big Data Appliance
30% Quicker to Deploy “Oracle Big Data Appliance is an excellent choice for customers looking to work with the full suite of Cloudera’s leading Hadoop-based technology. It’s more cost-effective and quicker to deploy than a DIY cluster.” Operational Simplicity 21% It’s logical that an engineered system would be quicker to deploy that building your own. It’s that first one that people don’t believe. But we worked with an analyst firm, ESG, and they measured list prices, did the comparison and the BDA is at least 20% cheaper than building your own comparable cluster, assuming that you have the time and skills to do so. The key word here is comparable – I’ve had lots of people tell me that they could build one cheaper, but they turn out to have a fraction of the storage. These are large, dense storage nodes, and if you use cheap pizza box servers, you can fill a rack much more cheaply, but you’ll have way less storage. Cheaper to Buy Mike Olson, Cloudera founder, Chief Strategy Officer, and Chairman of the Board
24
Big Data Initiative @ Oracle Global Support Services
Real-time access to better data means better insights, which means better decisions and better business results Integrate data associated with customer telemetry, configurations, service history, diagnostics, knowledge & support information Anticipate Detect Predict Automate Delight
25
Core Design Principles Enable Success
Operational Simplicity Simplify Access to ALL Data
26
There is one more thing…
Business Value = Applications
27
Big Data Appliance powers instant Business Value
Customer Experience Management Communications Data Model Cyber Security Solutions
28
Introducing Paul Kent - SAS
29
Big Data and Big Analytics – So Much more Gunpowder!
Paul Kent VP BigData, SAS Research and Development
30
1. Change Safari Pics
31
[CON8279] Oracle Big Data Appliance: Deep Dive and Roadmap for Customers and Partners
Oracle Big Data Appliance is the premier Hadoop appliance in the market. This session describes the roadmap for customers in the areas of high-performance SQL on Hadoop and securing big data, plus overall performance improvements for Hadoop. A special focus in the session is the roadmap and benefits Oracle Big Data Appliance brings to Oracle partners. To illustrate the benefits of running on a standardized and optimized Hadoop platform, SAS presents the findings of its tests of SAS In-Memory Analytics on Oracle Big Data Appliance.
32
Agenda SAS & Oracle Partnership Family Stories Deployment Patterns
Hadoop Oracle Engineered Systems Family SAS Software Family Deployment Patterns
33
Reflection on a stronger partnership than ever
Both leaders in Big Data – Jointly solving the most difficult and demanding Big Data Problems Providing simplicity and agility to create flexible configurations Extensive engineering collaboration Can we answer: How Does it Work? How Does it Perform? 2014 #SASGF13 Copyright © 2013, SAS Institute Inc. All rights reserved.
34
SOURCE: http://commons.wikimedia.org/wiki/File:Tamoxifen-3D-vdW.png
the tamoxifen dilemma SOURCE:
35
Agenda SAS & Oracle Partnership Family Stories Deployment Patterns
Hadoop Oracle Engineered Systems Family SAS Software Family Deployment Patterns
37
Elephant :: 3 Good Ideas !! Never forgets Is a good (hard) worker
Is a Social Animal (teamwork)
38
Hadoop – Simplified View
Controller Worker Nodes MPP (Massively Parallel) hardware running database-like software “data” is stored in parts, across multiple worker nodes “work” operates in parallel ,on the different parts of the table
39
Idea #1 - HDFS. Never forgets!
Head Node Data 1 Data 2 Data 3 Data 4… MYFILE.TXT ..block1 -> block1 ..block2 -> block2 ..block3 -> block3
40
Idea #1 - HDFS. Never forgets!
Head Node Data 1 Data 2 Data 3 Data 4… MYFILE.TXT ..block1 -> block1 block1 copy2 ..block2 -> block2 block2 copy2 ..block3 -> block3 copy2 block3
41
Idea #1 - HDFS. Never forgets!
Head Node Data 1 Data 2 Data 3 Data 4… MYFILE.TXT ..block1 -> block1 block1copy2 ..block2 -> block2 block2 copy2 ..block3 -> block3 copy2 block3 X X
42
Redundancy Wins!
43
Idea #2 – MapReduce – Send the work to the Data
We Want the Youngest Person in the Room Each Row in the audience is a data node I’ll be the coordinator From outside to center, accumulate MIN Sweep from back to front. Youngest Advances
44
Agenda SAS & Oracle Partnership Family Stories Deployment Patterns
Hadoop Oracle Engineered Systems Family SAS Software Family Deployment Patterns
45
Recap: Standard and Modular
Starter Rack is a fully cabled and configured for growth with 6 servers In-Rack Expansion delivers 6 server modular expansion block Full Rack delivers optimal blend of capacity and expansion options Grow by adding rack – up to 18 racks without additional switches
46
Oracle Big Data SQL – A New Architecture
Powerful, high-performance SQL on Hadoop Full Oracle SQL capabilities on Hadoop SQL query processing local to Hadoop nodes Simple data integration of Hadoop and Oracle Database Single SQL point-of-entry to access all data Scalable joins between Hadoop and RDBMS data Optimized hardware Balanced Configurations No bottlenecks Big Data SQL represents a new architecture for querying data in its natural format, wherever it leves, and – when running on Oracle Big Data Appliance and Oracle Exadata – provides a world-class Big Data Management System. Oracle Confidential – Internal/Restricted/Highly Restricted
47
Diversity. It’s a good thing!
Impala Nyala
48
Agenda SAS & Oracle Partnership Family Stories Deployment Patterns
Hadoop Oracle Engineered Systems Family SAS Software Family Deployment Patterns
49
4 Important Things #1 Join the Family
50
SAS ACCESS to Hadoop SAS SERVER HADOOP Hive QL #2 Be Familiar
51
SAS / Embedded Process SAS SERVER SAS/Scoring Accelerator for Hadoop
SAS Data Step & DS2 SAS SERVER proc ds2 ; /* thread ~ eqiv to a mapper */ thread map_program; method run(); set dbmslib.intab; /* program statements */ end; endthread; run; /* program wrapper */ data hdf.data_reduced; dcl thread map_program map_pgm; method run(); set from map_pgm threads=N; /* reduce steps */ end; enddata; run; quit; SAS/Scoring Accelerator for Hadoop SAS/Code Accelerator for Hadoop SAS/Data Quality Accelerator for Hadoop
52
SAS / High Performance Analytics
HADOOP SAS HPA Procedures SAS SERVER #3 Use the Cluster!
53
SAS / High Performance Analytics
Prepare Explore / Transform Model HPDS2 HPDMDB HPSAMPLE HPSUMMARY HPCORR HPREDUCE HPIMPUTE HPBIN HPLOGISTIC HPREG HPNEURAL HPNLIN HPCOUNTREG HPMIXED HPSEVERITY HPFOREST HPSVM HPDECIDE HPQLIM HPLSO HPSPLIT HPTMINE HPTMSCORE What HPA procs … this is the short list these run in Hadoop today… HPDS2 – Parallel Execution of DS2 HPDMDB – Metadata definitions and data Summarization HPSAMPLE - Sampling and data partitioning HPSUMMARY – Summarization Descriptive Statistics HPCORR - Pearson correlation coefficients, three nonparametric measures of association, and the probabilities associated with these statistics HPREDUCE – unsupervised variable selection HPIMPUTE – Missing value replacement HPBIN – binning HPLOGISTIC – Logistic regression HPREG – regression HPNEURAL – Neural networks HPNLIN – Non linear regression HPCOUNTREG – regression of count variables HPMIXED – Mixed Linear models HPFOREST – random forest HPSVM – support vector machine HPDECIDE – Decision / Cost HPLSO – Lasso HPTMINE – Text Mining HPTSCORE – Text Scoring HPREG linear regression and variable selection HPLOGISTIC logistic regression and variable selection HPLMIXED linear mixed models HPNEURAL neural nets HPNLIN nonlinear regression and maximum likelihood HPREDUCE covariance/correlation analysis, variable reduction PROC HPREG High-performance combination of REG and GLMSELECT Supports »classical variable selection techniques »modern variable sélection techniques (LAR, LASSO) CLASS variables GLM and reference parameterizations SELECTION statement PROC HPNLIN High-performance combination of NLIN and NLP/NLMIXED »Classical nonlinear least squares (Levenberg-Marquardt) »Maximum likelihood for built-in distributions »Maximum likelihood for general, user-specified obj. functions »Boundaries, linear equality/inequality constraints
54
SAS / High Performance Analytics
Controller Client Some processes are more complex that fits “nicely” inside the terms & conditions of the container. We can use the embedded process as a data acquisition channel, and yet perform the mathematics elsewhere (and in the first generation, elsewhere meant other operating system processes on the same server – preserving a symetric or 1:1 balance between the data parallelism and the mathematics parallelism) 2012 – SAS High Performance appliances for teradata, greenplum, oracle and hadoop
56
#1 Join the Family #2 Be Familiar #3 Use the cluster #4 Have a pretty face!
57
SAS Visual Analytics Interactive exploration, dashboards and reporting
Auto-charting automatically picks the best graph Forecasting, scenario analysis, Decision Trees and other analytic visualizations Text analysis and content categorization Feature-rich mobile apps for iPad® and Android
59
SAS Visual Statistics July-2014
Interactive, visual application for statistical modeling and classification Multiple methods: logistic, Regression, GLM, Trees, Forest, Clustering and more… Model comparison and assessment Group BY Processing
61
4 Important Things (for cluster friendly software)
Join the Family Be Familiar Performance Have a pretty face
62
Agenda SAS & Oracle Partnership Family Stories Deployment Patterns
Hadoop Oracle Engineered Systems Family SAS Software Family Deployment Patterns
63
SAS Big Data on Big Data Appliance
Flexible Architectural options for SAS deployments Can run on Starter, Half and Full configurations Optionally select nodes “N, N-1, N-2, …” for additional SAS Services such as SAS Compute Tier, SAS MidTier Optionally select node subset “N, N-1, N-2, N-3, …) for more dedicated resources for SAS Analytic Compute Environment by shifting Big Data Appliance roles Option to selectively add more memory on a per node basis depending on specific workload distribution
64
STARTER BDA SAS HPA Root Node SAS Visual Analytics Metadata Server SAS Compute SAS Midtier … … SAS Visual Analytics, high-Performance Analytic Compute environment Co-Located with Hadoop
65
… … STARTER BDA Consider: Extra Memory for 5,6?
SAS HPA Root Node SAS Visual Analytics Metadata Server SAS Compute SAS Midtier Consider: Extra Memory for 5,6? … … SAS Visual Analytics, high-Performance Analytic Compute environment Co-Located with Hadoop
66
FULL RACK BDA SAS HPA Root Node Metadata Server SAS Compute SAS Midtier LASR Worker 18 HDFS Data 18 LASR Worker 17 … … HDFS Data 17 … SAS Visual Analytics, high-Performance Analytic Compute environment Co-Located with Hadoop
67
Assembled in OSC, SYDNEY AUSTRALIA
FULL RACK BDA Assembled in OSC, SYDNEY AUSTRALIA
68
Assembled in OSC, SYDNEY AUSTRALIA
FULL RACK BDA Assembled in OSC, SYDNEY AUSTRALIA
69
Assembled in OSC, SYDNEY AUSTRALIA
FULL RACK BDA Assembled in OSC, SYDNEY AUSTRALIA
70
Assembled in OSC, SYDNEY AUSTRALIA
FULL RACK BDA Assembled in OSC, SYDNEY AUSTRALIA Basic Smoke Tests Confirmed: Interoperate with Hadoop and Map Reduce Read and Write text files to/from HDFS Read and Write Tabular files to/from Hive (will confirm Oracle BIGSQL in OSC-SC) Read and Write SAS binary format files to/from HDFS High Degree Of Parallelism (DOP) reads via Map-Only jobs SAS LASR server co-exists on/with datanodes SAS HPA tasks scheduled on datanodes
71
SAS High-Performance Analytics Performance SAS Format Data (SASHDAT)
7171 SAS High-Performance Analytics Performance SAS Format Data (SASHDAT) 1107 var Mobs 97GB 5.7GB/node Mobs 608GB 35.7GB/node 6x Create sec sec 11 Scan/Count 24.60 sec sec 10.5 HPCORR 295.20 4.7 HPCNTREG 336.79 4.6 HPREDUCE (u) 236.55 10.4 HPREDUCE (s) 219.50 9.3 Table 1: Summation of 5/20/100/200 columns; Baseline: DOP=1 (no parallelism) 120M rows, 400 columns, reg_simtbl_400
72
OSC-AU FullRack BDA 408 Threads 600 GB dataset 17 servers
Your Problem solved ASAP
76
… … Exadata IntegraTion SAS Embedded Processing (EP) to Exadata
Leveraging Big Data SQL SAS HPA Root Node SAS Visual Analytics Metadata Server SAS Compute SAS Midtier LASR Worker 18 … … HDFS Data 18 SAS EP Big Data SQL
77
SAS High-Performance Analytics Performance
7777 SAS High-Performance Analytics Performance SAS EP Parallel Data Feeders DOP=1 DOP=24 (flash cache) Add(5) 1.25min 1.5min .5min Add(20) 2.5min Add(100) 13min .6min Add(200) 16min ~2min 1.25min (10x) Table 1: Summation of 5/20/100/200 columns; Baseline: DOP=1 (no parallelism) 120M rows, 400 columns, reg_simtbl_400
78
SAS High-Performance Analytics Performance
7878 SAS High-Performance Analytics Performance SAS EP Parallel Data Feeders Access Access / DBSlice SAS HPA Using EP Reg_sim_200 1:01:12 0:28:37 0:08:00 Reg_sim_400 1:49:11 0:55:33 0:16:05 (7x!) Table 2: Scan times for 2 tables (200 columns, 400 columns, 120M rows); Baseline: SAS/ACCESS vs. HPA EP feeder
79
SAS High-Performance Analytics Performance
7979 SAS High-Performance Analytics Performance SAS Format Data (SASHDAT) and Oracle EXADATA 1107 var Mobs 97GB 5.7GB/node SASHDAT 907 var 79.7GB 4.7GB/node EXADATA Mobs 608GB 35.7GB/node Create sec sec sec Scan/Count 24.60 sec sec sec HPCORR 295.20 833.24 HPCNTREG 336.79 756.97 HPREDUCE (u) 236.55 HPREDUCE (s) 219.50 Table 1: Summation of 5/20/100/200 columns; Baseline: DOP=1 (no parallelism) 120M rows, 400 columns, reg_simtbl_400
80
Oracle Engineered Systems for
ExaData ExaLogic SuperCluster Big Data Appliance ZFS Storage Appliance Virtual Compute Appliance Database Backup, Recovery, Logging Appliance
81
Working together to create customer value
SAS and Oracle Working together to create customer value Joint R & D development and Product Management teams in Cary and Redwood Shores Focus on driving SAS technology components to run natively in Oracle database Joint performance engineering optimizations Template physical architectures developed based on use-cases Physically tested and benchmarked together Reduction in physical effort Overall reduction in lifecycle costs Best Practice papers SAS and Oracle Engineers provide joint "Sizing and Architecture Analysis and Design"
82
SAS and Oracle Paul.Kent @ sas.com @hornpolish paulmkent
Better Together
83
Oracle Confidential – Internal/Restricted/Highly Restricted
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.