SAS Analytic Solutions Running on a Hadoop Cluster using YARN

Slides:



Advertisements
Similar presentations
Chapter 1 Business Driven Technology
Advertisements

Can’t We All Just Get Along? Sandy Ryza. Introductions Software engineer at Cloudera MapReduce, YARN, Resource management Hadoop committer.
The Internet of Riedwaan Bassadien Platform Strategy Manager Microsoft Everything Your things.
South Central SAS Users’ Group November 5, 2012 Philip Easterling
Setting Big Data Capabilities Free How to Make Business on Big Data? Stig Torngaard, Partner Platon.
IBM SPSS Solutions A SELECT INTERNATIONAL COMPANY.
Resource Management with YARN: YARN Past, Present and Future
SAS solutions SAS ottawa platform user society nov 20th 2014.
Advance Analytics Capabilities
Tableau Visual Intelligence Platform
© 2004 Visible Systems Corporation. All rights reserved. 1 (800) 6VISIBLE Holistic View of the Enterprise Business Development Operations.
Workload Management BMO Financial Group Case Study IRMAC, January 2008 Sorina Faur, Database Development Manager.
Copyright © 2011 OpTier Ltd. All rights reserved. Contents subject to change without notice.Confidential. Optimizing Performance and Capacity in Private.
1 Community (Optimize both Yarn & Non Yarn Hadoop clusters)
Hadoop tutorials. Todays agenda Hadoop Introduction and Architecture Hadoop Distributed File System MapReduce Spark 2.
Copyright © 2008 SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks.
© 2014 IBM Corporation Introduction to Cognos Business Intelligence.
Amadeus Travel Intelligence ‘Monetising’ big data sets
Copyright © 2011 SAS Institute Inc. All rights reserved. Performance & Portfolio Management: SAS ® IT Intelligence for the Air Force Personnel Operations.
Copyright © 2012, SAS Institute Inc. All rights reserved. BIG DATA ANALYTICS FOR DEVELOPMENT.
Copyright © 2009, SAS Institute Inc. All rights reserved. Creating a Practical Social Media Strategy for Authors, Customers, and Colleagues: A SAS Publishing.
Next Generation of Apache Hadoop MapReduce Arun C. Murthy - Hortonworks Founder and Architect Formerly Architect, MapReduce.
This presentation was scheduled to be delivered by Brian Mitchell, Lead Architect, Microsoft Big Data COE Follow him Contact him.
Page 1 © Hortonworks Inc – All Rights Reserved Hortonworks Naser Ali UK Building Energy Management Group Hadoop: A Data platform for businesses.
© 2011 IBM Corporation Smarter Software for a Smarter Planet The Capabilities of IBM Software Borislav Borissov SWG Manager, IBM.
Efficient BI Solution Presented by: Leo Khaskin, PowerCubes Lab Value of Information as Business Asset.
Tyson Condie.
Big Data Adoption Drivers Sources: US Data: IDC 2012 Vertical IT & Communications Survey. N = 4177 LatAm Data: PRELIMINARY RESULTS from IDC Latin America.
©2014 Experian Information Solutions, Inc. All rights reserved. Experian Confidential.
Department of Business Information & Analytics MSMESB: Experience with Adding Analytics to the Academic Program Kellie Keeling University of Denver.
Copyright © 2003, SAS Institute Inc. All rights reserved. Company confidential - for internal use only 1 Know Your Customers SAS® Banking Intelligence.
Big Data. What is Big Data? Big Data Analytics: 11 Case Histories and Success Stories
Almost 4 decades of Advanced Analytics & DM expertise.
Accelerating Development Using Open Source Software Black Duck Software Company Presentation.
Introduction to Hadoop and HDFS
Copyright © 2010, SAS Institute Inc. All rights reserved. Applied Analytics Using SAS ® Enterprise Miner™
Chapter 1 Business Driven Technology MANGT 366 Information Technology for Business Chapter 1: Management Information Systems: Business Driven MIS.
How Companies are Using Spark And where the Edge in Big Data will be Matei Zaharia.
Copyright © 2004, SAS Institute Inc. All rights reserved. SAS ® IT Intelligence Taking IT “Beyond BI ™ ” November 28, 2015.
© 2009 IBM Corporation Smarter Decisions for Optimized Performance IBM Global Executive Forum Panel Discussion Business Analytics and Optimization Fred.
Robert Mahowald August 26, 2015 VP, Cloud Software, IDC
CISC 849 : Applications in Fintech Namami Shukla Dept of Computer & Information Sciences University of Delaware iCARE : A Framework for Big Data Based.
OMIS 694, Big Data Analytics
EXASOL Helps Organizations Drive Profit by Analyzing Their Data and Information Quickly, All on the Microsoft Azure Cloud Platform MICROSOFT AZURE ISV.
Copyright © 2015, SAS Institute Inc. All rights reserved. Business & Analytics unite VS.
Axis AI Solves Challenges of Complex Data Extraction and Document Classification through Advanced Natural Language Processing and Machine Learning MICROSOFT.
Software Engineering Prof. Dr. Bertrand Meyer March 2007 – June 2007 Chair of Software Engineering Lecture #20: Profiling NetBeans Profiler 6.0.
BUSINESS INTELLIGENCE & ADVANCED ANALYTICS DISCOVER | PLAN | EXECUTE JANUARY 14, 2016.
Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 11: BIG DATA AND.
Teradata Overview. 2 The Teradata Difference What We Do >Establish an enterprise view of the business >Integrate detailed, enterprise-wide data >Provide.
Data Science Hadoop YARN Rodney Nielsen. Rodney Nielsen, Human Intelligence & Language Technologies Lab Outline Classical Hadoop What’s it all about Hadoop.
Apache Hadoop on Windows Azure Avkash Chauhan
Banking Services Solutions. Solutions and Services  Technology enabled business solutions and services to help in growth of Banking services.  Dynamic.
Oracle Exalytics Business Intelligence Machine Eshaanan Gounden – Core Technology Team.
Leverage Big Data With Hadoop Analytics Presentation by Ravi Namboori Visit
Hadoop Introduction. Audience Introduction of students – Name – Years of experience – Background – Do you know Java? – Do you know linux? – Any exposure.
Extreme Scale Infrastructure
Business Insights Play briefing deck.
Organizations Are Embracing New Opportunities
SAS Education Practice
Driving Digital Business with SAP Digital Business Services
Enabling Scalable and HA Ingestion and Real-Time Big Data Insights for the Enterprise OCJUG, 2014.
Software Architecture in Practice
Operationalize your data lake Accelerate business insight
Analytical Software for Fraud Detection
OMIS 665, Big Data Analytics
Smart Team Making a Beautiful software
Big Data Analysis in Digital Marketing
Copyright © JanBask Training. All rights reserved Get Started with Hadoop Hive HiveQL Languages.
Software Defined Perimeter Market Research Report By Forecast to 2023 Industry Survey, Growth, Competitive Landscape and Forecasts to 2023 PREPARED BY.
Presentation transcript:

SAS Analytic Solutions Running on a Hadoop Cluster using YARN James Kochuba

Market Leader in Data & Analytics Revenue Reinvested in R&D SAS Industry Average Great Places to Work® Awards 15 COUNTRIES 2 MULTINATIONAL SAS® Visual Analytics Customer Sites SAS® Cloud Analytics Revenue Growth As Ranked by IDC SAS® - Hadoop visualization and analytics solutions

Customers in 139 countries at 70,000 sites OF EXPERIENCE DECADES Customers in 139 countries at 70,000 sites 3+ Billion 35% MARKETSHARE 2014 REVENUE > SAS customers represent 90% of Fortune Global 500® companies

SAS Advanced Analytics Solution Lines SAS Background Millions of analytical procedures running at 65,000 sites Analytics applied to thousands of business issues 41,000 customers in 135 countries Three-plus decades of experience $650 million annually in advanced analytics revenue Total Yearly Revenues $2.8B IDC ranks SAS No. 1 in advanced analytics with a market share of 36.2% SAS Advanced Analytics Statistics Predictive Modeling Data Mining Text analytics Forecasting & Econometrics Solution Lines Analytics Business Intelligence Customer Intelligence Financial Intelligence Foundation Tools Fraud & Security Intelligence Governance, Risk & Compliance High-Performance Analytics Information Management IT & CIO Enablement OnDemand Solutions Performance Management Risk Management Supply Chain Intelligence Sustainability Management Quality Improvement Operations Research Data Visualization Model Management and Deployment SAS and the Analytic Lifecycle BUSINESS MANAGER BUSINESS ANALYST Domain Expert Makes Decisions Evaluates Processes and ROI Data Exploration Data Visualization Report Creation Author Rule Logic SAS Core Technologies Industries Analytic Lifecycle IT SYSTEMS / MANAGEMENT DATA MINER / STATISTICIAN Model Validation Model Deployment Model Monitoring Data Preparation Exploratory Analysis Descriptive Segmentation Predictive Modeling Speaker Notes: SAS is a global company We have many joint customers SAS is a leader in Advanced Analytics IDC ranks SAS No. 1 in advanced analytics and among top 5 business analytics providers In July 2013, IDC released its Worldwide Business Analytics Software 2013-2017 Forecast and 2012 Vendor Shares report. Based on 2012 revenue, SAS continues to dominate the advanced analytics market with a market share of 36.2%...up almost a percentage point from the previous year and more than twice the market share of IBM, our next closest competitor. SAS provides Businesses with ability to manage the analytic (Decisioning) lifecycle SAS provides tools and business solutions across a wide spectrum of industries Could also point out that SAS has had a long history of integrating with and complimenting with SAP – since the R2 days. Could refer to slides 22 and 23 if speaking to a more technical audience.

THE NEW ANALYTICS EXPERIENCE Kickoff 2015 The New Analytics Experience SAS is uniquely positioned to : Enable and Empower the new Analytics Culture; BRIDGE the gaps between Decision Design, Decision Engineering, and the Data. Decision Design Decision Engineering THE NEW ANALYTICS EXPERIENCE Data

The “ Art ” The “ Process ” Kickoff 2015 The New Analytics Experience DECISION DESIGN DECISION ENGINEERING Data is a Raw Material Data is a finished product Flexible, ad hoc Established, documented process Prototyping Governance (over data, process, technology) Data Scientists, Analysts, Smart Creatives Engineers, DBA, IT Open Source, “whatever works” Approved architecture Departmental, personal Enterprise Innovative, Experimental, Groundbreaking Productionized, Scalable, Repeatable DATA No amount or complexity is unsurmountable

Copyright © 2015, SAS Institute Inc. All rights reserved. ANALYTICS TEXT ANALYTICS Finding treasures in unstructured data like social media or survey tools that could uncover insights about consumer sentiment FORECASTING Leveraging historical data to drive better insight into decision-making for the future INFORMATION MANAGEMENT OPTIMIZATION Analyze massive amounts of data in order to accurately identify areas likely to produce the most profitable results DATA MINING Mine transaction databases for data of spending patterns that indicate a stolen card Currently, most organizations use one or more of these techniques to solve a departmental specific business problem. As such, even though valuable, from an enterprise standpoint, the value is somewhat marginalized or not optimized. What we see is that departments, such as Marketing, Finance or Risk are using one technique to solve a problem but not multiple techniques. So, for instance, in a bank, the risk department will use econometric forecasting to manage the treasury portfolio and overall risk…so they think. But they are not proactively looking at text (emails and chat), to see where the bank could be exposed. Or a retailer uses forecasting to predict what will sell, but they are not looking at on line sentiment and customer segmentation to optimize what product they should offer what customer and how to optimize distribution all as part of the same process. These are examples of why these techniques should be pulled together for the greater good. STATISTICS Copyright © 2015, SAS Institute Inc. All rights reserved. Copyright © 2011, SAS Institute Inc. All rights reserved.

Critical SAS Components for Hadoop SAS Access SAS In-Memory SAS In-Database

(Embedded Process - EP) Architecture Review SAS software with Hadoop High Speed Network SAS® Grid/Server SAS In-memory (TKGrid / LASR) In-memory Private Public SAS Access SAS In-database (Embedded Process - EP) SAN EP Hadoop Cluster

YARN Hadoop SAS AND Hadoop Traditional SAS with Hadoop EP TKGrid SAS® Grid/Server libname hadoop hdfs (hdmd) libname hadoop (hive) libname impala SAS SQL Yarn is effecting: SAS Hive and Impala calls SAS EP (Mapreduce) No yarn effect on HDMD since that goes directly to HDFS SAS ACCESS Scoring Accel calls Code Accel calls Data Quality Accel calls Impala HPA procs/LSAR Hive HDMD TKGrid EP YARN Hadoop Commodity Hardware Interaction model: Hive/Impala interaction using SQL though SAS ACCESS Accelerator calls using Embedded Process in Hadoop HPA proc calls to HPA rack who then use EP to process (can execute some DS2 code before lifting) and lift data into memory on HPA rack Note: HPA and LSAR are very similar action, just stay in memory

YARN Hadoop SAS AND Hadoop sas on hadoop TKGrid SAS ACCESS SAS® Grid/Server libname hadoop hdfs (hdmd) libname hadoop (hive) libname impala SAS reporting (SQL) SAS ACCESS Scoring Accel calls Code Accel calls Data Quality Accel calls Yarn is effecting: SAS Hive calls SAS EP (mapreduce) SAS In-memory To start TKGrid process, we will work with YARN No yarn effect on HDMD since that goes directly to HDFS Impala HPA procs/LSAR Hive HDMD Note: resources for SAS rack should be added into Hadoop cluster. TKGrid EP EP EP EP EP EP EP EP EP EP EP EP EP EP EP EP EP EP EP YARN Note: we don’t recommend LSAR currently in this picture Hadoop

SAS and yarn WHERE does SAS Fit In? SAS Access EP *Picture Created by Arun Murthy - Hortonworks http://blogs.sas.com/content/datamanagement/2014/08/20/sas-high-performance-capabilities-with-hadoop-yarn/

Sas model Sas HPDM example

Yarn view Shared sas and hadoop environment

YARN View SAS vs other work

Compress data ~31 GB (20,000,000 observations, 50 variables) Yarn view small sas application with background Number of Container Memory Usage Compress data ~31 GB (20,000,000 observations, 50 variables)

Compress data ~183 GB (120,000,000 observations, 50 variables) Yarn view Larger sas application with background Number of Container Memory Usage Compress data ~183 GB (120,000,000 observations, 50 variables)

Memory Usage Number of Container Yarn view Simple sas model with background Memory Usage Number of Container

Memory Usage Number of Container Yarn view Sas application loading hive with background Memory Usage Number of Container

Minimum container memory size can product wasted memory resources Client issues Lesson learns Minimum container memory size can product wasted memory resources MapReduce application does not use all memory Smaller applications pushed into large containers (application master to simple applications) MapReduce tuning Dependency jobs require queue to help SAS In-memory using SAS EP to lift data into memory Queue happy craves up cluster too much Monitor real resource usage vs containers Focus on application tuning SAS YARN workshop

New with 9.4M1 you can use SPDE server to write your SAS datasets to a Hadoop cluster in a distributed fashion. Now eliminating SAN storage cost.

Free Software Trial! Leading provider for: Data Preparation Data Visualization Data Analysis sas.com/strata-trial

Enter for a chance to win a GoPro HERO4! Booth 1022

Questions

Yarn tuning Basic yarn settings Property Name Description yarn.nodemanager.resource.memory-mb Amount of physical memory, in MiB, that can be allocated for containers. yarn.scheduler.minimum-allocation-mb The minimum allocation for every container request at the RM, in MBs. Memory requests lower than this won't take effect, and the specified value will get allocated at minimum. yarn.scheduler.maximum-allocation-mb Largest Container allowed. A Multiple of the minimum-allocation-mb above Depending on your setup you may want to allow the entire node for MR, or restrict it to smaller then a node to prevent potential malicious actions. yarn.nodemanager.resource.cpu-vcores Number of virtual CPU cores that can be allocated for containers. This value covers all applications and their containers running on this node and or physical system. yarn.scheduler.minimum-allocation-vcores The smallest number of virtual CPU cores that can be requested per container. yarn.scheduler.maximum-allocation-vcores The largest number of virtual CPU cores that can be requested per container. yarn.resourcemanager.scheduler.class The class used for resource manager (note Hortonworks and Cloudera used different defaults and today, they do prompt writing custom classes)

Yarn tuning Mapreduce settings Property Name Description mapreduce.map.memory.mb The size of the container for the Mapper task mapreduce.map.java.opts The java opts for the Mapper JVM, make sure that the max heap is less then the size of the container. mapreduce.reduce.memory.mb The size of the container for the Reducer task mapreduce.reduce.java.opts The java opts for the Reducer JVM, make sure that the max heap is less then the size of the container. mapreduce.job.reduce.slowstart.completedmaps Fraction of the number of maps in the job which should be complete before reduces are scheduled for the job.

Scheduler queuing Cloudera queuing Yarn tuning queues FairScheduler - CapacityScheduler – queues Cloudera queuing Dynamic Pools