Presentation is loading. Please wait.

Presentation is loading. Please wait.

Guavus Confidential – Do Not Distribute © 2014 Guavus, Inc. All rights reserved. Eric Carr (VP Core Systems Group) Spark Summit 2014 BUILDING BIG DATA.

Similar presentations


Presentation on theme: "Guavus Confidential – Do Not Distribute © 2014 Guavus, Inc. All rights reserved. Eric Carr (VP Core Systems Group) Spark Summit 2014 BUILDING BIG DATA."— Presentation transcript:

1 Guavus Confidential – Do Not Distribute © 2014 Guavus, Inc. All rights reserved. Eric Carr (VP Core Systems Group) Spark Summit 2014 BUILDING BIG DATA OPERATIONAL INTELLIGENCE PLATFORM WITH APACHE SPARK 1

2 Guavus Confidential – Do Not Distribute © 2014 Guavus, Inc. All rights reserved. Market & Technology Imperatives Communication Service Providers & Big Data Analytics

3 Guavus Confidential – Do Not Distribute © 2014 Guavus, Inc. All rights reserved. Industry Context for Communication Service Providers (CSPs) 3 Big Data is at the heart of two core strategies for CSPs: Improve current revenue sources through greater operational efficiencies Create new revenue source with the 4th wave Source: Chetan Consulting

4 Guavus Confidential – Do Not Distribute © 2014 Guavus, Inc. All rights reserved. 4 CSPs - Industry Value Chain Shift Source: asymco

5 Guavus Confidential – Do Not Distribute © 2014 Guavus, Inc. All rights reserved. CSPs - A High Bar for Operational Intelligence 5 CSPs require solutions engineered to meet very stringent requirements

6 Guavus Confidential – Do Not Distribute © 2014 Guavus, Inc. All rights reserved. Type of Data Type of Analysis 6 Different Platforms target Different Questions Data LakeData Streams Discovery Decisioning Data Warehouses + Business Intelligence Search Centric Stream Analytics + Operational Intelligence

7 Guavus Confidential – Do Not Distribute © 2014 Guavus, Inc. All rights reserved. Streaming Analytics & Machine Learning to Action

8 Guavus Confidential – Do Not Distribute © 2014 Guavus, Inc. All rights reserved. Driving Streaming Analytics to Action Network Flow Analytics Usage Awareness Operational Interactions Real-Time Actions NetFlow, Routing Planning Layer 7 Visibility Content & CDN Analytics Policy Profile Triggers Care & Experience Mgmt New Service Creation & Monetization Small Cell / RAN / Backhaul Differentiation SON / SDN / Virtualization Content Optimization Operational Intelligence 8

9 Guavus Confidential – Do Not Distribute © 2014 Guavus, Inc. All rights reserved. Reflex 1.0 Pipeline – Timely Cube Reporting Collector Hadoop Compute Analytics Store RAM Cache Reflex 1.5 Pipeline – Spark / Yarn Reflex 2.0 Pipeline – Spark Streaming core UI Collector Spark / YARN Analytics Store RAM Cache UI Collector Spark / YARN / HDFS2 UI ML Store SQL Streams Cache Msg Queue

10 Guavus Confidential – Do Not Distribute © 2014 Guavus, Inc. All rights reserved. Stream Engine Stream Engine - Operational Intelligence Analytics Data Streams Contextual Data Stores Feature Engine Data fusion Metrics creation Variables selection Causality inference Anomaly Engine Multivariate analysis Outlier Detection RCA Engine Statistical learning Clustering Pattern identification Item set mining Targeted Actions Optimized algorithmic support for common stream data processing & fusions Detect unusual events occurring in stream(s) of data. Once detected, isolate root cause Anomaly / outlier detection, commonalities / root cause forensics, prediction / forecasting, actions / alerts / notifications Record enrichment capabilities – e.g. URL categorization, device id, etc. 10

11 Guavus Confidential – Do Not Distribute © 2014 Guavus, Inc. All rights reserved. Example - State of the art causality analysis LinearLinear + Non linear Random variables (Time independent) Stochastic processes (Time dependent) Maximal Information Coefficient Maximal Information Coefficient Correlation Granger Causality Transfer Entropy RELATIONSHIP TYPE TIME DEPENDENCY Ranking of metrics: 1) Transfer Entropy 2) Maximal Information Coefficient, Granger Causality 3) Correlation 11

12 Guavus Confidential – Do Not Distribute © 2014 Guavus, Inc. All rights reserved. Transfer entropy from a process X to another process Y is the amount of uncertainty reduced in future values of Y by knowing the past values of X given past values of Y. Example - Causality Techniques CONS Challenging joint probability estimation Large amount of data needed for calculation Choice of time lags 12 Maximal Information Coefficient is a measure of the strength between two random variables based on mutual information. Methodology for empirical estimation based on maximizing the mutual information over a set of grids CONS No time information

13 Guavus Confidential – Do Not Distribute © 2014 Guavus, Inc. All rights reserved. 13 Network Operations / Care Example Identifying Commonalities, Anomalies, RCA Anomaly Detection Root-Cause Analysis Event Drivers Event Chaining

14 Guavus Confidential – Do Not Distribute © 2014 Guavus, Inc. All rights reserved. BinStream Details

15 Guavus Confidential – Do Not Distribute © 2014 Guavus, Inc. All rights reserved. Use Case Use IP Traffic Records to calculate Bandwidth Usage as a Time Series (continuously …), can’t do that based on the time the records are received by Spark. –In general for any record which has a timestamp, it important to analyze based on the time of event rather than the reception of the event record. 15

16 Guavus Confidential – Do Not Distribute © 2014 Guavus, Inc. All rights reserved. Challenges With For one dataset. Make the time stamp part of the key. For continuously streamed data sets. –You do not know if you have received all data for a particular time slot. Caused by event delay, or event duration. –An event could span multiple time slots. Caused by event duration. 16

17 Guavus Confidential – Do Not Distribute © 2014 Guavus, Inc. All rights reserved. Data Processing – Timing (Map-Reduce world) Collector/ Adapter Collector/ Writer MR Jobs/ Cube Generator MR Job/ Cube exporter STORE Columnar Storage BINS HDFS Data Files HDFS Cubes Available For visualization Delayed 2x bin interval(10mins) to allow last bin to be closed k th hour(k+1) th hour(k+2) th hour Bin Interval 5mins Job Duration 45- 55mins Write on bin interval or size Available For visualization Events PastcurrentFutureClosed 17

18 Guavus Confidential – Do Not Distribute © 2014 Guavus, Inc. All rights reserved. Proposed Binning & Proration Solution 18

19 Guavus Confidential – Do Not Distribute © 2014 Guavus, Inc. All rights reserved. Solution (cond.) The typical solutions are: –For event delay: Wait (Buffer). –For event duration: Prorate events across time slots. Introduce a concept of BinStream. An abstraction over the Dstream, which needs a –Function to extract the time fields from the records. –Function to prorate the records. Note that this can be trivially achieved by using ‘window’ functionality, by having the batch equal to the time series interval and the window size equal to maximum possible delay. 19

20 Guavus Confidential – Do Not Distribute © 2014 Guavus, Inc. All rights reserved. Problems / Solutions window == wait & buffer. –This has two issues. A. Need memory for buffering. B. Downstream needs to wait for the result (or any part of it) BinStream provides two additional options –Gets rid of delay for getting partial results, by sending regular latest snapshots for the old time slots. This does not solve the memory and increases the processing load. –If the client can handle partial results, i.e. if it can aggregate partial results, it can get updates to the old bins. This reduces the memory for the spark-streaming application. 20

21 Guavus Confidential – Do Not Distribute © 2014 Guavus, Inc. All rights reserved. Limitations. The number of time series slots for which the updates can be generated is fixed, basically governed by the event delay characteristics. 21

22 Guavus Confidential – Do Not Distribute © 2014 Guavus, Inc. All rights reserved. THANK YOU!


Download ppt "Guavus Confidential – Do Not Distribute © 2014 Guavus, Inc. All rights reserved. Eric Carr (VP Core Systems Group) Spark Summit 2014 BUILDING BIG DATA."

Similar presentations


Ads by Google