REAL-TIME NETWORK ANALYTICS WITH STORM

Slides:



Advertisements
Similar presentations
Copyright © 2007, GemStone Systems Inc. All Rights Reserved. Optimize computations with Grid data caching OGF21 Jags Ramnarayan Chief Architect, GemFire.
Advertisements

Designing InfoPath Forms: The Dos and Donts Deploying InfoPath Forms: Making the right choice Adding custom business logicin case the built-in stuff isnt.
Middleware Support for RDMA-based Data Transfer in Cloud Computing Yufei Ren, Tan Li, Dantong Yu, Shudong Jin, Thomas Robertazzi Department of Electrical.
Apache Storm A scalable distributed & fault tolerant real time computation system ( Free & Open Source ) Shyam Rajendran 16-Feb-15.
Big Data Open Source Software and Projects ABDS in Summary XIV: Level 14B I590 Data Science Curriculum August Geoffrey Fox
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
Apache Storm and Kafka Boston Storm User Group September 25, 2014 P. Taylor Goetz,
Training solution for Mobile Workforce. People expect to consume content when and where they want to. Training for Mobile Workforce.
Lecture 18-1 Lecture 17-1 Computer Science 425 Distributed Systems CS 425 / ECE 428 Fall 2013 Hilfi Alkaff November 5, 2013 Lecture 21 Stream Processing.
1 Large-Scale Machine Learning at Twitter Jimmy Lin and Alek Kolcz Twitter, Inc. Presented by: Yishuang Geng and Kexin Liu.
Data - Information - Knowledge
Workload Management Massimo Sgaravatto INFN Padova.
1© Copyright 2015 EMC Corporation. All rights reserved. SDN INTELLIGENT NETWORKING IMPLICATIONS FOR END-TO-END INTERNETWORKING Simone Mangiante Senior.
Pulsar Realtime Analytics At Scale Tony Ng, Sharad Murthy June 11, 2015.
Real-time Stream Processing Architecture for Comcast IP Video
Real-Time Stream Processing CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook.
Word Wide Cache Distributed Caching for the Distributed Enterprise.
` tuplejump The data engineering platform. A startup with a vision to simplify data engineering and empower the next generation of data powered miracles!
Overview of Cloud Computing Sven Rosvall ACCU
Department of Information Engineering The Chinese University of Hong Kong A Framework for Monitoring and Measuring a Large-Scale Distributed System in.
Chap 7: Consistency and Replication
Copyright © 2006, GemStone Systems Inc. All Rights Reserved. Increasing computation throughput with Grid Data Caching Jags Ramnarayan Chief Architect GemStone.
High Performance Processing of Streaming Data Workshops on Dynamic Data Driven Applications Systems(DDDAS) In conjunction with: 22nd International Conference.
Big Data Analytics Platforms. Our Team NameApplication Viborov MichaelApache Spark Bordeynik YanivApache Storm Abu Jabal FerasHPCC Oun JosephGoogle BigQuery.
TWOJA CYFROWA PRZYSZŁOŚĆ. JUŻ DZISIAJ. Christoph F. Strnadl CTO Central & Eastern Europe 11 May 2016.
Real-time Ingestion of telemetry into Hadoop to respond to Zero-Day Attacks Vipul Sawant, Pallav Jakhotiya.
Part III BigData Analysis Tools (Storm) Yuan Xue
Adaptive Online Scheduling in Storm Paper by Leonardo Aniello, Roberto Baldoni, and Leonardo Querzoni Presentation by Keshav Santhanam.
Towards High Performance Processing of Streaming Data May Supun Kamburugamuve, Saliya Ekanayake, Milinda Pathirage and Geoffrey C. Fox Indiana.
Fault – Tolerant Distributed Multimedia Streaming Web Application By Nirvan Sagar – Srishti Ganjoo – Syed Shahbaaz Safir
Energy Management Solution
DATA Storage and analytics with AZURE DATA LAKE
Pilot Kafka Service Manuel Martín Márquez. Pilot Kafka Service Manuel Martín Márquez.
Heron: a stream data processing engine
Apache Storm.
Euro17 LSO Hackathon Open LSO Analytics
Connected Infrastructure
R-Storm: Resource Aware Scheduling in Storm
TV Broadcasting What to look for Architecture TV Broadcasting Solution
Workload Management Workpackage
E-Storm: Replication-based State Management in Distributed Stream Processing Systems Xunyun Liu, Aaron Harwood, Shanika Karunasekera, Benjamin Rubinstein.
Analytics as a First-Class Concern
Smart Building Solution
Introduction to Spark Streaming for Real Time data analysis
Parcel Tracking Solution Parcel Tracking What to look for Architecture
Original Slides by Nathan Twitter Shyam Nutanix
Real-Time Processing with Apache Flume, Kafka, and Storm Kamlesh Dhawale Ankalytics
WHY IDEAL ANALYTICS?.
Spark Presentation.
Smart Building Solution
Advancements in Availability and Reliability computation Introduction and current status of the Comp Reports mini project C. Kanellopoulos GRNET.
Connected Infrastructure
Enabling Scalable and HA Ingestion and Real-Time Big Data Insights for the Enterprise OCJUG, 2014.
Energy Management Solution
Eng Computation & Data Science.
Big Data Analytics in Parallel Systems
9/18/2018 Big Data Analytics with HDInsight Module 6 – Storm Essentials Asad Khan Nishant Thacker Principal PM Manager Technical Product Manager.
MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner
Capital One Architecture Team and DataTorrent
Big Data - in Performance Engineering
Ed oms team OMS: Log Analytics Ed oms team.
7.1. CONSISTENCY AND REPLICATION INTRODUCTION
Learn. Imagine. Build. .NET Conf
Introduction to SAP HANA
From Rivulets to Rivers: Elastic Stream Processing in Heron
SAND: Towards High-Performance Serverless Computing
In Distributed Systems
CRM DMP – a marriage of two acronyms
Streaming data processing using Spark
Presentation transcript:

REAL-TIME NETWORK ANALYTICS WITH STORM Mauricio Vacas Fausto Inestroza Sonali Parthasarathy

The Team Mauricio Vacas Big Data Architect Anita Mehrotra Data Scientist Fausto Inestroza Big Data Architect Krista Schnell Visualization Sonali Parthasarathy Real-Time Processing Susie Lu Visualization John Akred Product Lead Rick Drushal Engineering Lead

WHY REAL-TIME?

PROCESS UNDERSTAND REACT Real-Time Data Ingestion Distributed Analytics Real-Time Data Ingestion Model Prototyping Exploratory Analytics Real-Time Rule Execution UNDERSTAND REACT

Accenture Cloud Platform Recommender as a Service … Network Analytics Services Big Data Platform

Drivers consumer devices Issues Operational Costs video usage Understanding service quality degradation Inefficient capacity planning

VISUALIZE INGEST PROCESS STORE ANALYZE

WHY STORM?

What do we need? Multiple use cases Processing, computation, etc. Data types, size, velocity Scalability Mission critical data Fault-tolerance Time series / pattern analysis Reliability

How do we get this from Storm? Processing, computation, etc. Low-level Primitives Scalability Parallelization Fault-tolerance Robust fail-over strategies Reliability Processing guarantees

PRIMITIVES

Topology Stream Spout Bolt Suboptimal network speed, geospatial analysis Topology Request info (IP, user-agent, etc) Stream Tuple Pull messages from distributed queue Spout Sessionization, speed calculation Bolt

PARALLELISM

Supervisor W T Nimbus Zookeeper Supervisor W T

Topology Worker Process Executor Executor Task Task Task Task

FAULT TOLERANCE

Supervisor W T T W Nimbus T T Supervisor W T Supervisor W T

RELIABILITY

IP1 IP2 IP2 IP3 IP3 A

IP1 IP2 IP2 IP3 IP3 A

SUBOPTIMAL NETWORK SPEED TOPOLOGY AN EXAMPLE

Calculate N/W Speed per Session Identify Suboptimal Speed Kafka Spout Pre-process Sessionize Calculate N/W Speed per Session Update Speed per IP Identify Suboptimal Speed Store in Cassandra Tuple (ip 1) Tuple (ip 1) Tuple (ip 1) Tuple (ip 1) Tuple (ip 1) Tuple (ip 1) Tuple (ip 1) Cassandra

Calculate N/W Speed per Session Identify Suboptimal Speed Parallelism Kafka Spout Pre-process Sessionize Calculate N/W Speed per Session Update Speed per IP Identify Suboptimal Speed Store in Cassandra Tuple (ip 1) Tuple (ip 1) Tuple (ip 1) Tuple (ip 1) Tuple (ip 1) Tuple (ip 1) Tuple (ip 1) Tuple (ip 1) Tuple (ip 1) Cassandra Tuple (ip 2) Tuple (ip 2) Tuple (ip 2) Tuple (ip 2) Tuple (ip 2) Tuple (ip 2) Tuple (ip 2) Tuple (ip 2) Tuple (ip 2)

Calculate N/W Speed per Session Branching and Joins Kafka Spout Pre-process Sessionize Calculate N/W Speed per Session Update Speed per IP Join Compare Speed Store in Cassandra Stream 1 Tuple (ip 1/NY) Tuple (ip 1/NY) Tuple (ip 1) Cassandra Tuple (NY) Stream 2 Kafka Spout Speed by Location

RULE EXECUTION

METHOD 1 Storm METHOD 2 Storm + Drools Drools

Calculate N/W Speed per Session Identify Suboptimal Speed Storm + Drools Kafka Spout Pre-process Sessionize Calculate N/W Speed per Session Update Speed per IP Identify Suboptimal Speed Store in Cassandra Drools Cassandra

Integration with Cassandra Optimal for time series data Near-linear scalable Low read/write latency Custom Bolt Uses Hector API to access Cassandra Creates dynamic columns per request Stores relevant network data

Lessons Learned Rebalance Topology Tweak Parallelism in bolt Isolation of Topologies Use TimeUUIDUtils Log4j level set to INFO by default

DEMO

Next Steps Trident Externalizing Rules Predictive Models Real-Time Notifications