Apache Storm A scalable distributed & fault tolerant real time computation system ( Free & Open Source ) Shyam Rajendran 16-Feb-15.

Slides:



Advertisements
Similar presentations
Apache ZooKeeper By Patrick Hunt, Mahadev Konar
Advertisements

Executional Architecture
Spark Streaming Large-scale near-real-time stream processing
Big Data Open Source Software and Projects ABDS in Summary XIV: Level 14B I590 Data Science Curriculum August Geoffrey Fox
Mapreduce and Hadoop Introduce Mapreduce and Hadoop
MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
MapReduce Online Veli Hasanov Fatih University.
Spark Streaming Large-scale near-real-time stream processing UC BERKELEY Tathagata Das (TD)
Lecture 18-1 Lecture 17-1 Computer Science 425 Distributed Systems CS 425 / ECE 428 Fall 2013 Hilfi Alkaff November 5, 2013 Lecture 21 Stream Processing.
Spark: Cluster Computing with Working Sets
Discretized Streams: Fault-Tolerant Streaming Computation at Scale Wenting Wang 1.
1 Large-Scale Machine Learning at Twitter Jimmy Lin and Alek Kolcz Twitter, Inc. Presented by: Yishuang Geng and Kexin Liu.
 Need for a new processing platform (BigData)  Origin of Hadoop  What is Hadoop & what it is not ?  Hadoop architecture  Hadoop components (Common/HDFS/MapReduce)
CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) Lecture 22: Stream Processing, Graph Processing All slides © IG.
Google Distributed System and Hadoop Lakshmi Thyagarajan.
HADOOP ADMIN: Session -2
Real-Time Stream Processing CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook.
Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.
SIDDHARTH MEHTA PURSUING MASTERS IN COMPUTER SCIENCE (FALL 2008) INTERESTS: SYSTEMS, WEB.
SOFTWARE SYSTEMS DEVELOPMENT MAP-REDUCE, Hadoop, HBase.
Company LOGO An Introduction of JStorm
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
NoSQL Databases Oracle - Berkeley DB Rasanjalee DM Smriti J CSC 8711 Instructor: Dr. Raj Sunderraman.
NoSQL Databases Oracle - Berkeley DB. Content A brief intro to NoSQL About Berkeley Db About our application.
MARISSA: MApReduce Implementation for Streaming Science Applications 作者 : Fadika, Z. ; Hartog, J. ; Govindaraju, M. ; Ramakrishnan, L. ; Gunter, D. ; Canon,
Spark Streaming Large-scale near-real-time stream processing
MapReduce Kristof Bamps Wouter Deroey. Outline Problem overview MapReduce o overview o implementation o refinements o conclusion.
Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!
Presenters: Rezan Amiri Sahar Delroshan
Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA
HDFS (Hadoop Distributed File System) Taejoong Chung, MMLAB.
Presented by: Katie Woods and Jordan Howell. * Hadoop is a distributed computing platform written in Java. It incorporates features similar to those of.
MapReduce Computer Engineering Department Distributed Systems Course Assoc. Prof. Dr. Ahmet Sayar Kocaeli University - Fall 2015.
Copyright © 2006, GemStone Systems Inc. All Rights Reserved. Increasing computation throughput with Grid Data Caching Jags Ramnarayan Chief Architect GemStone.
A N I N - MEMORY F RAMEWORK FOR E XTENDED M AP R EDUCE 2011 Third IEEE International Conference on Coud Computing Technology and Science.
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
History • Created by Nathan BackType • Open sourced on 19th September, 2011 Documentation at Contribution
Distributed File System. Outline Basic Concepts Current project Hadoop Distributed File System Future work Reference.
강호영 Contents Storm introduction – Storm Architecture – Concepts of Storm – Operation Modes : Local Mode vs. Remote(Cluster) Mode.
ZOOKEEPER. CONTENTS ZooKeeper Overview ZooKeeper Basics ZooKeeper Architecture Getting Started with ZooKeeper.
Part III BigData Analysis Tools (Storm) Yuan Xue
COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn University
Microsoft Ignite /28/2017 6:07 PM
HERON.
TensorFlow– A system for large-scale machine learning
Lecture 22: Stream Processing, Graph Processing
CSCI5570 Large Scale Data Processing Systems
Introduction to Distributed Platforms
Software Systems Development
Guangxiang Du*, Indranil Gupta
Chapter 10 Data Analytics for IoT
Original Slides by Nathan Twitter Shyam Nutanix
Real-Time Processing with Apache Flume, Kafka, and Storm Kamlesh Dhawale Ankalytics
Spark Presentation.
MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner
Replication Middleware for Cloud Based Storage Service
COS 518: Advanced Computer Systems Lecture 11 Michael Freedman
Boyang Peng, Le Xu, Indranil Gupta
Capital One Architecture Team and DataTorrent
湖南大学-信息科学与工程学院-计算机与科学系
Introduction to PIG, HIVE, HBASE & ZOOKEEPER
Overview of big data tools
Lecture 16 (Intro to MapReduce and Hadoop)
Cloud Computing for Data Analysis Pig|Hive|Hbase|Zookeeper
Apache Storm: Design And Usage
COS 518: Advanced Computer Systems Lecture 12 Michael Freedman
COS 518: Distributed Systems Lecture 11 Mike Freedman
Lecture 29: Distributed Systems
Pig Hive HBase Zookeeper
Presentation transcript:

Apache Storm A scalable distributed & fault tolerant real time computation system ( Free & Open Source ) Shyam Rajendran 16-Feb-15

Agenda History & the whys Concept & Architecture Features Demo!

History Before the Storm Workers Queues

Analyzing Real Time Data (old) http://www.slideshare.net/nathanmarz/storm-distributed-and-faulttolerant-realtime-computation

Why not as one self-contained application? History Problems? Cumbersome to build applications (manual + tedious + serialize/deserialize message) Brittle ( No fault tolerance ) Pain to scale - same application logic spread across many workers, deployed separately Hadoop ? For parallel batch processing : No Hacks for realtime Map/Reduce is built to leverage data localization on HDFS to distribute computational jobs. Works on big data. 2011 : lead engineer analytics products http://nathanmarz.com/blog/history-of-apache-storm-and-lessons-learned.html Why not as one self-contained application? http://nathanmarz.com.

BackType ( Acquired by Twitter ) Enter the Storm! BackType ( Acquired by Twitter ) Nathan Marz* + Clojure Storm ! Stream process data in realtime with no latency! Generates big data!

Features Simple programming model Programming language agnostic Topology - Spouts – Bolts Programming language agnostic ( Clojure, Java, Ruby, Python default ) Fault-tolerant Horizontally scalable Ex: 1,000,000 messages per second on a 10 node cluster Guaranteed message processing Fast : Uses zeromq message queue Local Mode : Easy unit testing

Concepts – Steam and Spouts Stream Unbounded sequence of tuples ( storm data model ) <key, value(s)> pair ex. <“UIUC”, 5> Spouts Source of streams : Twitterhose API Stream of tweets or some crawler

Concept - Bolts Bolts Functions Process (one or more ) input stream and produce new streams Functions Filter, Join, Apply/Transform etc Parallelize to make it fast! – multiple processes constitute a bolt

Concepts – Topology & Grouping Graph of computation – can have cycles Network of Spouts and Bolts Spouts and bolts execute as many tasks across the cluster Grouping How to send tuples between the components / tasks?

Concepts – Grouping Shuffle Grouping Fields Grouping All Grouping Distribute streams “randomly” to bolt’s tasks Fields Grouping Group a stream by a subset of its fields All Grouping All tasks of bolt receive all input tuples Useful for joins Global Grouping Pick task with lowers id

Zookeeper Open source server for highly reliable distributed coordination. As a replicated synchronization service with eventual consistency. Features Robust Persistent data replicated across multiple nodes Master node for writes Concurrent reads Comprises a tree of znodes, - entities roughly representing file system nodes. Use only for saving small configuration data.

Cluster

Features Simple programming model Programming language agnostic Topology - Spouts – Bolts Programming language agnostic ( Clojure, Java, Ruby, Python default ) Guaranteed message processing Fault-tolerant Horizontally scalable Ex: 1,000,000 messages per second on a 10 node cluster Fast : Uses zeromq message queue Local Mode : Easy unit testing

Guranteed Message Processing When is a message “Fully Proceed” ? "fully processed" when the tuple tree has been exhausted and every message in the tree has been processed A tuple is considered failed when its tree of messages fails to be fully processed within a specified timeout. Storms’s reliability API ? Tell storm whenever you create a new link in the tree of tuples Tell storm when you have finished processing individual tuple

Fault Tolerance APIS Emit(tuple, output) Ack(tuple) Fail(tuple) Emits an output tuple, perhaps anchored on an input tuple (first argument) Ack(tuple) Acknowledge that you (bolt) finished processing a tuple Fail(tuple) Immediately fail the spout tuple at the root of tuple topology if there is an exception from the database, etc. Must remember to ack/fail each tuple Each tuple consumes memory. Failure to do so results in memory leaks.

Fault-tolerant What’s the catch? Anchoring How? Specify link in the tuple tree. ( anchor an output to one or more input tuples.) At the time of emitting new tuple Replay one or more tuples. How? Every individual tuple must be acked. If not task will run out of memory! Filter Bolts ack at the end of execution Join/Aggregation bolts use multi ack . "acker" tasks Track DAG of tuples for every spout Every tuple ( spout/bolt ) given a random 64 bit id Every tuple knows the ids of all spout tuples for which it exits. When you emit a new tuple in a bolt, the spout tuple ids from the tuple's anchors are copied into the new tuple. When a tuple is acked, it sends a message to the appropriate acker tasks with information about how the tuple tree changed. In particular it tells the acker "I am now completed within the tree for this spout tuple, and here are the new tuples in the tree that were anchored to me". What’s the catch?

Failure Handling A tuple isn't acked because the task died: Spout tuple ids at the root of the trees for the failed tuple will time out and be replayed. Acker task dies: All the spout tuples the acker was tracking will time out and be replayed. Spout task dies: The source that the spout talks to is responsible for replaying the messages. For example, queues like Kestrel and RabbitMQ will place all pending messages back on the queue when a client disconnects.

Storm Genius Major breakthrough : Tracking algorithm Storm uses mod hashing to map a spout tuple id to an acker task. Acker task: Stores a map from a spout tuple id to a pair of values. Task id that created the spout tuple Second value is 64bit number : Ack Val XOR all tuple ids that have been created/acked in the tree. Tuple tree completed when Ack Val = 0 Configuring Reliability Config.TOPOLOGY_ACKERS to 0. you can emit them as unanchored tuples

Exactly Once Semantics ? Trident High level abstraction for realtime computing on top of storm Stateful stream processing with low latency distributed quering Provides exactly-once semantics ( avoid over counting ) How can we do ? That is, the state updates for batch 3 won't be applied until the state updates for batch 2 have succeeded. Store the transaction id with the count in the database as an atomic value https://storm.apache.org/documentation/Trident-state

Exactly Once Mechanism Lets take a scenario Count aggregation of your stream Store running count in database. Increment count after processing tuple. Failure! Design Tuples are processed as small batches. Each batch of tuples is given a unique id called the "transaction id" (txid). If the batch is replayed, it is given the exact same txid. State updates are ordered among batches. Batch for a txid never changes, and Trident ensures that state updates are ordered among batches.

Exactly Once Mechanism (contd.) Design Processing txid = 3 Database state man => [count=3, txid=1] dog => [count=4, txid=3] apple => [count=10, txid=2] ["man"] ["dog"] If they're the same : SKIP ( Strong Ordering ) If they're different, you increment the count. man => [count=5, txid=3] dog => [count=4, txid=3] apple => [count=10, txid=2] https://storm.apache.org/documentation/Trident-state

Improvements and Future Work Lax security policies Performance and scalability improvements Presently with just 20 nodes SLAs that require processing more than a million records per second is achieved. High Availability (HA) Nimbus Though presently not a single point of failure, it does affect degrade functionality. Enhanced tooling and language support http://hortonworks.com/blog/the-future-of-apache-storm/

DEMO

Intermediate Ranker Bolt Topology Parse Tweet Bolt Count Bolt Intermediate Ranker Bolt Total Ranker Bolt Report Bolt Tweet Spout

Downloads Download the binaries, Install and Configure - ZooKeeper. - Download the code, build, install - zeromq and jzmq. - Download the binaries, Install and Configure – Storm. http://www.thecloudavenue.com/2013/11/InstallingAndConfiguringStormOnUbuntu.htmlhttp://www.thecloudavenue.com/2013/11/InstallingAndConfiguringStormOnUbuntu.html

References https://storm.apache.org/ http://www.slideshare.net/nathanmarz/storm-distributed-and-faulttolerant-realtime-computation http://hortonworks.com/blog/the-future-of-apache-storm/ http://zeromq.org/intro:read-the-manual http://www.thecloudavenue.com/2013/11/InstallingAndConfiguringStormOnUbuntu.html https://storm.apache.org/documentation/Setting-up-a-Storm-cluster.html