Towards High Performance Processing of Streaming Data May-27-2016 Supun Kamburugamuve, Saliya Ekanayake, Milinda Pathirage and Geoffrey C. Fox Indiana.

Slides:



Advertisements
Similar presentations
A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
Advertisements

U of Houston – Clear Lake
Big Data Open Source Software and Projects ABDS in Summary XIV: Level 14B I590 Data Science Curriculum August Geoffrey Fox
MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
Dr. Kalpakis CMSC 621, Advanced Operating Systems. Fall 2003 URL: Distributed System Architectures.
Spring 2000CS 4611 Introduction Outline Statistical Multiplexing Inter-Process Communication Network Architecture Performance Metrics.
Programming Models for IoT and Streaming Data IC2E Internet of Things Panel Judy Qiu Indiana University.
SALSA HPC Group School of Informatics and Computing Indiana University.
Esma Yildirim Department of Computer Engineering Fatih University Istanbul, Turkey DATACLOUD 2013.
Big Data Open Source Software and Projects ABDS in Summary XVI: Layer 13 Part 1 Data Science Curriculum March Geoffrey Fox
1 Tuesday, October 03, 2006 If I have seen further, it is by standing on the shoulders of giants. -Isaac Newton.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Case Study - GFS.
Larisa kocsis priya ragupathy
IoTCloud Platform – Connecting Sensors to Cloud Services Supun Kamburugamuve, Geoffrey C. Fox {skamburu, School of Informatics and Computing.
Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.
High Throughput Computing on P2P Networks Carlos Pérez Miguel
Cassandra - A Decentralized Structured Storage System
Harp: Collective Communication on Hadoop Bingjing Zhang, Yang Ruan, Judy Qiu.
HYDRA: Using Windows Desktop Systems in Distributed Parallel Computing Arvind Gopu, Douglas Grover, David Hart, Richard Repasky, Joseph Rinkovsky, Steve.
SALSA HPC Group School of Informatics and Computing Indiana University.
OV Copyright © 2013 Logical Operations, Inc. All rights reserved. Network Theory  Networking Terminology  Network Categories  Standard Network.
OV Copyright © 2011 Element K Content LLC. All rights reserved. Network Theory  Networking Terminology  Network Categories  Standard Network Models.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 February 10, 2005 Session 9.
Performance Model for Parallel Matrix Multiplication with Dryad: Dataflow Graph Runtime Hui Li School of Informatics and Computing Indiana University 11/1/2012.
 Apache Airavata Architecture Overview Shameera Rathnayaka Graduate Assistant Science Gateways Group Indiana University 07/27/2015.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 February Session 11.
Distributed DBMSs- Concept and Design Jing Luo CS 157B Dr. Lee Fall, 2003.
A Collaborative Framework for Scientific Data Analysis and Visualization Jaliya Ekanayake, Shrideep Pallickara, and Geoffrey Fox Department of Computer.
GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.
Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
Design an MPI collective communication scheme A collective communication involves a group of processes. –Assumption: Collective operation is realized based.
Distributed simulation with MPI in ns-3 Joshua Pelkey and Dr. George Riley Wns3 March 25, 2011.
Internet-Based TSP Computation with Javelin++ Michael Neary & Peter Cappello Computer Science, UCSB.
AMQP, Message Broker Babu Ram Dawadi. overview Why MOM architecture? Messaging broker like RabbitMQ in brief RabbitMQ AMQP – What is it ?
MPI implementation – collective communication MPI_Bcast implementation.
Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.
High Performance Processing of Streaming Data Workshops on Dynamic Data Driven Applications Systems(DDDAS) In conjunction with: 22nd International Conference.
Streaming Applications for Robots with Real Time QoS Oct Supun Kamburugamuve Indiana University.
By Nitin Bahadur Gokul Nadathur Department of Computer Sciences University of Wisconsin-Madison Spring 2000.
Java-Based Parallel Computing on the Internet: Javelin 2.0 & Beyond Michael Neary & Peter Cappello Computer Science, UCSB.
Indiana University Faculty Geoffrey Fox, David Crandall, Judy Qiu, Gregor von Laszewski Data Science at Digital Science Center.
Adaptive Online Scheduling in Storm Paper by Leonardo Aniello, Roberto Baldoni, and Leonardo Querzoni Presentation by Keshav Santhanam.
High Performance Processing of Streaming Data in the Cloud AFOSR FA : Cloud-Based Perception and Control of Sensor Nets and Robot Swarms 01/27/2016.
1 Panel on Merge or Split: Mutual Influence between Big Data and HPC Techniques IEEE International Workshop on High-Performance Big Data Computing In conjunction.
Cloud-based Parallel Implementation of SLAM for Mobile Robots Supun Kamburugamuve, Hengjing He, Geoffrey Fox, David Crandall School of Informatics & Computing.
Pilot Kafka Service Manuel Martín Márquez. Pilot Kafka Service Manuel Martín Márquez.
SPIDAL Java Optimized February 2017 Software: MIDAS HPC-ABDS
Cassandra - A Decentralized Structured Storage System
Open Source distributed document DB for an enterprise
Alternative system models
Distinguishing Parallel and Distributed Computing Performance
Chapter 16: Distributed System Structures
Parallel Programming in C with MPI and OpenMP
Geoffrey Fox, Shantenu Jha, Lavanya Ramakrishnan
MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner
Replication Middleware for Cloud Based Storage Service
IEEE BigData 2016 December 5-8, Washington D.C.
湖南大学-信息科学与工程学院-计算机与科学系
Distinguishing Parallel and Distributed Computing Performance
Distinguishing Parallel and Distributed Computing Performance
Digital Science Center III
Indiana University, Bloomington
Lecture 16 (Intro to MapReduce and Hadoop)
Twister2: Design of a Big Data Toolkit
Parallel Programming in C with MPI and OpenMP
Big Data, Simulations and HPC Convergence
Emulating Massively Parallel (PetaFLOPS) Machines
Convergence of Big Data and Extreme Computing
Twister2 for BDEC2 Poznan, Poland Geoffrey Fox, May 15,
Presentation transcript:

Towards High Performance Processing of Streaming Data May Supun Kamburugamuve, Saliya Ekanayake, Milinda Pathirage and Geoffrey C. Fox Indiana University Bloomington, USA

Outline Research Motivation Distributed stream processing systems (DSPFs) Communication improvements Results Future work

Background & Motivation Robot with a Laser Range Finder Map Built from Robot data Robotics Applications Robots need to avoid collisions when they move N-Body Collision AvoidanceSimultaneous Localization and Mapping Time series data visualization in real time Cloud Applications HPC Applications Work in progress

Data pipeline Message Brokers RabbitMQ, Kafka Gateway Sending to pub-sub Sending to Persisting to storage Streaming workflow A stream application with some tasks running in parallel Multiple streaming workflows Streaming Workflows Apache Storm Hosted in FutureSystems OpenStack cloud which is accessible through IU network End to end delays without any processing is less than 10ms

Improving communications Broadcast communication Shared memory communication between workers in a node

Streaming Application User Graph Execution Graph User graph is converted to an execution graph Stream of events Event processing logic Broadcast, Round Robin, Direct, Stream Partitioning Communication methods

Example applications Particles are distributed in parallel tasks SLAM Map building happens periodically N-Body collision avoidance

Execution Graph Distribution in the Storm Cluster W Worker Node-1 S Worker W W Node-2 S Worker G Two node cluster each running two workers. The tasks of the Topology is assigned to the workers

Communications in Storm 9 Worker and Task distribution of Storm A worker hosts multiple tasks. B-1 is a task of component B and W-1 is a task of W W-1 Worker Node-1 B-1 W-3 Worker W-2 W-5 Worker Node-2 W-4 W-7 Worker W-6 Communication links are between workers These are multiplexed among the tasks W-1 Worker Node-1 B-1 W-3 Worker W-2 W-5 Worker Node-2 W-4 W-7 Worker W-6 BW

Default Broadcast 10 W-1 Worker Node-1 B-1 W-3 Worker W-2 W-5 Worker Node-2 W-4 W-7 Worker W-6 B-1 wants to broadcast a message to W, it sends 6 messages through 3 TCP communication channels and send 1 message to W-1 via memory

Optimized Broadcast Binary tree Workers arranged in a binary tree Flat tree Broadcast from the origin to 1 worker in each node sequentially. This worker broadcast to other workers in the node sequentially Bidirectional Rings Workers arranged in a line Starts two broadcasts from the origin and these traverse half of the line

Three Algorithms for broadcast Example broadcasting communications for each algorithm in a 4 node cluster with each machine having 4 workers. The outer green boxes show cluster machines and inner small boxes show workers. The top box displays the broadcasting worker and arrows illustrate the communication among the workers

Shared memory based Communications Inter process communications using shared memory for a single node Multiple writer single reader design Writer breaks the message in to packets and puts them to memory Reader reads the packets and assemble the message

Shared Memory Communications writePacket 1Packet 2Packet 3 Writer 01 Writer 02 Write Obtain the write location atomically and increment Shared File Reader Read packet by packet sequentially Use a new file when the file size is reached Reader deletes the files after it reads them fully IDNo of PacketsPacket NoDest TaskContent LengthSource TaskStream LengthStreamContent Variable Bytes Fields Packet Structure

Experiments W W W B R G RabbitMQ Client 11 Node cluster 1 Node – Nimbus & ZooKeeper 1 Node – RabbitMQ 1 Node – Client 8 Nodes – Supervisors with 4 workers each Client sends messages with the current timestamp, the topology returns a response with the same time stamp. Latency = current time - timestamp

Memory Mapped Communications No significant difference because we are using all the workers in the cluster beyond 30 workers Y axis shows the difference in latency of default TCP implementation and shared memory based implementation (TCP - SHM) Relative Importance of Shared Memory Communication to TCP Default TCP implementation Latency A topology with communications going through all the workers arranged in a line About 25% reduction for 32 workers

Broadcast Latency Improvement Latency of binary tree, flat tree and bi-directional ring implementations compared to serial implementation. Different lines show varying parallel tasks with TCP communications and shared memory communications(SHM). Speedup of latency with both TCP based and Shared Memory based communications for different algorithms

Throughput of serial, binary tree, flat tree and ring implementations. Different lines show varying parallel tasks with TCP communications and shared memory communications (SHM) Throughput

Future Work Implement the Shared memory and communication algorithms for other stream engines (Twitter Heron) Experiment on larger clusters with applications HPC Scheduler for Streaming applications (Slurm, Torque) C++ APIs for data processing High performance interconnects for high throughput low latency communications in HPC clusters Scheduling to take advantage of the shared memory & collective communications

Summary Communication algorithms reduce the network traffic and increases throughput Ring gives the best throughput and binary tree best latency Shared memory communications reduce the latency further but not throughput because TCP is the bottleneck This work was partially supported by NSF CIF21 DIBBS and AFOSR FA awards. We would also like to express our gratitude to the staff and system administrators at the Digital Science Center (DSC) at Indiana University for supporting us on this work.

References Projects SLAM Collision Avoidance Time series data visualization Apache Storm Twitter Heron