Parallel Event Processing for Content-Based Publish/Subscribe Systems Amer Farroukh Department of Electrical and Computer Engineering University of Toronto.

Slides:

Advertisements

Similar presentations

Solving Manufacturing Equipment Monitoring Through Efficient Complex Event Processing Tilmann Rabl, Kaiwen Zhang, Mohammad Sadoghi, Navneet Kumar Pandey,

Advertisements

Efficient Event-based Resource Discovery Wei Yan*, Songlin Hu*, Vinod Muthusamy +, Hans-Arno Jacobsen +, Li Zha* * Chinese Academy of Sciences, Beijing.

Alex Cheung and Hans-Arno Jacobsen August, 14 th 2009 MIDDLEWARE SYSTEMS RESEARCH GROUP.

Concurrent programming: From theory to practice Concurrent Algorithms 2014 Vasileios Trigonakis Georgios Chatzopoulos.

Multi-dimensional Packet Classification on FPGA: 100Gbps and Beyond

Bio Michel Hanna M.S. in E.E., Cairo University, Egypt B.S. in E.E., Cairo University at Fayoum, Egypt Currently is a Ph.D. Student in Computer Engineering.

Erhan Erdinç Pehlivan Computer Architecture Support for Database Applications.

Multithreaded FPGA Acceleration of DNA Sequence Mapping Edward Fernandez, Walid Najjar, Stefano Lonardi, Jason Villarreal UC Riverside, Department of Computer.

Fast Paths in Concurrent Programs Wen Xu, Princeton University Sanjeev Kumar, Intel Labs. Kai Li, Princeton University.

Parallelized variational EM for Latent Dirichlet Allocation: An experimental evaluation of speed and scalability Ramesh Nallapati, William Cohen and John.

Revisiting a slide from the syllabus: CS 525 will cover Parallel and distributed computing architectures – Shared memory processors – Distributed memory.

Capriccio: Scalable Threads for Internet Services ( by Behren, Condit, Zhou, Necula, Brewer ) Presented by Alex Sherman and Sarita Bafna.

Page 1 CS Department Parallel Design of JPEG2000 Image Compression Xiuzhen Huang CS Department UC Santa Barbara April 30th, 2003.

CS533 Concepts of Operating Systems Class 2 Thread vs Event-Based Programming.

Copyright ©2009 Opher Etzion Event Processing Course Engineering and implementation considerations (related to chapter 10)

Deep Packet Inspection with Regular Expression Matching Min Chen, Danny Guo {michen, CSE Dept, UC Riverside 03/14/2007.

Lecture 37: Chapter 7: Multiprocessors Today’s topic –Introduction to multiprocessors –Parallelism in software –Memory organization –Cache coherence 1.

Why The Grass May Not Be Greener On The Other Side: A Comparison of Locking vs. Transactional Memory Written by: Paul E. McKenney Jonathan Walpole Maged.

MATE-EC2: A Middleware for Processing Data with Amazon Web Services Tekin Bicer David Chiu* and Gagan Agrawal Department of Compute Science and Engineering.

Design and Implementation of a Single System Image Operating System for High Performance Computing on Clusters Christine MORIN PARIS project-team, IRISA/INRIA.

University of Karlsruhe, System Architecture Group Balancing Power Consumption in Multiprocessor Systems Andreas Merkel Frank Bellosa System Architecture.

What is Concurrent Programming? Maram Bani Younes.

Achieving fast (approximate) event matching in large-scale content- based publish/subscribe networks Yaxiong Zhao and Jie Wu The speaker will be graduating.

Motivation “Every three minutes a woman is diagnosed with Breast cancer” (American Cancer Society, “Detailed Guide: Breast Cancer,” 2006) Explore the use.

Efficient Parallel Implementation of Molecular Dynamics with Embedded Atom Method on Multi-core Platforms Reporter: Jilin Zhang Authors:Changjun Hu, Yali.

“Low-Power, Real-Time Object- Recognition Processors for Mobile Vision Systems”, IEEE Micro Jinwook Oh ; Gyeonghoon Kim ; Injoon Hong ; Junyoung.

Performance Tuning on Multicore Systems for Feature Matching within Image Collections Xiaoxin Tang*, Steven Mills, David Eyers, Zhiyi Huang, Kai-Cheung.

MIDDLEWARE SYSTEMS RESEARCH GROUP Denial of Service in Content-based Publish/Subscribe Systems M.A.Sc. Candidate: Alex Wun Thesis Supervisor: Hans-Arno.

MIDeA :A Multi-Parallel Instrusion Detection Architecture Author: Giorgos Vasiliadis, Michalis Polychronakis,Sotiris Ioannidis Publisher: CCS’11, October.

Para-Snort : A Multi-thread Snort on Multi-Core IA Platform Tsinghua University PDCS 2009 November 3, 2009 Xinming Chen, Yiyao Wu, Lianghong Xu, Yibo Xue.

InCoB August 30, HKUST “Speedup Bioinformatics Applications on Multicore- based Processor using Vectorizing & Multithreading Strategies” King.

High Performance Computing Processors Felix Noble Mirayma V. Rodriguez Agnes Velez Electric and Computer Engineer Department August 25, 2004.

Shared Memory Parallelization of Decision Tree Construction Using a General Middleware Ruoming Jin Gagan Agrawal Department of Computer and Information.

Towards a Billion Routing Lookups per Second in Software  Author: Marko Zec, Luigi, Rizzo Miljenko Mikuc  Publisher: SIGCOMM Computer Communication Review,

Integrating and Optimizing Transactional Memory in a Data Mining Middleware Vignesh Ravi and Gagan Agrawal Department of ComputerScience and Engg. The.

April 26, CSE8380 Parallel and Distributed Processing Presentation Hong Yue Department of Computer Science & Engineering Southern Methodist University.

Performance Prediction for Random Write Reductions: A Case Study in Modelling Shared Memory Programs Ruoming Jin Gagan Agrawal Department of Computer and.

Achieving Scalability, Performance and Availability on Linux with Oracle 9iR2-RAC Grant McAlister Senior Database Engineer Amazon.com Paper

Compiling Several Classes of Communication Patterns on a Multithreaded Architecture Gagan Agrawal Department of Computer and Information Sciences Ohio.

MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG Total Order in Content-based Publish/Subscribe Systems Joint work with: Vinod Muthusamy, Hans-Arno Jacobsen.

Parallelization and Characterization of Pattern Matching using GPUs Author: Giorgos Vasiliadis 、 Michalis Polychronakis 、 Sotiris Ioannidis Publisher:

Randomized Parallel Proof-Number Search ACG 12, Pamplona, May 2009.

Towards Vulnerability-Based Intrusion Detection with Event Processing Amer Farroukh, Mohammad Sadoghi, Hans-Arno Jacobsen University of Toronto July 13,

MIDDLEWARE SYSTEMS RESEARCH GROUP Modelling Performance Optimizations for Content-based Publish/Subscribe Alex Wun and Hans-Arno Jacobsen Department of.

MIDDLEWARE SYSTEMS RESEARCH GROUP Adaptive Content-based Routing In General Overlay Topologies Guoli Li, Vinod Muthusamy Hans-Arno Jacobsen Middleware.

Department of Computer Science and Engineering Applied Research Laboratory Architecture for a Hardware Based, TCP/IP Content Scanning System David V. Schuehler.

Minimal Broker Overlay Design for Content-Based Publish/Subscribe Systems Naweed Tajuddin Balasubramaneyam Maniymaran Hans-Arno Jacobsen University of.

Analysis of Cache Tuner Architectural Layouts for Multicore Embedded Systems + Also Affiliated with NSF Center for High- Performance Reconfigurable Computing.

Radix Sort and Hash-Join for Vector Computers Ripal Nathuji 6.893: Advanced VLSI Computer Architecture 10/12/00.

DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.

Layali Rashid, Wessam M. Hassanein, and Moustafa A. Hammad*

Exploiting Multithreaded Architectures to Improve Data Management Operations Layali Rashid The Advanced Computer Architecture U of C (ACAG) Department.

MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG Distributed Ranked Data Dissemination in Social Networks Joint work with: Mo Sadoghi Vinod Muthusamy Hans-Arno.

Computer Organization CS224 Fall 2012 Lesson 52. Introduction  Goal: connecting multiple computers to get higher performance l Multiprocessors l Scalability,

REED ： Robust, Efficient Filtering and Event Detection in Sensor Network Daniel J. Abadi, Samuel Madden, Wolfgang Lindner Proceedings of the 31st VLDB.

Introduction Goal: connecting multiple computers to get higher performance – Multiprocessors – Scalability, availability, power efficiency Job-level (process-level)

Congestion Avoidance with Incremental Filter Aggregation in Content-Based Routing Networks Mingwen Chen 1, Songlin Hu 1, Vinod Muthusamy 2, Hans-Arno Jacobsen.

Parallel Databases.

Toward Advocacy-Free Evaluation of Packet Classification Algorithms

Morgan Kaufmann Publishers

Spare Register Aware Prefetching for Graph Algorithms on GPUs

Yan Chen Department of Electrical Engineering and Computer Science

Optimizing MapReduce for GPUs with Effective Shared Memory Usage

CLUSTER COMPUTING.

CSE8380 Parallel and Distributed Processing Presentation

What is Concurrent Programming?

Multithreaded Programming

Chapter 4 Multiprocessors

Assignment 2: Activity 2 Produce a written report to explain how software utilities can improve the performance of Computer Systems. Select 3-4 utilities.

Presentation transcript:

Parallel Event Processing for Content-Based Publish/Subscribe Systems Amer Farroukh Department of Electrical and Computer Engineering University of Toronto Joint work with Elias Ferzli, Naweed Tajuddin, and Hans-Arno Jacobsen DEBS 2009

Motivation Event processing is ubiquitous in enterprise-scale applications (Fraud detection, Data analysis) Network security monitoring and analysis tools require Gigabit per second speed (Application-layer firewalls) Selective dissemination of information for Internet- scale applications (RSS, XML, Xpath) These systems need to support thousands of users and process millions of events Achieving Scalability and high performance under excessive load is a challenging problem Matching engine is the most computation intensive function in event processing 2 DEBS 2009

Choose an existing, powerful matching algorithm Leverage chip multi-processors Increase throughput or reduce matching time Evaluate multi-threading vs. software transactional memory 3 DEBS 2009 How to support high data-processing rates?

Outline Related work Matching algorithm Parallelization techniques Implementation and results 4 DEBS 2009

Sequential Matching Algorithms Single phase: A_TREAT [E.H., 1992] – Predicates are complied into a test network – Subscriptions may appear in one or several leaves – Poor locality, space consuming, hard to maintain Two phase: SIFT [T.Y., 2000] – Predicates are evaluated in the first phase – Subscriptions are matched in the second phase – Predicates and subscription are indexed Algorithm used: Filtering Algorithms [F.F., 2001] 5 DEBS 2009

P1P2 Price Color Quantity Ap 1 Ap 2 Ap 3 Ap 4 Ap 5 C1C2C3 C1C2 C1C2C Matching Algorithm S9 S5 S1 E 6 DEBS 2009 Phase 1 Phase 2

P1P2P1P2 Price Color Quantity Ap 1 Ap 2 Ap 3 Ap 4 Ap 5 C1C2C3 C1C2 C1C2C Multiple Events Independent Processing S9 S2 S E1E2Thread 1Thread 2 S1 S8 S7 7 DEBS 2009

P2P1 Price Color Quantity Single Event Collaborative Processing EThread 1Thread Ap 1 Ap 2 Ap 3 Ap 4 Ap 5 C1C2C3 C1C2 C1C2C S S1 S DEBS 2009

Price Color Quantity Multiple Events Collaborative Processing Group Ap 1 Ap 2 Ap 3 Ap 4 Ap 5 C1C2C3 C1C2 C1C2C S T2 T1 Group 2 T3 T P1P2P1P2 E1E2 S3 S2 S4 S7S9 1 9 DEBS 2009

Implementation Setup 10 Synchronization – Static – Locks – Software transactional memory (STM) Machine – 2.33GHz quad-core Xeon processors – 32KB L1 cache and 4MB L2 cache Workload Number of Subscriptions1M – 6M Average Predicates per Subscription10 Predicate Range Number of Events5000 Average Attributes per Event50 DEBS 2009

Multiple Events Independent Processing Analysis 11 Linear Throughput and Constant Average Matching Time DEBS 2009

Single Event Collaborative Processing Analysis 12 Lock Implementation is best Bit vector size limits scalability Lock Implementation is best Bit vector size limits scalability DEBS 2009

Multiple Events Collaborative Processing Analysis 13 Threads can be allocated based on system requirements and load DEBS 2009

Conclusions Parallel matching engine is a promising solution Over 1600 events/s with 6M subs Matching time vs. throughput Lock-based implementation is more efficient HTM is a potential candidate for enhancing speed and potential ease of implementation 14 DEBS 2009

Predicate Tables (Phase 1) PRICE EQUAL LESS GREATER NOT EQUAL 16 S1: quantity = 2, price < 30 QUANTITY12345 EQUAL LESS GREATER NOT EQUAL S2: quantity > 4, price =

DEBS 2009 Ap 1 S1S2S3 P1 S4 P2 P3 Ap 2 S5 P4 Ap N Subscription Clusters (Phase 2) 17

Time Profiling 18 DEBS 2009

Block Size 19 DEBS 2009

Subscriptions Effect 20 DEBS 2009 ME-IP SE-CP