Geo-distributed Messaging with RabbitMQ

Slides:



Advertisements
Similar presentations
How We Manage SaaS Infrastructure Knowledge Track
Advertisements

WEB AND WIRELESS AUTOMATION connecting people and processes InduSoft Web Solution Welcome.
Windows Server ® 2008 File Services Infrastructure Planning and Design Published: June 2010 Updated: November 2011.
Chapter 19: Network Management Business Data Communications, 5e.
Overview of the technology that comprises Attendance Enterprise.
1 Disk Based Disaster Recovery & Data Replication Solutions Gavin Cole Storage Consultant SEE.
Oracle Data Guard Ensuring Disaster Recovery for Enterprise Data
CS-550: Distributed File Systems [SiS]1 Resource Management in Distributed Systems: Distributed File Systems.
Scalable Content-aware Request Distribution in Cluster-based Network Servers Jianbin Wei 10/4/2001.
Distributed Systems Fall 2010 Replication Fall 20105DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
1 ITC242 – Introduction to Data Communications Week 12 Topic 18 Chapter 19 Network Management.
EEC-681/781 Distributed Computing Systems Lecture 3 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
IT:Network:Applications VIRTUAL DESKTOP INFRASTRUCTURE.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Report Distribution Report Distribution in PeopleTools 8.4 Doug Ostler & Eric Knapp 7264.
CS CS 5150 Software Engineering Lecture 13 System Architecture and Design 1.
Hands-On Microsoft Windows Server 2003 Administration Chapter 3 Administering Active Directory.
Distributed Systems: Client/Server Computing
Distributed Databases
Service Broker Lesson 11. Skills Matrix Service Broker Service Broker, provides a solution to common problems with message delivery and consistency that.
Agenda Master Expert Associat e Microsoft Certified Solutions Master (MCSM) Microsoft Certified Solutions Expert (MCSE) Microsoft Certified Solutions.
IBM Haifa Research 1 The Cloud Trade Off IBM Haifa Research Storage Systems.
Distributed Process Implementation
Distributed Data Stores – Facebook Presented by Ben Gooding University of Arkansas – April 21, 2015.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Case Study: Amazon Dynamo Steve Ko Computer Sciences and Engineering University at Buffalo.
(ITI310) SESSIONS : Active Directory By Eng. BASSEM ALSAID.
WP6: Grid Authorization Service Review meeting in Berlin, March 8 th 2004 Marcin Adamski Michał Chmielewski Sergiusz Fonrobert Jarek Nabrzyski Tomasz Nowocień.
Distributed Systems Tutorial 11 – Yahoo! PNUTS written by Alex Libov Based on OSCON 2011 presentation winter semester,
Module 5: Managing Public Folders. Overview Managing Public Folder Data Managing Network Access to Public Folders Publishing an Outlook 2003 Form Discussion:
 Cloud computing  Workflow  Workflow lifecycle  Workflow design  Workflow tools : xcp, eucalyptus, open nebula.
Technology Overview. Agenda What’s New and Better in Windows Server 2003? Why Upgrade to Windows Server 2003 ?  From Windows NT 4.0  From Windows 2000.
Database Design – Lecture 16
INSTALLING MICROSOFT EXCHANGE SERVER 2003 CLUSTERS AND FRONT-END AND BACK ‑ END SERVERS Chapter 4.
Remote Access Chapter 4. Learning Objectives Understand implications of IEEE 802.1x and how it is used Understand VPN technology and its uses for securing.
70-411: Administering Windows Server 2012
Windows Azure Conference 2014 Deploy your Java workloads on Windows Azure.
IMDGs An essential part of your architecture. About me
CSE 486/586 CSE 486/586 Distributed Systems Case Study: Amazon Dynamo Steve Ko Computer Sciences and Engineering University at Buffalo.
OOI CI LCA REVIEW August 2010 Ocean Observatories Initiative OOI Cyberinfrastructure Architecture Overview Michael Meisinger Life Cycle Architecture Review.
(Business) Process Centric Exchanges
Windows Server ® 2008 R2 Remote Desktop Services Infrastructure Planning and Design Published: November 2009.
RVP Protocol for Real-Time Presence Information Sonu Aggarwal Lead Program Manager, Exchange Instant Messaging Microsoft Corporation
Windows Server ® 2008 R2 Remote Desktop Services Infrastructure Planning and Design Published: July 2008 Updated: February 2011.
Module 8: Planning and Troubleshooting IPSec. Overview Understanding Default Policy Rules Planning an IPSec Deployment Troubleshooting IPSec Communications.
Practical Byzantine Fault Tolerance
MCTS Guide to Microsoft Windows Server 2008 Applications Infrastructure Configuration (Exam # ) Chapter Five Windows Server 2008 Remote Desktop Services,
1 ZYZZYVA: SPECULATIVE BYZANTINE FAULT TOLERANCE R.Kotla, L. Alvisi, M. Dahlin, A. Clement and E. Wong U. T. Austin Best Paper Award at SOSP 2007.
1 ACTIVE FAULT TOLERANT SYSTEM for OPEN DISTRIBUTED COMPUTING (Autonomic and Trusted Computing 2006) Giray Kömürcü.
 Apache Airavata Architecture Overview Shameera Rathnayaka Graduate Assistant Science Gateways Group Indiana University 07/27/2015.
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
11 CLUSTERING AND AVAILABILITY Chapter 11. Chapter 11: CLUSTERING AND AVAILABILITY2 OVERVIEW  Describe the clustering capabilities of Microsoft Windows.
AMQP, Message Broker Babu Ram Dawadi. overview Why MOM architecture? Messaging broker like RabbitMQ in brief RabbitMQ AMQP – What is it ?
CHAPTER 7 CLUSTERING SERVERS. CLUSTERING TYPES There are 2 types of clustering ; Server clusters Network Load Balancing (NLB) The difference between the.
Spring RabbitMQ Martin Toshev.
GRID ANATOMY Advanced Computing Concepts – Dr. Emmanuel Pilli.
MICROSOFT TESTS /291/293 Fairfax County Adult Education Courses 1477/1478/1479.
1 Active Directory Service in Windows 2000 Li Yang SID: November 2000.
Vignesh Ravindran Sankarbala Manoharan. Infrastructure As A Service (IAAS) is a model that is used to deliver a platform virtualization environment with.
Introduction Contain two or more CPU share common memory and peripherals. Provide greater system throughput. Multiple processor executing simultaneous.
70-294: MCSE Guide to Microsoft Windows Server 2003 Active Directory, Enhanced Chapter 6: Active Directory Physical Design.
4.1 © 2004 Pearson Education, Inc. Exam Managing and Maintaining a Microsoft® Windows® Server 2003 Environment Lesson 12: Implementing Security.
Interstage BPM v11.2 1Copyright © 2010 FUJITSU LIMITED INTERSTAGE BPM ARCHITECTURE BPMS.
Making Sense of Service Broker Inside the Black Box.
AMSA TO 4 Advanced Technology for Sensor Clouds 09 May 2012 Anabas Inc. Indiana University.
Resource subscription using DDS in oneM2M
Open Source distributed document DB for an enterprise
Replication Middleware for Cloud Based Storage Service
Ebusiness Infrastructure Platform
Message Queuing.
Amazon AWS Certified Solutions Architect Professional solutions-architect-professional-practice-test.html.
Presentation transcript:

Geo-distributed Messaging with RabbitMQ Real World Case Study Geo-distributed Messaging with RabbitMQ by Andriy Shapochka System Architect @ SoftServe

Context Security Management SaaS Solution with availability and scalability issues

Goal To build highly available, geo-distributed clustered solution based on the original implementation. Clients can have their servers in the US, Europe, and elsewhere. Amazon and data centers are to be supported The data should always stay consistent in the sense of CAP theorem.

Architecture Drivers Nodes communicate over WAN and must account for high latencies and possible connectivity interruptions. The main quality attributes to achieve are: High Availability Eventual state consistency on each active node (primary and replicas) Inter-node communication security (transport protocol encryption and authentication) The secondary quality attributes to achieve are: Performance Thin maintenance windows

Consistency, Availability, Partition Tolerance: Pick two only. CAP Theorem by Brewer Consistency – roughly meaning that all clients of a data store get responses to requests that ‘make sense’. For example, if Client A writes 1 then 2 to location X, Client B cannot read 2 followed by 1. Availability – all operations on a data store eventually return successfully. We say that a data store is ‘available’ for, e.g. write operations. Partition tolerance – if the network stops delivering messages between two sets of servers, will the system continue to work correctly? Consistency, Availability, Partition Tolerance: Pick two only.

Partition is network property – not our choice Partition Tolerance means data copying strategy for consistency or for availability In a distributed system that may drop messages we cannot have both consistency and availability of the data – only one of them!

Primary Decision To use RabbitMQ as a platform for the data bus between the nodes in the cluster. AMQP Broker with extensions implemented in Erlang. Clients in Java, Python, .NET, REST, etc. Performance, HA, Federation, Clustering, Flexible Routing, Security

Messaging in RabbitMQ Exchanges: fan-out, direct, topics; can be bound to exchanges Queues bound to exchanges Routing keys Acknowledgements RPC with correlation ids

Geo-Distribution in RabbitMQ Federation plugin to the rescue

Design Strategy: Components Application Node – the nodes are geo-distributed and can play a role of primary node - single node serving user requests in the entire cluster replicas - all the other nodes updated with the changes on the primary node Cluster Controller – a single node controlling the cluster state and assigning the primary and replica roles to the application nodes.

Design Strategy: Decisions - 1 All the application nodes are equivalent in the sense each of them can become primary or a replica at the runtime. The application node status (active, primary, replica, etc.) is controlled by the cluster monitor. The replicas receive updates from the current primary node by means of exchange federation. Each replica monitors its transaction flow state and validates it against every new incoming primary update.

Design Strategy: Decisions - 2 When the replica finds its transaction flow state to become inconsistent it switches to the catch-up mode which involves two steps in order to offload additional processing and communication from the primary node: To request the missing transactions from the other replicas. In case no replica succeeds in replying with the requested transactions to fall back to the catch-up request toward the primary node. The bus construction is based on the official RabbitMQ Federation plugin which works by way of propagating the messages published in the local upstream exchanges to the federated exchanges owned by the remote brokers.

Design Strategy: RPC Catch-up

Design Strategy: Extra Mile Security – communication is secured by means of the https protocol, server and client certificates supported by RabbitMQ Federation. It is configured in the upstream part. Cluster Configuration – runtime roulette. The upstream configuration occurs at the deployment time. The exchange creation does at runtime. Cluster Controller selects and promotes the new primary and notifies replicas. Cluster Controller – non-trivial task to build a cluster controller which would be HA and would avoid the split-brain issue. Zookeeper and other controller distribution options were evaluated. At the end it was decided to build it on top of AWS infrastructure using Multi-AZ RDS as a configuration storage.

Thank You! Questions, please!