Presentation is loading. Please wait.

Presentation is loading. Please wait.

SPUD A Distributed High Performance Publish-Subscribe Cluster Uriel Peled and Tal Kol Guided by Edward Bortnikov Software Systems Laboratory Faculty of.

Similar presentations


Presentation on theme: "SPUD A Distributed High Performance Publish-Subscribe Cluster Uriel Peled and Tal Kol Guided by Edward Bortnikov Software Systems Laboratory Faculty of."— Presentation transcript:

1 SPUD A Distributed High Performance Publish-Subscribe Cluster Uriel Peled and Tal Kol Guided by Edward Bortnikov Software Systems Laboratory Faculty of Electrical Engineering, Technion

2 Project Goal Design and implement a general-purpose Publish-Subscribe server Push traditional implementations into global scale performance demands 1 million concurrent clients Millions of concurrent topics High transaction rate Demonstrate server abilities with a fun client application

3 What is Pub/Sub? topic://traffic-jams/ayalon subscribe publish accident in hashalom accident in hashalom

4 What Can We Do With It? Collaborative Web Browsing others:

5 What Can We Do With It? Instant Messaging Hi buddy! Hi buddy!

6 Seems Easy To Implement, But… “I’m behind a NAT, I can’t connect!” Not all client setups are server friendly “Server is too busy, try again later?!” 1 million concurrent clients is simply too much “The server is so slow!!!” Service time grows exponentially with load “A server crashed, everything is lost!” Single points of failure will eventually fail

7 Naïve Implementation (example 1) Simple UDP for client-server communication No need for sessions since we send messages Very low cost-per-client Sounds perfect? NAT

8 NAT Traversal UDP hole punching NAT will accept UDP reply for a short window Our measurements: 15-30 seconds Keep UDP pinging from each client every 15s Days-long TCP sessions NAT remembers current sessions for replies If WWW works - we should work Increases dramatically cost-per-client Our research: all IM’s do exactly this

9 Naïve Implementation (example 2) Blocking I/O with one thread per client Basic model for most servers (JAVA default) Traditional UNIX – fork for every client Sounds perfect? 500 clients 500 clients 500 clients

10 Network I/O Internals Blocking I/O – one thread per client 2MB stack, 1GB virtual space enough for only 512 (!) Non-blocking I/O - select Linear fd searches are very slow Asynchronous I/O – completion ports Thread pool to handle request completion Our measurements: 30,000 concurrent clients! What is the bottleneck? Number of locked pages (zero-byte receives) TCP/IP kernel driver non-paged pool allocations

11 Scalability Scale up Buy a bigger box Scale out Buy more boxes Which one to do? Both! Push each box to its hardware maximum 1000’s of servers is impractical Add relevant boxes as load increases The Google way (cheap PC server farms)

12 Identify Our Load Factors Concurrent TCP clients Scale up: async-I/O, 0-byte-recv, larger NPP Scale out: dedicate boxes to handle clients => Connection Server (CS) High transaction throughput (topic load) Scale up: software optimizations Scale out: dedicate boxes to handle topics => Topic Server (TS) Design the cluster accordingly

13 Network Architecture

14 Client Load Balancing CLB CS1 CS2 CS3 TS1 TS2 request CS load balance: - user location - CS client load given CS2 loginsubscribepublish

15 Topic Load Balancing Static CS TS0 TS3 TS2 TS1 subscribe: traffic Room 0 subscribe: 923481%4=1

16 Topic Load Balancing Dynamic TS1 CS Room 0 Room 1 Room 2 TS1 subscribe subscribe R0: 345K R1: ? R2: ? subscribe R0: 345K R1: 278K R2: ? subscribe R0: 345K R1: 278K R2: 301K subscribe R1: 278K handle subscribe

17 Performance Pitfalls Data Copies Single instance - reference counting (REF_BLOCK) Multi-buffer messages (MESSAGE: header, body, tail) Context Switches Flexible module exec foundation (MODULE) Processor num sized thread pools Memory Allocation MM: custom memory pools (POOL, POOL_BLOCK) fine-grained locking, pre-allocation, batching, single-size Lock Contention EVENT, MUTEX, RW_MUTEX, interlocked API

18 Class Diagram (Application)

19 Class Diagram (TS, CS)

20 Stress Testing Measure publish-notify turnaround time 1 ms resolution using MM timer, avg. of 30 Increasing client and/or topic load Several room topologies examined Results: Exponential-like climb TS increase: better times CS increase: better max clients time not improved


Download ppt "SPUD A Distributed High Performance Publish-Subscribe Cluster Uriel Peled and Tal Kol Guided by Edward Bortnikov Software Systems Laboratory Faculty of."

Similar presentations


Ads by Google