www.streambase.com4 Stream Processing Stream Processing (Electronic Trading) A feed comes out of the wall Compute a secret sauce looking for events of interest Trade based on the result But only if you are more nimble than the next guy….
www.streambase.com5 Traditional RDBMS Model Outbound Processing Store the data before processing! Latency What if the data is not important? Too many processes! Optimized for business data processing Where you dont trust the app. Queries Memory Disk Updates Processing Too slow to be interesting!
www.streambase.com6 Stream Processing Engine with StreamSQL Database paradigm (SQL) a good one But need a different architecture Straight through processing No task switches Lightweight scheduling Inbound Processing Memory Disk StreamBase Application Event Data Queries Alerts Actions Streambase Application
www.streambase.com7 Example: Every minute for every stock I am trading: Calculate VWAP (vol. weighted avg. price) for my trades & all trades Alert whenever my personal trading execution is inferior to market 5 Streambase operators, 30 min to build Streams of tuples (time-series data) flow through query Queries run continuously StreamSQL Application Example Market_Feeds My_Buys Alerts
www.streambase.com8 StreamSQL Will Dominate Rule Engines Essentially all applications entail a mix of stored and real-time data StreamSQL covers both kinds of data in a single paradigm A rule engine must switch paradigms StreamSQL amenable to compilation Know what is the next event to process In contrast, hard to figure this out in a rule engine
www.streambase.com9 Performance Benchmark Financial Services Application: Construct a virtual feed of first arrivers on a low end Linux machine Relational DB: 11,000 messages/sec Streambase: 300,000 messages/sec Another StreamSQL vendor: 20,000 messages/sec Result: Streambase was a factor of 27 faster
www.streambase.com10 Tick Stores Tick Stores (and Other Warehouse Applications) Store all market data for the last 10 years To back test secret sauce models To answer ad-hoc queries – how many times has X happened Typical size – 100 Tbytes Append only
www.streambase.com11 Terminology -- Row Store Record 2 Record 4 Record 1 Record 3 E.g. DB2, Oracle, Sybase, SQLServer, …
www.streambase.com12 Rotate Your Thinking 90 Degrees Rotate Your Thinking 90 Degrees Column stores read only the columns required Not all of them Compression works better By a factor of 2-3 against the elephants No record headers Which are big ticket items No padding to byte or word boundaries
www.streambase.com13 Benchmark Summary Benchmark Summary Vertica has been baked off about 30 times Typically against the incumbent Has yet to win by less than a factor of 30 against a row store Beats most other column stores by around 10X KX is the only system to come within an order of magnitude
www.streambase.com14 Maybe Elephants are Good Maybe Elephants are Good at OLTP…… OLTP is a main memory market Not a disk-based one Transactions are short and have no I/O or user stalls Run to completion (single threaded) Disaster Recovery (and HA) a requirement Build it into the bottom of the system
www.streambase.com15 TPC-C Performance TPC-C Performance on a Low-end Machine Elephant 850 TPS (1/2 the land speed record per processor) H-Store (so far – a university prototype) 70,416 TPS (41X the land speed record per processor) Factor of 82!!!!!
www.streambase.com16 Implications for the Elephants Implications for the Elephants They are selling one size fits all Which is 30 year old legacy technology that is good at nothing
www.streambase.com17 Pictorially: OLTP Data Warehouse Streaming data DBMS apps
www.streambase.com18 The DBMS Landscape – Performance Needs OLTP Data Warehouse Streaming data low high
www.streambase.com19 One Size Does Not Fit All -- Pictorially Open source Vertica H-Store successors Streambase Elephants get only the crevices