Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 Cassandra concepts, patterns and anti- patterns Dave ApacheCon EU 2012
Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 Agenda Choosing NoSQL Cassandra concepts (Dynamo and Big Table) Patterns and anti-patterns of use
Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 Choosing NoSQL...
Cassandra concepts, patterns and anti-patterns - ApacheCon EU Find data store that doesn’t use SQL 2.Anything 3.Cram all the things into it 4.Triumphantly blog this success 5.Complain a month later when it bursts into flames
Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 “NoSQL DBs trade off traditional features to better support new and emerging use cases” solutions-to-hard-problems
Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 More widely used, tested and documented software.. (MySQL first OS release 1998).. for a relatively immature product (Cassandra first open-sourced in 2008)
Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 Ad-hoc querying.. (SQL join, group by, having, order).. for a rich data model with limited ad-hoc querying ability (Cassandra makes you denormalise)
Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 What do we get in return?
Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 Proven horizontal scalability Cassandra scales reads and writes linearly as new nodes are added
Cassandra concepts, patterns and anti-patterns - ApacheCon EU
Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 High availability Cassandra is fault-resistant with tunable consistency levels
Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 Low latency, solid performance Cassandra has very good write performance
Cassandra concepts, patterns and anti-patterns - ApacheCon EU * Add pinch of salt
Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 Operational simplicity Homogenous cluster, no “master” node, no SPOF
Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 Rich data model Cassandra is more than simple key-value – columns, composites, counters, secondary indexes
Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 Choosing NoSQL...
Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 “they say … I can’t decide between this project and this project even though they look nothing like each other. And the fact that you can’t decide indicates that you don’t actually have a problem that requires them.” computing-and-fast_ip computing-and-fast_ip (at 30:15)
Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 Or you haven’t learned enough about them..
Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 What tradeoffs are you making? How is it designed? What algorithms does it use? Are the fundamental design decisions sane? html
Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 Concepts...
Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 Consistent hashing Vector clocks * Gossip protocol Hinted handoff Read repair iles/amazon-dynamo-sosp2007.pdf Columnar SSTable storage Append-only Memtable Compaction gtable-osdi06.pdf * not in Cassandra Amazon Dynamo + Google Big Table
Cassandra concepts, patterns and anti-patterns - ApacheCon EU Clien t tokens are integers from 0 to Distributed Hash Table (DHT)
Cassandra concepts, patterns and anti-patterns - ApacheCon EU Clien t Coordinator node consistent hashing Clien t
Cassandra concepts, patterns and anti-patterns - ApacheCon EU Clien t replication factor (RF) 3 coordinator node Clien t
Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 Consistency Level (CL) How many replicas must respond to declare success?
Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 LevelDescription ONE1 st Response QUORUMN/2 + 1 replicas LOCAL_QUORUMN/2 + 1 replicas in local data centre EACH_QUORUMN/2 + 1 replicas in each data centre ALLAll replicas For read operations
Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 LevelDescription ANYOne node, including hinted handoff ONEOne node QUORUMN/2 + 1 replicas LOCAL_QUORUMN/2 + 1 replicas in local data centre EACH_QUORUMN/2 + 1 replicas in each data centre ALLAll replicas For write operations
Cassandra concepts, patterns and anti-patterns - ApacheCon EU Clien t coordinator node Clien t RF = 3 CL = Quorum
Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 Hinted Handoff A hint is written to the coordinator node when a replica is down
Cassandra concepts, patterns and anti-patterns - ApacheCon EU Clien t coordinator node Clien t RF = 3 CL = Quorum node offline hint
Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 Read Repair Background digest query on-read to find and update out-of-date replicas * * carried out in the background unless CL:ALL
Cassandra concepts, patterns and anti-patterns - ApacheCon EU Clien t coordinator node Clien t RF = 3 CL = One background digest query, then update out-of-date replicas
Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 Big Table...
Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 Sparse column based data model SSTable disk storage Append-only commit log Memtable (buffer and sort) Immutable SSTable files Compaction
Cassandra concepts, patterns and anti-patterns - ApacheCon EU timestamp Name Value Column Timestamp used for conflict resolution (last write wins)
Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 Name Value Column Name Value Column Name Value Column we can have millions of columns * * theoretically up to 2 billion
Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 Name Value Column Name Value Column Name Value Column Row Key Row
Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 Column Family Column Row Key Column Row Key Column Row Key Column we can have billions of rows
Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 Write Memtable SSTable Commit Log Memory Disk Write path buffer writes and sort data flush on time or size trigger immutable
Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 Sorted data written to disk in blocks Each “query” can be answered from a single slice of disk Therefore start from your queries and work backwards
Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 Patterns and anti-patterns...
Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
Storing entities as individual columns under one row Pattern
Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 row: USERID1234 name:Dave job:Developer Pattern we can use C* secondary indexes to fetch all users with job=developer one row per user
Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 Storing whole entity as single column blob Anti-pattern
Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 row: USERID1234 data: {"name":"Dave", "job":"Developer"} now we can’t use secondary indexes nor easily update safely one row per user Anti-pattern
Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 Mutate just the changes to entities, make use of C* conflict resolution Pattern
Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 $userCf->insert( "USER1234", array("job" => "Cruft") ); Pattern we only update the “job” column, avoiding any race conditions on reading all properties and then writing all, having only updated one
Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 Lock, read, update Anti-pattern
Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 Don’t overwrite anything; store as time series data Pattern
Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 row: USERID1234 a384cff0-26c1-11e2-81c c9a66 {"action":"create", "name":"Dave"} 10dc4c40-26c2-11e2-81c c9a66 {"action":"update", "name":"foo"} Pattern column name is a type 1 UUID (time based) one row per user; many columns (wide row)
Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 We can store all sorts of stuff as time series Pattern
Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 Order Preserving Paritioner (OPP) randompartitioner-vs-orderpreservingpartitioner/ Anti-pattern
Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 Distributed counters Pattern
Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 Super Columns (a trap for the unwary) for-the-unwary/ Anti-pattern
Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 In conclusion...
Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 Cassandra is founded on sound design principles
Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 The data model is incredibly powerful
Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 CQL and a new breed of clients are making it easier to use
Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 Lots of tools and integrations exist to expand the feature set
Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 There is a strong community and multiple companies offering professional support
Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 Thanks Learn more about Cassandra (if you’re ever in London) meetup.com/Cassandra-London Learn more about the fundamentals Watch videos from Cassandra SF s looking for a job?
Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 Extending functionality Search via Apache Solr and DataStax Enterprise Batch processing via Apache Hadoop and DataStax Enterprise Real-time analytics via Acunu Reflex