: what’s all the buzz about?

: what’s all the buzz about?

http://nosql-database.org/ Next generation databases are:
Non-relational, Distributed, Open-source, Horizontal scalable Often more characteristics: Schema-free, easy replication support, simple API, eventually consistent / BASE (not ACID), a huge data amount NoSQL manifest from Next Generation Databases mostly addressing some of the points: being non-relational, distributed, open-source and horizontal scalable. Often more characteristics apply as: schema-free, easy replication support, simple API, eventually consistent / BASE (Basically Available Soft-state Eventual consistency, not ACID), a huge data amount, and more. So the misleading term "nosql" (the community now translates it mostly with "not only sql") should be seen as an alias to something like the definition above.

List of NoSQL databases [122+]
Wide Column Store / Column Families HBase, Cassandra, Hypertable, Cloudata, Cloudera, Amazon SimpleDB Document Stores CouchDB, MongoDB, Terrastore, ThruDB, OrientDB, RavenDB, Citrusleaf, SisoDB Key Value / Tuple Store Azure Table Storage, MEMBASE, Riak, Redis, Chordless, GenieDB, Scalaris, Tokyo Cabinet / Tyrant, Keyspace Berkeley DB, MemcacheDB, Faircom C-Tree, Mnesia, LightCloud, Hibari, HamsterDB, STSdb, Pincaster, RaptorDB Eventually Consistent Key Value Stores Amazon Dynamo, Voldemort, Dynomite, KAI Graph Databases Neo4J, Infinite Graph, Sones, InfoGrid, HyperGraphDB, Trinity, AllegroGraph, Bigdata, DEX, OpenLink, Virtuoso, VertexDB, FlockDB Object Databases db4o, Versant, Objectivity, Gemstone, Progress, Starcounter, Perst, Caching, ZODB, NEO, PicoLisp, Sterling More and more databases List of relational databases [96+] The original intention has been modern web-scale databases. The movement began early 2009 and is growing rapidly.

So what’s wrong with relational databases?

Main principals of RDBMS
SQL ACID Atomic “all or nothing” Consistent means that data moves from one correct state to another correct state, with no possibility that readers could view different values that don’t make sense together. Isolated means that transactions executing concurrently will not become entangled with each other. Durable once a transaction has succeeded, the changes will not be lost. Main principals of RDBMS ACID is an acronym for Atomic, Consistent, Isolated, Durable, which are the gauges we can use to assess that a transaction has executed properly and that it was successful: Atomic Atomic means “all or nothing”; that is, when a statement is executed, every update within the transaction must succeed in order to be called successful. There is no partial failure where one update was successful and another related update failed. The common example here is with monetary transfers at an ATM: the transfer requires subtracting money from one account and adding it to another account. This operation cannot be subdivided; they must both succeed. Consistent Consistent means that data moves from one correct state to another correct state, with no possibility that readers could view different values that don’t make sense together. For example, if a transaction attempts to delete a Customer and her Order history, it cannot leave Order rows that reference the deleted customer’s primary key; this is an inconsistent state that would cause errors if someone tried to read those Order records. Isolated Isolated means that transactions executing concurrently will not become entangled with each other; they each execute in their own space. That is, if two different transactions attempt to modify the same data at the same time, then one of them will have to wait for the other to complete. Durable Once a transaction has succeeded, the changes will not be lost. This doesn’t imply another transaction won’t later modify the same data; it just means that writers can be confident that the changes are available for the next transaction to work with as necessary.

Shortcomings of RDBMS Transactions under heavy load
Complexities of vertical scaling 2 phase commit (2PC) protocol Transactions become difficult under heavy load. When you first attempt to horizontally scale a relational database, making it distributed, you must now account for distributed transactions, where the transaction isn’t simply operating inside a single table or a single database, but is spread across multiple systems. In order to continue to honor the ACID properties of transactions, you now need a transaction manager to orchestrate across the multiple nodes. In order to account for successful completion across multiple hosts, the idea of a two phase commit (often referred to as “2PC”) is introduced. But then, because two-phase commit locks all associate resources, it is useful only for operations that can complete very quickly. Two-phase commit blocks; that is, clients (“competing consumers”) must wait for a prior transaction to finish before they can access the blocked resource. The protocol will wait for a node to respond, even if it has died. It’s possible to avoid waiting forever in this event, because a timeout can be set that allows the transaction coordinator node to decide that the node isn’t going to respond and that it should abort the transaction. The real question is not “What’s wrong with relational databases?” but rather, “What problem do you have?” Relational databases are very good at solving certain data storage problems, but because of their focus, they also can create problems of their own when it’s time to scale. Then, you often need to find a way to get rid of your joins, which means denormalizing the data, which means maintaining multiple copies of data and seriously disrupting your design, both in the database and in your application. Further, you almost certainly need to find a way around distributed transactions, which will quickly become a bottleneck. These compensatory actions are not directly supported in any but the most expensive RDBMS. And even if you can write such a huge check, you still need to carefully choose partitioning keys to the point where you can never entirely ignore the limitation. Perhaps more importantly, as we see some of the limitations of RDBMS and consequently some of the strategies that architects have used to mitigate their scaling issues, a picture slowly starts to emerge. It’s a picture that makes some NoSQL solutions seem perhaps less radical and less scary than we may have thought at first, and more like a natural expression and encapsulation of some of the work that was already being done to manage very large databases.

Sharding If you can’t split it, you can’t scale it (Randy Shoup, distinguished architect, eBay) Sharging approach Feature-based shard or functional segmentation Key-based sharding Lookup table Shared-nothing or Cassandra like sharding Sharding: If you can’t split it, you can’t scale it (Randy Shoup, distinguished architect, eBay) Sharding approach Feature-based shard or functional segmentation This is the approach taken by Randy Shoup who in 2006 helped bring their architecture into maturity to support many billions of queries per day. Using this strategy, the data is split not by dividing records in a single table (as in the customer example discussed earlier), but rather by splitting into separate databases the features that don’t overlap with each other very much. For example, at eBay, the users are in one shard, and the items for sale are in another. At Flixster, movie ratings are in one shard and comments are in another. This approach depends on understanding your domain so that you can segment data cleanly. Key-based sharding In this approach, you find a key in your data that will evenly distribute it across shards. So instead of simply storing one letter of the alphabet for each server as in the (naive and improper) earlier example, you use a one-way hash on a key data element and distribute data across machines according to the hash. It is common in this strategy to find time-based or numeric keys to hash on. Lookup table In this approach, one of the nodes in the cluster acts as a “yellow pages” directory and looks up which node has the data you’re trying to access. This has two obvious disadvantages. The first is that you’ll take a performance hit every time you have to go through the lookup table as an additional hop. The second is that the lookup table not only becomes a bottleneck, but a single point of failure. Shared-nothing or Cassandra like sharding Sharding could be termed a kind of “shared-nothing” architecture that’s specific to databases. A shared-nothing architecture is one in which there is no centralized (shared) state, but each node in a distributed system is independent, so there is no client contention for shared resources. The term was first coined by Michael Stonebraker at University of California at Berkeley in his 1986 paper “The Case for Shared Nothing.” Shared Nothing was more recently popularized by Google, which has written systems such as its Bigtable database and its MapReduce implementation that do not share state, and are therefore capable of near-infinite scaling. The Cassandra database is a shared-nothing architecture, as it has no central controller and no notion of master/slave; all of its nodes are the same.

The real question is not “What’s wrong with relational databases
The real question is not “What’s wrong with relational databases?” but rather, “What problem do you have?” Relational databases are very good at solving certain data storage problems, but because of their focus, they also can create problems of their own when it’s time to scale. Then, you often need to find a way to get rid of your joins, which means denormalizing the data, which means maintaining multiple copies of data and seriously disrupting your design, both in the database and in your application. Further, you almost certainly need to find a way around distributed transactions, which will quickly become a bottleneck. These compensatory actions are not directly supported in any but the most expensive RDBMS. And even if you can write such a huge check, you still need to carefully choose partitioning keys to the point where you can never entirely ignore the limitation. Perhaps more importantly, as we see some of the limitations of RDBMS and consequently some of the strategies that architects have used to mitigate their scaling issues, a picture slowly starts to emerge. It’s a picture that makes some NoSQL solutions seem perhaps less radical and less scary than we may have thought at first, and more like a natural expression and encapsulation of some of the work that was already being done to manage very large databases.

Brewer’s CAP Theorem Availability Partition Consistency Tolerance
While working at University of California at Berkeley, Eric Brewer posited his CAP theorem in 2000 at the ACM Symposium on the Principles of Distributed Computing. The theorem states that within a large-scale distributed data system, there are three requirements that have a relationship of sliding dependency: Consistency, Availability, and Partition Tolerance. Consistency All database clients will read the same value for the same query, even given concurrent updates. Availability All database clients will always be able to read and write data. Partition Tolerance The database can be split into multiple machines; it can continue functioning in the face of network segmentation breaks. CA To primarily support Consistency and Availability means that you’re likely using two-phase commit for distributed transactions. It means that the system will block when a network partition occurs, so it may be that your system is limited to a single data center cluster in an attempt to mitigate this. If your application needs only this level of scale, this is easy to manage and allows you to rely on familiar, simple structures. CP To primarily support Consistency and Partition Tolerance, you may try to advance your architecture by setting up data shards in order to scale. Your data will be consistent, but you still run the risk of some data becoming unavailable if nodes fail. AP To primarily support Availability and Partition Tolerance, your system may return inaccurate data, but the system will always be available, even in the face of network partitioning. DNS is perhaps the most popular example of a system that is massively scalable, highly available, and partition-tolerant. Consistency

Brewer’s CAP Theorem Availability Partition Tolerance Consistency
Amazon Dynamo derivatives: Cassandra, Voldemort, Riak, CouchDB Relational: MySQL, Oracle, MSSQL Brewer’s CAP Theorem While working at University of California at Berkeley, Eric Brewer posited his CAP theorem in 2000 at the ACM Symposium on the Principles of Distributed Computing. The theorem states that within a large-scale distributed data system, there are three requirements that have a relationship of sliding dependency: Consistency, Availability, and Partition Tolerance. Consistency All database clients will read the same value for the same query, even given concurrent updates. Availability All database clients will always be able to read and write data. Partition Tolerance The database can be split into multiple machines; it can continue functioning in the face of network segmentation breaks. CA To primarily support Consistency and Availability means that you’re likely using two-phase commit for distributed transactions. It means that the system will block when a network partition occurs, so it may be that your system is limited to a single data center cluster in an attempt to mitigate this. If your application needs only this level of scale, this is easy to manage and allows you to rely on familiar, simple structures. CP To primarily support Consistency and Partition Tolerance, you may try to advance your architecture by setting up data shards in order to scale. Your data will be consistent, but you still run the risk of some data becoming unavailable if nodes fail. AP To primarily support Availability and Partition Tolerance, your system may return inaccurate data, but the system will always be available, even in the face of network partitioning. DNS is perhaps the most popular example of a system that is massively scalable, highly available, and partition-tolerant. Partition Tolerance Consistency Neo4j, Google Big Table and its derivatives: MongoDB, Redis, Hypertable

in 50 words or less Apache Cassandra is an open source, distributed, decentralized, elastically scalable, highly available, fault-tolerant, tuneably consistent, column-oriented database that bases its distribution design on Amazon’s Dynamo and its data model on Google’s Bigtable. Created at Facebook, it is now used at some of the most popular sites on the Web. Cassandra first started as an incubation project at Apache in January of Shortly thereafter, the committers, led by Apache Cassandra Project Chair Jonathan Ellis, released version 0.3 of Cassandra, and have steadily made minor releases since that time. Though as of this writing it has not yet reached a 1.0 release, Cassandra is being used in production by some of the biggest properties on the Web, including Facebook, Twitter, Cisco, Rackspace, Digg, Cloudkick, Reddit, and more.

Cassandra case studies
• Digg uses it for its primary near-time data store. The recent V4 relaunch is 100% Cassandra. Cassandra is running on multiple clusters internally. • Facebook still uses it for inbox search, though they are using a proprietary fork. • Ooyala uses it to store and serve near real-time video analytics data. • SimpleGeo - used for core datastore for providing location-based services and products. • Reddit uses it as a persistent cache. • Mollom - SaaS for filtering out various types of spam from user generated content: comments, forum posts, blog posts, polls, contact forms, registration forms, and password request forms (helps protect nearly 45,000 websites from spam). Used for write heavy loads and as a caching layer. • Rackspace uses it for its cloud service, monitoring, and logging (for a variety of internal needs) • Cloudkick – uses Cassandra for storing monitoring data of different metrics, checked by system. • Twitter is using Cassandra for analytics, for geolocation and places of interest data, and for data mining over the entire user store. • Mahalo uses it for its primary near-time data store.

Cassandra outlines BASE (Basically Available Soft-state Eventual consistency) and not ACID (Atomicity, Consistency, Isolation, Durability) Distributed and decentralized Elastic scalability High availability and fault tolerance Tunable consistency On the contrary to the strong consistency used in most relational databases (ACID for Atomicity Consistency Isolation Durability) Cassandra is at the other end of the spectrum (BASE for Basically Available Soft-state Eventual consistency) Distributed and Decentralized Cassandra is distributed, which means that it is capable of running on multiple machines while appearing to users as a unified whole. In fact, there is little point in running a single Cassandra node. The fact that Cassandra is decentralized means that there is no single point of failure. All of the nodes in a Cassandra cluster function exactly the same. This is sometimes referred to as “server symmetry”. Because they are all doing the same thing, by definition there can’t be a special host that is coordinating activities, as with the master/slave setup that you see in MySQL, Bigtable, and so many others. Elastic scalability refers to a special property of horizontal scalability. It means that your cluster can seamlessly scale up and scale back down. To do this, the cluster must be able to accept new nodes that can begin participating by getting a copy of some or all of the data and start serving new user requests without major disruption or reconfiguration of the entire cluster. High Availability and Fault Tolerance Cassandra is highly available. You can replace failed nodes in the cluster with no downtime, and you can replicate data to multiple data centers to offer improved local performance and prevent downtime if one data center experiences a catastrophe such as fire or flood. Tunable Consistency Consistency essentially means that a read always returns the most recently written value. Consider two customers are attempting to put the same item into their shopping carts on an ecommerce site. If I place the last item in stock into my cart an instant after you do, you should get the item added to your cart, and I should be informed that the item is no longer available for purchase. This is guaranteed to happen when the state of a write is consistent among all nodes that have that data.

Use cases for Cassandra
Large deployments Lots of writes, statistics and analysis Geographical distribution Evolving applications Large Deployments You probably don’t drive a semi-truck to pick up your dry cleaning; semis aren’t well suited for that sort of task. Lots of careful engineering has gone into Cassandra’s high availability, tuneable consistency, peer-to-peer protocol, and seamless scaling, which are its main selling points. None of these qualities is even meaningful in a single-node deployment, let alone allowed to realize its full potential. There are, however, a wide variety of situations where a single-node relational database is all we may need. So do some measuring. Consider your expected traffic, throughput needs, and SLAs. There are no hard and fast rules here, but if you expect that you can reliably serve traffic with an acceptable level of performance with just a few relational databases, it might be a better choice to do so, simply because RDBMS are easier to run on a single machine and are more familiar. Lots of Writes, Statistics, and Analysis Consider your application from the perspective of the ratio of reads to writes. Cassandra is optimized for excellent throughput on writes. Many of the early production deployments of Cassandra involve storing user activity updates, social network usage, recommendations/reviews, and application statistics. These are strong use cases for Cassandra because they involve lots of writing with less predictable read operations, and because updates can occur unevenly with sudden spikes. In fact, the ability to handle application workloads that require high performance at significant write volumes with many concurrent client threads is one of the primary features of Cassandra. Geographical Distribution Cassandra has out-of-the-box support for geographical distribution of data. You can easily configure Cassandra to replicate data across multiple data centers. If you have a globally deployed application that could see a performance benefit from putting the data near the user, Cassandra could be a great fit. Evolving Applications If your application is evolving rapidly and you’re in “startup mode,” Cassandra might be a good fit given its schema-free data model. This makes it easy to keep your database in step with application changes as you rapidly deploy.

Writes Memtable No reads No seeks Fast Sequential disk access
Atomic within a column family Any node Always writable (hinted hand-off) ≈ 0.2 ms Commit log Threshold Cassandra chooses a different architecture to support the operations performed by the users. A typical structure contains the following parts: CommitLog, Memtable, SSTable. Memtable located in the memory. One data structure maps to one Memtable object.The place which the data is firstly written into. SSTable is permanent data storage. The data is flushed from Memtable to SSTable when an specific threshold is reached. CommitLog is used for recovery purposes, record the changes so they could be used in the case of crash or inconsistency. When you perform a write operation, it’s immediately written to the commit log. The commit log is a crash-recovery mechanism that supports Cassandra’s durability goals. No reads, no seeks, sequential disk access, atomic within a column family, fast, any node, always writable (hinted hand-off) Hinted hand-off Consider the following scenario. A write request is sent to Cassandra, but the node where the write properly belongs is not available due to network partition, hardware failure, or some other reason. In order to ensure general availability of the ring in such a situation, Cassandra implements a feature called hinted handoff. You might think of a hint as a little post-it note that contains the information from the write request. If the node where the write belongs has failed, the Cassandra node that receives the write will create a hint, which is a small reminder that says, “I have the write information that is intended for node B. I’m going to hang onto this write, and I’ll notice when node B comes back online; when it does, I’ll send it the write request.” That is, node A will “hand off” to node B the “hint” regarding the write. Tombstones In the relational world, you might be used to the idea of a “soft delete.” Instead of actually executing a delete SQL statement, the application will issue an update statement that changes a value in a column called something like “deleted”. Programmers sometimes do this to support audit trails, for example. There’s a similar concept in Cassandra called a tombstone. This is how all deletes work and is therefore automatically handled for you. When you execute a delete operation, the data is not immediately deleted. Instead, it’s treated as an update operation that places a tombstone on the value. A tombstone is a deletion marker that is required to suppress older data in SSTables until compaction can run. There’s a related setting called Garbage Collection Grace Seconds. This is the amount of time that the server will wait to garbage-collect a tombstone. By default, it’s set to 864,000 seconds, the equivalent of 10 days. Cassandra keeps track of tombstone age, and once a tombstone is older than GCGraceSeconds, it will be garbage-collected. The purpose of this delay is to give a node that is unavailable time to recover; if a node is down longer than this value, then it is treated as failed and replaced. Write SSTable SSTable

Reads Memtable Bloomfilter field to determine whether a provided key is in the SSTable Index field for quick read Any node Read repair ≈ 15 ms Read Bf Idx Bf Idx Reading here is a little bit slower, because the system needs to search not only the Memtable, but also the SSTables. Every piece of written data uses a special key to identify itself in the database. When the system searches for some specific elements, it should take advantage of these special keys. In order to promote the searching efficiency, SSTable is specialized to provide wide range of searching algorithms. There are three integral parts in a SSTable, Data field, Bf field, and Idx field. Data field holds the real content of the stored data. Index is responsible for recording key and its corresponding data address. Filter fields, which is also known as “bloomfilter", can quickly determine whether a provided key is in this SSTable or not. With the assistance of these advanced structures, Cassandra can do much faster reading and writing operations without sacrificing too much space. Any node, read repair Bloom filter Bloom filters are used as a performance booster. They are named for their inventor, Burton Bloom. Bloom filters are very fast, nondeterministic algorithms for testing whether an element is a member of a set. They are nondeterministic because it is possible to get a false-positive read from a Bloom filter, but not a false-negative. Bloom filters work by mapping the values in a data set into a bit array and condensing a larger data set into a digest string. The digest, by definition, uses a much smaller amount of memory than the original data would. The filters are stored in memory and are used to improve performance by reducing disk access on key lookups. Disk access is typically much slower than memory access. So, in a way, a Bloom filter is a special kind of cache. When a query is performed, the Bloom filter is checked first before accessing disk. Because false-negatives are not possible, if the filter indicates that the element does not exist in the set, it certainly doesn’t; but if the filter thinks that the element is in the set, the disk is accessed to make sure. SSTable SSTable

The tenets of column-oriented model
Keyspace Outer container, that contains column families (is sort of like a relational database) Column Family Logical division that associates similar data (very roughly analogous to tables in the relational world) Column Name/value pair (and a client-supplied timestamp of when it was last updated) Super Column Family Container for super columns sorted by their names Super Column Structure with name and set of dependent columns It’s not relational, and it does represent its data structures in sparse multidimensional hashtables. “Sparse” means that for any given row you can have one or more columns, but each row doesn’t need to have all the same columns as other rows like it (as in a relational model). Cassandra requires you to define an outer container, called a keyspace, that contains column families. The keyspace is essentially just a logical namespace to hold column families and certain configuration properties. The column families are names for associated data and a sort order. Beyond that, the data tables are sparse, so you can just start adding data to it, using the columns that you want; there’s no need to define your columns ahead of time. Instead of modeling data up front using expensive data modeling tools and then writing queries with complex join statements, Cassandra asks you to model the queries you want, and then provide the data around them. Keyspace A keyspace is the outermost container for data in Cassandra, corresponding closely to a relational database. Like a relational database, a keyspace has a name and a set of attributes that define keyspace-wide behavior. Although people frequently advise that it’s a good idea to create a single keyspace per application, this doesn’t appear to have much practical basis. It’s certainly an acceptable practice, but it’s perfectly fine to create as many keyspaces as your application needs. Note, however, that you will probably run into trouble creating thousands of keyspaces per application. Super Columns A super column is a special kind of column. Both kinds of columns are name/value pairs, but a regular column stores a byte array value, and the value of a super column is a map of subcolumns (which store byte array values). Note that they store only a map of columns; you cannot define a super column that stores a map of other super columns. So the super column idea goes only one level deep, but it can have an unbounded number of columns. The basic structure of a super column is its name, which is a byte array (just as with a regular column), and the columns it stores. Its columns are held as a map whose keys are the column names and whose values are the columns. Fun fact: super columns were one of the updates that Facebook added to Google’s Bigtable data model.

Column Family\Column Column
A name value pair (contains also a time-stamp for conflict resolution on the server side) column name column value + timestamp : long : byte[] Column Family A container for columns sorted by their names. Column Families are referenced and sorted by row keys. Column A column is the most basic unit of data structure in the Cassandra data model. A column is a triplet of a name, a value, and a clock, which you can think of as a timestamp for now. Again, although we’re familiar with the term “columns” from the relational world, it’s confusing to think of them in the same way in Cassandra. Cassandra’s clock was introduced in version 0.7, but its fate is uncertain. Prior to 0.7, it was called a timestamp, and was simply a Java long type. It was changed to support Vector Clocks, which are a popular mechanism for replica conflict resolution in distributed systems, and it’s how Amazon Dynamo implements conflict resolution. That’s why you’ll hear the third aspect of the column referred to both as a timestamp and a clock. Vector Clocks may or may not ultimately become how timestamps are represented in Cassandra 0.7, which is in beta at the time of this writing. Column family There are a few good reasons not to go too far with the idea that a column family is like a relational table. First, Cassandra is considered schema-free because although the column families are defined, the columns are not. You can freely add any column to any column family at any time, depending on your needs. Second, a column family has two attributes: a name and a comparator. The comparator value indicates how columns will be sorted when they are returned to you in a query—according to long, byte, UTF8, or other ordering. row key column name 1 column value 1 column name n column value n

Super Column Family\Super Column
A sorted associative array of columns. column name 1 column value 1 column name n column value n super column name Super Column Family A container for super columns sorted by their names. Like Column Families, Super Column Families are referenced and sorted by row keys. Super Columns A super column is a special kind of column. Both kinds of columns are name/value pairs, but a regular column stores a byte array value, and the value of a super column is a map of subcolumns (which store byte array values). Note that they store only a map of columns; you cannot define a super column that stores a map of other super columns. So the super column idea goes only one level deep, but it can have an unbounded number of columns. The basic structure of a super column is its name, which is a byte array (just as with a regular column), and the columns it stores. Its columns are held as a map whose keys are the column names and whose values are the columns. Fun fact: super columns were one of the updates that Facebook added to Google’s Bigtable data model. row key super column name 1 super column name m column name 1 column value 1 column name n1 column value n1 column name nm column value nm

Addressing Column Family
row key column name 1 column value 1 column name n column value n Four-dimensional hash [Keyspace][ColumnFamily][Key][Column] Addressing Super Column Family row key super column name 1 super column name m column name 1 column value 1 column name n1 column value n1 column name nm column value nm Five-dimensional hash [Keyspace][ColumnFamily][Key][SuperColumn][SubColumn]

Cassandra client options
Thrift (12 different languages) Avro (data serialization system) Java: Hector: (abstraction over thrift) Pelops: (abstraction over thrift) CQL: JDBC driver for Cassandra version starting from 0.8 (SQL like language) Hector JPA: (ORM client) Cassandrelle: (documentation ???) Kundera: (buggy ???) Python: Pycassa, Telephus Grails: grails-cassandra .NET: Aquiles, FluentCassandra Ruby: Cassandra PHP: phpcassa, SimpleCassie Thrift Thrift is the driver-level interface; it provides the API for client implementations in a wide variety of languages. Thrift was developed at Facebook and donated as an Apache project with Incubator status in It’s available at thrift, though you don’t need to download it separately to use Cassandra. Thrift is a code generation library for clients in C++, C#, Erlang, Haskell, Java, Objective C/Cocoa, OCaml, Perl, PHP, Python, Ruby, Smalltalk, and Squeak. Its goal is to provide an easy way to support efficient RPC calls in a wide variety of popular languages, without requiring the overhead of something like SOAP. To use it, you create a language-neutral service definition file that describes your data types and service interface. This file is then used as input into the engine that generates RPC client code libraries for each of the supported languages. The effect of the static generation design choice is that it is very easy for the developer to use, and the code can perform efficiently because validation happens at compile time instead of runtime. Avro (starting with version 0.7) Avro provides functionality similar to systems such as Thrift, Protocol Buffers, etc. Avro differs from these systems in the following fundamental aspects. Dynamic typing: Avro does not require that code be generated. Data is always accompanied by a schema that permits full processing of that data without code generation, static datatypes, etc. This facilitates construction of generic data-processing systems and languages. Untagged data: Since the schema is present when data is read, considerably less type information need be encoded with data, resulting in smaller serialization size. No manually-assigned field IDs: When a schema changes, both the old and new schema are always present when processing data, so differences may be resolved symbolically, using field names. Hector (Java) Hector is an open source project written in Java using the MIT license. It was created by Ran Tavory of Outbrain (previously of Google) and is hosted at GitHub. It was one of the early Cassandra clients and is used in production at Outbrain. It wraps Thrift and offers JMX, connection pooling, and failover. But if your client connects to a node that has gone down, it would be nice to have your client fail over—to automatically search for another node to use to complete your request. Pelops (Java) Pelops is a free, open source Java client written by Dominic Williams. It is similar to Hector in that it’s Java-based, but it was started more recently. This has become a very popular client. Its goals include the following: • To create a simple, easy-to-use client • To completely separate concerns for data processing from lower-level items such as connection pooling • To act as a close follower to Cassandra so that it’s readily up to date

Cassandra\RDBMS query differences
No update query Record-level atomicity on writes No duplicate keys Basic write properties: consistency level (ZERO, ANY, ONE, QUORUM, ALL) Basic read properties: consistency level (ONE, QUORUM, ALL) No Update Query There is no first-order concept of an update in Cassandra, meaning that there is no client query called an “update.” You can readily achieve the same effect, however, by simply performing an insert using an existing row key. If you issue an insert statement for a key that already exists, Cassandra will overwrite the values for any matching columns; if your query contains additional columns that don’t already exist for that row key, then the additional columns will be inserted. This is all seamless. Record-Level Atomicity on Writes Cassandra automatically gives you record-level atomicity on every write operation. In RDBMS, you would have to specify row-level locking. Although Cassandra offers atomicity at the column family level, it does not guarantee isolation. No Duplicate Keys It is possible in SQL databases to insert more than one row with identical values if you have not defined a unique primary key constraint on one of the columns. This is not possible in Cassandra. If you write a new record with a key that already exists in a column family, the values for any existing columns will be overwritten, and any columns that previously were not present for that row will be added to the row.

Integrating Hadoop ( is a set of open source projects that deal with large amounts of data in a distributed way. Hadoop Distributed File System (HDFS): a distributed file system that provides high-throughput access to application data. Hadoop MapReduce: a software framework for distributed processing of large data sets on compute clusters. Other Hadoop-related projects at Apache include: Cassandra™: a scalable multi-master database with no single points of failure. Hive™: a data warehouse infrastructure that provides data summarization and ad hoc querying. Mahout™: a Scalable machine learning and data mining library. Pig™: a high-level data-flow language and execution framework for parallel computation. Its Hadoop distributed filesystem (HDFS) and MapReduce subprojects are open source implementations of Google’s GFS and MapReduce. ColumnFamilyInputFormat The main class we’ll use to interact with data stored in Cassandra from Hadoop. It’s an extension of Hadoop’s InputFormat abstract class. ConfigHelper A helper class to configure Cassandra-specific information such as the server node to point to, the port, and information specific to your MapReduce job. ColumnFamilySplit The extension of Hadoop’s InputSplit abstract class that creates splits over our Cassandra data. It also provides Hadoop with the location of the data, so that it may prefer running tasks on nodes where the data is stored. ColumnFamilyRecordReader The layer at which individual records from Cassandra are read. It’s an extension of Hadoop’s RecordReader abstract class.

The end Questions?

: what’s all the buzz about?

Similar presentations

Presentation on theme: ": what’s all the buzz about?"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

: what’s all the buzz about?

Similar presentations

Presentation on theme: ": what’s all the buzz about?"— Presentation transcript:

Similar presentations

About project

Feedback