Presentation is loading. Please wait.

Presentation is loading. Please wait.

Cassandra Database Project Alireza Haghdoost, Jake Moroshek Computer Science and Engineering University of Minnesota-Twin Cities Nov. 17, 2011 News Presentation:

Similar presentations


Presentation on theme: "Cassandra Database Project Alireza Haghdoost, Jake Moroshek Computer Science and Engineering University of Minnesota-Twin Cities Nov. 17, 2011 News Presentation:"— Presentation transcript:

1 Cassandra Database Project Alireza Haghdoost, Jake Moroshek Computer Science and Engineering University of Minnesota-Twin Cities Nov. 17, 2011 News Presentation: Joab Jackson, “New Cassandra Can Pack Two Billion Columns Into a Row”, PCWorld News, January 2011.

2 What was the Problem ?  Facebook Messages Inbox Search  Feature that enables users to search through their Facebook Inbox  Millions of messages are sent everyday on Facebook  Messages stored in different data centers  How to handle indexing all of this information for Inbox search ? 2

3 What is Cassandra ?  Distributed storage system  Designed for managing kind of NoSQL database NoSQL: Key-Value, schema-less database  Scale to a very large size across many servers spread across different datacenters small and large components fail continuously  No single point of failure Data replicated at several nodes 3

4 Cassandra Goals  High scalability  The ability to scale incrementally  High performance  The ability to respond quickly  High availability  The ability to retain data available for users 4

5 Cassandra Data Model  Cassandra does not support a full relational data model  Key-Value data model  Every row is identified by a unique key  Every row can have unlimited number of Columns classified in different columns family can pack Two Billion columns into a row  Columns are sorted in a row by name order time order (required for inbox search) 5

6 Distribution and Replication  Data is distributed across the nodes using Consistent Hashing function  High availability is achieved using replication  If one storage node fails, data that has been replicated in other nodes is available.  Data replicate at N node across data centers actively.  Replication policies: Rack Unaware Rack Aware Datacenter Aware 6

7 Users of Cassandra System  First deployment:  2008 by Facebook, inspired by Google and Amazon  Designed for message inbox search system  Stores TB’s of indexes across a cluster of 600+ cores and 120+ TB of disk space  Each node can handle over 5,000 requests per second  Well-known users: 7

8 References  Prashant Malik, “Inbox Search” http://ja-jp.facebook.com/blog.php?post=20387467130http://ja-jp.facebook.com/blog.php?post=20387467130  Joab Jackson, “Apache Cassandra Ready for the Enterprise”, http://www.pcworld.com/businesscenter/article/242111/apache_cassandra_ready_for_the_enterprise.html#tk.mod_rel  Joab Jackson “, New Cassandra Can Pack Two Billion Columns Into a Row http://www.pcworld.com/businesscenter/article/216766/new_cassandra_can_pack_two_billion_columns_into_a_row.htmlhttp://www.pcworld.com/businesscenter/article/216766/new_cassandra_can_pack_two_billion_columns_into_a_row.html”  Avinash Lakshman and Prashant Malik. “Cassandra: a decentralized structured storage system” SIGOPS Oper. Syst. Rev. 44, 2 (April 2010) http://doi.acm.org/10.1145/1773912.1773922 8

9 Thank You 9


Download ppt "Cassandra Database Project Alireza Haghdoost, Jake Moroshek Computer Science and Engineering University of Minnesota-Twin Cities Nov. 17, 2011 News Presentation:"

Similar presentations


Ads by Google