Presentation is loading. Please wait.

Presentation is loading. Please wait.

Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t DBCF GT IT Monitoring WG Technology for Storage/Analysis 28 November 2011.

Similar presentations


Presentation on theme: "Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t DBCF GT IT Monitoring WG Technology for Storage/Analysis 28 November 2011."— Presentation transcript:

1 Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t DBCF GT IT Monitoring WG Technology for Storage/Analysis 28 November 2011

2 Grid Technology NoSQL Overview Highlights –Non-relational –Distributed, Easy replication support –Open-source –Horizontally scalable, High scalability –Simple API Use cases –Large data volumes –Extreme query workloads –Schema evolution

3 Grid Technology The Zoo of solutions

4 Grid Technology Classification (data model) NoSQL key-value based –BerkleyDB, Dynamo, Veldemort, Redis, Scalaris, etc NoSQL column/tabular based –Hadoop, Cassandra, HBase, Hive, Hypertable, etc NoSQL document based –MongoDB, CouchDB, SimpleDB, Riak, etc Relational DBMS –Oracle, MySQL, etc Column based DBMS –Vertica, Infobright, LucidDB, etc

5 Grid Technology NoSQL Key-value Store Data items stored and paired with a key Data accessible by a hash map Fast storage/retrieval of simple data by primary key Complex queries are not straightforward Modeling applications can get complicated

6 Grid Technology NoSQL Document Store More complex and meaningful data structures Based on versioned structured documents Values associated with keys are full documents The documents are stored in formats like JSON Provides more modeling flexibility Good for incomplete datasets Easy to map data from object-oriented software

7 Grid Technology NoSQL Document Store MongoDBCouchDB Programming Language C++Erlang HDFS Support No (GridFS)No Document Format BSONJSON Query Method Object-based Javascript MapReduce Best UseDynamic queries Pre-defined queries Less dynamic data Supported / Used by Foursquare, SourceForge Several websites

8 Grid Technology NoSQL Column Store Each key is associated with many attributes Data stored as column families (similar to namespace for a set of related attributes) Most known because of Google’s BigTable implementation. Used by the largest and best supported NoSQL implementations Store and process very large amounts Very high throughput Strong partitioning support

9 Grid Technology NoSQL Column Store CassandraHBaseHypertableHive Programming Language Java C++Java HDFS Support Yes Batch Processing No Yes Query Method MapReduceMapreduceHQLHiveQL Best Use Real-time write Real-time read/write - Complex Queries Supported / Used by Facebook, Reddit, Digg Facebook, Adobe, Yahoo, Twitter Baidu Facebook, Amazon

10 Grid Technology Final Considerations Start prototyping with few use cases –Take few use cases spanning across different groups –One use case based on NoSQL document store –One or two use cases based on NoSQL column store –Each use case should involve 2+ groups –Try to maximize the collaboration between groups Get feedback from NoSQL team –Status of their work –Plan the next steps together Terminology and Shared Architecture - 10

11 Grid Technology Final Considerations Do not forget NoSQL distributions (as Cloudera) Do not forget (R)DBMS !


Download ppt "Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t DBCF GT IT Monitoring WG Technology for Storage/Analysis 28 November 2011."

Similar presentations


Ads by Google