Download presentation
Presentation is loading. Please wait.
Published byAnn Shelton Modified over 9 years ago
1
NoSQL Or Peles
2
What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...
3
RDBMS Limitations Hard to scale horizontally (for updates) – Distributed ACID requires 2 phase commit. Schema can be a bitch – Hard to change. – Data normalization can slow down queries.
4
Web Scale Some numbers: – Youtube serves over 100MM videos a day. – Ebay adds over 10TB of storage every week. – Facebook holds over 80 Billion photos, and serves hundreds of thousands of requests/second.
5
Ideal System Available – can always read and write. Consistent – Reads always pick up the latest write. Partition tolerant – The system can be split across multiple machines and datacenters.
6
Starbucks doesn’t use two phase commit A great example presented here.here Asynchronous execution Correlation Exception handling: – Write off – Retry – Compensation 2 phase commit would create a choke point.
7
CAP Theorem CAP (Eric Brewer, 2000): Simply put, of the following 3 properties: Consistency Availability Partition tolerance Only two can hold at any system.
8
CAP in practice
9
CA Two phase commits, works best at a single data center. Scaling issues. CP Sharding. Data may become unavailable if a shard fails. AP May return inaccurate data. DNS is a prime example.
10
Consistency Types Strict Eventual – Causual – Read your writes – Session – Monotonic read – Monotonic write
11
Concepts In memory vs. disk based. Shared everything vs. shared nothing. Master slave vs. server symmetry. Elastic scalability. MapReduce.
12
Sharding Split data across machines (database instances). – Feature based sharding. – Key based sharding. – Lookup table.
13
NoSQL Categories Key-Value stores Document store Tabular
14
Lean & Mean The Key-Value In-Memory DBs In memory DBs are simpler and faster than their on-disk counterparts. Key value stores offer a simple interface with no schema. Major limitation – data size is limited to RAM size. Often used as caches for on-disk DB systems.
15
Open Source In-Memory DBs Memcached/MemchachedDB Redis – Both are key-value stores that rely on hash partitioning – Memcached is an LRU based cache. – Redis is more of a data structure server. – Both use a Shared-Nothing architecture
16
Memcached Really a giant, distributed hash table. Advantages: – Relatively simple – Practically no server to server talk. – Linear scalability Disadvantages: – Doesn’t understand data – no server side operations. The key and value are always strings. – It’s really meant to only be a cache – no more, no less. – No recovery, limited elasticity.
17
Redis Like Memcached, it’s a distributed hash in memory. Offers support for lists and sets, as well as strings. Offers limited server side operations. Supports master-slave architecture and data replicas for scalability and high availability. Also supports a persistent mode that writes to disk.
18
Document Stores As the name implies, these databases store documents. Usually schema-free. The same database can store multiple documents. Allow indexing based on document content. Prominent examples: CouchDB, MongoDB.
19
Documents A document is just a collection of values, usually serialized in JSON. Many implementations offer nesting of documents Example: { "username" : "bob", "address" : { "street" : "123 Main Street", "city" : "Springfield", "state" : "NY" } }
20
CouchDB Written in ERLANG. Offers ACID guarantees based on multi-version control. Supports replication, but isn’t a real distributed database.
21
MongoDB Written in C++. Atomic operations on single documents only. Excellent scalability based on sharding. Support for server side javascript and MapReduce.
22
Tabular stores The original: Google’s BigTable – Proprietary, not open source. The open source elephant alternative – Hadoop with HBase. A top level Apache Project. Large number of users. Contains a distributed file system, MapReduce, a database server (Hbase), and more. Rack aware.
23
Hadoop components
24
Hadoop basic components At it’s core, Hadoop is a framework for running MapReduce operations on large data sets. The data sets are placed as text files on the distributed file system.
25
Hadoop MapReduce Flow
26
HBase A database engine built on top of Hadoop distributed file system. Scales up to Billions of rows with Millions of columns. Has a Java interface for queries.
27
The Tradeoff – SQL vs. NoSQL RDBMS: – Mature. – Standard SQL (but not for DDL, extensions). – Robust tools. NoSQL: – Scale – Schemaless
28
References Eventual Consistency (Werner Vogels, CTO, Amazon) Eventual Consistency Starbucks doesn’t use two phase commit. Starbucks doesn’t use two phase commit Hadoop the definitive guide (O’Reilly) MongoDB the definitive guide (O’Reilly) Many wiki pages
29
Questions
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.