© 2011 Citrusleaf. All rights reserved.1 A Real-Time NoSQL DB That Preserves ACID Citrusleaf Srini V. Srinivasan Brian Bulkowski VLDB, 09/01/11.

Slides:



Advertisements
Similar presentations
Copyright © 2007, GemStone Systems Inc. All Rights Reserved. Optimize computations with Grid data caching OGF21 Jags Ramnarayan Chief Architect, GemFire.
Advertisements

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 1.
Google App Engine Cloud B. Ramamurthy 7/11/2014CSE651, B. Ramamurthy1.
NoSQL Databases: MongoDB vs Cassandra
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 17 Client-Server Processing, Parallel Database Processing,
5 Creating the Physical Model. Designing the Physical Model Phase IV: Defining the physical model.
NoSQL and NewSQL Justin DeBrabant CIS Advanced Systems - Fall 2013.
National Manager Database Services
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 1 Preview of Oracle Database 12 c In-Memory Option Thomas Kyte
A Study in NoSQL & Distributed Database Systems John Hawkins.
Windows ® Powered NAS. Agenda Windows Powered NAS Windows Powered NAS Key Technologies in Windows Powered NAS Key Technologies in Windows Powered NAS.
How WebMD Maintains Operational Flexibility with NoSQL Rajeev Borborah, Sr. Director, Engineering Matt Wilson – Director, Production Engineering – Consumer.
Distributed Data Stores – Facebook Presented by Ben Gooding University of Arkansas – April 21, 2015.
Windows Azure SQL Database and Storage Name Title Organization.
SilverLining. Stuff we're covering Hardware infrastructure and scaling Cloud platform as a service The SilverLining Project.
SANPoint Foundation Suite HA Robert Soderbery Sr. Director, Product Management VERITAS Software Corporation.
Word Wide Cache Distributed Caching for the Distributed Enterprise.
PMIT-6102 Advanced Database Systems
Database Services for Physics at CERN with Oracle 10g RAC HEPiX - April 4th 2006, Rome Luca Canali, CERN.
IMS 4212: Distributed Databases 1 Dr. Lawrence West, Management Dept., University of Central Florida Distributed Databases Business needs.
Distributing Data for Availability and Scalability Don Vilen Program Manager SQL Server Microsoft Corporation.
Technology Overview. Agenda What’s New and Better in Windows Server 2003? Why Upgrade to Windows Server 2003 ?  From Windows NT 4.0  From Windows 2000.
Sofia, Bulgaria | 9-10 October SQL Server 2005 High Availability for developers Vladimir Tchalkov Crossroad Ltd. Vladimir Tchalkov Crossroad Ltd.
Distributed Indexing of Web Scale Datasets for the Cloud {ikons, eangelou, Computing Systems Laboratory School of Electrical.
Data Warehousing at Acxiom Paul Montrose Data Warehousing at Acxiom Paul Montrose.
1 Moshe Shadmon ScaleDB Scaling MySQL in the Cloud.
1 Yasin N. Silva Arizona State University This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
NoSQL Databases Oracle - Berkeley DB Rasanjalee DM Smriti J CSC 8711 Instructor: Dr. Raj Sunderraman.
NoSQL Databases Oracle - Berkeley DB. Content A brief intro to NoSQL About Berkeley Db About our application.
IMDGs An essential part of your architecture. About me
DATABASE MANAGEMENT SYSTEMS IN DATA INTENSIVE ENVIRONMENNTS Leon Guzenda Chief Technology Officer.
Usenix Annual Conference, Freenix track – June 2004 – 1 : Flexible Database Clustering Middleware Emmanuel Cecchet – INRIA Julie Marguerite.
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
Management Information Systems, 4 th Edition 1 Chapter 8 Data and Knowledge Management.
Introduction.  Administration  Simple DBMS  CMPT 454 Topics John Edgar2.
Copyright © 2006, GemStone Systems Inc. All Rights Reserved. Increasing computation throughput with Grid Data Caching Jags Ramnarayan Chief Architect GemStone.
CNN Case Study: Deploying eDirectory ™ in a UNIX Environment Steve Brunton Chief Engineer CNN Internet Technologies
Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure.
Enhancing Scalability and Availability of the Microsoft Application Platform Damir Bersinic Ruth Morton IT Pro Advisor Microsoft Canada
Your Data Any Place, Any Time Performance and Scalability.
Features Of SQL Server 2000: 1. Internet Integration: SQL Server 2000 works with other products to form a stable and secure data store for internet and.
Chapter 1 Database Access from Client Applications.
Load Rebalancing for Distributed File Systems in Clouds.
MarkLogic The Only Enterprise NoSQL Database Presented by: Aashi Rastogi ( ) Sanket Patel ( )
BIG DATA/ Hadoop Interview Questions.
Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.
Abstract MarkLogic Database – Only Enterprise NoSQL DB Aashi Rastogi, Sanket V. Patel Department of Computer Science University of Bridgeport, Bridgeport,
Ignite in Sberbank: In-Memory Data Fabric for Financial Services
Gorilla: A Fast, Scalable, In-Memory Time Series Database
CSCI5570 Large Scale Data Processing Systems
CS 540 Database Management Systems
Data Platform and Analytics Foundational Training
Distributed Cache Technology in Cloud Computing and its Application in the GIS Software Wang Qi Zhu Yitong Peng Cheng
Introduction to Cassandra
Improving searches through community clustering of information
MongoDB Er. Shiva K. Shrestha ME Computer, NCIT
Open Source distributed document DB for an enterprise
Maximum Availability Architecture Enterprise Technology Centre.
Couchbase Server is a NoSQL Database with a SQL-Based Query Language
Introduction to NewSQL
Google Filesystem Some slides taken from Alan Sussman.
What is the Azure SQL Datawarehouse?
Distributed File Systems
AWS Cloud Computing Masaki.
Taming the Big Data Fire Hose
DAT381 Team Development with SQL Server 2005
Azure Cosmos DB with SQL API .Net SDK
NoSQL databases An introduction and comparison between Mongodb and Mysql document store.
SQL Server 2016 High Performance Database Offer.
Presentation transcript:

© 2011 Citrusleaf. All rights reserved.1 A Real-Time NoSQL DB That Preserves ACID Citrusleaf Srini V. Srinivasan Brian Bulkowski VLDB, 09/01/11

Citrusleaf  The real-time NoSQL database company – Reliable, Scalable, Exceptionally fast – Immediate consistency (ACID compliant)  Founded 2009  Citrusleaf V2.0 (in production since Sept. 2010) – 200K+ TPS per node – Low latency – Runs on commodity h/w – 24x7 uptime – Several Web scale deployments  Citrusleaf RTA (in production since July 2011) © 2011 Citrusleaf. All rights reserved.2VLDB, 09/01/11

High velocity user data  Applications – Real-time bidding applications  Cookie matching  Server side user profiles  Frequency capping – Online & social game data  Retrieval of select user histories in seconds  User ID storage & access – High Traffic Web Sites  Session Management  DB Requirements – High write/read ratio (e.g. 70% reads, 30% writes) – Need access to recent data – Need low latency (milliseconds) © 2011 Citrusleaf. All rights reserved.3VLDB, 09/01/11

© 2011 Citrusleaf. All rights reserved.4 Real-time matching Citrusleaf Application Users Million JoeSmithToronto KevinLyonSan Jose LisaJingNew York MikeNolanDetroit AshwinIyerChicago Citrusleaf Database 500M+ Objects100K+ operations/second100% uptime Flexible scalingLow latency (< 1ms)Self Management VLDB, 09/01/11

 Combination of OLTP & distributed technology  Architecture – Client Layer – Distribution Layer – Data Layer  Linear scale-out algorithms © 2011 Citrusleaf. All rights reserved.5 Citrusleaf 2.0 VLDB, 09/01/11

Transactions, short and long © 2011 Citrusleaf. All rights reserved.6  Short transactions with Immediate Consistency  Writes applied synchronously to all copies  Long running data rebalancing tasks  Prioritized lower than short transactions  24X7 uptime considerations  Relax availability for brief periods to maintain consistency  Relax consistency during partitions to maintain availability VLDB, 09/01/11

 Parallel query optimization  Client cluster knowledge – Non-stop transactions – Efficient transaction routing; higher speed  Source-code available plugs easily into custom application environments © 2011 Citrusleaf. All rights reserved.7 Client layer VLDB, 09/01/11

 Shared nothing  Automatic load & data balancing  Distributed transaction commit  Tunable consistency  Low-overhead consensus © 2011 Citrusleaf. All rights reserved.8 Distribution layer VLDB, 09/01/11

 Optimized for cost- effective hardware combinations – DRAM and rotational – SSD – High capacity rotational indexes  Real-time eviction – Integration with warehousing solutions © 2011 Citrusleaf. All rights reserved.9 Data layer VLDB, 09/01/11

Technology © 2011 Citrusleaf. All rights reserved.10  Distributed Index techniques for performance  Multi-level concurrency control ending in a record lock  Fast snapshots based using mark and sweep  Schema free data API  Dynamically extensible data types  Multi-language support: C, PHP, Java, Python, Ruby, …  Self-management  Ease of upgrading VLDB, 09/01/11

 Major Real-Time Advertisement Company – Applications :  User Profile Store  Real Time Bidding Infrastructure – Environment  > 50 servers  3 data centers worldwide  24 x 7 uptime (100% available)  Commodity hardware  Full support for SSD and DRAM/HDD storage – Fast deployment (4-8 weeks) © 2011 Citrusleaf. All rights reserved.11 Example Use Case VLDB, 09/01/11

Benchmarks © 2011 Citrusleaf. All rights reserved.12  Setup  2-4 node clusters  2 copies of data in cluster  Immediate consistency  Commodity nodes  Results  Linear scale up  Over 200,000 tps per node  Sub-millisecond latency VLDB, 09/01/11

Future Directions  Cross data center replication  Real-time analytics/reporting  Multi-record transactions  Graph APIs  SQL support ... © 2011 Citrusleaf. All rights reserved.13VLDB, 09/01/11

Summary  Unique set of functionality – Immediately consistent – Self-managing clusters – High performance: 200K+ TPS per node, low latency (sub millisecond) – Support for billions of objects & high volumes of transaction data – Flexible data storage (DRAM, SSD & Rotational Disk)  High ROI – Low TCO: 2 to 5X less expensive hardware setup cost – Fast deployment (a matter of weeks) – Highly available and self-sustaining © 2011 Citrusleaf. All rights reserved.14VLDB, 09/01/11

© 2011 Citrusleaf. All rights reserved.15 Questions VLDB, 09/01/11