From Startup to Enterprise A Story of MySQL Evolution Vidur Apparao, CTO Stephen OSullivan, Manager of Data and Grid Technologies April 2009.

Slides:



Advertisements
Similar presentations
How We Manage SaaS Infrastructure Knowledge Track
Advertisements

Forward Data Cache Integration Pattern
Case Study: Photo.net March 20, What is photo.net? An online learning community for amateur and professional photographers 90,000 registered users.
Database Systems: Design, Implementation, and Management
Extreme Performance with Oracle Data Warehousing
Inner Architecture of a Social Networking System Petr Kunc, Jaroslav Škrabálek, Tomáš Pitner.
Best Practices for Implementing High Availability for SAS® 9.4
QA practitioners viewpoint
Megastore: Providing Scalable, Highly Available Storage for Interactive Services. Presented by: Hanan Hamdan Supervised by: Dr. Amer Badarneh 1.
HP Virtual Tape Library (VTL) Appliance Powered by IPStor Ross Parker – Sales Director, Northern Europe.
Performance Tuning for Informer PRESENTER: Jason Vorenkamp| | October 11, 2010.
Based on the text by Jimmy Lin and Chris Dryer; and on the yahoo tutorial on mapreduce at index.html
EHarmony in Cloud Subtitle Brian Ko. eHarmony Online subscription-based matchmaking service Available in United States, Canada, Australia and United Kingdom.
HadoopDB Inneke Ponet.  Introduction  Technologies for data analysis  HadoopDB  Desired properties  Layers of HadoopDB  HadoopDB Components.
Amazon RDS (MySQL and Oracle) and SQL Azure Emil Tabakov Telerik Software Academy academy.telerik.com.
Adding scalability to legacy PHP web applications Overview Mario A. Valdez-Ramirez.
 Need for a new processing platform (BigData)  Origin of Hadoop  What is Hadoop & what it is not ?  Hadoop architecture  Hadoop components (Common/HDFS/MapReduce)
Overview Distributed vs. decentralized Why distributed databases
Google Bigtable A Distributed Storage System for Structured Data Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University.
GridGain In-Memory Data Fabric:
Presentation by Krishna
NoSQL and NewSQL Justin DeBrabant CIS Advanced Systems - Fall 2013.
Definition of terms Definition of terms Explain business conditions driving distributed databases Explain business conditions driving distributed databases.
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications C. Faloutsos – A. Pavlo How to Scale a Database System.
Russ Houberg Senior Technical Architect, MCM KnowledgeLake, Inc.
Gowtham Rajappan. HDFS – Hadoop Distributed File System modeled on Google GFS. Hadoop MapReduce – Similar to Google MapReduce Hbase – Similar to Google.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
1 Copyright 2008 MySQL AB The World’s Most Popular Open Source Database MySQL Enterprise for SaaS and Managed Hosting Providers Jimmy Guerrero Sr Product.
How WebMD Maintains Operational Flexibility with NoSQL Rajeev Borborah, Sr. Director, Engineering Matt Wilson – Director, Production Engineering – Consumer.
22-Aug-15 | 1 |1 | Help! I need more servers! What do I do? Scaling a PHP application.
Distributed Data Stores – Facebook Presented by Ben Gooding University of Arkansas – April 21, 2015.
Opensource for Cloud Deployments – Risk – Reward – Reality
Chapter 10 : Designing a SQL Server 2005 Solution for High Availability MCITP Administrator: Microsoft SQL Server 2005 Database Server Infrastructure Design.
Managing Multi-User Databases AIMS 3710 R. Nakatsu.
Training Workshop Windows Azure Platform. Presentation Outline (hidden slide): Technical Level: 200 Intended Audience: Developers Objectives (what do.
MapReduce April 2012 Extract from various presentations: Sudarshan, Chungnam, Teradata Aster, …
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
ArcGIS for Server: Reference Implementations
Introduction to Hadoop and HDFS
1 Moshe Shadmon ScaleDB Scaling MySQL in the Cloud.
Oracle Tuning Ashok Kapur Hawkeye Technology, Inc.
The Memory B. Ramamurthy C B. Ramamurthy1. Topics for discussion On chip memory On board memory System memory Off system/online storage/ secondary memory.
1 HBase Intro 王耀聰 陳威宇
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation MongoDB Architecture.
Introduction to Hbase. Agenda  What is Hbase  About RDBMS  Overview of Hbase  Why Hbase instead of RDBMS  Architecture of Hbase  Hbase interface.
SQL Server 2005 Implementation and Maintenance Chapter 12: Achieving High Availability Through Replication.
Authors Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, Ramana.
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Big Data Directions Greg.
Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure.
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...
Enhancing Scalability and Availability of the Microsoft Application Platform Damir Bersinic Ruth Morton IT Pro Advisor Microsoft Canada
What we know or see What’s actually there Wikipedia : In information technology, big data is a collection of data sets so large and complex that it.
© 2006 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Database Growth: Problems & Solutions.
Harnessing Big Data with Hadoop Dipti Sangani; Madhu Reddy DBI210.
Configuring SQL Server for a successful SharePoint Server Deployment Haaron Gonzalez Solution Architect & Consultant Microsoft MVP SharePoint Server
Click to edit Master title style Sytel’s High Availability Strategy © 2012 Sytel Limited. All rights reservedVersion 2.5.
BIG DATA/ Hadoop Interview Questions.
Abstract MarkLogic Database – Only Enterprise NoSQL DB Aashi Rastogi, Sanket V. Patel Department of Computer Science University of Bridgeport, Bridgeport,
Ignite in Sberbank: In-Memory Data Fabric for Financial Services
Call-Center Agents, Customers Communicate More Conveniently with SMS Chat App COMPANY PROFILE: EARLY CONNECT Early Connect is a regional SaaS ISV founded.
1 Gaurav Kohli Xebia Breaking with DBMS and Dating with Relational Hbase.
and Big Data Storage Systems
Managing Multi-User Databases
Open Source distributed document DB for an enterprise
Maximum Availability Architecture Enterprise Technology Centre.
PowerMart of Informatica
Gowtham Rajappan.
Web Application Server 2001/3/27 Kang, Seungwoo. Web Application Server A class of middleware Speeding application development Strategic platform for.
Presentation transcript:

From Startup to Enterprise A Story of MySQL Evolution Vidur Apparao, CTO Stephen OSullivan, Manager of Data and Grid Technologies April 2009

Data Size Growth 2009 LiveOps, Inc.2 The data volume for a majority of companies increases % every year. (IDC) Common Solutions: Rely on Moores Law Spend more money But there are other ways…

About LiveOps Technology Platform for Contact Centers On-Demand, Multi-tenanted Contact Center Platform Virtual Call Center of 20,000 independent home agents Eight years of continuous growth Founded in 2000 Profitable since employees 2009 LiveOps, Inc.3

LiveOps Data Main data classes: Configuration data – low GBs, slow change Logging data – low TBs, ever increasing System state – MBs, rapid change Customer-specific data – high GBs, versioned Largest table has 1.4 billion rows Tenant key as a column on all tables Multi-site deployment for high availability 2009 LiveOps, Inc.4 Configuration DataTransaction dataSession data Configuration ToolsReporting ToolsMonitoring Tools Web ApplicationsTelephony Applications

Phase 1: The Basic Model Application servers connecting to a single DB Replication to a slave for backup & load balancing 2009 LiveOps, Inc.5 Web & Telephony Applications R/W Master MySQL Replication Slave/Backup

Primary Drivers for Change AvailabilityPerformanceScale

Options for Improving (Write) Scale Sharding Partition data into distinct databases based on a sharding key Functional Segmentation Separate functional data classes into distinct databases MySQL Partitioning LiveOps choice: Sharding, Functional Segmentation 2009 LiveOps, Inc.7

Options for Improving (Query) Performance Replication & Load balancing Distribute query load across multiple replicants Separation of DB roles Separate fast from slow, OLTP from OLAP Caching Reduce dependency on the database for queries Consistent query tuning/optimization LiveOps choice: Load balancing, separation of roles 2009 LiveOps, Inc.8

Options for Improving Availability Application resilience Remove requirement of direct write access and/or degrade gracefully MySQL Cluster Multi-master replication Split tables or databases between ring replicating masters DRBD or SAN HA LiveOps choice: Application resilience, multi-master 2009 LiveOps, Inc.9

2. Functional segmentation of data between multiple masters. 6. Separation of DB roles based on type and cost of query. 4. Replication to a farm of read-only replicants within and across data centers. 5. Load balancing using DB monitoring and pushed configuration files. 3. Multi-master replication and quick recovery processes on master failure. 1. Data writers use store- and-forward pattern for fault tolerance. Phase 2: A Pure MySQL Solution 2009 LiveOps, Inc.10 Reporting and Analytics Web & Telephony Applications R/W Masters Read-only Replicants w/ Roles Queries Writes DB Monitor/ Load Balancer Monitoring Config. Push Logging Config + Session All of these techniques still dont get us to horizontal data scalability

Horizontal Scalability Options Distributed Storage Systems DFS for unstructured file storage BigTable/HBase for structured data storage Various vendors with distributed RDBMSs Grid Processing MapReduce and Hadoop 2009 LiveOps, Inc.11

Our Approach Take logging data out of the transactional databases Reduce replication load Store logging data as text files in a DFS Use MapReduce for ETL into OLAP databases Leverage open source tools like ActiveMQ and Hadoop 2009 LiveOps, Inc.12

6. MySQL as a data mart. Phase 3: MySQL w/ Horizontal Scalability 2009 LiveOps, Inc.13 Broker ActiveMQ Brokers Repository Process Map Reduce DFS Hadoop Backup Storage Array Reporting and Analytics R/W Masters Read-only Replicants Web & Telephony Applications Data Marts Audit Process 1. MySQL continues as OLTP solution. 2. Logging data now written to log files on local disk. 3. Log files moved via ActiveMQ to a log repository and DFS. 5. Hadoop as MapReduce system for ETL. Horizontal scalability is now in reach! 4. Audit process to reconcile data between log files and DFS.

Learned Best Practices Know your data Build and enforce a data access layer Put your data only where you need it Experiment early and often 2009 LiveOps, Inc.14

Conclusions MySQL, other open source technology, and commodity hardware can be used to build a horizontally scalable data solution Companies today are left to chart their own evolutionary paths Collaboration and communication between companies in this area can help everyone 2009 LiveOps, Inc.15

Thank You. Vidur Apparao, Stephen OSullivan