Migrating from PostgreSQL to MySQL at Cocolog Naoto Yokoyama, NIFTY Corporation Garth Webb, Six Apart Lisa Phillips, Six Apart Credits: Kenji Hirohama,

Slides:



Advertisements
Similar presentations
How We Manage SaaS Infrastructure Knowledge Track
Advertisements

From Startup to Enterprise A Story of MySQL Evolution Vidur Apparao, CTO Stephen OSullivan, Manager of Data and Grid Technologies April 2009.
Presented by Ben Serebin Tue, June 15, Every 2 nd Tuesday of the Month. Same Time and Place Visit for Presentation.
Enterprise Information Server Frankfurt/Main Presentation by Dipl.-Ing. Ralf Steffler Netcool Certified Consultant
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 15 Introduction to Rails.
© 2006 Open Grid Forum GGF18, 13th September 2006 OGSA Data Architecture Scenarios Dave Berry & Stephen Davey.
The creation of "Yaolan.com" A Site for Pre-natal and Parenting Education in Chinese by James Caldwell DAE Interactive Marketing a Web Connection Company.
Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Torsten Antoni – LCG Operations Workshop, CERN 02-04/11/04 Global Grid User Support - GGUS -
Case Study: Photo.net March 20, What is photo.net? An online learning community for amateur and professional photographers 90,000 registered users.
Database Systems: Design, Implementation, and Management
What's new?. ETS4 for Experts - New ETS4 Functions - improved Workflows - improvements in relation to ETS3.
Request Tracker IT Partners Conference Oliver Thomas 19 April 2005.
DVDZone2.com From Linux to Windows 2003 Gregory Bronchart [web-o-net] Fabrice Cornet [BrainSys]
Introduction to cloud computing Jiaheng Lu Department of Computer Science Renmin University of China
Oracle Application Express Rapid Application Development Tool
MAP REDUCE PROGRAMMING Dr G Sudha Sadasivam. Map - reduce sort/merge based distributed processing Best for batch- oriented processing Sort/merge is primitive.
New Release Announcements and Product Roadmap Chris DiPierro, Director of Software Development April 9-11, 2014
ArrayExpress Oracle DBA Ahmet Oezcimen. Agenda 1. Tasks 2. System Overview 3. Oracle DB System 4. Database Monitoring 5. Database Security 6. Performance.
SQL Server 2005 RDBMS Technical Overview Matthew Stephen IT Pro Evangelist (SQL Server) Microsoft Ltd.
Visit : Call Us: US: , India:
Visit : Call Us: US: , India:
Database Optimization & Maintenance Tim Richard ECM Training Conference#dbwestECM Agenda SQL Configuration OnBase DB Planning Backups Integrity.
Cacti Workshop Tony Roman Agenda What is Cacti? The Origins of Cacti Large Installation Considerations Automation The Current.
ManageEngine TM Applications Manager 8 Monitoring Custom Applications.
Web Server Administration
Computer Organization and Architecture
Nikolay Tomitov Technical Trainer SoftAcad.bg.  What are Amazon Web services (AWS) ?  What’s cool when developing with AWS ?  Architecture of AWS 
Setting Up a Sandbox Presented by: Kevin Brunson Chief Technology Officer.
Capacity Planning in SharePoint Capacity Planning Process of evaluating a technology … Deciding … Hardware … Variety of Ways Different Services.
Platform as a Service (PaaS)
Google AppEngine. Google App Engine enables you to build and host web apps on the same systems that power Google applications. App Engine offers fast.
Google App Engine Danail Alexiev Technical Trainer SoftAcad.bg.
How WebMD Maintains Operational Flexibility with NoSQL Rajeev Borborah, Sr. Director, Engineering Matt Wilson – Director, Production Engineering – Consumer.
IBIS System: Requirements and Components Lois M. Haggard Office of Public Health Assessment.
Overview of the ODP Data Provider Sergey Sukhonosov National Oceanographic Data Centre, Russia Expert training on the Ocean Data Portal technology, Buenos.
1 Web Database Processing. Web Database Applications Static Report Publishing a report is prepared from a database application and exported to HTML DB.
Copyright © Texas Education Agency, All rights reserved.1 Web Technologies Web Administration.
OM. Brad Gall Senior Consultant
1 Web Server Administration Chapter 1 The Basics of Server and Web Server Administration.
#devshark welcome to #devshark. #devshark HELLO! I’M Ville Rauma Fingersoft Product Owner Web
Meet with the AppEngine Márk Gergely eu.edge. What is AppEngine? It’s a tool, that lets you run your web applications on Google's infrastructure. –Google's.
Service Computation 2010November 21-26, Lisbon.
SQL Queries Relational database and SQL MySQL LAMP SQL queries A MySQL Tutorial and applications Database Building Assignment.
Scalable Web Server on Heterogeneous Cluster CHEN Ge.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
Microsoft Azure SoftUni Team Technical Trainers Software University
The Memory B. Ramamurthy C B. Ramamurthy1. Topics for discussion On chip memory On board memory System memory Off system/online storage/ secondary memory.
By Shanna Epstein IS 257 September 16, Cnet.com Provides information, tools, and advice to help customers decide what to buy and how to get the.
Intro to Datazen.
1 MSRBot Web Crawler Dennis Fetterly Microsoft Research Silicon Valley Lab © Microsoft Corporation.
ESG-CET Meeting, Boulder, CO, April 2008 Gateway Implementation 4/30/2008.
Panasonic UC Pro - UC Pro Server setup with Active Directory -
Configuring SQL Server for a successful SharePoint Server Deployment Haaron Gonzalez Solution Architect & Consultant Microsoft MVP SharePoint Server
Retele de senzori Curs 1 - 1st edition UNIVERSITATEA „ TRANSILVANIA ” DIN BRAŞOV FACULTATEA DE INGINERIE ELECTRICĂ ŞI ŞTIINŢA CALCULATOARELOR.
DB Questions and Answers open session (comments during session) WLCG Collaboration Workshop, CERN Geneva, 24 of April 2008.
2nd year Computer Science & Engineer
Platform as a Service (PaaS)
Deploying Web Application
Platform as a Service (PaaS)
Processes and threads.
Platform as a Service (PaaS)
Chapter 2 Memory and process management
Consulting Services JobScheduler Architecture Decision Template
NGS Oracle Service.
CSI 400/500 Operating Systems Spring 2009
Introduction of Week 3 Assignment Discussion
Google App Engine Danail Alexiev
Scaling and Performance
AlwaysOn Availability Groups
Presentation transcript:

Migrating from PostgreSQL to MySQL at Cocolog Naoto Yokoyama, NIFTY Corporation Garth Webb, Six Apart Lisa Phillips, Six Apart Credits: Kenji Hirohama, Sumisho Computer Systems Corp.

Agenda 1. What is Cocolog 2. History of Cocolog 3. DBP: Database Partitioning 4. Migration From PostgreSQL to MySQL

1. What is Cocolog

What is Cocolog NIFTY Corporation Established in 1986 A Fujitsu Group Company NIFTY-Serve (licensed and interconnected with CompuServe) One of the largest ISPs in Japan Cocolog First blog community at a Japanese ISP Based on TypePad technology by SixApart Several hundred million PV/month History Dec/02/2003: Cocolog for ISP users launch Nov/24/2005: Cocolog Free for free launch April/05/2007: Cocolog for Mobile Phone launch

2008/ Thousand Users Cocolog (Screenshot of home page)

TypePadCocolog

Cocolog template sets

Cocolog Growth (User)CocologCocolog Free phase 1 phase 2 phase 3 phase 4

Cocolog Growth (Entry)CocologCocolog Free phase 1 phase 2 phase 3 phase 4

Technology at Cocolog Core System Linux 2.4/2.6 Apache 1.3/2.0/2.2 mod_perl Perl 5.8+CPAN PostgreSQL 8.1 MySQL 5.0 memcached/TheSchwartz/cfengine Eco System LAMP,LAPP,Ruby+ActiveRecord, Capistrano Etc...

Monitoring Management Tool Proprietary in-house development with PostgreSQL, PHP, and Perl Monitoring points (order of priority) response time of each post number of spam comments/trackbacks number of comments/trackbacks source IP address of spam number of entries number of comments via mobile devices page views via mobile devices time of batch completion amount of API usage bandwidth usage DB Disk I/O Memory and CPU usage time of VACUUM analyze APP number of active processes CPU usage Memory usage Hard DB Service APL

Tips for migration Troubles with PostreSQL &Linux 2.4/2.6 VACUUM Data size Character set Cleaning data Troubles with MySQL convert_tz function sort order

2. History of Cocolog

Phase1 2003/12 (Entry: 0.04 Million ) Register Postgre SQL NAS WEB Static contents Published Before DBP 10servers TypePad

Podcast Portal Profile Etc.. Phase2 2004/12 (Entry: 7 Million ) Rich template Publish Book Tel Operator Support NAS WEB Static contents Published Postgre SQL Register TypePad 2004/ /5 Before DBP 50servers

Phase2 - Problems The system is tightly coupled. Database server is receiving from multiple points. It is difficult to change the system design and database schema.

Phase3 2006/3 (Entry: 12 Million ) NAS WEB Static contents Published Web-API memcached Podcast Portal Profile Etc.. Postgre SQL Rich template Publish Book Tel Operator Support Register TypePad Before DBP 200servers

Phase4 2007/4 (Entry: 16 Million ) Web-API NAS WEB Static contents Published memcached Atom Mobile WEB Rich template Publish Book Tel Operator Support Register Typepad Postgre SQL Before DBP 300servers

Now 2008/4 Web-API NAS WEB Static contents Published memcached Atom Mobile WEB Typepad Rich template Publish Book Tel Operator Support Register Multi MySQL After DBP 150servers

3. TypePad Database Partitioning

Steps for Transitioning Server Preparation Hardware and software setup Global Write Write user information to the global DB Global Read Read/write user information on the global DB Move Sequence Table sequences served by global DB User Data Move Move user data to user partitions New User Partition All new users saved directly to user partition 1 New User Strategy Decide on a strategy for the new user partition Non User Data Move Move all non-user owned data

Storage TypePad Overview (PreDBP) Database (Postgres) Static Content (HTML, Images, etc) Application Server Web Server TypeCast Server ATOM Server MEMCACHED Data Caching servers to reduce DB load Dedicated Server for TypeCast (via ATOM) https(443) http(80) http(80) : atom api memcached(11211) postgres(5432) Mail Server Internet nfs(2049) ADMIN(CRON) Server smtp(25) / pop(110) Blog Readers Blog Owners Mobile Blog Readers smtp(25) / pop(110) Cron Server for periodic asynchronous tasks

TypePad Non- User Role Why Partition? TypePad User Role (User0) All inquires (access) go to one DB(Postgres) After DBP Current setup Inquiries (access) are divided among several DB(MySQL) TypePad Global Role Non-User Role User Role (User1) User Role (User2) User Role (User3)

Non- User Role Server Preparation TypePad User Role (User0) DB(PostgreSQL) User Role (User1) User Role (User2) User Role (User3) Global Role Non-User Role New expanded setup DB(MySQL) for partitioned data Current Setup Job Server + TypePad + Schwartz Schwartz DB User information is partitioned Maintains user mapping and primary key generation Stores job details Server for executing Jobs Grey areas are not used in current steps Asynchronous Job Server Information that does not need to be partitioned (such as session information)

Global Write Creating the user map Non- User Role TypePad User Role (User0) DB(PostgreSQL) User Role (User1) User Role (User2) User Role (User3) Global Role Non-User Role Job Server + TypePad + Schwartz Schwartz DB Explanation For new registrations only, uniquely identifying user data is written to the global DB This same data continues to be written to the existing DB DB(MySQL) for partitioned data Asynchronous Job Server Maintains user mapping and primary key generation Grey areas are not used in current steps

Global Read Use the user map to find the user partition Non- User Role TypePad User Role (User0) DB(PostgreSQL) User Role (User1) User Role (User2) User Role (User3) Global Role Non-User Role Job Server + TypePad + Schwartz Schwartz DB Explanation Migrate existing user data to the global DB At start of the request, the application queries global DB for the location of user data The application then talks to this DB for all queries about this user. At this stage the global DB points to the user0 partition in all cases. DB(MySQL) for partitioned data Maintains user mapping and primary key generation Migrate existing user data Asynchronous Job Server Grey areas are not used in current steps

Move Sequence Migrating primary key generation Non- User Role TypePad User Role (User0) DB(PostgreSQL) User Role (User1) User Role (User2) User Role (User3) Global Role Non-User Role Job Server + TypePad + Schwartz Schwartz DB Explanation Postgres sequences (for generating unique primary keys) are migrated to tables on the global DB that act as pseudo-sequences. Application requests new primary keys from global DB rather than the user partition. DB(MySQL) for partitioned data Maintains user mapping and primary key generation Grey areas are not used in current steps Migrate sequence management Asynchronous Job Server

User Data Move Moving user data to the new user-role partitions Non- User Role TypePad User Role (User0) DB(PostgreSQL) User Role (User1) User Role (User2) User Role (User3) Global Role Non-User Role Job Server + TypePad + Schwartz Schwartz DB Explanation Existing users that should be migrated by Job Server are submitted as new Schwartz jobs. User data is then migrated asynchronously If a comment arrives while the user is being migrated, it is saved in the Schwartz DB to be published later. After being migrated all user data will exist on the user-role DB partitions Once all user data is migrated, only non-user data is on Postgres DB(MySQL) for partitioned data Stores job details Server for executing Jobs Maintains user mapping and primary key generation User information is partitioned Grey areas are not used in current steps Migrating each user data DB(MySQL) for partitioned data

New User Partition New registrations are created on one user role partition Non- User Role TypePad User Role (User0) DB(PostgreSQL) User Role (User1) User Role (User2) User Role (User3) Global Role Non-User Role Job Server + TypePad + Schwartz Schwartz DB Explanation When new users register, user data is written to a user role partition. Non-user data continues to be served off Postgres DB(MySQL) for partitioned data Maintains user mapping and primary key generation User information is partitioned Grey areas are not used in current steps Asynchronous Job Server

New User Strategy Pick a scheme for distributing new users Non- User Role TypePad User Role (User0) DB(PostgreSQL) User Role (User1) User Role (User2) User Role (User3) Global Role Non-User Role Job Server + TypePad + Schwartz Schwartz DB Explanation When new users register, user data is written to one of the user role partitions, depending on a set distribution method (round robin, random, etc) Non-user data continues to be served off Postgres DB(MySQL) for partitioned data Maintains user mapping and primary key generation User information is partitioned Grey areas are not used in current steps Asynchronous Job Server

Non User Data Move Migrate data that cannot be partitioned by user Non- User Role TypePad User Role (User0) DB(PostgreSQL) User Role (User1) User Role (User2) User Role (User3) Global Role Non-User Role Job Server + TypePad + Schwartz Schwartz DB Explanation Migrate non-user role data left on PostgreSQL to the MySQL side. DB(MySQL) for partitioned data Maintains user mapping and primary key generation User information is partitioned Grey areas are not used in current steps Migrate non-User data Asynchronous Job Server Information that does not need to be partitioned (such as session information)

Data migration done Non- User Role TypePad User Role (User0) DB(Postgres) User Role (User1) User Role (User2) User Role (User3) Global Role Non-User Role Job Server + TypePad + Schwartz Schwartz DB Explanation All data access is now done through MySQL Continue to use The Schwartz for asynchronous jobs DB(MySQL) for partitioned data Stores job details Server for executing Jobs Maintains user mapping and primary key generation User information is partitioned Grey areas are not used in current steps Asynchronous Job Server Information that does not need to be partitioned (such as session information)

Storage The New TypePad configuration Database (MySQL) Static Content (HTML, Images, etc) Application Server Web Server TypeCast Server ATOM Server MEMCACHED Data Caching servers to reduce DB load Dedicated Server for TypeCast (via ATOM) https(443) http(80) http(80) : atom api memcached(11211) MySQL(3306) Mail Server Internet nfs(2049) ADMIN(CRON) Server smtp(25) / pop(110) Blog Readers Blog Owners (management interface) Mobile Blog Readers smtp(25) / pop(110) Cron Server for periodic asynchronous tasks Job Server TheSchwartz server for running ad-hoc jobs asynchronously

4. Migration from PostgreSQL to MySQL

DB Node Spec History TimeOS(RedHat)CPU XeonMEMDiskArray 2003/ /11 7.4(2.4.9) 1.8GHz/512k×1 1GBNo ES2.1(2.4.9) 3.2GHz/1M×2 4GBNo ES2.1(2.4.9) 3.2GHz/1M×2 4GBYes AS2.1(2.4.9) 3.2GHz/1M×4 12G B Yes AS4 (2.6.9) 3.2GHz/1M×4 12G B Yes AS4 (2.6.9)MP3.3GHz/1M×4 2Core×4 16G B Yes History of scale up PostgreSQL server, Before DBP

DB DiskArray Spec [FUJITSU ETERNUS8000] Best I/O transaction performance in the world 146GB (15 krpm) * 32disk with RAID - 10 MultiPath FibreChannel 4Gbps QuickOPC (One Point Copy) OPC copy functions let you create a duplicate copy of any data from the original at any chosen time. ducts_storage.shtml?products/storage/fujitsu/ e8000/e8000 History of scale up PostgreSQL server, Before DBP

Scale out MySQL servers, After DBP A role configuration Each role is configured as HA cluster HA Software: NEC ClusterPro Shared Storage

Scale out MySQL servers, After DBP Postgre SQL FibreChannel SAN DiskArray … heart beat MySQL Role3 MySQL Role2 MySQL Role1 TypePad Application

Scale out MySQL servers, After DBP Backup Replication w/ Hot backup

Scale out MySQL servers, After DBP Postgre SQL FibreChannel SAN DiskArray … heart beat MySQL Role3 MySQL Role2 MySQL Role1 MySQL BackupRole TypePad Application mysqld rep opc mysqld

Troubles with PostreSQL 7.4 – 8.1 Data size over 100 GB 40% is index Severe Data Fragmentation VACUUM VACUUM analyze cause the performance problem Takes too long to VACUUM large amounts of data dump/restore is the only solution for de-fragmentation Auto VACUUM We dont use Auto VACUUM since we are worried about latent response time

Troubles with PostgreSQL 7.4 – 8.1 Character set PostgreSQL allow the out of boundary UTF-8 Japanese extended character sets and multi bytes character sets which normally should come back with an error - instead of accepting them.

Cleaning data Removing characters set that are out of the boundries UTF-8 character sets. Steps PostgreSQL.dumpALL Split for Piconv UTF8 -> UCS2 -> UTF8 & Merge PostgreSQL.restore dump Split UTF8->UCS2->UTF8 Mergerestore

TypePad Migration from PostgreSQL to MySQL using TypePad script Steps PostgreSQL -> PerlObject & tmp publish -> MySQL -> PerlObject & last publish diff tmp last Object data check diff tmp last publish file check PostgreSQL Document Object tmp Document Object last File check data check

Troubles with MySQL convert_tz function doesn't support the input value outside the scope of Unix Time sort order different sort order without order by clause

Cocolog Future Plans Dynamic Job queue

Consulting by Sumisho Computer Systems Corp. System Integrator first and best partner of MySQL in Japan since 2003 provide MySQL consulting, support, training service HA Maintenance online backup Japanese character support

Questions