Presentation is loading. Please wait.

Presentation is loading. Please wait.

From DBA to DPA – Becoming a Data Platform Administrator

Similar presentations


Presentation on theme: "From DBA to DPA – Becoming a Data Platform Administrator"— Presentation transcript:

1 From DBA to DPA – Becoming a Data Platform Administrator
Lior King Database Team Manager - InterContinental Exchange (ICE)

2 The data world is changing
Once Today RDBMS for everything is the default but is not good for everything. RDBMS the only option There are also document based DBs, K/V databases, Analytics systems etc. Commercial RDBMS only AND ALSO cheap community edition RDBMS DWH on RDBMS only on RDBMS AND ALSO on special Analytics platforms Applicative DBA / Infrastructure DBA Full stack DBA

3 Data Platform Administrator - DPA
Todays’ data platforms consists of: RDBMS – sometimes more than one. NoSQL systems: Document based, column based, K/V based, graph based. Analytical systems. SQL language is as relevant as ever: Most systems can “talk” SQL of some kind. Todays’ BI systems can connect to ALL of them. The DPA needs to administer them all.

4 How can you become a Data Platform Administrator ?

5 Here are 6 topics worth learning to become a data platform administrator

6 #1: Learn about Analytic Platforms
They can process real “big data” volumes very fast starting from dozens of TB and up to many PB. They are scalable (some of them have limits…). They are the true “data warehouse” platforms for “big data” projects. Today’s leading platform is Hadoop The most scalable solution – limitless. Rather cheap (in comparison to the alternatives). Good for all analytics: data scans (M/R), seeks (Impala/Shark), Machine Learning (Mahout), workflows (Tez/ZooKeeper) etc. Evolves very rapidly – the #1 project of the Apache Software Foundation. Alternatives: PDW, Vertica, GreenPlum, Netezza, TeraData

7 #2: Learn about Document Based DBs
Scale more than RDBMS Easy to manage and to program with (object insertion/retrieval) Flexible schema design. They are very fast – usually faster than RDBMS. They also have MANY disadvantages compare to RDBMS. 3 recommendations to check out: MongoDB CouchBase Azure DocumentDB.

8 #3: Learn Linux – 10 reasons
It’s free Evolves very fast – update release every 6-9 months. Convenient software repositories. Modifiable and customizable. Runs on any platform. Very secure – a FW at the heart of the kernel Lack of malware (installing from the repositories). Restart after upgrades? Usually not required. Freedom to choose distribution and GUI. SQL Server on Linux – next summer.

9 #4: Learn one “free” RDBMS platform
Companies look for cutting costs. “Free” DBs are becoming popular. There is a growing demand for DBAs who can manage them. Which one to learn? MySQL – resembles SQL Server in many ways (check out MariaDB as well). The most popular “cheap” RDBMS. PostgreSQL – resembles Oracle in many ways. Uses by GreenPlum, Netezza, ParAccel (used in Amazon “RedShift”), Truviso. Sometimes a cheap RDBMS platform can be “good enough”. SQL Server and MySQL together? Why not?

10 #5: Learn Python Easy to learn. Very readable. Cross platform.
Good for scripts as well as for big development projects Object oriented Huge library of packages (~40K packages in 300 topics - on PyPi site). A general purpose language. A leading platform for data analytics (almost as popular as R).

11 #6: Become a “full stack” DBA
Study infrastructure (production) as well as Database development. Infra: Study all HA/DR solutions of SQL Server: AlwaysOn/Mirroring, Log Shipping, Replication, Clusters. Master the dynamic views. Learn a scripting language – PowerShell or Python. Master the locking mechanism. Development: Master SQL and T-SQL. Study the optimization engine and deep dive into execution plans. DB DevOps. Learn an OO programming language (C#, Java, Scala, Python etc.).

12 About InterContinental Exchange (ICE)
ICE owns 11 financials and commodities exchanges worldwide Including the New York Stock Exchange (NYSE) - the biggest stock exchange in the world. ICE owns and operates 6 clearing houses for derivatives. ICE is a major global supplier of financial market data. ICE is a fast growing organization – mostly by M&A (merges and acquisitions): Year Revenues (million USD) Net Income (million USD) 2005 155 53 2010 1,150 398 2015 3,338 1,274

13 Data Platforms in ICE Transaction processing (RDBMS): Analytics:
Oracle (RAC/Exadata), TimesTen MS SQL Server (2008R2/2012/2014) MySQL PostgreSQL Sybase DB2 (LUW) Analytics: GreenPlum Netezza Hadoop Cassandra

14 How do we manage it all? Learn more than one RDBMS.
Learn one analytics platform at least. Learn Linux Learn Shell scripting and/or Python Share the knowledge. Studying is a part of the job – it NEVER stops. Play around in sandboxes.

15 Summary: Become a Data Platform Administrator
Learn about Analytic Platforms (focus on Hadoop) Learn about Document Based DBs Learn Linux Learn one “free” RDBMS platform Learn Python Become a “full stack” DBA – Infrastructure AND development.

16


Download ppt "From DBA to DPA – Becoming a Data Platform Administrator"

Similar presentations


Ads by Google