Database Services Katarzyna Dziedziniewicz-Wojcik On behalf of IT-DB.

Slides:



Advertisements
Similar presentations
Project presentation by Mário Almeida Implementation of Distributed Systems KTH 1.
Advertisements

High Availability Group 08: Võ Đức Vĩnh Nguyễn Quang Vũ
Updates from Database Services at CERN Andrei Dumitru CERN IT Department / Database Services.
Evaluation of NoSQL databases for DIRAC monitoring and beyond
Evaluation of distributed open source solutions in CERN database use cases HEPiX, spring 2015 Kacper Surdy IT-DB-DBF M. Grzybek, D. L. Garcia, Z. Baranowski,
Hadoop tutorials. Todays agenda Hadoop Introduction and Architecture Hadoop Distributed File System MapReduce Spark 2.
Hadoop Ecosystem Overview
SQL on Hadoop. Todays agenda Introduction Hive – the first SQL approach Data ingestion and data formats Impala – MPP SQL.
Scale-out databases for CERN use cases Strata Hadoop World London 6 th of May,2015 Zbigniew Baranowski, CERN IT-DB.
Presented by CH.Anusha.  Apache Hadoop framework  HDFS and MapReduce  Hadoop distributed file system  JobTracker and TaskTracker  Apache Hadoop NextGen.
Hadoop tutorials. Todays agenda Hadoop Introduction and Architecture Hadoop Distributed File System MapReduce Spark Cluster Monitoring 2.
Introduction to Hadoop and HDFS
Contents HADOOP INTRODUCTION AND CONCEPTUAL OVERVIEW TERMINOLOGY QUICK TOUR OF CLOUDERA MANAGER.
CERN IT Department CH-1211 Geneva 23 Switzerland t Daniel Gomez Ruben Gaspar Ignacio Coterillo * Dawid Wojcik *CERN/CSIC funded by Spanish.
Data and SQL on Hadoop. Cloudera Image for hands-on Installation instruction – 2.
CERN IT Department CH-1211 Geneva 23 Switzerland t CF Computing Facilities Agile Infrastructure Monitoring CERN IT/CF.
Hadoop implementation of MapReduce computational model Ján Vaňo.
Hadoop IT Services Hadoop Users Forum CERN October 7 th,2015 CERN IT-D*
Nov 2006 Google released the paper on BigTable.
Maria Girone CERN - IT Tier0 plans and security and backup policy proposals Maria Girone, CERN IT-PSS.
Scalable data access with Impala Zbigniew Baranowski Maciej Grzybek Daniel Lanza Garcia Kacper Surdy.
1 HBASE – THE SCALABLE DATA STORE An Introduction to HBase XLDB Europe Workshop 2013: CERN, Geneva James Kinley EMEA Solutions Architect, Cloudera.
CERN IT Department CH-1211 Genève 23 Switzerland t CERN Agile Infrastructure Monitoring Pedro Andrade CERN – IT/GT HEPiX Spring 2012.
Data Analytics and Hadoop Service in IT-DB Visit of Cloudera - April 19 th, 2016 Luca Canali (CERN) for IT-DB.
Microsoft Ignite /28/2017 6:07 PM
Monitoring Evolution 1 Alberto AIMAR, IT-CM-MM. Outline Mandate Data Centres Monitoring Experiments Dashboards Architecture Plans Status Demo 2.
IT Monitoring Service Status and Progress 1 Alberto AIMAR, IT-CM-MM.
Data Analytics Challenges Some faults cannot be avoided Decrease the availability for running physics Preventive maintenance is not enough Does not take.
1 Gaurav Kohli Xebia Breaking with DBMS and Dating with Relational Hbase.
Eric Grancher CERN IT department Overview of Database Technologies Computing and Astroparticle Physics 2 nd ASPERA Workshop /1.
Hadoop Introduction. Audience Introduction of students – Name – Years of experience – Background – Do you know Java? – Do you know linux? – Any exposure.
Pilot Kafka Service Manuel Martín Márquez. Pilot Kafka Service Manuel Martín Márquez.
Integration of Oracle and Hadoop: hybrid databases affordable at scale
OMOP CDM on Hadoop Reference Architecture
Protecting a Tsunami of Data in Hadoop
Monitoring Evolution and IPv6
Backup and Recovery for Hadoop: Plans, Survey and User Inputs
PROTECT | OPTIMIZE | TRANSFORM
Integration of Oracle and Hadoop: hybrid databases affordable at scale
IT Services Katarzyna Dziedziniewicz-Wojcik IT-DB.
Database Services Katarzyna Dziedziniewicz-Wojcik On behalf of IT-DB.
Hadoop.
Data Analytics and CERN IT Hadoop Service
Introduction to Distributed Platforms
Hadoop and Analytics at CERN IT
Future Database Challenges
INTRODUCTION TO PIG, HIVE, HBASE and ZOOKEEPER
Running virtualized Hadoop, does it make sense?
Chapter 14 Big Data Analytics and NoSQL
Sqoop Mr. Sriram
CLOUDERA TRAINING For Apache HBase
Database Workshop Report
New Big Data Solutions and Opportunities for DB Workloads
Enabling Scalable and HA Ingestion and Real-Time Big Data Insights for the Enterprise OCJUG, 2014.
Data Analytics and CERN IT Hadoop Service
Data Analytics and CERN IT Hadoop Service
Oracle Database Monitoring and beyond
APACHE HAWQ 2.X A Hadoop Native SQL Engine
Central Florida Business Intelligence User Group
Powering real-time analytics on Xfinity using Kudu
Oracle Storage Performance Studies
Data Analytics and CERN IT Hadoop Service
Data Analytics – Use Cases, Platforms, Services
Big Data - in Performance Engineering
Introduction to PIG, HIVE, HBASE & ZOOKEEPER
Introduction to Apache
Overview of big data tools
Pig Hive HBase Zookeeper
Presentation transcript:

Database Services Katarzyna Dziedziniewicz-Wojcik On behalf of IT-DB

Agenda Database service overview Status & plans Oracle Database On Demand Hadoop 20/10/2016 DB Services

Databases 3 main pillars Oracle Database on-demand Hadoop For critical, transactional load Administered by experts (DBAs) Database on-demand Different database engines MySQL, PostgreSQL, Oracle, InfluxDB Instance owners have the DBA rights Hadoop For analysis where DBs are not suitable 20/10/2016 DB Services

Questions? 20/04/2016 DB Services

Oracle in numbers ~100 databases, most of them RAC Mostly NAS storage plus some SAN with ASM More than 600 TB of data files for production DBs in total 20/10/2016 DB Services

20/10/2016 DB Services

Oracle – short term plans Creation of a public database service supporting UNICODE Hardware migration of all experiment and project databases New nodes 512GB of RAM Migration of all non-accelerator APEX installations to APEX 5 20/10/2016 DB Services

Oracle – long term Oracle 11.2 and 12.1 LS2 upgrade Main new feature Supported until 2020 and 2021 respectively LS2 upgrade Oracle 12.2 (recommended) Release date 2016 Main new feature In-Memory database Real-time responsiveness 20/10/2016 DB Services

On-Demand in numbers All databases – 415 By DB Type 25% growth in last 6 monys Production – 276 By DB Type MySQL: 321 PostgreSQL: 70 Oracle: 9 InfluxDB: 17 (pilot) 20/10/2016 DBServices

Time series databases Time series data workload assumptions Billions of individual data points High write and read throughput Mostly an insert/append workload, very few updates/deletes Large deletes to free up disk space 20/10/2016 DB Services

InfluxDB Pilot service, production in Q1 2017 HTTP Write and Query API SQL like language to query aggregated data Schema-less Works with different visualization tools: Grafana, Chronograf, … Authentication Data Retention policies Storage optimization & compression 20/10/2016 DB Services

InfluxDB use cases Cgroup stats for batch jobs Fifemon(scheduler and central manager of batch nodes) Gitlab Stats Network TSM ActiveMQ Monitoring LHCb WLCG DBoD 20/10/2016 DB Services

Databases – On-Demand MySQL CE 5.7 PostgreSQL 9.6 To stay for next 3 years PostgreSQL 9.6 Q4 2016 Oracle as for General Service Upgrades require database reboot 20/10/2016 DB Services

Status for On-Demand SSL for client access encryption and authentication ✔ High availability Replication ✔ App/DB Gateways (coming soon) Clustered Databases (2017) Instance Monitoring Currently studying a hybrid solution: ElasticSearch+Packetbeat for database activity (SQL) 􏰀 InfluxDB+telegraf for instance metrics 20/10/2016 DB Services

RDBMS vs Hadoop RDBMS optimized for OLTP, Hadoop affordable at scale Interconnect Node 1 Node 2 Node n The shared nothing architecture allows to scale for high capacity and throughput on commodity HW Example of Oracle RAC deployed with shared storage 20/10/2016 DB Services

Main components of CERN-IT Hadoop service Data exchange with RDBMS Sqoop Scripting Pig Hive SQL Large scale data proceesing Spark Log data collector Flume Impala SQL NoSql columnar store HBase Data streaming hub Kafka Zookeeper Coordination MapReduce YARN Cluster resource manager HDFS Hadoop Distributed File System 20/10/2016 DB Services

Hadoop usage at CERN 3 production clusters Main use-cases Aggregated capacity ~ 1000 cores, 3TB RAM, 6PB Storage (~2 effective PB) Main use-cases IT Monitoring ATLAS EventIndex Accelerator Logging v2 (pilot phase) Core –physical core 20/10/2016 DB Services

Plans – Apache Kafka Distributed streaming platform Open source project started at LinkedIn in 2011 Stores streams of data in distributed replicated cluster Horizontally scalable (streams partitioning) Fault tolerant (replication of partitions) Allows to access and process the streams of data in real-time Became an industry standard and important component of Hadoop ecosystem 20/10/2016 DB Services

Kafka as Hadoop enhancement 20/10/2016 DB Services

Apache Kafka as a service Core component for mission critical systems at CERN CERN Accelerator Logging System v2 Industrial Controls and Data Acquisition systems IT Monitoring & Security Production quality service needed Pilot service proposed Joined effort of IT-DB and IT-CM groups Collecting inputs from the user communities ... it-kafka-service@cern.ch 20/10/2016 DB Services

Hadoop data backup Introduction aimed at Q1 2017 Cold, incremental copy of files every 24 hours Last 24 hours of stored in Kafka buffers Data owner’s responsibility to insert them 20/10/2016 DB Services

Acknowledgments Colleagues from IT-DB Special thanks: Zbigniew Baranowski Ignacio Coterillo Coz Prasanth Kothuri Antonio Romero Marin 20/10/2016 DB Services