Eric Grancher CERN IT department Overview of Database Technologies Computing and Astroparticle Physics 2 nd ASPERA Workshop https://edms.cern.ch/document/ /1.

Slides:



Advertisements
Similar presentations
Large Scale Computing Systems
Advertisements

10 REASONS Why it makes a good option for your DB IN-MEMORY DATABASES Presenter #10: Robert Vitolo.
NoSQL Databases: MongoDB vs Cassandra
GGF Toronto Spitfire A Relational DB Service for the Grid Peter Z. Kunszt European DataGrid Data Management CERN Database Group.
NoSQL and NewSQL Justin DeBrabant CIS Advanced Systems - Fall 2013.
CERN IT Department CH-1211 Geneva 23 Switzerland t Sequential data access with Oracle and Hadoop: a performance comparison Zbigniew Baranowski.
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 1 Preview of Oracle Database 12 c In-Memory Option Thomas Kyte
CERN IT Department CH-1211 Genève 23 Switzerland t Integrating Lemon Monitoring and Alarming System with the new CERN Agile Infrastructure.
Scale-out databases for CERN use cases Strata Hadoop World London 6 th of May,2015 Zbigniew Baranowski, CERN IT-DB.
Database Services for Physics at CERN with Oracle 10g RAC HEPiX - April 4th 2006, Rome Luca Canali, CERN.
Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
CERN IT Department CH-1211 Geneva 23 Switzerland t Experience with NetApp at CERN IT/DB Giacomo Tenaglia on behalf of Eric Grancher Ruben.
CERN - IT Department CH-1211 Genève 23 Switzerland t The High Performance Archiver for the LHC Experiments Manuel Gonzalez Berges CERN, Geneva.
Big Data Tools Overview Avi Freedman ServerCentral Technology Executives Club November 13, 2013.
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
Getting Biologists off ACID Ryan Verdon 3/13/12. Outline Thesis Idea Specific database Effects of losing ACID What is a NoSQL database Types of NoSQL.
ORACLE GOLDENGATE AT CERN
1 Yasin N. Silva Arizona State University This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
NoSQL Databases Oracle - Berkeley DB Rasanjalee DM Smriti J CSC 8711 Instructor: Dr. Raj Sunderraman.
NoSQL Databases Oracle - Berkeley DB. Content A brief intro to NoSQL About Berkeley Db About our application.
IMDGs An essential part of your architecture. About me
4/5/2007Data handling and transfer in the LHCb experiment1 Data handling and transfer in the LHCb experiment RT NPSS Real Time 2007 FNAL - 4 th May 2007.
Development of Hybrid SQL/NoSQL PanDA Metadata Storage PanDA/ CERN IT-SDC meeting Dec 02, 2014 Marina Golosova and Maria Grigorieva BigData Technologies.
CERN Physics Database Services and Plans Maria Girone, CERN-IT
6 May 2014 CERN openlab IT Challenges workshop, Kacper Szkudlarek, CERN Manuel.
CERN - IT Department CH-1211 Genève 23 Switzerland t Oracle Real Application Clusters (RAC) Techniques for implementing & running robust.
CERN-IT Oracle Database Physics Services Maria Girone, IT-DB 13 December 2004.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Implementation and performance analysis of.
CERN Database Services for the LHC Computing Grid Maria Girone, CERN.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
Database Competence Centre openlab Major Review Meeting nd February 2012 Maaike Limper Zbigniew Baranowski Luigi Gallerani Mariusz Piorkowski Anton.
Nov 2006 Google released the paper on BigTable.
CERN IT Department CH-1211 Geneva 23 Switzerland t Eva Dafonte Perez IT-DB Database Replication, Backup and Archiving.
CERN IT Department CH-1211 Genève 23 Switzerland t CERN IT Monitoring and Data Analytics Pedro Andrade (IT-GT) Openlab Workshop on Data Analytics.
Scalable data access with Impala Zbigniew Baranowski Maciej Grzybek Daniel Lanza Garcia Kacper Surdy.
RAC aware change In order to reduce the cluster contention, a new scheme for the insertion has been developed. In the new scheme: - each “client” receives.
Eric Grancher, Maaike Limper, CERN IT department Oracle at CERN Database technologies Image courtesy of Forschungszentrum.
European Organization For Nuclear Research CERN Accelerator Logging Service Overview Focus on Data Extraction for Offline Analysis Ronny Billen & Chris.
Data Analytics and Hadoop Service in IT-DB Visit of Cloudera - April 19 th, 2016 Luca Canali (CERN) for IT-DB.
Database 12.2 and Oracle Enterprise Manager 13c Liana LUPSA.
JET INFOSYSTEMS The main approach to Big Data parallel processing: Oracle way Aleksey Struchenko Database Department Leader.
1 Gaurav Kohli Xebia Breaking with DBMS and Dating with Relational Hbase.
Oracle Database Native Sharding: a customer perspective ©2016 PayPal Inc. Confidential and proprietary. John Kanagaraj, Sr. Member of Technical Staff,
QC-specific database(s) vs aggregated data database(s) Outline
WLCG Workshop 2017 [Manchester] Operations Session Summary
Replication using Oracle Streams at CERN
Big Data Enterprise Patterns
Maria Girone, CERN – IT, Data Management Group
Database Services Katarzyna Dziedziniewicz-Wojcik On behalf of IT-DB.
Database Services Katarzyna Dziedziniewicz-Wojcik On behalf of IT-DB.
Data Analytics and CERN IT Hadoop Service
Hadoop and Analytics at CERN IT
Future Database Challenges
Future Archiver (librarian) for WinCC OA Control Systems
Evolution of Data(base) Replication Technologies for WLCG
CS122B: Projects in Databases and Web Applications Winter 2017
PVSS Evolution in Relation to Databases
Open Source distributed document DB for an enterprise
Database Workshop Report
Operational & Analytical Database
Modern Databases NoSQL and NewSQL
Oracle at CERN Database technologies
In-Memory Performance
Storage Systems for Managing Voluminous Data
Case studies – Atlas and PVSS Oracle archiver
ASM-based storage to scale out the Database Services for Physics
Data Lifecycle Review and Outlook
Oracle Streams Performance
Copyright © JanBask Training. All rights reserved Get Started with Hadoop Hive HiveQL Languages.
Presentation transcript:

Eric Grancher CERN IT department Overview of Database Technologies Computing and Astroparticle Physics 2 nd ASPERA Workshop

Outline Experience at CERN –Current usage and deployment –Replication –Lessons from last 15 years Oracle and MySQL Trends –Flash –Compression –Open source “relational” databases –NoSQL databases 2

CERN IT-DB Services - Luca Canali 3

4 CERN databases services –~130 databases, most of them database clusters (Oracle RAC technology RAC, 2 – 6 nodes) –Currently over 3000 disk spindles providing more than ~3PB raw disk space (NAS and SAN) –MySQL service started (for Drupal) Some notable databases at CERN –Experiments’ databases – 14 production databases Currently between 1 and 12 TB in size Expected growth between 1 and 10 TB / year –LHC accelerator logging database (ACCLOG) – ~70 TB, > rows, expected growth up to 35(+35) TB / year –... Several more DBs in the 1-2 TB range CERN databases in numbers original slide by Luca Canali

Key role for LHC physics data processing Online acquisition, offline production, data (re)processing, data distribution, analysis SCADA, conditions, geometry, alignment, calibration, file bookkeeping, file transfers, etc.. Grid Infrastructure and Operation services Monitoring, Dashboards, User-role management,.. Data Management Services File catalogues, file transfers and storage management, … Metadata and transaction processing for custom tape-based storage system of physics data Accelerator control and logging systems AMS as well: data/mc production bookkeeping and slow control data 5 original slide by Luca Canali

6 Oracle Real Application Cluster from Oracle9i Real Application Clusters Deployment and Performance

CERN openlab and Oracle Streams Worldwide distribution of experimental physics data using Oracle Streams  Huge effort, successful outcome 7 slide by Eva Dafonte Pérez

CMS LHCb ATLAS COMPASS ALICE 8 slide by Eva Dafonte Pérez

9

10 Lessons learnt Aiming for high-availability is (often) adding complexity … and complexity is the enemy of availability Scalability can be achieved with Oracle Real Application Cluster (150k entries/s for PVSS) Database / application instrumentation is key for understanding/improving performance NFS/D-NFS/pNFS are solutions to be considered for stability and scalability (very positive experience with NetApp, snapshots, scrubbing, etc.) Database independence is very complex if performance is required Hiding IO errors from the database leaves the database handle what it is best at (transactions, query optimisation, coherency, etc.)

Incidents review 11

Evolution Flash Large memory systems Compression Open source “relational” databases NoSQL databases 12

Flash, memory and compression Flash changes the picture in the database area IO –Sizing for IO Operations Per Second –Usage of fast disks for high number of IOPS and latency Large amount of memory –Enables consolidation and virtualisation (less nodes) –Some databases fully in memory Compression is gaining momentum for databases –For example Oracle’s hybrid columnar compression –Tiering of storage 13

Exadata Hybrid Columnar Compression on Oracle 11gR2 Compression factor 14 slide by Svetozar Kapusta

Local analysis of data “Big Data”, analysis of large amount of data in reasonable of time Goole MapReduce, Apache Hadoop implementation Oracle Exadata –Storage cells perform some of the operations locally (Smart Scans, storage index, column filtering, etc.) Greenplum –shared-nothing massively parallel processing architecture for Business Intelligence and analytical processing Important direction for at least the first level of selection 15

Open source relational databases MySQL and PostgreSQL Some components are not “free”, replacements exist (for example for hot backup Percona XtraBackup) MySQL default for many applications (OpenNebula, Drupal, etc.) PostgreSQL has a strong backend with Oracle-like features (stored procedures, write-ahead logging, “standby”, etc.) 16

NoSQL databases 1/2 Not about SQL limitations/issues –about scalability –Big Data scale-out Feature-rich SQL engines are complex, lead to some unpredictability Workshop at CERN (needs and experience) ATLAS Computing Technical Interchange Meeting 17

NoSQL databases 2/2 Lot of differences –Data models (Key-Value, Column Family, Key- Document, Graph, etc.) –Schema free storage –Queries complexity –Mostly not ACID (Atomicity, Consistency, Isolation and Durability), data durability relaxed compared to traditional SQL engines –Performance with sharding –Compromise on consistency to keep availability and partition tolerance -> application is at the core, evolution? 18

Summary Oracle –Critical component for LHC accelerator and physics data processing –Scalable and stable, including data replication CERN central services run on Oracle, for which we have components and experience to build high availability, guaranteed data, scalability MySQL is being introduced –Nice features and light, lacks some scalability and High- Availability features for solid service NoSQL is being considered –Ecosystem still in infancy (architecture, designs and interfaces subject to change!) 19

Oracle EU director for research, Monica Marinucci Strong collaboration with CERN and universities Well known solution, easy to find database administrators and developers, training available Support and licensing 20

References NoSQL ecosystem, Database workshop at CERN and ATLAS Computing Technical Interchange Meeting Eva Dafonte Pérez, UKOUG 2009 “Worldwide distribution of experimental physics data using Oracle Streams” Luca Canali, CERN IT-DB Deployment, Status, Outlook CERN openlab, CAP theorem, ACID,

Backup slides 22

Incidents review 23

24 LHC logging service, > graph by Ronny Billen