Development of a noSQL storage solution for the Panda Monitoring System “Database Futures” Workshop at CERN June 7 th, 2011 Maxim Potekhin Brookhaven National.

Slides:



Advertisements
Similar presentations
Introduction to DBA.
Advertisements

Milestone 1 Workshop in Information Security – Distributed Databases Project Access Control Security vs. Performance By: Yosi Barad, Ainat Chervin and.
Evaluation of NoSQL databases for DIRAC monitoring and beyond
Chapter Physical Database Design Methodology Software & Hardware Mapping Logical Design to DBMS Physical Implementation Security Implementation Monitoring.
5 Creating the Physical Model. Designing the Physical Model Phase IV: Defining the physical model.
Designing a Data Warehouse
NovaBACKUP 10 xSP Technical Training By: Nathan Fouarge
Load Test Planning Especially with HP LoadRunner >>>>>>>>>>>>>>>>>>>>>>
A Brief Overview by Aditya Dutt March 18 th ’ Aditya Inc.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
Database Services for Physics at CERN with Oracle 10g RAC HEPiX - April 4th 2006, Rome Luca Canali, CERN.
USING HADOOP & HBASE TO BUILD CONTENT RELEVANCE & PERSONALIZATION Tools to build your big data application Ameya Kanitkar.
Cloud computing is the use of computing resources (hardware and software) that are delivered as a service over the Internet. Cloud is the metaphor for.
Microsoft ® SQL Server ® 2008 and SQL Server 2008 R2 Infrastructure Planning and Design Published: February 2009 Updated: January 2012.
Systems analysis and design, 6th edition Dennis, wixom, and roth
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
CERN - IT Department CH-1211 Genève 23 Switzerland t The High Performance Archiver for the LHC Experiments Manuel Gonzalez Berges CERN, Geneva.
1 An SLA-Oriented Capacity Planning Tool for Streaming Media Services Lucy Cherkasova, Wenting Tang, and Sharad Singhal HPLabs,USA.
OSG Public Storage and iRODS
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
Performance Concepts Mark A. Magumba. Introduction Research done on 1058 correspondents in 2006 found that 75% OF them would not return to a website that.
Milestone 2 Workshop in Information Security – Distributed Databases Project Access Control Security vs. Performance By: Yosi Barad, Ainat Chervin and.
Planning and Designing Server Virtualisation.
MapReduce: Hadoop Implementation. Outline MapReduce overview Applications of MapReduce Hadoop overview.
Introduction to Hadoop and HDFS
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
Hadoop Hardware Infrastructure considerations ©2013 OpalSoft Big Data.
OSG Area Coordinator’s Report: Workload Management February 9 th, 2011 Maxim Potekhin BNL
Summary of Alma-OSF’s Evaluation of MongoDB for Monitoring Data Heiko Sommer June 13, 2013 Heavily based on the presentation by Tzu-Chiang Shen, Leonel.
Designing and Deploying a Scalable EPM Solution Ken Toole Platform Test Manager MS Project Microsoft.
OSG Area Coordinator’s Report: Workload Management April 20 th, 2011 Maxim Potekhin BNL
Development of Hybrid SQL/NoSQL PanDA Metadata Storage PanDA/ CERN IT-SDC meeting Dec 02, 2014 Marina Golosova and Maria Grigorieva BigData Technologies.
Remote Site C Pilot Scheduler Pilots and Pilot Schedulers Jobs Statistics Production Dashboard Dynamic Data Movement Monitor Panda Server (Apache) Development.
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
11 CLUSTERING AND AVAILABILITY Chapter 11. Chapter 11: CLUSTERING AND AVAILABILITY2 OVERVIEW  Describe the clustering capabilities of Microsoft Windows.
Hadoop IT Services Hadoop Users Forum CERN October 7 th,2015 CERN IT-D*
HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
Copyright 2007, Information Builders. Slide 1 Machine Sizing and Scalability Mark Nesson, Vashti Ragoonath June 2008.
Clusterpoint Margarita Sudņika ms RDBMS & NoSQL Databases & tables → Document stores Columns, rows → Schemaless documents Scales UP → Scales UP.
Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure.
Big traffic data processing framework for intelligent monitoring and recording systems 學生 : 賴弘偉 教授 : 許毅然 作者 : Yingjie Xia a, JinlongChen a,b,n, XindaiLu.
Dynamo: Amazon’s Highly Available Key-value Store DAAS – Database as a service.
3/12/2013Computer Engg, IIT(BHU)1 CLOUD COMPUTING-1.
Cloud Computing Lecture 5-6 Muhammad Ahmad Jan.
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
ORACLE & VLDB Nilo Segura IT/DB - CERN. VLDB The real world is in the Tb range (British Telecom - 80Tb using Sun+Oracle) Data consolidated from different.
Scalable data access with Impala Zbigniew Baranowski Maciej Grzybek Daniel Lanza Garcia Kacper Surdy.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
OSG Area Coordinator’s Report: Workload Management February 9 th, 2011 Maxim Potekhin BNL
Introduction to Core Database Concepts Getting started with Databases and Structure Query Language (SQL)
First results of the feasibility study of the Cassandra DB application in Panda Job Monitoring ATLAS S&C Week at CERN April 5 th, 2011 Maxim Potekhin for.
Configuring SQL Server for a successful SharePoint Server Deployment Haaron Gonzalez Solution Architect & Consultant Microsoft MVP SharePoint Server
BIG DATA/ Hadoop Interview Questions.
Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.
Abstract MarkLogic Database – Only Enterprise NoSQL DB Aashi Rastogi, Sanket V. Patel Department of Computer Science University of Bridgeport, Bridgeport,
Spark on Entropy : A Reliable & Efficient Scheduler for Low-latency Parallel Jobs in Heterogeneous Cloud Huankai Chen PhD Student at University of Kent.
SQL IMPLEMENTATION & ADMINISTRATION Indexing & Views.
Understanding and Improving Server Performance
Diskpool and cloud storage benchmarks used in IT-DSS
Running virtualized Hadoop, does it make sense?
Maximum Availability Architecture Enterprise Technology Centre.
Physical Database Design for Relational Databases Step 3 – Step 8
Large Scale Test of a storage solution based on an Industry Standard
Migration Strategies – Business Desktop Deployment (BDD) Overview
Oracle Storage Performance Studies
Case studies – Atlas and PVSS Oracle archiver
Building a Database on S3
Performance And Scalability In Oracle9i And SQL Server 2000
Fast Accesses to Big Data in Memory and Storage Systems
Presentation transcript:

Development of a noSQL storage solution for the Panda Monitoring System “Database Futures” Workshop at CERN June 7 th, 2011 Maxim Potekhin Brookhaven National Laboratory Physics Applications Software Group

2 Brief Intro: Panda Workload Management System (1) Panda is the principal workload management system used in ATLAS. It is a “pilot-based” framework. Pilot jobs are started remotely on participating sites, and after they are successfully activated and perform validation of the environment on the worker node, the payload is acquired and executed. The job (payload) is matched to a pilot by the central service based on a number of criteria (“brokerage”). Pilots, jobs and other entities in Panda transition between their various states as this cycle is repeated over and over. The state of object is recorded in RDBMS (Oracle) and guaranteed consistency of these data is crucial for operation of Panda.

3 Brief Intro: Panda Workload Management System (2) After a Panda job is finished, the data associated with its attributes is preserved for troubleshooting, accounting, optimization and other purposes. It is accessed by the Panda Monitor web service and a few other systems. We note that archive-type data is largely de-normalized, does not require most of the features of RDBMS and There are classes of queries which include time and date ranges where Oracle typically does not perform too well An option to unload significant amounts of archived reference data from Oracle to a highly performing, scalable system based on commodity hardware appears attractive

4 Brief Intro: Panda Workload Management System (3)

5 Motivation We need to scale out as the number of jobs processed daily by Panda is steadily growing with peak capacity of about 840k jobs per day. Using a noSQL solution allows us to solve scalability issues without putting more demands and load on the mission-critical Oracle infrastructure of ATLAS. In addition, this gives us an opportunity to optimally index data, with resulting performance improvements (more details below). Cassandra is one of many noSQL platforms available today, and we chose it because of existing expertise in ATLAS, and ease of installation and configuration.

6 Objectives of the project Ultimately, we aim to create of a noSQL-based data store for archived reference data in Panda. In the validation stage, it would be the data source for Panda Monitor and will continuously function in parallel with the Oracle DB in order to ascertain the performance and characteristics of such system under real life data volume and query loads. To this end, we are pursuing multiple objectives, in the context of Cassandra DB: to create an optimal data design for the data generated by the Panda server as well as indexes to perform a performance comparison of the recently built cluster at BNL to the existing Oracle facility at CERN To determine the optimal hardware configuration of that system and its tuning parameters

7 History of the project In Jan-Feb 2011, using the 4-VM Cassandra R&D cluster at CERN (created by ATLAS DDM team), gained experience in data design and indexing of Panda data, as well as operational experience with Cassandra Demonstrated usefulness of map-reduce approach to generation of date range indexes into Cassandra data, which results in marked performance improvement In March 2011: commissioned a 3-node real hardware cluster at BNL and loaded a year worth of real Panda data in a modified format (based on initial CERN experience). Ran a series of performance tests aimed at the “worst-case performance” estimation, to set the baseline for further optimization. May 2011: upgraded to SSD, testing in progress

8 The Cassandra Cluster at BNL Currently we have 3 high-spec nodes and tested the cluster in 2 data storage configurations. Each node: two Xeon X5650 processors, 6 cores each due to hyper-threading – effectively 24 cores 48GB RAM on a 1333MHz bus Storage Configuration 1: six 500GB, 7200RM SATA RAID0 Storage Configuration 2: five consumer-grade 240GB SSD in RAID0 configuration (recent) We also have a separate client machine from which to do queries against the Cassandra DB.

9 Mapping/indexing of the Data (1) Note on indexing: native ability to index individual columns is a relatively new feature in Cassandra, introduced in mid Since our data is mostly write-once, read-many, we can do a better job by indexing data in map-reduce fashion, and we don’t need to create extraneous columns to implement an equivalent of composite indexes (so we save space). Typically, the indexing process is run at the same time as the data is loaded into Cassandra, however we can add more indexes later if and when needed. Proper index design helps to dramatically improve performance. Example: find number of failed jobs for user XYZ in a range of dates, in cloud ABC. Can takes minutes in standard SQL query vs tens of milliseconds on Cassandra with map/reduce.

10 Mapping/indexing of the Data (2) We identify popular/high demand queries made against the database, and create MAPS. In the above example, USER::CLOUD::STATUS::DATE LIST of IDs Index thus created takes up approx. ~1% of space compared to the data that’s being indexed – affordable. The above is just one example, in reality we create many maps to reflect the queries made by users.

11 Patterns in the Data In Panda, jobs are often submitted not individually, but in batches which can measure anywhere from 2 to hundreds of jobs. This is demonstrated by the following plot which shows the distribution of lengths of continuous ID ranges as they are found in the database. Exploit the pattern: Use bucketing for optimal performance! One trip to DB instead of ten or a hundred.

12 Packaging of the data Cassandra provides a lot of flexibility in how the data may be formatted and stored. In the application being considered, we could use any of the following Simple dictionary of key-value pairs for each attribute (storage penalty as key names are stored for each object) JSON strings (same issue) CSV (low storage overhead, but necessitates parsing in the client) We have opted to use the latter option in most recent configuration.

13 Data sources used to load the Cassandra instance Oracle DB Automated backup on Amazon S3 Local file cache Cassandra HTTP

14 Testing/Benchmarking Comments Using a multithreaded client for data load, with Amazon S3 as current primary source Write performance: with proper separation of the commit log and primary data store, in all of our testing, Cassandra was able to handle all data that we could provide from our network sources – hard or impossible to saturate at this point. Indexed queries perform on par with Oracle or better, under moderate load Our focus now is to consider the worst case scenario with simple primary key queries under moderate to high load (~1000 queries per second), because this will determine the behavior of most applications. The Oracle cluster that we use in our benchmarks has >100 spindles of the Raptor class. In Phase 1 of testing at BNL, we only had 18.

15 Testing with BNL cluster: phase 1, rotational media Case #1: Large query load, primary key (ID) queries. Data load: 1 year worth of job data. Looking at throughput in a random query of 10k items over the past year, vs the number of concurrent clients. Numbers are seconds to query completion. Using iostat and Ganglia, we observe disk bound I/O behavior and corresponding scaling in Cassandra. Disk is close to saturation at about 10 concurrent clients, and client network bandwidth also becomes a limiting factor Cassandra Oracle500

16 Testing with BNL cluster: phase 2, SSD In the table below, we list time to query completion and disk utilization percentages as reported by iostat. As in the previous case, each client runs 10k random job queries against a year worth of data. We weren’t able stress the disk I/O further because of the network constraints (work in progress) Cassandra135s/2%150s/15%220s /27% Oracle500s

17 Conclusions and plans With hi-spec machines and SSD we aren’t quite scaling horizontally, rather diagonally – however cost/benefit of either approach is still to be evaluated – maintenance and administration of a few dozen low spec machines doesn’t come for free either. We observe that we have plenty of headroom with the current setup while handling the load of the order of 1500 queries per second that is higher than the current load on the ATLAS Oracle server used for similar purposes. At this point, we are satisfied with characteristics of the Cassandra cluster built for the purpose of serving Panda job archive data, and our focus will be on building a functional high performance data store for Panda Monitor and other related systems. Will serve JSON to clients, most likely use Django.