Diskpool and cloud storage benchmarks used in IT-DSS

Slides:

Advertisements

Similar presentations

Scalable Multi-Cache Simulation Using GPUs Michael Moeng Sangyeun Cho Rami Melhem University of Pittsburgh.

Advertisements

SLA-Oriented Resource Provisioning for Cloud Computing

Flash: An efficient and portable Web server Authors: Vivek S. Pai, Peter Druschel, Willy Zwaenepoel Presented at the Usenix Technical Conference, June.

VSphere vs. Hyper-V Metron Performance Showdown. Objectives Architecture Available metrics Challenges in virtual environments Test environment and methods.

Transparent Checkpoint of Closed Distributed Systems in Emulab Anton Burtsev, Prashanth Radhakrishnan, Mike Hibler, and Jay Lepreau University of Utah,

An Adaptable Benchmark for MPFS Performance Testing A Master Thesis Presentation Yubing Wang Advisor: Prof. Mark Claypool.

1 PLuSH – Mesh Tree Fast and Robust Wide-Area Remote Execution Mikhail Afanasyev ‧ Jose Garcia ‧ Brian Lum.

Private Cloud or Dedicated Hosts Mason Mabardy & Matt Maples.

Gordon: Using Flash Memory to Build Fast, Power-efficient Clusters for Data-intensive Applications A. Caulfield, L. Grupp, S. Swanson, UCSD, ASPLOS’09.

Statistics of CAF usage, Interaction with the GRID Marco MEONI CERN - Offline Week –

Cluster computing facility for CMS simulation work at NPD-BARC Raman Sehgal.

PROOF: the Parallel ROOT Facility Scheduling and Load-balancing ACAT 2007 Jan Iwaszkiewicz ¹ ² Gerardo Ganis ¹ Fons Rademakers ¹ ¹ CERN PH/SFT ² University.

Welcome Thank you for taking our training. Collection 6421: Configure and Troubleshoot Windows Server® 2008 Network Course 6690 – 6709 at

Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.

Bottlenecks: Automated Design Configuration Evaluation and Tune.

+ CS 325: CS Hardware and Software Organization and Architecture Cloud Architectures.

Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.

Module 1: Installing and Configuring Servers. Module Overview Installing Windows Server 2008 Managing Server Roles and Features Overview of the Server.

Sensitivity of Cluster File System Access to I/O Server Selection A. Apon, P. Wolinski, and G. Amerson University of Arkansas.

Designing and Deploying a Scalable EPM Solution Ken Toole Platform Test Manager MS Project Microsoft.

Block1 Wrapping Your Nugget Around Distributed Processing.

DBI313. MetricOLTPDWLog Read/Write mixMostly reads, smaller # of rows at a time Scan intensive, large portions of data at a time, bulk loading Mostly.

DONE-08 Sizing and Performance Tuning N-Tier Applications Mike Furgal Performance Manager Progress Software

Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!

ROOT and Federated Data Stores What Features We Would Like Fons Rademakers CERN CC-IN2P3, Nov, 2011, Lyon, France.

CS 360 Lecture 11.  In most computer systems:  The cost of people (development) is much greater than the cost of hardware  Yet, performance is important.

Authors – Jeahyuk huh, Doug Burger, and Stephen W.Keckler Presenter – Sushma Myneni Exploring the Design Space of Future CMPs.

Sep. 17, 2002BESIII Review Meeting BESIII DAQ System BESIII Review Meeting IHEP · Beijing · China Sep , 2002.

Distributed applications monitoring at system and network level A.Brunengo (INFN- Ge), A.Ghiselli (INFN-Cnaf), L.Luminari (INFN-Roma1), L.Perini (INFN-Mi),

COMP381 by M. Hamdi 1 Clusters: Networks of WS/PC.

Lecture 5: 9/10/2002CS170 Fall CS170 Computer Organization and Architecture I Ayman Abdel-Hamid Department of Computer Science Old Dominion University.

Andrea Manzi CERN On behalf of the DPM team HEPiX Fall 2014 Workshop DPM performance tuning hints for HTTP/WebDAV and Xrootd 1 16/10/2014.

PROOF tests at BNL Sergey Panitkin, Robert Petkus, Ofer Rind BNL May 28, 2008 Ann Arbor, MI.

PROOF Benchmark on Different Hardware Configurations 1 11/29/2007 Neng Xu, University of Wisconsin-Madison Mengmeng Chen, Annabelle Leung, Bruce Mellado,

Upgrading the Cloud Storage Benchmark Framework for ROOT 6 Compatibility By Surya Seetharaman Openlab Summer Intern 2015 IT Department Data Storage and.

Evaluating the performance of Seagate Kinetic Drives Technology and its integration with the CERN EOS storage system Ivana Pejeva openlab Summer Student.

FroNtier Stress Tests at Tier-0 Status report Luis Ramos LCG3D Workshop – September 13, 2006.

Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.

BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.

LIOProf: Exposing Lustre File System Behavior for I/O Middleware

1 Thierry Titcheu Chekam 1,2, Ennan Zhai 3, Zhenhua Li 1, Yong Cui 4, Kui Ren 5 1 School of Software, TNLIST, and KLISS MoE, Tsinghua University 2 Interdisciplinary.

Claudio Grandi INFN Bologna Virtual Pools for Interactive Analysis and Software Development through an Integrated Cloud Environment Claudio Grandi (INFN.

CERN IT Department CH-1211 Genève 23 Switzerland t Load testing & benchmarks on Oracle RAC Romain Basset – IT PSS DP.

Fermilab Scientific Computing Division Fermi National Accelerator Laboratory, Batavia, Illinois, USA. Off-the-Shelf Hardware and Software DAQ Performance.

DAQ thoughts about upgrade 11/07/2012

Lyon Analysis Facility - status & evolution - Renaud Vernet.

Getting the Most out of Scientific Computing Resources

Dynamic Extension of the INFN Tier-1 on external resources

Getting the Most out of Scientific Computing Resources

BD-Cache: Big Data Caching for Datacenters

Solid State Disks Testing with PROOF

Software Architecture in Practice

Running virtualized Hadoop, does it make sense?

GWE Core Grid Wizard Enterprise (

Processes and Threads Processes and their scheduling

BD-CACHE Big Data Caching for Datacenters

Resource Aware Scheduler – Initial Results

PES Lessons learned from large scale LSF scalability tests

”The Ball” Radical Cloud Resource Consolidation

Memory Management for Scalable Web Data Servers

Large Scale Test of a storage solution based on an Industry Standard

Oracle Storage Performance Studies

Overview Introduction VPS Understanding VPS Architecture

Haiyan Meng and Douglas Thain

Network Attached Storage NAS100

KISS-Tree: Smart Latch-Free In-Memory Indexing on Modern Architectures

Prof. Leonardo Mostarda University of Camerino

Software System Performance

Performance And Scalability In Oracle9i And SQL Server 2000

Client/Server Computing and Web Technologies

Presentation transcript:

Diskpool and cloud storage benchmarks used in IT-DSS Geoffray ADDE

Outline I- A rational approach to storage systems evaluation Hypothesis, metrics and experiments Some examples II- Introduction to the benchmarking framework Why? What? Where? Who? How?

I- A Rational Approach A storage system A question An experiment: Architecture (features, hardware, software) Physical Resources (nodes, CPU, RAM, HD, SSD, network, …) User Interface (POSIX, GridFTP, S3, ...) A question What is the aggregated capacity of the system for a given operation? How does it scale? What is the impact of a given setting of the system on the system on a given operation? How the system behaves in degraded mode? during rolling upgrades? An experiment: ▪ a workload A set of clients (number, resources) A set of tasks (read/write, small/big files, random/pattern/seq access, combinations) selected metrics Host specific metrics (CPU, mem, IO, network) Client specific metrics (request processing time, request througput, ...)

I- Metadata Read Performance Storage System: 7 head nodes 2x10Gb fiber network 400 storage nodes, 700TB no encryption, 1 session per operation Question: How does the metadata read performance scale? Experiment: 20 boxes 1Gb network 24 cores 48GB memory up to 20 processes/box One repeated task : read an entire randomly chosen 4k file Host specific metrics (CPU, mem, IO, network) check any client bound Counting the number of completed requests Results: Scaling is almost linear Saturation of the metadata subsystem is not reached

I- Read Throughput Storage System: Question: Experiment: Results: 7 head nodes 2x10Gb fiber network 400 storage nodes, 700TB no encryption, 1 session per operation Question: How does the read throughput scale? Experiment: 20 boxes 1Gb network 24 cores 48GB memory up to 20 processes/box One repeated task : read an entire randomly chosen 100MB file Host specific metrics (CPU, mem, IO, network) to check any client bound number of completed requests, run time, network Results: Scales linearly up to 80% of the max bandwidth Beyond it's still growing slower and slower and reaches the max at 240 processes.

I- S3 plug-in for ROOT Storage System: Question: Experiment: Results: 7 head nodes 2x10Gb fiber network 400 storage nodes, 700TB no encryption, 1 session per operation Question: How fast is the ROOT S3 plugin compared to other storage plugins? Experiment: 20 boxes 1Gb network 24 cores 48GB memory up to 20 processes/box. Idle One real ATLAS ROOT file : 793MB on disk, 2.11GB after decompression, 12K entries, 6K branches, cache size = 30MB One client reads in entries sequentially in “physics tree” Time to process the task. CPU client to check that the client is not CPU bounded. Results: Negligible gap of performance compared to EOS

I- about metrics... Evaluation of a storage file system: Measuring a global quality of service Looking at aggregated distributions Host process-specific and machine-specific CPU (User / Kernel / Wait) RAM (mainly remaining RAM) Network (transmit / receive, only machine-specific) IO (local storage) Request response time Request throughput Dedicated user-defined metric Dedicated metrics on the server side Network metrics

II- A tool for the evaluation of storage systems Context: Metering the raw performances Evaluate the scalability(ies) Understanding the bottlenecks Need: Simulate the requests from a pool of clients (1~10000) Collect (if possible in real-time) metrics from these clients Don't interfere with this evaluation (low CPU/Net/Disk overhead) Ease of use (call, writing client, network, io) Proposed Solution: A C++ client/server framework Based on SSH (Kerberos) Using Root as a way of compacting/transmitting/presenting the data

II- Framework OverView Box 1 Box 2 Box n Master Box Master Spawner Slave Client Storage System SSH disk cpu net

II- Main Features Architecture: Master side: Remote side: SSH tunneling avoid any firewall issue (vs cyphering overhead) Timestamped data Built-in CPU, Network and Disk IO loads reporting (yet configurable) Master side: Clean Management of remote processes (signal 2, 15 and 9) Real-time reporting Built-in graphs for histograms (1D and 2D) and plots Command line interface Some debug facilities Remote side: One class, one instance, few methods. Can report histograms or raw data instant, elapsed, rate (./sec) A set of thread-safe methods and a SubId identifier to cope with multithreaded clients A Python wrapper is available

II- Status Technical facts: Level of maturity: Concret usage: requires root and boost build with cmake 4600 lines of code Level of maturity: Successfully tested on Ubuntu x86, x86_64, SLC 6.x Daily run on SLC 6.x on 5 to 20 boxes at a 1Hz reporting rate Concret usage: Evaluation of Huawei S3 implementation (stress-test and ROOT) Evaluation of OpenStack Perspectives: A web platform to collect measurements from many storage systems to build-up a benchmark.