1 R. Voicu 1, I. Legrand 1, H. Newman 1 2 C.Grigoras 1 California Institute of Technology 2 CERN CHEP 2010 Taipei, October 21 st, 2010 End to End Storage.

1 R. Voicu 1, I. Legrand 1, H. Newman 1 2 C.Grigoras 1 California Institute of Technology 2 CERN CHEP 2010 Taipei, October 21 st, 2010 End to End Storage Performance Measurements

2 Introduction  Understanding the end to end performance of storage systems to move large amounts of data over the wide area network is critical for data management planning and very useful for debugging such complex and heterogeneous systems  In general, each storage system has a set of data servers. The number of data servers varies from a few to hundreds and is very much dependent on the type of storage technology used by a site.  The entire system is highly heterogeneous : network, storage and transfer application wise  An End to End monitoring should bring a common view and should help in debugging and understanding data transfer performance in a transparent manner

3 Data moving patterns Network infrastructure (LAN + WAN) Site A Site B Storage cluster Data transfer gateway(s) Data transfer gateway(s) Data flow

4 Network infrastructure (LAN + WAN) Site ASite B Storage cluster Data flow Data moving patterns (2)

5 Monitoring metrics Network infrastructure (LAN + WAN) Site B Storage cluster Data transfer gateway(s) Data transfer gateway(s) App Monitoring CPU usage, Memory Disk to Network Queue monitoring Full HW&OS Monitoring CPU usage, Memory, Load, Eth traffic, etc Disk IO Load Net Monitoring Net interfaces traffic Error countersNetflow/SFlow

6 Motivation  The MonALISA monitoring system includes:  Local host monitoring (CPU, memory, network traffic, Disk I/O, processes and sockets in each state, LM sensors, APC UPSs), log files tailing  SNMP generic & specific modules  Condor, PBS, LSF and SGE (accounting & host monitoring), Ganglia  Ping, tracepath, traceroute, pathload and other network-related measurements  TL1, Network devices, Ciena, Optical switches  Calling external applications/scripts that return as output the values  XDR-formatted UDP messages (such as ApMon)

7 Motivation(cont) Monitoring architecture in ALICE 7 Long History DB LCG Tools MonALISA @Site ApMon AliEn Job Agent ApMon AliEn Job Agent ApMon AliEn Job Agent MonALISA @CERN MonALISA LCG Site ApMon AliEn CE ApMon AliEn SE ApMon Cluster Monitor ApMon AliEn TQ ApMon AliEn Job Agent ApMon AliEn Job Agent ApMon AliEn Job Agent ApMon AliEn CE ApMon AliEn SE ApMon Cluster Monitor ApMon AliEn IS ApMon AliEn Optimizers ApMon AliEn Brokers ApMon MySQL Servers ApMon CastorGrid Scripts ApMon API Services MonaLisaRepository Aggregated Data rss vsz cpu time run time job slots free space nr. of files open files Queued JobAgents cpu ksi2k job status disk used processes load net In/out jobs status sockets migrated mbytes active sessions MyProxy status Alerts Actions Orchestrated Data transfer tool Orchestrated Orchestrated

FDT – Fast Data Transfer  FDT is an open source application, developed at Caltech, for efficient data transfers  Easy to use: similar syntax with SCP, iperf/netperf  Written in java and runs on all major platforms.  Single.jar file (~800 KB)  Based on an asynchronous, multithreaded system  Uses the New I/O (NIO) interface and is able to:  stream continuously a list of files  use independent threads to read and write on each physical device  transfer data in parallel on multiple TCP streams, when necessary  use appropriate buffer size for disk IO and networking  resume a file transfer session

FDT - Architecture Pool of buffers Kernel Space Data Transfer Sockets / Channels Independent threads per device Restore the files from buffers Control connection / authorization Pool of buffers Kernel Space

FDT features  User defined loadable modules for Pre and Post Processing to provide support for dedicated Mass Storage system, compression, dynamic circuit setup, …  Pluggable file systems “providers” (e.g. non-POSIX FS)  Dynamic bandwidth capping (can be controlled by MonALISA)  Different transport strategies:  blocking (1 thread per channel)  non-blocking (selector + pool of threads)  On the fly MD5 checksum on the reader side  Configurable number of streams and threads per physical device (useful for distributed FS)  Automatic updates  Can be used as network testing tool (/dev/zero → /dev/null memory transfers, or –nettest flag)

FDT Throughput tests – 1 Stream

12 Active End to End Available Bandwidth between all the ALICE grid sites

Active End to End Available Bandwidth between all the ALICE grid sites (2) 1 Gbps network card Newer kernel Tuned TCP Buffers 100 Mbps network card Default kernels Default TCP Buffers Different trends = different kernels

14 CPU and Disk I/O performance metrics

15 CPU and Disk I/O performance metrics 8 Fast SAS Disks RAID6 512MB Raid controller Clear correlation Expected behavior

16 CPU and Disk I/O performance metrics <8 Mbytes/s RW Speed CPU Idle > 85% CPU IOWait ~ 10-11% IO Utilization 100% !! Same I/O hardware: 8 Fast SAS Disks RAID6 512MB Raid controller

17  MonALISA provides a wide set of monitoring modules  Full host system monitoring: CPU usage, System load, Disk IO/Load, Memory, Swap, etc  Network infrastructure monitoring and topology  Application monitoring (via ApMon)  Local and global triggers and alarms  Powerful correlation framework ; a few lines of config file to send an alert (email, SMS, IM, etc) or take an action (e.g. restarting a service)  The security and controlling infrastructure is built-in.  Orchestrated end to end tests (bandwidth and disk) are scheduled using secure channels  End to end network bandwidth tests (memory to memory) Current status

18 Future plans  Add per process/task I/O disk statistics  Add local disk performance tests (only when the disk is idle) to get a baseline  Extend the End to End tests to the entire chain:  Site A Disk => Network => Site B Disk  Check if the network and disk to disk baseline match  Raise alarms in case of problems  Disk IO Utilization to R/W I/O ratio below a certain threshold  Current performance below the established baseline  Integrate with the network topology measurements (already used to choose best SE, based on RTT)

19 Q&A http://monalisa.caltech.edu http://alimonitor.cern.ch http://monalisa.cern.ch/FDT

1 R. Voicu 1, I. Legrand 1, H. Newman 1 2 C.Grigoras 1 California Institute of Technology 2 CERN CHEP 2010 Taipei, October 21 st, 2010 End to End Storage.

Similar presentations

Presentation on theme: "1 R. Voicu 1, I. Legrand 1, H. Newman 1 2 C.Grigoras 1 California Institute of Technology 2 CERN CHEP 2010 Taipei, October 21 st, 2010 End to End Storage."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 R. Voicu 1, I. Legrand 1, H. Newman 1 2 C.Grigoras 1 California Institute of Technology 2 CERN CHEP 2010 Taipei, October 21 st, 2010 End to End Storage.

Similar presentations

Presentation on theme: "1 R. Voicu 1, I. Legrand 1, H. Newman 1 2 C.Grigoras 1 California Institute of Technology 2 CERN CHEP 2010 Taipei, October 21 st, 2010 End to End Storage."— Presentation transcript:

Similar presentations

About project

Feedback