Presentation is loading. Please wait.

Presentation is loading. Please wait.

Distributed applications monitoring at system and network level A.Brunengo (INFN- Ge), A.Ghiselli (INFN-Cnaf), L.Luminari (INFN-Roma1), L.Perini (INFN-Mi),

Similar presentations


Presentation on theme: "Distributed applications monitoring at system and network level A.Brunengo (INFN- Ge), A.Ghiselli (INFN-Cnaf), L.Luminari (INFN-Roma1), L.Perini (INFN-Mi),"— Presentation transcript:

1 Distributed applications monitoring at system and network level A.Brunengo (INFN- Ge), A.Ghiselli (INFN-Cnaf), L.Luminari (INFN-Roma1), L.Perini (INFN-Mi), S.Resconi (INFN-Mi), M.Sgaravatto (INFN-Pd), C.Vistoli (INFN-Cnaf) for the Monarc Collaboration

2 7 feb. 2000 - CHEP 2000C. Vistoli INFN/CNAF2 Objectives analysis of measurements regarding resource utilization in a distributed environment: –CPU usage, –network throughput –wall clock time of a single job to locate –system and network bottlenecks, –software and hardware inefficiencies in different scenarios. to understand –client behaviour and system resource requirements –network impact to the application behaviour and application efficency using network

3 7 feb. 2000 - CHEP 2000C. Vistoli INFN/CNAF3 Test description ATLFast++ stress tests: increasing number of concurrent jobs with read access to the Data Base A single job reads ~3000 events 40KB each Single objectivity federation System configuration –Single workstation (without AMS server) –One AMS server - one client machine –One AMS server - many client machines Network configuration –LAN (Gigabit Ethernet,Fast Ethernet, Ethernet) –WAN with different bandwidth capacity (2Mbps to 8Mbps) –QOS/Differentiated services WAN link 2Mbps

4 7 feb. 2000 - CHEP 2000C. Vistoli INFN/CNAF4 System parameters collected: –Client side: CPU use (by user and system), job wall clock time –Server side: CPU use (by user and system), network throughput CPU use in client machine is important to evaluate machine load versus number of concurrent jobs with different link speed. CPU use on Server is important to evaluate the maximum number of client-jobs that can be served and if this is related with client characteristics and network link capacity. Wall clock time execution is important to evaluate system capacity to deliver workload in connection with the number of jobs and network speed. Application monitoring

5 7 feb. 2000 - CHEP 2000C. Vistoli INFN/CNAF5 CPU usage (client and server) is collected via periodical ‘vmstat’ commands Application itself records elapsed time and Cpu time Aggregate server throughput is collected tracing the AMS server process systems calls: –every 2 minutes a script compute and sum n.bytes read/send and n.bytes write /received. Application monitoring

6 7 feb. 2000 - CHEP 2000C. Vistoli INFN/CNAF6 Test results One Client – One Server – Gigabit Ethernet Server 1000BaseSX sunlab1gsun Client sunlab1, gsun: Sun Ultra5, 333 MHz, 128 MB RAM, Solaris 2.7 (14 SpecInt 95)

7 7 feb. 2000 - CHEP 2000C. Vistoli INFN/CNAF7

8 7 feb. 2000 - CHEP 2000C. Vistoli INFN/CNAF8

9 7 feb. 2000 - CHEP 2000C. Vistoli INFN/CNAF9

10 7 feb. 2000 - CHEP 2000C. Vistoli INFN/CNAF10

11 7 feb. 2000 - CHEP 2000C. Vistoli INFN/CNAF11 Comments: –Client CPU: 100 % used with 5 jobs –Server CPU: 100 % used with 50 jobs –After 40 concurrent jobs, Client CPU decreases as well as network throughput

12 7 feb. 2000 - CHEP 2000C. Vistoli INFN/CNAF12 One Client – One Server – Fast Ethernet Rome Babar Farm

13 7 feb. 2000 - CHEP 2000C. Vistoli INFN/CNAF13

14 7 feb. 2000 - CHEP 2000C. Vistoli INFN/CNAF14

15 7 feb. 2000 - CHEP 2000C. Vistoli INFN/CNAF15

16 7 feb. 2000 - CHEP 2000C. Vistoli INFN/CNAF16

17 7 feb. 2000 - CHEP 2000C. Vistoli INFN/CNAF17 Comments: –Client CPU: 60 % used up to 30 jobs then 20% –Server CPU: 100 % used with 5 jobs (multi- processor server used as mono-processor) –After 30 concurrent jobs execution wall clock execution increase rapidly –After 30 concurrent jobs, Client CPU decreases as well as network throughput

18 7 feb. 2000 - CHEP 2000C. Vistoli INFN/CNAF18 one server (Bologna) – one client(CERN) QOS/Differentiaited Service 2Mbps WAN ATM link 2 Mbps sunlab1monarc01 ServerClient sunlab1: Sun Ultra5, 333 MHz, 128 MB, Solaris 2.7 monarc01: Sun Enterprise 450 - 4 X 400 MHz, 512 MB, Solaris 2.6

19 7 feb. 2000 - CHEP 2000C. Vistoli INFN/CNAF19 4717 cells/sec --> 4717 * 48 byte/ sec = 1811 Kbit/sec. Available bandwidth completely used. CPU server and client unloaded. Very high job elapsed time. Differentiated services mechanism working properly (with precise network parameters tuning).

20 7 feb. 2000 - CHEP 2000C. Vistoli INFN/CNAF20 Clients sunlab1, gsun, cmssun4, atlsun1, atlas4: Sun Ultra5/10, 333 MHz, 128 MB vlsi06: Sun SPARC20, 125 MHz, 128 MB monarc01: Sun Enterprise 450, 4X 400 MHz, 512 MB sunlab1 Server 1000BaseT 2 Mbps 8 Mbps sunlab1 cmssun4vlsi06atlsun1atlas4 monarc01 WAN Test

21 7 feb. 2000 - CHEP 2000C. Vistoli INFN/CNAF21 Comments client CPU: never 100 % used server CPU: never 100 % used wall clock time for job running in the workstation connected via GE very high (i.e. for 10 jobs >1000’; 400’ without other clients in WAN): slow clients degrade performances on fast clients ???

22 7 feb. 2000 - CHEP 2000C. Vistoli INFN/CNAF22 Clients Clients and server: Sun Enterprise 450, 4X 400 MHz, 512 MB cutter Server 100BaseT bbfarm01 bbfarm02bbfarm03bbfarm04 One server –many clients LAN FE Rome Babar Farm

23 7 feb. 2000 - CHEP 2000C. Vistoli INFN/CNAF23 Comments: Throughput decrease when there are more than 30 job in the server. Cpu server is always 100% used. Starting with 40 concurrent jobs in one client jobs start crashing.

24 7 feb. 2000 - CHEP 2000C. Vistoli INFN/CNAF24 Test results GE test: client CPU is saturated with 5 jobs FE test: server CPU is saturated with 30 jobs Ethernet test: network is saturated

25 7 feb. 2000 - CHEP 2000C. Vistoli INFN/CNAF25 Test results GE utilization is poor Network LinkServer host Network speed Max throughput Number of jobs 1000M GEthernet 37Mbps  20 100M FEthernet 80Mbps  30 10M Ethernet9Mbps  20 2M VC ATM1.7Mbps  20

26 7 feb. 2000 - CHEP 2000C. Vistoli INFN/CNAF26 Conclusion Objectivity 5.1 behaviour on different network layouts: –Application/Objectivity in powerful client machine use high CPU. –AMS is not able to use multi CPU machine Optimal measured values for the server corresponds to 30 connection from concurrent remote jobs. –Too small for a production environment. Identified boundary condition for efficent running with the specific CPU. Acceptable running condition: –Link Server/Client minimum speed 8Mbps –Client machine from 6 to 15 concurrent analysis jobs –Server 30 conncurrent jobs request Global performance degrades rapidly moving away from optimal condition

27 7 feb. 2000 - CHEP 2000C. Vistoli INFN/CNAF27 Future works Application monitoring tools able to real time check the working conditions, to take necessary action to mantain the system around the optimal conditions. Test Objectivity 5.2 features Test a Multi Server Configuration using read/write application. Dedicated WAN Test-bed with 10Mbps bandwitdh links. LAN - WAN behaviour with equivalent high bandwidth capacity to be investigated deeply: –Host tuning and RTT could impact performances –QoS documentation: www.cern.ch/MONARC


Download ppt "Distributed applications monitoring at system and network level A.Brunengo (INFN- Ge), A.Ghiselli (INFN-Cnaf), L.Luminari (INFN-Roma1), L.Perini (INFN-Mi),"

Similar presentations


Ads by Google