MONARC Simulation Framework Corina Stratan, Ciprian Dobre UPB Iosif Legrand, Harvey Newman CALTECH.

Slides:



Advertisements
Similar presentations
Network II.5 simulator ..
Advertisements

Database Architectures and the Web
Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.
Jaringan Komputer Lanjut Packet Switching Network.
Network+ Guide to Networks, Fourth Edition
September, 1999MONARC - Distributed System Simulation I.C. Legrand1 MONARC Models Of Networked Analysis at Regional Centers Iosif C. Legrand (CERN/CIT)
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
Reference: Message Passing Fundamentals.
Improving Robustness in Distributed Systems Jeremy Russell Software Engineering Honours Project.
POLITEHNICA University of Bucharest California Institute of Technology National Center for Information Technology Ciprian Mihai Dobre Corina Stratan MONARC.
Chapter 10 Introduction to Wide Area Networks Data Communications and Computer Networks: A Business User’s Approach.
1 Introduction to Load Balancing: l Definition of Distributed systems. Collection of independent loosely coupled computing resources. l Load Balancing.
October 2003 Iosif Legrand Iosif Legrand California Institute of Technology.
WSN Simulation Template for OMNeT++
The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.
Measuring Performance Chapter 12 CSE807. Performance Measurement To assist in guaranteeing Service Level Agreements For capacity planning For troubleshooting.
Network+ Guide to Networks, Fourth Edition Chapter 1 An Introduction to Networking.
Distributed Data Management for Compute Grid Presented by Michael Di Stefano Founder of Author of Meeting: Tuesday, September 13 th, 2005.
Fundamentals of Python: From First Programs Through Data Structures
Overview SAP Basis Functions. SAP Technical Overview Learning Objectives What the Basis system is How does SAP handle a transaction request Differentiating.
EstiNet Network Simulator & Emulator 2014/06/ 尉遲仲涵.
What is Concurrent Programming? Maram Bani Younes.
Introduction to Parallel Programming MapReduce Except where otherwise noted all portions of this work are Copyright (c) 2007 Google and are licensed under.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
Network+ Guide to Networks, Fourth Edition Chapter 1 An Introduction to Networking.
 Cloud computing  Workflow  Workflow lifecycle  Workflow design  Workflow tools : xcp, eucalyptus, open nebula.
NetworkProtocols. Objectives Identify characteristics of TCP/IP, IPX/SPX, NetBIOS, and AppleTalk Understand position of network protocols in OSI Model.
1 Chapter Client-Server Interaction. 2 Functionality  Transport layer and layers below  Basic communication  Reliability  Application layer.
LOGO OPERATING SYSTEM Dalia AL-Dabbagh
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
Grid Data Management A network of computers forming prototype grids currently operate across Britain and the rest of the world, working on the data challenges.
July, 2000.Simulation of distributed computing systems I.C. Legrand1 MONARC Models Of Networked Analysis at Regional Centers Iosif C. Legrand (CALTECH)
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Dataset Caitlin Minteer & Kelly Clynes.
ACAT 2003 Iosif Legrand Iosif Legrand California Institute of Technology.
Ramiro Voicu December Design Considerations  Act as a true dynamic service and provide the necessary functionally to be used by any other services.
1 Next Few Classes Networking basics Protection & Security.
Scalable Web Server on Heterogeneous Cluster CHEN Ge.
Intro to Network Design
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
4 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved. Computer Software Chapter 4.
October, 2000.A Self Organsing NN for Job Scheduling in Distributed Systems I.C. Legrand1 Iosif C. Legrand CALTECH.
Operating Systems David Goldschmidt, Ph.D. Computer Science The College of Saint Rose CIS 432.
© Wiley Inc All Rights Reserved. MCSE: Windows Server 2003 Active Directory Planning, Implementation, and Maintenance Study Guide, Second Edition.
9 Systems Analysis and Design in a Changing World, Fourth Edition.
Packet switching network Data is divided into packets. Transfer of information as payload in data packets Packets undergo random delays & possible loss.
Interconnect simulation. Different levels for Evaluating an architecture Numerical models – Mathematic formulations to obtain performance characteristics.
Interconnect simulation. Different levels for Evaluating an architecture Numerical models – Mathematic formulations to obtain performance characteristics.
1 Client-Server Interaction. 2 Functionality Transport layer and layers below –Basic communication –Reliability Application layer –Abstractions Files.
NETWORKING FUNDAMENTALS. Network+ Guide to Networks, 4e2.
HIGUCHI Takeo Department of Physics, Faulty of Science, University of Tokyo Representing dBASF Development Team BELLE/CHEP20001 Distributed BELLE Analysis.
Xrootd Monitoring and Control Harsh Arora CERN. Setting Up Service  Monalisa Service  Monalisa Repository  Test Xrootd Server  ApMon Module.
April 2003 Iosif Legrand MONitoring Agents using a Large Integrated Services Architecture Iosif Legrand California Institute of Technology.
PPDG February 2002 Iosif Legrand Monitoring systems requirements, Prototype tools and integration with other services Iosif Legrand California Institute.
Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.
Library Online Resource Analysis (LORA) System Introduction Electronic information resources and databases have become an essential part of library collections.
Data Consolidation: A Task Scheduling and Data Migration Technique for Grid Networks Author: P. Kokkinos, K. Christodoulopoulos, A. Kretsis, and E. Varvarigos.
June 22, 1999MONARC Simulation System I.C. Legrand1 MONARC Models Of Networked Analysis at Regional Centres Distributed System Simulation Iosif C. Legrand.
January 20, 2000K. Sliwa/ Tufts University DOE/NSF ATLAS Review 1 SIMULATION OF DAILY ACTIVITITIES AT REGIONAL CENTERS MONARC Collaboration Alexander Nazarenko.
70-294: MCSE Guide to Microsoft Windows Server 2003 Active Directory, Enhanced Chapter 6: Active Directory Physical Design.
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
Introduction to operating systems What is an operating system? An operating system is a program that, from a programmer’s perspective, adds a variety of.
Network Processing Systems Design
Processes and threads.
Introduction to Load Balancing:
California Institute of Technology
Chapter 16: Distributed System Structures
Ch > 28.4.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
Network+ Guide to Networks, Fourth Edition
Presentation transcript:

MONARC Simulation Framework Corina Stratan, Ciprian Dobre UPB Iosif Legrand, Harvey Newman CALTECH

December 2003I.C. Legrand2 The GOALS of the Simulation Framework  The aim of this work is to continue and improve the development of the MONARC simulation framework  To perform realistic simulation and modelling of large scale distributed computing systems, customised for specific HEP applications.  To offer a dynamic and flexible simulation environment to be used as a design tool for large distributed systems  To provide a design framework to evaluate the performance of a range of possible computer systems, as measured by their ability to provide the physicists with the requested data in the required time, and to optimise the cost.

December 2003I.C. Legrand3 A Global View for Modelling Simulation Engine Basic Components Specific Components Computing Models LAN WAN DBCPU Scheduler Job Catalog Analysis Distributed Scheduler MetaData Jobs MONITORING REAL Systems Testbeds

December 2003I.C. Legrand4 Design Considerations This Simulation framework is not intended to be a detailed simulator for basic components such as operating systems, data base servers or routers. Instead, based on realistic mathematical models and measured parameters on test bed systems for all the basic components, it aims to correctly describe the performance and limitations of large distributed systems with complex interactions.

December 2003I.C. Legrand5 Simulation Engine Simulation Engine Basic Components Specific Components Computing Models LAN WAN DBCPU Scheduler Job Catalog Analysis Distributed Scheduler MetaData Jobs MONITORING REAL Systems Testbeds

December 2003I.C. Legrand6 Design Considerations of the Simulation Engine  A process oriented approach for discrete event simulation is well suited to describe concurrent running programs.  “Active objects” (having an execution thread, a program counter, stack...) provide an easy way to map the structure of a set of distributed running programs into the simulation environment.  The Simulation engine supports an “interrupt” scheme This allows effective & correct simulation for concurrent processes with very different time scale by using a DES approach with a continuous process flow between events

December 2003I.C. Legrand7 The Simulation Engine – Tasks and Events Create d Ready Running Waiting Finished Assigned to worker thread semaphore.v() Event happens or sleeping period is over semaphore.p() Task – for simulating an entity with time dependent behavior (active object, server, …) 5 possible states for a task: CREATED, READY, RUNNING, FINISHED, WAITING Each task maintains an internal semaphore necessary for switching between states. Event - used for communication and synchronization between tasks: when a task must notify another task about something that happened or will happen in the future, it creates an event addressed to that task. The events are queued and sent to the destination tasks by the engine’s scheduler. Event - used for communication and synchronization between tasks: when a task must notify another task about something that happened or will happen in the future, it creates an event addressed to that task. The events are queued and sent to the destination tasks by the engine’s scheduler.

December 2003I.C. Legrand8 Tests of the Engine Processing a TOTAL of simple jobs in 1, 10, 100, 1000, 2 000, 4 000, CPUs using the same number of parallel threads more tests:

December 2003I.C. Legrand9 Basic Components Simulation Engine Basic Components Specific Components Computing Models LAN WAN DBCPU Scheduler Job Catalog Analysis Distributed Scheduler MetaData Jobs MONITORING REAL Systems Testbeds

December 2003I.C. Legrand10 Basic Components u These Basic components are capable to simulate the core functionality for general distributed computing systems. They are constructed based on the simulation engine and are using efficiently the implementation of the interrupt functionality for the active objects. u These components should be considered the basic classes from which specific components can be derived and constructed

December 2003I.C. Legrand11 Basic Components u Computing Nodes u Network Links and Routers, IO protocols u Data Containers u Servers  Data Base Servers  File Servers (FTP, NFS … ) u Jobs  Processing Jobs  FTP jobs u Scripts & Graph execution schemes u Basic Scheduler u Activities ( a time sequence of jobs )

December 2003I.C. Legrand12 Multitasking Processing Model Concurrent running tasks share resources (CPU, memory, I/O) “ Interrupt” driven scheme: For each new task or when one task is finished, an interrupt is generated and all “processing times” are recomputed. It provides: Handling of concurrent jobs with different priorities. An efficient mechanism to simulate multitask processing. An easy way to apply different load balancing schemes.

December 2003I.C. Legrand13 LAN/WAN Simulation Model Node Link Node LAN Node Link Node LAN Node Link Node LAN Internet Connections ROUTER “Interrupt” driven simulation : for each new message an interrupt is created and for all the active transfers the speed and the estimated time to complete the transfer are recalculated. Continuous Flow between events ! An efficient and realistic way to simulate concurrent transfers having different sizes / protocols.

December 2003I.C. Legrand14 Network model  data traffic simulated for both local and wide area networks  a simulation at the packet level is practically impossible  we adopted a larger scale approach, based on an “interrupt” mechanism Components of the network model Network Entity: LAN, WAN, LinkPort main attribute: bandwidth keeps the evidence of the messages that traverse it Network Entity: LAN, WAN, LinkPort main attribute: bandwidth keeps the evidence of the messages that traverse it

December 2003I.C. Legrand15 Simulating the network transfers CERN Router CPU LinkPort CPU LinkPort newMessage CERN WAN CERN LAN Caltech WAN Caltech LAN Caltech Router INT Message1 Message3 INT Message2 INT 1. The route and the available bandwidth for the new message are determined. 1. The messages on the route are interrupted and their speeds are recalculated.  interrupt mechanism similar with the one used for job execution simulation  the initial speed of a message is determined by evaluating the bandwidth that each entity on the route can offer  different network protocols can be modelled

December 2003I.C. Legrand16 Job Scheduling and Execution Activity1 class Activity1 extends Activity { … public void pushJobs() { … Job newJob = new Job (…); addJob(newJob); … } … } Activity1 class Activity1 extends Activity { … public void pushJobs() { … Job newJob = new Job (…); addJob(newJob); … } … } Activity2 class Activity2 extends Activity { … } Activity2 class Activity2 extends Activity { … } CPU 1 CPU 2 CPU 3 Job 3 (30% CPU) Job 4 (30% CPU) Job 5 (40% CPU) Job 1 (30% CPU) Job 2 (70% CPU) Job 6 (50% CPU) Job 7 (50% CPU) INT 1. The activity class creates a job and submits it to the farm. 2. The job scheduler sends the new job to a CPU unit. All the jobs executing on that CPU are interrupted. 3. CPU power reallocated on the unit where the new job was scheduled. The interrupted jobs reestimate their completion time. 1 newJob 2 Job 6 (33% CPU) Job 7 (33% CPU) newJob (33% CPU) 3

December 2003I.C. Legrand17 Output of the simulation Simulation Engine Node DB Router User C Output Listener Filters Output Listener Filters Log Files EXEL GRAPHICS Any component in the system can generate generic results objects Any client can subscribe with a filter and will receive the results it is Interested in. VERY SIMILAR structure as in MonALISA. We will integrate soon The output of the simulation framework into MonaLISA

December 2003I.C. Legrand18 Specific Components Simulation Engine Basic Components Specific Components Computing Models LAN WAN DBCPU Scheduler Job Catalog Analysis Distributed Scheduler MetaData Jobs MONITORING REAL Systems Testbeds

December 2003I.C. Legrand19 Specific Components These Components should be derived from the basic components and must implement the specific characteristics and way they will operate. Major Parts : u Data Model u Data Flow Diagrams from Production and especially for Analysis Jobs u Scheduling / pre-allocation policies u Data Replication Strategies

December 2003I.C. Legrand20 Generic Data Container  Size  Event Type  Event Range  Access Count  INSTANCE Data Model Data Model FTP Server Node DB ServerNFS Server FILEData Base Custom Data Server Network FILE META DATA Catalog Replication Catalog Export / Import

December 2003I.C. Legrand21 Data Model (2) Data Model (2) Data Container JOB META DATA Catalog Replication Catalog Data Request Data Container List Of IO Transactions Data Processing JOB Select from the options

December 2003I.C. Legrand22 Database Functionality  Client-server model  Automatic storage management is possible, with data being sent to mass storage units 1. The job wants to write a container into the database DB1, but the server is out of storage space. 2. The least frequently used container is moved to a mass storage unit. The new container is written to the database. 3 kinds of requests for the database server: write read get (read the data and erase it from the server) Automatic storage management example: DatabaseServer DContainer 1 DContainer 2 DB1 DContainer 15 DContainer 16 DB2 … DContainer 20 DContainer 21 DContainer 22 DContainer 24 DContainer 3 Mass Storage 1 writeData() 1 Mass Storage 2 DContainer 23

December 2003I.C. Legrand23 Data Flow Diagrams for JOBS Processing 1 Input Output Processing 2 Processing 3 Processing 4 Output Processing 4 Output Input 10x Input and output is a collection of data. This data is described by type and range Process is described by name A fine granularity decomposition of processes which can be executed independently and the way they communicate can be very useful for optimization and parallel execution !

December 2003I.C. Legrand24 Job Scheduling Centralized Scheme CPU FARM JobScheduler CPU FARM JobScheduler Site ASite B GLOBAL Job Scheduler Dynamically loadable module

December 2003I.C. Legrand25 Job Scheduling Distributed Scheme – market model CPU FARM JobScheduler CPU FARM JobScheduler Site ASite B CPU FARM JobScheduler Site A Request COST DECISION

December 2003I.C. Legrand26 Computing Models Simulation Engine Basic Components Specific Components Computing Models LAN WAN DBCPU Scheduler Job Catalog Analysis Distributed Scheduler MetaData Jobs MONITORING REAL Systems Testbeds

December 2003I.C. Legrand27 Activities: Arrival Patterns A flexible mechanism to define the Stochastic process of how users perform data processing tasks Dynamic loading of “Activity” tasks, which are threaded objects and are controlled by the simulation scheduling mechanism Physics Activities Injecting “Jobs” Each “Activity” thread generates data processing jobs for( int k =0; k< jobs_per_group; k++) { Job job = new Job( this, Job.ANALYSIS, "TAG”, 1, events_to_process); farm.addJob(job ); // submit the job sim_hold ( 1000 ); // wait 1000 s } Regional Centre Farm Job Activity Job Activity These dynamic objects are used to model the users behavior

December 2003I.C. Legrand28 Regional Centre Model Complex Composite Object Simplified topology of the Centers AB C D E Job Activity Job Scheduler AJob CPU... Link Port AJob CPU... Link Port AJob CPU... Link Port DB Index DB Server Link Port DB Server Link Port FARM REGIONAL CENTER LAN WAN

December 2003I.C. Legrand29 MONARC - Main Classes AJob WorkerThread LinkPort WAN LAN NetworkEntity UDPMessage TCPMessage Message TCPProtocol UDPProtocol Protocol CPUCluster CPUUnit AbstractCPUUnit Farm RegionalCenter Activity MassStorage DatabaseServer DatabaseEntity Database DatabaseIndex MetaJob JobDatabase JobFTP JobProcessData Job QScheduler DistribScheduler JobScheduler Scheduler DContainer EventQueue Pool Task Event

December 2003I.C. Legrand30 Monitoring Simulation Engine Basic Components Specific Components Computing Models LAN WAN DBCPU Scheduler Job Catalog Analysis Distributed Scheduler MetaData Jobs MONITORING REAL Systems Testbeds

December 2003I.C. Legrand31 Real Need for Flexible Monitoring Systems u It is important to measure & monitor the Key applications in a well defined test environment and to extract the parameters we need for modeling u Monitor the farms used today, and try to understand how they work and simulate such systems. u It requires a flexible monitoring system able to dynamically add new parameters and provide access to historical data u Interfacing monitoring tools to get the parameters we need in simulations in a nearly automatic way u MonALISA was designed and developed based on the experience with the simulation problems.

December 2003I.C. Legrand32 EXAMPLES

December 2003I.C. Legrand33 FTP and NFS clusters FTP (NFS) Server Client n Client 3 Client 1 Client 2 request events This examples evaluate the performance of a local area network with a server and several worker stations. The server stores events used by the processing nodes.  NFS Example: the server concurrently delivers the events, one by one to the clients.  FTP Example: the server sends a whole file with events in a single transfer

December 2003I.C. Legrand34 FTP Cluster 50 CPU units x 2 Jobs per unit 100 events per job, event size 1MB LAN bandwidth 1 Gbps, server’s effective bandwidth 60Mbps

December 2003I.C. Legrand35 NFS Cluster

December 2003I.C. Legrand36 Distributed Scheduling CERN Caltech KEK FNAL Regional Center Jobs export() Job Migration: when a regional center is assigned too many jobs, it sends a part of them to other centers with more free resources New job scheduler implemented, which supports job migration, applying load balancing criteria Job Migration: when a regional center is assigned too many jobs, it sends a part of them to other centers with more free resources New job scheduler implemented, which supports job migration, applying load balancing criteria We tested different configurations, with 1, 2 and 4 regional centers, and with different numbers of CPUs per regional center. The number of jobs submitted is kept constant, the job arrival rate varying during a day.

December 2003I.C. Legrand37 Distributed Scheduling (2) Average processing time and CPU usage for 1, 2, 4, 6 centers Test Case: 4 regional centers, 20 CPUs per center average job processing time 3h, approx. 500 jobs per day submitted in a center Test Case: 4 regional centers, 20 CPUs per center average job processing time 3h, approx. 500 jobs per day submitted in a center

December 2003I.C. Legrand38 CERNCaltechKEKFNAL  similar with the previous example, but the jobs are more complex, involving network transfers  centers connected in a chain configuration: Every job submitted to a regional center needs an amount of data located in that center. If the job is exported to another center, would the benefits be great enough to compensate the cost of the data transfer? Distributed Scheduling (3) Chain WAN connection

December 2003I.C. Legrand39 Distributed Scheduling (4) The network transfers are more intense in the centers from the middle of the chain (like Caltech) The average processing time significantly increases when reducing the bandwidth and the number of CPUs

December 2003I.C. Legrand40 Distributed Scheduling (5)

December 2003I.C. Legrand41 Local Data Replication  Evaluates the performance improvements that can be obtained by replicating data.  We simulated a regional center which has a number of database servers, and another four centers which host jobs that process the data on those database servers  A better performance can be obtained if the data from the servers is replicated into the other regional centers  Evaluates the performance improvements that can be obtained by replicating data.  We simulated a regional center which has a number of database servers, and another four centers which host jobs that process the data on those database servers  A better performance can be obtained if the data from the servers is replicated into the other regional centers

December 2003I.C. Legrand42 Local Data Replication (2)

December 2003I.C. Legrand43 WAN Data Replication Replica Common Link Replica Common Link Jobs similar with the previous example, but now with two central servers, each holding an equal amount of replicated data, and eight satellite regional centers, hosting worker jobs a worker job will get a number of events from one of the central regional centers (one event at a time) and process them locally workers choose the “best” server to get the data from. They use a Replication Load balancing service (knowing the load of the network and of the servers) VS The server is chosen randomly workers choose the “best” server to get the data from. They use a Replication Load balancing service (knowing the load of the network and of the servers) VS The server is chosen randomly

December 2003I.C. Legrand44 WAN Data Replication Both servers have the same bandwidth and support the same maximum load One server has half of the other’s bandwidth and supports half of its maximum load Better average response time, total execution time is smaller when taking decisions based on load balancing

December 2003I.C. Legrand45 Summary  Modelling and understanding current systems, their performance and limitations, is essential for the design of the large scale distributed processing systems. This will require continuous iterations between modelling and monitoring  Simulation and Modelling tools must provide the functionality to help in designing complex systems and evaluate different strategies and algorithms for the decision making units and the data flow management.