Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.

Slides:



Advertisements
Similar presentations
More on Processes Chapter 3. Process image _the physical representation of a process in the OS _an address space consisting of code, data and stack segments.
Advertisements

Chapter 3 Process Description and Control
Caching and Virtual Memory. Main Points Cache concept – Hardware vs. software caches When caches work and when they don’t – Spatial/temporal locality.
 2004 Deitel & Associates, Inc. All rights reserved. 1 Chapter 3 – Process Concepts Outline 3.1 Introduction 3.1.1Definition of Process 3.2Process States:
Flash: An efficient and portable Web server Authors: Vivek S. Pai, Peter Druschel, Willy Zwaenepoel Presented at the Usenix Technical Conference, June.
CS 333 Introduction to Operating Systems Class 11 – Virtual Memory (1)
Figure 2.8 Compiler phases Compiling. Figure 2.9 Object module Linking.
Dawson R. Engler, M. Frans Kaashoek, and James O’Toole Jr.
Introduction to Kernel
I/O Hardware n Incredible variety of I/O devices n Common concepts: – Port – connection point to the computer – Bus (daisy chain or shared direct access)
3.5 Interprocess Communication Many operating systems provide mechanisms for interprocess communication (IPC) –Processes must communicate with one another.
3.5 Interprocess Communication
Advanced OS Chapter 3p2 Sections 3.4 / 3.5. Interrupts These enable software to respond to signals from hardware. The set of instructions to be executed.
Threads CSCI 444/544 Operating Systems Fall 2008.
1 Outline File Systems Implementation How disks work How to organize data (files) on disks Data structures Placement of files on disk.
Cs238 Lecture 3 Operating System Structures Dr. Alan R. Davis.
1 Lecture 14: Virtual Memory Today: DRAM and Virtual memory basics (Sections )
OPERATING SYSTEMS Introduction
1 Process Description and Control Chapter 3 = Why process? = What is a process? = How to represent processes? = How to control processes?
Rensselaer Polytechnic Institute CSC 432 – Operating Systems David Goldschmidt, Ph.D.
Evaluating current processors performance and machines stability R. Esposito 2, P. Mastroserio 2, F. Taurino 2,1, G. Tortone 2 1 INFM, Sez. di Napoli,
Caching and Virtual Memory. Main Points Cache concept – Hardware vs. software caches When caches work and when they don’t – Spatial/temporal locality.
Rensselaer Polytechnic Institute CSCI-4210 – Operating Systems David Goldschmidt, Ph.D.
System Calls 1.
The Linux Benchmark Project Randy Appleton Kurt Payne Joe Schmeltzer Carey Stortz
Hardware Definitions –Port: Point of connection –Bus: Interface Daisy Chain (A=>B=>…=>X) Shared Direct Device Access –Controller: Device Electronics –Registers:
Rensselaer Polytechnic Institute CSCI-4210 – Operating Systems CSCI-6140 – Computer Operating Systems David Goldschmidt, Ph.D.
Lecture 3 Process Concepts. What is a Process? A process is the dynamic execution context of an executing program. Several processes may run concurrently,
By Teacher Asma Aleisa Year 1433 H.   Goals of memory management  To provide a convenient abstraction for programming  To allocate scarce memory resources.
Chapter 1 : The Linux System Part 1 Lecture 1 10/21/
LINUX System : Lecture 7 Bong-Soo Sohn Lecture notes acknowledgement : The design of UNIX Operating System.
OS, , Part I Operating - System Structures Department of Computer Engineering, PSUWannarat Suntiamorntut.
Processes and Virtual Memory
Process Description and Control Chapter 3. Source Modified slides from Missouri U. of Science and Tech.
Efficient Software-Based Fault Isolation Authors: Robert Wahbe Steven Lucco Thomas E. Anderson Susan L. Graham Presenter: Gregory Netland.
Concepts and Structures. Main difficulties with OS design synchronization ensure a program waiting for an I/O device receives the signal mutual exclusion.
Major OS Components CS 416: Operating Systems Design, Spring 2001 Department of Computer Science Rutgers University
Embedded Real-Time Systems Processing interrupts Lecturer Department University.
Virtual Memory. Cache memory enhances performance by providing faster memory access speed. Virtual memory enhances performance by providing greater memory.
Virtual Memory Alan L. Cox Some slides adapted from CMU slides.
OPERATING SYSTEMS STRUCTURES Jerry Breecher 2: Operating System Structures 1.
Computer Architecture Lecture 12: Virtual Memory I
Operating Systems {week 01.b}
Lecture 24 – Paging implementation
Introduction to Kernel
Kernel Design & Implementation
Module 12: I/O Systems I/O hardware Application I/O Interface
Virtual Memory: Systems
143A: Principles of Operating Systems Lecture 6: Address translation (Paging) Anton Burtsev October, 2017.
Chapter 3 – Process Concepts
CSE 153 Design of Operating Systems Winter 2018
CSE 153 Design of Operating Systems Winter 2018
Operation System Program 4
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
Virtual Memory: Systems
Pentium/Linux Memory System
CS703 - Advanced Operating Systems
Pentium III / Linux Memory System April 4, 2000
Virtual Memory.
Threads and Concurrency
Introduction to Operating Systems
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
LINUX System : Lecture 7 Lecture notes acknowledgement : The design of UNIX Operating System.
Virtual Memory: Systems CSCI 380: Operating Systems
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
CSE 153 Design of Operating Systems Winter 2019
CSE 153 Design of Operating Systems Winter 2019
Introduction to LMbench
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
Presentation transcript:

Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www

Lmbench: Micro-Benchmark Suite Simple, portable benchmarks Compares different Unix systems performance Measures latency and bandwidth Only analyzes performance of processor, memory, network, file system and disk Free software

Compiler & optimization issues The GNU C compiler is used for all the resources but copper IBM xlc compiler was used on copper. All of the benchmarks were compiled with optimization -O except the benchmarks that calculate clock speed and the context switch times

Metrics in the Benchmark Bandwidth Pipe/ TCP Cached file read Memory copy Memory read/write Latency System call Signal handling Process creation Basic CPU operations Context switching Inter process communication File and VM system Memory read latencies

Metrics in the Benchmark Bandwidth Pipe/ TCP Cached file read Memory copy Memory read/write Latency System call Signal handling Process creation Basic CPU operations Context switching Inter process communication File and VM system Memory read latencies

Inter Process Communication Bandwidth Transfers 64 MB of data in 64 KB chunks through Unix Pipe Unix sockets TCP/IP sockets MB/sec

Inter Process Communication Bandwidth Transfers 64 MB of data in 64 KB chunks through Unix Pipe Unix sockets TCP/IP sockets MB/sec W Co

Metrics in the Benchmark Bandwidth Pipe/ TCP Cached file read Memory copy Memory read/write Latency System call Signal handling Process creation Basic CPU operations Context switching File and VM system Inter process communication Memory read latencies

Cached file read A reread benchmark, intended to be used on a file that is in memory File reread : copies data from the kernel’s file system page into the processor’s buffer Mmap reread : maps the entire file (8 MB) into process’s address space

Metrics in the Benchmark Bandwidth Pipe/TCP Cached file read Memory copy Memory read/write Latency System call Signal handling Process creation Basic CPU operations Context switching File and VM system Inter process communication Memory read latencies

Memory copy Measures how fast the system can bcopy data Bcopy copies n bytes from string source to string destination An 8 MB to 8 MB copy, does not fit in the cache Kernel bcopy and C library bcopy C library bcopy shown in the next slide

Metrics in the Benchmark Bandwidth Pipe/TCP Cached file read Memory copy Memory read/write Latency System call Signal handling Process creation Basic CPU operations Context switching File and VM system Inter process communication Memory read latencies

Memory read/write Read Measures the time to read data into the processor An unrolled loop that sums up a series of integers Write Measures the time to write data to memory An unrolled loop that stores a value into an integer

1 2 3

Metrics in the Benchmark Bandwidth Pipe/ TCP Cached file read Memory copy Memory read/write Latency System call Signal handling Process creation Basic CPU operations Context switching Inter process communication File and VM system Memory read latencies

Operating System Entry/ Signal Handling / Process Creation Costs Process-related latencies System Call null call, null I/O, stat, open/close Signal Handling signal installation, signal handling Process Creation fork + exit, fork + execve, fork + /bin/sh -c

Metrics in the Benchmark Bandwidth Pipe/ TCP Cached file read Memory copy Memory read/write Latency System call Signal handling Process creation Basic CPU operations Context switching Inter process communication File and VM system Memory read latencies

Metrics in the Benchmark Bandwidth Pipe/ TCP Cached file read Memory copy Memory read/write Latency System call Signal handling Process creation Basic CPU operations Context switching Inter process communication File and VM system Memory read latencies

Context Switching The time to save the state of one process and restore the state of another process The processes are connected in a ring of Unix pipes A token is passed from process to process The process allocates an array and sums the array Context-switch time doesn't include the overhead of doing the work. Two parameters: number and size of processes

Metrics in the Benchmark Bandwidth Pipe/ TCP Cached file read Memory copy Memory read/write Latency System call Signal handling Process creation Basic CPU operations Context switching Inter process communication File and VM system Memory read latencies

Interprocess Communication Latencies Passing a small message back and forth between two processes The time reported is one round trip Message size: a byte or a word Metrics: Pipe, Unix Socket, UDP and TCP, RPC/UDP- TCP, TCP connection latency

Metrics in the Benchmark Bandwidth Pipe/ TCP Cached file read Memory copy Memory read/write Latency System call Signal handling Process creation Basic CPU operations Context switching Inter process communication File and VM system Memory read latencies

File & VM System File create/ delete creates a number of small files in the current working directory and then removes the files Mmap latency : costs of mmapping and unmmapping varying file sizes Prot fault : the time to catch a protection fault Page fault : the cost of page faulting pages from a file 100 fd selct : the time to do a select on n file descriptors

Metrics in the Benchmark Bandwidth Pipe/ TCP Cached file read Memory copy Memory read/write Latency System call Signal handling Process creation Basic CPU operations Context switching Inter process communication File and VM system Memory read latencies

Memory Latencies Measures memory read latency for varying memory sizes and strides The size of the array starts from 512 bytes The stride varies from 16 to 1024 Does not include the instruction execution time

Conclusion the best has problems IPC bandwidthCoW, Cu Cashed I/O bandwidth WCo, Hg Memory R/W Bandwidth WCo, Hg Process Creation CuCo CPU opsW, Co, HgCu Network LatWCo, Cu Memory LatW, CoCu

THANK YOU ! Have a nice weekend !

References “Lmbench – Tools for Performance Analysis” Larry McVoy and Carl Staelin, “Lmbench: Portable tools for performance analysis” sd96/full_papers/mcvoy.pdf Carl Staelin, “Lmbench:an extensible micro-benchmark suite”