Measuring Time Alan L. Cox Some slides adapted from CMU 15.213 slides.

Slides:



Advertisements
Similar presentations
Time Measurement Topics Time scales Interval counting Cycle counters K-best measurement scheme time.ppt CS 105 Tour of Black Holes of Computing.
Advertisements

Memory Protection: Kernel and User Address Spaces  Background  Address binding  How memory protection is achieved.
Chapter 12 CPU Structure and Function. CPU Sequence Fetch instructions Interpret instructions Fetch data Process data Write data.
SE-292 High Performance Computing Profiling and Performance R. Govindarajan
CSC 501 Lecture 2: Processes. Von Neumann Model Both program and data reside in memory Execution stages in CPU: Fetch instruction Decode instruction Execute.
OS2-1 Chapter 2 Computer System Structures. OS2-2 Outlines Computer System Operation I/O Structure Storage Structure Storage Hierarchy Hardware Protection.
Processes CSCI 444/544 Operating Systems Fall 2008.
1 Lecture 6 Performance Measurement and Improvement.
Performance D. A. Patterson and J. L. Hennessey, Computer Organization & Design: The Hardware Software Interface, Morgan Kauffman, second edition 1998.
Time Measurement October 25, 2001 Topics Time scales Interval counting Cycle counters K-best measurement scheme class18.ppt.
Introduction to Operating Systems What is an operating system? Examples How do many programs run at the same time, with one processor?
Code Optimization I: Machine Independent Optimizations Sept. 26, 2002 Topics Machine-Independent Optimizations Code motion Reduction in strength Common.
Time Measurement CS 201 Gerson Robboy Portland State University Topics Time scales Interval counting Cycle counters K-best measurement scheme.
Copyright © 1998 Wanda Kunkle Computer Organization 1 Chapter 2.1 Introduction.
Operating Systems 1 K. Salah Module 1.2: Fundamental Concepts Interrupts System Calls.
Lecture 3: Computer Performance
Time Measurement Topics Time scales Interval counting Cycle counters K-best measurement scheme class18.ppt.
Time Measurement December 9, 2004 Topics Time scales Interval counting Cycle counters K-best measurement scheme Related tools & ideas class29.ppt
1 Measuring Performance Chris Clack B261 Systems Architecture.
1 1 Profiling & Optimization David Geldreich (DREAM)
Virtual Memory.
System Calls 1.
Page 19/17/2015 CSE 30341: Operating Systems Principles Optimal Algorithm  Replace page that will not be used for longest period of time  Used for measuring.
Lecture 8. Profiling - for Performance Analysis - Prof. Taeweon Suh Computer Science Education Korea University COM503 Parallel Computer Architecture &
CSC 501 Lecture 2: Processes. Process Process is a running program a program in execution an “instantiation” of a program Program is a bunch of instructions.
Operating Systems Lecture 2 Processes and Threads Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard. Zhiqing Liu School of.
Recall: Three I/O Methods Synchronous: Wait for I/O operation to complete. Asynchronous: Post I/O request and switch to other work. DMA (Direct Memory.
Timing and Profiling ECE 454 Computer Systems Programming Topics: Measuring and Profiling Cristiana Amza.
10/19/2015Erkay Savas1 Performance Computer Architecture – CS401 Erkay Savas Sabanci University.
Operating Systems Process Management.
Performance.
CSE 303 Concepts and Tools for Software Development Richard C. Davis UW CSE – 12/6/2006 Lecture 24 – Profilers.
Multiprogramming. Readings r Silberschatz, Galvin, Gagne, “Operating System Concepts”, 8 th edition: Chapter 3.1, 3.2.
Timing Programs and Performance Analysis Tools for Analysing and Optimising advanced Simulations.
Adv. UNIX: Profile/151 Advanced UNIX v Objectives –introduce profiling based on execution times and line counts Special Topics in Comp.
Operating System Structure A key concept of operating systems is multiprogramming. –Goal of multiprogramming is to efficiently utilize all of the computing.
Introduction to ECE 454 Computer Systems Programming Topics: Lecture topics and assignments Profiling rudiments Lab schedule and rationale Cristiana Amza.
Morgan Kaufmann Publishers
Copyright ©: Nahrstedt, Angrave, Abdelzaher, Caccamo1 Timers and Clocks II.
We will focus on operating system concepts What does it do? How is it implemented? Apply to Windows, Linux, Unix, Solaris, Mac OS X. Will discuss differences.
Operating Systems 1 K. Salah Module 1.2: Fundamental Concepts Interrupts System Calls.
1 Computer Systems II Introduction to Processes. 2 First Two Major Computer System Evolution Steps Led to the idea of multiprogramming (multiple concurrent.
Lecture 2a: Performance Measurement. Goals of Performance Analysis The goal of performance analysis is to provide quantitative information about the performance.
Processor Structure and Function Chapter8:. CPU Structure  CPU must:  Fetch instructions –Read instruction from memory  Interpret instructions –Instruction.
Performance Performance
Computer Organization Instruction Set Architecture (ISA) Instruction Set Architecture (ISA), or simply Architecture, of a computer is the.
Time Management.  Time management is concerned with OS facilities and services which measure real time.  These services include:  Keeping track of.
Time Sources and Timing David Ferry, Chris Gill CSE 522S - Advanced Operating Systems Washington University in St. Louis St. Louis, MO
Performance Evaluation May 2, 2000 Topics Getting accurate measurements Amdahl’s Law class29.ppt.
Interrupts and Exception Handling. Execution We are quite aware of the Fetch, Execute process of the control unit of the CPU –Fetch and instruction as.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Operating Systems Overview: Using Hardware.
Measuring Performance Based on slides by Henri Casanova.
July 10, 2016ISA's, Compilers, and Assembly1 CS232 roadmap In the first 3 quarters of the class, we have covered 1.Understanding the relationship between.
Processes and threads.
Exceptional Control Flow
Homework Reading Machine Projects Labs
Time Sources and Timing
Mechanism: Limited Direct Execution
Time Sources and Timing
CS2100 Computer Organisation
Operating Systems CPU Scheduling.
Lecture Topics: 11/1 General Operating System Concepts Processes
Morgan Kaufmann Publishers Computer Performance
Time Sources and Timing
System calls….. C-program->POSIX call
Performance.
Code Optimization and Performance
In Today’s Class.. General Kernel Responsibilities Kernel Organization
Chapter 13: I/O Systems “The two main jobs of a computer are I/O and [CPU] processing. In many cases, the main job is I/O, and the [CPU] processing is.
CS2100 Computer Organisation
Presentation transcript:

Measuring Time Alan L. Cox Some slides adapted from CMU slides

Alan L. CoxMeasuring Time2 Computer Time Scales Two Fundamental Time Scales  Processor: ~10 –9 s  External events: ~10 –2 s Keyboard input Disk seek Screen refresh Implication  Can execute many instructions while waiting for external event to occur  Can alternate among processes without anyone noticing Time Scale (1 GHz Machine) 1.E-091.E-061.E-031.E+00 Time (seconds) 1 ns 1  s 1 ms1 s Integer Add FP Multiply FP Divide Keystroke Interrupt Handler Disk Access Screen Refresh Keystroke Microscopic Macroscopic

Alan L. CoxMeasuring Time3 Measurement Challenge How Much Time Does Program X Require?  CPU time How many total seconds are used when executing X? Measure used for most applications Small dependence on other system activities  Actual (“Wall Clock”) Time How many seconds elapse between the start and the completion of X? Depends on system load, I/O times, etc. Confounding Factors  How does time get measured?  Many processes share computing resources Transient effects when switching from one process to another Suddenly, the effects of alternating among processes become noticeable

Alan L. CoxMeasuring Time4 “Time” on a Computer System real (wall clock) time = user time (time executing instructions in the user process) += real (wall clock) time We will use the word “time” to refer to user time = system time (time executing instructions in kernel on behalf of user process) + = some other user’s time (time executing instructions for a different user process) cumulative user time

Alan L. CoxMeasuring Time5 Activity Periods: Light Load  Most of the time spent executing one process  Periodic interrupts every 10ms Interval timer Keep system from executing one process to exclusion of others  Other interrupts Due to I/O activity  Inactivity periods System time spent processing interrupts Activity Periods, Load = Time (ms) Active Inactive

Alan L. CoxMeasuring Time6 Activity Periods: Heavy Load  Sharing processor with one other active process  From perspective of this process, system appears to be “inactive” for ~50% of the time Other process is executing Activity Periods, Load = Time (ms) Active Inactive

Alan L. CoxMeasuring Time7 Interval Counting OS Measures Runtimes Using Interval Timer  Maintain 2 counts per process User time System time  Each timer interrupt, increment counter for executing process User time if running in user mode System time if running in kernel mode

Alan L. CoxMeasuring Time8 Interval Counting Example

Alan L. CoxMeasuring Time9 Unix time Command  0.82 seconds user time 82 timer intervals  0.30 seconds system time 30 timer intervals  1.32 seconds wall clock time  84.8% of total was used running these processes ( )/1.32 =.848 unix% time make osevent gcc -O2 -Wall –Wextra -g -c clock.c gcc -O2 -Wall –Wextra -g -c options.c gcc -O2 -Wall –Wextra -g -c load.c gcc -O2 -Wall –Wextra -g -o osevent u 0.300s 0: %

Alan L. CoxMeasuring Time10 Accuracy of Interval Counting Worst Case Analysis  Timer Interval =   Single process segment measurement can be off by   No bound on error for multiple segments Could consistently underestimate, or consistently overestimate A A Minimum Maximum A A Minimum Maximum Computed time = 70ms Min Actual = 60 +  Max Actual = 80 – 

Alan L. CoxMeasuring Time11 Accuracy of Interval Counting Average Case Analysis  Over/underestimates tend to balance out  As long as total run time is sufficiently large Min run time ~1 second 100 timer intervals  Consistently miss ~4% overhead due to timer interrupts A A Minimum Maximum A A Minimum Maximum Computed time = 70ms Min Actual = 60 +  Max Actual = 80 – 

Alan L. CoxMeasuring Time12 Cycle Counters Most modern systems have built in registers that are incremented every clock cycle  Very fine grained  Often counts elapsed global time On x86 and x86-64 machines:  64 bit counter Cycle counter period: –A 3 GHz machine wraps around every 195 years  Special instruction to access

Alan L. CoxMeasuring Time13 x86 Cycle Counter RDTSC  Assembly instruction to access 64-bit cycle counter  Places low/high 32 bits in two different registers  Expressed as machine cycles, not nanoseconds uint64_t rdtsc(void) { uint32_t low, high; /* Get cycle counter */ asm("rdtsc“ : "=a" (low), "=d" (high)); /* %eax, %edx */ return (low | ((uint64_t)high << 32)); }

Alan L. CoxMeasuring Time14 Measuring Cycles with rdtsc() Idea  Get current cycle counter  Compute something  Get new cycle counter  Perform 64-bit subtraction to get elapsed cycles uint64_t start, end; int i; int iters = 100; start = rdtsc(); for (i = 0; i < iters; i++) getpid(); end = rdtsc(); printf(“getpid(): Average cycles = %ld\n”, (end – start) / iters);

Alan L. CoxMeasuring Time15 Converting Cycles to Seconds Idea  Compute elapsed cycles  Get processor’s clock frequency (cycles/second)  Divide elapsed cycles by the clock frequency How do you get the clock frequency? UNIX% cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 23 model name : Intel(R) Xeon(R) CPU 3.16GHz stepping : 6 cpu MHz : cache size : 6144 KB …

Alan L. CoxMeasuring Time16 Measurement Pitfalls Overhead/resolution  Calling rdtsc() incurs some overhead  Resolution of rdtsc() may not allow very short code sequences to be timed  Want to measure long enough code sequence to compensate Unexpected Cache Effects  artificial hits or misses  e.g., these measurements were taken with the Alpha cycle counter:  foo1(array1, array2, array3); /* 68,829 cycles */  foo2(array1, array2, array3); /* 23,337 cycles */ vs.  foo2(array1, array2, array3); /* 70,513 cycles */  foo1(array1, array2, array3); /* 23,203 cycles */

Alan L. CoxMeasuring Time17 Dealing with Overhead & Cache Effects  Always execute function once to “warm up” cache  Keep doubling number of times execute P() until reach some threshold (i.e., CMIN = 50000) int cnt = 1; int i; uint64_t start, end, tm; do { P(); /* Warm up cache */ start = rdtsc(); for (i = 0; i < cnt; i++) P(); end = rdtsc(); tm = (end – start) / cnt; cnt += cnt; } while ((end – start) < CMIN); /* Make sure long enough */ return (tm);

Alan L. CoxMeasuring Time18 Multitasking Effects Cycle Counter Measures Elapsed Time  Keeps accumulating during periods of inactivity System activity Running other processes Key Observation  Cycle counter never underestimates program run time  Possibly overestimates by large amount

Alan L. CoxMeasuring Time19 High Resolution CPU Time #include struct timespec { time_t tv_sec; /* seconds */ long tv_nsec; /* nanoseconds */ }; int clock_gettime(clockid_t id, struct timespec *tp); struct timespec ts; clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &ts);  High resolution per-process timer based on the cycle counter However, higher overhead than rdtsc()  CLOCK_PROCESS_CPUTIME_ID is not portable (Linux only)

Alan L. CoxMeasuring Time20 Time of Day Clock  Unix gettimeofday() function Elapsed time since reference time (1/1/1970)  Implementation Uses interval counting on some machines –Coarse grained Uses cycle counter on others –Fine grained, but significant overhead and only 1 microsecond resolution #include struct timeval tstart, tfinish; double tsecs; gettimeofday(&tstart, NULL); P(); gettimeofday(&tfinish, NULL); tsecs = (tfinish.tv_sec - tstart.tv_sec) + 1e-6 * (tfinish.tv_usec - tstart.tv_usec);

Alan L. CoxMeasuring Time21 Measurement Summary Timing is highly case and system dependent  What is overall duration being measured? > 1 second: interval counting is OK << 1 second: must use cycle counters  On what hardware / OS / OS version? Accessing counters ( clock_gettime ? rdtsc ?) Timer interrupt overhead Scheduling policy Devising a Measurement Method  Long durations: use Unix time command  Short durations Use clock_gettime or gettimeofday Work directly with cycle counters

Alan L. CoxMeasuring Time22 Important Tools When Optimizing Observation  Generating assembly code Lets you see what optimizations compiler can make Understand capabilities/limitations of particular compiler Measurement  Accurately compute time taken by code Most modern machines have built in cycle counters Using them to get reliable measurements is tricky  Profile procedure calling frequencies Unix: gprof

Alan L. CoxMeasuring Time23 Profiling Example Task 1. Count word frequencies in text document 2. Produce sorted list of words in descending frequency Steps 1. Convert strings to lowercase 2. Apply hash function 3. Read words and insert into hash table Mostly list operations Maintain counter for each unique word 4. Sort results Data Set  Collected works of Shakespeare  946,596 total words, 26,596 unique  Initial implementation: 9.2 seconds Shakespeare’s most frequent words 29,801the 27,529and 21,029I 20,957to 18,514of 15,370a 14,010you 12,936my 11,722in 11,519that

Alan L. CoxMeasuring Time24 Profiling Example Augment Executable Program with Timing Functions Computes (approximate) amount of time spent in each function  Periodically (~ every 10ms) interrupt program  Determine what function is currently executing  Increment its timer by interval (e.g., 10ms) Counts how many times each function is called gcc –O2 –pg prog.c –o prog prog gprof prog Executes normally, plus generates file gmon.out Generates profile info from gmon.out

Alan L. CoxMeasuring Time25 Profiling Example: Results % cumulative self self total time seconds seconds calls ms/call ms/call name sort_words lower find_ele_rec h_add time and #calls per function Bottleneck: Inefficient sort: 1 call = 87% of CPU time average time per function call (self = function, total = function + children)

Alan L. CoxMeasuring Time26 Profiling Example Optimized: 1 Use library qsort instead

Alan L. CoxMeasuring Time27 Profiling Example Optimized: 2 Use iterative function to insert elements into linked list first or last Last  tends to place most common words at front of list More hash buckets Better hash function Move strlen out of lower loop

Alan L. CoxMeasuring Time28 Profiling Observations Benefits  Helps identify performance bottlenecks  Especially useful with large, complex systems Limitations  Only shows performance for data tested E.g., mostly short test words  linear lower didn’t gain big –Quadratic inefficiency could remain lurking in code  Timing mechanism fairly crude Only accurate for programs that run long enough (> 3 sec) for (i=0; i<strlen(str); i++) { str[i] = tolower(str[i]); }

Alan L. CoxMeasuring Time29 Better profilers Linux OProfile  No special compilation options Profile preexisting, stock applications  Uses cycle counters  Simultaneously profile application and operating system

Alan L. CoxMeasuring Time30 Next Time Virtual Memory