Time Measurement CS 201 Gerson Robboy Portland State University Topics Time scales Interval counting Cycle counters K-best measurement scheme.

Slides:



Advertisements
Similar presentations
Time Measurement Topics Time scales Interval counting Cycle counters K-best measurement scheme time.ppt CS 105 Tour of Black Holes of Computing.
Advertisements

Lecture 11: Operating System Services. What is an Operating System? An operating system is an event driven program which acts as an interface between.
Exceptional Control Flow Processes Today. Control Flow Processors do only one thing: From startup to shutdown, a CPU simply reads and executes (interprets)
Virtual Memory Introduction to Operating Systems: Module 9.
SE-292 High Performance Computing Profiling and Performance R. Govindarajan
CSCE 3121 Computer Organization Lecture 1. CSCE 3122 Course Overview Topics: –Theme –Five great realities of computer systems –Computer System Overview.
Inline Assembly Section 1: Recitation 7. In the early days of computing, most programs were written in assembly code. –Unmanageable because No type checking,
1 Lecture 6 Performance Measurement and Improvement.
PC hardware and x86 3/3/08 Frans Kaashoek MIT
1 Soft Timers: Efficient Microsecond Software Timer Support For Network Processing Mohit Aron and Peter Druschel Rice University Presented By Jonathan.
1 ICS 51 Introductory Computer Organization Fall 2006 updated: Oct. 2, 2006.
Time Measurement October 25, 2001 Topics Time scales Interval counting Cycle counters K-best measurement scheme class18.ppt.
CSCE 121, Sec 200, 507, 508 Fall 2010 Prof. Jennifer L. Welch.
Introduction to Computer Systems Topics: Theme Five great realities of computer systems How this fits within CS curriculum CS 213 F ’03 class01a.ppt
Lecture 3: Computer Performance
Time Measurement Topics Time scales Interval counting Cycle counters K-best measurement scheme class18.ppt.
1 Lecture 10: FP, Performance Metrics Today’s topics:  IEEE 754 representations  FP arithmetic  Evaluating a system Reminder: assignment 4 due in a.
Time Measurement December 9, 2004 Topics Time scales Interval counting Cycle counters K-best measurement scheme Related tools & ideas class29.ppt
Introduction to Computer Systems Topics: Theme Five great realities of computer systems How this fits within CS curriculum F ’04 class01a.ppt
1 Timing System Timing System Applications. 2 Timing System components Counting mechanisms Input capture mechanisms Output capture mechanisms.
6.828: PC hardware and x86 Frans Kaashoek
Page 19/17/2015 CSE 30341: Operating Systems Principles Optimal Algorithm  Replace page that will not be used for longest period of time  Used for measuring.
CHAPTER 3 TOP LEVEL VIEW OF COMPUTER FUNCTION AND INTERCONNECTION
Memory Management 3 Tanenbaum Ch. 3 Silberschatz Ch. 8,9.
Fall 2012 Chapter 2: x86 Processor Architecture. Irvine, Kip R. Assembly Language for x86 Processors 6/e, Chapter Overview General Concepts IA-32.
CSC469 Tutorial2 Inline assembly, cycle counters, Gnuplot, LaTeX Bogdan Simion.
Operating Systems Process Management.
Assembly Code Optimization Techniques for the AMD64 Athlon and Opteron Architectures David Phillips Robert Duckles Cse 520 Spring 2007 Term Project Presentation.
Computer Organization & Assembly Language © by DR. M. Amer.
Processor Architecture
15-410, S’ Hardware Overview Jan. 19, 2004 Dave Eckhardt Bruce Maggs L04_Hardware “Dude, what were you thinking?”
O PERATING S YSTEM. What is an Operating System? An operating system is an event driven program which acts as an interface between a user of a computer,
Concurrency, Processes, and System calls Benefits and issues of concurrency The basic concept of process System calls.
Lecture 2a: Performance Measurement. Goals of Performance Analysis The goal of performance analysis is to provide quantitative information about the performance.
Processor Structure and Function Chapter8:. CPU Structure  CPU must:  Fetch instructions –Read instruction from memory  Interpret instructions –Instruction.
Computer Organization Instruction Set Architecture (ISA) Instruction Set Architecture (ISA), or simply Architecture, of a computer is the.
October 1, 2003Serguei A. Mokhov, 1 SOEN228, Winter 2003 Revision 1.2 Date: October 25, 2003.
Embedding Assembly Code in C Programs תרגול 7 שילוב קוד אסמבלי בקוד C.
Time Management.  Time management is concerned with OS facilities and services which measure real time.  These services include:  Keeping track of.
Performance Evaluation May 2, 2000 Topics Getting accurate measurements Amdahl’s Law class29.ppt.
1 Hardware Overview Dave Eckhardt
Interrupts and Exception Handling. Execution We are quite aware of the Fetch, Execute process of the control unit of the CPU –Fetch and instruction as.
Announcements Assignment 2 Out Today Quiz today - so I need to shut up at 4:25 1.
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
Week 6 Dr. Muhammad Ayaz Intro. to Assembly Language.
Measuring Time Alan L. Cox Some slides adapted from CMU slides.
Chapter Overview General Concepts IA-32 Processor Architecture
Real-World Pipelines Idea Divide process into independent stages
Lecture 1: Operating System Services
Lecture 4 Timing Linux Processes & Threads
Exceptional Control Flow
Basic Processor Structure/design
Homework Reading Machine Projects Labs
Time Sources and Timing
Computer Organization & Assembly Language Chapter 3
Time Sources and Timing
CS2100 Computer Organisation
Pipelining: Advanced ILP
Introduction to C++ Programming
Introduction to Computer Systems
CS 301 Fall 2002 Computer Organization
Lecture Topics: 11/1 General Operating System Concepts Processes
Introduction to Computer Systems
Virtual Memory Prof. Eric Rotenberg
Time Sources and Timing
System calls….. C-program->POSIX call
Computer Architecture Assembly Language
CS2100 Computer Organisation
The ‘asm’ construct An introduction to the GNU C/C++ compiler’s obscure syntax for doing inline assembly language.
Presentation transcript:

Time Measurement CS 201 Gerson Robboy Portland State University Topics Time scales Interval counting Cycle counters K-best measurement scheme

– 2 – Computer Time Scales Two Fundamental Time Scales Processor: ~10 –9 sec. External events: ~10 –2 sec. Keyboard input Disk seek Screen refreshImplication Can execute many instructions while waiting for external event to occur Can alternate among processes without anyone noticing Time Scale (1 Ghz Machine) 1.E-091.E-061.E-031.E+00 Time (seconds) 1 ns 1  s 1 ms1 s Integer Add FP Multiply FP Divide Keystroke Interrupt Handler Disk Access Screen Refresh Keystroke Microscopic Macroscopic

– 3 – Measurement Challenge How Much Time Does Program X Require? CPU time How many total seconds are used when executing X? Small dependence on other system activities Actual (“Wall”) Time How many seconds elapse between the start and the completion of X? Depends on system load, I/O times, etc. Confounding Factors How does time get measured? Many processes share computing resources

– 4 – “Time” on a Computer System real (wall clock) time = user time (time executing instructions in the user process) += real (wall clock) time We will use the word “time” to refer to user time. = system time (time executing instructions in kernel on behalf of user process) + = some other user’s time (time executing instructions in different user’s process) cumulative user time

– 5 – Activity Periods: Light Load Most of the time spent executing one process Periodic interrupts every 10ms Interval timer Schedules processes to run Other interrupts Due to I/O activity Inactivity periods System time spent processing interrupts ~250,000 clock cycles Activity Periods, Load = Time (ms) Active Inactive

– 6 – Activity Periods: Heavy Load Sharing processor with one other active process From perspective of this process, system appears to be “inactive” for ~50% of the time Other process is executing Activity Periods, Load = Time (ms) Active Inactive

– 7 – Interval Counting OS Measures Runtimes Using Interval Timer In other words, statistical sampling. Similar to profiling. Maintain 2 counts per process User time System time On each timer interrupt, increment counter for executing process User time if running in user mode System time if running in kernel mode

– 8 – Interval Counting Example Au AsBuBsBu AsAu BsBu BsAu As (a) Interval Timings BBAAA (b) Actual Times B AA B A120.0u s B73.3u s A Au AsBuBsBu AsAu BsBu BsAu As (a) Interval Timings BBAAA Au AsBuBsBu AsAu BsBu BsAu As (a) Interval Timings BBAAA (b) Actual Times B AA B A120.0u s B73.3u s A (b) Actual Times B AA B A120.0u s B73.3u s A Exercise: What timings does the interval timer give us for Au, As, Bu, and Bs? How far off are they from “actual?”

– 9 – Unix time Command 0.82 seconds user time 82 timer intervals 0.30 seconds system time 30 timer intervals 1.32 seconds wall time 84.8% of total was used running these processes ( )/1.32 =.848 Exactly what process or processes are using that.82 of user time? time make osevent gcc -O2 -Wall -g -march=i486 -c clock.c gcc -O2 -Wall -g -march=i486 -c options.c gcc -O2 -Wall -g -march=i486 -c load.c gcc -O2 -Wall -g -march=i486 -o osevent osevent.c u 0.300s 0: % 0+0k 0+0io 4049pf+0w

– 10 – Accuracy of Interval Counting Worst Case Analysis Single process segment measurement can be off by how much? No bound on error for multiple segments Could consistently underestimate, or consistently overestimate Pretty unlikely A A Minimum Maximum A A Minimum Maximum Computed time = 70ms Min Actual = 60 +  Max Actual = 80 – 

– 11 – Accuracy of Int. Cntg. (cont.) Average Case Analysis Over/underestimates tend to balance out As long as total run time is sufficiently large Min run time ~1 second 100 timer intervals Consistently miss 4% overhead due to timer interrupts A A Minimum Maximum A A Minimum Maximum Computed time = 70ms Min Actual = 60 +  Max Actual = 80 – 

– 12 – Cycle Counters Most modern systems have built in registers that are incremented every clock cycle Very fine grained Elapsed global time (“wall clock” time) Accessed with a special instruction On (recent model) Intel machines: It’s a 64 bit counter called the time stamp counter. Accessed with RDTSC instruction RDTSC sets %edx to high order 32-bits, %eax to low order 32-bits

– 13 – Cycle Counter Period Wrap Around Times for 550 MHz machine Low order 32 bits wrap around every 2 32 / (550 * 10 6 ) = 7.8 seconds High order 64 bits wrap around every 2 64 / (550 * 10 6 ) = seconds 1065 years For 2 GHz machine Low order 32-bits every 2.1 seconds High order 64 bits every 293 years

– 14 – Measuring with Cycle Counter Main Idea Get current value of cycle counter store as pair of unsigned’s cyc_hi and cyc_lo Compute something Get new value of cycle counter Perform double precision subtraction to get elapsed cycles /* Keep track of most recent reading of cycle counter */ static unsigned cyc_hi = 0; static unsigned cyc_lo = 0; void start_counter() { /* Get current value of cycle counter */ access_counter(&cyc_hi, &cyc_lo); }

– 15 – So how do you implement access_counter() in C? Need to do an RDTSC instruction. One way: Write a function in assembly language, in a separate source file, conforming to the C language call/return conventions. An easier way with gcc: Use inline assembly code When you see the syntax, you may not believe it’s easier

– 16 – Accessing Linux Cycle Counter GCC allows inline assembly code with mechanism for matching registers with program variables This only works on x86 machine compiling with GCC Emit assembly with rdtsc and two movl instructions void access_counter(unsigned *hi, unsigned *lo) { /* Get cycle counter */ asm("rdtsc; movl %edx,%0; movl %eax,%1" : "=r" (*hi), "=r" (*lo) : /* No input */ : "%edx", "%eax"); }

– 17 – Extended ASM: A Closer Look Instruction String Series of assembly commands Separated by “;” or “\n” Use “%” where normally would use “%” asm(“Instruction String" : Output List : Input List : Clobbers List); } void access_counter (unsigned *hi, unsigned *lo) { /* Get cycle counter */ asm("rdtsc; movl %edx,%0; movl %eax,%1" : "=r" (*hi), "=r" (*lo) : /* No input */ : "%edx", "%eax"); }

– 18 – Extended ASM, Cont. Output List Expressions indicating destinations for values %0, %1, …, %j Enclosed in parentheses Must be lvalue »Value that can appear on LHS of assignment Tag "=r" indicates that symbolic value (%0, etc.), should be replaced by register asm(“Instruction String" : Output List : Input List : Clobbers List); } void access_counter (unsigned *hi, unsigned *lo) { /* Get cycle counter */ asm("rdtsc; movl %edx,%0; movl %eax,%1" : "=r" (*hi), "=r" (*lo) : /* No input */ : "%edx", "%eax"); }

– 19 – Extended ASM, Cont. Input List Series of expressions indicating sources for values %j+1, %j+2, … Enclosed in parentheses Any expression returning value Tag "r" indicates that symbolic value (%0, etc.) will come from register asm(“Instruction String" : Output List : Input List : Clobbers List); } void access_counter (unsigned *hi, unsigned *lo) { /* Get cycle counter */ asm("rdtsc; movl %edx,%0; movl %eax,%1" : "=r" (*hi), "=r" (*lo) : /* No input */ : "%edx", "%eax"); }

– 20 – Extended ASM, Cont. Clobbers List List of register names that get altered by assembly instruction Compiler will make sure doesn’t store something in one of these registers that must be preserved across asm Value set before & used after asm(“Instruction String" : Output List : Input List : Clobbers List); } void access_counter (unsigned *hi, unsigned *lo) { /* Get cycle counter */ asm("rdtsc; movl %edx,%0; movl %eax,%1" : "=r" (*hi), "=r" (*lo) : /* No input */ : "%edx", "%eax”); }

– 21 – Accessing Cycle Counter, Cont. Emitted Assembly Code Used %ecx for *hi (replacing %0) Used %ebx for *lo (replacing %1) Does not use %eax or %edx for value that must be carried across inserted assembly code movl 8(%ebp),%esi# hi movl 12(%ebp),%edi# lo rdtsc; movl %edx,%ecx; movl %eax,%ebx movl %ecx,(%esi)# Store high bits at *hi movl %ebx,(%edi)# Store low bits at *lo

– 22 – Accessing the Cycle Cntr. (cont.) Now are you convinced you’ll never write one of those? Relax, no one ever does, from scratch. Start with an existing snippet of code that looks pretty much like what you want. Inline assembly really comes in handy sometimes, once you get used to it.

– 23 – Completing the Measurement Perform double precision subtraction to get elapsed cycles Express as double to avoid overflow problems double get_counter() { unsigned ncyc_hi, ncyc_lo unsigned hi, lo, borrow; /* Get cycle counter */ access_counter(&ncyc_hi, &ncyc_lo); /* Do double precision subtraction */ lo = ncyc_lo - cyc_lo; borrow = lo > ncyc_lo; hi = ncyc_hi - cyc_hi - borrow; return (double) hi * (1 << 30) * 4 + lo; } Why do they do this? borrow = lo > ncyc_lo;

– 24 – Timing With Cycle Counter Determine Clock Rate of Processor Count number of cycles required for some fixed number of seconds Time Function P First attempt: Simply count cycles for one execution of P double tsecs; start_counter(); P(); tsecs = get_counter() / (MHZ * 1e6); double MHZ; int sleep_time = 10; start_counter(); sleep(sleep_time); MHZ = get_counter()/(sleep_time * 1e6);

– 25 – Measurement Pitfalls Overhead Calling get_counter() incurs small amount of overhead Want to measure long enough code sequence to compensate Unexpected Cache Effects artificial hits or misses e.g., these measurements were taken with the Alpha cycle counter: foo1(array1, array2, array3); /* 68,829 cycles */ foo2(array1, array2, array3); /* 23,337 cycles */ vs. foo2(array1, array2, array3); /* 70,513 cycles */ foo1(array1, array2, array3); /* 23,203 cycles */

– 26 – Dealing with Cache Effects Execute function once to “warm up” the cache Both data and instructions P();/* Warm up cache */ start_counter(); P(); cmeas = get_counter(); Issue: Some functions don’t execute in cache Depends on both the algorithm and the data set Need to do measurements with appropriate data

– 27 – Multitasking Effects Cycle Counter Measures Elapsed Time Keeps accumulating during periods of inactivity System activity Running other processes Key Observation Cycle counter never underestimates program run time Possibly overestimates by large amount K-Best Measurement Scheme Perform up to N (e.g., 20) measurements of function See if fastest K (e.g., 3) within some relative factor  (e.g., 0.001) K

– 28 – K-Best Validation Very good accuracy for < 8ms Within one timer interval Even when heavily loaded Less accurate of > 10ms Light load: ~4% error Interval clock interrupt handling Heavy load: Very high error K = 3,  = 0.001

– 29 – Time of Day Clock Unix gettimeofday() function Return elapsed time since reference time (Jan 1, 1970) Implementation Uses interval counting on some machines »Coarse grained Uses cycle counter on others »Fine grained, but significant overhead and only 1 microsecond resolution #include struct timeval tstart, tfinish; double tsecs; gettimeofday(&tstart, NULL); P(); gettimeofday(&tfinish, NULL); tsecs = (tfinish.tv_sec - tstart.tv_sec) + 1e6 * (tfinish.tv_usec - tstart.tv_usec);

– 30 – Nice things about gettimeofday() Portable! Portable! No need to do double-precision integer arithmetic No need to do double-precision integer arithmetic 1 Microsecond resolution is pretty good for most purposes 1 Microsecond resolution is pretty good for most purposes

– 31 – K-Best Using gettimeofday Linux As good as using cycle counter For times > 10 microsecondsWindows Implemented by interval counting Too coarse-grained

– 32 – Measurement Summary Timing is highly case and system dependent What is overall duration being measured? > 1 second: interval counting is OK << 1 second: must use cycle counters On what hardware / OS / OS version? Accessing counters »How gettimeofday is implemented Timer interrupt overhead Scheduling policy Devising a Measurement Method Long durations: use Unix timing functions Short durations If possible, use gettimeofday Otherwise must work with cycle counters K-best scheme most successful