Introduction to ECE 454 Computer Systems Programming Topics: Lecture topics and assignments Profiling rudiments Lab schedule and rationale Cristiana Amza.

Slides:



Advertisements
Similar presentations
Code Optimization and Performance Chapter 5 CS 105 Tour of the Black Holes of Computing.
Advertisements

Part IV: Memory Management
Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property.
ECE 454 Computer Systems Programming Compiler and Optimization (I) Ding Yuan ECE Dept., University of Toronto
OS Fall’02 Virtual Memory Operating Systems Fall 2002.
Fundamentals of Python: From First Programs Through Data Structures
CMPE 421 Parallel Computer Architecture MEMORY SYSTEM.
Fall 2011SYSC 5704: Elements of Computer Systems 1 SYSC 5704 Elements of Computer Systems Optimization to take advantage of hardware.
CMPT 300: Operating Systems I Dr. Mohamed Hefeeda
Introduction to Computer Systems Topics: Staff, text, and policies Lecture topics and assignments Lab rationale and infrastructure F ’08 class01b.ppt.
1 School of Computing Science Simon Fraser University CMPT 300: Operating Systems I Dr. Mohamed Hefeeda.
1 Lecture 6 Performance Measurement and Improvement.
OS Fall ’ 02 Introduction Operating Systems Fall 2002.
Code Optimization I September 24, 2007 Topics Machine-Independent Optimizations Basic optimizations Optimization blockers class08.ppt F’07.
Threads 1 CS502 Spring 2006 Threads CS-502 Spring 2006.
Incremental Path Profiling Kevin Bierhoff and Laura Hiatt Path ProfilingIncremental ApproachExperimental Results Path profiling counts how often each path.
Code Optimization I: Machine Independent Optimizations Sept. 26, 2002 Topics Machine-Independent Optimizations Code motion Reduction in strength Common.
Lecture 3: Computer Performance
1 ES 314 Advanced Programming Lec 2 Sept 3 Goals: Complete the discussion of problem Review of C++ Object-oriented design Arrays and pointers.
CS 213 Introduction to Computer Systems Course Organization David O’Hallaron August 28, 2001 Topics: Staff, text, and policies Lecture topics and assignments.
Cmpt-225 Simulation. Application: Simulation Simulation  A technique for modeling the behavior of both natural and human-made systems  Goal Generate.
1 Performance Measurement CSE, POSTECH 2 2 Program Performance Recall that the program performance is the amount of computer memory and time needed to.
Basics of Operating Systems March 4, 2001 Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard.
Chapter 91 Memory Management Chapter 9   Review of process from source to executable (linking, loading, addressing)   General discussion of memory.
1 1 Profiling & Optimization David Geldreich (DREAM)
Chapter 3 Memory Management: Virtual Memory
Chocolate Bar! luqili. Milestone 3 Speed 11% of final mark 7%: path quality and speed –Some cleverness required for full marks –Implement some A* techniques.
1 COMPSCI 110 Operating Systems Who - Introductions How - Policies and Administrative Details Why - Objectives and Expectations What - Our Topic: Operating.
LOGO OPERATING SYSTEM Dalia AL-Dabbagh
Operating System Review September 10, 2012Introduction to Computer Security ©2004 Matt Bishop Slide #1-1.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Lecture 8. Profiling - for Performance Analysis - Prof. Taeweon Suh Computer Science Education Korea University COM503 Parallel Computer Architecture &
1 Advance Computer Architecture CSE 8383 Ranya Alawadhi.
Timing and Profiling ECE 454 Computer Systems Programming Topics: Measuring and Profiling Cristiana Amza.
Application Profiling Using gprof. What is profiling? Allows you to learn:  where your program is spending its time  what functions called what other.
CSE 303 Concepts and Tools for Software Development Richard C. Davis UW CSE – 12/6/2006 Lecture 24 – Profilers.
University of Amsterdam Computer Systems – optimizing program performance Arnoud Visser 1 Computer Systems Optimizing program performance.
CS 3500 L Performance l Code Complete 2 – Chapters 25/26 and Chapter 7 of K&P l Compare today to 44 years ago – The Burroughs B1700 – circa 1974.
Navigation Timing Studies of the ATLAS High-Level Trigger Andrew Lowe Royal Holloway, University of London.
Introduction to Computer Systems Topics: Staff, text, and policies Lecture topics and assignments Lab rationale CS 213 F ’02 class01b.ppt “The Class.
Memory Hierarchy Adaptivity An Architectural Perspective Alex Veidenbaum AMRM Project sponsored by DARPA/ITO.
Lecture 5: Threads process as a unit of scheduling and a unit of resource allocation processes vs. threads what to program with threads why use threads.
Radix Sort and Hash-Join for Vector Computers Ripal Nathuji 6.893: Advanced VLSI Computer Architecture 10/12/00.
Compiler Optimizations ECE 454 Computer Systems Programming Topics: The Role of the Compiler Common Compiler (Automatic) Code Optimizations Cristiana Amza.
Machine Independent Optimizations Topics Code motion Reduction in strength Common subexpression sharing.
NETW3005 Memory Management. Reading For this lecture, you should have read Chapter 8 (Sections 1-6). NETW3005 (Operating Systems) Lecture 07 – Memory.
Programming for Performance CS 740 Oct. 4, 2000 Topics How architecture impacts your programs How (and how not) to tune your code.
Chapter 9: Web Services and Databases Title: NiagaraCQ: A Scalable Continuous Query System for Internet Databases Authors: Jianjun Chen, David J. DeWitt,
Time Management.  Time management is concerned with OS facilities and services which measure real time.  These services include:  Keeping track of.
Operating Systems: Summary INF1060: Introduction to Operating Systems and Data Communication.
Code Optimization and Performance I Chapter 5 perf01.ppt CS 105 “Tour of the Black Holes of Computing”
Software Engineering Prof. Dr. Bertrand Meyer March 2007 – June 2007 Chair of Software Engineering Lecture #20: Profiling NetBeans Profiler 6.0.
Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,
Measuring Time Alan L. Cox Some slides adapted from CMU slides.
CSE 340 Computer Architecture Summer 2016 Understanding Performance.
July 10, 2016ISA's, Compilers, and Assembly1 CS232 roadmap In the first 3 quarters of the class, we have covered 1.Understanding the relationship between.
Code Optimization.
Chapter 8: Main Memory.
Code Optimization I: Machine Independent Optimizations
Introduction to Computer Systems
Code Optimization I: Machine Independent Optimizations Feb 11, 2003
Programming Languages
Memory Management Overview
Code Optimization I Nov. 25, 2008
Lecture Topics: 11/1 General Operating System Concepts Processes
Introduction to Computer Systems
Machine-Independent Optimization
CSCE156: Introduction to Computer Science II
Code Optimization and Performance
Presentation transcript:

Introduction to ECE 454 Computer Systems Programming Topics: Lecture topics and assignments Profiling rudiments Lab schedule and rationale Cristiana Amza

– 2 – Lecture Topics Module 1 code optimization principles, and profiling, (also - measuring time on a computer) Module 2 the memory hierarchy, caches, locality Module 3 virtual memory, dynamic memory allocation Module 4 threads, parallel programming Module 5 Concurrency and the Architecture, other programming models (if time)

– 3 – Performance (1) Topics code optimization principles, measuring time on a computer and profilingAssignments L1: Code Performance Profiling, bottleneck determination

– 4 – The Memory Hierarchy (2) Topics Code optimization rules of thumb Memory technology, memory hierarchy, caches, locality Includes aspects of architecture and OS.Assignments L2: Optimizing Code Performance

– 5 – Virtual memory (3) Topics Virtual memory, dynamic memory allocation Includes aspects of architecture and OSAssignments L3: Writing your own malloc package

– 6 – Concurrency (4) Topics concurrency, threads. includes some aspects of networking, OS, and architecture.Assignments L4: Code parallelization for improved performance.

– 7 – Concurrency and the Architecture (5) Topics The connection between lock/mutex and cache coherence Various lock-based and lock-free parallelization schemes and standards Other parallel programming standardsAssignments L5: Parallel Code Optimization (game code)

– 8 – Module 1: code optimization principles, and profiling

– 9 – Convention: Cycles Per Element Convenient way to express performance of program that operates on vectors or lists Length = n T = CPE*n + Overhead vsum1 Slope = 4.0 vsum2 Slope = 3.5

– 10 – Clock Cycles Most computers controlled by high frequency clock Examples 100 MHz »10 8 cycles per second »Clock period = 10ns 2 GHz »2 X 10 9 cycles per second »Clock period = 0.5ns

– 11 – Role of Optimizing Compilers Provide efficient mapping of program to machine register allocation code selection and ordering eliminating minor inefficiencies Don’t (usually) improve asymptotic efficiency up to programmer to select best overall algorithm big-O savings are (often) more important than constant factors but constant factors also matter

– 12 – Role of Optimizing Compilers Provide efficient mapping of program to machine register allocation code selection and ordering eliminating minor inefficiencies Don’t (usually) improve asymptotic efficiency up to programmer to select best overall algorithm big-O savings are (often) more important than constant factors but constant factors also matter Have difficulty overcoming “optimization blockers” potential memory aliasing potential procedure side-effects

– 13 – Limitations of Optimizing Compilers Operate Under Fundamental Constraint Must not cause any change in program behavior under any possible condition Often prevents it from making optimizations when would only affect behavior under pathological conditions. Most analysis is performed only within procedures whole-program analysis is too expensive in most cases Most analysis is based only on static information compiler has difficulty anticipating run-time inputs When in doubt, the compiler must be conservative

– 14 – Role of Programmer How should I write my programs, given that I have a good, optimizing compiler? Don’t: Smash Code into Oblivion Hard to read, maintain, & assure correctnessDo: Select best algorithm Write code that’s readable & maintainable Procedures, recursion Even though these factors can slow down code Eliminate optimization blockers Allows compiler to do its job Focus on Inner Loops Do detailed optimizations where code will be executed repeatedly Will get most performance gain here

– 15 – Performance (1) Tools Measurement Accurately compute time taken by code Most modern machines have built in cycle counters Using them to get reliable measurements is tricky Profile procedure calling frequencies Unix tool gprof

– 16 – Code Profiling Example Task Count word frequencies in text document Produce sorted list of words from most frequent to leastSteps Convert strings to lowercase Apply hash function Read words and insert into hash table Mostly list operations Maintain counter for each unique word Sort results Data Set Collected works of Shakespeare 946,596 total words, 26,596 unique Initial implementation: 9.2 seconds 29,801the 27,529and 21,029I 20,957to 18,514of 15,370a 14010you 12,936my 11,722in 11,519that Shakespeare’s most frequent words

– 17 – Code Profiling Augment Executable Program with Instrumentation Computes (approximately) amount of time spent in each function Time computation method Periodically (~ every 10ms) interrupt program Determine what function is currently executing Increment its timer by interval (e.g., 10ms) Also maintains counter for each function indicating number of times called

– 18 – Code Profiling Augment Executable Program with Instrumentation Computes (approximately) amount of time spent in each function Time computation method Periodically (~ every 10ms) interrupt program Determine what function is currently executing Increment its timer by interval (e.g., 10ms) Also maintains counter for each function indicating number of times calledUsing gcc –O2 –pg prog.c –o prog./prog Executes in normal fashion, but also generates file gmon.out gprof prog Generates profile information based on gmon.out

– 19 – Profiling Results Call Statistics Number of calls and cumulative time for each function Performance Limiter Using inefficient sorting algorithm Single call uses 87% of CPU time % cumulative self self total time seconds seconds calls ms/call ms/call name sort_words lower find_ele_rec h_add

– 20 – Code Optimizations First step: Use more efficient sorting function Library function qsort

– 21 – Further Optimizations Iter first: Use iterative function to insert elements into linked list Causes code to slow down Iter last: Iterative function, places new entry at end of list Tend to place most common words at front of list Big table: Increase number of hash buckets Better hash: Use more sophisticated hash function Linear lower: Move strlen out of loop

– 22 – Profiling Observations Benefits Helps identify performance bottlenecks Especially useful when have complex system with many componentsLimitations Only shows performance for data tested E.g., linear lower did not show big gain, since words are short Quadratic inefficiency could remain lurking in code Timing mechanism fairly crude Only works for programs that run for > 3 seconds

– 23 – Lab Rationale Each lab well-defined goal such as determining the bottleneck of a program or a specific optimization. A (small) portion of some labs will be awarded for winning a performance contest or student-driven optimizations. We try to use competition in a fun and healthy way. Set a threshold for full credit. Might post some results (anonymized) on Web page for glory!

– 24 – Lab Rationale Doing a lab should result in new skills and concepts Profiling Lab: basic performance profiling, finding the bottleneck. Programming for cache: profiling, measurement, locality enhancements. Malloc Lab: understanding dynamic memory allocation Code Parallelization: multithreading of simple code. Parallel Code Optimization: putting all techniques together on simple game code parallelization

– 25 – Good Luck!