Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.

Slides:



Advertisements
Similar presentations
COMP375 Computer Architecture and Organization Senior Review.
Advertisements

MEMORY popo.
CSCI 4717/5717 Computer Architecture
Topics covered: Memory subsystem CSE243: Introduction to Computer Architecture and Hardware/Software Interface.
Chapter 6 Computer Architecture
Cache Memory Locality of reference: It is observed that when a program refers to memory, the access to memory for data as well as code are confined to.
Chapter 1 Computer System Overview Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design Principles,
Characteristics of Computer Memory
Computational Astrophysics: Methodology 1.Identify astrophysical problem 2.Write down corresponding equations 3.Identify numerical algorithm 4.Find a computer.
Chapter 1 and 2 Computer System and Operating System Overview
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
CHAPTER 8: CPU and Memory Design, Enhancement, and Implementation
Characteristics of Computer Memory
Computer Organization and Architecture
Inside The CPU. Buses There are 3 Types of Buses There are 3 Types of Buses Address bus Address bus –between CPU and Main Memory –Carries address of where.
Group 5 Alain J. Percial Paula A. Ortiz Francis X. Ruiz.
Cache Memories Effectiveness of cache is based on a property of computer programs called locality of reference Most of programs time is spent in loops.
CH05 Internal Memory Computer Memory System Overview Semiconductor Main Memory Cache Memory Pentium II and PowerPC Cache Organizations Advanced DRAM Organization.
Advanced Computer Architectures
Memory Systems Architecture and Hierarchical Memory Systems
C.S. Choy95 COMPUTER ORGANIZATION Logic Design Skill to design digital components JAVA Language Skill to program a computer Computer Organization Skill.
Basic Microcomputer Design. Inside the CPU Registers – storage locations Control Unit (CU) – coordinates the sequencing of steps involved in executing.
Computers Central Processor Unit. Basic Computer System MAIN MEMORY ALUCNTL..... BUS CONTROLLER Processor I/O moduleInterconnections BUS Memory.
CHAPTER 8: CPU and Memory Design, Enhancement, and Implementation
Chapter 1 Computer System Overview Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design Principles,
Chapter 1 Computer System Overview Dave Bremer Otago Polytechnic, N.Z. ©2008, Prentice Hall Operating Systems: Internals and Design Principles, 6/E William.
CACHE MEMORY Cache memory, also called CPU memory, is random access memory (RAM) that a computer microprocessor can access more quickly than it can access.
Operating Systems Lecture 02: Computer System Overview Anda Iamnitchi
What have mr aldred’s dirty clothes got to do with the cpu
How to Build a CPU Cache COMP25212 – Lecture 2. Learning Objectives To understand: –how cache is logically structured –how cache operates CPU reads CPU.
Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.
Super computers Parallel Processing By Lecturer: Aisha Dawood.
CHAPTER 4 The Central Processing Unit. Chapter Overview Microprocessors Replacing and Upgrading a CPU.
Computer system & Architecture
2007 Sept. 14SYSC 2001* - Fall SYSC2001-Ch4.ppt1 Chapter 4 Cache Memory 4.1 Memory system 4.2 Cache principles 4.3 Cache design 4.4 Examples.
1 How will execution time grow with SIZE? int array[SIZE]; int sum = 0; for (int i = 0 ; i < ; ++ i) { for (int j = 0 ; j < SIZE ; ++ j) { sum +=
Computer Hardware A computer is made of internal components Central Processor Unit Internal External and external components.
CPS 4150 Computer Organization Fall 2006 Ching-Song Don Wei.
Lecture#15. Cache Function The data that is stored within a cache might be values that have been computed earlier or duplicates of original values that.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%
Cosc 2150: Computer Organization
System Hardware FPU – Floating Point Unit –Handles floating point and extended integer calculations 8284/82C284 Clock Generator (clock) –Synchronizes the.
Different Microprocessors Tamanna Haque Nipa Lecturer Dept. of Computer Science Stamford University Bangladesh.
Jeffrey Ellak CS 147. Topics What is memory hierarchy? What are the different types of memory? What is in charge of accessing memory?
1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.
High Performance Computing1 High Performance Computing (CS 680) Lecture 2a: Overview of High Performance Processors * Jeremy R. Johnson *This lecture was.
Characteristics Location Capacity Unit of transfer Access method Performance Physical type Physical characteristics Organisation.
1 load [2], [9] Transfer contents of memory location 9 to memory location 2. Illegal instruction.
CS161 – Design and Architecture of Computer
Computer Organization
Cache Memory.
Address – 32 bits WRITE Write Cache Write Main Byte Offset Tag Index Valid Tag Data 16K entries 16.
CSC 4250 Computer Architectures
Assembly Language for Intel-Based Computers, 5th Edition
5.2 Eleven Advanced Optimizations of Cache Performance
Cache Memory Presentation I
Chapter 6 Memory System Design
CHAPTER 8: CPU and Memory Design, Enhancement, and Implementation
Morgan Kaufmann Publishers Memory Hierarchy: Virtual Memory
Memory Organization.
Morgan Kaufmann Publishers Memory Hierarchy: Cache Basics
Chapter 1 Computer System Overview
Chapter Five Large and Fast: Exploiting Memory Hierarchy
Presentation transcript:

Chapter 11 System Performance Enhancement

Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands are decoded and required data fetched from specified location (using addressing mode built into instruction) l Operation corresponding to instruction is executed l Additional operand determines return location for the result of operation

Performance l CPU performs program instructions via a sequence of fetch-execute cycles l Note: F-E Cycle consists of many phases l Performance is degraded by delays in memory accesses -

Performance Enhancement l RISC Architecture - Reduced Instruction Set Computing - Simple instructions: easier to decode and to run in parallel - Limited memory access - only load and store - Many registers and compilers that optimize their use

Performance Enhancement l Pipelining - Overlap processing of instructions so that more than one instruction is being worked on at a given time l While one instruction is fetching, another may be executing l So pipelining performs Fetch - Execute phases in parallel NOTE: Only 1 instruction at a time is actually being executed to completion l Objective: start and finish one instruction per clock cycle: CPI = 1

Pipelining - Fig

Performance Enhancement l SuperScalar Design - start and finish more than one instruction per clock cycle: CPI < 1 l Executes several operations at once l Hardware duplication to support parallelism l CPU may have instruction fetch unit and several execution units operating in parallel l Hardware schedules instructions to exploit parallelism

Other Means of Improving Performance l Multiprocessing l Faster Clock Speed l Wider Instructions and Data Paths l Longer Registers l Faster Disk Access l Memory Enhancements

Multiprocessing l Increase number of processors l Multiprocessors - computers that have multiple CPUs within a single system, sharing memory and I/O devices l Typically, 2-4 processors l Tightly coupled system

Typical Multiprocessing System

Symmetrical Multiprocessing (SMP) Systems l Each CPU operates independently l Each CPU has access to all the system resources (memory and I/O) l Any CPU can respond to an interrupt l A program in memory can be executed by any CPU l Each CPU has identical access to OS l Each CPU performs its own dispatch scheduling - that is, determining what program will execute l Very controlled environment - CPUs, memory, I/O devices, and OS are designed to operate together and communication is built into the system

Increase Clock Speed l Faster clock speeds impact overall speed of the system since instruction cycle time is proportional to clock speed l Limitation - ability of CPU, busses, and other components to keep up

Wider Instruction and Data Paths l Ability to process more bits at a time improves performance l CPU can fetch or store more data in a single operation l CPU can fetch more instructions at a time l Memory accesses are slow compared to CPU operations, so improves performance

Longer Registers l Longer registers (# of bits) within CPU reduces number of program steps to complete a calculation l Example - Using 16-bit registers for 64-bit addition requires 4 additions plus steps to handle carries between registers and 4 moves to transfer result to memory l With 64-bit registers only a single addition and single move to memory via wider internal bus

Faster Disk Access l Small improvements in disk access can have significant improvement in system performance l Approach - data distributed among multiple devices so data can be accessed simultaneously from different devices l Manufacturers continue to produce disk drives that are smaller and more densely packed

Larger/Faster Memory l Increased amounts of memory provide larger buffers that can be used to hold data and programs transferred from I/O devices l Reduces number of disk accesses l Faster memory reduces number of wait states that must be inserted into the instruction cycle when memory access takes place l Memory access time can be reduced via RISC architecture - more registers - and l by providing wider memory data paths (8 bytes)

Memory l DRAM - Dynamic RAM - inexpensive memory, requires less electrical power, and more compact with more bits of memory in single integrated circuit. Requires periodic refreshing. l SRAM -Static RAM times faster, but more expensive and requires more chips l Impractical to use SRAM memory l Solution - Cache Memory

Cache Memory

l Cache memory is organized into blocks of 8-16 bytes each l Block holds exact copy of data stored in main memory l Each block has a tag that identifies location of data in main memory contained in the block l 64KB of cache => 8,192 blocks of data l CPU request for memory is handled by Cache Controller that checks tags for desired location Hit => data in CacheMiss => not present l Read => transfer data from Cache to CPU and Write => store data with tag in Cache memory l If Miss, data is copied from memory to Cache

Cache Illustration

Cache Situations l Full Cache and Memory Write: LRU - Least Recently Used Algorithm replace block that has not been accessed for the longest l Suppose block to be replaced has been altered - first write block to memory before replacement l Cache controller manages entire cache operation. CPU is unaware of Cache presence. l Why does Cache work? Locality of Reference - Empirical studies show that most well written programs confine memory references to a few small regions of memory - e.q. sequential instructions or loops or small procedure or array data. Hit-to-Miss ratios of 90%.

Two-Level Cache System