Status – Week 274 Victor Moya. Simulator model Boxes. Boxes. Perform the actual work. Perform the actual work. A box can only access its own data, external.

Slides:



Advertisements
Similar presentations
Computer Architecture
Advertisements

Computer Architecture and the Fetch-Execute Cycle
Adding the Jump Instruction
Control path Recall that the control path is the physical entity in a processor which: fetches instructions, fetches operands, decodes instructions, schedules.
Arithmetic Logic Unit (ALU)
Status – Week 257 Victor Moya. Summary GPU interface. GPU interface. GPU state. GPU state. API/Driver State. API/Driver State. Driver/CPU Proxy. Driver/CPU.
Sim-alpha: A Validated, Execution-Driven Alpha Simulator Rajagopalan Desikan, Doug Burger, Stephen Keckler, Todd Austin.
Computer Organization. This module surveys the physical resources of a computer system. –Basic components CPUMemoryBus I/O devices –CPU structure Registers.
1 Shader Performance Analysis on a Modern GPU Architecture Victor Moya, Carlos González, Jordi Roca, Agustín Fernández Jordi Roca, Agustín Fernández Department.
Status – Week 259 Victor Moya. Summary OpenGL Traces. OpenGL Traces. DirectX Traces. DirectX Traces. Proxy CPU. Proxy CPU. Command Processor. Command.
Midterm Wednesday Chapter 1-3: Number /character representation and conversion Number arithmetic Combinational logic elements and design (DeMorgan’s Law)
Status – Week 243 Victor Moya. Summary Current status. Current status. Tests. Tests. XBox documentation. XBox documentation. Post Vertex Shader geometry.
Memory - Registers Instruction Sets
GPU Simulator Victor Moya. Summary Rendering pipeline for 3D graphics. Rendering pipeline for 3D graphics. Graphic Processors. Graphic Processors. GPU.
Chapter 4 Processor Technology and Architecture. Chapter goals Describe CPU instruction and execution cycles Explain how primitive CPU instructions are.
Status – Week 265 Victor Moya. Summary ShaderEmulator ShaderEmulator ShaderFetch ShaderFetch ShaderDecodeExecute ShaderDecodeExecute Communication storage.
Status – Week 272 Victor Moya. Vertex Shader VS 2.0+ (NV30) based Vertex Shader model. VS 2.0+ (NV30) based Vertex Shader model. Multithreaded?? Implemented.
Overview The von Neumann Machine - the programmable digital computer Introducing the LC-3 Computer - A “toy” computer for us to learn from Computer machine.
Chapter 12 CPU Structure and Function. Example Register Organizations.
Status – Week 275 Victor Moya. Simulator model Boxes. Boxes. Perform the actual work. Perform the actual work. Parameters: wires in, wires out, child.
Status – Week 260 Victor Moya. Summary shSim. shSim. GPU design. GPU design. Future Work. Future Work. Rumors and News. Rumors and News. Imagine. Imagine.
Overview von Neumann Model Components of a Computer Some Computer Organization Models The Computer Bus An Example Organization: The LC-3.
Status – Week 266 Victor Moya. Summary ShaderEmulator ShaderEmulator ShaderFetch ShaderFetch ShaderDecodeExecute ShaderDecodeExecute Communication storage.
Chapter 6 Memory and Programmable Logic Devices
Inside The CPU. Buses There are 3 Types of Buses There are 3 Types of Buses Address bus Address bus –between CPU and Main Memory –Carries address of where.
The Computer Processor
Lecture 24: CPU Design Today’s topic –Multi-Cycle ALU –Introduction to Pipelining 1.
Group 5 Alain J. Percial Paula A. Ortiz Francis X. Ruiz.
Computer Science 210 Computer Organization The Instruction Execution Cycle.
The von Neumann Model – Chapter 4 COMP 2620 Dr. James Money COMP
Micro-operations Are the functional, or atomic, operations of a processor. A single micro-operation generally involves a transfer between registers, transfer.
Chapter 5 Basic Processing Unit
Introduction to Computing Systems from bits & gates to C & beyond Chapter 4 The Von Neumann Model Basic components Instruction processing.
Introduction to Computing Systems from bits & gates to C & beyond The Von Neumann Model Basic components Instruction processing.
Chapter 4 The Von Neumann Model
Chapter 2 Summary Classification of architectures Features that are relatively independent of instruction sets “Different” Processors –DSP and media processors.
Lecture 8: Processors, Introduction EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014,
CSCI 211 Intro Computer Organization –Consists of gates for logic And Or Not –Processor –Memory –I/O interface.
Model Computer CPU Arithmetic Logic Unit Control Unit Memory Unit
Multiple-Cycle Hardwired Control Digital Logic Design Instructor: Kasım Sinan YILDIRIM.
How Computers Work in Simple english Dr. John P. Abraham Professor UTPA.
Computer Architecture Memory, Math and Logic. Basic Building Blocks Seen: – Memory – Logic & Math.
Computer Hardware A computer is made of internal components Central Processor Unit Internal External and external components.
IT253: Computer Organization Lecture 9: Making a Processor: Single-Cycle Processor Design Tonga Institute of Higher Education.
Electronic Analog Computer Dr. Amin Danial Asham by.
Computer Architecture: Wrap-up CENG331 - Computer Organization Instructors: Murat Manguoglu(Section 1) Erol Sahin (Section 2 & 3) Adapted from slides of.
Computer Organization CS345 David Monismith Based upon notes by Dr. Bill Siever and notes from the Patterson and Hennessy Text.
CPU Overview Computer Organization II 1 February 2009 © McQuain & Ribbens Introduction CPU performance factors – Instruction count n Determined.
Simple ALU How to perform this C language integer operation in the computer C=A+B; ? The arithmetic/logic unit (ALU) of a processor performs integer arithmetic.
Elements of Datapath for the fetch and increment The first element we need: a memory unit to store the instructions of a program and supply instructions.
1 Basic Processor Architecture. 2 Building Blocks of Processor Systems CPU.
Simulator Outline of MIPS Simulator project  Write a simulator for the MIPS five-stage pipeline that does the following: Implements a subset of.
Logic Gates Dr.Ahmed Bayoumi Dr.Shady Elmashad. Objectives  Identify the basic gates and describe the behavior of each  Combine basic gates into circuits.
BASIC COMPUTER ARCHITECTURE HOW COMPUTER SYSTEMS WORK.
3.1.4 Hardware a. describe the function and purpose of the control unit, memory unit and ALU (arithmetic logic unit) as individual parts of a computer;
Control Unit Operation
Basic Processor Structure/design
Morgan Kaufmann Publishers
Computer Architecture
CS/COE0447 Computer Organization & Assembly Language
Design of the Control Unit for Single-Cycle Instruction Execution
Computer Architecture
Computer Science 210 Computer Organization
Design of the Control Unit for One-cycle Instruction Execution
Topic 6 LC-3.
MIPS Processor.
Chapter 4 The Von Neumann Model
Basic components Instruction processing
Chapter 4 The Von Neumann Model
Presentation transcript:

Status – Week 274 Victor Moya

Simulator model Boxes. Boxes. Perform the actual work. Perform the actual work. A box can only access its own data, external data must come through signals (time!). A box can only access its own data, external data must come through signals (time!). The box manages its own signals. The box manages its own signals. A box: A box: whatever you do in 1+ cycle or whatever whatever you do in 1+ cycle or whatever or or what a piece of hardware does what a piece of hardware does

Simulator Model Signals. Signals. Communication between boxes. Communication between boxes. Carry the simulator time: 1+ cycle latency. Carry the simulator time: 1+ cycle latency. Parameters: bandwidth, latency. Parameters: bandwidth, latency. Storage size: bw * (lat + 1). Storage size: bw * (lat + 1). Not allowed read and write with latency 0. Not allowed read and write with latency 0.

Simulator Model Wiring boxes: Wiring boxes: Global static object. Global static object. Creates and binds signals by name. Creates and binds signals by name. Statistics Statistics Global static object. Global static object. Boxes declare a statistic name. Boxes declare a statistic name. The statistics object manages the different statitistics. The statistics object manages the different statitistics.

Simulator Model BOX Signal BOX SignalBinder StatisticBinder Signal Box Statistic

Simulator Model Box1 Box2 write read bw:2 lat: 3

Problems Must be 0 latency for wires forbidden? => NO! Must be 0 latency for wires forbidden? => NO! What happens if a two boxes must communicate in the same cycle? => NOT ALLOWED!. What happens if a two boxes must communicate in the same cycle? => NOT ALLOWED!.

Problems How to manage multiple instances of the same Box (4 vertex shaders, 8 pixel shaders). How to manage multiple instances of the same Box (4 vertex shaders, 8 pixel shaders). Create each instance as a different class with its own name and signal binding. Create each instance as a different class with its own name and signal binding. Box() now has a parameter Name that defines a different name for each instance (How can we grant the names are different?). Box() now has a parameter Name that defines a different name for each instance (How can we grant the names are different?). How to bind signals in multiple instanced boxed (VS, PS). How to bind signals in multiple instanced boxed (VS, PS). Signal are created/binded by the signal emiter and receiver. Signal are created/binded by the signal emiter and receiver. Prefix signal name with instance name. Prefix signal name with instance name. Add new parameters to Box() for the emiters name instances. Add new parameters to Box() for the emiters name instances.

Vertex Shader VS 1.0 (NV20) based Vertex Shader model. VS 1.0 (NV20) based Vertex Shader model. Multithread (multivertex?) supported. Multithread (multivertex?) supported. No branching. No branching. No texture/vertex buffer load. No texture/vertex buffer load. No vertex kill. No vertex kill.

Vertex Shader

VS 2.0+ (NV30) based Vertex Shader model. VS 2.0+ (NV30) based Vertex Shader model. Multithreaded?? Implemented with a FP array (3DLabs P10). Multithreaded?? Implemented with a FP array (3DLabs P10). Dynamic branching. Dynamic branching. No texture/vertx buffer load. No texture/vertx buffer load. No vertex kill. No vertex kill.

Vertex Shader

VS 3.0 (DX9.1). Not implemented yet. VS 3.0 (DX9.1). Not implemented yet. Hardware implementation unknown. Hardware implementation unknown. Static and dynamic branching. Static and dynamic branching. Texture/Vertex Buffer load (and store?). Texture/Vertex Buffer load (and store?). Possible vertex kill? Possible vertex kill?

Vertex Shader Model Instruction Fetch Instruction Fetch Sends the instruction byte code pointed by the current PC to Decode/Register box (latency 1). Sends the instruction byte code pointed by the current PC to Decode/Register box (latency 1). Decode/Register Decode/Register Calculates next PC (sequential, jump, conditional jump, calls, return, indirect) and sends it to Instruction Fetch (latency 1). Calculates next PC (sequential, jump, conditional jump, calls, return, indirect) and sends it to Instruction Fetch (latency 1). Reads up to three source operands from the register files (Vertex Input, Constant, Temporary, Address) and sends them to Execute with the instruction operation code (latency1). Reads up to three source operands from the register files (Vertex Input, Constant, Temporary, Address) and sends them to Execute with the instruction operation code (latency1). Gets incoming result (flags + operation result) from execute and write them in the register files (flags, Vertex Output, Temporary). Gets incoming result (flags + operation result) from execute and write them in the register files (flags, Vertex Output, Temporary). Execute Execute Performes an operation with the operands received with from the Decode/Register box and sends the result back to Decode/Register box with 1+ latency. Performes an operation with the operands received with from the Decode/Register box and sends the result back to Decode/Register box with 1+ latency.

Vertex Shader Model

DirectX 9 Almost ready. Almost ready. DX 9 RC 0 just released. DX 9 RC 0 just released. ATI DX9 demos and drivers. ATI DX9 demos and drivers. GDC Presentations are available already. GDC Presentations are available already. Introduction to VS/PS 3.0 and beyond. Introduction to VS/PS 3.0 and beyond.

NV30 Product ‘release’. Product ‘release’. Cards in February. Cards in February. Reviews in later December. Reviews in later December MHz MHz um, 125 M Transistors um, 125 M Transistors. FP array implements the vertex shader. FP array implements the vertex shader. 8 pixel pipes, 1 TMU. 8 pixel pipes, 1 TMU. 128bit 500 MHZ DDRII. 128bit 500 MHZ DDRII.