ESE534: Computer Organization

Slides:



Advertisements
Similar presentations
Taxanomy of parallel machines. Taxonomy of parallel machines Memory – Shared mem. – Distributed mem. Control – SIMD – MIMD.
Advertisements

Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day6: October 11, 2000 Instruction Taxonomy VLSI Scaling.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day17: November 20, 2000 Time Multiplexing.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 21: April 2, 2007 Time Multiplexing.
CS294-6 Reconfigurable Computing Day 6 September 10, 1998 Comparing Computing Devices.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day8: October 18, 2000 Computing Elements 1: LUTs.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 9: February 7, 2007 Instruction Space Modeling.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 13: February 26, 2007 Interconnect 1: Requirements.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 11: February 14, 2007 Compute 1: LUTs.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 4: January 22, 2007 Memories.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 23: April 9, 2007 Control.
Penn ESE Spring DeHon 1 FUTURE Timing seemed good However, only student to give feedback marked confusing (2 of 5 on clarity) and too fast.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 14: February 28, 2007 Interconnect 2: Wiring Requirements and Implications.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 5: January 24, 2007 ALUs, Virtualization…
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 13: February 4, 2005 Interconnect 1: Requirements.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 11: March 3, 2014 Instruction Space Modeling 1.
Caltech CS184b Winter DeHon 1 CS184b: Computer Architecture [Single Threaded Architecture: abstractions, quantification, and optimizations] Day3:
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 9: February 24, 2014 Operator Sharing, Virtualization, Programmable Architectures.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR FPGA Fabric n Elements of an FPGA fabric –Logic element –Placement –Wiring –I/O.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 7: February 6, 2012 Memories.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 7: January 24, 2003 Instruction Space.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 5: February 1, 2010 Memories.
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 28: November 7, 2014 Memory Overview.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 21: April 4, 2012 Lossless Data Compression.
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 28: November 8, 2013 Memory Overview.
Penn ESE534 Spring DeHon 1 ESE534 Computer Organization Day 9: February 13, 2012 Interconnect Introduction.
Caltech CS184 Winter DeHon CS184a: Computer Architecture (Structure and Organization) Day 4: January 15, 2003 Memories, ALUs, Virtualization.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 11: February 20, 2012 Instruction Space Modeling.
ESE Spring DeHon 1 ESE534: Computer Organization Day 20: April 9, 2014 Interconnect 6: Direct Drive, MoT.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 8: February 19, 2014 Memories.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 17: March 29, 2010 Interconnect 2: Wiring Requirements and Implications.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 22: April 16, 2014 Time Multiplexing.
ESE532: System-on-a-Chip Architecture
Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Parallel Hardware Dr. Xiao Qin Auburn.
ESE534: Computer Organization
ESE534: Computer Organization
ESE534: Computer Organization
ESE534: Computer Organization
ESE532: System-on-a-Chip Architecture
ESE534: Computer Organization
ESE534: Computer Organization
ESE532: System-on-a-Chip Architecture
ESE532: System-on-a-Chip Architecture
ESE534 Computer Organization
ESE532: System-on-a-Chip Architecture
ESE532: System-on-a-Chip Architecture
ESE532: System-on-a-Chip Architecture
ESE532: System-on-a-Chip Architecture
CS184a: Computer Architecture (Structure and Organization)
Day 26: November 11, 2011 Memory Overview
ESE534: Computer Organization
ESE534: Computer Organization
ESE534: Computer Organization
ESE534: Computer Organization
ESE534: Computer Organization
ESE534: Computer Organization
ESE534: Computer Organization
ESE534: Computer Organization
ESE534: Computer Organization
ESE534: Computer Organization
ESE534: Computer Organization
Chapter 4 Multiprocessors
ESE534: Computer Organization
ESE534: Computer Organization
ESE532: System-on-a-Chip Architecture
ESE532: System-on-a-Chip Architecture
ESE532: System-on-a-Chip Architecture
Presentation transcript:

ESE534: Computer Organization Day 10: February 15, 2012 Instruction Space Penn ESE534 Spring2012 -- DeHon

Previously Temporally Programmable Architectures Spatially Programmable Architectures Instructions Penn ESE534 Spring2012 -- DeHon

Today Instructions Requirements Taxonomy Penn ESE534 Spring2012 -- DeHon

Computing Requirements (review) Penn ESE534 Spring2012 -- DeHon

Instruction Control What needs to be controlled per ALU-memory-bank? Penn ESE534 Spring2012 -- DeHon

Instruction Taxonomy Penn ESE534 Spring2012 -- DeHon

Instructions Distinguishing feature of programmable architectures: Instructions -- bits which tell the device how to behave Compute 0000 00 010 11 0110 net0 add mem slot#6 Penn ESE534 Spring2012 -- DeHon

Focus on Instructions Instruction organization has a large effect on: size or compactness of an architecture realm of efficient utilization for an architecture Penn ESE534 Spring2012 -- DeHon

Terminology Primitive Instruction (pinst) Collection of bits that tell a single bit-processing element what to do on each cycle Includes: select compute operation input sources in space (interconnect) input sources in time (retiming) Compute 0000 00 010 11 0110 net0 add mem slot#6 Penn ESE534 Spring2012 -- DeHon

Preclass How big is pinst for preclass? (problem 2) Penn ESE534 Spring2012 -- DeHon

Computational Array Model Collection of computing elements compute operator local storage/retiming Interconnect Instruction Penn ESE534 Spring2012 -- DeHon

“Ideal” Instruction Control Issue a new instruction to every computational bit operator on every cycle Penn ESE534 Spring2012 -- DeHon

“Ideal” Instruction Distribution Why don’t we do this? Penn ESE534 Spring2012 -- DeHon

Preclass How many total instruction bits? How many pins on a chip? Penn ESE534 Spring2012 -- DeHon

Preclass How wide is instruction distribution? (F units) For F=32nm? Penn ESE534 Spring2012 -- DeHon

“Ideal” Instruction Distribution Problem: Instruction bandwidth (and storage area) quickly dominates everything else Compute Block ~ 256K F2 (512F x 512F) Instruction ~ 64 bits Wire Pitch ~ 4F Memory bit ~ 300F2 Penn ESE534 Spring2012 -- DeHon

Instruction Distribution How many instructions Across side?? 64x4F=256F Two instructions in 512F Penn ESE534 Spring2012 -- DeHon

Instruction Distribution Distribute from both sides = 2x Penn ESE534 Spring2012 -- DeHon

Instruction Distribution Distribute X and Y = 2x Penn ESE534 Spring2012 -- DeHon

Instruction Distribution Room to distribute 2 instructions across PE per metal layer (512F = 2644F) Feed top and bottom (left and right) = 2 Two complete metal layers = 2 How many instructions per PE side?  8 instructions / PE Side Penn ESE534 Spring2012 -- DeHon

Instruction Distribution Maximum of 8 instructions per PE side When saturate wire channels? Saturate wire channels at 8N = N What N?  at 64 PE beyond this: instruction distribution dominates area Penn ESE534 Spring2012 -- DeHon

Instruction Distribution Perimeter = 42N ≤ N Saturate wire channels at 8N = N  at 64 PE beyond this: instruction distribution dominates area Instruction consumption goes with area Instruction bandwidth goes with perimeter Penn ESE534 Spring2012 -- DeHon

Instruction Distribution Beyond 64 PE, instruction bandwidth dictates PE size = N (644F) PEarea 4N How PEarea grow? PEarea =4KF2N As we build larger arrays processing elements become less dense Penn ESE534 Spring2012 -- DeHon

Avoid Instruction BW Saturation? How might we avoid this? Penn ESE534 Spring2012 -- DeHon

Instruction Memory Requirements Idea: put instruction memory in array Problem: Instruction memory can quickly dominate area, too Memory Area = 64300F2/instruction PEarea = 256K F2 + (Instructions)  20K F2 When instruction memory dominate? Penn ESE534 Spring2012 -- DeHon

Instruction Pragmatics Instruction requirements could dominate array size. Standard architecture trick: Look for structure to exploit in “typical computations” Penn ESE534 Spring2012 -- DeHon

Typical Structure? What structure do we usually expect? Penn ESE534 Spring2012 -- DeHon

One Extreme SIMD (Single Instruction Multiple Data) e.g. microprocessors,GPUs Instruction/cycle share instruction across array of PEs uniform operation in space operation variance in time Penn ESE534 Spring2012 -- DeHon

SIMD Penn ESE534 Spring2012 -- DeHon

Another Extreme FPGA (Field-Programmable Gate Array) Instruction/PE assume temporal locality of instructions (same) operation variance in space uniform operations in time Penn ESE534 Spring2012 -- DeHon

Spatially Programmable Penn ESE534 Spring2012 -- DeHon

VLIW VLIW = Very Long Instruction Word Few pinsts/cycle Share instruction across w bits Penn ESE534 Spring2012 -- DeHon

Architectural Differences What differentiates a VLIW from a multicore? E.g. 4-issue VLIW vs. 4 single-issue processors Penn ESE534 Spring2012 -- DeHon

Architectural Differences What differentiates a VLIW from a multicore? Penn ESE534 Spring2012 -- DeHon

SIMD, Spatial, VLIW, Multicore Penn ESE534 Spring2012 -- DeHon

Basis Vectors In practice, mix together: E.g. Modern Multicore MIMD (multiple cores, one PC per core) VLIW within core (superscalar, multiple pinst issue from each core) Word-wide SIMD for Integer operations Perhaps even with explicit SIMD operations Penn ESE534 Spring2012 -- DeHon

Placing Architectures What programmable architectures (organizations) are you familiar with? Penn ESE534 Spring2012 -- DeHon

Gross Parameters Instruction sharing width Instruction depth SIMD width granularity Instruction depth Instructions stored locally per compute element pinsts per control thread E.g. VLIW width Penn ESE534 Spring2012 -- DeHon

Architecture Taxonomy PCs Pints/PC depth width Architecture N 1 FPGA N (48,640) 8 Tabula ABAX (A1EC04) 1024 32 Scalar Processor (RISC) D W VLIW (superscalar) Small W*N SIMD, GPU, Vector MIMD 16 1 (4?) 2048 64 16-core Penn ESE534 Spring 2010 -- DeHon

Instruction Message Architectures fall out of: general model too expensive structure exists in common problems exploit structure to reduce resource requirements Architectures can be viewed in a unified design space Penn ESE534 Spring2012 -- DeHon

Admin Reading on blackboard HW5 Class Monday No class next Wednesday Problem 1 due Monday Should be able to do all of Problem 1 now Day11/Monday relevant to Problem 2 Class Monday No class next Wednesday Penn ESE534 Spring2012 -- DeHon

Big Ideas Basic elements of a programmable computation Compute Interconnect (space and time, outside system [IO]) Instructions Instruction resources can be significant dominant/limiting resource Penn ESE534 Spring2012 -- DeHon