Penn ESE534 Spring2014 -- DeHon 1 ESE534: Computer Organization Day 14: March 19, 2014 Compute 2: Cascades, ALUs, PLAs.

Slides:



Advertisements
Similar presentations
CS370 – Spring 2003 Programmable Logic Devices PALs/PLAs.
Advertisements

1 KU College of Engineering Elec 204: Digital Systems Design Lecture 9 Programmable Configurations Read Only Memory (ROM) – –a fixed array of AND gates.
EE141 © Digital Integrated Circuits 2nd Arithmetic Circuits 1 Digital Integrated Circuits A Design Perspective Arithmetic Circuits Jan M. Rabaey Anantha.
EE141 © Digital Integrated Circuits 2nd Arithmetic Circuits 1 [Adapted from Rabaey’s Digital Integrated Circuits, ©2002, J. Rabaey et al.]
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 15: March 12, 2007 Interconnect 3: Richness.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day10: October 25, 2000 Computing Elements 2: Cascades, ALUs,
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day8: October 18, 2000 Computing Elements 1: LUTs.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 4: February 25, 2009 Two-Level Logic-Synthesis.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 13: February 26, 2007 Interconnect 1: Requirements.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 26: April 18, 2007 Et Cetera…
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 11: February 14, 2007 Compute 1: LUTs.
CS294-6 Reconfigurable Computing Day 2 August 27, 1998 FPGA Introduction.
Chapter # 4: Programmable and Steering Logic Section 4.1
CS294-6 Reconfigurable Computing Day 14 October 7/8, 1998 Computing with Lookup Tables.
Digital Integrated Circuits© Prentice Hall 1995 Arithmetic Arithmetic Building Blocks.
Penn ESE Spring DeHon 1 FUTURE Timing seemed good However, only student to give feedback marked confusing (2 of 5 on clarity) and too fast.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 5: January 24, 2007 ALUs, Virtualization…
Chapter 6 Memory and Programmable Logic Devices
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 12: February 21, 2007 Compute 2: Cascades, ALUs, PLAs.
Introduction to Digital Logic Design Appendix A of CO&A Dr. Farag
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 13: February 4, 2005 Interconnect 1: Requirements.
Combinational Circuits Chapter 3 S. Dandamudi To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.
Digital Computer Concept and Practice Copyright ©2012 by Jaejin Lee Logic Circuits I.
ESE Spring DeHon 1 ESE534: Computer Organization Day 19: April 7, 2014 Interconnect 5: Meshes.
CALTECH CS137 Winter DeHon 1 CS137: Electronic Design Automation Day 9: January 30, 2006 Parallel Prefix.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 9: February 24, 2014 Operator Sharing, Virtualization, Programmable Architectures.
Digital Computer Concept and Practice Copyright ©2012 by Jaejin Lee Logic Circuits I.
Arithmetic Building Blocks
CBSSS 2002: DeHon Costs André DeHon Wednesday, June 19, 2002.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 7: February 6, 2012 Memories.
CDA 3101 Fall 2013 Introduction to Computer Organization The Arithmetic Logic Unit (ALU) and MIPS ALU Support 20 September 2013.
EE 466/586 VLSI Design Partha Pande School of EECS Washington State University
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 18: April 2, 2014 Interconnect 4: Switching.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 3: February 3, 2014 Arithmetic Work preclass exercise.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 5: February 1, 2010 Memories.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 2: January 20, 2010 Universality, Gates, Logic Work Preclass Exercise.
Digital Integrated Circuits© Prentice Hall 1995 Arithmetic Arithmetic Building Blocks.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 2: January 18, 2012 Universality, Gates, Logic Work Preclass Exercise.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 10: January 31, 2003 Compute 2:
Penn ESE534 Spring DeHon 1 ESE534 Computer Organization Day 19: March 28, 2012 Minimizing Energy.
Penn ESE534 Spring DeHon 1 ESE534 Computer Organization Day 9: February 13, 2012 Interconnect Introduction.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 3: January 25, 2010 Arithmetic Work preclass exercise.
Caltech CS184 Winter DeHon CS184a: Computer Architecture (Structure and Organization) Day 4: January 15, 2003 Memories, ALUs, Virtualization.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 11: January 31, 2005 Compute 1: LUTs.
CBSSS 2002: DeHon Universal Programming Organization André DeHon Tuesday, June 18, 2002.
EE141 Arithmetic Circuits 1 Chapter 14 Arithmetic Circuits Rev /12/2003 Rev /05/2003.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 15: March 13, 2013 High Level Synthesis II Dataflow Graph Sharing.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 11: February 20, 2012 Instruction Space Modeling.
ESE Spring DeHon 1 ESE534: Computer Organization Day 20: April 9, 2014 Interconnect 6: Direct Drive, MoT.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 8: February 19, 2014 Memories.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 22: April 16, 2014 Time Multiplexing.
EE141 Arithmetic Circuits 1 Chapter 14 Arithmetic Circuits Rev /12/2003.
ESE532: System-on-a-Chip Architecture
ESE534: Computer Organization
CS184a: Computer Architecture (Structure and Organization)
ESE534 Computer Organization
ESE532: System-on-a-Chip Architecture
CS184a: Computer Architecture (Structure and Organization)
CprE / ComS 583 Reconfigurable Computing
CprE / ComS 583 Reconfigurable Computing
ESE534: Computer Organization
ESE534: Computer Organization
CSE 370 – Winter 2002 – Comb. Logic building blocks - 1
Programmable Configurations
Logic Circuits I Lecture 3.
ESE534: Computer Organization
ESE534: Computer Organization
FIGURE 5-1 MOS Transistor, Symbols, and Switch Models
Arithmetic Building Blocks
CprE / ComS 583 Reconfigurable Computing
Presentation transcript:

Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 14: March 19, 2014 Compute 2: Cascades, ALUs, PLAs

Penn ESE534 Spring DeHon 2 Last Time LUTs –area –structure –big LUTs vs. small LUTs with interconnect –design space –optimization

Penn ESE534 Spring DeHon 3 Today ALUs Cascades PLAs

Penn ESE534 Spring DeHon 4 Last Time Larger LUTs –Less interconnect delay +General: Larger compute blocks –Minimize interconnect crossings ­Large LUTs –Not efficient for typical logic structure

Preclass How does addition delay compare between ALU-based and LUT-based architectures? What is the source of the advantage? Penn ESE534 Spring DeHon 5

Preclass Advantages of ALU bitslice compute block over LUT? Disadvantages? Penn ESE534 Spring DeHon 6

7 Implications from LUT-size study ALU has more restricted logic functions –Will require more blocks

Penn ESE534 Spring DeHon 8 Structure in subgraphs Small LUTs capture structure What structure does a small-LUT-mapped netlist have?

Penn ESE534 Spring DeHon 9 Structure LUT sequences ubiquitous

Penn ESE534 Spring DeHon 10 Hardwired Logic Blocks Single Output

Penn ESE534 Spring DeHon 11 Hardwired Logic Blocks Two outputs

Penn ESE534 Spring DeHon 12 Delay Model Tcascade =T(3LUT) + 2T(mux) Don’t pay –General interconnect –Full 4-LUT delay

Penn ESE534 Spring DeHon 13 Options

Chung & Rose Study Which most help: –Ripple adder? –Associative reduce tree? –Multiplier? Penn ESE534 Spring DeHon 14 [Chung & Rose, DAC ’92]

Penn ESE534 Spring DeHon 15 Chung & Rose Study [Chung & Rose, DAC ’92]

Penn ESE534 Spring DeHon 16 Cascade LUT Mappings [Chung & Rose, DAC ’92]

Penn ESE534 Spring DeHon 17 Energy Impact? XC4003A data from Eric Kusse (UCB MS 1997) [Virtex II, Shang et al., FPGA 2002] What’s the likely energy impact?

Penn ESE534 Spring DeHon 18 ALU vs. Cascaded LUT?

Penn ESE534 Spring DeHon 19 Datapath Cascade ALU/LUT (datapath) Cascade –Long “serial” path w/out general interconnect –Pay only Tmux and nearest-neighbor interconnect

Penn ESE534 Spring DeHon 20 4-LUT Cascade ALU

Penn ESE534 Spring DeHon 21 ALU vs. LUT ? ALU –Only subset of ops available –Denser coding for those ops –Smaller –…but interconnect area dominates –[Datapath width orthogonal to function]

Penn ESE534 Spring DeHon 22 Accelerating LUT Cascade? Know can compute addition in O(log(N)) time Can we do better than N×Tmux for LUT cascade? Can we compute LUT cascade in O(log(N)) time? Can we compute mux cascade using parallel prefix? Can we make mux cascade associative?

Penn ESE534 Spring DeHon 23 Parallel Prefix Mux cascade How can mux transform S  mux-out? –A=0, B=0  mux-out=0 –A=1, B=1  mux-out=1 –A=0, B=1  mux-out=S –A=1, B=0  mux-out=/S

Penn ESE534 Spring DeHon 24 Parallel Prefix Mux cascade How can mux transform S  mux-out? –A=0, B=0  mux-out=0 Stop= S –A=1, B=1  mux-out=1 Generate= G –A=0, B=1  mux-out=S Buffer = B –A=1, B=0  mux-out=/S Invert = I

Penn ESE534 Spring DeHon 25 Parallel Prefix Mux cascade How can 2 muxes transform input? Can I compute 2-mux transforms from 1 mux transforms?

Penn ESE534 Spring DeHon 26 Two-mux transforms SS  S SG  G SB  S SI  G GS  S GG  G GB  G GI  S BS  S BG  G BB  B BI  I IS  S IG  G IB  I II  B

Penn ESE534 Spring DeHon 27 Generalizing mux-cascade How can N muxes transform the input? Is mux transform composition associative?

Penn ESE534 Spring DeHon 28 Parallel Prefix Mux-cascade Can be hardwired, no general interconnect

LUT Cascade Implications We can compute any LUT cascade in logarithmic time –Not just addition carry cascade –Can be different functions in each LUT Not demand SIMD op Penn ESE534 Spring DeHon 29

Penn ESE534 Spring DeHon 30 “ALU”s Unpacked Traditional/Datapath ALUs combine 3 ideas 1.SIMD/Datapath Control Architecture variable w 2.Long Cascade Typically also w, but can shorter/longer Amenable to parallel prefix implementation in O(log(w)) time w/ O(w) space 3.Restricted function Reduces instruction bits Reduces expressiveness

Penn ESE534 Spring DeHon 31 Commercial Devices

Virtex 6 & 7 Carry Prop=A xor B Otherwise –Generate –Squash Sum A xor B xor Cin Penn ESE534 Spring DeHon 32

Virtex 6 V7 similar Penn ESE534 Spring DeHon 33

Penn ESE534 Spring DeHon 34 Programmable Array Logic (PLAs)

Penn ESE534 Spring DeHon 35 PLA Directly implement flat (two-level) logic –O=a*b*c*d + !a*b*!d + b*!c*d Exploit substrate properties allow wired-OR

Penn ESE534 Spring DeHon 36 Wired-or Connect series of inputs to wire Any of the inputs can drive the wire high

Penn ESE534 Spring DeHon 37 Wired-or Implementation with Transistors

Penn ESE534 Spring DeHon 38 Programmable Wired-or Use some memory function to programmable connect (disconnect) wires to OR Fuse:

Penn ESE534 Spring DeHon 39 Programmable Wired-or Gate-memory model

Penn ESE534 Spring DeHon 40 Diagram Wired-or

Penn ESE534 Spring DeHon 41 Wired-or array Build into array –Compute many different or functions from set of inputs

Preclass 5 Penn ESE534 Spring DeHon 42 What function (MSP) ?

Penn ESE534 Spring DeHon 43 Combined or-arrays to PLA Combine two or (nor) arrays to produce PLA (and-or array) Programmable Logic Array

Penn ESE534 Spring DeHon 44 PLA

Penn ESE534 Spring DeHon 45 PLA Can implement each and on single line in first array Can implement each or on single line in second array

Penn ESE534 Spring DeHon 46 PLA Efficiency questions: –Each and/or is linear in total number of potential inputs (not actual) –How many product terms between arrays?

Preclass 6 How many product terms in S[w] in MSP? Penn ESE534 Spring DeHon 47

Carries C[1]=A[0]*B[0] –P[1]=1 C[2]=A[1]*B[1]+C[1]*A[1]+C[1]*B[1] –P[2]=1+2*P[1]=3 C[3]=A[2]*B[2]+C[2]*A[2]+C[2]*B[2] –P[3]=1+2*P[2]=7 P[4]=15, P[5]=31, P[6]=63 P[w]=2 w -1 Penn ESE534 Spring DeHon 48

Sum S[w]=A[w] xor B[w] xor C[w] S[w]=(A[w]*B[w]+/A[w]*/B[w])*C[w] +(/A[w]*B[w]+A[w]*/B[w])*/C[w] Product terms = 2*P[w]+2*Pterms(/C[w]) Penn ESE534 Spring DeHon 49

S[2] in MSP a2*/a0*/b2*/b1+ /a2*/a0*b2*/b1+ a2*/a1*/a0*/b2+ /a2*/a1*/a0*b2+ a2*/b2*/b1*/b0+ /a2*b2*/b1*/b0+ a2*/a2*/b2*/b0+ /a2*/a1*b2*/b0+ /a2*a0*/b2*b1*b0+ /a2*a1*a0*/b2*b0+ a2*a0*b2*b1*b0+ a2*a1*a0*b2*b0+ a2*/a1*/b2*/b1+ /a2*/a1*b2*/b1+ /a2*a1*/b2*b1 Penn ESE534 Spring DeHon 50

Penn ESE534 Spring DeHon 51 PLA Product Terms Can be exponential in number of inputs E.g. n-input xor (parity function) –When flatten to two-level logic, requires exponential product terms –a*!b+!a*b –a*!b*!c+!a*b*!c+!a*!b*c+a*b*c …and additions, as we just saw…

Penn ESE534 Spring DeHon 52 PLAs Fast Implementations for large ANDs or ORs Number of P-terms can be exponential in number of input bits –most complicated functions –not exponential for many functions Can use arrays of small PLAs –to exploit structure –like we saw arrays of small memories last time

Penn ESE534 Spring DeHon 53 PLAs vs. LUTs? Look at Inputs, Outputs, P-Terms –minimum area (one study, see paper) –K=10, N=12, M=3 A(PLA 10,12,3) comparable to 4-LUT? –80-130%? –300% on ECC (structure LUT can exploit) Delay? –Claim 40% fewer logic levels (4-LUT) (general interconnect crossings) Not comparing to hardwired cascades [Kouloheris & El Gamal/CICC’92]

Penn ESE534 Spring DeHon 54 PLA

Penn ESE534 Spring DeHon 55 PLA and Memory

Penn ESE534 Spring DeHon 56 PAL = Programmable Array Logic PLA and PAL

Penn ESE534 Spring DeHon 57 Conventional/Commercial FPGA Altera 9K (from databook)

Penn ESE534 Spring DeHon 58 Conventional/Commercial FPGA Altera 9K (from databook)Like PAL

Penn ESE534 Spring DeHon 59 Big Ideas [MSB Ideas] Programmable Interconnect allows us to exploit structure in logic –want to match to application structure –Prog. interconnect delay expensive Hardwired Cascades –key technique to reducing delay in programmables (both ALUs and LUTs) PLAs –canonical two level structure –hardwire portions to get Memories, PALs

Penn ESE534 Spring DeHon 60 Admin HW5.2 due today HW6 out Reading for Monday on web