Lecture 10: Logic Emulation October 8, 2013 ECE 636 Reconfigurable Computing Lecture 13 Logic Emulation.

Slides:



Advertisements
Similar presentations
RAMP Gold : An FPGA-based Architecture Simulator for Multiprocessors Zhangxi Tan, Andrew Waterman, David Patterson, Krste Asanovic Parallel Computing Lab,
Advertisements

Lecture 15 Finite State Machine Implementation
Lecture 38: Chapter 7: Multiprocessors Today’s topic –Vector processors –GPUs –An example 1.
Lecture 9: Coarse Grained FPGA Architecture October 6, 2004 ECE 697F Reconfigurable Computing Lecture 9 Coarse Grained FPGA Architecture.
Reconfigurable Computing: What, Why, and Implications for Design Automation André DeHon and John Wawrzynek June 23, 1999 BRASS Project University of California.
ECE669 L12: Interconnection Network Performance March 9, 2004 ECE 669 Parallel Computer Architecture Lecture 12 Interconnection Network Performance.
Lecture 26: Reconfigurable Computing May 11, 2004 ECE 669 Parallel Computer Architecture Reconfigurable Computing.
ECE669 L11: Static Routing Architectures March 4, 2004 ECE 669 Parallel Computer Architecture Lecture 11 Static Routing Architectures.
ENGIN112 L38: Programmable Logic December 5, 2003 ENGIN 112 Intro to Electrical and Computer Engineering Lecture 38 Programmable Logic.
Spring 07, Jan 16 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Introduction Vishwani D. Agrawal James J. Danaher.
Chapter 17 Parallel Processing.
Modern trends in computer architecture and semiconductor scaling are leading towards the design of chips with more and more processor cores. Highly concurrent.
Mahapatra-Texas A&M-Fall'001 Partitioning - I Introduction to Partitioning.
CS 300 – Lecture 2 Intro to Computer Architecture / Assembly Language History.
Lecture 9: Multi-FPGA System Software October 3, 2013 ECE 636 Reconfigurable Computing Lecture 9 Multi-FPGA System Software.
The Memory/Logic Interface in FPGA’s with Large Embedded Memory Arrays The Memory/Logic Interface in FPGA’s with Large Embedded Memory Arrays Steven J.
Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review.
A Fault-tolerant Architecture for Quantum Hamiltonian Simulation Guoming Wang Oleg Khainovski.
CS 151 Digital Systems Design Lecture 38 Programmable Logic.
Network-on-Chip: Communication Synthesis Department of Computer Science Texas A&M University.
Using FPGAs with Embedded Processors for Complete Hardware and Software Systems Jonah Weber May 2, 2006.
VerificationTechniques for Macro Blocks (IP) Overview Inspection as Verification Adversarial Testing Testbench Design Timing Verification.
Dr. Konstantinos Tatas ACOE201 – Computer Architecture I – Laboratory Exercises Background and Introduction.
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
TLC: Transmission Line Caches Brad Beckmann David Wood Multifacet Project University of Wisconsin-Madison 12/3/03.
Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical.
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
Lecture 2: Field Programmable Gate Arrays September 13, 2004 ECE 697F Reconfigurable Computing Lecture 2 Field Programmable Gate Arrays.
Wireless Sensor Monitoring Group Members: Daniel Eke (COMPE) Brian Reilly (ECE) Steven Shih (ECE) Sponsored by:
1 3-General Purpose Processors: Altera Nios II 2 Altera Nios II processor A 32-bit soft core processor from Altera Comes in three cores: Fast, Standard,
Making FPGAs a Cost-Effective Computing Architecture Tom VanCourt Yongfeng Gu Martin Herbordt Boston University BOSTON UNIVERSITY.
A comprehensive method for the evaluation of the sensitivity to SEUs of FPGA-based applications A comprehensive method for the evaluation of the sensitivity.
Eric Keller, Evan Green Princeton University PRESTO /22/08 Virtualizing the Data Plane Through Source Code Merging.
DLS Digital Controller Tony Dobbing Head of Power Supplies Group.
High-Level Interconnect Architectures for FPGAs An investigation into network-based interconnect systems for existing and future FPGA architectures Nick.
Logic Synthesis for Low Power(CHAPTER 6) 6.1 Introduction 6.2 Power Estimation Techniques 6.3 Power Minimization Techniques 6.4 Summary.
Lecture 15: Multi-FPGA System Software I November 1, 2004 ECE 697F Reconfigurable Computing Lecture 15 Mid-term Review.
경종민 1 Multiple-FPGA System; SoC Verification using an array of FPGA ’ s.
High-Level Interconnect Architectures for FPGAs Nick Barrow-Williams.
Laurent VUILLEMIN Platform Compile Software Manager Emulation Division The Veloce Emulator and its Use for Verification and System Integration of Complex.
J. Christiansen, CERN - EP/MIC
Lecture 16: Reconfigurable Computing Applications November 3, 2004 ECE 697F Reconfigurable Computing Lecture 16 Reconfigurable Computing Applications.
I/O Computer Organization II 1 Interconnecting Components Need interconnections between – CPU, memory, I/O controllers Bus: shared communication channel.
Lecture 13: Logic Emulation October 25, 2004 ECE 697F Reconfigurable Computing Lecture 13 Logic Emulation.
Lecture 13: Reconfigurable Computing Applications October 10, 2013 ECE 636 Reconfigurable Computing Lecture 11 Reconfigurable Computing Applications.
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
Lecture 12: Reconfigurable Systems II October 20, 2004 ECE 697F Reconfigurable Computing Lecture 12 Reconfigurable Systems II: Exploring Programmable Systems.
Topics Architecture of FPGA: Logic elements. Interconnect. Pins.
By Edward A. Lee, J.Reineke, I.Liu, H.D.Patel, S.Kim
Processor Architecture
CS 232: Computer Architecture II Prof. Laxmikant (Sanjay) Kale.
High Performance Embedded Computing © 2007 Elsevier Chapter 7, part 3: Hardware/Software Co-Design High Performance Embedded Computing Wayne Wolf.
UltraSPARC III Hari P. Ananthanarayanan Anand S. Rajan.
CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #12 – Systolic.
FPGA CAD 10-MAR-2003.
Baring It All to Software: Raw Machines E. Waingold, M. Taylor, D. Srikrishna, V. Sarkar, W. Lee, V. Lee, J. Kim, M. Frank, P. Finch, R. Barua, J. Babb,
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
EE121 John Wakerly Lecture #15
Sunpyo Hong, Hyesoon Kim
Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Advanced Computer Architecture Lecture 22 Distributed computer Interconnection.
Lecture 17: Dynamic Reconfiguration I November 10, 2004 ECE 697F Reconfigurable Computing Lecture 17 Dynamic Reconfiguration I Acknowledgement: Andre DeHon.
Parallel Computing Presented by Justin Reschke
ECE 506 Reconfigurable Computing Lecture 5 Logic Block Architecture Ali Akoglu.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #22 – Multi-Context.
ECE354 Embedded Systems Introduction C Andras Moritz.
CSCI1600: Embedded and Real Time Software
CSCI1600: Embedded and Real Time Software
Research: Past, Present and Future
Presentation transcript:

Lecture 10: Logic Emulation October 8, 2013 ECE 636 Reconfigurable Computing Lecture 13 Logic Emulation

Lecture 10: Logic Emulation October 8, 2013 Overview Background Rent’s Rule Overcoming pin limitations through scheduling “Virtual wires” implementation Veloce 2 DEEP

Lecture 10: Logic Emulation October 8, 2013 The Challenge Making a large multi-FPGA system is easy. Making it programmable is hard. New approach is a software technology that facilitates hardware implementation. Effectively make a large number of discrete devices look like one large one. Leads to low-cost, scalable multi-FPGA substrate.

Lecture 10: Logic Emulation October 8, 2013 Logic Emulation Emulation takes a sizable amount of resources Compilation time can be large due to FPGA compiles One application: also direct ties to other FPGA computing applications.

Lecture 10: Logic Emulation October 8, 2013 Are Meshes Realistic? The number of wires leaving a partition grows with Rent’s Rule P = KG B Perimeter grows as G 0.5 but unfortunately most circuits grow at G B where B > 0.5 Effectively devices highly pin limited What does this mean for meshes?

Lecture 10: Logic Emulation October 8, 2013 Possible Device Scenarios Rent’s Rule indicates that pin limited situation is getting worse. Frequently some logic must be left unused leading to limited utilization Perhaps this logic can be “reclaimed”

Lecture 10: Logic Emulation October 8, 2013 Partition vs FPGA Pin Count FPGAs don’t have enough pins Problem may or may not get worse depending on “structured” design.

Lecture 10: Logic Emulation October 8, 2013 Virtual Wires Overcome pin limitations by multiplexing pins and signals Schedule when communication will take place.

Lecture 10: Logic Emulation October 8, 2013 Virtual Wires Software Flow Global router enhanced to include scheduling and embedding. Multiplexing logic synthesized from FPGA logic.

Lecture 10: Logic Emulation October 8, 2013 A Simple Example FPGA 1FPGA 2 FPGA 3FPGA 4

Lecture 10: Logic Emulation October 8, 2013 Clocking Strategy Evaluation and communication split into phases Longest dependency path determines number of phases -Overall emulation performance

Lecture 10: Logic Emulation October 8, 2013 Example Scheduling Initial phase requires one uClk for computation, one for communication. Second phase requires 2 communication uClks due to through hop. Note this example assumed needed bandwidth was available.

Lecture 10: Logic Emulation October 8, 2013 Routing Algorithm For each phase, only some internal signals are ready for routing. Routing resources between FPGAs may be considered channels. Solution: Route signals use maze route for each phase. If available bandwidth not present, delay signals until later phases.

Lecture 10: Logic Emulation October 8, 2013 Worst Case Microcycle Count Most designs dominated by latency bound. If original design has been pipelined this is less of an issue V >= max ( L*D, P C /P f ) L = critical path length D = network diameter P C = max circuit partition pin count P f = FPGA pin count

Lecture 10: Logic Emulation October 8, 2013 Improved Scheduling Overlap computation and communication. Effectively create a “data flow” of information Schedule communication to happen as soon as possible -No need for phases. uCLK

Lecture 10: Logic Emulation October 8, 2013 Physical Implementation Small finite state machine encoded and placed in each FPGA Current implementation is one-hot encoding.

Lecture 10: Logic Emulation October 8, 2013 System Implementation Low cost hardware So simple a graduate student can build it

Lecture 10: Logic Emulation October 8, 2013 Benchmark Designs Sparcle – modified Sparc processor -17K gates -4,352 bits of memory -Emulated in circuit. CMMU – cache controller for scalable multiprocessor system -85K gates -Designed as gate array and optimized with SIS Palindrome -14K gates -systolic

Lecture 10: Logic Emulation October 8, 2013 Emulation Results At least 31 FPGAs needed for HW full connectivity (>100 for torus) Some degradation in overall system performance.

Lecture 10: Logic Emulation October 8, 2013 Device Utilization Approximately 45% of CLBs used for design logic. ~10% virtual wires overhead

Lecture 10: Logic Emulation October 8, 2013 Utilizations As devices scale projected utilization increases Hardwired approach doesn’t scale Equation ->

Lecture 10: Logic Emulation October 8, 2013 Veloce 2 Emulates up to 2 billion logic gates using 128 boards of FPGAs VirtuaLAB environment -Allows emulator sharing across a company (128 users) -Allows for emulation of popular interfaces (Ethernet, USB, PCIe) Significant company investment 40 MG per hour compile 11 KW power consumption Veloce 2 Quattro Source: Mentor Graphics Veloce2 brochure

Lecture 10: Logic Emulation October 8, 2013 Veloce 2 Execution Environment Emulation environment contains three parts -Generation of test vectors and other stimulus -Design under test (DUT) and other support tools -Design checking resources (response checker) Source: Mentor Graphics Veloce2 brochure

Lecture 10: Logic Emulation October 8, 2013 Veloce 2 Emulator must interact with a number of interfaces Source: Mentor Graphics Veloce2 brochure

Lecture 10: Logic Emulation October 8, 2013 A Contemporary Example: DEEP Developed at U. Delaware in 2011 Ten FPGA processor boards Backplane used for communication in conjunction with switch boards Ethernet interface Somewhat limited scalability

Lecture 10: Logic Emulation October 8, 2013 A Contemporary Example: DEEP Used to emulate a multi-threaded processor Subblocks assigned to specific FPGAs The same logic is repetitively used for different threads Hand-partitioning of subblocks Bottom line: 80K cycles per second

Lecture 10: Logic Emulation October 8, 2013 Summary Virtual wires overcome pin limitations by intelligently multiplexing I/O signals Key CAD step is scheduling. Simplifies routing and partitioning. Latest push is towards incremental compilation Commercialized by Mentor Graphics – Veloce2 Contemporary multi-FPGA systems often contain many boards -Some require multiplexing of subblocks