Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 10: Logic Emulation October 8, 2013 ECE 636 Reconfigurable Computing Lecture 13 Logic Emulation.

Similar presentations


Presentation on theme: "Lecture 10: Logic Emulation October 8, 2013 ECE 636 Reconfigurable Computing Lecture 13 Logic Emulation."— Presentation transcript:

1 Lecture 10: Logic Emulation October 8, 2013 ECE 636 Reconfigurable Computing Lecture 13 Logic Emulation

2 Lecture 10: Logic Emulation October 8, 2013 Overview Background Rent’s Rule Overcoming pin limitations through scheduling “Virtual wires” implementation Veloce 2 DEEP

3 Lecture 10: Logic Emulation October 8, 2013 The Challenge Making a large multi-FPGA system is easy. Making it programmable is hard. New approach is a software technology that facilitates hardware implementation. Effectively make a large number of discrete devices look like one large one. Leads to low-cost, scalable multi-FPGA substrate.

4 Lecture 10: Logic Emulation October 8, 2013 Logic Emulation Emulation takes a sizable amount of resources Compilation time can be large due to FPGA compiles One application: also direct ties to other FPGA computing applications.

5 Lecture 10: Logic Emulation October 8, 2013 Are Meshes Realistic? The number of wires leaving a partition grows with Rent’s Rule P = KG B Perimeter grows as G 0.5 but unfortunately most circuits grow at G B where B > 0.5 Effectively devices highly pin limited What does this mean for meshes?

6 Lecture 10: Logic Emulation October 8, 2013 Possible Device Scenarios Rent’s Rule indicates that pin limited situation is getting worse. Frequently some logic must be left unused leading to limited utilization Perhaps this logic can be “reclaimed”

7 Lecture 10: Logic Emulation October 8, 2013 Partition vs FPGA Pin Count FPGAs don’t have enough pins Problem may or may not get worse depending on “structured” design.

8 Lecture 10: Logic Emulation October 8, 2013 Virtual Wires Overcome pin limitations by multiplexing pins and signals Schedule when communication will take place.

9 Lecture 10: Logic Emulation October 8, 2013 Virtual Wires Software Flow Global router enhanced to include scheduling and embedding. Multiplexing logic synthesized from FPGA logic.

10 Lecture 10: Logic Emulation October 8, 2013 A Simple Example FPGA 1FPGA 2 FPGA 3FPGA 4

11 Lecture 10: Logic Emulation October 8, 2013 Clocking Strategy Evaluation and communication split into phases Longest dependency path determines number of phases -Overall emulation performance

12 Lecture 10: Logic Emulation October 8, 2013 Example Scheduling Initial phase requires one uClk for computation, one for communication. Second phase requires 2 communication uClks due to through hop. Note this example assumed needed bandwidth was available.

13 Lecture 10: Logic Emulation October 8, 2013 Routing Algorithm For each phase, only some internal signals are ready for routing. Routing resources between FPGAs may be considered channels. Solution: Route signals use maze route for each phase. If available bandwidth not present, delay signals until later phases.

14 Lecture 10: Logic Emulation October 8, 2013 Worst Case Microcycle Count Most designs dominated by latency bound. If original design has been pipelined this is less of an issue V >= max ( L*D, P C /P f ) L = critical path length D = network diameter P C = max circuit partition pin count P f = FPGA pin count

15 Lecture 10: Logic Emulation October 8, 2013 Improved Scheduling Overlap computation and communication. Effectively create a “data flow” of information Schedule communication to happen as soon as possible -No need for phases. uCLK

16 Lecture 10: Logic Emulation October 8, 2013 Physical Implementation Small finite state machine encoded and placed in each FPGA Current implementation is one-hot encoding.

17 Lecture 10: Logic Emulation October 8, 2013 System Implementation Low cost hardware So simple a graduate student can build it

18 Lecture 10: Logic Emulation October 8, 2013 Benchmark Designs Sparcle – modified Sparc processor -17K gates -4,352 bits of memory -Emulated in circuit. CMMU – cache controller for scalable multiprocessor system -85K gates -Designed as gate array and optimized with SIS Palindrome -14K gates -systolic

19 Lecture 10: Logic Emulation October 8, 2013 Emulation Results At least 31 FPGAs needed for HW full connectivity (>100 for torus) Some degradation in overall system performance.

20 Lecture 10: Logic Emulation October 8, 2013 Device Utilization Approximately 45% of CLBs used for design logic. ~10% virtual wires overhead

21 Lecture 10: Logic Emulation October 8, 2013 Utilizations As devices scale projected utilization increases Hardwired approach doesn’t scale Equation ->

22 Lecture 10: Logic Emulation October 8, 2013 Veloce 2 Emulates up to 2 billion logic gates using 128 boards of FPGAs VirtuaLAB environment -Allows emulator sharing across a company (128 users) -Allows for emulation of popular interfaces (Ethernet, USB, PCIe) Significant company investment 40 MG per hour compile 11 KW power consumption Veloce 2 Quattro Source: Mentor Graphics Veloce2 brochure

23 Lecture 10: Logic Emulation October 8, 2013 Veloce 2 Execution Environment Emulation environment contains three parts -Generation of test vectors and other stimulus -Design under test (DUT) and other support tools -Design checking resources (response checker) Source: Mentor Graphics Veloce2 brochure

24 Lecture 10: Logic Emulation October 8, 2013 Veloce 2 Emulator must interact with a number of interfaces Source: Mentor Graphics Veloce2 brochure

25 Lecture 10: Logic Emulation October 8, 2013 A Contemporary Example: DEEP Developed at U. Delaware in 2011 Ten FPGA processor boards Backplane used for communication in conjunction with switch boards Ethernet interface Somewhat limited scalability

26 Lecture 10: Logic Emulation October 8, 2013 A Contemporary Example: DEEP Used to emulate a multi-threaded processor Subblocks assigned to specific FPGAs The same logic is repetitively used for different threads Hand-partitioning of subblocks Bottom line: 80K cycles per second

27 Lecture 10: Logic Emulation October 8, 2013 Summary Virtual wires overcome pin limitations by intelligently multiplexing I/O signals Key CAD step is scheduling. Simplifies routing and partitioning. Latest push is towards incremental compilation Commercialized by Mentor Graphics – Veloce2 Contemporary multi-FPGA systems often contain many boards -Some require multiplexing of subblocks


Download ppt "Lecture 10: Logic Emulation October 8, 2013 ECE 636 Reconfigurable Computing Lecture 13 Logic Emulation."

Similar presentations


Ads by Google