1 Heterogeneous Logic Blocks 1.Mixture of two different sizes of LUTs:  Larger LUT and cluster sizes: higher speed  Smaller sizes: more area efficient.

Slides:



Advertisements
Similar presentations
© 2003 Xilinx, Inc. All Rights Reserved Course Wrap Up DSP Design Flow.
Advertisements

ECE 506 Reconfigurable Computing ece. arizona
Commercial FPGAs: Altera Stratix Family Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
Lecture 7 FPGA technology. 2 Implementation Platform Comparison.
Logic Block Architectures. 2 Crosspoint Solution  Requires the use of large amounts of programmable interconnect −  suffer from area-inefficiency 
Altera FLEX 10K technology in Real Time Application.
A Survey of Logic Block Architectures For Digital Signal Processing Applications.
EELE 367 – Logic Design Module 2 – Modern Digital Design Flow Agenda 1.History of Digital Design Approach 2.HDLs 3.Design Abstraction 4.Modern Design Steps.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR SRAM-based FPGA n SRAM-based LE –Registers in logic elements –LUT-based logic element.
Digital Signal Processing and Field Programmable Gate Arrays By: Peter Holko.
Week 1- Fall 2009 Dr. Kimberly E. Newman University of Colorado.
Some Thoughts on Technology and Strategies for Petaflops.
Lecture 26: Reconfigurable Computing May 11, 2004 ECE 669 Parallel Computer Architecture Reconfigurable Computing.
Design Flow – Computation Flow. 2 Computation Flow For both run-time and compile-time For some applications, must iterate.
Configurable System-on-Chip: Xilinx EDK
Programmable logic and FPGA
1 Chapter 13 Cores and Intellectual Property. 2 Overview FPGA intellectual property (IP) can be defined as a reusable design block (Hard, Firm or soft)
1 Fast Communication for Multi – Core SOPC Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab.
1 Chapter 14 Embedded Processing Cores. 2 Overview RISC: Reduced Instruction Set Computer RISC-based processor: PowerPC, ARM and MIPS The embedded processor.
George Mason University ECE 448 – FPGA and ASIC Design with VHDL Overview of Modern FPGAs ECE 448 Lecture 14.
Implementation of DSP Algorithm on SoC. Mid-Semester Presentation Student : Einat Tevel Supervisor : Isaschar Walter Accompaning engineer : Emilia Burlak.
Introduction to FPGA and DSPs Joe College, Chris Doyle, Ann Marie Rynning.
Computer Organization and Assembly language
경종민 1 System Functionality Verification using FPGA.
Future FPGA Development Duane McDonald Digital Electronics 3.
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab Spring 2009.
Lecture 2: Field Programmable Gate Arrays September 13, 2004 ECE 697F Reconfigurable Computing Lecture 2 Field Programmable Gate Arrays.
EE4OI4 Engineering Design Programmable Logic Technology.
1 3-General Purpose Processors: Altera Nios II 2 Altera Nios II processor A 32-bit soft core processor from Altera Comes in three cores: Fast, Standard,
Ch.9 CPLD/FPGA Design TAIST ICTES Program VLSI Design Methodology Hiroaki Kunieda Tokyo Institute of Technology.
Electronics in High Energy Physics Introduction to Electronics in HEP Field Programmable Gate Arrays Part 1 based on the lecture of S.Haas.
Computers organization & Assembly Language Chapter 0 INTRODUCTION TO COMPUTING Basic Concepts.
CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #6 – Modern.
집적회로 Spring 2007 Prof. Sang Sik AHN Signal Processing LAB.
Advanced Computer Architecture, CSE 520 Generating FPGA-Accelerated DFT Libraries Chi-Li Yu Nov. 13, 2007.
SPREE RTL Generator RTL Simulator RTL CAD Flow 3. Area 4. Frequency 5. Power Correctness1. 2. Cycle count SPREE Benchmarks Verilog Results 3. Architecture.
J. Christiansen, CERN - EP/MIC
Heterogeneous FPGA architecture and CAD Peter Jamieson Supervisor: Jonathan Rose.
FPGA (Field Programmable Gate Array): CLBs, Slices, and LUTs Each configurable logic block (CLB) in Spartan-6 FPGAs consists of two slices, arranged side-by-side.
Introduction to FPGA Created & Presented By Ali Masoudi For Advanced Digital Communication Lab (ADC-Lab) At Isfahan University Of technology (IUT) Department.
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
CPLD Vs. FPGA Positioning Presentation
Tools - LogiBLOX - Chapter 5 slide 1 FPGA Tools Course The LogiBLOX GUI and the Core Generator LogiBLOX L BX.
 Historical view:  1940’s-Vacuum tubes  1947-Transistors invented by willliam shockely & team  1959-Integrated chips invented by Texas Instrument.
© 2010 Altera Corporation - Public Lutiac – Small Soft Processors for Small Programs David Galloway and David Lewis November 18, 2010.
M.Mohajjel. Why? TTM (Time-to-market) Prototyping Reconfigurable and Custom Computing 2Digital System Design.
ESS | FPGA for Dummies | | Maurizio Donna FPGA for Dummies Basic FPGA architecture.
Programmable Logic Device Architectures
Introduction to Field Programmable Gate Arrays Lecture 1/3 CERN Accelerator School on Digital Signal Processing Sigtuna, Sweden, 31 May – 9 June 2007 Javier.
Survey of Reconfigurable Logic Technologies
ESS | FPGA for Dummies | | Maurizio Donna FPGA for Dummies Modern FPGA architecture.
Delivered by.. Love Jain p08ec907. Design Styles  Full-custom  Cell-based  Gate array  Programmable logic Field programmable gate array (FPGA)
FPGA Technology Overview Carl Lebsack * Some slides are from the “Programmable Logic” lecture slides by Dr. Morris Chang.
System on a Programmable Chip (System on a Reprogrammable Chip)
집적회로설계 1 Spring 2007 Prof. Sang Sik AHN Signal Processing LAB.
Programmable Logic Devices
UniBoard: Xilinx or Altera
Programmable Logic Device Architectures
Topics SRAM-based FPGA fabrics: Xilinx. Altera..
ECE354 Embedded Systems Introduction C Andras Moritz.
Design for Embedded Image Processing on FPGAs
Introduction to Programmable Logic
Head-to-Head Xilinx Virtex-II Pro Altera Stratix 1.5v 130nm copper
Instructor: Dr. Phillip Jones
Electronics for Physicists
Spartan FPGAs مرتضي صاحب الزماني.
The performance requirements for DSP applications continue to grow and the traditional solutions do not adequately address this new challenge Paradigm.
Electronics for Physicists
Reconfigurable Computing (EN2911X, Fall07)
Presentation transcript:

1 Heterogeneous Logic Blocks 1.Mixture of two different sizes of LUTs:  Larger LUT and cluster sizes: higher speed  Smaller sizes: more area efficient −Up to the CAD tool to select the resource 2.Mixture of PAL-like LBs and LUT-based LBs:  PAL blocks: improved circuit speed  LUT blocks: area efficiency 3.Mixture of “specific-purpose logic” and general- purpose LBs:  SP LBs: superior area, speed, and power consumption  If the function is not used, the silicon area is wasted

2 Heterogeneous Logic Blocks Key questions: 1.Which kinds of SP functions? 2.What should be the ratio: SP/GP? 3.What can be done about SP LBs not used in a specific application? −Rose’s golden rule: “build structures that are always useful, even if that use is less than perfectly efficient.” −“The more useful a hard structure is, across a wider range of applications, then the greater its net benefit - provided the cost of the extra functionality is not excessive.” −Rose. Hard vs. Soft: The Central Question of Pre-Fabricated Silicon. In Proceedings of the 34 th International Symposium on Multiple-Valued Logic (ISMVL’04), 2004.

3 Hard Blocks Common hard blocks in modern FPGAs:  Memory  Multipliers  MAC for DSP applications  Microprocessors

Embedded Memories

5 Memory in Altera Flex10K

6 Memory in FLEX 10K

7

8 Heterogeneous Logic Blocks Each EAB:  2048 bits if used as memory −Dual port RAM, ROM, FIFO, …  gates if used as logic

9 پيكر بندي به عنوان حافظه A[10..0] D0 2048x1 D[7..0] A[7..0] 256x8 A[8..0]D[3..0] 512x4 A[9..0]D[1..0] 1028x2

10 پيكر بندي به عنوان حافظه Can be used independently Can be combined for a larger memory A[8..0]D[3..0] 512x4 A[8..0]D[3..0] 512x4 D[7..0] A[8..0]

11 Altera Cyclone III Architecture

12 Cyclone III

13 پيكر بندي به عنوان تابع منطقي مي تواند به عنوان LUT به كار رود : مثل جذرگير ( با يك EAB 8 ورودي 8 خروجي ). مزيت ( نسبت به پياده سازي با چند LE): تأخير قابل پيش بيني و سرعت بيشتر. مي تواند مستقلاً استفاده شود يا چند EAB ترکيب شوند و تابع پيچيده تري را پياده سازي کنند.  Remember: 3.What can be done about SP LBs not used in a specific application?

14 Cyclone III M9K

15 Memory Modes Embedded shift register mode ROM mode FIFO buffer Single/dual-port

16 Memory Volume in Cyclone III

17 Memory Modes Simple dual-port mode:  Supports simultaneous read and write operation to different locations. True dual-port mode:  Supports any combination of two-port operations: −two reads, −two writes, −one read and one write, at two different clock frequencies.

18 Memory Block Megafunctions Can instantiate memory blocks by Quartus MegaWizard Can instantiate them in your VHDL/Verilog code.  Refer to −“RAM Megafunction User Guide,” 2007,

19 Altera Stratix II Embedded Memory

20 TriMatrix Memory Structure

21 Stratix II RAM Blocks

22 Stratix IV RAM Blocks

23 کاربردهاي Embedded Memory ضرب کننده 4x4: ( يا هر تابع رياضي پيچيده : ريشة B ام عدد A) براي ضرب کننده هاي بزرگتر، از چند ضرب کننده ي 4x4 و چند جمع کننده استفاده مي کنيم.

24 کاربردهاي Embedded Memory ضرب کننده ي ثابت ( در DSP و سيستمهاي کنترلي ): مقدار ثابت تعيين کننده ي الگوي محتويات EAB خواهد بود. اگر مقدار ثابت در حين اجرا تغيير کند مي توان الگوي جديد را در EAB لود کرد. دقت ضرب کننده را مي توان با تنظيم تعداد بيتهاي خروجي تنظيم کرد ( براي صرفه جويي )

25 کاربردهاي Embedded Memory FSM هاي با تغيير حالت (transition) هاي پيچيده : FSM عمومي (general purpose):

26 کاربردهاي Memory

27 کاربردهاي Embedded Memory توابع Transcendental: سينوس،... ، لگاريتم،... که محاسبه شان با الگوريتم و پياده سازي سخت افزاريشان مشکل است. آرگومان تابع : ورودي خطوط آدرس. نتيجه : روي خروجي داده.

28 کاربردهاي Embedded Memory مبدل کدهاي بزرگ : مبدل کد عدد 8 بيتي به عدد 10 بيتي

29 Xilinx Virtex II Pro (Digital Clock Manager)

30 Xilinx Virtex II Pro

31 Xilinx Virtex 4

32 Virtex 5

Computation-Oriented Tiles

34 Virtex Family

35 ضرب كننده هاي 18*18 براي كارهاي محاسباتي و DSP

36 تراشه هاي خانوادةVirtex II Pro (Digital Clock Manager)

37 ضرب كننده هاي 18*18 In Virtex 5: DSP48E slices - 25 x 18, two ’ s complement multiplication - One adder, one subtracter and an accumulator

38 Multipliers in Altera Cyclone III

39 Embedded Multipliers

40 Embedded Multipliers Can configure each embedded multiplier as  one 18 × 18 or  two 9 × 9. For > 18 × 18, the Quartus II software cascades. No restriction on the data width  but the greater the data width, the slower the multiplication process. Can also implement soft multipliers using Cyclone III M9K memory blocks.  Increase the number of multipliers.

41 Number of Multipliers

42 Multiplier Block Architecture

43 9-Bit Mode

44 Multiplier Megafunctions For instantiating multipliers, refer to:  Quartus User Guide, Synthesis,

45 Stratix II DSP Blocks

46 Stratix II DSP Blocks

47 Stratix II DSP Blocks

48 Stratix II DSP Blocks

49

50 Stratix Architecture

51 Ratio-Based Architectures If multipliers not needed by an application, the multipliers provide little benefit.  One way: multiple sub-families within a device family with different ratios of soft logic to hard-logic.  Designer can select the device with the most appropriate ratio −  minimize “wasted” area −  FPGA vendor must support a larger number of devices soft/hard ratio

52 Ratio-Based Architectures Virtex 4/Virtex 5 sub-families: 1.LX: focus on soft logic and memory 2.SX: focus on arithmetic computational units 3.FX: with a processor and high-speed serial interface focus Virtex 6: 1.LXT: High-performance logic with advanced serial connectivity 2.SXT: Highest signal processing capability with advanced serial connectivity 3.HXT: Highest bandwidth serial connectivity

53 Xilinx Virtex 4

54 Virtex 5

Embedded Processors

56 System-Level Design  Until recently, CPU and its peripheral: as discrete chips. Two Scenarios: Memory Connected to CPU via general-purpose processor bus Tightly-coupled memory (TCM) connected to processor via dedicated bus

57 Embedded System Design  Dedicated chips for CPU and peripherals  −High area cost, −Low reliability.  For relatively small amount of memory, integrated memory in FPGA is used.

58 Challenges Challenges:  Decision on hardware/software partitioning.  Design environment must support hardware/software co-verification.

59 SoPC SoC:  A chip that integrates the major functional elements of a complete end product. Complex FPGAs :  CPU  Memory  Arithmetic units (multipliers, …)  Peripheral modules  Logic  Whole system on a chip (SoPC)

60 Microprocessor Cores Two types:  Hard Core −Implemented as hardwired component −E.g. PowerPC in Xilinx −E.g. Arm in Altera −E.g. MIPS in QuickLogic  Soft Core −Configure logic blocks to act as microprocessor(s) −E.g. MicroBlaze in Xilinx −E.g. NiosII in Altera −E.g. Q90C1 in QuickLogic

61 Hard Microprocessor Cores Two Scenarios: 1.Locate it in a strip to the side of FPGA fabric.  Easier for tools because the main FPGA fabric is identical for devices with or without hard code  FPGA vendor can embed a lot of additional functions in the strip to complement the micro.  Altera: ARM in Excalibur

62 Hard Microprocessor Cores Two Scenarios: 2.Embed core(s) directly into the main FPGA fabric  Design tools must consider presence of these blocks in the fabric.  Memory used by the core from embedded RAM blocks  Speed advantages by proximity to the main FPGA fabric.  Xilinx: PowerPC in Virtex II-Pro, Virtex 4, and Virtex 5.

63 Hard Microprocessor Cores 2.(cont.) Embed core(s) directly into the main FPGA fabric  No dedicated processor bus or peripheral bus.  These buses must be implemented using FPGA logic.  Advantage: flexibility to define the architecture of the embedded system.  Disadvantage: the processor cannot perform useful work without configuring the FPGA logic

64 Soft Processor Core Disadvantages:  Generally slower  Larger Advantage:  can often be customized to exactly suit the needs of the application −  Gains back some of the lost performance and area efficiency.

65 Soft Microprocessor Cores Firm or Soft:  Soft: if in the form of RTL netlist that will be synthesized,  Firm: if placed and routed. Peripherals in soft or firm form:  E.g. Memory controllers, interrupt controllers, communication functions, timer counters.  Refer to library of FPGA vendor. Xilinx  MicroBlaze: 32-bit microprocessor (~1000 logic cells)  PicoBlaze: 8-bit microprocessor (~150 logic cells) Altera:  NiosII: 32-bits

66 References [Xilinx] [Altera]