ALTERA FPGAs and NIOSII

Slides:



Advertisements
Similar presentations
Comparison of Altera NIOS II Processor with Analog Device’s TigerSHARC
Advertisements

Commercial FPGAs: Altera Stratix Family Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
Reconfigurable Computing (EN2911X, Fall07) Lecture 04: Programmable Logic Technology (2/3) Prof. Sherief Reda Division of Engineering, Brown University.
Processor System Architecture
Lecture 9: Coarse Grained FPGA Architecture October 6, 2004 ECE 697F Reconfigurable Computing Lecture 9 Coarse Grained FPGA Architecture.
Programmable Logic Devices
Spartan II Features  Plentiful logic and memory resources –15K to 200K system gates (up to 5,292 logic cells) –Up to 57 Kb block RAM storage  Flexible.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR SRAM-based FPGA n SRAM-based LE –Registers in logic elements –LUT-based logic element.
Week 2 Dr. Kimberly E. Newman Hybrid Embedded Systems.
DSP for FPGA SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software and Applications Miodrag Bolic.
Lecture 26: Reconfigurable Computing May 11, 2004 ECE 669 Parallel Computer Architecture Reconfigurable Computing.
ENGIN112 L38: Programmable Logic December 5, 2003 ENGIN 112 Intro to Electrical and Computer Engineering Lecture 38 Programmable Logic.
NIOS II Ethernet Communication Final Presentation
t Popularity of the Internet t Provides universal interconnection between individual groups that use different hardware suited for their needs t Based.
Configurable System-on-Chip: Xilinx EDK
University College Cork IRELAND Hardware Concepts An understanding of computer hardware is a vital prerequisite for the study of operating systems.
Programmable logic and FPGA
ECE Department: University of Massachusetts, Amherst Lab 1: Introduction to NIOS II Hardware Development.
Ethernet Bomber Ethernet Packet Generator for network analysis Oren Novitzky & Rony Setter Advisor: Mony Orbach Started: Spring 2008 Part A final Presentation.
Ethernet Bomber Ethernet Packet Generator for network analysis Oren Novitzky & Rony Setter Advisor: Mony Orbach Spring 2008 – Winter 2009 Midterm Presentation.
ASPPRATECH.
Eye-RIS. Vision System sense – process - control autonomous mode Program stora.
Basic Adders and Counters Implementation of Adders in FPGAs ECE 645: Lecture 3.
GallagherP188/MAPLD20041 Accelerating DSP Algorithms Using FPGAs Sean Gallagher DSP Specialist Xilinx Inc.
The Xilinx Spartan 3 FPGA EGRE 631 2/2/09. Basic types of FPGA’s One time programmable Reprogrammable (non-volatile) –Retains program when powered down.
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
EE4OI4 Engineering Design Programmable Logic Technology.
1 3-General Purpose Processors: Altera Nios II 2 Altera Nios II processor A 32-bit soft core processor from Altera Comes in three cores: Fast, Standard,
MICROPROCESSOR INPUT/OUTPUT
1 Nios II Processor Architecture and Programming CEG 4131 Computer Architecture III Miodrag Bolic.
Electronics in High Energy Physics Introduction to Electronics in HEP Field Programmable Gate Arrays Part 1 based on the lecture of S.Haas.
System Arch 2008 (Fire Tom Wada) /10/9 Field Programmable Gate Array.
J. Christiansen, CERN - EP/MIC
FPGA (Field Programmable Gate Array): CLBs, Slices, and LUTs Each configurable logic block (CLB) in Spartan-6 FPGAs consists of two slices, arranged side-by-side.
Towards the Design of Heterogeneous Real-Time Multicore System m Yumiko Kimezawa February 1, 20131MT2012.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n FPGA fabric architecture concepts.
NIOS II Ethernet Communication Final Presentation
Field Programmable Gate Arrays (FPGAs) An Enabling Technology.
1 Introduction CEG 4131 Computer Architecture III Miodrag Bolic.
Computer Organization - 1. INPUT PROCESS OUTPUT List different input devices Compare the use of voice recognition as opposed to the entry of data via.
ECEG-3202 Computer Architecture and Organization Chapter 3 Top Level View of Computer Function and Interconnection.
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
1 Presented By: Eyal Enav and Tal Rath Eyal Enav and Tal Rath Supervisor: Mike Sumszyk Mike Sumszyk.
Network On Chip Platform
Lab 2 Parallel processing using NIOS II processors
Tools - LogiBLOX - Chapter 5 slide 1 FPGA Tools Course The LogiBLOX GUI and the Core Generator LogiBLOX L BX.
Different Microprocessors Tamanna Haque Nipa Lecturer Dept. of Computer Science Stamford University Bangladesh.
© 2010 Altera Corporation - Public Lutiac – Small Soft Processors for Small Programs David Galloway and David Lewis November 18, 2010.
ESS | FPGA for Dummies | | Maurizio Donna FPGA for Dummies Basic FPGA architecture.
Ethernet Bomber Ethernet Packet Generator for network analysis
Different Microprocessors Tamanna Haque Nipa Lecturer Dept. of Computer Science Stamford University Bangladesh.
Survey of Reconfigurable Logic Technologies
بسم الله الرحمن الرحيم MEMORY AND I/O.
George Mason University ECE 448 – FPGA and ASIC Design with VHDL FPGA Devices ECE 448 Lecture 5.
Embedded Systems Design with Qsys and Altera Monitor Program
1 Basic Processor Architecture. 2 Building Blocks of Processor Systems CPU.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n FPGA fabric architecture concepts.
FPGA Technology Overview Carl Lebsack * Some slides are from the “Programmable Logic” lecture slides by Dr. Morris Chang.
Nios II Processor: Memory Organization and Access
Lab 1: Using NIOS II processor for code execution on FPGA
Topics SRAM-based FPGA fabrics: Xilinx. Altera..
Introduction to Programmable Logic
Head-to-Head Xilinx Virtex-II Pro Altera Stratix 1.5v 130nm copper
The Xilinx Virtex Series FPGA
Low cost FPGA implimentation of tracking system from USB to VGA
Basic Adders and Counters Implementation of Adders
The Xilinx Virtex Series FPGA
ADSP 21065L.
Presentation transcript:

ALTERA FPGAs and NIOSII ELG6158 Computer Systems Architecture Miodrag Bolic

Presentation Outline Basic description of Stratix Altera Devices NIOS II processor architecture How to design a system using NIOS II processor

Stratix EP1S10 [2]

TriMatrix™ Memory [1] Dedicated External Memory Interface M512 Blocks M4K Blocks M-RAM Small FIFOs Shift Register Rake Receiver Correlator FIR Filter Delay Line Header / Cell Storage Channelized Functions ATM cell–packet processing Nios Program Memory Packet / Data Storage Nios Program Memory System Cache Video Frame Buffers Echo Canceller Data Storage Look-Up Schemes Packet & Cell Buffering Cache More Bits For Larger Memory Buffering 512 Kbits per block + parity 4 Kbits per block + parity 512 bits per block + parity More Data Ports for Greater Memory Bandwidth

Memory Bandwidth Summary Stratix Device Family [1] Total RAM Bits M-RAM Blocks M4K Blocks M512 Blocks Maximum Bandwidth (Mbps) EP1S10 920,448 1 60 94 1,245,024 EP1S20 1,669,248 2 82 194 2,096,928 EP1S25 1,944,576 138 224 2,894,400 EP1S30 3,317,184 4 171 295 3,750,192 EP1S40 3,423,744 183 384 4,384,800 EP1S60 5,215,104 6 292 574 6,762,528 EP1S80 7,427,520 9 364 767 8,784,720

Logic Array Blocks (LAB) [2] Control Signals 10 LEs Local Interconnect LAB-Wide Control Signals 4 LE1 LE2 LE3 LE4 LE5 LE6 LE7 LE8 LE10 LE9 4 4 4 4 Local Interconnect 4 4 4 4 4

LAB Arrangement LAB LAB LAB LAB LAB LAB M512 LAB LAB LAB LAB LAB LAB LABs Communicate Directly to Each Other & Other Blocks Both Horizontally & Vertically LAB Column LAB LAB LAB LAB LAB LAB M512 LAB Row LAB LAB LAB LAB LAB LAB M512

Logic Elements Stratix™ LE Smallest Units of Logic Used for Combinatorial/Registered Logic Carry-In Register Chain Input LUT Chain Input Stratix™ LE General Routing & Local Routing Carry-Out LUT Chain Output Register Chain Output

Total LE Resources Device Total LEs EP1S10 10,570 EP1S20 18,460 EP1S25 25,660 EP1S30 32,470 EP1S40 41,250 EP1S60 57,120 EP1S80 79,040

LE Datasheet Image

LE Features 4-Input Look-Up Table (LUT) Configurable Register 2 Operation Modes Dynamic Add/Subtract Control Carry-Select Chain Logic Performance-Enhancing Features LUT & Register Chain Area-Enhancing Features Register Packing & Feedback

LE Inputs/Outputs Inputs Outputs 4 Data 2 LE Carry-Ins & 1 Lab Carry-In 1 Dynamic Addition/Subtraction Control Register Controls Outputs 2 LE Carry-Outs 2 Row/Column/DirectLink Outputs 1 Local Output 1 LUT Chain & 1 Register Chain

Operation Modes Normal Dynamic Arithmetic General Combinatorial or Registered Logic Dynamic Arithmetic Used for Adders Counters Accumulators Comparators Uses Carry Chain for Faster Operation Chosen Automatically by Quartus® II & NativeLink® Synthesis Tools Based on Design & Design Constraints

LE Register Controls Clock/Clock Enable Synchronous & Asynchronous Clear Synchronous & Asynchronous Load & Data Asynchronous Preset Preset Function Loads a ‘1 ALD/PRE ADATA D Q ENA CLRN

Normal Mode LUT Chain Input Register Chain Input Register Control Signals addnsub cin (2) data1 4-Input LUT Sync Load & Clear Logic data2 D DATA Row, Column & DirectLink Routing data3 data4 Local Routing Register Feedback LUT Chain Output Register Chain Output Note: Functional Diagram Only. Please See Datasheet for more Details. Addnsum & data1 connected via XOR logic

Combinatorial Logic Only LUT Chain Input Register Chain Input Register Control Signals addnsub cin (2) data1 4-Input LUT Sync Load & Clear Logic data2 D DATA Row, Column & DirectLink Routing data3 data4 Local Routing Register Feedback LUT Chain Output Register Chain Output Note: Functional Diagram Only. Please See Datasheet for more Details. Addnsum & data1 connected via XOR logic

Sequential Logic Only LUT Chain Input Register Chain Input Register Control Signals addnsub cin (2) data1 4-Input LUT Sync Load & Clear Logic data2 D DATA Row, Column & DirectLink Routing data3 data4 Local Routing Register Feedback LUT Chain Output Register Chain Output Note: Functional Diagram Only. Please See Datasheet for more Details. Addnsum & data1 connected via XOR logic

Dynamic Arithmetic Mode LAB Carry-In Register Chain Input Register Control Signals Carry-In Logic Carry-In0 Carry-In1 addnsub data1 Sum Calculator Sync Load & Clear Logic D DATA data2 Row, Column & DirectLink Routing data3 Carry Calculator Local Routing Carry-In0 Carry-Out Logic Carry-In1 Register Chain Output Carry-Out1 Carry-Out0 Note: Functional Diagram Only. Please See Datasheet for more Details.

Carry-Select Logic Each Cell Pre-Calculates Sum & Carry-Out for Carry = 1 & Carry = 0 Carry-In Selects which Pre-Calculation Is Used CIN 1 Single LUT A0+B0+1 A0+B0+0 SUMOUT COUT1 COUT0 COUT

Carry Chain Details Carry Chains Begin & End in Any LE 1 LAB Carry-In A1 LE1 LE1 Sum1 B1 A2 LE2 LE2 Sum2 B2 A3 LE3 LE3 LE3 Sum3 Carry Chains Begin & End in Any LE 2 Carry Chains Can Exist In Any LAB Carry-Select Generated in LEs 5 & 10 Every LE Not in Critical Timing Path B3 A4 LE4 Sum4 LE4 B4 A5 LE5 Sum5 B5 1 A6 LE6 Sum6 B6 A7 LE7 Sum7 B7 A8 LE8 Sum8 B8 A9 LE9 Sum9 B9 A10 LE10 Sum10 B10 LAB Carry-Out

LUT & Register Chains LUT Chain Register Chain Output of LUT Connects Directly to LUT Below Available Only In Normal Mode Ex. Wide Fan-In Functions Register Chain Output of Register Connects Directly to Register Below (Shift Register) LUT Can Be Used for Unrelated Function Ex. LE Shift Register Both Chains End at LAB Boundary LE1 LUT D Q LE2 LUT D Q LUT Chain Register Chain LEs 3 - 10

Stratix Interconnects Global Signals LE & Register Chains Carry Chains Local Interconnect DirectLink™ MultiTrack Interconnects Row Interconnects Column Interconnects

# of Local Lines Depends on Block Local Interconnect Groups 10 LEs Together Provides Input Signals to Blocks (LABs, Memory, DSP Blocks) Local Interconnect M512 Local Interconnect LAB # of Local Lines Depends on Block

DirectLink Allows Blocks to Drive Local Interconnects of Neighboring Blocks in the Same Row Local Interconnect LE1 LE2 LE3 LE4 LE5 LE6 LE7 LE8 LE10 LE9 Local Interconnect LE1 LE2 LE3 LE4 LE5 LE6 LE7 LE8 LE10 LE9 Local Interconnect M512

DirectLink (cont.) Provides Fast Communication between Neighboring Blocks One LE Has Fast Access to Up to 29 Other LEs in Area Saves Row Resources

MultiTrack Interconnect Architecture Provides Connections between All Device Blocks Series of 3 Types of Continuous Row & Column Interconnects Each Has a Fixed Speed and Length Constant Performance Across Family Members within Given Area Simplifies Block Design Same Routing Resources Available Regardless of Location

Row Resources 3 Row Interconnect Lengths R4 R8 R24 R4 160 Lines Wide 4 LABs R4 160 Lines Wide R8 48 Lines Wide R24 24 Lines Wide

R4 Routing Line Driving Left R4 Routing Line Driving Right Row Resources (cont.) Each Block Has Own Row Resource to Drive Right and Left R4 Routing Line Driving Left R4 Routing Line Driving Right : : : : : : : : :

Row Resource Details R4 R8 R24 Terminate at M-RAM Only Connect to Local & R8/C8 Interconnects Faster than 2 R4s R24 Do Not Interface with Blocks Directly Can Cross M-RAM Fastest Resource for Long Connections (Ex. Design Block to Design Block)

Column Resources 3 Interconnect Lengths Features Similar to Row Interconnects Each Block Has Column Resource to Drive Up and Down Interconnects Are Staggered Interconnects Can Drive End-to-End C8 C4 4 LABs

Presentation Outline Basic description of Stratix Altera Devices NIOS II processor architecture How to design a system using NIOS II processor

NIOS II Overview [3] Soft IP Core A soft-core processor is a microprocessor fully described in software, usually in an HDL, which can be synthesized in programmable hardware, such as FPGAs. Reduced Instruction Set Computer (RISC) No pipeline, 5 or 6 stages pipeline configurations Full 32-bit instruction set, data path, and address space 32 general-purpose registers 32 external interrupt sources Access to a variety of on-chip peripherals, and interfaces to off-chip memories and peripherals Software development environment based on the GNU C/C++ tool chain and Eclipse IDE

NIOS II Scalability Powerful multiprocessing systems can be built

NIOS II Processor Core [3] How do we build

Implementation The functional units of the Nios II architecture form the foundation for the Nios II instruction set. The Nios II architecture describes an instruction set, not a particular hardware implementation. Trade-offs: More or less of a feature - amount of instruction cache memory. Inclusion or exclusion of a feature - the JTAG debug module. Hardware implementation or software emulation - divider

Types of Processors

Memory Organization What is the name of the technique for accessing peripherals?

Cache Performance Memory I-Cache D-Cache Normalised Performance SDRAM No No 40.2% SDRAM No Yes 55.2% SDRAM Yes No 64.3% SDRAM Yes Yes 96.4% OnChip No No 100.0% OnChip No Yes 98.0% OnChip Yes No 110.2% OnChip Yes Yes 105.6% Memory I-Cache D-Cache Normalised Performance SDRAM No No 40.2% SDRAM No Yes 55.2% SDRAM Yes No 64.3% SDRAM Yes Yes 96.4% OnChip No No 100.0% OnChip No Yes 98.0% OnChip Yes No 110.2% OnChip Yes Yes 105.6% Performance relative to on chip RAM with no Cache running dhry.c modified for unbuffered I/O

Tightly Coupled Memory Fast data buffers Fast sections of code Fast interrupt handler Critical loop Constant access time; guaranteed not to have arbitration delays Up to 4 tightly coupled memories Software Guidelines Software accesses tightly-coupled memory addresses just like any other addresses. Cache operations have no effect when targeting tightly-coupled

Pipelining Static branch prediction is implemented using the branch offset direction; a negative offset is predicted as taken a positive offset is predicted as not-taken

Presentation Outline Basic description of Stratix Altera Devices NIOS II processor architecture Review pipelining techniques Review memory access techniques How to design a system using NIOS II processor

Hardware Abstraction Layer (HAL) [4] Isolates the application software from hardware modifications. Applications are device-independent because they abstract information from such systems as: Character mode devices: UART core, JTAG UART core, LCD display controller Flash memory devices Timer devices DMA controller core Ethernet MAC/PHY Controller HAL application program interface (API) is integrated with the ANSI C standard library.

Layers of HAL API [4] HAL library generatioin: SOPC Builder generates a hardware system Nios II IDE generates a custom HAL system library to match the hardware configuration Changes in the hardware configuration automatically propagate to the HAL device driver configuration NIOS II is programmed in C

Programming NIOS II Processor [4] Programming UART Standard Input, Standard Output routines in C --------------------------------------------------- #include <stdio.h> #include <string.h> int main (void) { char* msg = “hello world”; FILE* fp; fp = fopen (“/dev/uart1”, “w”); if (fp) fprintf(fp, “%s”,msg); fclose (fp); } return 0;

References Altera Corp., Stratix & Stratix II Module 3: Using TriMatrix Memories, 2004 Altera Corp., Stratix Module 2: Logic Structure & MultiTrack Interconnect, 2004. Altera Corp., Nios II Processor Reference Handbook, 2005. Altera Corp., Nios II Software Developer's Handbook, 2005.