Presentation is loading. Please wait.

Presentation is loading. Please wait.

ALTERA FPGAs and NIOSII

Similar presentations


Presentation on theme: "ALTERA FPGAs and NIOSII"— Presentation transcript:

1 ALTERA FPGAs and NIOSII
ELG6158 Computer Systems Architecture Miodrag Bolic

2 Presentation Outline Basic description of Stratix Altera Devices
NIOS II processor architecture How to design a system using NIOS II processor

3 Stratix EP1S10 [2]

4

5

6 TriMatrix™ Memory [1] Dedicated External Memory Interface M512 Blocks
M4K Blocks M-RAM Small FIFOs Shift Register Rake Receiver Correlator FIR Filter Delay Line Header / Cell Storage Channelized Functions ATM cell–packet processing Nios Program Memory Packet / Data Storage Nios Program Memory System Cache Video Frame Buffers Echo Canceller Data Storage Look-Up Schemes Packet & Cell Buffering Cache More Bits For Larger Memory Buffering 512 Kbits per block + parity 4 Kbits per block + parity 512 bits per block + parity More Data Ports for Greater Memory Bandwidth

7 Memory Bandwidth Summary Stratix Device Family [1]
Total RAM Bits M-RAM Blocks M4K Blocks M512 Blocks Maximum Bandwidth (Mbps) EP1S10 920,448 1 60 94 1,245,024 EP1S20 1,669,248 2 82 194 2,096,928 EP1S25 1,944,576 138 224 2,894,400 EP1S30 3,317,184 4 171 295 3,750,192 EP1S40 3,423,744 183 384 4,384,800 EP1S60 5,215,104 6 292 574 6,762,528 EP1S80 7,427,520 9 364 767 8,784,720

8

9 Logic Array Blocks (LAB) [2]
Control Signals 10 LEs Local Interconnect LAB-Wide Control Signals 4 LE1 LE2 LE3 LE4 LE5 LE6 LE7 LE8 LE10 LE9 4 4 4 4 Local Interconnect 4 4 4 4 4

10 LAB Arrangement LAB LAB LAB LAB LAB LAB M512 LAB LAB LAB LAB LAB LAB
LABs Communicate Directly to Each Other & Other Blocks Both Horizontally & Vertically LAB Column LAB LAB LAB LAB LAB LAB M512 LAB Row LAB LAB LAB LAB LAB LAB M512

11 Logic Elements Stratix™ LE
Smallest Units of Logic Used for Combinatorial/Registered Logic Carry-In Register Chain Input LUT Chain Input Stratix™ LE General Routing & Local Routing Carry-Out LUT Chain Output Register Chain Output

12 Total LE Resources Device Total LEs EP1S10 10,570 EP1S20 18,460 EP1S25
25,660 EP1S30 32,470 EP1S40 41,250 EP1S60 57,120 EP1S80 79,040

13 LE Datasheet Image

14 LE Features 4-Input Look-Up Table (LUT) Configurable Register
2 Operation Modes Dynamic Add/Subtract Control Carry-Select Chain Logic Performance-Enhancing Features LUT & Register Chain Area-Enhancing Features Register Packing & Feedback

15 LE Inputs/Outputs Inputs Outputs 4 Data
2 LE Carry-Ins & 1 Lab Carry-In 1 Dynamic Addition/Subtraction Control Register Controls Outputs 2 LE Carry-Outs 2 Row/Column/DirectLink Outputs 1 Local Output 1 LUT Chain & 1 Register Chain

16 Operation Modes Normal Dynamic Arithmetic
General Combinatorial or Registered Logic Dynamic Arithmetic Used for Adders Counters Accumulators Comparators Uses Carry Chain for Faster Operation Chosen Automatically by Quartus® II & NativeLink® Synthesis Tools Based on Design & Design Constraints

17 LE Register Controls Clock/Clock Enable
Synchronous & Asynchronous Clear Synchronous & Asynchronous Load & Data Asynchronous Preset Preset Function Loads a ‘1 ALD/PRE ADATA D Q ENA CLRN

18 Normal Mode LUT Chain Input Register Chain Input
Register Control Signals addnsub cin (2) data1 4-Input LUT Sync Load & Clear Logic data2 D DATA Row, Column & DirectLink Routing data3 data4 Local Routing Register Feedback LUT Chain Output Register Chain Output Note: Functional Diagram Only. Please See Datasheet for more Details. Addnsum & data1 connected via XOR logic

19 Combinatorial Logic Only
LUT Chain Input Register Chain Input Register Control Signals addnsub cin (2) data1 4-Input LUT Sync Load & Clear Logic data2 D DATA Row, Column & DirectLink Routing data3 data4 Local Routing Register Feedback LUT Chain Output Register Chain Output Note: Functional Diagram Only. Please See Datasheet for more Details. Addnsum & data1 connected via XOR logic

20 Sequential Logic Only LUT Chain Input Register Chain Input
Register Control Signals addnsub cin (2) data1 4-Input LUT Sync Load & Clear Logic data2 D DATA Row, Column & DirectLink Routing data3 data4 Local Routing Register Feedback LUT Chain Output Register Chain Output Note: Functional Diagram Only. Please See Datasheet for more Details. Addnsum & data1 connected via XOR logic

21 Dynamic Arithmetic Mode
LAB Carry-In Register Chain Input Register Control Signals Carry-In Logic Carry-In0 Carry-In1 addnsub data1 Sum Calculator Sync Load & Clear Logic D DATA data2 Row, Column & DirectLink Routing data3 Carry Calculator Local Routing Carry-In0 Carry-Out Logic Carry-In1 Register Chain Output Carry-Out1 Carry-Out0 Note: Functional Diagram Only. Please See Datasheet for more Details.

22 Carry-Select Logic Each Cell Pre-Calculates Sum & Carry-Out for Carry = 1 & Carry = 0 Carry-In Selects which Pre-Calculation Is Used CIN 1 Single LUT A0+B0+1 A0+B0+0 SUMOUT COUT1 COUT0 COUT

23 Carry Chain Details Carry Chains Begin & End in Any LE
1 LAB Carry-In A1 LE1 LE1 Sum1 B1 A2 LE2 LE2 Sum2 B2 A3 LE3 LE3 LE3 Sum3 Carry Chains Begin & End in Any LE 2 Carry Chains Can Exist In Any LAB Carry-Select Generated in LEs 5 & 10 Every LE Not in Critical Timing Path B3 A4 LE4 Sum4 LE4 B4 A5 LE5 Sum5 B5 1 A6 LE6 Sum6 B6 A7 LE7 Sum7 B7 A8 LE8 Sum8 B8 A9 LE9 Sum9 B9 A10 LE10 Sum10 B10 LAB Carry-Out

24 LUT & Register Chains LUT Chain Register Chain
Output of LUT Connects Directly to LUT Below Available Only In Normal Mode Ex. Wide Fan-In Functions Register Chain Output of Register Connects Directly to Register Below (Shift Register) LUT Can Be Used for Unrelated Function Ex. LE Shift Register Both Chains End at LAB Boundary LE1 LUT D Q LE2 LUT D Q LUT Chain Register Chain LEs

25 Stratix Interconnects
Global Signals LE & Register Chains Carry Chains Local Interconnect DirectLink™ MultiTrack Interconnects Row Interconnects Column Interconnects

26 # of Local Lines Depends on Block
Local Interconnect Groups 10 LEs Together Provides Input Signals to Blocks (LABs, Memory, DSP Blocks) Local Interconnect M512 Local Interconnect LAB # of Local Lines Depends on Block

27 DirectLink Allows Blocks to Drive Local Interconnects of Neighboring Blocks in the Same Row Local Interconnect LE1 LE2 LE3 LE4 LE5 LE6 LE7 LE8 LE10 LE9 Local Interconnect LE1 LE2 LE3 LE4 LE5 LE6 LE7 LE8 LE10 LE9 Local Interconnect M512

28 DirectLink (cont.) Provides Fast Communication between Neighboring Blocks One LE Has Fast Access to Up to 29 Other LEs in Area Saves Row Resources

29 MultiTrack Interconnect Architecture
Provides Connections between All Device Blocks Series of 3 Types of Continuous Row & Column Interconnects Each Has a Fixed Speed and Length Constant Performance Across Family Members within Given Area Simplifies Block Design Same Routing Resources Available Regardless of Location

30 Row Resources 3 Row Interconnect Lengths R4 R8 R24 R4 160 Lines Wide
4 LABs R4 160 Lines Wide R8 48 Lines Wide R24 24 Lines Wide

31 R4 Routing Line Driving Left R4 Routing Line Driving Right
Row Resources (cont.) Each Block Has Own Row Resource to Drive Right and Left R4 Routing Line Driving Left R4 Routing Line Driving Right : : : : : : : : :

32 Row Resource Details R4 R8 R24 Terminate at M-RAM
Only Connect to Local & R8/C8 Interconnects Faster than 2 R4s R24 Do Not Interface with Blocks Directly Can Cross M-RAM Fastest Resource for Long Connections (Ex. Design Block to Design Block)

33 Column Resources 3 Interconnect Lengths
Features Similar to Row Interconnects Each Block Has Column Resource to Drive Up and Down Interconnects Are Staggered Interconnects Can Drive End-to-End C8 C4 4 LABs

34 Presentation Outline Basic description of Stratix Altera Devices
NIOS II processor architecture How to design a system using NIOS II processor

35

36 NIOS II Overview [3] Soft IP Core
A soft-core processor is a microprocessor fully described in software, usually in an HDL, which can be synthesized in programmable hardware, such as FPGAs. Reduced Instruction Set Computer (RISC) No pipeline, 5 or 6 stages pipeline configurations Full 32-bit instruction set, data path, and address space 32 general-purpose registers 32 external interrupt sources Access to a variety of on-chip peripherals, and interfaces to off-chip memories and peripherals Software development environment based on the GNU C/C++ tool chain and Eclipse IDE

37 NIOS II Scalability Powerful multiprocessing systems can be built

38 NIOS II Processor Core [3]
How do we build

39 Implementation The functional units of the Nios II architecture form the foundation for the Nios II instruction set. The Nios II architecture describes an instruction set, not a particular hardware implementation. Trade-offs: More or less of a feature - amount of instruction cache memory. Inclusion or exclusion of a feature - the JTAG debug module. Hardware implementation or software emulation - divider

40 Types of Processors

41 Memory Organization What is the name of the technique for accessing peripherals?

42 Cache Performance Memory I-Cache D-Cache Normalised Performance
SDRAM No No 40.2% SDRAM No Yes 55.2% SDRAM Yes No 64.3% SDRAM Yes Yes 96.4% OnChip No No % OnChip No Yes 98.0% OnChip Yes No % OnChip Yes Yes % Memory I-Cache D-Cache Normalised Performance SDRAM No No 40.2% SDRAM No Yes 55.2% SDRAM Yes No 64.3% SDRAM Yes Yes 96.4% OnChip No No % OnChip No Yes 98.0% OnChip Yes No % OnChip Yes Yes % Performance relative to on chip RAM with no Cache running dhry.c modified for unbuffered I/O

43 Tightly Coupled Memory
Fast data buffers Fast sections of code Fast interrupt handler Critical loop Constant access time; guaranteed not to have arbitration delays Up to 4 tightly coupled memories Software Guidelines Software accesses tightly-coupled memory addresses just like any other addresses. Cache operations have no effect when targeting tightly-coupled

44 Pipelining Static branch prediction is implemented using the branch offset direction; a negative offset is predicted as taken a positive offset is predicted as not-taken

45

46 Presentation Outline Basic description of Stratix Altera Devices
NIOS II processor architecture Review pipelining techniques Review memory access techniques How to design a system using NIOS II processor

47

48 Hardware Abstraction Layer (HAL) [4]
Isolates the application software from hardware modifications. Applications are device-independent because they abstract information from such systems as: Character mode devices: UART core, JTAG UART core, LCD display controller Flash memory devices Timer devices DMA controller core Ethernet MAC/PHY Controller HAL application program interface (API) is integrated with the ANSI C standard library.

49 Layers of HAL API [4] HAL library generatioin:
SOPC Builder generates a hardware system Nios II IDE generates a custom HAL system library to match the hardware configuration Changes in the hardware configuration automatically propagate to the HAL device driver configuration NIOS II is programmed in C

50 Programming NIOS II Processor [4]
Programming UART Standard Input, Standard Output routines in C #include <stdio.h> #include <string.h> int main (void) { char* msg = “hello world”; FILE* fp; fp = fopen (“/dev/uart1”, “w”); if (fp) fprintf(fp, “%s”,msg); fclose (fp); } return 0;

51 References Altera Corp., Stratix & Stratix II Module 3: Using TriMatrix Memories, 2004 Altera Corp., Stratix Module 2: Logic Structure & MultiTrack Interconnect, 2004. Altera Corp., Nios II Processor Reference Handbook, 2005. Altera Corp., Nios II Software Developer's Handbook, 2005.


Download ppt "ALTERA FPGAs and NIOSII"

Similar presentations


Ads by Google