1 SHARC ‘S’uper ‘H’arvard ‘ARC’hitecture Nagendra Doddapaneni ER hit HAR ect VARD ure SUP Arc.

1 SHARC ‘S’uper ‘H’arvard ‘ARC’hitecture Nagendra Doddapaneni ER hit HAR ect VARD ure SUP Arc

2 Overview Harvard Architecture Super Harvard Architecture TigerSHARC processor

3 Outline Background Harvard Architecture −Why? −What? Modern CPU Chip Design Super Harvard Architecture TigerSHARC Processor

4 Outline Background <- Harvard Architecture −Why? −What? Modern CPU Chip Design Super Harvard Architecture TigerSHARC Processor

5 Background von Neumann Architecture −Single storage for instructions and data Digital Signal Processors −Specialized microprocessor designed specifically for digital signal processing, generally in real time

6 Outline Background Harvard Architecture −Why? <- −What? Modern CPU Chip Design Super Harvard Architecture TigerSHARC Processor

7 Why Harvard Architecture ? von Neumann bottleneck (‘memory bound’) DSP applications In von Neumann architecture −Either reading an instruction −Or reading/writing from/to memory

8 Harvard Architecture (cont…)

9 Outline Background Harvard Architecture −Why? −What? <- Modern CPU Chip Design Super Harvard Architecture TigerSHARC Processor

10 What is Harvard Architecture ? Physically separate storage and signal pathways for instruction and data Next instruction fetched, when executing current instruction Program memory can be small and wide Data memory can be large and narrower

11 Outline Background Harvard Architecture −Why? −What? Modern CPU Chip Design <- Super Harvard Architecture TigerSHARC Processor

12 Modern CPU chip design Incorporate features from both architectures ‘On chip’ cache memory – divided into instruction cache and data cache. Harvard architecture used when CPU accesses cache memory. On a cache miss, ‘off chip’ main memory is accessed using von Neumann architecture. Main memory is not separated into data and instruction sections.

13 Outline Background Harvard Architecture −Why? −What? Modern CPU Chip Design Super Harvard Architecture <- TigerSHARC Processor

14 Super Harvard Architecture Cache used to store instructions, leaving both instruction bus and data bus free to fetch operands Harvard Architecture + cache = Extended Harvard Architecture or Super Harvard Architecture

15 Outline Background Harvard Architecture −Why? −What? Modern CPU Chip Design Super Harvard Architecture TigerSHARC Processor <-

16 TigerSHARC Processor Processor Architecture Instruction Parallelism and SIMD Operation Integer ALU Computational blocks −X and Y Register File −X and Y ALU −Multiplier −Shifter −CLU Program Sequencer I J and K buses DMA Controller Applications

17 TigerSHARC Processor Processor Architecture <- Instruction Parallelism and SIMD Operation Integer ALU Computational blocks −X and Y Register File −X and Y ALU −Multiplier −Shifter −CLU Program Sequencer I J and K buses DMA Controller Applications

18 TigerSHARC Processor Architecture 3 128-bit data buses 2 IALU’s 2 Computational Blocks − ALU ( Float and Integer ) − SHIFTER − MULTIPLIER − CLU

19 TigerSHARC Processor Processor Architecture Instruction Parallelism and SIMD Operation <- Integer ALU Computational blocks −X and Y Register File −X and Y ALU −Multiplier −Shifter −CLU Program Sequencer I J and K buses DMA Controller Applications

20 TigerSHARC Instruction Parallelism and SIMD Operation Core can execute simultaneously one to four 32-bit instructions encoded in single instruction line (VLIW). Can execute in parallel? Depends on…. −Instruction line resources each requires −Source and Destination of registers used Supports SIMD operations through the use of both Computational Blocks in parallel. Each Computational Block can execute four 16-bit or eith 8-bit SIMD computations in parallel.

21 TigerSHARC Processor Processor Architecture Instruction Parallelism and SIMD Operation Integer ALU <- Computational blocks −X and Y Register File −X and Y ALU −Multiplier −Shifter −CLU Program Sequencer I J and K buses DMA Controller Applications

22 TigerSHARC Integer ALU 31 32 bit general registers + 1 status register + 8 dedicated registers for circular buffers Performs integer ALU operations and data addressing ALU instructions: ADD, SUB, ARS, LRS (right shifts only), ROT (left and right), AND NOT, NOT, OR, XOR, ABS, MIN, MAX, CMP Status flags: zero (Z), negative (N), overflow (V), carry (C) Instruction conditions: EQ, LT, LE, NEQ, NLT, NLE Instruction options: unsigned (U), circular buffer (CB), bit reverse (BR), computed jump (CJMP) Address related operations: data address generation, circular buffers, bit reverse, UREG moves, DAB control.

23 TigerSHARC Processor Processor Architecture Instruction Parallelism and SIMD Operation Integer ALU Computational blocks −X and Y Register File <- −X and Y ALU −Multiplier −Shifter −CLU Program Sequencer I J and K Buses DMA Controller Applications

24 TigerSHARC Computational Blocks X and Y Register File Register File Syntax −Each Block has 32x32 bit Data registers −Each register can store 4x8 bit, 2x16 bit or 1x32 bit words. −Registers can be combined into dual or quad groups. These groups can store 8, 16, 32, 40 or 64 bit words.

25 TigerSHARC Computational Blocks X and Y Register File Register File Syntax

26 Volatile registers in each block 24 Volatile Data registers in each block −XR0 – XR23 −YR0 – YR23 2 ALU summation registers in each block −XPR0, XPR1, YPR0, YPR1 5 MAC accumulate registers in each block −XMR0 – XMR3, YMR0 – YMR3 −XMR4, YMR4 – Overflow registers

27 TigerSHARC Processor Processor Architecture Instruction Parallelism and SIMD Operation Integer ALU Computational blocks −X and Y Register File −X and Y ALU <- −Multiplier −Shifter −CLU Program Sequencer I J and K buses DMA Controller Applications

28 TigerSHARC X and Y ALU 2x64 bit input paths 2x64 bit output paths 8, 16, 32, or 64 bit addition/subtractio n - Fixed-point 32 or 64 bit logical operations - fixed- point 32 or 40 bit floating-point operations

29 Sample ALU Instruction Example of 16 bit addition XYSR1:0 = R31:30 + R25:24 Performs addition in X and Y Compute Blocks

30 TigerSHARC Processor Processor Architecture Instruction Parallelism and SIMD Operation Integer ALU Computational blocks −X and Y Register File −X and Y ALU −Multiplier <- −Shifter −CLU Program Sequencer I J and K buses DMA Controller Applications

31 TigerSHARC Multiplier Operates on fixed, floating and complex numbers. Fixed-Point numbers −32x32 bit with 32 or 64 bit results −4 (16x16 bit) with 4x16 or 4x32 bit results Floating-Point numbers −32x32 bit with 32 bit result −40x40 bit with 40 bit result Complex Numbers −32x32 bit with 32 bit result −Fixed-point only Results stored in MR register

32 TigerSHARC Multiplier XR0 = R1*R2;; XR1:0 = R3*R5;; XMR1:0 = R3*R5;; //uses XMR4 overflow XR2 = MR3:2, XMR3:2 = R3*R5;; XR3:2 = MR1:0, XMR1:0 = R3*R5;; XFR0 = R1*R2;; XFR1:0 = R3:2*R5:4;; //40 bit multiply //32 bit mantissa

33 TigerSHARC Processor Processor Architecture Instruction Parallelism and SIMD Operation Integer ALU Computational blocks −X and Y Register File −X and Y ALU −Multiplier −Shifter <- −CLU Program Sequencer I J and K data buses DMA Controller Applications

34 TigerSHARC Shifter Operates on one 64-bit, one or two 32-bit, two or four 16-bit, and four or eight 8-bit fixed-point operands Shifts and rotates bits manipulation operations, like bit set, clear, toggle and test Bit FIFO operations to support bit streams

35 TigerSHARC Processor Processor Architecture Integer ALU Computational blocks −X and Y Register File −X and Y ALU −Multiplier −Shifter −CLU <- Program Sequencer J and K data buses I bus – data bus

36 TigerSHARC CLU CLU instructions are designed to support different algorithms used for communications applications Algorithms supported are −Viterbi Decoding (minimal distance decoding algorithm) −Turbo-code Decoding (variant of Viterbi decoding) −De-spreading for Code Division Multiple Access (CDMA) systems (used for tasking a signal in wide Pseudo Noise spread bandwidth)

37 TigerSHARC Processor Processor Architecture Instruction Parallelism and SIMD Operation Integer ALU Computational blocks −X and Y Register File −X and Y ALU −Multiplier −Shifter −CLU Program Sequencer <- I J and K buses DMA Controller Applications

38 TigerSHARC Program Sequencer Supplies instruction addresses to memory IAB caches up to five fetched instruction lines waiting to execute It extracts an instruction line from IAB and distributes to appropriate core component for execution Determine flow control for instructions like JMP, CALL Reduce branch delays using branch prediction and BTB

39 TigerSHARC Processor Processor Architecture Instruction Parallelism and SIMD Operation Integer ALU Computational blocks −X and Y Register File −X and Y ALU −Multiplier −Shifter −CLU Program Sequencer I J and K buses <- DMA Controller Applications

40 TigerSHARC architecture at a glance

41 TigerSHARC Buses DRAM divided into 6 blocks of 4Mbits 6 blocks connect to four 128-bit wide internal buses through a crossbar connection Internal bus architecture provides a total memory bandwidth of 32Gbytes/sec Core and I/O can access −twelve 32-bit data words −four 32-bit instructions per cycle

42 TigerSHARC Processor Processor Architecture Instruction Parallelism and SIMD Operation Integer ALU Computational blocks −X and Y Register File −X and Y ALU −Multiplier −Shifter −CLU Program Sequencer I J and K buses DMA Controller <- Applications

43 TigerSHARC DMA Controller On-chip, with 14 DMA channels Provide zero-overhead data transfers Operates independently and invisibly to the DSP’s core

44 TigerSHARC Processor Processor Architecture Instruction Parallelism and SIMD Operation Integer ALU Computational blocks −X and Y Register File −X and Y ALU −Multiplier −Shifter −CLU Program Sequencer I J and K buses DMA Controller Applications <-

45 TigerSHARC Applications

46 References ANALOG DEVICES −http://www.analog.com/processors/processors/tigersharc/index.htmlhttp://www.analog.com/processors/processors/tigersharc/index.html −http://www.analog.com/processors/processors/sharc/index.htmlhttp://www.analog.com/processors/processors/sharc/index.html −http://www.analog.com/processors/resources/teachingResources.htmlhttp://www.analog.com/processors/resources/teachingResources.html ECE-ADI-PROJECT HOME PAGE −http://www.enel.ucalgary.ca/People/Smith/ECE-ADI-PROJECT/Index/index.htmlhttp://www.enel.ucalgary.ca/People/Smith/ECE-ADI-PROJECT/Index/index.html −http://www.enel.ucalgary.ca/People/Smith/ECE-ADI-PROJECT/Index/otherschoolsFrame.htmhttp://www.enel.ucalgary.ca/People/Smith/ECE-ADI-PROJECT/Index/otherschoolsFrame.htm

47 Summary What is Harvard Architecture? What is Super Harvard Architecture? TigerSHARC processor architecture How TigerSHARC is ‘faster’ for targeted DSP applications?

48 Questions? Thank You.

1 SHARC ‘S’uper ‘H’arvard ‘ARC’hitecture Nagendra Doddapaneni ER hit HAR ect VARD ure SUP Arc.

Similar presentations

Presentation on theme: "1 SHARC ‘S’uper ‘H’arvard ‘ARC’hitecture Nagendra Doddapaneni ER hit HAR ect VARD ure SUP Arc."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 SHARC ‘S’uper ‘H’arvard ‘ARC’hitecture Nagendra Doddapaneni ER hit HAR ect VARD ure SUP Arc.

Similar presentations

Presentation on theme: "1 SHARC ‘S’uper ‘H’arvard ‘ARC’hitecture Nagendra Doddapaneni ER hit HAR ect VARD ure SUP Arc."— Presentation transcript:

Similar presentations

About project

Feedback